6 Text_Highlighter is a class for syntax highlighting. The main idea is to
7 simplify creation of subclasses implementing syntax highlighting for
8 particular language. Subclasses do not implement any new functioanality, they
9 just provide syntax highlighting rules. The rules sources are in XML format.
10 To create a highlighter for a language, there is no need to code a new class
11 manually. Simply describe the rules in XML file and use Text_Highlighter_Generator
12 to create a new class.
15 This document does not contain a formal description of API - it is very
16 simple, and I believe providing some examples of code is sufficient.
19 Highlighter XML source
20 ======================
25 Creating a new syntax highlighter begins with describing the highlighting
26 rules. There are two basic elements: block and region. A block is just a
27 portion of text matching a regular expression and highlighted with a single
28 color. Keyword is an example of a block. A region is defined by two regular
29 expressions: one for start of region, and another for the end. The main
30 difference from a block is that a region can contain blocks and regions
31 (including same-named regions). An example of a region is a group of
32 statements enclosed in curly brackets (this is used in many languages, for
33 example PHP and C). Also, characters matching start and end of a region may be
34 highlighted with their own color, and region contents with another.
36 Blocks and regions may be declared as contained. Contained blocks and regions
37 can only appear inside regions. If a region or a block is not declared as
38 contained, it can appear both on top level and inside regions. Block or region
39 declared as not-contained can only appear on top level.
41 For any region, a list of blocks and regions that can appear inside this
42 region can be specified.
44 In this document, the term "color group" is used. Chunks of text assigned to
45 same color group will be highlighted with same color. Note that in versions
46 prior 0.5.0 color goups were refered as CSS classes, but since 0.5.0 not only
47 HTML output is supported, so "color group" is more appropriate term.
52 The toplevel element is <highlight>. Attribute lang is required and denotes
53 the name of the language. Its value is used as a part of generated class name,
54 and must only contain letters, digits and underscores. Optional attribute
55 case, when given value yes, makes the language case sensitive (default is case
56 insensitive). Allowed subelements are:
58 * <authors>: Information about the authors of the file.
59 <author>: Information about a single author of the file. (May be used
60 multiple times, one per author.)
61 - name="...": Author's name. Required.
62 - email="...": Author's email address. Optional.
64 * <default>: Default color group.
65 - innerGroup="...": color group name. Required.
67 * <region>: Region definition
68 - name="...": Region name. Required.
69 - innerGroup="...": Default color group of region contents. Required.
70 - delimGroup="...": color group of start and end of region. Optional,
71 defaults to value of innerGroup attribute.
72 - start="...", end="...": Regular expression matching start and end
73 of region. Required. Regular expression delimiters are optional, but
74 if you need to specify delimiter, use /. The only case when the
75 delimiters are needed, is specifying regular expression modifiers,
76 such as m or U. Examples: \/\* or /$/m.
77 - contained="yes": Marks region as contained.
78 - never-contained="yes": Marks region as not-contained.
79 - <contains>: Elements allowed inside this region.
80 - all="yes" Region can contain any other region or block
81 (except not-contained). May be used multiple times.
82 - <but> Do not allow certain regions or blocks.
83 - region="..." Name of region not allowed within
85 - block="..." Name of block not allowed within
87 - region="..." Name of region allowed within current region.
88 - block="..." Name of block allowed within current region.
89 - <onlyin> Only allow this region within certain regions. May be
91 - block="..." Name of parent region
93 * <block>: Block definition
94 - name="...": Block name. Required.
95 - innerGroup="...": color group of block contents. Optional. If not
96 specified, color group of parent region or default color group will be
97 used. One would only want to omit this attribute if there are
98 keyword groups (see below) inherited from this block, and no special
99 highlighting should apply when the block does not match the keyword.
100 - match="..." Regular expression matching the block. Required.
101 Regular expression delimiters are optional, but if you need to
102 specify delimiter, use /. The only case when the delimiters are
103 needed, is specifying regular expression modifiers, such as m or U.
104 Examples: #|\/\/ or /$/m.
105 - contained="yes": Marks block as contained.
106 - never-contained="yes": Marks block as not-contained.
107 - <onlyin> Only allow this block within certain regions. May be used
109 - block="..." Name of parent region
110 - multiline="yes": Marks block as multi-line. By default, whole
111 blocks are assumed to reside in a single line. This make the things
112 faster. If you need to declare a multi-line block, use this
114 - <partgroup>: Assigns another color group to a part of the block that
115 matched a subpattern.
116 - index="n": Subpattern index. Required.
117 - innerGroup="...": color group name. Required.
119 This is an example from CSS highlighter: the measure is matched as
120 a whole, but the measurement units are highlighted with different
123 <block name="measure" match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)"
124 innerGroup="number" contained="yes">
125 <onlyin region="property"/>
126 <partGroup index="1" innerGroup="string" />
129 * <keywords>: Keyword group definition. Keyword groups are useful when you
130 want to highlight some words that match a condition for a block with a
131 different color. Keywords are defined with literal match, not regular
132 expressions. For example, you have a block named identifier matching a
133 general identifier, and want to highlight reserved words (which match
134 this block as well) with different color. You inherit a keyword group
135 "reserved" from "identifier" block.
136 - name="...": Keyword group. Required.
137 - ifdef="...", ifndef="..." : Conditional declaration. See
139 - inherits="...": Inherited block name. Required.
140 - innerGroup="...": color group of keyword group. Required.
141 - case="yes|no": Overrides case-sensitivity of the language.
142 Optional, defaults to global value.
143 - <keyword>: Single keyword definition.
144 - match="..." The keyword. Note: this is not a regular
145 expression, but literal match (possibly case insensitive).
147 Note that for BC reasons element partClass is alias for partGroup, and
148 attributes innerClass and delimClass are aliases of innerGroup and
149 delimGroup, respectively.
155 Conditional declarations allow enabling or disabling certain highlighting
156 rules at runtime. For example, Java highlighter has a very big list of
157 keywords matching Java standard classes. Finding a match in this list can take
158 much time. For that reason, corresponding keyword group is declared with
161 <keywords name="builtin" inherits="identifier" innerClass="builtin"
162 case="yes" ifdef="java.builtins">
163 <keyword match="AbstractAction" />
164 <keyword match="AbstractBorder" />
165 <keyword match="AbstractButton" />
168 <keyword match="_Remote_Stub" />
169 <keyword match="_ServantActivatorStub" />
170 <keyword match="_ServantLocatorStub" />
173 This keyword group will be only enabled when "java.builtins" is passed as an
174 element of "defines" option:
180 'numbers' => HL_NUMBERS_TABLE,
182 $highlighter = Text_Highlighter::factory('java', $options);
184 "ifndef" attribute has reverse meaning.
186 Currently, "ifdef" and "ifndef" attributes are only supported for <keywords>
194 Creating XML description of highlighting rules is the most complicated part of
195 the process. To generate the class, you need just few lines of code:
198 require_once 'Text/Highlighter/Generator.php';
199 $generator = new Text_Highlighter_Generator('php.xml');
200 $generator->generate();
201 $generator->saveCode('PHP.php');
206 Command-line class generation tool
207 ==================================
209 Example from previous section looks pretty simple, but it does not handle any
210 errors which may occur during parsing of XML source. The package provides a
211 command-line script to make generation of classes even more simple, and takes
212 care of possible errors. It is called generate (on Unix/Linux) or generate.bat
213 (on Windows). This script is able to process multiple files in one run, and
214 also to process XML from standard input and write generated code to standard
221 -x filename, --xml=filename
222 source XML file. Multiple input files can be specified, in which
223 case each -x option must be followed by -p unless -d is specified
225 -p filename, --php=filename
226 destination PHP file. Defaults to stdout. If specied multiple times,
227 each -p must follow -x
228 -d dirname, --dir=dirname
229 Default destination directory. File names will be taken from XML input
230 ("lang" attribute of <highlight> tag)
236 Read from php.xml, write to PHP.php
238 generate -x php.xml -p PHP.php
240 Read from php.xml, write to standard output
244 Read from php.xml, write to PHP.php, read from xml.xml, write to XML.php
246 generate -x php.xml -p PHP.php -x xml.xml -p XML.php
248 Read from php.xml, write to /some/dir/PHP.php, read from xml.xml, write to
249 /some/dir/XML.php (assuming that xml.xml contains <highlight lang="xml">, and
250 php.xml contains <highlight lang="php">)
252 generate -x php.xml -x xml.xml -d /some/dir/
262 Text_Highlighter supports renderes. Using renderers, you can get output in
263 different formats. Two renderers are included in the package:
265 - HTML renderer. Generates HTML output. A style sheet should be linked to
266 the document to display colored text
268 - Console renderer. Can be used to output highlighted text to
269 color-capable terminals, either directly or trough less -r
275 Renderers are subclasses of Text_Highlighter_Renderer. Renderer should
276 override at least two methods - acceptToken and getOutput. Overriding other
277 methods is optional, depending on the nature of renderer's output and details
281 resets renderer state. This method is called every time before a new
282 source file is highlighted.
284 string preprocess(string $code)
285 preprocesses code. Can be used, for example, to normalize whitespace
286 before highlighting. Returns preprocessed string.
288 void acceptToken(string $group, string $content)
289 the core method of the renderer. Highlighter passes chunks of text to
290 this method in $content, and color group in $group
293 signals the renderer that no more tokens are available.
296 returns generated output.
299 Setting renderer options
300 --------------------------------
302 Renderers accept an optional argument to their constructor - options array.
303 Elements of this array are renderer-specific.
308 HTML renderer produces HTML output with optional line numbering. The renderer
309 itself does not provide information about actual colors of highlighted text.
310 Instead, <span class="hl-XXX"> is used, where XXX is replaced with color group
311 name (hl-var, hl-string, etc.). It is up to you to create a CSS stylesheet.
312 If 'use_language' option with value evaluating to true was passed, class names
313 will be formatted as "LANG-hl-XXX", where LANG is language name as defined in
314 highlighter XML source ("lang" attribute of <highlight> tag) in lower case.
316 There are 3 special CSS classes:
318 hl-main - this class applies to whole output or right table column,
319 depending on 'numbers' option
320 hl-gutter - applies to left column in table
321 hl-table - applies to whole table
323 HTML renderer accepts following options (each being optional):
325 * numbers - line numbering style.
326 0 - no numbering (default)
327 HL_NUMBERS_LI - use <ol></ol> for line numbering
328 HL_NUMBERS_TABLE - create a 2-column table, with line numbers in left
329 column and highlighted text in right column
331 * tabsize - tabulation size. Defaults to 4
335 require_once 'Text/Highlighter/Renderer/Html.php';
337 'numbers' => HL_NUMBERS_LI,
340 $renderer = new Text_Highlighter_Renderer_HTML($options);
345 Console renderer produces output for displaying on a color-capable terminal,
346 either directly or through less -r, using ANSI escape sequences. By default,
347 this renderer only highlights most common color groups. Additional colors
348 can be specified using 'colors' option. This renderer also accepts 'numbers'
349 option - a boolean value, and 'tabsize' option.
353 require_once 'Text/Highlighter/Renderer/Console.php';
355 'prepro' => "\033[35m",
356 'types' => "\033[32m",
363 $renderer = new Text_Highlighter_Renderer_Console($options);
366 ANSI color escape sequences have the following format:
370 where ESC is character with ASCII code 27 (033 octal, 0x1B hexadecimal). # is
371 one of the following:
375 4 underline (mono only)
378 8 nondisplayed (invisible)
384 35 magenta foreground
392 45 magenta background
397 How to use Text_Highlighter class
398 =================================
400 Creating a highlighter object
401 -----------------------------
403 To create a highlighter for a certain language, use Text_Highlighter::factory()
406 require_once 'Text/Highlighter.php';
407 $hl = Text_Highlighter::factory('php');
413 Actual output is produced by a renderer.
415 require_once 'Text/Highlighter.php';
416 require_once 'Text/Highlighter/Renderer/Html.php';
418 'numbers' => HL_NUMBERS_LI,
421 $renderer = new Text_Highlighter_Renderer_HTML($options);
422 $hl = Text_Highlighter::factory('php');
423 $hl->setRenderer($renderer);
425 Note that for BC reasons, it is possible to use highlighter without setting a
426 renderer. If no renderer is set, HTML renderer will be used by default. In
427 this case, you should pass options as second parameter to factory method. The
428 following example works exactly as previous one:
430 require_once 'Text/Highlighter.php';
432 'numbers' => HL_NUMBERS_LI,
435 $hl = Text_Highlighter::factory('php', $options);
441 And finally, do the highlighting and get the output:
443 require_once 'Text/Highlighter.php';
444 require_once 'Text/Highlighter/Renderer/Html.php';
446 'numbers' => HL_NUMBERS_LI,
449 $renderer = new Text_Highlighter_Renderer_HTML($options);
450 $hl = Text_Highlighter::factory('php');
451 $hl->setRenderer($renderer);
452 $html = $hl->highlight(file_get_contents('example.php'));
454 # vim: set autoindent tabstop=4 shiftwidth=4 softtabstop=4 tw=78: */