Options |
|
Gelex provides an option mechanism for controlling
the way the scanner is generated. Options are specified either in
the declarations section
of gelex input file or from the command
line.
Declaration Options
Options are specified in the declarations section
using unindented lines beginning with %option
followed by a whitespace-separated list of options. One can use
more than one %option directives if
necessary. Most options are given simply as names, optionally
preceded by the word no
(with no intervening whitespace) to negate their meaning.
Following is an excerpt from a scanner description file:
%option ecs meta-ecs case-insensitive nodefault
%option nowarn outfile="my_scanner.e"
The gelex options have the following meanings:
- backup
- Generate backing-up information to
standard output. This is a list of scanner states which
require backing up and the input characters on which they
do so. By adding rules one can remove backing-up states.
If all backing-up states are eliminated and the full
option is used, the generated scanner will run faster.
Only users who wish to squeeze every last CPU cycle out
of their scanners need worry about this option. [default:
nobackup]
- case-insensitive
nocase-sensitive
- Generate a case-insensitive scanner. The
case of ASCII letters (letters a to z) given in the gelex input
patterns will be ignored, and tokens in the generated
scanner input will be matched regardless of case. The
matched text given in text will have the preserved case. [default: case-sensitive]
- nodefault
- Cause the default
rule (that unmatched scanner
input is echoed to the standard output) to be suppressed.
If the scanner encounters input that does not match any
of its rules, it aborts with an error. This option is
useful for finding holes in a scanner's rule set.
[default: default]
- ecs
- Direct gelex to construct
equivalence classes, i.e. sets of characters which have
identical lexical properties (for example, if the only
appearance of digits in the gelex input is in
the character class [0-9] then the digits '0', '1', ..., '9' will all be
put in the same equivalence class). Equivalence classes
usually give dramatic reductions in the final
table/object file sizes (typically a factor of 2-5) and
are pretty cheap performance-wise (one array look-up per
character scanned). [default: ecs]
- full
- Specify that the full scanner tables
should be generated - gelex should not compress the tables
by taking advantages of similar transition functions for
different states. The result is large but fast. [default:
nofull]
- line
- Direct gelex to generate code for
line and column counting. [default: noline]
- meta-ecs
- Direct gelex to construct
meta-equivalence classes, which are sets of equivalence
classes (or characters, if equivalence classes are not
being used) that are commonly used together.
Meta-equivalence classes are often a big win when using
compressed tables, but they have a moderate performance
impact (one or two "if" tests and one array
look-up per character scanned). This option does not make
sense together with option full since there is no opportunity for
meta-equivalence classes if the table is not being
compressed. [default: meta-ecs]
- outfile="filename"
- Direct gelex to write the scanner
class to the file filename instead of the standard output.
- position
- Direct gelex to generate code for
position counting (i.e. the number of characters read
since the beginning of the input source). [default: noposition]
- post-action
- Specify that the feature post_action should be called after each semantic action.
[default: nopost-action]
- post-eof-action
- Specify that the feature post_eof_action should be called after each end-of-file (i.e. <<EOF>>) semantic action. [default: nopost-eof-action]
- pre-action
- Specify that the feature pre_action should be called before each semantic action.
[default: nopre-action]
- pre-eof-action
- Specify that the feature pre_eof_action should be called before each end-of-file (i.e. <<EOF>>) semantic action. [default: nopre-eof-action]
- reject
- Specify that the feature reject is used in semantic actions. [default: noreject]
- utf8
- Unicode characters appearing in patterns
other than (b:r)
are internally converted to their UTF-8 sequence of bytes.
Character classes are converted accordinly to the equivalent
patterns. This option is useful when the file to be scanned
is using the UTF-8 encoding. However it produces larger
scanners (size of generated tables and size of the resulting
executable), with no visible speed improvement (when measured
with the Eiffel parser of gec scanning 20,000 Eiffel
class text files) compared to the default mode which is also
capable of scanning Unicode characters.
- nowarn
- Suppress warning messages. [default: warn]
Most of these options can also be specified from the command-line.
Following is gelex command-line usage message:
gelex [--version][--help][-bcefhimsVwxz?][-a size][--array-size=size][--backup]
[--ecs][--[no]full][--case-insensitive][--inspect-actions[=(yes|no)]][--meta-ecs]
[--outfile=filename][-o filename][--pragma=[no]line][--nodefault][--nowarn] filename
The command-line options have the following meanings:
- -V
- --version
- Print gelex version number to the
standard output and exit.
- -h
- -?
- --help
- Print gelex usage message to the
standard output and exit.
- -a size
- --array-size=size
- Some Eiffel compilers experience
difficulties to process big manifest arrays. This option
directs gelex to split manifest arrays with more
than size elements into several smaller arrays. This
option can be disabled by setting size to 0.
[default: 200]
- -b
- --backup
- Equivalent to backup.
- -c
- --nofull
- Equivalent to nofull.
- -e
- --ecs
- Equivalent to ecs.
- -f
- --full
- Equivalent to full.
- -i
- --case-insensitive
- Equivalent to case-insensitive.
- -z
- --inspect-actions[=(yes|no)]
- The generated code uses an inspect
instruction to find out which action to execute. This is the
default. Otherwise, a binary-search implemented with if
instructions is used.
- -m
- --meta-ecs
- Equivalent to meta-ecs.
- -o filename
- --outfile=filename
- Equivalent to outfile="filename"
- --pragma=[no]line
- The code generated for the semantic actions contains (or
not) indication about the line number where it originally
appeared in the input description file.
- -s
- --nodefault
- Equivalent to nodefault.
- -w
- --nowarn
- Equivalent to nowarn.
- -x
- Write each semantic action into a separate routine. The
default is to write all actions into the same routine,
which can become too large for C back-end compilers to
handle.
- --
- Mark the end of the options. Useful when the scanner
description filename begins with character '-'.
- filename
- Name of gelex input file containing the scanner
description. By convention gelex input filenames
have the extension '.l' such
as in my_scanner.l. One is not required to follow this convention
though.
Note that options specified on the command-line
will override those specified in the %option directives of
the input file.