Scanner Description File PreviousNext

The gelex input file consists of three sections, separated by a line with just %% in it:

declarations
%%
rules
%%
user code

Comments follow Eiffel style conventions and have no effect on the description's semantics.

Declarations section

The declarations section contains declarations of options and start conditions which are explained elsewhere, declarations of simple name definitions to simplify the scanner specification, and declarations of Eiffel code to be copied at the beginning of the generated scanner class.

Name definitions

Name definitions have the form:

name  definition

The name is a word beginning with a letter and followed by zero or more letters, digits, or underscores. It must appear at the beginning of the line and is case-insensitive, which means that the three following lines are equivalent:

name  definition
NAME  definition
nAmE  definition

The definition is taken to begin at the first non-whitespace character following the name and continuing to the end of the line. The definition can subsequently be referred to using {name}, which will expand to (definition). For example,

DIGIT  [0-9]
ID     [a-z][a-z0-9]*

defines DIGIT to be a regular expression which matches a single digit, and ID to be a regular expression which matches a letter followed by zero-or-more letters-or-digits. A subsequent reference to:

{DIGIT}+"."{DIGIT}*

is identical to:

([0-9])+"."([0-9])*

and matches one-or-more digits followed by a dot followed by zero-or-more digits.

Note that the definition part can be defined in terms of other definitions. For example:

ID     {LETTER}({LETTER}|{DIGIT})*
DIGIT  [0-9]
LETTER [a-z]

Then a subsequent reference to:

{ID}

is identical to:

([a-z])(([a-z])|([0-9]))*

As opposed to start conditions, the name definition mechanism has its own name space. Names that are otherwise used as feature names in the generated scanner class can therefore be used as definition names without ambiguity.

Eiffel declarations

The declarations section may also contain Eiffel code to be copied verbatim to the beginning of the generated scanner class. The Eiffel text has to be enclosed between two unindented lines containing the two marks %{ and %} such as in the following example:

%{
class MY_SCANNER

inherit

    YY_COMPRESSED_SCANNER_SKELETON

create

    make
%}

Gelex does not generate the note, class header, formal generics, obsolete, inheritance and creation clauses. As the example above shows, Eiffel declarations are used to specify such clauses in order to ensure that the generated scanner class is syntactically and semantically correct. Here, the name of the generated class is MY_SCANNER and its creation procedure is make, a routine inherited from class YY_COMPRESSED_SCANNER_SKELETON. This class contains the pattern-matching engine - a Deterministic Finite Automaton (or DFA for short) - which is optimized in size, hence the name of the class. It also provides numerous facilities such as routine scan for analyzing a given input text. The generated scanner class has to inherit from one such class to work properly. Other alternatives are YY_FULL_SCANNER_SKELETON, whose DFA is optimized in speed but not in space, and YY_INTERACTIVE_SCANNER_SKELETON, whose DFA can deal with interactive input such as input from the keyboard.

If several of these Eiffel blocks appear in the declarations section, they are all copied to the generated scanner class in their order of appearance in the input file.

Note that if the Eiffel code contains Unicode characters, the input file should use the UTF-8 encoding and start with the BOM character.

Rules section

The rules section of the gelex input contains a series of rules of the form:

pattern action

where the pattern can be indented and the action must begin on the same line. A further description of patterns and actions is provided in other chapters.

User code section

Finally, the user code section is simply copied verbatim to the end of the generated scanner class. Gelex does not generate the invariant clause and the end of class keyword. This section is hence used to specify such clauses and also to define features called from the semantic actions. The presence of this section is optional (if it is missing, the second %% in the input file may be skipped, too) but is highly recommended if only to specify the end of the generated scanner class and thus ensure that this class is syntactically correct.

Note that if the Eiffel code in this user code section contains Unicode characters, the input file should use the UTF-8 encoding and start with the BOM character.

Names of implementation features in inherited classes YY_*_SCANNER_SKELETON are prefixed by yy. As a consequence, user-declarared feature names beginning with this prefix should be avoided.


Copyright © 1997-2019, Eric Bezault
mailto:
ericb@gobosoft.com
http:
//www.gobosoft.com
Last Updated: 25 September 2019

HomeTocPreviousNext