Matching Rules PreviousNext

When the generated scanner is run, it analyzes its input looking for strings which match any of its patterns. If it finds more than one match, it takes the one matching the most text (for trailing context rules, this includes the length of the trailing part, even though it will then be returned to the input). If it finds two or more matches of the same length, the rule listed first in the gelex description file is chosen.

Once the match is determined, the text corresponding to the match (called the token) is made available through function text (or alternatively unicode_text or utf8_text), and its length through function text_count from class YY_SCANNER. The action corresponding to the matched pattern is then executed (a more detailed description of actions follows), and then the remaining input is scanned for another match.

Default Rule

If no match is found, then the default rule is executed: the next character in the input is considered matched and copied to the standard output. Thus, the simplest legal gelex input has an empty rules section:

%{
class ...
%}
%%
%%
    ...

which generates a scanner that simply copies its input (one character at a time) to its output. The semantic action of the default rule can be overriden just by redefining the feature default_action which is inherited from class YY_SCANNER.

Note that if the %option nodefault or the command-line option -s is specified, then the default rule is disabled. If the scanner encounters input that does not match any of its rules, it aborts with an error. This option is useful for finding holes in a scanner's rule set. The default rule can then be simulated by adding the following rule at the end of the rules section:

.|\n    default_action

End-of-file Rules

When the scanner receives an end-of-file indication from its input buffer, it then checks the wrap function. If wrap returns false, then it is assumed that the function has gone ahead and set up the scanner to point to another input buffer using set_input_buffer, and scanning continues. If it returns true, then there is no further files to process. By default, wrap returns true, but this routine can be redefined as in the following example:

%{
class MY_SCANNER

inherit

    YY_COMPRESSED_SCANNER_SKELETON
        rename
            ...
        redefine
            wrap, ...
        end

create

    make
%}
...
%%
...
%%

    wrap: BOOLEAN
            -- Should current scanner terminate when end of file is reached?
        do
            if other_file_available then
                set_input_buffer (new_file_buffer (other_file))
            else
                Result := True
            end
        end

    ...

end 

Note that in either case, the start condition remains unchanged; it does not revert to INITIAL.

The special rule <<EOF>> indicates actions which are to be taken when an end-of-file is encountered and wrap returns true. The action must finish by doing one of three things:

<<EOF>> rules may not be used with other patterns; they may only be qualified with a list of start conditions. If an unqualified <<EOF>> rule is given, it applies to all start conditions which do not already have <<EOF>> actions. If a start condition has no <<EOF>> action associated with it, it will by default execute the terminate action. To specify an <<EOF>> rule for only the initial start condition, use:

<INITIAL><<EOF>>

These rules are useful for catching things like unclosed comments. An example:

%x quote
%%
...other rules for dealing with quotes...
<quote><<EOF>> {
       io.error.put_string ("unterminated quote%N")
       terminate
            }
<<EOF>> {
       if not file_list.after then
           set_input_buffer (new_file_buffer (file_list.item))
           file_list.forth
       else
           terminate
       end
     }
%%

Copyright © 2000-2019, Eric Bezault
mailto:
ericb@gobosoft.com
http:
//www.gobosoft.com
Last Updated: 27 September 2019

HomeTocPreviousNext