Actions PreviousNext

Each pattern in a rule has a corresponding action, which can be any arbitrary Eiffel instructions. The pattern ends at the first non-escaped whitespace character; the remainder of the line is its action. If the action is empty, then when the pattern is matched the input token is simply discarded. For example, here is an excerpt from the specification of a program which deletes all occurrences of "zap me" from its input:

%%
"zap me"

It will copy all other characters in the input to the output since they will be matched by the default rule. Here is a program which compresses multiple blanks and tabs down to a single blank, and throws away whitespace found at the end of a line:

%%
[ \t]+     io.put_character (' ')
[ \t]+$    -- Ignore this token.

If the action begins with a {, then the action spans till the balancing } is found, and the action may cross multiple lines. Gelex knows about Eiffel strings, characters and comments and therefore won't be fooled by braces found within them.

An action consisting solely of a vertical bar | means "same as the action for the next rule". See below for an illustration.

Actions can include arbitrary Eiffel code. There are a number of special features, inherited from class YY_SCANNER, which can be used in actions:

append_text_to_string (a_string: STRING)
Append text at end of a_string. For efficiency reason, this feature can bypass the call to text and directly copy the characters from the input buffer.
append_text_substring_to_string (s, e: INTEGER; a_string: STRING)
Append text_substring at end of a_string. For efficiency reason, this feature can bypass the call to text_substring and directly copy the characters from the input buffer.
column: INTEGER
Column number of last token read. If it is used in any of the scanner's actions the %option line will have to be set.
echo
Copy text to the scanner's output file using output.
Empty_buffer: YY_BUFFER
Empty input buffer (once function). When input sources are not known yet at the creation time of a scanner, this input buffer can be used by default with the creation routine make_with_buffer.
flush_input_buffer
Flush the scanner's internal buffer so that the next time the scanner attempts to match a token it will first refill the buffer, unless end of file has been found.
input_buffer: YY_BUFFER
Input buffer of the scanner. By default the input buffer is filled from the standard input. To avoid unexpected behaviors, the routine set_input_buffer should be used to switch to other input buffers.
last_character: CHARACTER
Last character read by read_character.
last_token: INTEGER
Code of the last token read. When this attribute is given a non-negative value the procedure read_token stops, giving the opportunity to its caller (e.g. a parser routine) to inspect this code. Each time read_token is called again it continues processing tokens from where it last left off until either last_token is given a non-negative value again or the end of the file is reached (yielding a null value). Non-positive values are reserved by read_token to indicate internal errors which can occur when too many reject are called (and hence nothing can be matched anymore) or when the option nodefault (or option -s) has been specified but the default rule is matched nevertheless.
less (n: INTEGER)
Return all but the first n characters of the current token back to the input stream, where they will be rescanned when the scanner looks for the next match. text and text_count are adjusted appropriately (e.g., text_count will now be equal to n). For example, on the input "foobar" the following will write out "foobarbar":
%%
foobar    echo; less (3)
[a-z]+    echo 
An argument of 0 to less will cause the entire current input string to be scanned again. Unless the way the scanner subsequently process its input has been changed (using set_start_condition, for example), this will result in an endless loop.
line: INTEGER
Line number of last token read. If it is used in any of the scanner's actions the %option line will have to be set.
more
Tell the scanner that the next time it matches a rule, the corresponding token should be appended onto the current value of text rather than replacing it. For example, given the input "mega-kludge" the following will write "mega-mega-kludge" to the output:
%%
mega-      echo; more
kludge     echo 
First "mega-" is matched and echoed to the output. Then "kludge" is matched, but the previous "mega-" is still hanging around at the beginning of text so the echo for the "kludge" rule will actually write "mega-kludge".
new_file_buffer (a_file: FILE): YY_FILE_BUFFER
Create an input buffer for a_file. This routine is convenient when used with set_input_buffer.
new_string_buffer (a_string: STRING): YY_BUFFER
Create an input buffer for a_string. This routine is convenient when used with set_input_buffer.
output (a_text: like text)
Write a_text to the standard output by default. This behavior can easily be modified through redefinition.
position: INTEGER
Position of last token read (i.e. number of characters from the start of the input source). If it is used in any of the scanner's actions the %option position will have to be set.
print_last_token
Routine called at the end of read_token when debugging instructions are enabled. Print to standard error debug information about the last token read. This routine can be redefined in descendant classes to print more information. In particular, the routine token_name generated by geyacc can be used to make the debugging output more human-readable.
read_character
Read the next character from the input stream. Make the result available in last_character. For example, the following is one way to eat up C comments:
%%
"/*"  {
    from until stop loop
        from
            read_character
         until
            last_character = '*' or
            last_character = '%/255/'
        loop
            read_character
        end
        if last_character = '*' then
            from
                read_character
            until
                last_character /= '*'
            loop
                read_character
            end
            if last_character = '/' then
                stop := True
            end
        end
        if last_character = '%/255/' then
            io.error.put_string ("EOF in comment%N")
            stop := True
        end
    end
}
This feature should be used with care since it bypasses the pattern-matching DFA engine.
reject
Direct the scanner to proceed on to the "second best" rule which matched the input (or a prefix of the input). The rule is chosen as described in Matching Rules, and text and text_count return the appropriate values. It may either be one which matched as much text as the originally chosen rule but came later in the gelex input file, or one which matched less text. For example, the following will both count the words in the input and call the routine special whenever "frob" is seen:
%%
frob         special; reject
[^ \t\n]+    word_count := word_count + 1
%%
    word_count: INTEGER
    special is do ... end

Without the reject, any "frob"'s in the input would not be counted as words, since the scanner normally executes only one action per token. Multiple reject's are allowed, each one finding the next best choice to the currently active rule. For example, when the following scanner scans the token "abcd", it will write "abcdabcaba" to the output:

%%
a        |
ab       |
abc      |
abcd     echo; reject
.|\n     -- Eat up any unmatched character. 
(The first three rules share the fourth's action since they use the special '|' action.) reject is a particularly expensive feature in terms of scanner performance. If it is used in any of the scanner's actions the %option reject will have to be set and it will slow down all of the scanner's matching. Furthermore, reject cannot be used with the %option full and this feature is only available to descendants of class YY_COMPRESSED_SCANNER_SKELETON.
set_input_buffer (a_buffer: like input_buffer)
Switch the scanner's input buffer so that subsequent tokens will come from a_buffer. This routine can be used to continue scanning another file when the end-of-file has been read, or to deal with preprocessor instructions such as #include. It can eventually be given as argument the result of one of the functions new_file_buffer or new_string_buffer. Note that switching input buffers does not change the start condition of the scanner.
set_last_token (a_token: INTEGER)
Set last_token to a_token.
set_start_condition (a_start_condition: INTEGER)
Put the scanner in the corresponding start condition. See discussion on start conditions for further details.
start_condition: INTEGER
Current start condition. This value can subsequently be used with set_start_condition to return to that start condition. See discussion on start conditions for further details.
terminate
Terminate the scanner and set last_token to 0, indicating "all done". By default, terminate is also called when an end-of-file is encountered.
text: STRING
Text of the last token read. This feature is a function which creates a new string each time it is called. Actions are hence free to alter the result of text without damaging the input buffer.
text_count: INTEGER
Length of the last token read. This feature is a function which computes the number of characters matched by the corresponding pattern. If efficiency is a concern and this function is called several times in the same action, its result can be stored in a temporary variable.
text_item (i: INTEGER): CHARACTER
Character at a given index in text. For efficiency reason, this function bypasses the call to text and reads the character directly from the input buffer.
text_substring (s, e: INTEGER): STRING
Substring of text. This function creates a new string each time it is called. For efficiency reason, this function bypasses the call to text and creates the substring directly from the input buffer.
unread_character (c: CHARACTER)
Put the character c back onto the input stream. It will be the next character scanned. The following action will take the current token and cause it to be rescanned enclosed in parentheses.
{
    a_text := text
    unread_character (')')
    from i := text_count until i < 1 loop
        unread_character (a_text.item (i))
        i := i - 1
    end
    unread_character ('(')
} 
Note that since each unread_character puts the given character back at the beginning of the input stream, pushing back strings must be done back-to-front. An important potential problem when using unread_character is that it alters the input stream. If you need the value of text after a call to unread_character (as in the above example), you must first save it elsewhere. Finally, note that you cannot put back EOF (i.e. '%/255/') to attempt to mark the input stream with an end-of-file.

In addition to the above routines which can be called in semantic actions, the following routines can be called after the routine read_token has returned:

end_of_file: BOOLEAN
Has the end of input buffer been reached? This is the case when last_token has been set to 0.
scanning_error: BOOLEAN
Has an error occurred during scanning? This is the case when last_token has been given a non-positive value. It can occur when too many reject are called (and hence nothing can be matched anymore) or when the option nodefault (or option -s) has been specified but the default rule is matched nevertheless.

Furthermore, the following routines can be called before or after any semantic actions if the corresponding %option have been specified. These routines do nothing by default but can be redefined in the generated scanner class.

pre_action
Action executed before every semantic action when %option pre-action has been specified.
post_action
Action executed after every semantic action when %option post-action has been specified.
pre_eof_action
Action executed before every end-of-file semantic action (i.e. <<EOF>>) when %option pre-eof-action has been specified.
post_eof_action
Action executed after every end-of-file semantic action (i.e. <<EOF>>) when %option post-eof-action has been specified.

Copyright 2000-2005, Eric Bezault
mailto:
ericb@gobosoft.com
http:
//www.gobosoft.com
Last Updated: 22 February 2005

HomeTocPreviousNext