Semantic Actions |
The grammar rules for a language determine only the syntax. The semantics are determined by the semantic values associated with various tokens and groupings, and by the actions taken when various groupings are recognized. For example, the calculator calculates properly because the value associated with each expression is the proper number; it adds properly because the action for the grouping X + Y is to add the numbers associated with X and Y.
An action accompanies a syntactic rule and contains Eiffel code to be executed each time an instance of that rule is recognized. The task of most actions is to compute a semantic value for the grouping built by the rule from the semantic values associated with tokens or smaller groupings.
An action consists of Eiffel instructions surrounded by braces. Geyacc knows about Eiffel strings, characters and comments and therefore won't be fooled by braces found within them. An action can be placed at any position in the rule; it is executed at that position. Most rules have just one action at the end of the rule, following all the components. Actions in the middle of a rule are tricky and used only for special purposes.
The Eiffel code in an action can refer to the semantic values of the components matched by the rule with the construct $N, which stands for the value of the Nth component. The semantic value for the grouping being constructed is $$. (Geyacc translates both of these constructs into array element references when it copies the actions into the generated parser class.)
Here is a typical example:
exp: ... | exp '+' exp { $$ := $1 + $3 }
This rule constructs an exp from two smaller exp groupings connected by a plus-sign token. In the action, $1 and $3 refer to the semantic values of the two component exp groupings, which are the first and third symbols on the right hand side of the rule. The sum is stored into $$ so that it becomes the semantic value of the addition-expression just recognized by the rule. If there were a useful semantic value associated with the '+' token, it could be referred to as $2.
Like entities in Eiffel, $$ is initialized to its default value at the begining of the semantic action. This default value is the same as in Eiffel: 0 for INTEGER, False for BOOLEAN, Void for reference types, etc. Specifying no action for a rule is equivalent to specifying an empty action {}. Therefore the semantic value of such rules is set to its corresponding default value. Note that this is a departure from yacc and Bison behavior: If you don't specify an action for a rule, yacc and Bison would supply a default: { $$ := $1 }. Thus, the value of the first symbol in the rule would become the value of the whole rule. Furthermore, there is no meaningful default action for an empty rule in yacc and Bison; every empty rule must have an explicit action unless the rule's value does not matter. The current behavior of geyacc was deemed more appropriate in the Eiffel context. In Eiffel, all entities are initialized to its default value. $$ could be considered as the Result entity of the semantic action, therefore it is initialized to its default value at the beginning of the action as well. Furthermore, in a typed system such as Eiffel, it is meaningless to use { $$ := $1 } as a default action since there is no guarantee that $$ and $1 will have conforming types.
Note that contrary to yacc and Bison, $N with N zero or negative is not allowed in geyacc.
Actions can include arbitrary Eiffel code. There are a number of special features, inherited from class YY_PARSER, which can be used in actions:
In a simple program it may be sufficient to use the same Eiffel type for the semantic values of all constructs. This was true in the RPN and infix calculator examples. However, in most programs, there will be a need for different Eiffel types for different kinds of tokens and groupings. For example, a numeric constant may need type INTEGER or DOUBLE, while a string constant needs type STRING, and a list of identifiers might need type LINKED_LIST [STRING]. To use more than one Eiffel type for semantic values in one parser, choose the types for each symbol (terminal or nonterminal) for which semantic values are used. This is done for tokens with the %token geyacc declaration, and for groupings with the %type geyacc declaration. If the type of a semantic value has not been specified that way, it will by default be detachable ANY.
Each time $$ or $N is used, its Eiffel type is determined by which symbol it refers to in the rule. In this example:
exp: ... | exp '+' exp { $$ := $1 + $3 }
$1 and $3 refer to instances of exp, so they all have the Eiffel type declared for the nonterminal symbol exp. If $2 were used, it would have the type declared for the terminal symbol '+'.
Occasionally it is useful to put an action in the middle of a rule. These actions are written just like usual end-of-rule actions, but they are executed before the parser even recognizes the following components.
A mid-rule action may refer to the components preceding it using $N, but it may not refer to subsequent components because it is run before they are parsed. The mid-rule action itself counts as one of the components of the rule. This makes a difference when there is another action later in the same rule (and usually there is another at the end): you have to count the actions along with the symbols when working out which number N to use in $N.
The mid-rule action can also have a semantic value. The action can set its value with an assignment to $$, and actions later in the rule can refer to the value using $N. The Eiffel type for the semantic value of a mid-rule action is the same type as declared for the full grouping.
There is no way to set the value of the entire rule with a mid-rule action, because assignments to $$ do not have that effect. The only way to set the value for the entire rule is with an ordinary action at the end of the rule.
Here is an example from a hypothetical compiler, handling a let statement that looks like let (VARIABLE) STATEMENT and serves to create a variable named VARIABLE temporarily for the duration of STATEMENT. To parse this construct, we must put VARIABLE into the symbol table while STATEMENT is parsed, then remove it afterward. Here is how it is done:
stmt: LET '(' var ')' { $$ := new_context contexts.put ($$) $$.declare_variable ($3) } stmt { $$ := $6 contexts.remove ($5) } ;
As soon as let (VARIABLE) has been recognized, the first action is run. It saves a copy of the current semantic context (the list of accessible variables) as its semantic value. Then it calls declare_variable to add the new variable to that list. Once the first action is finished, the embedded statement stmt can be parsed. Note that the mid-rule action is component number 5, so the stmt is component number 6. After the embedded statement is parsed, its semantic value becomes the value of the entire `let'-statement. Then the semantic value from the earlier action is used to restore the prior list of variables. This removes the temporary `let'-variable from the list so that it won't appear to exist while the rest of the program is parsed.
Taking action before a rule is completely recognized often leads to conflicts since the parser must commit to a parse in order to execute the action. For example, the following two rules, without mid-rule actions, can coexist in a working parser because the parser can shift the open-brace token and look at what follows before deciding whether there is a declaration or not:
compound: '{' declarations statements '}' | '{' statements '}' ;
But when we add a mid-rule action as follows, the rules become nonfunctional:
compound: { prepare_for_local_variables } '{' declarations statements '}' | '{' statements '}' ;
Now the parser is forced to decide whether to run the mid-rule action when it has read no further than the open-brace. In other words, it must commit to using one rule or the other, without sufficient information to do it correctly. (The open-brace token is what is called the look-ahead token at this time, since the parser is still deciding what to do about it.) You might think that you could correct the problem by putting identical actions into the two rules, like this:
compound: { prepare_for_local_variables } '{' declarations statements '}' | { prepare_for_local_variables } '{' statements '}' ;
But this does not help, because geyacc does not realize that the two actions are identical. (Geyacc never tries to understand the Eiffel code in an action.) If the grammar is such that a declaration can be distinguished from a statement by the first token (which is true in C), then one solution which does work is to put the action after the open-brace, like this:
compound: '{' { prepare_for_local_variables } declarations statements '}' | '{' statements '}' ;
Now the first token of the following declaration or statement, which would in any case tell geyacc which rule to use, can still do so. Another solution is to bury the action inside a nonterminal symbol which serves as a subroutine:
subroutine: -- Empty { prepare_for_local_variables } ; compound: subroutine '{' declarations statements '}' | subroutine '{' statements '}' ;
Now geyacc can execute the action in the rule for subroutine without deciding which rule for compound it will eventually use. Note that the action is now at the end of its rule. Any mid-rule action can be converted to an end-of-rule action in this way, and this is what geyacc actually does to implement mid-rule actions.
Copyright © 1999-2019, Eric
Bezault mailto:ericb@gobosoft.com http://www.gobosoft.com Last Updated: 25 September 2019 |