Symbols, Terminal and Nonterminal PreviousNext

Symbols in geyacc grammars represent the grammatical classifications of the language.

A terminal symbol (also known as a token type) represents a class of syntactically equivalent tokens. You use the symbol in grammar rules to mean that a token in that class is allowed. The symbol is represented in the geyacc parser by a numeric code, and the read_token routine returns a token type code last_token to indicate what kind of token has been read. You don't need to know what the code value is; you can use the symbol to stand for it.

A nonterminal symbol stands for a class of syntactically equivalent groupings. The symbol name is used in writing grammar rules. By convention, it should be in lower case.

Symbol names are case-insensitive words beginning with a letter and followed by zero or more letters, digits, or underscores.

There are three ways of writing terminal symbols in the grammar:

How you choose to write a terminal symbol has no effect on its grammatical meaning. That depends only on where it appears in rules.

The value returned by read_token is always one of the terminal symbols (or 0 for end-of-input). Whichever way you write the token type in the grammar rules, you write it the same way in the definition of read_token. The numeric code for a character token type is simply the ASCII code for the character, so read_token can use the identical character constant to generate the requisite code. Each named token type becomes an integer constant feature in the parser class, so read_token can use the name to stand for the code. For a literal string token, read_token has to use the named token type associated with it.

If read_token is defined in a separate class, you need to arrange for the token-type integer constants definitions to be available there. Use the -t option when running geyacc, so that it will write these constants definitions into a separate class from which you can inherit in the classes that need it.

The symbol error is a terminal symbol reserved for error recovery; it shouldn't be used for any other purpose. In particular, read_token should never return this value.

Copyright 1999, Eric Bezault
Last Updated: 10 October 1999