Introduction to Gelex

Introduction to Gelex

In most operating systems, files are generally dealt with as a stream of individual characters. This abstraction is very powerful since any file can be represented by it. However there is a disadvantage in that a programmer almost always has to impose a structure upon the raw file, or in other words, break the file into more meaningful components. For example one part of a compiler takes a stream of characters from a file and groups them into units that the syntax checker understands, like numbers, keywords and strings. This is because the language parser of the compiler does not work with a stream of characters but with a stream of symbols from the language.

Database applications and applications dealing with binary files often have a fixed format for their data and this format is used for extracting meaning from the input. The opposite is usually true for programs that read in text. These programs must often break the input into words or symbols, and there is usually no set structure to the way these words or symbols are laid out. So, in order to break the input into meaningful symbols, programs that deal with text often include a stage called the lexical analysis stage or lexical scanning stage which takes care of breaking up the input. The functions for doing this are referred to as lexical analyzers or lexical scanners, or scanners for short. A scanner is like a factory that takes in raw materials (i.e. characters) and produces the finished product (tokens), ready for the consumers (e.g. parsers).

Generally, writing scanners is neither difficult nor interesting for a programmer, but it can be time consuming. Fortunately Gobo Eiffel Lex provides programmers with a method for cleanly describing the lexical analysis stage and generating efficient lexical scanners from the description. The programmer supplies gelex with a description of the scanner needed and gelex uses this description to produce a scanner in Eiffel. The description language is a high-level language and is much more suitable for describing scanner than Eiffel. It allows the programmer to specify how to group characters and also what actions to take after a grouping is completed.

Gelex is not limited to use only in compilers. Think of all the programs on your computer that need to read files and deal with groups of characters in some way, and in particular transformation filters and language tools. Almost all of these programs can be written using gelex, or gelex combined with other tools.

Gelex can save programmers significant amounts of time in developing scanners and processing the characters that make up a file. In most cases the gelex input will be easier to understand, and at least as portable, and easier to maintain than code directly written in C and/or Eiffel. No more Eiffel external features calling scanners generated in C (using lex or flex for example), and no need for Cecil in the semantic actions to call Eiffel routines from C. With gelex programmers can directly write semantic actions in Eiffel. Not only that, but because scanners in gelex can usually be developed in a much shorter time than traditional methods, it is ideal for prototyping (taking advantage of incremental compilation provided by most Eiffel compilers) and for one-shot programs or filters.