16.33 LALR: A parser generator

Author: Arthur Norman

This code serves the same sort of purpose as the well known utilities yacc and bison, in that it accepts a specification of a contex free grammar and builds a parser for the language that it describes. The techniques used are those described in standard texts (eg Aho, Lam, Sethi and Ullman) as “LALR”.

Grammars are presented to the code here as lists. In a grammar strings will denote terminal symbols and most symbols non-terminals. A small number of special symbols will stand for classes of terminal that are especially useful.

A grammar is given as a list of rules. Each rule has a single non-terminal and then a seqence of productions for it. Each production can optionally be provided with an associated semantic action.

 GRAMMAR ::= ~(~ GTAIL  
 GTAIL   ::= ~)~         % Sequence of rules  
         |   RULE GTAIL  
 RULE    ::= ~(~ nonterminal RULETAIL  
 RULETAIL::= ~)~         % Sequence of productions  
 PRODUCTION ::= ~(~ ~(~ PT1 PT2  
 PT1     ::= ~)~         % Symbols to make one production  
         | nonterminal PT1  
         | terminal PT1  
 PT2     ::= ~)~         % Semantic actions as Lisp code  
         | lisp-s-expressoin PT2  

Before specifying a grammar it is possible to declare the precedence and associativity of some of the terminal symbols it will use. Here is an example:

 lalr_precedence ’(~.~ !:right ~^~ !:left (~*~ ~/~) (~+~ ~-~));

will arrange that the token “.” has highest precedence and that it is left associative. Next comes “^” which is right associative. Both “*” and “/” have the same precedence and both are left associative, and finally “+” and “-” are also equal in precedence but lower than “*”.

With those precedences in place one could specify a grammar by

 lalr_construct_parser ’(  
    (S  ((!:symbol))  
        ((S ~.~ S))  
        ((S ~^~ S))  
        ((S ~*~ S))  
        ((S ~/~ S))  
        ((S ~+~ S))  
        ((S ~-~ S))));

The special markers !:symbol and !:number will match any symbols or numbers – for commentary on what count as such see the discussion of the lexter later on. The strings stand for fixed tokens and by virtue of them being used in the grammer the lexer will recognise them specially.