2 LALR(1) Parsers
(require parser-tools/yacc) | package: parser-tools-lib |
syntax
(parser clause ...)
clause =
(grammar (non-terminal-id ((grammar-id ...) maybe-prec expr) ...) ...) | (tokens group-id ...) | (start non-terminal-id ...) | (end token-id ...) | (error expr) | (precs (assoc token-id ...) ...) | (src-pos) | (suppress) | (expected-SR-conflicts num) | (expected-RR-conflicts num) | (debug filename) | (yacc-output filename) maybe-prec =
| (prec token-id) assoc = left | right | nonassoc
(grammar (non-terminal-id ((grammar-id ...) maybe-prec expr) ...) ...) Declares the grammar to be parsed. Each grammar-id can be a token-id from a group-id named in a tokens declaration, or it can be a non-terminal-id declared in the grammar declaration. The optional prec declaration works with the precs declaration. The expr is a “semantic action,” which is evaluated when the input is found to match its corresponding production.
Each action is Racket code that has the same scope as its parser’s definition, except that the variables $1, ..., $i are bound, where i is the number of grammar-ids in the corresponding production. If the kth grammar symbol on the right of the production is a non-terminal, $k is bound to the result of its action; if it is a terminal, $k is bound to the value stored in the token. If the src-pos option is present in the parser, then variables $1-start-pos, ..., $i-start-pos and $1-end-pos, ..., $i-end-pos and are also available, and they refer to the position structures corresponding to the start and end of the corresponding grammar-symbol. Grammar symbols defined as empty-tokens have no $k associated, but do have $k-start-pos and $k-end-pos. Also $n-start-pos and $n-end-pos are bound to the largest start and end positions, (i.e., $i-start-pos and $i-end-pos).
An error production can be defined by providing a production of the form (error α), where α is a string of grammar symbols, possibly empty.
All of the productions for a given non-terminal must be grouped with it. That is, no non-terminal-id may appear twice on the left hand side in a parser.
(tokens group-id ...)
Declares that all of the tokens defined in each group-id—
as bound by define-tokens or define-empty-tokens— can be used by the parser in the grammar declaration. (start non-terminal-id ...)
Declares a list of starting non-terminals for the grammar.
(end token-id ...)
Specifies a set of tokens from which some member must follow any valid parse. For example, an EOF token would be specified for a parser that parses entire files and a newline token for a parser that parses entire lines individually.
(error expr)
The expr should evaluate to a function which will be executed for its side-effect whenever the parser encounters an error.
If the src-pos declaration is present, the function should accept 5 arguments:
(lambda (tok-ok? tok-name tok-value start-pos end-pos) ....) Otherwise it should accept 3:
(lambda (tok-ok? tok-name tok-value) ....) The first argument will be #f if and only if the error is that an invalid token was received. The second and third arguments will be the name and the value of the token at which the error was detected. The fourth and fifth arguments, if present, provide the source positions of that token.
In both cases, the function is allowed to accept an additional keyword argument named #:stack. This argument is a representation of the parsing automaton’s stack. This can, for example, be used to generate context-sensitive error messages as described in Generating LR syntax error messages from examples, by Clinton L. Jeffrey.
(precs (assoc token-id ...) ...) OPTIONAL
Precedence declarations to resolve shift/reduce and reduce/reduce conflicts as in yacc/bison. An assoc must be one of left, right or nonassoc. States with multiple shift/reduce or reduce/reduce conflicts (or some combination thereof) are not resolved with precedence.
(src-pos) OPTIONAL
Causes the generated parser to expect input in the form (make-position-token token start-pos end-pos) instead of simply token. Include this option when using the parser with a lexer generated with lexer-src-pos.
(debug filename) OPTIONAL
Causes the parser generator to write the LALR table to the file named filename (unless the file exists), where filename is a literal string. Additionally, if a debug file is specified, when a running generated parser encounters a parse error on some input file, after the user specified error expression returns, the complete parse stack is printed to assist in debugging the grammar of that particular parser. The numbers in the stack printout correspond to the state numbers in the LALR table file.
(yacc-output filename) OPTIONAL
Causes the parser generator to write a grammar file in approximately the syntax of yacc/bison. The file might not be a valid yacc file, because the Racket grammar can use symbols that are invalid in C.
(suppress) OPTIONAL
Causes the parser generator not to report shift/reduce or reduce/reduce conflicts.
(expected-SR-conflicts n) OPTIONAL
Causes the parser generator to expect exactly num shift/reduce conflicts, where num must be a literal number. The suppress option overrides this option.
(expected-RR-conflicts n) OPTIONAL
Causes the parser generator to expect exactly num reduce/reduce conflicts, where num must be a literal number. The suppress option overrides this option.
The result of a parser expression with one start non-terminal is a function, parse, that takes one argument. This argument must be a zero argument function, gen, that produces successive tokens of the input each time it is called. If desired, the gen may return symbols instead of tokens, and the parser will treat symbols as tokens of the corresponding name (with #f as a value, so it is usual to return symbols only in the case of empty tokens). The parse function returns the value associated with the parse tree by the semantic actions. If the parser encounters an error, after invoking the supplied error function, it will try to use error productions to continue parsing. If it cannot, it raises exn:fail:read.
If multiple non-terminals are provided in start, the parser expression produces a list of parsing functions, one for each non-terminal in the same order. Each parsing function is like the result of a parser expression with only one start non-terminal.
Each time the Racket code for a parser is compiled (e.g. when a ".rkt" file containing a parser form is loaded), the parser generator is run. To avoid this overhead place the parser into a module and compile the module to a ".zo" bytecode file.
; Use the lexer and tokens from Tokens ; and the sample input from Creating a Lexer
> (define the-parser (parser [start expr] [end EOF] [error void] [tokens basic-tokens punct-tokens] [grammar [expr [(LPAREN exprs RPAREN) $2] [(NUM) $1] [(ID) $1]] [exprs [() '()] [(expr exprs) (cons $1 $2)]]])) > (define p (open-input-string sample-input)) > (the-parser (λ () (the-lexer/tokens p))) '(lambda (a) (add_number a 42))