3 Parsing
Honu is parsed using an algorithm based primarily on operator precedence. The main focus of the operator precedence algorithm is to support infix operators. In short, the algorithm operates in the following way
1. parse an expression
2. check for a binary operator. if one is found then continue to step 3 otherwise return the expression from step 1 immediately.
3. parse another expression
4. check for a binary operator. if one is found then check if its precedence is higher than the operator found in step 2, and if so then continue parsing from step 3. if the precedence is lower or an operator is not found then build an infix expression from the left hand expression from step 1, the binary operator in step 2, and the right hand expression in step 3.
left - a function that takes the right hand side of an expression and returns the infix expression by combining the left hand side and the operator.
current - the current right hand side
precedence - represents the current precedence level
stream - stream of tokens to parse
This algorithm is illustrated with the following example. Consider the raw stream of tokens
1 + 2 * 3 - 9
left |
| current |
| precedence |
| stream | |
(lambda (x) x) |
| #f |
| 0 |
|
| |
(lambda (x) x) |
| 1 |
| 0 |
|
| |
(lambda (x) #'(+ 1 x)) |
| #f |
| 1 |
|
| |
(lambda (x) #'(+ 1 x)) |
| 2 |
| 1 |
|
| |
(lambda (x) (left #'(* 2 x))) |
| 2 |
| 2 |
|
| |
(lambda (x) (left #'(* 2 x))) |
| 3 |
| 2 |
|
| |
(lambda (x) #'(- (+ 1 (* 2 3)) x)) |
| #f |
| 1 |
|
| |
(lambda (x) #'(- (+ 1 (* 2 3)) x)) |
| 9 |
| 1 |
|
(- (+ 1 (* 2 3)) 9)
In this example + and - both have a precedence of 1 while * has a precedence of 2. Currently, precedences can be any number that can be compared with <=.
The example takes some liberties with respect to how the actual implementation works. In particular the binary operators are syntax transformers that accept the left and right hand expressions as parameters and return new syntax objects. Also when the * operator is parsed the left function for + is nested inside the new function for *.
- datum - number, string, or symbol.
5 - macro - a symbol bound to a syntax transformer.
cond x = 5: true, else: false stop - a symbol which immediately ends the current expression. these are currently , ; :
- lambda expression - an identifier followed by (id ...) followed by a block of code in braces.
add(x, y){ x + y } - function application - an expression followed by (arg ...).
f(2, 2) - list comprehension -
[x + 1: x <- [1, 2, 3]] block of code - a series of expressions wrapped in braces.
- expression grouping - any expression inside a set of parenthesis
(1 + 1) * 2