3 Parsing

Honu is parsed using an algorithm based primarily on operator precedence. The main focus of the operator precedence algorithm is to support infix operators. In short, the algorithm operates in the following way

Parsing will maintain the following registers
  • left - a function that takes the right hand side of an expression and returns the infix expression by combining the left hand side and the operator.

  • current - the current right hand side

  • precedence - represents the current precedence level

  • stream - stream of tokens to parse

This algorithm is illustrated with the following example. Consider the raw stream of tokens

 1 + 2 * 3 - 9 

left

 

current

 

precedence

 

stream

(lambda (x) x)

 

#f

 

0

 

1 + 2 * 3 - 9

(lambda (x) x)

 

1

 

0

 

+ 2 * 3 - 9

(lambda (x) #'(+ 1 x))

 

#f

 

1

 

2 * 3 - 9

(lambda (x) #'(+ 1 x))

 

2

 

1

 

* 3 - 9

(lambda (x) (left #'(* 2 x)))

 

2

 

2

 

3 - 9

(lambda (x) (left #'(* 2 x)))

 

3

 

2

 

- 9

(lambda (x) #'(- (+ 1 (* 2 3)) x))

 

#f

 

1

 

9

(lambda (x) #'(- (+ 1 (* 2 3)) x))

 

9

 

1

 

 

When the stream of tokens is empty the current register is passed as an argument to the left function which ultimately produces the expression
(- (+ 1 (* 2 3)) 9)

In this example + and - both have a precedence of 1 while * has a precedence of 2. Currently, precedences can be any number that can be compared with <=.

The example takes some liberties with respect to how the actual implementation works. In particular the binary operators are syntax transformers that accept the left and right hand expressions as parameters and return new syntax objects. Also when the * operator is parsed the left function for + is nested inside the new function for *.

An expression can be one of the following
  • datum - number, string, or symbol.
    5

  • macro - a symbol bound to a syntax transformer.
    cond x = 5: true, else: false

  • stop - a symbol which immediately ends the current expression. these are currently , ; :

  • lambda expression - an identifier followed by (id ...) followed by a block of code in braces.
    add(x, y){ x + y }

  • function application - an expression followed by (arg ...).
    f(2, 2)

  • list comprehension -
    [x + 1: x <- [1, 2, 3]]

  • block of code - a series of expressions wrapped in braces.

  • expression grouping - any expression inside a set of parenthesis
    (1 + 1) * 2