17.2 Reader Extensions
The reader layer of the Racket language can be extended through the #reader form. A reader extension is implemented as a module that is named after #reader. The module exports functions that parse raw characters into a form to be consumed by the expander layer.
The syntax of #reader is
#reader ‹module-path› ‹reader-specific›
where ‹module-path› names a module that provides read and read-syntax functions. The ‹reader-specific› part is a sequence of characters that is parsed as determined by the read and read-syntax functions from ‹module-path›.
For example, suppose that file "five.rkt" contains
"five.rkt"
#lang racket/base (provide read read-syntax) (define (read in) (list (read-string 5 in))) (define (read-syntax src in) (list (read-string 5 in)))
Then, the program
#lang racket/base |
'(1 #reader"five.rkt"234567 8) |
is equivalent to
#lang racket/base |
'(1 ("23456") 7 8) |
because the read and read-syntax functions of "five.rkt" both read five characters from the input stream and put them into a string and then a list. The reader functions from "five.rkt" are not obliged to follow Racket lexical conventions and treat the continuous sequence 234567 as a single number. Since only the 23456 part is consumed by read or read-syntax, the 7 remains to be parsed in the usual Racket way. Similarly, the reader functions from "five.rkt" are not obliged to ignore whitespace, and
#lang racket/base |
'(1 #reader"five.rkt" 234567 8) |
is equivalent to
#lang racket/base |
'(1 (" 2345") 67 8) |
since the first character immediately after "five.rkt" is a space.
A #reader form can be used in the REPL, too:
> '#reader"five.rkt"abcde |
'("abcde") |
17.2.1 Source Locations
The difference between read and read-syntax is that read is meant to be used for data while read-syntax is meant to be used to parse programs. More precisely, the read function will be used when the enclosing stream is being parsed by the Racket read, and read-syntax is used when the enclosing stream is being parsed by the Racket read-syntax function. Nothing requires read and read-syntax to parse input in the same way, but making them different would confuse programmers and tools.
The read-syntax function can return the same kind of value as read, but it should normally return a syntax object that connects the parsed expression with source locations. Unlike the "five.rkt" example, the read-syntax function is typically implemented directly to produce syntax objects, and then read can use read-syntax and strip away syntax object wrappers to produce a raw result.
The following "arith.rkt" module implements a reader to parse simple infix arithmetic expressions into Racket forms. For example, 1*2+3 parses into the Racket form (+ (* 1 2) 3). The supported operators are +, -, *, and /, while operands can be unsigned integers or single-letter variables. The implementation uses port-next-location to obtain the current source location, and it uses datum->syntax to turn raw values into syntax objects.
"arith.rkt"
#lang racket (require syntax/readerr) (provide read read-syntax) (define (read in) (syntax->datum (read-syntax #f in))) (define (read-syntax src in) (skip-whitespace in) (read-arith src in)) (define (skip-whitespace in) (regexp-match #px"^\\s*" in)) (define (read-arith src in) (define-values (line col pos) (port-next-location in)) (define expr-match (regexp-match ; Match an operand followed by any number of ; operator–operand sequences, and prohibit an ; additional operator from following immediately: #px"^([a-z]|[0-9]+)(?:[-+*/]([a-z]|[0-9]+))*(?![-+*/])" in)) (define (to-syntax v delta span-str) (datum->syntax #f v (make-srcloc delta span-str))) (define (make-srcloc delta span-str) (and line (vector src line (+ col delta) (+ pos delta) (string-length span-str)))) (define (parse-expr s delta) (match (or (regexp-match #rx"^(.*?)([+-])(.*)$" s) (regexp-match #rx"^(.*?)([*/])(.*)$" s)) [(list _ a-str op-str b-str) (define a-len (string-length a-str)) (define a (parse-expr a-str delta)) (define b (parse-expr b-str (+ delta 1 a-len))) (define op (to-syntax (string->symbol op-str) (+ delta a-len) op-str)) (to-syntax (list op a b) delta s)] [else (to-syntax (or (string->number s) (string->symbol s)) delta s)])) (unless expr-match (raise-read-error "bad arithmetic syntax" src line col pos (and pos (- (file-position in) pos)))) (parse-expr (bytes->string/utf-8 (car expr-match)) 0))
If the "arith.rkt" reader is used in an expression position, then its parse result will be treated as a Racket expression. If it is used in a quoted form, however, then it just produces a number or a list:
> #reader"arith.rkt" 1*2+3 |
5 |
> '#reader"arith.rkt" 1*2+3 |
'(+ (* 1 2) 3) |
The "arith.rkt" reader could also be used in positions that make no sense. Since the read-syntax implementation tracks source locations, syntax errors can at least refer to parts of the input in terms of their original locations (at the beginning of the error message):
> (let #reader"arith.rkt" 1*2+3 8) |
repl:1:27: let: bad syntax (not an identifier and |
expression for a binding) at: + in: (let (+ (* 1 2) 3) 8) |
17.2.2 Readtables
A reader extension’s ability to parse input characters in an arbitrary way can be powerful, but many cases of lexical extension call for a less general but more composable approach. In much the same way that the expander level of Racket syntax can be extended through macros, the reader level of Racket syntax can be composably extended through a readtable.
The Racket reader is a recursive-descent parser, and the readtable maps characters to parsing handlers. For example, the default readtable maps ( to a handler that recursively parses subforms until it finds a ). The current-readtable parameter determines the readtable that is used by read or read-syntax. Rather than parsing raw characters directly, a reader extension can install an extended readtable and then chain to read or read-syntax.
See Dynamic Binding: parameterize for an introduction to parameters.
The make-readtable function constructs a new readtable as an extension of an existing one. It accepts a sequence of specifications in terms of a character, a type of mapping for the character, and (for certain types of mappings) a parsing procedure. For example, to extend the readtable so that $ can be used to start and end infix expressions, implement a parse-dollar function and use:
(make-readtable (current-readtable) |
#\$ 'terminating-macro read-dollar) |
The protocol for read-dollar requires the function to accept different numbers of arguments depending on whether it is being used in read or read-syntax mode. In read mode, the parser function is given two arguments: the character that triggered the parser function and the input port that is being read. In read-syntax mode, the function must accept four additional arguments that provide the source location of the character.
The following "dollar.rkt" module defines a parse-dollar function in terms of the read and read-syntax functions provided by "arith.rkt", and it puts parse-dollar together with new read and read-syntax functions that install the readtable and chain to Racket’s read or read-syntax:
"dollar.rkt"
#lang racket (require syntax/readerr (prefix-in arith: "arith.rkt")) (provide (rename-out [$-read read] [$-read-syntax read-syntax])) (define ($-read in) (parameterize ([current-readtable (make-$-readtable)]) (read in))) (define ($-read-syntax src in) (parameterize ([current-readtable (make-$-readtable)]) (read-syntax src in))) (define (make-$-readtable) (make-readtable (current-readtable) #\$ 'terminating-macro read-dollar)) (define read-dollar (case-lambda [(ch in) (check-$-after (arith:read in) in (object-name in))] [(ch in src line col pos) (check-$-after (arith:read-syntax src in) in src)])) (define (check-$-after val in src) (regexp-match #px"^\\s*" in) ; skip whitespace (let ([ch (peek-char in)]) (unless (equal? ch #\$) (bad-ending ch src in)) (read-char in)) val) (define (bad-ending ch src in) (let-values ([(line col pos) (port-next-location in)]) ((if (eof-object? ch) raise-read-error raise-read-eof-error) "expected a closing `$'" src line col pos (if (eof-object? ch) 0 1))))
With this reader extension, a single #reader can be used at the beginning of an expression to enable multiple uses of $ that switch to infix arithmetic:
> #reader"dollar.rkt" (let ([a $1*2+3$] [b $5/6$]) $a+b$) |
35/6 |