4.3 Languages
syntax
(define-language lang-name non-terminal-def ... maybe-binding-spec)
non-terminal-def = (non-terminal-name ...+ ::= pattern ...+) | (non-terminal-name pattern ...+) | ((non-terminal-name ...+) pattern ...+) maybe-binding-spec =
| #:binding-forms binding-pattern ... binding-pattern = pattern | binding-pattern #:exports beta | binding-pattern #:refers-to beta | binding-pattern #:...bind (id beta beta) beta = nothing | symbol | (shadow beta-sequence ...) beta-sequence = beta | ... ; literal ellipsis
A non-terminal-def comprises one or more non-terminal names (considered aliases) followed by one or more productions.
(define-language lc-lang (e ::= (e e ...) x (λ (x ...) e)) (v ::= (λ (x ...) e)) (E ::= (v ... E e ...) hole) (x y ::= variable-not-otherwise-mentioned))
It has non-terminals: e for the expression language, x and y for variables, v for values, and E for the evaluation contexts.
Non-terminals used in define-language are not bound in side-condition patterns. Duplicate non-terminals that appear outside of the binding-forms section are not constrained to be the same unless they have underscores in them.
4.3.1 Binding Forms
Typical languages provide a mechanism for the programmer to introduce new names and give them meaning. The language forms used for this (such as Racket’s let and λ) are called binding forms.
Binding forms require special treatment from the language implementer. In Redex, this treatment consists of declaring the binding forms at the time of language definition. Explicitly declaring binding forms makes safely manipulating terms containing binding simpler and easier, eliminating the need to write operations that (explicitly) respect the binding structure of the language.
When maybe-binding-spec is provided, it declares binding specifications for certain forms in the language. The binding-pattern specification is an extension of Redex’s pattern language, allowing the keywords #:refers-to, #:exports, and #:...binds to appear nested inside a binding pattern.
The language, lc-lang, above does not declare any binding specifications, despite the clear intention of λ as a binding form. To understand the consequences of not specifying any binding forms, consider the behavior of substitution on terms of lc-lang.
Passing the #:lang argument to term allows the substitute metafunction to determine the language of its arguments.
> (term (substitute (x (λ (x) (λ (y) x))) x (y y)) #:lang lc-lang) '((y y) (λ ((y y)) (λ (y) (y y))))
This call is intended to replace all free occurrences of x with (y y) in the first argument to substitute. But, because lc-lang is missing a binding forms declaration, substitute replaces all instances of x with (y y) in the term (x (λ (x) (λ (y) x))). Note that even the x that appears in what is normally a binding position has been replaced, resulting in an ill-formed lambda expression.
In order to have substitute behave correctly when substituting over terms that contain bound variables, the language lc-lang must declare its binding specification. Consider the following simplification of the lc-lang definition, this time with a binding form declaration for λ.
(define-language lc-bind (e ::= (e e) x (λ (x) e)) (v ::= (λ (x) e)) (x y ::= variable-not-otherwise-mentioned) #:binding-forms (λ (x) e #:refers-to x))
Just like Racket’s λ, in lc-bind all instances of the argument variable in the body of the lambda refer to the argument. In a binding declaration, this is specified using the #:refers-to keyword. Now the previous example has the right behavior.
> (term (substitute (x (λ (x) (λ (y) x))) x (y y)) #:lang lc-bind) '((y y) (λ (x«0») (λ (y«1») x«0»)))
Note that sometimes substitute changes the names of the bound identifiers, in this case replacing the x and y with identifiers that have « and » in their names.
The #:refers-to declaration says that, in a λ term, the e subterm has the name from the x subterm in scope.
4.3.2 Multiple Variables in a Single Scope
To generalize to the version of λ in lc-lang, we need to cope with multiple variables at once. And in order to do that, we must handle the situation where some of the names are the same. Redex’s binding support offers only one option for this, namely taking the variables in order, using the keyword shadow. It also allows us to specify the binding structure for let:
(define-language lc-bind+let (e ::= x number (λ (x ...) e) (e e) (let ([x e] ...) e)) (x ::= variable-not-otherwise-mentioned) #:binding-forms (λ (x ...) e #:refers-to (shadow x ...)) (let ([x e_x] ...) e_body #:refers-to (shadow x ...)))
This #:binding-forms declaration says that the subterm e of the λ expression refers to all of the binders in λ. Similarly, the e_body refers to all of the binders in the let expression.
> (term (substitute (let ([x 5] [y x]) (y x)) x z) #:lang lc-bind+let) '(let ((x«2» 5) (y«3» z)) (y«3» x«2»))
The intuition behind the name of the shadow form can be seen in the following example:
> (term (substitute (let ([x 1] [y x] [x 3]) x) x z) #:lang lc-bind+let) '(let ((x«4» 1) (y«5» z) (x«6» 3)) x«4»)
Because the lc-bind+let language does not require that all binders in its let form be distinct from one another, the binding forms specification must declare what happens when there is a conflict. The shadow form specifies that duplicate binders will be shadowed by earlier binders in its list of arguments. (Of course, if we were interested in modelling Racket’s let form, we’d want that term to be malformed syntax.)
It is possible to have multiple uses of #:refers-to in a single binding specification. For example, consider a language with a letrec form.
(define-language lc-bind+letrec (e ::= x number (λ (x ...) e) (e e) (let ([x e] ...) e) (letrec ([x e] ...) e)) (x ::= variable-not-otherwise-mentioned) #:binding-forms (λ (x ...) e #:refers-to (shadow x ...)) (let ([x e_x] ...) e_body #:refers-to (shadow x ...)) (letrec ([x e_x] ...) #:refers-to (shadow x ...) e_body #:refers-to (shadow x ...)))
In this binding specification the subterms corresponding to both ([x e_x] ...) and e_body refer to the bound variables (shadow x ...).
> (term (substitute (letrec ([x x]) x) x y) #:lang lc-bind+letrec) '(letrec ((x«7» x«7»)) x«7»)
> (term (substitute (letrec ([x (λ (a) (y a))] [y (λ (b) (z b))] [z a]) (x 7)) a (λ (x) 5)) #:lang lc-bind+letrec)
'(letrec ((x«8» (λ (a«11») (y«9» a«11»)))
(y«9» (λ (b«12») (z«10» b«12»)))
(z«10» (λ (x) 5)))
(x«8» 7))
4.3.3 Ellipses in Binding Forms
Some care must be taken when writing binding specifications that match patterns with ellipses. If a pattern symbol is matched underneath ellipses, it may only be mentioned under the same number of ellipses. Consider, for example, a language with Racket’s let-values binding form.
(define-language lc-bind+values (e ::= x number (λ (x ...) e) (e e) (values e ...) (let-values ([(x ...) e] ...) e)) (x ::= variable-not-otherwise-mentioned) #:binding-forms (λ (x ...) e #:refers-to (shadow x ...)) (let-values ([(x ...) e_x0] ...) e_body #:refers-to (shadow (shadow x ...) ...)))
In the binding specification for the let-values form, the bound variable, x, occurs only under a single ellipsis, thus when it is mentioned in a #:refers-to clause it is restricted to be mentioned only underneath a single ellipsis. Therefore the body of the let-values form must refer to (shadow (shadow x ...) ...) rather than (shadow x ... ...).
4.3.4 Compound Forms with Binders
So far, the nonterminals mentioned in #:refers-to have always stood directly for variables that appear in the terms. But sometimes the variables are down inside some piece of the term, or only some of the variables are relevant. The #:exports clause can be used to handle such situations.
When a binding form with an #:exports clause is mentioned, the names brought into scope are determined by recursively examining everything mentioned by that #:exports clause. Consider the following version of the lc-bind language with lists that allows for pattern matching in binding positions.
(define-language lc-bind+patterns (e ::= x number (λ (p) e) (e e) (list e ...)) (x ::= variable-not-otherwise-mentioned) (p ::= (listp p ...) x) #:binding-forms (λ (p) e #:refers-to p) (listp p ...) #:exports (shadow p ...))
In this language functions accept patterns as arguments, therefore all variables mentioned in a pattern in binding position should be bound in the body of the function. A call to the substitute metafunction shows this behavior.
> (term (substitute (x (λ ((listp w (listp x y) z)) (list z y x w))) x u) #:lang lc-bind+patterns)
'(u
(λ ((listp w«13» (listp x«14» y«15») z«16»)) (list z«16» y«15» x«14» w«13»)))
The use of the #:exports clause in the binding specification for lc-bind+patterns allows the use of nested binding patterns seen in the example. More precisely, each p may itself be a pattern that mentions any number of bound variables.
4.3.5 Binding Repetitions
In some situations, the #:exports and #:refers-to keywords are not sufficiently expressive to be able to describe the binding structure of different parts of a repeated sequence relate to each other. For example, consider the let* form. Its shape is the same as let, namely (let* ([x e] ...) e), but the binding structure is different.
In a let* form, each variable is accessible to each of the es that follow it, with all of the variables available in the body (the final e). With #:exports, we can build an expression form that has a structure like that, but we must write syntax that nests differently than let*.
(define-language lc-bind+awkward-let* (e ::= (let*-awk c e) natural x (+ e ...)) (x ::= variable-not-otherwise-mentioned) (c ::= (clause x e c) ()) #:binding-forms (let*-awk c e #:refers-to c) (clause x e c #:refers-to x) #:exports (shadow x c))
The let*-awk form binds like Racket’s let*, with each clause’s variable being active for the subsequent ones, but the syntax is different with extra nesting inside the clauses:
> (term (substitute (let*-awk (clause x y (clause y x ())) (+ x y z)) x 1) #:lang lc-bind+awkward-let*) '(let*-awk (clause x«17» y (clause y«18» x«17» ())) (+ x«17» y«18» z))
> (term (substitute (let*-awk (clause x y (clause y x ())) (+ x y z)) y 2) #:lang lc-bind+awkward-let*) '(let*-awk (clause x«19» 2 (clause y«20» x«19» ())) (+ x«19» y«20» z))
In order to get the same syntax as Racket’s let*, we need to use the #:...bind binding pattern annotation. A #:...bind can appear wherever a ... might appear, and it has the same function, namely indicating a repetition of the preceding pattern. In addition, however it comes with three extra pieces that follow the #:...bind form that describe how the binding structure inside the repetition is handled. The first part is a name that can be used by a #:refers-to outside of the repetition to indicate all of the exported variables of the sequence. The middle piece indicates the variables from a specific repetition of the ellipsis are exported to all subsequent repetitions of the ellipsis. The last piece is a beta that moves backwards through the sequence, indicating what is exported from the last repetition of the sequence to the one before, from the one before to the one before that, and then finally from the first one to the export of the entire sequence (as named by the identifier in the first position).
So, in this example, we use #:...bind to express the scope of let*.
(define-language lc-bind+let* (e ::= (let* ([x e] ...) e) natural x (+ e ...)) (x ::= variable-not-otherwise-mentioned) #:binding-forms (let* ([x e] #:...bind (clauses x (shadow clauses x))) e_body #:refers-to clauses))
It says that the name of the exported variables from the entire sequence is clauses, which means that all of the variable exported from the sequence in the second position of the let* bind variables in the body (thanks to the last #:refers-to in the example). The x in the second position following the #:...bind says that x is in scope for each of the subsequent [x e] elements of the sequence. The final (shadow clauses x) says that the variables in a subsequent clauses are exported by the current one, as well as x, which then is exported by the entire sequence.
> (term (substitute (let* ([x y] [y x]) (+ x y z)) x 1) #:lang lc-bind+let*) '(let* ((x«22» y) (y«21» x«22»)) (+ x«22» y«21» z))
> (term (substitute (let* ([x y] [y x]) (+ x y z)) y 2) #:lang lc-bind+let*) '(let* ((x«24» 2) (y«23» x«24»)) (+ x«24» y«23» z))
syntax
syntax
syntax
syntax
(define-extended-language extended-lang base-lang non-terminal-def ... maybe-binding-spec)
non-terminal-def = (non-terminal-name ...+ ::= pattern ...+) | (non-terminal-name pattern ...+) | ((non-terminal-name ...+) pattern ...+) maybe-binding-spec =
| #:binding-forms binding-declaration ...
(define-extended-language lc-num-lang lc-lang (e ::= .... ; extend the previous `e' non-terminal number +) (v ::= .... ; extend the previous `v' non-terminal number +))
extends lc-lang with two new alternatives (+ and number) for the v non-terminal, carries forward the e, E, x, and y non-terminals. Note that the meaning of variable-not-otherwise-mentioned adapts to the language where it is used, so in this case it is equivalent to (variable-except λ +) because λ and + are used as literals in this language.
The four-period ellipses indicates that the new language’s non-terminal has all of the alternatives from the original language’s non-terminal, as well as any new ones. If a non-terminal occurs in both the base language and the extension, the extension’s non-terminal replaces the originals. If a non-terminal only occurs in the base language, then it is carried forward into the extension. And, of course, define-extended-language lets you add new non-terminals to the language.
If a language has a group of multiple non-terminals defined together, extending any one of those non-terminals extends all of them.
syntax
(define-union-language L base/prefix-lang ...)
base/prefix-lang = lang-id | (prefix lang-id)
(define-language L1 (e ::= (+ e e) number)) (define-language L2 (e ::= (if e e e) true false)) (define-union-language L1-plus-L2 L1 L2)
If a language has a prefix, then all of the non-terminals from that language have the corresponding prefix in the union language. The prefix helps avoid unintended collisions between the constituent language’s non-terminals.
(define-language UT (e (e e) (λ (x) e) x)) (define-language WT (e (e e) (λ (x t) e) x) (t (→ t t) num))
(define-union-language B (ut. UT) (wt. WT))
procedure
(make-immutable-binding-hash lang [assocs]) → dict?
lang : compiled-lang? assocs : (listof pair?) = '()
Added in version 1.14 of package redex-lib.
procedure
(make-binding-hash lang [assocs]) → dict?
lang : compiled-lang? assocs : (listof pair?) = '()
Added in version 1.14 of package redex-lib.
procedure
(language-nts lang) → (listof symbol?)
lang : compiled-lang?
procedure
(compiled-lang? l) → boolean?
l : any/c
parameter
(default-language lang) → void? lang : (or/c false/c compiled-lang?)
The default-language parameter is set to the appropriate language inside judgment forms and metafunctions, and by apply-reduction-relation.
procedure
(alpha-equivalent? lang lhs rhs) → boolean?
lang : compiled-lang? lhs : any/c rhs : any/c (alpha-equivalent? lhs rhs) → boolean? lhs : any/c rhs : any/c
If the lang argument is not supplied, it defaults to the value of (default-language), which must not #f.
metafunction
(substitute val old-var new-val)
(substitute val (old-var new-val) ...)
> (define-language lc-bind (e ::= (e e) x (λ (x ...) e)) (x ::= variable-not-otherwise-mentioned) #:binding-forms (λ (x ...) e #:refers-to (shadow x ...)))
> (define-metafunction lc-bind β-reduce : (λ (x ..._1) e) e ..._1 -> e [(β-reduce (λ (x ...) e) e_x ...) (substitute e [x e_x] ...)]) > (term (β-reduce (λ (x y) (x y)) y z)) '(y z)
Note that substitute is merely a convenience metafunction. Any manually-written substitution in the correct language will also be capture-avoiding, provided that the language’s binding forms are correctly defined. However, substitute may be significantly faster.
Changed in version 1.19 of package redex-lib: Added support for simultaneous substitutions