6 Text Preprocessing
The
scribble/text
language provides everything from
racket/base with a few
changes that make it suitable as a preprocessor language:
It uses read-syntax-inside to read the body of the
module, similar to Document Reader. This means that by
default, all text is read in as Racket strings; and
@-forms can be used to use Racket
functions and expression escapes.
Values of expressions are printed with a custom
output function. This function displays most values
in a similar way to display, except that it is more
convenient for a preprocessor output.
6.1 Writing Preprocessor Files
The combination of the two features makes text in files in the
scribble/text language be read as strings, which get printed
out when the module is required, for example, when a file is
given as an argument to racket. (In these example the left
part shows the source input, and the right part the printed result.)
| #lang scribble/text | Programming languages should | be designed not by piling | feature on top of feature, but | blah blah blah. |
|
| → | Programming languages should | be designed not by piling | feature on top of feature, but | blah blah blah. |
|
Using @-forms, we can define and use Racket
functions.
| #lang scribble/text | @(require racket/list) | @(define Foo "Preprocessing") | @(define (3x . x) | ;; racket syntax here | (add-between (list x x x) " ")) | @Foo languages should | be designed not by piling | feature on top of feature, but | @3x{blah}. |
|
| → | Preprocessing languages should | be designed not by piling | feature on top of feature, but | blah blah blah. |
|
As demonstrated in this case, the output function simply
scans nested list structures recursively, which makes them convenient
for function results. In addition, output prints most values
similarly to display — notable exceptions are void and
false values which cause no output to appear. This can be used for
convenient conditional output.
| #lang scribble/text | @(define (errors n) | (list n | " error" | (and (not (= n 1)) "s"))) | You have @errors[3] in your code, | I fixed @errors[1]. |
|
| → | You have 3 errors in your code, | I fixed 1 error. |
|
Using the scribble @-forms syntax, you can write
functions more conveniently too.
| #lang scribble/text | @(define (errors n) | ;; note the use of `unless' | @list{@n error@unless[(= n 1)]{s}}) | You have @errors[3] in your code, | I fixed @errors[1]. |
|
| → | You have 3 errors in your code, | I fixed 1 error. |
|
Following the details of the scribble reader, you may notice that in
these examples there are newline strings after each definition, yet
they do not show in the output. To make it easier to write
definitions, newlines after definitions and indentation spaces before
them are ignored.
| #lang scribble/text |
| @(define (plural n) | (unless (= n 1) "s")) |
| @(define (errors n) | @list{@n error@plural[n]}) |
| You have @errors[3] in your code, | @(define fixed 1) | I fixed @errors[fixed]. |
|
| → | You have 3 errors in your code, | I fixed 1 error. |
|
These end-of-line newline strings are not ignored when they follow
other kinds of expressions, which may lead to redundant empty lines in
the output.
| #lang scribble/text | @(define (count n str) | (for/list ([i (in-range 1 (add1 n))]) | @list{@i @str,@"\n"})) | Start... | @count[3]{Mississippi} | ... and I'm done. |
|
| → | Start... | 1 Mississippi, | 2 Mississippi, | 3 Mississippi, |
| ... and I'm done. |
|
There are several ways to avoid having such empty lines in your
output. The simplest way is to arrange for the function call’s form
to end right before the next line begins, but this is often not too
convenient. An alternative is to use a @; comment, which
makes the scribble reader ignore everything that follows it up to and
including the newline. (These methods can be applied to the line that
precedes the function call too, but the results are likely to have
what looks like erroneous indentation. More about this below.)
| #lang scribble/text | @(define (count n str) | (for/list ([i (in-range 1 (+ n 1))]) | @list{@i @str,@"\n"})) | Start... | @count[3]{Mississippi | }... done once. |
| Start again... | @count[3]{Massachusetts}@; | ... and I'm done again. |
|
| → | Start... | 1 Mississippi, | 2 Mississippi, | 3 Mississippi, | ... done once. |
| Start again... | 1 Massachusetts, | 2 Massachusetts, | 3 Massachusetts, | ... and I'm done again. |
|
A better approach is to generate newlines only when needed.
| #lang scribble/text | @(require racket/list) | @(define (counts n str) | (add-between | (for/list ([i (in-range 1 (+ n 1))]) | @list{@i @str,}) | "\n")) | Start... | @counts[3]{Mississippi} | ... and I'm done. |
|
| → | Start... | 1 Mississippi, | 2 Mississippi, | 3 Mississippi, | ... and I'm done. |
|
In fact, this is common enough that the scribble/text
language provides a convenient facility: add-newlines is a
function that is similar to add-between using a newline
string as the default separator, except that false and void values are
filtered out before doing so.
| #lang scribble/text | @(define (count n str) | (add-newlines | (for/list ([i (in-range 1 (+ n 1))]) | @list{@i @str,}))) | Start... | @count[3]{Mississippi} | ... and I'm done. |
|
| → | Start... | 1 Mississippi, | 2 Mississippi, | 3 Mississippi, | ... and I'm done. |
|
| #lang scribble/text | @(define (count n str) | (add-newlines | (for/list ([i (in-range 1 (+ n 1))]) | @(and (even? i) @list{@i @str,})))) | Start... | @count[6]{Mississippi} | ... and I'm done. |
|
| → | Start... | 2 Mississippi, | 4 Mississippi, | 6 Mississippi, | ... and I'm done. |
|
The separator can be set to any value.
| #lang scribble/text | @(define (count n str) | (add-newlines #:sep ",\n" | (for/list ([i (in-range 1 (+ n 1))]) | @list{@i @str}))) | Start... | @count[3]{Mississippi}. | ... and I'm done. |
|
| → | Start... | 1 Mississippi, | 2 Mississippi, | 3 Mississippi. | ... and I'm done. |
|
6.2 Defining Functions and More
(Note: most of the tips in this section are applicable to any code
that uses the Scribble @-form syntax.)
Because the Scribble reader is uniform, you can use it in place of any
expression where it is more convenient. (By convention, we use a
plain S-expression syntax when we want a Racket expression escape, and
an @-form for expressions that render as text, which, in the
scribble/text language, is any value-producing expression.)
For example, you can use an @-form for a function that you define.
| #lang scribble/text | @(define @bold[text] @list{*@|text|*}) | An @bold{important} note. |
|
| → | |
This is not commonly done, since most functions that operate with text
will need to accept a variable number of arguments. In fact, this
leads to a common problem: what if we want to write a function that
consumes a number of “text arguments” rathen than a single
“rest-like” body? The common solution for this is to provide the
separate text arguments in the S-expression part of an @-form.
| #lang scribble/text | @(define (choose 1st 2nd) | @list{Either @1st, or @|2nd|@"."}) | @(define who "us") | @choose[@list{you're with @who} | @list{against @who}] |
|
| → | Either you're with us, or against us. |
|
You can even use @-forms with a Racket quote or quasiquote as the
“head” part to make it shorter, or use a macro to get grouping of
sub-parts without dealing with quotes.
| #lang scribble/text | @(define (choose 1st 2nd) | @list{Either @1st, or @|2nd|@"."}) | @(define who "us") | @choose[@list{you're with @who} | @list{against @who}] | @(define-syntax-rule (compare (x ...) ...) | (add-newlines | (list (list "* " x ...) ...))) | Shopping list: | @compare[@{apples} | @{oranges} | @{@(* 2 3) bananas}] |
|
| → | Either you're with us, or against us. | Shopping list: | * apples | * oranges | * 6 bananas |
|
Yet another solution is to look at the text values and split the input
arguments based on a specific token. Using match can make it
convenient — you can even specify the patterns with @-forms.
| #lang scribble/text | @(require racket/match) | @(define (features . text) | (match text | [@list{@|1st|@... | --- | @|2nd|@...} | @list{>> Pros << | @1st; | >> Cons << | @|2nd|.}])) | @features{fast, | reliable | --- | expensive, | ugly} |
|
| → | >> Pros << | fast, | reliable; | >> Cons << | expensive, | ugly. |
|
In particular, it is often convenient to split the input by lines,
identified by delimiting "\n" strings. Since this can be
useful, a split-lines function is provided.
| #lang scribble/text | @(require racket/list) | @(define (features . text) | (add-between (split-lines text) | ", ")) | @features{red | fast | reliable}. |
|
| → | |
Finally, the Scribble reader accepts any expression as the head
part of an @-form — even an @ form. This makes it possible to
get a number of text bodies by defining a curried function, where each
step accepts any number of arguments. This, however, means that the
number of body expressions must be fixed.
| #lang scribble/text | @(define ((choose . 1st) . 2nd) | @list{Either you're @1st, or @|2nd|.}) | @(define who "me") | @@choose{with @who}{against @who} |
|
| → | Either you're with me, or against me. |
|
6.3 Using Printouts
Because the preprocessor language simply displays each toplevel value
as the file is run, it is possible to print text directly as part of
the output.
| #lang scribble/text | First | @display{Second} | Third |
|
| → | |
Taking this further, it is possible to write functions that output
some text instead of returning values that represent the text.
| #lang scribble/text | @(define (count n) | (for ([i (in-range 1 (+ n 1))]) | (printf "~a Mississippi,\n" i))) | Start... | @count[3]@; avoid an empty line | ... and I'm done. |
|
| → | Start... | 1 Mississippi, | 2 Mississippi, | 3 Mississippi, | ... and I'm done. |
|
This can be used to produce a lot of output text, even infinite.
| #lang scribble/text | @(define (count n) | (printf "~a Mississippi,\n" n) | (count (add1 n))) | Start... | @count[1] | this line is never printed! |
|
| → | Start... | 1 Mississippi, | 2 Mississippi, | 3 Mississippi, | 4 Mississippi, | 5 Mississippi, | ... |
|
However, you should be careful not to mix returning values with
printouts, as the results are rarely desirable.
| #lang scribble/text | @list{1 @display{two} 3} |
|
| → | |
Note that you don’t need side-effects if you want infinite output.
The output function iterates thunks and (composable)
promises, so you can create a loop that is delayed in either form.
| #lang scribble/text | @(define (count n) | (cons @list{@n Mississippi,@"\n"} | (lambda () | (count (add1 n))))) | Start... | @count[1] | this line is never printed! |
|
| → | Start... | 1 Mississippi, | 2 Mississippi, | 3 Mississippi, | 4 Mississippi, | 5 Mississippi, | ... |
|
6.4 Indentation in Preprocessed output
An issue that can be very important in many preprocessor applications
is the indentation of the output. This can be crucial in some cases,
if you’re generating code for an indentation-sensitive language (e.g.,
Haskell, Python, or C preprocessor directives). To get a better
understanding of how the pieces interact, you may want to review how
the Scribble reader section, but also remember
that you can use quoted forms to see how some form is read.
| #lang scribble/text | @(format "~s" '@list{ | a | b | c}) |
|
| → | (list "a" "\n" " " "b" "\n" "c") |
|
The Scribble reader ignores indentation spaces in its body. This is
an intentional feature, since you usually do not want an expression to
depend on its position in the source. But the question is how
can we render some output text with proper indentation. The
output function achieves that by assigning a special meaning
to lists: when a newline is part of a list’s contents, it causes the
following text to appear with indentation that corresponds to the
column position at the beginning of the list. In most cases, this
makes the output appear “as intended” when lists are used for nested
pieces of text — either from a literal list expression, or
an expression that evaluates to a list, or when a list is passed on as
a value; either as a toplevel expression, or as a nested value; either
appearing after spaces, or after other output.
| #lang scribble/text | foo @list{1 | 2 | 3} |
|
| → | |
| #lang scribble/text | @(define (block . text) | @list{begin | @text | end}) | @block{first | second | @block{ | third | fourth} | last} |
|
| → | begin | first | second | begin | third | fourth | end | last | end |
|
| #lang scribble/text | @(define (enumerate . items) | (add-newlines #:sep ";\n" | (for/list ([i (in-naturals 1)] | [item (in-list items)]) | @list{@|i|. @item}))) | Todo: @enumerate[@list{Install Racket} | @list{Hack, hack, hack} | @list{Profit}]. |
|
| → | Todo: 1. Install Racket; | 2. Hack, hack, hack; | 3. Profit. |
|
There are, however, cases when you need more refined control over the
output. The scribble/text provides a few functions for such
cases. The splice function is used to group together a
number of values but avoid introducing a new indentation context.
| #lang scribble/text | @(define (block . text) | @splice{{ | blah(@text); | }}) | start | @splice{foo(); | loop:} | @list{if (something) @block{one, | two}} | end |
|
| → | start | foo(); | loop: | if (something) { | blah(one, | two); | } | end |
|
The disable-prefix function disables all indentation
printouts in its contents, including the indentation before the body
of the disable-prefix value itself. It is useful, for
example, to print out CPP directives.
| #lang scribble/text | @(define (((IFFOO . var) . expr1) . expr2) | (define (array e1 e2) | @list{[@e1, | @e2]}) | @list{var @var; | @disable-prefix{#ifdef FOO} | @var = @array[expr1 expr2]; | @disable-prefix{#else} | @var = @array[expr2 expr1]; | @disable-prefix{#endif}}) |
| function blah(something, something_else) { | @disable-prefix{#include "stuff.inc"} | @@@IFFOO{i}{something}{something_else} | } |
|
| → | function blah(something, something_else) { | #include "stuff.inc" | var i; | #ifdef FOO | i = [something, | something_else]; | #else | i = [something_else, | something]; | #endif | } |
|
If there are values after a disable-prefix value on the same
line, they will get indented to the goal column (unless the output is
already beyond it).
| #lang scribble/text | @(define (thunk name . body) | @list{function @name() { | @body | }}) | @(define (ifdef cond then else) | @list{@disable-prefix{#}ifdef @cond | @then | @disable-prefix{#}else | @else | @disable-prefix{#}endif}) |
| @thunk['do_stuff]{ | init(); | @ifdef["HAS_BLAH" | @list{var x = blah();} | @thunk['blah]{ | @ifdef["BLEHOS" | @list{@disable-prefix{#}@; | include <bleh.h> | bleh();} | @list{error("no bleh");}] | }] | more_stuff(); | } |
|
| → | function do_stuff() { | init(); | # ifdef HAS_BLAH | var x = blah(); | # else | function blah() { | # ifdef BLEHOS | # include <bleh.h> | bleh(); | # else | error("no bleh"); | # endif | } | # endif | more_stuff(); | } |
|
There are cases where each line should be prefixed with some string
other than a plain indentation. The add-prefix function
causes its contents to be printed using some given string prefix for
every line. The prefix gets accumulated to an existing indentation,
and indentation in the contents gets added to the prefix.
| #lang scribble/text | @(define (comment . body) | @add-prefix["// "]{@body}) | @comment{add : int int -> string} | char *foo(int x, int y) { | @comment{ | skeleton: | allocate a string | print the expression into it | @comment{...more work...} | } | char *buf = malloc(@comment{FIXME! | This is bad} | 100); | } |
|
| → | // add : int int -> string | char *foo(int x, int y) { | // skeleton: | // allocate a string | // print the expression into it | // // ...more work... | char *buf = malloc(// FIXME! | // This is bad | 100); | } |
|
When combining add-prefix and disable-prefix there
is an additional value that can be useful: flush. This is a
value that causes output to print the current indentation and
prefix. This makes it possible to get the “ignored as a prefix”
property of disable-prefix but only for a nested prefix.
| #lang scribble/text | @(define (comment . text) | (list flush | @add-prefix[" *"]{ | @disable-prefix{/*} @text */})) | function foo(x) { | @comment{blah | more blah | yet more blah} | if (x < 0) { | @comment{even more | blah here | @comment{even | nested}} | do_stuff(); | } | } |
|
| → | function foo(x) { | /* blah | * more blah | * yet more blah */ | if (x < 0) { | /* even more | * blah here | * /* even | * * nested */ */ | do_stuff(); | } | } |
|
6.5 Using External Files
Using additional files that contain code for your preprocessing is
trivial: the preprocessor source is still source code in a module, so
you can require additional files with utility functions.
| #lang scribble/text | @(require "itemize.rkt") | Todo: | @itemize[@list{Hack some} | @list{Sleep some} | @list{Hack some | more}] |
| itemize.rkt: | #lang racket | (provide itemize) | (define (itemize . items) | (add-between (map (lambda (item) | (list "* " item)) | items) | "\n")) |
|
| → | Todo: | * Hack some | * Sleep some | * Hack some | more |
|
Note that the at-exp language can
often be useful here, since such files need to deal with texts. Using
it, it is easy to include a lot of textual content.
| #lang scribble/text | @(require "stuff.rkt") | Todo: | @itemize[@list{Hack some} | @list{Sleep some} | @list{Hack some | more}] | @summary |
| stuff.rkt: | #lang at-exp racket/base | (require racket/list) | (provide (all-defined-out)) | (define (itemize . items) | (add-between (map (lambda (item) | @list{* @item}) | items) | "\n")) | (define summary | @list{If that's not enough, | I don't know what is.}) |
|
| → | Todo: | * Hack some | * Sleep some | * Hack some | more | If that's not enough, | I don't know what is. |
|
Of course, the extreme side of this will be to put all of your content
in a plain Racket module, using @-forms for convenience. However,
there is no need to use the preprocessor language in this case;
instead, you can (require scribble/text), which will get all
of the bindings that are available in the scribble/text
language. Using output, switching from a preprocessed files
to a Racket file is very easy —- choosing one or the other depends
on whether it is more convenient to write a text file with occasional
Racket expressions or the other way.
| #lang at-exp racket/base | (require scribble/text racket/list) | (define (itemize . items) | (add-between (map (lambda (item) | @list{* @item}) | items) | "\n")) | (define summary | @list{If that's not enough, | I don't know what is.}) | (output | @list{ | Todo: | @itemize[@list{Hack some} | @list{Sleep some} | @list{Hack some | more}] | @summary | }) |
|
| → | Todo: | * Hack some | * Sleep some | * Hack some | more | If that's not enough, | I don't know what is. |
|
However, you might run into a case where it is desirable to include a
mostly-text file from a preprocessor file. It might be because you
prefer to split the source text to several files, or because you need
to preprocess a file without even a #lang header (for
example, an HTML template file that is the result of an external
editor). For these cases, the scribble/text language
provides an include form that includes a file in the
preprocessor syntax (where the default parsing mode is text).
| #lang scribble/text | @(require racket/list) | @(define (itemize . items) | (list | "<ul>" | (add-between | (map (lambda (item) | @list{<li>@|item|</li>}) | items) | "\n") | "</ul>")) | @(define title "Todo") | @(define summary | @list{If that's not enough, | I don't know what is.}) |
| @include["template.html"] |
| template.html: | <html> | <head><title>@|title|</title></head> | <body> | <h1>@|title|</h1> | @itemize[@list{Hack some} | @list{Sleep some} | @list{Hack some | more}] | <p><i>@|summary|</i></p> | </body> | </html> |
|
| → | <html> | <head><title>Todo</title></head> | <body> | <h1>Todo</h1> | <ul><li>Hack some</li> | <li>Sleep some</li> | <li>Hack some | more</li></ul> | <p><i>If that's not enough, | I don't know what is.</i></p> | </body> | </html> |
|
(Using require with a text file in the scribble/text
language will not work as intended: using the preprocessor language
means that the text is displayed when the module is invoked, so the
required file’s contents will be printed before any of the requiring
module’s text does. If you find yourself in such a situation, it is
better to switch to a Racket-with-@-expressions file as shown
above.)