7 raco decompile: Decompiling Bytecode
The raco decompile command takes a bytecode file (which usually
has the file extension ".zo") or a source file with an
associated bytecode file (usually created with raco make) and
converts it back to an approximation of Racket code. Decompiled
bytecode is mostly useful for checking the compiler’s transformation
and optimization of the source program.
Many forms in the decompiled code, such as module,
define, and lambda, have the same meanings as
always. Other forms and transformations are specific to the rendering
of bytecode, and they reflect a specific execution model:
Top-level variables, variables defined within the module, and
variables imported from other modules are prefixed with _,
which helps expose the difference between uses of local variables
versus other variables. Variables imported from other modules,
moreover, have a suffix that indicates the source module.
Non-local variables are always accessed indirectly though an implicit
#%globals or #%modvars variable that
resides on the value stack (which otherwise contains local
variables). Variable accesses are further wrapped with
#%checked when the compiler cannot prove that the
variable will be defined before the access.
Uses of core primitives are shown without a leading _, and
they are never wrapped with #%checked.
Local-variable access may be wrapped with
#%sfs-clear, which indicates that the variable-stack
location holding the variable will be cleared to prevent the
variable’s value from being retained by the garbage collector.
Variables whose name starts with unused are never
actually stored on the stack, and so they never have
#%sfs-clear annotations. (The bytecode compiler
normally eliminates such bindings, but sometimes it cannot, either
because it cannot prove that the right-hand side produces the right
number of values, or the discovery that the variable is unused
happens too late with the compiler.)
Mutable variables are converted to explicitly boxed values using
#%box, #%unbox, and
#%set-boxes! (which works on multiple boxes at once).
A set!-rec-values operation constructs
mutually-recursive closures and simultaneously updates the
corresponding variable-stack locations that bind the closures. A
set!, set!-values, or
set!-rec-values form is always used on a local
variable before it is captured by a closure; that ordering reflects
how closures capture values in variable-stack locations, as opposed
to stack locations.
In a lambda form, if the procedure produced by the
lambda has a name (accessible via object-name)
and/or source-location information, then it is shown as a quoted
constant at the start of the procedure’s body. Afterward, if the
lambda form captures any bindings from its context, those
bindings are also shown in a quoted constant. Neither constant
corresponds to a computation when the closure is called, though the
list of captured bindings corresponds to a closure allocation when
the lambda form itself is evaluated.
A lambda form that closes over no bindings is wrapped with
#%closed plus an identifier that is bound to the
closure. The binding’s scope covers the entire decompiled output, and
it may be referenced directly in other parts of the program; the
binding corresponds to a constant closure value that is shared, and
it may even contain cyclic references to itself or other constant
closures.
A form (#%apply-values proc expr) is equivalent to
(call-with-values (lambda () expr) proc), but the run-time
system avoids allocating a closure for expr.
Some applications of core primitives are annotated with
#%in, which indicates that the JIT compiler will
inline the operation. (Inlining information is not part of the
bytecode, but is instead based on an enumeration of primitives that
the JIT is known to handle specially.) Operations from
racket/flonum and racket/unsafe/ops
are always inlined, so #%in is not shown for them.
Some applications of flonum operations from racket/flonum
and racket/unsafe/ops are annotated with
#%flonum, indicating a place where the JIT compiler
might avoid allocation for intermediate flonum results. A single
#%flonum by itself is not useful, but a
#%flonum operation that consumes a
#%flonum or #%from-flonum argument
indicates a potential performance improvement. A
#%from-flonum wraps an identifier that is bound by
let with a #%as-flonum around its value,
which indicates a local binding that can avoid boxing (when used as
an argument to an operation that can work with unboxed values).
A #%decode-syntax form corresponds to a syntax
object. Future improvements to the decompiler will convert such
syntax objects to a readable form.
7.1 API for Decompiling
Consumes the result of parsing bytecode and returns an S-expression
(as described above) that represents the compiled code.
7.2 API for Parsing Bytecode
The compiler/zo-parse module re-exports
compiler/zo-structs in addition to
zo-parse.
Parses a port (typically the result of opening a ".zo" file)
containing bytecode. Beware that the structure types used to
represent the bytecode are subject to frequent changes across Racket
versons.
The parsed bytecode is returned in a compilation-top
structure. For a compiled module, the compilation-top
structure will contain a mod structure. For a top-level
sequence, it will normally contain a seq or splice
structure with a list of top-level declarations and expressions.
The bytecode representation of an expression is closer to an
S-expression than a traditional, flat control string. For example, an
if form is represented by a branch structure that
has three fields: a test expression, a “then” expression, and an
“else” expression. Similarly, a function call is represented by an
application structure that has a list of argument
expressions.
Storage for local variables or intermediate values (such as the
arguments for a function call) is explicitly specified in terms of a
stack. For example, execution of an application structure
reserves space on the stack for each argument result. Similarly, when
a let-one structure (for a simple let) is executed,
the value obtained by evaluating the right-hand side expression is
pushed onto the stack, and then the body is evaluated. Local
variables are always accessed as offsets from the current stack
position. When a function is called, its arguments are passed on the
stack. A closure is created by transferring values from the stack to
a flat closure record, and when a closure is applied, the saved values
are restored on the stack (though possibly in a different order and
likely in a more compact layout than when they were captured).
When a sub-expression produces a value, then the stack pointer is
restored to its location from before evaluating the sub-expression.
For example, evaluating the right-hand size for a let-one
structure may temporarily push values onto the stack, but the stack is
restored to its pre-let-one position before pushing the
resulting value and continuing with the body. In addition, a tail
call resets the stack pointer to the position that follows the
enclosing function’s arguments, and then the tail call continues by
pushing onto the stack the arguments for the tail-called function.
Values for global and module-level variables are not put directly on
the stack, but instead stored in “buckets,” and an array of
accessible buckets is kept on the stack. When a closure body needs to
access a global variable, the closure captures and later restores the
bucket array in the same way that it captured and restores a local
variable. Mutable local variables are boxed similarly to global
variables, but individual boxes are referenced from the stack and
closures.
Quoted syntax (in the sense of quote-syntax) is treated like
a global variable, because it must be instantiated for an appropriate
phase. A prefix structure within a compilation-top
or mod structure indicates the list of global variables and
quoted syntax that need to be instantiated (and put into an array on
the stack) before evaluating expressions that might use them.
7.3 API for Marshaling Bytecode
Consumes a representation of bytecode and writes it to out.
Consumes a representation of bytecode and generates a byte string for
the marshaled bytecode.
7.4 Bytecode Representation
The compiler/zo-structs library defines the bytecode
structures that are produced by zo-parse and consumed by
decompile and zo-marshal.
A supertype for all forms that can appear in compiled code.
7.4.1 Prefix
Wraps compiled code. The
max-let-depth field indicates the
maximum stack depth that
code creates (not counting the
prefix array). The
prefix field describes top-level
variables, module-level variables, and quoted syntax-objects accessed
by
code. The
code field contains executable code;
it is normally a
form, but a literal value is represented as
itself.
Represents a “prefix” that is pushed onto the stack to initiate
evaluation. The prefix is an array, where buckets holding the
values for toplevels are first, then the buckets for the
stxs, then a bucket for another array if stxs is
non-empty, then num-lifts extra buckets for lifted local
procedures.
In toplevels, each element is one of the following:
a #f, which indicates a dummy variable that is used
to access the enclosing module/namespace at run time;
a symbol, which is a reference to a variable defined in the
enclosing module;
a global-bucket, which is a top-level variable (appears
only outside of modules); or
a module-variable, which indicates a variable imported
from another module.
The variable buckets and syntax objects that are recorded in a prefix
are accessed by toplevel and topsyntax expression
forms.
Represents a top-level variable, and used only in a
prefix.
Represents a top-level variable, and used only in a
prefix.
The
pos may record the variable’s offset within its module,
or it can be
-1 if the variable is always located by name.
The
phase indicates the phase level of the definition within
its module.
Wraps a syntax object in a
prefix.
7.4.2 Forms
A supertype for all forms that can appear in compiled code (including
exprs), except for literals that are represented as
themselves.
Represents a
define-values form. Each element of
ids will reference via the prefix either a top-level variable
or a local module variable.
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs.
Represents a
define-syntaxes or
define-values-for-syntax form. The
rhs expression
has its own
prefix, which is pushed before evaluating
rhs; the stack is restored after obtaining the result values.
The
max-let-depth field indicates the maximum size of the
stack that will be created by
rhs (not counting
prefix).
Represents a top-level
#%require form (but not one in a
module form) with a sequence of specifications
reqs.
The
dummy variable is used to access to the top-level
namespace.
Represents a
begin form, either as an expression or at the
top level (though the latter is more commonly a
splice form).
When a
seq appears in an expression position, its
forms are expressions.
After each form in forms is evaluated, the stack is restored
to its depth from before evaluating the form.
Represents a top-level
begin form where each evaluation is
wrapped with a continuation prompt.
After each form in forms is evaluated, the stack is restored
to its depth from before evaluating the form.
Represents a
module declaration. The
body forms use
prefix, rather than any prefix in place for the module
declaration itself (and each
syntax-body has its own prefix).
The provides and requires lists are each an
association list from phases to exports or imports. In the case of
provides, each phase maps to two lists: one for exported
variables, and another for exported syntax. In the case of
requires, each phase maps to a list of imported module paths.
The body field contains the module’s run-time code, and
syntax-body contains the module’s compile-time code. After
each form in body or syntax-body is evaluated, the
stack is restored to its depth from before evaluating the form.
The unexported list contains lists of symbols for unexported
definitions that can be accessed through macro expansion. The first
list is phase-0 variables, the second is phase-0 syntax, and the last
is phase-1 variables.
The max-let-depth field indicates the maximum stack depth
created by body forms (not counting the prefix
array). The dummy variable is used to access to the
top-level namespace.
The lang-info value specifies an optional module path that
provides information about the module’s implementation language.
The internal-module-context value describes the lexical
context of the body of the module. This value is used by
module->namespace. A #f value means that the
context is unavailable or empty. A #t value means that the
context is computed by re-importing all required modules. A
syntax-object value embeds an arbitrary lexical context.
Describes an individual provided identifier within a
mod
instance.
7.4.3 Expressions
A supertype for all expression forms that can appear in compiled code,
except for literals that are represented as themselves and some
seq structures (which can appear as an expression as long as
it contains only other things that can be expressions).
Represents a
lambda form. The
name field is a name
for debugging purposes. The
num-params field indicates the
number of arguments accepted by the procedure, not counting a rest
argument; the
rest? field indicates whether extra arguments
are accepted and collected into a “rest” variable. The
param-types list contains
num-params symbols
indicating the type of each argumet, either
'val for a normal
argument,
'ref for a boxed argument (representing a mutable
local variable), or
'flonum for a flonum argument.
The
closure-map field is a vector of stack positions that are
captured when evaluating the lambda form to create a closure.
The closure-types field provides a corresponding list of
types, but no distinction is made between normal values and boxed
values; also, this information is redundant, since it can be inferred
by the bindings referenced though closure-map.
Which a closure captures top-level or module-level variables, they
are represented in the closure by capturing a prefix (in the sense
of prefix). The toplevel-map field indicates
which top-level and lifted variables are actually used by the
closure (so that variables in a prefix can be pruned by the run-time
system if they become unused). A #f value indicates either
that no prefix is captured or all variables in the prefix should be
considered used. Otherwise, numbers in the set indicate which
variables and lifted variables are used. Variables are numbered
consecutively by position in the prefix starting from
0. Lifted variables are numbered immediately
afterward—which means that, if the prefix contains any syntax
objects, lifted-variable numbers are shifted down relative to a
toplevel by the number of syntax object in the prefix plus
one (which makes the toplevel-map set more compact).
When the function is called, the rest-argument list (if any) is pushed
onto the stack, then the normal arguments in reverse order, then the
closure-captured values in reverse order. Thus, when body is
run, the first value on the stack is the first value captured by the
closure-map array, and so on.
The max-let-depth field indicates the maximum stack depth
created by body plus the arguments and closure-captured
values pushed onto the stack. The body field is the
expression for the closure’s body.
A
lambda form with an empty closure, which is a procedure
constant. The procedure constant can appear multiple times in the
graph of expressions for bytecode, and the
code field can be
a cycle for a recursive constant procedure; the
gen-id is
different for each such constant.
Represents a
case-lambda form as a combination of
lambda forms that are tried (in order) based on the number of
arguments given.
Pushes an uninitialized slot onto the stack, evaluates
rhs
and puts its value into the slot, and then runs
body. If
flonum? is
#t, then
rhs must produce a
flonum, and the slot must be accessed by
localrefs that
expect a flonum. If
unused? is
#t, then the slot
must not be used, and the value of
rhs is not actually pushed
onto the stack (but
rhs is constrained to produce a single
value).
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs. Note that the new slot is
created before evaluating rhs.
Pushes
count uninitialized slots onto the stack and then runs
body. If
boxes? is
#t, then the slots are
filled with boxes that contain
#<undefined>.
Runs rhs to obtain count results, and installs them
into existing slots on the stack in order, skipping the first
pos stack positions. If boxes? is #t, then
the values are put into existing boxes in the stack slots.
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs.
Represents a
letrec form with
lambda bindings. It
allocates a closure shell for each
lambda form in
procs, installs each onto the stack in previously allocated
slots in reverse order (so that the closure shell for the last element
of
procs is installed at stack position
0), fills
out each shell’s closure (where each closure normally references some
other just-created closures, which is possible because the shells have
been installed on the stack), and then evaluates
body.
Skips
pos elements of the stack, setting the slot afterward
to a new box containing the slot’s old value, and then runs
body. This form appears when a
lambda argument is
mutated using
set! within its body; calling the function
initially pushes the value directly on the stack, and this form boxes
the value so that it can be mutated later.
Represents a local-variable reference; it accesses the value in the
stack slot after the first
pos slots. If
unbox? is
#t, the stack slot contains a box, and a value is extracted
from the box. If
clear? is
#t, then after the value
is obtained, the stack slot is cleared (to avoid retaining a reference
that can prevent reclamation of the value as garbage). If
other-clears? is
#t, then some later reference to
the same stack slot may clear after reading. If
flonum? is
#t, the slot holds to a flonum value.
Represents a reference to a top-level or imported variable via the
prefix array. The
depth field indicates the number
of stack slots to skip to reach the prefix array, and
pos is
the offset into the array.
If const? is #t, then the variable definitely will
be defined, and its value stays constant. If ready? is
#t, then the variable definitely will be defined (but its
value might change in the future). If const? and
ready? are both #f, then a check is needed to
determine whether the variable is defined.
Represents a reference to a quoted syntax object via the
prefix array. The
depth field indicates the number
of stack slots to skip to reach the prefix array, and
pos is
the offset into the array. The
midpt value is used
internally for lazy calculation of syntax information.
Represents a function call. The
rator field is the
expression for the function, and
rands are the argument
expressions. Before any of the expressions are evaluated,
(length rands) uninitialized stack slots are created (to be
used as temporary space).
After test is evaluated, the stack is restored to its depth
from before evaluating test.
After each of key and val is evaluated, the stack is
restored to its depth from before evaluating key or
val.
Represents a
begin0 expression.
After each expression in seq is evaluated, the stack is
restored to its depth from before evaluating the expression.
Represents a
#%variable-reference form. The
dummy field
accesses a variable bucket that strongly references its namespace (as
opposed to a normal variable bucket, which only weakly references its
namespace).
Represents a
set! expression that assigns to a top-level or
module-level variable. (Assignments to local variables are represented
by
install-value expressions.)
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs.
Represents a direct reference to a variable imported from the run-time
kernel.
7.4.4 Syntax Objects
Represents a syntax object, where wraps contain the lexical
information and tamper-status is taint information. When the
datum part is itself compound, its pieces are wrapped, too.
A supertype for lexical-information elements.
A top-level renaming.
A mark barrier.
Information about a free identifier.
A local-binding mapping from symbols to binding-set names.
Shifts module bindings later in the wrap set.
Represents a set of module and import bindings.
Represents a set of simple imports from one module within a
module-rename.
A supertype for module bindings.
A supertype for nominal paths.
Represents a simple nominal path.
Represents an imported nominal path.
Represents a phased nominal path.