10 raco decompile: Decompiling Bytecode
The raco decompile command takes a bytecode file (which usually
has the file extension ".zo") or a source file with an
associated bytecode file (usually created with raco make) and
converts it back to an approximation of Racket code. Decompiled
bytecode is mostly useful for checking the compiler’s transformation
and optimization of the source program.
Many forms in the decompiled code, such as module,
define, and lambda, have the same meanings as
always. Other forms and transformations are specific to the rendering
of bytecode, and they reflect a specific execution model:
Top-level variables, variables defined within the module, and
variables imported from other modules are prefixed with _,
which helps expose the difference between uses of local variables
versus other variables. Variables imported from other modules,
moreover, have a suffix starting with @ that indicates
the source module. Finally, imported variables with constantness
have a midfix:
:c to indicate constant shape across all instantiations,
:f to indicate a fixed value after initialization,
:p to indicate a procedure,
:P to indicate a procedure that preserves continuation
marks on return,
:t to indicate a structure type,
:mk to indicate a structure constructor,
:? to indicate a structure predicate,
:ref to indicate a structure accessor, or
:set! to indicate a structure mutator.
Non-local variables are always accessed indirectly though an implicit
#%globals or #%modvars variable that
resides on the value stack (which otherwise contains local
variables). Variable accesses are further wrapped with
#%checked when the compiler cannot prove that the
variable will be defined before the access.
Uses of core primitives are shown without a leading _, and
they are never wrapped with #%checked.
Local-variable access may be wrapped with
#%sfs-clear, which indicates that the variable-stack
location holding the variable will be cleared to prevent the
variable’s value from being retained by the garbage collector.
Variables whose name starts with unused are never
actually stored on the stack, and so they never have
#%sfs-clear annotations. (The bytecode compiler
normally eliminates such bindings, but sometimes it cannot, either
because it cannot prove that the right-hand side produces the right
number of values, or the discovery that the variable is unused
happens too late with the compiler.)
Mutable variables are converted to explicitly boxed values using
#%box, #%unbox, and
#%set-boxes! (which works on multiple boxes at once).
A set!-rec-values operation constructs
mutually-recursive closures and simultaneously updates the
corresponding variable-stack locations that bind the closures. A
set!, set!-values, or
set!-rec-values form is always used on a local
variable before it is captured by a closure; that ordering reflects
how closures capture values in variable-stack locations, as opposed
to stack locations.
In a lambda form, if the procedure produced by the
lambda has a name (accessible via object-name)
and/or source-location information, then it is shown as a quoted
constant at the start of the procedure’s body. Afterward, if the
lambda form captures any bindings from its context, those
bindings are also shown in a quoted constant. Neither constant
corresponds to a computation when the closure is called, though the
list of captured bindings corresponds to a closure allocation when
the lambda form itself is evaluated.
A lambda form that closes over no bindings is wrapped with
#%closed plus an identifier that is bound to the
closure. The binding’s scope covers the entire decompiled output, and
it may be referenced directly in other parts of the program; the
binding corresponds to a constant closure value that is shared, and
it may even contain cyclic references to itself or other constant
closures.
A form (#%apply-values proc expr) is equivalent to
(call-with-values (lambda () expr) proc), but the run-time
system avoids allocating a closure for expr.
A define-values form may have (begin '%%inline-variant%% expr1 expr2) for its expression, in which case
expr2 is the normal result, but expr1 may be
inlined for calls to the definition from other modules. Definitions
of functions without an '%%inline-variant%% are never
inlined across modules.
Some applications of core primitives are annotated with
#%in, which indicates that the JIT compiler will
inline the operation. (Inlining information is not part of the
bytecode, but is instead based on an enumeration of primitives that
the JIT is known to handle specially.) Operations from
racket/flonum and racket/unsafe/ops
are always inlined, so #%in is not shown for them.
Function arguments and local bindings that are known to have a
particular type have names that embed the known type. For example, an
argument might have a name that starts argflonum or a
local binding might have a name that starts flonum to
indicate a flonum value.
A #%decode-syntax form corresponds to a syntax
object.
10.1 API for Decompiling
Consumes the result of parsing bytecode and returns an S-expression
(as described above) that represents the compiled code.
10.2 API for Parsing Bytecode
The compiler/zo-parse module re-exports
compiler/zo-structs in addition to
zo-parse.
Parses a port (typically the result of opening a ".zo" file)
containing bytecode. Beware that the structure types used to
represent the bytecode are subject to frequent changes across Racket
versons.
The parsed bytecode is returned in a compilation-top
structure. For a compiled module, the compilation-top
structure will contain a mod structure. For a top-level
sequence, it will normally contain a seq or splice
structure with a list of top-level declarations and expressions.
The bytecode representation of an expression is closer to an
S-expression than a traditional, flat control string. For example, an
if form is represented by a branch structure that
has three fields: a test expression, a “then” expression, and an
“else” expression. Similarly, a function call is represented by an
application structure that has a list of argument
expressions.
Storage for local variables or intermediate values (such as the
arguments for a function call) is explicitly specified in terms of a
stack. For example, execution of an application structure
reserves space on the stack for each argument result. Similarly, when
a let-one structure (for a simple let) is executed,
the value obtained by evaluating the right-hand side expression is
pushed onto the stack, and then the body is evaluated. Local
variables are always accessed as offsets from the current stack
position. When a function is called, its arguments are passed on the
stack. A closure is created by transferring values from the stack to
a flat closure record, and when a closure is applied, the saved values
are restored on the stack (though possibly in a different order and
likely in a more compact layout than when they were captured).
When a sub-expression produces a value, then the stack pointer is
restored to its location from before evaluating the sub-expression.
For example, evaluating the right-hand size for a let-one
structure may temporarily push values onto the stack, but the stack is
restored to its pre-let-one position before pushing the
resulting value and continuing with the body. In addition, a tail
call resets the stack pointer to the position that follows the
enclosing function’s arguments, and then the tail call continues by
pushing onto the stack the arguments for the tail-called function.
Values for global and module-level variables are not put directly on
the stack, but instead stored in “buckets,” and an array of
accessible buckets is kept on the stack. When a closure body needs to
access a global variable, the closure captures and later restores the
bucket array in the same way that it captured and restores a local
variable. Mutable local variables are boxed similarly to global
variables, but individual boxes are referenced from the stack and
closures.
Quoted syntax (in the sense of quote-syntax) is treated like
a global variable, because it must be instantiated for an appropriate
phase. A prefix structure within a compilation-top
or mod structure indicates the list of global variables and
quoted syntax that need to be instantiated (and put into an array on
the stack) before evaluating expressions that might use them.
10.3 API for Marshaling Bytecode
Consumes a representation of bytecode and writes it to out.
Consumes a representation of bytecode and generates a byte string for
the marshaled bytecode.
10.4 Bytecode Representation
The compiler/zo-structs library defines the bytecode
structures that are produced by zo-parse and consumed by
decompile and zo-marshal.
A supertype for all forms that can appear in compiled code.
10.4.1 Prefix
Wraps compiled code. The
max-let-depth field indicates the
maximum stack depth that
code creates (not counting the
prefix array). The
prefix field describes top-level
variables, module-level variables, and quoted syntax-objects accessed
by
code. The
code field contains executable code;
it is normally a
form, but a literal value is represented as
itself.
Represents a “prefix” that is pushed onto the stack to initiate
evaluation. The prefix is an array, where buckets holding the
values for toplevels are first, then the buckets for the
stxs, then a bucket for another array if stxs is
non-empty, then num-lifts extra buckets for lifted local
procedures.
In toplevels, each element is one of the following:
a #f, which indicates a dummy variable that is used
to access the enclosing module/namespace at run time;
a symbol, which is a reference to a variable defined in the
enclosing module;
a global-bucket, which is a top-level variable (appears
only outside of modules); or
a module-variable, which indicates a variable imported
from another module.
The variable buckets and syntax objects that are recorded in a prefix
are accessed by toplevel and topsyntax expression
forms.
Represents a top-level variable, and used only in a
prefix.
Represents a top-level variable, and used only in a
prefix.
The
pos may record the variable’s offset within its module,
or it can be
-1 if the variable is always located by name.
The
phase indicates the phase level of the definition within
its module. The
constantness field is either
'constant,
a
function-shape value, or a
struct-shape value
to indicate that
variable’s value is always the same for every instantiation of its module;
'fixed to indicate
that it doesn’t change within a particular instantiation of the module;
or
#f to indicate that the variable’s value
can change even for one particular instantiation of its module.
Represents the shape of an expected import, which should be a function
having the arity specified by arity. The
preserves-marks? field is true if calling the function is
expected to leave continuation marks unchanged by the time it
returns.
Represents the shape of an expected import as a structure-type
binding, constructor, etc.
Wraps a syntax object in a
prefix.
10.4.2 Forms
A supertype for all forms that can appear in compiled code (including
exprs), except for literals that are represented as
themselves.
Represents a
define-values form. Each element of
ids will reference via the prefix either a top-level variable
or a local module variable.
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs.
Represents a
define-syntaxes or
begin-for-syntax form. The
rhs expression or set of
forms forms has its own
prefix, which is pushed before evaluating
rhs or the
forms; the stack is restored after obtaining the result values.
The
max-let-depth field indicates the maximum size of the
stack that will be created by
rhs (not counting
prefix). The
dummy variable is used to access the enclosing
namespace.
Represents a top-level
#%require form (but not one in a
module form) with a sequence of specifications
reqs.
The
dummy variable is used to access the top-level
namespace.
Represents a
begin form, either as an expression or at the
top level (though the latter is more commonly a
splice form).
When a
seq appears in an expression position, its
forms are expressions.
After each form in forms is evaluated, the stack is restored
to its depth from before evaluating the form.
Represents a top-level
begin form where each evaluation is
wrapped with a continuation prompt.
After each form in forms is evaluated, the stack is restored
to its depth from before evaluating the form.
Represents a function that is bound by
define-values, where the
function has two variants.
The first variant is used for normal calls to the function. The second may
be used for cross-module inlining of the function.
Represents a
module declaration.
The provides and requires lists are each an
association list from phases to exports or imports. In the case of
provides, each phase maps to two lists: one for exported
variables, and another for exported syntax. In the case of
requires, each phase maps to a list of imported module paths.
The body field contains the module’s run-time (i.e., phase
0) code. The syntax-bodies list has a list of forms for
each higher phase in the module body; the phases are in order
starting with phase 1. The body forms use prefix,
rather than any prefix in place for the module declaration itself,
while members of lists in syntax-bodies have their own
prefixes. After each form in body or syntax-bodies
is evaluated, the stack is restored to its depth from before
evaluating the form.
The unexported list contains lists of symbols for
unexported definitions that can be accessed through macro expansion
and that are implemented through the forms in body and
syntax-bodies. Each list in unexported starts
with a phase level.
The max-let-depth field indicates the maximum stack depth
created by body forms (not counting the prefix
array). The dummy variable is used to access to the
top-level namespace.
The lang-info value specifies an optional module path that
provides information about the module’s implementation language.
The internal-module-context value describes the lexical
context of the body of the module. This value is used by
module->namespace. A #f value means that the
context is unavailable or empty. A #t value means that the
context is computed by re-importing all required modules. A
syntax-object value embeds an arbitrary lexical context.
The flags field records certain properties of the module.
The 'cross-phase flag indicates that the module body is
evaluated once and the results shared across instances for all phases; such a
module contains only definitions of functions, structure types, and
structure type properties.
The pre-submodules field records module-declared
submodules, while the post-submodules field records
module*-declared submodules.
Describes an individual provided identifier within a
mod
instance.
10.4.3 Expressions
A supertype for all expression forms that can appear in compiled code,
except for literals that are represented as themselves and some
seq structures (which can appear as an expression as long as
it contains only other things that can be expressions).
Represents a
lambda form. The
name field is a name
for debugging purposes. The
num-params field indicates the
number of arguments accepted by the procedure, not counting a rest
argument; the
rest? field indicates whether extra arguments
are accepted and collected into a “rest” variable. The
param-types list contains
num-params symbols
indicating the type of each argumet, either
'val for a normal
argument,
'ref for a boxed argument (representing a mutable
local variable), or
'flonum for a flonum argument.
The
closure-map field is a vector of stack positions that are
captured when evaluating the lambda form to create a closure.
The closure-types field provides a corresponding list of
types, but no distinction is made between normal values and boxed
values; also, this information is redundant, since it can be inferred
by the bindings referenced though closure-map.
Which a closure captures top-level or module-level variables, they
are represented in the closure by capturing a prefix (in the sense
of prefix). The toplevel-map field indicates
which top-level and lifted variables are actually used by the
closure (so that variables in a prefix can be pruned by the run-time
system if they become unused). A #f value indicates either
that no prefix is captured or all variables in the prefix should be
considered used. Otherwise, numbers in the set indicate which
variables and lifted variables are used. Variables are numbered
consecutively by position in the prefix starting from
0. Lifted variables are numbered immediately
afterward—which means that, if the prefix contains any syntax
objects, lifted-variable numbers are shifted down relative to a
toplevel by the number of syntax object in the prefix plus
one (which makes the toplevel-map set more compact).
When the function is called, the rest-argument list (if any) is pushed
onto the stack, then the normal arguments in reverse order, then the
closure-captured values in reverse order. Thus, when body is
run, the first value on the stack is the first value captured by the
closure-map array, and so on.
The max-let-depth field indicates the maximum stack depth
created by body plus the arguments and closure-captured
values pushed onto the stack. The body field is the
expression for the closure’s body.
A
lambda form with an empty closure, which is a procedure
constant. The procedure constant can appear multiple times in the
graph of expressions for bytecode, and the
code field can be
a cycle for a recursive constant procedure; the
gen-id is
different for each such constant.
Represents a
case-lambda form as a combination of
lambda forms that are tried (in order) based on the number of
arguments given.
Pushes an uninitialized slot onto the stack, evaluates
rhs
and puts its value into the slot, and then runs
body. If
type is not
#f, then
rhs must produce a
value of the corresponding type, and the slot must be accessed by
localrefs that
expect the type. If
unused? is
#t, then the slot
must not be used, and the value of
rhs is not actually pushed
onto the stack (but
rhs is constrained to produce a single
value).
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs. Note that the new slot is
created before evaluating rhs.
Pushes
count uninitialized slots onto the stack and then runs
body. If
boxes? is
#t, then the slots are
filled with boxes that contain
#<undefined>.
Runs rhs to obtain count results, and installs them
into existing slots on the stack in order, skipping the first
pos stack positions. If boxes? is #t, then
the values are put into existing boxes in the stack slots.
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs.
Represents a
letrec form with
lambda bindings. It
allocates a closure shell for each
lambda form in
procs, installs each onto the stack in previously allocated
slots in reverse order (so that the closure shell for the last element
of
procs is installed at stack position
0), fills
out each shell’s closure (where each closure normally references some
other just-created closures, which is possible because the shells have
been installed on the stack), and then evaluates
body.
Skips
pos elements of the stack, setting the slot afterward
to a new box containing the slot’s old value, and then runs
body. This form appears when a
lambda argument is
mutated using
set! within its body; calling the function
initially pushes the value directly on the stack, and this form boxes
the value so that it can be mutated later.
Represents a local-variable reference; it accesses the value in the
stack slot after the first pos slots. If unbox? is
#t, the stack slot contains a box, and a value is extracted
from the box. If clear? is #t, then after the value
is obtained, the stack slot is cleared (to avoid retaining a reference
that can prevent reclamation of the value as garbage). If
other-clears? is #t, then some later reference to
the same stack slot may clear after reading. If type is
not #f, the slot is known to hold a specific type of value.
Represents a reference to a top-level or imported variable via the
prefix array. The
depth field indicates the number
of stack slots to skip to reach the prefix array, and
pos is
the offset into the array.
When the toplevel is an expression, if both const?
and ready? are #t, then the variable definitely
will be defined, its value stays constant, and the constant is
effectively the same for every module instantiation. If only
const? is #t, then the value is constant, but it
may vary across instantiations. If only ready? is
#t, then the variable definitely will be defined, but its
value may change. If const? and ready? are both
#f, then a check is needed to determine whether the
variable is defined.
When the toplevel is the right-hand side for
def-values, then const? is #f. If
ready? is #t, the variable is marked as immutable
after it is defined.
Represents a reference to a quoted syntax object via the
prefix array. The
depth field indicates the number
of stack slots to skip to reach the prefix array, and
pos is
the offset into the array. The
midpt value is used
internally for lazy calculation of syntax information.
Represents a function call. The
rator field is the
expression for the function, and
rands are the argument
expressions. Before any of the expressions are evaluated,
(length rands) uninitialized stack slots are created (to be
used as temporary space).
After test is evaluated, the stack is restored to its depth
from before evaluating test.
After each of key and val is evaluated, the stack is
restored to its depth from before evaluating key or
val.
Represents a
begin0 expression.
After each expression in seq is evaluated, the stack is
restored to its depth from before evaluating the expression.
Represents a
#%variable-reference form. The
toplevel
field is
#t if the original reference was to a constant local
binding. The
dummy field
accesses a variable bucket that strongly references its namespace (as
opposed to a normal variable bucket, which only weakly references its
namespace); it can be
#f.
Represents a
set! expression that assigns to a top-level or
module-level variable. (Assignments to local variables are represented
by
install-value expressions.)
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs.
Represents a direct reference to a variable imported from the run-time
kernel.
10.4.4 Syntax Objects
Represents a syntax object, where wraps contain the lexical
information and tamper-status is taint information. When the
datum part is itself compound, its pieces are wrapped, too.
A supertype for lexical-information elements.
A top-level renaming.
A mark barrier.
Information about a free identifier.
A local-binding mapping from symbols to binding-set names.
Shifts module bindings later in the wrap set.
Represents a set of module and import bindings.
Represents a set of simple imports from one module within a
module-rename.
A supertype for module bindings.
A supertype for nominal paths.
Represents a simple nominal path.
Represents an imported nominal path.
Represents a phased nominal path.