3 Automated Testing Benchmark

package: redex-benchmark

Redex’s automated testing benchmark provides a collection of buggy models and utilities to test how efficiently methods of automatic test case generation are able to find counterexamples for each bug.

The benchmark is organized by pairs of generate and check functions, as described in Running Benchmark Models. Usually these are defined on a per-module basis, using pattern based rewrites applied to existing module definitions, as described in Managing Benchmark Modules. More specifically, generate and check functions are written for an existing (non-buggy) model, and then bugs are individually added to the model; for each bug, the benchmark measures how long, on average, different generate and check pairs take to find a counterexample.

3.1 Managing Benchmark Modules

This section describes utilities for making changes to existing modules to create new ones, intended to assist in adding bugs to models and keeping buggy models in sync with changes to the original model.

syntax
(define-rewrite id from ==> to
  [#:context (context-id ...)
   #:variables (variable-id ...)
   #:once-only
   #:exactly-once])

Defines a syntax transformer bound to id, the effect of which is to rewrite syntax matching the pattern from to the result expression to. The from argument should follow the grammar of a syntax-case pattern, and to acts as the corresponding result expression. The behavior of the match is the same as syntax-case, except that all identifiers in from are treated as literals with the exception of an identifier that has the same binding as a variable-id appearing in the #:variables keyword argument, which is treated as a pattern variable. (The reverse of the situation for syntax-case, where literals must be specified instead.) The rewrite will only be applied in the context of a module form, but it will be applied wherever possible within the module body, subject to a few constraints.

The rest of the keyword arguments control where and how often the rewrite may be applied. The #:once-only option specifies that the rewrite can be applied no more than once, and the #:exactly-once option asserts that the rewrite must be applied once (and no more). In both cases a syntax error is raised if the condition is not met. The #:context option searches for syntax of the form (some-id . rest), where the binding of some-id matches that of the first context-id in the #:context list, at which point it recurs on rest but drops the first id from the list. Once every context-id has been matched, the rewrite can be applied.

syntax
(define-rewrite/compose id rw-id ...)

Defines a syntax transformer bound to id, assuming that every rw-id also binds a syntax transformer, such that id has the effect of applying all of the rw-ids.

syntax
(include/rewrite path-spec mod-id rw-id ...)

If the syntax designated by path-spec is a module, the module syntax is inlined as a submodule with the identifier mod-id. Assumes each rw-id binds a syntax transformer, and applies them to the resulting module syntax. The syntax of path-spec must be same as for include.

For example, if the contents of the file mod-fx.rkt are:

"mod-fx.rkt"
#lang racket/base

(provide f)
(define x 'X!)
(define (f x) x)

Then:

> (define-rewrite xy-rw
    x ==> y
    #:context (f)
    #:once-only)
> (require "mod-fx.rkt")
> (f 3)
3
> (include/rewrite "mod-fx.rkt" submod-fx xy-rw)
> (require (prefix-in s: 'submod-fx))
> (s:f 3)
'X!

3.2 Running Benchmark Models

struct
(struct run-results (tries time cexps))
  tries : natural-number/c
  time : natural-number/c
  cexps : natural-number/c

Returned by run-gen-and-check. Minimal results for one run of a generate and check pair.

procedure
(run-gen-and-check get-gen
check
seconds
[ #:name name
#:type type]) → run-results?
  get-gen : (-> (-> any/c))
  check : (-> any/c boolean?)
  seconds : natural-number/c
  name : string? = "unknown"
  type : symbol? = 'unknown

Repeatedly generates random terms and checks if they are counterexamples to some property defined by check (a term is considered a counterexample if check returns #f for that term). The thunk get-gen is assumed to return a fresh generator, which can then be called repeatedly to generate random terms. (The outer thunk is assumed to reset the generator, for generators that have internal state. The generator is reset each time the property is found to be false.) Each term is passed to check to see if it is a counterexample. The interval in milliseconds between counterexamples is tracked, and the process is repeated either until the time specified by seconds has elapsed or the standard error in the average interval between counterexamples is less than 10% of the average.

Returns an instance of run-results containing the total number of terms generated, the total elapsed time, and the number of counterexamples found. More detailed information can be obtained using the benchmark logging facilities, for which name is refers to the name of the model, and type is a symbol indicating the generation type used.

procedure
(run-gen-and-check/mods gen-mod-path
check-mod-path
seconds
[ #:name name]) → run-results?
  gen-mod-path : module-path?
  check-mod-path : module-path?
  seconds : natural-number/c
  name : string? = "unknown"

Just like run-gen-and-check, except that gen-mod-path and check-mod-path are module paths to a generator module and a check module, which are assumed to have the following characteristics:

A generator module provides the function get-generator, which meets the specification for the get-gen argument to run-gen-and-check, and type, which is a symbol designating the type of the generator.
A check module provides the function check, which meets the specification for the check argument to run-gen-and-check.

3.3 Logging

struct
(struct bmark-log-data (data))
data : any/c

Contains data logged by the benchmark, as described below.

Detailed information gathered during a benchmark run is logged to the current-logger, at the 'info level, with the message "BENCHMARK-LOGGING". The data field of the log message contains a bmark-log-data struct, which wraps data of the form:

log-data

(list event timestamp data-list)

Where event is a symbol that designates the type of event, and timestamp is symbol that contains the current-date of the event in ISO-8601 format. The information in data-list depends on the event, but must be in the form of a list alternating between a keyword and a datum, where the keyword is a short description of the datum.

The following events are logged (the symbol designating the event is in parentheses, and the form of the data logged for each event is shown):

Run starts ('start), logged when beginning a run with a new generate/check pair.
data-list = (list '#:model model '#:type gen)

Run completions ('finished), logged at the end of a run.

data-list

(list '#:model model '#:type gen '#:time-ms time

'#:attempts tries

'#:num-counterexamples countxmps

'#:rate-terms/s rate '#:attempts/cexp atts)

Every counterexample found ('counterexample).
  data-list =
(list '#:model model '#:type gen
      '#:counterexample term '#:iterations tries
      '#:time time)
New average intervals between counterexamples ('new-average), which are recalculated whenever a counterexample is found.
data-list =
(list '#:model model '#:type gen
'#:average avg '#:stderr err)
Major garbage collections ('gc-major).
data-list = (list '#:amount amount '#:time time)
Heartbeats ('hearbeat) are logged every 10 seconds by the benchmark as a liveness check.
data-list = (list '#:model model '#:type gen)
Timeouts ('timeout), which occur when generating or checking a single takes term longer than 5 minutes.
data-list =
(list '#:during 'check '#:term term '#:model model
'#:type gen)
| (list '#:during 'generation '#:model model '#:type gen)

procedure
(benchmark-logging-to filename thunk) → any/c
filename : string?
thunk : (-> any/c)

Intercepts events logged by the benchmark and writes the data specified by the log-data production above to filename.

parameter
(bmark-log-directory)
→ (or/c path-string? path-for-some-system? 'up 'same)
(bmark-log-directory directory) → void?
directory : (or/c path-string? path-for-some-system? 'up 'same)
= (current-directory)

Controls the directory where filename in benchmark-logging-to is located.

3.4 Plotting

Plotting and analysis tools consume data of the form produced by the benchmark logging facilities (see Logging).

TODO!

3.5 Benchmark Models

The models included in the distribution of the benchmark are in the "redex/benchmark/models" subdirectory of the redex-benchmark package. Each such subdirectory contains an info file named according to the pattern "<name>-info.rkt", defining a module that provides the function:

procedure
(all-mods)
→ (listof (list/c string? module-path? module-path?))

Returns a list of generate and check pairs for a given model or set of models, such that for each pair the first element is the name of the model, the second is a module defining a generator, and the third is a module defining a check function.

The file "redex/benchmark/models/all-info.rkt" provides an all-mods function listing all of the generate and check pairs included in the benchmark.

A command line interface is provided by the file "redex/benchmark/run-benchmark.rkt", which takes an “info” file as described above as its primary argument and provides options for running the listed tests. It automatically writes results from each run to a separate log file, all of which are located in a temporary directory. (The directory path is printed to standard out at the beginning of the run).

top ← prev up next →

1	Amb: A Redex Tutorial
2	The Redex Reference
3	Automated Testing Benchmark
	Index

3.1	Managing Benchmark Modules
3.2	Running Benchmark Models
3.3	Logging
3.4	Plotting
3.5	Benchmark Models