3 Automated Testing Benchmark
(require redex/benchmark) | package: redex-benchmark |
Redex’s automated testing benchmark provides a collection of buggy models and utilities to test how efficiently methods of automatic test case generation are able to find counterexamples for each bug.
The benchmark is organized by pairs of generate and check functions, as described in Running Benchmark Models. Usually these are defined on a per-module basis, using pattern based rewrites applied to existing module definitions, as described in Managing Benchmark Modules. More specifically, generate and check functions are written for an existing (non-buggy) model, and then bugs are individually added to the model; for each bug, the benchmark measures how long, on average, different generate and check pairs take to find a counterexample.
3.1 Managing Benchmark Modules
This section describes utilities for making changes to existing modules to create new ones, intended to assist in adding bugs to models and keeping buggy models in sync with changes to the original model.
syntax
(define-rewrite id from ==> to [#:context (context-id ...) #:variables (variable-id ...) #:once-only #:exactly-once])
Defines a syntax transformer bound to id, the effect of which is to rewrite syntax matching the pattern from to the result expression to. The from argument should follow the grammar of a syntax-case pattern, and to acts as the corresponding result expression. The behavior of the match is the same as syntax-case, except that all identifiers in from are treated as literals with the exception of an identifier that has the same binding as a variable-id appearing in the #:variables keyword argument, which is treated as a pattern variable. (The reverse of the situation for syntax-case, where literals must be specified instead.) The rewrite will only be applied in the context of a module form, but it will be applied wherever possible within the module body, subject to a few constraints.
The rest of the keyword arguments control where and how often the rewrite may be applied. The #:once-only option specifies that the rewrite can be applied no more than once, and the #:exactly-once option asserts that the rewrite must be applied once (and no more). In both cases a syntax error is raised if the condition is not met. The #:context option searches for syntax of the form (some-id . rest), where the binding of some-id matches that of the first context-id in the #:context list, at which point it recurs on rest but drops the first id from the list. Once every context-id has been matched, the rewrite can be applied.
syntax
(define-rewrite/compose id rw-id ...)
syntax
(include/rewrite path-spec mod-id rw-id ...)
"mod-fx.rkt"
#lang racket/base (provide f) (define x 'X!) (define (f x) x)
> (define-rewrite xy-rw x ==> y #:context (f) #:once-only)
> (require "mod-fx.rkt")
> (f 3) 3
> (include/rewrite "mod-fx.rkt" submod-fx xy-rw)
> (require (prefix-in s: 'submod-fx))
> (s:f 3) 'X!
3.2 Running Benchmark Models
struct
(struct run-results (tries time cexps))
tries : natural-number/c time : natural-number/c cexps : natural-number/c
procedure
(run-gen-and-check get-gen check seconds [ #:name name #:type type]) → run-results? get-gen : (-> (-> any/c)) check : (-> any/c boolean?) seconds : natural-number/c name : string? = "unknown" type : symbol? = 'unknown
Returns an instance of run-results containing the total number of terms generated, the total elapsed time, and the number of counterexamples found. More detailed information can be obtained using the benchmark logging facilities, for which name is refers to the name of the model, and type is a symbol indicating the generation type used.
procedure
(run-gen-and-check/mods gen-mod-path check-mod-path seconds [ #:name name]) → run-results? gen-mod-path : module-path? check-mod-path : module-path? seconds : natural-number/c name : string? = "unknown"
A generator module provides the function get-generator, which meets the specification for the get-gen argument to run-gen-and-check, and type, which is a symbol designating the type of the generator.
A check module provides the function check, which meets the specification for the check argument to run-gen-and-check.
3.3 Logging
struct
(struct bmark-log-data (data))
data : any/c
Detailed information gathered during a benchmark run is logged to the current-logger, at the 'info level, with the message "BENCHMARK-LOGGING". The data field of the log message contains a bmark-log-data struct, which wraps data of the form:
log-data | = | (list event timestamp data-list) |
Where event is a symbol that designates the type of event, and timestamp is symbol that contains the current-date of the event in ISO-8601 format. The information in data-list depends on the event, but must be in the form of a list alternating between a keyword and a datum, where the keyword is a short description of the datum.
- Run starts ('start), logged when beginning a run with a new generate/check pair.
data-list = (list '#:model model '#:type gen) - Run completions ('finished), logged at the end of a run.
data-list = (list '#:model model '#:type gen '#:time-ms time '#:attempts tries '#:num-counterexamples countxmps '#:rate-terms/s rate '#:attempts/cexp atts) - Every counterexample found ('counterexample).
data-list = (list '#:model model '#:type gen '#:counterexample term '#:iterations tries '#:time time) - New average intervals between counterexamples ('new-average), which are recalculated whenever a counterexample is found.
data-list = (list '#:model model '#:type gen '#:average avg '#:stderr err) - Major garbage collections ('gc-major).
data-list = (list '#:amount amount '#:time time) - Heartbeats ('hearbeat) are logged every 10 seconds by the benchmark as a liveness check.
data-list = (list '#:model model '#:type gen) - Timeouts ('timeout), which occur when generating or checking a single takes term longer than 5 minutes.
procedure
(benchmark-logging-to filename thunk) → any/c
filename : string? thunk : (-> any/c)
parameter
→ (or/c path-string? path-for-some-system? 'up 'same) (bmark-log-directory directory) → void? directory : (or/c path-string? path-for-some-system? 'up 'same)
= (current-directory)
3.4 Plotting
Plotting and analysis tools consume data of the form produced by the benchmark logging facilities (see Logging).
TODO!
3.5 Benchmark Models
The models included in the distribution of the benchmark are in the "redex/benchmark/models" subdirectory of the redex-benchmark package. Each such subdirectory contains an info file named according to the pattern "<name>-info.rkt", defining a module that provides the function:
procedure
→ (listof (list/c string? module-path? module-path?))
The file "redex/benchmark/models/all-info.rkt" provides an all-mods function listing all of the generate and check pairs included in the benchmark.
A command line interface is provided by the file "redex/benchmark/run-benchmark.rkt", which takes an “info” file as described above as its primary argument and provides options for running the listed tests. It automatically writes results from each run to a separate log file, all of which are located in a temporary directory. (The directory path is printed to standard out at the beginning of the run).