Previously, we had discussed how delta-debugging worked, and I had explained at that time that when it comes
to preserving semantics, the only options are either custom passes such as CReduce
or commandeering the generator as done by Hypothesis.
Of the two, the Hypothesis approach is actually more generalizable to arbitrary generators. Hence we will look at how it is done. For ease
of naming, I will call this approach the generator reduction approach. Note that we use the simple delta debug on the choice sequences.
This is different from Hypothesis in that Hypothesis uses a number of custom passes rather than delta debug. For further information
on Hypothesis, please see the paper by MacIver et al.1 at ECOOP.
For the generator reduction to work, we need a generator in the first place. So, we start with a rather simple generator that we discussed previously.
Contents
Important:Pyodide takes time to initialize.
Initialization completion is indicated by a red border around Run all button.
Available Packages
These are packages that refer either to my previous posts or to pure python
packages that I have compiled, and is available in the below locations. As
before, install them if you need to run the program directly on the machine.
We have a grammar describes a simple assignment language.
Using this grammar.
The context free grammar assignment_grammar generates assignment expressions. However, it tends to
use variables before they are defined. We want to avoid that. However, using only defined variables is a context sensitive feature, which we incorporate
by a small modification to the fuzzer.
We now allow only defined variables to be used for later expansion. The helper procedures defining_var is invoked
when we produce the left hand side of the variable assignment, and the defined_var is invoked when the variable is
referred to from the right hand side. Hence defined_var ensures only defined vars are used. The sync function
ensures that the definition is complete only when the assignment is finished.
Note that the modifications assume the knowledge of the <var> key in the grammar defined in the driver.
The driver now includes a context sensitive grammar in the form of pre and post functions.
As you can see, the variables used are only those that were defined earlier. So, how do we minimize such a generated string?
For the answer, we need to modify our fuzzer a bit more. We need to make it take a stream of integers which are interpreted as the choices at each step.
The choice sequence both keeps track of all choices made, and also allows one to reuse previous choices.
The driver is as follows
The choice sequence is printed out at the end. The same sequence can be used later, to produce the same string. We use this
in the next step. Now, all that we need is to hook up the predicate for ddmin, and its definitions.
First, the traditional ddmin that works on independent deltas that we defined in the previous post.
The ddmin now operates on choice sequences. So we need to convert them back to string
We also need our predicate. Note that we specialcase None in case the ints_to_string cannot successfully produce a value.
The driver tries to minimize the string if predicate returns true.
As you can see, the original string that is a 61 choice long sequence has become reduced to an 8 choice long sequence, with a corresponding
decrease in the string length. At this point, note that it is fairly magic how the approach performs. In particular, as soon as an edit is made,
the remaining choices are not interpreted as in the original string. What if we help the reducer by specifying an NOP that allows one to delete
chunks with a chance for the remaining string to be interpreted similarly?
The idea is to delete a sequence of values and replace it by a single -1 value which will cause the choice fuzzer to interpret it as fill in
with default value. The ddmin is modified as follows:
Next, we need to get our fuzzer to understand the -1 value.
We add defaults to each nonterminal, and modify the select function to take a default value.
Rebinding our grammar
The choice sequence now returns the default when it sees the -1 value.
These are all the modifications that we require.
How does this modification fare against the original without modification?
With seed 5
Another:
With seed 9
Another:
With seed 16
There does not seem to be a lot of advantage in using an NOP.
Next: How does this compare against the custom passes of Hypothesis? and how does it compare against direct delta debug and variants of HDD including Perses.