Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
This post is the implementation of my paper Input Algebras
In my previous post on DDSet I explained how
one can abstract failure inducing inputs. The idea is that given an input such
'1+((2*3/4))' which induces a
failure in the program, one can extract a minimal
failure inducing input such as say
'((4))', which can again be abstracted to
produce the expression
((<expr>)) which precisely defines the fault
inducing input. Such an input is quite useful for the developers to understand
why a failure occurred.
However, such inputs are insufficient from a testing perspective. For example,
such a pattern fails to capture the contexts in which the doubled parenthesis
can appear, and hence induce the failure. That is, how do we use the
((<expr>)) to produce inputs such as
'1+((2*3/4))'? This is what
this post will discuss.
Note that the guarantee of inducing failures is statistical. That is, the abstract input is mined by evaluating produced inputs for failure a fixed (configurable) number of times. It is possible that some rare inputs may still fail to induce the failure. Since abstract possibly failure inducing inputs is a mouthful, let us call these abstract failure inducing inputs evocative patterns for short. In this post, we will see how to transform such an evocative pattern to an evocative grammar that is guaranteed to produce evocative inputs (that is, each input contains an evocative fragment and the derivation tree of the input contains an evocative subtree) in all contexts that are guaranteed (statistically) to induce failures.
As before, let us start with importing our required modules.
We import the following modules
We have our input that causes the failure.
We first parse the input
Then reduce input
Finally, extract the abstract pattern.
However, it does not actually tell you in what all
circumstances the failure can be reproduced. The problem is that it
represents a complete abstract input. The only general part here is
<expr> between the doubled parenthesis. So, it tells us that we can
((2 + 3)) etc. but these are not the only possible
errors. Indeed, our original error: ‘1+((2*3/4))’ does not fit this
template. So, how can we rectify this limitation?
A grammar that produces at least one evocative fragment per input generated.
The basic idea is that if we can find an abstract subtree of the evocative pattern derivation tree, such that the presence of this abstract subtree in the input guarantees the failure, then we can modify the grammar such that this abstract subtree is always present (we call the abstract subtree an evocative subtree). That is, for any input from such a grammar, at least one instance of the evocative subtree will be present. This is fairly easy to do if the generated tree contains a nonterminal of the same kind as that of the top node of the evocative subtree. Simply replace that node with the evocative subtree, and fill in the abstract nonterminals in the evocative subtree with concrete expansions.
On the other hand, this gives us an idea. What if we modify the grammar
such that at least one instance of such a nonterminal is present? Such
a grammar is called the
To produce a reaching grammar, first we need to find what nonterminals are
reachable from the expansion of a given nonterminal.
<A> is reachable from another nonterminal
<B> if and only
if one of the expansion rules for
<B> contains the nonterminal
it is reachable from one of the nonterminals in the expansion rules of
Note that it is not enough to be the same nonterminal. That is, (for e.g.)
<A> is not # reachable from
<A> if expansion rules of
<A> contain only
We can now collect it in a data structure for easier access
That is, if we want to know the nonterminals that are reachable from
we can simply do
That is, only
<integer> are reachable from the expansion of
Next, given an evocative subtree, we want to find what tokens of the grammar can actually embed such a subtree. We call the top node of an evocative subtree its characterizing node, and the nonterminal the characterizing nonterminal of the evocative subtree.
Say we assume that
<factor> is the characterizing nonterminal. Here are the
locations in grammar where one can embed the evocative subtree.
Given the insertion locations, can we produce a grammar such that we can
guarantee that at least one instance of evocative subtree can be inserted?
To do that, all we need to guarantee is that the start node in any derivation
tree can reach the characterizing nonterminal.
For now, let us call our fault
F1. Let us indicate that our start symbol
guarantees reachability of characterizing nonterminal by specializing it. So, our new
start symbol is
Next, for the start symbol to guarantee reachability to characterizing nonterminal, all that we need to ensure is that all the expansion rules of start can reach the characterizing nonterminal. On the other hand, for a guarantee that an expansion rule can reach the characterizing nonterminal, all that is required is that one of the nonterminals in that rule guarantees reachability. We start with a few helper functions
It is used as follows
Next, we need to ensure that our evocative subtree can form a unique subtree. For that, all we need to do is that all nodes in the subtree are named uniquely. Not all nodes in the evocative subtree needs unique names however. DDSet produces trees such that some nodes in the tree are left abstract. We leave these with the original node names.
We can extract a unique pattern grammar from this tree.
We can convert this to grammar, but first, to display the grammar properly, we
pattern_grammar() that wraps both calls.
We define a procedure to reverse our pattern grammar to ensure we can get back our tree.
The tree can be recovered thus.
Given the reaching grammar and the pattern grammar, we can combine them to produce the complete grammar
Here, you will notice a problem:
There are a few nonterminals such as
<integer F1> that do not have a
definition. That is, it has no expansion (not even empty expansion). So,
any rule that uses it will also by definition have no possible expansions.
This has consequences during generation, forcing us to abandon partially
constructed trees. Hence, we define a
Cleanup of the grammar
Let us make sure that we are operating on a copy of the grammar.
Next, we find the empty keys in the grammar that do not have a definition.
Now, we need to remove such empty definitions, which also means any rules that
refer to the corresponding nonterminals also have to be removed. The
remove_nonterminal function takes a nonterminal, and removes its references
from the given grammar.
When removing a rule, it can also happen that the corresponding nonterminal may be left with no further rules. Hence, we need to define finding and removing empty nonterminals from the grammar recursively.
These gives us the
At this point we are ready to define our
The new grammar is as follows
This grammar is now guaranteed to produce at least one instance of the characterizing node.
This seems to work. However, there is a wrinkle. Note that we set the
evocative subtree as this subtree
pattern, which is a
<factor> node. But why not any of the other subtrees? for example, why not
evocative_pattern itself? Indeed, the effectiveness of
our specialized grammar is dependent on finding as small a evocative subtree
as possible that fully captures the fault. So, how do we automate it?
The idea is fairly simple. We start from the root of the evocative pattern.
We know that this pattern fully captures the predicate. That is, any valid input
generated from this pattern by filling in the abstract nodes is (statistically)
guaranteed to produce the failure. Next, we move to the children of this node.
If the node is abstract, we return immediately. If however, the child is not
abstract, we let the child be the evocative subtree, and try to reproduce the
failure. If the failure is not reproduced, we move to the next child. If the
failure is reproduced we recurse deeper into the current child.
That is, we found the correct evocative subtree. We can confirm this by comparing it against the subtree we identified manually.
We save this subtree for later use.
Here is another evocative pattern, but we define a different predicate.
Check the assertion
We first parse the input as before
Then reduce input
Finally, extract the evocative pattern.
Then extract the evocative subtree
We save this evocative pattern for later use.
The new grammar is as follows
This grammar is now guaranteed to produce at least one instance of a divide by zero.
At this point, we have the ability to guarantee that a evocative
fragment is present in any inputs produced. In later posts I will discuss how
combine multiple such fragments together using
will also discuss how to ensure
at most one fragment in the input and
exactly one fragment or
exactly n fragments.
As before, the runnable source of this notebook can be found here