Important:Pyodide takes time to initialize.
Initialization completion is indicated by a red border around Run all button.
This post is the implementation of my paper Input Algebras
In my previous post on DDSet I explained how
one can abstract failure inducing inputs. The idea is that given an input such
as '1+((2*3/4))' which induces a
failure in the program, one can extract a minimal
failure inducing input such as say '((4))', which can again be abstracted to
produce the expression ((<expr>)) which precisely defines the fault
inducing input. Such an input is quite useful for the developers to understand
why a failure occurred.
However, such inputs are insufficient from a testing perspective. For example,
such a pattern fails to capture the contexts in which the doubled parenthesis
can appear, and hence induce the failure. That is, how do we use the
pattern ((<expr>)) to produce inputs such as '1+((2*3/4))'? This is what
this post will discuss.
Note that the guarantee of inducing failures is statistical. That is, the
abstract input is mined by evaluating produced
inputs for failure a fixed (configurable) number of times. It is possible that
some rare inputs may still fail to induce the failure. Since abstract possibly
failure inducing inputs is a mouthful, let us call these abstract failure
inducing inputs evocative patterns for short. In this post, we will see
how to transform such an evocative pattern to an evocative grammar that is
guaranteed to produce evocative inputs (that is, each input contains an
evocative fragment and the derivation tree of the input contains an
evocative subtree) in all contexts that are guaranteed (statistically) to
induce failures.
As before, let us start with importing our required modules.
Available Packages
These are packages that refer either to my previous posts or to pure python
packages that I have compiled, and is available in the below locations. As
before, install them if you need to run the program directly on the machine.
To install, simply download the wheel file (`pkg.whl`) and install using
`pip install pkg.whl`.
However, it does not actually tell you in what all
circumstances the failure can be reproduced. The problem is that it
represents a complete abstract input. The only general part here is
<expr> between the doubled parenthesis. So, it tells us that we can
produce ((1)), ((2 + 3)) etc. but these are not the only possible
errors. Indeed, our original error: 1+((2*3/4)) does not fit this
template. So, how can we rectify this limitation?
A grammar that produces at least one evocative fragment per input generated.
The basic idea is that if we can find an abstract subtree of the
evocative pattern derivation tree, such that the presence of this abstract
subtree in the input guarantees the failure, then we can modify the grammar
such that this abstract subtree is always present (we call the abstract
subtree an evocative subtree). That is, for any input
from such a grammar, at least one instance of the evocative
subtree will be present. This is fairly easy to do if the generated tree
contains a nonterminal of the same kind as that of the top node of the evocative subtree.
Simply replace that node with the evocative subtree, and fill in the
abstract nonterminals in the evocative subtree with concrete expansions.
Reachable Grammar
On the other hand, this gives us an idea. What if we modify the grammar
such that at least one instance of such a nonterminal is present? Such
a grammar is called the reachable_grammar.
To produce a reaching grammar, first we need to find what nonterminals are
reachable from the expansion of a given nonterminal.
A nonterminal <A> is reachable from another nonterminal <B> if and only
if one of the expansion rules for <B> contains the nonterminal <A> or
it is reachable from one of the nonterminals in the expansion rules of <B>
Note that it is not enough to be the same nonterminal. That is, (for e.g.)
<A> is not reachable from <A> if expansion rules of <A> contain only
terminal symbols.
We can now collect it in a data structure for easier access
That is, if we want to know the nonterminals that are reachable from <integer>,
we can simply do
That is, only <digit> and <integer> are reachable from the expansion of
nonterminal <integer>
Reachable positions.
Next, given an evocative subtree, we want to find what tokens of the grammar
can actually embed such a subtree. We call the top node of an evocative
subtree its characterizing node, and the nonterminal the characterizing
nonterminal of the evocative subtree.
Say we assume that <factor> is the characterizing nonterminal. Here are the
locations in grammar where one can embed the evocative subtree.
Insertion Grammar
Given the insertion locations, can we produce a grammar such that we can
guarantee that at least one instance of evocative subtree can be inserted?
To do that, all we need to guarantee is that the start node in any derivation
tree can reach the characterizing nonterminal.
For now, let us call our fault F1. Let us indicate that our start symbol
guarantees reachability of characterizing nonterminal by specializing it. So, our new
start symbol is <start F1>
Next, for the start symbol to guarantee reachability to characterizing nonterminal,
all that we need to ensure is that all the expansion rules of start can
reach the characterizing nonterminal. On the other hand, for a guarantee that an
expansion rule can reach the characterizing nonterminal, all that is required is that
one of the nonterminals in that rule guarantees reachability.
We start with a few helper functions
Defining the reachable_key()
It is used as follows
Pattern Grammar
Next, we need to ensure that our evocative subtree can form a unique subtree.
For that, all we need to do is that all nodes in the subtree are named uniquely.
Not all nodes in the evocative subtree needs unique names however. DDSet
produces trees such that some nodes in the tree are left abstract. We leave
these with the original node names.
Using
We can extract a unique pattern grammar from this tree.
We can convert this to grammar, but first, to display the grammar properly, we
define display_grammar()
Using it.
We define pattern_grammar() that wraps both calls.
Using it.
We define a procedure to reverse our pattern grammar to ensure we can
get back our tree.
The tree can be recovered thus.
Given the reaching grammar and the pattern grammar, we can combine them to
produce the complete grammar
Using it
Here, you will notice a problem:
There are a few nonterminals such as <integer F1> that do not have a
definition. That is, it has no expansion (not even empty expansion). So,
any rule that uses it will also by definition have no possible expansions.
This has consequences during generation, forcing us to abandon partially
constructed trees. Hence, we define a grammar_gc()
Cleanup of the grammar
Let us make sure that we are operating on a copy of the grammar.
Next, we find the empty keys in the grammar that do not have a definition.
Now, we need to remove such empty definitions, which also means any rules that
refer to the corresponding nonterminals also have to be removed. The
remove_nonterminal function takes a nonterminal, and removes its references
from the given grammar.
When removing a rule, it can also happen that the corresponding nonterminal
may be left with no further rules. Hence, we need to
define finding and removing empty nonterminals from the grammar recursively.
These gives us the grammar_gc()
At this point we are ready to define our atleast_one_fault_grammar()
The new grammar is as follows
This grammar is now guaranteed to produce at least one instance of the characterizing node.
This seems to work. However, there is a wrinkle. Note that we set the
evocative subtree as this subtree pattern[1][0][1][0][1][0], which is a
<factor> node. But why not any of the other subtrees? for example, why not
evocative_pattern itself? Indeed, the effectiveness of
our specialized grammar is dependent on finding as small a evocative subtree
as possible that fully captures the fault. So, how do we automate it?
The idea is fairly simple. We start from the root of the evocative pattern.
We know that this pattern fully captures the predicate. That is, any valid input
generated from this pattern by filling in the abstract nodes is (statistically)
guaranteed to produce the failure. Next, we move to the children of this node.
If the node is abstract, we return immediately. If however, the child is not
abstract, we let the child be the evocative subtree, and try to reproduce the
failure. If the failure is not reproduced, we move to the next child. If the
failure is reproduced we recurse deeper into the current child.
Usage
That is, we found the correct evocative subtree. We can confirm this by
comparing it against the subtree we identified manually.
We save this subtree for later use.
Here is another evocative pattern, but we define a different predicate.
Check the assertion
We first parse the input as before
Then reduce input
Finally, extract the evocative pattern.
Then extract the evocative subtree
prettyprinting
using
We save this evocative pattern for later use.
The new grammar is as follows
This grammar is now guaranteed to produce at least one instance of a divide by zero.
At this point, we have the ability to guarantee that a evocative
fragment is present in any inputs produced. In later posts I will discuss how
combine multiple such fragments together using and, or or negation. I
will also discuss how to ensure at most one fragment in the input and
exactly one fragment or exactly n fragments.
As before, the runnable source of this notebook can be found here
Artifacts
The runnable Python source for this notebook is available here.
The installable python wheel gatleastsinglefault is available here.