- Producing inputs with two fault inducing fragments guaranteed to be present.
Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
This post is the implementation of my paper Input Algebras
In my previous post on inducing faults I explained the deficiency of abstract failure inducing inputs mined using DDSet, and showed how to overcome that by inserting that abstract (evocative) pattern into a grammar, producing evocative grammars that guarantee that the evocative fragment is present in any input generated. However, what if one wants to produce inputs that contain two evocative fragments? or wants to produce inputs that are guaranteed to contain at least one of them? This is what we will discuss in this post.
As before, let us start with importing our required modules.
Available PackagesThese are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.
- earleyparser-0.0.1-py2.py3-none-any.whl from "Earley Parser".
- hdd-0.0.1-py2.py3-none-any.whl from "Hierarchical Delta Debugging".
- simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
- ddset-0.0.1-py2.py3-none-any.whl from "Simple DDSet".
- gatleastsinglefault-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Faults".
The imported modules
Producing inputs with two fault inducing fragments guaranteed to be present.
From the previous post inducing faults we extracted two evocative subtrees
We now want to produce a grammar such that any input produced from that grammar is guaranteed to contain both evocative subtrees. First, let us extract the corresponding grammars. Here is the first one
We save this for later
Here is the second grammar.
We save this for later
Now, we want to combine these grammars. Remember that a gramamr has a set of
definitions that correspond to nonterminals, and each definition has a set of
rules. We start from the rules. If we want to combine two grammars, we need
to make sure that any input produced from the combined grammar is also parsed
by the original grammars. That is, any rule from the combined grammar should
have a corresponding rule in the original grammars. This gives us the
algorithm for combining two rules. First, we can only combine rules that have
similar base representation. That is, if ruleA is
[<A f1>, <B f2>, 'T']
<B> are nonterminals and
T is a terminal
and ruleB is
[<A f1>, <C f3>], these can’t have a combination in the
combined grammar. On the other hand, if ruleB is
[<A f3>, <B f4> 'T']
then, a combined rule of
[<A f1 & f3>, <B f2 & f4>, 'T'] can infact
represent both parent rules. That is, when combining two rules from different,
grammars, their combination is empty if they have different base
The idea for combining two definitions of nonterminals is simply using the
distributive law. A definition is simply #
A1 or B1 or C1 where
A1 etc are
rules. Now, when you want to and two defintions, you have
and(A1 or B1 or C1, A2 or B2 or C2) , and you want the
or out again.
So, this becomes
(A1 AND A2) OR (A1 AND B2) OR (A1 AND C2) OR (A2 AND B1) OR (A2 AND C1) OR (B1 AND B2) OR (B1 AND C2) OR (B2 AND C1) OR (C1 AND C2)
which is essentially that many rules.
If they have the same base representation, then we only have to deal with how
to combine the nonterminal symbols. The terminal symbols are exactly the same
in parent rules as well as combined rule. So, given two tokens, we can
combine them as follows. The
and of a refined nonterminal and a base
nonterminal is always the refined nonterminal. Otherwise, it is simply an
and() specialization of both refinements.
Next, we define combination for rules
Next, our grammars may contain multiple rules that represent the same base rule. All the rules that represent the same base rule is called a ruleset. combining two rulesets is done by producing a new ruleset that contains all possible pairs of rules from the parent ruleset.
Next, we define a few helper functions that collects all rulesets
Now, we can define the conjunction of definitions as follows.
We can now define our grammar conjunction as follows.
This grammar is now guaranteed to produce instances of both characterizing subtrees.
Producing inputs with at least one of the two fault inducing fragments guaranteed to be present.
How do we construct grammars that are guaranteed to contain at least one of
the evocative patterns? This is actually much less complicated than
The idea is simply using the distributive law. A definition is simply
A1 or B1 or C1 as before where
A1 etc are rules.
Now, when you want to
or two definitions, you have
or(A1 or B1 or C1, A2 or B2 or C2), then it simply becomes
A1 or B1 or C1 or A2 or B2 or C2
At this point, our work is essentially done. All that we need to do
is to merge any rules that potentially allow us to merge. However, there is
one constraint. When we do a disjunction, it is possible that the rules being
combined may contain common parts. In such cases, a naive disjunction will
produce rules which can lead to ambiguous grammars. So, we have to be more
careful to remove ambiguity.
For nonterminals, it is similar to
and except that the base cases differ.
or of a base nonterminal with a refined nonterminal is always the base.
What rules can be merged? Only those rules can be merged that has
a single refinement difference. That is if we have
or(<A 1> <B 5> <C>, <A 2> <B 5> <C>), then this merges to
<A or(1,2)><B 5><C>. However
or(<A 1> <B 5> <C>, <A 2> <B 6> <C>)
is not mergeable directly. For now, we leave such rules separate. However,
note that this can lead to ambiguity in grammar. In a later post, we will
show how to remove this ambiguity.
As a hint as to what we can do to remove ambiguity,
(<A a1> <B a2>) or (<A a3> <B a4>), we replace it with
(<A a1> <B a2>) and (<A a3> <B a4>) | (<A a1> <B a2>) and neg(<A a3> <B a4>) | (<A a3> <B a4>) and neg(<A a1> <B a2>)
However, we do not do that here as
neg is yet to be implemented.
or rulesets we first combine
both rulesets together then (optional) take one at a time,
and check if it can be merged with another.
Now, we can define the disjunction of definitions as follows.
This grammar is now guaranteed to produce at least one of the evocative subtrees Using it.
As before, the runnable source of this notebook can be found here
The runnable Python source for this notebook is available here.
The installable python wheel
gmultiplefaults is available here.