Contents

  1. Producing inputs with two fault inducing fragments guaranteed to be present.
    1. And Grammars
      1. Combining tokens
    2. Combining rules
      1. Combining rulesets
      2. definition conjunction
      3. grammar conjunction
    3. Producing inputs with at least one of the two fault inducing fragments guaranteed to be present.
      1. Nonterminals
    4. rules
    5. rulesets
      1. definition disjunction
      2. grammar disjunction
    6. Artifacts

Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.

This post is the implementation of my paper Input Algebras

In my previous post on inducing faults I explained the deficiency of abstract failure inducing inputs mined using DDSet, and showed how to overcome that by inserting that abstract (evocative) pattern into a grammar, producing evocative grammars that guarantee that the evocative fragment is present in any input generated. However, what if one wants to produce inputs that contain two evocative fragments? or wants to produce inputs that are guaranteed to contain at least one of them? This is what we will discuss in this post.

As before, let us start with importing our required modules.

Available Packages These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.
  1. earleyparser-0.0.1-py2.py3-none-any.whl from "Earley Parser".
  2. hdd-0.0.1-py2.py3-none-any.whl from "Hierarchical Delta Debugging".
  3. simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
  4. ddset-0.0.1-py2.py3-none-any.whl from "Simple DDSet".
  5. gatleastsinglefault-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Faults".

The imported modules



Producing inputs with two fault inducing fragments guaranteed to be present.

From the previous post inducing faults we extracted two evocative subtrees



We now want to produce a grammar such that any input produced from that grammar is guaranteed to contain both evocative subtrees. First, let us extract the corresponding grammars. Here is the first one



We save this for later



Here is the second grammar.



We save this for later



And Grammars

Now, we want to combine these grammars. Remember that a gramamr has a set of definitions that correspond to nonterminals, and each definition has a set of rules. We start from the rules. If we want to combine two grammars, we need to make sure that any input produced from the combined grammar is also parsed by the original grammars. That is, any rule from the combined grammar should have a corresponding rule in the original grammars. This gives us the algorithm for combining two rules. First, we can only combine rules that have similar base representation. That is, if ruleA is [<A f1>, <B f2>, 'T'] where <A> and <B> are nonterminals and T is a terminal and ruleB is [<A f1>, <C f3>], these can’t have a combination in the combined grammar. On the other hand, if ruleB is [<A f3>, <B f4> 'T'] then, a combined rule of [<A f1 & f3>, <B f2 & f4>, 'T'] can infact represent both parent rules. That is, when combining two rules from different, grammars, their combination is empty if they have different base representation.

The idea for combining two definitions of nonterminals is simply using the distributive law. A definition is simply # A1 or B1 or C1 where A1 etc are rules. Now, when you want to and two defintions, you have and(A1 or B1 or C1, A2 or B2 or C2) , and you want the or out again. So, this becomes

(A1 AND A2) OR (A1 AND B2) OR (A1 AND C2) OR
(A2 AND B1) OR (A2 AND C1) OR
(B1 AND B2) OR (B1 AND C2) OR
(B2 AND C1) OR (C1 AND C2)

which is essentially that many rules.

Combining tokens

If they have the same base representation, then we only have to deal with how to combine the nonterminal symbols. The terminal symbols are exactly the same in parent rules as well as combined rule. So, given two tokens, we can combine them as follows. The and of a refined nonterminal and a base nonterminal is always the refined nonterminal. Otherwise, it is simply an and() specialization of both refinements.



Using it.



Combining rules

Next, we define combination for rules



Using it.



Combining rulesets

Next, our grammars may contain multiple rules that represent the same base rule. All the rules that represent the same base rule is called a ruleset. combining two rulesets is done by producing a new ruleset that contains all possible pairs of rules from the parent ruleset.



Using it.



Next, we define a few helper functions that collects all rulesets



definition conjunction

Now, we can define the conjunction of definitions as follows.



Using it.



grammar conjunction

We can now define our grammar conjunction as follows.



Using it.



This grammar is now guaranteed to produce instances of both characterizing subtrees.



Producing inputs with at least one of the two fault inducing fragments guaranteed to be present.

How do we construct grammars that are guaranteed to contain at least one of the evocative patterns? This is actually much less complicated than and The idea is simply using the distributive law. A definition is simply A1 or B1 or C1 as before where A1 etc are rules. Now, when you want to or two definitions, you have or(A1 or B1 or C1, A2 or B2 or C2), then it simply becomes A1 or B1 or C1 or A2 or B2 or C2 At this point, our work is essentially done. All that we need to do is to merge any rules that potentially allow us to merge. However, there is one constraint. When we do a disjunction, it is possible that the rules being combined may contain common parts. In such cases, a naive disjunction will produce rules which can lead to ambiguous grammars. So, we have to be more careful to remove ambiguity.

Nonterminals

For nonterminals, it is similar to and except that the base cases differ. or of a base nonterminal with a refined nonterminal is always the base.



rules

What rules can be merged? Only those rules can be merged that has a single refinement difference. That is if we have or(<A 1> <B 5> <C>, <A 2> <B 5> <C>), then this merges to <A or(1,2)><B 5><C>. However or(<A 1> <B 5> <C>, <A 2> <B 6> <C>) is not mergeable directly. For now, we leave such rules separate. However, note that this can lead to ambiguity in grammar. In a later post, we will show how to remove this ambiguity. As a hint as to what we can do to remove ambiguity, if given (<A a1> <B a2>) or (<A a3> <B a4>), we replace it with

  (<A a1> <B a2>) and (<A a3> <B a4>)
| (<A a1> <B a2>) and neg(<A a3> <B a4>)
| (<A a3> <B a4>) and neg(<A a1> <B a2>)

However, we do not do that here as neg is yet to be implemented.



using



rulesets

For or rulesets we first combine both rulesets together then (optional) take one at a time, and check if it can be merged with another.



Using it.



definition disjunction

Now, we can define the disjunction of definitions as follows.



Using



grammar disjunction



Using it.



This grammar is now guaranteed to produce at least one of the evocative subtrees Using it.



As before, the runnable source of this notebook can be found here

Artifacts

The runnable Python source for this notebook is available here.

The installable python wheel gmultiplefaults is available here.