# Specializing Context-Free Grammars for Inducing Multiple Faults

## Contents

**Important:** Pyodide takes time to initialize.
Initialization completion is indicated by a red border around *Run all* button.

This post is the implementation of my paper *Input Algebras*

In my previous post on inducing faults I explained the deficiency of abstract failure inducing inputs mined using DDSet, and showed how to overcome that by inserting that abstract (evocative) pattern into a grammar, producing evocative grammars that guarantee that the evocative fragment is present in any input generated. However, what if one wants to produce inputs that contain two evocative fragments? or wants to produce inputs that are guaranteed to contain at least one of them? This is what we will discuss in this post.

As before, let us start with importing our required modules.

## Available Packages

These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.- earleyparser-0.0.1-py2.py3-none-any.whl from "Earley Parser".
- hdd-0.0.1-py2.py3-none-any.whl from "Hierarchical Delta Debugging".
- simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
- ddset-0.0.1-py2.py3-none-any.whl from "Simple DDSet".
- gatleastsinglefault-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Faults".

The imported modules

# Producing inputs with two fault inducing fragments guaranteed to be present.

From the previous post inducing faults we extracted two evocative subtrees

We now want to produce a grammar such that any input produced from that grammar is guaranteed to contain both evocative subtrees. First, let us extract the corresponding grammars. Here is the first one

We save this for later

Here is the second grammar.

We save this for later

## And Grammars

Now, we want to combine these grammars. Remember that a gramamr has a set of
definitions that correspond to nonterminals, and each definition has a set of
rules. We start from the rules. If we want to combine two grammars, we need
to make sure that any input produced from the combined grammar is also parsed
by the original grammars. That is, any rule from the combined grammar should
have a corresponding rule in the original grammars. This gives us the
algorithm for combining two rules. First, we can only combine rules that have
similar base representation. That is, if ruleA is `[<A f1>, <B f2>, 'T']`

where `<A>`

and `<B>`

are nonterminals and `T`

is a terminal
and ruleB is `[<A f1>, <C f3>]`

, these canâ€™t have a combination in the
combined grammar. On the other hand, if ruleB is `[<A f3>, <B f4> 'T']`

then, a combined rule of `[<A f1 & f3>, <B f2 & f4>, 'T']`

can infact
represent both parent rules. That is, when combining two rules from different,
grammars, their combination is empty if they have different base
representation.

The idea for combining two definitions of nonterminals is simply using the
distributive law. A definition is simply # `A1 or B1 or C1`

where `A1`

etc are
rules. Now, when you want to and two defintions, you have
`and(A1 or B1 or C1, A2 or B2 or C2)`

, and you want the `or`

out again.
So, this becomes

```
(A1 AND A2) OR (A1 AND B2) OR (A1 AND C2) OR
(A2 AND B1) OR (A2 AND C1) OR
(B1 AND B2) OR (B1 AND C2) OR
(B2 AND C1) OR (C1 AND C2)
```

which is essentially that many rules.

### Combining tokens

If they have the same base representation, then we only have to deal with how
to combine the nonterminal symbols. The terminal symbols are exactly the same
in parent rules as well as combined rule. So, given two tokens, we can
combine them as follows. The `and`

of a refined nonterminal and a base
nonterminal is always the refined nonterminal. Otherwise, it is simply an
`and()`

specialization of both refinements.

Using it.

## Combining rules

Next, we define combination for rules

Using it.

### Combining rulesets

Next, our grammars may contain multiple rules that represent the same base rule. All the rules that represent the same base rule is called a ruleset. combining two rulesets is done by producing a new ruleset that contains all possible pairs of rules from the parent ruleset.

Using it.

Next, we define a few helper functions that collects all rulesets

### definition conjunction

Now, we can define the conjunction of definitions as follows.

Using it.

### grammar conjunction

We can now define our grammar conjunction as follows.

Using it.

This grammar is now guaranteed to produce instances of both characterizing subtrees.

## Producing inputs with at least one of the two fault inducing fragments guaranteed to be present.

How do we construct grammars that are guaranteed to contain at least one of
the evocative patterns? This is actually much less complicated than `and`

The idea is simply using the distributive law. A definition is simply
`A1 or B1 or C1`

as before where `A1`

etc are rules.
Now, when you want to `or`

two definitions, you have
`or(A1 or B1 or C1, A2 or B2 or C2)`

, then it simply becomes
`A1 or B1 or C1 or A2 or B2 or C2`

At this point, our work is essentially done. All that we need to do
is to merge any rules that potentially allow us to merge. However, there is
one constraint. When we do a disjunction, it is possible that the rules being
combined may contain common parts. In such cases, a naive disjunction will
produce rules which can lead to ambiguous grammars. So, we have to be more
careful to remove ambiguity.

### Nonterminals

For nonterminals, it is similar to `and`

except that the base cases differ.
`or`

of a base nonterminal with a refined nonterminal is always the base.

## rules

What rules can be merged? Only those rules can be merged that has
a single refinement difference. That is if we have
`or(<A 1> <B 5> <C>, <A 2> <B 5> <C>)`

, then this merges to
`<A or(1,2)><B 5><C>`

. However `or(<A 1> <B 5> <C>, <A 2> <B 6> <C>)`

is not mergeable directly. For now, we leave such rules separate. However,
note that this can lead to ambiguity in grammar. In a later post, we will
show how to remove this ambiguity.
As a hint as to what we can do to remove ambiguity,
if given `(<A a1> <B a2>) or (<A a3> <B a4>)`

, we replace it with

```
(<A a1> <B a2>) and (<A a3> <B a4>)
| (<A a1> <B a2>) and neg(<A a3> <B a4>)
| (<A a3> <B a4>) and neg(<A a1> <B a2>)
```

However, we do not do that here as `neg`

is yet to be implemented.

using

## rulesets

For `or`

rulesets we first combine
both rulesets together then (optional) take one at a time,
and check if it can be merged with another.

Using it.

### definition disjunction

Now, we can define the disjunction of definitions as follows.

Using

### grammar disjunction

Using it.

This grammar is now guaranteed to produce at least one of the evocative subtrees Using it.

As before, the runnable source of this notebook can be found here

## Artifacts

The runnable Python source for this notebook is available here.

The installable python wheel `gmultiplefaults`

is available here.