# Conjunction, Disjunction, and Complement of Regular Grammars

## Contents

- Conjunction of two regular grammars
- Disjunction of two regular grammars
- Complement of regular grammars
- Complete
- Artifacts

**Important:** Pyodide takes time to initialize.
Initialization completion is indicated by a red border around *Run all* button.

In the previous post I showed how to produce a grammar out of regular expressions. In the second post, I claimed that we need a regular grammar because regular grammars have more interesting properties such as being closed under intersection and complement. Now, the question is how do we actually do the intersection between two regular grammars? For this post, I assume that the regular expressions are in the canonical format as given in this post. That is, there are only two possible rule formats \(A \rightarrow a B\) and \(A \rightarrow \epsilon\). Further, the canonical format requires that there is only one rule in a nonterminal definition that starts with a particular terminal symbol. Refer to this post for how convert any regular grammar to the canonical format. We start with importing the prerequisites

## System Imports

These are available from Pyodide, but you may wish to make sure that they are installed if you are attempting to run the program directly on the machine.- sympy

## Available Packages

These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.- simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
- gatleastsinglefault-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Faults".
- gfaultexpressions-0.0.1-py2.py3-none-any.whl from "Fault Expressions for Specializing Context-Free Grammars".
- gmultiplefaults-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Multiple Faults".
- earleyparser-0.0.1-py2.py3-none-any.whl from "Earley Parser".
- hdd-0.0.1-py2.py3-none-any.whl from "Hierarchical Delta Debugging".
- ddset-0.0.1-py2.py3-none-any.whl from "Simple DDSet".
- rxfuzzer-0.0.1-py2.py3-none-any.whl from "iFuzzing With Regular Expressions".
- rxregular-0.0.1-py2.py3-none-any.whl from "Regular Expression to Regular Grammar".
- rxcanonical-0.0.1-py2.py3-none-any.whl from "Converting a Regular Expression to DFA using Regular Grammar".

The imported modules

There are only two patterns of rules in canonical regular grammars.

- \[A \rightarrow aB\]
- \[A \rightarrow \epsilon\]

The idea is that when evaluating intersection of the start symbol, pair up all rules that start with the same terminal symbol. Only the intersection of these would exist in the resulting grammar. The intersection of

```
<A1> ::= a <B1>
```

and

```
<A2> ::= a <B2>
```

is simply

```
<and(A1,A2)> ::= a <and(B1,B2)>
```

For constructing such rules, we also need to parse the boolean expressions in the nonterminals. So, we define our grammar first.

Next, we define our expression class which is used to wrap boolean expressions and extract components of boolean expressions.

Ensure that it can parse boolean expressions in nonterminals.

## Conjunction of two regular grammars

Next, we define the conjunction of two regular grammars in the canonical format. We will define the conjunction of two definitions, and at the end discuss how to stitch it together for the complete grammar. The nice thing here is that, because our grammar is in the canonical format, conjunction disjunction and negation is really simple, and follows roughly the same framework.

### Conjunction of nonterminal symbols

Ensure that it works

### Conjunction of rules

We only provide conjunction for those rules whose initial terminal symbols are the same or it is an empty rule.

We check to make sure that conjunction of rules work.

### Conjunction of definitions

Checking that conjunction of definitions work.

## Disjunction of two regular grammars

For disjunction, the strategy is the same. We line up rules in both definitions with same starting symbol, then construct the conjunction of the nonterminal parts. Unlike conjunction, however, we do not throw away the rules without partners in the other definition. Instead, they are added to the resulting definition.

### Disjunction of nonterminal symbols

Ensure that it works

### Disjunction of rules

We assume that the starting terminal symbol is the same for both rules.

We check to make sure that disjunction for rules work.

### Disjunction of definitions

We check that disjunction of definitions work.

## Complement of regular grammars

For complement, the idea is to treat each pattern separately. We take the definition of each nonterminal separately.

- If the nonterminal definition does not contain \(\epsilon\), we add
`NT_EMPTY`

to the resulting definition. If it contains, then we skip it. - We collect all terminal symbols that start up a rule in the definition. For each such rule, we add a rule that complements the nonterminal used. That is, given

```
<A> := a <B>
```

We produce

```
<neg(A)> := a <neg(B)>
```

as one of the complement rules.

- For every remaining terminal in the
`TERMINAL_SYMBOLS`

, we add a match for any string given by`NT_ALL_STAR`

(`<.*>`

).

We start by producing the complement of a single nonterminal symbol.

Ensure that it works

### Complement of a single rule

Ensure that it works

### Complement a definition

We complement a definition by applying complement to each rule in the original definition, and adding any new rules that did not match the original definition.

Checking complement of a definition in an example.

## Complete

Until now, we have only produced conjunction, disjunction, and complement for definitions. When producing these, we have introduced new nonterminals in definitions that are not yet defined. For producing a complete grammar, we need to define these new nonterminals too. This is what we will do in this section. We first define a few helper procedures.

The `remove_empty_defs()`

recursively removes any nonterminal that has empty
definitions. That is, of the form `"<A>" : []`

. Note that it is different from
an epsilon rule which is `"<A>" : [[]]`

Next, we define `complete()`

which recursively computes the complex
nonterminals that is left undefined in a grammar from the simpler
nonterminal definitions.

That is, for any conjunction, disjunction, or negation of grammars, we start
at the start symbol, and produce the corresponding operation in the definition
of the start symbol. Then, we check if any new new nonterminal was used in any
of the rules. If any were used, we recursively define them using the
nonterminals already present in the grammar. This is very similar to the
`ReconstructRules`

from fault expressions
for context-free grammars, but is also different enough. Hence, we define a
completely new class.

We start with reconstructing a single key. For example, given the two grammars
`G1`

and `G2`

, and their start symbols `S1`

, and `S2`

, to compute an intersection
of `G1 & G2`

, we simply reconstruct `<and(S1,S2)>`

from the two grammars, and
recursively define any undefined nonterminals.

Given a complex boolean expression, construct the definition for it from the grammar rules.

Produce disjunction of grammars

Using

Produce conjunction of grammars

We now verify that it works.

Next, we come to complement.

Ensure that negation also works.

The runnable code for this post is available here

## Artifacts

The runnable Python source for this notebook is available here.

The installable python wheel `rxexpressions`

is available here.