Contents

  1. Conjunction of two regular grammars
    1. Conjunction of nonterminal symbols
    2. Conjunction of rules
    3. Conjunction of definitions
  2. Disjunction of two regular grammars
    1. Disjunction of nonterminal symbols
    2. Disjunction of rules
    3. Disjunction of definitions
  3. Complement of regular grammars
    1. Complement of a single rule
    2. Complement a definition
  4. Complete
  5. Artifacts

Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.

In the previous post I showed how to produce a grammar out of regular expressions. In the second post, I claimed that we need a regular grammar because regular grammars have more interesting properties such as being closed under intersection and complement. Now, the question is how do we actually do the intersection between two regular grammars? For this post, I assume that the regular expressions are in the canonical format as given in this post. That is, there are only two possible rule formats \(A \rightarrow a B\) and \(A \rightarrow \epsilon\). Further, the canonical format requires that there is only one rule in a nonterminal definition that starts with a particular terminal symbol. Refer to this post for how convert any regular grammar to the canonical format. We start with importing the prerequisites

System Imports These are available from Pyodide, but you may wish to make sure that they are installed if you are attempting to run the program directly on the machine.
  1. sympy
Available Packages These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.
  1. simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
  2. gatleastsinglefault-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Faults".
  3. gfaultexpressions-0.0.1-py2.py3-none-any.whl from "Fault Expressions for Specializing Context-Free Grammars".
  4. gmultiplefaults-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Multiple Faults".
  5. earleyparser-0.0.1-py2.py3-none-any.whl from "Earley Parser".
  6. hdd-0.0.1-py2.py3-none-any.whl from "Hierarchical Delta Debugging".
  7. ddset-0.0.1-py2.py3-none-any.whl from "Simple DDSet".
  8. rxfuzzer-0.0.1-py2.py3-none-any.whl from "iFuzzing With Regular Expressions".
  9. rxregular-0.0.1-py2.py3-none-any.whl from "Regular Expression to Regular Grammar".
  10. rxcanonical-0.0.1-py2.py3-none-any.whl from "Converting a Regular Expression to DFA using Regular Grammar".

The imported modules



There are only two patterns of rules in canonical regular grammars.

  1. \[A \rightarrow aB\]
  2. \[A \rightarrow \epsilon\]

The idea is that when evaluating intersection of the start symbol, pair up all rules that start with the same terminal symbol. Only the intersection of these would exist in the resulting grammar. The intersection of

<A1> ::= a <B1>

and

<A2> ::= a <B2>

is simply

<and(A1,A2)> ::= a <and(B1,B2)>

For constructing such rules, we also need to parse the boolean expressions in the nonterminals. So, we define our grammar first.



Next, we define our expression class which is used to wrap boolean expressions and extract components of boolean expressions.



Ensure that it can parse boolean expressions in nonterminals.



Conjunction of two regular grammars

Next, we define the conjunction of two regular grammars in the canonical format. We will define the conjunction of two definitions, and at the end discuss how to stitch it together for the complete grammar. The nice thing here is that, because our grammar is in the canonical format, conjunction disjunction and negation is really simple, and follows roughly the same framework.

Conjunction of nonterminal symbols



Ensure that it works



Conjunction of rules

We only provide conjunction for those rules whose initial terminal symbols are the same or it is an empty rule.



We check to make sure that conjunction of rules work.



Conjunction of definitions



Checking that conjunction of definitions work.



Disjunction of two regular grammars

For disjunction, the strategy is the same. We line up rules in both definitions with same starting symbol, then construct the conjunction of the nonterminal parts. Unlike conjunction, however, we do not throw away the rules without partners in the other definition. Instead, they are added to the resulting definition.

Disjunction of nonterminal symbols



Ensure that it works



Disjunction of rules

We assume that the starting terminal symbol is the same for both rules.



We check to make sure that disjunction for rules work.



Disjunction of definitions



We check that disjunction of definitions work.



Complement of regular grammars

For complement, the idea is to treat each pattern separately. We take the definition of each nonterminal separately.

  1. If the nonterminal definition does not contain \(\epsilon\), we add NT_EMPTY to the resulting definition. If it contains, then we skip it.
  2. We collect all terminal symbols that start up a rule in the definition. For each such rule, we add a rule that complements the nonterminal used. That is, given
<A>  :=  a <B>

We produce

<neg(A)>  :=  a <neg(B)>

as one of the complement rules.

  1. For every remaining terminal in the TERMINAL_SYMBOLS, we add a match for any string given by NT_ALL_STAR (<.*>).

We start by producing the complement of a single nonterminal symbol.



Ensure that it works



Complement of a single rule



Ensure that it works



Complement a definition

We complement a definition by applying complement to each rule in the original definition, and adding any new rules that did not match the original definition.



Checking complement of a definition in an example.



Complete

Until now, we have only produced conjunction, disjunction, and complement for definitions. When producing these, we have introduced new nonterminals in definitions that are not yet defined. For producing a complete grammar, we need to define these new nonterminals too. This is what we will do in this section. We first define a few helper procedures.

The remove_empty_defs() recursively removes any nonterminal that has empty definitions. That is, of the form "<A>" : []. Note that it is different from an epsilon rule which is "<A>" : [[]]



Next, we define complete() which recursively computes the complex nonterminals that is left undefined in a grammar from the simpler nonterminal definitions.



That is, for any conjunction, disjunction, or negation of grammars, we start at the start symbol, and produce the corresponding operation in the definition of the start symbol. Then, we check if any new new nonterminal was used in any of the rules. If any were used, we recursively define them using the nonterminals already present in the grammar. This is very similar to the ReconstructRules from fault expressions for context-free grammars, but is also different enough. Hence, we define a completely new class.



We start with reconstructing a single key. For example, given the two grammars G1 and G2, and their start symbols S1, and S2, to compute an intersection of G1 & G2, we simply reconstruct <and(S1,S2)> from the two grammars, and recursively define any undefined nonterminals.



Given a complex boolean expression, construct the definition for it from the grammar rules.



Produce disjunction of grammars



Using



Produce conjunction of grammars



We now verify that it works.



Next, we come to complement.



Ensure that negation also works.



The runnable code for this post is available here

Artifacts

The runnable Python source for this notebook is available here.

The installable python wheel rxexpressions is available here.