Important:Pyodide takes time to initialize.
Initialization completion is indicated by a red border around Run all button.
In the previous post
I showed how to produce a grammar out of regular expressions. In the
second post, I
claimed that we need a regular grammar because regular grammars have more
interesting properties such as being closed under
intersection and complement. Now, the question is how do we actually do
the intersection between two regular grammars? For this post, I assume that
the regular expressions are in the canonical format as given in
this post.
That is, there are only two possible rule formats \(A \rightarrow a B\)
and \(A \rightarrow \epsilon\). Further, the canonical format requires that
there is only one rule in a nonterminal definition that starts with a
particular terminal symbol. Refer to
this post for how convert any
regular grammar to the canonical format.
We start with importing the prerequisites
System Imports
These are available from Pyodide, but you may wish to make sure that they are
installed if you are attempting to run the program directly on the machine.
sympy
Available Packages
These are packages that refer either to my previous posts or to pure python
packages that I have compiled, and is available in the below locations. As
before, install them if you need to run the program directly on the machine.
To install, simply download the wheel file (`pkg.whl`) and install using
`pip install pkg.whl`.
There are only two patterns of rules in canonical regular grammars.
\[A \rightarrow aB\]
\[A \rightarrow \epsilon\]
The idea is that when evaluating intersection of the start symbol,
pair up all rules that start with the same terminal symbol.
Only the intersection of these would exist in the resulting grammar.
The intersection of
<A1> ::= a <B1>
and
<A2> ::= a <B2>
is simply
<and(A1,A2)> ::= a <and(B1,B2)>
For constructing such rules, we also need to parse the boolean expressions
in the nonterminals. So, we define our grammar first.
Next, we define our expression class which is used to wrap boolean
expressions and extract components of boolean expressions.
Ensure that it can parse boolean expressions in nonterminals.
Conjunction of two regular grammars
Next, we define the conjunction of two regular grammars in the canonical
format. We will define the conjunction of two definitions, and at the end
discuss how to stitch it together for the complete grammar. The nice thing
here is that, because our grammar is in the canonical format, conjunction
disjunction and negation is really simple, and follows roughly the same
framework.
Conjunction of nonterminal symbols
Ensure that it works
Conjunction of rules
We only provide conjunction for those rules whose initial terminal symbols are
the same or it is an empty rule.
We check to make sure that conjunction of rules work.
Conjunction of definitions
Checking that conjunction of definitions work.
Disjunction of two regular grammars
For disjunction, the strategy is the same. We line up rules in both
definitions with same starting symbol, then construct the conjunction of the
nonterminal parts. Unlike conjunction, however, we do not throw away the rules
without partners in the other definition. Instead, they are added to the
resulting definition.
Disjunction of nonterminal symbols
Ensure that it works
Disjunction of rules
We assume that the starting terminal symbol is the same for both rules.
We check to make sure that disjunction for rules work.
Disjunction of definitions
We check that disjunction of definitions work.
Complement of regular grammars
For complement, the idea is to treat each pattern separately. We take the
definition of each nonterminal separately.
If the nonterminal definition does not contain \(\epsilon\), we add NT_EMPTY
to the resulting definition. If it contains, then we skip it.
We collect all terminal symbols that start up a rule in the definition.
For each such rule, we add a rule that complements the nonterminal used.
That is, given
<A> := a <B>
We produce
<neg(A)> := a <neg(B)>
as one of the complement rules.
For every remaining terminal in the TERMINAL_SYMBOLS, we add a match for
any string given by NT_ALL_STAR (<.*>).
We start by producing the complement of a single nonterminal symbol.
Ensure that it works
Complement of a single rule
Ensure that it works
Complement a definition
We complement a definition by applying complement to each rule in the
original definition, and adding any new rules that did not match the
original definition.
Checking complement of a definition in an example.
Complete
Until now, we have only produced conjunction, disjunction, and complement for
definitions. When producing these, we have introduced new nonterminals in
definitions that are not yet defined. For producing a complete grammar, we
need to define these new nonterminals too. This is what we will do in this
section. We first define a few helper procedures.
The remove_empty_defs() recursively removes any nonterminal that has empty
definitions. That is, of the form "<A>" : []. Note that it is different from
an epsilon rule which is "<A>" : [[]]
Next, we define complete() which recursively computes the complex
nonterminals that is left undefined in a grammar from the simpler
nonterminal definitions.
That is, for any conjunction, disjunction, or negation of grammars, we start
at the start symbol, and produce the corresponding operation in the definition
of the start symbol. Then, we check if any new new nonterminal was used in any
of the rules. If any were used, we recursively define them using the
nonterminals already present in the grammar. This is very similar to the
ReconstructRules from fault expressions
for context-free grammars, but is also different enough. Hence, we define a
completely new class.
We start with reconstructing a single key. For example, given the two grammars
G1 and G2, and their start symbols S1, and S2, to compute an intersection
of G1 & G2, we simply reconstruct <and(S1,S2)> from the two grammars, and
recursively define any undefined nonterminals.
Given a complex boolean expression, construct the definition for it from the
grammar rules.