Important:Pyodide takes time to initialize.
Initialization completion is indicated by a red border around Run all button.
In the previous post about uniform random sampling from grammars,
I mentioned that the algorithm expects an epsilon-free grammar. That is,
the grammar should contain no empty rules. Unfortunately, empty rules are
quite useful for describing languages. For example, to specify that we need
zero or more white space characters, the following definition of <spaceZ>
is the ideal representation.
So, what can we do? In fact, it is possible to transform the grammar such that
it no longer contain epsilon rules. The idea is that any rule that references
a nonterminal that can be empty can be represented by skipping in a duplicate
rule. When there are multiple such empty-able nonterminals, you need to
produce every combination of skipping them.
But first, let us tackle an easier task. We want to remove those nonterminals
that exclusively represent an empty string. E.g.
We also load a few prerequisites
System Imports
These are available from Pyodide, but you may wish to make sure that they are
installed if you are attempting to run the program directly on the machine.
sympy
Available Packages
These are packages that refer either to my previous posts or to pure python
packages that I have compiled, and is available in the below locations. As
before, install them if you need to run the program directly on the machine.
To install, simply download the wheel file (`pkg.whl`) and install using
`pip install pkg.whl`.
First, we implement removing empty keys that have empty expansions.
In the above <empty> is such a key.
Note that we still need an empty expansion inside the definition. i.e [[]].
Leaving <empty> without an expansion, i.e. [] means that <empty> can’t
be expanded, and hence we will have an invalid grammar.
That is, <empty>: [] is not a valid definition.
We can use it thus:
Now we are ready to tackle the more complex part: That of removing epsilon
rules. First, we need to identify such rules that can become empty, and
hence the corresponding keys that can become empty.
Finding empty (epsilon) rules
The idea is as follows, We keep a set of nullable nonterminals. For each
rule, we check if all the tokens in the rule are nullable (i.e in the nullable
set). If all are (i.e all(t in my_epsilons for t in r)), then, this rule
is nullable. If there are any nullable rules for a key, then the key is
nullable. We process these keys until there are no more new keys.
We can use it thus:
Now that we can find epsilon rules, we need generate all combinations of
the corresponding keys, so that we can generate corresponding rules.
The idea is that for any given rule with nullable nonterminals in it,
you need to generate all combinations of possible rules where some of
such nonterminals are missing. That is, if given
[<A> <E1> <B> <E2> <C> <E3>], you need to generate these rules.