Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.

In the previous post about uniform random sampling from grammars, I mentioned that the algorithm expects an epsilon-free grammar. That is, the grammar should contain no empty rules. Unfortunately, empty rules are quite useful for describing languages. For example, to specify that we need zero or more white space characters, the following definition of <spaceZ> is the ideal representation.



So, what can we do? In fact, it is possible to transform the grammar such that it no longer contain epsilon rules. The idea is that any rule that references a nonterminal that can be empty can be represented by skipping in a duplicate rule. When there are multiple such empty-able nonterminals, you need to produce every combination of skipping them. But first, let us tackle an easier task. We want to remove those nonterminals that exclusively represent an empty string. E.g.



We also load a few prerequisites



We import the following modules



Remove empty keys

First, we implement removing empty keys that have empty expansions. In the above <empty> is such a key. Note that we still need an empty expansion inside the definition. i.e [[]]. Leaving <empty> without an expansion, i.e. [] means that <empty> can’t be expanded, and hence we will have an invalid grammar. That is, <empty>: [] is not a valid definition.



We can use it thus:



Now we are ready to tackle the more complex part: That of removing epsilon rules. First, we need to identify such rules that can become empty, and hence the corresponding keys that can become empty.

Finding empty (epsilon) rules

The idea is as follows, We keep a set of nullable nonterminals. For each rule, we check if all the tokens in the rule are nullable (i.e in the nullable set). If all are (i.e all(t in my_epsilons for t in r)), then, this rule is nullable. If there are any nullable rules for a key, then the key is nullable. We process these keys until there are no more new keys.



We can use it thus:



Now that we can find epsilon rules, we need generate all combinations of the corresponding keys, so that we can generate corresponding rules. The idea is that for any given rule with nullable nonterminals in it, you need to generate all combinations of possible rules where some of such nonterminals are missing. That is, if given [<A> <E1> <B> <E2> <C> <E3>], you need to generate these rules.

[<A> <E1> <B> <E2> <C> <E3>]
[<A> <B> <E2> <C> <E3>]
[<A> <B> <C> <E3>]
[<A> <B> <C>]
[<A> <E1> <B> <C> <E3>]
[<A> <E1> <B> <C>]
[<A> <E1> <B> <E2> <C>]


We can use it thus:



Let us try a larger grammar. This is the JSON grammar.



Extract combinations.



Here comes the last part, which stitches all these together.



Using the complete epsilon remover.



We can now count the strings produced by the epsilon free grammar



As before, the runnable source of this notebook is here.