Remove Empty (Epsilon) Rules From a Context-Free Grammar.

Remove empty keys
Finding empty (epsilon) rules
Artifacts

Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.

In the previous post about uniform random sampling from grammars, I mentioned that the algorithm expects an epsilon-free grammar. That is, the grammar should contain no empty rules. Unfortunately, empty rules are quite useful for describing languages. For example, to specify that we need zero or more white space characters, the following definition of <spaceZ> is the ideal representation.

So, what can we do? In fact, it is possible to transform the grammar such that it no longer contain epsilon rules. The idea is that any rule that references a nonterminal that can be empty can be represented by skipping in a duplicate rule. When there are multiple such empty-able nonterminals, you need to produce every combination of skipping them. But first, let us tackle an easier task. We want to remove those nonterminals that exclusively represent an empty string. E.g.

We also load a few prerequisites

System Imports

These are available from Pyodide, but you may wish to make sure that they are installed if you are attempting to run the program directly on the machine.

sympy

Available Packages

These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.

The imported modules

Remove empty keys

First, we implement removing empty keys that have empty expansions. In the above <empty> is such a key. Note that we still need an empty expansion inside the definition. i.e [[]]. Leaving <empty> without an expansion, i.e. [] means that <empty> can’t be expanded, and hence we will have an invalid grammar. That is, <empty>: [] is not a valid definition.

We can use it thus:

Now we are ready to tackle the more complex part: That of removing epsilon rules. First, we need to identify such rules that can become empty, and hence the corresponding keys that can become empty.

Finding empty (epsilon) rules

The idea is as follows, We keep a set of nullable nonterminals. For each rule, we check if all the tokens in the rule are nullable (i.e in the nullable set). If all are (i.e all(t in my_epsilons for t in r)), then, this rule is nullable. If there are any nullable rules for a key, then the key is nullable. We process these keys until there are no more new keys.

We can use it thus:

Now that we can find epsilon rules, we need generate all combinations of the corresponding keys, so that we can generate corresponding rules. The idea is that for any given rule with nullable nonterminals in it, you need to generate all combinations of possible rules where some of such nonterminals are missing. That is, if given [<A> <E1> <B> <E2> <C> <E3>], you need to generate these rules.

[<A> <E1> <B> <E2> <C> <E3>]
[<A> <B> <E2> <C> <E3>]
[<A> <B> <C> <E3>]
[<A> <B> <C>]
[<A> <E1> <B> <C> <E3>]
[<A> <E1> <B> <C>]
[<A> <E1> <B> <E2> <C>]

We can use it thus:

Let us try a larger grammar. This is the JSON grammar.

jsonG = {
    "<start>": [["<json>"]],
    "<json>": [["<element>"]],
    "<element>": [["<ws>", "<value>", "<ws>"]],
    "<value>": [["<object>"], ["<array>"], ["<string>"], ["<number>"],
                ["true"], ["false"],
                ["null"]],
    "<object>": [["{", "<ws>", "}"], ["{", "<members>", "}"]],
    "<members>": [["<member>", "<symbol-2>"]],
    "<member>": [["<ws>", "<string>", "<ws>", ":", "<element>"]],
    "<array>": [["[", "<ws>", "]"], ["[", "<elements>", "]"]],
    "<elements>": [["<element>", "<symbol-1-1>"]],
    "<string>": [["\"", "<characters>", "\""]],
    "<characters>": [["<character-1>"]],
    "<character>": [["0"], ["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"],
                    ["8"], ["9"], ["a"], ["b"], ["c"], ["d"], ["e"], ["f"],
                    ["g"], ["h"], ["i"], ["j"], ["k"], ["l"], ["m"], ["n"],
                    ["o"], ["p"], ["q"], ["r"], ["s"], ["t"], ["u"], ["v"],
                    ["w"], ["x"], ["y"], ["z"], ["A"], ["B"], ["C"], ["D"],
                    ["E"], ["F"], ["G"], ["H"], ["I"], ["J"], ["K"], ["L"],
                    ["M"], ["N"], ["O"], ["P"], ["Q"], ["R"], ["S"], ["T"],
                    ["U"], ["V"], ["W"], ["X"], ["Y"], ["Z"], ["!"], ["#"],
                    ["$"], ["%"], ["&"], ["\""], ["("], [")"], ["*"], ["+"],
                    [","], ["-"], ["."], ["/"], [":"], [";"], ["<"], ["="],
                    [">"], ["?"], ["@"], ["["], ["]"], ["^"], ["_"], ["`"],
                    ["{"], ["|"], ["}"], ["~"], [" "], ["<esc>"]],
    "<esc>": [["\\","<escc>"]],
    "<escc>": [["\\"],["b"],["f"], ["n"], ["r"],["t"],["\""]],
    "<number>": [["<int>", "<frac>", "<exp>"]],
    "<int>": [["<digit>"], ["<onenine>", "<digits>"], ["-", "<digits>"],
              ["-", "<onenine>", "<digits>"]],
    "<digits>": [["<digit-1>"]],
    "<digit>": [["0"], ["<onenine>"]],
    "<onenine>": [["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"], ["8"],
                  ["9"]],
    "<frac>": [[], [".", "<digits>"]],
    "<exp>": [[], ["E", "<sign>", "<digits>"], ["e", "<sign>", "<digits>"]],
    "<sign>": [[], ["+"], ["-"]],
    "<ws>": [["<sp1>", "<ws>"], []],
    "<sp1>": [[" "],["\n"],["\t"],["\r"]],
    "<symbol>": [[",", "<members>"]],
    "<symbol-1>": [[",", "<elements>"]],
    "<symbol-2>": [[], ["<symbol>", "<symbol-2>"]],
    "<symbol-1-1>": [[], ["<symbol-1>", "<symbol-1-1>"]],
    "<character-1>": [[], ["<character>", "<character-1>"]],
    "<digit-1>": [["<digit>"], ["<digit>", "<digit-1>"]]
}
jsonS = '<start>'

Extract combinations.

Here comes the last part, which stitches all these together.

Using the complete epsilon remover.

We can now count the strings produced by the epsilon free grammar

As before, the runnable source of this notebook is here.

Artifacts

The runnable Python source for this notebook is available here.

The installable python wheel cfgremoveepsilon is available here.

Contents

Remove empty keys

Finding empty (epsilon) rules

Artifacts