Left recursion in context-free grammars is when a nonterminal when expanded,
results in the same nonterminal symbol in the expansion as the first symbol.
If the symbol is present as the first symbol in one of the expansion rules of
the nonterminal, it is called a direct left-recursion. Below is a grammar with
direct left-recursion. The nonterminal symbol
<Ds> has a direct
Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
An indirect left-recursion occurs when the symbol is found after more than one expansion step. Here is an indirect left-recursion.
<I> has an indirect left-recursion as expanding
<I> being the first symbol again.
For context-free grammars, in many cases left-recursions are a nuisance. During parsing, left-recursion in grammars can make simpler parsers recurse infinitely. Hence, it is often useful to eliminate them.
It is fairly simple to eliminate them from context-free grammars. Here is a solution.
As before, we start with the prerequisite imports.
Available PackagesThese are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.
We need the fuzzer to generate inputs to parse and also to provide some utilities
As before, we use the fuzzingbook grammar style.
A terminal symbol has exactly one character
(Note that we disallow empty string (
'') as a terminal symbol).
We have our grammar:
First, we need to check for left recursion.
Following Moore 1, we will use the following terminology.
A symbol X is a direct left corner of a nonetrminal symbol A if there is an expansion rule A -> X alpha, where alpha is any sequence of tokens.
A left corner relation is a reflexive transitive closure of the direct left corner relation.
A symbol is directly left recursive if it is in the direct left corner of itself.
A symbol is left recursive if it has a left corner relation with itself.
A symbol is indirectly left recursive if it is left recursive but not directly left recursive.
For this algorithm to work, we need the grammar to not have epsilon productions. You can refer to my previous post for the algorithm for removing epsilon productions.
Next, to eliminate direct left recursion for the nonterminal
we repeat the following transformations until the direct left recursions
<A> are removed.
A -> A alpha_1 | ... | A alpha_2 | beta_1 | .. | beta_m
such that A -> A alpha_i is an expansion rule of A that contains a direct left recursion, and alpha_i is a sequence of tokens, and A -> beta_j represents an expansion rule without direct left recursion where beta_j is a sequence of tokens, for every beta_j, add a new expansion rule A -> beta_j A’ and for every alpha_i, add A’ -> alpha_i A’ to the grammar. Finally add A’ -> epsilon to the grammar.
Removing the indirect left-recursion is a bit more trickier. The algorithm
starts by establishing some stable ordering of the nonterminals so that
they can be procesed in order. Next, we apply an algorithm called
algorithm 1, which is as follows:
For any nonterminals Ai and Aj such that i > j in the ordering, and Aj
is a direct left corner of Ai, replace all occurrences of Aj as a direct
left corner of Ai with all possible expansions of Aj
Let us see if the grammar results in the right language
A problem with this algorithm is its exponential case behavior as Moore 2 notes. The solution that Moore offers is to order the nonterminals in the decreasing order of the number of distinct left corners.
The runnable Python source for this notebook is available here.
The installable python wheel
removeleftrecursion is available here.