Removing Left Recursion from Context Free Grammars
Left recursion in contextfree grammars is when a nonterminal when expanded,
results in the same nonterminal symbol in the expansion as the first symbol.
If the symbol is present as the first symbol in one of the expansion rules of
the nonterminal, it is called a direct leftrecursion. Below is a grammar with
direct leftrecursion. The nonterminal symbol <Ds>
has a direct
leftrecursion.
Contents
Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
An indirect leftrecursion occurs when the symbol is found after more than one expansion step. Here is an indirect leftrecursion.
Here, <I>
has an indirect leftrecursion as expanding <I>
through <Ds>
results in <I>
being the first symbol again.
For contextfree grammars, in many cases leftrecursions are a nuisance. During parsing, leftrecursion in grammars can make simpler parsers recurse infinitely. Hence, it is often useful to eliminate them.
It is fairly simple to eliminate them from contextfree grammars. Here is a solution.
Prerequisites
As before, we start with the prerequisite imports.
Available Packages
These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.We need the fuzzer to generate inputs to parse and also to provide some utilities
As before, we use the fuzzingbook grammar style.
A terminal symbol has exactly one character
(Note that we disallow empty string (''
) as a terminal symbol).
We have our grammar:
First, we need to check for left recursion.
Definitions
Following Moore ^{1}, we will use the following terminology.

A symbol X is a direct left corner of a nonetrminal symbol A if there is an expansion rule A > X alpha, where alpha is any sequence of tokens.

A left corner relation is a reflexive transitive closure of the direct left corner relation.

A symbol is directly left recursive if it is in the direct left corner of itself.

A symbol is left recursive if it has a left corner relation with itself.

A symbol is indirectly left recursive if it is left recursive but not directly left recursive.
For this algorithm to work, we need the grammar to not have epsilon productions. You can refer to my previous post for the algorithm for removing epsilon productions.
Using it
Next, to eliminate direct left recursion for the nonterminal <A>
we repeat the following transformations until the direct left recursions
from <A>
are removed.
Given
A > A alpha_1
 ...
 A alpha_2
 beta_1
 ..
 beta_m
such that A > A alpha_i is an expansion rule of A that contains a direct left recursion, and alpha_i is a sequence of tokens, and A > beta_j represents an expansion rule without direct left recursion where beta_j is a sequence of tokens, for every beta_j, add a new expansion rule A > beta_j A’ and for every alpha_i, add A’ > alpha_i A’ to the grammar. Finally add A’ > epsilon to the grammar.
Using it
Removing the indirect leftrecursion is a bit more trickier. The algorithm
starts by establishing some stable ordering of the nonterminals so that
they can be procesed in order. Next, we apply an algorithm called Paull's
algorithm ^{1}, which is as follows:
For any nonterminals Ai and Aj such that i > j in the ordering, and Aj
is a direct left corner of Ai, replace all occurrences of Aj as a direct
left corner of Ai with all possible expansions of Aj
Using it:
Let us see if the grammar results in the right language
A problem with this algorithm is its exponential case behavior as Moore ^{2} notes. The solution that Moore offers is to order the nonterminals in the decreasing order of the number of distinct left corners.
Artifacts
The runnable Python source for this notebook is available here.
The installable python wheel removeleftrecursion
is available here.