Removing Left Recursion from Context Free Grammars
Left recursion in context-free grammars is when a nonterminal when expanded,
results in the same nonterminal symbol in the expansion as the first symbol.
If the symbol is present as the first symbol in one of the expansion rules of
the nonterminal, it is called a direct left-recursion. Below is a grammar with
direct left-recursion. The nonterminal symbol <Ds>
has a direct
left-recursion.
Contents
Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
An indirect left-recursion occurs when the symbol is found after more than one expansion step. Here is an indirect left-recursion.
Here, <I>
has an indirect left-recursion as expanding <I>
through <Ds>
results in <I>
being the first symbol again.
For context-free grammars, in many cases left-recursions are a nuisance. During parsing, left-recursion in grammars can make simpler parsers recurse infinitely. Hence, it is often useful to eliminate them.
It is fairly simple to eliminate them from context-free grammars. Here is a solution.
Prerequisites
As before, we start with the prerequisite imports.
Available Packages
These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.We need the fuzzer to generate inputs to parse and also to provide some utilities
As before, we use the fuzzingbook grammar style.
A terminal symbol has exactly one character
(Note that we disallow empty string (''
) as a terminal symbol).
We have our grammar:
First, we need to check for left recursion.
Definitions
Following Moore 1, we will use the following terminology.
-
A symbol X is a direct left corner of a nonetrminal symbol A if there is an expansion rule A -> X alpha, where alpha is any sequence of tokens.
-
A left corner relation is a reflexive transitive closure of the direct left corner relation.
-
A symbol is directly left recursive if it is in the direct left corner of itself.
-
A symbol is left recursive if it has a left corner relation with itself.
-
A symbol is indirectly left recursive if it is left recursive but not directly left recursive.
For this algorithm to work, we need the grammar to not have epsilon productions. You can refer to my previous post for the algorithm for removing epsilon productions.
Using it
Next, to eliminate direct left recursion for the nonterminal <A>
we repeat the following transformations until the direct left recursions
from <A>
are removed.
Given
A -> A alpha_1
| ...
| A alpha_2
| beta_1
| ..
| beta_m
such that A -> A alpha_i is an expansion rule of A that contains a direct left recursion, and alpha_i is a sequence of tokens, and A -> beta_j represents an expansion rule without direct left recursion where beta_j is a sequence of tokens, for every beta_j, add a new expansion rule A -> beta_j A’ and for every alpha_i, add A’ -> alpha_i A’ to the grammar. Finally add A’ -> epsilon to the grammar.
Using it
Removing the indirect left-recursion is a bit more trickier. The algorithm
starts by establishing some stable ordering of the nonterminals so that
they can be procesed in order. Next, we apply an algorithm called Paull's
algorithm 1, which is as follows:
For any nonterminals Ai and Aj such that i > j in the ordering, and Aj
is a direct left corner of Ai, replace all occurrences of Aj as a direct
left corner of Ai with all possible expansions of Aj
Using it:
Let us see if the grammar results in the right language
A problem with this algorithm is its exponential case behavior as Moore 2 notes. The solution that Moore offers is to order the nonterminals in the decreasing order of the number of distinct left corners.
Artifacts
The runnable Python source for this notebook is available here.
The installable python wheel removeleftrecursion
is available here.