Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
This post shows you how to convert a regular grammar to a regular expression directly – that is, without first creating an NFA or a DFA. Converting a regular grammar to a regular expression is fairly easy We only need to follow a fairly fixed set of rules.
- First, we make sure that we have a start symbol, and a single symbol that represents the nonterminal in the grammar.
- Next, we combine any production rules that end with the same nonterminal into a regular expression rule with a regular expression prefix, and the nonterminal suffix.
- Next, we handle any self repetitions by using Kleene star expressions
- Finally, we start removing nonterminals one by one until we are left with the regular expression that takes us from start to stop. We start with importing the prerequisites
Available PackagesThese are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine.
The imported modules
We define a grammar for our use.
Ensuring start and stop
The grammar should have one start symbol and exactly one stop symbol which is NT_EMPTY So, what we do is, whenever we hae a rule that contains just a terminal symbol, we append the NT_EMPTY symbol to the rule. Thus NT_EMPTY symbol becomes the final nontermainal to be expanded.
We also define an
is_nonterminal that knows about regular expressions.
Next, given a rule, we want to split it into a regex part and a nonterminal part.
Next, what we want to do is to consolidate rules that have same nonterminals to a single rule with a regular expression prefix, and the nonterminal suffix. That is:
When there is recursion, that is a rule contains a nonterminal that is the same as the nonterminal we are defining, we need to convert that to a kleene star, and add it in front of every other rule. That is,
A -> b B | c C | a A
A -> a* b B | a* c C
Finally, we start eliminating nonterminals from the grammar one by one. The idea is to choose a single nonterminal to be eliminated, and find where it is being used. For each such rules, replace that rule with a set of rules with the same prefix, and the rules of the nonterminal being eliminated as the suffix. That is, given
A -> b B | c D B -> e E | f F
Eliminating B will result in
A -> b e E # new | b f F # new | c D # B -> e E # | f F
Regular grammar to regex
Eliminate each nonterminal one by one to get our expression.
The runnable code for this post is available here.