Converting a Regular Grammar to a Regular Expression
Contents
Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
This post shows you how to convert a regular grammar to a regular expression using the state elimination algorithm. The intuition in this algorithm is to note that every nonterminal is just a way point between two sets of states A and B. So, Eliminating one nonterminal can be achieved by linking all pairs between A and B with the equivalent regular expression transmissions. We start with importing the prerequisites
Available Packages
These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.- simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
- gatleastsinglefault-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Faults".
- earleyparser-0.0.1-py2.py3-none-any.whl from "Earley Parser".
- hdd-0.0.1-py2.py3-none-any.whl from "Hierarchical Delta Debugging".
- ddset-0.0.1-py2.py3-none-any.whl from "Simple DDSet".
- rxfuzzer-0.0.1-py2.py3-none-any.whl from "Fuzzing With Regular Expressions".
- rxregular-0.0.1-py2.py3-none-any.whl from "Regular Expression to Regular Grammar".
- rxcanonical-0.0.1-py2.py3-none-any.whl from "Converting a Regular Expression to DFA using Regular Grammar".
The imported modules
Let us define a grammar that we want to extract the regular expression for.
Let us first define a few helpr functions for regular expression conversion. we represent individual regular expressions by lists. So, concatenation is just list addition. An empty list represents epsilon. We justneed to take care of two meta symbols, kleene star and alternative, which is represented as below.
We also define how to convert a regular expression to its string form.
We require grammars in a particular format. In particular,
we reqire that we have only one nonterminal symbol <_>: []
as the accept state. If the grammar doesn’t conform, we can
make it conform with fix_grammar() which adds the necessary
definitions. Essentially,
if the grammar already contains <_>, then it is kept, else
a new defintion <_>: [] is created. For any rule that contains
a single terminal, we add <_> as the ending nonterminal symbol.
if it is a single nonterminal, we assert that it is <_> which
is one epsilon transition away from `k`.
Adjuscency Matrix
Using the regular grammar directly is too unwieldy. Let us first extract the states, and transitions to build an adjuscency matrix. We do not want any incoming transitions to the start state, and we do not want any outgoing transitions from the accept. So, let us also define two phony states for these.
Let us also define a helper function to display adjuscency matrix
Using it.
We can now define our conversion routine. The idea is very simple,
remove one state at a time. For any pair of states such that
<src> -> (a) -> <q> -> (b) -> <dst>, replace it with
`
Using it.
The runnable code for this post is available here.
Artifacts
The runnable Python source for this notebook is available here.
The installable python wheel rgtorx is available here.