# Converting a Regular Grammar to a Regular Expression

## Contents

**Important:** Pyodide takes time to initialize.
Initialization completion is indicated by a red border around *Run all* button.

This post shows you how to convert a regular grammar to a regular expression directly – that is, without first creating an NFA or a DFA. Converting a regular grammar to a regular expression is fairly easy We only need to follow a fairly fixed set of rules.

- First, we make sure that we have a start symbol, and a single symbol that represents the nonterminal in the grammar.
- Next, we combine any production rules that end with the same nonterminal into a regular expression rule with a regular expression prefix, and the nonterminal suffix.
- Next, we handle any self repetitions by using Kleene star expressions
- Finally, we start removing nonterminals one by one until we are left with the regular expression that takes us from start to stop. We start with importing the prerequisites

## Available Packages

These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine.- simplefuzzer-0.0.1-py2.py3-none-any.whl
- gatleastsinglefault-0.0.1-py2.py3-none-any.whl
- earleyparser-0.0.1-py2.py3-none-any.whl
- hdd-0.0.1-py2.py3-none-any.whl
- ddset-0.0.1-py2.py3-none-any.whl
- rxfuzzer-0.0.1-py2.py3-none-any.whl
- rxregular-0.0.1-py2.py3-none-any.whl
- rxcanonical-0.0.1-py2.py3-none-any.whl

The imported modules

We define a grammar for our use.

## Ensuring start and stop

The grammar should have one start symbol and exactly one stop symbol which is NT_EMPTY So, what we do is, whenever we hae a rule that contains just a terminal symbol, we append the NT_EMPTY symbol to the rule. Thus NT_EMPTY symbol becomes the final nontermainal to be expanded.

Testing it.

We also define an `is_nonterminal`

that knows about regular expressions.

Next, given a rule, we want to split it into a regex part and a nonterminal part.

## Prefix Regex

Next, what we want to do is to consolidate rules that have same nonterminals to a single rule with a regular expression prefix, and the nonterminal suffix. That is:

- convert := a
**| b****to (a|b)** - convert := <_> to := <_>
- convert <_> := \e to <_> := \e

Using it.

## Recursion (Repetition)

When there is recursion, that is a rule contains a nonterminal that is the same as the nonterminal we are defining, we need to convert that to a kleene star, and add it in front of every other rule. That is,

```
A -> b B
| c C
| a A
```

becomes

```
A -> a* b B
| a* c C
```

Using it.

Finally, we start eliminating nonterminals from the grammar one by one. The idea is to choose a single nonterminal to be eliminated, and find where it is being used. For each such rules, replace that rule with a set of rules with the same prefix, and the rules of the nonterminal being eliminated as the suffix. That is, given

```
A -> b B
| c D
B -> e E
| f F
```

Eliminating B will result in

```
A -> b e E # new
| b f F # new
| c D
# B -> e E
# | f F
```

Using it.

## Regular grammar to regex

Eliminate each nonterminal one by one to get our expression.

Using it.

The runnable code for this post is available here.