# Minimizing a Deterministic Regular Grammar or a DFA

A deterministic regular grammar is a DFA (Deterministic Finite Automaton) represented as a regular grammar. Such a grammar can have rules of the following format:

- \[A \rightarrow a B\]
- \[S \rightarrow E\]
- \[E \rightarrow \epsilon\]

where given a nonterminal \(A\) and a terminal symbol \(a\) at most one of
its rules will start with a terminal symbol \(a\).
Secondly, the \(\epsilon\) is attached to the \(E\) symbol, which is the
only termination point. If the language contains \(\epsilon\), then the
single degenerate rule, \(S \rightarrow E\) is added to the rules.
As you can see, each node has at most one
transition for a given terminal symbol. Hence, this canonical form is
equivalent to a deterministic finite state automation (**DFA**).

A deterministic regular grammar (also a DFA) is useful in many applications.
However, in most applications where a such a grammar can be used
(i.e., parsing and fuzzing) the performance of algorithms can be improved much
further by minimizing the grammar such that it has the smallest size possible
for the language it represents. This post tackles how to minimize a DFA using
the classical algorithm ^{1}. Note, it is referred to as
Hopcroft’s erroneously in wikipedia ^{2}, however Hopcroft has
a different algorithm based on partitioning.

We start with importing the prerequisites

## Contents

**Important:** Pyodide takes time to initialize.
Initialization completion is indicated by a red border around *Run all* button.

## Available Packages

These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.- simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
- gatleastsinglefault-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Faults".
- hdd-0.0.1-py2.py3-none-any.whl from "Hierarchical Delta Debugging".
- rxfuzzer-0.0.1-py2.py3-none-any.whl from "Fuzzing With Regular Expressions".
- ddset-0.0.1-py2.py3-none-any.whl from "Simple DDSet".
- earleyparser-0.0.1-py2.py3-none-any.whl from "Earley Parser".

The imported modules

## Minimization of the Regular Grammar

We assume that we have a DFA that is represented as a grammar where the states in the DFA are nonterminal symbols in the grammar and terminals are the transitions. However, this DFA is not necessarily minimal. That is, thre could be duplicate nonterminal symbols that could behave exactly alike. That is, if we take two such states as start symbols of the grammar, then the strings accepted by such an grammars would be exactly the same. So, the idea behind minimizing a regular grammar is to identify nonterminals that are duplicates of each other

Interestingly, Brzozowski ^{3} observed that if you reverse the
arrows in the DFA, resulting in an NFA, and then convert the NFA to DFA, then
do this again, the resulting DFA is minimal. However, this can have
exponential worst case complexity (but can be much faster in common patterns).
Hence, we look at a more common algorithm for minimization.

The idea is as follows:

- Mark the start state and all final states
- Collect all pairs of nonterminals from the grammar
- while there are pairs to be marked a. for each nonmarked pairs p,q do i. for each symbol a in alphabet do * if the pair delta(p, a) and delta(q, a) is marked, mark p,q
- Construct the reduced grammar by merging unmarked groups of pairs. We start with a matrix nonterminals, and iteratively refine them. We start by initializing our data structure.

Next, we want to update the markings. To improve the performance, we define a datastruture to keep the transitions from states.

We are now ready to update our markings.

Using it.

At this point, all that remains to be done is to merge the nonterminals in the
pairs which are indistinguished. That is, the value is `None`

.

Using it.

The runnable code for this post is available here.

## Artifacts

The runnable Python source for this notebook is available here.