A Fast Grammar Fuzzer by Compiling the Grammar
Contents
Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
In a previous post I disucssed a simple fuzzer. While simple, the fuzzer is somewhat inefficient. This post discusses a way to speed it up – by compiling.
Available Packages
These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.The imported modules
As before, we start with a grammar.
Compiled Fuzzer
This grammar fuzzer is described in the paper Building Fast Fuzzers. The idea is to compile a grammar definition to the corresponding source code. Each nonterminal symbol becomes a procedure. First we define a few helpers.
We are going to compile the grammar, which will become a source file that we load separately. To ensure that we do not descend into quoting hell, we transform the grammar so that we store the character bytes rather than the terminals as strings.
Usage
Next, we define the class.
Convenience methods
Cheap grammar compilation
In the previous post, I described how we shift to a cheap grammar when we exhaust our budget. We use the same thing here. That is, at some point we need to curtail the recursion. Hence, we define the cheap grammar that does not contain recursion. The idea is that if we go beyond a given depth, we switch to choosing rules from the non-recursive grammar (cheap grammar).
The cheap grammar from expr grammar
Translation
Translating the nonterminals of the cheap grammar is simple because there is no recursion. We simply choose a random rule to expand.
Usage
Main grammar compilation
For recursive grammars, we need to verify that the depth of recursion is not beyond what is specified. If it has gone beyond the maximum specified depth, we expand the cheap grammar instead.
Usage
The complete driver
Using it
A Problem – Recursion
A problem with the compiled grammar fuzzer is that it relies on recursion, and Python limits the recursion depth arbitrarily (starting with just 1000). Hence, we need a solution that allows us to go past that depth.
We discussed here how to use
generators as a continuation passing trampoline. We use the same technique
again. The basic technique is to turn every function call into a yield
statement, and return the generator. A loop then translates activates
and traverses these generators.
Example
We also need to redefine our cost computation which is recursive.
Using it
This is a useful trick. But what if we do not want to use the generator hack? Turns out, there is an easier solution. The idea is to wrap the remaining computation as a continuation and return. In our case, we modify this technique slightly.
We now define our trampoline, which is different from previous.
Using
Artifacts
The runnable Python source for this notebook is available here.