The simplest grammar fuzzer in the world

Iterative fuzzer
Artifacts

Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.

Fuzzing is one of the key tools in a security researcher’s tool box. It is simple to write a random fuzzer.

Unfortunately, random fuzzing is not very effective for programs that accept complex input languages such as those that expect JSON or any other structure in their input. For these programs, the fuzzing can be much more effective if one has a model of their input structure. A number of such tools exist (1, 2, 3, 4). But how difficult is it to write your own grammar based fuzzer? The interesting thing is that, a grammar fuzzer is essentially a parser turned inside out. Rather than consuming, we simply output what gets compared. With that idea in mind, let us use one of the simplest parsers – (A PEG parser).

Now, all one needs is a grammar.

grammar = {
        '<start>': [['<json>']],
        '<json>': [['<element>']],
        '<element>': [['<ws>', '<value>', '<ws>']],
        '<value>': [
           ['<object>'], ['<array>'], ['<string>'], ['<number>'],
           ['true'], ['false'], ['null']],
        '<object>': [['{', '<ws>', '}'], ['{', '<members>', '}']],
        '<members>': [['<member>', '<symbol-2>']],
        '<member>': [['<ws>', '<string>', '<ws>', ':', '<element>']],
        '<array>': [['[', '<ws>', ']'], ['[', '<elements>', ']']],
        '<elements>': [['<element>', '<symbol-1-1>']],
        '<string>': [['"', '<characters>', '"']],
        '<characters>': [['<character-1>']],
        '<character>': [
            ['0'], ['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9'],
            ['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g'], ['h'], ['i'], ['j'],
            ['k'], ['l'], ['m'], ['n'], ['o'], ['p'], ['q'], ['r'], ['s'], ['t'],
            ['u'], ['v'], ['w'], ['x'], ['y'], ['z'], ['A'], ['B'], ['C'], ['D'],
            ['E'], ['F'], ['G'], ['H'], ['I'], ['J'], ['K'], ['L'], ['M'], ['N'],
            ['O'], ['P'], ['Q'], ['R'], ['S'], ['T'], ['U'], ['V'], ['W'], ['X'],
            ['Y'], ['Z'], ['!'], ['#'], ['$'], ['%'], ['&'], ["'"], ['('], [')'],
            ['*'], ['+'], [','], ['-'], ['.'], ['/'], [':'], [';'], ['<'], ['='],
            ['>'], ['?'], ['@'], ['['], [']'], ['^'], ['_'], ['`'], ['{'], ['|'],
            ['}'], ['~'], [' '], ['\\"'], ['\\\\'], ['\\/'], ['<escaped>']],
        '<number>': [['<int>', '<frac>', '<exp>']],
        '<int>': [
           ['<digit>'], ['<onenine>', '<digits>'],
           ['-', '<digits>'], ['-', '<onenine>', '<digits>']],
        '<digits>': [['<digit-1>']],
        '<digit>': [['0'], ['<onenine>']],
        '<onenine>': [['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9']],
        '<frac>': [[], ['.', '<digits>']],
        '<exp>': [[], ['E', '<sign>', '<digits>'], ['e', '<sign>', '<digits>']],
        '<sign>': [[], ['+'], ['-']],
        '<ws>': [['<sp1>', '<ws>'], []],
        '<sp1>': [[' ']], ##[['\n'], ['\r'], ['\t'], ['\x08'], ['\x0c']],
        '<symbol>': [[',', '<members>']],
        '<symbol-1>': [[',', '<elements>']],
        '<symbol-2>': [[], ['<symbol>', '<symbol-2>']],
        '<symbol-1-1>': [[], ['<symbol-1>', '<symbol-1-1>']],
        '<character-1>': [[], ['<character>', '<character-1>']],
        '<digit-1>': [['<digit>'], ['<digit>', '<digit-1>']],
        '<escaped>': [['\\u', '<hex>', '<hex>', '<hex>', '<hex>']],
        '<hex>': [
            ['0'], ['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9'],
            ['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['A'], ['B'], ['C'], ['D'], ['E'],   ['F']]
        }

The driver is as follows:

This grammar fuzzer can be implemented in pretty much any programming language that supports basic data structures. What if you want the derivation tree instead? The following modified fuzzer will get you the derivation tree which can be used with fuzzingbook.GrammarFuzzer.tree_to_string

Using it

We now want a way to display this tree. We can do that as follows We first define a simple option holder class.

We can now define our default drawing options for displaying a tree. The default options include the vertical (|), the horizontal (–) and the how the last line is represented (+)

We want to display the tree. This is simply display_tree.

The display_tree calls format_tree which is defined as follows

We can now show the tree

The corresponding string is

One problem with the above fuzzer is that it can fail to terminate the recursion. So, what we want to do is to limit unbounded recursion to a fixed depth. Beyond that fixed depth, we want to only expand those rules that are guaranteed to terminate.

For that, we define the cost of expansion for each symbol in a grammar. A symbol costs as much as the cost of the least cost rule expansion.

A rule costs as much as the cost of expansion of the most costliest symbol in that rule + 1.

Here is an implementation that uses random expansions until a configurable depth (max_depth) is reached, and beyond that, uses purely non-recursive cheap expansions.

class LimitFuzzer:

def gen_key(self, key, depth, max_depth):
        if key not in self.grammar: return key
        if depth > max_depth:
            clst = sorted([(self.cost[key][str(rule)], rule) for rule in self.grammar[key]])
            assert clst[0][0] != float('inf')
            rules = [r for c,r in clst if c == clst[0][0]]
        else:
            rules = self.grammar[key]
        return self.gen_rule(random.choice(rules), depth+1, max_depth)

def gen_rule(self, rule, depth, max_depth):
        return ''.join(self.gen_key(token, depth, max_depth) for token in rule)

def fuzz(self, key='<start>', max_depth=10):
        return self.gen_key(key=key, depth=0, max_depth=max_depth)

def __init__(self, grammar):
        self.grammar = grammar
        self.key_cost = {}
        self.cost = compute_cost(grammar)
        self.cheap_grammar = {}
        for k in self.cost:
            # should we minimize it here? We simply avoid infinities
            rules = self.grammar[k]
            min_cost = min([self.cost[k][str(r)] for r in rules])
            #grammar[k] = [r for r in grammar[k] if self.cost[k][str(r)] == float('inf')]
            self.cheap_grammar[k] = [r for r in self.grammar[k] if self.cost[k][str(r)] == min_cost]

Using it:

Iterative fuzzer

One of the problems with the above fuzzer is that we use the Python stack to keep track of the expansion tree. Unfortunately, Python is really limited in terms of the usable stack depth. This can make it hard to generate deeply nested trees. One alternative solution is to handle the stack management ourselves as we show next. First, we define an iterative version of the tree_to_string function called iter_tree_to_str() as below.

You can use it as follows:

Next, we add the iter_gen_key() to LimitFuzzer

Finally, we ensure that the iterative gen_key can be called by defining iter_fuzz().

Using it

The runnable Python source for this notebook is available here

Artifacts

The runnable Python source for this notebook is available here.

The installable python wheel simplefuzzer is available here.

Contents

Iterative fuzzer

Artifacts