Compiled Fuzzer

Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.

In a previous post I disucssed a simple fuzzer. While simple, the fuzzer is somewhat inefficient. This post discusses a way to speed it up – by compiling.

Available Packages

These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.

simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".

The imported modules

As before, we start with a grammar.

Compiled Fuzzer

This grammar fuzzer is described in the paper Building Fast Fuzzers. The idea is to compile a grammar definition to the corresponding source code. Each nonterminal symbol becomes a procedure. First we define a few helpers.

We are going to compile the grammar, which will become a source file that we load separately. To ensure that we do not descend into quoting hell, we transform the grammar so that we store the character bytes rather than the terminals as strings.

Usage

Next, we define the class.

Convenience methods

Cheap grammar compilation

In the previous post, I described how we shift to a cheap grammar when we exhaust our budget. We use the same thing here. That is, at some point we need to curtail the recursion. Hence, we define the cheap grammar that does not contain recursion. The idea is that if we go beyond a given depth, we switch to choosing rules from the non-recursive grammar (cheap grammar).

The cheap grammar from expr grammar

Translation

Translating the nonterminals of the cheap grammar is simple because there is no recursion. We simply choose a random rule to expand.

Usage

Main grammar compilation

For recursive grammars, we need to verify that the depth of recursion is not beyond what is specified. If it has gone beyond the maximum specified depth, we expand the cheap grammar instead.

Usage

The complete driver

Using it

A Problem – Recursion

A problem with the compiled grammar fuzzer is that it relies on recursion, and Python limits the recursion depth arbitrarily (starting with just 1000). Hence, we need a solution that allows us to go past that depth.

We discussed here how to use generators as a continuation passing trampoline. We use the same technique again. The basic technique is to turn every function call into a yield statement, and return the generator. A loop then translates activates and traverses these generators.

class F1CPSFuzzer(F1Fuzzer):
    def gen_rule_src_cheap(self, rule, key, i, grammar):
        res = []
        for token in rule:
            if token in grammar:
                res.append('''\
yield gen_%s_cheap()''' % (self.k_to_s(token)))
            else:
                res.append('''\
result.append(%d)''' % token)
        return '\n'.join(res)

def gen_alt_src_cheap(self, key, grammar):
        rules = grammar[key]
        result = []
        result.append('''
def gen_%(name)s_cheap():
    val = random.randrange(%(nrules)s)''' % {
            'name':self.k_to_s(key),
            'nrules':len(rules)})
        for i, rule in enumerate(rules):
            result.append('''\
    if val == %d:
%s
        return''' % (i, self.add_indent(self.gen_rule_src_cheap(rule, key, i, grammar),'        ')))
        return '\n'.join(result)

class F1CPSFuzzer(F1CPSFuzzer):
    def gen_rule_src(self, rule, key, i, grammar):
        res = []
        for token in rule:
            if token in grammar:
                res.append('''\
yield gen_%s(max_depth, next_depth)''' % (self.k_to_s(token)))
            else:
                res.append('''\
result.append(%d)''' % token)
        return '\n'.join(res)

def gen_alt_src(self, key, grammar):
        rules = grammar[key]
        result = []
        result.append('''
def gen_%(name)s(max_depth, depth=0):
    next_depth = depth + 1
    if depth > max_depth:
        yield gen_%(name)s_cheap()
        return
    val = random.randrange(%(nrules)s)''' % {
            'name':self.k_to_s(key),
            'nrules':len(rules)})
        for i, rule in enumerate(rules):
            result.append('''\
    if val == %d:
%s
        return''' % (i, self.add_indent(self.gen_rule_src(rule, key, i, grammar),'        ')))
        return '\n'.join(result)

class F1CPSFuzzer(F1CPSFuzzer):
    def gen_main_src(self):
        return '''
def cpstrampoline(gen):
    stack = [gen]
    ret = None
    while stack:
        try:
            value, ret = ret, None
            res = stack[-1].send(value)
            if res is not None:
                stack.append(res)
        except StopIteration as e:
            stack.pop()
            ret = e.value
    return ret

import random
result = []
def start(max_depth):
    cpstrampoline(gen_start(max_depth))
    v = ''.join([chr(i) for i in result])
    result.clear()
    return v
        '''

def gen_fuzz_src(self):
        result = []
        cheap_grammar = self.cheap_grammar()
        for key in cheap_grammar:
            result.append(self.gen_alt_src_cheap(key, cheap_grammar))
        for key in self.grammar:
            result.append(self.gen_alt_src(key, self.grammar))
        return '\n'.join(result)

def fuzz_src(self, key='<start>'):
        result = [self.gen_fuzz_src(),
                  self.gen_main_src()]
        return ''.join(result)

def fuzzer(self, name):
        cf_src = self.fuzz_src()
        return self.load_src(cf_src, name + '_f1_fuzzer')

Example

We also need to redefine our cost computation which is recursive.

def symbol_cost_cps(grammar, symbol, seen, cache):
    if symbol in seen: return float('inf')
    lst = []
    for rule in grammar.get(symbol, []):
        if symbol in cache and str(rule) in cache[symbol]:
            lst.append(cache[symbol][str(rule)])
        else:
            e = yield expansion_cost_cps(grammar, rule, seen | {symbol}, cache)
            lst.append(e)
    v = min(lst, default=0)
    return v

def expansion_cost_cps(grammar, tokens, seen, cache):
    lst = []
    for token in tokens:
        if token not in grammar: continue
        s = yield symbol_cost_cps(grammar, token, seen, cache)
        lst.append(s)
    return max(lst, default=0) + 1

def cpstrampoline(gen):
    stack = [gen]
    ret = None
    while stack:
        try:
            value, ret = ret, None
            res = stack[-1].send(value)
            if res is not None:
                stack.append(res)
        except StopIteration as e:
            stack.pop()
            ret = e.value
    return ret

def compute_cost_cps(grammar):
    rule_cost = {}
    for k in grammar:
        rule_cost[k] = {}
        for rule in grammar[k]:
            e = cpstrampoline(expansion_cost_cps(grammar, rule, set(), rule_cost))
            rule_cost[k][str(rule)] = e
    return rule_cost

class F1CPSFuzzer(F1CPSFuzzer):
    def __init__(self, grammar):
        self.grammar = transform_bytes(grammar)
        self.cost = compute_cost_cps(self.grammar)

Using it

This is a useful trick. But what if we do not want to use the generator hack? Turns out, there is an easier solution. The idea is to wrap the remaining computation as a continuation and return. In our case, we modify this technique slightly.

class F1LFuzzer(F1Fuzzer):
    def gen_rule_src_cheap(self, rule, key, i, grammar):
        res = []
        for token in rule:
            if token in grammar:
                res.append('''\
lambda: gen_%s_cheap(),''' % (self.k_to_s(token)))
            else:
                res.append('''\
lambda: result.append(%d),''' % token)
        return '\n'.join(res)

def gen_alt_src_cheap(self, key, grammar):
        rules = grammar[key]
        result = []
        result.append('''
def gen_%(name)s_cheap():
    val = random.randrange(%(nrules)s)''' % {
            'name':self.k_to_s(key),
            'nrules':len(rules)})
        for i, rule in enumerate(rules):
            result.append('''\
    if val == %d:
        return [%s]''' % (i, self.add_indent(self.gen_rule_src_cheap(rule, key, i, grammar),'        ')))
        return '\n'.join(result)

class F1LFuzzer(F1LFuzzer):
    def gen_rule_src(self, rule, key, i, grammar):
        res = []
        for token in rule:
            if token in grammar:
                res.append('''\
lambda: gen_%s(max_depth, next_depth),''' % (self.k_to_s(token)))
            else:
                res.append('''\
lambda: result.append(%d),''' % token)
        return '\n'.join(res)

def gen_alt_src(self, key, grammar):
        rules = grammar[key]
        result = []
        result.append('''
def gen_%(name)s(max_depth, depth=0):
    next_depth = depth + 1
    if depth > max_depth:
        return [lambda: gen_%(name)s_cheap()]
    val = random.randrange(%(nrules)s)''' % {
            'name':self.k_to_s(key),
            'nrules':len(rules)})
        for i, rule in enumerate(rules):
            result.append('''\
    if val == %d:
        return [%s]''' % (i, self.add_indent(self.gen_rule_src(rule, key, i, grammar),'        ')))
        return '\n'.join(result)

We now define our trampoline, which is different from previous.

class F1LFuzzer(F1LFuzzer):
    def gen_main_src(self):
        return '''
def trampoline(gen):
    ret = None
    stack = gen
    while stack:
        cur, *stack = stack
        res = cur()
        if res is not None:
            stack = res + stack
    return

import random
result = []
def start(max_depth):
    trampoline(gen_start(max_depth))
    v = ''.join([chr(i) for i in result])
    result.clear()
    return v
        '''

def fuzz_src(self, key='<start>'):
        result = [self.gen_fuzz_src(),
                  self.gen_main_src()]
        return ''.join(result)

def fuzzer(self, name):
        cf_src = self.fuzz_src()
        return self.load_src(cf_src, name + '_f1_fuzzer')

Using

Artifacts

The runnable Python source for this notebook is available here.

A Fast Grammar Fuzzer by Compiling the Grammar

Contents

Compiled Fuzzer

Cheap grammar compilation

Translation

Main grammar compilation

A Problem – Recursion

Artifacts