Contents

  1. Compiled Fuzzer
    1. Cheap grammar compilation
      1. Translation
    2. Main grammar compilation
    3. A Problem – Recursion
    4. Artifacts

Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.

In a previous post I disucssed a simple fuzzer. While simple, the fuzzer is somewhat inefficient. This post discusses a way to speed it up – by compiling.

Available Packages These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.
  1. simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
xxxxxxxxxx

The imported modules

x
 
import simplefuzzer as fuzzer


As before, we start with a grammar.

xxxxxxxxxx
 
EXPR_GRAMMAR = {
    "<start>": [["<expr>"]],
    "<expr>": [
        ["<term>", "+", "<expr>"],
        ["<term>", "-", "<expr>"],
        ["<term>"]],
    "<term>": [
        ["<factor>", "*", "<term>"],
        ["<factor>", "/", "<term>"],
        ["<factor>"]],
    "<factor>": [
        ["+", "<factor>"],
        ["-", "<factor>"],
        ["(", "<expr>", ")"],
        ["<integer>", ".", "<integer>"],
        ["<integer>"]],
    "<integer>": [
        ["<digit>", "<integer>"],
        ["<digit>"]],
    "<digit>": [["0"], ["1"], ["2"], ["3"], ["4"], ["5"], ["6"], ["7"], ["8"], ["9"]]
}
EXPR_START = '<start>'


Compiled Fuzzer

This grammar fuzzer is described in the paper Building Fast Fuzzers. The idea is to compile a grammar definition to the corresponding source code. Each nonterminal symbol becomes a procedure. First we define a few helpers.

xxxxxxxxxx
 
def symbol_cost(grammar, symbol, seen, cache):
    if symbol in seen: return float('inf')
    lst = []
    for rule in grammar.get(symbol, []):
        if symbol in cache and str(rule) in cache[symbol]:
            lst.append(cache[symbol][str(rule)])
        else:
            lst.append(expansion_cost(grammar, rule, seen | {symbol}, cache))
    v = min(lst, default=0)
    return v
def expansion_cost(grammar, tokens, seen, cache):
    return max((symbol_cost(grammar, token, seen, cache)
                for token in tokens if token in grammar), default=0) + 1
def compute_cost(grammar):
    rule_cost = {}
    for k in grammar:
        rule_cost[k] = {}
        for rule in grammar[k]:
            rule_cost[k][str(rule)] = expansion_cost(grammar, rule, set(), rule_cost)
    return rule_cost


We are going to compile the grammar, which will become a source file that we load separately. To ensure that we do not descend into quoting hell, we transform the grammar so that we store the character bytes rather than the terminals as strings.

xxxxxxxxxx
 
def transform_bytes(grammar):
   new_g = {}
   for k in grammar:
       new_g[k] = [
               [t if fuzzer.is_nonterminal(t) else ord(t) for t in r]
               for r in grammar[k]]
   return new_g


Usage

xxxxxxxxxx
 
g = transform_bytes(EXPR_GRAMMAR)
for k in g:
    print(k)
    for r in g[k]:
        print('  ', r)


Next, we define the class.

xxxxxxxxxx
 
class F1Fuzzer(fuzzer.LimitFuzzer):
    def __init__(self, grammar):
        self.grammar = transform_bytes(grammar)
        self.cost = compute_cost(self.grammar)


Convenience methods

xxxxxxxxxx
 
class F1Fuzzer(F1Fuzzer):
    def add_indent(self, string, indent):
        return '\n'.join([indent + i for i in string.split('\n')])
    def k_to_s(self, k): return k[1:-1].replace('-', '_')


Cheap grammar compilation

In the previous post, I described how we shift to a cheap grammar when we exhaust our budget. We use the same thing here. That is, at some point we need to curtail the recursion. Hence, we define the cheap grammar that does not contain recursion. The idea is that if we go beyond a given depth, we switch to choosing rules from the non-recursive grammar (cheap grammar).

xxxxxxxxxx
 
class F1Fuzzer(F1Fuzzer):
    def cheap_grammar(self):
        cheap_grammar = {}
        for k in self.cost:
            rules = self.grammar[k]
            if rules:
                min_cost = min([self.cost[k][str(r)] for r in rules])
                cheap_grammar[k] = [r for r in self.grammar[k] if self.cost[k][str(r)] == min_cost]
            else:
                cheap_grammar[k] = [] # (No rules found)
        return cheap_grammar


The cheap grammar from expr grammar

xxxxxxxxxx
 
expr_cg = F1Fuzzer(EXPR_GRAMMAR).cheap_grammar()
for k in expr_cg:
    print(k)
    for r in expr_cg[k]:
        print('   ', r)


Translation

Translating the nonterminals of the cheap grammar is simple because there is no recursion. We simply choose a random rule to expand.

xxxxxxxxxx
 
class F1Fuzzer(F1Fuzzer):
    def gen_rule_src_cheap(self, rule, key, i, grammar):
        res = []
        for token in rule:
            if token in grammar:
                res.append('''\
gen_%s_cheap()''' % (self.k_to_s(token)))
            else:
                res.append('''\
result.append(%d)''' % token)
        return '\n'.join(res)
    def gen_alt_src_cheap(self, key, grammar):
        rules = grammar[key]
        result = []
        result.append('''
def gen_%(name)s_cheap():
    val = random.randrange(%(nrules)s)''' % {
            'name':self.k_to_s(key),
            'nrules':len(rules)})
        for i, rule in enumerate(rules):
            result.append('''\
    if val == %d:
%s
        return''' % (i, self.add_indent(
            self.gen_rule_src_cheap(rule, key, i, grammar),'        ')))
        return '\n'.join(result)


Usage

xxxxxxxxxx
 
src = F1Fuzzer(EXPR_GRAMMAR).gen_alt_src_cheap('<expr>',
        transform_bytes(EXPR_GRAMMAR))
print(src)


Main grammar compilation

For recursive grammars, we need to verify that the depth of recursion is not beyond what is specified. If it has gone beyond the maximum specified depth, we expand the cheap grammar instead.

xxxxxxxxxx
 
class F1Fuzzer(F1Fuzzer):
    def gen_rule_src(self, rule, key, i, grammar):
        res = []
        for token in rule:
            if token in grammar:
                res.append('''\
gen_%s(max_depth, next_depth)''' % (self.k_to_s(token)))
            else:
                res.append('''\
result.append(%d)''' % token)
        return '\n'.join(res)
    def gen_alt_src(self, key, grammar):
        rules = grammar[key]
        result = []
        result.append('''
def gen_%(name)s(max_depth, depth=0):
    next_depth = depth + 1
    if depth > max_depth:
        gen_%(name)s_cheap()
        return
    val = random.randrange(%(nrules)s)''' % {
            'name':self.k_to_s(key),
            'nrules':len(rules)})
        for i, rule in enumerate(rules):
            result.append('''\
    if val == %d:
%s
        return''' % (i, self.add_indent(self.gen_rule_src(rule, key, i, grammar),'        ')))
        return '\n'.join(result)


Usage

xxxxxxxxxx
 
src = F1Fuzzer(EXPR_GRAMMAR).gen_alt_src('<expr>',
        transform_bytes(EXPR_GRAMMAR))
print(src)


The complete driver

xxxxxxxxxx
 
import types
class F1Fuzzer(F1Fuzzer):
    def gen_main_src(self):
        return '''
import random
result = []
def start(max_depth):
    gen_start(max_depth)
    v = ''.join([chr(i) for i in result])
    result.clear()
    return v
        '''
    def gen_fuzz_src(self):
        result = []
        cheap_grammar = self.cheap_grammar()
        for key in cheap_grammar:
            result.append(self.gen_alt_src_cheap(key, cheap_grammar))
        for key in self.grammar:
            result.append(self.gen_alt_src(key, self.grammar))
        return '\n'.join(result)
    def fuzz_src(self, key='<start>'):
        result = [self.gen_fuzz_src(),
                  self.gen_main_src()]
        return ''.join(result)
    def load_src(self, src, mn):
        module = types.ModuleType(mn)
        exec(src, module.__dict__)
        return module
    def fuzzer(self, name):
        cf_src = self.fuzz_src()
        return self.load_src(cf_src, name + '_f1_fuzzer')


Using it

xxxxxxxxxx
 
expr_fuzzer = F1Fuzzer(EXPR_GRAMMAR).fuzzer('expr_fuzzer')
for i in range(10):
    v = expr_fuzzer.start(10)
    print(v)


A Problem – Recursion

A problem with the compiled grammar fuzzer is that it relies on recursion, and Python limits the recursion depth arbitrarily (starting with just 1000). Hence, we need a solution that allows us to go past that depth.

We discussed here how to use generators as a continuation passing trampoline. We use the same technique again. The basic technique is to turn every function call into a yield statement, and return the generator. A loop then translates activates and traverses these generators.

xxxxxxxxxx
 
class F1CPSFuzzer(F1Fuzzer):
    def gen_rule_src_cheap(self, rule, key, i, grammar):
        res = []
        for token in rule:
            if token in grammar:
                res.append('''\
yield gen_%s_cheap()''' % (self.k_to_s(token)))
            else:
                res.append('''\
result.append(%d)''' % token)
        return '\n'.join(res)
    def gen_alt_src_cheap(self, key, grammar):
        rules = grammar[key]
        result = []
        result.append('''
def gen_%(name)s_cheap():
    val = random.randrange(%(nrules)s)''' % {
            'name':self.k_to_s(key),
            'nrules':len(rules)})
        for i, rule in enumerate(rules):
            result.append('''\
    if val == %d:
%s
        return''' % (i, self.add_indent(self.gen_rule_src_cheap(rule, key, i, grammar),'        ')))
        return '\n'.join(result)
class F1CPSFuzzer(F1CPSFuzzer):
    def gen_rule_src(self, rule, key, i, grammar):
        res = []
        for token in rule:
            if token in grammar:
                res.append('''\
yield gen_%s(max_depth, next_depth)''' % (self.k_to_s(token)))
            else:
                res.append('''\
result.append(%d)''' % token)
        return '\n'.join(res)
    def gen_alt_src(self, key, grammar):
        rules = grammar[key]
        result = []
        result.append('''
def gen_%(name)s(max_depth, depth=0):
    next_depth = depth + 1
    if depth > max_depth:
        yield gen_%(name)s_cheap()
        return
    val = random.randrange(%(nrules)s)''' % {
            'name':self.k_to_s(key),
            'nrules':len(rules)})
        for i, rule in enumerate(rules):
            result.append('''\
    if val == %d:
%s
        return''' % (i, self.add_indent(self.gen_rule_src(rule, key, i, grammar),'        ')))
        return '\n'.join(result)
class F1CPSFuzzer(F1CPSFuzzer):
    def gen_main_src(self):
        return '''
def cpstrampoline(gen):
    stack = [gen]
    ret = None
    while stack:
        try:
            value, ret = ret, None
            res = stack[-1].send(value)
            if res is not None:
                stack.append(res)
        except StopIteration as e:
            stack.pop()
            ret = e.value
    return ret
import random
result = []
def start(max_depth):
    cpstrampoline(gen_start(max_depth))
    v = ''.join([chr(i) for i in result])
    result.clear()
    return v
        '''
    def gen_fuzz_src(self):
        result = []
        cheap_grammar = self.cheap_grammar()
        for key in cheap_grammar:
            result.append(self.gen_alt_src_cheap(key, cheap_grammar))
        for key in self.grammar:
            result.append(self.gen_alt_src(key, self.grammar))
        return '\n'.join(result)
    def fuzz_src(self, key='<start>'):
        result = [self.gen_fuzz_src(),
                  self.gen_main_src()]
        return ''.join(result)
    def fuzzer(self, name):
        cf_src = self.fuzz_src()
        return self.load_src(cf_src, name + '_f1_fuzzer')


Example

xxxxxxxxxx
 
src = F1CPSFuzzer(EXPR_GRAMMAR).fuzz_src('<start>')
print(src)


We also need to redefine our cost computation which is recursive.

xxxxxxxxxx
 
def symbol_cost_cps(grammar, symbol, seen, cache):
    if symbol in seen: return float('inf')
    lst = []
    for rule in grammar.get(symbol, []):
        if symbol in cache and str(rule) in cache[symbol]:
            lst.append(cache[symbol][str(rule)])
        else:
            e = yield expansion_cost_cps(grammar, rule, seen | {symbol}, cache)
            lst.append(e)
    v = min(lst, default=0)
    return v
def expansion_cost_cps(grammar, tokens, seen, cache):
    lst = []
    for token in tokens:
        if token not in grammar: continue
        s = yield symbol_cost_cps(grammar, token, seen, cache)
        lst.append(s)
    return max(lst, default=0) + 1
def cpstrampoline(gen):
    stack = [gen]
    ret = None
    while stack:
        try:
            value, ret = ret, None
            res = stack[-1].send(value)
            if res is not None:
                stack.append(res)
        except StopIteration as e:
            stack.pop()
            ret = e.value
    return ret
def compute_cost_cps(grammar):
    rule_cost = {}
    for k in grammar:
        rule_cost[k] = {}
        for rule in grammar[k]:
            e = cpstrampoline(expansion_cost_cps(grammar, rule, set(), rule_cost))
            rule_cost[k][str(rule)] = e
    return rule_cost
class F1CPSFuzzer(F1CPSFuzzer):
    def __init__(self, grammar):
        self.grammar = transform_bytes(grammar)
        self.cost = compute_cost_cps(self.grammar)


Using it

xxxxxxxxxx
 
expr_fuzzer = F1CPSFuzzer(EXPR_GRAMMAR).fuzzer('expr_fuzzer')
for i in range(10):
    v = expr_fuzzer.start(10)
    print(repr(v))


This is a useful trick. But what if we do not want to use the generator hack? Turns out, there is an easier solution. The idea is to wrap the remaining computation as a continuation and return. In our case, we modify this technique slightly.

xxxxxxxxxx
 
class F1LFuzzer(F1Fuzzer):
    def gen_rule_src_cheap(self, rule, key, i, grammar):
        res = []
        for token in rule:
            if token in grammar:
                res.append('''\
lambda: gen_%s_cheap(),''' % (self.k_to_s(token)))
            else:
                res.append('''\
lambda: result.append(%d),''' % token)
        return '\n'.join(res)
    def gen_alt_src_cheap(self, key, grammar):
        rules = grammar[key]
        result = []
        result.append('''
def gen_%(name)s_cheap():
    val = random.randrange(%(nrules)s)''' % {
            'name':self.k_to_s(key),
            'nrules':len(rules)})
        for i, rule in enumerate(rules):
            result.append('''\
    if val == %d:
        return [%s]''' % (i, self.add_indent(self.gen_rule_src_cheap(rule, key, i, grammar),'        ')))
        return '\n'.join(result)
class F1LFuzzer(F1LFuzzer):
    def gen_rule_src(self, rule, key, i, grammar):
        res = []
        for token in rule:
            if token in grammar:
                res.append('''\
lambda: gen_%s(max_depth, next_depth),''' % (self.k_to_s(token)))
            else:
                res.append('''\
lambda: result.append(%d),''' % token)
        return '\n'.join(res)
    def gen_alt_src(self, key, grammar):
        rules = grammar[key]
        result = []
        result.append('''
def gen_%(name)s(max_depth, depth=0):
    next_depth = depth + 1
    if depth > max_depth:
        return [lambda: gen_%(name)s_cheap()]
    val = random.randrange(%(nrules)s)''' % {
            'name':self.k_to_s(key),
            'nrules':len(rules)})
        for i, rule in enumerate(rules):
            result.append('''\
    if val == %d:
        return [%s]''' % (i, self.add_indent(self.gen_rule_src(rule, key, i, grammar),'        ')))
        return '\n'.join(result)


We now define our trampoline, which is different from previous.

xxxxxxxxxx
 
class F1LFuzzer(F1LFuzzer):
    def gen_main_src(self):
        return '''
def trampoline(gen):
    ret = None
    stack = gen
    while stack:
        cur, *stack = stack
        res = cur()
        if res is not None:
            stack = res + stack
    return
import random
result = []
def start(max_depth):
    trampoline(gen_start(max_depth))
    v = ''.join([chr(i) for i in result])
    result.clear()
    return v
        '''
    def gen_fuzz_src(self):
        result = []
        cheap_grammar = self.cheap_grammar()
        for key in cheap_grammar:
            result.append(self.gen_alt_src_cheap(key, cheap_grammar))
        for key in self.grammar:
            result.append(self.gen_alt_src(key, self.grammar))
        return '\n'.join(result)
    def fuzz_src(self, key='<start>'):
        result = [self.gen_fuzz_src(),
                  self.gen_main_src()]
        return ''.join(result)
    def fuzzer(self, name):
        cf_src = self.fuzz_src()
        return self.load_src(cf_src, name + '_f1_fuzzer')


Using

xxxxxxxxxx
 
f = F1LFuzzer(EXPR_GRAMMAR)
expr_fuzzer = f.fuzzer('expr_fuzzer')
for i in range(10):
    v = expr_fuzzer.start(10)
    print(repr(v))


Artifacts

The runnable Python source for this notebook is available here.