# Search Based Fuzzing -- Computing Branch Distance

Search Based Fuzzing involves generating various candidate inputs for a given
program, identifying the inputs with the best score in some metric of
effectiveness and choosing them for the next iteration so that one can
iteratively improve the fitness of the given population of inputs.
In the previous post
I discussed how you can score inputs using *approach level*.

Approach level (or approximation level) is reasonable to compute the
distance we have to traverse to reach a given node. However, in fuzzing, we
also find that just reaching a node is insufficient. In several cases, the
branch taken by an input execution determines if we can make progress. That is
given several inputs, each of which reach a given decision node, we need to
prioritize those inputs that are closest to switching to an uncovered branch.
To do this, we use what is called the *Branch Distance*. This was first
proposed by Bogdan Korel in 1990 [^korel1990]. In this post, I will discuss how to
compute branch distance for computing the fitness score of an input in
flipping a branch condition.

As before, we start by importing the prerequisites

## Contents

**Important:** Pyodide takes time to initialize.
Initialization completion is indicated by a red border around *Run all* button.

## Available Packages

These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.Given a set of variables (corresponding to an input) and a conditional, the branch distance measures the distance to the critical branch being true or false, where the critical branch is the branch where control flow diverged from reaching the target. That is, if an input diverged at a critical branch, then the distance to the conditional being true or false. It is given by the following translation:

Element | Value |
---|---|

Boolean | if True then 0 else K |

a = b | if abs(a-b) = 0 then 0 else abs(a-b) + K |

a != b | if abs(a-b) != 0 then 0 else K |

a < b | if a-b < 0 then 0 else (a-b) + K |

a <= b | if a-b <= 0 then 0 else (a-b) + K |

a > b | if b-a > 0 then 0 else (b-a) + K |

a > b | if b-a >= 0 then 0 else (b-a) + K |

a or b |
min(bdistance(a), bdistance(b)) |

a and b |
bdistance(a) + bdistance(b) |

not a |
Negation is moved inward and propagated over a |

K is a penalty constant which lets you fine tune the penalty of being wrong. Typically K=1

Once you have the branch distance, you need to **normalize** it to make it less than 1. A common formula used is

Another [^arcuri2011] is

\[\frac{branchD}{branchD + \beta}\]where \(branchD\) is the value to be normalized and \(\beta\) is a constant value you choose.

Finally, \(fitness = approach level + normalized branch distance\)

Let us consider a few examples.

say we have the following inputs

As per the above formula, the `bdistance`

is

Can we move it forward? Let us consider a few neighbours.

That is, as per this computation, 0, 0 is closer to flipping the branch. let us explore the neighbours again

again

at this point, we have a zero

Can we automate this approach? Interestingly, this is quite easy. We can reuse the approach in metacircular interpreter and change the semantics to conform to our requirement.

Here is a quick check to show that the meta-circular interpreter works expected.

Next, let us redefine the interesting bits according to the table we provided. Provided below again for easy reference.

Element | Value |
---|---|

Boolean | if True then 0 else K |

a = b | if abs(a-b) = 0 then 0 else abs(a-b) + K |

a != b | if abs(a-b) != 0 then 0 else K |

a < b | if a-b < 0 then 0 else (a-b) + K |

a <= b | if a-b <= 0 then 0 else (a-b) + K |

a > b | if b-a > 0 then 0 else (b-a) + K |

a > b | if b-a >= 0 then 0 else (b-a) + K |

a or b |
min(bdistance(a), bdistance(b)) |

a and b |
bdistance(a) + bdistance(b) |

not a |
Negation is moved inward and propagated over a |

For ease of discourse, let us consider the last one first. The idea is that if
one encounters a `not`

unary, then it should be moved inward to the outermost
comparison, which gets flipped. Any `and`

or `or`

that is encountered gets
switched.

The same also gets applied when we want to take the `false`

branch of a
conditional. So, let us create a new class, that given an expression `e`

,
transforms it as equivalent to `not e`

, but without the `not`

in the
expression, and normalizes it. So, we need two classes that correspond to
both distributing any internal `not`

and negating a given expression.

## DistributeNot

This class normalizes any `Not`

by distributing it inside.
First the infrastructure.

When we find `module`

, and `expr`

there is no change, because they are
just wrapper classes

We need two classes, the `DistributeNot`

which is responsible for
non-negated and `NegateDistributeNot`

which is responsible for carrying
a negated expression.

Simple things like names and constants should get translated directly by
the `DistributeNot`

, but should be negated by `NegateDistributeNot`

.

Check that it works.

What should happen for `not a`

? It should get pushed into a
if possible. That is, `DistributeNot`

should then switch
to `NegateDistributeNot`

. However, if we are starting with
`NegateDistributeNot`

, then it is already carrying a negation,
so it should switch to `DistributeNot`

.

Check that it works

What should happen for `a and b`

? It should get turned into
`not (a and b)`

which is `(not a) or (not b)`

, but only
on NegateDistributeNot. For DistributeNot, there is no change.

Check that it works

The on_compare method is simply itself in `DistributeNot`

because we do not
expect a `not`

inside the compare. The `NegateDistributeNot`

switches to
its anti operation. We also do not have to `walk`

inside the comparators
because we do not expect either boolean operators or other comparators inside
comparators.

Check that it works

We can now define branch distance conversions in `BDInterpreter`

class.
we want the comparator to have access to K. So we pass in `self`

.

We need one more step. That is, we need to run the evaluator. In the below, we
assume that we need to take the `True`

branch. Hence, we use the `normal_ast`

to
find how to flip from the `False`

branch. If on the other hand, you want to flip
from the `True`

branch to `False`

branch of the conditional, then you need the
`negated_ast`

.

Let us try to make it run. Note that the examples would typically be present in code as

```
if a>b:
# target branch
print('Hello')
else:
# this branch was taken.
print('Hi')
```

How would you use the branch distance metric? Currently (2024) it is used to further fine-tune the fitness of a given input with respect to a given condition. That is, the approach distance is used as the whole number component of the fitness, and the branch distance is used as the fractional component, and the branch distance is only computed at the particular branch where the control flow diverged from the desired path. However, this is not the only way it can be used. Korel [^korel1990] initially suggested using the branch distance directly, computing the fitness based on the entire set of boolean conditions encountered on a desired path.

Note: A particularly good resource on SBST is Simon Marcus Poulding’s Ph.D. thesis “The Use of Automated Search in Deriving Software Testing Strategies”. [^korel1990]: Bogdan Korel “Automated software test data generation.” IEEE Transactions on software engineering, 1990 [^arcuri2011]: Andrea Arcuri “It really does matter how you normalize the branch distance in search-based software testing” 2011

## Artifacts

The runnable Python source for this notebook is available here.