Search Based Fuzzing involves generating various candidate inputs for a given
program, identifying the inputs with the best score in some metric of
effectiveness and choosing them for the next iteration so that one can
iteratively improve the fitness of the given population of inputs.
In the previous post
I discussed how you can score inputs using approach level.
Approach level (or approximation level) is reasonable to compute the
distance we have to traverse to reach a given node. However, in fuzzing, we
also find that just reaching a node is insufficient. In several cases, the
branch taken by an input execution determines if we can make progress. That is
given several inputs, each of which reach a given decision node, we need to
prioritize those inputs that are closest to switching to an uncovered branch.
To do this, we use what is called the Branch Distance. This was first
proposed by Bogdan Korel in 1990 [^korel1990]. In this post, I will discuss how to
compute branch distance for computing the fitness score of an input in
flipping a branch condition.
As before, we start by importing the prerequisites
Important:Pyodide takes time to initialize.
Initialization completion is indicated by a red border around Run all button.
Available Packages
These are packages that refer either to my previous posts or to pure python
packages that I have compiled, and is available in the below locations. As
before, install them if you need to run the program directly on the machine.
To install, simply download the wheel file (`pkg.whl`) and install using
`pip install pkg.whl`.
Given a set of variables (corresponding to an input) and a conditional,
the branch distance measures the distance to the critical branch being
true or false, where the critical branch is the branch where control flow
diverged from reaching the target. That is, if an input diverged at
a critical branch, then the distance to the conditional being true or false.
It is given by the following translation:
Element
Value
Boolean
if True then 0 else K
a = b
if abs(a-b) = 0 then 0 else abs(a-b) + K
a != b
if abs(a-b) != 0 then 0 else K
a < b
if a-b < 0 then 0 else (a-b) + K
a <= b
if a-b <= 0 then 0 else (a-b) + K
a > b
if b-a > 0 then 0 else (b-a) + K
a > b
if b-a >= 0 then 0 else (b-a) + K
a or b
min(bdistance(a), bdistance(b))
a and b
bdistance(a) + bdistance(b)
not a
Negation is moved inward and propagated over a
K is a penalty constant which lets you fine tune the penalty of being wrong. Typically K=1
Once you have the branch distance, you need to normalize it to make it less than 1. A common formula used is
\[normalize(branchD) = 1 − 1.001^{(−branchD)}\]
Another [^arcuri2011] is
\[\frac{branchD}{branchD + \beta}\]
where \(branchD\) is the value to be normalized and \(\beta\) is a constant value you choose.
Can we move it forward? Let us consider a few neighbours.
That is, as per this computation, 0, 0 is closer to flipping the branch.
let us explore the neighbours again
again
at this point, we have a zero
Can we automate this approach? Interestingly, this is quite easy.
We can reuse the approach in metacircular interpreter
and change the semantics to conform to our requirement.
Here is a quick check to show that the meta-circular interpreter works expected.
Next, let us redefine the interesting bits according to the table we provided.
Provided below again for easy reference.
Element
Value
Boolean
if True then 0 else K
a = b
if abs(a-b) = 0 then 0 else abs(a-b) + K
a != b
if abs(a-b) != 0 then 0 else K
a < b
if a-b < 0 then 0 else (a-b) + K
a <= b
if a-b <= 0 then 0 else (a-b) + K
a > b
if b-a > 0 then 0 else (b-a) + K
a > b
if b-a >= 0 then 0 else (b-a) + K
a or b
min(bdistance(a), bdistance(b))
a and b
bdistance(a) + bdistance(b)
not a
Negation is moved inward and propagated over a
For ease of discourse, let us consider the last one first. The idea is that if
one encounters a not unary, then it should be moved inward to the outermost
comparison, which gets flipped. Any and or or that is encountered gets
switched.
The same also gets applied when we want to take the false branch of a
conditional. So, let us create a new class, that given an expression e,
transforms it as equivalent to not e, but without the not in the
expression, and normalizes it. So, we need two classes that correspond to
both distributing any internal not and negating a given expression.
DistributeNot
This class normalizes any Not by distributing it inside.
First the infrastructure.
When we find module, and expr there is no change, because they are
just wrapper classes
We need two classes, the DistributeNot which is responsible for
non-negated and NegateDistributeNot which is responsible for carrying
a negated expression.
Simple things like names and constants should get translated directly by
the DistributeNot, but should be negated by NegateDistributeNot.
Check that it works.
What should happen for not a? It should get pushed into a
if possible. That is, DistributeNot should then switch
to NegateDistributeNot. However, if we are starting with
NegateDistributeNot, then it is already carrying a negation,
so it should switch to DistributeNot.
Check that it works
What should happen for a and b? It should get turned into
not (a and b) which is (not a) or (not b), but only
on NegateDistributeNot. For DistributeNot, there is no change.
Check that it works
The on_compare method is simply itself in DistributeNot because we do not
expect a not inside the compare. The NegateDistributeNot switches to
its anti operation. We also do not have to walk inside the comparators
because we do not expect either boolean operators or other comparators inside
comparators.
Check that it works
We can now define branch distance conversions in BDInterpreter class.
we want the comparator to have access to K. So we pass in self.
We need one more step. That is, we need to run the evaluator. In the below, we
assume that we need to take the True branch. Hence, we use the normal_ast to
find how to flip from the False branch. If on the other hand, you want to flip
from the True branch to False branch of the conditional, then you need the
negated_ast.
Let us try to make it run. Note that the examples would typically be present
in code as
if a>b:
# target branch
print('Hello')
else:
# this branch was taken.
print('Hi')
How would you use the branch distance metric? Currently (2024) it is used to
further fine-tune the fitness of a given input with respect to a given
condition. That is, the approach distance is used as the whole number
component of the fitness, and the branch distance is used as the fractional
component, and the branch distance is only computed at the particular branch
where the control flow diverged from the desired path. However, this is not
the only way it can be used. Korel [^korel1990] initially suggested using the
branch distance directly, computing the fitness based on the entire set of
boolean conditions encountered on a desired path.
Note: A particularly good resource on SBST is Simon Marcus Poulding’s Ph.D. thesis
“The Use of Automated Search in Deriving Software Testing Strategies”.
[^korel1990]: Bogdan Korel “Automated software test data generation.” IEEE Transactions on software engineering, 1990
[^arcuri2011]: Andrea Arcuri “It really does matter how you normalize the branch distance in search-based software testing” 2011
Artifacts
The runnable Python source for this notebook is available here.