Search Based Fuzzing involves generating various candidate inputs for a given program, identifying the inputs with the best score in some metric of effectiveness and choosing them for the next iteration so that one can iteratively improve the fitness of the given population of inputs. In the previous post I discussed how you can score inputs using approach level.

Approach level (or approximation level) is reasonable to compute the distance we have to traverse to reach a given node. However, in fuzzing, we also find that just reaching a node is insufficient. In several cases, the branch taken by an input execution determines if we can make progress. That is given several inputs, each of which reach a given decision node, we need to prioritize those inputs that are closest to switching to an uncovered branch. To do this, we use what is called the Branch Distance. This was first proposed by Bogdan Korel in 1990 [^korel1990]. In this post, I will discuss how to compute branch distance for computing the fitness score of an input in flipping a branch condition.

As before, we start by importing the prerequisites

Contents

  1. DistributeNot
  2. Artifacts

Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.

Available Packages These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.
  1. simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
  2. pycfg-0.0.1-py2.py3-none-any.whl from "The Python Control Flow Graph".
  3. pydot-1.4.1-py2.py3-none-any.whl
  4. metacircularinterpreter-0.0.1-py2.py3-none-any.whl from "Python Meta Circular Interpreter".


Given a set of variables (corresponding to an input) and a conditional, the branch distance measures the distance to the critical branch being true or false, where the critical branch is the branch where control flow diverged from reaching the target. That is, if an input diverged at a critical branch, then the distance to the conditional being true or false. It is given by the following translation:

Element Value
Boolean if True then 0 else K
a = b if abs(a-b) = 0 then 0 else abs(a-b) + K
a != b if abs(a-b) != 0 then 0 else K
a < b if a-b < 0 then 0 else (a-b) + K
a <= b if a-b <= 0 then 0 else (a-b) + K
a > b if b-a > 0 then 0 else (b-a) + K
a > b if b-a >= 0 then 0 else (b-a) + K
a or b min(bdistance(a), bdistance(b))
a and b bdistance(a) + bdistance(b)
not a Negation is moved inward and propagated over a

K is a penalty constant which lets you fine tune the penalty of being wrong. Typically K=1

Once you have the branch distance, you need to normalize it to make it less than 1. A common formula used is

\[normalize(branchD) = 1 − 1.001^{(−branchD)}\]

Another [^arcuri2011] is

\[\frac{branchD}{branchD + \beta}\]

where \(branchD\) is the value to be normalized and \(\beta\) is a constant value you choose.

Finally, \(fitness = approach level + normalized branch distance\)

Let us consider a few examples.



say we have the following inputs



As per the above formula, the bdistance is



Can we move it forward? Let us consider a few neighbours.



That is, as per this computation, 0, 0 is closer to flipping the branch. let us explore the neighbours again



again



at this point, we have a zero



Can we automate this approach? Interestingly, this is quite easy. We can reuse the approach in metacircular interpreter and change the semantics to conform to our requirement.



Here is a quick check to show that the meta-circular interpreter works expected.



Next, let us redefine the interesting bits according to the table we provided. Provided below again for easy reference.

Element Value
Boolean if True then 0 else K
a = b if abs(a-b) = 0 then 0 else abs(a-b) + K
a != b if abs(a-b) != 0 then 0 else K
a < b if a-b < 0 then 0 else (a-b) + K
a <= b if a-b <= 0 then 0 else (a-b) + K
a > b if b-a > 0 then 0 else (b-a) + K
a > b if b-a >= 0 then 0 else (b-a) + K
a or b min(bdistance(a), bdistance(b))
a and b bdistance(a) + bdistance(b)
not a Negation is moved inward and propagated over a


For ease of discourse, let us consider the last one first. The idea is that if one encounters a not unary, then it should be moved inward to the outermost comparison, which gets flipped. Any and or or that is encountered gets switched.

The same also gets applied when we want to take the false branch of a conditional. So, let us create a new class, that given an expression e, transforms it as equivalent to not e, but without the not in the expression, and normalizes it. So, we need two classes that correspond to both distributing any internal not and negating a given expression.

DistributeNot

This class normalizes any Not by distributing it inside. First the infrastructure.



When we find module, and expr there is no change, because they are just wrapper classes



We need two classes, the DistributeNot which is responsible for non-negated and NegateDistributeNot which is responsible for carrying a negated expression.



Simple things like names and constants should get translated directly by the DistributeNot, but should be negated by NegateDistributeNot.



Check that it works.



What should happen for not a? It should get pushed into a if possible. That is, DistributeNot should then switch to NegateDistributeNot. However, if we are starting with NegateDistributeNot, then it is already carrying a negation, so it should switch to DistributeNot.



Check that it works



What should happen for a and b? It should get turned into not (a and b) which is (not a) or (not b), but only on NegateDistributeNot. For DistributeNot, there is no change.



Check that it works



The on_compare method is simply itself in DistributeNot because we do not expect a not inside the compare. The NegateDistributeNot switches to its anti operation. We also do not have to walk inside the comparators because we do not expect either boolean operators or other comparators inside comparators.



Check that it works



We can now define branch distance conversions in BDInterpreter class. we want the comparator to have access to K. So we pass in self.



We need one more step. That is, we need to run the evaluator. In the below, we assume that we need to take the True branch. Hence, we use the normal_ast to find how to flip from the False branch. If on the other hand, you want to flip from the True branch to False branch of the conditional, then you need the negated_ast.



Let us try to make it run. Note that the examples would typically be present in code as

if a>b:
   # target branch
   print('Hello') 
else:
   # this branch was taken.
   print('Hi') 


How would you use the branch distance metric? Currently (2024) it is used to further fine-tune the fitness of a given input with respect to a given condition. That is, the approach distance is used as the whole number component of the fitness, and the branch distance is used as the fractional component, and the branch distance is only computed at the particular branch where the control flow diverged from the desired path. However, this is not the only way it can be used. Korel [^korel1990] initially suggested using the branch distance directly, computing the fitness based on the entire set of boolean conditions encountered on a desired path.

Note: A particularly good resource on SBST is Simon Marcus Poulding’s Ph.D. thesis “The Use of Automated Search in Deriving Software Testing Strategies”. [^korel1990]: Bogdan Korel “Automated software test data generation.” IEEE Transactions on software engineering, 1990 [^arcuri2011]: Andrea Arcuri “It really does matter how you normalize the branch distance in search-based software testing” 2011

Artifacts

The runnable Python source for this notebook is available here.