TLDR; This tutorial describes how to design test cases that can effectively cover features of a given context-free grammar using the k-path strategy.

Note: The k-path here has nothing to do with the k-path cover from algorithms.

Definitions

For this post, we use the following terms as we have defiend previously:

  • The alphabet is the set all of symbols in the input language. For example, in this post, we use all ASCII characters as alphabet.

  • A terminal is a single alphabet symbol. For example, x is a terminal symbol.
  • A nonterminal is a symbol outside the alphabet whose expansion is defined in the grammar using rules for expansion. For example, <term> is a nonterminal symbol.
  • A rule is a finite sequence of terms (two types of terms: terminals and nonterminals) that describe an expansion of a given terminal. For example, [<term>+<expr>] is one of the expansion rules of the nonterminal <expr>.
  • A definition is a set of rules that describe the expansion of a given nonterminal. For example, [[<digit>,<digits>],[<digit>]] is the definition of the nonterminal <digits>
  • A context-free grammar is composed of a set of nonterminals and corresponding definitions that define the structure of the nonterminal. The grammar given below is an example context-free grammar.
  • A terminal derives a string if the string contains only the symbols in the terminal. A nonterminal derives a string if the corresponding definition derives the string. A definition derives the string if one of the rules in the definition derives the string. A rule derives a string if the sequence of terms that make up the rule can derive the string, deriving one substring after another contiguously (also called parsing).
  • A derivation tree is an ordered tree that describes how an input string is derived by the given start symbol. Also called a parse tree.

  • A derivation tree can be collapsed into its string equivalent. Such a string can be parsed again by the nonterminal at the root node of the derivation tree such that at least one of the resulting derivation trees would be the same as the one we started with.

Contents

  1. Definitions
  2. Syntax Based Testing
    1. K-Paths
  3. Artifacts

Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.

Available Packages These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.
  1. simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
  2. earleyparser-0.0.1-py2.py3-none-any.whl from "Earley Parser".


Syntax Based Testing

When the abstract model of a program is a graph, to test such programs effectively, one uses the syntax based testing. Similarly, if the input domain of a program is a context-free language, to test such programs effectively, one needs to rely on syntax based testing.

As Havricov et al. 1 discusses, a high variation of inputs is desirable because it can induce a high variation in how programs behave, which is highly desirable. In particular, if some feature of the input domain is not present in the test cases, the corresponding code that processes that element may not get exercised. Finally, some elements may be processed only by the combination of input elements in specific order. In all such cases, producing inputs that cover all combinations is desirable. The traditional test design objectives 2 for syntax based testing are as follows:

  • Covering all terminal symbols in the context-free grammar
  • Covering all definitions in the context-free grammar
  • Covering all rules in the context-free grammar (subsumes the above two).

However, note that none of them actually produce inputs that are a combination of input features. Hence, to ensure that we can produce such inputs, Havricov et al. 1 introduces a new measure of test obligations called k-path.

K-Paths

A K-path is a sequence of expansions with K nonterminal symbols. A 1-path is a path where there is only a single nonterminal symbol is involved. So, a test suite designed with all 1-path criteria is same as one defined with all definitions obligation. A 2-path is a path with exactly two nonterminal symbols that are expanded consecutively. For example, say you have a definition such as

<digits>:[
               [<digit><digits>],
               [<digit>],
],

<digit>: [[1],[0]]

and a derivation tree that looks like

('<digits>', [
          ('<digit>', [('1', [])])
          ])

Such a tree is an instance of a 2-path, which is [<digits>, <digit>]. In a tree such as

('<digits>', [
          ('<digit>', [('1', [])]),
          ('<digits>',
              [('<digit>', [('1', [])])])
          ])

we haveone 3-path, which is [<digits>,<digits>, <digit>]. We also have two 2-paths [<digits>, <digit>], [<digits>,<digits>] and [<digits>, <digit>] , but there are only two unique 2-paths. So, the question is, how to compute the more complex k-paths?

We stat by defining a function parents() which takes in a grammar, and identifies possible parent nodes for a given nonterminal symbol.



That is, given the expression grammar



The parents are



Next, we define a function to compute all k-paths in a grammar.



Using it:



We can now tie both together



Using



Now that we have the k-paths, how do we obtain the test cases? For that, we first define a procedure that given a a node of the tree, and the parent nonterminal symbol (key), looks through the rules of the parent key, and identifies one of the rule which contains the root nonterminal of the # given node as a token. Given such a rule, we can embed the current node, forming a partial tree.



using it



Next, given a k-path, we want to make it into a tree.



Using it.



Let us define a method to display such trees



Using it



Filling the partial tree



Using it



Let us make it into a test case



Using it



Artifacts

The runnable Python source for this notebook is available here.

  1. Havrikov, Nikolas, and Andreas Zeller. “Systematically covering input structure.” 2019 IEEE/ACM international conference on Automated Software Engineering (ASE). IEEE, 2019.  2

  2. Paul Purdom. “A Sentence Generator for Testing Parsers.” BIT Numerical Mathematics, 1972