Grammar Coverage with KPaths for Syntax Testing
TLDR; This tutorial describes how to design test cases that can effectively cover features of a given contextfree grammar using the kpath strategy.
Note: The kpath here has nothing to do with the kpath cover from algorithms.
Definitions
For this post, we use the following terms as we have defiend previously:

The alphabet is the set all of symbols in the input language. For example, in this post, we use all ASCII characters as alphabet.
 A terminal is a single alphabet symbol.
For example,
x
is a terminal symbol.  A nonterminal is a symbol outside the alphabet whose expansion is
defined in the grammar using rules for expansion.
For example,
<term>
is a nonterminal symbol.  A rule is a finite sequence of terms (two types of terms: terminals and
nonterminals) that describe an expansion of a given terminal.
For example,
[<term>+<expr>]
is one of the expansion rules of the nonterminal<expr>
.  A definition is a set of rules that describe the expansion of a given
nonterminal.
For example,
[[<digit>,<digits>],[<digit>]]
is the definition of the nonterminal<digits>
 A contextfree grammar is composed of a set of nonterminals and corresponding definitions that define the structure of the nonterminal. The grammar given below is an example contextfree grammar.
 A terminal derives a string if the string contains only the symbols in the terminal. A nonterminal derives a string if the corresponding definition derives the string. A definition derives the string if one of the rules in the definition derives the string. A rule derives a string if the sequence of terms that make up the rule can derive the string, deriving one substring after another contiguously (also called parsing).

A derivation tree is an ordered tree that describes how an input string is derived by the given start symbol. Also called a parse tree.
 A derivation tree can be collapsed into its string equivalent. Such a string can be parsed again by the nonterminal at the root node of the derivation tree such that at least one of the resulting derivation trees would be the same as the one we started with.
Contents
Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
Available Packages
These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.Syntax Based Testing
When the abstract model of a program is a graph, to test such programs effectively, one uses the syntax based testing. Similarly, if the input domain of a program is a contextfree language, to test such programs effectively, one needs to rely on syntax based testing.
As Havricov et al. ^{1} discusses, a high variation of inputs is desirable because it can induce a high variation in how programs behave, which is highly desirable. In particular, if some feature of the input domain is not present in the test cases, the corresponding code that processes that element may not get exercised. Finally, some elements may be processed only by the combination of input elements in specific order. In all such cases, producing inputs that cover all combinations is desirable. The traditional test design objectives ^{2} for syntax based testing are as follows:
 Covering all terminal symbols in the contextfree grammar
 Covering all definitions in the contextfree grammar
 Covering all rules in the contextfree grammar (subsumes the above two).
However, note that none of them actually produce inputs that are a combination of input features. Hence, to ensure that we can produce such inputs, Havricov et al. ^{1} introduces a new measure of test obligations called kpath.
KPaths
A Kpath is a sequence of expansions with K nonterminal symbols. A 1path is a path where there is only a single nonterminal symbol is involved. So, a test suite designed with all 1path criteria is same as one defined with all definitions obligation. A 2path is a path with exactly two nonterminal symbols that are expanded consecutively. For example, say you have a definition such as
<digits>:[
[<digit><digits>],
[<digit>],
],
<digit>: [[1],[0]]
and a derivation tree that looks like
('<digits>', [
('<digit>', [('1', [])])
])
Such a tree is an instance of a 2path, which is [<digits>, <digit>]
. In a tree such as
('<digits>', [
('<digit>', [('1', [])]),
('<digits>',
[('<digit>', [('1', [])])])
])
we haveone 3path, which is [<digits>,<digits>, <digit>]
. We also have two
2paths [<digits>, <digit>]
, [<digits>,<digits>]
and [<digits>, <digit>]
, but there are only two unique 2paths.
So, the question is, how to compute the more complex kpaths?
We stat by defining a function parents()
which takes in a grammar, and
identifies possible parent nodes for a given nonterminal symbol.
That is, given the expression grammar
The parents are
Next, we define a function to compute all kpaths in a grammar.
Using it:
We can now tie both together
Using
Now that we have the kpaths, how do we obtain the test cases? For that, we first define a procedure that given a a node of the tree, and the parent nonterminal symbol (key), looks through the rules of the parent key, and identifies one of the rule which contains the root nonterminal of the # given node as a token. Given such a rule, we can embed the current node, forming a partial tree.
using it
Next, given a kpath, we want to make it into a tree.
Using it.
Let us define a method to display such trees
Using it
Filling the partial tree
Using it
Let us make it into a test case
Using it
Artifacts
The runnable Python source for this notebook is available here.