Learning Regular Languages with the TTT Algorithm
TLDR; This tutorial is a complete implementation of the TTT algorithm for active automata learning in Python. TTT combines the discrimination tree of Kearns and Vazirani with binary search counterexample analysis from Rivest and Schapire, and adds prefix transformation and discriminator finalization to eliminate all redundant membership queries. The Python interpreter is embedded so that you can work through the implementation steps.
In my previous post, I implemented Angluin’s L* algorithm for learning regular languages from a blackbox oracle. L* uses a flat observation table to track state distinctions, which leads to redundant membership queries: when a counterexample arrives, all its suffixes are added as columns even though most distinguish no new states.
The key insight that the discrimination tree is a better data structure for this job was due to Kearns and Vazirani 1, who replaced L*’s observation table with a binary tree of discriminators. Rivest and Schapire 2 independently contributed binary search counterexample analysis, which finds the single relevant suffix in a counterexample in \(O(\log k)\) queries rather than adding all \(k\) suffixes.
TTT by Isberner, Howar and Steffen 3 adds two further refinements: prefix transformation, which keeps access sequences minimal, and discriminator finalization, which keeps the discrimination tree shallow. Together these make TTT provably redundancy-free. That is, it never makes a membership query whose answer could have been derived from earlier queries.
TTT is the algorithm of choice in practical automata learning tools such as LearnLib 4. ADT 5 extends TTT with adaptive distinguishing sequences, which can reduce resets in hardware settings, though performance differences in software engineering settings are modest.
Definitions
- Alphabet \(A\): the set of input symbols the DFA reads.
- Membership query: a string passed to the blackbox oracle. The oracle answers yes (accepted) or no (rejected).
- Equivalence query: a hypothesis grammar passed to the teacher. The teacher answers yes, or returns a counterexample string where the hypothesis and the target disagree.
- PAC oracle: a probabilistic approximation to the equivalence oracle. After \(N\) random tests without finding a counterexample, we declare the hypothesis probably approximately correct.
- Discrimination tree (DT): a binary tree whose inner nodes are discriminator suffixes and whose leaves are states. Sifting a string \(w\) through the tree classifies it to a state using one membership query per level.
- Access sequence \(acc(q)\): the shortest known string that reaches state \(q\) in the target.
- Spanning tree: a mapping from each known state to its access sequence. A dict from state to the shortest string known to reach it.
- Open transition: a transition from state \(q\) on symbol \(a\) whose target state has no access sequence yet, meaning TTT has not yet determined which state it leads to.
- Counterexample decomposition: the process of finding the split point in a counterexample, extracting a new discriminator, and splitting a leaf in the DT.
Contents
- Definitions
- Prerequisites
- The Road from L* to TTT
- The DFA Representation
- The Oracle
- The Discrimination Tree
- The Spanning Tree
- Hypothesis Construction
- Counterexample Decomposition
- Non-Redundancy
- A Note on the Equivalence Oracle
- DT Coherence After Split
- The Main Loop
- Examples
- Evaluating Model Accuracy
- Comparison with L*
- References
- Artifacts
Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
Prerequisites
We use the Teacher and Oracle from the L* post unchanged.
The PAC equivalence oracle in Teacher is a direct drop-in for TTT.
The rest of the algorithm is completely independent of how equivalence
queries are answered.
Available Packages
These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.- simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
- rxfuzzer-0.0.1-py2.py3-none-any.whl from "Fuzzing With Regular Expressions".
- earleyparser-0.0.1-py2.py3-none-any.whl from "Earley Parser".
- cfgrandomsample-0.0.1-py2.py3-none-any.whl from "Uniform Random Sampling of Strings from Context-Free Grammar".
- cfgremoveepsilon-0.0.1-py2.py3-none-any.whl from "Remove Empty (Epsilon) Rules From a Context-Free Grammar.".
- gatleastsinglefault-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Faults".
- hdd-0.0.1-py2.py3-none-any.whl from "Hierarchical Delta Debugging".
- ddset-0.0.1-py2.py3-none-any.whl from "Simple DDSet".
- lstar-0.0.1-py2.py3-none-any.whl from "L* Algorithm".
Since this notebook serves both as a web notebook as well as a script
that can be run on the command line, we redefine canvas if it is not
defined already. The __canvas__ function is defined externally when
it is used as a web notebook.
The Road from L* to TTT
In L*, when the equivalence oracle returns a counterexample \(ce\) of length \(k\), the algorithm adds all \(k\) suffixes of \(ce\) as new columns. For each new column, it must re-query every existing row. Many of these new columns are, however, redundant. They do not distinguish any new pair of states, but given that they exist, the cells need to be filled, and hence you pay the price in queries.
The key insight in TTT is that a counterexample only ever witnesses one new distinction: one pair of states was wrongly merged. Hence, exactly one new discriminator is sufficient, not \(k\). Getting to this point required three independent contributions:
- Kearns and Vazirani (1994)1 replaced the observation table with a discrimination tree. A binary tree of discriminator suffixes where each leaf is a state. Sifting a string down the tree classifies it in \(O(depth)\) queries rather than \(O(|suffixes|)\).
- Rivest and Schapire (1993)2 showed that binary search over the counterexample finds the single relevant split point in \(O(\log k)\) queries, rather than adding all \(k\) suffixes.
- Isberner, Howar and Steffen (2014)3 combined these with prefix transformation (keeping access sequences minimal) and discriminator finalization (keeping the DT shallow), producing TTT.
The DFA Representation
The DFA class is similar to the one from the
RPNI post.
We also add a run() method which returns the state reached after consuming
a string, and ensure_state() which registers a state in the grammar
without allocating a new key, which is needed when manually constructing DFAs
in tests and when close_transitions (defined later) discovers states from the DT.
The DFA class can now model a deterministic finite state machine. We also add a helper to render any DFA as a Graphviz dot diagram.
Let us test it thoroughly.
The Oracle
Let us define a simple mock oracle for testing the components in isolation.
The Teacher imported from L* is the full oracle, and will be used in the
main loop.
The Discrimination Tree
The discrimination tree (DT) replaces L*’s flat observation table. Think of it as a game of 20 questions: each inner node asks: if I append suffix \(d\) to this string, does the target accept it? and routes left (no) or right (yes). Each leaf is a known state.
There are exactly two kinds of nodes: Leaf and Inner.
We test the node types before proceeding.
Sifting
To classify any string \(w\) to a state, we walk the DT from root to leaf. At each inner node labeled \(d\), we query \(member(w \cdot d)\) and go right (yes) or left (no). The leaf we land on is the state \(w\) belongs to. Each step is one membership query, so sifting costs at most \(depth(DT)\) queries, far fewer than L*’s \(O(|suffixes|)\).
We also add a helper to render a DT as a Graphviz dot diagram.
Inner nodes show their discriminator suffix; leaves show the state name.
Left edges (membership query returned False) are labelled “no”;
right edges (True) are labelled “yes”.
An empty discriminator is shown as ε (epsilon), meaning the string itself
is queried directly with nothing appended.
An optional tracer argument (a DTTracer, defined below) colours the
recorded sift path blue.
DTTracer wraps a real DT node and records every step taken during a sift.
It proxies is_leaf, discriminator, left, right, and state to the
wrapped node, but intercepts the left/right accesses to record which
edge was taken. After sift(DTTracer(root), w, oracle) returns, the tracer
holds path_nodes and path_edges as sets of id()s referencing the
unwrapped nodes, so they match what dt_to_dot sees.
We test sifting on the even-a’s example: the DT has one discriminator (the empty string) that separates even-a states from odd-a states.
single leaf: everything maps to start
two-level tree: [''] / <1> (odd) \ <start> (even)
Sifting 'a' (odd a’s) goes left to <1>; sifting 'aa' (even a’s) goes right to <start>.
Sifting 'aa' goes right.
The even-a’s DT always uses ε as its discriminator because membership of
the access sequence alone is enough to tell states apart. Most targets
produce non-empty discriminators. Consider strings over {a, b} that end
in ba. After two counterexamples the DT has two levels: the root asks
member(w + 'a') (discriminator 'a'), and the left subtree then asks
member(w + ε) to separate <start> from the accepting state <2>.
Sifting 'b' (access of <1>): member('b'+'a') = member('ba') = True
so go right -> leaf <1>.
Sifting 'ba' (access of <2>): member('ba'+'a') = member('baa') = False
so go left; then member('ba'+ε) = member('ba') = True -> go right -> leaf <2>.
Sifting '' (access of <start>): member(''+a) = member('a') = False
so go left; then member(''+ε) = member('') = False -> go left -> leaf <start>.
The Spanning Tree
If you have read the RPNI post, the spanning tree will look familiar. The RPNI Prefix Tree Acceptor (PTA) is a tree-shaped DFA where every path from root to a node spells out the string that reaches that state. The spanning tree is the dual of the PTA:
- PTA maps strings -> states. You start with examples and build states to match them.
- Spanning tree maps states -> strings. You start with states (discovered by TTT) and record the string that reaches each one.
In TTT, we never traverse the spanning tree as a tree. We only ever look up \(acc(q)\) for a given state, or add a new state with its access sequence. So the implementation reduces to a simple dict.
We add a helper to render a spanning tree as a Graphviz dot diagram. Each node shows its state name and access sequence. Edges are labelled with the character that extends the parent’s access sequence to reach the child.
We test the spanning tree.
Hypothesis Construction
At any point during learning, we have a DT and a spanning tree. Together they are enough to construct a complete hypothesis DFA. The idea is simple: to find the target state of transition \(q \xrightarrow{a} ?\), form the string \(acc(q) \cdot a\) and sift it through the DT. Whatever leaf it lands on is the state we assign as the target.
Concretely, for the even-a’s example with a two-leaf DT (discriminator '',
left = <1>, right = <start>), and spanning tree
{<start>: '', <1>: 'a'}, we sift each transition in turn.
The blue path shows the route taken through the DT.
\(\langle start \rangle \xrightarrow{a} ?\): sift '' + 'a' = 'a'.
Is 'a' + '' accepted? No (odd a’s) — go left — leaf <1>.
So <start> -a-> <1>.
\(\langle start \rangle \xrightarrow{b} ?\): sift '' + 'b' = 'b'.
Is 'b' + '' accepted? Yes (zero a’s) — go right — leaf <start>.
So <start> -b-> <start>.
\(\langle 1 \rangle \xrightarrow{a} ?\): sift 'a' + 'a' = 'aa'.
Is 'aa' + '' accepted? Yes (even a’s) — go right — leaf <start>.
So <1> -a-> <start>.
\(\langle 1 \rangle \xrightarrow{b} ?\): sift 'a' + 'b' = 'ab'.
Is 'ab' + '' accepted? No (odd a’s) — go left — leaf <1>.
So <1> -b-> <1>.
A transition is open if we have not yet determined its target: either the transition is missing from the DFA entirely, or its recorded target has no access sequence in the spanning tree. The second case arises when a sift landed on a leaf whose state is not yet in the spanning tree, meaning a new state was just discovered and needs to be registered.
close_transitions works through all known states, sifting each open
transition. When sifting discovers a state not yet in the spanning tree,
that state is added and appended to the work list so its own transitions
are processed in the same pass. This means the loop may grow as it runs:
starting with one state, it may end with the full set.
leaf_index records, for each DT leaf, which (state, char) transitions
were routed there. After a split, we use this index to find and re-sift
only the transitions that are now stale, rather than re-sifting everything.
Accepting states are determined by querying the oracle directly on each access sequence. If \(acc(q)\) is accepted by the target, then \(q\) is an accepting state.
To see what closing looks like step by step, consider the even-a’s target
with alphabet {a, b}. We start with one state <start> and a single-leaf
DT. Every transition is open because no target state has been determined yet.
Closing sifts acc(<start>) + 'a' = 'a' and acc(<start>) + 'b' = 'b'
through the single-leaf DT. Both land on <start>, so both transitions
point back to <start> – the hypothesis says everything loops.
We visualise the DT, spanning tree, and resulting DFA at this stage.
The spanning tree
DFA before the split.
After the first counterexample (say 'a') is processed, decompose splits
the DT: a new inner node with discriminator '' separates <start> (goes
right: accepts '') from new state <1> (goes left: rejects '').
Re-sifting now correctly routes 'a' to <1> and 'b' back to <start>.
Spanning tree
DFA after split
Incremental Hypothesis Update
When decompose splits a leaf \(\ell\) into an inner node, every
transition that was previously routed to \(\ell\) is now stale: it
pointed to a single state, but the DT now says some of those strings
belong to the left child and some to the right.
We could re-sift every transition in the hypothesis from scratch, but that
is wasteful. Instead, leaf_index tells us exactly which (state, char)
pairs were routed to \(\ell\). We remove only those transitions from the
DFA and re-sift only them. Every other transition remains correct.
Concretely, continuing the even-a’s example: after the DT is split on ''
and state <1> is created, the stale transitions are those that sifted to
the old single leaf – all of them, since there was only one leaf. Each is
removed and re-sifted through the new two-node DT. The transition
<start> -a-> <start> becomes <start> -a-> <1>, and all others stay the
same. We show the DFA before and after the update.
split_id is the Python id() of the leaf before it was mutated; the
caller must capture it before calling split_leaf, since the mutation
changes the object in place and the old id is the key in leaf_index.
We test hypothesis construction on the even-a’s example.
Counterexample Decomposition
When the equivalence oracle returns a counterexample \(ce\), we know the hypothesis is wrong on \(ce\). But where exactly does it go wrong?
The Split Point
Walk \(ce\) through the hypothesis. At position 0, both hypothesis and target are in \(\langle start \rangle\), so they agree. At position \(|ce|\), they disagree (that is what the counterexample means). So somewhere in between is the first point of disagreement. That is, the position \(i\) where the hypothesis first takes a wrong transition.
We find this by binary search in \(O(\log|ce|)\) queries. At each midpoint \(m\), we check whether \(acc(q_m) \cdot ce[m:]\) gives the same answer as the full counterexample. If yes, the split point is to the right; if no, it is here or to the left.
Prefix Transformation
After finding the split point \(i\), we need the string that reaches state \(q_i\) and then reads \(ce[i]\). The raw counterexample prefix \(ce[:i+1]\) would work, but we use \(acc(q_i) \cdot ce[i]\) instead. This is the prefix transformation, and it gives two guarantees:
- Correctness: \(acc(q_i)\) traces a known path through the hypothesis, so the sift is guaranteed to work even if the hypothesis is partially stale.
- Minimality: the new state gets access sequence \(acc(q_i) \cdot ce[i]\), which is the shortest possible. Using \(ce[:i+1]\) could produce a much longer access sequence, making future sifts more expensive.
Splitting a Leaf
Once we have the split point, we know:
- A leaf \(\ell\) currently represents
old_state old_stateandnew_statewere treated as identical by the hypothesis, but the counterexample proves they are different- The discriminator \(ce[i+1:]\) is the suffix that tells them apart
We mutate the leaf in place into an inner node. This is essential because other parts of the tree already hold references to \(\ell\), so replacing it with a new object would leave those references stale.
We test split_leaf.
After the split, the single leaf becomes an inner node separating <start> and <1>.
We now show update_hypothesis in action. We build the stale hypothesis
(single-leaf DT, everything loops to <start>), then split the leaf and
call update_hypothesis. The stale transition <start> -a-> <start> is
removed and re-sifted to <1>.
Capture split_id before mutating the leaf, then split it.
Now update: stale transitions are removed and re-sifted.
DFA after update:
Discriminator Finalization
The discriminator \(ce[i+1:]\) is correct but may be longer than necessary. A shorter suffix of \(ce[i+1:]\) may distinguish the two states just as well. Keeping discriminators short keeps the DT shallow, which reduces sifting costs in all future iterations.
We try suffixes from shortest to longest, stopping at the first one that distinguishes the two states. Once a candidate fails, all shorter suffixes will also fail, so we stop early.
We test discriminator finalization.
Putting Decomposition Together
We now have all the pieces. decompose finds the split point by binary
search, applies the prefix transformation, finalizes the discriminator,
and splits the leaf. One counterexample yields one new state and one new
discriminator. This is the tightest possible refinement.
Note: decompose uses hypothesis transitions only to find the split point. The actual split uses \(acc(q_i)\), which is always correct with respect to the target. This means decompose is correct even if the hypothesis is partially stale. The access sequences in the spanning tree are ground truth, independent of hypothesis quality.
We test decompose.
test 1: single symbol counterexample ‘a’
test 2: longer counterexample ‘aab’ — binary search finds split at position 0, so the new state still gets access sequence ‘a’ and discriminator is ‘b’.
test 3: two states, counterexample ‘aa’ reveals a third state. The DT gains a second level; the new state gets access sequence ‘aa’.
Worklist Growth in close_transitions
We now trace the full (a|b)*ba learning run to show how state names
emerge from the algorithm and how the worklist grows during
close_transitions. No states are pre-declared — they are created by
dfa.new_state() inside decompose as each counterexample is processed.
Step 1. Single-leaf DT, only <start> in the spanning tree.
close_transitions sifts '' + 'a' and '' + 'b'. Both land on the
only leaf <start>, so no new states are discovered.
Spanning tree and DFA after step 1.
DFA step 1 — everything loops back to <start>.
Step 2. Counterexample 'ba': the hypothesis accepts it as <start>
(which is accepting) but shouldn’t — <start> should only accept ''.
decompose creates a fresh state (call it s1) and splits the DT leaf
with discriminator 'a'. update_hypothesis re-sifts stale transitions;
sifting '' + 'b' now lands on s1 — new, so it is appended to the
worklist and its transitions are closed immediately. Worklist grows from
['<start>'] to ['<start>', s1].
Spanning tree after step 2 — s1 now has access sequence 'b'.
DFA after step 2.
Step 3. Counterexample 'ba' again — now the hypothesis rejects it
because s1 -a-> <start> is wrong (should reach an accepting state).
decompose creates s2 and splits the <start> leaf with discriminator
ε. update_hypothesis re-sifts; sifting acc(s1) + 'a' = 'ba'
lands on s2 — new, appended. Worklist grows to include s2.
Here is that sift path — the one that grows the worklist:
And sifting acc(s1) + 'b' = 'bb' lands on <start> — known, no append.
After update_hypothesis finishes, all three states are wired up.
Spanning tree after step 3 — three states, all access sequences minimal.
Final DFA — states named by the algorithm as they were discovered.
Non-Redundancy
The central claim of the TTT is that it never makes a membership
query whose answer could have been derived from earlier queries. To see
what this means concretely: in L*, if the counterexample is ba, the
algorithm adds both a and ba as new suffix columns and re-queries every
existing state against both. If a was already a column, that work is
wasted. TTT avoids this by extracting exactly one new suffix per
counterexample and routing all future classification through the DT. This
holds at every level:
- Sifting is non-redundant. Every query is \(w \cdot d\) where \(d\) was placed in the DT by a previous split that proved it necessary.
- Splitting is non-redundant. Each split adds exactly one discriminator, proven necessary by the counterexample.
- Closing is non-redundant. Each transition is sifted exactly once per iteration. Newly discovered states come with their DT position already established by the sift that found them, so no extra queries are needed.
This contrasts with L*, where adding all \(k\) suffixes of a counterexample forces re-querying every existing row against every new column, most of which add no new information.
A Note on the Equivalence Oracle
TTT assumes the equivalence oracle is exact: if it says the hypothesis is
wrong, it returns a string the hypothesis genuinely misclassifies. The
Teacher we use is a PAC oracle: it samples a finite set of strings and
declares equivalence if none expose a mistake. This is an approximation.
In principle, a false counterexample from a PAC oracle could cause TTT to create a redundant state: one that could have been merged with an existing state without changing the language the DFA accepts. The DFA would still be correct, just slightly larger than necessary. Neither TTT nor L* guarantees a globally minimal DFA under a PAC oracle: both can acquire spurious states or columns from false counterexamples, and neither performs a global minimization pass afterward. In practice this is a minor concern: the PAC oracle is unlikely to produce false counterexamples with reasonable parameters, and the language accepted by the DFA is unaffected either way.
DT Coherence After Split
After split_leaf turns a leaf into an inner node, some transitions in the
hypothesis that targeted the old leaf are now stale, because the DT now
routes some of those strings to the new child instead.
The incremental strategy finds exactly those stale transitions via
leaf_index, removes them from the DFA, and re-sifts only them. Every
other transition remains valid and costs no queries. New states discovered
during re-sifting are registered and their own transitions are closed in the
same pass.
The Main Loop
The main loop orchestrates everything:
- Build the initial hypothesis: one state, all transitions open.
- Ask the equivalence oracle. If it says yes, we are done.
- If not, decompose the counterexample to find one new state and one new discriminator.
- Incrementally update the hypothesis: re-sift only the stale transitions.
- Repeat from step 2.
The loop runs exactly \(n - 1\) times where \(n\) is the number of states in the minimal DFA, one counterexample per new state discovered.
Examples
We test TTT on three targets of increasing complexity.
target 1: strings over {a, b} with an even number of a’s
target 2: strings over {a, b} that end in ‘b’
target 3: binary strings whose value is divisible by 3
This target has no convenient regex, so we write a custom teacher. It is a good stress test: the minimal DFA has exactly 3 states (one per remainder mod 3), the alphabet is {0, 1}, and transitions are determined by how reading a bit updates the current value modulo 3. This exercises TTT on a target where the states correspond to arithmetic structure rather than string patterns.
Test this.
Evaluating Model Accuracy
We measure precision and recall by cross-fuzzing the target grammar and the inferred grammar. Precision is the fraction of strings generated by the inferred DFA that the target accepts. Recall is the fraction of strings generated by the target that the inferred DFA accepts.
The inferred DFA may contain a dead/sink state: a non-accepting state with
no exit, representing strings the target permanently rejects. Such a state
causes LimitFuzzer to loop, because the grammar has no finite derivation
from it. We remove dead states before fuzzing using fuzzer.compute_cost,
which assigns each nonterminal the minimum number of steps needed to reach
a terminal string. Any nonterminal with infinite cost is a dead state;
we remove it and all rules that reference it.
We define a match helper that wraps the Earley parser in a boolean check.
Testing Each pair is (regex, alphabet). Cases cover a range of DFA shapes: two-segment and three-segment chains, prefix-anchored, suffix-anchored, substring-containment, exact-alternation, and disjoint finite sets.
Comparison with L*
| L* | TTT | |
|---|---|---|
| Data structure | Flat observation table | Discrimination tree (Kearns-Vazirani) |
| Counterexample processing | Add all \(k\) suffixes | Binary search for 1 suffix (Rivest-Schapire) |
| Prefix transformation | No | Yes (minimal access sequences, TTT) |
| Discriminator finalization | No | Yes (shallow DT, TTT) |
| Redundant queries | Many | None: the DT structure prevents re-querying known distinctions |
| Closedness check | Explicit global scan | Lazy, local (open transitions) |
| Consistency check | Explicit global scan | Structurally prevented by DT |
The DT depth is bounded by \(n\) (one split per state), so sifting never becomes expensive. This makes TTT the preferred algorithm when membership queries are costly, as is typical when learning protocol implementations or library APIs through testing.
References
Artifacts
The runnable Python source for this notebook is available here.
-
Michael Kearns and Umesh Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994. pp. 44-58. ↩ ↩2
-
Ronald L. Rivest and Robert E. Schapire. Inference of Finite Automata Using Homing Sequences. Information and Computation, 103(2):51-73, 1993. ↩ ↩2
-
Malte Isberner, Falk Howar, and Bernhard Steffen. The TTT Algorithm: A Redundancy-Free Approach to Active Automata Learning. RV 2014. ↩ ↩2
-
Falk Howar and Bernhard Steffen. Active Automata Learning in Practice. Springer, 2022. ↩
-
Markus Frohme. Active Automata Learning with Adaptive Distinguishing Sequences. Master Thesis, TU Dortmund, 2015. https://arxiv.org/abs/1902.01139 ↩