Important:Pyodide takes time to initialize.
Initialization completion is indicated by a red border around Run all button.
One of the problems when working with Python is that, even though many
libraries use data structures making use of recursion, the recursion stack in
Python is limited. Python starts with a recursion budget of 1000 function
calls which is easy to exhaust. Such data structures include
JSON and pickle. Even simple deep copy of a Python data structure from
copy.deepcopy() makes use of recursion.
This will blow the stack on copy
The trouble is that copy() is implemented as a recursive procedure. For
example, For exmple copy_array() may be defined as follows:
This is used as follows:
As before, it easily blows the stack when given a deeply nested data
structure.
Here is a simple recipe that lets you duplicate or serialize deeply nested
data structures. The traditional way to duplicate such a data structure is to
simply turn the recursive implementation to an iterative solution as follows:
It is used as follows:
As expected, it does not result in stack exhaustion.
Another way is to use a stack. For example, we can serialize a nested array by
using the following function.
We make use of a stack: to_expand contains a stack of items that
still needs to be processed. Our results are stored in expanded.
You can use it as follows:
TLV
If you do not care about human readability of the generated instructions, you
can also go for a variant of the tag-length-value (TLV) format used for binary
serialization.
TLV Serialize
Next, we define how to serialize a deep data structure. Here is our subject.
The TLV format serializes a data structure by storing the type (tag) of the
data structure followed by the number of its child elements, finally followed
by the child elements themselves. That is, (from right to left)
'a' 'b' 'c' 'd' 2 <set> 3 <list>
represents
'a' 'b' {'c', 'd'} 3 <list>
which in turn represents the following Python data structure.
['a', 'b', {'c', 'd'}]
Let us see how it works
TLV Deserialize
To deserialize, we do the opposite.
Let us see how it works
Cyclic data structures
How do we serialize a cyclic data structure or a data structure where
a single item is present as a child of multiple items?
For example, in the below fragment, b contains two links to a.
To handle such data structures, we need to introduce naming. Let us
consider the below example.
To handle this we first need a data structure for references.
Serialize
Next we define how to convert a data structure to a format that preserves
links.
Here is how the definition looks like. As you can see,
all the nesting is eliminated using naming of data structures.
Deserialize
Next, to recreate the structure
Using it.
Reconstruct
This structure still contains references. So, we need to reconstruct the
actual data
Using it.
Generators for recursion
This i not the end of the story however. It is remarkably easy to make a
Python function to allocate its stack frames on the heap so that you
are not restricted to the arbitrary cut off of recursion limit. The answer
is generators.
Here is how it is done
With this, we can transform any of our recursive functions as below. The idea
is to change any function call to yield
Once we have this, we can use the cpstrampoline() to execute this function.
Testing
Artifacts
The runnable Python source for this notebook is available here.