Contents

  1. TLV
    1. TLV Serialize
    2. TLV Deserialize
  2. Cyclic data structures
    1. Serialize
    2. Deserialize
    3. Reconstruct
  3. Generators for recursion
    1. Artifacts

Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.

One of the problems when working with Python is that, even though many libraries use data structures making use of recursion, the recursion stack in Python is limited. Python starts with a recursion budget of 1000 function calls which is easy to exhaust. Such data structures include JSON and pickle. Even simple deep copy of a Python data structure from copy.deepcopy() makes use of recursion.



This will blow the stack on copy



The trouble is that copy() is implemented as a recursive procedure. For example, For exmple copy_array() may be defined as follows:



This is used as follows:



As before, it easily blows the stack when given a deeply nested data structure.



Here is a simple recipe that lets you duplicate or serialize deeply nested data structures. The traditional way to duplicate such a data structure is to simply turn the recursive implementation to an iterative solution as follows:



It is used as follows:



As expected, it does not result in stack exhaustion.



Another way is to use a stack. For example, we can serialize a nested array by using the following function. We make use of a stack: to_expand contains a stack of items that still needs to be processed. Our results are stored in expanded.



You can use it as follows:



TLV

If you do not care about human readability of the generated instructions, you can also go for a variant of the tag-length-value (TLV) format used for binary serialization.

TLV Serialize

Next, we define how to serialize a deep data structure. Here is our subject.



The TLV format serializes a data structure by storing the type (tag) of the data structure followed by the number of its child elements, finally followed by the child elements themselves. That is, (from right to left)

'a' 'b' 'c' 'd' 2 <set> 3 <list>

represents

'a' 'b' {'c', 'd'} 3 <list>

which in turn represents the following Python data structure.

['a', 'b', {'c', 'd'}]


Let us see how it works



TLV Deserialize

To deserialize, we do the opposite.



Let us see how it works



Cyclic data structures

How do we serialize a cyclic data structure or a data structure where a single item is present as a child of multiple items? For example, in the below fragment, b contains two links to a.



To handle such data structures, we need to introduce naming. Let us consider the below example.



To handle this we first need a data structure for references.



Serialize

Next we define how to convert a data structure to a format that preserves links.



Here is how the definition looks like. As you can see, all the nesting is eliminated using naming of data structures.



Deserialize

Next, to recreate the structure



Using it.



Reconstruct

This structure still contains references. So, we need to reconstruct the actual data



Using it.



Generators for recursion

This i not the end of the story however. It is remarkably easy to make a Python function to allocate its stack frames on the heap so that you are not restricted to the arbitrary cut off of recursion limit. The answer is generators. Here is how it is done



With this, we can transform any of our recursive functions as below. The idea is to change any function call to yield



Once we have this, we can use the cpstrampoline() to execute this function.



Testing



Artifacts

The runnable Python source for this notebook is available here.