Some Sillyness with Python Iterators as Pipes
Contents
Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
Here is a how we write a normal for
loop in Python.
What if you want to operate on the list, such as squaring each element, or perhaps only selecting values greater than 5? Python list comprehensions are the Pythonic solution, which is as below
But I have always found more complex list comprehensions a bit difficult to read. Is there a better solution? Here is an attempt to adapt a UNIX shell pipelines like solution to Python. Something like
[i1,2,3,4,5,6,7,8,9] | where(_ > 5) | map(_ * _)
Here is a possible solution. What we need is a way to stitch operations on a
list together. So, we define a class Chains
with the __or__
dunder method
redefined. What it does is to connect the current object to the right object
in the pipeline, and set the current object as the source of values..
Source
Next, we define the source. That is, an object that is at the start of the pipeline.
We can use it as follows:
Can we do better on the sink, by avoiding the parenthesis? Here is a possibility
Using
Map
Next, we define maps.
We use it as follows.
Filter
Finally, we implement filters as follows.
This is used as follows.
We can also have our original names
Pipe DSL
This is great, but can we do better? In particular, can we avoid having
to specify the constructors? One way to do that is through introspection.
We redefine Chains
as below. (The other classes are simply redefined so
that they inherit from the right Chain
class)
What we are essentially saying here is that, a lambda
within a list ([lambda s: ...]
)
is treated as a map, while within a set ({lambda s:, ..}
) is treated as a
filter.
It is used a follows
One final note here is that, the final iterator object that the for loop
iterates on here is of kind M_
from the last object.
This is a consequence of the precedence of the operator |
. That is, when
we have a | b | c
, this is parenthesized as (a | b) | c
, which is then
taken as c(b(a()))
. This is also the reason why we have to wrap the
initial value in S_
, but not any others (Because we override __or__
only the right hand object in an |
operation needs to be the type Chain
).
(We are also essentially pulling the values from previous pipe stages.)
If we want, we can make the last stage the required Chain
type so that
we can write a | b | c | Chain()
. However, for that, we need to override
the only right associative operator in python – **
. That is, we have to
write a ** b ** c ** Chain()
, and have to override __rpow__()
. We will
then get the object corresponding to a
, and we will then have to push
the values to the later pipe stages.
Artifacts
The runnable Python source for this notebook is available here.