Important:Pyodide takes time to initialize.
Initialization completion is indicated by a red border around Run all button.
We previously saw how to incorporate
indentation sensitive parsing to combinatory parsers. There were two things
that made that solution somewhat unsatisfactory. The first is that we had to
use a lexer first, and generate lexical tokens before we could actually
parse. This is unsatisfactory because it forces us to deal with two different
kinds of grammars – the lexical grammar of tokens and the parsing grammar.
Given that we have reasonable complete grammar parsers such as
PEG parser and the Earley
parser, it would be nicer if we can reuse
these parsers somehow. The second problem is that
combinatory parsers can be difficult to debug.
So, can we incorporate the indentation sensitive parsing to more standard
parsers? Turns out, it is fairly simple to retrofit Python like parsing to
standard grammar parsers. In this post we will see how to do that for
PEG parsers. (The
PEG parser post post contains the background
information on PEG parsers.)
That is, given
if True:
if False:
x = 100
y = 200
z = 300
We want to parse it similar to
if True: {
if False: {
x = 100;
y = 200;
}
}
z = 300;
in a C like language.
As before, we start by importing our prerequisite packages.
Available Packages
These are packages that refer either to my previous posts or to pure python
packages that I have compiled, and is available in the below locations. As
before, install them if you need to run the program directly on the machine.
To install, simply download the wheel file (`pkg.whl`) and install using
`pip install pkg.whl`.
We want a stream of text that we can manipulate where needed. This stream
will allow us to control our parsing.
Next, we modify our PEG parser so that we can use the text stream instead of
the array.
PEG
Using
It is often useful to understand the parser actions. Hence, we also define
a parse visualizer as follows.
Using
As you can see, while the visualizer is helpful, it is very verbose. Hence,
use it only when you have difficulty understanding why a parse did or did not
succeed.
Indentation Based Parser
For indentation based parsing, we modify our string stream slightly. The idea
is that when the parser is expecting a new line that corresponds to a new
block (indentation) or a delimiter, then it will specifically ask for <$nl>
token from the text stream. The text stream will first try to satisfy the
new line request. If the request can be satisfied, it will also try to
identify the new indentation level. If the new indentation level is
more than the current indentation level, it will insert a new <$indent>
token into the text stream. If on the other hand, the new indentation level
is less than the current level, it will generate as many <$dedent> tokens
as required that will match the new indentation level.
IText
We will first define a small grammar to test it out.
Here is our text that corresponds to the g1 grammar.
We can now use the same parser with the text stream.
Here is a slightly more complex grammar and corresponding text
Checking if the text is parsable.
Another test
Using
Another
Using
Another
Using
Another
Using
Another
Using
Another
Using
Another
Using
Another
Using
Let us make a much larger grammar
Using
As can be seen, we require no changes to the standard PEG parser for
incorporating indentation sensitive (layout sensitive) parsing. The situation
is same for other parsers such as Earley parsing.
Artifacts
The runnable Python source for this notebook is available here.