Contents

  1. Delimited Parser
    1. Text
    2. PEG
  2. Indentation Based Parser
    1. IText
  3. Artifacts

Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.

We previously saw how to incorporate indentation sensitive parsing to combinatory parsers. There were two things that made that solution somewhat unsatisfactory. The first is that we had to use a lexer first, and generate lexical tokens before we could actually parse. This is unsatisfactory because it forces us to deal with two different kinds of grammars – the lexical grammar of tokens and the parsing grammar. Given that we have reasonable complete grammar parsers such as PEG parser and the Earley parser, it would be nicer if we can reuse these parsers somehow. The second problem is that combinatory parsers can be difficult to debug.

So, can we incorporate the indentation sensitive parsing to more standard parsers? Turns out, it is fairly simple to retrofit Python like parsing to standard grammar parsers. In this post we will see how to do that for PEG parsers. (The PEG parser post post contains the background information on PEG parsers.) That is, given

if True:
   if False:
      x = 100
      y = 200
z = 300

We want to parse it similar to

if True: {
   if False: {
      x = 100;
      y = 200;
   }
}
z = 300;

in a C like language. As before, we start by importing our prerequisite packages.

Available Packages These are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.
  1. simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".


Delimited Parser

We first define our grammar.



Text

We want a stream of text that we can manipulate where needed. This stream will allow us to control our parsing.



Next, we modify our PEG parser so that we can use the text stream instead of the array.

PEG



Using



It is often useful to understand the parser actions. Hence, we also define a parse visualizer as follows.



Using



As you can see, while the visualizer is helpful, it is very verbose. Hence, use it only when you have difficulty understanding why a parse did or did not succeed.

Indentation Based Parser

For indentation based parsing, we modify our string stream slightly. The idea is that when the parser is expecting a new line that corresponds to a new block (indentation) or a delimiter, then it will specifically ask for <$nl> token from the text stream. The text stream will first try to satisfy the new line request. If the request can be satisfied, it will also try to identify the new indentation level. If the new indentation level is more than the current indentation level, it will insert a new <$indent> token into the text stream. If on the other hand, the new indentation level is less than the current level, it will generate as many <$dedent> tokens as required that will match the new indentation level.

IText



We will first define a small grammar to test it out.



Here is our text that corresponds to the g1 grammar.



We can now use the same parser with the text stream.



Here is a slightly more complex grammar and corresponding text



Checking if the text is parsable.



Another test



Using



Another



Using



Another



Using



Another



Using



Another



Using



Another



Using



Another



Using



Another



Using



Let us make a much larger grammar



Using



As can be seen, we require no changes to the standard PEG parser for incorporating indentation sensitive (layout sensitive) parsing. The situation is same for other parsers such as Earley parsing.

Artifacts

The runnable Python source for this notebook is available here.