- The graphical routines.
- The GraphState
- Extracting the graph
- Arithmetic expressions
- Control structures
- What remains?
Important: Pyodide takes time to initialize. Initialization completion is indicated by a red border around Run all button.
We previously discussed how one can write an interpreter Python, and we made use of this machinery in generating a control flow graph. In this post, we will show how one can extract the static call-graph using the same machinery. A call-graph is a directed graph data structure that encodes the structure of function calls in a program. One can extract the call-graph either dynamically or statically. A dynamic call-grah can be constructed by following the execution path of the program on multiple inputs. Such dynamic call-graphs under-approximate the actual structure of the program to the functions that was covered by the set of input. The dual of a dynamic call-graph is a static call-graph (the subject of this post) which is constructed by statically analyzing the program. Such call-graphs overapproximate the actual call-graph of the program (the exceptions are when there are higher order functions and othe polymorphism). As in control-flow graph, a static call-graph is another abstract view of the interpreter. Static call-graphs complement control flow graphs in static analysis. They allow one to identify which functions may be impacted by a change in one function (dependencies), evaluate reachability, and also can be used to implement various traversals of the code. Note that this is a limited proof of concept. It does not implement the entire traversal other than what is required for us to get our examples working.
As before, we start with the prerequisite imports. Click on the bullets for more information on how to obtain them.
System ImportsThese are available from Pyodide, but you may wish to make sure that they are installed if you are attempting to run the program directly on the machine.
Available PackagesThese are packages that refer either to my previous posts or to pure python packages that I have compiled, and is available in the below locations. As before, install them if you need to run the program directly on the machine. To install, simply download the wheel file (`pkg.whl`) and install using `pip install pkg.whl`.
The graphical routines.
The default class for Pyodide.
CLIGraphics if you are running it from the
We use Pyodide here.
to_graph() function produces a graph from the nodes in the registry.
This is more simplified than in control-flow because there is only one kind
of node in this graph – a function.
The call-graph is a graph, and hence we need a data structure for the graph.
We also need a node class.
Given that it is a directed graph node, we need the ability to add calls and callers.
The context is really the lexical context (scope) in which a function was defined. For example, functions may be defined in the context of modules or classes, and classes may be defined in the context of modules.
We next define two convenience methods for printing any node.
The usage of our graph structure is as below:
Extracting the graph
The call-graph is essentially a source code walker, and shares the basic structure with our interpreter.
As in previous posts, we define
walk() that walks a given AST node.
A major difference from the control-flow graph is that we want to keep track
of the lexical
context in which a function is defined, but not necessarily the sequence.
walk() accepts the node and the lexical context. It then
invokes the various
on_*() functions with the same list. Since we ignore
the sequence of statements that resulted in a call, The return value is not
We also define
parse which parses a given fragment and
given a fragment, parses and walks the AST.
We first define the
Module. The module is a basic context from where calls
to functions can be made.
Similar to modules, we traverse function definitions looking for calls to other functions. Hence, we only have a basic traversal in place.
The pass statement is trivial. It does nothing but exist.
The name is just a skeleton. It has no function other than to exist.
The return statement can also have embedded function calls. Hence, we traverse the value.
A call creates a link between two functions.
Expressions can have function calls embedded. Hence, we traverse the value.
Similar to name, primitives also have no function other than to exist.
For arithmetic expressions, we need to walk on the operands.
Assign(expr* targets, expr value)
For assignment, we walk the value.
We now come to the control structures. For the
if statement, we have two
parallel paths. We first traverse the test expression, then the body of the
if branch, and the body of the else branch.
while statement is more complex than the
if statement, but the
essential idea is the same.
Break and continue are simply skeletons.
For(expr target, expr iter, stmt* body, stmt* orelse, string? type_comment)
For statement is again quite simple.
At this point, what we have is a very basic call-graph algorithm that understands a restricted subset of Python. However, a lot more is left.
The main problem is that if you have higher order functions, it is very hard to identify which functions can get called when the variable containing the function is invoked. For example, the call-graph of
def f(x): x(1)
is almost impossible to compute because
x() can be any function including
f() itself. Any polymorphic behavior (such as from objects) induce similar
For example, many objects may override various operators such as arithmetic or boolean operators, providing their own implementations. Given the dynamic nature of Python, we can’t identify such methods. That is, without full type inference, accurate call-graphs are impossible. Even given full type inference, we need accurate abstract interpretation of the program to determine possible assignments (which we do not do here).
That is, while basic call-graph construction is easier than control-flow construction, a complete call-graph is much harder than complete control-flow graph.
We note that this post is a proof-of-concept, more intended for understanding how to construct call-graphs than as a library for use in production. If you need such libraries, I recommend Pyan, which seems to work reasonably well, as well as PyCG which is more rigorous.
The runnable Python source for this notebook is available here.
V Salis, T Sotiropoulos, P Louridas, D Spinellis, and D Mitropoulos. Pycg: Practical call graph generation in python. ICSE 2021. ↩
L Yu. Empirical study of Python call graph ASE 2019. ↩
A Abadi, B Makovitzki, R Shemer, and S Tyszberowicz. NoCFG: A Lightweight Approach for Sound Call Graph Approximation. arXiv:2105.03099. 2021. ↩