My final semester at University I took a class on advanced compilers. The first project that we wrote in python exposed us to the basics of code analysis.
Pycparser is a python library which generates an abstract syntax tree(AST) for an input C file. At first glance one might think that this does a lot of the work for us, however pycparser creates a plethora of classes to denote various object types in the AST. Since python is a dynamically typed language this creates a severe headache of deciphering the pycparser library and converting everything to a string type. After that we still have to traverse the AST and generate three address code, but that's more straight forward.
Now I break up the 3 address code instructions into groups known as Basic Blocks. Each Basic Block is read from top down and grantees completion. So a 'goto' statement for example would signify an end of a basic block.
A control flow graph connects basic blocks and points them to potential next steps. Using the graphviz library for python I converted my programmatic graph to a visual representation. This is essential for debugging future issues and to have a general visual understanding of what's going on.
The goal is simple, but the execution is not: for each basic block minimize the number of redundant computations whilst maintaining code integrity.