Code Translator


Overview


During my senior year I took a class on programming language design and implementation. One of the projects had us take in a C program and output a C program written in assembly styled, or three address code. There are three main parts to successfully accomplishing this: Tokenizing; creating a tree based on a grammar; traversing the tree and building the final result. This project was first done in Python and then in Ocaml due to its tail-recursive properties.



Step 1: Creating Tokens

A token is an object that represents some sort of item in a particular line of code. On the most basic level we have 3 types of digit, letter, symbol classifications for each character. Then if possible we combine these elementary classifications into a number, string, symbol, identified , or a reserve word. A reserve word a string of characters reserved by the language such as 'int', 'if', 'else', 'printf', etc. An identifier something that can function as a variable in code. Creating a list of tokens in the chronological order of appearance.



Step 2: Tree Generation

In the same way the English language abides by a set of rules, or grammar, code does as well. So my next step was to decipher meaning from the list of tokens based on a set of Grammar rules. Using these rules we recursively build a tree that associates meaning between the tokens.



A little background info

Three address code consists of one variable being equal to of at most an operation between two other variables. For example x = z + t could be considered three address code. Where as x = z + k * f does not and would need to be further modification. To make thing simple we replace the original variables with 'local[]' and each variable would have its own index in the 'local[]' array. The one other thing that we are allowed to use are goto statements their respective labels in lieu of directional commands such as for loops, functions, etc.



Step 3: Converting Step 2's Tree Into Code
&
Wrapping Things Up


I first copied the meta statements; then declared global variables and stored the appropriate relationships in a dictionary ; then I went through function declaration. Once I was ready to define the functions, I first replaced all global variable instances with the proper global array value utilizing my global variable dictionary. Once all the global variables were properly over-written, I recursively went through each function and converted them into instructions which followed the project's design guidelines and pushed the instructions into an array which held the instructions for each function. This was done via a DFS on each function and recursively storing/creating new local variables as needed. As we recursed back I pushed new instructions into an array which stored the current functions lines of code; except for the local variable initialization line which we can't write in until we are done creating instructions for the function. At the end I looked at the amount of local variables and added a local variable initialization instruction to the master array. After which I copied over the the instructions that I stored in my function array to the master array.