Lab 8: Writing a SimpleC Compiler


The tests that you will use to test your compiler are in tests.tar. Copy tests.tar into your lab8-src directory and type "tar -xvf tests.tar". To run the tests cd into "tests" and type "testall".

These are the sources that were modified during class for local vars

Goal

In this lab you will write a compiler for the Simple C language.Your compiler will generate x86-64 assembly language that can be assembled to produce an executable file.

Introduction

You will write a compiler for the SimpleC language. The SimpleC language is a subset of the C language that only has the following types:

SimpleC Type
long
long*
char*
char**
void

Examples of the SimpleC language is the following:

------------ test1.c ---------------
void main()
{
    printf("Hello world...\n");
}

------------ test2.c ---------------
long fact(long n) {
    if (n==0) return 1;
    return n*fact(n-1);
}

void main()
{
    printf("Factorial of 5 = %d\n" , fact(5));
}
--------------- test3.c -------------
long sum(long n, long* a) {
    long i, s;
    s = 0;
    for (i=0; i<n; i=i+1) {
        s = s + a[i];
    }
    return s;
}

long main()
{
    long* a;
    a = malloc(5*8);
   
    a[0]=4; a[1]=3; a[2]=1; a[3]=7; a[4]= 6;
   
    long s;
    s = sum(5, a);
   
    printf("sum=%d\n", s);
}
---------------------------------

Language Description

The language is very similar to "C" but only uses long, long*, char, char* and void as types. NOTICE: There is no space betwen "long" or "char" and "*" .

Arithmetic Operators: +, -, *, /, %,
Logical Operators: &&, ||
Equality Operators: ==, !=
Relational Operators: <, <=, >, >=
Flow control: if, if.else, while, for, do

Downloading the Starting Sources

You can dwnload the sources from lab8-src.tar.Z. Uncompress them and untar them.

uncompress lab8-src.tar.Z
tar -xvf lab8-src.tar

Now you are ready to compile your first program. Type:

cd lab8-src
make
./scc test1.c
cat test1.c
cat test1.s

The commans scc compiles the program and generates the assembly file test1.s

Currently the scc compiler has just the bare bones of the compiler you will write. The main implementation of the compiler is in simple.y.
The file simple.y is a "yacc" file that describes the grammar of the language and it has some "actions" or "C" code that are executed when some parts of the grammar are reached.

For example, the entry in siple.y:

function:
         var_type WORD
         {
         fprintf(fasm, "\t.text\n");
         fprintf(fasm, ".globl %s\n", $2);
         fprintf(fasm, "%s:\n", $2);
     }
     LPARENT arguments RPARENT compound_statement
         {
         fprintf(fasm, "\tret\n", $2);
         }
    ;

This entry is reached when a function definition is parsed. var_type is the type of the function: long, char*, void etc. WORD is the name of the function and it can be accessed using the variable $2, that means the 2nd element in the defintion. The code between {...} is executed after the function name is parsed and before the "(". The code generates the preamble of the function that includes .text, .global and the name:.

After the whole function is parsed, the epiloge with the ret instruction is generated.

For more information about yacc see Yacc and Lex tutorial. and Lex and Yacc manuals.

Part 1 Generating code for expressions

The first step will consist in generating code for expressions. The expressions contain all the operands Arithmetic Operators: +, -, *, /, %,
Logical Operators: &&, || Equality Operators: ==, != Relational Operators: <, <=, >, >=.

Since the code generation is intended to be done in a single pass, it will be easier to generate code that simulates a stack machine. However, instead of using memory all the time for the stack, we will be using some of the registers. In this way, registers and not memory will be used in most of the cases. This registers will be saved when entering a function and restored before leaving a function. When more than 5 entries in the stack are needed then the stack can be exteneded to the execution stack.


Stack Position
Register/Memory
0
rbx
1
r10
2
r13
3
r14
4
r15
>=5
Use the execution stack


Example:

x = y + 5*(z-1)

Using a stack machine:

push y
push 5
push z
push 1
-
*
+

Using the equivalent register positions:

mov y, %rbx          # push y
mov $5, % r10       # push 5
mov z, %r13           # push z
mov $1,%r14          # push 1
subq %r14, %r13    # -
imulq %r13, %r10   #*
addq %r10,%rbx     # +

Add the code in simple.y that will generate the assembly code described above for all the epressions, Arithmetic Operators: +, -, *, /, %,
Logical Operators: &&, || Equality Operators: ==, != Relational Operators: <, <=, >, >=, variables and integer constants.

After completing this part you should be able to assemble and run test2.c


Turnin

Follow these instructions to turnin lab8 part 1:
  1. Make sure that your programs are built by typing "make". Make sure it builds and runs in one of the machines sslab01.cs, sslab02.cs  etc.
  2. Type "turnin -c cs250 -p lab8-1 lab8-src" 
  3. Type  "turnin -c cs250 -p lab8-1 -v" to make sure you have submitted the right files 

The deadline for this part is Monday November 18th 2013 at 11:59pm.


Part 2 Generating code for control flow statements

Inn this part you will generate the code for if-else/while/do-while/for etc. Alsoyou will complete the code for array derreference. After you finish this part of the project you should be able to generate assemble and run test3.c and test4.c. Also write your own tests.

Turnin

Follow these instructions to turnin lab8 part 2:
  1. Make sure that your programs are built by typing "make". Make sure it builds and runs in one of the machines sslab01.cs, sslab02.cs  etc.
  2. Type "turnin -c cs250 -p lab8-2 lab8-src" 
  3. Type  "turnin -c cs250 -p lab8-2 -v" to make sure you have submitted the right files 

The deadline for this part is Monday December 2nd at 11:59pm.