Chapter 4 Variables
In the last chapter, we’ve seen how to compile
nested expressions like (3+4)*5
by using temporary
storage for intermediate expresssions. On a high level
this is not so different from rewriting the program
ourselves as val x = 3+4; x * 5
.
So let’s support this next, and add support for variables.
4.1 Language Definition
4.1.1 Syntax
Concrete syntax grammar in BNF:
<op> ::= '+' | '-' | '*' | '/'
<atom> ::= <num>
| <ident>
| '(' <simp> ')'
| '{' <exp> '}'
<simp> ::= <atom> [<op> <atom>]*
<exp> ::= <simp>
| 'val' <ident> '=' <simp>';' <exp>
Example:
val x = 10;
x*2
Abstract syntax:
<exp> ::= <num>
| <ident>
| <exp> <op> <exp>
| val <ident> = <exp>; <exp>
Scala Definition:
abstract class Exp
case class Lit(x: Int) extends Exp
case class Prim(op: String, xs: List[Exp]) extends Exp
case class Var(x: String) extends Exp
case class Let(x: String, rhs: Exp, body: Exp) extends Exp
4.1.2 Direct Interpreter and Compiler
type Val = Int
def eval(exp: Exp): Val = exp match {
// previous cases
// ...
case Let(n, v, b) =>
val ev = eval(v)
val eb = eval(b)
???
case Var(name) => ???
}
Problem: we’re missing context information.
What can we do?
Idea 1: Substitution (inefficient, and wouldn’t work for compiler)
type Val = Int
def eval(exp: Exp): Val = exp match {
// previous cases
// ...
case Let(n, v, b) =>
val ev = eval(v)
eval(subst(b,n,ev)) // Var(name) --> Lit(ev)
case Var(name) =>
error(s"Unbound variable $name")
}
Example:
eval(Let("x", Lit(10), Prim("*", Var("x"), Lit(2)))) // => ???
Idea 2: Add an environment data structure! Make some information about the evaluation context, i.e., the surrounding program explicit.
type Val = Int
def eval(exp: Exp)(implicit env: Map[String,Val]): Val = exp match {
// previous cases
// ...
case Let(x, v, b) => eval(b)(env + (x -> eval(v)))
case Var(x) => env(x)
}
Example:
eval(Let("x", Lit(10), Prim("*", Var("x"), Lit(2)))) // => 20
Compiler:
def block(s: Code) = if (!s.contains("\n")) s else
"{\n " + s.replace("\n","\n ") + "\n }"
def trans(e: Exp)(implicit env: Map[String,Code]): Code = e match {
case Lit(x) => x.toString
case Prim(op, xs) => transPrim(op, xs.map(trans))
case Var(x) => env(x)
case Let(x,rhs,body) =>
s"val $x = ${block(trans(rhs))}" + NL +
s"${trans(body)(env + (x -> x))}"
}
Note: could have a first-cut compiler with dynamic environment.
4.2 Code Generation
4.2.1 Stack-Based Interpreter and Compiler
Interpreter:
type Val = Int
val memory = new Array[Int](MEM_SIZE)
def eval(exp: Exp, sp: Int)(env: Map[String,Int]): Unit = exp match {
// previous cases
// ...
case Let(n, v, b) =>
eval(v, sp)(env)
eval(b, sp + 1)(env + n -> sp)
memory(sp) = memory(sp + 1)
case Var(n) =>
memory(sp) = memory(env(n))
}
Example:
eval(Let("x", Lit(10), Prim("*", Ref("x"), Lit(2)))) // => 20
Compiler:
def trans(exp: Exp, sp: Int)(env: Map[String,Int]): Unit = exp match {
case Let(n, v, b) =>
trans(v, sp)(env); trans(b, sp + 1)(env + n -> sp)
println(s"memory($sp) = memory(${sp + 1})")
case Var(n) =>
println(s"memory($sp) = memory(${env(n)})")
}
Example:
trans(Let("x", Lit(10), Prim("*", Ref("x"), Lit(2)))) // x = 10; x*2
memory(0) = 10 // Lit(10), Map()
memory(1) = memory(0) // Var("x"), Map("x" -> 0)
memory(2) = 2 // Lit(2), Map("x" -> 0)
memory(1) *= memory(2) // Prim("*", ...), Map("x" -> 0)
memory(0) = memory(1) // Let("x", ...), Map()
4.2.2 Stack-Based Compiler Targeting x86-64 Registers
First straightforward impl:
def trans(e: Exp)(implicit env: Map[String,Int]): Unit = e match {
case Lit(x) => grow(); emitln(s"movq $$$x, ${mem(sp-1)} # ${sp-1}");
case Var(x) => grow(); emitln(s"movq ${mem(env(x))}, ${mem(sp-1)} # ${sp-1}");
case Prim(op,List(x,y)) => trans(x); trans(y); shrink(); transPrim(op)
case Let(x,rhs,body) =>
trans(rhs)
trans(body)(env + (x -> (sp-1)))
shrink()
emitln(s"movq ${mem(sp)}, ${mem(sp-1)} # drop ${sp}")
}
Example:
trans(Let("x", Lit(10), Prim("*", Ref("x"), Lit(2)))) // x = 10; x*2
movq $10, %rbx
movq %rbx, %rcx
movq $2, %rdi
imulq %rdi, %rcx
movq %rcx, %rbx
Works fine. But now let’s consider the interaction with registers and spilling. Example:
val x1 = 1
val x2 = 2
val x3 = 3
val x4 = 4
val x5 = 5
val x6 = 6
val x7 = 7
val x8 = 8
9
With only three available registers this yields:
movq $1, %rax # 0
movq $2, %rdx # 1
movq $3, %rcx # 2
push %rax # evict 0
movq $4, %rax # 3
push %rdx # evict 1
movq $5, %rdx # 4
push %rcx # evict 2
movq $6, %rcx # 5
push %rax # evict 3
movq $7, %rax # 6
push %rdx # evict 4
movq $8, %rdx # 7
push %rcx # evict 5
movq $9, %rcx # 8
movq %rcx, %rdx # drop 8
movq %rdx, %rax # drop 7
pop %rcx # reload 5
movq %rax, %rcx # drop 6
pop %rdx # reload 4
movq %rcx, %rdx # drop 5
pop %rax # reload 3
movq %rdx, %rax # drop 4
pop %rcx # reload 2
movq %rax, %rcx # drop 3
pop %rdx # reload 1
movq %rcx, %rdx # drop 2
pop %rax # reload 0
movq %rdx, %rax # drop 1
Problem is that all the reloading at the end is unnecessary. We’re exiting the scope so the variables can’t be used anymore. Fix: pass along reset information.
def shrink(dst: Int) = {
if (offset != dst) emitln(s"addq $$${(offset-dst)*8}, %rsp # reset stack ${offset} to $dst")
sp = dst; offset = dst
}
// put result in location 'dst' and drop everything to the right of it
def trans1(e: Exp, dst: Int)(implicit env: Map[String,Int]): Unit = e match {
case Let(x,rhs,body) =>
trans(rhs)
trans1(body,dst)(env + (x -> (sp-1)))
case Lit(x) =>
shrink(dst); emitln(s"movq $$$x, ${mem(dst)}")
case Var(x) =>
shrink(dst); if (mem(env(x)) != mem(dst)) emitln(s"movq ${mem(env(x))}, ${mem(dst)}")
case _ =>
trans(e); val src = sp-1
shrink(dst); if (mem(src) != mem(dst)) emitln(s"movq ${mem(src)}, ${mem(dst)} # drop $src to $dst")
}
// put result on the next free stack slot
def trans(e: Exp)(implicit env: Map[String,Int]): Unit = e match {
case Lit(x) => grow(); emitln(s"movq $$$x, ${mem(sp-1)} # ${sp-1}");
case Var(x) => grow(); emitln(s"movq ${mem(env(x))}, ${mem(sp-1)} # ${sp-1}");
case Prim("+",List(x,y)) => trans(x); trans(y); shrink(); transPrim(op)
case Let(x,rhs,body) =>
trans(rhs)
trans1(body,sp-1)(env + (x -> (sp-1)))
}
Now the example val x1 = 1; ...; 9
looks much better. All the reloads are removed and the final result is directly stored in the target register %rax
:
movq $1, %rax # 0
movq $2, %rdx # 1
movq $3, %rcx # 2
push %rax # evict 0
movq $4, %rax # 3
push %rdx # evict 1
movq $5, %rdx # 4
push %rcx # evict 2
movq $6, %rcx # 5
push %rax # evict 3
movq $7, %rax # 6
push %rdx # evict 4
movq $8, %rdx # 7
addq $40, %rsp # reset stack 5 to 0
movq $9, %rax
Note of course that the variables are never used so the code could be further optimized. However this requires somewhat different techniques that we’ll cover at a later point.
4.3 Parsing
Example:
val x = 10;
x*2
Can we parse this expression? Not with the strategy we were using.
Todo: show code?
We need to distinguish between keywords and identifiers. Consider:
valxyx
We need to look at the first four characters to figure out that
this isn’t the keyword val
. Then we need to backtrack. Doable,
but lookahead gets large (length of longest keyword?). Better
to read input only once and make a decision right away whether
this is a keyword or an identifier.
Let’s introduce tokenization. Idea: don’t have parser operate directly on sequence of characters as input but first group characters into a sequence of words.
Some other benefits: easy to skip whitespace, easy to add comments.
4.3.1 Tokenization
What classes of tokens do we have?
- Numbers:
[0-9]+
- Identifiers:
[a-zA-Z][a-zA-Z0-9]*
- Keywords:
val
- Operators:
+
,-
, … - Delimiters:
=
,;
And we ignore whitespace: ' ', '\n', '\r', '\t'
We also want comments (now or later?).
abstract class Token
case object EOF extends Token
case class Number(x: Int) extends Token
case class Ident(x: String) extends Token
case class Keyword(x: String) extends Token
case class Delim(x: Char) extends Token
4.3.1.1 Scanner
class Scanner(in: Reader[Char]) extends Reader[Token] {
// reader interface
var peek = getToken()
def hasNext: Boolean = peek != EOF
def hasNext(f: Token => Boolean) = f(peek)
def next() = try peek finally { peek = getToken() }
// character classes
def isAlpha(c: Char) =
('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z')
def isDigit(c: Char) = '0' <= c && c <= '9'
def isAlphaNum(c: Char) = isAlpha(c) || isDigit(c)
val isWhiteSpace = Set(' ','\t','\n','\r')
val isOperator = Set('+','-','*','/')
val isDelim = Set('{','}','(',')',';','=')
val isKeyword = Set("val")
// core tokenization logic
def skipWhiteSpace() =
while (in.hasNext(isWhiteSpace)) in.next() // skip white space
def getRawToken(): Token = {
if (in.hasNext(isAlpha)) {
getName()
} else if (in.hasNext(isOperator)) {
getOperator()
} else if (in.hasNext(isDigit)) {
getNum()
} else if (in.hasNext(isDelim)) {
Delim(in.next())
} else if (!in.hasNext) {
EOF
} else
abort(s"Unexpected character: ${in.peek}")
}
def getToken(): Token = {
skipWhiteSpace()
getRawToken()
}
// individual token classes
def getName() = {
val buf = new StringBuilder
while (in.hasNext(isAlphaNum))
buf += in.next()
val s = buf.toString
if (isKeyword(s)) Keyword(s) else Ident(s)
}
def getNum() = ...
def getOperator() = ...
}
4.3.1.2 Parser
class Parser(in: Reader[Token]) {
// convenience over scanner methods
def follows(c: Char) =
if (in.hasNext(_ == Delim(c))) { in.next(); true }
else false
def follows(s: String) =
if (in.hasNext(_ == Keyword(s))) { in.next(); true }
else false
def require(c: Char) =
if (in.hasNext(_ == Delim(c))) in.next()
else expected(s"'$c'")
def require(s: String) =
if (in.hasNext(_ == Keyword(s))) in.next()
else expected(s"'$s'")
// token classes
def isName(x: Token) = x match {
case Ident(x) => true
case _ => false
}
def isNum(x: Token) = x match {
case Number(x) => true
case _ => false
}
def isInfixOp(min: Int)(x: Token) = x match {
case Ident(x) => prec(x) >= min
case _ => false
}
// grammar productions
def name(): String = {
if (!in.hasNext(isName)) expected("Name")
val Ident(x) = in.next(); x
}
// e.g.: 45 | x | (...) | {...}
def atom(): Exp = {
in.peek match {
case Number(x) => in.next(); Lit(x)
case Ident(x) => in.next(); Var(x)
case Delim('(') => in.next(); val x = simpl(); require(')'); x
case Delim('{') => in.next(); val x = expr(); require('}'); x
case _ => expected("Atom"); EmptyExp
}
}
// e.g.: x + 4 * 7
def simpl(): Exp = binop(0)
def binop(min: Int): Exp = {
var res = atom()
while (in.hasNext(isInfixOp(min))) {
val op = name()
val nextMin = prec(op) + assoc(op) // + 1 for left assoc
val rhs = binop(nextMin)
res = Prim(op, res, rhs)
}
res
}
// val x: T = e; e | e
def expr(): Exp = {
in.peek match {
case Keyword("val") =>
require("val")
val nm = name()
require('=')
val rhs = simpl()
require(';')
val body = expr()
Let(nm,tpe,rhs,body)
case _ =>
simpl()
}
}
def parse() = {
val s = expr();
if (in.hasNext) expected("End of file"); // eof
s
}
}
4.4 Error Cases
Refer to a variable that isn’t defined. Interpreter will error at runtime. Compiler will currently crash during codegen. See next chapter.
4.5 Where are we?
We added (immutable) variables to our language and we added tokenization to our parser, so that it will process the input source as sequences of words instead of characters. This was necessary to support keywords like “val” without backtracking.