Chapter 9 Type Checking/Inference
Where is this going: remove syntactic distinction between boolean expressions and normal integer expressions. Remove clunkiness around unit expressions.
Syntactically everything will be an expression and the type system will distinguish expressions based on their types.
9.1 Types
9.1.1 About Types
What is a type?
- A set of values: e.g. true, false
- A set of operations on those values: e.g. !, &&, …
Why do we need types?
- Help to structure and understand a program
- Can prevent some kinds of errors
- Generate more efficient code (e.g. no storage for Unit)
9.1.2 Our Grammar, Typed
<op> ::= ['*' | '/' | '+' | '-' | '<' | '>' | '=' | '!']+
<type> ::= <ident>
<bool> ::= 'true' | 'false'
<atom> ::= <number> | <bool> | '()'
| '('<simp>')'
| <ident>
| '{'<exp>'}'
<uatom> ::= [<op>]<atom>
<simp> ::= <uatom>[<op><uatom>]*
| 'if' '('<simp>')' <simp> ['else' <simp>]
| 'while' '(' <simp> ')' <simp>
| <ident> '=' <simp>
<exp> ::= <simp>[';'<exp>]
| 'val' <ident> '=' <simp>';'<exp>
| 'var' <ident> '=' <simp>';'<exp>
9.1.3 Type Checking
The type checking step will be part of the semantic analyzer.
The key point to understand is that types represent an abstract value, and inference rules define the set of operations on these values.
Therefore, the implementation is going to be very similar to eval, trans, or analyze.
9.2 Formal Type System
9.2.1 Inference Rules
Before we look at the implementation, let’s first describe our type system in formal notation. Just as BNF is a formalism to describe syntax, type systems are typically described through systems of inference rules. Where BNF describes legal derivations of programs from grammer rules, a system of inference rules describes legal derivations of typing judgements:
Env |- e: T
Means that in the environment ‘Env’, the expression ‘e’ is of type ‘T’
This is a statement that can be proved by building a proof tree out of inference rules that formally specify the type sytem.
Env is the typing environment: it is a mapping between identifier and type.
Env,id:T is the environment that contains all information in ‘Env’ plus the mapping from ‘id’ to ‘T’
Note: Env is often represented with \(\Gamma\).
conditions
------------ [Name of the rule]
conclusion
- Lit: ‘i’ is an Int, ‘b’ is a Boolean
------------------ [Int] --------------------- [Boolean]
Env |- Lit(i): Int Env |- Lit(b): Boolean
------------------- [Unit]
Env |- Lit(()): Unit
- Unary: op is in [“+”, “-”]
Env |- e: Int
------------------------ [IntUnOp]
Env |- Unary(op, e): Int
- Prim:
- op is in [“+”, “-”, “*“,”/“]
- bop is in [“==”, “!=”, “<=”, “>=”, “<”, “>”]
Env |- e1: Int Env |- e2: Int
--------------------------------------- [IntOp]
Env |- Prim(op, e1, e2): Int
Env |- e1: Int Env |- e2: Int
----------------------------------------- [BoolOp]
Env |- Prim(bop, e1, e2): Boolean
- Immutable variables
Env |- e1: T1 Env,x:T1 |- e2: T2
-------------------------------------------- [Let]
Env |- Let(x, e1, e2): T2
Env(x) = T
------------------- [Ref]
Env |- Ref(x) : T
9.2.2 Example
Prove that the following program is of type Boolean
val x = 3; x == 4
Let(x, Int, Lit(3), Prim("==", Ref(x), Lit(4)))
Prove that the following program is of type Boolean
val x = 3; x == 4
|- Let(x, Int, Lit(3), Prim("==", Ref(x), Lit(4))): Boolean
Prove that the following program is of type Boolean
val x = 3; x == 4
-----------------------------------------------------------------[Let]
|- Let(x, Int, Lit(3), Prim("==", Ref(x), Lit(4))): Boolean
Prove that the following program is of type Boolean
val x = 3; x == 4
|- Lit(3): Int x:Int |- Prim("==", Ref(x), Lit(4)): Boolean
-----------------------------------------------------------------[Let]
|- Let(x, Int, Lit(3), Prim("==", Ref(x), Lit(4))): Boolean
Prove that the following program is of type Boolean
val x = 3; x == 4
------------------[Int]
|- Lit(3): Int x:Int |- Prim("==", Ref(x), Lit(4)): Boolean
-----------------------------------------------------------------[Let]
|- Let(x, Int, Lit(3), Prim("==", Ref(x), Lit(4))): Boolean
Prove that the following program is of type Boolean
val x = 3; x == 4
--------------------------------------------------[BoolOp]
|- Lit(3): Int x:Int |- Prim("==", Ref(x), Lit(4)): Boolean
-----------------------------------------------------------------[Let]
|- Let(x, Int, Lit(3), Prim("==", Ref(x), Lit(4))): Boolean
Prove that the following program is of type Boolean
val x = 3; x == 4
x:Int |- Ref(x): Int x:Int |- Lit(4) : Int
--------------------------------------------------[BoolOp]
|- Lit(3): Int x:Int |- Prim("==", Ref(x), Lit(4)): Boolean
-----------------------------------------------------------------[Let]
|- Let(x, Int, Lit(3), Prim("==", Ref(x), Lit(4))): Boolean
Prove that the following program is of type Boolean
val x = 3; x == 4
(x: Int)(x) = Int
----------------------[Ref]
x:Int |- Ref(x): Int x:Int |- Lit(4) : Int
--------------------------------------------------[BoolOp]
|- Lit(3): Int x:Int |- Prim("==", Ref(x), Lit(4)): Boolean
-----------------------------------------------------------------[Let]
|- Let(x, Int, Lit(3), Prim("==", Ref(x), Lit(4))): Boolean
Prove that the following program is of type Boolean
val x = 3; x == 4
(x: Int)(x) = Int
----------------------[Ref]-----------------------[Int]
x:Int |- Ref(x): Int x:Int |- Lit(4) : Int
--------------------------------------------------[BoolOp]
|- Lit(3): Int x:Int |- Prim("==", Ref(x), Lit(4)): Boolean
-----------------------------------------------------------------[Let]
|- Let(x, Int, Lit(3), Prim("==", Ref(x), Lit(4))): Boolean
There are no more obligations to prove!! That means our initial statement was true.
9.2.3 Inference Rules (cont’d)
- if:
Env |- c1: Boolean Env |- e1: T Env |- e2: T
-------------------------------------------------------------- [If]
Env |- If(c1, e1, e2): T
- Mutable variables
Env |- e1: T1 Env,x:T1 |- e2: T2
------------------------------------------- [VarDec]
Env |- VarDec(x, T1, e1, e2): T2
Env(x) = T1 Env |- e1: T1
------------------------------------ [VarAssign]
Env |- VarAssign(x, e1): T1
- while
Env |- c1: Boolean Env |- e1: Unit
-------------------------------------------- [While]
Env |- While(c1, e1): Unit
9.3 Type Checking and Type Inference
The type checking/inference step will be part of the semantic analyzer.
The key point to understand is that types represent an abstract value, and inference rules define the set of operations on these values.
Therefore, the implementation is going to be very similar to eval or analyze.
9.3.1 Syntax of Types
abstract class Type
case class BaseType(tp: String) extends Type
val IntType = BaseType("Int")
val BoolType = BaseType("Boolean")
val UnitType = BaseType("Unit")
9.3.2 A First Typer
The type checker has to resolve the type of each node:
def typeInfer(exp: Exp)(env: Map[String,Type]): Type = exp {
case Lit(x: Int) =>
IntType
case Prim("+", List(a,b)) =>
if (typeInfer(a) != IntType) error("expected Int!")
if (typeInfer(b) != IntType) error("expected Int!")
IntType
case Let(x, rhs, body) =>
val t1 = typeInfer(rhs)
typeInfer(body)(env + (x -> t1))
case If(c, thenp, elsep) =>
if (typeInfer(c) != BooleanType) error("expected Boolean!")
val t1 = typeInfer(thenp)
val t2 = typeInfer(elsep)
if (t1 != t2) error("if/else result types don't match!")
t1
}
The function typeInfer
is very similar to an interpreter, but
it operates not over concrete values but over abstract values.
Instead of computing 3+4 = 7, it computes Int+Int = Int.
And instead of computing 4+true = error it computes
Int+Boolean = error.
9.3.3 Separating Checking and Inference
How can we improve the code? There is clearly a lot of duplication that can be factored out. Let’s introduce another function, typeCheck, to take care of checking types and producing errors:
def typeCheck(exp: Exp, pt: Type)(env: Env): Unit = {
val tpe = typeInfer(exp, pt)(env)
if (tpe != pt) error(s"Expected type $pt but got $tpe", exp.pos)
}
Now we can refactor typeInfer as follows:
def typeInfer(exp: Exp)(env: Map[String,Type]): Type = exp match {
case Prim("+", List(a,b)) =>
typeCheck(a, IntType); typeCheck(b, IntType); IntType
case Let(x, rhs, body) =>
typeInfer(body)(env + (x -> typeInfer(rhs)))
case If(c, thenp, elsep) =>
typeCheck(c, BooleanType)
val t1 = typeInfer(thenp); typeCheck(elsep,t1); t1
}
9.3.4 Type-Annotating Trees
What we have so far gets the job done of checking that a program doesn’t have any type errors, but we immediately discard the computed type information.
However, type information is often useful for later phases, e.g. for the code generator to decide on how much space needs to be reserved for a given value.
We can keep track of this by adding a field in the Exp class
trait Typed {
var tpe: Type = UnknownType
def withType(pt: Type): this.type = { tpe = pt; this }
}
abstract class Exp extends Typed
and modifying typeInfer to set this for each Exp visited.
def typeInfer(exp: Exp)(env: Map[String,Type]): Type = {
val tpe = exp match { ... }
exp.withTpe(tpe); tpe
}
This way, the computed type information is available for later phases to pick up.
9.3.5 Type-Driven Transformation
It is often the case that type mismatches can be fixed by inserting certain conversions. For example, an expression e of any type can be converted to Unit by wrapping it in a Let
val dummy = e; ()
In order to do this, we need to change typeInfer to return a new Exp (with the correct type assigned) instead of only a type.
def typeInfer(exp: Exp)(env: Map[String,Type]): Exp = ...
def typeCheck(exp: Exp, pt: Type)(env: Env): Exp = {
val exp1 = typeInfer(exp, pt)(env)
if (exp1.tpe == pt) exp1 else {
if (pt == UnitType) {
val dummy = freshDummyVar()
Let(dummy, exp1, Lit(())).withType(UnitType)
} else error(s"Expected type $pt but got $tpe", exp.pos)
}
}
The body of typeInfer changes as follows:
def typeInfer(exp: Exp)(env: Map[String,Type]): Exp = exp match {
case Prim("+", List(a,b)) =>
Prim("+", List(typeCheck(a, IntType), typeCheck(b, IntType))).withType(IntType)
case Let(x, rhs, body) =>
val rhs1 = typeInfer(rhs)
val body1 = typeInfer(body)(env + (x -> rhs1.tpe))
Let(x, rhs1, body1).withType(body1.tpe)
case If(c, thenp, elsep) =>
val c1 = typeCheck(c, BooleanType)
val elsep1 = typeInfer(elsep);
val thenp1 = typeCheck(thenp, elsep.tpe);
If(c1, thenp1, elsep1).withType(elsep1.tpe)
}
The last case (for if/else) demands some explanation.
Do now: why don’t we infer the type of the then branch and then check the else part against it?
This has to do with optional else. The parser will desugar an if without else by inserting a Unit literal in the else branch. For an if with missing else, converting a Unit literal to whatever other type the then branch has would be doomed to fail. But the opposite is always possible: we really want the then branch force-converted to Unit.
9.3.6 The Unknown Type as Sentinel
Note: this becomes relevant really only once we have functions or structured data
We need to define what “T1 conforms to T2” means.
T1 conforms to T2 if:
- T1 == T2, or
- T2 is UnknownType
Code update:
// Infer now also takes expected type as suggestion
def typeInfer(exp: Tree, pt: Type): Tree
// Check if 'tp' is well-formed. For now that means that 'tp'
// is not unknown
def typeWellFormed(tp: Type): Type
// Check if 'tp' conforms to 'pt' and return the more precise Type
// The returned type should also be well-formed
def typeConforms(tp: Type, tp: Type): Type
def typeCheck(exp: Tree, pt: Type): Tree = {
// First infer
val nexp = typeInfer(exp, pt)(env)
val rtp = typeConforms(nexp.tp, pt)(env)
nexp.withType(rtp)
}
9.4 Wrap-Up
9.4.1 Interpretation With Types
Let’s update our definitional interpreter to include types.
Previously we had
type Val = Int
Now we need a more generic definition. Either
type Val = Any
or something like
abstract class Val
case class IntVal(x: Int) extends Val
case class BoolVal(x: Boolean) extends Val
case object UnitVal extends Val
if we want to be more specific.
9.4.2 New Error Cases
Either way, the implementation will now contain type casts that may fail at runtime:
def eval(exp)(env: Env): Val = exp match {
case Lit(i: Int) => IntVal(i)
case Prim(op, l, r) =>
evalPrim(op)(eval(l)(env), eval(r)(env))
// ...
}
def evalPrim(op: String)(l: Val, r: Val) = (op, l, r) match {
case ("+" , IntVal(x), IntVal(y)) => IntVal(x + y)
case ("==", IntVal(x), IntVal(y)) => BoolVal(x == y)
// ...
// runtime error if "+" used with Int and Bool
}
These potential error cases didn’t exist before.
9.4.3 Type Soundness
We would like to make sure that our type checker actually prevents these newly introduced error cases. We could formally prove this based on the typing judgements, if we want to go all the way but at least we should convince ourselves informally by inspecting each case, such as:
def eval(exp: String)(env: Map[String,Type]) = exp match {
case Prim("+" , List(IntVal(x), IntVal(y))) => IntVal(x + y)
...
}
def typeInfer(exp: Exp)(env: Map[String,Type]): Exp = exp match {
case Prim("+", List(a,b)) =>
Prim("+", List(typeCheck(a, IntType), typeCheck(b, IntType))).withType(IntType)
...
}
Here, it is easy to see that all “well-typed programs don’t go wrong”, i.e., that all error cases in the interpreter map to error cases in the type checker.
Exercise: does the converse also hold? In this case? In general?
Of course for more complicated type systems and languages things aren’t quite so easy (see work on formalizing Scala’s type system in the DOT calculus).
9.4.4 Compilation With Types
Assembly code does not have types. We need to make an “implementation” decision on how to represent the new types.
- Boolean: 0 or 1, but we still use the full register
- Unit: There are no operations using Unit, we don’t need to store it
Implementation of the operators:
val x = 1 == 4;
We could use jumps: one branch sets 0, the other sets 1.
but X86 offers us a shortcut:
set<op> %al // set %al if flags validate <op>
// like jump, there are: sete, setne, setl, etc.
movbq %al, %rax // transform the byte into the full register
We also have to modify our compilation for the If statements.
def trans(exp: Tree, sp: Int)(env: Env) = exp match {
case If(cond, tBranch, eBranch) =>
trans(cond, sp)(env) // now sp will contain 0 or 1
transJumpIfTrue(sp)("if")
// ...
What code would transJumpIfTrue generates?
cmp $0, ${regs(sp)}
jne $label
test ${regs(sp)}, ${regs(sp)} # used by gcc
jnz $label
‘test S, T’ sets the flags accordingly to S & T. So if ‘sp’ contains 1: 1 & 1 != 0 so we jump (jnz). If ‘sp’ contains 0: 0 & 0 == 0 so we don’t jump.
9.5 Where Are We?
We looked into the formalization of type checking in our language. We discussed the implementation of the type checker.