2. 2
Administrative info
Instructor
Name: Vana Doufexi
E-mail: vdoufexi@cs.northwestern.edu
Office: Ford Building, #2-229
Hours: E-mail to set up appointment
Teaching Assistant
TBA
4. 4
What is a compiler
A program that reads a program written in some
language and translates it into a program written in
some other language
Modula-2 to C
Java to bytecodes
COOL to MIPS code
How was the first compiler created?
5. 5
Why study compilers?
Application of a wide range of theoretical techniques
Data Structures
Theory of Computation
Algorithms
Computer Architecture
Good SW engineering experience
Better understanding of programming languages
6. 6
Features of compilers
Correctness
preserve the meaning of the code
Speed of target code
Speed of compilation
Good error reporting/handling
Cooperation with the debugger
Support for separate compilation
7. 7
Compiler structure
Use intermediate representation
Why?
source
code
target
code
Front End Back End
IR
8. 8
Compiler Structure
Front end
Recognize legal/illegal programs
report/handle errors
Generate IR
The process can be automated
Back end
Translate IR into target code
instruction selection
register allocation
instruction scheduling
lots of NPC problems -- use approximations
9. 9
Compiler Structure
Optimization
goals
improve running time of generated code
improve space, power consumption, etc.
how?
perform a number of transformations on the IR
multiple passes
important: preserve meaning of code
10. 10
The Front End
Scanning (a.k.a. lexical analysis)
recognize "words" (tokens)
Parsing (a.k.a. syntax analysis)
check syntax
Semantic analysis
examine meaning (e.g. type checking)
Other issues:
symbol table (to keep track of identifiers)
error detection/reporting/recovery
11. 11
The Scanner
Its job:
given a character stream, recognize words (tokens)
e.g. x = 1 becomes IDENTIFIER EQUAL INTEGER
collect identifier information
e.g. IDENTIFIER corresponds to a lexeme (the actual
word x) and its type (acquired from the declaration of x).
ignore white space and comments
report errors
Good news
the process can be automated
12. 12
The Parser
Its job:
Check and verify syntax based on specified syntax rules
e.g. IDENTIFIER LPAREN RPAREN make up an
EXPRESSION.
Coming soon: how context-free grammars specify syntax
Report errors
Build IR
often a syntax tree
Good news
the process can be automated
13. 13
Semantic analysis
Its job:
Check the meaning of the program
e.g. In x=y, is y defined before being used? Are x and y
declared?
e.g. In x=y, are the types of x and y such that you can
assign one to the other?
Meaning may depend on context
Report errors
14. 14
IRs
Graphical
e.g. parse tree, DAG
Linear
e.g. three-address code
Hybrid
e.g. linear for blocks of straight-line code, a graph to
connect blocks
Low-level or high-level
15. 15
The scanning process
Main goal: recognize words
How? by recognizing patterns
e.g. an identifier is a sequence of letters or digits that starts
with a letter.
Lexical patterns form a regular language
Regular languages are described using regular
expressions (REs)
Can we create an automatic RE recognizer?
Yes! (Hold that thought)
16. 16
The scanning process
Definition: Regular expressions (over alphabet )
is an RE denoting {}
If , then is an RE denoting {}
If r and s are REs, then
(r) is an RE denoting L(r)
r|s is an RE denoting L(r)L(s)
rs is an RE denoting L(r)L(s)
r* is an RE denoting the Kleene closure of L(r)
Property: REs are closed under many operations
This allows us to build complex REs.
17. 17
The scanning process
Definition: Deterministic Finite Automaton
a five-tuple (, S, , s0, F) where
is the alphabet
S is the set of states
is the transition function (SS)
s0 is the starting state
F is the set of final states (F S)
Notation:
Use a transition diagram to describe a DFA
DFAs are equivalent to REs
Hey! We just came up with a recognizer!
18. 18
The scanning process
Goal: automate the process
Idea:
Start with an RE
Build a DFA
How?
We can build a non-deterministic finite automaton
(Thompson's construction)
Convert that to a deterministic one
(Subset construction)
Minimize the DFA
(Hopcroft's algorithm)
Implement it
Existing scanner generator: flex