Compiler Construction Chapter number 1 slide

1
Compiler Construction
Vana Doufexi

2
Administrative info
 Instructor
 Name: Vana Doufexi
 E-mail: vdoufexi@cs.northwestern.edu
 Office: Ford Building, #2-229
 Hours: E-mail to set up appointment
 Teaching Assistant
 TBA

3
Administrative info
 Course webpage
 http://www.cs.northwestern.edu/academics/courses/322
 contains:
 news
 staff information
 lecture notes & other handouts
 homeworks & manuals
 policies, grades
 newsgroup info
 useful links
 Newsgroup
 Name: cs.322
 nntp: news.cs.northwestern.edu

4
What is a compiler
 A program that reads a program written in some
language and translates it into a program written
in some other language
 Modula-2 to C
 Java to bytecodes
 COOL to MIPS code
 How was the first compiler created?

5
Why study compilers?
 Application of a wide range of theoretical
techniques
 Data Structures
 Theory of Computation
 Algorithms
 Computer Architecture
 Good SW engineering experience
 Better understanding of programming languages

6
Features of compilers
 Correctness
 preserve the meaning of the code
 Speed of target code
 Speed of compilation
 Good error reporting/handling
 Cooperation with the debugger
 Support for separate compilation

7
Compiler structure
 Use intermediate representation
 Why?
source
code
target
code
Front End Back End
IR

8
Compiler Structure
 Front end
 Recognize legal/illegal programs
 report/handle errors
 Generate IR
 The process can be automated
 Back end
 Translate IR into target code
 instruction selection
 register allocation
 instruction scheduling
 lots of NPC problems -- use approximations

9
Compiler Structure
 Optimization
 goals
 improve running time of generated code
 improve space, power consumption, etc.
 how?
 perform a number of transformations on the IR
 multiple passes
 important: preserve meaning of code

10
The Front End
 Scanning (a.k.a. lexical analysis)
 recognize "words" (tokens)
 Parsing (a.k.a. syntax analysis)
 check syntax
 Semantic analysis
 examine meaning (e.g. type checking)
 Other issues:
 symbol table (to keep track of identifiers)
 error detection/reporting/recovery

11
The Scanner
 Its job:
 given a character stream, recognize words (tokens)
 e.g. x = 1 becomes IDENTIFIER EQUAL INTEGER
 collect identifier information
 e.g. IDENTIFIER corresponds to a lexeme (the actual
word x) and its type (acquired from the declaration of
x).
 ignore white space and comments
 report errors
 Good news
 the process can be automated

12
The Parser
 Its job:
 Check and verify syntax based on specified syntax rules
 e.g. IDENTIFIER LPAREN RPAREN make up an
EXPRESSION.
 Coming soon: how context-free grammars specify
syntax
 Report errors
 Build IR
 often a syntax tree
 Good news
 the process can be automated

13
Semantic analysis
 Its job:
 Check the meaning of the program
 e.g. In x=y, is y defined before being used? Are x and
y declared?
 e.g. In x=y, are the types of x and y such that you can
assign one to the other?
 Meaning may depend on context
 Report errors

14
IRs
 Graphical
 e.g. parse tree, DAG
 Linear
 e.g. three-address code
 Hybrid
 e.g. linear for blocks of straight-line code, a graph to
connect blocks
 Low-level or high-level

15
The scanning process
 Main goal: recognize words
 How? by recognizing patterns
 e.g. an identifier is a sequence of letters or digits that
starts with a letter.
 Lexical patterns form a regular language
 Regular languages are described using regular
expressions (REs)
 Can we create an automatic RE recognizer?
 Yes! (Hold that thought)

16
 Definition: Regular expressions (over alphabet )
  is an RE denoting {}
 If , then  is an RE denoting {}
 If r and s are REs, then
 (r) is an RE denoting L(r)
 r|s is an RE denoting L(r)L(s)
 rs is an RE denoting L(r)L(s)
 r* is an RE denoting the Kleene closure of L(r)
 Property: REs are closed under many operations
 This allows us to build complex REs.

17
 Definition: Deterministic Finite Automaton
 a five-tuple (, S, , s0, F) where
  is the alphabet
 S is the set of states
  is the transition function (SS)
 s0 is the starting state
 F is the set of final states (F  S)
 Notation:
 Use a transition diagram to describe a DFA
 DFAs are equivalent to REs
 Hey! We just came up with a recognizer!

18
 Goal: automate the process
 Idea:
 Start with an RE
 Build a DFA
 How?
 We can build a non-deterministic finite automaton
(Thompson's construction)
 Convert that to a deterministic one
(Subset construction)
 Minimize the DFA
(Hopcroft's algorithm)
 Implement it
 Existing scanner generator: flex

Compiler Construction Chapter number 1 slide

More Related Content

Similar to Compiler Construction Chapter number 1 slide

Recently uploaded

Compiler Construction Chapter number 1 slide