1
Compiler Construction
Vana Doufexi
2
Administrative info
 Instructor
 Name: Vana Doufexi
 E-mail: vdoufexi@cs.northwestern.edu
 Office: Ford Building, #2-229
 Hours: E-mail to set up appointment
 Teaching Assistant
 TBA
3
Administrative info
 Course webpage
 http://www.cs.northwestern.edu/academics/courses/322
 contains:
 news
 staff information
 lecture notes & other handouts
 homeworks & manuals
 policies, grades
 newsgroup info
 useful links
 Newsgroup
 Name: cs.322
 nntp: news.cs.northwestern.edu
4
What is a compiler
 A program that reads a program written in some
language and translates it into a program written
in some other language
 Modula-2 to C
 Java to bytecodes
 COOL to MIPS code
 How was the first compiler created?
5
Why study compilers?
 Application of a wide range of theoretical
techniques
 Data Structures
 Theory of Computation
 Algorithms
 Computer Architecture
 Good SW engineering experience
 Better understanding of programming languages
6
Features of compilers
 Correctness
 preserve the meaning of the code
 Speed of target code
 Speed of compilation
 Good error reporting/handling
 Cooperation with the debugger
 Support for separate compilation
7
Compiler structure
 Use intermediate representation
 Why?
source
code
target
code
Front End Back End
IR
8
Compiler Structure
 Front end
 Recognize legal/illegal programs
 report/handle errors
 Generate IR
 The process can be automated
 Back end
 Translate IR into target code
 instruction selection
 register allocation
 instruction scheduling
 lots of NPC problems -- use approximations
9
Compiler Structure
 Optimization
 goals
 improve running time of generated code
 improve space, power consumption, etc.
 how?
 perform a number of transformations on the IR
 multiple passes
 important: preserve meaning of code
10
The Front End
 Scanning (a.k.a. lexical analysis)
 recognize "words" (tokens)
 Parsing (a.k.a. syntax analysis)
 check syntax
 Semantic analysis
 examine meaning (e.g. type checking)
 Other issues:
 symbol table (to keep track of identifiers)
 error detection/reporting/recovery
11
The Scanner
 Its job:
 given a character stream, recognize words (tokens)
 e.g. x = 1 becomes IDENTIFIER EQUAL INTEGER
 collect identifier information
 e.g. IDENTIFIER corresponds to a lexeme (the actual
word x) and its type (acquired from the declaration of
x).
 ignore white space and comments
 report errors
 Good news
 the process can be automated
12
The Parser
 Its job:
 Check and verify syntax based on specified syntax rules
 e.g. IDENTIFIER LPAREN RPAREN make up an
EXPRESSION.
 Coming soon: how context-free grammars specify
syntax
 Report errors
 Build IR
 often a syntax tree
 Good news
 the process can be automated
13
Semantic analysis
 Its job:
 Check the meaning of the program
 e.g. In x=y, is y defined before being used? Are x and
y declared?
 e.g. In x=y, are the types of x and y such that you can
assign one to the other?
 Meaning may depend on context
 Report errors
14
IRs
 Graphical
 e.g. parse tree, DAG
 Linear
 e.g. three-address code
 Hybrid
 e.g. linear for blocks of straight-line code, a graph to
connect blocks
 Low-level or high-level
15
The scanning process
 Main goal: recognize words
 How? by recognizing patterns
 e.g. an identifier is a sequence of letters or digits that
starts with a letter.
 Lexical patterns form a regular language
 Regular languages are described using regular
expressions (REs)
 Can we create an automatic RE recognizer?
 Yes! (Hold that thought)
16
The scanning process
 Definition: Regular expressions (over alphabet )
  is an RE denoting {}
 If , then  is an RE denoting {}
 If r and s are REs, then
 (r) is an RE denoting L(r)
 r|s is an RE denoting L(r)L(s)
 rs is an RE denoting L(r)L(s)
 r* is an RE denoting the Kleene closure of L(r)
 Property: REs are closed under many operations
 This allows us to build complex REs.
17
The scanning process
 Definition: Deterministic Finite Automaton
 a five-tuple (, S, , s0, F) where
  is the alphabet
 S is the set of states
  is the transition function (SS)
 s0 is the starting state
 F is the set of final states (F  S)
 Notation:
 Use a transition diagram to describe a DFA
 DFAs are equivalent to REs
 Hey! We just came up with a recognizer!
18
The scanning process
 Goal: automate the process
 Idea:
 Start with an RE
 Build a DFA
 How?
 We can build a non-deterministic finite automaton
(Thompson's construction)
 Convert that to a deterministic one
(Subset construction)
 Minimize the DFA
(Hopcroft's algorithm)
 Implement it
 Existing scanner generator: flex

Compiler Construction Chapter number 1 slide

  • 1.
  • 2.
    2 Administrative info  Instructor Name: Vana Doufexi  E-mail: vdoufexi@cs.northwestern.edu  Office: Ford Building, #2-229  Hours: E-mail to set up appointment  Teaching Assistant  TBA
  • 3.
    3 Administrative info  Coursewebpage  http://www.cs.northwestern.edu/academics/courses/322  contains:  news  staff information  lecture notes & other handouts  homeworks & manuals  policies, grades  newsgroup info  useful links  Newsgroup  Name: cs.322  nntp: news.cs.northwestern.edu
  • 4.
    4 What is acompiler  A program that reads a program written in some language and translates it into a program written in some other language  Modula-2 to C  Java to bytecodes  COOL to MIPS code  How was the first compiler created?
  • 5.
    5 Why study compilers? Application of a wide range of theoretical techniques  Data Structures  Theory of Computation  Algorithms  Computer Architecture  Good SW engineering experience  Better understanding of programming languages
  • 6.
    6 Features of compilers Correctness  preserve the meaning of the code  Speed of target code  Speed of compilation  Good error reporting/handling  Cooperation with the debugger  Support for separate compilation
  • 7.
    7 Compiler structure  Useintermediate representation  Why? source code target code Front End Back End IR
  • 8.
    8 Compiler Structure  Frontend  Recognize legal/illegal programs  report/handle errors  Generate IR  The process can be automated  Back end  Translate IR into target code  instruction selection  register allocation  instruction scheduling  lots of NPC problems -- use approximations
  • 9.
    9 Compiler Structure  Optimization goals  improve running time of generated code  improve space, power consumption, etc.  how?  perform a number of transformations on the IR  multiple passes  important: preserve meaning of code
  • 10.
    10 The Front End Scanning (a.k.a. lexical analysis)  recognize "words" (tokens)  Parsing (a.k.a. syntax analysis)  check syntax  Semantic analysis  examine meaning (e.g. type checking)  Other issues:  symbol table (to keep track of identifiers)  error detection/reporting/recovery
  • 11.
    11 The Scanner  Itsjob:  given a character stream, recognize words (tokens)  e.g. x = 1 becomes IDENTIFIER EQUAL INTEGER  collect identifier information  e.g. IDENTIFIER corresponds to a lexeme (the actual word x) and its type (acquired from the declaration of x).  ignore white space and comments  report errors  Good news  the process can be automated
  • 12.
    12 The Parser  Itsjob:  Check and verify syntax based on specified syntax rules  e.g. IDENTIFIER LPAREN RPAREN make up an EXPRESSION.  Coming soon: how context-free grammars specify syntax  Report errors  Build IR  often a syntax tree  Good news  the process can be automated
  • 13.
    13 Semantic analysis  Itsjob:  Check the meaning of the program  e.g. In x=y, is y defined before being used? Are x and y declared?  e.g. In x=y, are the types of x and y such that you can assign one to the other?  Meaning may depend on context  Report errors
  • 14.
    14 IRs  Graphical  e.g.parse tree, DAG  Linear  e.g. three-address code  Hybrid  e.g. linear for blocks of straight-line code, a graph to connect blocks  Low-level or high-level
  • 15.
    15 The scanning process Main goal: recognize words  How? by recognizing patterns  e.g. an identifier is a sequence of letters or digits that starts with a letter.  Lexical patterns form a regular language  Regular languages are described using regular expressions (REs)  Can we create an automatic RE recognizer?  Yes! (Hold that thought)
  • 16.
    16 The scanning process Definition: Regular expressions (over alphabet )   is an RE denoting {}  If , then  is an RE denoting {}  If r and s are REs, then  (r) is an RE denoting L(r)  r|s is an RE denoting L(r)L(s)  rs is an RE denoting L(r)L(s)  r* is an RE denoting the Kleene closure of L(r)  Property: REs are closed under many operations  This allows us to build complex REs.
  • 17.
    17 The scanning process Definition: Deterministic Finite Automaton  a five-tuple (, S, , s0, F) where   is the alphabet  S is the set of states   is the transition function (SS)  s0 is the starting state  F is the set of final states (F  S)  Notation:  Use a transition diagram to describe a DFA  DFAs are equivalent to REs  Hey! We just came up with a recognizer!
  • 18.
    18 The scanning process Goal: automate the process  Idea:  Start with an RE  Build a DFA  How?  We can build a non-deterministic finite automaton (Thompson's construction)  Convert that to a deterministic one (Subset construction)  Minimize the DFA (Hopcroft's algorithm)  Implement it  Existing scanner generator: flex