INTERMEDIATE CODE
GENERATION
1
2Structure of a Compiler
 Front end of a compiler is efficient and can be
automated
 Back end is generally hard to automate and finding the
optimum solution requires exponential time
 Intermediate code generation can affect the
performance of the back end
Instruction
Selection
Instruction
Scheduling
Register
Allocation
Scanner Parser
Semantic
Analysis
Code
Optimization
Intermediate
Code
Generation
IR
Overview
 Goal: Generate a Machine Independent
Intermediate Form that is Suitable for Optimization
and Portability
 Facilitates retargeting: enables attaching a back
end for the new machine to an existing front end
 Enables machine-independent code optimization
3
4
Motivation
 What we have so far...
 A Parser tree
 With all the program information
 Known to be correct
 Well-typed
 Nothing missing
 No ambiguities
 What we need...
 Something “Executable”
 Closer to
 An operations schedule
 Actual machine level
5
What We Want
 A Representation that
 Is closer to actual machine
 Is easy to manipulate
 Is target neutral (hardware
independent)
 Can be interpreted
6
Intermediate Languages
 Syntax Tree.
 A syntax tree heiarachical structure of the
source program
 DAG more compact.
 Postfix Notation
 Linearized representation of a syntax tree.
 Edges do not appear explicitly. They can be
recovered.
 Three address code
 Control Flow Graphs (CFGs)
7
Recall ASTs and DAGs
 Intermediate Forms of a Program:
 ASTs: Abstract Syntax Trees
 DAGs: Directed Acyclic Graphs
 What is the Expression?
assign
a +
* *
b
c
uminusb
c
uminus
assign
a +
*
b
c
uminus
8
Representation
 Two Different Forms:
 Linked Data Structure
 Multi-Dimensional Array
9
10Abstract Syntax Trees (ASTs)
if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
Statements
<
AssignStmt
+
*
x
IfStmt
AssignStmt AssignStmt
x x y+ yxy
/
5 y 3*
5 y
5
11Directed Acyclic Graphs
(DAGs)
 Use directed acyclic graphs to represent expressions
 Use a unique node for each n
if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
Statements
<
AssignStmt
*
IfStmt
AssignStmt AssignStmt
x +y
/
5
3
12
Control Flow Graphs (CFGs)
 Nodes in the control flow graph are basic blocks
 A basic block is a sequence of statements
always entered at the beginning of the
block and exited at the end
 Edges in the control flow graph represent the control flow
CFG
if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
B1
if (x < y) goto B1 else goto B2
x = 5*y + 5*y/3 y = 5
x = x+y
B2
B0
B3
• Each block has a sequence of statements
• No jump from or to the middle of the block
• Once a block starts executing, it will execute till the end
13
Objective
 Directly Generate Code From AST or DAG as a Side Effect of Parsing
Process.
 Consider Code Below:
14
Each is Referred to as “3 Address Coding (3AC)”
since there are at Most 3 Addresses per Statement
One for Result and At Most 2 for Operands
15
The Intermediate Code
Generation Machine
 A Machine with
 Infinite number of temporaries
 Simple instructions
 3-operands
 Branching
 Calls with simple calling convention
 Simple code structure
 Array of instructions
 Labels to define targets of branches.
16
Temporaries
 The machine has an infinite number of temporaries
 Call them t0, t1, t2, ....
 Temporaries can hold values of any type
 The type of the temporary is derived from the
generation
 Temporaries go out of scope with the function
they are in
17
What is Three-Address
Coding?
 A simple type of instruction
 3 / 2 Operands x,y,z
 Each operand could be
 A literal
 A variable
 A temporary
 Example
x := y op z
x + y * z
t0 := y * z
t1 := x + t0
x := op z
18
Types of Three Address
Statements
 Assignment Statements of Form:
 X := Y op Z
 op is a Binary Arithmetic or Logical Operation
 Assignment Instructions of Form:
 X := op Y
 op is Unary Operation such as Unary Minus, Logical
Negative, Shift/Conversion Operations
 Copy Statements of Form:
 X := Y where value of Y assigned to X
 Unconditional Jump of Form:
 goto L which goes to a three address statement labeled
with L
19
Types of Three Address
Statements
 Conditional Jumps of Form:
 if x relop y goto L
 with relop as relational operators and the goto executed if
the x relop y is true
 Parameter Operations of Form:
 param a (a parameter of function)
 call p, n (call function p with n parameters)
 return y (return value y from function – optional)
 param a
 param b
 param c
 call p, 3
20
Types of Three Address
Statements
 Indexed Assignments of Form:
 X := Y[i] (Set X to i-th memory location of Y)
 X[i] := Y (Set i-th memory location of X to Y)
 Note the limit of 3 Addresses (X, Y, i)
 Cannot do: x[i] := y[j]; (4 addresses!)
 Address and Pointer Assignments of Form:
 X := & Y (X set to the Address of Y)
 X := * Y (X set to the contents pointed to by Y)
 * X := Y (Contents of X set to Value of Y)
21
Three Address Code
Representations
 Data structures for representation of TAC can be objects or
records with fields for operator and operands.
Representations include quadruples, triples and indirect
triples.
22
Quadruples
 In the quadruple representation, there are four
fields for each instruction: op, arg1, arg2, result
 Binary ops have the obvious representation
 Unary ops don’t use arg2
 Operators like param don’t use either arg2 or
result
 Jumps put the target label into result
 The quadruples in Fig (b) implement the three-
address code in (a) for the expression a = b * - c +
b * - c
23
Quadruples 24
Triples
 A triple has only three fields for each instruction:op,arg1,
arg2
 The result of an operation x op y is referred to by its
position.
 Triples are equivalent to signatures of nodes in DAG or
syntax trees.
 Triples and DAGs are equivalent representations only for
expressions; they are not equivalent for control flow.
 Ternary operations like x[i] = y requires two entries in the
triple structure, similarly for x = y[i].
 Moving around an instruction during optimization is a
problem
25
Representations of a = (b* - c) +
(b* – c)
26
Indirect Triples
 These consist of a listing of pointers to triples, rather than a listing of
the triples themselves.
 An optimizing compiler can move an instruction by reordering the
instruction list, without affecting the triples themselves.
27
28
Attribute Grammar for
Assignments
 Concepts:
 Need to Introduce Temporary Variables as
Necessary to Decompose Assignment Statement
 Every Generated Line of Code Must have at Most
3 Addresses!
29
Declarations
 Stack Utilized during Procedure/Function Calls to
 Allocate Space for Variables
 This now Includes Temporaries for 3AC
 We need to Track
 Name
 Type (Int, real, boolean, etc.)
 Offset (with respect to some relative address)
 Function
 enter (name, type, offset) creates symbol table entry
 offset global initially 0
30
Storage Layout for Local
Names
 The type and relative address are saved in the symbol-table
entry for the name .
 The width of a type is the number of storage units needed for
objects of that type.
 type
31
32Code Generation for Boolean
Expressions
 Two approaches
 Numerical representation
 Implicit representation
 Numerical representation
 Use 1 to represent true, use 0 to represent
false
 For three-address code store this result in a
temporary
 For stack machine code store this result in the
stack
Code Generation for Boolean
Expressions
 Implicit representation
 For the boolean expressions which are used
in flow-of-control statements (such as if-
statements, while-statements etc.) boolean
expressions do not have to explicitly
compute a value, they just need to branch
to the right instruction
 Generate code for boolean expressions
which branch to the appropriate instruction
based on the result of the boolean
expression
33
Generated Code
 Consider: a < b or c <
d and e < f
100: if a< b goto 103
101: t1:=0
102: goto 104
103: t1:=1
104: if c< d goto 107
105: t2:=0
106: goto 108
107: t2 := 1
108: if e< f goto 111
109: t3 := 0
110: goto 112
111: t3:=1
112: t4:=t2 and t3
113: t5:=t1 or t4
34

Intermediate code generation

  • 1.
  • 2.
    2Structure of aCompiler  Front end of a compiler is efficient and can be automated  Back end is generally hard to automate and finding the optimum solution requires exponential time  Intermediate code generation can affect the performance of the back end Instruction Selection Instruction Scheduling Register Allocation Scanner Parser Semantic Analysis Code Optimization Intermediate Code Generation IR
  • 3.
    Overview  Goal: Generatea Machine Independent Intermediate Form that is Suitable for Optimization and Portability  Facilitates retargeting: enables attaching a back end for the new machine to an existing front end  Enables machine-independent code optimization 3
  • 4.
  • 5.
    Motivation  What wehave so far...  A Parser tree  With all the program information  Known to be correct  Well-typed  Nothing missing  No ambiguities  What we need...  Something “Executable”  Closer to  An operations schedule  Actual machine level 5
  • 6.
    What We Want A Representation that  Is closer to actual machine  Is easy to manipulate  Is target neutral (hardware independent)  Can be interpreted 6
  • 7.
    Intermediate Languages  SyntaxTree.  A syntax tree heiarachical structure of the source program  DAG more compact.  Postfix Notation  Linearized representation of a syntax tree.  Edges do not appear explicitly. They can be recovered.  Three address code  Control Flow Graphs (CFGs) 7
  • 8.
    Recall ASTs andDAGs  Intermediate Forms of a Program:  ASTs: Abstract Syntax Trees  DAGs: Directed Acyclic Graphs  What is the Expression? assign a + * * b c uminusb c uminus assign a + * b c uminus 8
  • 9.
    Representation  Two DifferentForms:  Linked Data Structure  Multi-Dimensional Array 9
  • 10.
    10Abstract Syntax Trees(ASTs) if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; Statements < AssignStmt + * x IfStmt AssignStmt AssignStmt x x y+ yxy / 5 y 3* 5 y 5
  • 11.
    11Directed Acyclic Graphs (DAGs) Use directed acyclic graphs to represent expressions  Use a unique node for each n if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; Statements < AssignStmt * IfStmt AssignStmt AssignStmt x +y / 5 3
  • 12.
    12 Control Flow Graphs(CFGs)  Nodes in the control flow graph are basic blocks  A basic block is a sequence of statements always entered at the beginning of the block and exited at the end  Edges in the control flow graph represent the control flow
  • 13.
    CFG if (x <y) x = 5*y + 5*y/3; else y = 5; x = x+y; B1 if (x < y) goto B1 else goto B2 x = 5*y + 5*y/3 y = 5 x = x+y B2 B0 B3 • Each block has a sequence of statements • No jump from or to the middle of the block • Once a block starts executing, it will execute till the end 13
  • 14.
    Objective  Directly GenerateCode From AST or DAG as a Side Effect of Parsing Process.  Consider Code Below: 14
  • 15.
    Each is Referredto as “3 Address Coding (3AC)” since there are at Most 3 Addresses per Statement One for Result and At Most 2 for Operands 15
  • 16.
    The Intermediate Code GenerationMachine  A Machine with  Infinite number of temporaries  Simple instructions  3-operands  Branching  Calls with simple calling convention  Simple code structure  Array of instructions  Labels to define targets of branches. 16
  • 17.
    Temporaries  The machinehas an infinite number of temporaries  Call them t0, t1, t2, ....  Temporaries can hold values of any type  The type of the temporary is derived from the generation  Temporaries go out of scope with the function they are in 17
  • 18.
    What is Three-Address Coding? A simple type of instruction  3 / 2 Operands x,y,z  Each operand could be  A literal  A variable  A temporary  Example x := y op z x + y * z t0 := y * z t1 := x + t0 x := op z 18
  • 19.
    Types of ThreeAddress Statements  Assignment Statements of Form:  X := Y op Z  op is a Binary Arithmetic or Logical Operation  Assignment Instructions of Form:  X := op Y  op is Unary Operation such as Unary Minus, Logical Negative, Shift/Conversion Operations  Copy Statements of Form:  X := Y where value of Y assigned to X  Unconditional Jump of Form:  goto L which goes to a three address statement labeled with L 19
  • 20.
    Types of ThreeAddress Statements  Conditional Jumps of Form:  if x relop y goto L  with relop as relational operators and the goto executed if the x relop y is true  Parameter Operations of Form:  param a (a parameter of function)  call p, n (call function p with n parameters)  return y (return value y from function – optional)  param a  param b  param c  call p, 3 20
  • 21.
    Types of ThreeAddress Statements  Indexed Assignments of Form:  X := Y[i] (Set X to i-th memory location of Y)  X[i] := Y (Set i-th memory location of X to Y)  Note the limit of 3 Addresses (X, Y, i)  Cannot do: x[i] := y[j]; (4 addresses!)  Address and Pointer Assignments of Form:  X := & Y (X set to the Address of Y)  X := * Y (X set to the contents pointed to by Y)  * X := Y (Contents of X set to Value of Y) 21
  • 22.
    Three Address Code Representations Data structures for representation of TAC can be objects or records with fields for operator and operands. Representations include quadruples, triples and indirect triples. 22
  • 23.
    Quadruples  In thequadruple representation, there are four fields for each instruction: op, arg1, arg2, result  Binary ops have the obvious representation  Unary ops don’t use arg2  Operators like param don’t use either arg2 or result  Jumps put the target label into result  The quadruples in Fig (b) implement the three- address code in (a) for the expression a = b * - c + b * - c 23
  • 24.
  • 25.
    Triples  A triplehas only three fields for each instruction:op,arg1, arg2  The result of an operation x op y is referred to by its position.  Triples are equivalent to signatures of nodes in DAG or syntax trees.  Triples and DAGs are equivalent representations only for expressions; they are not equivalent for control flow.  Ternary operations like x[i] = y requires two entries in the triple structure, similarly for x = y[i].  Moving around an instruction during optimization is a problem 25
  • 26.
    Representations of a= (b* - c) + (b* – c) 26
  • 27.
    Indirect Triples  Theseconsist of a listing of pointers to triples, rather than a listing of the triples themselves.  An optimizing compiler can move an instruction by reordering the instruction list, without affecting the triples themselves. 27
  • 28.
  • 29.
    Attribute Grammar for Assignments Concepts:  Need to Introduce Temporary Variables as Necessary to Decompose Assignment Statement  Every Generated Line of Code Must have at Most 3 Addresses! 29
  • 30.
    Declarations  Stack Utilizedduring Procedure/Function Calls to  Allocate Space for Variables  This now Includes Temporaries for 3AC  We need to Track  Name  Type (Int, real, boolean, etc.)  Offset (with respect to some relative address)  Function  enter (name, type, offset) creates symbol table entry  offset global initially 0 30
  • 31.
    Storage Layout forLocal Names  The type and relative address are saved in the symbol-table entry for the name .  The width of a type is the number of storage units needed for objects of that type.  type 31
  • 32.
    32Code Generation forBoolean Expressions  Two approaches  Numerical representation  Implicit representation  Numerical representation  Use 1 to represent true, use 0 to represent false  For three-address code store this result in a temporary  For stack machine code store this result in the stack
  • 33.
    Code Generation forBoolean Expressions  Implicit representation  For the boolean expressions which are used in flow-of-control statements (such as if- statements, while-statements etc.) boolean expressions do not have to explicitly compute a value, they just need to branch to the right instruction  Generate code for boolean expressions which branch to the appropriate instruction based on the result of the boolean expression 33
  • 34.
    Generated Code  Consider:a < b or c < d and e < f 100: if a< b goto 103 101: t1:=0 102: goto 104 103: t1:=1 104: if c< d goto 107 105: t2:=0 106: goto 108 107: t2 := 1 108: if e< f goto 111 109: t3 := 0 110: goto 112 111: t3:=1 112: t4:=t2 and t3 113: t5:=t1 or t4 34