Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
INTERMEDIATE CODE
GENERATION
1
2Structure of a Compiler
 Front end of a compiler is efficient and can be
automated
 Back end is generally hard to autom...
Overview
 Goal: Generate a Machine Independent
Intermediate Form that is Suitable for Optimization
and Portability
 Faci...
4
Motivation
 What we have so far...
 A Parser tree
 With all the program information
 Known to be correct
 Well-typed
...
What We Want
 A Representation that
 Is closer to actual machine
 Is easy to manipulate
 Is target neutral (hardware
i...
Intermediate Languages
 Syntax Tree.
 A syntax tree heiarachical structure of the
source program
 DAG more compact.
 P...
Recall ASTs and DAGs
 Intermediate Forms of a Program:
 ASTs: Abstract Syntax Trees
 DAGs: Directed Acyclic Graphs
 Wh...
Representation
 Two Different Forms:
 Linked Data Structure
 Multi-Dimensional Array
9
10Abstract Syntax Trees (ASTs)
if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
Statements
<
AssignStmt
+
*
x
IfStmt
Assig...
11Directed Acyclic Graphs
(DAGs)
 Use directed acyclic graphs to represent expressions
 Use a unique node for each n
if ...
12
Control Flow Graphs (CFGs)
 Nodes in the control flow graph are basic blocks
 A basic block is a sequence of statemen...
CFG
if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
B1
if (x < y) goto B1 else goto B2
x = 5*y + 5*y/3 y = 5
x = x+y
B2
B...
Objective
 Directly Generate Code From AST or DAG as a Side Effect of Parsing
Process.
 Consider Code Below:
14
Each is Referred to as “3 Address Coding (3AC)”
since there are at Most 3 Addresses per Statement
One for Result and At Mo...
The Intermediate Code
Generation Machine
 A Machine with
 Infinite number of temporaries
 Simple instructions
 3-opera...
Temporaries
 The machine has an infinite number of temporaries
 Call them t0, t1, t2, ....
 Temporaries can hold values...
What is Three-Address
Coding?
 A simple type of instruction
 3 / 2 Operands x,y,z
 Each operand could be
 A literal
 ...
Types of Three Address
Statements
 Assignment Statements of Form:
 X := Y op Z
 op is a Binary Arithmetic or Logical Op...
Types of Three Address
Statements
 Conditional Jumps of Form:
 if x relop y goto L
 with relop as relational operators ...
Types of Three Address
Statements
 Indexed Assignments of Form:
 X := Y[i] (Set X to i-th memory location of Y)
 X[i] :...
Three Address Code
Representations
 Data structures for representation of TAC can be objects or
records with fields for o...
Quadruples
 In the quadruple representation, there are four
fields for each instruction: op, arg1, arg2, result
 Binary ...
Quadruples 24
Triples
 A triple has only three fields for each instruction:op,arg1,
arg2
 The result of an operation x op y is referre...
Representations of a = (b* - c) +
(b* – c)
26
Indirect Triples
 These consist of a listing of pointers to triples, rather than a listing of
the triples themselves.
 A...
28
Attribute Grammar for
Assignments
 Concepts:
 Need to Introduce Temporary Variables as
Necessary to Decompose Assignment...
Declarations
 Stack Utilized during Procedure/Function Calls to
 Allocate Space for Variables
 This now Includes Tempor...
Storage Layout for Local
Names
 The type and relative address are saved in the symbol-table
entry for the name .
 The wi...
32Code Generation for Boolean
Expressions
 Two approaches
 Numerical representation
 Implicit representation
 Numerica...
Code Generation for Boolean
Expressions
 Implicit representation
 For the boolean expressions which are used
in flow-of-...
Generated Code
 Consider: a < b or c <
d and e < f
100: if a< b goto 103
101: t1:=0
102: goto 104
103: t1:=1
104: if c< d...
Upcoming SlideShare
Loading in …5
×

Intermediate code generation

INTERMEDIATE CODE GENERATION

  • Be the first to comment

Intermediate code generation

  1. 1. INTERMEDIATE CODE GENERATION 1
  2. 2. 2Structure of a Compiler  Front end of a compiler is efficient and can be automated  Back end is generally hard to automate and finding the optimum solution requires exponential time  Intermediate code generation can affect the performance of the back end Instruction Selection Instruction Scheduling Register Allocation Scanner Parser Semantic Analysis Code Optimization Intermediate Code Generation IR
  3. 3. Overview  Goal: Generate a Machine Independent Intermediate Form that is Suitable for Optimization and Portability  Facilitates retargeting: enables attaching a back end for the new machine to an existing front end  Enables machine-independent code optimization 3
  4. 4. 4
  5. 5. Motivation  What we have so far...  A Parser tree  With all the program information  Known to be correct  Well-typed  Nothing missing  No ambiguities  What we need...  Something “Executable”  Closer to  An operations schedule  Actual machine level 5
  6. 6. What We Want  A Representation that  Is closer to actual machine  Is easy to manipulate  Is target neutral (hardware independent)  Can be interpreted 6
  7. 7. Intermediate Languages  Syntax Tree.  A syntax tree heiarachical structure of the source program  DAG more compact.  Postfix Notation  Linearized representation of a syntax tree.  Edges do not appear explicitly. They can be recovered.  Three address code  Control Flow Graphs (CFGs) 7
  8. 8. Recall ASTs and DAGs  Intermediate Forms of a Program:  ASTs: Abstract Syntax Trees  DAGs: Directed Acyclic Graphs  What is the Expression? assign a + * * b c uminusb c uminus assign a + * b c uminus 8
  9. 9. Representation  Two Different Forms:  Linked Data Structure  Multi-Dimensional Array 9
  10. 10. 10Abstract Syntax Trees (ASTs) if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; Statements < AssignStmt + * x IfStmt AssignStmt AssignStmt x x y+ yxy / 5 y 3* 5 y 5
  11. 11. 11Directed Acyclic Graphs (DAGs)  Use directed acyclic graphs to represent expressions  Use a unique node for each n if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; Statements < AssignStmt * IfStmt AssignStmt AssignStmt x +y / 5 3
  12. 12. 12 Control Flow Graphs (CFGs)  Nodes in the control flow graph are basic blocks  A basic block is a sequence of statements always entered at the beginning of the block and exited at the end  Edges in the control flow graph represent the control flow
  13. 13. CFG if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; B1 if (x < y) goto B1 else goto B2 x = 5*y + 5*y/3 y = 5 x = x+y B2 B0 B3 • Each block has a sequence of statements • No jump from or to the middle of the block • Once a block starts executing, it will execute till the end 13
  14. 14. Objective  Directly Generate Code From AST or DAG as a Side Effect of Parsing Process.  Consider Code Below: 14
  15. 15. Each is Referred to as “3 Address Coding (3AC)” since there are at Most 3 Addresses per Statement One for Result and At Most 2 for Operands 15
  16. 16. The Intermediate Code Generation Machine  A Machine with  Infinite number of temporaries  Simple instructions  3-operands  Branching  Calls with simple calling convention  Simple code structure  Array of instructions  Labels to define targets of branches. 16
  17. 17. Temporaries  The machine has an infinite number of temporaries  Call them t0, t1, t2, ....  Temporaries can hold values of any type  The type of the temporary is derived from the generation  Temporaries go out of scope with the function they are in 17
  18. 18. What is Three-Address Coding?  A simple type of instruction  3 / 2 Operands x,y,z  Each operand could be  A literal  A variable  A temporary  Example x := y op z x + y * z t0 := y * z t1 := x + t0 x := op z 18
  19. 19. Types of Three Address Statements  Assignment Statements of Form:  X := Y op Z  op is a Binary Arithmetic or Logical Operation  Assignment Instructions of Form:  X := op Y  op is Unary Operation such as Unary Minus, Logical Negative, Shift/Conversion Operations  Copy Statements of Form:  X := Y where value of Y assigned to X  Unconditional Jump of Form:  goto L which goes to a three address statement labeled with L 19
  20. 20. Types of Three Address Statements  Conditional Jumps of Form:  if x relop y goto L  with relop as relational operators and the goto executed if the x relop y is true  Parameter Operations of Form:  param a (a parameter of function)  call p, n (call function p with n parameters)  return y (return value y from function – optional)  param a  param b  param c  call p, 3 20
  21. 21. Types of Three Address Statements  Indexed Assignments of Form:  X := Y[i] (Set X to i-th memory location of Y)  X[i] := Y (Set i-th memory location of X to Y)  Note the limit of 3 Addresses (X, Y, i)  Cannot do: x[i] := y[j]; (4 addresses!)  Address and Pointer Assignments of Form:  X := & Y (X set to the Address of Y)  X := * Y (X set to the contents pointed to by Y)  * X := Y (Contents of X set to Value of Y) 21
  22. 22. Three Address Code Representations  Data structures for representation of TAC can be objects or records with fields for operator and operands. Representations include quadruples, triples and indirect triples. 22
  23. 23. Quadruples  In the quadruple representation, there are four fields for each instruction: op, arg1, arg2, result  Binary ops have the obvious representation  Unary ops don’t use arg2  Operators like param don’t use either arg2 or result  Jumps put the target label into result  The quadruples in Fig (b) implement the three- address code in (a) for the expression a = b * - c + b * - c 23
  24. 24. Quadruples 24
  25. 25. Triples  A triple has only three fields for each instruction:op,arg1, arg2  The result of an operation x op y is referred to by its position.  Triples are equivalent to signatures of nodes in DAG or syntax trees.  Triples and DAGs are equivalent representations only for expressions; they are not equivalent for control flow.  Ternary operations like x[i] = y requires two entries in the triple structure, similarly for x = y[i].  Moving around an instruction during optimization is a problem 25
  26. 26. Representations of a = (b* - c) + (b* – c) 26
  27. 27. Indirect Triples  These consist of a listing of pointers to triples, rather than a listing of the triples themselves.  An optimizing compiler can move an instruction by reordering the instruction list, without affecting the triples themselves. 27
  28. 28. 28
  29. 29. Attribute Grammar for Assignments  Concepts:  Need to Introduce Temporary Variables as Necessary to Decompose Assignment Statement  Every Generated Line of Code Must have at Most 3 Addresses! 29
  30. 30. Declarations  Stack Utilized during Procedure/Function Calls to  Allocate Space for Variables  This now Includes Temporaries for 3AC  We need to Track  Name  Type (Int, real, boolean, etc.)  Offset (with respect to some relative address)  Function  enter (name, type, offset) creates symbol table entry  offset global initially 0 30
  31. 31. Storage Layout for Local Names  The type and relative address are saved in the symbol-table entry for the name .  The width of a type is the number of storage units needed for objects of that type.  type 31
  32. 32. 32Code Generation for Boolean Expressions  Two approaches  Numerical representation  Implicit representation  Numerical representation  Use 1 to represent true, use 0 to represent false  For three-address code store this result in a temporary  For stack machine code store this result in the stack
  33. 33. Code Generation for Boolean Expressions  Implicit representation  For the boolean expressions which are used in flow-of-control statements (such as if- statements, while-statements etc.) boolean expressions do not have to explicitly compute a value, they just need to branch to the right instruction  Generate code for boolean expressions which branch to the appropriate instruction based on the result of the boolean expression 33
  34. 34. Generated Code  Consider: a < b or c < d and e < f 100: if a< b goto 103 101: t1:=0 102: goto 104 103: t1:=1 104: if c< d goto 107 105: t2:=0 106: goto 108 107: t2 := 1 108: if e< f goto 111 109: t3 := 0 110: goto 112 111: t3:=1 112: t4:=t2 and t3 113: t5:=t1 or t4 34

×