2. 2Structure of a Compiler
Front end of a compiler is efficient and can be
automated
Back end is generally hard to automate and finding the
optimum solution requires exponential time
Intermediate code generation can affect the
performance of the back end
Instruction
Selection
Instruction
Scheduling
Register
Allocation
Scanner Parser
Semantic
Analysis
Code
Optimization
Intermediate
Code
Generation
IR
3. Overview
Goal: Generate a Machine Independent
Intermediate Form that is Suitable for Optimization
and Portability
Facilitates retargeting: enables attaching a back
end for the new machine to an existing front end
Enables machine-independent code optimization
3
5. Motivation
What we have so far...
A Parser tree
With all the program information
Known to be correct
Well-typed
Nothing missing
No ambiguities
What we need...
Something “Executable”
Closer to
An operations schedule
Actual machine level
5
6. What We Want
A Representation that
Is closer to actual machine
Is easy to manipulate
Is target neutral (hardware
independent)
Can be interpreted
6
7. Intermediate Languages
Syntax Tree.
A syntax tree heiarachical structure of the
source program
DAG more compact.
Postfix Notation
Linearized representation of a syntax tree.
Edges do not appear explicitly. They can be
recovered.
Three address code
Control Flow Graphs (CFGs)
7
8. Recall ASTs and DAGs
Intermediate Forms of a Program:
ASTs: Abstract Syntax Trees
DAGs: Directed Acyclic Graphs
What is the Expression?
assign
a +
* *
b
c
uminusb
c
uminus
assign
a +
*
b
c
uminus
8
10. 10Abstract Syntax Trees (ASTs)
if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
Statements
<
AssignStmt
+
*
x
IfStmt
AssignStmt AssignStmt
x x y+ yxy
/
5 y 3*
5 y
5
11. 11Directed Acyclic Graphs
(DAGs)
Use directed acyclic graphs to represent expressions
Use a unique node for each n
if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
Statements
<
AssignStmt
*
IfStmt
AssignStmt AssignStmt
x +y
/
5
3
12. 12
Control Flow Graphs (CFGs)
Nodes in the control flow graph are basic blocks
A basic block is a sequence of statements
always entered at the beginning of the
block and exited at the end
Edges in the control flow graph represent the control flow
13. CFG
if (x < y)
x = 5*y + 5*y/3;
else
y = 5;
x = x+y;
B1
if (x < y) goto B1 else goto B2
x = 5*y + 5*y/3 y = 5
x = x+y
B2
B0
B3
• Each block has a sequence of statements
• No jump from or to the middle of the block
• Once a block starts executing, it will execute till the end
13
15. Each is Referred to as “3 Address Coding (3AC)”
since there are at Most 3 Addresses per Statement
One for Result and At Most 2 for Operands
15
16. The Intermediate Code
Generation Machine
A Machine with
Infinite number of temporaries
Simple instructions
3-operands
Branching
Calls with simple calling convention
Simple code structure
Array of instructions
Labels to define targets of branches.
16
17. Temporaries
The machine has an infinite number of temporaries
Call them t0, t1, t2, ....
Temporaries can hold values of any type
The type of the temporary is derived from the
generation
Temporaries go out of scope with the function
they are in
17
18. What is Three-Address
Coding?
A simple type of instruction
3 / 2 Operands x,y,z
Each operand could be
A literal
A variable
A temporary
Example
x := y op z
x + y * z
t0 := y * z
t1 := x + t0
x := op z
18
19. Types of Three Address
Statements
Assignment Statements of Form:
X := Y op Z
op is a Binary Arithmetic or Logical Operation
Assignment Instructions of Form:
X := op Y
op is Unary Operation such as Unary Minus, Logical
Negative, Shift/Conversion Operations
Copy Statements of Form:
X := Y where value of Y assigned to X
Unconditional Jump of Form:
goto L which goes to a three address statement labeled
with L
19
20. Types of Three Address
Statements
Conditional Jumps of Form:
if x relop y goto L
with relop as relational operators and the goto executed if
the x relop y is true
Parameter Operations of Form:
param a (a parameter of function)
call p, n (call function p with n parameters)
return y (return value y from function – optional)
param a
param b
param c
call p, 3
20
21. Types of Three Address
Statements
Indexed Assignments of Form:
X := Y[i] (Set X to i-th memory location of Y)
X[i] := Y (Set i-th memory location of X to Y)
Note the limit of 3 Addresses (X, Y, i)
Cannot do: x[i] := y[j]; (4 addresses!)
Address and Pointer Assignments of Form:
X := & Y (X set to the Address of Y)
X := * Y (X set to the contents pointed to by Y)
* X := Y (Contents of X set to Value of Y)
21
22. Three Address Code
Representations
Data structures for representation of TAC can be objects or
records with fields for operator and operands.
Representations include quadruples, triples and indirect
triples.
22
23. Quadruples
In the quadruple representation, there are four
fields for each instruction: op, arg1, arg2, result
Binary ops have the obvious representation
Unary ops don’t use arg2
Operators like param don’t use either arg2 or
result
Jumps put the target label into result
The quadruples in Fig (b) implement the three-
address code in (a) for the expression a = b * - c +
b * - c
23
25. Triples
A triple has only three fields for each instruction:op,arg1,
arg2
The result of an operation x op y is referred to by its
position.
Triples are equivalent to signatures of nodes in DAG or
syntax trees.
Triples and DAGs are equivalent representations only for
expressions; they are not equivalent for control flow.
Ternary operations like x[i] = y requires two entries in the
triple structure, similarly for x = y[i].
Moving around an instruction during optimization is a
problem
25
27. Indirect Triples
These consist of a listing of pointers to triples, rather than a listing of
the triples themselves.
An optimizing compiler can move an instruction by reordering the
instruction list, without affecting the triples themselves.
27
29. Attribute Grammar for
Assignments
Concepts:
Need to Introduce Temporary Variables as
Necessary to Decompose Assignment Statement
Every Generated Line of Code Must have at Most
3 Addresses!
29
30. Declarations
Stack Utilized during Procedure/Function Calls to
Allocate Space for Variables
This now Includes Temporaries for 3AC
We need to Track
Name
Type (Int, real, boolean, etc.)
Offset (with respect to some relative address)
Function
enter (name, type, offset) creates symbol table entry
offset global initially 0
30
31. Storage Layout for Local
Names
The type and relative address are saved in the symbol-table
entry for the name .
The width of a type is the number of storage units needed for
objects of that type.
type
31
32. 32Code Generation for Boolean
Expressions
Two approaches
Numerical representation
Implicit representation
Numerical representation
Use 1 to represent true, use 0 to represent
false
For three-address code store this result in a
temporary
For stack machine code store this result in the
stack
33. Code Generation for Boolean
Expressions
Implicit representation
For the boolean expressions which are used
in flow-of-control statements (such as if-
statements, while-statements etc.) boolean
expressions do not have to explicitly
compute a value, they just need to branch
to the right instruction
Generate code for boolean expressions
which branch to the appropriate instruction
based on the result of the boolean
expression
33
34. Generated Code
Consider: a < b or c <
d and e < f
100: if a< b goto 103
101: t1:=0
102: goto 104
103: t1:=1
104: if c< d goto 107
105: t2:=0
106: goto 108
107: t2 := 1
108: if e< f goto 111
109: t3 := 0
110: goto 112
111: t3:=1
112: t4:=t2 and t3
113: t5:=t1 or t4
34