Code generator

Agenda
• Introduction
• Code Generation
• Issues in the Design of a Code Generator
• The Target Language
• Addresses in the Target Code
• Basic Blocks and Flow Graphs
• Optimization of Basic Blocks
• Next…
09/03/12 2

Introduction
• The final phase in our compiler model

Source Front End Intermediate Code Intermediate Code Target
program code Optimizer code Generator program

Position Of a Code Generator

09/03/12 3

Code Generation
• This phase generates the target code consisting of
assembly code.
1. Memory locations are selected for each variable;

2. Instructions are translated into a sequence of assembly instructions;

3. Variables and intermediate results are assigned to memory registers.

09/03/12 4

Issue in the Design of a Code
Generator
• Input to the Code Generator
• The Target Program
• Instruction Selection
• Register Allocation
• Evaluation Order

09/03/12 5

Input to the Code Generator
• The input to the code generator is
– The intermediate representation of the source program
produced by the frontend along with the symbol table.

• Choices for the IR
– Three-address representations
– Virtual machine representations
– Linear representations
– Graphical representation

09/03/12 6

The Target Program
• The instruction-set architecture of the target machine
has a significant impact on the difficulty of constructing a
good code generator that produces high-quality machine
code.
• The most common target-machine architecture are
RISC, CISC, and stack based.
– A RISC machine typically has many registers, three-address
instructions, simple addressing modes, and a relatively simple
instruction-set architecture.

09/03/12 7

Contd…
– A CISC machine typically has few registers, two-address
instructions, and variety of addressing modes, several register
classes, variable-length instructions, and instruction with side
effects.

– In a stack-based machine, operations are done by pushing
operands onto a stack and then performing the operations on
the operands at the top of the stack.

09/03/12 8

Instruction Selection
• Instruction selection is important to obtain
efficient code
• Suppose we translate three-address code
x:=y+z
to: MOV y,R0
ADD z,R0
MOV R0,x a:=a+1 MOV a,R0
ADD #1,R0
MOV R0,a
Cost = 6
Better Better

ADD #1,a INC a
09/03/12 9
Cost = 3 Cost = 2

Register Allocation
• A key problem in code generation is deciding
what values to hold in what registers.
• Efficient utilization is particularly important.
• The use of registers is often subdivided into two
sub problems:
1. Register Allocation, during which we select the set of variables
that will reside in registers at each point in the program.
2. Register assignment, during which we pick the specific
register that a variable will reside in.

09/03/12 10

Contd…
• Finding an optimal assignment of registers to
variables is difficult, even with single-register
machine.
• Mathematically, the problem is NP-complete.

09/03/12 11

Evaluation Order
• The order in which computations are performed
can affect the efficiency of the target code.

• Some computation orders require fewer
registers to hold intermediate results than
others.

• However, picking a best order in the general
case is a difficult NP-complete problem.
09/03/12 12

The Target Language
• A Simple Target Machine Model
• Program and Instruction Costs

09/03/12 13

A Simple Target Machine Model
• Our target computer models a three-address
machine with load and store operations,
computation operations, jump operations, and
conditional jumps.

• The underlying computer is a byte-addressable
machine with n general-purpose registers.

09/03/12 14

Contd…
• Assume the following kinds of instructions are
available:
– Load operations
– Store operations
– Computation operations
– Unconditional jumps
– Conditional jumps

09/03/12 15

Contd…
• Assume a variety of addressing models:
– A variable name x referring o the memory location
that is reserved for x.
– Indexed address, a(r), where a is a variable and r is a
register.
– A memory can be an integer indexed by a register, for
example, LD R1, 100(R2).
– Two indirect addressing modes: *r and *100(r)
– Immediate constant addressing mode

09/03/12 16

A Simple Target Machine Model
• Example :

x = y –z ⇒ LD R1, y x = *p ⇒ LD R1, p
LD R2, z LD R2, 0(R1)
SUB R1, R1, R2 ST x, R2
ST x, R1

b = a[i] ⇒ LD R1, i *p = y ⇒ LD R1, p
MUL R1, R1, 8 LD R2, y
LD R2, a(R1) ST 0(R1), R2
ST b, R2

09/03/12 17

a[j] = c ⇒ LD R1, c
LD R2, j
MUL R2, R2, 8
ST a(R2), R1

if x < y goto L ⇒ LD R1, x
LD R2, y
SUB R1, R1, R2
BLTZ R1, L

09/03/12 18

Program and Instruction Costs
• For simplicity, we take the cost of an instruction
to be one plus the costs associated with the
addressing modes of the operands.

• Addressing modes involving registers have zero
additional cost, while those involving a memory
location or constant in them have an additional
cost f one.

09/03/12 19

Contd…
• For example,
– LD R0, R1 cost = 1
– LD R0, M cost = 2
– LD R1, *100(R2) cost = 3

09/03/12 20

Addresses in the Target Code
• Static Allocation
• Stack Allocation
• Run-Time Allocation

09/03/12 21

Static Allocation
• Focus on the following three-address
statements:
– Call callee
– Return
– Halt
– Action

09/03/12 22

Static Allocation
• Store return address and return control to caller

• Store
ST callee.staticArea , #here + 20
BR callee.codeArea

• Return
BR *callee.staticArea

09/03/12 23

Runtime Addresses for Names
• Assumption: a name in a three-address
statement is really a pointer to a symbol-
table entry for that name.

• Note that names must eventually be
replaced by code to access storage
locations

09/03/12 24

Contd…
• Example:
– x=0
– Suppose the symbol-table entry for x contains a relative address
12
– x is in a statically allocated area beginning at address static
– the actual run-time address of x is static + 12
– The actual assignment: static [ 12] = 0
– For a static area starting at address 100: LD 112, #0

09/03/12 25

Basic Blocks and Flow Graphs
• Basic Blocks
• Next-Use Information
• Flow Graphs
• Representation of Flow Graphs
• Loops

09/03/12 26

Basic Blocks
• Algorithm: Partitioning three-address instructions
into basic blocks.

INPUT: A sequence of three-address instructions.

OUTPUT: A list of the basic blocks for that sequence in which each
instruction is assigned to exactly one basic block.

09/03/12 27

Contd…
METHOD: First, we determine those instructions in the
intermediate code that are leaders. rules for finding leaders are:

1. The first three-address instruction in the intermediate code is
a leader.
2. Any instruction that is the target of a conditional or
unconditional jump is a leader.
3. Any instruction that immediately follows a conditional or
unconditional jump is a leader.

09/03/12 28

Contd…
• Find the leaders

09/03/12 29

Next-Use Information
• The use of a name in a three-address
statement:
– Three-address statement i assigns a value to x
– Statement j has x as an operand
– Control can flow from statement i to j along a path that has no
intervening assignments to x
– Then statement j uses the value of x computed at i .

– Say that x is live at statement i .

09/03/12 30

Next-Use Information
• Algorithm (for x=y+z) : Determining the liveness
and next-use information for each statement in a
basic block.

– INPUT: A basic block B of three-address statements. Assume
the symbol table initially shows all nontemporary variables in B
as being live on exit.

– OUTPUT: At each statement i : x = y + z in B, attach to i the
liveness and next-use information of x, y, and z .

09/03/12 31

Contd…
– METHOD: Start at the last statement in B and scan backwards
to the beginning of B. At each statement i: x = y + z in B, do the
following:
1. Attach to i the information currently found in the symbol table
regarding the next use and
liveness of x , y, and z.
2. In the symbol table, set x to "not live" and "no next use."
3. In the symbol table, set y and z to "live" and the next uses of y
and z to i.

09/03/12 32

Flow Graphs
• A flow graph is a graphical depiction of a
sequence of instructions with control flow
edges
• A flow graph can be defined at the
intermediate code level or target code
level
MOV 1,R0 MOV 0,R0
MOV n,R1 MOV n,R1
JMP L2 JMP L2
L1: MUL 2,R0 L1: MUL 2,R0
SUB 1,R1 SUB 1,R1
09/03/12 L2: JMPNZ R1,L1 L2: JMPNZ R1,L1 33

Loops
• A loop is a collection of basic blocks, such
that
– All blocks in the collection are strongly
connected
– The collection has a unique entry, and the
only way to reach a block in the loop is
through the entry

09/03/12 34

Optimizing of Basic Block
• Compile time evaluation
• Common sub-expression elimination
• Code motion
• Strength Reduction
• Dead code elimination
• Algebraic Transformations

Compile-Time Evaluation
• Expressions whose values can be pre-computed
at the compilation time
• Two ways:
– Constant folding
– Constant propagation

36

• Constant folding: Evaluation of an expression
with constant operands to replace the
expression with single value
• Example:

area := (22.0/7.0) * r ^ 2

area := 3.14286 * r ^ 2

37

• Constant Propagation: Replace a variable with
constant which has been assigned to it earlier.
• Example:
pi := 3.14286
area = pi * r ^ 2

area = 3.14286 * r ^ 2

Common Sub-expression Elimination

• Local common sub-expression elimination
– Performed within basic blocks.

a := b * c temp := b * c
… a := temp
…
… …
x := b * c + 5 x := temp + 5

Code Motion
• Moving code from one part of the program
to other without modifying the algorithm
– Reduce size of the program
– Reduce execution frequency of the code
subjected to movement

40

Code Motion
• Similar to common sub-expression elimination
but with the objective to reduce code size.

temp := x * 2
If(a<b) then
If(a<b) then
z:= x * 5
z:= temp
else
else
y := x * 5 + 2
y := temp + 2

Strength Reduction

• Replacement of an operator with a less costly one.

X=x^2 X=x*x

Y=y*2 Y=y+y

Dead Code Elimination
• Dead Code are portion of the program which will
not be executed in Basic block.

If(a==b) If(a==b)
{ {
b=c ; b=c ;
….. …..
return b ; return b ;
c=0; }
}

References
• Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,
“Compilers: Principles, Techniques, and Tools”
Addison-Wesley, 1986.

• http://en.wikipedia.org/wiki/Code_generation_(compiler)
• http://www.mec.ac.in/resources/notes/notes/compiler/mo
dule5/codegenissues.htm

09/03/12 44

Code generator

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Code generator

Similar to Code generator (20)

More from Tech_MX

More from Tech_MX (20)

Recently uploaded

Recently uploaded (20)

Code generator

Editor's Notes