2. Overview
Intermediate code is the interface between
front end and back end in a compiler.
Ideally the details of source language are
confined to the front end and the details of
target machines to the back end (a machine
model)
This intermediate code serves as a bridge
between the high-level source code and
the final machine code or another target
language.
3. 3
Overview
In a compiler, the front end translates source
program into an intermediate representation,
and the back end generates the target code from
this intermediate representation.
The use of a machine independent intermediate
code (IC) is:
retargeting to another machine is facilitated
the optimization can be done on the machine independent
code
4. 4
Overview
Intermediate representations span the gap between
the source and target languages:
closer to target language;
(more or less) machine independent;
allows many optimizations to be done in a machine-independent way.
Implementable via syntax directed translation, so
can be folded into the parsing process.
5. Intermediate Representations
Decisions in IR design affect the speed and
efficiency of the compiler
Some important IR properties
Ease of generation
Ease of manipulation
Procedure size
Level of abstraction
The importance of different properties varies
between compilers
Selecting an appropriate IR for a compiler is
critical
5
6. 6
Types of Intermediate Languages
High Level Representations (e.g., syntax trees):
closer to the source language
easy to generate from an input program
code optimizations may not be straightforward.
Low Level Representations (e.g., 3-address
code)
closer to the target machine;
easier for optimizations, final code generation;
7. Intermediate Code Generation
Intermediate language can be many different languages, and the
designer of the compiler decides this intermediate language.
Syntax tree can be used as an intermediate language.
Postfix notation can be used as an intermediate language.
Three-address code (Quadraples) can be used as an
intermediate language
We will use three address to discuss intermediate code
generation.
Three address are close to machine instructions, but they
are not actual machine instructions.
Some programming languages have well defined intermediate
languages.
7
8. 8
Syntax Trees
A syntax tree shows the structure of a program by abstracting
away irrelevant details from a parse tree.
Each node represents a computation to be performed;
The children of the node represents what that computation is
performed on.
Syntax trees decouple parsing from subsequent
processing.
10. 10
Syntax Trees: Structure
Expressions:
leaves: identifiers or constants;
internal nodes are labeled with operators;
the children of a node are its operands.
Statements:
a node’s label indicates what kind of
statement it is;
the children correspond to the components
of the statement.
11. Syntax Tree
While parsing the input, a syntax tree can be constructed.
A syntax tree (abstract tree) is a condensed form of parse tree
useful for representing language constructs.
For example, for the string a + b, the parse tree in (a) below can
be represented by the syntax tree shown in (b);
the keywords (syntactic sugar) that existed in the parse tree will
no longer exist in the syntax tree.
11
E
E
E
+
a b
Parse
tree
+
a b
Abstract
tree
12. Three-Address Code
A three address code is: x := y op z
where x, y and z are names, constants or compiler-
generated temporaries; op is any operator.
But we may also use the following notation for three
address code (much better notation because it looks like
a machine code instruction)
op y, z, x apply operator op to y and z, and store the
result in x.
We use the term “three-address code” because each
statement usually contains three addresses (two for
operands, one for the result).
12
13. Three-Address Code
13
a:= b * -c + b * -c
t1 := - c
t2 := b * t1
t3 := - c
t4 := b * t3
t5 := t2 + t4
a := t5
t1 := - c
t2 := b * t1
t5 := t2 + t2
a := t5
14. Postfix Notation
Postfix notation is a linear representation of a syntax tree.
In the postfix notation, any expression can be written
unambiguously without parentheses
In postfix notation, the operator appears after the operands,
i.e., the operator between operands is taken out & is attached
after operands.
Example : Translate a ∗ d − (b + c) into Postfix form.
Solution
ad ∗ bc + −
15. Three-Address Code…
In three-address code:
Only one operator at the right side of the
assignment is possible, i.e. x + y * z is not
possible.
Similar to postfix notation, the three address
code is a linear representation of a syntax tree.
It has been given the name three-address code
because such an instruction usually contains
three addresses (the two operands and the result)
t1 = y * z
t2 = x + t1 15
16. Data structures for three address codes
Quadruples
Has four fields: op, arg1, arg2 and result
Triples
Temporaries are not used and instead references to
instructions are made
Indirect triples
In addition to triples we use a list of pointers to triples
17. Example-1
b * minus c + b * minus c
t1 = minus c
t2 = b * t1
t3 = minus c
t4 = b * t3
t5 = t2 + t4
a = t5
Three address code
minus
*
minus c t3
*
+
=
c t1
b t2
t1
b t4
t3
t2 t5
t4
t5 a
arg1 result
arg2
op
Quadruples
minus
*
minus c
*
+
=
c
b (0)
b (2)
(1) (3)
a
arg1 arg2
op
Triples
(4)
0
1
2
3
4
5
minus
*
minus c
*
+
=
c
b (0)
b (2)
(1) (3)
a
arg1 arg2
op
Indirect Triples
(4)
0
1
2
3
4
5
(0)
(1)
(2)
(3)
(4)
(5)
op
35
36
37
38
39
40
18. Example continued…
Construct quadruple and triple for the following
equation:(a+b) * ( c+ d) - ( a+ b+ c) three Address code:
a. Quadruple
b. triples
op arg1 arg2 result
+ a b t1
+ c d t2
* t1 t2 t3
+ t1 c t4
- t3 t4 t5 op arg1 arg2
+ a b
+ c d
* (0) (1)
+ (0) c
- (2) (3)