10211CS107 – COMPILERDESIGN
Category : Program Core
Credit : 3
School of Computing
Department of Computer Science & Engineering
1/28/2025 1
Course Handling Faculty :
Dr. T SAJU RAJ
Associate Professor
2.
COURSE CONTENT
UNIT IIntroduction to Compilers 9
Compilers, Analysis of the Source Program, The Phases of a Compiler, Cousins of
the Compiler, The Grouping of Phases, Compiler-Construction Tools. LEXICAL
ANALYSIS: Need and role of lexical analyzer-Lexical errors, Input Buffering -
Specification of Tokens, Recognition of Tokens, Design of a Lexical Analyzer
Generator.
1/28/2025 T SAJU RAJ 2
3.
COURSE CONTENT
UNIT IISyntax Analysis 9
Need and role of the parser- Context Free Grammars-Top Down parsing –
Recursive Descent Parser - Predictive Parser - LL (1) Parser -Shift Reduce Parser -
LR Parser - LR (0) item - Construction of SLR Parsing table -Introduction to LALR
Parser, YACC- Design of a syntax analyser for a sample language
1/28/2025 T SAJU RAJ 3
4.
COURSE CONTENT
UNIT IIIIntermediate Code Generation L – 9
Intermediate languages – Declarations – Assignment
Statements – Boolean Expressions – Case Statements – Back
patching – Procedure calls.
1/28/2025 T SAJU RAJ 4
5.
COURSE CONTENT
1/28/2025 TSAJU RAJ 5
UNIT IV Code Generation L – 9
Issues in the design of code generator – The target machine – Runtime Storage
management – Basic Blocks and Flow Graphs – Next-use Information – A simple Code
generator – DAG representation of Basic Blocks
6.
COURSE CONTENT
1/28/2025 TSAJU RAJ 6
UNIT V Code Optimization and Run Time Environments L – 9
Introduction– Principal Sources of Optimization – Peephole Optimization-
Optimization of basic Blocks – Introduction to Global Data Flow Analysis – Runtime
Environments – Source Language issues – Storage Organization – Storage Allocation
strategies – Access to non-local names – Parameter Passing.
1/28/2025 T SAJURAJ 8
Lexical Analysis
Lexical Analysis is the first phase of the compiler also
known as a scanner.
It converts the High level input program into a
sequence of Tokens.
Lexical Analysis can be implemented with the
Deterministic finite Automata.
The output is a sequence of tokens that is sent to the
parser for syntax analysis.
1/28/2025 T SAJURAJ 10
Syntax Analyzer
There are three general types of parsers for grammars:
• Universal parser,
• top-down, and
• bottom-up.
Universal parsing methods such as the Cocke-Younger-Kasami algorithm and
Earley's algorithm can parse any grammar.
The methods commonly used in compilers can be classified as being either top-
down or bottom-up.
As implied by their names,
Top-down methods build parse trees from the top (root) to the bottom (leaves),
while
Bottom-up methods start from the leaves and work their way up to the root.
11.
1/28/2025 T SAJURAJ 11
INTERMEDIATE CODE
• Intermediate code is used to translate the source code into the
machine code.
• Intermediate code lies between the high-level language and the
machine language.
1/28/2025 T SAJURAJ 13
CODE OPTIMIZATIONimization
Optimization is the process of
transforming a piece of code to make
more efficient (either in terms of time
or space) without changing its output
or side-effects.
14.
1/28/2025 T SAJURAJ 14
➢Optimization is a program transformation
technique, which tries to improve the code by
making it consume less resources (i.e. CPU,
Memory) and deliver high speed.
15.
1/28/2025 T SAJURAJ 15
Why optimization is needed???
• Toimprove intermediate code
• Better target code
• Executes Faster
• Shorter code
• Less power
• Complexity : Time, Space & Cost
• Efficient memory usage
• Better performance.
16.
1/28/2025 T SAJURAJ 16
“Optimization”
The only difference visible to the code’s user should
be that it runs faster and/or consumes less
memory.
The name implies you are finding an "optimal“
solution— in truth, optimization aims to improve,
not perfect, the result.
17.
1/28/2025 T SAJURAJ 17
Code Optimization – Introduction
(..contd)
A code optimizing process must follow the
three rules given below:
•The output code must not, in any way, change the meaning of the
program.
•Optimization should increase the speed of the program and if
possible, the program should demand less number of resources.
•Optimization should itself be fast and should not delay the
overall compiling process.
18.
1/28/2025 T SAJURAJ 18
Code Optimization – Introduction
(..contd)
Efforts for an optimized code can be made at various levels of
compiling the process.
At the beginning, users can change/rearrange the code or use better
algorithms to write the code.
After generating intermediate code, the compiler can modify the
intermediate code by address calculations and improving loops.
While producing the target machine code, the compiler can
make use of memory hierarchy and CPU registers.
1/28/2025 T SAJURAJ 20
Machine-independent Optimization
This code optimization phase attempts to improve
the intermediate code to get a better target code as
the output.
The part of the intermediate code which is
transformed here does not involve any CPU registers
or absolute memory locations.
21.
1/28/2025 T SAJURAJ 21
➢ Machine-dependent optimization is done after
the target code has been generated and when the
code is transformed according to the target
machine architecture.
➢ It involves CPU registers and may have absolute
memory references rather than relative references.
➢ Machine-dependent optimizers put efforts to take
maximum advantage of the memory hierarchy.
Machine-dependent Optimization
22.
1/28/2025 T SAJURAJ 22
Code Optimization – Phases
➢ Global Optimization:
Transformations are applied to large program segments that
includes functions, procedures and loops.
➢ Local Optimization:
Transformations are applied to small blocks of statements.
The local optimization is done prior to global optimization.
23.
1/28/2025 T SAJURAJ 23
Principal sources Code
Optimization
➢ Common Subexpressions elimination
➢ Copy Propagation
➢ Dead-Code elimination
➢ Constant Folding
24.
1/28/2025 T SAJURAJ 24
Common Sub expressions elimination
Frequently a program will include calculations of the
same value
• For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5
This code can be optimized using the common sub-
expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
The common sub expression t4: =4*i is eliminated as
its computation is already in t1
and the value of i
is not been changed from definition to use.
1/28/2025 T SAJURAJ 26
Copy Propagation
Assignments of the form f : = g called copy statements, or
copies for short.
The idea behind the copy-propagation transformation is to use
g for f, whenever possible after the copy statement f: = g.
Copy propagation means use of one variable instead of another.
This may not appear to be an improvement, but as we shall see
it gives us an opportunity to eliminate x.
• For example:
x=Pi;
A=x*r*r;
The optimization using copy propagation can be done as follows: A=Pi*r*r;
Here the variable x is eliminated
27.
1/28/2025 T SAJURAJ 27
Dead-Code Eliminations
A variable is live at a point in a program if its value can be used
subsequently; otherwise, it is dead at that point.
A related idea is dead or useless code, statements that compute
values that never get used.
While the programmer is unlikely to introduce any dead code
intentionally, it may appear as the result of previous
transformations.
Example:
i=0;
if(i==1)
{
a=b+5;
}
Here, ‘if’ statement is dead code because this condition will never get satisfied.
28.
1/28/2025 T SAJURAJ 28
Constant folding
Deducing at compile time that the value of an expression is a
constant and using the constant instead is known as constant folding.
One advantage of copy propagation is that it often turns the copy
statement into dead code.
For example,
a=3.14157 /2 can be replaced by
a=1.570 there by eliminating a division operation.
29.
Department of ComputerScience and Engineering
1/28/2025 T SAJU RAJ 29
Loop Optimizations
➢ Code motion, which moves code outside a loop
➢ Induction-variable elimination, which we apply to
replace variables from inner loop.
➢ Reduction in strength, which replaces and expensive
operation by a cheaper one, such as a multiplication by an
addition.
30.
Department of ComputerScience and Engineering
1/28/2025 T SAJU RAJ 30
Loop Optimizations : Example Flow Graph
31.
Department of ComputerScience and Engineering
1/28/2025
31
Code motion
An important modification that decreases the amount of
code in a loop is code motion.
This transformation takes an expression that yields the
same result independent of the number of times a loop is
executed (a loop-invariant computation) and places the
expression before the loop.
For example, evaluation of limit-2 is a loop-invariant computation
in the following while-statement:
while (i <= limit-2) /* statement does not change limit*/
Code motion will result in the equivalent of
t= limit-2;
while (i<=t) /* statement does not change limit or t */
32.
Department of ComputerScience and Engineering
1/28/2025 T SAJU RAJ 32
Induction Variables
• Induction variable elimination is used to replace variable
from inner loop.
• It can reduce the number of additions in a loop.
• It improves both code space and run time performance.
33.
Department of ComputerScience and Engineering
1/28/2025 T SAJU RAJ 33
Induction Variables
In this figure, we can
replace the
assignment
t4:=4*j
by
t4:=t4-4.
The only problem
which will be arose
that t4 does not
have a value when
we enter block B2
for the first time. So
we place a relation
t4=4*j on entry to
the block B2.
34.
Department of ComputerScience and Engineering
1/28/2025 T SAJU RAJ
34
Reduction in Strength
• Strength reduction is used to replace the expensive
operation by the cheaper once on the target machine.
• Addition of a constant is cheaper than a multiplication.
So we can replace multiplication with an addition within
the loop.
• Multiplication is cheaper than exponentiation. So we can
replace exponentiation with multiplication within the loop.
35.
Department of ComputerScience and Engineering
1/28/2025 T SAJU RAJ
35
Reduction in Strength
For example,
•Addition of a constant is cheaper than a multiplication. So we can
replace multiplication with an addition within the loop.
3*4=12 replaced by 3+3+3+3=12
•Multiplication is cheaper than exponentiation. So we can replace
exponentiation with multiplication within the loop.
x² is invariably cheaper to implement as x*x than as a call to
an exponentiation routine.
• Fixed-point multiplication or division by a power of two is
cheaper to implement as a shift.
36.
1/28/2025 T SAJURAJ 36
1.while (i<10)
2.{
3.j= 3 * i+1;
4.a[j]=a[j]-2;
5.i=i+2;
6.}
After strength
reduction the
code will be:
1.s= 3*i+1;
2.while (i<10)
3.{
4.j=s;
5.a[j]= a[j]-2;
6.i=i+2;
7.s=s+6;
8.}
In the above code, it is cheaper to compute s=s+6 than j=3 *i
37.
1/28/2025 T SAJURAJ 37
Peephole Optimization
➢ A statement-by-statement code-generations strategy
often produces target code that contains redundant
instructions and suboptimal constructs.
➢ The quality of such target code can be improved by
applying “optimizing” transformations to the target
program.
➢ The peephole is a small, moving window on the target
program.
➢ It is characteristic of peephole optimization that each
improvement may spawn opportunities for additional
improvements.
38.
1/28/2025 T SAJURAJ 38
The small set of instructions or small part of code on
which peephole optimization is performed is known
as peephole or window.
1/28/2025 T SAJURAJ 40
Peephole Optimization
Common techniques applied in peephole optimization:-
➢ Constant folding
➢ Evaluate constant sub-expressions in advance.
➢ Strength reduction
➢ Replace slow operations with faster equivalents.
➢ Null sequences
➢ Delete useless operations.
➢ Combine operations
➢ Replace several operations with one equivalent.
➢ Algebraic laws
➢ Use algebraic laws to simplify or reorder instructions.
➢ Special case instructions
➢ Use instructions designed for special operand cases.
➢ Address mode operations
➢ Use address modes to simplify code.
41.
1/28/2025 T SAJURAJ 41
Characteristics of peephole
optimizations
➢ Redundant-instructions elimination
➢ Flow-of-control optimizations
➢ Algebraic simplifications
➢ Use of machine idioms
➢ Unreachable or dead code elimination
The objective of peephole optimization is:
1.To improve performance
2.To reduce memory footprint
3.To reduce code size
42.
1/28/2025 T SAJURAJ 42
Redundant instruction elimination
Redundant load/store: see if an obvious replacement is
possible
MOV R0, a
MOV a, R0
Can eliminate the second instruction without needing
any global knowledge of a
1/28/2025 T SAJURAJ 44
Flows-Of-Control Optimizations
The unnecessary jumps can be eliminated in either the
intermediate code or the target code by the following types of
peephole optimizations.
We can replace the jump sequence
goto L1
….
L1: gotoL2 (d)
by the sequence
goto L2
….
L1: goto L2
If there are now no jumps to L1, then it may be
possible to eliminate the statement L1:goto L2
provided it is preceded by an unconditional
jump
45.
1/28/2025 T SAJURAJ 45
JUMPS
❖Folding Jumps to Jumps
A jump to an unconditional jump can copy the target address
JNE lab1
...
lab1: JMP lab2
Can be replaced by:
JNE lab2
❖Jump to Return
A jump to a return can be replaced by a return
JMP lab1
...
lab1: RET
Can be replaced by
RET
lab1 may become dead code
46.
1/28/2025 T SAJURAJ 46
ALGEBRIC SIMPLIFICATION & STRENGTH
REDUCTION
• Worth recognizing single instructions with a constant operand:
A * 1 = A
A * 0 = 0
A / 1 = A
A * 2 = A + A
More delicate with floating-point
• Strength reduction:
A ^ 2 = A * A
47.
1/28/2025 T SAJURAJ 47
ALGEBRIC SIMPLIFICATION & STRENGTH
REDUCTION
x+0 = x
0+x = x
x*1 = x
1*x = x
0/x = 0
x-0 = x
b && true = b
b && false = false
b || true = true
b || false = b
48.
1/28/2025 T SAJURAJ 48
b = 5 + a + 10 ;
Intermediate Code
tmp0 = 5 ;
tmp1 = tmp0 + a ;
tmp2 = tmp1 + 10 ;
b = tmp2 ;
Code Optimization
tmp0 = 15 ;
tmp1 = a + tmp0 ;
b = tmp1 ;
49.
1/28/2025 T SAJURAJ 49
Use of Machine Idioms
➢ The target machine may have hardware instructions to implement certain
specific operations efficiently.
➢ For example, some machines have auto-increment and auto-decrement
addressing modes.
➢ These add or subtract one from an operand before or after using its value.
The use of these modes greatly improves the quality of code
when pushing or popping a stack, as in parameter passing.
These modes can also be used in code for statements like
i : =i+1.
i:=i+1 → i++
i:=i-1 → i- -
Use of Machine Idioms
50.
1/28/2025 T SAJURAJ 50
Unreachable code
identify code which will never be executed
#define DEBUG 0
if( DEBUG)
{
print debugging info
}
if (0 != 1) goto L2
print debugging info
L2:
51.
1/28/2025 T SAJURAJ 51
Optimization of Basic Blocks
Optimization process can be applied on a basic block.
While optimization, we don't need to change the set of
expressions computed by the block.
There are two type of basic block optimization. These are as
follows:
1.Structure-Preserving Transformations
2.Algebraic Transformations
52.
1/28/2025 T SAJURAJ 52
Structure-Preserving
Transformations:
➢ Common sub-expression elimination
➢ Dead code elimination
➢ Renaming of temporary variables
➢ Interchange of two independent adjacent statements.
The primary Structure-Preserving Transformation on basic blocks are:
1/28/2025 T SAJURAJ 54
54
Dead code elimination
Dead code elimination:
❖ It is possible that a large amount of dead (useless) code may exist in
the program.
❖ This might be especially caused when introducing variables and
procedures as part of construction or error-correction of a program -
once declared and defined, one forgets to remove them in case they
serve no purpose.
❖ Eliminating these will definitely optimize the code.
55.
1/28/2025 T SAJURAJ 55
55
Renaming of temporary variables
A statement t:=b+c where t is a temporary name can be
changed to u:=b+c where u is another temporary name, and
change all uses of t to u.
In this a basic block is transformed to its equivalent block
called normal-form block.
56.
1/28/2025 T SAJURAJ 56
Interchange of two independent
adjacent statements
Two statements
t1:=b+c
t2:=x+y
can be interchanged or reordered in its computation
in the basic block when value of t1 does not affect the
value of t2.
57.
1/28/2025 T SAJURAJ 57
Algebraic Transformations
In the algebraic transformation, we can change the set of expression into an
algebraically equivalent set. Thus the expression x:= x + 0 or x:= x *1 can be
eliminated from a basic block without changing the set of expression.
Algebraic identities represent another important class of optimizations on
basic blocks.
This includes simplifying expressions or replacing expensive operation by
cheaper ones i.e. reduction in strength.
Another class of related optimizations is constant folding.
Here we evaluate constant expressions at compile time and replace the
constant expressions by their values.
Thus the expression 2*3.14 would be replaced by 6.28.
58.
1/28/2025 T SAJURAJ 58
Example:
x:=x+0 can be removed
x:=y**2 can be replaced by a cheaper statement x:=y*y
•Sometimes associative expression is applied to expose
common sub expression without changing the basic block
value.
if the source code has the assignments
1.a:= b + c
2.e:= c +d +b
The following intermediate code may be generated:
1.a:= b + c
2. t:= c +d
3. e:= t + b
Algebraic Transformations
1/28/2025 T SAJURAJ 60
• Loop Optimization is the process of increasing execution speed
and reducing the overheads associated with loops.
• It plays an important role in improving cache performance and
making effective use of parallel processing capabilities.
• Most execution time of a scientific program is spent on loops.
• Loop Optimization is a machine independent optimization.
• Decreasing the number of instructions in an inner loop improves
the running time of a program even if the amount of code
outside that loop is increased.
Loop Optimization
61.
1/28/2025 T SAJURAJ 61
Frequency Reduction (Code Motion):
• In frequency reduction, the amount of code in loop is decreased.
• A statement or expression, which can be moved outside the loop body without
affecting the semantics of the program, is moved outside the loop.
Loop Optimization
Initial code:
while(i<100)
{
a = Sin(x)/Cos(x) + i;
i++;
}
Optimized code:
t = Sin(x)/Cos(x);
while(i<100)
{
a = t + i;
i++;
}
62.
1/28/2025 T SAJURAJ 62
Loop Unrolling:
• Loop unrolling is a loop transformation technique that helps to optimize the
execution time of a program.
• We basically remove or reduce iterations.
• Loop unrolling increases the program’s speed by eliminating loop control
instruction and loop test instructions.
Loop Optimization
Initial code:
for (int i=0; i<5; i++)
printf("Pankajn");
Optimized code:
printf("Pankajn");
printf("Pankajn");
printf("Pankajn");
printf("Pankajn");
printf("Pankajn");
63.
1/28/2025 T SAJURAJ 63
Loop Jamming:
• Loop jamming is the combining the two or more loops in a single loop.
• It reduces the time taken to compile the many number of loops.
Loop Optimization
Initial Code:
for(int i=0; i<5; i++)
a = i + 5;
for(int i=0; i<5; i++)
b = i + 10;
Optimized code:
for(int i=0; i<5; i++)
{
a = i + 5;
b = i + 10;
}
64.
Example – three-addresscode
1) i = m-1
2) j = n
3) t1 = 4*n
4) v = a[t1]
5) i = i+1
6) t2 = 4*i
7) t3 = a[t2]
8) if t3<v goto 5
9) j = j-1
10) t4 = 4*j
11) t5 = a[t4]
12) if t5>v goto 9
13) if i>=j goto 23
14) t6 = 4*i
15) x = a[t6]
16) t7 = 4*i
17) t8 = 4*j
18) t9 = a[t8]
19) a[t7] = t9
20) t10 = 4*j
1/28/2025 64
21) a[t10] = x
22) goto 5
23) t11 = 4*i
24) x = a[t11]
25) t12 = 4*i
26) t13 = 4*n
27) t14 = a[t13]
28) a[t12] = t14
29) t15 = 4*n
30) a[t15] = x
65.
Example – control-flowgraph
1/28/2025 65
i = m-1
j = n
t1 = 4*n
v = a[t1]
i = i+1
t2 = 4*i
t3 = a[t2]
if t3<v goto B2
j = j-1
t4 = 4*j
t5 = a[t4]
if t5>v goto B3
if i>=j goto B6
t6 = 4*i
x = a[t6]
t7 = 4*i
t8 = 4*j
t9 = a[t8]
a[t7] = t9
t10 = 4*j
a[t10] = x
goto B2
t11 = 4*i
x = a[t11]
t12 = 4*i
t13 = 4*n
t14 = a[t13]
a[t12] = t14
t15 = 4*n
a[t15] = x
B1
B2
B3
B4
B5 B6
Example – GCSE
1/28/202567
i = m-1
j = n
t1 = 4*n
v = a[t1]
i = i+1
t2 = 4*i
t3 = a[t2]
if t3<v goto B2
j = j-1
t4 = 4*j
t5 = a[t4]
if t5>v goto B3
if i>=j goto B6
x = t3
a[t2] = t5
a[t4] = x
goto B2
x = t3
t14 = a[t1]
a[t2] = t14
a[t1] = x
B1 B4
B5 B6
B2
B3
68.
copy propagation anddead-code elimination
1/28/2025 68
x = t3
a[t2] = t5
a[t4] = x
goto B2
B5
x = t3
a[t2] = t5
a[t4] = t3
goto B2
B5
x = t3
a[t2] = t5
a[t4] = t3
goto B2
B5
a[t2] = t5
a[t4] = t3
goto B2
B5
69.
Reduction in strength
1/28/202569
i = m-1
j = n
t1 = 4*n
v = a[t1]
B1
j = j-1
t4 = 4*j
t5 = a[t4]
if t5>v goto B3
B3
B2
if i>=j goto B6
B4
B5 B6
i = m-1
j = n
t1 = 4*n
v = a[t1]
t4 = 4*j
B1
j = j-1
t4 = t4-4
t5 = a[t4]
if t5>v goto B3
B3
B2
if i>=j goto B6
B4
B5 B6
70.
Removing induction variable
1/28/202570
i = m-1
j = n
t1 = 4*n
v = a[t1]
t4 = 4*j
B1
j = j-1
t4 = t4-4
t5 = a[t4]
if t5>v goto B3
B3
B2
if i>=j goto B6
B4
B5 B6
i = m-1
j = n
t1 = 4*n
v = a[t1]
t4 = 4*j
B1
t4 = t4-4
t5 = a[t4]
if t5>v goto B3
B3
B2
if i>=j goto B6
B4
B5 B6
71.
Result
1/28/2025 71
i =m-1
j = n
t1 = 4*n
v = a[t1]
t2 = 4*i
t4 = 4*j
t2 = t2+4
t3 = a[t2]
if t3<v goto B2
t4 = t4-4
t5 = a[t4]
if t5>v goto B3
if t2>=t4 goto B6
a[t2] = t5
a[t4] = t3
goto B2
t14 = a[t1]
a[t2] = t14
a[t1] = t3
B1 B4
B5 B6
B2
B3
72.
Example – finalresult
• Another application of
LCSE on B1
1/28/2025 72
i = m-1
t1 = 4*n
v = a[t1]
t2 = 4*i
t4 = t1
t2 = t2+4
t3 = a[t2]
if t3<v goto B2
t4 = t4-4
t5 = a[t4]
if t5>v goto B3
if t2>=t4 goto B6
a[t2] = t5
a[t4] = t3
goto B2
t14 = a[t1]
a[t2] = t14
a[t1] = t3
B1 B4
B5 B6
B2
B3
73.
1/28/2025 T SAJURAJ 73
INTRODUCTION TO GLOBAL
DATAFLOW ANALYSIS
▪ In order to do code optimization and a good job of code generation
▪ Compiler needs to collect information about the program
▪ To distribute this information to each block in the flow graph
▪ A compiler could take advantage of “reaching definitions” , such as knowing
where a variable like debug was last defined before reaching a given block,
in order to perform transformations an optimizing compiler collects by a process
known as data-flow analysis.
▪ Data-flow information can be collected by setting up and solving systems of equations of
the form :
Out[S]=gen[S] U (In[S]-kill[S])
Out[s]= Info at the end of S.
gen[s]=Information generated by S.
In[s]=Information enters at the beginning of S.
74.
1/28/2025 T SAJURAJ 74
Out[S]=gen[S] U (In[S]-kill[S])
This equation can be read as “ the information at the end of a statement is
either
generated within the statement ,
or
enters at the beginning and is not killed as control flows through the statement.”
Such equations are called data-flow equation.
INTRODUCTION TO GLOBAL
DATAFLOW ANALYSIS
1/28/2025 T SAJURAJ 77
INTRODUCTION TO GLOBAL
DATAFLOW ANALYSIS
Points and Paths:
▪ Within a basic block, we talk of the point between two adjacent
statements, as well as the point before the first statement and after the
last.
▪ Thus, block B1 has four points: one before any of the assignments and
one after each of the three assignments.
Now let us take a global view and consider all the points in all the blocks.
A path from p1 to pn is a sequence of points p1, p2,….,pn such that for each i
between 1 and n-1, either
1. Pi is the point immediately preceding a statement and pi+1 is the point
immediately following that statement in the same block, or
2. Pi is the end of some block and pi+1 is the beginning of a successor block
1/28/2025 T SAJURAJ 79
INTRODUCTION TO GLOBAL
DATAFLOW ANALYSIS
• A definition of variable x is a statement that assigns, a value to x.
• The most common forms of definition are assignments to x and
statements that read a value from an i/o device and store it in x.
• These statements certainly define a value for x, and they are referred to
as unambiguous definitions of x.
• There are certain kinds of statements that may define a value for x;
they are called ambiguous definitions.
80.
1/28/2025 T SAJURAJ 80
INTRODUCTION TO GLOBAL
DATAFLOW ANALYSIS
The most usual forms of ambiguous definitions of x are:
1. A call of a procedure with x as a parameter or a procedure that can access
x because x is in the scope of the procedure.
2. An assignment through a pointer that could refer to x.
For example, the assignment *q:=y is a definition of x if it is possible that q
points to x. we must assume that an assignment through a pointer is a
definition of every variable.
81.
1/28/2025 T SAJURAJ 81
INTRODUCTION TO GLOBAL
DATAFLOW ANALYSIS
S->id: = E| S; S | if E then S else S | do S while E
E->id + id| id
82.
1/28/2025 T SAJURAJ 82
INTRODUCTION TO GLOBAL
DATAFLOW ANALYSIS
S d:a=b+c
Gen[S]={d}
Kill[S]=Da - {d}
Out[S]=gen[S]U(in[S]-kill[S])
Observe the rules for a single assignment of
variable a. Surely that assignment is a
definition of a, say d. Thus
gen[S]={d}
On the other hand, d “kills” all other
definitions of a, so we write
Kill[S] = Da - {d}
Where, Da is the set of all definitions in the
program for variable a.
83.
1/28/2025 T SAJURAJ 83
INTRODUCTION TO GLOBAL
DATAFLOW ANALYSIS
Under what circumstances is definition d generated by S=S1; S2? First of all, if it is
generated by S2, then it is surely generated by S. if d is generated by S1, it will reach the
end of S provided it is not killed by S2. Thus, we write
gen[S]=gen[S2] U (gen[S1]-kill[S2])
Similar reasoning applies to the killing of a definition, so we have
Kill[S] = kill[S2] U (kill[S1] - gen[S2])
➢ A programas a source code is merely a collection of text code, statements etc.
and to make it alive, it requires actions to be performed on the target machine.
➢ A program needs memory resources to execute instructions.
➢ A program contains names for procedures, identifiers etc., that require
mapping with the actual memory location at runtime.
➢ By runtime, we mean a program in execution. Runtime environment is a state
of the target machine, which may include software libraries, environment
variables, etc., to provide services to the processes running in the system.
RUNTIME ENVIRONMENT
1/28/2025
89
90.
ACTIVATION
➢ A programis a sequence of instructions combined into a number of
procedures.
➢ Instructions in a procedure are executed sequentially.
➢ A procedure has a start and an end delimiter and everything inside it
is called the body of the procedure.
➢ The procedure identifier and the sequence of finite instructions inside
it make up the body of the procedure.
1/28/2025
90
91.
➢ The executionof a procedure is called its activation.
➢ An activation record contains all the necessary information required to call a
procedure.
➢ An activation record may contain the following units depending upon the
source language used.
ACTIVATION RECORD
1/28/2025
91
92.
ACTIVATION RECORD
1/28/2025
92
Temporaries Storestemporary and intermediate values of an
expression.
Local Data Stores local data of the called procedure.
Machine
Status
Stores machine status such as Registers, Program
Counter etc., before the procedure is called.
Control Link Stores the address of activation record of the caller
procedure.
Access Link Stores the information of data which is outside the
local scope.
Actual
Parameters
Stores actual parameters, i.e., parameters which are
used to send input to the called procedure.
Return Value Stores return values.
93.
Whenever a procedureis executed, its activation record is stored on the stack,
also known as control stack.
When a procedure calls another procedure, the execution of the caller is suspended
until the called procedure finishes execution.
At this time, the activation record of the called procedure is stored on the stack.
CONTROL STACK
Mr. K. Sankar Ganesh , Assistant Professor Department of Computer Science & Engineering
1151CS115-Compiler Design
1/28/2025
93
94.
➢ We assumethat the program control flows in a sequential manner and when a
procedure is called, its control is transferred to the called procedure.
➢ When a called procedure is executed, it returns the control back to the caller.
➢ This type of control flow makes it easier to represent a series of activations in
the form of a tree, known as the activation tree.
CONTROL STACK
1/28/2025
94
95.
To understand thisconcept, we take a piece of code as an example
. . .
printf(“Enter Your Name: “); scanf(“%s”, username); show_data(username);
printf(“Press any key to continue…”);
. . .
int show_data(char *user)
{
printf(“Your name is %s”, username); return 0;
}
. . .
ACTIVATION TREES
1/28/2025
95
96.
Below is theactivation tree of the code given.
Now we understand that procedures are executed in depth-first manner, thus stack
allocation is the best suitable form of storage for procedure activations.
ACTIVATION TREES
1/28/2025
96
1/28/2025
101
STORAGE ALLOCATION
STRATEGIES
Runtime environmentmanages runtime memory requirements for the following entities:
Code :
It is known as the text part of a program that does not change at runtime. Its memory
requirements are known at the compile time.
Procedures :
Their text part is static but they are called in a random manner. That is why, stack storage
is used to manage procedure calls and activations.
Variables :
Variables are known at the runtime only, unless they are global or constant. Heap memory
allocation scheme is used for managing allocation and de-allocation of memory for variables
in runtime.
102.
Mr. K. SankarGanesh , Assistant Professor Department of Computer Science & Engineering
1151CS115-Compiler Design
1/28/2025
102
The various storage allocation strategies to allocate storage
in different data areas of memory are:
1. Static Allocation
• Storage is allocated for all data objects at compile time
2. Stack Allocation
• The storage is managed as a stack
3. Heap Allocation (It is one of Dynamic Storage Allocation)
• The storage is allocated and deallocated at runtime
from a data area known as heap
STORAGE ALLOCATION STRATEGIES
103.
1/28/2025
103
➢ As shownin the image above, the text part of the code is allocated a fixed
amount of memory.
➢ Stack and heap memory are arranged at the extremes of total memory allocated
to the program. Both shrink and grow against each other.
STORAGE ALLOCATION
STRATEGIES
1/28/2025
105
Static Allocation
➢ Ina static environment (Fortran 77) there are a number
of restrictions:
▪ Size of data objects are known at compile time
▪ No recursive procedures
▪ No dynamic memory allocation
➢ Only one copy of each procedure activation record exists
at time t
▪ We can allocate storage at compile time
• Bindings do not change at runtime
• Every time a procedure is called, the same bindings
occur
106.
1/28/2025
106
Static Allocation
➢ Staticallyallocated names are bound to relocatable storage
at compile time.
➢ Storage bindings of statically allocated names never
change.
➢ The compiler uses the type of a name (retrieved from the
symbol table) to determine storage size required.
➢ The required number of bytes (possibly aligned) is set
aside for the name.
➢ The relocatable address of the storage is fixed at compile
time.`
1/28/2025
108
Static Allocation
➢ Limitations:
•The size required must be known at compile time.
• Recursive procedures cannot be implemented statically.
• No data structure can be created dynamically as all
data is static.
109.
1/28/2025
109
Stack-based Allocation
➢ Ina stack-based allocation, the previous restrictions are lifted (Pascal,
C, etc)
▪ procedures are allowed to be called recursively
o Need to hold multiple activation records for the same
procedure
o Created as required and placed on the stack
✓ Each record will maintain a pointer to the record that
activated it
✓ On completion, the current record will be deleted from
the stack and control is passed to the calling record
▪ Dynamic memory allocation is allowed
▪ Pointers to data locations are allowed
110.
1/28/2025
110
Stack-based Allocation
➢ Storageis organized as a stack.
➢ Activation records are pushed and popped.
➢ Locals and parameters are contained in the activation
records for the call.
➢ This means locals are bound to fresh storage on every call.
➢ We just need a stack_top pointer.
➢ To allocate a new activation record, we just increase
stack_top.
➢ To deallocate an existing activation record, we just
decrease stack_top.
1/28/2025
116
Address generation instack allocation
➢ The position of the activation record on the stack cannot
be determined statically.
➢ Therefore the compiler must generate addresses RELATIVE
to the activation record.
➢ We generate addresses of the
form stack_top + offset
117.
1/28/2025
117
Stack Allocation Advantagesand
Disadvantages
Advantages:
➢ It supports recursion as memory is always allocated on
block entry.
➢ It allows to create data structures dynamically.
➢ It allows an array declaration like A(I, J), since actual
allocation is made only at execution time. The
dimension bounds need not be known at compile time.
Disadvantages:
➢ Memory addressing has to be effected through pointers
and index registers which may be store them, static
allocation especially in case of array reference.
118.
1/28/2025
118
Heap Allocation
Stack allocationcannot be used if:
➢ The values of the local variables must be retained when
an activation ends
➢ A called activation outlives the caller
➢ In such a case de-allocation of activation record cannot
occur in last-in first-out fashion
➢ Heap allocation gives out pieces of contiguous storage
for activation records
119.
1/28/2025
119
Heap Allocation
There aretwo aspects of dynamic allocation :
➢ Runtime allocation and de-allocation of data structures
➢ Languages like Algol have dynamic data structures and it reserves some
part of memory for it.
If a procedure wants to put a value that is to be used after its activation is over then
cannot use stack for that purpose. That is language like Pascal allows data to be alloca
under program control. Also in certain language a called activation may outlive the ca
procedure. In such a case last-in-first-out queue will not work and we will require a da
structure like heap to store the activation. The last case is not true for those languag
whose activation trees correctly depict the flow of control between procedures.
➢ Some languages do not have tree-structured allocations.
➢ In these cases, activations have to be allocated on the heap.
➢ This allows strange situations, like callee activations that live longer than
their callers’ activations.
120.
1/28/2025
120
Heap Allocation
Heap Allocation
1.Heap allocation is the most flexible allocation scheme.
2. Variables local to a procedure are allocated and de-allocated only at runtime.
3. Heap allocation is used to dynamically allocate memory to the variables and claim it
back when the variables are no more required.
4. Except statically allocated memory area, both stack and heap memory can grow and
shrink dynamically and unexpectedly.
5. Therefore, they cannot be provided with a fixed amount of memory in the system.
6. Heap storage allocation supports the recursion process.
1. when aprocedure refer to variables that are not local to it, then such
variables are called nonlocal variables.
2. There are two types of scope rules, for the non-local names.
➢ Static scope
➢ Dynamic scope
ACCESS TO NON-LOCAL NAMES
1/28/2025
125
Static Scope orLexical Scope
1. Lexical scope is also called static scope. In this type of scope, the scope
is verified by investigative the text of the program.
2. Examples: PASCAL, C and ADA are the languages that use the static
scope rule.
3. These languages are also called block structured languages.
STATIC SCOPE OR LEXICAL S`COPE
1/28/2025
127
128.
Block
▪ A blockdefines a new scope with a sequence of statements that
contains the local data declarations.
▪ It is enclosed within the delimiters.
Example:
{
Declaration statements
……….
}
STATIC SCOPE OR LEXICAL SCOPE
1/28/2025
128
129.
The beginning andend of the block are specified by the delimiter. The blocks can be
in nesting fashion that means block B2 completely can be inside the block B1
In a block structured language, scope declaration is given by static rule or most
closely nested loop
At a program point, declarations are visible
1. The declarations that are made inside the procedure.
2. The names of all enclosing procedures.
3. The declarations of names made immediately within such procedures.
STATIC SCOPE OR LEXICAL SCOPE
1/28/2025
129
130.
1. The displayedimage on the screen shows the storage for the names
corresponding to particular block.
2. Thus, block structure storage allocation can be done by stack.
STATIC SCOPE OR LEXICAL SCOPE
1/28/2025
130
131.
LEXICAL SCOPE FORNESTED PROCEDURE
If a procedure is declared inside another procedure then that procedure is known as
nested procedure
A procedure pi, can call any procedure, i.e., its direct ancestor or older siblings of
its direct ancestor
Procedure main
Procedure P1
Procedure P2
Procedure P3
Procedure P4
1/28/2025
131
Nesting Depth
Lexical scopecan be implemented by using nesting depth of a procedure.
The procedure of calculating nesting depth is as follows:
➢ The main programs nesting depth is ‘1’
➢ When a new procedure begins, add ‘1’ to nesting depth each time
➢ When you exit from a nested procedure, subtract ‘1’ from depth each
time.
➢ The variable declared in specific procedure is associated with
nesting depth.
LEXICAL SCOPE FOR NESTED PROCEDURE
1/28/2025
133
134.
Static Scope orLexical Scope
The lexical scope can be implemented using access link and displays.
Access Link:
▪ Access links are the pointers used in the implementation of lexical scope
which is obtained by using pointer to each activation record.
▪ If procedure p is nested within a procedure q then access link of p points
to access link or most recent activation record of procedure q.
STATIC SCOPE OR LEXICAL SCOPE
1/28/2025
134
135.
Example: Consider thefollowing piece of code and the runtime stack during execution of the program
program test;
var a: int;
procedure A;
var d: int;
{
a := 1,
}
procedure B(i: int);
var b : int;
procedure C;
var k : int;
{
A;
}
{
if(i<>0) then B(i-1)
else C;
}
{
B(1);
}
STATIC SCOPE OR LEXICAL SCOPE
1/28/2025
135
136.
1. If accesslinks are used in the search, then the search can be slow
2. So, optimization is used to access an activation record from the direct
location of the variable without any search
3. Display is a global array d of pointers to activation records, indexed by
lexical nesting depth. The number of display elements can be known at
compiler time
4. d[i] is an array element which points to the most recent activation of the
block at nesting depth (or lexical level)
DISPLAYS
1/28/2025
136
137.
A nonlocal Xis found in the following manner:
1. Use one array access to find the activation record containing X.
if the most-closely nested declaration of X is at nesting depth I,
the d[i] points to the activation record containing the location for
X.
2. Use relative address within the activation record
DISPLAYS
1/28/2025
137
How to maintaindisplay information?
EXAMPLE
Mr. K. Sankar Ganesh , Assistant Professor Department of Computer Science & Engineering
1151CS115-Compiler Design
1/28/2025
139
140.
When a procedureis called, a procedure ‘p’ at nesting depth ‘i’ is setup:
Save value of d[i] in activation record for ‘p’
‘I’ set d[i] to point to new activation record
When a ‘p’ returns:
Reset d[i] to display value stored
Where can display be maintained?
Registers
In statically allocated memory (data segment)
Store display or control stack and create a new copy on each entry
EXAMPLE
1/28/2025
140
1. The communicationmedium among procedures is known as parameter
passing.
2. First go through some basic terminologies pertaining to the values in a
program.
PARAMETER PASSING
1/28/2025
142
143.
r-value
1. The valueof an expression is called its r-value.
2. The value contained in a single variable also becomes an r-value if it
appears on the right-hand side of the assignment operator.
3. r-values can always be assigned to some other variable.
PARAMETER PASSING
1/28/2025
143
144.
l-value
1. The locationof memory address where an expression is stored is
known as the l-value of that expression.
2. It always appears at the left hand side of an assignment operator.
PARAMETER PASSING
1/28/2025
144
145.
For example:
day =1;
week = day * 7; month = 1;
year = month * 12;
From this example, we understand that constant values like 1, 7, 12,
and variables like day, week, month and year, all have r-values.
Only variables have l-values as they also represent the memory
location assigned to them.
PARAMETER PASSING
1/28/2025
145
146.
For example:
7 =x + y; is an l-value error, as the constant 7 does not represent any
memory location.
Formal Parameters
1. Variables that take the information passed by the caller procedure are
called formal parameters.
2. These variables are declared in the definition of the called function.
PARAMETER PASSING
1/28/2025
146
147.
Actual Parameters
1. Variableswhose values or addresses are being passed to the
called procedure are called actual parameters.
2. These variables are specified in the function call as arguments.
PARAMETER PASSING
1/28/2025
147
148.
Example:
fun_one()
{
int actual_parameter =10;
call fun_two(int actual_parameter);
}
fun_two(int formal_parameter)
{
print formal_parameter;
}
Formal parameters hold the information of the actual parameter,
depending upon the parameter passing technique used. It may be a
value or an address.
PARAMETER PASSING
1/28/2025
148
149.
Pass by Value
1.In pass by value mechanism, the calling procedure passes the r-value
of actual parameters and the compiler puts that into the called
procedure’s activation record.
2. Formal parameters then hold the values passed by the calling
procedure.
3. If the values held by the formal parameters are changed, it should
have no impact on the actual parameters.
PASS BY VALUE
Mr. K. Sankar Ganesh , Assistant Professor Department of Computer Science & Engineering
1151CS115-Compiler Design
1/28/2025
149
150.
Pass by Reference
Inpass by reference mechanism, the l-value of the actual parameter is
copied to the activation record of the called procedure.
This way, the called procedure now has the address memory location of the
actual parameter and the formal parameter refers to the same memory
location.
Therefore, if the value pointed by the formal parameter is changed, the
impact should be seen on the actual parameter as they should also point to
PASS BY REFERENCE
1/28/2025
150
151.
Pass by Copy-restore
Thisparameter passing mechanism works similar to ‘pass-by-reference’
except that the changes to actual parameters are made when the called
procedure ends.
Upon function call, the values of actual parameters are copied in the
activation record of the called procedure.
PASS BY COPY-RESTORE
1/28/2025
151
152.
Formal parameters
If manipulatedhave no real-time effect on actual parameters as l −
values are passed, but when the called procedure ends, the l-values of
formal parameters are copied to the l-values of actual parameters.
FORMAL PARAMETERS
Mr. K. Sankar Ganesh , Assistant Professor Department of Computer Science & Engineering
1151CS115-Compiler Design
1/28/2025
152
153.
Pass by Name
Languageslike Algol provide a new kind of parameter passing
mechanism that works like preprocessor in C language.
In pass by name mechanism, the name of the procedure being called
is replaced by its actual body.
Pass-by-name textually substitutes the argument expressions in a
procedure call for the corresponding parameters in the body of the
procedure so that it can now work on actual parameters, much like
pass-by-reference.
PASS BY NAME
1/28/2025
153