Compiler Design

7UNIT 01
1.Define Compiler and different phases of compiler with Diagram.
⇒It is a computer program that converts High level language to machine level language.
The Compiler has two modules namely the front end and the back end. Front-end
constitutes the Lexical Analyzer, semantic analyzer, syntax analyzer, and intermediate
code generator. And the rest are assembled to form the back end.
Lexical Analyzer: This phase reads the character in the source program & grouped
them in tokens. Ex. +,=, ,.
Example:
x = y + 10
Tokens
X identifier
= Assignment operator
Y identifier
+ Addition operator
10 Number

Syntax Analyzer: The next phase is called the syntax analysis or parsing. It takes the
token produced by lexical analysis as input and generates a parse tree (or syntax tree).
In this phase, token arrangements are checked against the source code grammar, i.e.
the parser checks if the expression made by the tokens is syntactically correct.
(a+b)*c
Semantic Analyzer: It verifies the parse tree, whether it’s meaningful or not. It
furthermore produces a verified parse tree. It also does type checking, Label checking,
and Flow control checking.
example,
float x = 20.2;
float y = x*30;
Intermediate Code Generator: After semantic analysis the compiler generates an
intermediate code of the source code for the target machine. It is in between the high-
level language and the machine language. This intermediate code should be generated
in such a way that it makes it easier to be translated into the target machine code.
ex.
total = count + rate * 5
Intermediate code with the help of address code method is:
t1 := int_to_float(5)
t2 := rate * t1
t3 := count + t2
total := t3

Code Optimization: The next phase does code optimization of the intermediate code.
Optimization can be assumed as something that removes unnecessary code lines, and
arranges the sequence of statements in order to speed up the program execution
without wasting resources (CPU, memory).
Example:
Consider the following code
a = intofloat(10)
b = c * a
d = e + b
f = d
Can become
b =c * 10.0
f = e+b
Code Generation: The code generator translates the intermediate code into a sequence
of (generally) relocatable machine code. Sequence of instructions of machine code
performs the task as the intermediate code would do.
Example,
a = b + 60.0
Would be possibly translated to registers.
MOVF a, R1
MULF #60.0, R2
ADDF R1, R2
2.What are the different phases in analysis of source programs?
⇒The analysis of a source program is divided into mainly three phases. They are:
Linear Analysis-
This involves a scanning phase where the stream of characters is read from left to right.
It is then grouped into various tokens having a collective meaning.
Hierarchical Analysis-
In this analysis phase, based on a collective meaning, the tokens are categorized
hierarchically into nested groups.
Semantic Analysis-
This phase is used to check whether the components of the source program are
meaningful or not.

UNIT-02
1. What is Lexical Analysis?
⇒Lexical Analysis is the very first phase in the compiler designing. A Lexer takes the
modified source code which is written in the form of sentences . In other words, it helps
you to convert a sequence of characters into a sequence of tokens. The lexical analyzer
breaks this syntax into a series of tokens. It removes any extra space or comment
written in the source code.
Programs that perform Lexical Analysis in compiler design are called lexical analyzers
or lexers.
What’s a lexeme?
A lexeme is a sequence of characters that are included in the source program according
to the matching pattern of a token. It is nothing but an instance of a token.
What’s a token?
Tokens in compiler design are the sequence of characters which represents a unit of
information in the source program.
What is Pattern?
A pattern is a description which is used by the token. In the case of a keyword which is
used as a token, the pattern is a sequence of characters.
2.Explain Lexical Analyzer Architecture and its role with suitable examples.
⇒The main task of lexical analysis is to read input characters in the code and produce
tokens.
Lexical analyzer scans the entire source code of the program. It identifies each token
one by one. Scanners are usually implemented to produce tokens only when requested
by a parser.

Here is how recognition of tokens in compiler design works-
⦁“Get next token” is a command which is sent from the parser to the lexical analyzer.
⦁On receiving this command, the lexical analyzer scans the input until it finds the next
token.
⦁It returns the token to Parser.
Lexical Analyzer skips whitespaces and comments while creating these tokens. If any
error is present, then Lexical analyzer will correlate that error with the source file and
line number.
Roles of the Lexical analyzer
Lexical analyzer performs below given tasks:
⦁Helps to identify token into the symbol table
⦁Removes white spaces and comments from the source program
⦁Correlates error messages with the source program
⦁Helps you to expands the macros if it is found in the source program
⦁Read input characters from the source program
Example of Lexical Analysis, Tokens, Non-Tokens
Consider the following code that is fed to Lexical Analyzer
#include <stdio.h>
int maximum(int x, int y) {
// This will compare 2 numbers
if (x > y)
return x;
else {
return y;
}
}
Examples of Tokens created
Lexeme Token
int Keyword
maximum Identifier
( Operator
int Keyword
x Identifier

, Operator
int Keyword
Y Identifier
) Operator
{ Operator
If Keyword
3.Specification of Tokens
⇒There are 3 specifications of tokens:
1)Strings
2) Language
3)Regular expression
1) and 2) Strings and Languages
An alphabet or character class is a finite set of symbols.
A string over an alphabet is a finite sequence of symbols drawn from that
alphabet.
A language is any countable set of strings over some fixed alphabet.
In language theory, the terms "sentence" and "word" are often used as synonyms
for "string." The length of a string s, usually written |s|, is the number of
occurrences of symbols in s. For example, a banana is a string of length six. The
empty string, denoted ε, is the string of length zero.
a. Operations on strings
The following string-related terms are commonly used:
1. A prefix of string s is any string obtained by removing zero or more symbols
from the end of string s. For example, ban is a prefix of banana.
2. A suffix of string s is any string obtained by removing zero or more symbols
from the beginning of s. For example, nana is a suffix of banana.
3. A substring of s is obtained by deleting any prefix and any suffix from s. For
example, nan is a substring of banana.
4. The proper prefixes, suffixes, and substrings of a string s are those prefixes,
suffixes, and substrings, respectively of s that are not ε or not equal to s itself.

5. A subsequence of s is any string formed by deleting zero or more not
necessarily consecutive positions of s
6. For example, baan is a subsequence of banana.
b. Operations on languages:
The following are the operations that can be applied to languages:
1. Union
2. Concatenation
3. Kleene closure f ex.∑ = { a, b } , ∑+ = { a, b, aa, ab, ba, bb,………..}
4. Positive closure
The following example shows the operations on strings: Let L={0,1} and
S={a,b,c}
3) Regular Expressions
Each regular expression r denotes a language L(r).
Here are the rules that define the regular expressions over some alphabet Σ and
the languages that those expressions denote:
1. ε is a regular expression, and L(ε) is { ε }, that is, the language whose
sole member is the empty string.
2. If ‘a’ is a symbol in Σ, then ‘a’ is a regular expression, and L(a) = {a},
that is, the language with one string, of length one, with ‘a’ in its one
position.
3. Suppose r and s are regular expressions denoting the languages L(r)
and L(s). Then,
a) (r)|(s) is a regular expression denoting the language L(r) U L(s).
b) (r)(s) is a regular expression denoting the language L(r)L(s).
c) (r)* is a regular expression denoting (L(r))*.
d) (r) is a regular expression denoting L(r).
4. The unary operator * has highest precedence and is left associative.
5. Concatenation has second highest precedence and is left associative.
6. | has lowest precedence and is left associative.

4.Design a lexical analyzer generator lex and explain it.
⇒
● Lex is a program that generates lexical analyzers. It is used with the YACC
parser generator.
● The lexical analyzer is a program that transforms an input stream into a
sequence of tokens.
● It reads the input stream and produces the source code as output through
implementing the lexical analyzer in the C program.
The function of Lex is as follows:
● Firstly, a lexical analyzer creates a program lex.1 in the Lex language. Then the
Lex compiler runs the lex.1 program and produces a C program lex.yy.c.
● Finally C compiler runs the lex.yy.c program and produces an object program
a.out.
● a.out is a lexical analyzer that transforms an input stream into a sequence of
tokens.
4.Explain any three compiler construction tools.

⇒ ⦁Scanner generators: This tool takes regular expressions as input.
For example LEX for Unix Operating System.
⦁Syntax-directed translation engines: These software tools offer an intermediate code by
using the parse tree. It has a goal of associating one or more translations with each node of
the parse tree.
⦁Parser generators: A parser generator takes a grammar as input and automatically generates
source code which can parse streams of characters with the help of a grammar.
⦁Automatic code generators: Takes intermediate code and converts them into Machine
Language.
⦁Data-flow engines: This tool is helpful for code optimization. Here, information is supplied
by the user, and intermediate code is compared to analyze any relation. It is also known as
data-flow analysis. It helps you to find out how values are transmitted from one part of the
program to another part.
5.How to convert source program to target program. Or Language processing
system.

Preprocessor:Takes skeletal source program as input and produces an extended
version of it.
Compiler: Compiles The Program
Assembler: Takes assembly lang as input and converts into machine language code.
Loader/Linker: Links all functions and files required by the object code and converts
object code into executable code.
A translator is a program that converts source code into object code. Generally,
there are three types of translator: compilers. interpreters. assemblers.
6.Input Buffering .

⇒The Input buffer is also commonly known as the input area or input block. When
referring to computer memory, the input buffer is a location that holds all incoming
information before it continues to the CPU for processing.
The lexical analyzer scans the input from left to right one character at a time. It uses two
pointers beginning with ptr(bp) and forward to keep track of the pointer of the input
scanned.
Initially both the pointers point to the first character of the input string as shown below
UNIT-03
1.Syntax Analysis.
⇒Syntax Analysis or Parsing is the second phase, i.e. after lexical analysis. It checks
the syntactic structure of the given input, i.e. whether the given input is in the correct
syntax (of the language in which the input has been written) or not. It does so by
building a data structure, called a Parse tree or Syntax tree.
The parse tree is constructed by using the predefined Grammar of the language and the
input string.
2.Parser
⇒A parser is a compiler or interpreter component that breaks data into smaller elements
for easy translation into another language. A parser takes input in the form of a
sequence of tokens, interactive commands, or program instructions and breaks them up
into parts that can be used by other components in programming.

Types of Parser:
a)Top-Down Parser
● Top-down parsing attempts to build the parse tree from root to leaf.
● The top-down parser will start from the start symbol and proceed to the string.
● It follows the leftmost derivation. In leftmost derivation, the leftmost non-terminal
in each sentential is always chosen.
● Example: LL(1)
● Recursive descent : Brute force, Backtracking
● Non- Recursive descent : Predictive Parser or without backtrack.
b)Bottom Up Parsers / Shift Reduce Parser
● Build the parse tree from leaves to root.
● Bottom-up parsing can be defined as an attempt to reduce the input string w to
the start symbol of grammar by tracing out the rightmost derivations of w in
reverse.
● Example: LR ,SLR,CLR,LALR
5.Error recovery Strategies
⇒
1.Panic mode
In this method of discovering the error, the parser discards input symbols one at a time.
This process is continued until one of the designated sets of synchronizing tokens is
found. Synchronizing tokens are delimiters such as semicolons or ends. These tokens
indicate an end of the input statement.
Advantage:
-It’s easy to use.
-The program never falls into the loop.
Disadvantage:

-This technique may lead to semantic error or runtime error in further stages.
2.Phase level recovery
In this strategy, on discovering an error parser performs original correction on the
remaining input. It can replace a prefix of the remaining input with some string. This
actually helps the parser to continue its job. The original correction can be replacing the
comma with semicolons, omission of semicolons, or, fitting missing semicolons. This
type of original correction is decided by the compiler developer.
Advantages:
-This method is used in many errors repairing compilers.
Disadvantages:
-While doing the replacement the program should be prevented from falling into an
infinite loop.
3.Error production
It requires good knowledge of common errors that might be encountered, then we can
augment the grammar for the corresponding language with error productions that
generate the erroneous constructs.
Advantages:
-Syntactic phase errors are generally recovered by error productions.
Disadvantages:
-The method is very difficult to maintain because if we change the grammar then it
becomes necessary to change the corresponding production.
-It is difficult to maintain by the developers.
4.Global correction
Given an incorrect input string x and grammar G, the algorithm itself can find a parse
tree for a related string y; such that the number of insertions, deletions, and changes of
token required to transform x into y is as low as possible. Global correction methods
increase time & space requirements at parsing time. This is simply a theoretical
concept.
Advantages:
-It makes very few changes in processing an incorrect input string.
Disadvantages:
-It is simply a theoretical concept, which is unimplementable.
5.Symbol table

In semantic errors, errors are recovered by using a symbol table for the corresponding
identifier and if data types of two operands are not compatible, automatically type
conversion is done by the compiler.
Advantages:
-It allows basic type conversion, which we generally do in real-life calculations.
Disadvantages:
-Only Implicit type conversion is possible.
UNIT-04
1.Parse Tree and Syntax Tree
Parse Tree:
⇒
● Parse tree is the graphical representation of symbols. The symbol can be terminal
or non-terminal.
● In parsing, the string is derived using the start symbol. The root of the parse tree
is that start symbol.
● Parse tree follows the precedence of operators. The deepest sub-tree traversed
First,
The parse tree follows these points:
- All leaf nodes have to be terminals.
- All interior nodes have to be non-terminals.
- In-order traversal gives the original input string.
Syntax Tree:
⇒
● A syntax tree is a tree in which each leaf node represents an operand,
while each inside node represents an operator.
● The Parse Tree is abbreviated as the syntax tree.
● Syntax trees are abstract or compact representations of parse trees.
● They are also called Abstract Syntax Trees.

Parse Tree Syntax Tree
Parse tree is a graphical
representation of the replacement
process in a derivation
Syntax tree is the compact form of a
parse
tree.
Each interior node represents a
grammar rules.
Each leaf node represents a terminal
Each interior node represents an
operator.
Each leaf node represents an operand
Parse trees provide every
characteristic information from the real
syntax.
Syntax trees do not provide every
characteristic information from the real
syntax.
Parse trees are comparatively less
dense than syntax trees.
Syntax trees are comparatively more
dense
than parse trees.
2.Syntax Directed Translation
⇒In syntax directed translation, along with the grammar we associate some informal
notations and these notations are called semantic rules.
So we can say that
1. Grammar + semantic rule = SDT (syntax directed translation)
● Every non-terminal can get one or more than one attribute or sometimes 0
attribute depending on the type of the attribute.
● The value of these attributes is evaluated by the semantic rules associated with
the production rule.
● In the semantic rule, attribute is VAL and an attribute may hold anything like a

string, a number, a memory location and a complex record
● In Syntax directed translation, whenever a construct encounters in the
programming language then it is translated according to the semantic rules
define in that particular programming language.
Production Semantic Rules
E → E + T E.node = make_node(‘+’, E.node, T.node)
E → T E.node := T.node
T → T * F T.node := make_node(‘*’, T.node, F.node)
F → (T) F.node := T.node
F —-> id F.node = make_leaf(id, id.entry)
F → num F.node = make_leaf(num, num.value)
-E.val is one of the attributes of E.
-num.val is the attribute returned by the lexical analyzer.
3.Attributes
⇒● Semantic information is stored in attributes associated with terminal and
nonterminal symbols of the grammar.
● The attributes are divided into two groups: Synthesized attributes and Inherited
attribute
1. Synthesized attributes:
● A Synthesized attribute is an attribute of the non-terminal on the left-hand
side of a production.
● Synthesized attributes represent information that is being passed up the
parse tree.
● The attribute can take value only from its children.
● For eg. let’s say A -> BC is a production of a grammar, and A’s attribute is
dependent on B’s attributes or C’s attributes then it will be a synthesized
attribute.
● To illustrate, assume the following production:
S → ABC
● If S is taking values from its child nodes (A, B, C), then it is said to be a
synthesized attribute, as the values of ABC are synthesized to S.
2. Inherited attributes:
● An attribute of a nonterminal on the right-hand side of a production is
called an inherited attribute.
● The attribute can take value either from its parent or from its siblings.

● For example, let’s say A -> BC is a production of a grammar and B’s
attribute is dependent on A’s attributes or C’s attributes then it will be
inherited attribute.
● To illustrate, assume the following production:
S → ABC
● A can get values from S, B and C. B can take values from S, A, and C. Likewise,
C can take values from S, A, and B.
4.Expansion and Reduction
⇒
Expansion : When a non-terminal is expanded to terminals as per a grammatical rule
Reduction : When a terminal is reduced to its corresponding non-terminal according to
grammar rules. Syntax trees are parsed top-down and left to right. Whenever reduction
occurs, we apply its corresponding semantic rules (actions).
5.S-attributed and L-attributed SDT
⇒
S-attributed SDT:
● If an SDT uses only synthesized attributes, it is called an S-attributed SDT.
● S-attributed SDTs are evaluated in bottom-up parsing, as the values of the
parent nodes depend upon the values of the child nodes.
● Semantic actions are placed in the rightmost place of RHS.
L-attributed SDT:

● If an SDT uses both synthesized attributes and inherited attributes with a
restriction that inherited attribute can inherit values from left siblings only, it
is called as L-attributed SDT.
● In L-attributed SDTs, a non-terminal can get values from its parent, child, and
sibling nodes. As in the following production
● S → ABC
● S can take values from A, B, and C (synthesized). A can take values from S only.
B can take values from S and A. C can get values from S, A, and B. No
non-terminal can get values from the sibling to its right.
● For example,
A -> XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S}
is not an L-attributed grammar since Y.S = A.S and Y.S = X.S are allowed
but Y.S = Z.S violates the L-attributed SDT definition as attributed is
inheriting the value from its right sibling.
6.Three Address Code
⇒
● Three-address code is an intermediate code. It is used by the optimizing
compilers.
● In three-address code, the given expression is broken down into several separate
instructions. These instructions can easily translate into assembly language.
● Each Three address code instruction has at most three operands, or it can have 2
operands. It is a combination of assignment and a binary operator.
The characteristics of Three Address instructions are-
-They are generated by the compiler for implementing Code Optimization.
-They use a maximum of three addresses to represent any statement.
-They are implemented as a record with the address fields.
Example−
Expression a = b + c + d can be converted into the following Three Address Code.
t1 = b + c
t2 = t1 + d
a = t2
where t1 and t2 are temporary variables generated by the compiler. Most of the time a
statement includes less than three references, but it is still known as a three address
Statement.
Types of Three-Address Code Statements

1. Assignment
The types of Assignment statements are,
x = y op z and x = op y
Here,
● x, y and z are the operands.
● op represents the operator.
It assigns the result obtained after solving the right side expression of the assignment
operator to the left side operand.
x = y, value of y is assigned to x.
2. Unconditional Jump Goto
X , Here, X is the tag or label of the target statement.
On executing the statement,
● The control is sent directly to the location specified by label X.
● All the statements in between are skipped.
3. Conditional Jump
-If x relop y goto X
Here,
● x & y are the operands.
● X is the tag or label of the target statement.
● relop is a relational operator.
If the condition “x relop y” gets satisfied, then-
● The control is sent directly to the location specified by label X.
● All the statements in between are skipped.
If the condition “x relop y” fails, then-
● The control is not sent to the location specified by label X.
● The next statement appearing in the usual sequence is executed.
4. Procedure Call
-A call to the procedure P(x1, x2 … . . xn) with the parameters x1, x2 … . . xn
Param x1
Param x2
Param xn
Here, P is a function which takes x as a parameter.
5. Array Statements
−x = y[i], value of ith location of array y is assigned to x.
x[i] = y, the value of y is assigned to i-th location of array x.

There are three implementations used for three address code statements which are as
follows −
● Quadruples
● Triples
● Indirect Triples
Quadruples
Quadruple is a structure that contains at most four fields, i.e., operator, Argument 1,
Argument 2, and Result.
Operator Argument 1 Argument 2 Result
Triples
This three address code representation contains three (3) fields, i.e., one for operator
and two for arguments (i.e., Argument 1 and Argument 2)
Operator Argument 1 Argument 2
Indirect Triples
The indirect triple representation uses an extra array to list the pointers to the triples in
the desired order. This is known as indirect triple representation.

UNIT-05
1.Intermediate Code Generator
⇒Intermediate code can translate the source program into the machine program.
Intermediate code is generated because the compiler can’t generate machine code
directly in one pass.
Therefore, first, it converts the source program into intermediate code, which performs
efficient generation of machine code further.
The intermediate code can be represented in the form of postfix notation, syntax tree,
directed acyclic graph, three address codes, Quadruples, and triples.
1.Postfix Notation
⇒The ordinary (infix) way of writing the sum of a and b is with operator in the

middle : a + b
-The postfix notation for the same expression places the operator at the right end as
ab +.
-No parentheses are needed in postfix notation.
-In postfix notation the operator follows the operand.
Example – The postfix representation of the expression
(a – b) * (c + d) + (a – b)
is : (ab-)*(cd+ )+(ab-)
ab-cd+*ab-+
ab – cd + *ab -+.
2.Three Address Code
⇒A statement involving no more than three references(two for operands and one for
result) is known as three address statements.
Example – The three address code for the expression a + b * c + d :
T 1 = b * c
T 2 = a + T 1
T 3 = T 2 + d
T 1 , T 2 , T 3 are temporary variables.
3.Syntax Tree
⇒The operator and keyword nodes of the parse tree are moved to their parents and a
chain of single productions is replaced by single link in syntax tree the internal
nodes are operators and child nodes are operands. To form syntax tree put
parentheses in the expression, this way it's easy to recognize which operand should
come first.
Example –
x = (a + b * c) / (a – b * c)

`2.Translation of Statements
⇒
a)Assignments Statements:
Ex.
1. S → id := E
2. E → E1 + E2
3. E → E1 * E2
4. E → (E1)
5. E → id
For this given grammar SDT= Production rule + Semantic action
Production Rule Semantic Actions
S → id :=E {
p.value = look_up(id.name);
if p ≠ nil then
Emit (p = E.place) //GEN()
else
error;
}
E → E1 + E2 {E.place = newtemp();
Emit (E.place = E1.place '+' E2.place)
}
E → E1 * E2 {E.place = newtemp();
Emit (E.place = E1.place '*' E2.place)
}
E → (E1) {E.place = E1.place}
E → id {p = look_up(id.name);
If p ≠ nil then
Emit (p = E.place)
Else
Error;
}
b)Boolean Expressions
⇒
Ex.
1. E → E OR E
2. E → E AND E
3. E → NOT E

4. E → (E)
5. E → id relop id
6. E → TRUE
7. E → FALSE
Production rule Semantic actions
E → E1 OR E2 {E.place = newtemp();
Emit(E.place = E1.place OR E2.place)
}
E → E1 AND E2 {E.place = newtemp();
Emit (E.place = E1.place AND E2.place)
}
E → NOT E1 {E.place = newtemp();
Emit (E.place = NOT E1.place)
}
E → (E1) {E.place = E1.place}
E → id relop id2 {E.place = newtemp();
Emit (if id1.place relop id2.place goto
next_state + 3);
EMIT (E.place =0)
EMIT (goto next_state + 2)
EMIT (E.pla ce =1)
}
E → TRUE {E.place := newtemp();
Emit (E.place =1)
}
E → FALSE {E.place := newtemp();
Emit (E.place = 0)
}
UNIT-06
1.Code Optimization
⇒The code optimization in the synthesis phase is a program transformation technique,
which tries to improve the intermediate code by making it consume fewer resources (i.e.

CPU, Memory) so that faster-running machine code will result. Compiler optimizing
process should meet the following objectives :
-The optimization must be correct, it must not, in any way, change the meaning of the
program.
-Optimization should increase the speed and performance of the program.
-The compilation time must be kept reasonable.
-The optimization process should not delay the overall compiling process.
Types of Code Optimization: The optimization process can be broadly classified into two
types :
Machine Independent Optimization:
This code optimization phase attempts to improve the intermediate code to get a better
target code as the output. The part of the intermediate code which is transformed here
does not involve any CPU registers or absolute memory locations.
Machine Dependent Optimization:
Machine-dependent optimization is done after the target code has been generated and
when the code is transformed according to the target machine architecture. It involves
CPU registers and may have absolute memory references rather than relative
references. Machine-dependent optimizers put efforts to take maximum advantage of
the memory hierarchy.
2.Issues in code generation. Describe it.
⇒Code generator converts the intermediate representation of source code into a form
that can be readily executed by the machine. A code generator is expected to generate
the correct code. Designing of the code generator should be done in such a way so that
it can be easily implemented, tested, and maintained.
The following issue arises during the code generation phase:
● Input to code generator –
-The input to the code generator contains the intermediate representation of the
source program and the information of the symbol table. The source program is
produced by the front end.
-Intermediate representation has the several choices:
a) Postfix notation
b) Syntax tree
c) Three address code

-We assume the front end produces low-level intermediate representation i.e.
values of names in it can be directly manipulated by the machine instructions.
-The code generation phase needs complete error-free intermediate code as an
input requires.
● Target program:
The target program is the output of the code generator. The output can be:
a) Assembly language: It allows subprograms to be separately compiled.
b) Relocatable machine language: It makes the process of code generation
easier.
c) Absolute machine language: It can be placed in a fixed location in memory and
can be executed immediately.
● Memory Management –
-During code generation process the symbol table entries have to be mapped to
actual ip addresses and levels have to be mapped to instruction addresses.
-Mapping names in the source program to address data is co-operating done by
the front end and code generator.
-Local variables are stack allocation in the activation record while global variables
are in the static area.
● Instruction selection –
-The instruction set of the target machine should be complete and uniform.
-When you consider the efficiency of the target machine then the instruction
speed and machine idioms are important factors.
-The quality of the generated code can be determined by its speed and size.
Approaches to code generation issues:
Code generators must always generate the correct code. It is essential because of the
number of special cases that a code generator might face. Some of the design goals of
code generator are:
-Correct
-Easily maintainable
-Testable
-Efficient

3.Target Machine Architecture
⇒The target computer is a type of byte-addressable machine. It has 4 bytes to a word.
The target machine has n general purpose registers, R0, R1,...., Rn-1. It also has two-
address instructions of the form:
Op source Destination
Where, op is used as an op-code and source and destination are used as a data field.
● It has the following op-codes:
ADD (add source to destination)
SUB (subtract source from destination)
MOV (move source to destination)
● The source and destination of an instruction can be specified by the combination
of registers and memory location with address modes.
4.Register Allocation
⇒Register allocation is the process of assigning program variables to registers and
reducing the number of swaps in and out of the registers. Movement of variables across
memory is time consuming and this is the main reason why registers are used as they
are available within the memory and they are the fastest accessible storage location.
Advantages :
-Fast accessible storage

-Allows computations to be performed on them
-Deterministic as it incurs no miss
-Reduce memory traffic
-Reduces overall computation time
Disadvantages :
-Registers are generally available in small amount ( up to few hundred Kb )
-Register sizes are fixed and it varies from one processor to another
-Registers are complicated
-Need to save and restore changes during context switch and procedure calls.
There are three popular Register allocation algorithms .
Naive Register Allocation
Linear Scan Algorithm
Chaitin’s Algorithm
a)Naive Register Allocation:
-Naive (no) register allocation is based on the assumption that variables are stored in
Main Memory .
-We can’t directly perform operations on variables stored in Main Memory .
-Variables are moved to registers which allows various operations to be carried out
using ALU .
-ALU contains a temporary register where variables are moved before performing
arithmetic and logic operations .
-Once operations are complete we need to store the result back to the main memory in
this method .
-Transferring of variables to and fro from Main Memory reduces the overall speed of
execution
Advantages :
-Easy to understand operations and the flow of variables from Main memory to -
Registers and vice versa .
-Only 2 registers are enough to perform any operations .
-Design complexity is less .
Disadvantages :
-Time complexity increases as variables are moved to registers from main memory .
-Too many LOAD and STORE instructions .

-To access a variable a second time we need to STORE it to the Main Memory to
record any changes made and LOAD it again .
-This method is not suitable for modern compilers .
b)Linear Scan Algorithm
-Linear Scan Algorithm is a global register allocation mechanism .
-It is a bottom up approach .
-If n variables are live at any point of time then we require ‘n’ registers .
-In this algorithm the variables are scanned linearly to determine the live ranges of the -
variable based on which the registers are allocated .
-The main idea behind this algorithm is to allocate a minimum number of registers -
such that these registers can be used again and this totally depends upon the live range
of the variables .
-For this algorithm we need to implement live variable analysis of Code Optimization .
Disadvantages :
-Linear Scan Algorithm doesn’t take into account the “lifetime holes” of the variable .
-Variables are not live throughout the program and this algorithm fails to record the
holes in the live range of the variable .
c)Graph Coloring (Chaitin’s Algorithm) :
-Register allocation is interpreted as a graph coloring problem .
-Nodes represent the live range of the variable.
-Edges represent the connection between two live ranges .
-Assign color to the nodes such that no two adjacent nodes have the same color .
-Number of colors represents the minimum number of registers required .
A k-coloring of the graph is mapped to k registers .
Steps :
1. Choose an arbitrary node of degree less than k .
2. Push that node onto the stack and remove all of its outgoing edges .
3. Check if the remaining edges have degree less than k, if YES goto 5 else goto #
4. If the degree of any remaining vertex is less than k then push it onto the stack .
5. If there is no more edge available to push and if all edges are present in the
stack POP each node and color them such that no two adjacent nodes have the
same color.
6. Number of colors assigned to nodes is the minimum number of registers needed

5.Directed Acyclic Graph (DAG)
⇒
The Directed Acyclic Graph (DAG) is used to represent the structure of basic blocks, to
visualize the flow of values between basic blocks, and to provide optimization
techniques in the basic block. To apply an optimization technique to a basic block, a
DAG is a three-address code that is generated as the result of an intermediate code
generation.
● Directed acyclic graphs are a type of data structure and they are used to apply
transformations to basic blocks.
● The Directed Acyclic Graph (DAG) facilitates the transformation of basic blocks.
● DAG is an efficient method for identifying common sub-expressions.
● It demonstrates how the statement’s computed value is used in subsequent
statements.
Directed Acyclic Graph Characteristics :
A Directed Acyclic Graph for Basic Block is a directed acyclic graph with the following
labels on nodes.
● The graph’s leaves each have a unique identifier, which can be variable names
or constants.
● The interior nodes of the graph are labeled with an operator symbol.
● In addition, nodes are given a string of identifiers to use as labels for storing the
computed value.
● Directed Acyclic Graphs have their own definitions for transitive closure and
transitive reduction.
● Directed Acyclic Graphs have topological orderings defined.
Application of Directed Acyclic Graph:
● Directed acyclic graph determines the subexpressions that are commonly used.
● Directed acyclic graph determines the names used within the block as well as the
names computed outside the block.
● Determines which statements in the block may have their computed value
outside the block.
● Code can be represented by a Directed acyclic graph that describes the inputs
and outputs of each of the arithmetic operations performed within the code; this
representation allows the compiler to perform common subexpression elimination
efficiently.

● Several programming languages describe value systems that are linked together
by a directed acyclic graph. When one value changes, its successors are
recalculated; each value in the DAG is evaluated as a function of its
predecessors.
Algorithm for construction of Directed Acyclic Graph :
There are three possible scenarios for building a DAG on three address codes:
Case 1 – x = y op z
Case 2 – x = op y
Case 3 – x = y
Directed Acyclic Graph for the above cases can be built as follows :
Step 1 –
-If the y operand is not defined, then create a node (y).
-If the z operand is not defined, create a node for case(1) as node(z).
Step 2 –
-Create node(OP) for case(1), with node(z) as its right child and node(OP) as its left
child (y).
-For the case (2), see if there is a node operator (OP) with one child node (y).
-Node n will be node(y) in case (3).
Step 3 –
-Remove x from the list of node identifiers. Step 2: Add x to the list of attached
identifiers for node n.
Example.
T0 = a + b —Expression 1

T1 = T0 + c —-Expression 2
d = T0 + T1 —–Expression 3
6.Peephole Optimization
⇒Peephole optimization is a type of code Optimization performed on a small part of the
code. It is performed on a very small set of instructions in a segment of code.

The small set of instructions or small part of code on which peephole optimization is
performed is known as peephole or window.
It basically works on the theory of replacement in which a part of code is replaced by
shorter and faster code without a change in output. The peephole is machine-
dependent optimization.
Objectives of Peephole Optimization:
The objective of peephole optimization is as follows:
-To improve performance
-To reduce memory footprint
-To reduce code size
Peephole Optimization Techniques
● A. Redundant load and store elimination: In this technique, redundancy is
eliminated.
● Constant folding: The code that can be simplified by the user itself, is simplified.
● Strength Reduction: The operators that consume higher execution time are
replaced by the operators consuming less execution time.
● Null sequences: Useless operations are deleted.
Combine operations: Several operations are replaced by a single equivalent operation.

Compiler Design

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Compiler Design

Similar to Compiler Design (20)

Recently uploaded

Recently uploaded (20)

Compiler Design