Phases of Compiler.pptx

BASIS OF COMPARISON ASSEMBLER COMPILER
Conversion
Assembler converts the assembly code
into the machine code.
Compiler converts the source
code written by the programmer
to a machine level language.
Input Assembler inputs source code.
Compiler inputs is preprocessed
source code.
The output The output of assembler is binary code.
The output of compiler is a
mnemonic version of machine
code.
Examples GAS, GNU C, C#, Java, C++
Debugging Debugging is difficult. Debugging is easy.
Working
Assembler converts source code to an
object code first then it converts the
object code to the machine language
with the help of linker programs.
Complier scans the entire
program first before translating
into machine code.
Intelligence
Assembler is less intelligent than a
compiler.
Compiler is more intelligent than
assembler.
Working Phases
Assembler makes works in two phases
over the given input. The phases are:
Pass I
Pass II
The compilation phases are:
Lexical analyzer
Syntax analyzer
Semantic analyzer
Code optimizer
Code generator
Error handler

BASIS OF
COMPARISON
COMPILER INTERPRETER
Function
A compiler converts high-level language
program code into machine language
and then executes it.
Interpreter converts source code
into the intermediate form and
then converts that intermediate
code into machine language
Scanning
Complier scans the entire program first
before translating into machine code.
Interpreter scans and translates
the program line by line to
equivalent machine code.
Working
Compiler takes entire program as
input.
Interpreter takes single
instruction as input.
Code Generation
Intermediate object code is generated
in case of compiler.
In case of interpreter, No
intermediate object code is
generated.
Execution Time
Compiler takes less execution time
when compared to interpreter.
Interpreter takes more
execution time when compared
to compiler.
Examples
C
COBOL
C#
C++, etc
Python
Perl
VB
PostScript
LISP etc.

BASIS OF COMPARISON COMPILER INTERPRETER
Memory Requirement
Compiler requires more memory than
interpreter.
Interpreter needs less memory
when compared to compiler.
Modification
If you happen to make any modification
in program you have to recompile entire
program i.e scan the whole program
every time after modification.
If you make any modification and
if that line has not been scanned
then no need to recompile entire
program.
Speed
Compiler is faster when compared to
interpreter.
Interpreter is slower when
compared to compiler.
At Execution
There is usually no need to compile
program every time (if not modified) at
execution time.
Every time program is scanned
and translated at execution time.
Error Detection
Compiler gives you the list of all errors
after compilation of whole program.
Interpreter stops the translation
at the error generation and will
continue when error get solved.
Machine Code
Compiler converts the entire program to
machine code when all errors are
removed execution takes place.
Each time the program is
executed; every line is checked for
error and then converted into
equivalent machine code.
Debugging
Compiler is slow for debugging because
errors are displayed after entire program
has been checked.
Interpreter is good for fast
debugging.
Code Version
The assembly code generated by the
compiler is a mnemonic version of
machine code.
At the output of assembler is re-
locatable machine code generated
by an assembler represented by
binary code.

compiler
• The compiler is software that converts a program written in a high-
level language (Source Language) to low-level language
(Object/Target/Machine Language).

• Cross Compiler that runs on a machine ‘A’ and produces a code for
another machine ‘B’. It is capable of creating code for a platform
other than the one on which the compiler is running.
• Source-to-source Compiler or transcompiler or transpiler is a
compiler that translates source code written in one programming
language into the source code of another programming language.

• Language processing systems (using Compiler) –
• The programs go through a series of transformations so that can
readily be used by machines.
• This is where language procedure systems come in handy.

• High-Level Language – If a program contains
#define or #include directives such as
#include or #define it is called HLL. They are
closer to humans but far from machines.
These (#) tags are called preprocessor
directives. They direct the pre-processor
about what to do.
• Pre-Processor – The pre-processor removes
all the #include directives by including the
files called file inclusion and all the #define
directives using macro expansion. It performs
file inclusion, augmentation, macro-
processing, etc.

• Assembly Language – It’s neither in binary form
nor high level. It is an intermediate state that is
a combination of machine instructions and
some other useful data needed for execution.
• Assembler – For every platform (Hardware +
OS) we will have an assembler. They are not
universal since for each platform we have one.
The output of the assembler is called an object
file. Its translates assembly language to
machine code.

• Interpreter – An interpreter converts high-level
language into low-level machine language, just
like a compiler. But they are different in the way
they read the input. Interpreted programs are
usually slower with respect to compiled ones.
• Relocatable Machine Code – It can be loaded at
any point and can be run. The address within
the program will be in such a way that it will
cooperate with the program movement.

• Loader/Linker – It converts the relocatable code
into absolute code and tries to run the program
resulting in a running program or an error
message (or sometimes both can happen). Linker
loads a variety of object files into a single file to
make it executable. Then loader loads it in
memory and executes it.

Phases of a Compiler
• There are two major phases of compilation, which in turn have many
parts. Each of them takes input from the output of the previous level
and works in a coordinated way.

• Analysis Phase – An intermediate representation is created from the
given source code :
• Lexical Analyzer
• Syntax Analyzer
• Semantic Analyzer
• Intermediate Code Generator

• Synthesis Phase – Equivalent target program is created from the
intermediate representation. It has two parts :
• Code Optimizer
• Code Generator

• Symbol Table – It is a data structure being used and maintained by
the compiler, consisting of all the identifier’s names along with their
types. It helps the compiler to function smoothly by finding the
identifiers quickly.
• The analysis of a source program is divided into mainly three phases.
They are:
• Linear Analysis-
This involves a scanning phase where the stream of characters is read
from left to right. It is then grouped into various tokens having a
collective meaning.
• Hierarchical Analysis-
In this analysis phase, based on a collective meaning, the tokens are
categorized hierarchically into nested groups.
• Semantic Analysis-
This phase is used to check whether the components of the source
program are meaningful or not.

• The compiler has two modules namely the front end and the back
end. Front-end constitutes the Lexical analyzer, semantic analyzer,
syntax analyzer, and intermediate code generator. And the rest are
assembled to form the back end.

Lexical Analyzer
• It is also called a scanner.
• It takes the output of the preprocessor (which performs file inclusion
and macro expansion) as the input which is in a pure high-level
language.
• It reads the characters from the source program and groups them
into lexemes (sequence of characters that “go together”). Each
lexeme corresponds to a token. Tokens are defined by regular
expressions which are understood by the lexical analyzer.
• It also removes lexical errors (e.g., erroneous characters), comments,
and white space.

Tokens
• Lexemes are said to be a sequence of characters (alphanumeric) in a
token.
• There are some predefined rules for every lexeme to be identified as
a valid token.
• These rules are defined by grammar rules, by means of a pattern.
• A pattern explains what can be a token, and these patterns are
defined by means of regular expressions.

int value = 100;
contains the tokens:
int (keyword), value (identifier), = (operator), 100
(constant) and ; (symbol).
In programming language, keywords, constants, identifiers, strings,
numbers, operators and punctuations symbols can be considered as
tokens.
For example, in C language, the variable declaration line

Specifications of Tokens
• Alphabets
Any finite set of symbols {0,1} is a set of binary alphabets,
{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets, {a-z,
A-Z} is a set of English language alphabets.
• Strings
Any finite sequence of alphabets (characters) is called a string. Length
of the string is the total number of occurrence of alphabets, e.g., the
length of the string compilerdesign is 14 and is denoted by
|compilerdesign| = 14.
A string having no alphabets, i.e. a string of zero length is known as an
empty string and is denoted by ε (epsilon).

Arithmetic
Symbols
Addition(+), Subtraction(-), Modulo(%),
Multiplication(*), Division(/)
Punctuation Comma(,), Semicolon(;), Dot(.), Arrow(->)
Assignment =
Special
Assignment
+=, /=, *=, -=
Comparison ==, !=, <, <=, >, >=
Preprocessor #
Location
Specifier
&
Logical &, &&, |, ||, !
Shift Operator >>, >>>, <<, <<<
Special symbols
A typical high-level language contains the following symbols:-

Language
• A language is considered as a finite set of strings over some finite set
of alphabets.
• Computer languages are considered as finite sets, and
mathematically set operations can be performed on them.
• Finite languages can be described by means of regular expressions.

Regular Expressions
• The lexical analyzer needs to scan and identify only a finite set of
valid string/token/lexeme that belong to the language in hand. It
searches for the pattern defined by the language rules.
• Regular expressions have the capability to express finite languages by
defining a pattern for finite strings of symbols. The grammar defined
by regular expressions is known as regular grammar. The language
defined by regular grammar is known as regular language.

Operations
• The various operations on languages are:
• Union of two languages L and M is written as
• L U M = {s | s is in L or s is in M}
• Concatenation of two languages L and M is written as
• L∩M = {st | s is in L and t is in M}
• The Kleene Closure of a language L is written as
• L* = Zero or more occurrence of language L.

Notations
• If r and s are regular expressions denoting the languages L(r) and L(s),
then
• Union : (r)|(s) is a regular expression denoting L(r) U L(s)
• Concatenation : (r).(s) is a regular expression denoting L(r) ∩L(s)
• Kleene closure : (r)* is a regular expression denoting (L(r))*
• (r) is a regular expression denoting L(r)

Precedence and Associativity
• *, concatenation (.), and | (pipe sign) are left associative
• * has the highest precedence
• Concatenation (.) has the second highest precedence.
• | (pipe sign) has the lowest precedence of all.

Finite Automata
• Finite automata is a recognizer for regular expressions. When a
regular expression string is fed into finite automata, it changes its
state for each literal.
• The mathematical model of finite automata consists of:
• Finite set of states (Q)
• Finite set of input symbols (Σ)
• One Start state (q0)
• Set of final states (qf)
• Transition function (δ)
• The transition function (δ) maps the finite set of state (Q) to a finite
set of input symbols (Σ), Q × Σ ➔ Q

Finite Automata Construction
• Let L(r) be a regular language recognized by some finite
automata (FA).
• States :
• Start state
• Intermediate states
• Final state
• Transition

• States : States of FA are represented by circles. State names are of
the state is written inside the circle.
• Start state : The state from where the automata starts, is known as
start state. Start state has an arrow pointed towards it.
• Intermediate states : All intermediate states has at least two arrows;
one pointing to and another pointing out from them.

• Final state : If the input string is successfully parsed, the automata is
expected to be in this state. Final state is represented by double
circles.
• Transition : The transition from one state to another state happens
when a desired symbol in the input is found. Upon transition,
automata can either move to next state or stay in the same state.
Movement from one state to another is shown as a directed arrow,
where the arrows points to the destination state. If automata stays
on the same state, an arrow pointing from a state to itself is drawn.
• Example : We assume FA accepts any three digit binary value ending
in digit 1. FA = {Q(q0, qf), Σ(0,1), q0, qf, δ}

Longest Match Rule
• When the lexical analyzer read the source-code, it scans the code
letter by letter; and when a whitespace, operator symbol, or special
symbols occurs, it decides that a word is completed.
• For example:
• While scanning both lexemes till ‘int’, the lexical analyzer cannot
determine whether it is a keyword int or the initials of identifier int
value.
• The Longest Match Rule states that the lexeme scanned should be
determined based on the longest match among all the tokens
available.
int intvalue;

Syntax Analyzer
• It is sometimes called a parser.
• It constructs the parse tree.
• It takes all the tokens one by one and uses Context-Free Grammar to
construct the parse tree.
There are certain rules associated with the derivation tree.
• Any identifier is an expression
• Any number can be called an expression
• Performing any operations in the given expression will always result in an
expression. For example, the sum of two expressions is also an expression.
• The parse tree can be compressed to form a syntax tree
• Syntax error can be detected at this level if the input is not in
accordance with the grammar.

Semantic Analyzer
• It verifies the parse tree, whether it’s meaningful or not.
• It furthermore produces a verified parse tree. It also does type
checking, Label checking, and Flow control checking.

Intermediate Code Generator
• It generates intermediate code, which is a form that can be readily
executed by a machine.
• Intermediate code is converted to machine language using the last
two phases which are platform dependent.
• Till intermediate code, it is the same for every compiler out there,
but after that, it depends on the platform. To build a new compiler
we don’t need to build it from scratch.
• We can take the intermediate code from the already existing
compiler and build the last two parts.

Code Optimizer
• It transforms the code so that it consumes fewer resources and
produces more speed.
• The meaning of the code being transformed is not altered.
Optimization can be categorized into two types: machine-dependent
and machine-independent.

Target Code Generator
• The main purpose of the Target Code generator is to write a code
that the machine can understand and also register allocation,
instruction selection, etc.
• The output is dependent on the type of assembler. This is the final
stage of compilation.
• The optimized code is converted into relocatable machine code
which then forms the input to the linker and loader.
• All these six phases are associated with the symbol table manager
and error handler as shown in the above block diagram.

Phases of Compiler.pptx

Recommended

Recommended

More Related Content

Similar to Phases of Compiler.pptx

Similar to Phases of Compiler.pptx (20)

Recently uploaded

Recently uploaded (20)

Phases of Compiler.pptx