Concept of compiler in details

Introduction to Compiler
Construction
Prof. A. N. Kazi
Jawaharlal Darda Institute of Engineering &
Technology, Yavatmal

TRANSLATORS
• A translator is one kind of program that takes one form
of program (input) and converts into another form
(output). The input program is called source language
and the output program is called target language.
Types of Translators are ::
(1) Compilers
(2) Interpreters
(3) Assemblers

COMPILATION AND INTERPRETATION
• A compiler is a program that reads a program in one
language and translates it into an equivalent program
in another language. The translation done by a
compiler is called compilation.
• An interpreter is another common kind of language
processor. Instead of producing a target program as a
translation, an interpreter appears to directly execute
the operations specified in the source program on
inputs supplied by the user.

COMPILATION AND INTERPRETATION
 Compiler :
 Interpreter :

Compilers
• “Compilation”
– Translation of a program written in a source
language into a semantically equivalent
program written in a target language
Compiler
Error messages
Source
Program
Target
Program
Input
Output
As an important role of a compiler is error showing to
the programmer.

INTERPRETER
• An interpreter is a program that appears to execute a
source program as if it were machine language.
Fig: Execution in Interpreter
Languages such as BASIC, SNOBOL, LISP can be translated
using interpreters

Compiler is a translator program that translates a program written in
(HLL) the source program and translate it into an equivalent program in
(MLL) the target program.
Fig : Language Processing System
HLL Consisting
#include< >
#define SIZE
Pure HLL

Fig : Structure of Compiler
Fig : Execution process of source program in Compiler

ASSEMBLER
1. Programmers found it difficult to write or read programs in
machine language. They begin to use a mnemonic (symbols) for
each machine instruction, which they would subsequently translate
into machine language.
2. Such a mnemonic machine language is now called an assembly
language.(ALP)
3. Programs known as assembler were written to automate the
translation of assembly language in to machine language.

LOADER AND LINK-EDITOR:
• Loader : Once the assembler procedures an object
program, that program must be placed into memory and
executed. The assembler could place the object program
directly in memory and transfer control to it, thereby
causing the machine language program to be execute.
• Linker : Add necessary library file that are included in
source program

LIST OF COMPILERS
1. Ada compilers
2 .ALGOL compilers
3 .BASIC compilers
4 .C# compilers
5 .C compilers
6 .C++ compilers
7 .COBOL compilers
8 .Common Lisp compilers
9. ECMAScript interpreters
10. Fortran compilers
11 .Java compilers
12. Pascal compilers
13. PL/I compilers
14. Python compilers
15. Smalltalk compilers

Preprocessors, Compilers, Assemblers,
and Linkers
Preprocessor
Compiler
Assembler
Linker
Skeletal Source Program
Source Program
Target Assembly Program
Relocatable Object Code
Absolute Machine Code
Libraries and
Relocatable Object Files
Try for example:
gcc -v myprog.c

• Lexical Analysis
- The first phase of a compiler is called lexical
analysis or scanning or linear analysis. The lexical
analyzer reads the stream of characters making up
the source program and groups the characters into
meaningful sequences called lexemes.
For each lexeme, the lexical analyzer produces output as a token of
the form
<token-name, attribute-value>
The first component token-name is an abstract symbol that is used
during syntax analysis, and the second component attribute-value
points to an entry in the symbol table for this token.

Lexeme mapped in Tokens
1) position is a lexeme that would be mapped into a token
<id,1>.
where , id is an abstract symbol standing for identifier and 1
points to the symbol able entry for position.
(2) The assignment symbol = is a lexeme that is mapped into
the token <=>.
(3) initial is a lexeme that is mapped into the token <id, 2>.
(4) + is a lexeme that is mapped into the token <+>.
(5) rate is a lexeme that is mapped into the token <id, 3>.
(6) * is a lexeme that is mapped into the token <*>.
(7) 60 is a lexeme that is mapped into the token <60>.
The sequence of tokens produced as follows after lexical analysis.
<id, 1> <=> <id, 2> <+> <id, 3> <*> <60>

Syntax Analysis
• The second phase of the compiler is syntax analysis or
parsing or hierarchical analysis.
• The parser uses the first components of the tokens
produced by the lexical analyzer to create a tree-like
intermediate representation that depicts the
grammatical structure of the token stream.
• The hierarchical tree structure generated in this phase
is called parse tree or syntax tree.
Figure : Syntax tree for position = initial + rate * 60

Semantic Analysis
• The semantic analyzer uses the syntax tree and the
information in the symbol table to check the source
program for semantic consistency with the language
definition.
• It ensures the correctness of the program, matching of
the parenthesis is also done in this phase.
• An important part of semantic analysis is type
checking, where the compiler checks that each
operator has matching operands.
• The compiler must report an error if a floating-point
number is used to index an array.

Semantic Analysis
Figure : Semantic tree for position = initial + rate * 60

Intermediate Code Generation
• After syntax and semantic analysis of the source program,
many compilers generate an explicit low-level or machine-
like intermediate representation
• The intermediate representation have two important
properties:
a. It should be easy to produce
b. It should be easy to translate into the target machine.
Three-address code is one of the intermediate representations,
which consists of a sequence of assembly-like instructions with
three operands per instruction.

Intermediate Code Generation
• Each operand can act like a register.
• The output of the intermediate code generator
consists of the three-address code sequence for
position = initial + rate * 60
• t1 = inttofloat(60)
• t2 = id3 * t1
• t3 = id2 + t2
• id1 = t3

Code Optimization
• The machine-independent code-optimization phase
attempts to improve the intermediate code so that better
target code will result. Usually better means faster.
• Optimization has to improve the efficiency of code so
that the target program running time and consumption
of memory can be reduced.
Moreover, t3 is used only once to transmit its value to id1 so
the optimizer can transform into the shorter sequence:
t1 = id3 * 60.0
id1 = id2 + t1

Code Generation
• The code generator takes as input an intermediate
representation of the source program and maps it
into the target language.
• If the target language is machine code, then the
registers or memory locations are selected for each
of the variables used by the program.
The intermediate instructions are translated into sequences of machine
instructions.
LDF R2, id3
MULF R2, R2 , #60.0
LDF Rl, id2
ADDF Rl, Rl, R2
STF idl, Rl

Figure : Translation of an
assignment statement

Symbol-Table Management
• The symbol table, which stores information about the
entire source program, is used by all phases of the
compiler.
• An essential function of a compiler is to record the
variable names used in the source program and collect
information about various attributes of each name.
• These attributes may provide information about the
storage allocated for a name, its type, its scope.
A symbol table can be implemented in one of the following ways:
Linear (sorted or unsorted) list
Binary Search Tree
Hash table

Concept of compiler in details

More Related Content

What's hot

Similar to Concept of compiler in details

Recently uploaded

Concept of compiler in details