Introduction to Compiler
Construction
Prof. A. N. Kazi
Jawaharlal Darda Institute of Engineering &
Technology, Yavatmal
TRANSLATORS
• A translator is one kind of program that takes one form
of program (input) and converts into another form
(output). The input program is called source language
and the output program is called target language.
Types of Translators are ::
(1) Compilers
(2) Interpreters
(3) Assemblers
COMPILATION AND INTERPRETATION
• A compiler is a program that reads a program in one
language and translates it into an equivalent program
in another language. The translation done by a
compiler is called compilation.
• An interpreter is another common kind of language
processor. Instead of producing a target program as a
translation, an interpreter appears to directly execute
the operations specified in the source program on
inputs supplied by the user.
COMPILATION AND INTERPRETATION
 Compiler :
 Interpreter :
Compilers
• “Compilation”
– Translation of a program written in a source
language into a semantically equivalent
program written in a target language
Compiler
Error messages
Source
Program
Target
Program
Input
Output
As an important role of a compiler is error showing to
the programmer.
INTERPRETER
• An interpreter is a program that appears to execute a
source program as if it were machine language.
Fig: Execution in Interpreter
Languages such as BASIC, SNOBOL, LISP can be translated
using interpreters
Compiler is a translator program that translates a program written in
(HLL) the source program and translate it into an equivalent program in
(MLL) the target program.
Fig : Language Processing System
HLL Consisting
#include< >
#define SIZE
Pure HLL
Fig : Structure of Compiler
Fig : Execution process of source program in Compiler
ASSEMBLER
1. Programmers found it difficult to write or read programs in
machine language. They begin to use a mnemonic (symbols) for
each machine instruction, which they would subsequently translate
into machine language.
2. Such a mnemonic machine language is now called an assembly
language.(ALP)
3. Programs known as assembler were written to automate the
translation of assembly language in to machine language.
LOADER AND LINK-EDITOR:
• Loader : Once the assembler procedures an object
program, that program must be placed into memory and
executed. The assembler could place the object program
directly in memory and transfer control to it, thereby
causing the machine language program to be execute.
• Linker : Add necessary library file that are included in
source program
LIST OF COMPILERS
1. Ada compilers
2 .ALGOL compilers
3 .BASIC compilers
4 .C# compilers
5 .C compilers
6 .C++ compilers
7 .COBOL compilers
8 .Common Lisp compilers
9. ECMAScript interpreters
10. Fortran compilers
11 .Java compilers
12. Pascal compilers
13. PL/I compilers
14. Python compilers
15. Smalltalk compilers
Preprocessors, Compilers, Assemblers,
and Linkers
Preprocessor
Compiler
Assembler
Linker
Skeletal Source Program
Source Program
Target Assembly Program
Relocatable Object Code
Absolute Machine Code
Libraries and
Relocatable Object Files
Try for example:
gcc -v myprog.c
Phases of a compiler
• Lexical Analysis
- The first phase of a compiler is called lexical
analysis or scanning or linear analysis. The lexical
analyzer reads the stream of characters making up
the source program and groups the characters into
meaningful sequences called lexemes.
For each lexeme, the lexical analyzer produces output as a token of
the form
<token-name, attribute-value>
The first component token-name is an abstract symbol that is used
during syntax analysis, and the second component attribute-value
points to an entry in the symbol table for this token.
Lexeme mapped in Tokens
1) position is a lexeme that would be mapped into a token
<id,1>.
where , id is an abstract symbol standing for identifier and 1
points to the symbol able entry for position.
(2) The assignment symbol = is a lexeme that is mapped into
the token <=>.
(3) initial is a lexeme that is mapped into the token <id, 2>.
(4) + is a lexeme that is mapped into the token <+>.
(5) rate is a lexeme that is mapped into the token <id, 3>.
(6) * is a lexeme that is mapped into the token <*>.
(7) 60 is a lexeme that is mapped into the token <60>.
The sequence of tokens produced as follows after lexical analysis.
<id, 1> <=> <id, 2> <+> <id, 3> <*> <60>
Syntax Analysis
• The second phase of the compiler is syntax analysis or
parsing or hierarchical analysis.
• The parser uses the first components of the tokens
produced by the lexical analyzer to create a tree-like
intermediate representation that depicts the
grammatical structure of the token stream.
• The hierarchical tree structure generated in this phase
is called parse tree or syntax tree.
Figure : Syntax tree for position = initial + rate * 60
Semantic Analysis
• The semantic analyzer uses the syntax tree and the
information in the symbol table to check the source
program for semantic consistency with the language
definition.
• It ensures the correctness of the program, matching of
the parenthesis is also done in this phase.
• An important part of semantic analysis is type
checking, where the compiler checks that each
operator has matching operands.
• The compiler must report an error if a floating-point
number is used to index an array.
Semantic Analysis
Figure : Semantic tree for position = initial + rate * 60
Intermediate Code Generation
• After syntax and semantic analysis of the source program,
many compilers generate an explicit low-level or machine-
like intermediate representation
• The intermediate representation have two important
properties:
a. It should be easy to produce
b. It should be easy to translate into the target machine.
Three-address code is one of the intermediate representations,
which consists of a sequence of assembly-like instructions with
three operands per instruction.
Intermediate Code Generation
• Each operand can act like a register.
• The output of the intermediate code generator
consists of the three-address code sequence for
position = initial + rate * 60
• t1 = inttofloat(60)
• t2 = id3 * t1
• t3 = id2 + t2
• id1 = t3
Code Optimization
• The machine-independent code-optimization phase
attempts to improve the intermediate code so that better
target code will result. Usually better means faster.
• Optimization has to improve the efficiency of code so
that the target program running time and consumption
of memory can be reduced.
Moreover, t3 is used only once to transmit its value to id1 so
the optimizer can transform into the shorter sequence:
t1 = id3 * 60.0
id1 = id2 + t1
Code Generation
• The code generator takes as input an intermediate
representation of the source program and maps it
into the target language.
• If the target language is machine code, then the
registers or memory locations are selected for each
of the variables used by the program.
The intermediate instructions are translated into sequences of machine
instructions.
LDF R2, id3
MULF R2, R2 , #60.0
LDF Rl, id2
ADDF Rl, Rl, R2
STF idl, Rl
Figure : Translation of an
assignment statement
Symbol-Table Management
• The symbol table, which stores information about the
entire source program, is used by all phases of the
compiler.
• An essential function of a compiler is to record the
variable names used in the source program and collect
information about various attributes of each name.
• These attributes may provide information about the
storage allocated for a name, its type, its scope.
A symbol table can be implemented in one of the following ways:
Linear (sorted or unsorted) list
Binary Search Tree
Hash table
THE GROUPING OF PHASES
THE END

Concept of compiler in details

  • 1.
    Introduction to Compiler Construction Prof.A. N. Kazi Jawaharlal Darda Institute of Engineering & Technology, Yavatmal
  • 2.
    TRANSLATORS • A translatoris one kind of program that takes one form of program (input) and converts into another form (output). The input program is called source language and the output program is called target language. Types of Translators are :: (1) Compilers (2) Interpreters (3) Assemblers
  • 3.
    COMPILATION AND INTERPRETATION •A compiler is a program that reads a program in one language and translates it into an equivalent program in another language. The translation done by a compiler is called compilation. • An interpreter is another common kind of language processor. Instead of producing a target program as a translation, an interpreter appears to directly execute the operations specified in the source program on inputs supplied by the user.
  • 4.
    COMPILATION AND INTERPRETATION Compiler :  Interpreter :
  • 5.
    Compilers • “Compilation” – Translationof a program written in a source language into a semantically equivalent program written in a target language Compiler Error messages Source Program Target Program Input Output As an important role of a compiler is error showing to the programmer.
  • 6.
    INTERPRETER • An interpreteris a program that appears to execute a source program as if it were machine language. Fig: Execution in Interpreter Languages such as BASIC, SNOBOL, LISP can be translated using interpreters
  • 7.
    Compiler is atranslator program that translates a program written in (HLL) the source program and translate it into an equivalent program in (MLL) the target program. Fig : Language Processing System HLL Consisting #include< > #define SIZE Pure HLL
  • 8.
    Fig : Structureof Compiler Fig : Execution process of source program in Compiler
  • 9.
    ASSEMBLER 1. Programmers foundit difficult to write or read programs in machine language. They begin to use a mnemonic (symbols) for each machine instruction, which they would subsequently translate into machine language. 2. Such a mnemonic machine language is now called an assembly language.(ALP) 3. Programs known as assembler were written to automate the translation of assembly language in to machine language.
  • 10.
    LOADER AND LINK-EDITOR: •Loader : Once the assembler procedures an object program, that program must be placed into memory and executed. The assembler could place the object program directly in memory and transfer control to it, thereby causing the machine language program to be execute. • Linker : Add necessary library file that are included in source program
  • 11.
    LIST OF COMPILERS 1.Ada compilers 2 .ALGOL compilers 3 .BASIC compilers 4 .C# compilers 5 .C compilers 6 .C++ compilers 7 .COBOL compilers 8 .Common Lisp compilers 9. ECMAScript interpreters 10. Fortran compilers 11 .Java compilers 12. Pascal compilers 13. PL/I compilers 14. Python compilers 15. Smalltalk compilers
  • 12.
    Preprocessors, Compilers, Assemblers, andLinkers Preprocessor Compiler Assembler Linker Skeletal Source Program Source Program Target Assembly Program Relocatable Object Code Absolute Machine Code Libraries and Relocatable Object Files Try for example: gcc -v myprog.c
  • 13.
    Phases of acompiler
  • 14.
    • Lexical Analysis -The first phase of a compiler is called lexical analysis or scanning or linear analysis. The lexical analyzer reads the stream of characters making up the source program and groups the characters into meaningful sequences called lexemes. For each lexeme, the lexical analyzer produces output as a token of the form <token-name, attribute-value> The first component token-name is an abstract symbol that is used during syntax analysis, and the second component attribute-value points to an entry in the symbol table for this token.
  • 15.
    Lexeme mapped inTokens 1) position is a lexeme that would be mapped into a token <id,1>. where , id is an abstract symbol standing for identifier and 1 points to the symbol able entry for position. (2) The assignment symbol = is a lexeme that is mapped into the token <=>. (3) initial is a lexeme that is mapped into the token <id, 2>. (4) + is a lexeme that is mapped into the token <+>. (5) rate is a lexeme that is mapped into the token <id, 3>. (6) * is a lexeme that is mapped into the token <*>. (7) 60 is a lexeme that is mapped into the token <60>. The sequence of tokens produced as follows after lexical analysis. <id, 1> <=> <id, 2> <+> <id, 3> <*> <60>
  • 16.
    Syntax Analysis • Thesecond phase of the compiler is syntax analysis or parsing or hierarchical analysis. • The parser uses the first components of the tokens produced by the lexical analyzer to create a tree-like intermediate representation that depicts the grammatical structure of the token stream. • The hierarchical tree structure generated in this phase is called parse tree or syntax tree. Figure : Syntax tree for position = initial + rate * 60
  • 17.
    Semantic Analysis • Thesemantic analyzer uses the syntax tree and the information in the symbol table to check the source program for semantic consistency with the language definition. • It ensures the correctness of the program, matching of the parenthesis is also done in this phase. • An important part of semantic analysis is type checking, where the compiler checks that each operator has matching operands. • The compiler must report an error if a floating-point number is used to index an array.
  • 18.
    Semantic Analysis Figure :Semantic tree for position = initial + rate * 60
  • 19.
    Intermediate Code Generation •After syntax and semantic analysis of the source program, many compilers generate an explicit low-level or machine- like intermediate representation • The intermediate representation have two important properties: a. It should be easy to produce b. It should be easy to translate into the target machine. Three-address code is one of the intermediate representations, which consists of a sequence of assembly-like instructions with three operands per instruction.
  • 20.
    Intermediate Code Generation •Each operand can act like a register. • The output of the intermediate code generator consists of the three-address code sequence for position = initial + rate * 60 • t1 = inttofloat(60) • t2 = id3 * t1 • t3 = id2 + t2 • id1 = t3
  • 21.
    Code Optimization • Themachine-independent code-optimization phase attempts to improve the intermediate code so that better target code will result. Usually better means faster. • Optimization has to improve the efficiency of code so that the target program running time and consumption of memory can be reduced. Moreover, t3 is used only once to transmit its value to id1 so the optimizer can transform into the shorter sequence: t1 = id3 * 60.0 id1 = id2 + t1
  • 22.
    Code Generation • Thecode generator takes as input an intermediate representation of the source program and maps it into the target language. • If the target language is machine code, then the registers or memory locations are selected for each of the variables used by the program. The intermediate instructions are translated into sequences of machine instructions. LDF R2, id3 MULF R2, R2 , #60.0 LDF Rl, id2 ADDF Rl, Rl, R2 STF idl, Rl
  • 23.
    Figure : Translationof an assignment statement
  • 24.
    Symbol-Table Management • Thesymbol table, which stores information about the entire source program, is used by all phases of the compiler. • An essential function of a compiler is to record the variable names used in the source program and collect information about various attributes of each name. • These attributes may provide information about the storage allocated for a name, its type, its scope. A symbol table can be implemented in one of the following ways: Linear (sorted or unsorted) list Binary Search Tree Hash table
  • 25.
  • 26.