This document provides an overview of the key components and phases of a compiler. It discusses that a compiler translates a program written in a source language into an equivalent program in a target language. The main phases of a compiler are lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, code generation, and symbol table management. Each phase performs important processing that ultimately results in a program in the target language that is equivalent to the original source program.
2. TRANSLATORS
• A translator is one kind of program that takes one form
of program (input) and converts into another form
(output). The input program is called source language
and the output program is called target language.
Types of Translators are ::
(1) Compilers
(2) Interpreters
(3) Assemblers
3. COMPILATION AND INTERPRETATION
• A compiler is a program that reads a program in one
language and translates it into an equivalent program
in another language. The translation done by a
compiler is called compilation.
• An interpreter is another common kind of language
processor. Instead of producing a target program as a
translation, an interpreter appears to directly execute
the operations specified in the source program on
inputs supplied by the user.
5. Compilers
• “Compilation”
– Translation of a program written in a source
language into a semantically equivalent
program written in a target language
Compiler
Error messages
Source
Program
Target
Program
Input
Output
As an important role of a compiler is error showing to
the programmer.
6. INTERPRETER
• An interpreter is a program that appears to execute a
source program as if it were machine language.
Fig: Execution in Interpreter
Languages such as BASIC, SNOBOL, LISP can be translated
using interpreters
7. Compiler is a translator program that translates a program written in
(HLL) the source program and translate it into an equivalent program in
(MLL) the target program.
Fig : Language Processing System
HLL Consisting
#include< >
#define SIZE
Pure HLL
8. Fig : Structure of Compiler
Fig : Execution process of source program in Compiler
9. ASSEMBLER
1. Programmers found it difficult to write or read programs in
machine language. They begin to use a mnemonic (symbols) for
each machine instruction, which they would subsequently translate
into machine language.
2. Such a mnemonic machine language is now called an assembly
language.(ALP)
3. Programs known as assembler were written to automate the
translation of assembly language in to machine language.
10. LOADER AND LINK-EDITOR:
• Loader : Once the assembler procedures an object
program, that program must be placed into memory and
executed. The assembler could place the object program
directly in memory and transfer control to it, thereby
causing the machine language program to be execute.
• Linker : Add necessary library file that are included in
source program
14. • Lexical Analysis
- The first phase of a compiler is called lexical
analysis or scanning or linear analysis. The lexical
analyzer reads the stream of characters making up
the source program and groups the characters into
meaningful sequences called lexemes.
For each lexeme, the lexical analyzer produces output as a token of
the form
<token-name, attribute-value>
The first component token-name is an abstract symbol that is used
during syntax analysis, and the second component attribute-value
points to an entry in the symbol table for this token.
15. Lexeme mapped in Tokens
1) position is a lexeme that would be mapped into a token
<id,1>.
where , id is an abstract symbol standing for identifier and 1
points to the symbol able entry for position.
(2) The assignment symbol = is a lexeme that is mapped into
the token <=>.
(3) initial is a lexeme that is mapped into the token <id, 2>.
(4) + is a lexeme that is mapped into the token <+>.
(5) rate is a lexeme that is mapped into the token <id, 3>.
(6) * is a lexeme that is mapped into the token <*>.
(7) 60 is a lexeme that is mapped into the token <60>.
The sequence of tokens produced as follows after lexical analysis.
<id, 1> <=> <id, 2> <+> <id, 3> <*> <60>
16. Syntax Analysis
• The second phase of the compiler is syntax analysis or
parsing or hierarchical analysis.
• The parser uses the first components of the tokens
produced by the lexical analyzer to create a tree-like
intermediate representation that depicts the
grammatical structure of the token stream.
• The hierarchical tree structure generated in this phase
is called parse tree or syntax tree.
Figure : Syntax tree for position = initial + rate * 60
17. Semantic Analysis
• The semantic analyzer uses the syntax tree and the
information in the symbol table to check the source
program for semantic consistency with the language
definition.
• It ensures the correctness of the program, matching of
the parenthesis is also done in this phase.
• An important part of semantic analysis is type
checking, where the compiler checks that each
operator has matching operands.
• The compiler must report an error if a floating-point
number is used to index an array.
19. Intermediate Code Generation
• After syntax and semantic analysis of the source program,
many compilers generate an explicit low-level or machine-
like intermediate representation
• The intermediate representation have two important
properties:
a. It should be easy to produce
b. It should be easy to translate into the target machine.
Three-address code is one of the intermediate representations,
which consists of a sequence of assembly-like instructions with
three operands per instruction.
20. Intermediate Code Generation
• Each operand can act like a register.
• The output of the intermediate code generator
consists of the three-address code sequence for
position = initial + rate * 60
• t1 = inttofloat(60)
• t2 = id3 * t1
• t3 = id2 + t2
• id1 = t3
21. Code Optimization
• The machine-independent code-optimization phase
attempts to improve the intermediate code so that better
target code will result. Usually better means faster.
• Optimization has to improve the efficiency of code so
that the target program running time and consumption
of memory can be reduced.
Moreover, t3 is used only once to transmit its value to id1 so
the optimizer can transform into the shorter sequence:
t1 = id3 * 60.0
id1 = id2 + t1
22. Code Generation
• The code generator takes as input an intermediate
representation of the source program and maps it
into the target language.
• If the target language is machine code, then the
registers or memory locations are selected for each
of the variables used by the program.
The intermediate instructions are translated into sequences of machine
instructions.
LDF R2, id3
MULF R2, R2 , #60.0
LDF Rl, id2
ADDF Rl, Rl, R2
STF idl, Rl
24. Symbol-Table Management
• The symbol table, which stores information about the
entire source program, is used by all phases of the
compiler.
• An essential function of a compiler is to record the
variable names used in the source program and collect
information about various attributes of each name.
• These attributes may provide information about the
storage allocated for a name, its type, its scope.
A symbol table can be implemented in one of the following ways:
Linear (sorted or unsorted) list
Binary Search Tree
Hash table