Introduction to Compilers
Mr. Sameer Prembabu Mamadapure
Information Technology Department
International Institute of Information Technology, I²IT
www.isquareit.edu.in
Compilers
 “Compilation”
 Translation of a program written in a source language
into a semantically equivalent program written in a
target language
Compiler
Error messages
Source
Program
Target
Program
Input
Output
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Intermediate Code Generator
Machine-Independent
Code Optimizer
Code Generator
Machine-Dependent
Code Optimizer
character stream
token stream
syntax tree
syntax tree
target-machine code
Symbol
Table
Phases of a Compiler
Error Handler
intermediate representation
target-machine code
intermediate representation
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
character stream
position = initial + rate * 60
<id,1> <=> <id,2> <+> <id,3> <*> <60>
=
<id,1>
<id,2>
<id,3>
+
*
60
=
<id,1>
<id,2>
<id,3>
+
*
inttofloat
60
Example:-
Intermediate Code Generator
Machine-Independent
Code Optimizer
Code Generator
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
t1 = id3 * 60.0
id1 = id2 + t1
L R2, id3
MULF R2, #60.0
L R1, id2
ADDF R1, R2
ST id1, R1
1 position …
2 initial …
3 rate …
SYMBOL TABLE
The Phases of a Compiler
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis
based on the grammar of the
programming language)
Parse tree or abstract syntax tree ;
|
=
/ 
A +
/ 
B C
Semantic analyzer (type checking,
etc)
Annotated parse tree or abstract
syntax tree
Intermediate code generator Three-address code, quads, or RTL int2fp B t1
+ t1 C t2
:= t2 A
Optimizer Three-address code, quads, or RTL int2fp B t1
+ t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Peephole optimizer Assembly code ADDF2 #2.3,r2
MOVF r2,A
7
 Any compiler must perform two major tasks
 Analysis of the source program
 Synthesis of a machine-language program
The Structure of a Compiler (1)
Compiler
Analysis Synthesis
The Structure of a Compiler
8
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all Phases of The Compiler)
(Character Stream)
Intermediate
Representation
Target machine code
9
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Scanner
 The scanner begins the analysis of the source program
by reading the input, character by character, and
grouping characters into individual words and symbols
(tokens)
 RE ( Regular expression )
 NFA ( Non-deterministic Finite Automata )
 DFA ( Deterministic Finite Automata )
 LEX
(Character Stream)
Intermediate
Representation
Target machine code
10
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Parser
 Given a formal syntax specification (typically as a context-
free grammar [CFG] ), the parse reads tokens and groups
them into units as specified by the productions of the
CFG being used.
 As syntactic structure is recognized, the parser either calls
corresponding semantic routines directly or builds a
syntax tree.
 CFG ( Context-Free Grammar )
 BNF ( Backus-Naur Form )
 GAA ( Grammar Analysis Algorithms )
 LL, LR, SLR, LALR Parsers
 YACC
(Character Stream)
Intermediate
Representation
Target machine code
11
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program
(Character Stream)
Tokens Syntactic
Structure
Intermediate
Representation
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Semantic Routines
 Perform two functions
 Check the static semantics of each construct
 Do the actual translation
 The heart of a compiler
 Syntax Directed Translation
 Semantic Processing Techniques
 IR (Intermediate Representation)
Target machine code
12
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Optimizer
 The IR code generated by the semantic routines is
analyzed and transformed into functionally equivalent but
improved IR code
 This phase can be very complex and slow
 Peephole optimization
 loop optimization, register allocation, code scheduling
 Register and Temporary Management
 Peephole Optimization
(Character Stream)
Intermediate
Representation
Target machine code
13
Source
Program
(Character Stream)
Scanner
Tokens
Parser
Syntactic
Structure
Semantic
Routines
Intermediate
Representation
Optimizer
Code
Generator
Code Generator
 Interpretive Code Generation
 Generating Code from Tree/Dag
 Grammar-Based Code Generator
Target machine code
14
Scanner
[Lexical Analyzer]
Parser
[Syntax Analyzer]
Semantic Process
[Semantic analyzer]
Code Generator
[Intermediate Code Generator]
Code Optimizer
Tokens
Parse tree
Abstract Syntax Tree w/ Attributes
Non-optimized Intermediate
Code
Optimized Intermediate Code
Code Optimizer
Target machine code
Thank You
www.isquareit.edu.in
P-14, Rajiv Gandhi Infotech Park, MIDC
Phase – 1, Hinjawadi, Pune – 411057, India

Introduction to Compilers | Phases & Structure

  • 1.
    Introduction to Compilers Mr.Sameer Prembabu Mamadapure Information Technology Department International Institute of Information Technology, I²IT www.isquareit.edu.in
  • 2.
    Compilers  “Compilation”  Translationof a program written in a source language into a semantically equivalent program written in a target language Compiler Error messages Source Program Target Program Input Output
  • 3.
    Lexical Analyzer Syntax Analyzer SemanticAnalyzer Intermediate Code Generator Machine-Independent Code Optimizer Code Generator Machine-Dependent Code Optimizer character stream token stream syntax tree syntax tree target-machine code Symbol Table Phases of a Compiler Error Handler intermediate representation target-machine code intermediate representation
  • 4.
    Lexical Analyzer Syntax Analyzer SemanticAnalyzer character stream position = initial + rate * 60 <id,1> <=> <id,2> <+> <id,3> <*> <60> = <id,1> <id,2> <id,3> + * 60 = <id,1> <id,2> <id,3> + * inttofloat 60 Example:-
  • 5.
    Intermediate Code Generator Machine-Independent CodeOptimizer Code Generator t1 = inttofloat(60) t2 = id3 * t1 t3 = id2 + t2 id1 = t3 t1 = id3 * 60.0 id1 = id2 + t1 L R2, id3 MULF R2, #60.0 L R1, id2 ADDF R1, R2 ST id1, R1 1 position … 2 initial … 3 rate … SYMBOL TABLE
  • 6.
    The Phases ofa Compiler Phase Output Sample Programmer (source code producer) Source string A=B+C; Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’ And symbol table with names Parser (performs syntax analysis based on the grammar of the programming language) Parse tree or abstract syntax tree ; | = / A + / B C Semantic analyzer (type checking, etc) Annotated parse tree or abstract syntax tree Intermediate code generator Three-address code, quads, or RTL int2fp B t1 + t1 C t2 := t2 A Optimizer Three-address code, quads, or RTL int2fp B t1 + t1 #2.3 A Code generator Assembly code MOVF #2.3,r1 ADDF2 r1,r2 MOVF r2,A Peephole optimizer Assembly code ADDF2 #2.3,r2 MOVF r2,A
  • 7.
    7  Any compilermust perform two major tasks  Analysis of the source program  Synthesis of a machine-language program The Structure of a Compiler (1) Compiler Analysis Synthesis
  • 8.
    The Structure ofa Compiler 8 Scanner Parser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) (Character Stream) Intermediate Representation Target machine code
  • 9.
    9 Scanner Parser Semantic Routines Code Generator Optimizer Source Program TokensSyntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Scanner  The scanner begins the analysis of the source program by reading the input, character by character, and grouping characters into individual words and symbols (tokens)  RE ( Regular expression )  NFA ( Non-deterministic Finite Automata )  DFA ( Deterministic Finite Automata )  LEX (Character Stream) Intermediate Representation Target machine code
  • 10.
    10 Scanner Parser Semantic Routines Code Generator Optimizer Source Program TokensSyntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Parser  Given a formal syntax specification (typically as a context- free grammar [CFG] ), the parse reads tokens and groups them into units as specified by the productions of the CFG being used.  As syntactic structure is recognized, the parser either calls corresponding semantic routines directly or builds a syntax tree.  CFG ( Context-Free Grammar )  BNF ( Backus-Naur Form )  GAA ( Grammar Analysis Algorithms )  LL, LR, SLR, LALR Parsers  YACC (Character Stream) Intermediate Representation Target machine code
  • 11.
    11 Scanner Parser Semantic Routines Code Generator Optimizer Source Program (Character Stream) TokensSyntactic Structure Intermediate Representation Symbol and Attribute Tables (Used by all Phases of The Compiler) Semantic Routines  Perform two functions  Check the static semantics of each construct  Do the actual translation  The heart of a compiler  Syntax Directed Translation  Semantic Processing Techniques  IR (Intermediate Representation) Target machine code
  • 12.
    12 Scanner Parser Semantic Routines Code Generator Optimizer Source Program TokensSyntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Optimizer  The IR code generated by the semantic routines is analyzed and transformed into functionally equivalent but improved IR code  This phase can be very complex and slow  Peephole optimization  loop optimization, register allocation, code scheduling  Register and Temporary Management  Peephole Optimization (Character Stream) Intermediate Representation Target machine code
  • 13.
    13 Source Program (Character Stream) Scanner Tokens Parser Syntactic Structure Semantic Routines Intermediate Representation Optimizer Code Generator Code Generator Interpretive Code Generation  Generating Code from Tree/Dag  Grammar-Based Code Generator Target machine code
  • 14.
    14 Scanner [Lexical Analyzer] Parser [Syntax Analyzer] SemanticProcess [Semantic analyzer] Code Generator [Intermediate Code Generator] Code Optimizer Tokens Parse tree Abstract Syntax Tree w/ Attributes Non-optimized Intermediate Code Optimized Intermediate Code Code Optimizer Target machine code
  • 15.
    Thank You www.isquareit.edu.in P-14, RajivGandhi Infotech Park, MIDC Phase – 1, Hinjawadi, Pune – 411057, India