1
COMPILER
A compiler is a computer program that transforms
source code written in a programming language (the
source language) into another computer language (the
target language), with the latter takes binary form
known as object code
It create an executable program
2
Cause
Software for early computers was written in
assembly language
The benefits of reusing software on
different CPUs started to become
significantly greater than the cost of writing
a compiler
The first real compiler
FORTRAN compilers of the late 1950s
18 person-years to build
3
Any compiler must perform two major tasks
Analysis of the source program
Synthesis of a machine-language program
Structure of Compiler
4
THE STRUCTURE OF A COMPILER (2)
5
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all Phases of The Compiler)
(Character Stream)
Intermediate
Representation
Target machine code
THE STRUCTURE OF A COMPILER (3)
6
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Scanner
 The scanner begins the analysis of the source program by
reading the input, character by character, and grouping
characters into individual words and symbols (tokens)
 RE ( Regular expression )
 NFA ( Non-deterministic Finite Automata )
 DFA ( Deterministic Finite Automata )
 LEX
(Character Stream)
Intermediate
Representation
Target machine code
THE STRUCTURE OF A COMPILER (4)
7
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Parser
 Given a formal syntax specification (typically as a [CFG] ),
the parse reads tokens and groups them icontext-free
grammar nto units as specified by the productions of the
CFG being used.
 As syntactic structure is recognized, the parser either calls
corresponding semantic routines directly or builds a syntax
tree.
 CFG ( Context-Free Grammar )
 BNF ( Backus-Naur Form )
 GAA ( Grammar Analysis Algorithms )
 LL, LR, SLR, LALR Parsers
 YACC
(Character Stream)
Intermediate
Representation
Target machine code
THE STRUCTURE OF A COMPILER (5)
8
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program
(Character Stream)
Tokens Syntactic
Structure
Intermediate
Representation
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Semantic Routines
 Perform two functions
 Check the static semantics of each construct
 Do the actual translation
 The heart of a compiler
 Syntax Directed Translation
 Semantic Processing Techniques
 IR (Intermediate Representation)
Target machine code
THE STRUCTURE OF A COMPILER (6)
9
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Optimizer
 The IR code generated by the semantic routines is
analyzed and transformed into functionally equivalent but
improved IR code
 This phase can be very complex and slow
 Peephole optimization
 loop optimization, register allocation, code scheduling
 Register and Temporary Management
 Peephole Optimization
(Character Stream)
Intermediate
Representation
Target machine code
THE STRUCTURE OF A COMPILER (7)
10
Source
Program
(Character Stream)
Scanner
Tokens
Parser
Syntactic
Structure
Semantic
Routines
Intermediate
Representation
Optimizer
Code
Generator
Code Generator
 Interpretive Code Generation
 Generating Code from Tree/Dag
 Grammar-Based Code Generator
Target machine code
THE STRUCTURE OF A COMPILER (8)
11
Scanner
[Lexical Analyzer]
Parser
[Syntax Analyzer]
Semantic Process
[Semantic analyzer]
Code Generator
[Intermediate Code Generator]
Code Optimizer
Tokens
Parse tree
Abstract Syntax Tree w/ Attributes
Non-optimized Intermediate Cod
Optimized Intermediate Code
Code Optimizer
Target machine code
Language Description
Identifier Rules
•Identifier can be of maximum length 6.
•Identifiers are not case sensitive.
•An Indetifier can only have alphanumeric characters( a-z
, A-Z , 0-9 ) and underscore(_).
•The first character of an identifier can only contain
alphabet( a-z , A-Z ).
•Keywords are not allowed to be used as Identifiers.
•No special characters, such as semicolon, period,
whitespaces, slash or comma are permitted to be used in
or as Identifier.
12
Data Types:
Our language supports only 3 datatypes
•Integer
•String
•Character
Expressions
1.Arithmetic operators (+, -, *, /, %)
2.Uniray operator
3.Paranthesis
4.Only Integer supported
5.Relational expression to be supported (>, <, >=, <=, ==, !=)
6. Character string and integer constants
13
Statements
•Declaration statement : int a;
•Declaration and Initialisation : int a=5;
•Assingment Statement : a=6;
Conditional statement
Simple if (nesting not allowed)
if then
Endif
Switch Statement (nesting not allowed)
Switch()
Cases
Value 1:
Break;
Value n:
break;
Endcase
14
Repetition Statement (nesting not allowed)
a.Repeat
Until ()
a.While (relational expression)
Endwhile
a.For = start value, end value, inc/dec
………
Endfor
4
I/O Statement
•Input ;
•Output ;
Program Structure
Decleration:
Start
End 15
1.Sample Program I
#mode 10
declaration
int r
int c
int in
int flg
start
r = 0
flg = 1
while( flg == 1 )
if( c == 0) then
flg = 0
endif
c = c-1
endwhile
end
16
OUTPUT 1
START:
MOV AX, @DATA
MOV DS, AX
MOV AX,
MOV r, AX
MOV AX,
MOV flg, AX
LB01:
MOV AX,
CMP AX,
JNE LB01
MOV AX,
CMP AX,
JNE LB01
MOV AX,
MOV flg, AX
LB02:
MOV AX,
SUB AX,
MOV c, AX
JMP LB01
LB03:
MOV AX, 4C00H
INT 21H
END START
17
Sample Program II
#mode 10
declaration
int a ; b
int i
int k
string mes1
start
k=k*1
if(i<9 )then
i=i+9
k=k*1
endif
i=i-45
repeat
i=i+9*k+b
k=k*1
output "Hello World"
input k
until(i<2 )
while(k>3 )
i=i+9
k=k*1 endwhile
end
18
OUTPUT
START:
MOV AX, @DATA
MOV DS, AX
MOV AX, k
MUL 1
MOV k, AX
MOV AX, i
CMP AX, 9
JGE LB01
MOV AX, i
ADD AX, 9
MOV i, AX
MOV AX, k
MUL 1
MOV k, AX
LB01:
MOV AX, i
SUB AX, 45
MOV i, AX
LB02:
MOV AX, i
ADD AX, 9
MUL k
ADD AX, b
MOV i, AX
MOV AX, k
MUL 1
MOV k, AX 19
OUTPUT
LEA DX, "Hello World"
CALL MESSAGE
CALL INDEC
MOV k, AX
MOV AX, i
CMP AX, 2
JGE LB01
LB03:
MOV AX,
CMP AX, 3
JLE LB01
MOV AX, i
ADD AX, 9
MOV i, AX
MOV AX, k
MUL 1
MOV k, AX
JMP LB01
MOV AX, i
ADD AX, 9
MOV i, AX
MOV AX, k
MUL 1
MOV k, AX
JMP LB01
LB04:
MOV AX, 4C00H
INT 21H
END START
20
SCREENSHOTS
21
22
23
Feasibility and future scope
With the growth of technology ease of working is given
priority.
We have emerged from C , C++ to python ,ruby , etc. which
require less lines of code .
Our project can be extended to form a new language which is
easy to learn, faster , has more inbuilt features and has many
more qualities of a good programming language.
24
Conclusion
In a compiler the process of Intermediate code generation is
independent of machine and the process of conversion of
Intermediate code to target code is independent of language
used.
Thus we have done the front end of compilation process.
It includes 3 phases of compilation
lexical analysis
syntax analysis
semantic analysis
Followed by intermediate code generation.
25
References
•Salomaa, Arto [1973]. Formal Languages. Academic Press,
New York
•Schulz, Waldean A. [1976]. Semantic Analysis and Target
Language Synthesis in a Translator.Ph.D. thesis, University of
Colorado, Boulder, CO.
•https://www.cs.vt.edu/undergraduate/courses/CS4304
26
27

Structure-Compiler-phases information about basics of compiler. Pdfpdf

  • 1.
  • 2.
    COMPILER A compiler isa computer program that transforms source code written in a programming language (the source language) into another computer language (the target language), with the latter takes binary form known as object code It create an executable program 2
  • 3.
    Cause Software for earlycomputers was written in assembly language The benefits of reusing software on different CPUs started to become significantly greater than the cost of writing a compiler The first real compiler FORTRAN compilers of the late 1950s 18 person-years to build 3
  • 4.
    Any compiler mustperform two major tasks Analysis of the source program Synthesis of a machine-language program Structure of Compiler 4
  • 5.
    THE STRUCTURE OFA COMPILER (2) 5 Scanner Parser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) (Character Stream) Intermediate Representation Target machine code
  • 6.
    THE STRUCTURE OFA COMPILER (3) 6 Scanner Parser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Scanner  The scanner begins the analysis of the source program by reading the input, character by character, and grouping characters into individual words and symbols (tokens)  RE ( Regular expression )  NFA ( Non-deterministic Finite Automata )  DFA ( Deterministic Finite Automata )  LEX (Character Stream) Intermediate Representation Target machine code
  • 7.
    THE STRUCTURE OFA COMPILER (4) 7 Scanner Parser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Parser  Given a formal syntax specification (typically as a [CFG] ), the parse reads tokens and groups them icontext-free grammar nto units as specified by the productions of the CFG being used.  As syntactic structure is recognized, the parser either calls corresponding semantic routines directly or builds a syntax tree.  CFG ( Context-Free Grammar )  BNF ( Backus-Naur Form )  GAA ( Grammar Analysis Algorithms )  LL, LR, SLR, LALR Parsers  YACC (Character Stream) Intermediate Representation Target machine code
  • 8.
    THE STRUCTURE OFA COMPILER (5) 8 Scanner Parser Semantic Routines Code Generator Optimizer Source Program (Character Stream) Tokens Syntactic Structure Intermediate Representation Symbol and Attribute Tables (Used by all Phases of The Compiler) Semantic Routines  Perform two functions  Check the static semantics of each construct  Do the actual translation  The heart of a compiler  Syntax Directed Translation  Semantic Processing Techniques  IR (Intermediate Representation) Target machine code
  • 9.
    THE STRUCTURE OFA COMPILER (6) 9 Scanner Parser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Optimizer  The IR code generated by the semantic routines is analyzed and transformed into functionally equivalent but improved IR code  This phase can be very complex and slow  Peephole optimization  loop optimization, register allocation, code scheduling  Register and Temporary Management  Peephole Optimization (Character Stream) Intermediate Representation Target machine code
  • 10.
    THE STRUCTURE OFA COMPILER (7) 10 Source Program (Character Stream) Scanner Tokens Parser Syntactic Structure Semantic Routines Intermediate Representation Optimizer Code Generator Code Generator  Interpretive Code Generation  Generating Code from Tree/Dag  Grammar-Based Code Generator Target machine code
  • 11.
    THE STRUCTURE OFA COMPILER (8) 11 Scanner [Lexical Analyzer] Parser [Syntax Analyzer] Semantic Process [Semantic analyzer] Code Generator [Intermediate Code Generator] Code Optimizer Tokens Parse tree Abstract Syntax Tree w/ Attributes Non-optimized Intermediate Cod Optimized Intermediate Code Code Optimizer Target machine code
  • 12.
    Language Description Identifier Rules •Identifiercan be of maximum length 6. •Identifiers are not case sensitive. •An Indetifier can only have alphanumeric characters( a-z , A-Z , 0-9 ) and underscore(_). •The first character of an identifier can only contain alphabet( a-z , A-Z ). •Keywords are not allowed to be used as Identifiers. •No special characters, such as semicolon, period, whitespaces, slash or comma are permitted to be used in or as Identifier. 12
  • 13.
    Data Types: Our languagesupports only 3 datatypes •Integer •String •Character Expressions 1.Arithmetic operators (+, -, *, /, %) 2.Uniray operator 3.Paranthesis 4.Only Integer supported 5.Relational expression to be supported (>, <, >=, <=, ==, !=) 6. Character string and integer constants 13
  • 14.
    Statements •Declaration statement :int a; •Declaration and Initialisation : int a=5; •Assingment Statement : a=6; Conditional statement Simple if (nesting not allowed) if then Endif Switch Statement (nesting not allowed) Switch() Cases Value 1: Break; Value n: break; Endcase 14
  • 15.
    Repetition Statement (nestingnot allowed) a.Repeat Until () a.While (relational expression) Endwhile a.For = start value, end value, inc/dec ……… Endfor 4 I/O Statement •Input ; •Output ; Program Structure Decleration: Start End 15
  • 16.
    1.Sample Program I #mode10 declaration int r int c int in int flg start r = 0 flg = 1 while( flg == 1 ) if( c == 0) then flg = 0 endif c = c-1 endwhile end 16
  • 17.
    OUTPUT 1 START: MOV AX,@DATA MOV DS, AX MOV AX, MOV r, AX MOV AX, MOV flg, AX LB01: MOV AX, CMP AX, JNE LB01 MOV AX, CMP AX, JNE LB01 MOV AX, MOV flg, AX LB02: MOV AX, SUB AX, MOV c, AX JMP LB01 LB03: MOV AX, 4C00H INT 21H END START 17
  • 18.
    Sample Program II #mode10 declaration int a ; b int i int k string mes1 start k=k*1 if(i<9 )then i=i+9 k=k*1 endif i=i-45 repeat i=i+9*k+b k=k*1 output "Hello World" input k until(i<2 ) while(k>3 ) i=i+9 k=k*1 endwhile end 18
  • 19.
    OUTPUT START: MOV AX, @DATA MOVDS, AX MOV AX, k MUL 1 MOV k, AX MOV AX, i CMP AX, 9 JGE LB01 MOV AX, i ADD AX, 9 MOV i, AX MOV AX, k MUL 1 MOV k, AX LB01: MOV AX, i SUB AX, 45 MOV i, AX LB02: MOV AX, i ADD AX, 9 MUL k ADD AX, b MOV i, AX MOV AX, k MUL 1 MOV k, AX 19
  • 20.
    OUTPUT LEA DX, "HelloWorld" CALL MESSAGE CALL INDEC MOV k, AX MOV AX, i CMP AX, 2 JGE LB01 LB03: MOV AX, CMP AX, 3 JLE LB01 MOV AX, i ADD AX, 9 MOV i, AX MOV AX, k MUL 1 MOV k, AX JMP LB01 MOV AX, i ADD AX, 9 MOV i, AX MOV AX, k MUL 1 MOV k, AX JMP LB01 LB04: MOV AX, 4C00H INT 21H END START 20
  • 21.
  • 22.
  • 23.
  • 24.
    Feasibility and futurescope With the growth of technology ease of working is given priority. We have emerged from C , C++ to python ,ruby , etc. which require less lines of code . Our project can be extended to form a new language which is easy to learn, faster , has more inbuilt features and has many more qualities of a good programming language. 24
  • 25.
    Conclusion In a compilerthe process of Intermediate code generation is independent of machine and the process of conversion of Intermediate code to target code is independent of language used. Thus we have done the front end of compilation process. It includes 3 phases of compilation lexical analysis syntax analysis semantic analysis Followed by intermediate code generation. 25
  • 26.
    References •Salomaa, Arto [1973].Formal Languages. Academic Press, New York •Schulz, Waldean A. [1976]. Semantic Analysis and Target Language Synthesis in a Translator.Ph.D. thesis, University of Colorado, Boulder, CO. •https://www.cs.vt.edu/undergraduate/courses/CS4304 26
  • 27.