Pune Vidyarthi Griha’s
COLLEGE OF ENGINEERING, NASHIK.
“ LANGUAGE TRANSLATOR ”
By
Prof. Anand N. Gharu
(Assistant Professor)
PVGCOE Computer Dept.
22nd Jan 2018
.
CONTENTS :-
1. Role of lexical analysis
2. Parsing, token, pattern, lexemes lex. Error
3. Regular def. for language construct & string
4. Sequences, comments & transition diagram for
recognition of tokens, reserved word & ident.
5. Introduction to Compiler & Interpreters
6. General model of Compiler
7. Compare compiler and interpreter
8. Use of interpreter & component of interpreter
9. Overview of Lex & YACC Specifications.
3
What’s a compiler?
• All computers only understand machine language
• Therefore, high-level language instructions must be translated
into machine language prior to execution
10000010010110100100101……
This is
a program
4
What’s a compiler?
• Compiler
A piece of system software that translates high-level languages
into machine language
10000010010110100100101……
Congrats!
while (c!='x')
{
if (c == 'a' || c == 'e' || c == 'i')
printf("Congrats!");
else
if (c!='x')
printf("You Loser!");
}
Compiler
gcc -o prog program.c
program.c
prog
Compiler
• Complier:-
• These are the system programs which will
automatically translate the High level language
program in to the machine language program
Source program
High level Lang.
Prog.
Target program /
M/C Lang. Prog.Compiler
Database
Types of Compiler
• Cross Assembler:-
• These are the system programs which will automatically
translate the Assembly Language program compatible with
M/C A, in to the machine language program compatible with
M/C A
Cross Assembler
Source program
Assembly Lang.
Prog. Compatible
with M/C A
Target program /
M/C Lang. Prog.
Compatible with
M/C A
M/C B
Types of compiler
• Cross Compiler:-
• These are the system programs which will automatically
translate the HLL program compatible with M/C A, in to the
machine language program compatible with M/C A , but the
underlying M/C is M/C B
Cross Compiler
Source program
HLL Prog.
Compatible with
M/C A
Target program /
M/C Lang. Prog.
M/C B
Types of Compiler
Interpreter
- It is the language translator which execute source
program line by line with out translating them into
machine language.
- It does not generate object code.
Compiler vs Interpreter
, C++ , Visual Basic
Phases of compiler
13
• Any compiler must perform two major tasks
o Analysis of the source program
o Synthesis of a machine-language program
Structure of Compiler
Compiler
Analysis Synthesis
Structure of Compiler
14
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all Phases of The Compiler)
(Character
Stream)
Intermediate
Representation
Target machine code
Structure of Compiler
15
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Scanner (Lexical Analysis)
The scanner begins the analysis of the source program
by reading the input, character by character, and
grouping characters into individual words and symbols
(tokens)
RE ( Regular expression )
NFA ( Non-deterministic Finite Automata )
DFA ( Deterministic Finite Automata )
LEX
(Character
Stream)
Intermediate
Representation
Target machine code
Structure of Compiler
16
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Parser (Syntax Analysis)
Given a formal syntax specification (typically as a
context-free grammar [CFG] ), the parse reads tokens
and groups them into units as specified by the
productions of the CFG being used.
As syntactic structure is recognized, the parser either
calls corresponding semantic routines directly or builds a
syntax tree.
CFG ( Context-Free Grammar )
BNF ( Backus-Naur Form )
GAA ( Grammar Analysis Algorithms )
(Character
Stream)
Intermediate
Representation
Target machine code
Structure of Compiler
17
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program
(Character
Stream)
Tokens Syntactic
Structure
Intermediate
Representation
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Semantic Routines
 Perform two functions
 Check the static semantics of each construct
 Do the actual translation
 The heart of a compiler
Syntax Directed Translation
Semantic Processing Techniques
IR (Intermediate Representation)
Target machine code
Structure of Compiler
18
Scanner Parser
Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all
Phases of
The Compiler)
Optimizer
The IR code generated by the semantic routines is
analyzed and transformed into functionally equivalent but
improved IR code
This phase can be very complex and slow
Peephole optimization
loop optimization, register allocation, code scheduling
Register and Temporary Management
Peephole Optimization
(Character
Stream)
Intermediate
Representation
Target machine code
Structure of Compiler
19
Source
Program
(Character
Stream)
Scanner
Tokens
Parser
Syntactic
Structure
Semantic
Routines
Intermediate
Representation
Optimizer
Code
Generator
Code Generator
 Interpretive Code Generation
 Generating Code from Tree/Dag
 Grammar-Based Code Generator
Target machine code
Structure of Compiler
20
Scanner
[Lexical Analyzer]
Parser
[Syntax Analyzer]
Semantic Process
[Semantic analyzer]
Code Generator
[Intermediate Code Generator]
Code Optimizer
Tokens
Parse tree
Abstract Syntax Tree w/ Attributes
Non-optimized Intermediate
Code
Optimized Intermediate Code
Code Optimizer
Target machine code
Lexical Analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
Code Optimization
Structure of Compiler
Compiler writing tools
• Compiler generators or compiler-
compilers
oE.g. scanner and parser
generators
oExamples : Yacc, Lex
52
Overview of Lex & YAAC
 Lex:
 Theory.
 Execution.
 Example.
 Yacc:
 Theory.
 Description.
 Example.
 Lex & Yacc linking.
 Demo.
53
Lex
 lex is a program (generator) that generates lexical analyzers, (widely
used on Unix).
 It is mostly used with Yacc parser generator.
 Written by Eric Schmidt and Mike Lesk.
 It reads the input stream (specifying the lexical analyzer ) and
outputs source code implementing the lexical analyzer in the C
programming language.
 Lex will read patterns (regular expressions); then produces C code
for a lexical analyzer that scans for identifiers.
54
STRUCTURE OF LEX
Lex
◦ A simple pattern: letter(letter|digit)*
 Regular expressions are translated by lex to a computer
program that mimics an FSA.
 This pattern matches a string of characters that begins with a
single letter followed by zero or more letters or digits.
56
Lex
 Some limitations, Lex cannot be used to recognize nested
structures such as parentheses, since it only has states and
transitions between states.
 So, Lex is good at pattern matching, while Yacc is for more
challenging tasks.
57
Lex
Pattern Matching Primitives
58
Lex
• Pattern Matching examples.
59
Lex
……..Definitions section……
%%
……Rules section……..
%%
……….C code section (subroutines)……..
• The input structure to Lex.
•Echo is an action and
predefined macro in lex that
writes code matched by the
pattern.
60
Lex
Lex predefined variables.
61
Lex
 Whitespace must separate the defining term and the associated expression.
 Code in the definitions section is simply copied as-is to the top of the generated
C file and must be bracketed with “%{“ and “%}” markers.
 substitutions in the rules section are surrounded by braces ({letter}) to
distinguish them from literals.
62
Yacc
 Theory:
◦ Yacc reads the grammar and generate C code for a parser .
◦ Grammars written in Backus Naur Form (BNF) .
◦ BNF grammar used to express context-free languages .
◦ e.g. to parse an expression , do reverse operation( reducing the
expression)
◦ This known as bottom-up or shift-reduce parsing .
◦ Using stack for storing (LIFO).
63
STRUCTURE OF YACC
Yacc
• Input to yacc is divided into three sections.
... definitions ...
%%
... rules ...
%%
... subroutines ...
65
Yacc
 The definitions section consists of:
◦ token declarations .
◦ C code bracketed by “%{“ and
“%}”.
◦ the rules section consists of:
 BNF grammar .
 the subroutines section consists of:
◦ user subroutines .
66
yacc& lex in Together
• The grammar:
program -> program expr | ε
expr -> expr + expr | expr - expr | id
• Program and expr are nonterminals.
• Id are terminals (tokens returned by lex) .
• expression may be :
o sum of two expressions .
o product of two expressions .
o Or an identifiers
67
Lex file
68
Yacc file
69
Linking lex&yacc
70
Thank You
1/22/2018 71
Gharu.anand@gmail.com

LANGUAGE TRANSLATOR

  • 1.
    Pune Vidyarthi Griha’s COLLEGEOF ENGINEERING, NASHIK. “ LANGUAGE TRANSLATOR ” By Prof. Anand N. Gharu (Assistant Professor) PVGCOE Computer Dept. 22nd Jan 2018 .
  • 2.
    CONTENTS :- 1. Roleof lexical analysis 2. Parsing, token, pattern, lexemes lex. Error 3. Regular def. for language construct & string 4. Sequences, comments & transition diagram for recognition of tokens, reserved word & ident. 5. Introduction to Compiler & Interpreters 6. General model of Compiler 7. Compare compiler and interpreter 8. Use of interpreter & component of interpreter 9. Overview of Lex & YACC Specifications.
  • 3.
    3 What’s a compiler? •All computers only understand machine language • Therefore, high-level language instructions must be translated into machine language prior to execution 10000010010110100100101…… This is a program
  • 4.
    4 What’s a compiler? •Compiler A piece of system software that translates high-level languages into machine language 10000010010110100100101…… Congrats! while (c!='x') { if (c == 'a' || c == 'e' || c == 'i') printf("Congrats!"); else if (c!='x') printf("You Loser!"); } Compiler gcc -o prog program.c program.c prog
  • 5.
    Compiler • Complier:- • Theseare the system programs which will automatically translate the High level language program in to the machine language program Source program High level Lang. Prog. Target program / M/C Lang. Prog.Compiler Database
  • 6.
    Types of Compiler •Cross Assembler:- • These are the system programs which will automatically translate the Assembly Language program compatible with M/C A, in to the machine language program compatible with M/C A Cross Assembler Source program Assembly Lang. Prog. Compatible with M/C A Target program / M/C Lang. Prog. Compatible with M/C A M/C B
  • 7.
    Types of compiler •Cross Compiler:- • These are the system programs which will automatically translate the HLL program compatible with M/C A, in to the machine language program compatible with M/C A , but the underlying M/C is M/C B Cross Compiler Source program HLL Prog. Compatible with M/C A Target program / M/C Lang. Prog. M/C B
  • 8.
  • 10.
    Interpreter - It isthe language translator which execute source program line by line with out translating them into machine language. - It does not generate object code.
  • 11.
    Compiler vs Interpreter ,C++ , Visual Basic
  • 12.
  • 13.
    13 • Any compilermust perform two major tasks o Analysis of the source program o Synthesis of a machine-language program Structure of Compiler Compiler Analysis Synthesis
  • 14.
    Structure of Compiler 14 ScannerParser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) (Character Stream) Intermediate Representation Target machine code
  • 15.
    Structure of Compiler 15 ScannerParser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Scanner (Lexical Analysis) The scanner begins the analysis of the source program by reading the input, character by character, and grouping characters into individual words and symbols (tokens) RE ( Regular expression ) NFA ( Non-deterministic Finite Automata ) DFA ( Deterministic Finite Automata ) LEX (Character Stream) Intermediate Representation Target machine code
  • 16.
    Structure of Compiler 16 ScannerParser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Parser (Syntax Analysis) Given a formal syntax specification (typically as a context-free grammar [CFG] ), the parse reads tokens and groups them into units as specified by the productions of the CFG being used. As syntactic structure is recognized, the parser either calls corresponding semantic routines directly or builds a syntax tree. CFG ( Context-Free Grammar ) BNF ( Backus-Naur Form ) GAA ( Grammar Analysis Algorithms ) (Character Stream) Intermediate Representation Target machine code
  • 17.
    Structure of Compiler 17 ScannerParser Semantic Routines Code Generator Optimizer Source Program (Character Stream) Tokens Syntactic Structure Intermediate Representation Symbol and Attribute Tables (Used by all Phases of The Compiler) Semantic Routines  Perform two functions  Check the static semantics of each construct  Do the actual translation  The heart of a compiler Syntax Directed Translation Semantic Processing Techniques IR (Intermediate Representation) Target machine code
  • 18.
    Structure of Compiler 18 ScannerParser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Optimizer The IR code generated by the semantic routines is analyzed and transformed into functionally equivalent but improved IR code This phase can be very complex and slow Peephole optimization loop optimization, register allocation, code scheduling Register and Temporary Management Peephole Optimization (Character Stream) Intermediate Representation Target machine code
  • 19.
    Structure of Compiler 19 Source Program (Character Stream) Scanner Tokens Parser Syntactic Structure Semantic Routines Intermediate Representation Optimizer Code Generator CodeGenerator  Interpretive Code Generation  Generating Code from Tree/Dag  Grammar-Based Code Generator Target machine code
  • 20.
    Structure of Compiler 20 Scanner [LexicalAnalyzer] Parser [Syntax Analyzer] Semantic Process [Semantic analyzer] Code Generator [Intermediate Code Generator] Code Optimizer Tokens Parse tree Abstract Syntax Tree w/ Attributes Non-optimized Intermediate Code Optimized Intermediate Code Code Optimizer Target machine code
  • 21.
  • 22.
  • 23.
  • 24.
  • 31.
  • 32.
  • 42.
  • 52.
    Structure of Compiler Compilerwriting tools • Compiler generators or compiler- compilers oE.g. scanner and parser generators oExamples : Yacc, Lex 52
  • 53.
    Overview of Lex& YAAC  Lex:  Theory.  Execution.  Example.  Yacc:  Theory.  Description.  Example.  Lex & Yacc linking.  Demo. 53
  • 54.
    Lex  lex isa program (generator) that generates lexical analyzers, (widely used on Unix).  It is mostly used with Yacc parser generator.  Written by Eric Schmidt and Mike Lesk.  It reads the input stream (specifying the lexical analyzer ) and outputs source code implementing the lexical analyzer in the C programming language.  Lex will read patterns (regular expressions); then produces C code for a lexical analyzer that scans for identifiers. 54
  • 55.
  • 56.
    Lex ◦ A simplepattern: letter(letter|digit)*  Regular expressions are translated by lex to a computer program that mimics an FSA.  This pattern matches a string of characters that begins with a single letter followed by zero or more letters or digits. 56
  • 57.
    Lex  Some limitations,Lex cannot be used to recognize nested structures such as parentheses, since it only has states and transitions between states.  So, Lex is good at pattern matching, while Yacc is for more challenging tasks. 57
  • 58.
  • 59.
  • 60.
    Lex ……..Definitions section…… %% ……Rules section…….. %% ……….Ccode section (subroutines)…….. • The input structure to Lex. •Echo is an action and predefined macro in lex that writes code matched by the pattern. 60
  • 61.
  • 62.
    Lex  Whitespace mustseparate the defining term and the associated expression.  Code in the definitions section is simply copied as-is to the top of the generated C file and must be bracketed with “%{“ and “%}” markers.  substitutions in the rules section are surrounded by braces ({letter}) to distinguish them from literals. 62
  • 63.
    Yacc  Theory: ◦ Yaccreads the grammar and generate C code for a parser . ◦ Grammars written in Backus Naur Form (BNF) . ◦ BNF grammar used to express context-free languages . ◦ e.g. to parse an expression , do reverse operation( reducing the expression) ◦ This known as bottom-up or shift-reduce parsing . ◦ Using stack for storing (LIFO). 63
  • 64.
  • 65.
    Yacc • Input toyacc is divided into three sections. ... definitions ... %% ... rules ... %% ... subroutines ... 65
  • 66.
    Yacc  The definitionssection consists of: ◦ token declarations . ◦ C code bracketed by “%{“ and “%}”. ◦ the rules section consists of:  BNF grammar .  the subroutines section consists of: ◦ user subroutines . 66
  • 67.
    yacc& lex inTogether • The grammar: program -> program expr | ε expr -> expr + expr | expr - expr | id • Program and expr are nonterminals. • Id are terminals (tokens returned by lex) . • expression may be : o sum of two expressions . o product of two expressions . o Or an identifiers 67
  • 68.
  • 69.
  • 70.
  • 71.