Akhil Kaushik
Asstt. Prof., CE Deptt.,
TIT Bhiwani
Lexical Analysis
Lexical Analysis
• 1st phase of compiler.
• Part of front end of
compiler.
• It is usually implemented
as subroutine or co-
routine of parser.
Lexical Analysis
• Reads the source program as a file of characters and
divide them up into tokens.
• Tokens are recognized using Finite Automata or
Transition Diagrams.
Lexical Analysis
• The vital functions of Lexical Analyzer are:-
– Separate tokens from program
– Insert tokens into symbol table
– Eliminates whitespaces, comments, newline characters,
etc. from string.
– Give tokens to parser (pass their integer no./code to
parser)
– Keep track of line numbers (for error handling).
– Pre-processing of Macros.
Lexical Analysis
• It reads characters from the input, groups them into
lexeme and
• Then, passes the tokens formed by the lexemes,
together with their attribute value, to the parser.
• In some situations ,it has to read some characters
ahead before it can decide on the token to be
returned to the parser.
Lexical Analysis
• Ex: lexical analyzer for C must read ahead after it
sees the character ‘>’.
• If the next character is = ,then the character
sequence >= is the lexeme forming the token for
the “Greater than or equal to ” operator.
• Other wise > is the lexeme forming “Greater than ”
operator ,and the lexical analyzer has read one
character too many.
Lexical Analysis
• The extra character has to be pushed back on to the
input, because it can be the beginning of the next
lexeme in the input.
• The lexical analyzer and parser form a producer-
Consumer pair.
• Produced tokens can be held in a token buffer until
they are consumed.
Input Buffering
• The interaction b/w the two is constrained only by
the size of the buffer, because:-
– lexical analyzer can not proceed when the buffer is full.
– parser can not proceed when the buffer is empty.
• Commonly,the buffer holds just one token.
• Lexical Analyzer can be a procedure called by a
parser, returning tokens on demand.
Input Buffering
• The implementation of reading and pushing back
character is usually done by setting up an input buffer.
• A block of character is read into the buffer at a time.
• A pointer keeps track of the portion of the input that
has been analysed.
• Pushing back a character is implemented by moving
the pointer.
Input Buffering
• Source program is read from HDD.
• LA has to access secondary memory everytime it
needs to identify tokens.
• It may be time consuming, hence an input buffer
is used.
• LA scans input string from left to right one char. at
a time to identify tokens.
Input Buffering
• It uses 2 pointers:
– Begin Pointer (bptr) – points to begin of string.
– Look-ahead Pointer (lptr) – moves ahead to search for
end of token.
• Ex: int a, b;
Input Buffering
Input Buffering
Tokens, Lexemes & Patterns
• Connected with lexical analysis are three
important terms with similar meaning.
• Lexeme
• Token
• Patterns
Tokens, Lexemes & Patterns
• A token is a pair consisting of a token name and an
optional attribute value.
• Token names:
– Keywords,
– operators,
– identifiers,
– constants,
– literal strings,
– punctuation symbols(such as commas, semi-colons)
Tokens, Lexemes & Patterns
A lexeme is a sequence of characters in the
source program that matches the pattern for a
token and is identified by the lexical analyzer as
an instance of that token.
Ex: Relation Operators {<,<=,>,>=,==,<>}
Tokens, Lexemes & Patterns
• A pattern is a description of the form that the
lexemes of token may take.
• It gives an informal or formal description of a token.
Ex: identifier
• Gives a precise description/ specification of tokens.
• Used to automatically generate a lexical analyzer
Tokens, Lexemes & Patterns
Tokens
Non-Tokens
Akhil Kaushik
akhilkaushik05@gmail.com
9416910303
CONTACT ME AT:
Akhil Kaushik
akhilkaushik05@gmail.com
9416910303
THANK YOU !!!

Lexical Analysis - Compiler Design

  • 1.
    Akhil Kaushik Asstt. Prof.,CE Deptt., TIT Bhiwani Lexical Analysis
  • 2.
    Lexical Analysis • 1stphase of compiler. • Part of front end of compiler. • It is usually implemented as subroutine or co- routine of parser.
  • 3.
    Lexical Analysis • Readsthe source program as a file of characters and divide them up into tokens. • Tokens are recognized using Finite Automata or Transition Diagrams.
  • 4.
    Lexical Analysis • Thevital functions of Lexical Analyzer are:- – Separate tokens from program – Insert tokens into symbol table – Eliminates whitespaces, comments, newline characters, etc. from string. – Give tokens to parser (pass their integer no./code to parser) – Keep track of line numbers (for error handling). – Pre-processing of Macros.
  • 5.
    Lexical Analysis • Itreads characters from the input, groups them into lexeme and • Then, passes the tokens formed by the lexemes, together with their attribute value, to the parser. • In some situations ,it has to read some characters ahead before it can decide on the token to be returned to the parser.
  • 6.
    Lexical Analysis • Ex:lexical analyzer for C must read ahead after it sees the character ‘>’. • If the next character is = ,then the character sequence >= is the lexeme forming the token for the “Greater than or equal to ” operator. • Other wise > is the lexeme forming “Greater than ” operator ,and the lexical analyzer has read one character too many.
  • 7.
    Lexical Analysis • Theextra character has to be pushed back on to the input, because it can be the beginning of the next lexeme in the input. • The lexical analyzer and parser form a producer- Consumer pair. • Produced tokens can be held in a token buffer until they are consumed.
  • 8.
    Input Buffering • Theinteraction b/w the two is constrained only by the size of the buffer, because:- – lexical analyzer can not proceed when the buffer is full. – parser can not proceed when the buffer is empty. • Commonly,the buffer holds just one token. • Lexical Analyzer can be a procedure called by a parser, returning tokens on demand.
  • 9.
    Input Buffering • Theimplementation of reading and pushing back character is usually done by setting up an input buffer. • A block of character is read into the buffer at a time. • A pointer keeps track of the portion of the input that has been analysed. • Pushing back a character is implemented by moving the pointer.
  • 10.
    Input Buffering • Sourceprogram is read from HDD. • LA has to access secondary memory everytime it needs to identify tokens. • It may be time consuming, hence an input buffer is used. • LA scans input string from left to right one char. at a time to identify tokens.
  • 11.
    Input Buffering • Ituses 2 pointers: – Begin Pointer (bptr) – points to begin of string. – Look-ahead Pointer (lptr) – moves ahead to search for end of token. • Ex: int a, b;
  • 12.
  • 13.
  • 14.
    Tokens, Lexemes &Patterns • Connected with lexical analysis are three important terms with similar meaning. • Lexeme • Token • Patterns
  • 15.
    Tokens, Lexemes &Patterns • A token is a pair consisting of a token name and an optional attribute value. • Token names: – Keywords, – operators, – identifiers, – constants, – literal strings, – punctuation symbols(such as commas, semi-colons)
  • 16.
    Tokens, Lexemes &Patterns A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token. Ex: Relation Operators {<,<=,>,>=,==,<>}
  • 17.
    Tokens, Lexemes &Patterns • A pattern is a description of the form that the lexemes of token may take. • It gives an informal or formal description of a token. Ex: identifier • Gives a precise description/ specification of tokens. • Used to automatically generate a lexical analyzer
  • 18.
  • 19.
  • 20.
  • 21.
    Akhil Kaushik akhilkaushik05@gmail.com 9416910303 CONTACT MEAT: Akhil Kaushik akhilkaushik05@gmail.com 9416910303 THANK YOU !!!