By : Kiran Acharya
inspireray.blogspot.in
Introduction to lexical Analyzer
Input buffering
Specifications of tokens
Regular expressions
inspireray.blogspot.in
There are two basic rules in regular
expression.
1. ɛ is a regular expression and L(ɛ), that is
the language whose sole number is an
empty string.
2. If a is a symbol in Ʃ , then a is a regular
expression with one string of length one ,
with a at its first position.
inspireray.blogspot.in
(r)|(s) is a regular expression denoting
L(r)ỤL(s).
(r)(s) is a regular expression means L(r)L(s).
(r)* means regular expression(L(r)).
inspireray.blogspot.in
Unary operator * has the highest precedence
and its left associative.
Concatenation has second and its left
associative.
| has the lowest and left associative.
inspireray.blogspot.in
Language that can be defined by regular
expression is called regular set
if regular expression r and s are from same
set they are equivalent.
(a|b)=(b|a)
inspireray.blogspot.in
It’s a sequence of definitions of the form:
d1→r1;
d2→r2;
each d is a new symbol
r is regular expression over alphabets.
inspireray.blogspot.in
Kleene closure extended in 50’s
One or more instances
Zero or one instance.
Character class.
inspireray.blogspot.in
Taking the patterns from tokens and build
piece of code that examines the input find
the prefix that is the lexeme matching one of
the pattern.
Methods:
1. Transition Diagrams
2. Recognition of reverse words and identifier
3. Completion of running example.
inspireray.blogspot.in
These are first flow charts.
Conversion from patterns to transition
diagram.
It has states.
Edges
input
inspireray.blogspot.in
First and final state
Accepting state
Start state
inspireray.blogspot.in
Finding keywords and identifiers are the
problem.
Return(gettoken(),installid())
0 10 11
start
letter
Letter
or digit
othe
r
inspireray.blogspot.in
Tool lex
Input is lex language tool itself is a lex
compiler.
Input file is lex.l
Compiler transform it into c program
Lex.yy.c
And later the file is compiled by c to a.out
inspireray.blogspot.in
Declarations: %%
Translation rules %%
Auxiliary functions
inspireray.blogspot.in
Always prefer a longer prefix over the
shorter
It longer matches the two or more patterns
then prefer the pattern listed first.
Look Ahead operator: / is inserted to know
the end of the part of lexeme.
inspireray.blogspot.in
The heart of the transition of lex turning
input program to lexical analyzer is finite
automata.
Finite automata are recognizers they just say
yes or no.
Two types:
1. Non deterministic
2. Deterministic
inspireray.blogspot.in
No restrictions to the edges from the same
state.
Finite state of state s
Input alphabet Ʃ
Transition function
Start state
Final state
inspireray.blogspot.in
State a b ɛ
0 {0,1} {0} ф
1 ф {2} ф
2 ф {3} ф
3 ф ф Ф
inspireray.blogspot.in
aabb
(a|b)* abb
inspireray.blogspot.in
There is exactly one edge form the input to
the next state.
inspireray.blogspot.in

Lexical analyzer

  • 1.
    By : KiranAcharya inspireray.blogspot.in
  • 2.
    Introduction to lexicalAnalyzer Input buffering Specifications of tokens Regular expressions inspireray.blogspot.in
  • 3.
    There are twobasic rules in regular expression. 1. ɛ is a regular expression and L(ɛ), that is the language whose sole number is an empty string. 2. If a is a symbol in Ʃ , then a is a regular expression with one string of length one , with a at its first position. inspireray.blogspot.in
  • 4.
    (r)|(s) is aregular expression denoting L(r)ỤL(s). (r)(s) is a regular expression means L(r)L(s). (r)* means regular expression(L(r)). inspireray.blogspot.in
  • 5.
    Unary operator *has the highest precedence and its left associative. Concatenation has second and its left associative. | has the lowest and left associative. inspireray.blogspot.in
  • 6.
    Language that canbe defined by regular expression is called regular set if regular expression r and s are from same set they are equivalent. (a|b)=(b|a) inspireray.blogspot.in
  • 7.
    It’s a sequenceof definitions of the form: d1→r1; d2→r2; each d is a new symbol r is regular expression over alphabets. inspireray.blogspot.in
  • 8.
    Kleene closure extendedin 50’s One or more instances Zero or one instance. Character class. inspireray.blogspot.in
  • 9.
    Taking the patternsfrom tokens and build piece of code that examines the input find the prefix that is the lexeme matching one of the pattern. Methods: 1. Transition Diagrams 2. Recognition of reverse words and identifier 3. Completion of running example. inspireray.blogspot.in
  • 10.
    These are firstflow charts. Conversion from patterns to transition diagram. It has states. Edges input inspireray.blogspot.in
  • 11.
    First and finalstate Accepting state Start state inspireray.blogspot.in
  • 12.
    Finding keywords andidentifiers are the problem. Return(gettoken(),installid()) 0 10 11 start letter Letter or digit othe r inspireray.blogspot.in
  • 13.
    Tool lex Input islex language tool itself is a lex compiler. Input file is lex.l Compiler transform it into c program Lex.yy.c And later the file is compiled by c to a.out inspireray.blogspot.in
  • 14.
    Declarations: %% Translation rules%% Auxiliary functions inspireray.blogspot.in
  • 15.
    Always prefer alonger prefix over the shorter It longer matches the two or more patterns then prefer the pattern listed first. Look Ahead operator: / is inserted to know the end of the part of lexeme. inspireray.blogspot.in
  • 16.
    The heart ofthe transition of lex turning input program to lexical analyzer is finite automata. Finite automata are recognizers they just say yes or no. Two types: 1. Non deterministic 2. Deterministic inspireray.blogspot.in
  • 17.
    No restrictions tothe edges from the same state. Finite state of state s Input alphabet Ʃ Transition function Start state Final state inspireray.blogspot.in
  • 18.
    State a bɛ 0 {0,1} {0} ф 1 ф {2} ф 2 ф {3} ф 3 ф ф Ф inspireray.blogspot.in
  • 19.
  • 20.
    There is exactlyone edge form the input to the next state. inspireray.blogspot.in