Lexical analyzer

By : Kiran Acharya
inspireray.blogspot.in

Introduction to lexical Analyzer
Input buffering
Specifications of tokens
Regular expressions

There are two basic rules in regular
expression.
1. ɛ is a regular expression and L(ɛ), that is
the language whose sole number is an
empty string.
2. If a is a symbol in Ʃ , then a is a regular
expression with one string of length one ,
with a at its first position.

(r)|(s) is a regular expression denoting
L(r)ỤL(s).
(r)(s) is a regular expression means L(r)L(s).
(r)* means regular expression(L(r)).

Unary operator * has the highest precedence
and its left associative.
Concatenation has second and its left
associative.
| has the lowest and left associative.

Language that can be defined by regular
expression is called regular set
if regular expression r and s are from same
set they are equivalent.
(a|b)=(b|a)

It’s a sequence of definitions of the form:
d1→r1;
d2→r2;
each d is a new symbol
r is regular expression over alphabets.

Kleene closure extended in 50’s
One or more instances
Zero or one instance.
Character class.

Taking the patterns from tokens and build
piece of code that examines the input find
the prefix that is the lexeme matching one of
the pattern.
Methods:
1. Transition Diagrams
2. Recognition of reverse words and identifier
3. Completion of running example.

These are first flow charts.
Conversion from patterns to transition
diagram.
It has states.
Edges
input

First and final state
Accepting state
Start state

Finding keywords and identifiers are the
problem.
Return(gettoken(),installid())
0 10 11
start
letter
Letter
or digit
othe
r

Tool lex
Input is lex language tool itself is a lex
compiler.
Input file is lex.l
Compiler transform it into c program
Lex.yy.c
And later the file is compiled by c to a.out

Declarations: %%
Translation rules %%
Auxiliary functions

Always prefer a longer prefix over the
shorter
It longer matches the two or more patterns
then prefer the pattern listed first.
Look Ahead operator: / is inserted to know
the end of the part of lexeme.

The heart of the transition of lex turning
input program to lexical analyzer is finite
automata.
Finite automata are recognizers they just say
yes or no.
Two types:
1. Non deterministic
2. Deterministic

No restrictions to the edges from the same
state.
Finite state of state s
Input alphabet Ʃ
Transition function
Start state
Final state

State a b ɛ
0 {0,1} {0} ф
1 ф {2} ф
2 ф {3} ф
3 ф ф Ф

aabb
(a|b)* abb

There is exactly one edge form the input to
the next state.

Lexical analyzer

More Related Content

What's hot

Similar to Lexical analyzer

Recently uploaded

Lexical analyzer