Regular Expressions To Finite Automata

Mr.Keshav Tambre
Information Technology Department
International Institute of Information Technology, I²IT
www.isquareit.edu.in
Regular Expressions To Finite Automata

 Lexical analyzer recognizes tokens present in the source
program
 Compiler defines tokens using RE
 These indicate what characters may go into a lexeme
belonging to a particular token and how they go together
 Lexical analyzer is implemented as FSM
2
Regular Expressions

 Design of lexical analyzer involves:-
 Represent tokens by RE
 Synthesis of finite state machines that recognize desired
strings generated by RE
 Simulating the behavior of FSM which generates the lexical
analyzer
3
Regular Expressions

 RE are notations for specifying patterns
 Alphabet/Character class
 Denotes a finite set of symbols
 String
 Strings are defined over some alphabets, is a finite sequence
of symbols drawn from the alphabet
 Empty string denoted by €
 Language
 Set of strings over some alphabet
4
Specification of Tokens

 ∑={a,b}
 a|b denotes the set {a,b}
 RE (a|b) (a|b) denotes the set {aa,ab,ba,bb}
 (a|b)*
 a|a*b
 Identifiers in C
 letter|(letter | digit)*
5
Regular Expression

 Names given to RE
 If ∑ is an alphabet of basic symbols then RE is a sequence
of definitions of the form:
d1 → r1……….. dn → rn
where di is the distinct name and ri is the RE
 Short hand notations:
 *
 +
 ?
6
Regular Definitions

 Intermediate steps in the construction of the analyzer
 Depicts the action of the lexical analyzer when called
upon by the parser to get the next token
 Positions are represented as circles called as states
 States connected by arrows called edges
 Edges have labels indicating the input characters that
can appear next after the transition diagram has
reached states
 Assumption –TD are deterministic in nature
7
Transition diagrams

 When RE is compiled as a recognizer for a language by
constructing a generalized transition diagram called finite
automata
 Deterministic and Non deterministic
 NFA
 Mathematical model that consists of
 Set of states , s
 Set of input symbol ∑
 Transition function move that maps state symbol pairs to set of states
 A start state
 A final state
8
Finite automata

 Diagrammatically represented as a labeled directed graph
called transition graph
 Implementation-State Transition Table
 DFA
 Special case of NFA in which
 No state has € transition
 For each state s and input symbol a is at most one edge labeled a
leaving s
9
Finite automata

 RE are notations for specifying patterns
 Alphabet/Character class
 Denotes a finite set of symbols
 String
 Strings are defined over some alphabets, is a finite sequence
of symbols drawn from the alphabet
 Empty string denoted by €
 Language
 Set of strings over some alphabet
10
Specification of Tokens

 RE to DFA
 We augment the grammar the RE with #
 Construct nullable,firstpos ,lastpos and followpos which is done by
making traversal over the tree
 Construct DFA from followpos
 Firstpos: at what positions the string can start
 Lastpos:at what positions the string can end
 Follopos:what positions can follow position I in the syntax tree
 Nullable:whether an empty string can be generated at the node (sub
expression)
11
RE to DFA

Node n Nullable(n) Firstpos(n) Lastpos(n)
N is a leaf labeled έ true ø ø
if N is | node with left child
c1 and right child c2
Nullable(c1) or
Nullable (c2)
Firstpos (C1) U Firstpos(C2) Lastpos (C1) U Lasttpos(C2)
N is a leaf labeled with
position i
false { i } { i }
if N is . Node node with left
child c1 and right child c2
Nullable(c1) and
Nullable (c2)
If nullable (C1) then
Firstpos (C1) U Firstpos(C2)
else Firstpos(C1)
If nullable (C2) then
Lastpos (C1) U Lastpos(C2)
else Lastpos(C2)
if N is the node with only c1 as
child
true Firstpos (C1) Lastpos (C2)
12
RE to DFA

 Computation of Followpos : Followpos(i) tells us what
positions can follow position i in the syntax tree. This can be
computed as follows.
 if n is a ‘.’ Node, with a left child C1 and right child C2 and i
is a position in the Lastpos(C1), then all positions in
Firstpos(C2) are in Followpos(i)
 if n is a * Nodeand i is a position in the Lastpos(n), then all
positions in Firstpos(n) are Followpos(i)
13
RE to DFA

Algorithm for construction of DFA transition table
Initially , the only unmarked state in Dstates is firstpos(root), where root is the root of a
syntax tree.
While there is an unmarked state T in Dstates do
Begin
Mark T
For each input symbol a do
Begin
Let U be the set of positions that are in Followpos(P) for some P in T,
such that the symbol at position P is a.
If U is not empty and is not in Dstates then add U as an unmarked
state to Dstates
Dtran [T,a] = U
End
End
14
RE to DFA

Thank You
www.isquareit.edu.in
P-14, Rajiv Gandhi Infotech Park, MIDC Phase – 1,
Hinjawadi, Pune – 411057, India

Regular Expressions To Finite Automata

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Regular Expressions To Finite Automata

Similar to Regular Expressions To Finite Automata (20)

More from International Institute of Information Technology (I²IT)

More from International Institute of Information Technology (I²IT) (20)

Recently uploaded

Recently uploaded (20)

Regular Expressions To Finite Automata