2. 2
Regular Expressions vs. Finite State
Machine
Offers a declarative way to express the pattern of any
string we want to accept
E.g., 01*+ 10*
Automata => more machine-like
< input: string , output: [accept/reject] >
Regular expressions => more program syntax-like
Unix environments heavily use regular expressions
E.g., bash shell, grep, vi & other editors, sed
Perl scripting – good for string processing
Lexical analyzers such as Lex or Flex
4. Definition of Regular Expression
4
A Regular Expression is recursively defined as
follows:
• Փ is a regular expression denoting an empty language.
• ε is a regular expression denoting the language
containing an empty string.
• a is a regular expression indicating the language
containing only{a}.
• If R is a regular expression denoting the language LR
and S is a regular expression denoting the language LS
• R+S is a regular expression corresponding to the
language LR Ս LS.
• R . S is a regular expression corresponding to the
language LR . LS.
• R* is a regular expression corresponding to the
language LR*.
• The expression obtained by applying any of the rules
from 1 to 4 are regular expressions.
5. 5
Language Operators
Union of two languages:
L U M = all strings that are either in L or M
Note: A union of two languages produces a third
language
Concatenation of two languages:
L . M = all strings that are of the form xy
s.t., x L and y M
The dot operator is usually omitted
i.e., LM is same as L.M
6. 6
Kleene Closure (the * operator)
Kleene Closure of a given language L:
L0= {}
L1= {w | for some w L}
L2= { w1w2 | w1 L, w2 L (duplicates allowed)}
Li= { w1w2…wi | all w’s chosen are L (duplicates allowed)}
(Note: the choice of each wi is independent)
L* = Ui≥0 Li (arbitrary number of concatenations)
Example:
Let L = { 1, 00}
L0= {}
L1= {1,00}
L2= {11,100,001,0000}
L3= {111,1100,1001,10000,000000,00001,00100,0011}
L* = L0 U L1 U L2 U …
“i” here refers to how many strings to concatenate from the parent
language L to produce strings in the language Li
7. 7
Kleene Closure (special notes)
L* is an infinite set iff |L|≥1 and L≠{}
If L={}, then L* = {}
If L = Φ, then L* = {}
Σ* denotes the set of all words over an
alphabet Σ
Therefore, an abbreviated way of saying
there is an arbitrary language L over an
alphabet Σ is:
L Σ*
Why?
Why?
Why?
8. 8
Building Regular Expressions
Let E be a regular expression and the
language represented by E is L(E)
Then:
(E) = E
L(E + F) = L(E) U L(F)
L(E F) = L(E) L(F)
L(E*) = (L(E))*
9. Regular
Expression
Meaning
a* String consisting of any number of a’s(0 to many)=>{ε,a,aa,aaa,
aaaa,…….. ,aaaaaaa………………..}
a+ String consisting of at least one a (1 to many)
=>{a,aa,aaa,aaaa,….}
(a+b) / (aՍb) String consisting of either one a or one b.
(a+b)* Set of strings of a’s and b’s of any length including the NULL string.
(a+b)*abb Set of strings of a’s and b’s ending with the string abb.
ab(a+b)* Set of strings of a’s and b’s starting with the string ab
(a+b)*aa(a+b)* Set of strings of a’s and b’s having a sub string aa.
a*.b*.c* Set of strings consisting of any number of a’s followed by any
number of b’s followed by any number of c’s
a+b+c+ Set of strings consisting of atleast one a followed by atleast one b
followed by atleast one c.
aa*bb*cc* Set of strings consisting of atleast one a followed by atleast one b
followed by atleast one c.
(a+b)*(a+bb) Set of strings of a’s and b’s ending with either a or ab.
(aa)*(bb)*b Set of strings consisting of even number of a’s and followed by odd
number of b’s. 9
13. Star Closure=>
(a)*={a0, a1, a2, a3, a4,……………,a10…}
={ε, a, aa, aaa, aaaa, ………aaaaaaaaaa….}
Positive Closure=>
(a)+={a1, a2, a3, a4,……………,a10……. }
={ a, aa, aaa, aaaa,… ……aaaaaaaaaa….}
13
14. 1. (a+b)=> {a, b}
2. (a+b)2=>{ε, a, b, aa, ab, ba, bb}
3. (a+b)3=>{(a+b)0U(a+b)1U(a+b)2U(a+b)3}
1. {ε, a, b, aa, ab, ba, bb, aaa, bbb, aab, abb, bba, baa, bab, aba, }
4. (a+b)*=>{(a+b)0U(a+b)1U(a+b)2 ……….}
1. {ε, a, b, aa, ab, ba, bb, aaa, bbb, aab, abb, bba, baa, bab, aba, aaaa, bbbb, abab,
baba, abba, baab, ababa,babab,abbbb,..….}
5. (a+b)+={(a+b)1U(a+b)2, (a+b)3U(a+b)4….}
1. ={a,b,aa,ab,ba,bb,,,,,,,,,,,,……….}
6. (a+b)*a => Strings of a’s and b’s ending with b.
7. ab(a+b)*=> Strings of a’s and b’s starting with “ab”.
8. (a+bb)=> Strings of either one a or two b’s.
9. a*(a+bb)=> Strings of a’s and b’s ending with either one
a or two b’s. 14
Some Examples of Regular Expression
17. Regular Expressions practice examples
1. Obtain a regular expression representing strings of a’s
and b’s of length <= 2.
RE = (ε+a+b+aa+ab+ba+bb) OR (ε+a+b)2
OR (ε+a+b) (ε+a+b)
2. Obtain a regular expression representing strings of a’s
and b’s of length <= 10.
RE = (ε+a+b)10
3. Obtain a regular expression representing strings of a’s
and b’s having even length.
RE = (aa+ab+ba+bb)* OR ((a+b) (a+b))*
4. Obtain a regular expression representing strings of a’s
and b’s having odd length.
RE = ((a+b) (a+b))* (a+b) OR (a+b)((a+b) (a+b))*
17
18. Regular Expressions practice examples
5. Obtain a regular expression representing strings of a’s
and b’s starting with a and ending with b.
RE = a(a+b)*b
6. Obtain a regular expression representing strings of a’s
and b’s whose second symbol from right end is a.
RE = (a+b)*a(a+b)
7. Obtain a regular expression representing strings of a’s
and b’s whose tenth symbol from left end is a.
RE = (a+b)9a(a+b)*
8. Obtain a regular expression representing strings of a’s
and b’s whose length is either even or multiples of 3 or
both.
RE = ((a+b) (a+b))* + ((a+b) (a+b) (a+b))*
18
19. Regular Expressions practice examples
9. Obtain a regular expression for the language
L={anbm|m+n is even}.
Case 1: Even no. of a’s followed by even no. of b’s
RE = (aa)*(bb)*
Case 2: Odd no. of a’s followed by Odd no. of b’s
RE = a(aa)*b(bb)*
So, RE= (aa)*(bb)*+ a(aa)*b(bb)*
10. Obtain a regular expression for the language
L={anbm | n≥4 , m≤3}.
RE = aaaa(a)*(ε+b) (ε+b) (ε+b)
11. Obtain a regular expression for the language L={w:
na(w) mod 3=0 where w Ꞓ (a,b)*}
RE = (b*ab*ab*ab*)*
19
20. 20
Example: how to use these regular
expression properties and language
operators?
L = { w | w is a binary string which does not contain two consecutive 0s or
two consecutive 1s anywhere)
E.g., w = 01010101 is in L, while w = 10010 is not in L
Goal: Build a regular expression for L
Four cases for w:
Case A: w starts with 0 and |w| is even
Case B: w starts with 1 and |w| is even
Case C: w starts with 0 and |w| is odd
Case D: w starts with 1 and |w| is odd
Regular expression for the four cases:
Case A: (01)*
Case B: (10)*
Case C: 0(10)*
Case D: 1(01)*
Since L is the union of all 4 cases:
Reg Exp for L = (01)* + (10)* + 0(10)* + 1(01)*
If we introduce then the regular expression can be simplified to:
Reg Exp for L = ( +1)(01)*( +0)
21. 21
Finite State Machine (FSM) &
Regular Expressions (Reg Ex)
To show that they are interchangeable,
consider the following theorems:
Theorem 1: For every DFSM A there exists a
regular expression R such that L(R)=L(A)
Theorem 2: For every regular expression R there
exists an -NDFSM E such that L(E)=L(R)
-NDFSM NDFSM
DFSM
Reg Ex
Theorem 2
Theorem 1
Proofs
in the book
Kleene Theorem
22. To build a FSM from Regular
Expression
Theorem : Let R be a regular expression.
Then there exists a finite state machine
M=(K, Σ, δ, s, A) which accepts L(R).
Proof: By definition, Փ, ε, a are regular
expressions. So the corresponding machines are:
22
Փ a
23. To build a FSM from Regular
Expression contd…..
23
The Schematic representation of a regular expression R to accept the
language L(R) is:
q M f
L(R)
According to definition of regular expression if R1 and R2 are
two regular expressions, then:
Case 1: R1 + R2 is a regular expression
Case 2: R1. R2 is a regular expression
Case 3: R1 * is a regular expression
24. NDFSM for Regular Expression:
24
Case 1: R=R1+R2. We can construct NDFSM which accepts
either L(R1) or L(R2) which can be represented as L(R1+R2)
as shown below:
q1 M1 f1
L(R1)
q2 M2 f2
L(R2)
q0
qf
ε ε
ε
ε
25. NDFSM for Regular Expression:
25
Case 2: R=R1.R2. We can construct NDFSM which accepts
L(R1) followed by L(R2) which can be represented as L(R1.R2)
as shown below:
q1 M1 f1
L(R1)
q2 M2 f2
L(R2)
ε ε
26. NDFSM for Regular Expression:
26
Case 3: R=(R1)*. We can construct NDFSM which accepts ε or
any number of L(R1) which can be represented as L(R1)* as
shown below:
q1 M1 f1
L(R1)
q0 qf
ε ε
ε
ε
27. 27
RE to -NDFSM construction
Example: Obtain a NDFSM for RE (0+1)*
0
1
0
Step 1: NDFSM for RE= 0 is
1
Step 2: NDFSM for RE= 1 is
Step 3: NDFSM for RE= (0+1) is
Step 4: NDFSM for RE= (0+1)* is
0
1
28. 28
RE to -NDFSM construction
-NDFSM
Reg Ex
Theorem 2
Example: (0+1)*01(0+1)*
0
1
0 1
0
1
(0+1)* 01 (0+1)*
29. 29
RE to -NDFSM construction
Example: a*+b*+c*
a
Step 1: NDFSM for RE= a is
a
Step 2: NDFSM for RE= a* is
b
Step 2: NDFSM for RE= b* is
c
Step 2: NDFSM for RE= c* is
30. 30
RE to -NDFSM construction
Example: a*+b*+c*
ε
Step 1: NDFSM for RE= a* is
a
b
c
ε
ε
ε
ε
Step 2: NDFSM for RE= b* is
Step 3: NDFSM for RE= c* is
Step 4:To Combine
it together we need
one start state and
one final state:
Step 5: Finally
connect it together
with ε-transitions.
ε
31. 31
RE to -NDFSM construction
Example: a*b*c*
ε
Step 1: NDFSM for RE= a* is
a
b
c
ε
ε
Step 2: NDFSM for RE= b* is
Step 3: NDFSM for RE= c* is
Step 5: Finally
connect it together
with ε-transitions.
ε
32. Construct NDFSM for (ab)*(ab)
32
b
a
ε b
a
Step 1: NDFSM for RE: a
Step 2: NDFSM for RE: b
Step 3: NDFSM for RE: ab
Step 4: NDFSM for RE: one or more times “ab”
Step 5: NDFSM for RE: (ab)*
Step 6: NDFSM for RE: ab
Step 7: Connect (ab)* and ab using -transition
Step 8: Finally make final state.
34. 34
DFSM to RE construction
Reg Ex
DFSM
Theorem 1
Example:
q0 q1 q2
0 1
1 0 0,1
(1*) 0 (0*) 1 (0 + 1)*
Informally, trace all distinct paths (traversing cycles only once)
from the start state to each of the final states
and enumerate all the expressions along the way
1*00*1(0+1)*
00*
1* 1 (0+1)*
Q) What is the language?
35. To Obtain RE from FSM
Method 1
35
fsmtoregexheuristic(M: FSM}
36. fsmtoregexheuristic(M: FSM} =
36
1. Remove from M any states that are unreachable from the start
state.
2. If M has no accepting states then halt and return the simple regular
expression Ф.
3. If the start state of M is part of a loop (i.e., it has any transitions
coming into it), create a new start state s and connect s to M's start
state via an ε-transition. This new start state s will have no transitions
into it.
4. If there is more than one accepting state of M or if there is just one
but there are any transitions out of it, create a new accepting state
and connect each of M's accepting states to it via an ε-transition.
Remove the old accepting states from the set of accepting states.
Note that the new accepting state will have no transitions out from it.
37. fsmtoregexheuristic contd…..
37
5. If at this point, M has only one state, then that state is both the start
state and the accepting state and M has no transitions. So L (M) = {ε}.
Halt and return the simple regular expression ε.
6. Until only the start state and the accepting state remain do:
6.1. Select some state rip of M. Any state except the start state or the
accepting state may be chosen.
6.2. Remove rip from M.
6.3. Modify the transitions among the remaining states so that M
accepts the same strings. The labels on the rewritten transitions
may be any regular expression.
7. Return the regular expression that labels the one remaining transition
from the start state to the accepting state.
38. Ex. 1 Obtain a regular expression for the given FSM
38
1 2
3
a
b
b
a
a
1 2
3
a
b
b
a
a
Step1: Insert separate start state i.e. state 0 as there are incoming transitions on state 1.
4
ε
ε
0
ε
Step 2: Insert separate final state as there are outgoing transitions from final states i.e. state 1
and state 2 and there are multiple final states.
1 2
3
a
b
b
a
a
4
ε
0
ε
ε
39. 39
1 2
3
a
b
b a
4
ε
0
ε
ε
Ex. 1 Obtain a regular expression for the given FSM. Contd…
Step 3: The state 3 can be removed by inserting a transition from state 2 to state 1 with label
aa*b as:
a
b
4
ε
0
ε
ε
a
1 2
aa*b
40. 40
Ex. 1 Obtain a regular expression for the given FSM. Contd…
Step 4: Two arcs from state 2 to state 1 can be written using one arc as shown:
a
4
ε
0
ε
ε
a
b
4
ε
0
ε
ε
1 2
aa*b
1 2
b+(aa*b)
41. 41
Ex. 1 Obtain a regular expression for the given FSM. Contd…
Step 5: State 2 can be removed by having a self loop at sate 1 with regular expression
a(b+aa*b) as shown:
4
a
0
ε
ε
a(b+(aa*b)) OR ab+aaa*b
1
4
ε
0
ε
ε
1 2
a
b+(aa*b)
42. 42
Ex. 1 Obtain a regular expression for the given FSM. Contd…
Step 6: Two arcs from state 1 to state 4 can be written using one arc as shown:
4
a+ ε
0
ε
ab+aaa*b
1
4
a
0
ε
ε
a(b+(aa*b)) OR ab+aaa*b
1
Step 7: State 1 can be removed by having a pattern a(b+aa*b) repeated zero or more times
followed by an optional a as shown:
0
(ab+aaa*b)*(a+ ε)
4
So, Final Regular expression is: (ab+aaa*b)*(a+ ε)
It can be also written as: (abUaaa*b)*(aU ε)
43. Ex. 2 Obtain a regular expression for the given FSM
43
1 3
2
a
b
b
b
4
a
1 3
2
a
b
b
b
4
a 1
(a+bb)
4
b*a
Final Regular Expression is:
(a+bb)b*a
45. To Obtain RE from FSM
Method 2
45
Step1: standardize(M: FSM) =
Step 2: buildregex(M: FSM) =
46. Step 1: standardize(M: FSM) =
46
1. Remove from M any states that are unreachable from the start state.
2. If the start state of M is part of a loop (i.e., it has any transitions coming into it),
create a new start state s and connect s to M’s start state via an e-transition.
3. If there is more than one accepting state of M or if then.• is just one but there are
any transitions out of it, create a new accepting state and connect each of M's
accepting states to it via an ε-transition. Remove the old accepting states from the
set of accepting states.
4. If there is more than one transition between states p and q. collapse them into a
single transition, S. If there is a pair of states p. q and there is no transition between
them and p is not the accepting state and CJ is not the start state, then create a
transition from p to q labeled 0.
47. Step 2: buildregex(M: FSM) =
47
1. If M has no accepting states, then halt and return the simple regular expression
Փ .
2. If M has only one state, then halt and return the simple regular expression ε.
3. Until only the start slate and the accepting state remain do:
Select some state rip of M. Any state except the start state or the accepting
state may be chosen.
For every transition from some state p to some state q, if both p and q are
not rip then, using the current labels given by the expressions R, compute
the new label R ' for the transition from p to q using the formula:
R '(p, q) = R(p, q) U R(p, rip)R(rip, rip)*R(rip, q).
Remove rip and all transitions into and out of it.
4. Return the regular expression that labels the one remaining transition from the
start state to the accepting state.
48. Ex. 2 Obtain a regular expression for the given FSM
48
1 3
2
a
b
b
b
4
a
Step 1: Standardized FSM: Adding
all the required transitions.
1 3
2
a
b b
b
4
a
Փ
Փ
Փ
Փ
Փ
Փ
Փ
49. Ex. 2 Obtain a regular expression for the given FSM
49
1 2
4
a
b
b
b
3
a
Step 2: Ripping states out one at a time.
1 2
4
a
b b
b
3
a
Փ
Փ
Փ
Փ
Let rip be state 4. Then:
R’(1,3)=R(1,3) U R(1,rip) R(rip,rip)* R(rip,3).
=R(1,3) U R(1,4)R(4,4)R(4,3)
=Փ U (b. Փ* Փ)=> Փ
Let rip be state 2. Then:
R’(1,3)=R(1,3) U R(1,rip) R(rip,rip)* R(rip,3).
=R(1,3) U R(1,2)R(2,2)R(2,3)
=ՓU(ab*a)
50. Ex. 2 Obtain a regular expression for the given FSM
50
1 2
a
b
b
b
3
a
Step 2: Ripping states out one at a time.
1 2
4
a
b b
b
3
a
Փ
Փ
Փ
Փ
Let rip be state 4. Then:
R’(1,3)=R(1,3) U R(1,rip) R(rip,rip)*
R(rip,3).
=R(1,3) U
R(1,4)R(4,4)R(4,3)
=Փ U (b. Փ* Փ)=> Փ
Let rip be state 2. Then:
R’(1,3)=R(1,3) U R(1,rip) R(rip,rip)*
R(rip,3).
=R(1,3) U
R(1,2)R(2,2)R(2,3)
=ՓU(ab*a)
1 2
a
b
3
a
Փ
4
51. Ex. 2 Obtain a regular expression for the given FSM
51
Step 2: Ripping states out one at a time.
1 2
4
a
b b
b
3
a
After ripping state 2.
We get
=ab*a
1 ab*a
3
52. 52
0
2
R ‘(0,2) = R(0,2) U R(0,1)R(1, 1)*R(1, 2).
a
0 1 2
a
Փ
ε
R(0,2)=a+Փ= Փ+a
=a
R(0,2)=R(0,1).R(1,2)
a.Փ= Փ.a = Փ