1. Complexity and Computability Regular Expressions
Regular Expressions
Regular Expressions(RE) denote structure of data, especially text
strings
They describe the same strings as those defined by finite automata
Define all and only regular languages (algebraic description of
languages) in comparision to the machine-like descriptions
Give a declarative way to express strings, therefore serve as input
language for many string-processing systems
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 1 / 29
2. Complexity and Computability Regular Expressions
Constructing a RE
General rule: To construct a RE for the language consisting of only the
string w , use w itself as a RE
Example: Write a regular expression for the set of strings consisting of
alternating 0’s and 1’s.
First develop a regular expression for the language consisting of single
string 01, then use star operator to get expression for all strings of form
0101...01
0 and 1 are regular expressions of the languages {0} and {1}
Concatenating the expressions gives a regular expression for the
language {01}, RE = 01
The RE 01 will be used for the construction
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 26 / 29
3. Complexity and Computability Regular Expressions
Constructing a RE
Therefore, strings consisting of zero or more occurrences of 01 will be
(01)∗. Note that (01)∗ is not the same as 01∗
This gives the language L((01)∗) which does not entirely satisfy what
we want, because it only considers strings beginning with 0 and
ending with 1.
Consider also possibility of having a 1 at the beginning and or a 0 at
the end
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 27 / 29
4. Complexity and Computability Regular Expressions
Constructing a RE
For the three other possibilities, construct a RE for each,
0(10)∗ for strings that begin and end with 0,
1(01)∗ for those that begin and end with 1,
(10)∗ for those that begin with 1 and end with 0
This gives:
T (G ) = (01)∗ + (10)∗ + 0(10)∗ + 1(01)∗
Note: Union operator gives collective set of all possibilities of strings with
alternating 0s, 1s (it combines the whole set of possible strings).
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 28 / 29
5. Complexity and Computability Regular Expressions
Regular Sets
Definition
Let Σ be an alphabet. A regular set over Σ is defined as:
Basis:
The constant, ∅, is a regular expression denoting the empty language
(L(∅) = ∅), which is a regular set of Σ
The constant, ϵ, is a regular expression denoting the empty word (ϵ),
and the language L(ϵ) = {ϵ} is a regular set of Σ
The symbol a is a regular expression, a, denoting the language {a}.
L(a) = {a} is a regular set of Σ ∀ a ∈ Σ
The variable L represents any language
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 8 / 29
6. Complexity and Computability Regular Expressions
Regular Sets
Induction:
If P and Q are regular sets/regular expressions over Σ, then:
P + Q denotes a regular expression of the union of L(P) and L(Q),
that is, L(P + Q) = L(P) ∪ L(Q)
PQ is a regular expression denoting concatenation of the languages,
L(PQ) = L(P)L(Q)
P∗ is a RE denoting closure of L(P), L(P∗) = (L(P))∗
(P) is a regular expression denoting the same language as P, L((P))
= L(P)
Nothing else is a regular set
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 9 / 29
7. Complexity and Computability Regular Expressions
Operations on Regular Sets
Therefore, a set of Σ∗ is regular iff it falls in any of the above conditions,
or can be obtained from them by a finite number of applications of the
operations of union, concatenation and closure.
Example: The RE 01∗ + 10∗ represents the language having strings that
are either a single zero followed by any number of 1’s or a single 1 followed
by any number of 0’s
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 10 / 29
8. Complexity and Computability Regular Expressions
Operator Precedence
Definition
The order of precedence for operators in decreasing order is:
∗ - closure/star operator, applies to a well-formed RE to its left
• - concatenation operator, from left if similar
Note: RS = SR (order sensitive)
+ - union operator
( ) - parentheses can be used to group operands, and help to override the
precedence order.
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 11 / 29
9. Complexity and Computability Regular Expressions
Example
Consider the mathematical expression xy + z or x − y − z. How
would you group the operands?
Group the RE 01∗ + 1:
(1∗), then (0(1∗)), then (0(1∗)) + 1, the language consisting of the
string 1 and all strings consisting of a 0 followed by any number of 1’s
(which may also be none)
If grouped as (01)∗ + 1 (the dot before the star), it represents the
language having the string 1 and the strings repeating 01 (zero or more
times)
If grouped as 0(1∗ + 1), (the union first), it represents the language of
strings beginning with 0, followed by any number of 1’s.
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 12 / 29
10. Complexity and Computability Regular Expressions
Applications
They are used to express important applications such as:
Text search, which is accomplished by converting the RE into a DFA
or NFA,
Building compiler components by describing their software
components, such as the lexical analyzer
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 2 / 29
11. Complexity and Computability Regular Expressions
Properties of Operations on REs
For the regular sets R, S and T formed from Σ∗,
R + R = R (the indempotent law for union),
R + ∅ = R (the identity for union)
R + S = S + R, - the commutative law for union
(R + S ) +T = R+ (S + T ), - associative law for union
(RS )T = R(ST ) = RST , - associative law for concatenation
Note that there is no commutative law for concatenation
Rϵ = ϵR = R - identity for concatenation
(R + S )T = RT + ST - right distributive law of concatenation over
union
T (R + S ) = TR + TS - left distributive law of concatenation over
union
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 14 / 29
12. Simplifications for the Operators
For a RE converted to an ϵ−NFA, some simplifications exist for the
operator constructs:
For the union operator, instead of creating new start and accepting
states, merge two start states into one with all the transitions of both
start states. Similarly, merge the two accepting states
For the concatenation operator, merge the accepting state of the first
automaton with the start of the second
For the closure, add ϵ− transitions from the accepting state to the
start state and vice-versa
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 37 / 47
13. Transition Graphs from REs
Basis:
Start by constucting transition diagrams for basic expressions for
smaller automata,
ϵ,
∅, and
RE a
Induction:
Then combining these automata inductively, to form larger automata
that accept the different operations of
union,
concatenation and
closure
of the languages accepted by smaller automata
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 31 / 47
14. Complexity and Computability Regular Expressions
Testing RE Properties
To test whether two REs R = S, where R and S have the same set of
variables:
Convert R and S to concrete RE C and D, respectively by replacing
each variable by a concrete symbol
Test whether L(C) = L(D)
If so, R = S is a true law, else false
Note: If the languages are not the same, it is sufficient to provide a single
string that is in one language but not the other
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 17 / 29
15. Complexity and Computability Regular Expressions
Operations on REs
Given two sets of words R and S (languages) from Σ∗,
The union of R and S denoted R ∪ S is the set of words that are
either in R or S or both:
R + S = {x : x ∈ R or x ∈ S} - UNION SET
The concatenation of languages R and S is the set of strings formed
by taking any string in R and concatenating it with any string in S :
R • S = {xy : x ∈ R,y ∈ S} - CONCATENATION SET
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 3 / 29
16. Complexity and Computability Regular Expressions
Operations on REs
The closure (star or Kleene closure) of language R denoted R∗ is the
set of the words that can be formed by taking any number of strings
from R (same string may be repeated) and concatenating the words
R∗ = ϵ + {x : x is obtained by concatenating a finite number of
words of R}
= ϵ + R + R2 + + Ri + ..., where R0 = ϵ, the zeroth power of R
1 Ri = (...(RR)R...)R i times
2 Note that Ri has 2i members
3 R∗ = ∪i≥0 Ri
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 4 / 29
17. Complexity and Computability Regular Expressions
Operations on REs
For the empty language ∅,
∅0 = {ϵ} and
∅i∀ i ≥ 1 is empty (no strings can be selected from an empty set)
∅∗ = {ϵ}
Ri is a finite set but R∗ may be an infinite set
Exceptions
1
2
The closure of the language ∅∗, is not an infinite set
If R is a string of 0’s, R0 ={ϵ}, R1 = R, R2 = R ⇔ R∗ = R
In this case, R∗ is not infinite
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 5 / 29
18. Union Operator
Consider the RE given by R + S
Starting at the new start state, the automaton can transition to the
start state of either R or S
The accepting state of one of the automata is reached by following
path labelled by some string in L(R) or L(S )
Then one of the ϵ arcs is followed to the accepting state of the new
automaton, giving the language of the automaton as L(R) ∪ L(S )
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 32 / 47
19. Concatenation
For RS,
The start state of the first automaton becomes the start state of the
whole RE,
The accepting state of the second automaton becomes the accepting
state of the whole
Paths for acceptance go through R (by string in L(R)), then S
(L(S )), giving the language L(R)L(S )
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 33 / 47
20. Closure
When drawing transition graphs for regular expressions of the form R∗:
Draw the graph for R,
To include R∗, add two nodes to represent the start and final nodes,
let the other nodes be intermediate nodes. Join the new nodes to the
graph of R with arcs labeled ϵ
Add an arc from the start state of the new graph to the final node,
labeled ϵ, irrespective of what R may be (remember that for R∗, ϵ
must be an accepted string)
Add another arc from the final state of R to the initial node of R
labeled ϵ. This allows motion to start for R, through the automaton
one or more times, to the accepting state.
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 34 / 47
21. Closure
Note:
From the start state, there is a path from the start to accepting state
along path labelled ϵ
Also, there is a path from the new start state to the start state of R,
through the automaton one or more times, to the accepting state
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 35 / 47
22. (R)
The expression (R) is the same as R, since parentheses do not change
the language defined by the expression. The representation for (R) is
therefore the same as that for R.
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 36 / 47
23. Construction of Transition Graphs from REs
Just as was done for the finite state system, transition graphs can
also be constructed for REs
This is achieved by drawing the equivalent ϵ−NFAs for the RE
The ϵ−NFAs have a single accepting state
Generally, there are no arcs into the initial state or out of the
accepting state
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 30 / 47
24. Example 1
Consider the RE represented by: T (G ) = 0 + 11∗,
To draw a transition graph (ϵ−NFA) for this RE:
Consider the sub-expressions in the RE, and for each, assume that the
languages of the sub-expression are also those of ϵ−NFAs with one
accepting state:
Let them be T (G1) = 0 and T (G2) = 11∗
Construct transition graphs for each of the sub-expressions
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 38 / 47
25. Example 1
For T (G1):
For T (G2):
Combine the two transition graphs using the union operator to obtain
the transition graph for T (G )
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 39 / 47
26. Example 1
Introduce nodes to combine the two transition graphs and connect
the nodes with ϵ, the empty string transition
Note that the ϵ− transitions disappear in concatenation, so in effect,
do not affect the graph
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 40 / 47
27. Example 1
Let one of the nodes introduced be an initial node and the other a
final node
This is a viable transition graph for T (G )
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 41 / 47
28. Example 1 cont’d
However, it is inefficient to have so many ϵ’s
Reducing the ϵ’s, (bearing in mind that ϵ’s disappear in
concatenation), gives the following as the resulting transition graph
for T (G )
Note that for the union operator, both strings, 0 and 11∗ are accepted
by the transition graph and their transition graphs are combined.
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 42 / 47
29. Example 2
To concatenate T (G1) = 0 and T (G2) = 11∗ for the expression
T (G ) = 011∗,
For concatenation, the final state of the transition graph of one
sub-expression is joined to the initial state of the other
For the example in consideration, the graph of T (G1) is joined to that
of T (G2) giving:
The extra transition representing the combination of the two graphs is
labeled with ϵ
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 43 / 47
30. Example 2
Reducing the graph gives:
as the resulting graph for the RE of T (G )
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 44 / 47
31. Example 3
For the closure operator, consider the RE represented by T (G ) = (01∗)∗
Draw T (G1) = 01∗
Then construct T (G ) = (01∗)∗
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 45 / 47
32. Characteristics of the ϵ− NFA
They have exactly one accepting state
There are no arcs into the initial state
There are no arcs out of the accepting state
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 46 / 47
33. Exercise
Draw transition graphs to represent the following strings/expressions:
1
2
3
01∗
(0 + 1)01
(10 + 0∗1)∗1
Safari-Yonasi (Makerere University) CSC 2210 2012/2013 47 / 47
34. Complexity and Computability Regular Expressions
Example
Let Σ = {a, b} and R = {aa, ab} = {a2, ab} and S = {aba, ab, ba}, then:
R + S = {a2,ab,aba,ba}
RS = {a2, ab} • {aba, ab, ba} =
{a2aba, a2ab, a2ba, ababa, abab, abba}
R∗ = ϵ + {a2,ab} + {a2a2,a2ab,aba2,abab} + ...
= ϵ + {a2,ab,a2a2,a2ab,aba2,abab} + ...
= (a2)m (ab)n, m, n ≥ 0
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 6 / 29
35. Complexity and Computability Regular Expressions
Exercise
Given Σ = {0, 1} and L = {001, 10, 111} and M = {ϵ, 001}. Find:
1
2
3
4
L∪M
L•M
L∗
M∗
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 7 / 29
36. Complexity and Computability Regular Expressions
Example 1
Consider the RE 0 + 01∗, applying some of the laws/ properties of REs,
0 can be factored out of the union, but the RE 0 would have to be
replaced by another RE
Using identity for concatenation, 0 = 0ϵ giving the RE 0ϵ + 01∗
Applying the left distributive law to the RE gives 0(ϵ + 1∗)
However, ϵ ∈ L(1∗) giving ϵ + 1∗ = 1∗ giving:
0(ϵ + 1∗) = 01∗
Assignment: Define the language of this RE
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 18 / 29
37. Complexity and Computability Regular Expressions
Example 2
Prove the law: (R + S)∗ = (R∗S∗)∗
Let R = a, S = b ⇒ (a + b)∗ = (a∗b∗)∗
LHS: (a + b)∗ = {ϵ, a, b, } giving strings of a’ and b’s mixed
RHS: (a∗b∗)∗:
a∗ = {ϵ, a, aa, aaa, ...}
b∗ = {ϵ, b, bb, bbb, ...}
(a∗b∗)= {ϵ, a, aa, aaa, ...}{ϵ, b, bb, bbb, ...} = {ϵ, a, b, aa, ab, bb,
abb, bbb, ... }
(a∗b∗)∗ = {ϵ, a, b, aa, ab, bb, abb, bbb, ... }∗ = {ϵ, a, b, ...} also
strings with a’s and b’s mixed
∴ LHS = RHS, (R + S)∗ = (R∗S∗)∗
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 19 / 29
38. Complexity and Computability Regular Expressions
Exercise
Prove or disprove the following statements on REs:
R∗R∗ = R∗
(R + S )∗ = R∗ + S∗
(RS + S )∗RS = (RR∗S )∗
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 20 / 29
39. Complexity and Computability Regular Expressions
Transition Graphs for REs
Let Σ = {0, 1} be a two-letter alphabet. The transition graph, G , over Σ
consists of:
A finite set of nodes, with at least one labelled as the initial and some
(may be more than one) labelled as final states.
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 21 / 29
40. Complexity and Computability Regular Expressions
Transition Graphs for REs
Oriented branches (which may be represented as ordered pairs of
nodes, arrows, arcs or loops)
Every arrow is labelled with a 0, 1, or ϵ
A word, w , is accepted by a transition graph if there exists a path
from an initial node to a final node such that the labels of the arrows
of the path form the word w , after the ϵ’s are deleted (ϵ’s disappear
in concatenation)
The empty string, ϵ, is accepted if one node is both a start and final
node or if there exists a path from an initial to final node whose
arrows are all labelled with ϵ’s.
The set of words accepted by a transition graph is denoted by T(G)
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 22 / 29
41. Complexity and Computability Regular Expressions
Examples
T(G) = {1}, accepts only 1
T(G) = {1∗}, accepts ϵ, 1, 11, 111, ...
T(G) = {∅}, does not accept anything
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 23 / 29
42. Complexity and Computability Regular Expressions
Examples
T(G) = {ϵ}
T(G) = {11∗}, the empty word, ϵ is not accepted
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 24 / 29
43. Complexity and Computability Regular Expressions
Exercise
What sets of words are accepted by the following transition graphs?
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 25 / 29
44. Complexity and Computability Regular Expressions
Assignment
1
2
Write a regular expression for the language consisting of strings of 0’s
and 1’s such that every pair of adjacent 0’s appears before any pair of
adjacent 1’s.
Give the English description of the language of the RE
(1 + ϵ)(00∗1)∗0∗
Safari - Yonasi (Makerere University) CSC 2210 2012/2013 29 / 29