Dr. Mohamed Gamal Faculty of Computers and Informatics
Contents
1 Operations on Languages (Sets) 2
1.1 L1 Concatenation L2 (L1 · L2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 L1 Union L2 (L1 ∪ L2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 L3
1 (L1 Concatenated with Itself 3 Times) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 L∗
1 (Kleene Closure of L1 - Zero or more) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 L+
1 (Positive Closure of L1 - One or more) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Regular Expressions (RE) and Languages 4
2.1 Basic Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 C-Language Identifiers and Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Finite Automata 7
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Minimizing Number of States of a DFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Page 1
Dr. Mohamed Gamal Faculty of Computers and Informatics
1 Operations on Languages (Sets)
Operation Definition & Notation
Union of L and M L ∪ M = {s | s is in L or s in M}
Concatenation of L and M LM = {st | s is in L and t is in M}
Kleene closure of L and M L∗ = ∪∞
i=0Li
Positive closure of L L+ = ∪∞
i=1Li
Given:
L1 = {a, b, c, d} and L2 = {1, 2}
1.1 L1 Concatenation L2 (L1 · L2)
The concatenation of L1 and L2 forms strings by concatenating each element of L1 with each element of L2.
L1 · L2 = {a1, a2, b1, b2, c1, c2, d1, d2}
1.2 L1 Union L2 (L1 ∪ L2)
The union of L1 and L2 combines all elements from both sets.
L1 ∪ L2 = {a, b, c, d, 1, 2}
1.3 L3
1 (L1 Concatenated with Itself 3 Times)
This is the set of all possible strings formed by concatenating three elements of L1.
L3
1 =



















aaa, aab, aac, aad, aba, abb, abc, abd, aca, acb, acc, acd, ada, adb, adc, add,
baa, bab, bac, bad, bba, bbb, bbc, bbd, bca, bcb, bcc, bcd, bda, bdb, bdc, bdd,
caa, cab, cac, cad, cba, cbb, cbc, cbd, cca, ccb, ccc, ccd, cda, cdb, cdc, cdd,
daa, dab, dac, dad, dba, dbb, dbc, dbd, dca, dcb, dcc, dcd, dda, ddb, ddc, ddd



















Page 2
Dr. Mohamed Gamal Faculty of Computers and Informatics
1.4 L∗
1 (Kleene Closure of L1 - Zero or more)
The Kleene closure of L1 includes all possible strings including the empty string.
L∗
1 =



































































ϵ,
a, b, c, d,
aa, ab, ac, ad,
ba, bb, bc, bd,
ca, cb, cc, cd,
da, db, dc, dd,
aaa, aab, aac, aad, aba, abb, abc, abd, aca, acb, acc, acd, ada, adb, adc, add,
bba, bbb, bbc, bbd, bca, bcb, bcc, bcd, bda, bdb, bdc, bdd,
caa, cab, cac, cad, cba, cbb, cbc, cbd, cca, ccb, ccc, ccd, cda, cdb, cdc, cdd,
daa, dab, dac, dad, dba, dbb, dbc, dbd, dca, dcb, dcc, dcd, dda, ddb, ddc, ddd,
. . .



































































1.5 L+
1 (Positive Closure of L1 - One or more)
The positive closure of L1 includes all possible non-empty strings formed by concatenating elements of L1 one
or more times.
L+
1 =







































a, b, c, d,
aa, ab, ac, ad,
ba, bb, bc, bd,
ca, cb, cc, cd,
da, db, dc, dd,
aaa, aab, aac, aad, aba, abb, abc, abd, aca, acb, acc, acd, ada, adb, adc, add,
. . .







































Page 3
Dr. Mohamed Gamal Faculty of Computers and Informatics
2 Regular Expressions (RE) and Languages
2.1 Basic Terminology
Regular
Expression
Language Denoted Explanation
a {a} The language contains only the string ‘a’.
ab {ab} The language contains only the string “ab”.
(a | b) {a, b} The language contains the strings ‘a’ and ‘b’.
a(b | c) {ab, ac} The language contains the strings “ab” and
“ac”.
a(bc)∗ {a, abc, abcbc, abcbcbc, . . .} The language contains strings that start with ‘a’
followed by zero or more repetitions of “bc”.
(ab | cd)∗ {ϵ, ab, cd, abcd, ababcd, cdab, . . .} The language contains strings formed by con-
catenating zero or more repetitions of “ab” or
“cd”.
a(bc | de)∗ {a, abc, ade, abcde, abcbc, . . .} The language contains strings that start with ‘a’
followed by zero or more repetitions of “bc” or
“de”.
(a | b)∗ c {c, ac, bc, aac, abc, bbc, . . .} The language contains strings that end with ‘c’
and are preceded by zero or more repetitions of
‘a’ or ‘b’.
a∗ {ϵ, a, aa, aaa, . . .} The language contains zero or more repetitions
of ‘a’.
a+b {ab, aab, aaab, . . .} The language contains strings with one or more
‘a’ followed by ‘b’.
(a | b)∗ abb {abb, aabb, babb, aaabb, . . .} The language contains strings that end with
“abb” and can have zero or more repetitions of
‘a’ or ‘b’ before it.
(a | b)∗ {ϵ, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, . . .} The language contains all possible strings (in-
cluding the empty string) made up of ‘a’ and
‘b’.
Page 4
Dr. Mohamed Gamal Faculty of Computers and Informatics
2.2 C-Language Identifiers and Numbers
C-Language Identifiers
Definition Regular Expression Example
letter [A-Za-z ] a, B,
digit [0-9] 0, 1, 9
CId letter ( letter — digit )∗ var, myVar123, temp
Unsigned Integer or Floating Point Numbers
Definition Regular Expression Example
digit [0-9] 0, 1, 9
digits digit+ 123, 4567
number digits (.digits)? ( E [+−]? digits)? 42, 3.14, 2.71E-3
Regular Expressions Validators:
ˆ https://regexr.com/
ˆ https://regex101.com/
Page 5
Dr. Mohamed Gamal Faculty of Computers and Informatics
2.3 Examples
Write a regular expression for a language
1. accepting all strings of lowercase letters in which the letters are in ascending order.
Solution: R.E = a∗ b∗ . . . z∗.
2. accepting all strings which contain exactly two a’s, where
P
= {a, b}
Solution: R.E = b∗ a b∗ a b∗.
3. accepting all strings which contain exactly one a, where
P
= {a, b, c}
Solution: R.E = (b | c)∗ a (b | c)∗.
4. accepting all strings which contain as maximum three a’s, where
P
= {a, b, c}
Solution: R.E = (b | c)∗ a? (b | c)∗ a? (b | c)∗ a? (b | c)∗.
5. that doesn’t have ab as a substring, where
P
= {a, b}
Solution: R.E = b∗ a∗.
6. accepting all strings which contain 010 as a substring, where
P
= {0, 1}
Solution: R.E = (0 | 1)∗ 010 (0 | 1)∗.
7. accepting all strings where a is multiple of 3, where
P
= { a, b, c }
Solution: R.E = (b | c)∗ (aaa)∗ (b | c)∗.
8. that doesn’t end with ab, where
P
= { a, b }
Solution: R.E = (a | b)∗ (a | bb)+.
9. accepting strings of even length, where
P
= { a, b }
Solution: R.E = (aa | bb | ab | ba)∗.
10. including digits that begin with 1 and end with 1, where
P
= { 0, 1 }
Solution: R.E = 1 (0 | 1)∗ 1.
Page 6
Dr. Mohamed Gamal Faculty of Computers and Informatics
3 Finite Automata
3.1 Introduction
A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that
language, and “no” otherwise.
We call the recognizer of the tokens as a finite automaton which can be
1. Non-deterministic (NFA): slower, but it may take less space.
1) S: a set of states.
2)
P
: a set of input symbols (alphabet).
3) move: a transition function move to map state-symbol pairs to sets of states.
4) s0: a start (initial) state.
5) F: a set of accepting states (final states).
Note that ϵ-transitions are allowed in NFA.
Example:
2. Deterministic (DFA): faster recognizer, but it may take more space - widely used.
ˆ DFA is a special form of a NFA.
ˆ No ϵ-transitions.
ˆ for each symbol a and state s, there is at most one labeled edge leaving s.
Example:
Two Algorithms:
Algorithm 1: Regular Expression → NFA → DFA
Algorithm 2: Regular Expression → DFA
Page 7
Dr. Mohamed Gamal Faculty of Computers and Informatics
Algorithm 1: Regular Expression → NFA → DFA
1. Thomson’s Construction: Regular Expression → NFA
Operation Decsription
ϵ-closure(s) Set of NFA states reachable from NFA state s on ϵ-transitions
alone.
ϵ-closure(T) Set of NFA states reachable from some NFA state s in set T on
ϵ-transitions alone; = ∪s in T ϵ − closure(s).
move(T, a) Set of NFA states to which there is a transition on input symbol
’a’ from some state s in T.
Page 8
Dr. Mohamed Gamal Faculty of Computers and Informatics
Example to recognize the regular expression (a|b)∗ a
2. Converting NFA → DFA
Page 9
Dr. Mohamed Gamal Faculty of Computers and Informatics
NFA State DFA State a b Type
{0, 1, 2, 4, 7} S0 S1 S2 Start
{1, 2, 3, 4, 6, 7, 8} S1 S1 S2 Final
{1, 2, 4, 5, 6, 7} S2 S1 S2 -
Transition table for the DFA
Another Example:
NFA for (a|b)∗
abb
NFA State DFA State a b Type
{ 0,1 ,2 , 4, 7 } A B C Start
{ 1, 2, 3, 4, 6, 7, 8 } B B D -
{ 1, 2, 4, 5, 6, 7 } C B C -
{ 1, 2, 4, 5, 6, 7, 9 } D B E -
{ 1, 2, 3, 5, 6, 7, 10 } E B C Final
Transition table for the DFA
Page 10
Dr. Mohamed Gamal Faculty of Computers and Informatics
Resulting DFA
Page 11
Dr. Mohamed Gamal Faculty of Computers and Informatics
Algorithm 2: Regular Expression → DFA
1. Augment the given regular expression by concatenating it with a special symbol # at the end.
e.g., r → (r)#
2. Create a syntax tree for this augmented regular expression.
ˆ all alphabet symbols (plus # and the empty string) will be on the leaves.
ˆ all inner nodes will be the operators.
3. Each alphabet symbol (plus #) should be numbered (i.e., position numbers).
4. Calculate the functions: followpos, firstpos, lastpos, nullable
ˆ firstpos(node): set of positions in the string that can start at the node’s subtree.
ˆ lastpos(node): set of positions in the string that can end at the node’s subtree.
ˆ followpos(i): set of positions that can follow position i in the regular expression.
Page 12
Dr. Mohamed Gamal Faculty of Computers and Informatics
Algorithm 1 Computing followpos
1: for each node n in the tree do
2: if n is a concatenation node with left child c1 and right child c2 then
3: for each i in lastpos(c1) do
4: followpos(i) ← followpos(i) ∪ firstpos(c2)
5: end for
6: else if n is a star node then
7: for each i in lastpos(n) do
8: followpos(i) ← followpos(i) ∪ firstpos(n)
9: end for
10: end if
11: end for
Example:
We first construct the syntax tree using the table above and find the firstpos and lastpos for each node, as
follows
Then we calculate followpos using algorithm 1 above, the result is as follows
ˆ followpos(1) = {1, 2, 3}
ˆ followpos(2) = {1, 2, 3}
ˆ followpos(3) = {4}
ˆ followpos(4) = {}
Page 13
Dr. Mohamed Gamal Faculty of Computers and Informatics
Now we are ready to construct the corresponding DFA using the followpos.
(a
1
| b
2
)∗
a
3
#
4
followpos(1) = {1, 2, 3}
followpos(2) = {1, 2, 3}
followpos(3) = {4}
followpos(4) = {}
S1 = firstpos(root) = {1, 2, 3}
⇓ mark S1
a: followpos(1) ∪ followpos(3) = {1, 2, 3, 4} = S2 move(S1, a) = S2
b: followpos(2) = {1, 2, 3} = S1 move(S1, b) = S1
⇓ mark S2
a: followpos(1) ∪ followpos(3) = {1, 2, 3, 4} = S2 move(S1, a) = S2
b: followpos(2) = {1, 2, 3} = S1 move(S1, b) = S1
Start State: S1
Final (accepting) State: S2
Final resulting DFA
Page 14
Dr. Mohamed Gamal Faculty of Computers and Informatics
3.2 Minimizing Number of States of a DFA
The process of minimizing a DFA is
1. Initial Partition: divide all the states of your DFA into two groups
ˆ G1: Accepting States.
ˆ G2: Non-Accepting States.
2. Refining Groups: refine these groups to create subgroups based on transitions
ˆ For each group G (either G1 or G2):
– Break G into smaller subgroups.
– Two states, s1 and s2, will be in the same subgroup if and only if, for every input symbol, they
transition to states within the same subgroup.
3. Start State: the start state of the minimized DFA is the group that contains the original start state of
the DFA. This helps in keeping the starting behavior consistent with the original DFA.
4. Accepting States: the accepting states of the minimized DFA are those groups that contain at least
one of the original accepting states. This ensures that the minimized DFA correctly represents all the
accepting conditions of the original DFA.
Example:
Example DFA
1. Partition the set of states into two groups (accepting & non-accepting)
G1 = {4}
G2 = {1, 2, 3}
2. Refine the groups based on the DFA transition table
State a b
1 2 3
2 2 3
3 4 3
4 2 3
DFA transition table
Page 15
Dr. Mohamed Gamal Faculty of Computers and Informatics
From the transition table we notice that the states 1 and 2 in G2 lead to the same input symbols (colored
in red), thus, we can group them alone together.
G1 = {4}
G2 = {1, 2, 3}
{1, 2} {3}
⇓
{1, 2} {3} {4}
3. Start State: {1, 2}
4. Final State: {4}
Minimized DFA
Online Resources:
ˆ Regular Expression to DFA, NFA, Minimize
Page 16

Complier Design - Operations on Languages, RE, Finite Automata

  • 1.
    Dr. Mohamed GamalFaculty of Computers and Informatics Contents 1 Operations on Languages (Sets) 2 1.1 L1 Concatenation L2 (L1 · L2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 L1 Union L2 (L1 ∪ L2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 L3 1 (L1 Concatenated with Itself 3 Times) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 L∗ 1 (Kleene Closure of L1 - Zero or more) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 L+ 1 (Positive Closure of L1 - One or more) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Regular Expressions (RE) and Languages 4 2.1 Basic Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 C-Language Identifiers and Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Finite Automata 7 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Minimizing Number of States of a DFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Page 1
  • 2.
    Dr. Mohamed GamalFaculty of Computers and Informatics 1 Operations on Languages (Sets) Operation Definition & Notation Union of L and M L ∪ M = {s | s is in L or s in M} Concatenation of L and M LM = {st | s is in L and t is in M} Kleene closure of L and M L∗ = ∪∞ i=0Li Positive closure of L L+ = ∪∞ i=1Li Given: L1 = {a, b, c, d} and L2 = {1, 2} 1.1 L1 Concatenation L2 (L1 · L2) The concatenation of L1 and L2 forms strings by concatenating each element of L1 with each element of L2. L1 · L2 = {a1, a2, b1, b2, c1, c2, d1, d2} 1.2 L1 Union L2 (L1 ∪ L2) The union of L1 and L2 combines all elements from both sets. L1 ∪ L2 = {a, b, c, d, 1, 2} 1.3 L3 1 (L1 Concatenated with Itself 3 Times) This is the set of all possible strings formed by concatenating three elements of L1. L3 1 =                    aaa, aab, aac, aad, aba, abb, abc, abd, aca, acb, acc, acd, ada, adb, adc, add, baa, bab, bac, bad, bba, bbb, bbc, bbd, bca, bcb, bcc, bcd, bda, bdb, bdc, bdd, caa, cab, cac, cad, cba, cbb, cbc, cbd, cca, ccb, ccc, ccd, cda, cdb, cdc, cdd, daa, dab, dac, dad, dba, dbb, dbc, dbd, dca, dcb, dcc, dcd, dda, ddb, ddc, ddd                    Page 2
  • 3.
    Dr. Mohamed GamalFaculty of Computers and Informatics 1.4 L∗ 1 (Kleene Closure of L1 - Zero or more) The Kleene closure of L1 includes all possible strings including the empty string. L∗ 1 =                                                                    ϵ, a, b, c, d, aa, ab, ac, ad, ba, bb, bc, bd, ca, cb, cc, cd, da, db, dc, dd, aaa, aab, aac, aad, aba, abb, abc, abd, aca, acb, acc, acd, ada, adb, adc, add, bba, bbb, bbc, bbd, bca, bcb, bcc, bcd, bda, bdb, bdc, bdd, caa, cab, cac, cad, cba, cbb, cbc, cbd, cca, ccb, ccc, ccd, cda, cdb, cdc, cdd, daa, dab, dac, dad, dba, dbb, dbc, dbd, dca, dcb, dcc, dcd, dda, ddb, ddc, ddd, . . .                                                                    1.5 L+ 1 (Positive Closure of L1 - One or more) The positive closure of L1 includes all possible non-empty strings formed by concatenating elements of L1 one or more times. L+ 1 =                                        a, b, c, d, aa, ab, ac, ad, ba, bb, bc, bd, ca, cb, cc, cd, da, db, dc, dd, aaa, aab, aac, aad, aba, abb, abc, abd, aca, acb, acc, acd, ada, adb, adc, add, . . .                                        Page 3
  • 4.
    Dr. Mohamed GamalFaculty of Computers and Informatics 2 Regular Expressions (RE) and Languages 2.1 Basic Terminology Regular Expression Language Denoted Explanation a {a} The language contains only the string ‘a’. ab {ab} The language contains only the string “ab”. (a | b) {a, b} The language contains the strings ‘a’ and ‘b’. a(b | c) {ab, ac} The language contains the strings “ab” and “ac”. a(bc)∗ {a, abc, abcbc, abcbcbc, . . .} The language contains strings that start with ‘a’ followed by zero or more repetitions of “bc”. (ab | cd)∗ {ϵ, ab, cd, abcd, ababcd, cdab, . . .} The language contains strings formed by con- catenating zero or more repetitions of “ab” or “cd”. a(bc | de)∗ {a, abc, ade, abcde, abcbc, . . .} The language contains strings that start with ‘a’ followed by zero or more repetitions of “bc” or “de”. (a | b)∗ c {c, ac, bc, aac, abc, bbc, . . .} The language contains strings that end with ‘c’ and are preceded by zero or more repetitions of ‘a’ or ‘b’. a∗ {ϵ, a, aa, aaa, . . .} The language contains zero or more repetitions of ‘a’. a+b {ab, aab, aaab, . . .} The language contains strings with one or more ‘a’ followed by ‘b’. (a | b)∗ abb {abb, aabb, babb, aaabb, . . .} The language contains strings that end with “abb” and can have zero or more repetitions of ‘a’ or ‘b’ before it. (a | b)∗ {ϵ, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, . . .} The language contains all possible strings (in- cluding the empty string) made up of ‘a’ and ‘b’. Page 4
  • 5.
    Dr. Mohamed GamalFaculty of Computers and Informatics 2.2 C-Language Identifiers and Numbers C-Language Identifiers Definition Regular Expression Example letter [A-Za-z ] a, B, digit [0-9] 0, 1, 9 CId letter ( letter — digit )∗ var, myVar123, temp Unsigned Integer or Floating Point Numbers Definition Regular Expression Example digit [0-9] 0, 1, 9 digits digit+ 123, 4567 number digits (.digits)? ( E [+−]? digits)? 42, 3.14, 2.71E-3 Regular Expressions Validators: ˆ https://regexr.com/ ˆ https://regex101.com/ Page 5
  • 6.
    Dr. Mohamed GamalFaculty of Computers and Informatics 2.3 Examples Write a regular expression for a language 1. accepting all strings of lowercase letters in which the letters are in ascending order. Solution: R.E = a∗ b∗ . . . z∗. 2. accepting all strings which contain exactly two a’s, where P = {a, b} Solution: R.E = b∗ a b∗ a b∗. 3. accepting all strings which contain exactly one a, where P = {a, b, c} Solution: R.E = (b | c)∗ a (b | c)∗. 4. accepting all strings which contain as maximum three a’s, where P = {a, b, c} Solution: R.E = (b | c)∗ a? (b | c)∗ a? (b | c)∗ a? (b | c)∗. 5. that doesn’t have ab as a substring, where P = {a, b} Solution: R.E = b∗ a∗. 6. accepting all strings which contain 010 as a substring, where P = {0, 1} Solution: R.E = (0 | 1)∗ 010 (0 | 1)∗. 7. accepting all strings where a is multiple of 3, where P = { a, b, c } Solution: R.E = (b | c)∗ (aaa)∗ (b | c)∗. 8. that doesn’t end with ab, where P = { a, b } Solution: R.E = (a | b)∗ (a | bb)+. 9. accepting strings of even length, where P = { a, b } Solution: R.E = (aa | bb | ab | ba)∗. 10. including digits that begin with 1 and end with 1, where P = { 0, 1 } Solution: R.E = 1 (0 | 1)∗ 1. Page 6
  • 7.
    Dr. Mohamed GamalFaculty of Computers and Informatics 3 Finite Automata 3.1 Introduction A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and “no” otherwise. We call the recognizer of the tokens as a finite automaton which can be 1. Non-deterministic (NFA): slower, but it may take less space. 1) S: a set of states. 2) P : a set of input symbols (alphabet). 3) move: a transition function move to map state-symbol pairs to sets of states. 4) s0: a start (initial) state. 5) F: a set of accepting states (final states). Note that ϵ-transitions are allowed in NFA. Example: 2. Deterministic (DFA): faster recognizer, but it may take more space - widely used. ˆ DFA is a special form of a NFA. ˆ No ϵ-transitions. ˆ for each symbol a and state s, there is at most one labeled edge leaving s. Example: Two Algorithms: Algorithm 1: Regular Expression → NFA → DFA Algorithm 2: Regular Expression → DFA Page 7
  • 8.
    Dr. Mohamed GamalFaculty of Computers and Informatics Algorithm 1: Regular Expression → NFA → DFA 1. Thomson’s Construction: Regular Expression → NFA Operation Decsription ϵ-closure(s) Set of NFA states reachable from NFA state s on ϵ-transitions alone. ϵ-closure(T) Set of NFA states reachable from some NFA state s in set T on ϵ-transitions alone; = ∪s in T ϵ − closure(s). move(T, a) Set of NFA states to which there is a transition on input symbol ’a’ from some state s in T. Page 8
  • 9.
    Dr. Mohamed GamalFaculty of Computers and Informatics Example to recognize the regular expression (a|b)∗ a 2. Converting NFA → DFA Page 9
  • 10.
    Dr. Mohamed GamalFaculty of Computers and Informatics NFA State DFA State a b Type {0, 1, 2, 4, 7} S0 S1 S2 Start {1, 2, 3, 4, 6, 7, 8} S1 S1 S2 Final {1, 2, 4, 5, 6, 7} S2 S1 S2 - Transition table for the DFA Another Example: NFA for (a|b)∗ abb NFA State DFA State a b Type { 0,1 ,2 , 4, 7 } A B C Start { 1, 2, 3, 4, 6, 7, 8 } B B D - { 1, 2, 4, 5, 6, 7 } C B C - { 1, 2, 4, 5, 6, 7, 9 } D B E - { 1, 2, 3, 5, 6, 7, 10 } E B C Final Transition table for the DFA Page 10
  • 11.
    Dr. Mohamed GamalFaculty of Computers and Informatics Resulting DFA Page 11
  • 12.
    Dr. Mohamed GamalFaculty of Computers and Informatics Algorithm 2: Regular Expression → DFA 1. Augment the given regular expression by concatenating it with a special symbol # at the end. e.g., r → (r)# 2. Create a syntax tree for this augmented regular expression. ˆ all alphabet symbols (plus # and the empty string) will be on the leaves. ˆ all inner nodes will be the operators. 3. Each alphabet symbol (plus #) should be numbered (i.e., position numbers). 4. Calculate the functions: followpos, firstpos, lastpos, nullable ˆ firstpos(node): set of positions in the string that can start at the node’s subtree. ˆ lastpos(node): set of positions in the string that can end at the node’s subtree. ˆ followpos(i): set of positions that can follow position i in the regular expression. Page 12
  • 13.
    Dr. Mohamed GamalFaculty of Computers and Informatics Algorithm 1 Computing followpos 1: for each node n in the tree do 2: if n is a concatenation node with left child c1 and right child c2 then 3: for each i in lastpos(c1) do 4: followpos(i) ← followpos(i) ∪ firstpos(c2) 5: end for 6: else if n is a star node then 7: for each i in lastpos(n) do 8: followpos(i) ← followpos(i) ∪ firstpos(n) 9: end for 10: end if 11: end for Example: We first construct the syntax tree using the table above and find the firstpos and lastpos for each node, as follows Then we calculate followpos using algorithm 1 above, the result is as follows ˆ followpos(1) = {1, 2, 3} ˆ followpos(2) = {1, 2, 3} ˆ followpos(3) = {4} ˆ followpos(4) = {} Page 13
  • 14.
    Dr. Mohamed GamalFaculty of Computers and Informatics Now we are ready to construct the corresponding DFA using the followpos. (a 1 | b 2 )∗ a 3 # 4 followpos(1) = {1, 2, 3} followpos(2) = {1, 2, 3} followpos(3) = {4} followpos(4) = {} S1 = firstpos(root) = {1, 2, 3} ⇓ mark S1 a: followpos(1) ∪ followpos(3) = {1, 2, 3, 4} = S2 move(S1, a) = S2 b: followpos(2) = {1, 2, 3} = S1 move(S1, b) = S1 ⇓ mark S2 a: followpos(1) ∪ followpos(3) = {1, 2, 3, 4} = S2 move(S1, a) = S2 b: followpos(2) = {1, 2, 3} = S1 move(S1, b) = S1 Start State: S1 Final (accepting) State: S2 Final resulting DFA Page 14
  • 15.
    Dr. Mohamed GamalFaculty of Computers and Informatics 3.2 Minimizing Number of States of a DFA The process of minimizing a DFA is 1. Initial Partition: divide all the states of your DFA into two groups ˆ G1: Accepting States. ˆ G2: Non-Accepting States. 2. Refining Groups: refine these groups to create subgroups based on transitions ˆ For each group G (either G1 or G2): – Break G into smaller subgroups. – Two states, s1 and s2, will be in the same subgroup if and only if, for every input symbol, they transition to states within the same subgroup. 3. Start State: the start state of the minimized DFA is the group that contains the original start state of the DFA. This helps in keeping the starting behavior consistent with the original DFA. 4. Accepting States: the accepting states of the minimized DFA are those groups that contain at least one of the original accepting states. This ensures that the minimized DFA correctly represents all the accepting conditions of the original DFA. Example: Example DFA 1. Partition the set of states into two groups (accepting & non-accepting) G1 = {4} G2 = {1, 2, 3} 2. Refine the groups based on the DFA transition table State a b 1 2 3 2 2 3 3 4 3 4 2 3 DFA transition table Page 15
  • 16.
    Dr. Mohamed GamalFaculty of Computers and Informatics From the transition table we notice that the states 1 and 2 in G2 lead to the same input symbols (colored in red), thus, we can group them alone together. G1 = {4} G2 = {1, 2, 3} {1, 2} {3} ⇓ {1, 2} {3} {4} 3. Start State: {1, 2} 4. Final State: {4} Minimized DFA Online Resources: ˆ Regular Expression to DFA, NFA, Minimize Page 16