Complier Design - Operations on Languages, RE, Finite Automata

Dr. Mohamed Gamal Faculty of Computers and Informatics
Contents
1 Operations on Languages (Sets) 2
1.1 L1 Concatenation L2 (L1 · L2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 L1 Union L2 (L1 ∪ L2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 L3
1 (L1 Concatenated with Itself 3 Times) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 L∗
1 (Kleene Closure of L1 - Zero or more) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 L+
1 (Positive Closure of L1 - One or more) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Regular Expressions (RE) and Languages 4
2.1 Basic Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 C-Language Identifiers and Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Finite Automata 7
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Minimizing Number of States of a DFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Page 1

1 Operations on Languages (Sets)
Operation Definition & Notation
Union of L and M L ∪ M = {s | s is in L or s in M}
Concatenation of L and M LM = {st | s is in L and t is in M}
Kleene closure of L and M L∗ = ∪∞
i=0Li
Positive closure of L L+ = ∪∞
i=1Li
Given:
L1 = {a, b, c, d} and L2 = {1, 2}
1.1 L1 Concatenation L2 (L1 · L2)
The concatenation of L1 and L2 forms strings by concatenating each element of L1 with each element of L2.
L1 · L2 = {a1, a2, b1, b2, c1, c2, d1, d2}
1.2 L1 Union L2 (L1 ∪ L2)
The union of L1 and L2 combines all elements from both sets.
L1 ∪ L2 = {a, b, c, d, 1, 2}
1.3 L3
1 (L1 Concatenated with Itself 3 Times)
This is the set of all possible strings formed by concatenating three elements of L1.
L3
1 =



















aaa, aab, aac, aad, aba, abb, abc, abd, aca, acb, acc, acd, ada, adb, adc, add,
baa, bab, bac, bad, bba, bbb, bbc, bbd, bca, bcb, bcc, bcd, bda, bdb, bdc, bdd,
caa, cab, cac, cad, cba, cbb, cbc, cbd, cca, ccb, ccc, ccd, cda, cdb, cdc, cdd,
daa, dab, dac, dad, dba, dbb, dbc, dbd, dca, dcb, dcc, dcd, dda, ddb, ddc, ddd



















Page 2

1.4 L∗
1 (Kleene Closure of L1 - Zero or more)
The Kleene closure of L1 includes all possible strings including the empty string.
L∗
1 =



































































ϵ,
a, b, c, d,
aa, ab, ac, ad,
ba, bb, bc, bd,
ca, cb, cc, cd,
da, db, dc, dd,
bba, bbb, bbc, bbd, bca, bcb, bcc, bcd, bda, bdb, bdc, bdd,
caa, cab, cac, cad, cba, cbb, cbc, cbd, cca, ccb, ccc, ccd, cda, cdb, cdc, cdd,
daa, dab, dac, dad, dba, dbb, dbc, dbd, dca, dcb, dcc, dcd, dda, ddb, ddc, ddd,
. . .



































































1.5 L+
1 (Positive Closure of L1 - One or more)
The positive closure of L1 includes all possible non-empty strings formed by concatenating elements of L1 one
or more times.
L+
1 =







































a, b, c, d,
aa, ab, ac, ad,
ba, bb, bc, bd,
ca, cb, cc, cd,
da, db, dc, dd,
. . .







































Page 3

2 Regular Expressions (RE) and Languages
2.1 Basic Terminology
Regular
Expression
Language Denoted Explanation
a {a} The language contains only the string ‘a’.
ab {ab} The language contains only the string “ab”.
(a | b) {a, b} The language contains the strings ‘a’ and ‘b’.
a(b | c) {ab, ac} The language contains the strings “ab” and
“ac”.
a(bc)∗ {a, abc, abcbc, abcbcbc, . . .} The language contains strings that start with ‘a’
followed by zero or more repetitions of “bc”.
(ab | cd)∗ {ϵ, ab, cd, abcd, ababcd, cdab, . . .} The language contains strings formed by con-
catenating zero or more repetitions of “ab” or
“cd”.
a(bc | de)∗ {a, abc, ade, abcde, abcbc, . . .} The language contains strings that start with ‘a’
followed by zero or more repetitions of “bc” or
“de”.
(a | b)∗ c {c, ac, bc, aac, abc, bbc, . . .} The language contains strings that end with ‘c’
and are preceded by zero or more repetitions of
‘a’ or ‘b’.
a∗ {ϵ, a, aa, aaa, . . .} The language contains zero or more repetitions
of ‘a’.
a+b {ab, aab, aaab, . . .} The language contains strings with one or more
‘a’ followed by ‘b’.
(a | b)∗ abb {abb, aabb, babb, aaabb, . . .} The language contains strings that end with
“abb” and can have zero or more repetitions of
‘a’ or ‘b’ before it.
(a | b)∗ {ϵ, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, . . .} The language contains all possible strings (in-
cluding the empty string) made up of ‘a’ and
‘b’.
Page 4

2.2 C-Language Identifiers and Numbers
C-Language Identifiers
Definition Regular Expression Example
letter [A-Za-z ] a, B,
digit [0-9] 0, 1, 9
CId letter ( letter — digit )∗ var, myVar123, temp
Unsigned Integer or Floating Point Numbers
Definition Regular Expression Example
digit [0-9] 0, 1, 9
digits digit+ 123, 4567
number digits (.digits)? ( E [+−]? digits)? 42, 3.14, 2.71E-3
Regular Expressions Validators:
ˆ https://regexr.com/
ˆ https://regex101.com/
Page 5

2.3 Examples
Write a regular expression for a language
1. accepting all strings of lowercase letters in which the letters are in ascending order.
Solution: R.E = a∗ b∗ . . . z∗.
2. accepting all strings which contain exactly two a’s, where
P
= {a, b}
Solution: R.E = b∗ a b∗ a b∗.
3. accepting all strings which contain exactly one a, where
P
= {a, b, c}
Solution: R.E = (b | c)∗ a (b | c)∗.
4. accepting all strings which contain as maximum three a’s, where
P
= {a, b, c}
Solution: R.E = (b | c)∗ a? (b | c)∗ a? (b | c)∗ a? (b | c)∗.
5. that doesn’t have ab as a substring, where
P
= {a, b}
Solution: R.E = b∗ a∗.
6. accepting all strings which contain 010 as a substring, where
P
= {0, 1}
Solution: R.E = (0 | 1)∗ 010 (0 | 1)∗.
7. accepting all strings where a is multiple of 3, where
P
= { a, b, c }
Solution: R.E = (b | c)∗ (aaa)∗ (b | c)∗.
8. that doesn’t end with ab, where
P
= { a, b }
Solution: R.E = (a | b)∗ (a | bb)+.
9. accepting strings of even length, where
P
= { a, b }
Solution: R.E = (aa | bb | ab | ba)∗.
10. including digits that begin with 1 and end with 1, where
P
= { 0, 1 }
Solution: R.E = 1 (0 | 1)∗ 1.
Page 6

3 Finite Automata
3.1 Introduction
A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that
language, and “no” otherwise.
We call the recognizer of the tokens as a finite automaton which can be
1. Non-deterministic (NFA): slower, but it may take less space.
1) S: a set of states.
2)
P
: a set of input symbols (alphabet).
3) move: a transition function move to map state-symbol pairs to sets of states.
4) s0: a start (initial) state.
5) F: a set of accepting states (final states).
Note that ϵ-transitions are allowed in NFA.
Example:
2. Deterministic (DFA): faster recognizer, but it may take more space - widely used.
ˆ DFA is a special form of a NFA.
ˆ No ϵ-transitions.
ˆ for each symbol a and state s, there is at most one labeled edge leaving s.
Example:
Two Algorithms:
Algorithm 1: Regular Expression → NFA → DFA
Algorithm 2: Regular Expression → DFA
Page 7

Algorithm 1: Regular Expression → NFA → DFA
1. Thomson’s Construction: Regular Expression → NFA
Operation Decsription
ϵ-closure(s) Set of NFA states reachable from NFA state s on ϵ-transitions
alone.
ϵ-closure(T) Set of NFA states reachable from some NFA state s in set T on
ϵ-transitions alone; = ∪s in T ϵ − closure(s).
move(T, a) Set of NFA states to which there is a transition on input symbol
’a’ from some state s in T.
Page 8

Example to recognize the regular expression (a|b)∗ a
2. Converting NFA → DFA
Page 9

NFA State DFA State a b Type
{0, 1, 2, 4, 7} S0 S1 S2 Start
{1, 2, 3, 4, 6, 7, 8} S1 S1 S2 Final
{1, 2, 4, 5, 6, 7} S2 S1 S2 -
Transition table for the DFA
Another Example:
NFA for (a|b)∗
abb
NFA State DFA State a b Type
{ 0,1 ,2 , 4, 7 } A B C Start
{ 1, 2, 3, 4, 6, 7, 8 } B B D -
{ 1, 2, 4, 5, 6, 7 } C B C -
{ 1, 2, 4, 5, 6, 7, 9 } D B E -
{ 1, 2, 3, 5, 6, 7, 10 } E B C Final
Transition table for the DFA
Page 10

Resulting DFA
Page 11

Algorithm 2: Regular Expression → DFA
1. Augment the given regular expression by concatenating it with a special symbol # at the end.
e.g., r → (r)#
2. Create a syntax tree for this augmented regular expression.
ˆ all alphabet symbols (plus # and the empty string) will be on the leaves.
ˆ all inner nodes will be the operators.
3. Each alphabet symbol (plus #) should be numbered (i.e., position numbers).
4. Calculate the functions: followpos, firstpos, lastpos, nullable
ˆ firstpos(node): set of positions in the string that can start at the node’s subtree.
ˆ lastpos(node): set of positions in the string that can end at the node’s subtree.
ˆ followpos(i): set of positions that can follow position i in the regular expression.
Page 12

Algorithm 1 Computing followpos
1: for each node n in the tree do
2: if n is a concatenation node with left child c1 and right child c2 then
3: for each i in lastpos(c1) do
4: followpos(i) ← followpos(i) ∪ firstpos(c2)
5: end for
6: else if n is a star node then
7: for each i in lastpos(n) do
8: followpos(i) ← followpos(i) ∪ firstpos(n)
9: end for
10: end if
11: end for
Example:
We first construct the syntax tree using the table above and find the firstpos and lastpos for each node, as
follows
Then we calculate followpos using algorithm 1 above, the result is as follows
ˆ followpos(1) = {1, 2, 3}
ˆ followpos(2) = {1, 2, 3}
ˆ followpos(3) = {4}
ˆ followpos(4) = {}
Page 13

Now we are ready to construct the corresponding DFA using the followpos.
(a
1
| b
2
)∗
a
3
#
4
followpos(1) = {1, 2, 3}
followpos(2) = {1, 2, 3}
followpos(3) = {4}
followpos(4) = {}
S1 = firstpos(root) = {1, 2, 3}
⇓ mark S1
a: followpos(1) ∪ followpos(3) = {1, 2, 3, 4} = S2 move(S1, a) = S2
b: followpos(2) = {1, 2, 3} = S1 move(S1, b) = S1
⇓ mark S2
a: followpos(1) ∪ followpos(3) = {1, 2, 3, 4} = S2 move(S1, a) = S2
b: followpos(2) = {1, 2, 3} = S1 move(S1, b) = S1
Start State: S1
Final (accepting) State: S2
Final resulting DFA
Page 14

3.2 Minimizing Number of States of a DFA
The process of minimizing a DFA is
1. Initial Partition: divide all the states of your DFA into two groups
ˆ G1: Accepting States.
ˆ G2: Non-Accepting States.
2. Refining Groups: refine these groups to create subgroups based on transitions
ˆ For each group G (either G1 or G2):
– Break G into smaller subgroups.
– Two states, s1 and s2, will be in the same subgroup if and only if, for every input symbol, they
transition to states within the same subgroup.
3. Start State: the start state of the minimized DFA is the group that contains the original start state of
the DFA. This helps in keeping the starting behavior consistent with the original DFA.
4. Accepting States: the accepting states of the minimized DFA are those groups that contain at least
one of the original accepting states. This ensures that the minimized DFA correctly represents all the
accepting conditions of the original DFA.
Example:
Example DFA
1. Partition the set of states into two groups (accepting & non-accepting)
G1 = {4}
G2 = {1, 2, 3}
2. Refine the groups based on the DFA transition table
State a b
1 2 3
2 2 3
3 4 3
4 2 3
DFA transition table
Page 15

From the transition table we notice that the states 1 and 2 in G2 lead to the same input symbols (colored
in red), thus, we can group them alone together.
G1 = {4}
G2 = {1, 2, 3}
{1, 2} {3}
⇓
{1, 2} {3} {4}
3. Start State: {1, 2}
4. Final State: {4}
Minimized DFA
Online Resources:
ˆ Regular Expression to DFA, NFA, Minimize
Page 16

Complier Design - Operations on Languages, RE, Finite Automata

More Related Content

Similar to Complier Design - Operations on Languages, RE, Finite Automata

More from Faculty of Computers and Informatics, Suez Canal University, Ismailia, Egypt

Recently uploaded

Complier Design - Operations on Languages, RE, Finite Automata