4. SPECIFICATION OF TOKEN
⌐ Regular expressions are important notations for specifying lexeme patterns. While they
cannot express all possible patterns, they are very effective in specifying those types of
patterns that we actually need for tokens.
⌐ For lexical analysis we care about regular languages, which can be described using
regular expressions.
⌐ Q: How are tokens defined and recognized?
Ans: By using regular expressions to define a token as a formal regular language.
Compilers (CPL5316) # 4 Lectured by : Rebaz Najeeb
5. STRINGS AND LANGUAGES
⌐ An alphabet: is any finite set of symbols such as letters, digits, and punctuation. It denoted by
the symbol ∑
i. The set {0,1} is the binary alphabet.
⌐ A string:
⌐ Is a finite sequence of symbols over an alphabet or drown from alphabet.
⌐ The terms "sentence" and "word" are often used as synonyms for "string."
⌐ The empty string is the string of length zero. It denoted by є .
⌐ |s| represents the length of a string s. Ex: banana is a string of length 6 such as |s|=6
⌐Language :
⌐Is any countable set of strings over some fixed alphabet L = {A, . . . , Z}, then{“A”,”B”,”C”,
“BF”…,”ABZ”,…} is considering the language
Compilers (CPL5316) # 5 Lectured by : Rebaz Najeeb
6. OPERATIONS ON LANGUAGES
⌐ Union: is the set of letters and digits. If A and B are the two sets then it denoted as :
A U B .
⌐ Concatenation: is the set of strings of length two. If A and B are the two sets, then it
denoted as : A B
⌐ Exponentiation: A4 is the set of all 4-letter strings.
⌐ Kleene Closure: A* is the set of all strings of letters, including the empty string .
⌐ Positive Closure : A+ is the set of all strings of one or more digits.
⌐ A(A U B)* is the set of all strings of letters and digits beginning with a letter.
Compilers (CPL5316) # 6 Lectured by : Rebaz Najeeb
7. EXAMPLES OF OPERATIONS
⌐ let A= {a,b,c} B= {1,2}
⌐ AB = {a1,a2,b1,b2,c1,c2}
⌐ A U B = {a,b,c,1,2}
⌐ A3 = all strings with length three (using a,b,c}
⌐ A* = all strings using letters a,b,c and empty string
⌐ A+ = doesn’t include the empty string
Compilers (CPL5316) # 7 Lectured by : Rebaz Najeeb
8. Regular expressions rules
⌐ We may remove parentheses by using precedence rules.
⌐ * highest
⌐ concatenation next
⌐ | lowest
a∣(b(c*)) = a ∣ bc*
⌐ Let ∑ = {a, b}
⌐a ∣ b => stands for the set {a, b}
⌐ab => stands for the set {ab}
⌐ (a ∣ b) (a ∣ b) => {aa, ab, ba, bb}
⌐a* => {ԑ, a, aa, aaa, ... }
⌐(a ∣ b)* => all strings containing zero or more instances of a's and b's {ԑ, a, b, aa, ab, ba, bb, aaa, …. }
⌐a ∣ a * b => { a, b, ab, aab, aaab, ... }
Compilers (CPL5316) # 8 Lectured by : Rebaz Najeeb
9. REGULAR DEFINITION
⌐ Regular definitions are multiline regular expressions Each line can refer to any of the previous lines but not to
itself or to subsequent lines.
⌐ To write regular expression for some languages can be difficult or can be quite complex. In those cases, we may
use regular definitions (names).
⌐ Ex: define Identifiers using RegEx
Letter → A | B | ... | Z | a | b | ... | z
digit → 0 | 1 | ... | 9
id → letter (letter | digit ) * ==
⌐Ex: Unsigned numbers
digit → 0 | 1 | ... | 9
digits → digit digit*
opt-fraction → . digits | ԑ Opt(optional)
** Regular definitions are not recursive: digits digit digitsdigit wrong!
Compilers (CPL5316) # 9 Lectured by : Rebaz Najeeb
(A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *
10. Regular expression examples
⌐ consider ∑={a,b } then generate all strings that has length exactly 2
L1= { aa, ab , ba , bb }
RE= aa + ab + ba + bb a (a+b) + b(a+b) (a+b) (a+b)
--------------------------------------------
String Length at least two : length can be 2,3,4,….
(a+b) (a+b) (a+b)*
Compilers (CPL5316) # 10 Lectured by : Rebaz Najeeb
11. Regular expression examples
⌐ consider ∑={a,b } , find RE that covers all strings that have at most length 2
Ans: (ε+a+b) (ε+a+b)
-----------------------------------------------------------
String length= even numbers
L1 {ε , aa , ab , ba , bb , aaaa, bbba , bbaa ,… }
Ans: ((a+b) (a+b) )*
((a+b) (a+b) )n ((a+b)2)n ((a+b))2n whiele N>=0
Compilers (CPL5316) # 11 Lectured by : Rebaz Najeeb
13. Regular expression examples
consider ∑={a,b } , find RE that covers all strings that have exactly two a ‘s
Ans: b* a b* a b*
-----------------------------------------
At least two a
b* a b* a (a+b)*
---------------------------------------------------------
At most two a L1= {ε,a,b,bb,ab , aab , bbabba , }
Ans: b* (a+ ε) b* (a+ ε) b*
Compilers (CPL5316) # 13 Lectured by : Rebaz Najeeb
14. Regular expression examples
consider ∑={a,b } , find RE that covers all strings in which number of (a)s are even
Ans: (b* a b* a b* )* b*
-----------------------------
Start with (a)
Ans: a ( a + b )*
----------------------------------------
Start and end with different alphabet
Ans: a (a+b)* b + b (a+b)* a
Compilers (CPL5316) # 14 Lectured by : Rebaz Najeeb
15. Catch patterns
consider ∑={0-9} , find RE that catches all the numbers that match the result of (9)+ by any
number.
(9)n * 2 = 1 (9)n-1 8
(9)n * 3 = 2 (9)n-1 7
Compilers (CPL5316) # 15 Lectured by : Rebaz Najeeb
16. Homework
consider ∑={letter or digits} , find RE that covers valid email addresses.
Rebaz.Najeeb@gmail.com
Compilers (CPL5316) # 16 Lectured by : Rebaz Najeeb