SlideShare a Scribd company logo
Phases of Syntax Analysis
1. Identify the words: Lexical Analysis.
Converts a stream of characters (input program) into a stream of tokens.
Also called Scanning or Tokenizing.
2. Identify the sentences: Parsing.
Derive the structure of sentences: construct parse trees from a stream of tokens.
Lexical Analysis
Convert a stream of characters into a stream of tokens.
• Simplicity: Conventions about “words” are often different from conventions about “sentences”.
• Efficiency: Word identification problem has a much more efficient solution than sentence identification problem.
• Portability: Character set, special characters, device features.
Terminology
• Token: Name given to a family of words.
e.g., integer constant
• Lexeme: Actual sequence of characters representing a word.
e.g., 32894
• Pattern: Notation used to identify the set of lexemes represented by a token.
e.g., [0 − 9]+
Terminology
A few more examples:
Token Sample Lexemes Pattern
while while while
integer constant 32894, -1093, 0 [0-9]+
identifier buffer size [a-zA-Z]+
Patterns
How do we compactly represent the set of all lexemes corresponding to a token?
For instance:
The token integer constant represents the set of all integers: that is, all sequences of digits (0–9), preceded by an optional
sign (+ or −).
Obviously, we cannot simply enumerate all lexemes.
Use Regular Expressions.
Regular Expressions
Notation to represent (potentially) infinite sets of strings over alphabet Σ.
• a: stands for the set {a} that contains a single string a.
⊲ Analogous to Union.
• ab: stands for the set {ab} that contains a single string ab.
⊲ Analogous to Product.
⊲ (a|b)(a|b): stands for the set {aa, ab, ba, bb}.
• a∗
: stands for the set {ǫ, a, aa, aaa, . . .} that contains all strings of zero or more a’s.
⊲ Analogous to closure of the product operation.
Regular Expressions
Examples of Regular Expressions over {a, b}:
• (a|b)∗
: Set of strings with zero or more a’s and zero or more b’s:
{ǫ, a, b, aa, ab, ba, bb, aaa, aab, . . .}
• (a∗
b∗
): Set of strings with zero or more a’s and zero or more b’s such that all a’s occur before any b:
{ǫ, a, b, aa, ab, bb, aaa, aab, abb, . . .}
• (a∗
b∗
)∗
: Set of strings with zero or more a’s and zero or more b’s:
{ǫ, a, b, aa, ab, ba, bb, aaa, aab, . . .}
Language of Regular Expressions
Let R be the set of all regular expressions over Σ. Then,
• Empty String: ǫ ∈ R
• Unit Strings: α ∈ Σ ⇒ α ∈ R
• Concatenation: r1, r2 ∈ R ⇒ r1r2 ∈ R
• Alternative: r1, r2 ∈ R ⇒ (r1 | r2) ∈ R
• Kleene Closure: r ∈ R ⇒ r∗
∈ R
Regular Expressions
Example: (a | b)∗
L0 = {ǫ}
L1 = L0 · {a, b}
= {ǫ} · {a, b}
= {a, b}
L2 = L1 · {a, b}
= {a, b} · {a, b}
= {aa, ab, ba, bb}
L3 = L2 · {a, b}
...
L =
∞
i=0
Li = {ǫ, a, b, aa, ab, ba, bb, . . .}
Semantics of Regular Expressions
Semantic Function L : Maps regular expressions to sets of strings.
L(ǫ) = {ǫ}
L(α) = {α} (α ∈ Σ)
L(r1 | r2) = L(r1) ∪ L(r2)
L(r1 r2) = L(r1) · L(r2)
L(r∗
) = {ǫ} ∪ (L(r) · L(r∗
))
Computing the Semantics
L(a) = {a}
L(a | b) = L(a) ∪ L(b)
= {a} ∪ {b}
= {a, b}
L(ab) = L(a) · L(b)
= {a} · {b}
= {ab}
L((a | b)(a | b)) = L(a | b) · L(a | b)
= {a, b} · {a, b}
= {aa, ab, ba, bb}
Computing the Semantics of Closure
Example: L((a | b)∗
)
= {ǫ} ∪ (L(a | b) · L((a | b)∗
))
L0 = {ǫ} Base case
L1 = {ǫ} ∪ ({a, b} · L0)
= {ǫ} ∪ ({a, b} · {ǫ})
= {ǫ, a, b}
L2 = {ǫ} ∪ ({a, b} · L1)
= {ǫ} ∪ ({a, b} · {ǫ, a, b})
= {ǫ, a, b, aa, ab, ba, bb}
...
L((a | b)∗
) = L∞ = {ǫ, a, b, aa, ab, ba, bb, . . .}
Another Example
L((a∗
b∗
)∗
) :
L(a∗
) = {ǫ, a, aa, . . .}
L(b∗
) = {ǫ, b, bb, . . .}
L(a∗
b∗
) = {ǫ, a, b, aa, ab, bb,
aaa, aab, abb, bbb, . . .}
L((a∗
b∗
)∗
) = {ǫ}
∪{ǫ, a, b, aa, ab, bb,
aaa, aab, abb, bbb, . . .}
∪{ǫ, a, b, aa, ab, ba, bb,
aaa, aab, aba, abb, baa, bab, bba, bbb, . . .}
.
.
.
Regular Definitions
Assign “names” to regular expressions.
For example,
digit −→ 0 | 1 | · · · | 9
natural −→ digit digit∗
Shorthands:
• a+
: Set of strings with one or more occurrences of a.
• a?
: Set of strings with zero or one occurrences of a.
Example:
integer −→ (+|−)?
digit+
Regular Definitions: Examples
float −→ integer . fraction
integer −→ (+|−)?
no leading zero
no leading zero −→ (nonzero digit digit∗
) | 0
fraction −→ no trailing zero exponent?
no trailing zero −→ (digit∗
nonzero digit) | 0
exponent −→ (E | e) integer
digit −→ 0 | 1 | · · · | 9
nonzero digit −→ 1 | 2 | · · · | 9
Regular Definitions and Lexical Analysis
Regular Expressions and Definitions specify sets of strings over an input alphabet.
• They can hence be used to specify the set of lexemes associated with a token.
⊲ Used as the pattern language
How do we decide whether an input string belongs to the set of strings specified by a regular expression?
Using Regular Definitions for Lexical Analysis
Q: Is ababbaabbb in L(((a∗
b∗
)∗
)?
A: Hm. Well. Let’s see.
L((a∗
b∗
)∗
) = {ǫ}
∪{ǫ, a, b, aa, ab, bb,
aaa, aab, abb, bbb, . . .}
∪{ǫ, a, b, aa, ab, ba, bb,
aaa, aab, aba, abb, baa, bab, bba, bbb, . . .}
...
= ???
Recognizers
Construct automata that recognize strings belonging to a language.
• Finite State Automata ⇒ Regular Languages
• Push Down Automata ⇒ Context-free Languages
⊲ Stack is used to maintain counter, but only one counter can go arbitrarily high.
Recognizing Finite Sets of Strings
Identifying words from a small, finite, fixed vocabulary is straightforward.
For instance, consider a stack machine with push, pop, and add operations with two constants: 0 and 1.
We can use the automaton:
s
h
p
p 0 1
u o
a
d
d
push
pop add
integer_constant
Finite State Automata
Represented by a labeled directed graph.
• A finite set of states (vertices).
• Transitions between states (edges).
• Labels on transitions are drawn from Σ ∪ {ǫ}.
• One distinguished start state.
• One or more distinguished final states.
Finite State Automata: An Example
Consider the Regular Expression (a | b)∗
a(a | b).
L((a | b)∗
a(a | b)) = {aa, ab, aaa, aab, baa, bab,
aaaa, aaab, abaa, abab, baaa, . . .}.
The following automaton determines whether an input string belongs to L((a | b)∗
a(a | b):
a
a
b b
a
1 2 3
Determinism
(a | b)∗
a(a | b):
Nondeterministic:
(NFA)
a
a
b b
a
1 2 3
Deterministic:
(DFA)
a
a
b
b
a
a
b
1 2
3
4
Acceptance Criterion
A finite state automaton (NFA or DFA) accepts an input string x
. . . if beginning from the start state
. . . we can trace some path through the automaton
. . . such that the sequence of edge labels spells x
. . . and end in a final state.
Recognition with an NFA
Is abab ∈ L((a | b)∗
a(a | b))?
a
a
b b
a
1 2 3
Input: a b a b
Path 1: 1 1 1 1 1
Path 2: 1 1 1 2 3 Accept
Path 3: 1 2 3 ⊥ ⊥
Accept
Recognition with an NFA
Is abab ∈ L((a | b)∗
a(a | b))?
a
a
b b
a
1 2 3
Input: a b a b
Path 1: 1 1 1 1 1
Path 2: 1 1 1 2 3 Accept
Path 3: 1 2 3 ⊥ ⊥
Accept
Recognition with a DFA
Is abab ∈ L((a | b)∗
a(a | b))?
a
a
b
b
a
a
b
b
1 2
3
4
NFA vs. DFA
For every NFA, there is a DFA that accepts the same set of strings.
• NFA may have transitions labeled by ǫ.
(Spontaneous transitions)
• All transition labels in a DFA belong to Σ.
• For some string x, there may be many accepting paths in an NFA.
• For all strings x, there is one unique accepting path in a DFA.
• Usually, an input string can be recognized faster with a DFA.
• NFAs are typically smaller than the corresponding DFAs.
Regular Expressions to NFA
Thompson’s Construction: For every regular expression r, derive an NFA N(r) with unique start and final states.
ǫ
ε
α ∈ Σ
α
(r1 | r2)
N(r )
1
ε
ε
ε
ε
N(r )
2
Regular Expressions to NFA (contd.)
r1r2 N(r )2
N(r )1
ε ε
r∗
ε ε
N(r)
ε
ε
Example
(a | b)∗
a(a | b):
ε
ε ε
ε
a
b
ε ε a
ε
ε ε
ε
a
b
ε
Recognition with an NFA
Is abab ∈ L((a | b)∗
a(a | b))?
a
a
b b
a
1 2 3
Input: a b a b
Path 1: 1 1 1 1 1
Path 2: 1 1 1 2 3 Accept
Path 3: 1 2 3 ⊥ ⊥
All Paths {1} {1, 2} {1, 3} {1, 2} {1, 3} Accept
Recognition with an NFA (contd.)
Is aaab ∈ L((a | b)∗
a(a | b))?
a
a
b b
a
1 2 3
Input: a a a b
Path 1: 1 1 1 1 1
Path 2: 1 1 1 1 2
Path 3: 1 1 1 2 3 Accept
Path 4: 1 1 2 3 ⊥
Path 5: 1 2 3 ⊥ ⊥
All Paths {1} {1, 2} {1, 2, 3} {1, 2, 3} {1, 2, 3} Accept
Recognition with an NFA (contd.)
Is aabb ∈ L((a | b)∗
a(a | b))?
a
a
b
b
a
1 2 3
Input: a a a b
Path 1: 1 1 1 1 1
Path 2: 1 1 2 3 ⊥
Path 3: 1 2 3 ⊥ ⊥
All Paths {1} {1, 2} {1, 2, 3} {1, 3} {1} REJECT
Converting NFA to DFA
Subset construction
Given a set S of NFA states,
• compute Sǫ = ǫ-closure(S): Sǫ is the set of all NFA states reachable by zero or more ǫ-transitions from S.
• compute Sα = goto(S, α):
– S′
is the set of all NFA states reachable from S by taking a transition labeled α.
– Sα = ǫ-closure(S′
).
Converting NFA to DFA (contd).
Each state in DFA corresponds to a set of states in NFA.
Start state of DFA = ǫ-closure(start state of NFA).
From a state s in DFA that corresponds to a set of states S in NFA:
add a transition labeled α to state s′
that corresponds to a non-empty S′
in NFA,
such that S′
= goto(S, α).
⇐ s is a final state of DFA
NFA → DFA: An Example
a
a
b b
a
1 2 3
ǫ-closure({1}) = {1}
goto({1}, a) = {1, 2}
goto({1}, b) = {1}
goto({1, 2}, a) = {1, 2, 3}
goto({1, 2}, b) = {1, 3}
goto({1, 2, 3}, a) = {1, 2, 3}
...
NFA → DFA: An Example (contd.)
ǫ-closure({1}) = {1}
goto({1}, a) = {1, 2}
goto({1}, b) = {1}
goto({1, 2}, a) = {1, 2, 3}
goto({1, 2}, b) = {1, 3}
goto({1, 2, 3}, a) = {1, 2, 3}
goto({1, 2, 3}, b) = {1}
goto({1, 3}, a) = {1, 2}
goto({1, 3}, b) = {1}
NFA → DFA: An Example (contd.)
goto({1}, a) = {1, 2}
goto({1}, b) = {1}
goto({1, 2}, a) = {1, 2, 3}
goto({1, 2}, b) = {1, 3}
goto({1, 2, 3}, a) = {1, 2, 3}
...
a
a
b
b
a
a
b
b
{1} {1,2}
{1,3}
{1,2,3}
NFA vs. DFA
R = Size of Regular Expression
N = Length of Input String
NFA DFA
Size of
Automaton
O(R) O(2R
)
Lexical Analysis
• Regular Expressions and Definitions are used to specify the set of strings (lexemes) corresponding to a token.
• An automaton (DFA/NFA) is built from the above specifications.
• Each final state is associated with an action: emit the corresponding token.
Specifying Lexical Analysis
Consider a recognizer for integers (sequence of digits) and floats (sequence of digits separated by a decimal point).
[0-9]+ { emit(INTEGER_CONSTANT); }
[0-9]+"."[0-9]+ { emit(FLOAT_CONSTANT); }
0-9
0-9
0-9
0-9
ε
0-9
0-9
ε "."
INTEGER_CONSTANT
FLOAT_CONSTANT
Lex
Tool for building lexical analyzers.
Input: lexical specifications (.l file)
Output: C function (yylex) that returns a token on each invocation.
%%
[0-9]+ { return(INTEGER_CONSTANT); }
[0-9]+"."[0-9]+ { return(FLOAT_CONSTANT); }
Tokens are simply integers (#define’s).
Lex Specifications
%{
C header statements for inclusion
%}
Regular Definitions e.g.:
digit [0-9]
%%
Token Specifications e.g.:
{digit}+ { return(INTEGER_CONSTANT); }
%%
Support functions in C
Regular Expressions in Lex
• Range: [0-7]: Integers from 0 through 7 (inclusive)
[a-nx-zA-Q]: Letters a thru n, x thru z and A thru Q.
• Exception: [^/]: Any character other than /.
• Definition: {digit}: Use the previously specified regular definition digit.
• Special characters: Connectives of regular expression, convenience features.
e.g.: | * ^
Special Characters in Lex
| * + ? ( ) Same as in regular expressions
[ ] Enclose ranges and exceptions
{ } Enclose “names” of regular definitions
^ Used to negate a specified range (in Exception)
. Match any single character except newline
 Escape the next character
n, t Newline and Tab
For literal matching, enclose special characters in double quotes (") e.g.: "*"
Or use  to escape. e.g.: "
Examples
for Sequence of f, o, r
"||" C-style OR operator (two vert. bars)
.* Sequence of non-newline characters
[^*/]+ Sequence of characters except * and /
"[^"]*" Sequence of non-quote characters
beginning and ending with a quote
({letter}|" ")({letter}|{digit}|" ")*
C-style identifiers
A Complete Example
%{
#include <stdio.h>
#include "tokens.h"
%}
digit [0-9]
hexdigit [0-9a-f]
%%
"+" { return(PLUS); }
"-" { return(MINUS); }
{digit}+ { return(INTEGER_CONSTANT); }
{digit}+"."{digit}+ { return(FLOAT_CONSTANT); }
. { return(SYNTAX_ERROR); }
%%
Actions
Actions are attached to final states.
• Distinguish the different final states.
• Can be used to set attribute values.
• Fragment of C code (blocks enclosed by ‘{’ and ‘}’).
Attributes
Additional information about a token’s lexeme.
• Stored in variable yylval
• Type of attributes (usually a union) specified by YYSTYPE
• Additional variables:
– yytext: Lexeme (Actual text string)
– yyleng: length of string in yytext
⊲ yylineno: Current line number (number of ‘n’ seen thus far)
∗ enabled by %option yylineno
Priority of matching
What if an input string matches more than one pattern?
"if" { return(TOKEN_IF); }
{letter}+ { return(TOKEN_ID); }
"while" { return(TOKEN_WHILE); }
• A pattern that matches the longest string is chosen.
Example: if1 is matched with an identifier, not the keyword if.
• Of patterns that match strings of same length, the first (from the top of file) is chosen.
Example: while is matched as an identifier, not the keyword while.
Constructing Scanners using (f)lex
• Scanner specifications: specifications.l
(f)lex
specifications.l −−−−→ lex.yy.c
• Generated scanner in lex.yy.c
(g)cc
lex.yy.c −−−−→ executable
– yywrap(): hook for signalling end of file.
– Use -lfl (flex) or -ll (lex) flags at link time to include default function yywrap() that always returns 1.
Implementing a Scanner
transition : state × Σ → state
algorithm scanner() {
current state = start state;
while (1) {
c = getc(); /* on end of file, ... */
if defined(transition(current state, c))
current state = transition(current state, c);
else
return s;
}
Implementing a Scanner (contd.)
Implementing the transition function:
• Simplest: 2-D array.
Space inefficient.
• Traditionally compressed using row/colum equivalence. (default on (f)lex)
Good space-time tradeoff.
• Further table compression using various techniques:
– Example: RDM (Row Displacement Method):
Store rows in overlapping manner using 2 1-D arrays.
Smaller tables, but longer access times.
Lexical Analysis: A Summary
Convert a stream of characters into a stream of tokens.
• Make rest of compiler independent of character set
• Strip off comments
• Recognize line numbers
• Ignore white space characters
• Process macros (definitions and uses)
• Interface with symbol (name) table.

More Related Content

What's hot

theory of computation lecture 02
theory of computation lecture 02theory of computation lecture 02
theory of computation lecture 02
8threspecter
 
Theory of computation Lec2
Theory of computation Lec2Theory of computation Lec2
Theory of computation Lec2
Arab Open University and Cairo University
 
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaTheory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
Rushabh2428
 
Automata theory
Automata theoryAutomata theory
Automata theory
Pardeep Vats
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)bolovv
 
Automata theory introduction
Automata theory introductionAutomata theory introduction
Automata theory introduction
NAMRATA BORKAR
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)bolovv
 
Ch 2 lattice & boolean algebra
Ch 2 lattice & boolean algebraCh 2 lattice & boolean algebra
Ch 2 lattice & boolean algebra
Rupali Rana
 
Aho corasick-lecture
Aho corasick-lectureAho corasick-lecture
Aho corasick-lecture
PekkaKilpelinen2
 
Regular expression
Regular expressionRegular expression
Regular expression
Rajon
 
AUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTESAUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTES
suthi
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examplesankitamakin
 
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...
Bilal Amjad
 
regular expressions (Regex)
regular expressions (Regex)regular expressions (Regex)
regular expressions (Regex)
Rebaz Najeeb
 
FLAT Notes
FLAT NotesFLAT Notes
FLAT Notes
dilip kumar
 
string in C
string in Cstring in C
Chapter Eight(2)
Chapter Eight(2)Chapter Eight(2)
Chapter Eight(2)bolovv
 

What's hot (19)

theory of computation lecture 02
theory of computation lecture 02theory of computation lecture 02
theory of computation lecture 02
 
Theory of computation Lec2
Theory of computation Lec2Theory of computation Lec2
Theory of computation Lec2
 
Ch3
Ch3Ch3
Ch3
 
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaTheory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
 
Automata theory
Automata theoryAutomata theory
Automata theory
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Automata theory introduction
Automata theory introductionAutomata theory introduction
Automata theory introduction
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)
 
Ch 2 lattice & boolean algebra
Ch 2 lattice & boolean algebraCh 2 lattice & boolean algebra
Ch 2 lattice & boolean algebra
 
Aho corasick-lecture
Aho corasick-lectureAho corasick-lecture
Aho corasick-lecture
 
Regular expression
Regular expressionRegular expression
Regular expression
 
AUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTESAUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTES
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examples
 
Ch4a
Ch4aCh4a
Ch4a
 
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...
 
regular expressions (Regex)
regular expressions (Regex)regular expressions (Regex)
regular expressions (Regex)
 
FLAT Notes
FLAT NotesFLAT Notes
FLAT Notes
 
string in C
string in Cstring in C
string in C
 
Chapter Eight(2)
Chapter Eight(2)Chapter Eight(2)
Chapter Eight(2)
 

Viewers also liked

Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Snorre Tørriseng
 
Evaluation Question 2
Evaluation Question 2Evaluation Question 2
Evaluation Question 2rturner93
 
Atheism - By Suhit Kulkarni
Atheism - By Suhit KulkarniAtheism - By Suhit Kulkarni
Atheism - By Suhit Kulkarni
Suhit Kulkarni
 
Will i guess_your_birthdate
Will i guess_your_birthdateWill i guess_your_birthdate
Will i guess_your_birthdate
lady_shine
 
Cilc2013 .hr.wk.ewv 20130314
Cilc2013 .hr.wk.ewv 20130314Cilc2013 .hr.wk.ewv 20130314
Cilc2013 .hr.wk.ewv 20130314Heimo Rainer
 
Digital Workplace by Lizard Soft
Digital Workplace by Lizard SoftDigital Workplace by Lizard Soft
Digital Workplace by Lizard Soft
Igor Petrushyn
 
Criteria for the design of pressure transducer adapter systems
Criteria for the design of pressure transducer adapter systemsCriteria for the design of pressure transducer adapter systems
Criteria for the design of pressure transducer adapter systems
Yavuz özkaptan
 
A team may 23 2013
A team may 23 2013A team may 23 2013
A team may 23 2013bscisteam
 
BI Apps OLAP & Reports- SSAS 2012 Tabular & Multidimensional
BI Apps  OLAP & Reports- SSAS 2012 Tabular & Multidimensional BI Apps  OLAP & Reports- SSAS 2012 Tabular & Multidimensional
BI Apps OLAP & Reports- SSAS 2012 Tabular & Multidimensional Sunny U Okoro
 
Lezione2schetchup 111126133835-phpapp01
Lezione2schetchup 111126133835-phpapp01Lezione2schetchup 111126133835-phpapp01
Lezione2schetchup 111126133835-phpapp01
Giuliana Finco
 
Two way fine art pen & brush
Two way fine art pen & brushTwo way fine art pen & brush
Two way fine art pen & brushibec546
 
Mobile Marketing
Mobile MarketingMobile Marketing
Mobile Marketing
Mark Wilson
 
Hotel web ranking
Hotel web rankingHotel web ranking
Hotel web rankingcmhagc
 
Pat Ward - Church Growth Clinic
Pat Ward - Church Growth ClinicPat Ward - Church Growth Clinic
Pat Ward - Church Growth Clinic
theorchardoxford
 
гост пеноблок
гост пеноблокгост пеноблок
гост пеноблок
Al Maks
 
Pengenalan Kepada Teknologi Multimedia Part 3
Pengenalan Kepada Teknologi Multimedia Part 3Pengenalan Kepada Teknologi Multimedia Part 3
Pengenalan Kepada Teknologi Multimedia Part 3
Noor Hafizah Abd. Rahim
 
Mrs craig final exam 3
Mrs craig   final exam 3Mrs craig   final exam 3
Mrs craig final exam 3cmhagc
 
Value chains which unlock market opportunities
Value chains which unlock market opportunitiesValue chains which unlock market opportunities
Value chains which unlock market opportunities
agbiz
 

Viewers also liked (20)

Evaluation 3
Evaluation 3Evaluation 3
Evaluation 3
 
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
 
Evaluation Question 2
Evaluation Question 2Evaluation Question 2
Evaluation Question 2
 
Atheism - By Suhit Kulkarni
Atheism - By Suhit KulkarniAtheism - By Suhit Kulkarni
Atheism - By Suhit Kulkarni
 
Will i guess_your_birthdate
Will i guess_your_birthdateWill i guess_your_birthdate
Will i guess_your_birthdate
 
Cilc2013 .hr.wk.ewv 20130314
Cilc2013 .hr.wk.ewv 20130314Cilc2013 .hr.wk.ewv 20130314
Cilc2013 .hr.wk.ewv 20130314
 
Digital Workplace by Lizard Soft
Digital Workplace by Lizard SoftDigital Workplace by Lizard Soft
Digital Workplace by Lizard Soft
 
Essai
EssaiEssai
Essai
 
Criteria for the design of pressure transducer adapter systems
Criteria for the design of pressure transducer adapter systemsCriteria for the design of pressure transducer adapter systems
Criteria for the design of pressure transducer adapter systems
 
A team may 23 2013
A team may 23 2013A team may 23 2013
A team may 23 2013
 
BI Apps OLAP & Reports- SSAS 2012 Tabular & Multidimensional
BI Apps  OLAP & Reports- SSAS 2012 Tabular & Multidimensional BI Apps  OLAP & Reports- SSAS 2012 Tabular & Multidimensional
BI Apps OLAP & Reports- SSAS 2012 Tabular & Multidimensional
 
Lezione2schetchup 111126133835-phpapp01
Lezione2schetchup 111126133835-phpapp01Lezione2schetchup 111126133835-phpapp01
Lezione2schetchup 111126133835-phpapp01
 
Two way fine art pen & brush
Two way fine art pen & brushTwo way fine art pen & brush
Two way fine art pen & brush
 
Mobile Marketing
Mobile MarketingMobile Marketing
Mobile Marketing
 
Hotel web ranking
Hotel web rankingHotel web ranking
Hotel web ranking
 
Pat Ward - Church Growth Clinic
Pat Ward - Church Growth ClinicPat Ward - Church Growth Clinic
Pat Ward - Church Growth Clinic
 
гост пеноблок
гост пеноблокгост пеноблок
гост пеноблок
 
Pengenalan Kepada Teknologi Multimedia Part 3
Pengenalan Kepada Teknologi Multimedia Part 3Pengenalan Kepada Teknologi Multimedia Part 3
Pengenalan Kepada Teknologi Multimedia Part 3
 
Mrs craig final exam 3
Mrs craig   final exam 3Mrs craig   final exam 3
Mrs craig final exam 3
 
Value chains which unlock market opportunities
Value chains which unlock market opportunitiesValue chains which unlock market opportunities
Value chains which unlock market opportunities
 

Similar to Lex analysis

Theory of Computation Basic Concepts and Grammar
Theory of Computation Basic Concepts and GrammarTheory of Computation Basic Concepts and Grammar
Theory of Computation Basic Concepts and Grammar
Rushabh2428
 
Mod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptxMod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptx
RaviAr5
 
Ch03
Ch03Ch03
Ch03
Hankyo
 
Chapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdfChapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdf
dawod yimer
 
UNIT_-_II.docx
UNIT_-_II.docxUNIT_-_II.docx
UNIT_-_II.docx
karthikeyan Muthusamy
 
Lec 3 ---- dfa
Lec 3  ---- dfaLec 3  ---- dfa
Lec 3 ---- dfa
Abdul Aziz
 
Automata theory - CFG and normal forms
Automata theory - CFG and normal formsAutomata theory - CFG and normal forms
Automata theory - CFG and normal forms
Akila Krishnamoorthy
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
MDSayem35
 
Ch02
Ch02Ch02
Ch02
Hankyo
 
Automata
AutomataAutomata
Automata
Gaditek
 
Class1
 Class1 Class1
Class1issbp
 
Lec1.pptx
Lec1.pptxLec1.pptx
Lec1.pptx
ziadk6872
 
Lesson 04
Lesson 04Lesson 04
Syntax Analyzer.pdf
Syntax Analyzer.pdfSyntax Analyzer.pdf
Syntax Analyzer.pdf
kenilpatel65
 
13000120020_A.pptx
13000120020_A.pptx13000120020_A.pptx
13000120020_A.pptx
0020mdfatehcseB
 
Toc chapter 1 srg
Toc chapter 1 srgToc chapter 1 srg
Toc chapter 1 srg
Shayak Giri
 
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...
venkatapranaykumarGa
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
ProvatMajhi
 
Ch3.ppt
Ch3.pptCh3.ppt
RegularLanguageProperties [Autosaved].pptx
RegularLanguageProperties [Autosaved].pptxRegularLanguageProperties [Autosaved].pptx
RegularLanguageProperties [Autosaved].pptx
RaviAr5
 

Similar to Lex analysis (20)

Theory of Computation Basic Concepts and Grammar
Theory of Computation Basic Concepts and GrammarTheory of Computation Basic Concepts and Grammar
Theory of Computation Basic Concepts and Grammar
 
Mod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptxMod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptx
 
Ch03
Ch03Ch03
Ch03
 
Chapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdfChapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdf
 
UNIT_-_II.docx
UNIT_-_II.docxUNIT_-_II.docx
UNIT_-_II.docx
 
Lec 3 ---- dfa
Lec 3  ---- dfaLec 3  ---- dfa
Lec 3 ---- dfa
 
Automata theory - CFG and normal forms
Automata theory - CFG and normal formsAutomata theory - CFG and normal forms
Automata theory - CFG and normal forms
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Ch02
Ch02Ch02
Ch02
 
Automata
AutomataAutomata
Automata
 
Class1
 Class1 Class1
Class1
 
Lec1.pptx
Lec1.pptxLec1.pptx
Lec1.pptx
 
Lesson 04
Lesson 04Lesson 04
Lesson 04
 
Syntax Analyzer.pdf
Syntax Analyzer.pdfSyntax Analyzer.pdf
Syntax Analyzer.pdf
 
13000120020_A.pptx
13000120020_A.pptx13000120020_A.pptx
13000120020_A.pptx
 
Toc chapter 1 srg
Toc chapter 1 srgToc chapter 1 srg
Toc chapter 1 srg
 
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
RegularLanguageProperties [Autosaved].pptx
RegularLanguageProperties [Autosaved].pptxRegularLanguageProperties [Autosaved].pptx
RegularLanguageProperties [Autosaved].pptx
 

Recently uploaded

ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 

Recently uploaded (20)

ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 

Lex analysis

  • 1. Phases of Syntax Analysis 1. Identify the words: Lexical Analysis. Converts a stream of characters (input program) into a stream of tokens. Also called Scanning or Tokenizing. 2. Identify the sentences: Parsing. Derive the structure of sentences: construct parse trees from a stream of tokens. Lexical Analysis Convert a stream of characters into a stream of tokens. • Simplicity: Conventions about “words” are often different from conventions about “sentences”. • Efficiency: Word identification problem has a much more efficient solution than sentence identification problem. • Portability: Character set, special characters, device features. Terminology • Token: Name given to a family of words. e.g., integer constant • Lexeme: Actual sequence of characters representing a word. e.g., 32894 • Pattern: Notation used to identify the set of lexemes represented by a token. e.g., [0 − 9]+ Terminology A few more examples: Token Sample Lexemes Pattern while while while integer constant 32894, -1093, 0 [0-9]+ identifier buffer size [a-zA-Z]+ Patterns How do we compactly represent the set of all lexemes corresponding to a token? For instance: The token integer constant represents the set of all integers: that is, all sequences of digits (0–9), preceded by an optional sign (+ or −). Obviously, we cannot simply enumerate all lexemes. Use Regular Expressions. Regular Expressions Notation to represent (potentially) infinite sets of strings over alphabet Σ. • a: stands for the set {a} that contains a single string a.
  • 2. ⊲ Analogous to Union. • ab: stands for the set {ab} that contains a single string ab. ⊲ Analogous to Product. ⊲ (a|b)(a|b): stands for the set {aa, ab, ba, bb}. • a∗ : stands for the set {ǫ, a, aa, aaa, . . .} that contains all strings of zero or more a’s. ⊲ Analogous to closure of the product operation. Regular Expressions Examples of Regular Expressions over {a, b}: • (a|b)∗ : Set of strings with zero or more a’s and zero or more b’s: {ǫ, a, b, aa, ab, ba, bb, aaa, aab, . . .} • (a∗ b∗ ): Set of strings with zero or more a’s and zero or more b’s such that all a’s occur before any b: {ǫ, a, b, aa, ab, bb, aaa, aab, abb, . . .} • (a∗ b∗ )∗ : Set of strings with zero or more a’s and zero or more b’s: {ǫ, a, b, aa, ab, ba, bb, aaa, aab, . . .} Language of Regular Expressions Let R be the set of all regular expressions over Σ. Then, • Empty String: ǫ ∈ R • Unit Strings: α ∈ Σ ⇒ α ∈ R • Concatenation: r1, r2 ∈ R ⇒ r1r2 ∈ R • Alternative: r1, r2 ∈ R ⇒ (r1 | r2) ∈ R • Kleene Closure: r ∈ R ⇒ r∗ ∈ R Regular Expressions Example: (a | b)∗ L0 = {ǫ} L1 = L0 · {a, b} = {ǫ} · {a, b} = {a, b} L2 = L1 · {a, b} = {a, b} · {a, b} = {aa, ab, ba, bb} L3 = L2 · {a, b} ... L = ∞ i=0 Li = {ǫ, a, b, aa, ab, ba, bb, . . .} Semantics of Regular Expressions
  • 3. Semantic Function L : Maps regular expressions to sets of strings. L(ǫ) = {ǫ} L(α) = {α} (α ∈ Σ) L(r1 | r2) = L(r1) ∪ L(r2) L(r1 r2) = L(r1) · L(r2) L(r∗ ) = {ǫ} ∪ (L(r) · L(r∗ )) Computing the Semantics L(a) = {a} L(a | b) = L(a) ∪ L(b) = {a} ∪ {b} = {a, b} L(ab) = L(a) · L(b) = {a} · {b} = {ab} L((a | b)(a | b)) = L(a | b) · L(a | b) = {a, b} · {a, b} = {aa, ab, ba, bb} Computing the Semantics of Closure Example: L((a | b)∗ ) = {ǫ} ∪ (L(a | b) · L((a | b)∗ )) L0 = {ǫ} Base case L1 = {ǫ} ∪ ({a, b} · L0) = {ǫ} ∪ ({a, b} · {ǫ}) = {ǫ, a, b} L2 = {ǫ} ∪ ({a, b} · L1) = {ǫ} ∪ ({a, b} · {ǫ, a, b}) = {ǫ, a, b, aa, ab, ba, bb} ... L((a | b)∗ ) = L∞ = {ǫ, a, b, aa, ab, ba, bb, . . .} Another Example L((a∗ b∗ )∗ ) : L(a∗ ) = {ǫ, a, aa, . . .} L(b∗ ) = {ǫ, b, bb, . . .} L(a∗ b∗ ) = {ǫ, a, b, aa, ab, bb, aaa, aab, abb, bbb, . . .} L((a∗ b∗ )∗ ) = {ǫ} ∪{ǫ, a, b, aa, ab, bb, aaa, aab, abb, bbb, . . .} ∪{ǫ, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, . . .} . . .
  • 4. Regular Definitions Assign “names” to regular expressions. For example, digit −→ 0 | 1 | · · · | 9 natural −→ digit digit∗ Shorthands: • a+ : Set of strings with one or more occurrences of a. • a? : Set of strings with zero or one occurrences of a. Example: integer −→ (+|−)? digit+ Regular Definitions: Examples float −→ integer . fraction integer −→ (+|−)? no leading zero no leading zero −→ (nonzero digit digit∗ ) | 0 fraction −→ no trailing zero exponent? no trailing zero −→ (digit∗ nonzero digit) | 0 exponent −→ (E | e) integer digit −→ 0 | 1 | · · · | 9 nonzero digit −→ 1 | 2 | · · · | 9 Regular Definitions and Lexical Analysis Regular Expressions and Definitions specify sets of strings over an input alphabet. • They can hence be used to specify the set of lexemes associated with a token. ⊲ Used as the pattern language How do we decide whether an input string belongs to the set of strings specified by a regular expression? Using Regular Definitions for Lexical Analysis Q: Is ababbaabbb in L(((a∗ b∗ )∗ )? A: Hm. Well. Let’s see. L((a∗ b∗ )∗ ) = {ǫ} ∪{ǫ, a, b, aa, ab, bb, aaa, aab, abb, bbb, . . .} ∪{ǫ, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, . . .} ... = ??? Recognizers Construct automata that recognize strings belonging to a language. • Finite State Automata ⇒ Regular Languages
  • 5. • Push Down Automata ⇒ Context-free Languages ⊲ Stack is used to maintain counter, but only one counter can go arbitrarily high. Recognizing Finite Sets of Strings Identifying words from a small, finite, fixed vocabulary is straightforward. For instance, consider a stack machine with push, pop, and add operations with two constants: 0 and 1. We can use the automaton: s h p p 0 1 u o a d d push pop add integer_constant Finite State Automata Represented by a labeled directed graph. • A finite set of states (vertices). • Transitions between states (edges). • Labels on transitions are drawn from Σ ∪ {ǫ}. • One distinguished start state. • One or more distinguished final states. Finite State Automata: An Example Consider the Regular Expression (a | b)∗ a(a | b). L((a | b)∗ a(a | b)) = {aa, ab, aaa, aab, baa, bab, aaaa, aaab, abaa, abab, baaa, . . .}. The following automaton determines whether an input string belongs to L((a | b)∗ a(a | b): a a b b a 1 2 3 Determinism (a | b)∗ a(a | b): Nondeterministic: (NFA) a a b b a 1 2 3 Deterministic: (DFA) a a b b a a b 1 2 3 4
  • 6. Acceptance Criterion A finite state automaton (NFA or DFA) accepts an input string x . . . if beginning from the start state . . . we can trace some path through the automaton . . . such that the sequence of edge labels spells x . . . and end in a final state. Recognition with an NFA Is abab ∈ L((a | b)∗ a(a | b))? a a b b a 1 2 3 Input: a b a b Path 1: 1 1 1 1 1 Path 2: 1 1 1 2 3 Accept Path 3: 1 2 3 ⊥ ⊥ Accept Recognition with an NFA Is abab ∈ L((a | b)∗ a(a | b))? a a b b a 1 2 3 Input: a b a b Path 1: 1 1 1 1 1 Path 2: 1 1 1 2 3 Accept Path 3: 1 2 3 ⊥ ⊥ Accept Recognition with a DFA Is abab ∈ L((a | b)∗ a(a | b))? a a b b a a b b 1 2 3 4
  • 7. NFA vs. DFA For every NFA, there is a DFA that accepts the same set of strings. • NFA may have transitions labeled by ǫ. (Spontaneous transitions) • All transition labels in a DFA belong to Σ. • For some string x, there may be many accepting paths in an NFA. • For all strings x, there is one unique accepting path in a DFA. • Usually, an input string can be recognized faster with a DFA. • NFAs are typically smaller than the corresponding DFAs. Regular Expressions to NFA Thompson’s Construction: For every regular expression r, derive an NFA N(r) with unique start and final states. ǫ ε α ∈ Σ α (r1 | r2) N(r ) 1 ε ε ε ε N(r ) 2 Regular Expressions to NFA (contd.) r1r2 N(r )2 N(r )1 ε ε r∗ ε ε N(r) ε ε Example (a | b)∗ a(a | b): ε ε ε ε a b ε ε a ε ε ε ε a b ε
  • 8. Recognition with an NFA Is abab ∈ L((a | b)∗ a(a | b))? a a b b a 1 2 3 Input: a b a b Path 1: 1 1 1 1 1 Path 2: 1 1 1 2 3 Accept Path 3: 1 2 3 ⊥ ⊥ All Paths {1} {1, 2} {1, 3} {1, 2} {1, 3} Accept Recognition with an NFA (contd.) Is aaab ∈ L((a | b)∗ a(a | b))? a a b b a 1 2 3 Input: a a a b Path 1: 1 1 1 1 1 Path 2: 1 1 1 1 2 Path 3: 1 1 1 2 3 Accept Path 4: 1 1 2 3 ⊥ Path 5: 1 2 3 ⊥ ⊥ All Paths {1} {1, 2} {1, 2, 3} {1, 2, 3} {1, 2, 3} Accept Recognition with an NFA (contd.) Is aabb ∈ L((a | b)∗ a(a | b))? a a b b a 1 2 3 Input: a a a b Path 1: 1 1 1 1 1 Path 2: 1 1 2 3 ⊥ Path 3: 1 2 3 ⊥ ⊥ All Paths {1} {1, 2} {1, 2, 3} {1, 3} {1} REJECT Converting NFA to DFA Subset construction Given a set S of NFA states, • compute Sǫ = ǫ-closure(S): Sǫ is the set of all NFA states reachable by zero or more ǫ-transitions from S. • compute Sα = goto(S, α): – S′ is the set of all NFA states reachable from S by taking a transition labeled α. – Sα = ǫ-closure(S′ ). Converting NFA to DFA (contd). Each state in DFA corresponds to a set of states in NFA. Start state of DFA = ǫ-closure(start state of NFA). From a state s in DFA that corresponds to a set of states S in NFA: add a transition labeled α to state s′ that corresponds to a non-empty S′ in NFA, such that S′ = goto(S, α).
  • 9. ⇐ s is a final state of DFA NFA → DFA: An Example a a b b a 1 2 3 ǫ-closure({1}) = {1} goto({1}, a) = {1, 2} goto({1}, b) = {1} goto({1, 2}, a) = {1, 2, 3} goto({1, 2}, b) = {1, 3} goto({1, 2, 3}, a) = {1, 2, 3} ... NFA → DFA: An Example (contd.) ǫ-closure({1}) = {1} goto({1}, a) = {1, 2} goto({1}, b) = {1} goto({1, 2}, a) = {1, 2, 3} goto({1, 2}, b) = {1, 3} goto({1, 2, 3}, a) = {1, 2, 3} goto({1, 2, 3}, b) = {1} goto({1, 3}, a) = {1, 2} goto({1, 3}, b) = {1} NFA → DFA: An Example (contd.) goto({1}, a) = {1, 2} goto({1}, b) = {1} goto({1, 2}, a) = {1, 2, 3} goto({1, 2}, b) = {1, 3} goto({1, 2, 3}, a) = {1, 2, 3} ... a a b b a a b b {1} {1,2} {1,3} {1,2,3} NFA vs. DFA R = Size of Regular Expression N = Length of Input String NFA DFA Size of Automaton O(R) O(2R )
  • 10. Lexical Analysis • Regular Expressions and Definitions are used to specify the set of strings (lexemes) corresponding to a token. • An automaton (DFA/NFA) is built from the above specifications. • Each final state is associated with an action: emit the corresponding token. Specifying Lexical Analysis Consider a recognizer for integers (sequence of digits) and floats (sequence of digits separated by a decimal point). [0-9]+ { emit(INTEGER_CONSTANT); } [0-9]+"."[0-9]+ { emit(FLOAT_CONSTANT); } 0-9 0-9 0-9 0-9 ε 0-9 0-9 ε "." INTEGER_CONSTANT FLOAT_CONSTANT Lex Tool for building lexical analyzers. Input: lexical specifications (.l file) Output: C function (yylex) that returns a token on each invocation. %% [0-9]+ { return(INTEGER_CONSTANT); } [0-9]+"."[0-9]+ { return(FLOAT_CONSTANT); } Tokens are simply integers (#define’s). Lex Specifications %{ C header statements for inclusion %} Regular Definitions e.g.: digit [0-9] %% Token Specifications e.g.: {digit}+ { return(INTEGER_CONSTANT); } %% Support functions in C Regular Expressions in Lex
  • 11. • Range: [0-7]: Integers from 0 through 7 (inclusive) [a-nx-zA-Q]: Letters a thru n, x thru z and A thru Q. • Exception: [^/]: Any character other than /. • Definition: {digit}: Use the previously specified regular definition digit. • Special characters: Connectives of regular expression, convenience features. e.g.: | * ^ Special Characters in Lex | * + ? ( ) Same as in regular expressions [ ] Enclose ranges and exceptions { } Enclose “names” of regular definitions ^ Used to negate a specified range (in Exception) . Match any single character except newline Escape the next character n, t Newline and Tab For literal matching, enclose special characters in double quotes (") e.g.: "*" Or use to escape. e.g.: " Examples for Sequence of f, o, r "||" C-style OR operator (two vert. bars) .* Sequence of non-newline characters [^*/]+ Sequence of characters except * and / "[^"]*" Sequence of non-quote characters beginning and ending with a quote ({letter}|" ")({letter}|{digit}|" ")* C-style identifiers A Complete Example %{ #include <stdio.h> #include "tokens.h" %} digit [0-9] hexdigit [0-9a-f] %% "+" { return(PLUS); } "-" { return(MINUS); } {digit}+ { return(INTEGER_CONSTANT); } {digit}+"."{digit}+ { return(FLOAT_CONSTANT); } . { return(SYNTAX_ERROR); } %% Actions Actions are attached to final states. • Distinguish the different final states.
  • 12. • Can be used to set attribute values. • Fragment of C code (blocks enclosed by ‘{’ and ‘}’). Attributes Additional information about a token’s lexeme. • Stored in variable yylval • Type of attributes (usually a union) specified by YYSTYPE • Additional variables: – yytext: Lexeme (Actual text string) – yyleng: length of string in yytext ⊲ yylineno: Current line number (number of ‘n’ seen thus far) ∗ enabled by %option yylineno Priority of matching What if an input string matches more than one pattern? "if" { return(TOKEN_IF); } {letter}+ { return(TOKEN_ID); } "while" { return(TOKEN_WHILE); } • A pattern that matches the longest string is chosen. Example: if1 is matched with an identifier, not the keyword if. • Of patterns that match strings of same length, the first (from the top of file) is chosen. Example: while is matched as an identifier, not the keyword while. Constructing Scanners using (f)lex • Scanner specifications: specifications.l (f)lex specifications.l −−−−→ lex.yy.c • Generated scanner in lex.yy.c (g)cc lex.yy.c −−−−→ executable – yywrap(): hook for signalling end of file. – Use -lfl (flex) or -ll (lex) flags at link time to include default function yywrap() that always returns 1. Implementing a Scanner transition : state × Σ → state algorithm scanner() { current state = start state; while (1) { c = getc(); /* on end of file, ... */ if defined(transition(current state, c)) current state = transition(current state, c); else return s; }
  • 13. Implementing a Scanner (contd.) Implementing the transition function: • Simplest: 2-D array. Space inefficient. • Traditionally compressed using row/colum equivalence. (default on (f)lex) Good space-time tradeoff. • Further table compression using various techniques: – Example: RDM (Row Displacement Method): Store rows in overlapping manner using 2 1-D arrays. Smaller tables, but longer access times. Lexical Analysis: A Summary Convert a stream of characters into a stream of tokens. • Make rest of compiler independent of character set • Strip off comments • Recognize line numbers • Ignore white space characters • Process macros (definitions and uses) • Interface with symbol (name) table.