Recognition of Tokens
• We now know how to specify the tokens for our
language. But how do we write a program to
recognize them?
if -> if
then -> then
else -> else
relop -> < | <= | = | <> | > | >=
id -> letter ( letter | digit )*
num -> digit ( . digit )? ( E (+|-)? digit )?
Token recognition
• We also want to strip whitespace, so we need
definitions
delim -> blank | tab | newline
ws -> delim+
Attribute values
Regular Expression Token Attribute value
ws - -
if if -
then then -
else else -
id id ptr to sym table entry
num num ptr to sym table entry
< relop LT
<= relop LE
= relop EQ
<> relop NE
> relop GT
>= relop GE
Transition diagrams
• Transition diagrams are also called finite
automata.
• We have a collection of STATES drawn as
nodes in a graph.
• TRANSITIONS between states are represented
by directed edges in the graph.
• Each transition leaving a state s is labeled with
a set of input characters that can occur after
state s.
Transition diagrams (Conti…)
• For now, the transitions must be
DETERMINISTIC.
• Each transition diagram has a single START
state and a set of TERMINAL STATES.
• The label OTHER on an edge indicates all
possible inputs not handled by the other
transitions.
• Usually, when we recognize OTHER, we need
to put it back in the source stream since it is
part of the next token. This action is denoted
with a * next to the corresponding state.
0 6
8
7
Start
>
other
=
*
0 6
8
7
Start
>
other
=
For eg: >=
*
0 6
8
7Star
t
>
other
=
For eg: >=
*
0 6
8
7Star
t
>
other
=
For eg: >=
*
0 6
8
7Star
t
>
other
=
For eg: >=
*
0 6
8
7Star
t
>
other
=
For eg: >=
*
0 6
8
7Star
t
>
other
=
For eg: >=
*
0 6
8
7Star
t
>
other
=
For eg: >=
*
0 6
8
7
Start
>
other
=
For eg: abc > pqr
*
0 6
8
7Star
t
>
other
=
For eg: abc > pqr
*
0 6
8
7Star
t
>
other
=
For eg: abc > pqr
*
0 6
8
7Star
t
>
other
=
For eg: abc > pqr
*
0 6
8
7Star
t
>
other
=
For eg: abc > pqr
*
0 6
8
7Star
t
>
othe
r
=
For eg: abc > pqr
*
0 6
8
7Star
t
>
othe
r
=
For eg: abc > pqr
*
0 6
8
7Star
t
>
othe
r
=
For eg: abc > pqr
*
0 6
8
7Star
t
>
othe
r
=
For eg: abc > pqr
*
0 21
6
3
4
5
7
8
start
< =
>
other
return( relop, LE)
return( relop, NE)
return( relop, LT)
return( relop, EQ)
return( relop, GE)
return( relop, GT)
other
=
*
*=
>
0 21
6
3
4
5
7
8
start
< =
>
other
return( relop, LE)
return( relop, NE)
return( relop, LT)
return( relop, EQ)
return( relop, GE)
return( relop, GT)
other
=
*
*=
>
9 10 11
letter
letter or digit
other
*return(gettoken(), install_id())
start
25 26 27
digit
digit
other
*
start

Recognition-of-tokens