This document discusses the specification and recognition of tokens in compiling. It begins by defining tokens and regular expressions, which are used to formally specify tokens. Regular expressions define patterns for strings. The document then explains how to recognize tokens by defining patterns for language elements using regular expressions. Finally, it provides examples of transition diagrams that can be used to recognize tokens like relational operators, identifiers, numbers, and whitespace.
1. System Software (5KS03)
Unit 1 : Introduction to Compiling
Lecture : 4 Specification of Tokens
A S Kapse,
Assistant Professor,
Department Of Computer Sci. & Engineering
Anuradha Engineering College, Chikhli
3. Objectives…
Upon completion of this lecture, you will be able
To understand the use of tokens
To understand Role of lexical analyses to recognition
of tokens
To understand regular expression
To use of transition diagram
4. Review…./ Concepts
What do you mean by tokens?
What do you mean by parser and scanner?
What do you mean by regular expression?
5. Specification of tokens
In theory of compilation regular expressions are
used to formalize the specification of tokens
Regular expressions are means for specifying
regular languages
Example:
Letter_(letter_ | digit)*
Each regular expression is a pattern specifying the
form of strings
6. Regular expressions
Ɛ is a regular expression, L(Ɛ) = {Ɛ}
If a is a symbol in ∑then a is a regular expression,
L(a) = {a}
(r) | (s) is a regular expression denoting the language
L(r) ∪ L(s)
(r)(s) is a regular expression denoting the language
L(r)L(s)
(r)* is a regular expression denoting (L9r))*
(r) is a regular expression denting L(r)
7. Regular definitions
d1 -> r1
d2 -> r2
…
dn -> rn
Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
8. Extensions
One or more instances: (r)+
Zero of one instances: r?
Character classes: [abc]
Example:
letter_ -> [A-Za-z_]
digit -> [0-9]
id -> letter_(letter|digit)*
9. Recognition of tokens
Starting point is the language grammar to
understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
| Ɛ
expr -> term relop term
| term
term -> id
| number
10. Recognition of tokens (cont.)
The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
We also need to handle whitespaces:
ws -> (blank | tab | newline)+
15. Video on Compilers
1. Lexical Analysis : The role of lexical analyzer
2. Input buffering
16. Questions..
1. Define tokens?
2. What do you mean by regular expression
3. Explain the process of regular expression.
4. What is mean by transition diagram?
17. Homework..
1. What is parser?
2What is mean by analysis and synthesis.
3. Describe the following example.
area=pi * r * r + 45