FULL ENJOY - 8264348440 Call Girls in Hauz Khas | Delhi
Compiler Designs
1. Group 3
1. Piya Akter 37th
Batch
2. Rima Datta 36th
Batch
3. Shiren Akter 36th
Batch
4. Arman 36th
Batch
5. Md. Wasim 37th
Batch
Department of Computer Science & Engineering
2. The Subject of Discussion
The Role of lexical analyzer
Specification of tokens
Recognition of tokens
3. The Role of Lexical Analyzer
Lexical analysis is the first phase of a compiler. It
takes the modified source code from language
preprocessors that are written in the form of
sentences. The lexical analyzer breaks these syntaxes
into a series of tokens, by removing any whitespace or
comments in the source code.
If the lexical analyzer finds a token invalid, it
generates an error. The lexical analyzer works closely
with the syntax analyzer. It reads character streams
from the source code, checks for legal tokens, and
passes the data to the syntax analyzer when it
demands.
5. Why to separate Lexical analysis
and parsing
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
6. Lexical errors
Some errors are out of power of lexical analyzer to
recognize:
fi (a == f(x)) …
However it may be able to recognize errors like:
d = 2r
Such errors are recognized when no pattern for
tokens matches a character sequence
7. Error recovery
Panic mode: successive characters are ignored until
we reach to a well formed token
Delete one character from the remaining input
Insert a missing character into the remaining input
Replace a character by another character
Transpose two adjacent characters
8. Tokens
Lexemes are said to be a sequence of characters
(alphanumeric) in a token. There are some predefined
rules for every lexeme to be identified as a valid
token. These rules are defined by grammar rules, by
means of a pattern. A pattern explains what can be a
token, and these patterns are defined by means of
regular expressions.
In programming language, keywords, constants,
identifiers, strings, numbers, operators and
punctuations symbols can be considered as tokens.
9. Tokens Example
Token Informal description Sample lexemes
if
else
comparison
id
number
literal
Characters i, f
Characters e, l, s, e
< or > or <= or >= or == or !=
Letter followed by letter and digits
Any numeric constant
Anything but “ sorrounded by “
if
else
<=, !=
pi, score, D2
3.14159, 0, 6.02e23
“core dumped”
printf(“total = %dn”, score);
10. Specification of tokens
In theory of compilation regular expressions are used
to formalize the specification of tokens
Regular expressions are means for specifying regular
languages
Example:
Letter_(letter_ | digit)*
Each regular expression is a pattern specifying the
form of strings
11. Regular expressions
Ɛ is a regular expression, L(Ɛ) = {Ɛ}
If a is a symbol in ∑then a is a regular expression, L(a)
= {a}
(r) | (s) is a regular expression denoting the language
L(r) ∪ L(s)
(r)(s) is a regular expression denoting the language
L(r)L(s)
(r)* is a regular expression denoting (L9r))*
(r) is a regular expression denting L(r)
12. Regular definitions
d1 -> r1
d2 -> r2
…
dn -> rn
Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
13. Recognition of tokens
Tokens can be recognized by Finite Automata A
Finite automaton(FA) is a simple idealized
machine used to recognize patterns within input
taken from some character set(or Alphabet) C.
The job of FA is to accept or reject an input
depending on whether the pattern defined by the
FA occurs in the input.
There are two notations for representing Finite
Automata. They are Transition Diagram
Transition Table
14. Recognition of tokens
Transition diagram is a directed labeled graph in
which it contains nodes and edges
Nodes represents the states and edges represents
the transition of a state
Every transition diagram is only one initial state
represented by an arrow mark (-->) and zero or
more final states are represented by double circle
Where state "1" is initial state and state 3 is final state.
Finite Automata for recognizing identifiers