SlideShare a Scribd company logo
1 of 62
Lexical Analysis
Lexical Analysis 2
Review
What is a compiler?
What are the Phases
of a Compiler?
Lexical Analysis 3
The Goals of Chapter 3
See how to construct a lexical analyzer
– Lexeme description
– Code to recognize lexemes
Lexical Analyzer Generator
– Lex and Flex
Regular Expressions
– To Nondeterministic finite autamota
– To Finite Automata
– Then to Code
Lexical Analysis 4
What is a Lexical Analyzer
supposed to do?
Read the input characters of the source
program,
Group them into lexemes,
– Check with the symbol table regarding this
lexeme
Produce as output a sequence of tokens
– This stream is sent to the parser
Lexical Analysis 5
What is a Lexical Analyzer
supposed to do?
It may also
– strip out comments and whitespace.
– Keep track of line numbers in the source file
(for error messages)
Lexical Analysis 6
Terms
Token
– A pair (token name, [attribute value])
– Ex: (integer, 32)
Pattern
– A description of the form the lexemes will take
– Ex: regular expressions
Lexeme
– An instance matched by the pattern.
Lexical Analysis 7
What are the tokens?
Code:
– printf(“Total= %dn”, score);
Tokens:
– (id, printf)
– openParen
– literal “Total= %dn”
– comma
– (id, score)
– closeParen
– semicolon
Lexical Analysis 8
Token Attributes
When more than one lexeme can match a
pattern, the lexical analyzer must provide
the next phase additional information.
– Number – needs a value
– Id – needs the identifier name, or probably a
symbol table pointer
Lexical Analysis 9
Specification of Tokens
Regular expressions are an important
notation for specifying lexeme patterns.
We are going to
– first study the formal notation for a regular
expression
– Then we will show how to build a lexical
analyzer that uses regular expressions.
– Then we will see how they are used in a
lexical analyzer.
Lexical Analysis 10
Buffering
In principle, the analyzer goes through the
source string a character at a time;
In practice, it must be able to access
substrings of the source.
Hence the source is normally read into a
buffer
The scanner needs two subscripts to note
places in the buffer
– lexeme start & current position
Lexical Analysis 11
Strings and Alphabets
Def: Alphabet – a finite set of symbols
– Typically letters, digits, and punctuation
Example:
– {0,1} is a binary alphabet.
Lexical Analysis 12
Strings and Alphabets
Def: String – (over an alphabet) is a finite
sequence of symbols drawn from that
alphabet.
– Sometimes called a “word”
– The empty string e
Lexical Analysis 13
Strings and Alphabets
Def: Language – any countable set of
strings over some fixed alphabet.
Notes:
– Abstract languages like , and {e} fit this
definition, but so also do C programs
– This definition also does not require any
meaning – that will come later.
Lexical Analysis 14
Lexical Analysis 15
Finite State Automata
The compiler writer defines tokens in the
language by means of regular
expressions.
Informally a regular expression is a
compact notation that indicates what
characters may go together into lexemes
belonging to that particular token and how
they go together.
We will see regular expressions later
Lexical Analysis 16
The lexical analyzer is best
implemented as a finite state machine
or a finite state automaton.
Informally a Finite-State Automaton is a
system that has a finite set of states
with rules for making transitions
between states.
The goal now is to explain these two
things in detail and bridge the gap from
the first to the second.
Lexical Analysis 17
State Diagrams
and State Tables
Def: State Diagram -- is a directed graph
where the vertices in the graph represent
states, and the edges indicate transitions
between the states.
Let’s consider a vending machine that
sells candy bars for 25 cents, and it takes
nickels, dimes, and quarters.
Lexical Analysis 18
Figure 2.1 -- pg.. 21
Lexical Analysis 19
Def: State Table -- is a table with states
down the left side, inputs across the top,
and row/column values indicate the
current state/input and what state to go
to.
Lexical Analysis 20
Formal Definition
Def: FSA -- A FSA, M consists of
– a finite set of input symbols S (the input alphabet)
– a finite set of states Q
– A starting state q0 (which is an element of Q)
– A set of accepting states F (a subset of Q)
(these are sometimes called final states)
– A state-transition function N: (Q x S) -> Q
M = (S, Q, q0, F, N)
Lexical Analysis 21
Example 1:
– Given Machine Construct Table:
Lexical Analysis 22
Lexical Analysis 23
Example 1 (cont.):
– State 1 is the Starting State. This is shown
by the arrow in the Machine, and the fact
that it is the first state in the table.
– States 3 and 4 are the accepting states.
This is shown by the double circles in the
machine, and the fact that they are
underlined in the table.
Lexical Analysis 24
Example 2:
– Given Table, Construct Machine:
Lexical Analysis 25
Lexical Analysis 26
Example 2 (cont.):
– This machine shows that it is entirely
possible to have unreachable states in an
automaton.
– These states can be removed without
affecting the automaton.
Lexical Analysis 27
Acceptance
We use FSA's for recognizing tokens
A character string is recognized (or
accepted) by FSA M if, when the last
character has been read, M is in one of
the accepting states.
– If we pass through an accepting state, but end
in a non-accepting state, the string is NOT
accepted.
Lexical Analysis 28
Def: language -- A language is any set
of strings.
Def: a language over an alphabet S is
any set of strings made up only of the
characters from S
Def: L(M) -- the language accepted by
M is the set of all strings over S that are
accepted by M
Lexical Analysis 29
if we have an FSA for every token, then
the language accepted by that FSA is
the set of all lexemes embraced by that
token.
Def: equivalent --
M1 == M2 iff L(M1) = L(M2).
Lexical Analysis 30
A FSA can be easily programmed if the state
table is stored in memory as a two-
dimensional array.
table : array[1..nstates,1..ninputs] of byte;
Given an input string w, the code would look
something like this:
state := 1;
for i:=1 to length(w) do
begin
col:= char_to_col(w[i]);
state:= table[state, col]
end;
Lexical Analysis 31
Nondeterministic Finite-State
Automata
So far, the behavior of our FSAs has
always been predictable. But there is
another type of FSA in which the state
transitions are not predictable.
In these machines, the state transition
function is of the form:
N: Q x (S U {e}) -> P(Q)
– Note: some authors use a Greek lambda, l or
L
Lexical Analysis 32
This means two things:
– There can be transitions without input.
(That is why the e is in the domain)
– Input can transition to a number of states.
(That is the significance of the power set
in the codomain)
Lexical Analysis 33
Since this makes the behavior
unpredictable, we call it a
nondeterministic FSA
– So now we have DFAs and NFAs (or
NDFAs)
a string is accepted if there is at least 1
path from the start state to an accepting
state.
Lexical Analysis 34
Example: Given Machine
Trace the input = baabab
Lexical Analysis 35
Lexical Analysis 36
e-Transitions
a spontaneous transition without input.
Example: Trace input aaba
Lexical Analysis 37
Lexical Analysis 38
Equivalence
For every non-deterministic machine M we
can construct an equivalent deterministic
machine M'
Therefore, why study N-FSA?
– 1.Theory.
– 2.Tokens -> Reg.Expr. -> N-FSA -> D-FSA
Lexical Analysis 39
Lexical Analysis 40
The Subset Construction
Constructing an equivalent DFA from a
given NFA hinges on the fact that
transitions between state sets are
deterministic even if the transitions
between states are not.
Acceptance states of the subset machine
are those subsets that contain at least 1
accepting state.
Lexical Analysis 41
Generic brute force construction is
impractical.
– As the number of states in M increases,
the number of states in M' increases
drastically (n vs. 2n). If we have a NFA
with 20 states |P(Q)| is something over a
million.
– This also leads to the creation of many
unreachable states. (which can be omitted)
The trick is to only create subset states
as you need them.
Lexical Analysis 42
Example:
– Given: NFA
– Build DFA out of NFA
Lexical Analysis 43
Lexical Analysis 44
Why do we care?
Lexical Analysis and Syntactic Analysis
are typically run off of tables.
These tables are large and laborious to
build.
Therefore, we use a program to build the
tables.
Lexical Analysis 45
But there are two major problems:
– How do we represent a token for the table
generating program?
– How does the program convert this into the
corresponding FSA?
Tokens are described using regular
expressions.
Lexical Analysis 46
Lexical Analysis 47
Regular Expressions
Informally a regular expression of an
alphabet S is a combination of characters
from S and certain operators indicating
concatenation, selection, or repetition.
– b* -- 0 or more b's (Kleene Star)
– b+ -- 1 or more b's
– | -- a|b -- choice
Lexical Analysis 48
Regular Expressions
Def: Regular Expression:
– any character in S is an RE
– e is an RE
– if R and S are RE's so are
RS, R|S, R*, R+, S*, S+.
Only expressions formed by these rules
are regular.
Lexical Analysis 49
Regular Expressions
REs can be used to describe only a limited
variety of languages, but they are powerful
enough to be used to define tokens.
One limitation -- many languages put
length limitations on their tokens, RE's
have no means of enforcing such
limitations.
Lexical Analysis 50
Extensions to REs
One or more Instances +
Zero or One Instance ?
Character Classes
– Digit -> [0-9]
– Digits -> Digit+
– Number -> Digits (. Digits)? (E [+-]? Digits)?
Lexical Analysis 51
Regular Expressions and Finite-
State Machines
This machine recognizes e
Lexical Analysis 52
This machine will recognize a character
a in S
Lexical Analysis 53
To recognize RS connect the machines
as shown
Lexical Analysis 54
To recognize R|S, connect the
machines this way.
Lexical Analysis 55
R*
– Begin with R+
– Now add the zero or more to go from R+ to
R*
Lexical Analysis 56
The Pumping Lemma
Given a machine with n states
and a string w in L(M) has length n
w must go through n+1 states, therefore
something is repeated (call it y)
therefore w = xyz and y can be looped.
so xy*z is also part of the language.
Lexical Analysis 57
The goal of the pumping lemma is to
show that there are some languages
that are not regular.
For Example:
– LR = {wcwR | w in (0,1)*}
– LP -- matching parens
this is handled in syntax analysis.
Lexical Analysis 58
Application to Lexical Analysis
Now you are ready to put it all together:
– Given 2 tokens' regular expression
X = aa*(b|c)
Y = (b|c)c*
– Construct the NDFA
– Construct the DFA
Lexical Analysis 59
Lexical Analysis 60
Recognizing Tokens
The scanner must ignore white space
(except to note the end of a token)
– Add white space transition from Start state to
Start state.
When you enter an accept state,
announce it
– (therefore you cannot pass through accept
states)
– The string may be the entire program.
Lexical Analysis 61
One accept state for each token, so we
know what we found.
Identifier/Keyword differences
– Accept everything as an identifier, and then
look up keywords in table. Or pre-load the
Symbol Table with Keywords.
When you read an identifier, you read
the next character in order to tell it was
the end. You need to back up (put it
back on the input stream).
Lexical Analysis 62
Comments
– Recognize the beginning of comment, and
then ignore everything until the end of
comment.
– What if there are multiple types of
comments?
Character Strings
– single or double quotes?

More Related Content

Similar to 02-Lexical-Analysis.ppt

The Theory of Finite Automata.pptx
The Theory of Finite Automata.pptxThe Theory of Finite Automata.pptx
The Theory of Finite Automata.pptxssuser039bf6
 
Lexical Analyzer Implementation
Lexical Analyzer ImplementationLexical Analyzer Implementation
Lexical Analyzer ImplementationAkhil Kaushik
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingHemantha Kulathilake
 
Finals-review.pptx
Finals-review.pptxFinals-review.pptx
Finals-review.pptxamara jyothi
 
Regular Expression to Finite Automata
Regular Expression to Finite AutomataRegular Expression to Finite Automata
Regular Expression to Finite AutomataArchana Gopinath
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical AnalyzerArchana Gopinath
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptNderituGichuki1
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzersArchana Gopinath
 
Automata_Theory_and_compiler_design_UNIT-1.pptx.pdf
Automata_Theory_and_compiler_design_UNIT-1.pptx.pdfAutomata_Theory_and_compiler_design_UNIT-1.pptx.pdf
Automata_Theory_and_compiler_design_UNIT-1.pptx.pdfTONY562
 
CH 2.pptx
CH 2.pptxCH 2.pptx
CH 2.pptxObsa2
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLPkartikaVashisht
 
4 lexical and syntax analysis
4 lexical and syntax analysis4 lexical and syntax analysis
4 lexical and syntax analysisjigeno
 
Implementation of lexical analyser
Implementation of lexical analyserImplementation of lexical analyser
Implementation of lexical analyserArchana Gopinath
 

Similar to 02-Lexical-Analysis.ppt (20)

The Theory of Finite Automata.pptx
The Theory of Finite Automata.pptxThe Theory of Finite Automata.pptx
The Theory of Finite Automata.pptx
 
Lexical Analyzer Implementation
Lexical Analyzer ImplementationLexical Analyzer Implementation
Lexical Analyzer Implementation
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
Finals-review.pptx
Finals-review.pptxFinals-review.pptx
Finals-review.pptx
 
Lec1.pptx
Lec1.pptxLec1.pptx
Lec1.pptx
 
Unit2 Toc.pptx
Unit2 Toc.pptxUnit2 Toc.pptx
Unit2 Toc.pptx
 
Regular Expression to Finite Automata
Regular Expression to Finite AutomataRegular Expression to Finite Automata
Regular Expression to Finite Automata
 
Compilers Design
Compilers DesignCompilers Design
Compilers Design
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 
Control structure
Control structureControl structure
Control structure
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.ppt
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzers
 
Automata_Theory_and_compiler_design_UNIT-1.pptx.pdf
Automata_Theory_and_compiler_design_UNIT-1.pptx.pdfAutomata_Theory_and_compiler_design_UNIT-1.pptx.pdf
Automata_Theory_and_compiler_design_UNIT-1.pptx.pdf
 
CH 2.pptx
CH 2.pptxCH 2.pptx
CH 2.pptx
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
4 lexical and syntax analysis
4 lexical and syntax analysis4 lexical and syntax analysis
4 lexical and syntax analysis
 
Implementation of lexical analyser
Implementation of lexical analyserImplementation of lexical analyser
Implementation of lexical analyser
 
Parsing
ParsingParsing
Parsing
 

Recently uploaded

GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 

Recently uploaded (20)

GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 

02-Lexical-Analysis.ppt

  • 2. Lexical Analysis 2 Review What is a compiler? What are the Phases of a Compiler?
  • 3. Lexical Analysis 3 The Goals of Chapter 3 See how to construct a lexical analyzer – Lexeme description – Code to recognize lexemes Lexical Analyzer Generator – Lex and Flex Regular Expressions – To Nondeterministic finite autamota – To Finite Automata – Then to Code
  • 4. Lexical Analysis 4 What is a Lexical Analyzer supposed to do? Read the input characters of the source program, Group them into lexemes, – Check with the symbol table regarding this lexeme Produce as output a sequence of tokens – This stream is sent to the parser
  • 5. Lexical Analysis 5 What is a Lexical Analyzer supposed to do? It may also – strip out comments and whitespace. – Keep track of line numbers in the source file (for error messages)
  • 6. Lexical Analysis 6 Terms Token – A pair (token name, [attribute value]) – Ex: (integer, 32) Pattern – A description of the form the lexemes will take – Ex: regular expressions Lexeme – An instance matched by the pattern.
  • 7. Lexical Analysis 7 What are the tokens? Code: – printf(“Total= %dn”, score); Tokens: – (id, printf) – openParen – literal “Total= %dn” – comma – (id, score) – closeParen – semicolon
  • 8. Lexical Analysis 8 Token Attributes When more than one lexeme can match a pattern, the lexical analyzer must provide the next phase additional information. – Number – needs a value – Id – needs the identifier name, or probably a symbol table pointer
  • 9. Lexical Analysis 9 Specification of Tokens Regular expressions are an important notation for specifying lexeme patterns. We are going to – first study the formal notation for a regular expression – Then we will show how to build a lexical analyzer that uses regular expressions. – Then we will see how they are used in a lexical analyzer.
  • 10. Lexical Analysis 10 Buffering In principle, the analyzer goes through the source string a character at a time; In practice, it must be able to access substrings of the source. Hence the source is normally read into a buffer The scanner needs two subscripts to note places in the buffer – lexeme start & current position
  • 11. Lexical Analysis 11 Strings and Alphabets Def: Alphabet – a finite set of symbols – Typically letters, digits, and punctuation Example: – {0,1} is a binary alphabet.
  • 12. Lexical Analysis 12 Strings and Alphabets Def: String – (over an alphabet) is a finite sequence of symbols drawn from that alphabet. – Sometimes called a “word” – The empty string e
  • 13. Lexical Analysis 13 Strings and Alphabets Def: Language – any countable set of strings over some fixed alphabet. Notes: – Abstract languages like , and {e} fit this definition, but so also do C programs – This definition also does not require any meaning – that will come later.
  • 15. Lexical Analysis 15 Finite State Automata The compiler writer defines tokens in the language by means of regular expressions. Informally a regular expression is a compact notation that indicates what characters may go together into lexemes belonging to that particular token and how they go together. We will see regular expressions later
  • 16. Lexical Analysis 16 The lexical analyzer is best implemented as a finite state machine or a finite state automaton. Informally a Finite-State Automaton is a system that has a finite set of states with rules for making transitions between states. The goal now is to explain these two things in detail and bridge the gap from the first to the second.
  • 17. Lexical Analysis 17 State Diagrams and State Tables Def: State Diagram -- is a directed graph where the vertices in the graph represent states, and the edges indicate transitions between the states. Let’s consider a vending machine that sells candy bars for 25 cents, and it takes nickels, dimes, and quarters.
  • 18. Lexical Analysis 18 Figure 2.1 -- pg.. 21
  • 19. Lexical Analysis 19 Def: State Table -- is a table with states down the left side, inputs across the top, and row/column values indicate the current state/input and what state to go to.
  • 20. Lexical Analysis 20 Formal Definition Def: FSA -- A FSA, M consists of – a finite set of input symbols S (the input alphabet) – a finite set of states Q – A starting state q0 (which is an element of Q) – A set of accepting states F (a subset of Q) (these are sometimes called final states) – A state-transition function N: (Q x S) -> Q M = (S, Q, q0, F, N)
  • 21. Lexical Analysis 21 Example 1: – Given Machine Construct Table:
  • 23. Lexical Analysis 23 Example 1 (cont.): – State 1 is the Starting State. This is shown by the arrow in the Machine, and the fact that it is the first state in the table. – States 3 and 4 are the accepting states. This is shown by the double circles in the machine, and the fact that they are underlined in the table.
  • 24. Lexical Analysis 24 Example 2: – Given Table, Construct Machine:
  • 26. Lexical Analysis 26 Example 2 (cont.): – This machine shows that it is entirely possible to have unreachable states in an automaton. – These states can be removed without affecting the automaton.
  • 27. Lexical Analysis 27 Acceptance We use FSA's for recognizing tokens A character string is recognized (or accepted) by FSA M if, when the last character has been read, M is in one of the accepting states. – If we pass through an accepting state, but end in a non-accepting state, the string is NOT accepted.
  • 28. Lexical Analysis 28 Def: language -- A language is any set of strings. Def: a language over an alphabet S is any set of strings made up only of the characters from S Def: L(M) -- the language accepted by M is the set of all strings over S that are accepted by M
  • 29. Lexical Analysis 29 if we have an FSA for every token, then the language accepted by that FSA is the set of all lexemes embraced by that token. Def: equivalent -- M1 == M2 iff L(M1) = L(M2).
  • 30. Lexical Analysis 30 A FSA can be easily programmed if the state table is stored in memory as a two- dimensional array. table : array[1..nstates,1..ninputs] of byte; Given an input string w, the code would look something like this: state := 1; for i:=1 to length(w) do begin col:= char_to_col(w[i]); state:= table[state, col] end;
  • 31. Lexical Analysis 31 Nondeterministic Finite-State Automata So far, the behavior of our FSAs has always been predictable. But there is another type of FSA in which the state transitions are not predictable. In these machines, the state transition function is of the form: N: Q x (S U {e}) -> P(Q) – Note: some authors use a Greek lambda, l or L
  • 32. Lexical Analysis 32 This means two things: – There can be transitions without input. (That is why the e is in the domain) – Input can transition to a number of states. (That is the significance of the power set in the codomain)
  • 33. Lexical Analysis 33 Since this makes the behavior unpredictable, we call it a nondeterministic FSA – So now we have DFAs and NFAs (or NDFAs) a string is accepted if there is at least 1 path from the start state to an accepting state.
  • 34. Lexical Analysis 34 Example: Given Machine Trace the input = baabab
  • 36. Lexical Analysis 36 e-Transitions a spontaneous transition without input. Example: Trace input aaba
  • 38. Lexical Analysis 38 Equivalence For every non-deterministic machine M we can construct an equivalent deterministic machine M' Therefore, why study N-FSA? – 1.Theory. – 2.Tokens -> Reg.Expr. -> N-FSA -> D-FSA
  • 40. Lexical Analysis 40 The Subset Construction Constructing an equivalent DFA from a given NFA hinges on the fact that transitions between state sets are deterministic even if the transitions between states are not. Acceptance states of the subset machine are those subsets that contain at least 1 accepting state.
  • 41. Lexical Analysis 41 Generic brute force construction is impractical. – As the number of states in M increases, the number of states in M' increases drastically (n vs. 2n). If we have a NFA with 20 states |P(Q)| is something over a million. – This also leads to the creation of many unreachable states. (which can be omitted) The trick is to only create subset states as you need them.
  • 42. Lexical Analysis 42 Example: – Given: NFA – Build DFA out of NFA
  • 44. Lexical Analysis 44 Why do we care? Lexical Analysis and Syntactic Analysis are typically run off of tables. These tables are large and laborious to build. Therefore, we use a program to build the tables.
  • 45. Lexical Analysis 45 But there are two major problems: – How do we represent a token for the table generating program? – How does the program convert this into the corresponding FSA? Tokens are described using regular expressions.
  • 47. Lexical Analysis 47 Regular Expressions Informally a regular expression of an alphabet S is a combination of characters from S and certain operators indicating concatenation, selection, or repetition. – b* -- 0 or more b's (Kleene Star) – b+ -- 1 or more b's – | -- a|b -- choice
  • 48. Lexical Analysis 48 Regular Expressions Def: Regular Expression: – any character in S is an RE – e is an RE – if R and S are RE's so are RS, R|S, R*, R+, S*, S+. Only expressions formed by these rules are regular.
  • 49. Lexical Analysis 49 Regular Expressions REs can be used to describe only a limited variety of languages, but they are powerful enough to be used to define tokens. One limitation -- many languages put length limitations on their tokens, RE's have no means of enforcing such limitations.
  • 50. Lexical Analysis 50 Extensions to REs One or more Instances + Zero or One Instance ? Character Classes – Digit -> [0-9] – Digits -> Digit+ – Number -> Digits (. Digits)? (E [+-]? Digits)?
  • 51. Lexical Analysis 51 Regular Expressions and Finite- State Machines This machine recognizes e
  • 52. Lexical Analysis 52 This machine will recognize a character a in S
  • 53. Lexical Analysis 53 To recognize RS connect the machines as shown
  • 54. Lexical Analysis 54 To recognize R|S, connect the machines this way.
  • 55. Lexical Analysis 55 R* – Begin with R+ – Now add the zero or more to go from R+ to R*
  • 56. Lexical Analysis 56 The Pumping Lemma Given a machine with n states and a string w in L(M) has length n w must go through n+1 states, therefore something is repeated (call it y) therefore w = xyz and y can be looped. so xy*z is also part of the language.
  • 57. Lexical Analysis 57 The goal of the pumping lemma is to show that there are some languages that are not regular. For Example: – LR = {wcwR | w in (0,1)*} – LP -- matching parens this is handled in syntax analysis.
  • 58. Lexical Analysis 58 Application to Lexical Analysis Now you are ready to put it all together: – Given 2 tokens' regular expression X = aa*(b|c) Y = (b|c)c* – Construct the NDFA – Construct the DFA
  • 60. Lexical Analysis 60 Recognizing Tokens The scanner must ignore white space (except to note the end of a token) – Add white space transition from Start state to Start state. When you enter an accept state, announce it – (therefore you cannot pass through accept states) – The string may be the entire program.
  • 61. Lexical Analysis 61 One accept state for each token, so we know what we found. Identifier/Keyword differences – Accept everything as an identifier, and then look up keywords in table. Or pre-load the Symbol Table with Keywords. When you read an identifier, you read the next character in order to tell it was the end. You need to back up (put it back on the input stream).
  • 62. Lexical Analysis 62 Comments – Recognize the beginning of comment, and then ignore everything until the end of comment. – What if there are multiple types of comments? Character Strings – single or double quotes?