SlideShare a Scribd company logo
1 of 52
Lexical Analysis and Lexical Analyzer Generators Chapter 3 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2009
The Reason Why Lexical Analysis is a Separate Phase ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Interaction of the Lexical Analyzer with the Parser Lexical Analyzer Parser Source Program Token, tokenval Symbol Table Get next token error error
Attributes of Tokens Lexical analyzer < id , “ y ”> < assign , > < num , 31> < + , > < num , 28> < * , > < id , “ x ”> y := 31 + 28*x Parser token tokenval (token attribute)
Tokens, Patterns, and Lexemes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Specification of Patterns for Tokens:  Definitions ,[object Object],[object Object],[object Object],[object Object],[object Object]
Specification of Patterns for Tokens:  String Operations ,[object Object],[object Object]
Specification of Patterns for Tokens:  Language Operations ,[object Object],[object Object],[object Object],[object Object],[object Object]
Specification of Patterns for Tokens:  Regular Expressions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Specification of Patterns for Tokens:  Regular Definitions ,[object Object],[object Object]
Specification of Patterns for Tokens:  Regular Definitions ,[object Object],[object Object]
Specification of Patterns for Tokens:  Notational Shorthand ,[object Object],[object Object]
Regular Definitions and Grammars stmt      if   expr   then   stmt      if  expr   then   stmt   else   stmt         expr    term  relop   term      term term     id      num if      if   then      then   else      else relop      <      <=      <>      >      >=      =   id      letter ( letter  |  digit  ) *  num      digit +  ( . digit + )? (  E  ( +  - )?  digit +  )? Grammar Regular definitions
Coding Regular Definitions in  Transition Diagrams 0 2 1 6 3 4 5 7 8 return ( relop ,  LE ) return ( relop ,  NE ) return ( relop ,  LT ) return ( relop ,  EQ ) return ( relop ,  GE ) return ( relop ,  GT ) start < = > = > = other other * * 9 start letter 10 11 * other letter  or  digit return ( gettoken (),   install_id ()) relop      <  <=  <>  >  >=  = id      letter ( letter  digit  ) *
Coding Regular Definitions in Transition Diagrams: Code token nexttoken() {  while  (1) {   switch  (state) {   case  0: c = nextchar();   if  (c==blank || c==tab || c==newline) {   state = 0;   lexeme_beginning++;   }   else   if  (c==‘<’) state = 1; else   if  (c==‘=’) state = 5;   else   if  (c==‘>’) state = 6;   else  state = fail();   break ;   case  1:   …   case  9: c = nextchar();   if  (isletter(c)) state = 10;   else  state = fail();   break ;   case  10: c = nextchar();   if  (isletter(c)) state = 10;   else   if  (isdigit(c)) state = 10;   else  state = 11;   break ;   … int  fail() { forward = token_beginning; swith  (start) {   case   0: start =  9;  break ; case   9: start = 12;  break ;   case  12: start = 20;  break ;   case  20: start = 25;  break ;   case  25: recover();  break ;   default : /* error */ } return  start; } Decides the next start state to check
The Lex and Flex Scanner Generators ,[object Object],[object Object],[object Object]
Creating a Lexical Analyzer with Lex and Flex lex or flex compiler lex source program lex.l lex.yy.c input stream C compiler a.out sequence of tokens lex.yy.c a.out
Lex Specification ,[object Object],[object Object]
Regular Expressions in Lex x match the character  x    match the character  .   “ string ” match contents of string of characters   .  match any character except newline ^ match beginning of a line $ match the end of a line [xyz] match one character  x ,  y , or  z  (use   to escape  - )  [^xyz] match any character except  x ,  y , and  z   [a-z] match one of  a  to  z r * closure (match zero or more occurrences) r + positive closure (match one or more occurrences) r ?  optional (match zero or one occurrence) r 1 r 2 match  r 1  then  r 2  (concatenation) r 1 | r 2 match  r 1  or  r 2  (union) (  r  )  grouping r 1 r 2 match  r 1  when followed by  r 2 { d } match the regular expression defined by  d
Example Lex Specification 1 %{ #include <stdio.h> %} %% [0-9]+  { printf(“%s”, yytext); } .|  { } %% main() { yylex(); } Contains the matching lexeme Invokes the lexical analyzer lex spec.l gcc lex.yy.c -ll ./a.out < spec.l Translation rules
Example Lex Specification 2 %{ #include <stdio.h> int ch = 0, wd = 0, nl = 0; %} delim  [ ]+ %%   { ch++; wd++; nl++; } ^{delim}  { ch+=yyleng; } {delim}  { ch+=yyleng; wd++; } .  { ch++; } %% main() { yylex(); printf(&quot;%8d%8d%8d&quot;, nl, wd, ch); } Regular definition Translation rules
Example Lex Specification 3 %{ #include <stdio.h> %} digit  [0-9] letter  [A-Za-z] id  {letter}({letter}|{digit})* %% {digit}+  { printf(“number: %s”, yytext); } {id}  { printf(“ident: %s”, yytext); } .  { printf(“other: %s”, yytext); } %% main() { yylex();  } Regular definitions Translation rules
Example Lex Specification 4 %{ /* definitions of manifest constants */ #define LT (256) … %} delim  [ ] ws  {delim}+ letter  [A-Za-z] digit  [0-9] id  {letter}({letter}|{digit})* number  {digit}+({digit}+)?(E[+]?{digit}+)? %% {ws}  { } if  {return IF;} then  {return THEN;} else  {return ELSE;} {id}  {yylval = install_id(); return ID;} {number}  {yylval = install_num(); return NUMBER;} “<“  {yylval = LT; return RELOP;} “<=“  {yylval = LE; return RELOP;} “ =“  {yylval = EQ; return RELOP;} “ <>“  {yylval = NE; return RELOP;} “>“  {yylval = GT; return RELOP;} “ >=“  {yylval = GE; return RELOP;} %% int install_id() … Return token to parser Token attribute Install  yytext  as identifier in symbol table
Design of a Lexical Analyzer Generator ,[object Object],[object Object],regular expressions NFA DFA Simulate NFA to recognize tokens Simulate DFA to recognize tokens Optional
Nondeterministic Finite Automata ,[object Object]
Transition Graph ,[object Object],0 start a 1 3 2 b b a b S  = {0,1,2,3}   = { a , b } s 0  = 0 F  = {3}
Transition Table ,[object Object], (0, a ) = {0,1}  (0, b ) = {0}  (1, b ) = {2}  (2, b ) = {3} State Input a Input b 0 {0, 1} {0} 1 {2} 2 {3}
The Language Defined by an NFA ,[object Object],[object Object],[object Object]
Design of a Lexical Analyzer Generator: RE to NFA to DFA s 0 N ( p 1 ) N ( p 2 ) start   N ( p n )  … p 1 {  action 1  } p 2 {  action 2  } … p n {  action n  } action 1 action 2 action n Lex specification with regular expressions NFA DFA Subset construction
From Regular Expression to NFA (Thompson’s Construction) N ( r 2 ) N ( r 1 ) f i  f a i f i N ( r 1 ) N ( r 2 ) start start start     f i start N ( r ) f i start    a r 1  r 2 r 1 r 2 r *  
Combining the NFAs of a Set of Regular Expressions 2 a 1 start 6 a 3 start 4 5 b b 8 b 7 start a b a {  action 1  } abb {  action 2  }   a * b + {  action 3  } 2 a 1 6 a 3 4 5 b b 8 b 7 a b 0 start   
Simulating the Combined NFA Example 1 2 a 1 6 a 3 4 5 b b 8 b 7 a b 0 start    Must find the  longest match : Continue until no further moves are possible When last state is accepting: execute action action 1 action 2 action 3 a b a a none action 3 0 1 3 7 2 4 7 7 8
Simulating the Combined NFA Example 2 2 a 1 6 a 3 4 5 b b 8 b 7 a b 0 start    When two or more accepting states are reached, the first action given in the Lex specification is executed action 1 action 2 action 3 a b b a none action 2 action 3 0 1 3 7 2 4 7 5 8 6 8
Deterministic Finite Automata ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example DFA 0 start a 1 3 2 b b b b a a a A DFA that accepts ( a  b )* abb
Conversion of an NFA into a DFA ,[object Object],[object Object]
 -closure  and  move  Examples 2 a 1 6 a 3 4 5 b b 8 b 7 a b 0 start     -closure ({0}) = {0,1,3,7} move ({0,1,3,7}, a ) = {2,4,7}  -closure ({2,4,7}) = {2,4,7} move ({2,4,7}, a ) = {7}  -closure ({7}) = {7} move ({7}, b ) = {8}  -closure ({8}) = {8} move ({8}, a ) =   a b a a none Also used to simulate NFAs 0 1 3 7 2 4 7 7 8
Simulating an NFA using  -closure  and  move S  :=   -closure ({ s 0 }) S prev  :=     a  :=  nextchar () while   S        do S prev  :=  S S  :=   -closure ( move ( S,a )) a  :=  nextchar () end do if  S prev      F        then execute  action in S prev return  “yes” else return  “no”
The Subset Construction Algorithm Initially,   -closure ( s 0 ) is the only state in  Dstates  and it is unmarked while  there is an unmarked state  T  in  Dstates   do mark  T for  each input symbol  a        do U  :=   -closure ( move ( T,a )) if   U  is not in  Dstates   then add  U  as an unmarked state to  Dstates end if Dtran [ T , a ] :=  U end do end do
Subset Construction Example 1 0 start a 1 10 2 b b a b 3 4 5 6 7 8 9         A start B C D E b b b b b a a a a Dstates A = {0,1,2,4,7} B = {1,2,3,4,6,7,8} C = {1,2,4,5,6,7} D = {1,2,4,5,6,7,9} E = {1,2,4,5,6,7,10} a
Subset Construction Example 2 2 a 1 6 a 3 4 5 b b 8 b 7 a b 0 start    a 1 a 2 a 3 Dstates A = {0,1,3,7} B = {2,4,7} C = {8} D = {7} E = {5,8} F = {6,8} A start a D b b b a b b B C E F a b a 1 a 3 a 3 a 2  a 3
Minimizing the Number of States of a DFA A start B C D E b b b b b a a a a a A start B D E b b a a b a a
From Regular Expression to DFA Directly ,[object Object],[object Object]
From Regular Expression to DFA Directly (Algorithm) ,[object Object],[object Object],[object Object]
From Regular Expression to DFA Directly: Syntax Tree of  ( a | b )* abb# * | 1 a 2 b 3 a 4 b 5 b # 6 concatenation closure alternation position number (for leafs   )
From Regular Expression to DFA Directly: Annotating the Tree ,[object Object],[object Object],[object Object],[object Object]
From Regular Expression to DFA Directly: Annotating the Tree Node  n nullable ( n ) firstpos ( n ) lastpos ( n ) Leaf   true   Leaf  i false { i } { i } | /   c 1   c 2 nullable ( c 1 ) or nullable ( c 2 ) firstpos ( c 1 )  firstpos ( c 2 ) lastpos ( c 1 )  lastpos ( c 2 ) • /   c 1   c 2 nullable ( c 1 )  and nullable ( c 2 ) if  nullable ( c 1 )  then   firstpos ( c 1 )     firstpos ( c 2 ) else  firstpos ( c 1 ) if  nullable ( c 2 )  then   lastpos ( c 1 )     lastpos ( c 2 ) else  lastpos ( c 2 ) * | c 1 true firstpos ( c 1 ) lastpos ( c 1 )
From Regular Expression to DFA Directly: Syntax Tree of  ( a | b )* abb# {6} {1, 2, 3} {5} {1, 2, 3} {4} {1, 2, 3} {3} {1, 2, 3} {1, 2} {1, 2} * {1, 2} {1, 2} | {1} {1} a {2} {2} b {3} {3} a {4} {4} b {5} {5} b {6} {6} # nullable firstpos lastpos 1 2 3 4 5 6
From Regular Expression to DFA Directly:  followpos for  each node  n  in the tree  do if  n  is a cat-node with left child  c 1  and right child  c 2   then for  each  i  in  lastpos ( c 1 )  do followpos ( i ) :=  followpos ( i )     firstpos ( c 2 ) end do else if  n  is a star-node for  each  i  in  lastpos ( n )  do followpos ( i ) :=  followpos ( i )     firstpos ( n ) end do end if end do
From Regular Expression to DFA Directly: Algorithm s 0  :=  firstpos ( root ) where  root  is the root of the syntax tree Dstates  := { s 0 } and is unmarked while  there is an unmarked state  T  in  Dstates   do mark  T for  each input symbol  a         do let  U  be the set of positions that are in  followpos ( p ) for some position  p  in  T , such that the symbol at position  p  is  a if  U  is not empty and not in  Dstates   then add  U  as an unmarked state to  Dstates end if Dtran [ T,a ] :=  U end do end do
From Regular Expression to DFA Directly: Example 1,2,3 start a 1,2, 3,4 1,2, 3,6 1,2, 3,5 b b b b a a a 1 2 3 4 5 6 Node followpos 1 {1, 2, 3} 2 {1, 2, 3} 3 {4} 4 {5} 5 {6} 6 -
Time-Space Tradeoffs Automaton Space (worst case) Time (worst case) NFA O (  r  ) O (  r  x  ) DFA O (2 | r | ) O (  x  )

More Related Content

What's hot

Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Iffat Anjum
 
Theory of Automata
Theory of AutomataTheory of Automata
Theory of AutomataFarooq Mian
 
Python strings presentation
Python strings presentationPython strings presentation
Python strings presentationVedaGayathri1
 
Regular expressions-Theory of computation
Regular expressions-Theory of computationRegular expressions-Theory of computation
Regular expressions-Theory of computationBipul Roy Bpl
 
RECURSION IN C
RECURSION IN C RECURSION IN C
RECURSION IN C v_jk
 
Presentation on function
Presentation on functionPresentation on function
Presentation on functionAbu Zaman
 
02. chapter 3 lexical analysis
02. chapter 3   lexical analysis02. chapter 3   lexical analysis
02. chapter 3 lexical analysisraosir123
 
Hashing and Hash Tables
Hashing and Hash TablesHashing and Hash Tables
Hashing and Hash Tablesadil raja
 
LR(1) and SLR(1) parsing
LR(1) and SLR(1) parsingLR(1) and SLR(1) parsing
LR(1) and SLR(1) parsingR Islam
 
similarity measure
similarity measure similarity measure
similarity measure ZHAO Sam
 
Chapter 6 intermediate code generation
Chapter 6   intermediate code generationChapter 6   intermediate code generation
Chapter 6 intermediate code generationVipul Naik
 
Programming Fundamentals Functions in C and types
Programming Fundamentals  Functions in C  and typesProgramming Fundamentals  Functions in C  and types
Programming Fundamentals Functions in C and typesimtiazalijoono
 

What's hot (20)

Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01
 
Theory of Automata
Theory of AutomataTheory of Automata
Theory of Automata
 
Python strings presentation
Python strings presentationPython strings presentation
Python strings presentation
 
Regular expressions-Theory of computation
Regular expressions-Theory of computationRegular expressions-Theory of computation
Regular expressions-Theory of computation
 
RECURSION IN C
RECURSION IN C RECURSION IN C
RECURSION IN C
 
Presentation on function
Presentation on functionPresentation on function
Presentation on function
 
02. chapter 3 lexical analysis
02. chapter 3   lexical analysis02. chapter 3   lexical analysis
02. chapter 3 lexical analysis
 
Hashing and Hash Tables
Hashing and Hash TablesHashing and Hash Tables
Hashing and Hash Tables
 
Recurrence relation
Recurrence relationRecurrence relation
Recurrence relation
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
LR(1) and SLR(1) parsing
LR(1) and SLR(1) parsingLR(1) and SLR(1) parsing
LR(1) and SLR(1) parsing
 
similarity measure
similarity measure similarity measure
similarity measure
 
Role-of-lexical-analysis
Role-of-lexical-analysisRole-of-lexical-analysis
Role-of-lexical-analysis
 
Input-Buffering
Input-BufferingInput-Buffering
Input-Buffering
 
Chapter 6 intermediate code generation
Chapter 6   intermediate code generationChapter 6   intermediate code generation
Chapter 6 intermediate code generation
 
Recursion in c++
Recursion in c++Recursion in c++
Recursion in c++
 
Parsing LL(1), SLR, LR(1)
Parsing LL(1), SLR, LR(1)Parsing LL(1), SLR, LR(1)
Parsing LL(1), SLR, LR(1)
 
Code generation
Code generationCode generation
Code generation
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Programming Fundamentals Functions in C and types
Programming Fundamentals  Functions in C  and typesProgramming Fundamentals  Functions in C  and types
Programming Fundamentals Functions in C and types
 

Similar to Ch3

Similar to Ch3 (20)

Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.ppt
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
compiler Design course material chapter 2
compiler Design course material chapter 2compiler Design course material chapter 2
compiler Design course material chapter 2
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
ANSI C REFERENCE CARD
ANSI C REFERENCE CARDANSI C REFERENCE CARD
ANSI C REFERENCE CARD
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdf
 
system software
system software system software
system software
 
Ch04
Ch04Ch04
Ch04
 
UNIT_-_II.docx
UNIT_-_II.docxUNIT_-_II.docx
UNIT_-_II.docx
 
12IRGeneration.pdf
12IRGeneration.pdf12IRGeneration.pdf
12IRGeneration.pdf
 
Ch8a
Ch8aCh8a
Ch8a
 
Compilers Design
Compilers DesignCompilers Design
Compilers Design
 
Ch 2.pptx
Ch 2.pptxCh 2.pptx
Ch 2.pptx
 
Chapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdfChapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdf
 
Lec1.pptx
Lec1.pptxLec1.pptx
Lec1.pptx
 
Introduction of bison
Introduction of bisonIntroduction of bison
Introduction of bison
 

More from kinnarshah8888 (16)

Yuva Msp All
Yuva Msp AllYuva Msp All
Yuva Msp All
 
Yuva Msp Intro
Yuva Msp IntroYuva Msp Intro
Yuva Msp Intro
 
Ch6
Ch6Ch6
Ch6
 
Ch9c
Ch9cCh9c
Ch9c
 
Ch5a
Ch5aCh5a
Ch5a
 
Ch9b
Ch9bCh9b
Ch9b
 
Ch9a
Ch9aCh9a
Ch9a
 
Ch10
Ch10Ch10
Ch10
 
Ch7
Ch7Ch7
Ch7
 
Ch2
Ch2Ch2
Ch2
 
Ch4b
Ch4bCh4b
Ch4b
 
Ch4a
Ch4aCh4a
Ch4a
 
Ch8b
Ch8bCh8b
Ch8b
 
Ch5b
Ch5bCh5b
Ch5b
 
Ch4c
Ch4cCh4c
Ch4c
 
Ch1
Ch1Ch1
Ch1
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Ch3

  • 1. Lexical Analysis and Lexical Analyzer Generators Chapter 3 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2009
  • 2.
  • 3. Interaction of the Lexical Analyzer with the Parser Lexical Analyzer Parser Source Program Token, tokenval Symbol Table Get next token error error
  • 4. Attributes of Tokens Lexical analyzer < id , “ y ”> < assign , > < num , 31> < + , > < num , 28> < * , > < id , “ x ”> y := 31 + 28*x Parser token tokenval (token attribute)
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Regular Definitions and Grammars stmt  if expr then stmt  if expr then stmt else stmt   expr  term relop term  term term  id  num if  if then  then else  else relop  <  <=  <>  >  >=  = id  letter ( letter | digit ) * num  digit + ( . digit + )? ( E ( +  - )? digit + )? Grammar Regular definitions
  • 14. Coding Regular Definitions in Transition Diagrams 0 2 1 6 3 4 5 7 8 return ( relop , LE ) return ( relop , NE ) return ( relop , LT ) return ( relop , EQ ) return ( relop , GE ) return ( relop , GT ) start < = > = > = other other * * 9 start letter 10 11 * other letter or digit return ( gettoken (), install_id ()) relop  <  <=  <>  >  >=  = id  letter ( letter  digit ) *
  • 15. Coding Regular Definitions in Transition Diagrams: Code token nexttoken() { while (1) { switch (state) { case 0: c = nextchar(); if (c==blank || c==tab || c==newline) { state = 0; lexeme_beginning++; } else if (c==‘<’) state = 1; else if (c==‘=’) state = 5; else if (c==‘>’) state = 6; else state = fail(); break ; case 1: … case 9: c = nextchar(); if (isletter(c)) state = 10; else state = fail(); break ; case 10: c = nextchar(); if (isletter(c)) state = 10; else if (isdigit(c)) state = 10; else state = 11; break ; … int fail() { forward = token_beginning; swith (start) { case 0: start = 9; break ; case 9: start = 12; break ; case 12: start = 20; break ; case 20: start = 25; break ; case 25: recover(); break ; default : /* error */ } return start; } Decides the next start state to check
  • 16.
  • 17. Creating a Lexical Analyzer with Lex and Flex lex or flex compiler lex source program lex.l lex.yy.c input stream C compiler a.out sequence of tokens lex.yy.c a.out
  • 18.
  • 19. Regular Expressions in Lex x match the character x match the character . “ string ” match contents of string of characters . match any character except newline ^ match beginning of a line $ match the end of a line [xyz] match one character x , y , or z (use to escape - ) [^xyz] match any character except x , y , and z [a-z] match one of a to z r * closure (match zero or more occurrences) r + positive closure (match one or more occurrences) r ? optional (match zero or one occurrence) r 1 r 2 match r 1 then r 2 (concatenation) r 1 | r 2 match r 1 or r 2 (union) ( r ) grouping r 1 r 2 match r 1 when followed by r 2 { d } match the regular expression defined by d
  • 20. Example Lex Specification 1 %{ #include <stdio.h> %} %% [0-9]+ { printf(“%s”, yytext); } .| { } %% main() { yylex(); } Contains the matching lexeme Invokes the lexical analyzer lex spec.l gcc lex.yy.c -ll ./a.out < spec.l Translation rules
  • 21. Example Lex Specification 2 %{ #include <stdio.h> int ch = 0, wd = 0, nl = 0; %} delim [ ]+ %% { ch++; wd++; nl++; } ^{delim} { ch+=yyleng; } {delim} { ch+=yyleng; wd++; } . { ch++; } %% main() { yylex(); printf(&quot;%8d%8d%8d&quot;, nl, wd, ch); } Regular definition Translation rules
  • 22. Example Lex Specification 3 %{ #include <stdio.h> %} digit [0-9] letter [A-Za-z] id {letter}({letter}|{digit})* %% {digit}+ { printf(“number: %s”, yytext); } {id} { printf(“ident: %s”, yytext); } . { printf(“other: %s”, yytext); } %% main() { yylex(); } Regular definitions Translation rules
  • 23. Example Lex Specification 4 %{ /* definitions of manifest constants */ #define LT (256) … %} delim [ ] ws {delim}+ letter [A-Za-z] digit [0-9] id {letter}({letter}|{digit})* number {digit}+({digit}+)?(E[+]?{digit}+)? %% {ws} { } if {return IF;} then {return THEN;} else {return ELSE;} {id} {yylval = install_id(); return ID;} {number} {yylval = install_num(); return NUMBER;} “<“ {yylval = LT; return RELOP;} “<=“ {yylval = LE; return RELOP;} “ =“ {yylval = EQ; return RELOP;} “ <>“ {yylval = NE; return RELOP;} “>“ {yylval = GT; return RELOP;} “ >=“ {yylval = GE; return RELOP;} %% int install_id() … Return token to parser Token attribute Install yytext as identifier in symbol table
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. Design of a Lexical Analyzer Generator: RE to NFA to DFA s 0 N ( p 1 ) N ( p 2 ) start   N ( p n )  … p 1 { action 1 } p 2 { action 2 } … p n { action n } action 1 action 2 action n Lex specification with regular expressions NFA DFA Subset construction
  • 30. From Regular Expression to NFA (Thompson’s Construction) N ( r 2 ) N ( r 1 ) f i  f a i f i N ( r 1 ) N ( r 2 ) start start start     f i start N ( r ) f i start    a r 1  r 2 r 1 r 2 r *  
  • 31. Combining the NFAs of a Set of Regular Expressions 2 a 1 start 6 a 3 start 4 5 b b 8 b 7 start a b a { action 1 } abb { action 2 } a * b + { action 3 } 2 a 1 6 a 3 4 5 b b 8 b 7 a b 0 start   
  • 32. Simulating the Combined NFA Example 1 2 a 1 6 a 3 4 5 b b 8 b 7 a b 0 start    Must find the longest match : Continue until no further moves are possible When last state is accepting: execute action action 1 action 2 action 3 a b a a none action 3 0 1 3 7 2 4 7 7 8
  • 33. Simulating the Combined NFA Example 2 2 a 1 6 a 3 4 5 b b 8 b 7 a b 0 start    When two or more accepting states are reached, the first action given in the Lex specification is executed action 1 action 2 action 3 a b b a none action 2 action 3 0 1 3 7 2 4 7 5 8 6 8
  • 34.
  • 35. Example DFA 0 start a 1 3 2 b b b b a a a A DFA that accepts ( a  b )* abb
  • 36.
  • 37.  -closure and move Examples 2 a 1 6 a 3 4 5 b b 8 b 7 a b 0 start     -closure ({0}) = {0,1,3,7} move ({0,1,3,7}, a ) = {2,4,7}  -closure ({2,4,7}) = {2,4,7} move ({2,4,7}, a ) = {7}  -closure ({7}) = {7} move ({7}, b ) = {8}  -closure ({8}) = {8} move ({8}, a ) =  a b a a none Also used to simulate NFAs 0 1 3 7 2 4 7 7 8
  • 38. Simulating an NFA using  -closure and move S :=  -closure ({ s 0 }) S prev :=  a := nextchar () while S   do S prev := S S :=  -closure ( move ( S,a )) a := nextchar () end do if S prev  F   then execute action in S prev return “yes” else return “no”
  • 39. The Subset Construction Algorithm Initially,  -closure ( s 0 ) is the only state in Dstates and it is unmarked while there is an unmarked state T in Dstates do mark T for each input symbol a   do U :=  -closure ( move ( T,a )) if U is not in Dstates then add U as an unmarked state to Dstates end if Dtran [ T , a ] := U end do end do
  • 40. Subset Construction Example 1 0 start a 1 10 2 b b a b 3 4 5 6 7 8 9         A start B C D E b b b b b a a a a Dstates A = {0,1,2,4,7} B = {1,2,3,4,6,7,8} C = {1,2,4,5,6,7} D = {1,2,4,5,6,7,9} E = {1,2,4,5,6,7,10} a
  • 41. Subset Construction Example 2 2 a 1 6 a 3 4 5 b b 8 b 7 a b 0 start    a 1 a 2 a 3 Dstates A = {0,1,3,7} B = {2,4,7} C = {8} D = {7} E = {5,8} F = {6,8} A start a D b b b a b b B C E F a b a 1 a 3 a 3 a 2 a 3
  • 42. Minimizing the Number of States of a DFA A start B C D E b b b b b a a a a a A start B D E b b a a b a a
  • 43.
  • 44.
  • 45. From Regular Expression to DFA Directly: Syntax Tree of ( a | b )* abb# * | 1 a 2 b 3 a 4 b 5 b # 6 concatenation closure alternation position number (for leafs  )
  • 46.
  • 47. From Regular Expression to DFA Directly: Annotating the Tree Node n nullable ( n ) firstpos ( n ) lastpos ( n ) Leaf  true   Leaf i false { i } { i } | / c 1 c 2 nullable ( c 1 ) or nullable ( c 2 ) firstpos ( c 1 )  firstpos ( c 2 ) lastpos ( c 1 )  lastpos ( c 2 ) • / c 1 c 2 nullable ( c 1 ) and nullable ( c 2 ) if nullable ( c 1 ) then firstpos ( c 1 )  firstpos ( c 2 ) else firstpos ( c 1 ) if nullable ( c 2 ) then lastpos ( c 1 )  lastpos ( c 2 ) else lastpos ( c 2 ) * | c 1 true firstpos ( c 1 ) lastpos ( c 1 )
  • 48. From Regular Expression to DFA Directly: Syntax Tree of ( a | b )* abb# {6} {1, 2, 3} {5} {1, 2, 3} {4} {1, 2, 3} {3} {1, 2, 3} {1, 2} {1, 2} * {1, 2} {1, 2} | {1} {1} a {2} {2} b {3} {3} a {4} {4} b {5} {5} b {6} {6} # nullable firstpos lastpos 1 2 3 4 5 6
  • 49. From Regular Expression to DFA Directly: followpos for each node n in the tree do if n is a cat-node with left child c 1 and right child c 2 then for each i in lastpos ( c 1 ) do followpos ( i ) := followpos ( i )  firstpos ( c 2 ) end do else if n is a star-node for each i in lastpos ( n ) do followpos ( i ) := followpos ( i )  firstpos ( n ) end do end if end do
  • 50. From Regular Expression to DFA Directly: Algorithm s 0 := firstpos ( root ) where root is the root of the syntax tree Dstates := { s 0 } and is unmarked while there is an unmarked state T in Dstates do mark T for each input symbol a   do let U be the set of positions that are in followpos ( p ) for some position p in T , such that the symbol at position p is a if U is not empty and not in Dstates then add U as an unmarked state to Dstates end if Dtran [ T,a ] := U end do end do
  • 51. From Regular Expression to DFA Directly: Example 1,2,3 start a 1,2, 3,4 1,2, 3,6 1,2, 3,5 b b b b a a a 1 2 3 4 5 6 Node followpos 1 {1, 2, 3} 2 {1, 2, 3} 3 {4} 4 {5} 5 {6} 6 -
  • 52. Time-Space Tradeoffs Automaton Space (worst case) Time (worst case) NFA O (  r  ) O (  r  x  ) DFA O (2 | r | ) O (  x  )