• Save
Theory of Computation: Lecture 39
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Theory of Computation: Lecture 39

on

  • 1,452 views

1) Stages of Compilation...

1) Stages of Compilation
2) Syntactic Analysis
3) Tokenization with NFAs
4) Parsing with CFGs
5) Class home page is at http://vkedco.blogspot.com/2011/08/theory-of-computation-home.html

Statistics

Views

Total Views
1,452
Views on SlideShare
919
Embed Views
533

Actions

Likes
0
Downloads
0
Comments
0

40 Embeds 533

http://vkedco.blogspot.com 326
http://www.vkedco.blogspot.com 62
http://vkedco.blogspot.in 35
http://vkedco.blogspot.co.uk 19
http://vkedco.blogspot.com.au 6
http://vkedco.blogspot.mx 6
http://vkedco.blogspot.com.ar 6
http://vkedco.blogspot.com.es 5
http://vkedco.blogspot.sk 5
http://vkedco.blogspot.ca 5
http://vkedco.blogspot.gr 4
http://vkedco.blogspot.se 4
http://vkedco.blogspot.pt 4
http://vkedco.blogspot.de 3
http://vkedco.blogspot.com.tr 3
http://www.vkedco.blogspot.kr 3
http://vkedco.blogspot.dk 3
http://vkedco.blogspot.ro 3
http://vkedco.blogspot.com.br 3
http://www.vkedco.blogspot.tw 2
http://vkedco.blogspot.fr 2
http://vkedco.blogspot.co.nz 2
http://vkedco.blogspot.nl 2
http://vkedco.blogspot.co.il 2
http://www.vkedco.blogspot.co.uk 2
http://vkedco.blogspot.co.at 2
http://www.vkedco.blogspot.sg 1
http://vkedco.blogspot.kr 1
http://www.vkedco.blogspot.de 1
http://vkedco.blogspot.ru 1
http://translate.googleusercontent.com 1
http://vkedco.blogspot.be 1
http://www.vkedco.blogspot.in 1
http://vkedco.blogspot.cz 1
http://vkedco.blogspot.hu 1
http://vkedco.blogspot.no 1
http://vkedco.blogspot.jp 1
http://vkedco.blogspot.it 1
http://vkedco.blogspot.ch 1
http://vkedco.blogspot.tw 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Theory of Computation: Lecture 39 Presentation Transcript

  • 1. CS 5000: Theory of Computation Lecture 39 Vladimir Kulyukin Department of Computer Science Utah State Universitywww.youtube.com/vkedco
  • 2. Outline ● Review ● Stages of Compilation ● Syntactic Analysis ● Tokenization ● Parsingwww.youtube.com/vkedco
  • 3. Review: CFL Membership & CYK Algorithm ● Problem: Given a CFG G = (V, T, P, S) and a string x in T*, determine if x is in L(G)? ● Cocke-Younger-Kasami (CYK) algorithm takes a CFG in CNF and a string and returns true or false, depending on whether x is or is not in L(G) ● The CYK algorithm runs in O(n3), where |x|=nwww.youtube.com/vkedco
  • 4. CYK Algorithm: Basic Insight A * xij iff A  BC, A B * xik, and C * x(i+k)(j-k), for B C some k, 1 ≤ k < j i i+k-1 i+k i+j j-k k Xijwww.youtube.com/vkedco
  • 5. Three Stages of Compilation ● Syntactic Analysis: The source program is processed to determine its conformity to the language grammar and its structure ● Contextual Analysis: The output of the syntactic analysis (a parse tree) is checked for its conformity to the language’s contextual constraints ● Code Generation: The checked parse tree is used to generate the target code, e.g. Java byte code or assembly or some other target languagewww.youtube.com/vkedco
  • 6. Syntactic Analysiswww.youtube.com/vkedco
  • 7. Components of Syntactic Analsysis ● Syntactic Analysis consists of Tokenization and Parsing ● Tokenization – We have to define a set of FA’s (regular expressions) to tokenize input statements (primitive instructions) ● Parsing – We have to define a CFG to map tokenized input statements (primitive instructions) into parse trees.www.youtube.com/vkedco
  • 8. Tokenizationwww.youtube.com/vkedco
  • 9. Two Basic Design Principles ● Zero Token Ambiguity: Each sequence of non-white- space characters must be mapped to at most one token ● Zero Statement (Instruction) Ambiguity: Each sequence of tokens recognized in between the beginning of a line and a newline character must have at most one parse treewww.youtube.com/vkedco
  • 10. Tokenization of Programming Language Lwww.youtube.com/vkedco
  • 11. Tokenization: Input Variables (InputVarToken) ● Input variables are tokens of the form X1, X2, X3, etc. In general, an input variable is Xk, where k is a natural number greater than 0. An NFA is as follows: [0 – 9] X [1 – 9]www.youtube.com/vkedco
  • 12. Tokenization: Output Variables (OutputVarToken) ● L has only one output variable: Y. Here is an NFA: Ywww.youtube.com/vkedco
  • 13. Tokenization: Local Variables (LocalVarToken) ● Local variables are tokens of the form Z1, Z2, Z3, etc. In general, a local variable is Zk, where k is a natural number greater than 0. An NFA is as follows: [0 – 9] Z [1 – 9]www.youtube.com/vkedco
  • 14. Tokenization: Labels ● There are two places where a label can occur in a primitive instruction: at the beginning of a line and at the end of a line ● At the beginning of a line a label is bracketed; at the end of a line it is not ● Furthermore, labels that start with A, B, C, D are non-exit labels; labels that start with E are exit labelswww.youtube.com/vkedco
  • 15. Tokenization: Non-Exit Non-Bracketed Labels (NELblToken) ● Non-exit labels that occur at the end of a line are tokens of the form Λ1, Λ 2, Λ3, etc. In general, a label is Λk, where k is a natural number greater than 0 and Λ is in {A, B, C, D}. An NFA is as follows: [0 – 9] A,B,C,D [1 – 9]www.youtube.com/vkedco
  • 16. Tokenization: Non-Exit Bracketed Labels (NEBrLblToken) ● Non-exit labels that occur at the end of a line are tokens of the form [Λ1], [Λ2], [Λ3], etc. In general, a label is [Λk], where k is a natural number greater than 0 and Λ is in {A, B, C, D}. An NFA is as follows: [0 – 9] [ A,B,C,D [1 – 9] ]www.youtube.com/vkedco
  • 17. Tokenization: Exit Non-Bracketed Label (ELblToken) ● Every L program has a unique exit label (E1). If the exit label occurs at the end of a line, it is not bracketed. An NFA is as follows: E 1www.youtube.com/vkedco
  • 18. Tokenization: Exit Bracketed Label (EBrLblToken) ● Every L program has a unique exit label (E1). If the exit label occurs at the beginning of a line is it bracketed. An NFA is as follows: [ E 1 ]www.youtube.com/vkedco
  • 19. Tokenization: Operators ● There are four operator tokens in L: <=, +, -, != . Here is possible NFAs for operators: < = AssignOperToken ! = NotEqOperToken + PlusOperToken - MinusOperTokenwww.youtube.com/vkedco
  • 20. Tokenization: Keywords ● L has two keywords: IF and GOTO. Two possible NFAs: I F IFToken G O T O GOTOTokenwww.youtube.com/vkedco
  • 21. Tokenization: Literals ● L has 2 literals: 0 and 1. Two possible NFAs: 0 ZeroLitToken 1 OneLitTokenwww.youtube.com/vkedco
  • 22. Complete List of Tokens 1. InputVarToken 2. OutputVarToken 3. LocalVarToken 4. NELblToken 5. ELblToken 6. NEBrLblToken 7. EBrLblToken 8. AssignOperToken 9. NotEqOperToken 10. PlusOperToken 11. MinusOperToken 12. IFToken 13. GOTOToken 14. ZeroLitToken 15. OneLitTokenwww.youtube.com/vkedco
  • 23. Tokenization Algorithm: Outline ● Read in a line of text ● Partition the line into substrings on white space ● Run each substring through all possible NFAs ● Each substring can be recognized by at most one NFA ● If a substring is not recognized by an NFA, report an error; otherwise, create an appropriate token, depending on what NFA recognized the substring ● The output is a sequence of tokenswww.youtube.com/vkedco
  • 24. Tokenization Algorithm: Details ● Activate all Lazy NFAs for token recognition ● Read the file character by character; when a non-white-space character is read, go into the token recognition mode ● In the token recognition mode, when a character is read, feed it to every NFA so that all NFAs that recognize it make their transitions; if no NFA can transition, fail ● When a white-space character is read, switch off the token recognition mode and check if any NFAs accepted the sequence of non-white space characters – if yes, construct the appropriate token and reset each NFA back to its start state – If none of the NFAs accepted or more than one accepted, failwww.youtube.com/vkedco
  • 25. Tokenization Example ● Let us tokenize the following L program: [A1] X1 <= X1 – 1 Y <= Y + 1 IF X1 != 0 GOTO A1www.youtube.com/vkedco
  • 26. Tokenization Example: Line 1 ● [A1] X1 <= X1 – 1 ● White space partitioning gives us the following substrings: “[A1]”, “X1”, “<=“, “X1”, “-”, “1” ● “[A1]” is recognized by the Non-Exit Bracketed Label NFA; so create NEBrLblToken(“A1”) ● “X1” is recognized by the Input Variable NFA; so create InputVarToken(“X1”) ● “<=“ is recognized by the Assignment Operator NFA; so create AssignOperToken(“<=“) ● “X1” is recognized by the InputVariable NFA; so create InputVarToken(“X1”) ● “-” is recognized by the Minus Operator NFA; so create MinusOperToken(“-”) ● “1” is recognized by the One Literal NFA; so create OneLitToken(“1”) ● The output is: – <NEBrLblToken(“A1”), InputVarToken(“X1”), AssignOperToken(“<=“), InputVarToken(“X1”), MinusOperToken(“-”), OneLitToken(“1”)>www.youtube.com/vkedco
  • 27. Tokenization Example: Line 1 ● The line [A1] X1 <= X1 – 1 gives us the following sequences of tokens: NEBrLblToken InputVarToken AssigOperToken InputVarToken MinusOperToken OneLitToken “A1” “X1” “<=“ “X1” “-” “1”www.youtube.com/vkedco
  • 28. Tokenization Example: Line 2 ● The line Y <= Y + 1 gives us the following sequences of tokens: OutputVarToken AssigOperToken OutputVarToken PlusOperToken OneLitToken “Y” “<=“ “Y” “+” “1”www.youtube.com/vkedco
  • 29. Tokenization Example: Line 3 ● The line IF X1 != 0 GOTO A1 gives us the following sequences of tokens: IFToken InputVarToken NotEqOperToken ZeroLitToken GOTOToken NELblToken “IF” “X1” “!=“ “0” “” “A1”www.youtube.com/vkedco
  • 30. Parsingwww.youtube.com/vkedco
  • 31. Recursive Descent Parsing ● Recursive Descent Parsing is an algorithm that should be considered for any unambiguous CF grammar ● All programming languages are specified either with unambiguous CF grammars or with ambiguous CF grammars where ambiguity can be easily handled ● The basic step in designing an RDP parser is to design a parsing procedure parseN for every non-terminal symbol N in the grammarwww.youtube.com/vkedco
  • 32. Parsing Programming Language Lwww.youtube.com/vkedco
  • 33. Developing Recursive-Descent Parser for L ● To develop a recursive-descent parser for L we need to accomplish three tasks: – Develop a CFG G for L – Derive a set of RD parsing procedures from G – Implement the rules in a programming language (Java, C/C+ +, C#, Structured COBOL , etc.)www.youtube.com/vkedco
  • 34. A CFG Grammar for Lwww.youtube.com/vkedco
  • 35. A CFG Grammar for L • LInstruct  LblStmnt | Stmnt – L instruction can be a labeled statement or an unlabeled statement • LblStmnt  BrLBL Stmnt – A labeled statement consists of a bracketed label followed by a statement • BrLBL  NEBrLblToken | EBrLblToken – A bracketed label is either a non-exit bracketed label token or an exit bracketed label token • Stmnt  Incrmnt | Decrmnt | NOP | CDisp – A statement can be a increment statement, a decrement statement, a no-op statement, and a conditional dispatch statementwww.youtube.com/vkedco
  • 36. A CFG Grammar for L ● Incrmnt  VarToken AssignOperToken VarToken PlusOperToken OneLitToken – Note: this rule is simplified, because, technically speaking, VarToken is not present in the list of tokens. So, we have to write additional productions of the form: VarToken  InputVarToken | OutputVarToken | LocalVarToken ● Decrmnt  VarToken AssignOperToken VarToken MinusOperToken OneLitToken ● NOP  VarToken AssignOperToken VarToken ● CDisp  IFToken VarToken NotEqOperToken ZeroLitToken GOTOToken DispLBL ● DispLBL  NELblToken | ELblTokenwww.youtube.com/vkedco
  • 37. Top-Level CFG Productions ● LProgram  LInstructSEQ – To recognize a L Program is to recognize a sequence of L instructions ● LInstructSEQ  ε – A sequence of L instructions can be empty ● LInstructSEQ  LInstruct LInstructSEQ – A non-empty sequence of L instructions starts with an L instructions and is followed by a sequence of L instructionswww.youtube.com/vkedco
  • 38. Recursive-Descent Parsing Procedureswww.youtube.com/vkedco
  • 39. Parsing Procedures for L ● Let us agree that each parsing procedure returns a ParseTree data structure (the base class) ● Consider the first rule in our grammar: LProgram  LInstructSEQ ● ParseTree parseLProgram(input, start_pos) { ParseTree progTree = parseLInstructSEQ(input, start_pos); return progTree; }www.youtube.com/vkedco
  • 40. Implementation Notes ● ParseTree can be implemented as a base class ● All other parse trees corresponding to each non-terminal, e.g. LProgram, LInstructSEQ, LInstruct, etc., can be implemented as derived classes (sub-classes of ParseTree)www.youtube.com/vkedco
  • 41. ParseLinstructSEQ Procedure ● There are 2 productions: LInstructSEQ  ε | LInstructSEQ  LInstruct | LInstructSEQ ● ParseTree parseLInstructSEQ(input, start_pos) { if ( input is empty ) return the empty LInstructSEQ; else { ParseTree firstIns = parseLInstruct(input, start_pos); ParseTree restInstructs = parseLInstructSEQ(input, firstIns.getNextPos()); return new LInstructSEQ(firstInstruct, restInstructs); } }www.youtube.com/vkedco
  • 42. ParseLInstruct Procedure ● Two productions for LInstruct: LInstruct  LblStmnt | Stmnt ● ParseTree parseLInstruct(input, start_pos) { ParseTree lblSt = parseLblStmnt(input, start_pos); if ( lblSt == null ) return parseStmnt(input, start_pos); else return lblSt; }www.youtube.com/vkedco
  • 43. ParseLblStmnt Procedure ● G has one production for LblStmnt: LblStmnt  BrLBL Stmnt ● ParseTree parseLblStmnt(input, start_pos) { ParseTree brLbl = parseBrLbl(inut, start_pos); if ( brLbl == null ) return null; else { ParseTree stmnt = parseStmnt(input, brLbl.getNextPos(); if ( stmnt == null ) return null; else return new LblStmnt(brLbl, stmnt); }www.youtube.com/vkedco
  • 44. ParseLbl Procedure ● G has two productions for BrLbl: BrLBL  NEBrLblToken | EBrLblToken ● Note that both right-hand sides consist of tokens Remember that tokens are terminals to the parser ● So, in this case, instead of parsing we have to make sure that these terminals are in the inputwww.youtube.com/vkedco
  • 45. ParseLbl Procedure ParseTree parseLbl(input, start_pos) { if (input[start_pos] == NEBrLblToken ) return new Lbl(input[start_pos]); else if (input[start_pos] == EBrLblToken) return new Lbl(input[start_pos]); else return null; }www.youtube.com/vkedco
  • 46. ParseIncrmnt Procedure ● The rest of the parsing procedures can be derived in a similar fashion ● There is one rule for Incrmnt: Incrmnt  VarToken AssignOperToken VarToken PlusOperToken OneLitToken ● This rule does not require any parsing; it requires only matching of tokenswww.youtube.com/vkedco
  • 47. ParseIncrmnt Procedure ParseTree parseIncrmnt(input, start_pos) { if ( input[start_pos] != VarToken ) return null; else if ( input[start_pos+1] != AssignOperToken ) return null; else if ( input[start_pos+2] != VarToken) return null; else if ( input[start_pos+3] != PlusOperToken) return null; else if ( input[start_pos+4] != OneLitToken) return null; else return new Incrmnt(VarToken, AssignOperToken, VarToken, PlusOperToken, OneLitToken); }www.youtube.com/vkedco
  • 48. Parsing Example ● Let us parse the following L program: [A1] X1 <= X1 – 1 Y <= Y + 1 IF X1 != 0 GOTO A1www.youtube.com/vkedco
  • 49. Parsing Example: Line 1 Tokenized ● The line [A1] X1 <= X1 – 1 gives us the following sequences of tokens: NEBrLblToken InputVarToken AssigOperToken InputVarToken MinusOperToken OneLitToken “A1” “X1” “<=“ “X1” “-” “1”www.youtube.com/vkedco
  • 50. Parsing Example: Line 1 ParseTree LInstruct LblStmnt BrLbl Stmnt Decmnt NEBrLblToken InputVarToken AssignOperToken InputVarToken MinusOperToken OneLitToken “[A1]” “X1” “<=“ “X1” “-” “1”www.youtube.com/vkedco
  • 51. Parsing Example: Line 2 Tokenized ● The line Y <= Y + 1 gives us the following sequences of tokens: OutputVarToken AssigOperToken OutputVarToken PlusOperToken OneLitToken “Y” “<=“ “Y” “+” “1”www.youtube.com/vkedco
  • 52. Parsing Example: Line 2 ParseTree LInstruct Stmnt Incmnt OutputVarToken AssignOperToken OutputVarToken PlusOperToken OneLitToken “Y” “<=“ “Y” “+” “1”www.youtube.com/vkedco
  • 53. Parsing Example: Line 3 Tokenized ● The line IF X1 != 0 GOTO A1 gives us the following sequences of tokens: IFToken InputVarToken NotEqOperToken ZeroLitToken GOTOToken NELblToken “IF” “X1” “!=“ “0” “GOTO” “A1”www.youtube.com/vkedco
  • 54. Parsing Example: Line 3 ParseTree LInstruct Stmnt CDisp IFToken InputVarToken NotEqOperToken ZeroLitToken GOTOToken NELblToken “IF” “X1“ “!=” “0” “GOTO” “A1”www.youtube.com/vkedco
  • 55. Parsing Example: LProgram ParseTree LProgram LInstructSEQ LInstruct LInstruct LInstruct “[A1] X1 <= X1 – 1” “Y <= Y + 1” “IF X1 != 0 GOTO A1”www.youtube.com/vkedco
  • 56. References & Reading Suggestions ● Hopcroft and Ullman. Introduction to Automata Theory, Languages, and Computation, Narosa Publishing House ● Moll, Arbib, and Kfoury. An Introduction to Formal Language Theory ● Davis, Weyuker, Sigal. Computability, Complexity, and Languages, 2nd Edition, Academic Press ● Brooks Webber. Formal Language: A Practical Introduction, Franklin, Beedle & Associates, Incwww.youtube.com/vkedco
  • 57. Feedback Errors, comments to vladimir dot kulyukin at gmail dot comwww.youtube.com/vkedco