Your SlideShare is downloading. ×
Lexical analyzer
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Lexical analyzer

21,051

Published on

mam gave it.

mam gave it.

Published in: Education, Technology
6 Comments
17 Likes
Statistics
Notes
No Downloads
Views
Total Views
21,051
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1,355
Comments
6
Likes
17
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • lec00-outline August 9, 2011
  • Transcript

    • 1. CS40106 Compiler Design Compiler Design 40106
    • 2.
      • Prerequisites:
      • Automata Theory: DFAs, NFAs, Regular Expressions
      • Textbook:
      • Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,
      • “ Compilers: Principles, Techniques, and Tools ”
      • Addison-Wesley, 1986.
      • K. Cooper, L. Torczon, ‘Engineering a Compiler’, Morgan Kaufmann, 1 st Edition, 2003
      Compiler Design 40106
    • 3. Course Outline
      • Introduction to Compiling
      • Lexical Analysis
      • Syntax Analysis
        • Context Free Grammars
        • Top-Down Parsing, LL Parsing
        • Bottom-Up Parsing, LR Parsing
      • Syntax-Directed Translation
        • Attribute Definitions
        • Evaluation of Attribute Definitions
      • Semantic Analysis, Type Checking
      • Run-Time Organization
      • Intermediate Code Generation
      • Code Optimization
      Compiler Design 40106
    • 4. COMPILERS
      • A compiler is a program takes a program written in a source language and translates it into an equivalent program in a target language.
      • source program COMPILER target program
      • error messages
      Compiler Design 40106 ( Normally a program written in a high-level programming language) ( Normally the equivalent program in machine code – relocatable object file)
    • 5. Other Applications
      • In addition to the development of a compiler, the techniques used in compiler design can be applicable to many problems in computer science.
        • Techniques used in a lexical analyzer can be used in text editors, information retrieval system, and pattern recognition programs.
        • Techniques used in a parser can be used in a query processing system such as SQL.
        • A symbolic equation solver which takes an equation as input. That program should parse the given input equation.
        • Most of the techniques used in compiler design can be used in Natural Language Processing (NLP) systems.
      Compiler Design 40106
    • 6. Major Parts of Compilers
      • There are two major parts of a compiler: Analysis and Synthesis
      • In analysis phase, an intermediate representation is created from the given source program.
        • Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of this phase.
      • In synthesis phase, the equivalent target program is created from this intermediate representation.
        • Intermediate Code Generator, Code Generator, and Code Optimizer are the parts of this phase.
      Compiler Design 40106
    • 7. Phases of A Compiler Compiler Design 40106 Lexical Analyzer Semantic Analyzer Syntax Analyzer Intermediate Code Generator Code Optimizer Code Generator Target Program Source Program
      • Each phase transforms the source program from one representation into another representation.
      • They communicate with error handlers.
      • They communicate with the symbol table.
    • 8. 1. Lexical Analyzer
      • Lexical Analyzer reads the source program character by character and returns the tokens of the source program.
      • A token describes a pattern of characters having same meaning in the source program. (such as identifiers, operators, keywords, numbers, delimeters and so on)
      • Ex : newval := oldval + 12 => tokens: newval identifier
      • := assignment operator
      • oldval identifier
      • + add operator
      • 12 a number
      • Puts information about identifiers into the symbol table.
      • Regular expressions are used to describe tokens (lexical constructs).
      • A (Deterministic) Finite State Automaton can be used in the implementation of a lexical analyzer.
      Compiler Design 40106
    • 9. 2. Syntax Analyzer
      • A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program.
      • A syntax analyzer is also called as a parser .
      • A parse tree describes a syntactic structure.
      • assgstmt
      • identifier := expression
      • newval expression + expression
      • identifier number
      • oldval 12
      Compiler Design 40106
      • In a parse tree, all terminals are at leaves.
      • All inner nodes are non-terminals in
      • a context free grammar.
    • 10. Syntax Analyzer (CFG)
      • The syntax of a language is specified by a context free grammar (CFG).
      • The rules in a CFG are mostly recursive.
      • A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not.
        • If it satisfies, the syntax analyzer creates a parse tree for the given program.
      • Ex: We use BNF (Backus Naur Form) to specify a CFG
      • assgstmt -> identifier := expression
      • expression -> identifier
      • expression -> number
      • expression -> expression + expression
      Compiler Design 40106
    • 11. Syntax Analyzer versus Lexical Analyzer
      • Which constructs of a program should be recognized by the lexical analyzer, and which ones by the syntax analyzer?
        • Both of them do similar things; But the lexical analyzer deals with simple non-recursive constructs of the language.
        • The syntax analyzer deals with recursive constructs of the language.
        • The lexical analyzer simplifies the job of the syntax analyzer.
        • The lexical analyzer recognizes the smallest meaningful units (tokens) in a source program.
        • The syntax analyzer works on the smallest meaningful units (tokens) in a source program to recognize meaningful structures in our programming language.
      Compiler Design 40106
    • 12. Parsing Techniques
      • Depending on how the parse tree is created, there are different parsing techniques.
      • These parsing techniques are categorized into two groups:
        • Top-Down Parsing,
        • Bottom-Up Parsing
      • Top-Down Parsing:
        • Construction of the parse tree starts at the root, and proceeds towards the leaves.
        • Efficient top-down parsers can be easily constructed by hand.
        • Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).
      • Bottom-Up Parsing:
        • Construction of the parse tree starts at the leaves, and proceeds towards the root.
        • Normally efficient bottom-up parsers are created with the help of some software tools.
        • Bottom-up parsing is also known as shift-reduce parsing.
        • Operator-Precedence Parsing – simple, restrictive, easy to implement
        • LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR
      Compiler Design 40106
    • 13. 3. Semantic Analyzer
      • A semantic analyzer checks the source program for semantic errors and collects the type information for the code generation.
      • Type-checking is an important part of semantic analyzer.
      • Context-free grammars used in the syntax analysis are integrated with attributes (semantic rules)
        • the result is a syntax-directed translation,
        • Attribute grammars
      • Ex:
        • newval := oldval + 12
          • The type of the identifier newval must match with type of the expression (oldval+12)
      Compiler Design 40106
    • 14. 4. Intermediate Code Generation
      • A compiler may produce an explicit intermediate codes representing the source program.
      • These intermediate codes are generally machine (architecture independent). But the level of intermediate codes is close to the level of machine codes.
      • Ex:
        • newval := oldval * fact + 1
        • id1 := id2 * id3 + 1
        • temp1 := id2 * id3 Intermediates Codes (three address code)
        • temp2 := temp1 + 1
        • id1 := temp2
      Compiler Design 40106
    • 15. Intermediate Code Generation
      • Properties of IR-
      • Easy to produce
      • Easy to translate into the target program
      • IR can be in following forms-
      • Syntax trees
      • Postfix notation
      • Three address statements
      • Properties of three address statements-
      • At most one operator in addition to an assignment operator
      • Must generate temporary names to hold the value computed at each
      • instruction
      • Some instructions have fewer than three operands
      Compiler Design 40106
    • 16. 5. Code Optimizer (for Intermediate Code Generator)
      • The code optimizer optimizes the code produced by the intermediate code generator in the terms of time and space.
      • Ex:
        • temp1 := id2 * id3
        • id1 := temp1 + 1
      Compiler Design 40106
    • 17. 6. Code Generator
      • Produces the target language in a specific architecture.
      • The target program is normally a relocatable object file containing the machine code or assembly code.
      • Intermediate instructions are each translated into a sequence of machine instructions that perform the same task.
      • Ex:
      • ( assume that we have an architecture with instructions whose at least one of its operands is
      • a machine register)
        • MOVE id2,R1
        • MULT id3,R1
        • ADD #1,R1
        • MOVE R1,id1
      Compiler Design 40106
    • 18. Phases of compiler Compiler Design 40106
    • 19. Compiler Design 40106
    • 20. The Structure of a Compiler (8) Scanner [Lexical Analyzer] Parser [Syntax Analyzer] Semantic Process [Semantic analyzer] Code Generator [Intermediate Code Generator] Code Optimizer Parse tree Abstract Syntax Tree w/ Attributes Non-optimized Intermediate Code Optimized Intermediate Code Code Genrator Target machine code Compiler Design 40106 Tokens
    • 21. The input program as you see it. main () { int i,sum; sum = 0; for (i=1; i<=10; i++); sum = sum + i; printf(&quot;%dn&quot;,sum); } Compiler Design 40106
    • 22. Compiler Design 40106
    • 23. Compiler Design 40106
    • 24. Compiler Design 40106
    • 25. Compiler Design 40106
    • 26. Compiler Design 40106
    • 27. Compiler Design 40106
    • 28. Compiler Design 40106
    • 29. Compiler Design 40106
    • 30. Compiler Design 40106
    • 31. Compiler Design 40106
    • 32. Compiler Design 40106
    • 33. Compiler Design 40106
    • 34. Introducing Basic Terminology
      • What are Major Terms for Lexical Analysis?
        • TOKEN
          • A classification for a common set of strings
          • Examples Include <Identifier>, <number>, etc.
        • PATTERN
          • The rules which characterize the set of strings for a token
          • Recall File and OS Wildcards ([A-Z]*.*)
        • LEXEME
          • Actual sequence of characters that matches pattern and is classified by a token
          • Identifiers: x, count, name, etc…
      Compiler Design 40106
    • 35. Introducing Basic Terminology Compiler Design 40106 Token Sample Lexemes Informal Description of Pattern const if relation id num literal const if <, <=, =, < >, >, >= pi, count , D2 3.1416, 0, 6.02E23 “ core dumped” const if < or <= or = or < > or >= or > letter followed by letters and digits any numeric constant any characters between “ and “ except “ Classifies Pattern
      • Actual values are critical.
      • Info is :
        • 1.Stored in symbol table
        • 2.Returned to parser
    • 36. Attributes for Tokens Tokens influence parsing decision; the attributes influence the translation of tokens. Example: E = M * C ** 2 <id, pointer to symbol-table entry for E> <assign_op, > <id, pointer to symbol-table entry for M> <mult_op, > <id, pointer to symbol-table entry for C> <exp_op, > <num, integer value 2> Compiler Design 40106
    • 37. Role of the Lexical Analyzer Compiler Design 40106
    • 38.
      • Tasks of Lexical Analyzer
      • Reads source text and detects the token
      • Stripe outs comments, white spaces, tab, newline characters.
      • Correlates error messages from compilers to source program
      • Approaches to implementation
      • . Use assembly language- Most efficient but most difficult to implement
      • . Use high level languages like C- Efficient but difficult to implement
      • . Use tools like lex, flex- Easy to implement but not as efficient as the first two cases
      Compiler Design 40106
    • 39. Compiler Design 40106
    • 40.
        • Handling Lexical Errors
        • In what Situations do Errors Occur?
        • Lexical analyzer is unable to proceed because none of the patterns for tokens matches a prefix of remaining input.
        • For example:
        • fi ( a == f(x) )….
        • 1) Panic mode recovery - deleting successive characters until a well formed token is formed.
        • 2) Inserting a missing character
        • 3) Replacing a missing character by a correct character
        • 4) Transposing two adjacent characters.
        • 5) Deleting an extraneous character
      Compiler Design 40106
    • 41. Compiler Design 40106
    • 42. Compiler Design 40106
    • 43. Compiler Design 40106
    • 44. Compiler Design 40106
    • 45. Compiler Design 40106
    • 46. Compiler Design 40106
    • 47. Compiler Design 40106
    • 48. Compiler Design 40106
    • 49. Compiler Design 40106
    • 50. Compiler Design 40106
    • 51. Compiler Design 40106
    • 52. Compiler Design 40106
    • 53. Compiler Design 40106
    • 54. Compiler Design 40106
    • 55. Compiler Design 40106
    • 56. Compiler Design 40106
    • 57. Compiler Design 40106
    • 58. Compiler Design 40106
    • 59. Compiler Design 40106
    • 60. Compiler Design 40106
    • 61. Compiler Design 40106
    • 62. Compiler Design 40106
    • 63. Compiler Design 40106
    • 64. Compiler Design 40106
    • 65. Compiler Design 40106
    • 66. Compiler Design 40106
    • 67. Compiler Design 40106
    • 68. Compiler Design 40106
    • 69. Compiler Design 40106
    • 70. Compiler Design 40106
    • 71. Compiler Design 40106
    • 72. Compiler Design 40106
    • 73. Compiler Design 40106
    • 74. Compiler Design 40106
    • 75. Compiler Design 40106
    • 76.  
    • 77. Compiler Design 40106
    • 78. Compiler Design 40106
    • 79. Compiler Design 40106

    ×