Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Compiler design syntax analysis

3,022 views

Published on

In this slide you will explore more about how to make derivations ,design parse tree ,what is ambiguity and how to remove ambiguity ,left recursion ,left factoring .

Published in: Engineering
  • Be the first to comment

Compiler design syntax analysis

  1. 1. COMPILER DESIGN SYNTAX ANALYSIS RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 1 Ms. RICHA SHARMA Assistant Professor richa.18364@lpu.co.in Lovely Professional University
  2. 2. SYNTAX ANALYSIS INTRODUCTION • LEXICAL PHASE IS IMPLEMENTED ON FINITE AUTOMATA & FINITE AUTOMATA CAN REALLY ONLY EXPRESS THINGS WHERE YOU CAN COUNT MODULUS ON K. • REGULAR LANGUAGES – THE WEAKEST FORMAL LANGUAGES WIDELY USED – MANY APPLICATIONS – CAN’T HANDLE ITERATION & NESTED LOOPS(NESTED IF ELSE ). TO SUMMARIZE, THE LEXER TAKES A STRING OF CHARACTER AS INPUT AND PRODUCES A STRING OF TOKENS AS OUTPUT. THAT STRING OF TOKENS IS THE INPUT TO THE PARSER WHICH TAKES A STRING OF TOKENS AND PRODUCES A PARSE TREE OF THE PROGRAM. SOMETIMES THE PARSE TREE IS ONLY IMPLICIT. SO THE, A COMPILER MAY NEVER ACTUALLY BUILD THE FULL PARSE TREE. RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 2
  3. 3. Lexical Analyzer Parser Source program token getNext Token Symbol table Parse tree Rest of Front End Intermediate representation ROLE OF SYNTAX ANALYSIS/PARSER RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 3
  4. 4. CONTEXT FREE GRAMMARS expression -> expression + term expression -> expression – term expression -> term term -> term * factor term -> term / factor term -> factor factor -> (expression) factor -> id S - IS A FINITE SET OF TERMINALS N - IS A FINITE SET OF NON-TERMINALS P - IS A FINITE SUBSET OF PRODUCTION RULES S - IS THE START SYMBOL G=(S ,N,P,S) • A GRAMMAR DERIVES STRINGS BY BEGINNING WITH START SYMBOL AND REPEATEDLY REPLACING A NON TERMINAL BY THE RIGHT HAND SIDE OF A PRODUCTION FOR THAT NON TERMINAL. • FROM THE START SYMBOL OF A GRAMMAR G FORM THE LANGUAGE L(G) DEFINED BY THE GRAMMAR THE STRINGS THAT CAN BE DERIVED . RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 4
  5. 5. RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 5 • PROGRAMMING LANGUAGES HAVE RECURSIVE STRUCTURE • CONTEXT-FREE GRAMMARS ARE A NATURAL NOTATION FOR THIS RECURSIVE STRUCTURE . NOT ALL STRINGS OF TOKENS ARE PROGRAMS . . . . . . PARSER MUST DISTINGUISH BETWEEN VALID AND INVALID STRINGS OF TOKENS WE NEED : – A LANGUAGE :FOR DESCRIBING VALID STRINGS OF TOKENS – A METHOD: FOR DISTINGUISHING VALID FROM INVALID STRINGS OF TOKENS CONTEXT FREE GRAMMARS
  6. 6. RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) E ::= T | E + T | E - T T ::= F | T * F |T / F F ::= id | (E) • ARITHMETIC EXPRESSIONS • STATEMENTS If Statement ::= if E then Statement else Statement CONTEXT FREE GRAMMAR EXAMPLES Steps: 1. Begin with a string with only the start symbol S 2. Replace any non-terminal X in the string by the right-hand side of some production X -> Y1…Yn 3. Repeat (2) until there are no non- terminals 6
  7. 7. DERIVATIONS • DERIVATION IS A SEQUENCE OF PRODUCTIONS SO BEGINNING WITH THE START SYMBOL. • WE CAN APPLY PRODUCTIONS ONE AT A TIME IN SEQUENCE & THAT WILL PRODUCES A DERIVATION. • A DERIVATION IS A SEQUENCE OF PRODUCTIONS A -> … -> … ->… -> … -> … • A DERIVATION CAN BE DRAWN AS A TREE – START SYMBOL IS THE TREE’S ROOT – FOR A PRODUCTION X -> Y1…Yn ADD CHILDREN Y1…Yn TO NODE X • GRAMMAR E -> E + E | E * E | (E) | ID • STRING ID *ID + ID RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 7
  8. 8. DERIVATIONS DERIVATIONS ARE OF TWO TYPES: • RIGHTMOST AND LEFTMOST DERIVATIONS • LETS DISCUSS WITH EXAMPLE GRAMMAR: E -> E + E | E * E | -E | (E) | ID STRING :(ID+ID) LEFT MOST DERIVATION RIGHT MOST DERIVATION E E = (E) = (E) = (E+E) = (E+E) = (ID+E) = (E+ID) =(ID+ID) =(ID+ID) RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 8
  9. 9. DERIVATIONS • NOW WE'RE GOING TO PARSE THIS STRING AND WE'RE GOING TO SHOW HOW TO PRODUCE A DERIVATION FOR THE STRING AND ALSO AT THE SAME TIME BUILD THE TREE. • PARSE TREES HAVE TERMINALS AT THE LEAVES AND NONTERMINALS AT THE INTERIOR NODES AND FURTHERMORE, IN-ORDER TRAVERSAL OF THE LEAVES IS THE ORIGINAL INPUT. • GRAMMAR E -> E + E | E * E | (E) | ID • STRING ID * ID + ID RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 9
  10. 10. LEFT MOST DERIVATION AND PARSE TREE E E RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 10
  11. 11. LEFT MOST DERIVATION AND PARSE TREE E E+E E E + E RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 11
  12. 12. LEFT MOST DERIVATION AND PARSE TREE E E+E E E*E+E E + E E * E RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 12
  13. 13. LEFT MOST DERIVATION AND PARSE TREE E E+E E E*E+E E + E id*E+E E * E id RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 13
  14. 14. LEFT MOST DERIVATION AND PARSE TREE E E+E E E*E+E E + E id*E+E E * E id*id+E id id RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 14
  15. 15. LEFT MOST DERIVATION AND PARSE TREE E E+E E E*E+E E + E id*E+E E * E id id*id+id id id RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 15
  16. 16. DERIVATIONS • A PARSE TREE HAS – TERMINALS AT THE LEAVES – NON-TERMINALS AT THE INTERIOR NODES • AN IN-ORDER TRAVERSAL OF THE LEAVES IS THE ORIGINAL INPUT • THE PARSE TREE SHOWS THE ASSOCIATION OF OPERATIONS, THE INPUT STRING DOES NOT . NOTE: THAT RIGHT-MOST AND LEFT-MOST DERIVATIONS HAVE THE SAME PARSE TREE IF NOT THEN THE GRAMMAR IS AMBIGUOUS GRAMMAR. RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 16
  17. 17. AMBIGUITY • IF STRING HAS TWO OR MORE RIGHT MOST DERIVATIONS OR TWO OR MORE LEFT DERIVATIONS THEN THAT STRING WILL HAVE TWO DISTINCT PARSE TREES AND HENCE GRAMMAR WILL BE AMBIGUOUS. • AMBIGUITY IS BAD: LEAVES MEANING OF SOME PROGRAMS ILL-DEFINED • MULTIPLE PARSE TREES FOR SOME PROGRAM THEN THAT ESSENTIALLY MEANS THAT YOU'RE LEAVING IT UP TO THE COMPILER TO PICK WHICH OF THOSE TWO POSSIBLE INTERPRETATIONS OF THE PROGRAM YOU WANT IT TO GENERATE CODE FOR AND THAT'S NOT A GOOD IDEA. • TO REMOVE AMBIGUITY WE NEED TO REWRITE THE RULES CHECKING OVER PRECEDENCE AND ASSOCIATIVITY . RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 17
  18. 18. AMBIGUITY RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 18 Eg: The string id +id* id produces two parse tree hence the grammar is ambiguous. One can remove the ambiguity by rewriting the grammar as introducing new non-terminal instead of r non-terminal , but it can result in left or right recursion .Hence we have to remove left recursion.
  19. 19. AMBIGUITY • IF WE HAVE AN AMBIGUOUS GRAMMAR: E →E * E E →NUM • AS THIS DEPENDS ON THE ASSOCIATIVITY OF *,WE USE DIFFERENT REWRITE RULES FOR DIFFERENT ASSOCIATIVITY . • IF * IS LEFT-ASSOCIATIVE, WE MAKE THE GRAMMAR LEFT-RECURSIVE BY HAVING A RECURSIVE REFERENCE TO THE LEFT ONLY OF THE OPERATOR SYMBOL. UNAMBIGUOUS GRAMMAR: E →E * E’ E →E’ E’→NUM RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 19
  20. 20. LEFT RECURSION • UNAMBIGUOUS GRAMMAR : E →E * E’ E →E’ E’→NUM • THIS GRAMMAR IS NOW LEFT RECURSIVE. LEFT RECURSIVE GRAMMAR IS ANY GRAMMAR THAT HAS A NON-TERMINAL WHERE IF YOU START WITH THAT NON- TERMINAL AND YOU DO SOME NON-EMPTY SEQUENCE OF RE-WRITES. • CONSIDER THE LEFT-RECURSIVE GRAMMAR S -> S a | b • S GENERATES ALL STRINGS STARTING WITH “a” AND FOLLOWED BY ANY NUMBER OF “b’S” • CAN REWRITE USING RIGHT-RECURSION • S ->bS’ S’ ->aS’ |€ RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 20
  21. 21. EXAMPLES OF LEFT RECURSION 1. E -> E + T | T T -> ID | (E) 2. S ->(L)|X L ->L,S|S 3. S ->S0S1S|01 RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 21
  22. 22. LEFT FACTORING • LEFT FACTORING IS A GRAMMAR TRANSFORMATION THAT IS USEFUL FOR PRODUCING A DETERMINISTIC GRAMMAR FROM NON- DETERMINISTIC GRAMMAR SUITABLE FOR PREDICTIVE OR TOP-DOWN PARSING. • CONSIDER FOLLOWING GRAMMAR: • STMT -> IF EXPR THEN STMT ELSE STMT • | IF EXPR THEN STMT • ON SEEING INPUT IF IT IS NOT CLEAR FOR THE PARSER WHICH PRODUCTION TO USE • WE CAN EASILY PERFORM LEFT FACTORING: • IF WE HAVE A->ΑΒ1 | ΑΒ2 THEN WE REPLACE IT WITH • A -> ΑA’ • A’ -> Β1 | Β2 RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 22
  23. 23. EXAMPLES OF LEFT FACTORING 1. S -> iEtS|iEtSES|a E ->b 2. S-> aSSbS|aSaSb|abb|b 3. S-> bSSaaS|bSSaSb|bSb|a RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 23
  24. 24. RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 24

×