2. Introduction
Syntax analysis is the second phase of the compiler. It gets the
input from the tokens and generates a syntax tree or parse
tree.
The Syntax of programming language constructs can be
specified by Context Free Grammar or BNF (Backus-Naur
Form) notation.
3. Introduction
Advantages of Grammar for Syntax Specification
A grammar gives a precise and easy-to-understand syntactic specification
of a programming language.
An efficient parser can be constructed automatically from a properly
designed grammar.
A grammar imparts a structure to a source program that is useful for its
translation into object code and for the detection of errors.
New constructs can be added to a language more easily when there is a
grammatical description of the language.
4. Introduction
Role of the Parser
• The parser or syntactic analyzer obtains a string of tokens from
the lexical analyzer and verifies that the string can be generated
by the grammar for the source language.
•
• It reports any syntax errors in the program.
• It also recovers from commonly occurring errors so that it can
continue processing its input.
6. Role of the Parser
It verifies the structure generated by the tokens
based on the grammar.
It constructs the parse tree.
It reports the errors.
It performs error recovery.
7. Role of the Parser
3 Types of Parsers for Grammars
Universal –Parse any grammar, but too inefficient to use in
production compilers
Top-down—Builds the parse tree from top (root) to the
bottom(leaves)
Bottom-up—Starts from the leaves and work their way up to
the root
In either case , the input to the parser is scanned from left to
right, one symbol at a time
8. Issues : Parser cannot detect errors such as:
1. Variable re-declaration
2. Variable initialization before use
3. Data type mismatch for an operation.
The above issues are handled by Semantic Analysis phase
9. Role of the Parser
Representative Grammars
The following grammar describes the expressions , terms
and factors
10. Role of the Parser
Representative Grammars
The above grammar is left recursive.
Expression grammar belongs to the class of LR
grammars that are suitable for bottom-up parsing
This grammar handles the additional operators
and additional levels of precedence
11. Role of the Parser
Representative Grammars
The following Non-Left recursive variant of the
expression grammar will be used for top –down parsing.
12. Role of the Parser
Representative Grammars
The following grammar is useful for describing
techniques for handling the ambiguities during parsing.
This grammar permits more than one parse tree for the
expressions like a + b * c
E -> E + E | E * E | ( E ) | id
13. Role of the Parser
Syntax Error Handling
Planning the error handling right from the start can both simplify the structure of
the compiler and improve its handling of errors.
Common Programming errors can occur at many different levels
1 Lexical errors include misspelling of identifiers, keywords or operators.
2. Syntactic errors include misplaced semicolons or extra or missing braces .
3. Semantic errors, include type mismatch between operators and operand.
4. Logical, such as an infinitely recursive call.
14. Role of the Parser
Syntax Error Handling
The error handler in a parser has the following goals
It should report the presence of errors clearly and accurately.
It should recover from each error quickly enough to be able to
detect subsequent errors.
It should not significantly slow down the processing of correct
programs.
15. Role of the Parser
Error-Recovery Strategies
The different strategies that a parse uses to recover from a
syntactic error are:
1. Panic mode
2. Phrase level
3. Error productions
4. Global correction
16. Role of the Parser
Error-Recovery Strategies
1.Panic mode:
On discovering an error, the parser discards input symbols one at a time until a
synchronizing token is found.
The synchronizing tokens are usually delimiters, such as semicolon or end. It has
the advantage of simplicity and does not go into an infinite loop. When multiple
errors in the same statement are rare, this method is quite useful
2. Phrase level
On discovering an error, the parser performs local correction on the remaining
input that allows it to continue. Example: Insert a missing semicolon or delete an
extraneous semicolon etc.
17. Role of the Parser
Error-Recovery Strategies
3. Error productions
The parser is constructed using augmented grammar with error productions. If an
error production is used by the parser, appropriate error diagnostics can be
generated to indicate the erroneous constructs recognized by the input
4. Global correction
Given an incorrect input string x and grammar G, certain algorithms can be used
to find a parse tree for a string y, such that the number of insertions, deletions
and changes of tokens is as small as possible. However, these methods are in
general too costly in terms of time and space.
18. Context Free Grammars-Formal Definition
Grammars are used to systematically describe the syntax of programming
language constructs like expressions and statements.
A grammar is a finite set of formal rules for generating syntactically correct
sentences or meaningful correct sentences.
Context Free Grammar:
A Context Free Grammar consists of set of Terminals (T),set of Non-terminals
or Variables(V), A Start Symbol (S) and Productions (P)
G=(V,T,P,S)
19. Context-Free Grammar-Formal Definition
Ex : stmt if ( expr ) stmt else stmt
1.Terminals (T) are the basic symbols from which strings are formed.
Terminals denotes the tokens returned by lexical anlayzer. Ex: if, else ,
“(“ , ”)”
2. Non-Terminals (N) are the syntactic variables that denote sets of
strings Ex: stmt , expr
3. In a grammar, one non-terminal is distinguished as the start symbol
and the set of strings it denotes is the language generated by the grammar.
4. The productions of a grammar specify the manner in which the
terminals and non-terminals can be combined to form strings.
20. Context-Free Grammar-Formal Definition
Each production consists of:
a) A non terminal symbol called the head or left side of the
production
b) The symbol
c) A body or right side consisting of zero or more terminals
and non terminals
32. Parse Trees and Derivations
The derivations can be shown in the form of a tree,
such trees are called derivation trees or parse trees.
The left most derivations as well as right most
derivations can be represented using derivation trees
or parse trees.
33. Parse Trees and Derivations
A parse tree is a graphical representation of a derivation
that filters out the order in which productions are applied
to replace nonterminals.
Each interior node of a parse tree represents the application of a
production.
The interior node is labeled with the non terminal A in the head of the
production;
the children of the node are labeled, from left to right, by the symbols
in the body of the production
35. Ambiguous Grammar
A grammar that produces more than one parse tree
for some sentence is said to be ambiguous.
or
An ambiguous grammar is one that produces more
than one leftmost derivation or more than one
rightmost derivation for the same sentence.
37. Verifying the Language Generated by Grammar
A proof that a grammar (G) generates a language (L) has 2
parts:
i) Show that every string generated by G is in L
ii) Conversely, Show that every string in L can be
generated by G.
42. Writing a Grammar
Lexical Analysis vs Syntactic Analysis
Reasons for using Regular Expressions to define the lexical
syntax of a language are as follows
The lexical rules of a language are frequently quite simple,
and to describe them we do not need a notation as powerful as
grammars.
Regular expressions generally provide a more concise and
easier-to-understand notation for tokens than grammars.
More efficient lexical analyzers can be constructed
automatically from regular expressions than from arbitrary
grammars.
43. Lexical Analysis vs Syntactic Analysis
Regular expressions are most useful for describing the structure of
constructs such as identifiers, constants, keywords, and whitespace.
Grammars, on the other hand, are most useful for describing nested
structures such as balanced parentheses, matching begin-end's,
corresponding if-then-else's, and so on.
44. Eliminating Ambiguity
The ambiguity whether to associate else with the first if
statement or second if statement is called dangling else
problem
There are 2 methods for eliminating ambiguity in CFGs
1.Disambiguity rule
2.Using precedence and associativity of operators
46. Elimination of Left Recursion
A grammar is left recursive if it has a nonterminal A such
that there is a derivation A => Aα ,for some string α.
Top-down parsing methods cannot handle left-recursive
grammars, so a transformation is needed to eliminate left
recursion.
87. Predictive –LL(1)
It is a non recursive top down parser.
In LL(1), 1st L represents that the scanning of the
Input from Left to Right.
Second L shows that in this Parsing technique we
are going to use Left most Derivation Tree.
1 represents the number of look ahead, means
how many symbols are going to see when you
want to make a decision.
91. Predictive –LL(1)
The predictive parser has an input, a stack, a parsing
table, and an output.
The input contains the string to be parsed, followed by $,
the right end marker.
The stack contains a sequence of grammar symbols,
preceded by $, the bottom-of stack marker.
The Stack holds left most derivation.
92. Predictive –LL(1)
The parsing table is a two dimensional array M[A ,a], where A is
a nonterminal, and a is a terminal or the symbol $.
The parser is controlled by a program that behaves as follows:
The program determines X, the symbol on top of the stack, and a
the current input symbol.
These two symbols determine the action of the parser.
93. Predictive –LL(1)
There are three possibilities:
1.If X = a = $, the parser halts and announces successful completion
of parsing.
2. If X = a ≠ $, the parser pops X off the stack and advances the input
pointer to the next input symbol.
94. If X is a nonterminal, the program consults entry M[X, a] of the
parsing table M.
This entry will be either an X-production of the grammar or an
error entry.
If M[X, a] = {X → UVW}, the parser replaces X on top of the
stack by WVU (with U on top).
If M[X, a] = error, the parser calls an error recovery routine
Predictive –LL(1)
95. Predictive –LL(1)
The Construction of predictive parser is aided by two
functions associated with a grammar G.
These Functions, First and Follow allows us to fill the
entries of predictive parsing table for grammar G
101. Bottom-Up Parsing
Bottom-up parsing starts from the leaf nodes of a tree and
works in upward direction till it reaches the root node.
we start from a sentence or input string and then apply
production rules in reverse manner (reduction) in order to
reach the start symbol.
The process of parsing halts successfully as soon as we
reach to start symbol.
102. Bottom-Up Parsing
Bottom-up parsing is the process of “reducing “ the string
w to the start symbol of the grammar.
At each reduction step , a specific substring matching the
body of production is replaced by the nonterminal at the
head of that production.
The key decisions during Bottom-up parsing are about
when to reduce and what production to apply as the
parse proceeds.
104. A right most derivation in reverse can be
obtained by handle pruning.
105. Shift-Reduce Parsing
Shift Reduce Parsing is a form of Bottom-Up Parsing. It
generates the Parse Tree from Leaves to the Root.
Shift reduce parsing is a process of reducing a string to the start
symbol of a grammar.
Shift reduce parsing uses a stack to hold the grammar and an
input buffer to hold the string..
106. •Shift reduce parsing performs the actions:
shift , reduce, accept,error
•At the shift action, the current symbol in the input string
is pushed to a stack.
•At each reduction, the symbols will replaced by the non-
terminals. The symbol is the right side of the production
and non-terminal is the left side of the production.
107. LR-PARSERS
LR parsers are used to parse the large class of context free
grammars. This technique is called LR(k) parsing.
L is left-to-right scanning of the input.
R is for constructing a right most derivation in reverse.
k is the number of input symbols of lookahead that are
used in making parsing decisions.
109. LR-PARSERS
SLR(l) – Simple LR
Works on smallest class of grammar.
Few number of states, hence very small table
Simple and fast construction.
LR( 1) – LR parser
Also called as Canonical LR parser.
Works on complete set of LR(l) Grammar.
Generates large table and large number of states.
Slow construction.
110. LR-PARSERS
LALR(l) – Look ahead LR parser
o Works on intermediate size of grammar.
o Number of states are same as in SLR(l).
In all type of LR parsing, input, output and stack are same but
parsing table is different.
Reasons for attractiveness of LR parser
LR parsers can handle a large class of context-free grammars.
The LR parsing method is a most general non-back tracking shift-reduce parsing
method.
An LR parser can detect the syntax errors as soon as they can occur.
LR grammars can describe more languages than LL grammars.
111. LR-PARSERS
Drawbacks of LR parsers
It is too much work to construct LR parser by hand. It needs an
automated parser generator.
If the grammar contains ambiguities or other constructs then it is
difficult to parse in a left-to-right scan of the input.