This document discusses parsing and different parsing techniques. It defines parsing as the process of verifying that tokens generated by a lexical analyzer follow the syntactic rules of a language. Parsers can be top-down or bottom-up. Top-down parsers build the parse tree from the root to leaves by following leftmost derivations, while bottom-up parsers start from the leaves and work upwards. The document also discusses LL(1), SLR, and LR(1) parsing techniques, including how to construct parsing tables and handle conflicts. LR(1) parsers are more restrictive than SLR(1) parsers in where they allow reduce operations.
2. What is Parsing?
In the syntax analysis phase, a compiler verifies
whether or not the tokens generated by the lexical
analyser are grouped according to the syntactic rules
of the language.
This is done by a parser.
It detects and reports any syntax errors and produces
a parse tree from which intermediate code can be
generated.
4. Parse Tree
It is a picture of derivation in which there is a
node for each nonterminal that appears in the
derivation. The children of a node are symbols by
which that nonterminal is replaced in the
derivation.
5. Parsers
The process of deriving the string from the given
grammar is known as derivation (parsing).
Depending upon how derivation is done we have
two kinds of parsers :-
Top Down Parser
Bottom Up Parser
6. Top Down Parser
Top down parsing attempts to build the parse tree from
root to leave. Top down parser will start from start symbol
and proceeds to string. It follows leftmost derivation.
7. First & Follow
FIRST(X) for a grammar symbol X is the set of terminals
that begin the strings derivable from X.
Rules to compute FIRST set:
If x is a terminal, then FIRST(x) = { ‘x’ }
If x-> Є, is a production rule, then add Є to FIRST(x).
If X->Y1 Y2 Y3….Yn is a production,
FIRST(X) = FIRST(Y1)
If FIRST(Y1) contains Є then FIRST(X) = { FIRST(Y1) – Є }
U { FIRST(Y2) }
If FIRST (Yi) contains Є for all i = 1 to n, then add Є to
FIRST(X).
8. First & Follow continued
Follow(X) to be the set of terminals that can appear
immediately to the right of Non-Terminal X in some
sentential form.
Rules to compute FOLLOW set:
Follow(S)={$} where S is the start symbol.
If A->pBq is a production where p,B,q any grammar
symbols then everything in FIRST(q) except Є is in
FOLLOW(B).
If A->pB is a production or a production A->pBq where
FIRST(q) contains Є then everything in FOLLOW(A) is in
FOLLOW(B).
9. LL(1) Parser
Predictive parsers can be constructed for LL(1) grammar,
the first ‘L’ stands for scanning the input from left to
right, the second ‘L’ stands for leftmost derivation and
‘1’ for using one input symbol lookahead at each step to
make parsing action decisions.
A grammar G is LL(1) if A → α | β are two distinct
productions of G:
for no terminal, both α and β derive strings beginning
with a.
at most one of α and β can derive empty string.
if β → t, then α does not derive any string beginning with
a terminal in FOLLOW(A).
10. Bottom Up Parser
Bottom-up parsing starts from the leaf nodes of a tree
and works in upward direction till it reaches the root
node. Here, we start from a sentence and then apply
production rules in reverse manner in order to reach the
start symbol.
11. SLR
SLR stands for Simple LR, this is basically a method of adding
lookahead to LR(0) parsers as simply as possible. The reduced
productions are written only in the FOLLOW of the variable
whose production is reduced.
The technique is based on the following observation:
If we are in a DFA state containing the item: A → α4 then a
possible action will be to reduce by this rule. Doing this
reduction would involve:
going from a sentential form that looks like: . . . α4 . . .
to one that looks like: . . . A4 . . .
12. By looking at examples, we can see that the symbol
immediately to the right of the marker in a sentential form
should correspond to the next input symbol: we can rephrase
this as: the symbol following A should be the next symbol in the
input.
Since we already have a method of characterising the set of
symbols which can follow a non-terminal in a sentential form,
we can formulate the SLR(1) reduction rule:
• Reduce by the rule A → α only if the current state contains A
→ α4 and the next input symbol is in F OLLOW(A)
This provides a quick and easy way to incorporate lookahead
into the parser; however, there are many languages which are
not SLR(1).
13. Construction of SLR parsing table –
Construct C = { I0, I1, ……. In}, the collection of sets of LR(0) items for
G’.
State i is constructed from Ii. The parsing actions for state i are
determined as follow :
If [ A -> ?.a? ] is in Ii and GOTO(Ii , a) = Ij , then set ACTION[i, a] to
“shift j”. Here a must be terminal.
If [A -> ?.] is in Ii, then set ACTION[i, a] to “reduce A -> ?” for all a
in FOLLOW(A); here A may not be S’.
Is [S -> S.] is in Ii, then set action[i, $] to “accept”. If any
conflicting actions are generated by the above rules we say that
the grammar is not SLR.
The goto transitions for state i are constructed for all nonterminals A
using the rule:
if GOTO( Ii , A ) = Ij then GOTO [i, A] = j.
All entries not defined by rules 2 and 3 are made error.
14. Eg:
If in the parsing table we have multiple
entries then it is said to be a conflict.
Consider the grammar
E -> T+E | T
T ->id
Augmented grammar –
E’ -> E
E -> T+E | T
T -> id
15.
16. LR(1)
LR(1) grammar, the first ‘L’ stands for scanning
the input from left to right, the ‘R’ stands for
rightmost derivation and ‘1’ for using one input
symbol lookahead at each step to make parsing
action decisions.
An LR(1) parser keeps track of which terminals
are actual permitted followers of a given symbol
in each given parsing state. It thus recognizes
fewer valid reduce operations, and thus some
shift-reduce and reduce-reduce conflicts are
avoided.
17. Change to the Augmented Rule
In LR(0) and SLR(1), we add a rule, called the
augmented rule, for recognition of the Start
symbol of the grammar:
S’ :- S $
In LR(1) and LALR(1), the format of this rule does
not include the $
S’ :- S
18. Grammar Limitations
A LR(1) grammar is one where the
construction of an LR(1) parse table does
not require two action (shift-reduce or
reduce-reduce) in any one cell. Many
conflicts in SLR(1) parse tables are avoided
if the LR(1) parse approach is used,
because the latter approach is more
restrictive on where it allows reduce
operations. An SLR(1) parse table may allow
reduces where the next input token should
not allow such.