4. • LEX accepts and input specification which
consists of two components
• Specification of string representing Lexical
units
• Specification of semantic action aimed at
building TR (Translation Rule)
• TR consists of set of tables of lexical units and
a sequence of tokens for the lexical units
occurring in the source statement.
6. • YACC is available on Unix system.
• YACC can be used for the production of compiler for
PASCAL FORTRAN C C ++
• Lexical scanner must be supplied for use with YACC.
• This scanner is called by the parser when ever a new input
token is needed.
• The YACC parser generator accepts and input grammar for
the language being complied and set of actions
corresponding to rules of grammar.
• The parser generated by YACC use the bottom up parse
method.
• The parser produced by YACC has very good error detection
properties.
9. • The scanner recognizes words
• The parser recognizes syntactic units
• Parser functions:
– Check validity of source string based on specified
syntax rules
– Determine the syntactic structure of source string
10. • For an invalid string, the parser issues a
diagnostic message reporting the cause &
nature of errors in the string
• For valid string, it builds a parse tree to reflect
the sequence of the derivations or reduction
performed during parsing.
• Each step in parsing can identify an
elementary sub tree by deriving a string from
an NT of reducing a string to an NT
11. • Check and verify syntax based on specified
syntax rules
– Are regular expressions sufficient for describing
syntax?
• Example 1: Infix expressions
• Example 2: Nested parentheses
– We use Context-Free Grammars (CFGs) to specify
context-free syntax.
• A CFG describes how a sentence of a language may be
generated.
12. CFG
• A CFG is a quadruple (N, T, R, S) where
– N is the set of non-terminal symbols
– T is the set of terminal symbols
– S N is the starting symbol
– R is a set of rules
• Example: The grammar of nested parentheses
G = (N, T, R, S) where
– N = {S}
– T ={ (, ) }
– R ={ S (S) , S SS, S }
13. Derivations
• The language described by a CFG is the set of strings that can be derived
from the start symbol using the rules of the grammar.
• At each step, we choose a non-terminal to replace.
S (S) (SS) ((S)S) (( )S) (( )(S)) (( )((S))) (( )(( )))
sentential form
derivation
This example demonstrates a leftmost derivation : one where we always
expand the leftmost non-terminal in the sentential form.
14. Derivations and parse trees
• We can describe a derivation using a graphical
representation called parse tree:
– the root is labeled with the start symbol, S
– each internal node is labeled with a non-terminal
– the children of an internal node A are the right-
hand side of a production A
– each leaf is labeled with a terminal
• A parse tree has a unique leftmost and a
unique rightmost derivation (however, we
cannot tell which one was used by looking at
the tree)
15. Derivations and parse trees
• So, how can we use the grammar described
earlier to verify the syntax of "(( )((( ))))"?
– We must try to find a derivation for that string.
– We can work top-down (starting at the root/start
symbol) or bottom-up (starting at the leaves).
• Careful!
– There may be more than one grammars to
describe the same language.
– Not all grammars are suitable
17. Top-down Parsing
• Starts with sentence symbol & Builds down
towards terminal.
• It derives a identical string to a given I/P string
by applying rules of grammar to distinguish
symbol.
• Output would be a syntax tree for I/P string
• At every stage of derivation, an NT is chosen &
derivation affected according to grammar rule.
18. e.g. consider the grammar
ET+E/T
T V* T /V
V id
• Source string id + id * id
Prediction Predicted Sentential Form
ET+E T+E
TV V+ E
V id id + E
ET id + T
T V* T id + V * T
V id id + id * T
TV id + id * V
V id id + id * id
19. Limitations of Top-down parsing
1. The need of back tracking is must. Therefore
semantic analysis cant be implemented with
syntax analysis.
2. Back tracking slowdowns the parsing even if
now semantic actions are performed during
parsing.
3. Precise error indication is not possible in top
down analysis. When ever a mismatch is
encountered , the parser performs the standard
action of backtracking. When no predictions are
possible, the input string is declared erroneous.
20. 3. Certain grammar specification are not
amendable (suitable) to top down analysis.
The left-to-left nature of parser would push
the parser into an infinite loop of prediction
making. To make top-down parsing tensile ,it
is necessary to rewrite a grammar so as to
eliminate left recursion.
21. e.g. consider the grammar
E E+ E / E*E/E/id
• Source string id + id * id
• Backtracking
Applied Rule Predicted Sentential Applied Rule Predicted Sentential
Form Form
E E*E E* E E E+E E+E
E id id* E E id id + E
E E+ E Id * E+E E E*E Id + E*E
E id id *id + E E id Id + id * E
E id id *id + id E id Id + id * id
22. e.g. consider the grammar
E E+ E / E*E/E/id
• Source string id + id * id
• Left recursion
Applied Rule Predicted Sentential Form
E E*E E* E
E E*E E*E*E
E E*E E*E*E*E
E E*E E*E*E*E*E
E E*E E*E*E*E*E*E
23. Top-Down parsing without
backtracking
• Whenever a prediction has to be made for leftmost NT
of sentential form, a decision would be made as to
which RHS alternative for NT can be lead to a sentence
resembling input string.
• We must select RHS alternative which can produce the
next input symbol
• The grammar may too be modified to fulfill condition
• Due to deterministic nature of parsing such parses are
know as predictive parses. A popular from of predictive
parser used in practice is called recursive decent parser.
24. • e.g.
ET+E/T
TV*T/V
V id
• The modified grammar is--
ET E’
E’+E/€
TV T’
T’*T/€
V id
25. Prediction Predicted sentential form
ET E’ T E’
TV T’ V T’ E’
V id id T’ E’
T’€ id E’
E’+E id + E
ET E’ id + T E’
T V T’ id + V T’ E’
V id id +id T’ E’
T’*T id + id * T E’
TV T’ id + id * V T’ E’
V id id + id * id T’E’
T’€ id + id * E’
E’€ id + id * id
26. Recursive Descent Parser
• If recursive rule are exist in grammar then all
these procedures will be recursive & such parse
known as RDP.
• It is constructed by writing routines to recognize
each non-terminal symbol.
• It is well suited for many type of attributed
grammar.
• Synthesized attribute can be used because it
gives depth-first construct of parse tree
• It uses simple prediction parsing strategy.
27. • Error detection is restricted to routines which
gives defined set of symbols in first position.
• It makes possible recursive call to parse
procedures till the required terminal string is
obtain.
• RDP are easy to construct if programming
language permits.
28. Predictive Parser
(Table Driven Parser)
• When recursion is not permitted by
programming language in that case these
parsers are used.
• These are the table driven parsers, uses
prediction technique to eliminate back
tracking.
• For a given NT a prediction & a first terminal
symbol is produced.
29. • A parse table indicates what RHS alternative is
used to make prediction.
• It uses its own stack to store NT for which
prediction is not yet made.
30. • e.g.
ET+E/T
TV*T/V
V id
• The modified grammar is--
ET E’
E’+TE’/€
TV T’
T’*VT’/€
V id
31. Parse Table
NT Source Symbol
id + * -|
E ET E’
E’ E’+TE’ E’ €
T TV T’
T’ T’*VT’ T’ €
V V id
32. Prediction Symbol Predicted sentential form
ET E’ id T E’
TV T’ id V T’ E’
V id + id T’ E’
T’€ + id E’
E’+E id id + E
ET E’ id id + T E’
T V T’ id id + V T’ E’
V id * id +id T’ E’
TV T’ id id + id * V T’ E’
V id --| id + id * id T’E’
T’€ --| id + id * E’
E’€ id + id * id
33. Bottom–up Parsing [Shift Reduce
Parser]
• A bottom up parser attempt to develop the
syntax tree for an input string through a
sequence of reductions.
• If the input string can be reduced to the
distinguished symbol , the string is valid. If not
, error would have be detected and indicated
during the process of reduction itself.
• Attempts at reduction starts with the first
symbol in the string and process to the right.
34. Reduction should be processed as
follows
• For current sentential form, n symbols to the
left of current position are matches with all
RHS alternative of grammar.
• IF match is found, these n symbols are
replaced with NT on LHS of the rule.
• If symbol do not find a match, then n-1
symbols are matched, followed by n-2 symbols
etc.
35. • Until it is determined that no reduction is
possible at current stage of parsing, at this
point one new symbol of input string would
be admitted for parsing. This is known as Shift
action. Due to this nature of parsing , these
parses are known as left-to-left parser or shift
reduce parser.
36. Handles
• Handle of a string:
• Substring that matches the RHS of some
production AND whose reduction to the non-
terminal on the LHS is a step along the reverse
of some rightmost derivation
• A certain sentential form may have many
different handles.
• Right sentential forms of a non-ambiguous
grammar have one unique handle
37. • Rules of Production:-
• E E+E
• E E*E
• EE
• E id
50. Operator-Precedence Parser
• Operator grammar
– small, but an important class of grammars
– we may have an efficient operator precedence parser
(a shift-reduce parser) for an operator grammar.
• In an operator grammar, no production rule can have:
– at the right side
– two adjacent non-terminals at the right side.
• Ex:
E AB E EOE E E+E |
A a E id E*E |
B b O +|*|/ E/E | id
not operator grammar not operator grammar operator grammar
51. Precedence Relations
• In operator-precedence parsing, we define three
disjoint precedence relations between certain pairs of
terminals.
a <. b b has higher precedence than a
a =· b b has same precedence as a
a .> b b has lower precedence than a
• The determination of correct precedence relations
between terminals are based on the traditional
notions of associativity and precedence of operators.
(Unary minus causes a problem).
52. Using Operator-Precedence Relations
• The intention of the precedence relations is to
find the handle of a right-sentential form,
<. with marking the left end,
=· appearing in the interior of the handle, and
.> marking the right hand.
• In our input string $a1a2...an$, we insert the
precedence relation between the pairs of
terminals (the precedence relation holds
between the terminals in that pair).
53. Using Operator -Precedence Relations
E E+E | E-E | E*E | E/E | E^E | (E) | -E | id id + * $
id .> .> .>
The partial operator-precedence + <. .> <. .>
table for this grammar
* <. .> .> .>
$ <. <. <.
• Then the input string id+id*id with the precedence
relations inserted will be:
$ <. id .> + <. id .> * <. id .> $
54. To Find The Handles
1. Scan the string from left end until the first .> is
encountered.
2. Then scan backwards (to the left) over any =· until a <.
is encountered.
3. The handle contains everything to left of the first .>
and to the right of the <. is encountered.
$ <. id .> + <. id .> * <. id .> $ E id $ id + id * id $
$ <. + <. id .> * <. id .> $ E id $ E + id * id $
$ <. + <. * <. id .> $ E id $ E + E * id $
$ <. + < . * .> $ E E*E $ E + E * .E $
$ <. + . > $ E E+E $E+E$
$$ $E$