2. Introduction
• In POS tagging we can give a label for each word in a
sentence.
• In CFG, we can consider about the grammar structure of
the sentence.
• According to these grammar structure/ rules we can
determine whether the given sentence is legal or illegal.
• Since context-free grammars are a declarative formalism,
they don’t specify how the parse tree for a given sentence
should be computed.
• This lesson will, therefore, present some of the many
possible algorithms for automatically assigning a context-
free (phrase structure) tree to an input sentence.
3. Introduction (Cont…)
• Parse trees are directly useful in applications such
as grammar checking in word-processing
systems; a sentence which cannot be parsed may
have grammatical errors (or at least be hard to
read).
• In addition, parsing is an important intermediate
stage of representation for semantic analysis,
and thus plays an important role in applications
like machine translation, question answering,
and information extraction.
4. Parsing as Search
• Finding the right path through a finite-state
automaton, or finding the right transduction for an
input, can be viewed as a search problem.
• For FSAs, for example, the parser is searching through
the space of all possible paths through the automaton.
• In syntactic parsing, the parser can be viewed as
searching through the space of all possible parse trees
to find the correct parse tree for the sentence.
• Just as the search space of possible paths was defined
by the structure of the FSA, so the search space of
possible parse trees is defined by the grammar.
6. Parsing as Search (Cont…)
• For example, consider the following
ATIS sentence:
• Book that flight.
• How can we use the given simple
grammar to assign the parse tree for
this given sentence ? (In this case there
is only one parse tree, but it is possible
for there to be more than one.)
• The goal of a parsing search is to find
all trees whose root is the start symbol
S, which cover exactly the words in the
input.
7. Parsing as Search (Cont…)
• Regardless of the search algorithm we choose, there are
clearly two kinds of constraints that should help guide the
search.
– One kind of constraint comes from the data, i.e. the input
sentence itself. Whatever else is true of the final parse tree, we
know that there must be three leaves, and they must be the
words book, that, and flight.
– The second kind of constraint comes from the grammar. We
know that whatever else is true of the final parse tree, it must
have one root, which must be the start symbol S.
• These two constraints, give rise to the two search strategies
underlying most parsers:
– top-down or goal-directed search and
– bottom-up or data-directed search.
8. Top Down Parsing
• A top-down parser searches for a parse tree by trying to build from the
root node S down to the leaves.
• The algorithm starts by assuming the input can be derived by the
designated start symbol S.
• The next step is to find the tops of all trees which can start with S, by
looking for all the grammar rules with S on the left-hand side.
• Next expand the constituents in these new trees, just as we originally
expanded S.
• At each ply (level) of the search space, use the right-hand sides of the
rules to provide sets of expectations for the parser, which are then used to
recursively generate the rest of the trees.
• Trees are grown downward until they eventually reach the part-of-speech
categories at the bottom of the tree.
• Trees whose leaves fail to match all the words in the input can be rejected,
leaving behind those trees that represent successful parses.
9. Top Down Parsing (Cont…)
• The first tree tells us to expect an NP followed
by a VP, the second expects an Aux followed
by an NP and a VP, and the third a VP by itself.
10. Top Down Parsing (Cont…)
• Trees are grown downward until they eventually
reach the part-of-speech categories at the
bottom of the tree.
• At this point, trees whose leaves fail to match all
the words in the input can be rejected, leaving
behind those trees that represent successful
parses.
• Only the fifth parse tree in the third ply
(VPVerb NP) will eventually match the input
sentence Book that flight".
11. Bottom Up Parsing
• Bottom-up parsing is the earliest known parsing algorithm,
first suggested by Yngve (1955).
• The parser starts with the words of the input, and tries to
build trees from the words up, again by applying rules from
the grammar one at a time.
• The parse is successful if the parser succeeds in building a
tree rooted in the start symbol S that covers all of the input.
• In general, the parser extends one ply to the next by
looking for places in the parse-in-progress where the right-
hand side of some rule might fit.
• This contrasts with the earlier top-down parser, which
expanded trees by applying rules when their left-hand side
matched an unexpanded non-terminal.
12. Bottom Up Parsing (Cont…)
• The parser begins by looking up each word (book, that, and
flight) in the lexicon and building three partial trees with
the part of speech for each word.
• But the word book is ambiguous; it can be a noun or a verb.
• Thus the parser must consider two possible sets of trees.
• Each of the trees in the second ply isthen expanded.
• In the fifth ply, the interpretation of “book” as a noun has
been pruned from the search space.
• This is because this parse cannot be continued: there is no
rule in the grammar with the right-hand side Nominal NP.
15. Compare Top-Down and Bottom-Up
Parsing
• The top-down strategy never wastes time exploring trees
that cannot result in an S, since it begins by generating just
those trees.
• This means it also never explores subtrees that cannot find
a place in some S-rooted tree.
• In the bottom-up strategy, by contrast, trees that have no
hope of leading to an S, or fitting in with any of their
neighbors, are generated with wild abandon.
• For example the left branch of the search space in previous
example is completely wasted effort; it is based on
interpreting book as a Noun at the beginning of the
sentence despite the fact no such tree can lead to an S
given this grammar.
16. Compare Top-Down and Bottom-Up
Parsing (Cont…)
• The top-down approach has its own inefficiencies.
• While it does not waste time with trees that do not lead to an S, it
does spend considerable effort on S trees that are not consistent
with the input.
• Note that the first four of the six trees in the third ply in previous
example all have left branches that cannot match the word book.
• None of these trees could possibly be used in parsing this sentence.
• This weakness in top-down parsers arises from the fact that they
can generate trees before ever examining the input.
• Bottom-up parsers, on the other hand, never suggest trees that are
not at least locally grounded in the actual input.
• Neither of these approaches adequately exploits the constraints
presented by the grammar and the input words.
17. Compare Top-Down and Bottom-Up
Parsing (Cont…)
• Discuss the problems that AICT afflict standard
bottom-up or top-down parsers due to
ambiguity?
18. Probabilistic Context-free Grammar
(PCFG)
• The simplest augmentation of the context-free grammar is
the Probabilistic Context-Free Grammar (PCFG), also known
as the PCFG Stochastic Context-SCFG Free Grammar (SCFG),
first proposed by Booth (1969).
• Recall that a context-free grammar G is defined by four
parameters (N,Σ,P,S):
1. a set of nonterminal symbols (or ‘variables’) N
2. a set of terminal symbols Σ (disjoint from N)
3. a set of productions P, each of the form A β, where A
is a nonterminal and β is a string of symbols from the
infinite set of strings (Σ ∪ 𝑁)
∗
.
4. a designated start symbol S
19. Probabilistic Context-free Grammar
(PCFG) (Cont…)
• A probabilistic context-free grammar augments each
rule in P with a conditional probability:
A β [p]
• A PCFG is thus a 5-tuple G =(N, Σ, P, S, D), where D is a
function assigning probabilities to each rule in P.
• This function expresses the probability p that the given
nonterminal A will be expanded to the sequence β; it is
often referred to as P(Aβ) or as P(A β |A)
• Formally this is conditional probability of a given
expansion given the left-hand-side nonterminal A.
• Thus if we consider all the possible expansions of a
nonterminal, the sum of their probabilities must be 1.
20. Probabilistic Context-free Grammar
(PCFG) (Cont…)
• Figure shows a sample PCFG for a miniature grammar with only three nouns and
three verbs.
• Note that the probabilities of all of the expansions of a nonterminal sum to 1.
• Obviously in any real grammar there are a great many more rules for each
nonterminal and hence the probabilities of any particular rule are much smaller.
21. Probabilistic Context-free Grammar
(PCFG) (Cont…)
• How are these probabilities used?
• A PCFG can be used to estimate a number of
useful probabilities concerning a sentence and
its parse-tree(s).
• For example a PCFG assigns a probability to
each parse-tree T (i.e. each derivation) of a
sentence S.
• This attribute is useful in disambiguation.
22. Probabilistic Context-free Grammar
(PCFG) (Cont…)
• For example, consider the two parses of the sentence
“Can you book TWA flights” (one meaning ‘Can you
book flights on behalf of TWA’, and the other meaning
‘Can you book flights run by TWA’) shown in Figure
23. Probabilistic Context-free Grammar
(PCFG) (Cont…)
• The probability of a particular parse T is defined as the product of the
probabilities of all the rules r used to expand each node n in the parse
tree:
𝑃 𝑇, 𝑆 =
𝑛∈𝑇
𝑃(𝑟 𝑛 )
• The resulting probability P(T,S) is both the joint probability of the parse
and the sentence, and also the probability of the parse P(T).
• How can this be true?
• First, by definition of joint probability:
𝑃 𝑇, 𝑆 = 𝑃 𝑇 𝑃(𝑇|𝑆)
• But since a parse tree includes all the words of the sentence, P(S|T)is 1.
• Thus:
𝑃 𝑇, 𝑆 = 𝑃 𝑇 . 𝑃 𝑆 𝑇 = 𝑃(𝑇)
24. Probabilistic Context-free Grammar
(PCFG) (Cont…)
• The probability of each of the trees in previous
Figure can be computed by multiplying together
each of the rules used in the derivation.
• For example, the probability of the left tree in
Figure (call it Tl) and the right tree ( call it Tr) can
be computed as follows:
25. Probabilistic Context-free Grammar
(PCFG) (Cont…)
• We can see that the right tree in Figure has a
higher probability.
• Thus this parse would correctly be chosen by a
disambiguation algorithm which selects the parse
with the highest PCFG probability.
26. Probabilistic Context-free Grammar
(PCFG) (Cont…)
• Let’s formalize this intuition that picking the parse with the
highest probability is the correct way to do disambiguation.
• The disambiguation algorithm picks the best tree for a
sentence S out of the set of parse trees for S (which we’ll
call t(S).
• We want the parse tree T which is most likely given the
sentence S.
𝑇′
𝑠 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑇∈𝑡 𝑠 𝑃(𝑇|𝑆)
• By definition the probability P(T|S)can be rewritten as
P(T,S)/P(S), thus leading to:
𝑇′
𝑠 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑇∈𝑡(𝑠)
𝑃(𝑇, 𝑆)
𝑃(𝑆)
27. Probabilistic Context-free Grammar
(PCFG) (Cont…)
• Since we are maximizing over all parse trees
for the same sentence, P(S)will be a constant
for each tree, and so we can eliminate it:
𝑇′
𝑆 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑇∈𝑡 𝑠 𝑃(𝑇, 𝑆)
• Furthermore, since we showed above that
P(T,S)=P(T), the final equation for choosing the
most likely parse simplifies to choosing the
parse with the highest probability:
𝑇′
𝑆 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑇∈𝑡 𝑠 𝑃(𝑇)
28. Probabilistic Context-free Grammar
(PCFG) (Cont…)
• A second attribute of a PCFG is that it assigns a probability to the
string of words constituting a sentence.
• This is important in language modeling in speech recognition, spell-
correction, or augmentative communication.
• The probability of an unambiguous sentence is P(T,S)=P(T)or just
the probability of the single parse tree for that sentence.
• The probability of an ambiguous sentence is the sum of the
probabilities of all the parse trees for the sentence:
𝑃 𝑆 =
𝑇∈𝑡(𝑠)
𝑃(𝑇, 𝑆)
𝑃 𝑆 =
𝑇∈𝑡(𝑠)
𝑃(𝑇)
29. Probabilistic Context-free Grammar
(PCFG) (Cont…)
• A PCFG is said to be consistent if the sum of the
probabilities of all sentences in the language
equals 1.
• Certain kinds of recursive rules cause a grammar
to be inconsistent by causing infinitely looping
derivations for some sentences.
• For example a rule S S with probability 1 would
lead to lost probability mass due to derivations
that never terminate.
30. Probabilistic CYK Parsing of PCFGs
• CYK (Cocke-Younger-Kasami) algorithm is
essentially a bottom-up parser.
• Assume first that the PCFG is in Chomsky
normal form; that a grammar is in CNF if it is
ε-free and if in addition each production is
either of the form AB C or Aa.