SlideShare a Scribd company logo
1 of 109
Chapter 4 : Syntactic Parsing and
Semantic Analysis
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2021)
Outline
 Syntactic Parsing
Introduction
Basic Concepts
Rule-based parsing
CYK algorithm
Earley's algorithm Grammar formalisms and treebanks
Efficient parsing for context-free grammars (CFGs)
Statistical parsing and probabilistic CFGs (PCFGs)
Lexicalized PCFGs
3/29/2024 2
Outline
 Semantic Analysis
Lexical semantics and word-sense disambiguation
Compositional semantics
Semantic Role Labeling and Semantic Parsing.
3/29/2024 3
Introduction
 Syntactic parsing is a grammar-driven natural language parsing,
that is, analyzing a string of words (typically a sentence) to
determine its structural description according to a formal
grammar.
 In most circumstances, this is not a goal in itself but rather an
intermediary step for the purpose of further processing, such as the
assignment of a meaning to the sentence.
 To this end, the desired output of grammar-driven parsing is
typically a hierarchical, syntactic structure suitable for semantic
interpretation.
 The string of words constituting the input will usually have been
processed in separate phases of tokenization and lexical analysis
which is hence not part of parsing proper.
3/29/2024 4
Introduction …
 To get a grasp of the fundamental problems discussed here, it is
instructive to consider the ways in which parsers for natural
languages differ from parsers for computer languages:
 One such difference concerns the power of the grammar
formalisms used - the generative capacity,
 A second difference concerns the extreme structural ambiguity of
natural language. A classic example is the following:
 Put the block in the box on the table
 Assuming that “put” subcategorizes for two objects, there are two
possible analyses of :
 Put the block [in the box on the table]
 Put [the block in the box] on the table
3/29/2024 5
Introduction …
 Parsers for natural languages differ from parsers for computer
languages…
 A third difference stems from the fact that natural language data
are inherently noisy, both because of errors (under some
conception of “error”) and because of the ever persisting
incompleteness of lexicon and grammar relative to the unlimited
number of possible utterances which constitute the language.
 In contrast, a computer language has a complete syntax
specification, which means that by definition all correct input
strings are parsable.
3/29/2024 6
Introduction …
 Parsers for natural languages differ from parsers for computer
languages…
 A third difference ...
 In natural language parsing, it is notoriously difficult to distinguish
whether a failure to produce a parsing result is due to an error in the
input or to the lack of coverage of the grammar, also because a
natural language by its nature has no precise delimitation.
 Thus, input not licensed by the grammar may well be perfectly
adequate according to native speakers of the language.
 Moreover, input containing errors may still carry useful bits of
information that might be desirable to try to recover.
 Robustness refers to the ability of always producing some result in
response to such input.
3/29/2024 7
Basic Concepts
 A recognizer is a procedure that determines whether or not an
input sentence is grammatical according to the grammar
(including the lexicon).
 A parser is a recognizer that produces associated structural
analyses according to the grammar (e.g. parse trees or feature
terms).
 A robust parser attempts to produce useful output, such as a
partial analysis, even if the input is not covered by the grammar.
 It is possible to think of a grammar as inducing a search space
consisting of a set of states representing stages of successive
grammar-rule rewritings and a set of transitions between these
states.
3/29/2024 8
Basic Concepts…
 When analyzing a sentence, the parser (recognizer) must rewrite
the grammar rules in some sequence.
 A sequence that connects the state S, the string consisting of
just the start category of the grammar, and a state consisting of
exactly the string of input words, is called a derivation.
 Each state in the sequence then consists of a string over V and
is called a sentential form.
 If such a sequence exists, the sentence is said to be grammatical
according to the grammar.
3/29/2024 9
Basic Concepts…
 Parsers can be classified along several dimensions according to
the ways in which they carry out derivations.
 One such dimension concerns rule invocation:
 In a top-down derivation, each sentential form is produced from its
predecessor by replacing one nonterminal symbol A by a string of
terminal or nonterminal symbols X1 · · · Xd, where A → X1 · · · Xd
is a grammar rule.
 Conversely, in a bottom-up derivation, each sentential form is
produced by replacing X1 · · · Xd with A given the same grammar
rule, thus successively applying rules in the reverse direction.
3/29/2024 10
Basic Concepts…
3/29/2024 11
Basic Concepts…
3/29/2024 12
Basic Concepts…
 Another dimension concerns the way in which the parser deals
with ambiguity, in particular, whether the process is
deterministic or nondeterministic.
 In the former case, only a single, irrevocable choice may be made
when the parser is faced with local ambiguity.
 This choice is typically based on some form of look ahead or
systematic preference.
 A third dimension concerns whether parsing proceeds from left
to right (strictly speaking front to back) through the input or in
some other order, for example, inside-out from the right-hand-
side heads.
3/29/2024 13
Rule Based Parsing
 The rule-based approach has successfully been used in
developing many natural language processing systems.
 Systems that use rule-based transformations are based on a
core of solid linguistic knowledge.
 The linguistic knowledge acquired for one natural language
processing system may be reused to build knowledge required
for a similar task in another system.
3/29/2024 14
Rule Based Parsing…
 The advantage of the rule-based approach over the corpus-
based approach is clear for:
 1) Less-resourced languages, for which large corpora, possibly
parallel or bilingual, with representative structures and entities
are neither available nor easily affordable, and
 2) For morphologically rich languages, which even with the
availability of corpora suffer from data sparseness.
3/29/2024 15
CYK Algorithm
 The Cocke–Kasami–Younger (CKY, sometimes written CYK)
algorithm is one of the simplest context-free parsing algorithms.
 A reason for its simplicity is that it only works for grammars in
Chomsky Normal Form (CNF).
 A grammar is in CNF when each rule is either:
 (i) a unary terminal rule of the form A → w, or
 (ii) a binary nonterminal rule of the form A → BC.
 It is always possible to transform a grammar into CNF such that
it accepts the same language. However, the transformation can
change the structure of the grammar quite radically;
 E.g., if the original grammar has n rules, the transformed version
may in the worst case have O(n2) rules.
3/29/2024 16
CYK Algorithm…
 The CKY algorithm builds an upper triangular matrix T , where
each cell Ti,j (0 ≤ I,j ≤ n) is a set of nonterminals.
 The meaning of the statement A ∈ Ti,j is that A spans the input
words wi+1 · · · wj, or written more formally, A ⇒∗ wi+1 · · · wj.
3/29/2024 17
CYK Algorithm…
 CKY is a purely bottom-up algorithm consisting of two parts.
 First build the lexical cells Ti−1,i for the input word wi by applying the
lexical grammar rules,
 Then the nonlexical cells Ti,k (i < k−1) are filled by applying the
binary grammar rules:
Ti−1,i = { A | A → wi }
Ti,k = A | A → BC, i < j < k, B ∈ Ti,j, C ∈ Tj,k
 The sentence is recognized by the algorithm if S ∈ T…,n, where S is
the start symbol of the grammar.
 To make the algorithm less abstract, one should note that all cells
Ti,j and Tj,k (i < j < k) must already be known when building the
cell Ti,k. This means that it is required to be careful when
designing the i and k loops, so that smaller spans are calculated
before larger spans.
3/29/2024 18
CYK Algorithm…
 One solution is to start by looping over the end node k, and then
loop over the start node i in the reverse direction.
 The pseudo-code is as follows:
procedure CKY(T ,w1 · · · wn)
Ti,j := ∅for all 0 ≤ i, j ≤ n
for i := 1 to n do
for all lexical rules A → w do
if w = wi then add A to Ti−1,I
for k := 2 to n do
for i := k − 2 downto 0 do
for j := i + 1 to k − 1 do
for all binary rules A → BC do
if B ∈ Ti,j and C ∈ Tj,k then add A to Ti,k
3/29/2024 19
CYK Algorithm…
 But there are also several alternative possibilities for how to
encode the loops in the CKY algorithm;
 E.g., instead of letting the outer k loop range over end positions, it
is possible to equally well let it range over span lengths.
 It is important to keep in mind, however, that smaller spans must
be calculated before larger spans.
 As already mentioned, the CKY algorithm can only handle
grammars in CNF.
 Furthermore, converting a grammar to CNF is a bit
complicated, and can make the resulting grammar much larger.
 Instead, it is possible to modify the CKY algorithm directly to
handle unary grammar rules and longer right-hand sides.
3/29/2024 20
Top-down and Bottom-up
 Top-down parsing:
 Only build trees that have S at the root node may lead to trees that
do not yield the sentence.
 In naive search, top-down parsing is inefficient because structures
are created over and over again.
 Need a way to record that a particular structure has been predicted.
 Need a way to record where the structure was predicted wrt the
input.
 Bottom-up parsing:
 Only build trees that yield the sentence may lead to trees that do
not have S at the root.
3/29/2024 21
Top-down and Bottom-up…
 Pros/cons of top-down strategy:
 Never explores trees that aren't potential solutions, ones with the
wrong kind of root node.
 But explores trees that do not match the input sentence (predicts
input before inspecting input).
 Naive top-down parsers never terminate if G contains recursive
rules like X ! X Y (left recursive rules).
 Backtracking may discard valid constituents that have to be re-
discovered later (duplication of effort).
 Use a top-down strategy when you know what kind of constituent
you want to end up with (e.g. NP extraction, named entity
extraction). Avoid this strategy if you're stuck with a highly
recursive grammar.
3/29/2024 22
Earley's Algorithm Grammar
Formalisms and Treebanks
 Earley Algorithm
 The Earley algorithm is a parsing algorithm for arbitrary context-
free grammars.
 The Earley Parsing Algorithm is an efficient top-down parsing
algorithm that avoids some of the inefficiency associated with
purely naive search with the same top-down strategy (cf.
recursive descent parser).
 Intermediate solutions are created only once and stored in a chart
(dynamic programming).
 Left-recursion problem is solved by examining the input.
 Earley is not picky about what type of grammar it accepts, i.e., it
accepts arbitrary CFGs (cf. CKY).
3/29/2024 23
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm…
 Earley Parsing Algorithm
 Start with the start symbol S.
 Take the leftmost non-terminal and predict all possible expansions.
 If the next symbol in the expansion is a word, match it against the
input sentence (scan); otherwise, repeat.
 If there is nothing more to expand, the subtree is complete; in this
case, continue with the next incomplete subtree.
3/29/2024 24
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm…
 Dotted rules
 A dotted rule is a partially processed rule.
 Example: S → NP • VP
 The dot can be placed in front of the first symbol, behind the last
symbol, or between two symbols on the right-hand side of a rule.
 The general form of a dotted rule thus is A → α • β , where A → αβ
is the original, non-dotted rule.
3/29/2024 25
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm…
 Chart entries
 The chart contains entries of the form [min, max, A → α • β], where
min and max are positions in the input and A → α • β is a dotted
rule.
 Such an entry says: ‘We have built a parse tree whose first rule is A
→ αβ and where the part of this rule that corresponds to α covers
the words between min and max.’
3/29/2024 26
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm…
 Inference rules
3/29/2024 27
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm (Example)
3/29/2024 28
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm (Example)
3/29/2024 29
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm (Example)
3/29/2024 30
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm (Example)
3/29/2024 31
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm (Example)
3/29/2024 32
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm (Example)
3/29/2024 33
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm (Example)
3/29/2024 34
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm (Example)
3/29/2024 35
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm (Example)
3/29/2024 36
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm (Example)
3/29/2024 37
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm…
 Earley Parsing Algorithm
3/29/2024 38
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm…
 Earley Parsing Algorithm
3/29/2024 39
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm…
 Earley: fundamental operations
 Predict sub-structure (based on grammar)
 Scan partial solutions for a match
 Complete a sub-structure (i.e., build constituents)
3/29/2024 40
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley Algorithm…
 Recogniser/parser
 When parsing is complete, is there a chart entry? [0, n, S →
α • ]
 Recognizer
 If we want a parser, we have to add back pointers, and
retrieve a tree.
 Earley’s algorithm can be used for PCFGs, but it is more
complicated than for CKY.
3/29/2024 41
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley's Algorithm Grammar Formalisms
 Grammar Formalisms are mathematically precise notation for
formalizing a theory of grammar.
 CFG has been the most influential grammar formalism for
describing language syntax.
 This is not because CFG has been generally adopted as such for
linguistic description, but rather because most grammar
formalisms are derived from or can somehow be related to CFG.
 For this reason, CFG is often used as a base formalism when
parsing algorithms are described.
3/29/2024 42
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley's Algorithm Grammar Formalisms…
3/29/2024 43
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley's Algorithm Treebanks
 Treebanks are corpora in which each sentence has been annotated
with a syntactic analysis.
 Producing a high-quality treebank is both time-consuming and
expensive.
 One of the most widely known treebanks is the Penn TreeBank
(PTB).
3/29/2024 44
Earley's Algorithm Grammar
Formalisms and Treebanks…
 Earley's Algorithm Treebanks (Penn Treebank)
3/29/2024 45
Earley's Algorithm Grammar
Formalisms and Treebanks…
Treebank Grammars:
 Given a treebank, it is possible to construct a grammar by
reading rules off the phrase structure trees.
 A treebank grammar will account for all analyses in the
treebank.
 It will also account for sentences that were not observed in the
treebank.
 The simplest way to obtain rule probabilities is relative
frequency estimation.
 Step 1: Count the number of occurrences of each rule in the
treebank.
 Step 2: Divide this number by the total number of rule
occurrences for the same left-hand side.
3/29/2024 46
CKY Versus Earley
 The CKY algorithm has two disadvantages:
 It can only handle restricted grammars (CNF).
 It does not use top–down information.
 The Earley algorithm does not have these:
 The Earley algorithm is a parsing algorithm for arbitrary context-
free grammars.
 In contrast to the CKY algorithm, it also uses top–down
information.
 On the downside, it is more complicated.
 In contrast to the CKY algorithm, its probabilistic extension is not
straightforward.
3/29/2024 47
Efficient Parsing for Context-Free
Grammars (CFGs)…
 The standard way of defining a CFG is as a tuple G =(∑ ,N, S,
R), where ∑ and N are disjoint finite sets of terminal and
nonterminal symbols, respectively, and S ∈ N is the start
symbol.
 The nonterminals are also called categories, and the set V = N
∪ ∑ contains the symbols of the grammar.
 R is a finite set of production rules of the form A → α, where A
∈ N is a nonterminal and α ∈ V is a sequence of symbols.
3/29/2024 48
Efficient Parsing for Context-Free
Grammars (CFGs)…
 Although there are several conventions the followings can also
be considered:
 Capital letters A, B, C, . . . for nonterminals,
 Lower-case letters s, t, w, . . . for terminal symbols, and
 Uppercase X, Y, Z, . . . for general symbols (elements in V).
 Greek letters α, β, γ , . . . will be used for sequences of symbols,
and
 € for the empty sequence.
3/29/2024 49
Efficient Parsing for Context-Free
Grammars (CFGs)…
 Although there are several conventions the followings can also
be considered…
 The rewriting relation ⇒ is defined by αBγ ⇒ αβγ if and only if B → β.
 A phrase is a sequence of terminals β ∈ ∑ ∗ such that A ⇒ · · · ⇒ β for
some A ∈ N.
 Accordingly, the term phrase structure grammar is sometimes used for
grammars with at least context-free power.
 The sequence of rule expansions is called a derivation of β from A.
 A (grammatical) sentence is a phrase that can be derived from the start
symbol S.
 The string language L(G) accepted by G is the set of sentences of G.
 Some algorithms only work for particular normal forms of
CFGs.
3/29/2024 50
Efficient Parsing for Context-Free
Grammars (CFGs)…
 In practice, pure CFG is not widely used for developing natural
language grammars (though grammar based language modeling
in speech recognition is one such case).
 One reason for this is that CFG is not expressive enough—it
cannot describe all peculiarities of natural language,
 E.g., Geez, Swiss–German or Dutch scrambling, or Scandinavian
long-distance dependencies.
 But the main practical reason is that it is difficult to use;
 E.g., agreement, inflection, and other common phenomena are
complicated to describe using CFG.
3/29/2024 51
Efficient Parsing for Context-Free
Grammars (CFGs)…
 Example
 The example grammar in the
Figure is over generating—it
recognizes both the noun
phrases “a men” and “an man,”
as well as the sentence “the
men mans a ship.”
3/29/2024 52
 However, to make the grammar syntactically correct, we must
duplicate the categories Noun, Det, and NP into singular and plural
versions.
 All grammar rules involving these categories must be duplicated too.
And if the language is, e.g., German, then Det and Noun have to be
inflected on number (SING/PLUR), gender (FEM/NEUTR/MASC)
and, case (NOM/ACC/DAT/GEN).
Statistical Parsing and Probabilistic
CFGs (PCFGs)
3/29/2024 53
 Statistical Parsing
 Statistical parsing uses a probabilistic model of syntax in order to
assign probabilities to each parse tree.
 Provides principled approach to resolving syntactic ambiguity.
 Allows supervised learning of parsers from tree-banks of parse
trees provided by human linguists.
 Also allows unsupervised learning of parsers from unannotated
text, but the accuracy of such parsers has been limited.
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 54
 PCFG
 A PCFG is a probabilistic version of a CFG where each
production has a probability.
 Probabilities of all productions rewriting a given non-terminal
must add to 1, defining a distribution for each non-terminal.
 String generation is now probabilistic where production
probabilities are used to non-deterministically select a production
for rewriting a given non-terminal.
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 55
 Simple PCFG for English
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 56
 Sentence probability (Derivation Probability):
 Assume productions for each node are chosen independently.
 Probability of derivation is the product of the probabilities of its
productions.
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 57
 Syntactic Disambiguation:
 Resolve ambiguity by picking most probable parse tree.
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 58
 Sentence Probability:
 Probability of a sentence is the sum of the probabilities of all of
its derivations.
P(“book the flight through Houston”) =
P(D1) + P(D2) = 0.0000216 + 0.00001296
= 0.00003456
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 59
 Three Useful PCFG Tasks:
 Observation likelihood: to classify and order sentences.
 Useful for language modeling for speech recognition,
translation, word prediction.
 Parse trees are richer language models than Ngrams.
 Most likely derivation: To determine the most likely parse tree for
a sentence.
 Maximum likelihood training: To train a PCFG to fit empirical
training data.
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 60
 PCFG: Observation Likelihood
 There is an algorithm called the Inside algorithm for efficiently
determining how likely a string is to be produced by a PCFG.
 Can use a PCFG as a language model to choose between
alternative sentences for speech recognition or machine
translation.
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 61
 PCFG: Most Likely Derivation:
 There is an analog to the Viterbi algorithm to efficiently
determine the most probable derivation (parse tree) for a
sentence.
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 62
 PCFG: Most Likely Derivation
 There is an analog to the Viterbi algorithm to efficiently
determine the most probable derivation (parse tree) for a
sentence.
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 63
 PCFG: Supervised Training
 If parse trees are provided for training sentences, a grammar and
its parameters can all be estimated directly from counts
accumulated from the tree-bank (with appropriate smoothing).
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 64
 PCFG: Maximum Likelihood Training
 Given a set of sentences, induce a grammar that maximizes the
probability that this data was generated from this grammar.
 Assume the number of non-terminals in the grammar is specified.
 Only need to have an unannotated set of sequences generated
from the model.
 Does not need correct parse trees for these sentences.
 In this sense, it is unsupervised.
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 65
 PCFG: Maximum Likelihood Training
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 66
 Inside-Outside:
 The Inside-Outside algorithm is a version of EM for
unsupervised learning of a PCFG.
 Analogous to Baum-Welch (forward-backward) for HMMs.
 Given the number of non-terminals, construct all possible CNF
productions with these non-terminals and observed terminal
symbols.
 Use EM to iteratively train the probabilities of these productions
to locally maximize the likelihood of the data.
 Experimental results are not impressive, but recent work imposes
additional constraints to improve unsupervised grammar
learning.
Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 67
 Vanilla PCFG Limitations:
 Since probabilities of productions do not rely on specific words
or concepts, only general structural disambiguation is possible
(e.g. prefer to attach PPs to Nominals).
 Consequently, vanilla PCFGs cannot resolve syntactic
ambiguities that require semantics to resolve, e.g. ate with fork
vs. meatballs.
 In order to work well, PCFGs must be lexicalized, i.e.
productions must be specialized to specific words by including
their head-word in their LHS non-terminals (e.g. VP->ate).
Lexicalized PCFGs
3/29/2024 68
 Example of Importance of Lexicalization:
 A general preference for attaching PPs to NPs rather than VPs
can be learned by a vanilla PCFG.
 But the desired preference can depend on specific words.
Lexicalized PCFGs…
3/29/2024 69
 Example of Importance of Lexicalization:
 A general preference for attaching PPs to NPs rather than VPs
can be learned by a vanilla PCFG.
 But the desired preference can depend on specific words.
Lexicalized PCFGs…
3/29/2024 70
 Head-Words:
 Syntactic phrases usually have a word in them that is most
“central” to the phrase.
 Linguists have defined the concept of a lexical head of a phrase.
 Simple rules can identify the head of any phrase by percolating
head words up the parse tree.
 Head of a VP is the main verb,
 Head of an NP is the main noun,
 Head of a PP is the preposition,
 Head of a sentence is the head of its VP.
Lexicalized PCFGs…
3/29/2024 71
 Lexicalized Productions
 Specialized productions can be generated by including the head
word and its POS of each non-terminal as part of that non-
terminal’s symbol.
Lexicalized PCFGs…
3/29/2024 72
 Lexicalized Productions
Lexicalized PCFGs…
3/29/2024 73
 Parameterizing Lexicalized Productions
 Accurately estimating parameters on such a large number of very
specialized productions could require enormous amounts of
treebank data.
 Need some way of estimating parameters for lexicalized
productions that makes reasonable independence assumptions so
that accurate probabilities for very specific rules can be learned.
 Collins (1999) introduced one approach to learning effective
parameters for a lexicalized grammar.
Treebanks
3/29/2024 74
 English Penn Treebank: Standard corpus for testing syntactic
parsing consists of 1.2 M words of text from the Wall Street
Journal (WSJ).
 Typical to train on about 40,000 parsed sentences and test on an
additional standard disjoint test set of 2,416 sentences.
 Chinese Penn Treebank: 100K words from the Xinhua news
service.
 Other corpora existing in many languages, see the Wikipedia
article “Treebank”.
First WSJ Sentence
3/29/2024 75
( (S
(NP-SBJ
(NP (NNP Pierre) (NNP Vinken) )
(, ,)
(ADJP
(NP (CD 61) (NNS years) )
(JJ old) )
(, ,) )
(VP (MD will)
(VP (VB join)
(NP (DT the) (NN board) )
(PP-CLR (IN as)
(NP (DT a) (JJ nonexecutive) (NN director) ))
(NP-TMP (NNP Nov.) (CD 29) )))
(. .) ))
Parsing Evaluation Metrics
3/29/2024 76
 PARSEVAL metrics measure the fraction of the constituents
that match between the computed and human parse trees.
 If P is the system’s parse tree and T is the human parse tree
(the “gold standard”):
 Recall = (# correct constituents in P) / (# constituents in T)
 Precision = (# correct constituents in P) / (# constituents in P)
 Labeled Precision and labeled recall require getting the non-
terminal label on the constituent node correct to count as
correct.
 F1 is the harmonic mean of precision and recall.
Parsing Evaluation Metrics…
3/29/2024 77
 Computing Evaluation Metrics
Parsing Evaluation Metrics…
3/29/2024 78
 Treebank Results:
 Results of current state-of-the-art systems on the English Penn
WSJ treebank are 91-92% labeled F1.
 Statistical models such as PCFGs allow for probabilistic
resolution of ambiguities.
 PCFGs can be easily learned from treebanks.
 Lexicalization and non-terminal splitting are required to
effectively resolve many ambiguities.
 Current statistical parsers are quite accurate but not yet at the
level of human-expert agreement.
Semantic Analysis
3/29/2024 79
Outline
 Semantic Analysis:
Lexical semantics and word-sense disambiguation
Compositional semantics
Semantic Role Labeling and Semantic Parsing.
3/29/2024 80
Introduction
 Semantic analysis refers to analyzing the meanings of words,
fixed expressions, whole sentences, and utterances in context.
 In practice, this means translating original expressions into some
kind of semantic metalanguage.
 The major theoretical issues in semantic analysis therefore turn
on the nature of the metalanguage or equivalent
representational system.
3/29/2024 81
Introduction…
 For extended texts, specific NLP applications of semantic
analysis may include:
 Information retrieval,
 Information extraction,
 Text summarization,
 Data-mining, and
 Machine translation and translation aids.
3/29/2024 82
Introduction…
 Semantic analysis is also pertinent for much shorter texts, right
down to the single word level,
 For example, in understanding user queries and matching user
requirements to available data.
 Semantic analysis is also of high relevance in efforts to improve
Web ontologies and knowledge representation systems.
3/29/2024 83
Introduction…
 Various theories and approaches to semantic representation
can be roughly ranged along two dimensions:
 (1) formal vs. cognitive and
 (2) compositional vs. lexical
 Formal theories have been strongly advocated since the late
1960s while cognitive approaches have become popular in the
last three decades, driven also by influences from cognitive
science and psychology.
3/29/2024 84
Introduction…
 Compositional semantics is concerned with the bottom-up
construction of meaning, starting with the lexical items, whose
meanings are generally treated as given.
 Lexical semantics, on the other hand, aims at precisely
analyzing the meanings of lexical items, either by analyzing
their internal structure and content (decompositional
approaches) or by representing their relations to other elements
in the lexicon (relational approaches).
3/29/2024 85
Lexical Semantics and Word-Sense
Disambiguation
 Three Perspectives on Meaning
1. Lexical Semantics
 The meanings of individual words.
2. Formal Semantics (or Compositional Semantics or Sentential
Semantics)
 How those meanings combine to make meanings for individual
sentences or utterances.
3. Discourse or Pragmatics
 How those meanings combine with each other and with other facts
about various kinds of context to make meanings for a text or
discourse.
 Dialog or Conversation is often lumped together with Discourse.
3/29/2024 86
Lexical Semantics and Word-Sense
Disambiguation…
 Lexical Semantics
 Can be defined as the study of what individual lexical items mean,
why they mean, what they do, how we can represent all of this,
and where the combined interpretation for an utterance comes
from.
 Lexical semantics is concerned with the identification and
representation of the semantics of lexical items.
 If we are to identify the semantics of lexical items, we have to be
prepared for the eventuality of a given word having multiple
interpretations = polysemy (cf. monosemy).
 Polysemy = the condition of a single lexical item having
multiple meanings.
3/29/2024 87
Lexical Semantics and Word-Sense
Disambiguation…
 Lexical Semantics…
 There is a traditional division made between lexical semantics and
supralexical semantics.
 Lexical semantics, which concerns itself with the meanings of
words and fixed word combinations,
 Supralexical (combinational, or compositional) semantics, which
concerns itself with the meanings of the indefinitely large number of
word combinations—phrases and sentences—allowable under the
grammar.
 While there is some obvious appeal and validity to this division, it
is increasingly recognized that word-level semantics and
grammatical semantics interact and interpenetrate in various
ways.
3/29/2024 88
Lexical Semantics and Word-Sense
Disambiguation…
 Lexical Semantics…
 Approaches to Lexical Semantic Categorization
 Attributional semantic categorization
 Semantic clustering
 Relational semantic categorization
3/29/2024 89
Lexical Semantics and Word-Sense
Disambiguation…
 Word-Sense Disambiguation
 Many tasks in natural language processing require
disambiguation of ambiguous words.
 Question Answering
 Information Retrieval
 Machine Translation
 Text Mining
 Phone Help Systems
 Understanding how people disambiguate words is an interesting
problem that can provide insight in psycholinguistics.
3/29/2024 90
Lexical Semantics and Word-Sense
Disambiguation…
 Word-Sense Disambiguation
 Task of determining the meaning of an ambiguous word in the
given context.
 Bank:
 Edge of a river
or
 Financial institution that accepts money
 Refers to the resolution of lexical semantic ambiguity and its goal
is to attribute the correct senses to words.
3/29/2024 91
Lexical Semantics and Word-Sense
Disambiguation…
 Word-Sense Disambiguation…
 Given
 A word in context,
 A fixed inventory of potential word senses.
 Decide which sense of the word this is:
 English-to-Spanish MT
 Inventory is the set of Spanish translations
 Speech Synthesis
 Inventory is homographs with different pronunciations like bass
and bow.
 Automatic indexing of medical articles
 MeSH (Medical Subject Headings) thesaurus entries.
3/29/2024 92
Lexical Semantics and Word-Sense
Disambiguation…
 Word-Sense Disambiguation…
 Two variants of WSD task
 Lexical Sample task:
 Small pre-selected set of target words
 And inventory of senses for each word
 All-words task:
 Every word in an entire text
 A lexicon with senses for each word
 Sort-of like part-of-speech tagging
» Except each lemma has its own tagset
3/29/2024 93
Lexical Semantics and Word-Sense
Disambiguation…
 Word-Sense Disambiguation…
 Approaches
 Supervised
 Semi-supervised
 Unsupervised
» Dictionary-based techniques
» Selectional association
 Lightly supervised
» Bootstrapping
» Preferred Selectional Association
3/29/2024 94
Lexical Semantics and Word-Sense
Disambiguation…
 Word-Sense Disambiguation…
 WSD Approaches
 Disambiguation based on manually created rules,
 Disambiguation using machine readable dictionaries,
 Disambiguation using thesauri,
 Disambiguation based on unsupervised machine learning with
corpora.
3/29/2024 95
Lexical Semantics and Word-Sense
Disambiguation…
 Word-Sense Disambiguation…
 Lexical Ambiguity
 Most words in natural languages have multiple possible meanings.
– “pen” (noun)
» The dog is in the pen.
» The ink is in the pen.
– “take” (verb)
» Take one pill every morning.
» Take the first right past the stoplight.
 Syntax helps distinguish meanings for different parts of speech of an
ambiguous word.
– “conduct” (noun or verb)
» John’s conduct in class is unacceptable.
» John will conduct the orchestra on Thursday.
3/29/2024 96
Lexical Semantics and Word-Sense
Disambiguation…
 Word-Sense Disambiguation…
 Evaluation of WSD
 “In vitro”:
 Corpus developed in which one or more ambiguous words are
labeled with explicit sense tags according to some sense inventory.
 Corpus used for training and testing WSD and evaluated using
accuracy (percentage of labeled words correctly disambiguated).
» Use most common sense selection as a baseline.
 “In vivo”:
 Incorporate WSD system into some larger application system, such
as machine translation, information retrieval, or question answering.
 Evaluate relative contribution of different WSD methods by
measuring performance impact on the overall system on final task
(accuracy of MT, IR, or QA results).
3/29/2024 97
Lexical Semantics and Word-Sense
Disambiguation…
 Word-Sense Disambiguation…
 Issues in WSD
 What is the right granularity of a sense inventory?
 Integrating WSD with other NLP tasks
 Syntactic parsing
 Semantic role labeling
 Semantic parsing
 Does WSD actually improve performance on some real end-user
task?
 Information retrieval
 Information extraction
 Machine translation
 Question answering
3/29/2024 98
Lexical Semantics and Word-Sense
Disambiguation…
 WSD: Area of Research
 Assigning correct sense to words having electronic dictionary as
source of word definitions.
 Open research field in Natural Language Processing (NLP).
 Hard Problem which is a popular area for research.
 Used in speech synthesis by identifying the correct sense of the
word.
3/29/2024 99
Compositional Semantics
 Compositional semantics is concerned with the bottom-up
construction of meaning, starting with the lexical items, whose
meanings are generally treated as given.
 Compositional semantics: the construction of meaning
(generally expressed as logic) based on syntax.
3/29/2024 100
Compositional Semantics…
 Frame Semantics
 Originally developed by Fillmore 1968.
 Frames can represent situations of arbitrary granularity
(elementary or complex) and accordingly frame-semantic
analysis can be conducted on linguistic units of varying sizes, e.g.
phrases, sentences or whole documents,
 But most work has been devoted to frame semantics as a
formalism for sentence-level semantic analysis and most
commonly it has been applied for the analysis of verbal
predicate-argument structures.
3/29/2024 101
Compositional Semantics…
 Frame Semantics:
3/29/2024 102
Semantic Role Labeling and
Semantic Parsing
 Semantic role labeling
 Semantic role labeling, sometimes also called shallow semantic
parsing, is a task in NLP consisting of the detection of the
semantic arguments associated with the predicate or verb of a
sentence and their classification into their specific roles.
 For example, given a sentence like “Abebe sold the book to
Hagos", the task would be to recognize the verb "to sell" as
representing the predicate,
 “Abebe" as representing the seller (agent), "the book" as
representing the goods (theme), and “Hagos" as representing the
recipient.
3/29/2024 103
Semantic Role Labeling and
Semantic Parsing
 Semantic role labeling
 Semantic role…
 This is an important step towards making sense of the meaning of a
sentence.
 A semantic analysis of this sort is at a lower-level of abstraction
than a syntax tree, i.e. it has more categories, thus groups fewer
clauses in each category.
 For instance, "the book belongs to me" would need two labels such
as "possessed" and "possessor" whereas "the book was sold to
Hagos" would need two other labels such as "goal" (or "theme")
and "receiver" (or "recipient") even though these two clauses would
be very similar as far as "subject" and "object" functions are
concerned.
3/29/2024 104
Semantic Role Labeling and
Semantic Parsing…
 Semantic Parsing
 Traditional sentence parsing is often performed as a method of
understanding the exact meaning of a sentence or word,
sometimes with the aid of devices such as sentence diagrams.
 It usually emphasizes the importance of grammatical divisions
such as subject and predicate.
 Within computational linguistics parsing is used to refer to the
formal analysis by a computer of a sentence or other string of
words into its constituents, resulting in a parse tree showing their
syntactic relation to each other.
 Semantic parsing is the extension of broad-coverage probabilistic
parsers to represent sentence meaning.
3/29/2024 105
Question & Answer
3/29/2024 106
Thank You !!!
3/29/2024 107
Individual Assignment - Three
 Review the paper given below:
 Paper-5: Parsing Non-Recursive Context-Free Grammars
 Paper-6: GA-1 Approaches to Lexical Lemantic Categorization
3/29/2024 108
Group Assignment - One
 Discuss the three approaches to Lexical Semantic
Categorization:
 Group- One: Attribution Semantic Categorization
 Group-Two: Semantic Clustering
 Group-Three: Relational Semantic Categorization
3/29/2024 109

More Related Content

Similar to 4-Chapter Four-Syntactic Parsing and Semantic Analysis.ppt

Modification of some solution techniques of combinatorial
Modification of some solution techniques of combinatorialModification of some solution techniques of combinatorial
Modification of some solution techniques of combinatorialAlexander Decker
 
gemini model research paper by google is great
gemini model research paper by google is greatgemini model research paper by google is great
gemini model research paper by google is greatAdityaChourasiya9
 
A taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithmsA taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithmsunyil96
 
11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)ThennarasuSakkan
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.pptmilkesa13
 
SPACE-EFFICIENT K-MER ALGORITHM FOR GENERALISED SUFFIX TREE
SPACE-EFFICIENT K-MER ALGORITHM FOR GENERALISED SUFFIX TREESPACE-EFFICIENT K-MER ALGORITHM FOR GENERALISED SUFFIX TREE
SPACE-EFFICIENT K-MER ALGORITHM FOR GENERALISED SUFFIX TREEijitcs
 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESIJCSES Journal
 
10.1.1.70.8789
10.1.1.70.878910.1.1.70.8789
10.1.1.70.8789Hoài Bùi
 
Lectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducersLectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducersMatias Menendez
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachFindwise
 
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913Iffalia R
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture NotesFellowBuddy.com
 
Csr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminskiCsr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminskiCSR2011
 
A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...CSCJournals
 
Compoutational Physics
Compoutational PhysicsCompoutational Physics
Compoutational PhysicsSaad Shaukat
 
Principal Type Scheme for Session Types
Principal Type Scheme for Session TypesPrincipal Type Scheme for Session Types
Principal Type Scheme for Session TypesCSCJournals
 
LARGE LANGUAGE MODELS FOR CIPHERS
LARGE LANGUAGE MODELS FOR CIPHERSLARGE LANGUAGE MODELS FOR CIPHERS
LARGE LANGUAGE MODELS FOR CIPHERSgerogepatton
 

Similar to 4-Chapter Four-Syntactic Parsing and Semantic Analysis.ppt (20)

Modification of some solution techniques of combinatorial
Modification of some solution techniques of combinatorialModification of some solution techniques of combinatorial
Modification of some solution techniques of combinatorial
 
gemini model research paper by google is great
gemini model research paper by google is greatgemini model research paper by google is great
gemini model research paper by google is great
 
A taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithmsA taxonomy of suffix array construction algorithms
A taxonomy of suffix array construction algorithms
 
11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
SPACE-EFFICIENT K-MER ALGORITHM FOR GENERALISED SUFFIX TREE
SPACE-EFFICIENT K-MER ALGORITHM FOR GENERALISED SUFFIX TREESPACE-EFFICIENT K-MER ALGORITHM FOR GENERALISED SUFFIX TREE
SPACE-EFFICIENT K-MER ALGORITHM FOR GENERALISED SUFFIX TREE
 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
 
IJCTT-V4I9P137
IJCTT-V4I9P137IJCTT-V4I9P137
IJCTT-V4I9P137
 
Mcs 031
Mcs 031Mcs 031
Mcs 031
 
10.1.1.70.8789
10.1.1.70.878910.1.1.70.8789
10.1.1.70.8789
 
Lectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducersLectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducers
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised Approach
 
Er
ErEr
Er
 
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture Notes
 
Csr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminskiCsr2011 june17 15_15_kaminski
Csr2011 june17 15_15_kaminski
 
A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...
 
Compoutational Physics
Compoutational PhysicsCompoutational Physics
Compoutational Physics
 
Principal Type Scheme for Session Types
Principal Type Scheme for Session TypesPrincipal Type Scheme for Session Types
Principal Type Scheme for Session Types
 
LARGE LANGUAGE MODELS FOR CIPHERS
LARGE LANGUAGE MODELS FOR CIPHERSLARGE LANGUAGE MODELS FOR CIPHERS
LARGE LANGUAGE MODELS FOR CIPHERS
 

More from milkesa13

5-Information Extraction (IE) and Machine Translation (MT).ppt
5-Information Extraction (IE) and Machine Translation (MT).ppt5-Information Extraction (IE) and Machine Translation (MT).ppt
5-Information Extraction (IE) and Machine Translation (MT).pptmilkesa13
 
distributed system concerned lab sessions
distributed system concerned lab sessionsdistributed system concerned lab sessions
distributed system concerned lab sessionsmilkesa13
 
distributed system lab materials about ad
distributed system lab materials about addistributed system lab materials about ad
distributed system lab materials about admilkesa13
 
distributed system with lap practices at
distributed system with lap practices atdistributed system with lap practices at
distributed system with lap practices atmilkesa13
 
introduction to advanced distributed system
introduction to advanced distributed systemintroduction to advanced distributed system
introduction to advanced distributed systemmilkesa13
 
distributed system relation mapping (ORM)
distributed system relation mapping  (ORM)distributed system relation mapping  (ORM)
distributed system relation mapping (ORM)milkesa13
 
decision support system in management information
decision support system in management informationdecision support system in management information
decision support system in management informationmilkesa13
 
management system development and planning
management system development and planningmanagement system development and planning
management system development and planningmilkesa13
 
trends of information systems and artificial technology
trends of information systems and artificial technologytrends of information systems and artificial technology
trends of information systems and artificial technologymilkesa13
 

More from milkesa13 (9)

5-Information Extraction (IE) and Machine Translation (MT).ppt
5-Information Extraction (IE) and Machine Translation (MT).ppt5-Information Extraction (IE) and Machine Translation (MT).ppt
5-Information Extraction (IE) and Machine Translation (MT).ppt
 
distributed system concerned lab sessions
distributed system concerned lab sessionsdistributed system concerned lab sessions
distributed system concerned lab sessions
 
distributed system lab materials about ad
distributed system lab materials about addistributed system lab materials about ad
distributed system lab materials about ad
 
distributed system with lap practices at
distributed system with lap practices atdistributed system with lap practices at
distributed system with lap practices at
 
introduction to advanced distributed system
introduction to advanced distributed systemintroduction to advanced distributed system
introduction to advanced distributed system
 
distributed system relation mapping (ORM)
distributed system relation mapping  (ORM)distributed system relation mapping  (ORM)
distributed system relation mapping (ORM)
 
decision support system in management information
decision support system in management informationdecision support system in management information
decision support system in management information
 
management system development and planning
management system development and planningmanagement system development and planning
management system development and planning
 
trends of information systems and artificial technology
trends of information systems and artificial technologytrends of information systems and artificial technology
trends of information systems and artificial technology
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 

Recently uploaded (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 

4-Chapter Four-Syntactic Parsing and Semantic Analysis.ppt

  • 1. Chapter 4 : Syntactic Parsing and Semantic Analysis Adama Science and Technology University School of Electrical Engineering and Computing Department of CSE Dr. Mesfin Abebe Haile (2021)
  • 2. Outline  Syntactic Parsing Introduction Basic Concepts Rule-based parsing CYK algorithm Earley's algorithm Grammar formalisms and treebanks Efficient parsing for context-free grammars (CFGs) Statistical parsing and probabilistic CFGs (PCFGs) Lexicalized PCFGs 3/29/2024 2
  • 3. Outline  Semantic Analysis Lexical semantics and word-sense disambiguation Compositional semantics Semantic Role Labeling and Semantic Parsing. 3/29/2024 3
  • 4. Introduction  Syntactic parsing is a grammar-driven natural language parsing, that is, analyzing a string of words (typically a sentence) to determine its structural description according to a formal grammar.  In most circumstances, this is not a goal in itself but rather an intermediary step for the purpose of further processing, such as the assignment of a meaning to the sentence.  To this end, the desired output of grammar-driven parsing is typically a hierarchical, syntactic structure suitable for semantic interpretation.  The string of words constituting the input will usually have been processed in separate phases of tokenization and lexical analysis which is hence not part of parsing proper. 3/29/2024 4
  • 5. Introduction …  To get a grasp of the fundamental problems discussed here, it is instructive to consider the ways in which parsers for natural languages differ from parsers for computer languages:  One such difference concerns the power of the grammar formalisms used - the generative capacity,  A second difference concerns the extreme structural ambiguity of natural language. A classic example is the following:  Put the block in the box on the table  Assuming that “put” subcategorizes for two objects, there are two possible analyses of :  Put the block [in the box on the table]  Put [the block in the box] on the table 3/29/2024 5
  • 6. Introduction …  Parsers for natural languages differ from parsers for computer languages…  A third difference stems from the fact that natural language data are inherently noisy, both because of errors (under some conception of “error”) and because of the ever persisting incompleteness of lexicon and grammar relative to the unlimited number of possible utterances which constitute the language.  In contrast, a computer language has a complete syntax specification, which means that by definition all correct input strings are parsable. 3/29/2024 6
  • 7. Introduction …  Parsers for natural languages differ from parsers for computer languages…  A third difference ...  In natural language parsing, it is notoriously difficult to distinguish whether a failure to produce a parsing result is due to an error in the input or to the lack of coverage of the grammar, also because a natural language by its nature has no precise delimitation.  Thus, input not licensed by the grammar may well be perfectly adequate according to native speakers of the language.  Moreover, input containing errors may still carry useful bits of information that might be desirable to try to recover.  Robustness refers to the ability of always producing some result in response to such input. 3/29/2024 7
  • 8. Basic Concepts  A recognizer is a procedure that determines whether or not an input sentence is grammatical according to the grammar (including the lexicon).  A parser is a recognizer that produces associated structural analyses according to the grammar (e.g. parse trees or feature terms).  A robust parser attempts to produce useful output, such as a partial analysis, even if the input is not covered by the grammar.  It is possible to think of a grammar as inducing a search space consisting of a set of states representing stages of successive grammar-rule rewritings and a set of transitions between these states. 3/29/2024 8
  • 9. Basic Concepts…  When analyzing a sentence, the parser (recognizer) must rewrite the grammar rules in some sequence.  A sequence that connects the state S, the string consisting of just the start category of the grammar, and a state consisting of exactly the string of input words, is called a derivation.  Each state in the sequence then consists of a string over V and is called a sentential form.  If such a sequence exists, the sentence is said to be grammatical according to the grammar. 3/29/2024 9
  • 10. Basic Concepts…  Parsers can be classified along several dimensions according to the ways in which they carry out derivations.  One such dimension concerns rule invocation:  In a top-down derivation, each sentential form is produced from its predecessor by replacing one nonterminal symbol A by a string of terminal or nonterminal symbols X1 · · · Xd, where A → X1 · · · Xd is a grammar rule.  Conversely, in a bottom-up derivation, each sentential form is produced by replacing X1 · · · Xd with A given the same grammar rule, thus successively applying rules in the reverse direction. 3/29/2024 10
  • 13. Basic Concepts…  Another dimension concerns the way in which the parser deals with ambiguity, in particular, whether the process is deterministic or nondeterministic.  In the former case, only a single, irrevocable choice may be made when the parser is faced with local ambiguity.  This choice is typically based on some form of look ahead or systematic preference.  A third dimension concerns whether parsing proceeds from left to right (strictly speaking front to back) through the input or in some other order, for example, inside-out from the right-hand- side heads. 3/29/2024 13
  • 14. Rule Based Parsing  The rule-based approach has successfully been used in developing many natural language processing systems.  Systems that use rule-based transformations are based on a core of solid linguistic knowledge.  The linguistic knowledge acquired for one natural language processing system may be reused to build knowledge required for a similar task in another system. 3/29/2024 14
  • 15. Rule Based Parsing…  The advantage of the rule-based approach over the corpus- based approach is clear for:  1) Less-resourced languages, for which large corpora, possibly parallel or bilingual, with representative structures and entities are neither available nor easily affordable, and  2) For morphologically rich languages, which even with the availability of corpora suffer from data sparseness. 3/29/2024 15
  • 16. CYK Algorithm  The Cocke–Kasami–Younger (CKY, sometimes written CYK) algorithm is one of the simplest context-free parsing algorithms.  A reason for its simplicity is that it only works for grammars in Chomsky Normal Form (CNF).  A grammar is in CNF when each rule is either:  (i) a unary terminal rule of the form A → w, or  (ii) a binary nonterminal rule of the form A → BC.  It is always possible to transform a grammar into CNF such that it accepts the same language. However, the transformation can change the structure of the grammar quite radically;  E.g., if the original grammar has n rules, the transformed version may in the worst case have O(n2) rules. 3/29/2024 16
  • 17. CYK Algorithm…  The CKY algorithm builds an upper triangular matrix T , where each cell Ti,j (0 ≤ I,j ≤ n) is a set of nonterminals.  The meaning of the statement A ∈ Ti,j is that A spans the input words wi+1 · · · wj, or written more formally, A ⇒∗ wi+1 · · · wj. 3/29/2024 17
  • 18. CYK Algorithm…  CKY is a purely bottom-up algorithm consisting of two parts.  First build the lexical cells Ti−1,i for the input word wi by applying the lexical grammar rules,  Then the nonlexical cells Ti,k (i < k−1) are filled by applying the binary grammar rules: Ti−1,i = { A | A → wi } Ti,k = A | A → BC, i < j < k, B ∈ Ti,j, C ∈ Tj,k  The sentence is recognized by the algorithm if S ∈ T…,n, where S is the start symbol of the grammar.  To make the algorithm less abstract, one should note that all cells Ti,j and Tj,k (i < j < k) must already be known when building the cell Ti,k. This means that it is required to be careful when designing the i and k loops, so that smaller spans are calculated before larger spans. 3/29/2024 18
  • 19. CYK Algorithm…  One solution is to start by looping over the end node k, and then loop over the start node i in the reverse direction.  The pseudo-code is as follows: procedure CKY(T ,w1 · · · wn) Ti,j := ∅for all 0 ≤ i, j ≤ n for i := 1 to n do for all lexical rules A → w do if w = wi then add A to Ti−1,I for k := 2 to n do for i := k − 2 downto 0 do for j := i + 1 to k − 1 do for all binary rules A → BC do if B ∈ Ti,j and C ∈ Tj,k then add A to Ti,k 3/29/2024 19
  • 20. CYK Algorithm…  But there are also several alternative possibilities for how to encode the loops in the CKY algorithm;  E.g., instead of letting the outer k loop range over end positions, it is possible to equally well let it range over span lengths.  It is important to keep in mind, however, that smaller spans must be calculated before larger spans.  As already mentioned, the CKY algorithm can only handle grammars in CNF.  Furthermore, converting a grammar to CNF is a bit complicated, and can make the resulting grammar much larger.  Instead, it is possible to modify the CKY algorithm directly to handle unary grammar rules and longer right-hand sides. 3/29/2024 20
  • 21. Top-down and Bottom-up  Top-down parsing:  Only build trees that have S at the root node may lead to trees that do not yield the sentence.  In naive search, top-down parsing is inefficient because structures are created over and over again.  Need a way to record that a particular structure has been predicted.  Need a way to record where the structure was predicted wrt the input.  Bottom-up parsing:  Only build trees that yield the sentence may lead to trees that do not have S at the root. 3/29/2024 21
  • 22. Top-down and Bottom-up…  Pros/cons of top-down strategy:  Never explores trees that aren't potential solutions, ones with the wrong kind of root node.  But explores trees that do not match the input sentence (predicts input before inspecting input).  Naive top-down parsers never terminate if G contains recursive rules like X ! X Y (left recursive rules).  Backtracking may discard valid constituents that have to be re- discovered later (duplication of effort).  Use a top-down strategy when you know what kind of constituent you want to end up with (e.g. NP extraction, named entity extraction). Avoid this strategy if you're stuck with a highly recursive grammar. 3/29/2024 22
  • 23. Earley's Algorithm Grammar Formalisms and Treebanks  Earley Algorithm  The Earley algorithm is a parsing algorithm for arbitrary context- free grammars.  The Earley Parsing Algorithm is an efficient top-down parsing algorithm that avoids some of the inefficiency associated with purely naive search with the same top-down strategy (cf. recursive descent parser).  Intermediate solutions are created only once and stored in a chart (dynamic programming).  Left-recursion problem is solved by examining the input.  Earley is not picky about what type of grammar it accepts, i.e., it accepts arbitrary CFGs (cf. CKY). 3/29/2024 23
  • 24. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm…  Earley Parsing Algorithm  Start with the start symbol S.  Take the leftmost non-terminal and predict all possible expansions.  If the next symbol in the expansion is a word, match it against the input sentence (scan); otherwise, repeat.  If there is nothing more to expand, the subtree is complete; in this case, continue with the next incomplete subtree. 3/29/2024 24
  • 25. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm…  Dotted rules  A dotted rule is a partially processed rule.  Example: S → NP • VP  The dot can be placed in front of the first symbol, behind the last symbol, or between two symbols on the right-hand side of a rule.  The general form of a dotted rule thus is A → α • β , where A → αβ is the original, non-dotted rule. 3/29/2024 25
  • 26. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm…  Chart entries  The chart contains entries of the form [min, max, A → α • β], where min and max are positions in the input and A → α • β is a dotted rule.  Such an entry says: ‘We have built a parse tree whose first rule is A → αβ and where the part of this rule that corresponds to α covers the words between min and max.’ 3/29/2024 26
  • 27. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm…  Inference rules 3/29/2024 27
  • 28. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm (Example) 3/29/2024 28
  • 29. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm (Example) 3/29/2024 29
  • 30. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm (Example) 3/29/2024 30
  • 31. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm (Example) 3/29/2024 31
  • 32. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm (Example) 3/29/2024 32
  • 33. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm (Example) 3/29/2024 33
  • 34. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm (Example) 3/29/2024 34
  • 35. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm (Example) 3/29/2024 35
  • 36. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm (Example) 3/29/2024 36
  • 37. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm (Example) 3/29/2024 37
  • 38. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm…  Earley Parsing Algorithm 3/29/2024 38
  • 39. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm…  Earley Parsing Algorithm 3/29/2024 39
  • 40. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm…  Earley: fundamental operations  Predict sub-structure (based on grammar)  Scan partial solutions for a match  Complete a sub-structure (i.e., build constituents) 3/29/2024 40
  • 41. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley Algorithm…  Recogniser/parser  When parsing is complete, is there a chart entry? [0, n, S → α • ]  Recognizer  If we want a parser, we have to add back pointers, and retrieve a tree.  Earley’s algorithm can be used for PCFGs, but it is more complicated than for CKY. 3/29/2024 41
  • 42. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley's Algorithm Grammar Formalisms  Grammar Formalisms are mathematically precise notation for formalizing a theory of grammar.  CFG has been the most influential grammar formalism for describing language syntax.  This is not because CFG has been generally adopted as such for linguistic description, but rather because most grammar formalisms are derived from or can somehow be related to CFG.  For this reason, CFG is often used as a base formalism when parsing algorithms are described. 3/29/2024 42
  • 43. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley's Algorithm Grammar Formalisms… 3/29/2024 43
  • 44. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley's Algorithm Treebanks  Treebanks are corpora in which each sentence has been annotated with a syntactic analysis.  Producing a high-quality treebank is both time-consuming and expensive.  One of the most widely known treebanks is the Penn TreeBank (PTB). 3/29/2024 44
  • 45. Earley's Algorithm Grammar Formalisms and Treebanks…  Earley's Algorithm Treebanks (Penn Treebank) 3/29/2024 45
  • 46. Earley's Algorithm Grammar Formalisms and Treebanks… Treebank Grammars:  Given a treebank, it is possible to construct a grammar by reading rules off the phrase structure trees.  A treebank grammar will account for all analyses in the treebank.  It will also account for sentences that were not observed in the treebank.  The simplest way to obtain rule probabilities is relative frequency estimation.  Step 1: Count the number of occurrences of each rule in the treebank.  Step 2: Divide this number by the total number of rule occurrences for the same left-hand side. 3/29/2024 46
  • 47. CKY Versus Earley  The CKY algorithm has two disadvantages:  It can only handle restricted grammars (CNF).  It does not use top–down information.  The Earley algorithm does not have these:  The Earley algorithm is a parsing algorithm for arbitrary context- free grammars.  In contrast to the CKY algorithm, it also uses top–down information.  On the downside, it is more complicated.  In contrast to the CKY algorithm, its probabilistic extension is not straightforward. 3/29/2024 47
  • 48. Efficient Parsing for Context-Free Grammars (CFGs)…  The standard way of defining a CFG is as a tuple G =(∑ ,N, S, R), where ∑ and N are disjoint finite sets of terminal and nonterminal symbols, respectively, and S ∈ N is the start symbol.  The nonterminals are also called categories, and the set V = N ∪ ∑ contains the symbols of the grammar.  R is a finite set of production rules of the form A → α, where A ∈ N is a nonterminal and α ∈ V is a sequence of symbols. 3/29/2024 48
  • 49. Efficient Parsing for Context-Free Grammars (CFGs)…  Although there are several conventions the followings can also be considered:  Capital letters A, B, C, . . . for nonterminals,  Lower-case letters s, t, w, . . . for terminal symbols, and  Uppercase X, Y, Z, . . . for general symbols (elements in V).  Greek letters α, β, γ , . . . will be used for sequences of symbols, and  € for the empty sequence. 3/29/2024 49
  • 50. Efficient Parsing for Context-Free Grammars (CFGs)…  Although there are several conventions the followings can also be considered…  The rewriting relation ⇒ is defined by αBγ ⇒ αβγ if and only if B → β.  A phrase is a sequence of terminals β ∈ ∑ ∗ such that A ⇒ · · · ⇒ β for some A ∈ N.  Accordingly, the term phrase structure grammar is sometimes used for grammars with at least context-free power.  The sequence of rule expansions is called a derivation of β from A.  A (grammatical) sentence is a phrase that can be derived from the start symbol S.  The string language L(G) accepted by G is the set of sentences of G.  Some algorithms only work for particular normal forms of CFGs. 3/29/2024 50
  • 51. Efficient Parsing for Context-Free Grammars (CFGs)…  In practice, pure CFG is not widely used for developing natural language grammars (though grammar based language modeling in speech recognition is one such case).  One reason for this is that CFG is not expressive enough—it cannot describe all peculiarities of natural language,  E.g., Geez, Swiss–German or Dutch scrambling, or Scandinavian long-distance dependencies.  But the main practical reason is that it is difficult to use;  E.g., agreement, inflection, and other common phenomena are complicated to describe using CFG. 3/29/2024 51
  • 52. Efficient Parsing for Context-Free Grammars (CFGs)…  Example  The example grammar in the Figure is over generating—it recognizes both the noun phrases “a men” and “an man,” as well as the sentence “the men mans a ship.” 3/29/2024 52  However, to make the grammar syntactically correct, we must duplicate the categories Noun, Det, and NP into singular and plural versions.  All grammar rules involving these categories must be duplicated too. And if the language is, e.g., German, then Det and Noun have to be inflected on number (SING/PLUR), gender (FEM/NEUTR/MASC) and, case (NOM/ACC/DAT/GEN).
  • 53. Statistical Parsing and Probabilistic CFGs (PCFGs) 3/29/2024 53  Statistical Parsing  Statistical parsing uses a probabilistic model of syntax in order to assign probabilities to each parse tree.  Provides principled approach to resolving syntactic ambiguity.  Allows supervised learning of parsers from tree-banks of parse trees provided by human linguists.  Also allows unsupervised learning of parsers from unannotated text, but the accuracy of such parsers has been limited.
  • 54. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 54  PCFG  A PCFG is a probabilistic version of a CFG where each production has a probability.  Probabilities of all productions rewriting a given non-terminal must add to 1, defining a distribution for each non-terminal.  String generation is now probabilistic where production probabilities are used to non-deterministically select a production for rewriting a given non-terminal.
  • 55. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 55  Simple PCFG for English
  • 56. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 56  Sentence probability (Derivation Probability):  Assume productions for each node are chosen independently.  Probability of derivation is the product of the probabilities of its productions.
  • 57. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 57  Syntactic Disambiguation:  Resolve ambiguity by picking most probable parse tree.
  • 58. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 58  Sentence Probability:  Probability of a sentence is the sum of the probabilities of all of its derivations. P(“book the flight through Houston”) = P(D1) + P(D2) = 0.0000216 + 0.00001296 = 0.00003456
  • 59. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 59  Three Useful PCFG Tasks:  Observation likelihood: to classify and order sentences.  Useful for language modeling for speech recognition, translation, word prediction.  Parse trees are richer language models than Ngrams.  Most likely derivation: To determine the most likely parse tree for a sentence.  Maximum likelihood training: To train a PCFG to fit empirical training data.
  • 60. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 60  PCFG: Observation Likelihood  There is an algorithm called the Inside algorithm for efficiently determining how likely a string is to be produced by a PCFG.  Can use a PCFG as a language model to choose between alternative sentences for speech recognition or machine translation.
  • 61. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 61  PCFG: Most Likely Derivation:  There is an analog to the Viterbi algorithm to efficiently determine the most probable derivation (parse tree) for a sentence.
  • 62. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 62  PCFG: Most Likely Derivation  There is an analog to the Viterbi algorithm to efficiently determine the most probable derivation (parse tree) for a sentence.
  • 63. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 63  PCFG: Supervised Training  If parse trees are provided for training sentences, a grammar and its parameters can all be estimated directly from counts accumulated from the tree-bank (with appropriate smoothing).
  • 64. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 64  PCFG: Maximum Likelihood Training  Given a set of sentences, induce a grammar that maximizes the probability that this data was generated from this grammar.  Assume the number of non-terminals in the grammar is specified.  Only need to have an unannotated set of sequences generated from the model.  Does not need correct parse trees for these sentences.  In this sense, it is unsupervised.
  • 65. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 65  PCFG: Maximum Likelihood Training
  • 66. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 66  Inside-Outside:  The Inside-Outside algorithm is a version of EM for unsupervised learning of a PCFG.  Analogous to Baum-Welch (forward-backward) for HMMs.  Given the number of non-terminals, construct all possible CNF productions with these non-terminals and observed terminal symbols.  Use EM to iteratively train the probabilities of these productions to locally maximize the likelihood of the data.  Experimental results are not impressive, but recent work imposes additional constraints to improve unsupervised grammar learning.
  • 67. Statistical Parsing and Probabilistic CFGs (PCFGs)… 3/29/2024 67  Vanilla PCFG Limitations:  Since probabilities of productions do not rely on specific words or concepts, only general structural disambiguation is possible (e.g. prefer to attach PPs to Nominals).  Consequently, vanilla PCFGs cannot resolve syntactic ambiguities that require semantics to resolve, e.g. ate with fork vs. meatballs.  In order to work well, PCFGs must be lexicalized, i.e. productions must be specialized to specific words by including their head-word in their LHS non-terminals (e.g. VP->ate).
  • 68. Lexicalized PCFGs 3/29/2024 68  Example of Importance of Lexicalization:  A general preference for attaching PPs to NPs rather than VPs can be learned by a vanilla PCFG.  But the desired preference can depend on specific words.
  • 69. Lexicalized PCFGs… 3/29/2024 69  Example of Importance of Lexicalization:  A general preference for attaching PPs to NPs rather than VPs can be learned by a vanilla PCFG.  But the desired preference can depend on specific words.
  • 70. Lexicalized PCFGs… 3/29/2024 70  Head-Words:  Syntactic phrases usually have a word in them that is most “central” to the phrase.  Linguists have defined the concept of a lexical head of a phrase.  Simple rules can identify the head of any phrase by percolating head words up the parse tree.  Head of a VP is the main verb,  Head of an NP is the main noun,  Head of a PP is the preposition,  Head of a sentence is the head of its VP.
  • 71. Lexicalized PCFGs… 3/29/2024 71  Lexicalized Productions  Specialized productions can be generated by including the head word and its POS of each non-terminal as part of that non- terminal’s symbol.
  • 72. Lexicalized PCFGs… 3/29/2024 72  Lexicalized Productions
  • 73. Lexicalized PCFGs… 3/29/2024 73  Parameterizing Lexicalized Productions  Accurately estimating parameters on such a large number of very specialized productions could require enormous amounts of treebank data.  Need some way of estimating parameters for lexicalized productions that makes reasonable independence assumptions so that accurate probabilities for very specific rules can be learned.  Collins (1999) introduced one approach to learning effective parameters for a lexicalized grammar.
  • 74. Treebanks 3/29/2024 74  English Penn Treebank: Standard corpus for testing syntactic parsing consists of 1.2 M words of text from the Wall Street Journal (WSJ).  Typical to train on about 40,000 parsed sentences and test on an additional standard disjoint test set of 2,416 sentences.  Chinese Penn Treebank: 100K words from the Xinhua news service.  Other corpora existing in many languages, see the Wikipedia article “Treebank”.
  • 75. First WSJ Sentence 3/29/2024 75 ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (, ,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (, ,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (. .) ))
  • 76. Parsing Evaluation Metrics 3/29/2024 76  PARSEVAL metrics measure the fraction of the constituents that match between the computed and human parse trees.  If P is the system’s parse tree and T is the human parse tree (the “gold standard”):  Recall = (# correct constituents in P) / (# constituents in T)  Precision = (# correct constituents in P) / (# constituents in P)  Labeled Precision and labeled recall require getting the non- terminal label on the constituent node correct to count as correct.  F1 is the harmonic mean of precision and recall.
  • 77. Parsing Evaluation Metrics… 3/29/2024 77  Computing Evaluation Metrics
  • 78. Parsing Evaluation Metrics… 3/29/2024 78  Treebank Results:  Results of current state-of-the-art systems on the English Penn WSJ treebank are 91-92% labeled F1.  Statistical models such as PCFGs allow for probabilistic resolution of ambiguities.  PCFGs can be easily learned from treebanks.  Lexicalization and non-terminal splitting are required to effectively resolve many ambiguities.  Current statistical parsers are quite accurate but not yet at the level of human-expert agreement.
  • 80. Outline  Semantic Analysis: Lexical semantics and word-sense disambiguation Compositional semantics Semantic Role Labeling and Semantic Parsing. 3/29/2024 80
  • 81. Introduction  Semantic analysis refers to analyzing the meanings of words, fixed expressions, whole sentences, and utterances in context.  In practice, this means translating original expressions into some kind of semantic metalanguage.  The major theoretical issues in semantic analysis therefore turn on the nature of the metalanguage or equivalent representational system. 3/29/2024 81
  • 82. Introduction…  For extended texts, specific NLP applications of semantic analysis may include:  Information retrieval,  Information extraction,  Text summarization,  Data-mining, and  Machine translation and translation aids. 3/29/2024 82
  • 83. Introduction…  Semantic analysis is also pertinent for much shorter texts, right down to the single word level,  For example, in understanding user queries and matching user requirements to available data.  Semantic analysis is also of high relevance in efforts to improve Web ontologies and knowledge representation systems. 3/29/2024 83
  • 84. Introduction…  Various theories and approaches to semantic representation can be roughly ranged along two dimensions:  (1) formal vs. cognitive and  (2) compositional vs. lexical  Formal theories have been strongly advocated since the late 1960s while cognitive approaches have become popular in the last three decades, driven also by influences from cognitive science and psychology. 3/29/2024 84
  • 85. Introduction…  Compositional semantics is concerned with the bottom-up construction of meaning, starting with the lexical items, whose meanings are generally treated as given.  Lexical semantics, on the other hand, aims at precisely analyzing the meanings of lexical items, either by analyzing their internal structure and content (decompositional approaches) or by representing their relations to other elements in the lexicon (relational approaches). 3/29/2024 85
  • 86. Lexical Semantics and Word-Sense Disambiguation  Three Perspectives on Meaning 1. Lexical Semantics  The meanings of individual words. 2. Formal Semantics (or Compositional Semantics or Sentential Semantics)  How those meanings combine to make meanings for individual sentences or utterances. 3. Discourse or Pragmatics  How those meanings combine with each other and with other facts about various kinds of context to make meanings for a text or discourse.  Dialog or Conversation is often lumped together with Discourse. 3/29/2024 86
  • 87. Lexical Semantics and Word-Sense Disambiguation…  Lexical Semantics  Can be defined as the study of what individual lexical items mean, why they mean, what they do, how we can represent all of this, and where the combined interpretation for an utterance comes from.  Lexical semantics is concerned with the identification and representation of the semantics of lexical items.  If we are to identify the semantics of lexical items, we have to be prepared for the eventuality of a given word having multiple interpretations = polysemy (cf. monosemy).  Polysemy = the condition of a single lexical item having multiple meanings. 3/29/2024 87
  • 88. Lexical Semantics and Word-Sense Disambiguation…  Lexical Semantics…  There is a traditional division made between lexical semantics and supralexical semantics.  Lexical semantics, which concerns itself with the meanings of words and fixed word combinations,  Supralexical (combinational, or compositional) semantics, which concerns itself with the meanings of the indefinitely large number of word combinations—phrases and sentences—allowable under the grammar.  While there is some obvious appeal and validity to this division, it is increasingly recognized that word-level semantics and grammatical semantics interact and interpenetrate in various ways. 3/29/2024 88
  • 89. Lexical Semantics and Word-Sense Disambiguation…  Lexical Semantics…  Approaches to Lexical Semantic Categorization  Attributional semantic categorization  Semantic clustering  Relational semantic categorization 3/29/2024 89
  • 90. Lexical Semantics and Word-Sense Disambiguation…  Word-Sense Disambiguation  Many tasks in natural language processing require disambiguation of ambiguous words.  Question Answering  Information Retrieval  Machine Translation  Text Mining  Phone Help Systems  Understanding how people disambiguate words is an interesting problem that can provide insight in psycholinguistics. 3/29/2024 90
  • 91. Lexical Semantics and Word-Sense Disambiguation…  Word-Sense Disambiguation  Task of determining the meaning of an ambiguous word in the given context.  Bank:  Edge of a river or  Financial institution that accepts money  Refers to the resolution of lexical semantic ambiguity and its goal is to attribute the correct senses to words. 3/29/2024 91
  • 92. Lexical Semantics and Word-Sense Disambiguation…  Word-Sense Disambiguation…  Given  A word in context,  A fixed inventory of potential word senses.  Decide which sense of the word this is:  English-to-Spanish MT  Inventory is the set of Spanish translations  Speech Synthesis  Inventory is homographs with different pronunciations like bass and bow.  Automatic indexing of medical articles  MeSH (Medical Subject Headings) thesaurus entries. 3/29/2024 92
  • 93. Lexical Semantics and Word-Sense Disambiguation…  Word-Sense Disambiguation…  Two variants of WSD task  Lexical Sample task:  Small pre-selected set of target words  And inventory of senses for each word  All-words task:  Every word in an entire text  A lexicon with senses for each word  Sort-of like part-of-speech tagging » Except each lemma has its own tagset 3/29/2024 93
  • 94. Lexical Semantics and Word-Sense Disambiguation…  Word-Sense Disambiguation…  Approaches  Supervised  Semi-supervised  Unsupervised » Dictionary-based techniques » Selectional association  Lightly supervised » Bootstrapping » Preferred Selectional Association 3/29/2024 94
  • 95. Lexical Semantics and Word-Sense Disambiguation…  Word-Sense Disambiguation…  WSD Approaches  Disambiguation based on manually created rules,  Disambiguation using machine readable dictionaries,  Disambiguation using thesauri,  Disambiguation based on unsupervised machine learning with corpora. 3/29/2024 95
  • 96. Lexical Semantics and Word-Sense Disambiguation…  Word-Sense Disambiguation…  Lexical Ambiguity  Most words in natural languages have multiple possible meanings. – “pen” (noun) » The dog is in the pen. » The ink is in the pen. – “take” (verb) » Take one pill every morning. » Take the first right past the stoplight.  Syntax helps distinguish meanings for different parts of speech of an ambiguous word. – “conduct” (noun or verb) » John’s conduct in class is unacceptable. » John will conduct the orchestra on Thursday. 3/29/2024 96
  • 97. Lexical Semantics and Word-Sense Disambiguation…  Word-Sense Disambiguation…  Evaluation of WSD  “In vitro”:  Corpus developed in which one or more ambiguous words are labeled with explicit sense tags according to some sense inventory.  Corpus used for training and testing WSD and evaluated using accuracy (percentage of labeled words correctly disambiguated). » Use most common sense selection as a baseline.  “In vivo”:  Incorporate WSD system into some larger application system, such as machine translation, information retrieval, or question answering.  Evaluate relative contribution of different WSD methods by measuring performance impact on the overall system on final task (accuracy of MT, IR, or QA results). 3/29/2024 97
  • 98. Lexical Semantics and Word-Sense Disambiguation…  Word-Sense Disambiguation…  Issues in WSD  What is the right granularity of a sense inventory?  Integrating WSD with other NLP tasks  Syntactic parsing  Semantic role labeling  Semantic parsing  Does WSD actually improve performance on some real end-user task?  Information retrieval  Information extraction  Machine translation  Question answering 3/29/2024 98
  • 99. Lexical Semantics and Word-Sense Disambiguation…  WSD: Area of Research  Assigning correct sense to words having electronic dictionary as source of word definitions.  Open research field in Natural Language Processing (NLP).  Hard Problem which is a popular area for research.  Used in speech synthesis by identifying the correct sense of the word. 3/29/2024 99
  • 100. Compositional Semantics  Compositional semantics is concerned with the bottom-up construction of meaning, starting with the lexical items, whose meanings are generally treated as given.  Compositional semantics: the construction of meaning (generally expressed as logic) based on syntax. 3/29/2024 100
  • 101. Compositional Semantics…  Frame Semantics  Originally developed by Fillmore 1968.  Frames can represent situations of arbitrary granularity (elementary or complex) and accordingly frame-semantic analysis can be conducted on linguistic units of varying sizes, e.g. phrases, sentences or whole documents,  But most work has been devoted to frame semantics as a formalism for sentence-level semantic analysis and most commonly it has been applied for the analysis of verbal predicate-argument structures. 3/29/2024 101
  • 102. Compositional Semantics…  Frame Semantics: 3/29/2024 102
  • 103. Semantic Role Labeling and Semantic Parsing  Semantic role labeling  Semantic role labeling, sometimes also called shallow semantic parsing, is a task in NLP consisting of the detection of the semantic arguments associated with the predicate or verb of a sentence and their classification into their specific roles.  For example, given a sentence like “Abebe sold the book to Hagos", the task would be to recognize the verb "to sell" as representing the predicate,  “Abebe" as representing the seller (agent), "the book" as representing the goods (theme), and “Hagos" as representing the recipient. 3/29/2024 103
  • 104. Semantic Role Labeling and Semantic Parsing  Semantic role labeling  Semantic role…  This is an important step towards making sense of the meaning of a sentence.  A semantic analysis of this sort is at a lower-level of abstraction than a syntax tree, i.e. it has more categories, thus groups fewer clauses in each category.  For instance, "the book belongs to me" would need two labels such as "possessed" and "possessor" whereas "the book was sold to Hagos" would need two other labels such as "goal" (or "theme") and "receiver" (or "recipient") even though these two clauses would be very similar as far as "subject" and "object" functions are concerned. 3/29/2024 104
  • 105. Semantic Role Labeling and Semantic Parsing…  Semantic Parsing  Traditional sentence parsing is often performed as a method of understanding the exact meaning of a sentence or word, sometimes with the aid of devices such as sentence diagrams.  It usually emphasizes the importance of grammatical divisions such as subject and predicate.  Within computational linguistics parsing is used to refer to the formal analysis by a computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other.  Semantic parsing is the extension of broad-coverage probabilistic parsers to represent sentence meaning. 3/29/2024 105
  • 108. Individual Assignment - Three  Review the paper given below:  Paper-5: Parsing Non-Recursive Context-Free Grammars  Paper-6: GA-1 Approaches to Lexical Lemantic Categorization 3/29/2024 108
  • 109. Group Assignment - One  Discuss the three approaches to Lexical Semantic Categorization:  Group- One: Attribution Semantic Categorization  Group-Two: Semantic Clustering  Group-Three: Relational Semantic Categorization 3/29/2024 109