4-Chapter Four-Syntactic Parsing and Semantic Analysis.ppt

Chapter 4 : Syntactic Parsing and
Semantic Analysis
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2021)

Outline
 Syntactic Parsing
Introduction
Basic Concepts
Rule-based parsing
CYK algorithm
Earley's algorithm Grammar formalisms and treebanks
Efficient parsing for context-free grammars (CFGs)
Statistical parsing and probabilistic CFGs (PCFGs)
Lexicalized PCFGs
3/29/2024 2

Outline
 Semantic Analysis
Lexical semantics and word-sense disambiguation
Compositional semantics
Semantic Role Labeling and Semantic Parsing.
3/29/2024 3

Introduction
 Syntactic parsing is a grammar-driven natural language parsing,
that is, analyzing a string of words (typically a sentence) to
determine its structural description according to a formal
grammar.
 In most circumstances, this is not a goal in itself but rather an
intermediary step for the purpose of further processing, such as the
assignment of a meaning to the sentence.
 To this end, the desired output of grammar-driven parsing is
typically a hierarchical, syntactic structure suitable for semantic
interpretation.
 The string of words constituting the input will usually have been
processed in separate phases of tokenization and lexical analysis
which is hence not part of parsing proper.
3/29/2024 4

Introduction …
 To get a grasp of the fundamental problems discussed here, it is
instructive to consider the ways in which parsers for natural
languages differ from parsers for computer languages:
 One such difference concerns the power of the grammar
formalisms used - the generative capacity,
 A second difference concerns the extreme structural ambiguity of
natural language. A classic example is the following:
 Put the block in the box on the table
 Assuming that “put” subcategorizes for two objects, there are two
possible analyses of :
 Put the block [in the box on the table]
 Put [the block in the box] on the table
3/29/2024 5

Introduction …
 Parsers for natural languages differ from parsers for computer
languages…
 A third difference stems from the fact that natural language data
are inherently noisy, both because of errors (under some
conception of “error”) and because of the ever persisting
incompleteness of lexicon and grammar relative to the unlimited
number of possible utterances which constitute the language.
 In contrast, a computer language has a complete syntax
specification, which means that by definition all correct input
strings are parsable.
3/29/2024 6

Introduction …
 Parsers for natural languages differ from parsers for computer
languages…
 A third difference ...
 In natural language parsing, it is notoriously difficult to distinguish
whether a failure to produce a parsing result is due to an error in the
input or to the lack of coverage of the grammar, also because a
natural language by its nature has no precise delimitation.
 Thus, input not licensed by the grammar may well be perfectly
adequate according to native speakers of the language.
 Moreover, input containing errors may still carry useful bits of
information that might be desirable to try to recover.
 Robustness refers to the ability of always producing some result in
response to such input.
3/29/2024 7

Basic Concepts
 A recognizer is a procedure that determines whether or not an
input sentence is grammatical according to the grammar
(including the lexicon).
 A parser is a recognizer that produces associated structural
analyses according to the grammar (e.g. parse trees or feature
terms).
 A robust parser attempts to produce useful output, such as a
partial analysis, even if the input is not covered by the grammar.
 It is possible to think of a grammar as inducing a search space
consisting of a set of states representing stages of successive
grammar-rule rewritings and a set of transitions between these
states.
3/29/2024 8

Basic Concepts…
 When analyzing a sentence, the parser (recognizer) must rewrite
the grammar rules in some sequence.
 A sequence that connects the state S, the string consisting of
just the start category of the grammar, and a state consisting of
exactly the string of input words, is called a derivation.
 Each state in the sequence then consists of a string over V and
is called a sentential form.
 If such a sequence exists, the sentence is said to be grammatical
according to the grammar.
3/29/2024 9

Basic Concepts…
 Parsers can be classified along several dimensions according to
the ways in which they carry out derivations.
 One such dimension concerns rule invocation:
 In a top-down derivation, each sentential form is produced from its
predecessor by replacing one nonterminal symbol A by a string of
terminal or nonterminal symbols X1 · · · Xd, where A → X1 · · · Xd
is a grammar rule.
 Conversely, in a bottom-up derivation, each sentential form is
produced by replacing X1 · · · Xd with A given the same grammar
rule, thus successively applying rules in the reverse direction.
3/29/2024 10

Basic Concepts…
3/29/2024 11

Basic Concepts…
3/29/2024 12

Basic Concepts…
 Another dimension concerns the way in which the parser deals
with ambiguity, in particular, whether the process is
deterministic or nondeterministic.
 In the former case, only a single, irrevocable choice may be made
when the parser is faced with local ambiguity.
 This choice is typically based on some form of look ahead or
systematic preference.
 A third dimension concerns whether parsing proceeds from left
to right (strictly speaking front to back) through the input or in
some other order, for example, inside-out from the right-hand-
side heads.
3/29/2024 13

Rule Based Parsing
 The rule-based approach has successfully been used in
developing many natural language processing systems.
 Systems that use rule-based transformations are based on a
core of solid linguistic knowledge.
 The linguistic knowledge acquired for one natural language
processing system may be reused to build knowledge required
for a similar task in another system.
3/29/2024 14

Rule Based Parsing…
 The advantage of the rule-based approach over the corpus-
based approach is clear for:
 1) Less-resourced languages, for which large corpora, possibly
parallel or bilingual, with representative structures and entities
are neither available nor easily affordable, and
 2) For morphologically rich languages, which even with the
availability of corpora suffer from data sparseness.
3/29/2024 15

CYK Algorithm
 The Cocke–Kasami–Younger (CKY, sometimes written CYK)
algorithm is one of the simplest context-free parsing algorithms.
 A reason for its simplicity is that it only works for grammars in
Chomsky Normal Form (CNF).
 A grammar is in CNF when each rule is either:
 (i) a unary terminal rule of the form A → w, or
 (ii) a binary nonterminal rule of the form A → BC.
 It is always possible to transform a grammar into CNF such that
it accepts the same language. However, the transformation can
change the structure of the grammar quite radically;
 E.g., if the original grammar has n rules, the transformed version
may in the worst case have O(n2) rules.
3/29/2024 16

CYK Algorithm…
 The CKY algorithm builds an upper triangular matrix T , where
each cell Ti,j (0 ≤ I,j ≤ n) is a set of nonterminals.
 The meaning of the statement A ∈ Ti,j is that A spans the input
words wi+1 · · · wj, or written more formally, A ⇒∗ wi+1 · · · wj.
3/29/2024 17

CYK Algorithm…
 CKY is a purely bottom-up algorithm consisting of two parts.
 First build the lexical cells Ti−1,i for the input word wi by applying the
lexical grammar rules,
 Then the nonlexical cells Ti,k (i < k−1) are filled by applying the
binary grammar rules:
Ti−1,i = { A | A → wi }
Ti,k = A | A → BC, i < j < k, B ∈ Ti,j, C ∈ Tj,k
 The sentence is recognized by the algorithm if S ∈ T…,n, where S is
the start symbol of the grammar.
 To make the algorithm less abstract, one should note that all cells
Ti,j and Tj,k (i < j < k) must already be known when building the
cell Ti,k. This means that it is required to be careful when
designing the i and k loops, so that smaller spans are calculated
before larger spans.
3/29/2024 18

CYK Algorithm…
 One solution is to start by looping over the end node k, and then
loop over the start node i in the reverse direction.
 The pseudo-code is as follows:
procedure CKY(T ,w1 · · · wn)
Ti,j := ∅for all 0 ≤ i, j ≤ n
for i := 1 to n do
for all lexical rules A → w do
if w = wi then add A to Ti−1,I
for k := 2 to n do
for i := k − 2 downto 0 do
for j := i + 1 to k − 1 do
for all binary rules A → BC do
if B ∈ Ti,j and C ∈ Tj,k then add A to Ti,k
3/29/2024 19

CYK Algorithm…
 But there are also several alternative possibilities for how to
encode the loops in the CKY algorithm;
 E.g., instead of letting the outer k loop range over end positions, it
is possible to equally well let it range over span lengths.
 It is important to keep in mind, however, that smaller spans must
be calculated before larger spans.
 As already mentioned, the CKY algorithm can only handle
grammars in CNF.
 Furthermore, converting a grammar to CNF is a bit
complicated, and can make the resulting grammar much larger.
 Instead, it is possible to modify the CKY algorithm directly to
handle unary grammar rules and longer right-hand sides.
3/29/2024 20

Top-down and Bottom-up
 Top-down parsing:
 Only build trees that have S at the root node may lead to trees that
do not yield the sentence.
 In naive search, top-down parsing is inefficient because structures
are created over and over again.
 Need a way to record that a particular structure has been predicted.
 Need a way to record where the structure was predicted wrt the
input.
 Bottom-up parsing:
 Only build trees that yield the sentence may lead to trees that do
not have S at the root.
3/29/2024 21

Top-down and Bottom-up…
 Pros/cons of top-down strategy:
 Never explores trees that aren't potential solutions, ones with the
wrong kind of root node.
 But explores trees that do not match the input sentence (predicts
input before inspecting input).
 Naive top-down parsers never terminate if G contains recursive
rules like X ! X Y (left recursive rules).
 Backtracking may discard valid constituents that have to be re-
discovered later (duplication of effort).
 Use a top-down strategy when you know what kind of constituent
you want to end up with (e.g. NP extraction, named entity
extraction). Avoid this strategy if you're stuck with a highly
recursive grammar.
3/29/2024 22

Earley's Algorithm Grammar
Formalisms and Treebanks
 Earley Algorithm
 The Earley algorithm is a parsing algorithm for arbitrary context-
free grammars.
 The Earley Parsing Algorithm is an efficient top-down parsing
algorithm that avoids some of the inefficiency associated with
purely naive search with the same top-down strategy (cf.
recursive descent parser).
 Intermediate solutions are created only once and stored in a chart
(dynamic programming).
 Left-recursion problem is solved by examining the input.
 Earley is not picky about what type of grammar it accepts, i.e., it
accepts arbitrary CFGs (cf. CKY).
3/29/2024 23

Formalisms and Treebanks…
 Earley Algorithm…
 Earley Parsing Algorithm
 Start with the start symbol S.
 Take the leftmost non-terminal and predict all possible expansions.
 If the next symbol in the expansion is a word, match it against the
input sentence (scan); otherwise, repeat.
 If there is nothing more to expand, the subtree is complete; in this
case, continue with the next incomplete subtree.
3/29/2024 24

 Dotted rules
 A dotted rule is a partially processed rule.
 Example: S → NP • VP
 The dot can be placed in front of the first symbol, behind the last
symbol, or between two symbols on the right-hand side of a rule.
 The general form of a dotted rule thus is A → α • β , where A → αβ
is the original, non-dotted rule.
3/29/2024 25

 Chart entries
 The chart contains entries of the form [min, max, A → α • β], where
min and max are positions in the input and A → α • β is a dotted
rule.
 Such an entry says: ‘We have built a parse tree whose first rule is A
→ αβ and where the part of this rule that corresponds to α covers
the words between min and max.’
3/29/2024 26

 Inference rules
3/29/2024 27

 Earley Algorithm (Example)
3/29/2024 28

3/29/2024 29

3/29/2024 30

3/29/2024 31

3/29/2024 32

3/29/2024 33

3/29/2024 34

3/29/2024 35

3/29/2024 36

3/29/2024 37

3/29/2024 38

3/29/2024 39

 Earley: fundamental operations
 Predict sub-structure (based on grammar)
 Scan partial solutions for a match
 Complete a sub-structure (i.e., build constituents)
3/29/2024 40

 Recogniser/parser
 When parsing is complete, is there a chart entry? [0, n, S →
α • ]
 Recognizer
 If we want a parser, we have to add back pointers, and
retrieve a tree.
 Earley’s algorithm can be used for PCFGs, but it is more
complicated than for CKY.
3/29/2024 41

 Earley's Algorithm Grammar Formalisms
 Grammar Formalisms are mathematically precise notation for
formalizing a theory of grammar.
 CFG has been the most influential grammar formalism for
describing language syntax.
 This is not because CFG has been generally adopted as such for
linguistic description, but rather because most grammar
formalisms are derived from or can somehow be related to CFG.
 For this reason, CFG is often used as a base formalism when
parsing algorithms are described.
3/29/2024 42

 Earley's Algorithm Grammar Formalisms…
3/29/2024 43

 Earley's Algorithm Treebanks
 Treebanks are corpora in which each sentence has been annotated
with a syntactic analysis.
 Producing a high-quality treebank is both time-consuming and
expensive.
 One of the most widely known treebanks is the Penn TreeBank
(PTB).
3/29/2024 44

 Earley's Algorithm Treebanks (Penn Treebank)
3/29/2024 45

Treebank Grammars:
 Given a treebank, it is possible to construct a grammar by
reading rules off the phrase structure trees.
 A treebank grammar will account for all analyses in the
treebank.
 It will also account for sentences that were not observed in the
treebank.
 The simplest way to obtain rule probabilities is relative
frequency estimation.
 Step 1: Count the number of occurrences of each rule in the
treebank.
 Step 2: Divide this number by the total number of rule
occurrences for the same left-hand side.
3/29/2024 46

CKY Versus Earley
 The CKY algorithm has two disadvantages:
 It can only handle restricted grammars (CNF).
 It does not use top–down information.
 The Earley algorithm does not have these:
 The Earley algorithm is a parsing algorithm for arbitrary context-
free grammars.
 In contrast to the CKY algorithm, it also uses top–down
information.
 On the downside, it is more complicated.
 In contrast to the CKY algorithm, its probabilistic extension is not
straightforward.
3/29/2024 47

Efficient Parsing for Context-Free
Grammars (CFGs)…
 The standard way of defining a CFG is as a tuple G =(∑ ,N, S,
R), where ∑ and N are disjoint finite sets of terminal and
nonterminal symbols, respectively, and S ∈ N is the start
symbol.
 The nonterminals are also called categories, and the set V = N
∪ ∑ contains the symbols of the grammar.
 R is a finite set of production rules of the form A → α, where A
∈ N is a nonterminal and α ∈ V is a sequence of symbols.
3/29/2024 48

Grammars (CFGs)…
 Although there are several conventions the followings can also
be considered:
 Capital letters A, B, C, . . . for nonterminals,
 Lower-case letters s, t, w, . . . for terminal symbols, and
 Uppercase X, Y, Z, . . . for general symbols (elements in V).
 Greek letters α, β, γ , . . . will be used for sequences of symbols,
and
 € for the empty sequence.
3/29/2024 49

Grammars (CFGs)…
 Although there are several conventions the followings can also
be considered…
 The rewriting relation ⇒ is defined by αBγ ⇒ αβγ if and only if B → β.
 A phrase is a sequence of terminals β ∈ ∑ ∗ such that A ⇒ · · · ⇒ β for
some A ∈ N.
 Accordingly, the term phrase structure grammar is sometimes used for
grammars with at least context-free power.
 The sequence of rule expansions is called a derivation of β from A.
 A (grammatical) sentence is a phrase that can be derived from the start
symbol S.
 The string language L(G) accepted by G is the set of sentences of G.
 Some algorithms only work for particular normal forms of
CFGs.
3/29/2024 50

Grammars (CFGs)…
 In practice, pure CFG is not widely used for developing natural
language grammars (though grammar based language modeling
in speech recognition is one such case).
 One reason for this is that CFG is not expressive enough—it
cannot describe all peculiarities of natural language,
 E.g., Geez, Swiss–German or Dutch scrambling, or Scandinavian
long-distance dependencies.
 But the main practical reason is that it is difficult to use;
 E.g., agreement, inflection, and other common phenomena are
complicated to describe using CFG.
3/29/2024 51

Grammars (CFGs)…
 Example
 The example grammar in the
Figure is over generating—it
recognizes both the noun
phrases “a men” and “an man,”
as well as the sentence “the
men mans a ship.”
3/29/2024 52
 However, to make the grammar syntactically correct, we must
duplicate the categories Noun, Det, and NP into singular and plural
versions.
 All grammar rules involving these categories must be duplicated too.
And if the language is, e.g., German, then Det and Noun have to be
inflected on number (SING/PLUR), gender (FEM/NEUTR/MASC)
and, case (NOM/ACC/DAT/GEN).

Statistical Parsing and Probabilistic
CFGs (PCFGs)
3/29/2024 53
 Statistical Parsing
 Statistical parsing uses a probabilistic model of syntax in order to
assign probabilities to each parse tree.
 Provides principled approach to resolving syntactic ambiguity.
 Allows supervised learning of parsers from tree-banks of parse
trees provided by human linguists.
 Also allows unsupervised learning of parsers from unannotated
text, but the accuracy of such parsers has been limited.

CFGs (PCFGs)…
3/29/2024 54
 PCFG
 A PCFG is a probabilistic version of a CFG where each
production has a probability.
 Probabilities of all productions rewriting a given non-terminal
must add to 1, defining a distribution for each non-terminal.
 String generation is now probabilistic where production
probabilities are used to non-deterministically select a production
for rewriting a given non-terminal.

CFGs (PCFGs)…
3/29/2024 55
 Simple PCFG for English

CFGs (PCFGs)…
3/29/2024 56
 Sentence probability (Derivation Probability):
 Assume productions for each node are chosen independently.
 Probability of derivation is the product of the probabilities of its
productions.

CFGs (PCFGs)…
3/29/2024 57
 Syntactic Disambiguation:
 Resolve ambiguity by picking most probable parse tree.

CFGs (PCFGs)…
3/29/2024 58
 Sentence Probability:
 Probability of a sentence is the sum of the probabilities of all of
its derivations.
P(“book the flight through Houston”) =
P(D1) + P(D2) = 0.0000216 + 0.00001296
= 0.00003456

CFGs (PCFGs)…
3/29/2024 59
 Three Useful PCFG Tasks:
 Observation likelihood: to classify and order sentences.
 Useful for language modeling for speech recognition,
translation, word prediction.
 Parse trees are richer language models than Ngrams.
 Most likely derivation: To determine the most likely parse tree for
a sentence.
 Maximum likelihood training: To train a PCFG to fit empirical
training data.

CFGs (PCFGs)…
3/29/2024 60
 PCFG: Observation Likelihood
 There is an algorithm called the Inside algorithm for efficiently
determining how likely a string is to be produced by a PCFG.
 Can use a PCFG as a language model to choose between
alternative sentences for speech recognition or machine
translation.

CFGs (PCFGs)…
3/29/2024 61
 PCFG: Most Likely Derivation:
 There is an analog to the Viterbi algorithm to efficiently
determine the most probable derivation (parse tree) for a
sentence.

CFGs (PCFGs)…
3/29/2024 62
 PCFG: Most Likely Derivation
 There is an analog to the Viterbi algorithm to efficiently
determine the most probable derivation (parse tree) for a
sentence.

CFGs (PCFGs)…
3/29/2024 63
 PCFG: Supervised Training
 If parse trees are provided for training sentences, a grammar and
its parameters can all be estimated directly from counts
accumulated from the tree-bank (with appropriate smoothing).

CFGs (PCFGs)…
3/29/2024 64
 PCFG: Maximum Likelihood Training
 Given a set of sentences, induce a grammar that maximizes the
probability that this data was generated from this grammar.
 Assume the number of non-terminals in the grammar is specified.
 Only need to have an unannotated set of sequences generated
from the model.
 Does not need correct parse trees for these sentences.
 In this sense, it is unsupervised.

CFGs (PCFGs)…
3/29/2024 65
 PCFG: Maximum Likelihood Training

CFGs (PCFGs)…
3/29/2024 66
 Inside-Outside:
 The Inside-Outside algorithm is a version of EM for
unsupervised learning of a PCFG.
 Analogous to Baum-Welch (forward-backward) for HMMs.
 Given the number of non-terminals, construct all possible CNF
productions with these non-terminals and observed terminal
symbols.
 Use EM to iteratively train the probabilities of these productions
to locally maximize the likelihood of the data.
 Experimental results are not impressive, but recent work imposes
additional constraints to improve unsupervised grammar
learning.

CFGs (PCFGs)…
3/29/2024 67
 Vanilla PCFG Limitations:
 Since probabilities of productions do not rely on specific words
or concepts, only general structural disambiguation is possible
(e.g. prefer to attach PPs to Nominals).
 Consequently, vanilla PCFGs cannot resolve syntactic
ambiguities that require semantics to resolve, e.g. ate with fork
vs. meatballs.
 In order to work well, PCFGs must be lexicalized, i.e.
productions must be specialized to specific words by including
their head-word in their LHS non-terminals (e.g. VP->ate).

Lexicalized PCFGs
3/29/2024 68
 Example of Importance of Lexicalization:
 A general preference for attaching PPs to NPs rather than VPs
can be learned by a vanilla PCFG.
 But the desired preference can depend on specific words.

Lexicalized PCFGs…
3/29/2024 69
 Example of Importance of Lexicalization:
 A general preference for attaching PPs to NPs rather than VPs
can be learned by a vanilla PCFG.
 But the desired preference can depend on specific words.

3/29/2024 70
 Head-Words:
 Syntactic phrases usually have a word in them that is most
“central” to the phrase.
 Linguists have defined the concept of a lexical head of a phrase.
 Simple rules can identify the head of any phrase by percolating
head words up the parse tree.
 Head of a VP is the main verb,
 Head of an NP is the main noun,
 Head of a PP is the preposition,
 Head of a sentence is the head of its VP.

3/29/2024 71
 Lexicalized Productions
 Specialized productions can be generated by including the head
word and its POS of each non-terminal as part of that non-
terminal’s symbol.

3/29/2024 72
 Lexicalized Productions

3/29/2024 73
 Parameterizing Lexicalized Productions
 Accurately estimating parameters on such a large number of very
specialized productions could require enormous amounts of
treebank data.
 Need some way of estimating parameters for lexicalized
productions that makes reasonable independence assumptions so
that accurate probabilities for very specific rules can be learned.
 Collins (1999) introduced one approach to learning effective
parameters for a lexicalized grammar.

Treebanks
3/29/2024 74
 English Penn Treebank: Standard corpus for testing syntactic
parsing consists of 1.2 M words of text from the Wall Street
Journal (WSJ).
 Typical to train on about 40,000 parsed sentences and test on an
additional standard disjoint test set of 2,416 sentences.
 Chinese Penn Treebank: 100K words from the Xinhua news
service.
 Other corpora existing in many languages, see the Wikipedia
article “Treebank”.

First WSJ Sentence
3/29/2024 75
( (S
(NP-SBJ
(NP (NNP Pierre) (NNP Vinken) )
(, ,)
(ADJP
(NP (CD 61) (NNS years) )
(JJ old) )
(, ,) )
(VP (MD will)
(VP (VB join)
(NP (DT the) (NN board) )
(PP-CLR (IN as)
(NP (DT a) (JJ nonexecutive) (NN director) ))
(NP-TMP (NNP Nov.) (CD 29) )))
(. .) ))

Parsing Evaluation Metrics
3/29/2024 76
 PARSEVAL metrics measure the fraction of the constituents
that match between the computed and human parse trees.
 If P is the system’s parse tree and T is the human parse tree
(the “gold standard”):
 Recall = (# correct constituents in P) / (# constituents in T)
 Precision = (# correct constituents in P) / (# constituents in P)
 Labeled Precision and labeled recall require getting the non-
terminal label on the constituent node correct to count as
correct.
 F1 is the harmonic mean of precision and recall.

Parsing Evaluation Metrics…
3/29/2024 77
 Computing Evaluation Metrics

Parsing Evaluation Metrics…
3/29/2024 78
 Treebank Results:
 Results of current state-of-the-art systems on the English Penn
WSJ treebank are 91-92% labeled F1.
 Statistical models such as PCFGs allow for probabilistic
resolution of ambiguities.
 PCFGs can be easily learned from treebanks.
 Lexicalization and non-terminal splitting are required to
effectively resolve many ambiguities.
 Current statistical parsers are quite accurate but not yet at the
level of human-expert agreement.

Semantic Analysis
3/29/2024 79

Outline
 Semantic Analysis:
Lexical semantics and word-sense disambiguation
Compositional semantics
Semantic Role Labeling and Semantic Parsing.
3/29/2024 80

Introduction
 Semantic analysis refers to analyzing the meanings of words,
fixed expressions, whole sentences, and utterances in context.
 In practice, this means translating original expressions into some
kind of semantic metalanguage.
 The major theoretical issues in semantic analysis therefore turn
on the nature of the metalanguage or equivalent
representational system.
3/29/2024 81

Introduction…
 For extended texts, specific NLP applications of semantic
analysis may include:
 Information retrieval,
 Information extraction,
 Text summarization,
 Data-mining, and
 Machine translation and translation aids.
3/29/2024 82

Introduction…
 Semantic analysis is also pertinent for much shorter texts, right
down to the single word level,
 For example, in understanding user queries and matching user
requirements to available data.
 Semantic analysis is also of high relevance in efforts to improve
Web ontologies and knowledge representation systems.
3/29/2024 83

Introduction…
 Various theories and approaches to semantic representation
can be roughly ranged along two dimensions:
 (1) formal vs. cognitive and
 (2) compositional vs. lexical
 Formal theories have been strongly advocated since the late
1960s while cognitive approaches have become popular in the
last three decades, driven also by influences from cognitive
science and psychology.
3/29/2024 84

Introduction…
 Compositional semantics is concerned with the bottom-up
construction of meaning, starting with the lexical items, whose
meanings are generally treated as given.
 Lexical semantics, on the other hand, aims at precisely
analyzing the meanings of lexical items, either by analyzing
their internal structure and content (decompositional
approaches) or by representing their relations to other elements
in the lexicon (relational approaches).
3/29/2024 85

Lexical Semantics and Word-Sense
Disambiguation
 Three Perspectives on Meaning
1. Lexical Semantics
 The meanings of individual words.
2. Formal Semantics (or Compositional Semantics or Sentential
Semantics)
 How those meanings combine to make meanings for individual
sentences or utterances.
3. Discourse or Pragmatics
 How those meanings combine with each other and with other facts
about various kinds of context to make meanings for a text or
discourse.
 Dialog or Conversation is often lumped together with Discourse.
3/29/2024 86

Disambiguation…
 Lexical Semantics
 Can be defined as the study of what individual lexical items mean,
why they mean, what they do, how we can represent all of this,
and where the combined interpretation for an utterance comes
from.
 Lexical semantics is concerned with the identification and
representation of the semantics of lexical items.
 If we are to identify the semantics of lexical items, we have to be
prepared for the eventuality of a given word having multiple
interpretations = polysemy (cf. monosemy).
 Polysemy = the condition of a single lexical item having
multiple meanings.
3/29/2024 87

Disambiguation…
 Lexical Semantics…
 There is a traditional division made between lexical semantics and
supralexical semantics.
 Lexical semantics, which concerns itself with the meanings of
words and fixed word combinations,
 Supralexical (combinational, or compositional) semantics, which
concerns itself with the meanings of the indefinitely large number of
word combinations—phrases and sentences—allowable under the
grammar.
 While there is some obvious appeal and validity to this division, it
is increasingly recognized that word-level semantics and
grammatical semantics interact and interpenetrate in various
ways.
3/29/2024 88

Disambiguation…
 Lexical Semantics…
 Approaches to Lexical Semantic Categorization
 Attributional semantic categorization
 Semantic clustering
 Relational semantic categorization
3/29/2024 89

Disambiguation…
 Word-Sense Disambiguation
 Many tasks in natural language processing require
disambiguation of ambiguous words.
 Question Answering
 Information Retrieval
 Machine Translation
 Text Mining
 Phone Help Systems
 Understanding how people disambiguate words is an interesting
problem that can provide insight in psycholinguistics.
3/29/2024 90

Disambiguation…
 Word-Sense Disambiguation
 Task of determining the meaning of an ambiguous word in the
given context.
 Bank:
 Edge of a river
or
 Financial institution that accepts money
 Refers to the resolution of lexical semantic ambiguity and its goal
is to attribute the correct senses to words.
3/29/2024 91

Disambiguation…
 Word-Sense Disambiguation…
 Given
 A word in context,
 A fixed inventory of potential word senses.
 Decide which sense of the word this is:
 English-to-Spanish MT
 Inventory is the set of Spanish translations
 Speech Synthesis
 Inventory is homographs with different pronunciations like bass
and bow.
 Automatic indexing of medical articles
 MeSH (Medical Subject Headings) thesaurus entries.
3/29/2024 92

Disambiguation…
 Two variants of WSD task
 Lexical Sample task:
 Small pre-selected set of target words
 And inventory of senses for each word
 All-words task:
 Every word in an entire text
 A lexicon with senses for each word
 Sort-of like part-of-speech tagging
» Except each lemma has its own tagset
3/29/2024 93

Disambiguation…
 Approaches
 Supervised
 Semi-supervised
 Unsupervised
» Dictionary-based techniques
» Selectional association
 Lightly supervised
» Bootstrapping
» Preferred Selectional Association
3/29/2024 94

Disambiguation…
 WSD Approaches
 Disambiguation based on manually created rules,
 Disambiguation using machine readable dictionaries,
 Disambiguation using thesauri,
 Disambiguation based on unsupervised machine learning with
corpora.
3/29/2024 95

Disambiguation…
 Lexical Ambiguity
 Most words in natural languages have multiple possible meanings.
– “pen” (noun)
» The dog is in the pen.
» The ink is in the pen.
– “take” (verb)
» Take one pill every morning.
» Take the first right past the stoplight.
 Syntax helps distinguish meanings for different parts of speech of an
ambiguous word.
– “conduct” (noun or verb)
» John’s conduct in class is unacceptable.
» John will conduct the orchestra on Thursday.
3/29/2024 96

Disambiguation…
 Evaluation of WSD
 “In vitro”:
 Corpus developed in which one or more ambiguous words are
labeled with explicit sense tags according to some sense inventory.
 Corpus used for training and testing WSD and evaluated using
accuracy (percentage of labeled words correctly disambiguated).
» Use most common sense selection as a baseline.
 “In vivo”:
 Incorporate WSD system into some larger application system, such
as machine translation, information retrieval, or question answering.
 Evaluate relative contribution of different WSD methods by
measuring performance impact on the overall system on final task
(accuracy of MT, IR, or QA results).
3/29/2024 97

Disambiguation…
 Issues in WSD
 What is the right granularity of a sense inventory?
 Integrating WSD with other NLP tasks
 Syntactic parsing
 Semantic role labeling
 Semantic parsing
 Does WSD actually improve performance on some real end-user
task?
 Information retrieval
 Information extraction
 Machine translation
 Question answering
3/29/2024 98

Disambiguation…
 WSD: Area of Research
 Assigning correct sense to words having electronic dictionary as
source of word definitions.
 Open research field in Natural Language Processing (NLP).
 Hard Problem which is a popular area for research.
 Used in speech synthesis by identifying the correct sense of the
word.
3/29/2024 99

Compositional Semantics
 Compositional semantics is concerned with the bottom-up
construction of meaning, starting with the lexical items, whose
meanings are generally treated as given.
 Compositional semantics: the construction of meaning
(generally expressed as logic) based on syntax.
3/29/2024 100

Compositional Semantics…
 Frame Semantics
 Originally developed by Fillmore 1968.
 Frames can represent situations of arbitrary granularity
(elementary or complex) and accordingly frame-semantic
analysis can be conducted on linguistic units of varying sizes, e.g.
phrases, sentences or whole documents,
 But most work has been devoted to frame semantics as a
formalism for sentence-level semantic analysis and most
commonly it has been applied for the analysis of verbal
predicate-argument structures.
3/29/2024 101

Compositional Semantics…
 Frame Semantics:
3/29/2024 102

Semantic Role Labeling and
Semantic Parsing
 Semantic role labeling, sometimes also called shallow semantic
parsing, is a task in NLP consisting of the detection of the
semantic arguments associated with the predicate or verb of a
sentence and their classification into their specific roles.
 For example, given a sentence like “Abebe sold the book to
Hagos", the task would be to recognize the verb "to sell" as
representing the predicate,
 “Abebe" as representing the seller (agent), "the book" as
representing the goods (theme), and “Hagos" as representing the
recipient.
3/29/2024 103

Semantic Parsing
 Semantic role…
 This is an important step towards making sense of the meaning of a
sentence.
 A semantic analysis of this sort is at a lower-level of abstraction
than a syntax tree, i.e. it has more categories, thus groups fewer
clauses in each category.
 For instance, "the book belongs to me" would need two labels such
as "possessed" and "possessor" whereas "the book was sold to
Hagos" would need two other labels such as "goal" (or "theme")
and "receiver" (or "recipient") even though these two clauses would
be very similar as far as "subject" and "object" functions are
concerned.
3/29/2024 104

Semantic Parsing…
 Semantic Parsing
 Traditional sentence parsing is often performed as a method of
understanding the exact meaning of a sentence or word,
sometimes with the aid of devices such as sentence diagrams.
 It usually emphasizes the importance of grammatical divisions
such as subject and predicate.
 Within computational linguistics parsing is used to refer to the
formal analysis by a computer of a sentence or other string of
words into its constituents, resulting in a parse tree showing their
syntactic relation to each other.
 Semantic parsing is the extension of broad-coverage probabilistic
parsers to represent sentence meaning.
3/29/2024 105

Question & Answer
3/29/2024 106

Individual Assignment - Three
 Review the paper given below:
 Paper-5: Parsing Non-Recursive Context-Free Grammars
 Paper-6: GA-1 Approaches to Lexical Lemantic Categorization
3/29/2024 108

Group Assignment - One
 Discuss the three approaches to Lexical Semantic
Categorization:
 Group- One: Attribution Semantic Categorization
 Group-Two: Semantic Clustering
 Group-Three: Relational Semantic Categorization
3/29/2024 109

4-Chapter Four-Syntactic Parsing and Semantic Analysis.ppt

Recommended

Recommended

More Related Content

Similar to 4-Chapter Four-Syntactic Parsing and Semantic Analysis.ppt

Similar to 4-Chapter Four-Syntactic Parsing and Semantic Analysis.ppt (20)

More from milkesa13

More from milkesa13 (9)

Recently uploaded

Recently uploaded (20)

4-Chapter Four-Syntactic Parsing and Semantic Analysis.ppt