Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
4-Chapter Four-Syntactic Parsing and Semantic Analysis.ppt
1. Chapter 4 : Syntactic Parsing and
Semantic Analysis
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2021)
3. Outline
Semantic Analysis
Lexical semantics and word-sense disambiguation
Compositional semantics
Semantic Role Labeling and Semantic Parsing.
3/29/2024 3
4. Introduction
Syntactic parsing is a grammar-driven natural language parsing,
that is, analyzing a string of words (typically a sentence) to
determine its structural description according to a formal
grammar.
In most circumstances, this is not a goal in itself but rather an
intermediary step for the purpose of further processing, such as the
assignment of a meaning to the sentence.
To this end, the desired output of grammar-driven parsing is
typically a hierarchical, syntactic structure suitable for semantic
interpretation.
The string of words constituting the input will usually have been
processed in separate phases of tokenization and lexical analysis
which is hence not part of parsing proper.
3/29/2024 4
5. Introduction …
To get a grasp of the fundamental problems discussed here, it is
instructive to consider the ways in which parsers for natural
languages differ from parsers for computer languages:
One such difference concerns the power of the grammar
formalisms used - the generative capacity,
A second difference concerns the extreme structural ambiguity of
natural language. A classic example is the following:
Put the block in the box on the table
Assuming that “put” subcategorizes for two objects, there are two
possible analyses of :
Put the block [in the box on the table]
Put [the block in the box] on the table
3/29/2024 5
6. Introduction …
Parsers for natural languages differ from parsers for computer
languages…
A third difference stems from the fact that natural language data
are inherently noisy, both because of errors (under some
conception of “error”) and because of the ever persisting
incompleteness of lexicon and grammar relative to the unlimited
number of possible utterances which constitute the language.
In contrast, a computer language has a complete syntax
specification, which means that by definition all correct input
strings are parsable.
3/29/2024 6
7. Introduction …
Parsers for natural languages differ from parsers for computer
languages…
A third difference ...
In natural language parsing, it is notoriously difficult to distinguish
whether a failure to produce a parsing result is due to an error in the
input or to the lack of coverage of the grammar, also because a
natural language by its nature has no precise delimitation.
Thus, input not licensed by the grammar may well be perfectly
adequate according to native speakers of the language.
Moreover, input containing errors may still carry useful bits of
information that might be desirable to try to recover.
Robustness refers to the ability of always producing some result in
response to such input.
3/29/2024 7
8. Basic Concepts
A recognizer is a procedure that determines whether or not an
input sentence is grammatical according to the grammar
(including the lexicon).
A parser is a recognizer that produces associated structural
analyses according to the grammar (e.g. parse trees or feature
terms).
A robust parser attempts to produce useful output, such as a
partial analysis, even if the input is not covered by the grammar.
It is possible to think of a grammar as inducing a search space
consisting of a set of states representing stages of successive
grammar-rule rewritings and a set of transitions between these
states.
3/29/2024 8
9. Basic Concepts…
When analyzing a sentence, the parser (recognizer) must rewrite
the grammar rules in some sequence.
A sequence that connects the state S, the string consisting of
just the start category of the grammar, and a state consisting of
exactly the string of input words, is called a derivation.
Each state in the sequence then consists of a string over V and
is called a sentential form.
If such a sequence exists, the sentence is said to be grammatical
according to the grammar.
3/29/2024 9
10. Basic Concepts…
Parsers can be classified along several dimensions according to
the ways in which they carry out derivations.
One such dimension concerns rule invocation:
In a top-down derivation, each sentential form is produced from its
predecessor by replacing one nonterminal symbol A by a string of
terminal or nonterminal symbols X1 · · · Xd, where A → X1 · · · Xd
is a grammar rule.
Conversely, in a bottom-up derivation, each sentential form is
produced by replacing X1 · · · Xd with A given the same grammar
rule, thus successively applying rules in the reverse direction.
3/29/2024 10
13. Basic Concepts…
Another dimension concerns the way in which the parser deals
with ambiguity, in particular, whether the process is
deterministic or nondeterministic.
In the former case, only a single, irrevocable choice may be made
when the parser is faced with local ambiguity.
This choice is typically based on some form of look ahead or
systematic preference.
A third dimension concerns whether parsing proceeds from left
to right (strictly speaking front to back) through the input or in
some other order, for example, inside-out from the right-hand-
side heads.
3/29/2024 13
14. Rule Based Parsing
The rule-based approach has successfully been used in
developing many natural language processing systems.
Systems that use rule-based transformations are based on a
core of solid linguistic knowledge.
The linguistic knowledge acquired for one natural language
processing system may be reused to build knowledge required
for a similar task in another system.
3/29/2024 14
15. Rule Based Parsing…
The advantage of the rule-based approach over the corpus-
based approach is clear for:
1) Less-resourced languages, for which large corpora, possibly
parallel or bilingual, with representative structures and entities
are neither available nor easily affordable, and
2) For morphologically rich languages, which even with the
availability of corpora suffer from data sparseness.
3/29/2024 15
16. CYK Algorithm
The Cocke–Kasami–Younger (CKY, sometimes written CYK)
algorithm is one of the simplest context-free parsing algorithms.
A reason for its simplicity is that it only works for grammars in
Chomsky Normal Form (CNF).
A grammar is in CNF when each rule is either:
(i) a unary terminal rule of the form A → w, or
(ii) a binary nonterminal rule of the form A → BC.
It is always possible to transform a grammar into CNF such that
it accepts the same language. However, the transformation can
change the structure of the grammar quite radically;
E.g., if the original grammar has n rules, the transformed version
may in the worst case have O(n2) rules.
3/29/2024 16
17. CYK Algorithm…
The CKY algorithm builds an upper triangular matrix T , where
each cell Ti,j (0 ≤ I,j ≤ n) is a set of nonterminals.
The meaning of the statement A ∈ Ti,j is that A spans the input
words wi+1 · · · wj, or written more formally, A ⇒∗ wi+1 · · · wj.
3/29/2024 17
18. CYK Algorithm…
CKY is a purely bottom-up algorithm consisting of two parts.
First build the lexical cells Ti−1,i for the input word wi by applying the
lexical grammar rules,
Then the nonlexical cells Ti,k (i < k−1) are filled by applying the
binary grammar rules:
Ti−1,i = { A | A → wi }
Ti,k = A | A → BC, i < j < k, B ∈ Ti,j, C ∈ Tj,k
The sentence is recognized by the algorithm if S ∈ T…,n, where S is
the start symbol of the grammar.
To make the algorithm less abstract, one should note that all cells
Ti,j and Tj,k (i < j < k) must already be known when building the
cell Ti,k. This means that it is required to be careful when
designing the i and k loops, so that smaller spans are calculated
before larger spans.
3/29/2024 18
19. CYK Algorithm…
One solution is to start by looping over the end node k, and then
loop over the start node i in the reverse direction.
The pseudo-code is as follows:
procedure CKY(T ,w1 · · · wn)
Ti,j := ∅for all 0 ≤ i, j ≤ n
for i := 1 to n do
for all lexical rules A → w do
if w = wi then add A to Ti−1,I
for k := 2 to n do
for i := k − 2 downto 0 do
for j := i + 1 to k − 1 do
for all binary rules A → BC do
if B ∈ Ti,j and C ∈ Tj,k then add A to Ti,k
3/29/2024 19
20. CYK Algorithm…
But there are also several alternative possibilities for how to
encode the loops in the CKY algorithm;
E.g., instead of letting the outer k loop range over end positions, it
is possible to equally well let it range over span lengths.
It is important to keep in mind, however, that smaller spans must
be calculated before larger spans.
As already mentioned, the CKY algorithm can only handle
grammars in CNF.
Furthermore, converting a grammar to CNF is a bit
complicated, and can make the resulting grammar much larger.
Instead, it is possible to modify the CKY algorithm directly to
handle unary grammar rules and longer right-hand sides.
3/29/2024 20
21. Top-down and Bottom-up
Top-down parsing:
Only build trees that have S at the root node may lead to trees that
do not yield the sentence.
In naive search, top-down parsing is inefficient because structures
are created over and over again.
Need a way to record that a particular structure has been predicted.
Need a way to record where the structure was predicted wrt the
input.
Bottom-up parsing:
Only build trees that yield the sentence may lead to trees that do
not have S at the root.
3/29/2024 21
22. Top-down and Bottom-up…
Pros/cons of top-down strategy:
Never explores trees that aren't potential solutions, ones with the
wrong kind of root node.
But explores trees that do not match the input sentence (predicts
input before inspecting input).
Naive top-down parsers never terminate if G contains recursive
rules like X ! X Y (left recursive rules).
Backtracking may discard valid constituents that have to be re-
discovered later (duplication of effort).
Use a top-down strategy when you know what kind of constituent
you want to end up with (e.g. NP extraction, named entity
extraction). Avoid this strategy if you're stuck with a highly
recursive grammar.
3/29/2024 22
23. Earley's Algorithm Grammar
Formalisms and Treebanks
Earley Algorithm
The Earley algorithm is a parsing algorithm for arbitrary context-
free grammars.
The Earley Parsing Algorithm is an efficient top-down parsing
algorithm that avoids some of the inefficiency associated with
purely naive search with the same top-down strategy (cf.
recursive descent parser).
Intermediate solutions are created only once and stored in a chart
(dynamic programming).
Left-recursion problem is solved by examining the input.
Earley is not picky about what type of grammar it accepts, i.e., it
accepts arbitrary CFGs (cf. CKY).
3/29/2024 23
24. Earley's Algorithm Grammar
Formalisms and Treebanks…
Earley Algorithm…
Earley Parsing Algorithm
Start with the start symbol S.
Take the leftmost non-terminal and predict all possible expansions.
If the next symbol in the expansion is a word, match it against the
input sentence (scan); otherwise, repeat.
If there is nothing more to expand, the subtree is complete; in this
case, continue with the next incomplete subtree.
3/29/2024 24
25. Earley's Algorithm Grammar
Formalisms and Treebanks…
Earley Algorithm…
Dotted rules
A dotted rule is a partially processed rule.
Example: S → NP • VP
The dot can be placed in front of the first symbol, behind the last
symbol, or between two symbols on the right-hand side of a rule.
The general form of a dotted rule thus is A → α • β , where A → αβ
is the original, non-dotted rule.
3/29/2024 25
26. Earley's Algorithm Grammar
Formalisms and Treebanks…
Earley Algorithm…
Chart entries
The chart contains entries of the form [min, max, A → α • β], where
min and max are positions in the input and A → α • β is a dotted
rule.
Such an entry says: ‘We have built a parse tree whose first rule is A
→ αβ and where the part of this rule that corresponds to α covers
the words between min and max.’
3/29/2024 26
40. Earley's Algorithm Grammar
Formalisms and Treebanks…
Earley Algorithm…
Earley: fundamental operations
Predict sub-structure (based on grammar)
Scan partial solutions for a match
Complete a sub-structure (i.e., build constituents)
3/29/2024 40
41. Earley's Algorithm Grammar
Formalisms and Treebanks…
Earley Algorithm…
Recogniser/parser
When parsing is complete, is there a chart entry? [0, n, S →
α • ]
Recognizer
If we want a parser, we have to add back pointers, and
retrieve a tree.
Earley’s algorithm can be used for PCFGs, but it is more
complicated than for CKY.
3/29/2024 41
42. Earley's Algorithm Grammar
Formalisms and Treebanks…
Earley's Algorithm Grammar Formalisms
Grammar Formalisms are mathematically precise notation for
formalizing a theory of grammar.
CFG has been the most influential grammar formalism for
describing language syntax.
This is not because CFG has been generally adopted as such for
linguistic description, but rather because most grammar
formalisms are derived from or can somehow be related to CFG.
For this reason, CFG is often used as a base formalism when
parsing algorithms are described.
3/29/2024 42
44. Earley's Algorithm Grammar
Formalisms and Treebanks…
Earley's Algorithm Treebanks
Treebanks are corpora in which each sentence has been annotated
with a syntactic analysis.
Producing a high-quality treebank is both time-consuming and
expensive.
One of the most widely known treebanks is the Penn TreeBank
(PTB).
3/29/2024 44
46. Earley's Algorithm Grammar
Formalisms and Treebanks…
Treebank Grammars:
Given a treebank, it is possible to construct a grammar by
reading rules off the phrase structure trees.
A treebank grammar will account for all analyses in the
treebank.
It will also account for sentences that were not observed in the
treebank.
The simplest way to obtain rule probabilities is relative
frequency estimation.
Step 1: Count the number of occurrences of each rule in the
treebank.
Step 2: Divide this number by the total number of rule
occurrences for the same left-hand side.
3/29/2024 46
47. CKY Versus Earley
The CKY algorithm has two disadvantages:
It can only handle restricted grammars (CNF).
It does not use top–down information.
The Earley algorithm does not have these:
The Earley algorithm is a parsing algorithm for arbitrary context-
free grammars.
In contrast to the CKY algorithm, it also uses top–down
information.
On the downside, it is more complicated.
In contrast to the CKY algorithm, its probabilistic extension is not
straightforward.
3/29/2024 47
48. Efficient Parsing for Context-Free
Grammars (CFGs)…
The standard way of defining a CFG is as a tuple G =(∑ ,N, S,
R), where ∑ and N are disjoint finite sets of terminal and
nonterminal symbols, respectively, and S ∈ N is the start
symbol.
The nonterminals are also called categories, and the set V = N
∪ ∑ contains the symbols of the grammar.
R is a finite set of production rules of the form A → α, where A
∈ N is a nonterminal and α ∈ V is a sequence of symbols.
3/29/2024 48
49. Efficient Parsing for Context-Free
Grammars (CFGs)…
Although there are several conventions the followings can also
be considered:
Capital letters A, B, C, . . . for nonterminals,
Lower-case letters s, t, w, . . . for terminal symbols, and
Uppercase X, Y, Z, . . . for general symbols (elements in V).
Greek letters α, β, γ , . . . will be used for sequences of symbols,
and
€ for the empty sequence.
3/29/2024 49
50. Efficient Parsing for Context-Free
Grammars (CFGs)…
Although there are several conventions the followings can also
be considered…
The rewriting relation ⇒ is defined by αBγ ⇒ αβγ if and only if B → β.
A phrase is a sequence of terminals β ∈ ∑ ∗ such that A ⇒ · · · ⇒ β for
some A ∈ N.
Accordingly, the term phrase structure grammar is sometimes used for
grammars with at least context-free power.
The sequence of rule expansions is called a derivation of β from A.
A (grammatical) sentence is a phrase that can be derived from the start
symbol S.
The string language L(G) accepted by G is the set of sentences of G.
Some algorithms only work for particular normal forms of
CFGs.
3/29/2024 50
51. Efficient Parsing for Context-Free
Grammars (CFGs)…
In practice, pure CFG is not widely used for developing natural
language grammars (though grammar based language modeling
in speech recognition is one such case).
One reason for this is that CFG is not expressive enough—it
cannot describe all peculiarities of natural language,
E.g., Geez, Swiss–German or Dutch scrambling, or Scandinavian
long-distance dependencies.
But the main practical reason is that it is difficult to use;
E.g., agreement, inflection, and other common phenomena are
complicated to describe using CFG.
3/29/2024 51
52. Efficient Parsing for Context-Free
Grammars (CFGs)…
Example
The example grammar in the
Figure is over generating—it
recognizes both the noun
phrases “a men” and “an man,”
as well as the sentence “the
men mans a ship.”
3/29/2024 52
However, to make the grammar syntactically correct, we must
duplicate the categories Noun, Det, and NP into singular and plural
versions.
All grammar rules involving these categories must be duplicated too.
And if the language is, e.g., German, then Det and Noun have to be
inflected on number (SING/PLUR), gender (FEM/NEUTR/MASC)
and, case (NOM/ACC/DAT/GEN).
53. Statistical Parsing and Probabilistic
CFGs (PCFGs)
3/29/2024 53
Statistical Parsing
Statistical parsing uses a probabilistic model of syntax in order to
assign probabilities to each parse tree.
Provides principled approach to resolving syntactic ambiguity.
Allows supervised learning of parsers from tree-banks of parse
trees provided by human linguists.
Also allows unsupervised learning of parsers from unannotated
text, but the accuracy of such parsers has been limited.
54. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 54
PCFG
A PCFG is a probabilistic version of a CFG where each
production has a probability.
Probabilities of all productions rewriting a given non-terminal
must add to 1, defining a distribution for each non-terminal.
String generation is now probabilistic where production
probabilities are used to non-deterministically select a production
for rewriting a given non-terminal.
56. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 56
Sentence probability (Derivation Probability):
Assume productions for each node are chosen independently.
Probability of derivation is the product of the probabilities of its
productions.
57. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 57
Syntactic Disambiguation:
Resolve ambiguity by picking most probable parse tree.
58. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 58
Sentence Probability:
Probability of a sentence is the sum of the probabilities of all of
its derivations.
P(“book the flight through Houston”) =
P(D1) + P(D2) = 0.0000216 + 0.00001296
= 0.00003456
59. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 59
Three Useful PCFG Tasks:
Observation likelihood: to classify and order sentences.
Useful for language modeling for speech recognition,
translation, word prediction.
Parse trees are richer language models than Ngrams.
Most likely derivation: To determine the most likely parse tree for
a sentence.
Maximum likelihood training: To train a PCFG to fit empirical
training data.
60. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 60
PCFG: Observation Likelihood
There is an algorithm called the Inside algorithm for efficiently
determining how likely a string is to be produced by a PCFG.
Can use a PCFG as a language model to choose between
alternative sentences for speech recognition or machine
translation.
61. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 61
PCFG: Most Likely Derivation:
There is an analog to the Viterbi algorithm to efficiently
determine the most probable derivation (parse tree) for a
sentence.
62. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 62
PCFG: Most Likely Derivation
There is an analog to the Viterbi algorithm to efficiently
determine the most probable derivation (parse tree) for a
sentence.
63. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 63
PCFG: Supervised Training
If parse trees are provided for training sentences, a grammar and
its parameters can all be estimated directly from counts
accumulated from the tree-bank (with appropriate smoothing).
64. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 64
PCFG: Maximum Likelihood Training
Given a set of sentences, induce a grammar that maximizes the
probability that this data was generated from this grammar.
Assume the number of non-terminals in the grammar is specified.
Only need to have an unannotated set of sequences generated
from the model.
Does not need correct parse trees for these sentences.
In this sense, it is unsupervised.
65. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 65
PCFG: Maximum Likelihood Training
66. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 66
Inside-Outside:
The Inside-Outside algorithm is a version of EM for
unsupervised learning of a PCFG.
Analogous to Baum-Welch (forward-backward) for HMMs.
Given the number of non-terminals, construct all possible CNF
productions with these non-terminals and observed terminal
symbols.
Use EM to iteratively train the probabilities of these productions
to locally maximize the likelihood of the data.
Experimental results are not impressive, but recent work imposes
additional constraints to improve unsupervised grammar
learning.
67. Statistical Parsing and Probabilistic
CFGs (PCFGs)…
3/29/2024 67
Vanilla PCFG Limitations:
Since probabilities of productions do not rely on specific words
or concepts, only general structural disambiguation is possible
(e.g. prefer to attach PPs to Nominals).
Consequently, vanilla PCFGs cannot resolve syntactic
ambiguities that require semantics to resolve, e.g. ate with fork
vs. meatballs.
In order to work well, PCFGs must be lexicalized, i.e.
productions must be specialized to specific words by including
their head-word in their LHS non-terminals (e.g. VP->ate).
68. Lexicalized PCFGs
3/29/2024 68
Example of Importance of Lexicalization:
A general preference for attaching PPs to NPs rather than VPs
can be learned by a vanilla PCFG.
But the desired preference can depend on specific words.
69. Lexicalized PCFGs…
3/29/2024 69
Example of Importance of Lexicalization:
A general preference for attaching PPs to NPs rather than VPs
can be learned by a vanilla PCFG.
But the desired preference can depend on specific words.
70. Lexicalized PCFGs…
3/29/2024 70
Head-Words:
Syntactic phrases usually have a word in them that is most
“central” to the phrase.
Linguists have defined the concept of a lexical head of a phrase.
Simple rules can identify the head of any phrase by percolating
head words up the parse tree.
Head of a VP is the main verb,
Head of an NP is the main noun,
Head of a PP is the preposition,
Head of a sentence is the head of its VP.
71. Lexicalized PCFGs…
3/29/2024 71
Lexicalized Productions
Specialized productions can be generated by including the head
word and its POS of each non-terminal as part of that non-
terminal’s symbol.
73. Lexicalized PCFGs…
3/29/2024 73
Parameterizing Lexicalized Productions
Accurately estimating parameters on such a large number of very
specialized productions could require enormous amounts of
treebank data.
Need some way of estimating parameters for lexicalized
productions that makes reasonable independence assumptions so
that accurate probabilities for very specific rules can be learned.
Collins (1999) introduced one approach to learning effective
parameters for a lexicalized grammar.
74. Treebanks
3/29/2024 74
English Penn Treebank: Standard corpus for testing syntactic
parsing consists of 1.2 M words of text from the Wall Street
Journal (WSJ).
Typical to train on about 40,000 parsed sentences and test on an
additional standard disjoint test set of 2,416 sentences.
Chinese Penn Treebank: 100K words from the Xinhua news
service.
Other corpora existing in many languages, see the Wikipedia
article “Treebank”.
76. Parsing Evaluation Metrics
3/29/2024 76
PARSEVAL metrics measure the fraction of the constituents
that match between the computed and human parse trees.
If P is the system’s parse tree and T is the human parse tree
(the “gold standard”):
Recall = (# correct constituents in P) / (# constituents in T)
Precision = (# correct constituents in P) / (# constituents in P)
Labeled Precision and labeled recall require getting the non-
terminal label on the constituent node correct to count as
correct.
F1 is the harmonic mean of precision and recall.
78. Parsing Evaluation Metrics…
3/29/2024 78
Treebank Results:
Results of current state-of-the-art systems on the English Penn
WSJ treebank are 91-92% labeled F1.
Statistical models such as PCFGs allow for probabilistic
resolution of ambiguities.
PCFGs can be easily learned from treebanks.
Lexicalization and non-terminal splitting are required to
effectively resolve many ambiguities.
Current statistical parsers are quite accurate but not yet at the
level of human-expert agreement.
80. Outline
Semantic Analysis:
Lexical semantics and word-sense disambiguation
Compositional semantics
Semantic Role Labeling and Semantic Parsing.
3/29/2024 80
81. Introduction
Semantic analysis refers to analyzing the meanings of words,
fixed expressions, whole sentences, and utterances in context.
In practice, this means translating original expressions into some
kind of semantic metalanguage.
The major theoretical issues in semantic analysis therefore turn
on the nature of the metalanguage or equivalent
representational system.
3/29/2024 81
82. Introduction…
For extended texts, specific NLP applications of semantic
analysis may include:
Information retrieval,
Information extraction,
Text summarization,
Data-mining, and
Machine translation and translation aids.
3/29/2024 82
83. Introduction…
Semantic analysis is also pertinent for much shorter texts, right
down to the single word level,
For example, in understanding user queries and matching user
requirements to available data.
Semantic analysis is also of high relevance in efforts to improve
Web ontologies and knowledge representation systems.
3/29/2024 83
84. Introduction…
Various theories and approaches to semantic representation
can be roughly ranged along two dimensions:
(1) formal vs. cognitive and
(2) compositional vs. lexical
Formal theories have been strongly advocated since the late
1960s while cognitive approaches have become popular in the
last three decades, driven also by influences from cognitive
science and psychology.
3/29/2024 84
85. Introduction…
Compositional semantics is concerned with the bottom-up
construction of meaning, starting with the lexical items, whose
meanings are generally treated as given.
Lexical semantics, on the other hand, aims at precisely
analyzing the meanings of lexical items, either by analyzing
their internal structure and content (decompositional
approaches) or by representing their relations to other elements
in the lexicon (relational approaches).
3/29/2024 85
86. Lexical Semantics and Word-Sense
Disambiguation
Three Perspectives on Meaning
1. Lexical Semantics
The meanings of individual words.
2. Formal Semantics (or Compositional Semantics or Sentential
Semantics)
How those meanings combine to make meanings for individual
sentences or utterances.
3. Discourse or Pragmatics
How those meanings combine with each other and with other facts
about various kinds of context to make meanings for a text or
discourse.
Dialog or Conversation is often lumped together with Discourse.
3/29/2024 86
87. Lexical Semantics and Word-Sense
Disambiguation…
Lexical Semantics
Can be defined as the study of what individual lexical items mean,
why they mean, what they do, how we can represent all of this,
and where the combined interpretation for an utterance comes
from.
Lexical semantics is concerned with the identification and
representation of the semantics of lexical items.
If we are to identify the semantics of lexical items, we have to be
prepared for the eventuality of a given word having multiple
interpretations = polysemy (cf. monosemy).
Polysemy = the condition of a single lexical item having
multiple meanings.
3/29/2024 87
88. Lexical Semantics and Word-Sense
Disambiguation…
Lexical Semantics…
There is a traditional division made between lexical semantics and
supralexical semantics.
Lexical semantics, which concerns itself with the meanings of
words and fixed word combinations,
Supralexical (combinational, or compositional) semantics, which
concerns itself with the meanings of the indefinitely large number of
word combinations—phrases and sentences—allowable under the
grammar.
While there is some obvious appeal and validity to this division, it
is increasingly recognized that word-level semantics and
grammatical semantics interact and interpenetrate in various
ways.
3/29/2024 88
90. Lexical Semantics and Word-Sense
Disambiguation…
Word-Sense Disambiguation
Many tasks in natural language processing require
disambiguation of ambiguous words.
Question Answering
Information Retrieval
Machine Translation
Text Mining
Phone Help Systems
Understanding how people disambiguate words is an interesting
problem that can provide insight in psycholinguistics.
3/29/2024 90
91. Lexical Semantics and Word-Sense
Disambiguation…
Word-Sense Disambiguation
Task of determining the meaning of an ambiguous word in the
given context.
Bank:
Edge of a river
or
Financial institution that accepts money
Refers to the resolution of lexical semantic ambiguity and its goal
is to attribute the correct senses to words.
3/29/2024 91
92. Lexical Semantics and Word-Sense
Disambiguation…
Word-Sense Disambiguation…
Given
A word in context,
A fixed inventory of potential word senses.
Decide which sense of the word this is:
English-to-Spanish MT
Inventory is the set of Spanish translations
Speech Synthesis
Inventory is homographs with different pronunciations like bass
and bow.
Automatic indexing of medical articles
MeSH (Medical Subject Headings) thesaurus entries.
3/29/2024 92
93. Lexical Semantics and Word-Sense
Disambiguation…
Word-Sense Disambiguation…
Two variants of WSD task
Lexical Sample task:
Small pre-selected set of target words
And inventory of senses for each word
All-words task:
Every word in an entire text
A lexicon with senses for each word
Sort-of like part-of-speech tagging
» Except each lemma has its own tagset
3/29/2024 93
95. Lexical Semantics and Word-Sense
Disambiguation…
Word-Sense Disambiguation…
WSD Approaches
Disambiguation based on manually created rules,
Disambiguation using machine readable dictionaries,
Disambiguation using thesauri,
Disambiguation based on unsupervised machine learning with
corpora.
3/29/2024 95
96. Lexical Semantics and Word-Sense
Disambiguation…
Word-Sense Disambiguation…
Lexical Ambiguity
Most words in natural languages have multiple possible meanings.
– “pen” (noun)
» The dog is in the pen.
» The ink is in the pen.
– “take” (verb)
» Take one pill every morning.
» Take the first right past the stoplight.
Syntax helps distinguish meanings for different parts of speech of an
ambiguous word.
– “conduct” (noun or verb)
» John’s conduct in class is unacceptable.
» John will conduct the orchestra on Thursday.
3/29/2024 96
97. Lexical Semantics and Word-Sense
Disambiguation…
Word-Sense Disambiguation…
Evaluation of WSD
“In vitro”:
Corpus developed in which one or more ambiguous words are
labeled with explicit sense tags according to some sense inventory.
Corpus used for training and testing WSD and evaluated using
accuracy (percentage of labeled words correctly disambiguated).
» Use most common sense selection as a baseline.
“In vivo”:
Incorporate WSD system into some larger application system, such
as machine translation, information retrieval, or question answering.
Evaluate relative contribution of different WSD methods by
measuring performance impact on the overall system on final task
(accuracy of MT, IR, or QA results).
3/29/2024 97
98. Lexical Semantics and Word-Sense
Disambiguation…
Word-Sense Disambiguation…
Issues in WSD
What is the right granularity of a sense inventory?
Integrating WSD with other NLP tasks
Syntactic parsing
Semantic role labeling
Semantic parsing
Does WSD actually improve performance on some real end-user
task?
Information retrieval
Information extraction
Machine translation
Question answering
3/29/2024 98
99. Lexical Semantics and Word-Sense
Disambiguation…
WSD: Area of Research
Assigning correct sense to words having electronic dictionary as
source of word definitions.
Open research field in Natural Language Processing (NLP).
Hard Problem which is a popular area for research.
Used in speech synthesis by identifying the correct sense of the
word.
3/29/2024 99
100. Compositional Semantics
Compositional semantics is concerned with the bottom-up
construction of meaning, starting with the lexical items, whose
meanings are generally treated as given.
Compositional semantics: the construction of meaning
(generally expressed as logic) based on syntax.
3/29/2024 100
101. Compositional Semantics…
Frame Semantics
Originally developed by Fillmore 1968.
Frames can represent situations of arbitrary granularity
(elementary or complex) and accordingly frame-semantic
analysis can be conducted on linguistic units of varying sizes, e.g.
phrases, sentences or whole documents,
But most work has been devoted to frame semantics as a
formalism for sentence-level semantic analysis and most
commonly it has been applied for the analysis of verbal
predicate-argument structures.
3/29/2024 101
103. Semantic Role Labeling and
Semantic Parsing
Semantic role labeling
Semantic role labeling, sometimes also called shallow semantic
parsing, is a task in NLP consisting of the detection of the
semantic arguments associated with the predicate or verb of a
sentence and their classification into their specific roles.
For example, given a sentence like “Abebe sold the book to
Hagos", the task would be to recognize the verb "to sell" as
representing the predicate,
“Abebe" as representing the seller (agent), "the book" as
representing the goods (theme), and “Hagos" as representing the
recipient.
3/29/2024 103
104. Semantic Role Labeling and
Semantic Parsing
Semantic role labeling
Semantic role…
This is an important step towards making sense of the meaning of a
sentence.
A semantic analysis of this sort is at a lower-level of abstraction
than a syntax tree, i.e. it has more categories, thus groups fewer
clauses in each category.
For instance, "the book belongs to me" would need two labels such
as "possessed" and "possessor" whereas "the book was sold to
Hagos" would need two other labels such as "goal" (or "theme")
and "receiver" (or "recipient") even though these two clauses would
be very similar as far as "subject" and "object" functions are
concerned.
3/29/2024 104
105. Semantic Role Labeling and
Semantic Parsing…
Semantic Parsing
Traditional sentence parsing is often performed as a method of
understanding the exact meaning of a sentence or word,
sometimes with the aid of devices such as sentence diagrams.
It usually emphasizes the importance of grammatical divisions
such as subject and predicate.
Within computational linguistics parsing is used to refer to the
formal analysis by a computer of a sentence or other string of
words into its constituents, resulting in a parse tree showing their
syntactic relation to each other.
Semantic parsing is the extension of broad-coverage probabilistic
parsers to represent sentence meaning.
3/29/2024 105