SlideShare a Scribd company logo
1 of 68
Download to read offline
Introduction
Example Extraction
Example Selection
Results
Conclusion
Extracting
Sense-Disambiguated Example Sentences
From Parallel Corpora
Gerard de Melo and Gerhard Weikum
Max Planck Institute for Informatics
SaarbrĀØucken, Germany
2009-09-18
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
a thin, sharp-pointed metal pin with a raised spiral
thread running around it and a slotted head, used to join
things together by being rotated in under pressure
(Concise OED)
Use a crosshead screwdriver to tighten the screws inserted
in the pre-fixed connectors.
(Web)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
a thin, sharp-pointed metal pin with a raised spiral
thread running around it and a slotted head, used to join
things together by being rotated in under pressure
(Concise OED)
Use a crosshead screwdriver to tighten the screws inserted
in the pre-fixed connectors.
(Web)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
any sentence that contains a word (being used in a speciļ¬c
sense)
allow the user to grasp a wordā€™s meaning
see circumstances a word would typically be used in
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
any sentence that contains a word (being used in a speciļ¬c
sense)
allow the user to grasp a wordā€™s meaning
see circumstances a word would typically be used in
traditional intensional word deļ¬nitions may be too confusing
humans used to deriving meaning from context
users can verify whether they have understood deļ¬nition correctly
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
any sentence that contains a word (being used in a speciļ¬c
sense)
allow the user to grasp a wordā€™s meaning
see circumstances a word would typically be used in
possible contexts, e.g. child vs. youngster
typical collocations, e.g. to give birth or birth rate
but not *to give nascence or *nascence rate
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=ā‡’ most modern dictionaries include example sentences
number, length often limited
digital dictionaries: tight space constraints of print media no
longer apply!
larger number of example sentences can be presented to the
user on demand
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=ā‡’ most modern dictionaries include example sentences
number, length often limited
digital dictionaries: tight space constraints of print media no
longer apply!
larger number of example sentences can be presented to the
user on demand
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=ā‡’ most modern dictionaries include example sentences
number, length often limited
digital dictionaries: tight space constraints of print media no
longer apply!
larger number of example sentences can be presented to the
user on demand
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=ā‡’ most modern dictionaries include example sentences
number, length often limited
digital dictionaries: tight space constraints of print media no
longer apply!
larger number of example sentences can be presented to the
user on demand
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Goals
automatically obtain example sentences for a speciļ¬c sense of
a word
choose set of representative sentences to present to user
distinguish senses:
There were many bats flying out of the cave.
vs.
In professional baseball, only wooden bats
are permitted.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Goals
automatically obtain example sentences for a speciļ¬c sense of
a word
choose set of representative sentences to present to user
not all examples are equally useful
screen space may be limited (can still show more only on demand)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Index (Dictionary)
English: Princeton WordNet 3.0
Spanish: Spanish WordNet
Ļƒ(t): get senses for term t
Ļƒ(t, s): is sense s a sense of term t?
behind the scenes: morphological analysis, multi-word
expression detection
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Index (Dictionary)
English: Princeton WordNet 3.0
Spanish: Spanish WordNet
Ļƒ(t): get senses for term t
Ļƒ(t, s): is sense s a sense of term t?
behind the scenes: morphological analysis, multi-word
expression detection
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Index (Dictionary)
English: Princeton WordNet 3.0
Spanish: Spanish WordNet
Ļƒ(t): get senses for term t
Ļƒ(t, s): is sense s a sense of term t?
behind the scenes: morphological analysis, multi-word
expression detection
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate word occurrences, then use sentence as example
for those word senses
ļ¬ne-grained WSD not reliable enough (cf. SemEval results)
idea: use parallel corpora, jointly look at both versions of text
=ā‡’ greater accuracy can be achieved
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate word occurrences, then use sentence as example
for those word senses
ļ¬ne-grained WSD not reliable enough (cf. SemEval results)
idea: use parallel corpora, jointly look at both versions of text
=ā‡’ greater accuracy can be achieved
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate word occurrences, then use sentence as example
for those word senses
ļ¬ne-grained WSD not reliable enough (cf. SemEval results)
idea: use parallel corpora, jointly look at both versions of text
=ā‡’ greater accuracy can be achieved
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate word occurrences, then use sentence as example
for those word senses
ļ¬ne-grained WSD not reliable enough (cf. SemEval results)
idea: use parallel corpora, jointly look at both versions of text
=ā‡’ greater accuracy can be achieved
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Sense Disambiguation
There were many bats flying out of the cave.
Aus der HĀØohle flogen viele FledermĀØause.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Sense Disambiguation
There were many bats flying out of the cave.
Aus der HĀØohle flogen viele FledermĀØause.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Parallel Disambiguation
req.: text available in two languages a, b
good news: more and more parallel corpora available
word alignment
compute score
wsd(sa|ta, tb) = wsd(sa|ta) Ļƒ(ta,sa)csim(tb,sa)
s āˆˆĻƒ(ta)
Ļƒ(ta,s )csim(tb,s )
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Parallel Disambiguation
req.: text available in two languages a, b
good news: more and more parallel corpora available
word alignment
compute score
wsd(sa|ta, tb) = wsd(sa|ta) Ļƒ(ta,sa)csim(tb,sa)
s āˆˆĻƒ(ta)
Ļƒ(ta,s )csim(tb,s )
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Parallel Disambiguation
req.: text available in two languages a, b
good news: more and more parallel corpora available
word alignment
compute score
wsd(sa|ta, tb) = wsd(sa|ta) Ļƒ(ta,sa)csim(tb,sa)
s āˆˆĻƒ(ta)
Ļƒ(ta,s )csim(tb,s )
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Components
csim(tb, sa): cross-lingual similarity
monolingual WSD: compare bag-of-words vector for term
context with sense context (constructed from deļ¬nitions, etc.)
Semantic Similarity sim(s1, s2)
identify only near-identical senses (e.g. of house and home),
not arbitrary associations (e.g. house and door)
csim(tb, sa) =
sbāˆˆĻƒ(tb)
sim(sa, sb) wsd(sb|tb)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Components
csim(tb, sa): cross-lingual similarity
monolingual WSD: compare bag-of-words vector for term
context with sense context (constructed from deļ¬nitions, etc.)
Semantic Similarity sim(s1, s2)
identify only near-identical senses (e.g. of house and home),
not arbitrary associations (e.g. house and door)
wsd(s|t) = Ļƒ(t, s) Ī± + v(s)T
v(t)
||v(s)|| ||v(t)||
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Components
csim(tb, sa): cross-lingual similarity
monolingual WSD: compare bag-of-words vector for term
context with sense context (constructed from deļ¬nitions, etc.)
Semantic Similarity sim(s1, s2)
identify only near-identical senses (e.g. of house and home),
not arbitrary associations (e.g. house and door)
sim(s1, s2) =
ļ£±
ļ£“ļ£“ļ£“ļ£²
ļ£“ļ£“ļ£“ļ£³
1 s1 = s2
1 s1, s2 in near-synonymy relationship
1 s1, s2 in hypernymy/hyponymy relationship
0 otherwise
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Extraction
Use as example sentence iļ¬€ score suļ¬ƒciently high
wsd(sa|ta, tb) = wsd(ta, sa) Ļƒ(ta,sa)csim(tb,sa)
s āˆˆĻƒ(ta)
Ļƒ(ta,s )csim(tb,s )
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Task Motivation
for computational applications: more data =ā‡’ better data
for human users: a good limited selection should be provided
at ļ¬rst
assumption: space constraint k - number of sentences
goal: choose a good set of k example sentences
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Task Motivation
for computational applications: more data =ā‡’ better data
for human users: a good limited selection should be provided
at ļ¬rst
assumption: space constraint k - number of sentences
goal: choose a good set of k example sentences
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Task Motivation
for computational applications: more data =ā‡’ better data
for human users: a good limited selection should be provided
at ļ¬rst
assumption: space constraint k - number of sentences
goal: choose a good set of k example sentences
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good example sentence?
one that showcases how a word is used in context
(typical prepositions, collocations, etc.)
one that helps in grasping the meaning
related work by Rychly et al. (2008):
one that is intelligible to learners
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good example sentence?
one that showcases how a word is used in context
(typical prepositions, collocations, etc.)
one that helps in grasping the meaning
related work by Rychly et al. (2008):
one that is intelligible to learners
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good example sentence?
one that showcases how a word is used in context
(typical prepositions, collocations, etc.)
one that helps in grasping the meaning
related work by Rychly et al. (2008):
one that is intelligible to learners
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be thought of as having certain assets
6 diļ¬€erent classes of n-gram assets
1 class of assets for entire original sentence
e.g. for account:
containing frequent bigram bank account can be an asset
containing frequent trigram to account for can be an asset
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be thought of as having certain assets
6 diļ¬€erent classes of n-gram assets
1 class of assets for entire original sentence
1: original unigram
2-5: bigram with preceding/following, trigram, etc.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be thought of as having certain assets
6 diļ¬€erent classes of n-gram assets
1 class of assets for entire original sentence
weights w(a) = f (x,a)
n
i=1 f (xi ,a)
, etc.
=ā‡’ higher weight for open an account than for Peterā€™s chequing
account
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be thought of as having certain assets
6 diļ¬€erent classes of n-gram assets
1 class of assets for entire original sentence
weight: cosine similarity with sense deļ¬nition
=ā‡’ bias towards explanatory sentences
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sentences?
Select sentences with greatest total asset weights?
Instead: maximize
aāˆˆ
xāˆˆC
A(x)
w(a)
Problem is NP-hard (proof via Vertex Cover reduction)
āˆ’ā†’ use greedy heuristic
No: Most frequent expressions would dominate the result set.
Need diversity!
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sentences?
Select sentences with greatest total asset weights?
Instead: maximize
aāˆˆ
xāˆˆC
A(x)
w(a)
Problem is NP-hard (proof via Vertex Cover reduction)
āˆ’ā†’ use greedy heuristic
each asset in result set only counted once
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sentences?
Select sentences with greatest total asset weights?
Instead: maximize
aāˆˆ
xāˆˆC
A(x)
w(a)
Problem is NP-hard (proof via Vertex Cover reduction)
āˆ’ā†’ use greedy heuristic
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sentences?
Select sentences with greatest total asset weights?
Instead: maximize
aāˆˆ
xāˆˆC
A(x)
w(a)
Problem is NP-hard (proof via Vertex Cover reduction)
āˆ’ā†’ use greedy heuristic
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Greedy Heuristic to select set C
Greedily select highest-weighted sentence x:
x ā† argmax
xāˆˆXC aāˆˆA(x)
w(a)
Then set the weights w(a) of all assets a āˆˆ A(x) to zero
=ā‡’ ranked list obtained (useful!)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Greedy Heuristic to select set C
Greedily select highest-weighted sentence x:
x ā† argmax
xāˆˆXC aāˆˆA(x)
w(a)
Then set the weights w(a) of all assets a āˆˆ A(x) to zero
=ā‡’ ranked list obtained (useful!)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Greedy Heuristic to select set C
Greedily select highest-weighted sentence x:
x ā† argmax
xāˆˆXC aāˆˆA(x)
w(a)
Then set the weights w(a) of all assets a āˆˆ A(x) to zero
=ā‡’ ranked list obtained (useful!)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOļ¬ƒce.org collections)
and Reuters RCV1 for non-disambiguated sentence selection
GIZA++/UPlug for lexical alignment
Princeton WordNet 3.0 and Spanish WordNet
(+ sense mappings for compatibility)
TreeTagger for morphological analysis
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOļ¬ƒce.org collections)
and Reuters RCV1 for non-disambiguated sentence selection
GIZA++/UPlug for lexical alignment
Princeton WordNet 3.0 and Spanish WordNet
(+ sense mappings for compatibility)
TreeTagger for morphological analysis
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOļ¬ƒce.org collections)
and Reuters RCV1 for non-disambiguated sentence selection
GIZA++/UPlug for lexical alignment
Princeton WordNet 3.0 and Spanish WordNet
(+ sense mappings for compatibility)
TreeTagger for morphological analysis
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOļ¬ƒce.org collections)
and Reuters RCV1 for non-disambiguated sentence selection
GIZA++/UPlug for lexical alignment
Princeton WordNet 3.0 and Spanish WordNet
(+ sense mappings for compatibility)
TreeTagger for morphological analysis
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Examples from OpenSubtitles Corpus
line (something, as a
cord or rope, that is long
and thin and ļ¬‚exible)
I got some ļ¬shing line if you want
me to stitch that.
Von Sefelt, get the stern line.
line (the descendants of
one individual)
What line of kings do you descend
from?
My line has ended.
catch (catch up with
and possibly overtake)
Heā€™s got 100 laps to catch Beau
Brandenburg if he wants to become
world champion.
They wonā€™t catch up.
catch (grasp with the I didnā€™t catch your name.
mind or develop an
understanding of)
Sorry, I didnā€™t catch it.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Examples from OpenSubtitles Corpus
talk (exchange thoughts,
talk with)
Why donā€™t we have a seat and talk it
over.
Okay, Iā€™ll talk to you but one
condition...
talk (use language) But weā€™ll be listening from the kitchen
so talk loud.
You spit when you talk.
opening (a ceremony
accompanying the start
of some enterprise)
We donā€™t have much time until the
opening day of Exhibition.
What a disaster tomorrow is the
opening ceremony!
opening (the ļ¬rst
performance, as of a
theatrical production)
It will be rehearsed in the morning
ready for the opening tomorrow night.
You ready for our big opening night?
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Sense-disambiguated Example Sentences
Corpus Covered
Senses
Example
Sentences
Accuracy
(Wilson
interval)
OpenSubtitles en-es 13,559 117,078 0.815 Ā± 0.081
OpenSubtitles es-en 8,833 113,018 0.798 Ā± 0.090
OpenOļ¬ƒce.org en-es 1,341 13,295 0.803 Ā± 0.081
OpenOļ¬ƒce.org es-en 932 11,181 0.793 Ā± 0.087
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
error analysis:
(1) incorrect lexical alignments unlikely to lead to incorrect
disambiguation
(2) incomplete sense inventories can lead to mistakes
(3) also, on a few occasions, the morphological analyser led to
wrong results
implications for future work:
(1) better monolingual WSD
(2) additional languages, especially phylogenetically unrelated
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
error analysis:
(1) incorrect lexical alignments unlikely to lead to incorrect
disambiguation
(2) incomplete sense inventories can lead to mistakes
(3) also, on a few occasions, the morphological analyser led to
wrong results
implications for future work:
(1) better monolingual WSD
(2) additional languages, especially phylogenetically unrelated
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for OpenSubtitles Corpus
being or located on or
directed toward the
1. In America we drive on the right
side of the road.
side of the body to the
east when facing north
2. Iā€™ll tie down your right arm so
you can learn to throw a left.
3. If we wait from the right side, we
have an advantage there.
put up with something 1. You canā€™t stand it, can you?
or somebody
unpleasant
2. You really think I can tolerate
such an act?
3. No one can stand that harmonica
all day long.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for OpenSubtitles Corpus
using or providing or 1. Not the electric chair.
producing or
transmitting or
2. Some electrical current
circulating through my body.
operated by electricity 3. Near as I can tell itā€™s an
electrical impulse.
take something or
somebody with
1. And they were kind enough to
take me in here.
oneself somewhere 2. It conveys such a great feeling.
3. We interrupt this program to
bring you a special news bulletin.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for long in RCV1 Corpus
1. In the long term interest rate market, the yield of the key
182nd 10 year Japanese government bond (JGB) fell to
2.060 percent early on Tuesday, a record low for any
benchmark 10-year JGB.
2. ā€œThe government and opposition have gambled away the last
chance for a long time to prove they recognise the countryā€™s
problems, and that they put the national good above their
own power interestsā€, news weekly Der Spiegel said.
3. As long as the index keeps hovering between 957 and 995,
we will maintain our short term neutral recommendation.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for purchase in RCV1 Corpus
1. Romaniaā€™s State Ownership Fund (FPS), the countryā€™s main
privatisation body, said on Wednesday it had accepted ļ¬ve
bids for the purchase of a 50.98 percent stake in the largest
local cement maker Romcim.
2. Grand Hotel Group said on Wednesday it has agreed to
procure an option to purchase the remaining 50 percent of
the Grand Hyatt complex in Melbourne from hotel developer
and investor Lustig & Moar.
3. The purchase price for the business, which had 1996
calendar year sales of about $25 million, was not disclosed.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for colonial in RCV1 Corpus
1. Hong Kong came to the end of 156 years of British colonial
rule on June 30 and is now an autonomous capitalist region
of China, running all its own aļ¬€airs except defence and
diplomacy.
2. The letter was sent in error to the embassy of Portugal ā€“ the
former colonial power in East Timor ā€“ and was neither
returned nor forwarded to the Indonesian embassy.
3. Sino-British relations hit a snag when former Governor Chris
Patten launched electoral reforms in the twilight years of
colonial rule despite ļ¬erce opposition by Beijing.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Conclusion
Framework for:
extracting sense-disambiguated example sentences from
parallel corpora
selecting limited numbers of sentences given space constraints
Future Work:
better disambiguation, e.g. additional languages, better
techniques
additional input for selection: sentence length, deļ¬nition
extraction techniques
integrated user interface for UWN (Universal Wordnet)
Contact: demelo@mpi-inf.mpg.de
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Conclusion
Framework for:
extracting sense-disambiguated example sentences from
parallel corpora
selecting limited numbers of sentences given space constraints
Future Work:
better disambiguation, e.g. additional languages, better
techniques
additional input for selection: sentence length, deļ¬nition
extraction techniques
integrated user interface for UWN (Universal Wordnet)
Contact: demelo@mpi-inf.mpg.de
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Conclusion
Framework for:
extracting sense-disambiguated example sentences from
parallel corpora
selecting limited numbers of sentences given space constraints
Future Work:
better disambiguation, e.g. additional languages, better
techniques
additional input for selection: sentence length, deļ¬nition
extraction techniques
integrated user interface for UWN (Universal Wordnet)
Contact: demelo@mpi-inf.mpg.de
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

More Related Content

Similar to Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12Lawrie Hunter
Ā 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
Ā 
More on Indexing Text Operations (1).pptx
More on Indexing  Text Operations (1).pptxMore on Indexing  Text Operations (1).pptx
More on Indexing Text Operations (1).pptxMahsadelavari
Ā 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrasesYue Xiangnan
Ā 
Oxford English for Careers_ Technology 1 Student's Book ( PDFDrive ).pdf
Oxford English for Careers_ Technology 1 Student's Book   ( PDFDrive ).pdfOxford English for Careers_ Technology 1 Student's Book   ( PDFDrive ).pdf
Oxford English for Careers_ Technology 1 Student's Book ( PDFDrive ).pdfbeatrix15
Ā 
Word embeddings
Word embeddingsWord embeddings
Word embeddingsShruti kar
Ā 
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceTowards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceGerard de Melo
Ā 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemIJERA Editor
Ā 
Convert to journal
Convert to journalConvert to journal
Convert to journalProf Jamaluddin
Ā 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics Ibutest
Ā 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...John Tinsley
Ā 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Iconic Translation Machines
Ā 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Toru Fujino
Ā 
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALA NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALIJNSA Journal
Ā 
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALA NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALIJNSA Journal
Ā 
Using ontology based context in the
Using ontology based context in theUsing ontology based context in the
Using ontology based context in theijaia
Ā 

Similar to Extracting Sense-Disambiguated Example Sentences From Parallel Corpora (20)

Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12
Ā 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
Ā 
More on Indexing Text Operations (1).pptx
More on Indexing  Text Operations (1).pptxMore on Indexing  Text Operations (1).pptx
More on Indexing Text Operations (1).pptx
Ā 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrases
Ā 
Oxford English for Careers_ Technology 1 Student's Book ( PDFDrive ).pdf
Oxford English for Careers_ Technology 1 Student's Book   ( PDFDrive ).pdfOxford English for Careers_ Technology 1 Student's Book   ( PDFDrive ).pdf
Oxford English for Careers_ Technology 1 Student's Book ( PDFDrive ).pdf
Ā 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
Ā 
Abstract
AbstractAbstract
Abstract
Ā 
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceTowards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Ā 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
Ā 
Convert to journal
Convert to journalConvert to journal
Convert to journal
Ā 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
Ā 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Ā 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Ā 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
Ā 
Samr
SamrSamr
Samr
Ā 
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALA NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
Ā 
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALA NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
Ā 
Lucia Specia - SMT e pĆ³s-ediĆ§Ć£o
Lucia Specia - SMT e pĆ³s-ediĆ§Ć£oLucia Specia - SMT e pĆ³s-ediĆ§Ć£o
Lucia Specia - SMT e pĆ³s-ediĆ§Ć£o
Ā 
FinalReport
FinalReportFinalReport
FinalReport
Ā 
Using ontology based context in the
Using ontology based context in theUsing ontology based context in the
Using ontology based context in the
Ā 

More from Gerard de Melo

SEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionSEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionGerard de Melo
Ā 
How to Manage your Research
How to Manage your ResearchHow to Manage your Research
How to Manage your ResearchGerard de Melo
Ā 
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesKnowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesGerard de Melo
Ā 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebGerard de Melo
Ā 
From Big Data to Valuable Knowledge
From Big Data to Valuable KnowledgeFrom Big Data to Valuable Knowledge
From Big Data to Valuable KnowledgeGerard de Melo
Ā 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningGerard de Melo
Ā 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Gerard de Melo
Ā 
From Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataFrom Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataGerard de Melo
Ā 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataGerard de Melo
Ā 
UWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseUWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseGerard de Melo
Ā 
Not Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataNot Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataGerard de Melo
Ā 
Good, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGood, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGerard de Melo
Ā 
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyYAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyGerard de Melo
Ā 

More from Gerard de Melo (13)

SEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionSEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link Prediction
Ā 
How to Manage your Research
How to Manage your ResearchHow to Manage your Research
How to Manage your Research
Ā 
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesKnowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Ā 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the Web
Ā 
From Big Data to Valuable Knowledge
From Big Data to Valuable KnowledgeFrom Big Data to Valuable Knowledge
From Big Data to Valuable Knowledge
Ā 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
Ā 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)
Ā 
From Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataFrom Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated Data
Ā 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram Data
Ā 
UWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseUWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge Base
Ā 
Not Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataNot Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked Data
Ā 
Good, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGood, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic Intensities
Ā 
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyYAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
Ā 

Recently uploaded

Full night šŸ„µ Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy āœŒļøo...
Full night šŸ„µ Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy āœŒļøo...Full night šŸ„µ Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy āœŒļøo...
Full night šŸ„µ Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy āœŒļøo...shivangimorya083
Ā 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
Ā 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
Ā 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
Ā 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
Ā 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
Ā 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
Ā 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
Ā 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
Ā 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
Ā 
Delhi Call Girls CP 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
Ā 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
Ā 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
Ā 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
Ā 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
Ā 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
Ā 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
Ā 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
Ā 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
Ā 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
Ā 

Recently uploaded (20)

Full night šŸ„µ Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy āœŒļøo...
Full night šŸ„µ Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy āœŒļøo...Full night šŸ„µ Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy āœŒļøo...
Full night šŸ„µ Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy āœŒļøo...
Ā 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
Ā 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Ā 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
Ā 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Ā 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
Ā 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
Ā 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
Ā 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
Ā 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
Ā 
Delhi Call Girls CP 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Ā 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
Ā 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Ā 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
Ā 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Ā 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Ā 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
Ā 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
Ā 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
Ā 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
Ā 

Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

  • 1. Introduction Example Extraction Example Selection Results Conclusion Extracting Sense-Disambiguated Example Sentences From Parallel Corpora Gerard de Melo and Gerhard Weikum Max Planck Institute for Informatics SaarbrĀØucken, Germany 2009-09-18 G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 2. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 3. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 4. Introduction Example Extraction Example Selection Results Conclusion Introduction a thin, sharp-pointed metal pin with a raised spiral thread running around it and a slotted head, used to join things together by being rotated in under pressure (Concise OED) Use a crosshead screwdriver to tighten the screws inserted in the pre-fixed connectors. (Web) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 5. Introduction Example Extraction Example Selection Results Conclusion Introduction a thin, sharp-pointed metal pin with a raised spiral thread running around it and a slotted head, used to join things together by being rotated in under pressure (Concise OED) Use a crosshead screwdriver to tighten the screws inserted in the pre-fixed connectors. (Web) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 6. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences any sentence that contains a word (being used in a speciļ¬c sense) allow the user to grasp a wordā€™s meaning see circumstances a word would typically be used in G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 7. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences any sentence that contains a word (being used in a speciļ¬c sense) allow the user to grasp a wordā€™s meaning see circumstances a word would typically be used in traditional intensional word deļ¬nitions may be too confusing humans used to deriving meaning from context users can verify whether they have understood deļ¬nition correctly G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 8. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences any sentence that contains a word (being used in a speciļ¬c sense) allow the user to grasp a wordā€™s meaning see circumstances a word would typically be used in possible contexts, e.g. child vs. youngster typical collocations, e.g. to give birth or birth rate but not *to give nascence or *nascence rate G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 9. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =ā‡’ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 10. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =ā‡’ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 11. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =ā‡’ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 12. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =ā‡’ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 13. Introduction Example Extraction Example Selection Results Conclusion Introduction Goals automatically obtain example sentences for a speciļ¬c sense of a word choose set of representative sentences to present to user distinguish senses: There were many bats flying out of the cave. vs. In professional baseball, only wooden bats are permitted. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 14. Introduction Example Extraction Example Selection Results Conclusion Introduction Goals automatically obtain example sentences for a speciļ¬c sense of a word choose set of representative sentences to present to user not all examples are equally useful screen space may be limited (can still show more only on demand) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 15. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 16. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Index (Dictionary) English: Princeton WordNet 3.0 Spanish: Spanish WordNet Ļƒ(t): get senses for term t Ļƒ(t, s): is sense s a sense of term t? behind the scenes: morphological analysis, multi-word expression detection G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 17. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Index (Dictionary) English: Princeton WordNet 3.0 Spanish: Spanish WordNet Ļƒ(t): get senses for term t Ļƒ(t, s): is sense s a sense of term t? behind the scenes: morphological analysis, multi-word expression detection G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 18. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Index (Dictionary) English: Princeton WordNet 3.0 Spanish: Spanish WordNet Ļƒ(t): get senses for term t Ļƒ(t, s): is sense s a sense of term t? behind the scenes: morphological analysis, multi-word expression detection G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 19. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses ļ¬ne-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =ā‡’ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 20. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses ļ¬ne-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =ā‡’ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 21. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses ļ¬ne-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =ā‡’ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 22. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses ļ¬ne-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =ā‡’ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 23. Introduction Example Extraction Example Selection Results Conclusion Introduction Sense Disambiguation There were many bats flying out of the cave. Aus der HĀØohle flogen viele FledermĀØause. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 24. Introduction Example Extraction Example Selection Results Conclusion Introduction Sense Disambiguation There were many bats flying out of the cave. Aus der HĀØohle flogen viele FledermĀØause. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 25. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Parallel Disambiguation req.: text available in two languages a, b good news: more and more parallel corpora available word alignment compute score wsd(sa|ta, tb) = wsd(sa|ta) Ļƒ(ta,sa)csim(tb,sa) s āˆˆĻƒ(ta) Ļƒ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 26. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Parallel Disambiguation req.: text available in two languages a, b good news: more and more parallel corpora available word alignment compute score wsd(sa|ta, tb) = wsd(sa|ta) Ļƒ(ta,sa)csim(tb,sa) s āˆˆĻƒ(ta) Ļƒ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 27. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Parallel Disambiguation req.: text available in two languages a, b good news: more and more parallel corpora available word alignment compute score wsd(sa|ta, tb) = wsd(sa|ta) Ļƒ(ta,sa)csim(tb,sa) s āˆˆĻƒ(ta) Ļƒ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 28. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Components csim(tb, sa): cross-lingual similarity monolingual WSD: compare bag-of-words vector for term context with sense context (constructed from deļ¬nitions, etc.) Semantic Similarity sim(s1, s2) identify only near-identical senses (e.g. of house and home), not arbitrary associations (e.g. house and door) csim(tb, sa) = sbāˆˆĻƒ(tb) sim(sa, sb) wsd(sb|tb) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 29. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Components csim(tb, sa): cross-lingual similarity monolingual WSD: compare bag-of-words vector for term context with sense context (constructed from deļ¬nitions, etc.) Semantic Similarity sim(s1, s2) identify only near-identical senses (e.g. of house and home), not arbitrary associations (e.g. house and door) wsd(s|t) = Ļƒ(t, s) Ī± + v(s)T v(t) ||v(s)|| ||v(t)|| G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 30. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Components csim(tb, sa): cross-lingual similarity monolingual WSD: compare bag-of-words vector for term context with sense context (constructed from deļ¬nitions, etc.) Semantic Similarity sim(s1, s2) identify only near-identical senses (e.g. of house and home), not arbitrary associations (e.g. house and door) sim(s1, s2) = ļ£± ļ£“ļ£“ļ£“ļ£² ļ£“ļ£“ļ£“ļ£³ 1 s1 = s2 1 s1, s2 in near-synonymy relationship 1 s1, s2 in hypernymy/hyponymy relationship 0 otherwise G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 31. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Extraction Use as example sentence iļ¬€ score suļ¬ƒciently high wsd(sa|ta, tb) = wsd(ta, sa) Ļƒ(ta,sa)csim(tb,sa) s āˆˆĻƒ(ta) Ļƒ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 32. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 33. Introduction Example Extraction Example Selection Results Conclusion Example Selection Task Motivation for computational applications: more data =ā‡’ better data for human users: a good limited selection should be provided at ļ¬rst assumption: space constraint k - number of sentences goal: choose a good set of k example sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 34. Introduction Example Extraction Example Selection Results Conclusion Example Selection Task Motivation for computational applications: more data =ā‡’ better data for human users: a good limited selection should be provided at ļ¬rst assumption: space constraint k - number of sentences goal: choose a good set of k example sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 35. Introduction Example Extraction Example Selection Results Conclusion Example Selection Task Motivation for computational applications: more data =ā‡’ better data for human users: a good limited selection should be provided at ļ¬rst assumption: space constraint k - number of sentences goal: choose a good set of k example sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 36. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good example sentence? one that showcases how a word is used in context (typical prepositions, collocations, etc.) one that helps in grasping the meaning related work by Rychly et al. (2008): one that is intelligible to learners G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 37. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good example sentence? one that showcases how a word is used in context (typical prepositions, collocations, etc.) one that helps in grasping the meaning related work by Rychly et al. (2008): one that is intelligible to learners G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 38. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good example sentence? one that showcases how a word is used in context (typical prepositions, collocations, etc.) one that helps in grasping the meaning related work by Rychly et al. (2008): one that is intelligible to learners G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 39. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 diļ¬€erent classes of n-gram assets 1 class of assets for entire original sentence e.g. for account: containing frequent bigram bank account can be an asset containing frequent trigram to account for can be an asset G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 40. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 diļ¬€erent classes of n-gram assets 1 class of assets for entire original sentence 1: original unigram 2-5: bigram with preceding/following, trigram, etc. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 41. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 diļ¬€erent classes of n-gram assets 1 class of assets for entire original sentence weights w(a) = f (x,a) n i=1 f (xi ,a) , etc. =ā‡’ higher weight for open an account than for Peterā€™s chequing account G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 42. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 diļ¬€erent classes of n-gram assets 1 class of assets for entire original sentence weight: cosine similarity with sense deļ¬nition =ā‡’ bias towards explanatory sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 43. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize aāˆˆ xāˆˆC A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) āˆ’ā†’ use greedy heuristic No: Most frequent expressions would dominate the result set. Need diversity! G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 44. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize aāˆˆ xāˆˆC A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) āˆ’ā†’ use greedy heuristic each asset in result set only counted once G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 45. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize aāˆˆ xāˆˆC A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) āˆ’ā†’ use greedy heuristic G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 46. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize aāˆˆ xāˆˆC A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) āˆ’ā†’ use greedy heuristic G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 47. Introduction Example Extraction Example Selection Results Conclusion Example Selection Greedy Heuristic to select set C Greedily select highest-weighted sentence x: x ā† argmax xāˆˆXC aāˆˆA(x) w(a) Then set the weights w(a) of all assets a āˆˆ A(x) to zero =ā‡’ ranked list obtained (useful!) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 48. Introduction Example Extraction Example Selection Results Conclusion Example Selection Greedy Heuristic to select set C Greedily select highest-weighted sentence x: x ā† argmax xāˆˆXC aāˆˆA(x) w(a) Then set the weights w(a) of all assets a āˆˆ A(x) to zero =ā‡’ ranked list obtained (useful!) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 49. Introduction Example Extraction Example Selection Results Conclusion Example Selection Greedy Heuristic to select set C Greedily select highest-weighted sentence x: x ā† argmax xāˆˆXC aāˆˆA(x) w(a) Then set the weights w(a) of all assets a āˆˆ A(x) to zero =ā‡’ ranked list obtained (useful!) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 50. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 51. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOļ¬ƒce.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 52. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOļ¬ƒce.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 53. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOļ¬ƒce.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 54. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOļ¬ƒce.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 55. Introduction Example Extraction Example Selection Results Conclusion Results Examples from OpenSubtitles Corpus line (something, as a cord or rope, that is long and thin and ļ¬‚exible) I got some ļ¬shing line if you want me to stitch that. Von Sefelt, get the stern line. line (the descendants of one individual) What line of kings do you descend from? My line has ended. catch (catch up with and possibly overtake) Heā€™s got 100 laps to catch Beau Brandenburg if he wants to become world champion. They wonā€™t catch up. catch (grasp with the I didnā€™t catch your name. mind or develop an understanding of) Sorry, I didnā€™t catch it. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 56. Introduction Example Extraction Example Selection Results Conclusion Results Examples from OpenSubtitles Corpus talk (exchange thoughts, talk with) Why donā€™t we have a seat and talk it over. Okay, Iā€™ll talk to you but one condition... talk (use language) But weā€™ll be listening from the kitchen so talk loud. You spit when you talk. opening (a ceremony accompanying the start of some enterprise) We donā€™t have much time until the opening day of Exhibition. What a disaster tomorrow is the opening ceremony! opening (the ļ¬rst performance, as of a theatrical production) It will be rehearsed in the morning ready for the opening tomorrow night. You ready for our big opening night? G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 57. Introduction Example Extraction Example Selection Results Conclusion Results Sense-disambiguated Example Sentences Corpus Covered Senses Example Sentences Accuracy (Wilson interval) OpenSubtitles en-es 13,559 117,078 0.815 Ā± 0.081 OpenSubtitles es-en 8,833 113,018 0.798 Ā± 0.090 OpenOļ¬ƒce.org en-es 1,341 13,295 0.803 Ā± 0.081 OpenOļ¬ƒce.org es-en 932 11,181 0.793 Ā± 0.087 G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 58. Introduction Example Extraction Example Selection Results Conclusion Results error analysis: (1) incorrect lexical alignments unlikely to lead to incorrect disambiguation (2) incomplete sense inventories can lead to mistakes (3) also, on a few occasions, the morphological analyser led to wrong results implications for future work: (1) better monolingual WSD (2) additional languages, especially phylogenetically unrelated G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 59. Introduction Example Extraction Example Selection Results Conclusion Results error analysis: (1) incorrect lexical alignments unlikely to lead to incorrect disambiguation (2) incomplete sense inventories can lead to mistakes (3) also, on a few occasions, the morphological analyser led to wrong results implications for future work: (1) better monolingual WSD (2) additional languages, especially phylogenetically unrelated G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 60. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for OpenSubtitles Corpus being or located on or directed toward the 1. In America we drive on the right side of the road. side of the body to the east when facing north 2. Iā€™ll tie down your right arm so you can learn to throw a left. 3. If we wait from the right side, we have an advantage there. put up with something 1. You canā€™t stand it, can you? or somebody unpleasant 2. You really think I can tolerate such an act? 3. No one can stand that harmonica all day long. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 61. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for OpenSubtitles Corpus using or providing or 1. Not the electric chair. producing or transmitting or 2. Some electrical current circulating through my body. operated by electricity 3. Near as I can tell itā€™s an electrical impulse. take something or somebody with 1. And they were kind enough to take me in here. oneself somewhere 2. It conveys such a great feeling. 3. We interrupt this program to bring you a special news bulletin. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 62. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for long in RCV1 Corpus 1. In the long term interest rate market, the yield of the key 182nd 10 year Japanese government bond (JGB) fell to 2.060 percent early on Tuesday, a record low for any benchmark 10-year JGB. 2. ā€œThe government and opposition have gambled away the last chance for a long time to prove they recognise the countryā€™s problems, and that they put the national good above their own power interestsā€, news weekly Der Spiegel said. 3. As long as the index keeps hovering between 957 and 995, we will maintain our short term neutral recommendation. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 63. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for purchase in RCV1 Corpus 1. Romaniaā€™s State Ownership Fund (FPS), the countryā€™s main privatisation body, said on Wednesday it had accepted ļ¬ve bids for the purchase of a 50.98 percent stake in the largest local cement maker Romcim. 2. Grand Hotel Group said on Wednesday it has agreed to procure an option to purchase the remaining 50 percent of the Grand Hyatt complex in Melbourne from hotel developer and investor Lustig & Moar. 3. The purchase price for the business, which had 1996 calendar year sales of about $25 million, was not disclosed. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 64. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for colonial in RCV1 Corpus 1. Hong Kong came to the end of 156 years of British colonial rule on June 30 and is now an autonomous capitalist region of China, running all its own aļ¬€airs except defence and diplomacy. 2. The letter was sent in error to the embassy of Portugal ā€“ the former colonial power in East Timor ā€“ and was neither returned nor forwarded to the Indonesian embassy. 3. Sino-British relations hit a snag when former Governor Chris Patten launched electoral reforms in the twilight years of colonial rule despite ļ¬erce opposition by Beijing. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 65. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 66. Introduction Example Extraction Example Selection Results Conclusion Conclusion Framework for: extracting sense-disambiguated example sentences from parallel corpora selecting limited numbers of sentences given space constraints Future Work: better disambiguation, e.g. additional languages, better techniques additional input for selection: sentence length, deļ¬nition extraction techniques integrated user interface for UWN (Universal Wordnet) Contact: demelo@mpi-inf.mpg.de G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 67. Introduction Example Extraction Example Selection Results Conclusion Conclusion Framework for: extracting sense-disambiguated example sentences from parallel corpora selecting limited numbers of sentences given space constraints Future Work: better disambiguation, e.g. additional languages, better techniques additional input for selection: sentence length, deļ¬nition extraction techniques integrated user interface for UWN (Universal Wordnet) Contact: demelo@mpi-inf.mpg.de G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 68. Introduction Example Extraction Example Selection Results Conclusion Conclusion Framework for: extracting sense-disambiguated example sentences from parallel corpora selecting limited numbers of sentences given space constraints Future Work: better disambiguation, e.g. additional languages, better techniques additional input for selection: sentence length, deļ¬nition extraction techniques integrated user interface for UWN (Universal Wordnet) Contact: demelo@mpi-inf.mpg.de G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences