Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction
Example Extraction
Example Selection
Results
Conclusion
Extracting
Sense-Disambiguated Example Sentences
From...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
a thin, sharp-pointed metal pin with a r...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
a thin, sharp-pointed metal pin with a r...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
any sentence that cont...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
any sentence that cont...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
any sentence that cont...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=⇒ most modern diction...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=⇒ most modern diction...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=⇒ most modern diction...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=⇒ most modern diction...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Goals
automatically obtain example sente...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Goals
automatically obtain example sente...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Index (Dictionary)
English: ...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Index (Dictionary)
English: ...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Index (Dictionary)
English: ...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate ...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate ...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate ...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate ...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Sense Disambiguation
There were many bat...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Sense Disambiguation
There were many bat...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Parallel Disambiguation
req.: text...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Parallel Disambiguation
req.: text...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Parallel Disambiguation
req.: text...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Components
csim(tb, sa): cross-lin...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Components
csim(tb, sa): cross-lin...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Components
csim(tb, sa): cross-lin...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Extraction
Use as example sentence...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Task Motivation
for computational a...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Task Motivation
for computational a...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Task Motivation
for computational a...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good example sentence?
on...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good example sentence?
on...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good example sentence?
on...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be tho...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be tho...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be tho...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be tho...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sente...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sente...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sente...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sente...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Greedy Heuristic to select set C
Gr...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Greedy Heuristic to select set C
Gr...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Greedy Heuristic to select set C
Gr...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOffi...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOffi...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOffi...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOffi...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Examples from OpenSubtitles Corpus
line (some...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Examples from OpenSubtitles Corpus
talk (exch...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Sense-disambiguated Example Sentences
Corpus ...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
error analysis:
(1) incorrect lexical alignme...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
error analysis:
(1) incorrect lexical alignme...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for OpenSubtitles Corpus
being or lo...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for OpenSubtitles Corpus
using or pr...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for long in RCV1 Corpus
1. In the lo...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for purchase in RCV1 Corpus
1. Roman...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for colonial in RCV1 Corpus
1. Hong ...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Conclusion
Framework for:
extracting sense-disambigua...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Conclusion
Framework for:
extracting sense-disambigua...
Introduction
Example Extraction
Example Selection
Results
Conclusion
Conclusion
Framework for:
extracting sense-disambigua...
Upcoming SlideShare
Loading in …5
×

Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

1,317 views

Published on

Example sentences provide an intuitive means of grasping the meaning of a word, and are frequently used to complement conventional word definitions. When a word has multiple meanings, it is useful to have example sentences for specific senses (and hence definitions) of that word rather than indiscriminately lumping all of them together. In this paper, we investigate to what extent such sense-specific example sentences can be extracted from parallel corpora using lexical knowledge bases for multiple languages as a sense index. We use word sense disambiguation heuristics and a cross-lingual measure of semantic similarity to link example sentences to specific word senses. From the sentences found for a given sense, an algorithm then selects a smaller subset that can be presented to end users, taking into account both representativeness and diversity. Preliminary results show that a precision of around 80% can be obtained for a reasonable number of word senses, and that the subset selection yields convincing results.

  • Be the first to comment

Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

  1. 1. Introduction Example Extraction Example Selection Results Conclusion Extracting Sense-Disambiguated Example Sentences From Parallel Corpora Gerard de Melo and Gerhard Weikum Max Planck Institute for Informatics Saarbr¨ucken, Germany 2009-09-18 G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  2. 2. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  3. 3. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  4. 4. Introduction Example Extraction Example Selection Results Conclusion Introduction a thin, sharp-pointed metal pin with a raised spiral thread running around it and a slotted head, used to join things together by being rotated in under pressure (Concise OED) Use a crosshead screwdriver to tighten the screws inserted in the pre-fixed connectors. (Web) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  5. 5. Introduction Example Extraction Example Selection Results Conclusion Introduction a thin, sharp-pointed metal pin with a raised spiral thread running around it and a slotted head, used to join things together by being rotated in under pressure (Concise OED) Use a crosshead screwdriver to tighten the screws inserted in the pre-fixed connectors. (Web) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  6. 6. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences any sentence that contains a word (being used in a specific sense) allow the user to grasp a word’s meaning see circumstances a word would typically be used in G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  7. 7. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences any sentence that contains a word (being used in a specific sense) allow the user to grasp a word’s meaning see circumstances a word would typically be used in traditional intensional word definitions may be too confusing humans used to deriving meaning from context users can verify whether they have understood definition correctly G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  8. 8. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences any sentence that contains a word (being used in a specific sense) allow the user to grasp a word’s meaning see circumstances a word would typically be used in possible contexts, e.g. child vs. youngster typical collocations, e.g. to give birth or birth rate but not *to give nascence or *nascence rate G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  9. 9. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =⇒ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  10. 10. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =⇒ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  11. 11. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =⇒ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  12. 12. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =⇒ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  13. 13. Introduction Example Extraction Example Selection Results Conclusion Introduction Goals automatically obtain example sentences for a specific sense of a word choose set of representative sentences to present to user distinguish senses: There were many bats flying out of the cave. vs. In professional baseball, only wooden bats are permitted. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  14. 14. Introduction Example Extraction Example Selection Results Conclusion Introduction Goals automatically obtain example sentences for a specific sense of a word choose set of representative sentences to present to user not all examples are equally useful screen space may be limited (can still show more only on demand) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  15. 15. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  16. 16. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Index (Dictionary) English: Princeton WordNet 3.0 Spanish: Spanish WordNet σ(t): get senses for term t σ(t, s): is sense s a sense of term t? behind the scenes: morphological analysis, multi-word expression detection G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  17. 17. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Index (Dictionary) English: Princeton WordNet 3.0 Spanish: Spanish WordNet σ(t): get senses for term t σ(t, s): is sense s a sense of term t? behind the scenes: morphological analysis, multi-word expression detection G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  18. 18. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Index (Dictionary) English: Princeton WordNet 3.0 Spanish: Spanish WordNet σ(t): get senses for term t σ(t, s): is sense s a sense of term t? behind the scenes: morphological analysis, multi-word expression detection G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  19. 19. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses fine-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =⇒ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  20. 20. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses fine-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =⇒ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  21. 21. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses fine-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =⇒ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  22. 22. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses fine-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =⇒ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  23. 23. Introduction Example Extraction Example Selection Results Conclusion Introduction Sense Disambiguation There were many bats flying out of the cave. Aus der H¨ohle flogen viele Flederm¨ause. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  24. 24. Introduction Example Extraction Example Selection Results Conclusion Introduction Sense Disambiguation There were many bats flying out of the cave. Aus der H¨ohle flogen viele Flederm¨ause. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  25. 25. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Parallel Disambiguation req.: text available in two languages a, b good news: more and more parallel corpora available word alignment compute score wsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa) s ∈σ(ta) σ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  26. 26. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Parallel Disambiguation req.: text available in two languages a, b good news: more and more parallel corpora available word alignment compute score wsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa) s ∈σ(ta) σ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  27. 27. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Parallel Disambiguation req.: text available in two languages a, b good news: more and more parallel corpora available word alignment compute score wsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa) s ∈σ(ta) σ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  28. 28. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Components csim(tb, sa): cross-lingual similarity monolingual WSD: compare bag-of-words vector for term context with sense context (constructed from definitions, etc.) Semantic Similarity sim(s1, s2) identify only near-identical senses (e.g. of house and home), not arbitrary associations (e.g. house and door) csim(tb, sa) = sb∈σ(tb) sim(sa, sb) wsd(sb|tb) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  29. 29. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Components csim(tb, sa): cross-lingual similarity monolingual WSD: compare bag-of-words vector for term context with sense context (constructed from definitions, etc.) Semantic Similarity sim(s1, s2) identify only near-identical senses (e.g. of house and home), not arbitrary associations (e.g. house and door) wsd(s|t) = σ(t, s) α + v(s)T v(t) ||v(s)|| ||v(t)|| G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  30. 30. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Components csim(tb, sa): cross-lingual similarity monolingual WSD: compare bag-of-words vector for term context with sense context (constructed from definitions, etc.) Semantic Similarity sim(s1, s2) identify only near-identical senses (e.g. of house and home), not arbitrary associations (e.g. house and door) sim(s1, s2) =    1 s1 = s2 1 s1, s2 in near-synonymy relationship 1 s1, s2 in hypernymy/hyponymy relationship 0 otherwise G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  31. 31. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Extraction Use as example sentence iff score sufficiently high wsd(sa|ta, tb) = wsd(ta, sa) σ(ta,sa)csim(tb,sa) s ∈σ(ta) σ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  32. 32. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  33. 33. Introduction Example Extraction Example Selection Results Conclusion Example Selection Task Motivation for computational applications: more data =⇒ better data for human users: a good limited selection should be provided at first assumption: space constraint k - number of sentences goal: choose a good set of k example sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  34. 34. Introduction Example Extraction Example Selection Results Conclusion Example Selection Task Motivation for computational applications: more data =⇒ better data for human users: a good limited selection should be provided at first assumption: space constraint k - number of sentences goal: choose a good set of k example sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  35. 35. Introduction Example Extraction Example Selection Results Conclusion Example Selection Task Motivation for computational applications: more data =⇒ better data for human users: a good limited selection should be provided at first assumption: space constraint k - number of sentences goal: choose a good set of k example sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  36. 36. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good example sentence? one that showcases how a word is used in context (typical prepositions, collocations, etc.) one that helps in grasping the meaning related work by Rychly et al. (2008): one that is intelligible to learners G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  37. 37. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good example sentence? one that showcases how a word is used in context (typical prepositions, collocations, etc.) one that helps in grasping the meaning related work by Rychly et al. (2008): one that is intelligible to learners G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  38. 38. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good example sentence? one that showcases how a word is used in context (typical prepositions, collocations, etc.) one that helps in grasping the meaning related work by Rychly et al. (2008): one that is intelligible to learners G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  39. 39. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 different classes of n-gram assets 1 class of assets for entire original sentence e.g. for account: containing frequent bigram bank account can be an asset containing frequent trigram to account for can be an asset G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  40. 40. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 different classes of n-gram assets 1 class of assets for entire original sentence 1: original unigram 2-5: bigram with preceding/following, trigram, etc. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  41. 41. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 different classes of n-gram assets 1 class of assets for entire original sentence weights w(a) = f (x,a) n i=1 f (xi ,a) , etc. =⇒ higher weight for open an account than for Peter’s chequing account G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  42. 42. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 different classes of n-gram assets 1 class of assets for entire original sentence weight: cosine similarity with sense definition =⇒ bias towards explanatory sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  43. 43. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize a∈ x∈C A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) −→ use greedy heuristic No: Most frequent expressions would dominate the result set. Need diversity! G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  44. 44. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize a∈ x∈C A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) −→ use greedy heuristic each asset in result set only counted once G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  45. 45. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize a∈ x∈C A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) −→ use greedy heuristic G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  46. 46. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize a∈ x∈C A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) −→ use greedy heuristic G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  47. 47. Introduction Example Extraction Example Selection Results Conclusion Example Selection Greedy Heuristic to select set C Greedily select highest-weighted sentence x: x ← argmax x∈XC a∈A(x) w(a) Then set the weights w(a) of all assets a ∈ A(x) to zero =⇒ ranked list obtained (useful!) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  48. 48. Introduction Example Extraction Example Selection Results Conclusion Example Selection Greedy Heuristic to select set C Greedily select highest-weighted sentence x: x ← argmax x∈XC a∈A(x) w(a) Then set the weights w(a) of all assets a ∈ A(x) to zero =⇒ ranked list obtained (useful!) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  49. 49. Introduction Example Extraction Example Selection Results Conclusion Example Selection Greedy Heuristic to select set C Greedily select highest-weighted sentence x: x ← argmax x∈XC a∈A(x) w(a) Then set the weights w(a) of all assets a ∈ A(x) to zero =⇒ ranked list obtained (useful!) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  50. 50. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  51. 51. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOffice.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  52. 52. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOffice.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  53. 53. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOffice.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  54. 54. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOffice.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  55. 55. Introduction Example Extraction Example Selection Results Conclusion Results Examples from OpenSubtitles Corpus line (something, as a cord or rope, that is long and thin and flexible) I got some fishing line if you want me to stitch that. Von Sefelt, get the stern line. line (the descendants of one individual) What line of kings do you descend from? My line has ended. catch (catch up with and possibly overtake) He’s got 100 laps to catch Beau Brandenburg if he wants to become world champion. They won’t catch up. catch (grasp with the I didn’t catch your name. mind or develop an understanding of) Sorry, I didn’t catch it. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  56. 56. Introduction Example Extraction Example Selection Results Conclusion Results Examples from OpenSubtitles Corpus talk (exchange thoughts, talk with) Why don’t we have a seat and talk it over. Okay, I’ll talk to you but one condition... talk (use language) But we’ll be listening from the kitchen so talk loud. You spit when you talk. opening (a ceremony accompanying the start of some enterprise) We don’t have much time until the opening day of Exhibition. What a disaster tomorrow is the opening ceremony! opening (the first performance, as of a theatrical production) It will be rehearsed in the morning ready for the opening tomorrow night. You ready for our big opening night? G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  57. 57. Introduction Example Extraction Example Selection Results Conclusion Results Sense-disambiguated Example Sentences Corpus Covered Senses Example Sentences Accuracy (Wilson interval) OpenSubtitles en-es 13,559 117,078 0.815 ± 0.081 OpenSubtitles es-en 8,833 113,018 0.798 ± 0.090 OpenOffice.org en-es 1,341 13,295 0.803 ± 0.081 OpenOffice.org es-en 932 11,181 0.793 ± 0.087 G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  58. 58. Introduction Example Extraction Example Selection Results Conclusion Results error analysis: (1) incorrect lexical alignments unlikely to lead to incorrect disambiguation (2) incomplete sense inventories can lead to mistakes (3) also, on a few occasions, the morphological analyser led to wrong results implications for future work: (1) better monolingual WSD (2) additional languages, especially phylogenetically unrelated G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  59. 59. Introduction Example Extraction Example Selection Results Conclusion Results error analysis: (1) incorrect lexical alignments unlikely to lead to incorrect disambiguation (2) incomplete sense inventories can lead to mistakes (3) also, on a few occasions, the morphological analyser led to wrong results implications for future work: (1) better monolingual WSD (2) additional languages, especially phylogenetically unrelated G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  60. 60. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for OpenSubtitles Corpus being or located on or directed toward the 1. In America we drive on the right side of the road. side of the body to the east when facing north 2. I’ll tie down your right arm so you can learn to throw a left. 3. If we wait from the right side, we have an advantage there. put up with something 1. You can’t stand it, can you? or somebody unpleasant 2. You really think I can tolerate such an act? 3. No one can stand that harmonica all day long. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  61. 61. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for OpenSubtitles Corpus using or providing or 1. Not the electric chair. producing or transmitting or 2. Some electrical current circulating through my body. operated by electricity 3. Near as I can tell it’s an electrical impulse. take something or somebody with 1. And they were kind enough to take me in here. oneself somewhere 2. It conveys such a great feeling. 3. We interrupt this program to bring you a special news bulletin. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  62. 62. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for long in RCV1 Corpus 1. In the long term interest rate market, the yield of the key 182nd 10 year Japanese government bond (JGB) fell to 2.060 percent early on Tuesday, a record low for any benchmark 10-year JGB. 2. “The government and opposition have gambled away the last chance for a long time to prove they recognise the country’s problems, and that they put the national good above their own power interests”, news weekly Der Spiegel said. 3. As long as the index keeps hovering between 957 and 995, we will maintain our short term neutral recommendation. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  63. 63. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for purchase in RCV1 Corpus 1. Romania’s State Ownership Fund (FPS), the country’s main privatisation body, said on Wednesday it had accepted five bids for the purchase of a 50.98 percent stake in the largest local cement maker Romcim. 2. Grand Hotel Group said on Wednesday it has agreed to procure an option to purchase the remaining 50 percent of the Grand Hyatt complex in Melbourne from hotel developer and investor Lustig & Moar. 3. The purchase price for the business, which had 1996 calendar year sales of about $25 million, was not disclosed. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  64. 64. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for colonial in RCV1 Corpus 1. Hong Kong came to the end of 156 years of British colonial rule on June 30 and is now an autonomous capitalist region of China, running all its own affairs except defence and diplomacy. 2. The letter was sent in error to the embassy of Portugal – the former colonial power in East Timor – and was neither returned nor forwarded to the Indonesian embassy. 3. Sino-British relations hit a snag when former Governor Chris Patten launched electoral reforms in the twilight years of colonial rule despite fierce opposition by Beijing. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  65. 65. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  66. 66. Introduction Example Extraction Example Selection Results Conclusion Conclusion Framework for: extracting sense-disambiguated example sentences from parallel corpora selecting limited numbers of sentences given space constraints Future Work: better disambiguation, e.g. additional languages, better techniques additional input for selection: sentence length, definition extraction techniques integrated user interface for UWN (Universal Wordnet) Contact: demelo@mpi-inf.mpg.de G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  67. 67. Introduction Example Extraction Example Selection Results Conclusion Conclusion Framework for: extracting sense-disambiguated example sentences from parallel corpora selecting limited numbers of sentences given space constraints Future Work: better disambiguation, e.g. additional languages, better techniques additional input for selection: sentence length, definition extraction techniques integrated user interface for UWN (Universal Wordnet) Contact: demelo@mpi-inf.mpg.de G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  68. 68. Introduction Example Extraction Example Selection Results Conclusion Conclusion Framework for: extracting sense-disambiguated example sentences from parallel corpora selecting limited numbers of sentences given space constraints Future Work: better disambiguation, e.g. additional languages, better techniques additional input for selection: sentence length, definition extraction techniques integrated user interface for UWN (Universal Wordnet) Contact: demelo@mpi-inf.mpg.de G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

×