Your SlideShare is downloading. ×
0
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Towards a Universal Wordnet by Learning from Combined Evidence

202

Published on

Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction …

Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their
meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high
level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
202
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Towards a Universal Wordnet by Learning from Combined Evidence Gerard de Melo and Gerhard Weikum Max Planck Institute for Informatics Saarbr¨ucken, Germany 2009-11-03 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 1/29
  • 2. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? person who gives a talk “speaker” device that produces sounds Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 3. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? flat piece of wood “board” committee panel for writing with chalk to enter a transportation vehicle Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 4. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? someone who studies “student” “pupil” Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 5. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? faculty professor member part Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 6. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? entity institution educational institution university ... Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 7. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? Many Applications examples: NLP, AI question answering query expansion human consultation entity institution educational institution university ... Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 8. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Multilinguality the world is multilingual the Internet is also increasingly multilingual Top 10 Languages by Approx. No. of Speakers Source: Ethnologue 2005 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
  • 9. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Multilinguality the world is multilingual the Internet is also increasingly multilingual Internet users by Region Source: Internet World Stats Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
  • 10. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction person who gives a talk eng: “speaker” jpn: “ ”話者 rus: “докладчик” ces: “řečník” ... ...... Vision universal index of word meanings large-scale semantic network with class hierarchy look up any word in any language, get a list of its meanings Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
  • 11. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction entitypor: “entidade” cmn: “ ”制度 institution educational institution university heb: “‫ישות‬.” deu: “Bildungs- einrichtung” cym: “prifysgol” ... Vision universal index of word meanings large-scale semantic network with class hierarchy meanings should be connected via semantic relations Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
  • 12. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 5/29
  • 13. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 6/29
  • 14. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases WordNet lexical database created at Princeton enumerates meanings of English words meaning-to-meaning links Miller, Fellbaum et al. (1990) among most-cited papers in computer science (source: CiteseerX) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
  • 15. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases WordNet lexical database created at Princeton enumerates meanings of English words meaning-to-meaning links Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
  • 16. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases WordNet lexical database created at Princeton enumerates meanings of English words meaning-to-meaning links hypernym hierarchy meronymy (part of) etc. Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
  • 17. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 18. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 19. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets not a single, coherent resource Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 20. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 21. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Other Resources PANGLOSS Ontology: Knight & Luk (1994) TransGraph system: Etzioni et al. (2007) DBPedia, YAGO, OpenCyc 2 languages, around 70 000 entities Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
  • 22. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Other Resources PANGLOSS Ontology: Knight & Luk (1994) TransGraph system: Etzioni et al. (2007) DBPedia, YAGO, OpenCyc large translation graph limited structure e.g. no semantic hierarchy Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
  • 23. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Other Resources PANGLOSS Ontology: Knight & Luk (1994) TransGraph system: Etzioni et al. (2007) DBPedia, YAGO, OpenCyc class hierarchy not multilingual Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
  • 24. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 10/29
  • 25. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Strategy use existing wordnets as backbone add new terms, link to meaning nodes spa: “trayectoria” academic course part of a meal route of travel series of events eng: “course” eng: “class” Existing Wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
  • 26. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Strategy use existing wordnets as backbone add new terms, link to meaning nodes spa: “trayectoria” academic course part of a meal route of travel series of events eng: “course” eng: “class” Existing Wordnets −→ deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Desired Output Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
  • 27. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Input Graph use existing wordnets as backbone add translations to graph mainly English, Spanish, Catalan spa: “trayectoria” academic course part of a meal route of travel series of events eng: “course” eng: “class” Input Graph G0 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
  • 28. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Input Graph use existing wordnets as backbone add translations to graph dictionaries (e.g. Wiktionary) thesauri and ontologies parallel corpora (word alignment) also: predict new translations deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Input Graph G0 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
  • 29. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Approach: Link new words to meanings of their translations Huge Challenge: Disambiguation! academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
  • 30. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Approach: Link new words to meanings of their translations Huge Challenge: Disambiguation! academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation ? ? ? ? Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
  • 31. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation ? ? ? ? Approach variety of features that analyse previous graph Gi−1, incorporate neighbourhood information into an edge’s feature vector supervised learning: new edge weights determined using RBF-kernel SVM with posterior probability estimation Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
  • 32. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation ? ? ? ? Approach variety of features that analyse previous graph Gi−1, incorporate neighbourhood information into an edge’s feature vector supervised learning: new edge weights determined using RBF-kernel SVM with posterior probability estimation Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
  • 33. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m Given term t and meaning m Question: Should they be linked? Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 34. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' Given term t and meaning m Question: Should they be linked? Look at neighbours t ∈ Γt Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 35. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' t ∈Γ(t) sim∗(t , m) sim∗(t , m) + dissim(t , m) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 36. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' t ∈Γ(t) sim∗(t , m) sim∗(t , m) + dissim(t , m) sim∗(t ,m)= max m ∈Γ(t ) sim(m ,m) dissim(t ,m)= P m ∈Γ(t ) (1−sim(m ,m)) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 37. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' t ∈Γ(t) φ1(t, t ) sim∗(t , m) sim∗(t , m) + dissim(t , m) sim∗(t ,m)= max m ∈Γ(t ) φ2(t ,m )sim(m ,m) dissim(t ,m)= P m ∈Γ(t ) φ2(t ,m )(1−sim(m ,m)) weighting based on: part-of-speech corpus frequency ... Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 38. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Other Features cosine similarity of translations with gloss scores assessing polysemy by looking at back-translations many more (see paper for details) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 39. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Approach use scores as features for RBF-kernel SVM multiple iterations: each graphs Gi based on the previous Gi−1 stop when F1 score plateau is reached on a validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 40. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Approach use scores as features for RBF-kernel SVM multiple iterations: each graphs Gi based on the previous Gi−1 stop when F1 score plateau is reached on a validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 41. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Approach use scores as features for RBF-kernel SVM multiple iterations: each graphs Gi based on the previous Gi−1 stop when F1 score plateau is reached on a validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 42. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 17/29
  • 43. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results Setup input graph G0: 448,069 pre-existing term-meaning links 10,805,400 translation edges 1.3 million term nodes with candidates 7.7 candidate meanings per new term 2,445 term-meaning links for training (French/German) 2,901 term-meaning links as validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
  • 44. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results Setup input graph G0: 448,069 pre-existing term-meaning links 10,805,400 translation edges 1.3 million term nodes with candidates 7.7 candidate meanings per new term 2,445 term-meaning links for training (French/German) 2,901 term-meaning links as validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
  • 45. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results Setup input graph G0: 448,069 pre-existing term-meaning links 10,805,400 translation edges 1.3 million term nodes with candidates 7.7 candidate meanings per new term 2,445 term-meaning links for training (French/German) 2,901 term-meaning links as validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
  • 46. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results deu: “Schulgebäude” school (group of fish) school (institution) school (building) deu: “Schulhaus” deu: “Fischschwarm” ces: “hejno” fra: “banc” ind: “sekolah” jpn: “ ”学校 kor: “ ”학교 lao: “ໂຮງຮຽນ” kat: “ ”სკოლა Excerpt from final UWN graph G3 after 3 iterations retaining only edges with sufficiently high weights (0.5 / 0.6) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 19/29
  • 47. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Evaluation Relation Precision1 Term-Meaning Links (French) 89.2% ± 3.4% Term-Meaning Links (German) 85.9% ± 3.8% Term-Meaning Links (Mandarin Chinese) 90.5% ± 3.3% Generalization (Hypernymy) 87.1% ± 4.8% Instance 89.3% ± 4.4% Similarity 92.0% ± 3.8% Category 93.3% ± 4.5% Part (Meronymy) 94.4% ± 4.1% Member (Meronymy) 92.7% ± 4.0% Substance (Meronymy) 95.6% ± 3.5% Opposite 94.3% ± 3.9% 1: Wilson score intervals for random samples Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 20/29
  • 48. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Coverage Language Term-Meaning Links Distinct Terms Overall 1,595,763 822,212 German 132,523 67,087 French 75,544 33,423 Esperanto 71,247 33,664 Dutch 68,792 30,154 Spanish 68,445 32,143 Turkish 67,641 31,553 Czech 59,268 33,067 Russian 57,929 26,293 Portuguese 55,569 23,499 Italian 52,008 24,974 Hungarian 46,492 28,324 Thai 44,523 30,815 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 21/29
  • 49. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Experimental Setup Example: “curriculum” considered closely related to “school”, but not to “water” compute term relatedness using UWN sim(t1, t2) = max s1∈σ(t1) max s2∈σ(t2) sim(s1, s2) sim(s1, s2): combined graph-/gloss-based method compare with assessments of relatedness made by human judges Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
  • 50. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Experimental Setup Example: “curriculum” considered closely related to “school”, but not to “water” compute term relatedness using UWN sim(t1, t2) = max s1∈σ(t1) max s2∈σ(t2) sim(s1, s2) sim(s1, s2): combined graph-/gloss-based method compare with assessments of relatedness made by human judges Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
  • 51. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Experimental Setup Example: “curriculum” considered closely related to “school”, but not to “water” compute term relatedness using UWN sim(t1, t2) = max s1∈σ(t1) max s2∈σ(t2) sim(s1, s2) sim(s1, s2): combined graph-/gloss-based method compare with assessments of relatedness made by human judges Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
  • 52. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Results for 3 German Datasets Dataset GUR65 GUR350 ZG222 r Cov. r Cov. r Cov. Inter-Annot. Agreement 0.81 (65) 0.69 (350) 0.49 (222) Wikipedia (ESA*) 0.56 65 0.52 333 0.32 205 GermaNet (Lin*) 0.73 60 0.50 208 0.08 88 UWN 0.80 60 0.68 242 0.51 106 r: Pearson product-moment correlation coefficient Cov.: absolute coverage ∗: scores by Gurevych et al. (2007) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 23/29
  • 53. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 54. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 55. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 56. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 57. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification Language Pair Terms only Terms + Meanings English-Italian 68.3% 76.3% English-Russian 51.7% 71.2% Italian-English 74.4% 78.1% Italian-Russian 58.4% 73.2% Russian-English 67.3% 76.8% Russian-Italian 62.2% 71.8% (all values are F1 scores) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 25/29
  • 58. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 26/29
  • 59. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Summary large-scale multilingual wordnet: 85% accuracy, 800,000 terms, over 1.5 million links from terms to meanings, built by learning edge weights using graph-based evidence useful for monolingual and cross-lingual tasks Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
  • 60. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Summary large-scale multilingual wordnet: 85% accuracy, 800,000 terms, over 1.5 million links from terms to meanings, built by learning edge weights using graph-based evidence useful for monolingual and cross-lingual tasks Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
  • 61. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Summary large-scale multilingual wordnet: 85% accuracy, 800,000 terms, over 1.5 million links from terms to meanings, built by learning edge weights using graph-based evidence useful for monolingual and cross-lingual tasks Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
  • 62. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Future Work ongoing work: user interface incl. user contributions techniques to automatically discover new word meanings word sense disambiguation, query expansion using UWN Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
  • 63. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Future Work ongoing work: user interface incl. user contributions techniques to automatically discover new word meanings word sense disambiguation, query expansion using UWN Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
  • 64. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Future Work ongoing work: user interface incl. user contributions techniques to automatically discover new word meanings word sense disambiguation, query expansion using UWN Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
  • 65. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Thanks! expression of gratitude eng: “thank you” yue: “ ”唔該 cmn: “ ”谢谢 jap: “ ”ありがとう spa: “gracias” ara: “‫را‬ً ‫شك‬.” Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 29/29

×