Towards a Universal Wordnet by Learning from Combined Evidence

Gerard de Melo
Gerard de MeloAssistant Professor at Rutgers University
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Towards a Universal Wordnet
by Learning from Combined Evidence
Gerard de Melo and Gerhard Weikum
Max Planck Institute for Informatics
Saarbr¨ucken, Germany
2009-11-03
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 1/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words? person who
gives a talk
“speaker”
device that
produces
sounds
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
flat piece
of wood
“board”
committee
panel for writing
with chalk
to enter a
transportation
vehicle
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
someone who
studies
“student”
“pupil”
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
faculty
professor
member
part
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
entity
institution
educational
institution
university
...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
Many Applications
examples:
NLP, AI
question answering
query expansion
human consultation
entity
institution
educational
institution
university
...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Multilinguality
the world is
multilingual
the Internet is also
increasingly
multilingual
Top 10 Languages by
Approx. No. of Speakers
Source: Ethnologue 2005
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Multilinguality
the world is
multilingual
the Internet is also
increasingly
multilingual
Internet users by Region
Source:
Internet World Stats
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
person who
gives a talk
eng: “speaker”
jpn: “ ”話者
rus: “докладчик”
ces: “řečník”
... ......
Vision
universal index of word
meanings
large-scale semantic network
with class hierarchy
look up any word
in any language,
get a list of its meanings
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
entitypor: “entidade”
cmn: “ ”制度 institution
educational
institution
university
heb: “‫ישות‬.”
deu: “Bildungs-
einrichtung”
cym: “prifysgol”
...
Vision
universal index of word
meanings
large-scale semantic network
with class hierarchy
meanings should be connected
via semantic relations
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 5/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 6/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of English
words
meaning-to-meaning links
Miller, Fellbaum et al. (1990)
among most-cited papers
in computer science
(source: CiteseerX)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of English
words
meaning-to-meaning links
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of English
words
meaning-to-meaning links
hypernym hierarchy
meronymy (part of)
etc.
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
not a single, coherent resource
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
2 languages, around 70 000 entities
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
large translation graph
limited structure
e.g. no semantic hierarchy
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
class hierarchy not multilingual
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 10/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Strategy
use existing wordnets as backbone
add new terms, link to meaning nodes
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Existing Wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Strategy
use existing wordnets as backbone
add new terms, link to meaning nodes
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Existing Wordnets
−→
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Desired Output
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Input Graph
use existing wordnets as backbone
add translations to graph
mainly English, Spanish, Catalan
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Input Graph G0
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Input Graph
use existing wordnets as backbone
add translations to graph
dictionaries (e.g. Wiktionary)
thesauri and ontologies
parallel corpora (word alignment)
also: predict new translations
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Input Graph G0
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Approach: Link new words to meanings of their translations
Huge Challenge: Disambiguation!
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Approach: Link new words to meanings of their translations
Huge Challenge: Disambiguation!
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
?
?
?
?
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
?
?
?
?
Approach
variety of features that analyse
previous graph Gi−1,
incorporate neighbourhood
information into an
edge’s feature vector
supervised learning: new edge
weights determined using
RBF-kernel SVM with posterior
probability estimation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
?
?
?
?
Approach
variety of features that analyse
previous graph Gi−1,
incorporate neighbourhood
information into an
edge’s feature vector
supervised learning: new edge
weights determined using
RBF-kernel SVM with posterior
probability estimation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
Given term t
and meaning m
Question: Should they be linked?
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
Given term t
and meaning m
Question: Should they be linked?
Look at neighbours t ∈ Γt
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
t ∈Γ(t)
sim∗(t , m)
sim∗(t , m) + dissim(t , m)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
t ∈Γ(t)
sim∗(t , m)
sim∗(t , m) + dissim(t , m)
sim∗(t ,m)= max
m ∈Γ(t )
sim(m ,m)
dissim(t ,m)=
P
m ∈Γ(t )
(1−sim(m ,m))
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
t ∈Γ(t)
φ1(t, t ) sim∗(t , m)
sim∗(t , m) + dissim(t , m)
sim∗(t ,m)= max
m ∈Γ(t )
φ2(t ,m )sim(m ,m)
dissim(t ,m)=
P
m ∈Γ(t )
φ2(t ,m )(1−sim(m ,m))
weighting based on:
part-of-speech
corpus frequency
...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Other Features
cosine similarity of
translations with gloss
scores assessing polysemy by
looking at back-translations
many more
(see paper for details)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features for
RBF-kernel SVM
multiple iterations:
each graphs Gi based on the
previous Gi−1
stop when F1 score plateau
is reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features for
RBF-kernel SVM
multiple iterations:
each graphs Gi based on the
previous Gi−1
stop when F1 score plateau
is reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features for
RBF-kernel SVM
multiple iterations:
each graphs Gi based on the
previous Gi−1
stop when F1 score plateau
is reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 17/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
Setup
input graph G0:
448,069 pre-existing term-meaning links
10,805,400 translation edges
1.3 million term nodes with candidates
7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
Setup
input graph G0:
448,069 pre-existing term-meaning links
10,805,400 translation edges
1.3 million term nodes with candidates
7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
Setup
input graph G0:
448,069 pre-existing term-meaning links
10,805,400 translation edges
1.3 million term nodes with candidates
7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
deu: “Schulgebäude”
school
(group of fish)
school
(institution)
school
(building)
deu: “Schulhaus”
deu: “Fischschwarm”
ces: “hejno”
fra: “banc”
ind: “sekolah”
jpn: “ ”学校
kor: “ ”학교
lao: “ໂຮງຮຽນ”
kat: “ ”სკოლა
Excerpt from final UWN graph G3 after 3 iterations
retaining only edges with sufficiently high weights (0.5 / 0.6)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 19/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Evaluation
Relation Precision1
Term-Meaning Links (French) 89.2% ± 3.4%
Term-Meaning Links (German) 85.9% ± 3.8%
Term-Meaning Links (Mandarin Chinese) 90.5% ± 3.3%
Generalization (Hypernymy) 87.1% ± 4.8%
Instance 89.3% ± 4.4%
Similarity 92.0% ± 3.8%
Category 93.3% ± 4.5%
Part (Meronymy) 94.4% ± 4.1%
Member (Meronymy) 92.7% ± 4.0%
Substance (Meronymy) 95.6% ± 3.5%
Opposite 94.3% ± 3.9%
1: Wilson score intervals for random samples
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 20/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Coverage
Language Term-Meaning Links Distinct Terms
Overall 1,595,763 822,212
German 132,523 67,087
French 75,544 33,423
Esperanto 71,247 33,664
Dutch 68,792 30,154
Spanish 68,445 32,143
Turkish 67,641 31,553
Czech 59,268 33,067
Russian 57,929 26,293
Portuguese 55,569 23,499
Italian 52,008 24,974
Hungarian 46,492 28,324
Thai 44,523 30,815
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 21/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to
“school”, but not to “water”
compute term relatedness using UWN
sim(t1, t2) = max
s1∈σ(t1)
max
s2∈σ(t2)
sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by human
judges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to
“school”, but not to “water”
compute term relatedness using UWN
sim(t1, t2) = max
s1∈σ(t1)
max
s2∈σ(t2)
sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by human
judges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to
“school”, but not to “water”
compute term relatedness using UWN
sim(t1, t2) = max
s1∈σ(t1)
max
s2∈σ(t2)
sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by human
judges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Results for 3 German Datasets
Dataset GUR65 GUR350 ZG222
r Cov. r Cov. r Cov.
Inter-Annot. Agreement 0.81 (65) 0.69 (350) 0.49 (222)
Wikipedia (ESA*) 0.56 65 0.52 333 0.32 205
GermaNet (Lin*) 0.73 60 0.50 208 0.08 88
UWN 0.80 60 0.68 242 0.51 106
r: Pearson product-moment correlation coefficient
Cov.: absolute coverage
∗: scores by Gurevych et al. (2007)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 23/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
Language Pair Terms only Terms + Meanings
English-Italian 68.3% 76.3%
English-Russian 51.7% 71.2%
Italian-English 74.4% 78.1%
Italian-Russian 58.4% 73.2%
Russian-English 67.3% 76.8%
Russian-Italian 62.2% 71.8%
(all values are F1 scores)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 25/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 26/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Summary
large-scale multilingual wordnet:
85% accuracy, 800,000 terms, over 1.5 million links from
terms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Summary
large-scale multilingual wordnet:
85% accuracy, 800,000 terms, over 1.5 million links from
terms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Summary
large-scale multilingual wordnet:
85% accuracy, 800,000 terms, over 1.5 million links from
terms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Thanks!
expression of
gratitude
eng: “thank you”
yue: “ ”唔該
cmn: “ ”谢谢
jap: “ ”ありがとう
spa: “gracias”
ara: “‫را‬ً ‫شك‬.”
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 29/29
1 of 65

Recommended

Basic tools for language teachers: ITILT by
Basic tools for language teachers: ITILTBasic tools for language teachers: ITILT
Basic tools for language teachers: ITILTShona Whyte
4.4K views21 slides
Love is a stranger in an open car to tempt you in and drive you far away... t... by
Love is a stranger in an open car to tempt you in and drive you far away... t...Love is a stranger in an open car to tempt you in and drive you far away... t...
Love is a stranger in an open car to tempt you in and drive you far away... t...Alannah Fitzgerald
1.3K views10 slides
Standards, terminology and Europe by
Standards, terminology and EuropeStandards, terminology and Europe
Standards, terminology and EuropeLuigi Muzii
1.1K views3 slides
White Noise by
White NoiseWhite Noise
White NoiseLuigi Muzii
1.1K views7 slides
Strategies Second Language Acquisition Science by
Strategies Second Language Acquisition ScienceStrategies Second Language Acquisition Science
Strategies Second Language Acquisition ScienceMagda EnriquezBeitler
482 views26 slides
Second languange learning strategies by
Second languange learning strategiesSecond languange learning strategies
Second languange learning strategiesTantri Sundari
4K views12 slides

More Related Content

Similar to Towards a Universal Wordnet by Learning from Combined Evidence

Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit... by
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Alannah Fitzgerald
1.6K views47 slides
FinalReport by
FinalReportFinalReport
FinalReportVinh Xuan Ho
138 views4 slides
Concordancing 1 by
Concordancing 1Concordancing 1
Concordancing 1Hala Fawzi
2K views34 slides
Multimedia In The Esol Curriculum (Conference) by
Multimedia In The Esol Curriculum (Conference)Multimedia In The Esol Curriculum (Conference)
Multimedia In The Esol Curriculum (Conference)jwheetley
1.1K views72 slides
Closing the Gap: Data Models for Documentary Linguistics by
Closing the Gap: Data Models for Documentary LinguisticsClosing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsBaden Hughes
741 views21 slides
Week 8 Communication by
Week 8 CommunicationWeek 8 Communication
Week 8 Communicationanneleftwich
840 views65 slides

Similar to Towards a Universal Wordnet by Learning from Combined Evidence(20)

Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit... by Alannah Fitzgerald
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Alannah Fitzgerald1.6K views
Concordancing 1 by Hala Fawzi
Concordancing 1Concordancing 1
Concordancing 1
Hala Fawzi2K views
Multimedia In The Esol Curriculum (Conference) by jwheetley
Multimedia In The Esol Curriculum (Conference)Multimedia In The Esol Curriculum (Conference)
Multimedia In The Esol Curriculum (Conference)
jwheetley1.1K views
Closing the Gap: Data Models for Documentary Linguistics by Baden Hughes
Closing the Gap: Data Models for Documentary LinguisticsClosing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary Linguistics
Baden Hughes741 views
Week 8 Communication by anneleftwich
Week 8 CommunicationWeek 8 Communication
Week 8 Communication
anneleftwich840 views
Tsl641 by Izaham
Tsl641Tsl641
Tsl641
Izaham 1.1K views
Foundations of ICT In ELT by jaedth
Foundations of ICT In ELTFoundations of ICT In ELT
Foundations of ICT In ELT
jaedth1K views
TPCK: Use of ICT to teach/improve competence in listening to English by paula hodgson
TPCK: Use of ICT to teach/improve competence in listening to EnglishTPCK: Use of ICT to teach/improve competence in listening to English
TPCK: Use of ICT to teach/improve competence in listening to English
paula hodgson7K views
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience by BlackboardEMEA
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceText-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
BlackboardEMEA791 views
Resources at the Interface of Openness for Academic English by Alannah Fitzgerald
Resources at the Interface of Openness for Academic EnglishResources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic English
FLAX: Flexible Language Acquisition with Open Data-Driven Learning by Alannah Fitzgerald
FLAX: Flexible Language Acquisition with Open Data-Driven LearningFLAX: Flexible Language Acquisition with Open Data-Driven Learning
FLAX: Flexible Language Acquisition with Open Data-Driven Learning
Alannah Fitzgerald949 views
LRC XIII Localisation Conference - Using community feedback to improve social... by sarni
LRC XIII Localisation Conference - Using community feedback to improve social...LRC XIII Localisation Conference - Using community feedback to improve social...
LRC XIII Localisation Conference - Using community feedback to improve social...
sarni345 views
Using Technology In The Language Classroom by Erin Lowry
Using Technology In The Language ClassroomUsing Technology In The Language Classroom
Using Technology In The Language Classroom
Erin Lowry26.8K views
Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct... by Kamil Trzebiatowski
Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct...Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct...
Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct...
Kamil Trzebiatowski4.9K views
TRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjar by Dr. Shadia Banjar
TRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjarTRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjar
TRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjar

More from Gerard de Melo

SEMAC Graph Node Embeddings for Link Prediction by
SEMAC Graph Node Embeddings for Link PredictionSEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionGerard de Melo
932 views39 slides
How to Manage your Research by
How to Manage your ResearchHow to Manage your Research
How to Manage your ResearchGerard de Melo
2.3K views142 slides
Knowlywood: Mining Activity Knowledge from Hollywood Narratives by
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesKnowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesGerard de Melo
848 views28 slides
Learning Multilingual Semantics from Big Data on the Web by
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebGerard de Melo
1.2K views156 slides
From Big Data to Valuable Knowledge by
From Big Data to Valuable KnowledgeFrom Big Data to Valuable Knowledge
From Big Data to Valuable KnowledgeGerard de Melo
1K views44 slides
Scalable Learning Technologies for Big Data Mining by
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningGerard de Melo
1.7K views152 slides

More from Gerard de Melo(15)

SEMAC Graph Node Embeddings for Link Prediction by Gerard de Melo
SEMAC Graph Node Embeddings for Link PredictionSEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link Prediction
Gerard de Melo932 views
How to Manage your Research by Gerard de Melo
How to Manage your ResearchHow to Manage your Research
How to Manage your Research
Gerard de Melo2.3K views
Knowlywood: Mining Activity Knowledge from Hollywood Narratives by Gerard de Melo
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesKnowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Gerard de Melo848 views
Learning Multilingual Semantics from Big Data on the Web by Gerard de Melo
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the Web
Gerard de Melo1.2K views
From Big Data to Valuable Knowledge by Gerard de Melo
From Big Data to Valuable KnowledgeFrom Big Data to Valuable Knowledge
From Big Data to Valuable Knowledge
Gerard de Melo1K views
Scalable Learning Technologies for Big Data Mining by Gerard de Melo
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
Gerard de Melo1.7K views
Searching the Web of Data (Tutorial) by Gerard de Melo
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)
Gerard de Melo1.9K views
From Linked Data to Tightly Integrated Data by Gerard de Melo
From Linked Data to Tightly Integrated DataFrom Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated Data
Gerard de Melo1.6K views
Information Extraction from Web-Scale N-Gram Data by Gerard de Melo
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram Data
Gerard de Melo1.8K views
UWN: A Large Multilingual Lexical Knowledge Base by Gerard de Melo
UWN: A Large Multilingual Lexical Knowledge BaseUWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge Base
Gerard de Melo1.1K views
Multilingual Text Classification using Ontologies by Gerard de Melo
Multilingual Text Classification using OntologiesMultilingual Text Classification using Ontologies
Multilingual Text Classification using Ontologies
Gerard de Melo1.4K views
Extracting Sense-Disambiguated Example Sentences From Parallel Corpora by Gerard de Melo
Extracting Sense-Disambiguated Example Sentences From Parallel CorporaExtracting Sense-Disambiguated Example Sentences From Parallel Corpora
Extracting Sense-Disambiguated Example Sentences From Parallel Corpora
Gerard de Melo1.7K views
Not Quite the Same: Identity Constraints for the Web of Linked Data by Gerard de Melo
Not Quite the Same: Identity Constraints for the Web of Linked DataNot Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked Data
Gerard de Melo989 views
Good, Great, Excellent: Global Inference of Semantic Intensities by Gerard de Melo
Good, Great, Excellent: Global Inference of Semantic IntensitiesGood, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic Intensities
Gerard de Melo2K views
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology by Gerard de Melo
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyYAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
Gerard de Melo2.2K views

Recently uploaded

[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...DataScienceConferenc1
6 views15 slides
Data about the sector workshop by
Data about the sector workshopData about the sector workshop
Data about the sector workshopinfo828217
15 views27 slides
shivam tiwari.pptx by
shivam tiwari.pptxshivam tiwari.pptx
shivam tiwari.pptxAanyaMishra4
5 views14 slides
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx by
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptxDataScienceConferenc1
5 views15 slides
3196 The Case of The East River by
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East RiverErickANDRADE90
17 views4 slides
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
8 views36 slides

Recently uploaded(20)

[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821715 views
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9017 views
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
LIVE OAK MEMORIAL PARK.pptx by ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 views
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra17 views
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821711 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 views

Towards a Universal Wordnet by Learning from Combined Evidence

  • 1. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Towards a Universal Wordnet by Learning from Combined Evidence Gerard de Melo and Gerhard Weikum Max Planck Institute for Informatics Saarbr¨ucken, Germany 2009-11-03 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 1/29
  • 2. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? person who gives a talk “speaker” device that produces sounds Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 3. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? flat piece of wood “board” committee panel for writing with chalk to enter a transportation vehicle Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 4. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? someone who studies “student” “pupil” Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 5. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? faculty professor member part Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 6. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? entity institution educational institution university ... Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 7. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? Many Applications examples: NLP, AI question answering query expansion human consultation entity institution educational institution university ... Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 8. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Multilinguality the world is multilingual the Internet is also increasingly multilingual Top 10 Languages by Approx. No. of Speakers Source: Ethnologue 2005 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
  • 9. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Multilinguality the world is multilingual the Internet is also increasingly multilingual Internet users by Region Source: Internet World Stats Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
  • 10. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction person who gives a talk eng: “speaker” jpn: “ ”話者 rus: “докладчик” ces: “řečník” ... ...... Vision universal index of word meanings large-scale semantic network with class hierarchy look up any word in any language, get a list of its meanings Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
  • 11. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction entitypor: “entidade” cmn: “ ”制度 institution educational institution university heb: “‫ישות‬.” deu: “Bildungs- einrichtung” cym: “prifysgol” ... Vision universal index of word meanings large-scale semantic network with class hierarchy meanings should be connected via semantic relations Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
  • 12. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 5/29
  • 13. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 6/29
  • 14. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases WordNet lexical database created at Princeton enumerates meanings of English words meaning-to-meaning links Miller, Fellbaum et al. (1990) among most-cited papers in computer science (source: CiteseerX) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
  • 15. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases WordNet lexical database created at Princeton enumerates meanings of English words meaning-to-meaning links Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
  • 16. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases WordNet lexical database created at Princeton enumerates meanings of English words meaning-to-meaning links hypernym hierarchy meronymy (part of) etc. Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
  • 17. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 18. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 19. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets not a single, coherent resource Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 20. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 21. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Other Resources PANGLOSS Ontology: Knight & Luk (1994) TransGraph system: Etzioni et al. (2007) DBPedia, YAGO, OpenCyc 2 languages, around 70 000 entities Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
  • 22. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Other Resources PANGLOSS Ontology: Knight & Luk (1994) TransGraph system: Etzioni et al. (2007) DBPedia, YAGO, OpenCyc large translation graph limited structure e.g. no semantic hierarchy Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
  • 23. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Other Resources PANGLOSS Ontology: Knight & Luk (1994) TransGraph system: Etzioni et al. (2007) DBPedia, YAGO, OpenCyc class hierarchy not multilingual Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
  • 24. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 10/29
  • 25. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Strategy use existing wordnets as backbone add new terms, link to meaning nodes spa: “trayectoria” academic course part of a meal route of travel series of events eng: “course” eng: “class” Existing Wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
  • 26. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Strategy use existing wordnets as backbone add new terms, link to meaning nodes spa: “trayectoria” academic course part of a meal route of travel series of events eng: “course” eng: “class” Existing Wordnets −→ deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Desired Output Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
  • 27. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Input Graph use existing wordnets as backbone add translations to graph mainly English, Spanish, Catalan spa: “trayectoria” academic course part of a meal route of travel series of events eng: “course” eng: “class” Input Graph G0 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
  • 28. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Input Graph use existing wordnets as backbone add translations to graph dictionaries (e.g. Wiktionary) thesauri and ontologies parallel corpora (word alignment) also: predict new translations deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Input Graph G0 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
  • 29. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Approach: Link new words to meanings of their translations Huge Challenge: Disambiguation! academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
  • 30. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Approach: Link new words to meanings of their translations Huge Challenge: Disambiguation! academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation ? ? ? ? Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
  • 31. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation ? ? ? ? Approach variety of features that analyse previous graph Gi−1, incorporate neighbourhood information into an edge’s feature vector supervised learning: new edge weights determined using RBF-kernel SVM with posterior probability estimation Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
  • 32. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation ? ? ? ? Approach variety of features that analyse previous graph Gi−1, incorporate neighbourhood information into an edge’s feature vector supervised learning: new edge weights determined using RBF-kernel SVM with posterior probability estimation Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
  • 33. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m Given term t and meaning m Question: Should they be linked? Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 34. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' Given term t and meaning m Question: Should they be linked? Look at neighbours t ∈ Γt Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 35. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' t ∈Γ(t) sim∗(t , m) sim∗(t , m) + dissim(t , m) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 36. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' t ∈Γ(t) sim∗(t , m) sim∗(t , m) + dissim(t , m) sim∗(t ,m)= max m ∈Γ(t ) sim(m ,m) dissim(t ,m)= P m ∈Γ(t ) (1−sim(m ,m)) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 37. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' t ∈Γ(t) φ1(t, t ) sim∗(t , m) sim∗(t , m) + dissim(t , m) sim∗(t ,m)= max m ∈Γ(t ) φ2(t ,m )sim(m ,m) dissim(t ,m)= P m ∈Γ(t ) φ2(t ,m )(1−sim(m ,m)) weighting based on: part-of-speech corpus frequency ... Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 38. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Other Features cosine similarity of translations with gloss scores assessing polysemy by looking at back-translations many more (see paper for details) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 39. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Approach use scores as features for RBF-kernel SVM multiple iterations: each graphs Gi based on the previous Gi−1 stop when F1 score plateau is reached on a validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 40. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Approach use scores as features for RBF-kernel SVM multiple iterations: each graphs Gi based on the previous Gi−1 stop when F1 score plateau is reached on a validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 41. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Approach use scores as features for RBF-kernel SVM multiple iterations: each graphs Gi based on the previous Gi−1 stop when F1 score plateau is reached on a validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 42. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 17/29
  • 43. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results Setup input graph G0: 448,069 pre-existing term-meaning links 10,805,400 translation edges 1.3 million term nodes with candidates 7.7 candidate meanings per new term 2,445 term-meaning links for training (French/German) 2,901 term-meaning links as validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
  • 44. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results Setup input graph G0: 448,069 pre-existing term-meaning links 10,805,400 translation edges 1.3 million term nodes with candidates 7.7 candidate meanings per new term 2,445 term-meaning links for training (French/German) 2,901 term-meaning links as validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
  • 45. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results Setup input graph G0: 448,069 pre-existing term-meaning links 10,805,400 translation edges 1.3 million term nodes with candidates 7.7 candidate meanings per new term 2,445 term-meaning links for training (French/German) 2,901 term-meaning links as validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
  • 46. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results deu: “Schulgebäude” school (group of fish) school (institution) school (building) deu: “Schulhaus” deu: “Fischschwarm” ces: “hejno” fra: “banc” ind: “sekolah” jpn: “ ”学校 kor: “ ”학교 lao: “ໂຮງຮຽນ” kat: “ ”სკოლა Excerpt from final UWN graph G3 after 3 iterations retaining only edges with sufficiently high weights (0.5 / 0.6) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 19/29
  • 47. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Evaluation Relation Precision1 Term-Meaning Links (French) 89.2% ± 3.4% Term-Meaning Links (German) 85.9% ± 3.8% Term-Meaning Links (Mandarin Chinese) 90.5% ± 3.3% Generalization (Hypernymy) 87.1% ± 4.8% Instance 89.3% ± 4.4% Similarity 92.0% ± 3.8% Category 93.3% ± 4.5% Part (Meronymy) 94.4% ± 4.1% Member (Meronymy) 92.7% ± 4.0% Substance (Meronymy) 95.6% ± 3.5% Opposite 94.3% ± 3.9% 1: Wilson score intervals for random samples Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 20/29
  • 48. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Coverage Language Term-Meaning Links Distinct Terms Overall 1,595,763 822,212 German 132,523 67,087 French 75,544 33,423 Esperanto 71,247 33,664 Dutch 68,792 30,154 Spanish 68,445 32,143 Turkish 67,641 31,553 Czech 59,268 33,067 Russian 57,929 26,293 Portuguese 55,569 23,499 Italian 52,008 24,974 Hungarian 46,492 28,324 Thai 44,523 30,815 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 21/29
  • 49. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Experimental Setup Example: “curriculum” considered closely related to “school”, but not to “water” compute term relatedness using UWN sim(t1, t2) = max s1∈σ(t1) max s2∈σ(t2) sim(s1, s2) sim(s1, s2): combined graph-/gloss-based method compare with assessments of relatedness made by human judges Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
  • 50. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Experimental Setup Example: “curriculum” considered closely related to “school”, but not to “water” compute term relatedness using UWN sim(t1, t2) = max s1∈σ(t1) max s2∈σ(t2) sim(s1, s2) sim(s1, s2): combined graph-/gloss-based method compare with assessments of relatedness made by human judges Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
  • 51. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Experimental Setup Example: “curriculum” considered closely related to “school”, but not to “water” compute term relatedness using UWN sim(t1, t2) = max s1∈σ(t1) max s2∈σ(t2) sim(s1, s2) sim(s1, s2): combined graph-/gloss-based method compare with assessments of relatedness made by human judges Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
  • 52. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Results for 3 German Datasets Dataset GUR65 GUR350 ZG222 r Cov. r Cov. r Cov. Inter-Annot. Agreement 0.81 (65) 0.69 (350) 0.49 (222) Wikipedia (ESA*) 0.56 65 0.52 333 0.32 205 GermaNet (Lin*) 0.73 60 0.50 208 0.08 88 UWN 0.80 60 0.68 242 0.51 106 r: Pearson product-moment correlation coefficient Cov.: absolute coverage ∗: scores by Gurevych et al. (2007) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 23/29
  • 53. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 54. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 55. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 56. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 57. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification Language Pair Terms only Terms + Meanings English-Italian 68.3% 76.3% English-Russian 51.7% 71.2% Italian-English 74.4% 78.1% Italian-Russian 58.4% 73.2% Russian-English 67.3% 76.8% Russian-Italian 62.2% 71.8% (all values are F1 scores) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 25/29
  • 58. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 26/29
  • 59. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Summary large-scale multilingual wordnet: 85% accuracy, 800,000 terms, over 1.5 million links from terms to meanings, built by learning edge weights using graph-based evidence useful for monolingual and cross-lingual tasks Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
  • 60. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Summary large-scale multilingual wordnet: 85% accuracy, 800,000 terms, over 1.5 million links from terms to meanings, built by learning edge weights using graph-based evidence useful for monolingual and cross-lingual tasks Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
  • 61. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Summary large-scale multilingual wordnet: 85% accuracy, 800,000 terms, over 1.5 million links from terms to meanings, built by learning edge weights using graph-based evidence useful for monolingual and cross-lingual tasks Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
  • 62. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Future Work ongoing work: user interface incl. user contributions techniques to automatically discover new word meanings word sense disambiguation, query expansion using UWN Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
  • 63. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Future Work ongoing work: user interface incl. user contributions techniques to automatically discover new word meanings word sense disambiguation, query expansion using UWN Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
  • 64. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Future Work ongoing work: user interface incl. user contributions techniques to automatically discover new word meanings word sense disambiguation, query expansion using UWN Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
  • 65. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Thanks! expression of gratitude eng: “thank you” yue: “ ”唔該 cmn: “ ”谢谢 jap: “ ”ありがとう spa: “gracias” ara: “‫را‬ً ‫شك‬.” Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 29/29