The document summarizes Nervo Verdezoto's master's thesis project which applied formal ontology and semantic techniques to identify errors and improve coherence in WordNet and other lexical resources. The project involved analyzing WordNet relations to identify ontological problems, defining test queries to evaluate semantic and ontological constraints, and manually analyzing errors. The results showed over 100 ontological problems in WordNet and similar issues in other datasets. Future work could expand the experiments, develop tools to automatically identify errors in WordNet, and provide guidelines to prevent common mistakes.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Application of formal ontology and semantic techniques to improve the coherence and usability of lexical resources
1. Nervo Verdezoto
University of Trento
nervo.verdezoto@studenti.unitn.it
Prof. Laure Vieu and Prof. Alessandro Oltramari
Tutors
Application of formal ontology and semantic
techniques to improve the coherence and
usability of lexical resources
Master HLTI 2009-2010
2. Outline
Objectives and Tasks
– Data
– Ontological Principles
– Experiments
– Results
Manual Analysis and discussion
Summary
Master HLTI 2009-2010
3. Objectives
• Get familiar with Ontology-driven
Conceptual Modeling
• Develop semi-automatic methods to
spot semantic/ontological problems in
WordNet at lower levels
• Get familiar with scientific reporting
Master HLTI 2009-2010
4. Tasks
Study WordNet semantic relations to spot ontological
problems
Applications:
RTE
Automatic detection of part-whole relations e.g.
(atmospheric phenomenon, communication), (shape, artifact),
(shape, physical phenomenon)
Master HLTI 2009-2010
5. The Data
WordNet: 82115 synsets were examined to collect the initial data, 22187 were
involved in meronyms and holonyms relations (50% meronyms – 50% holonyms)
Semeval 2007: 89 pairs relations were extracted.
Additionally, we eliminated the redundant pairs from initial data.
MERONYMS
14000
12000
10000
8000 # PAIRS –
MERONYMS
6000
4000
2000
0
MEMBER PART SUBSTANCE
Master HLTI 2009-2010
6. Ontological Principles
• Constraints: part and whole should be of a
similar nature.
• DOLCE-ontological distinctions between:
– endurants (ED) or physical entities (like a
dog, a table, a cave, etc.)
– perdurants (PD) or eventualities (like a
lecture, a sleep, a raining, etc.)
– abstract (AB, entities like a number, the
content of a text, etc.).
Master HLTI 2009-2010
7. Experiments – Tests
[defining queries]
• Semantic Constraints
– Test 0: Individual – Class pairs:
• (great_divide%1:15:00,continental_divide%1:15:00)
– Test 4: Meronymy – Member and Member–Collection
pairs:
• (coronal%1:06:00, rose%1:20:00)
• Ontological Constraints
– Test 1: ED–AB (test 1.1) or AB–ED (test 1.2)
• Test 1.1: physical entity 1:03:00 (but not process 1:03:00) / abstraction 1:03:00 (but not event
1:03:00 + state 1:03:00. (head%1:06:04::,coin%1:21:02::)
– Test 2: ED–PD (test 2.1) or PD–ED (test 2.2)
• Test 2.1 , physical entity 1:03:00 (but not process 1:03:00) / process 1:03:00 + event 1:03:00 +
state 1:03:00. ⟨air%1:27:00, wind%1:19:00⟩
– Test 3: PD–AB (test 3.1) or AB–PD (test 3.2)
• Test 3.1 , abstraction 1:03:00 – but not event 1:03:00 + state 1:03:00(first all and then without
group) / event 1:03:00 + state 1:03:00 + process 1:03:00. ⟨regulation time%1:28:00, athletic
game%1:04:00⟩
Master HLTI 2009-2010
8. Results
Ontological Problems
180
163
160
140
120 108
100 WORDNET
SEMEVAL
80
60
45
40
20
2 2
0
Test 1 Test 2 Test 3
Ontological Problems
180
163
160
140
120 108
100 W ORDNET
SEMEVAL
80
60 45
40
20
2 2
0
Test 1 Test 2 Test 3
Master HLTI 2009-2010
9. Manual Analysis and discussion
General Errors
• a synset is considered as a class but should be an individual
– Confusion between class and an instance of this class for which the term is used with a specific
sense e.g., ⟨great_divide%1:15:00,continental_divide%1:15:00⟩
– Confusion between class and group e.g., new_testament%1:10:00
• a synset is not attached to the right place in the taxonomy
– Confusion between a property and a physical entity having that property (shape, quantity or
measure, location) or between a relation and a physical entity being an argument in that relation
e.g., coin%1:21:02, hay_mow%1:23:00 - calyx%1:20:00, mothball%1:06:00
• a synset mixes two senses, and the missing sense should be attached elsewhere in the
taxonomy or this missing sense is an individual, not a class
– Confusion between 2 senses of a word, amounting to a missing sense e.g.
⟨ethiopian%1:18:00, ethiopia%1:15:00⟩
• the meronymy relation is wrong
– Confusion between meronymy and other relations (location, participation, etc.):
• “is located in” - ⟨balkan_wars%1:04:00, balkan_peninsula%1:15:00⟩
• “participates in” - ⟨feminist%1:18:00,feminist_movement%1:04:00⟩
Master HLTI 2009-2010
10. Summary and future work
• An automatic query system based on ontological
principles and semantic constraints is effective to build
semi-automatic methods to spot errors in WordNet
• Increase the number and type of experiments
• Exploit the results of this study to:
– Develop a semi-automatic tool for ”cleaning-up” WordNet
– Design and develop guidelines to help lexicographers
(Christiane Fellbaum from Princeton WordNet Group) to
prevent classical ontological mistakes
– Evaluation for NLP applications
Master HLTI 2009-2010