NLP & Semantic Computing Group
N L P
Word Tagging with Foundational
Ontology Classes: Extending the
WordNet-DOLCE Mapping to Verbs
Vivian S. Silva
André Freitas
Siegfried Handschuh
NLP & Semantic Computing Group
Introduction
• NLP applications such as Question Answering
and Text Entailment require complex inferences
involving large commonsense knowledge bases
 Need to map the words to an enumerable set of
categories, reducing the reasoning search space
• Rules and algorithms can be applied to these categories
NLP & Semantic Computing Group
Introduction
Linguistic resources, such as WordNet, can
serve as a “bridge” between natural language
text and higher level semantic representations
Foundational ontologies, which are sets of high
level formal categories, can provide a suitable
semantic representation
Map WordNet to a
foundational ontology
to enable FO based
word tagging
NLP & Semantic Computing Group
Why Foundational Ontologies?
John is
Mary’s
son
Representation Reasoning
Mary
gave
birth
Data
performs (Mary, give birth)
son (John, Mary)
son (x,y)  mother (y,x)
mother (Mary, John)
Foundational ontologies are intended to represent the
world in the way people perceive it, classifying entities
into categories that are familiar to people’s common sense
can represent data
in a formal way
can reason over
data through rules
and restrictions
NLP & Semantic Computing Group
Practical Application
Assumption Mary is a mother
Hypothesis Mary gave birth
Text Entailment Task
Support definition
(e.g. from WN)
“a mother is a woman who has given birth”
Foundational
Ontology
Mapping
Rule
Applying the rule
Mary mother give birth
agent role action
(agent plays role) and (role performs action) -> (agent performs action)
(Mary plays mother) and (mother performs give birth) ->
(Mary performs give birth)
Foundational
classes
Commonsense
concepts
NLP & Semantic Computing Group
DOLCE-WordNet Alignment
• Sweetening WordNet with DOLCE (Gangemi et
al., 2003)
 DOLCE: oriented towards language and cognition
 813 noun synsets mapped to 50 DOLCE classes
 No verb synsets mapped
• Proposal:
 Expand the nouns alignment to the whole
taxonomy
 Map also the verb synsets to DOLCE, using their
links to already mapped noun synsets
NLP & Semantic Computing Group
Verb Alignment Methodology
Update and
Expansion of
Nouns Alignment
Mappings update from version 1.6 to 3.0: 809 synsets;
Alignment expanded trough the taxonomy using the hypernym
links: 80,897 mapped synsets – 98.5% noun database
Top Level Verbs
Selection
Verbs classification performed over the top level synsets: 560
synsets
Direct Links
Derivationally related form lexical link retrieved for each of the 560
synsets; results manually filtered to identify the noun that best
represents the verb occurrence.
Examples: run - running; appear - apparition; leak - leakage
Indirect Links
When no direct links were found, indirect paths were searched,
using the antonym and verb group links.
Example: ignore [antonym of] know – knowingness
Manual Assignment
For verbs with no explicit direct or indirect link to a noun, implicit
relationships given by the words in their gloss were then identified
Example: overarch (“form an arch over”) - arch (“form an arch or
curve”)
NLP & Semantic Computing Group
Verb Alignment Methodology
• Alignment examples:
breathe_1 breathing_1 process
derivationally related form
degrade_1 aggrade_1 event
antonym
change_2
inherited hypernym
change_1
derivationally related form
take orders_2 eventordinance_3
verbs nouns DOLCE
classes
(a)
(b)
(c)
(a) Direct link
(b) Indirect link
(c) Manual assignment
 Using as much information as it’s available in the synset’s gloss, in
order to make the classification as less subjective as possible!
NLP & Semantic Computing Group
Alignment Results
DOLCE class Top Synsets Full Taxonomy
event 412 12,037
cognitive-event 63 854
state 62 597
process 15 259
cognitive-state 8 20
Total 560 13,767
direct links
36.25%
indirect
links
16.25%
implicit
relationships
47.50%
explicit
links
52.50%
Alignments expanded
trough the taxonomy
using the hypernym
links: verb database
100% mapped
NLP & Semantic Computing Group
Evaluation
• Objective: evaluate the usefulness of the resulting
alignments in a semantic annotation task (not the
alignment quality!)
• Datasets: SemCor 3.0 and eXtended WordNet (XWN)
 Sense number for each word/phrase used to retrieve the
synset ID, and then the DOLCE class associated to the
synset
 Labeled datasets used as gold standard
• FO tagging approach:
 Lookup in the WN-DOLCE mappings table
 First sense WSD
NLP & Semantic Computing Group
Evaluation Results
• Baseline:
 Random: chooses a random label among the ones available for a
word/phrase
 SuperSense Tagger (Ciaramita & Altun, 2006): most similar tool,
assigns a super sense (high level WN synsets) to each word/phrase
XWN SemCor
Precision Recall F1-Score Precision Recall F1-Score
Random 71.82 72.04 71.93 61.52 62.52 62.02
FO Tagging 89.68 89.74 89.71 86.10 86.36 86.23
SuperSense Tagging - - - 76.65 77.71 77.18
9.05%
Accuracy of the chosen approach for FO tagging at selecting the
most suitable label from the standard mappings set
NLP & Semantic Computing Group
Known Issues
• WordNet hypernym links not always effectively
represent subsumption relationships
 FO tagging deals with very high level categories
 Related concepts tend to converge to the same
category even when not following a strict
subsumption relationship
• Tagging restricted to the words present in
WordNet
 Future work: use the labeled datasets to train a
machine learning tagger
NLP & Semantic Computing Group
Conclusions
• Using a previous WN-DOLCE alignment for noun
synsets, we extended the mapping to the verb synsets
 Using lexical links and gloss’ words to track back the
noun that best represent a verb occurrence
 Assigning to the verb the same DOLCE class associated
to its noun counterpart
• Resulting alignment used in the implementation of
the FO Tagging semantic annotation framework
 Compared to SST, it showed an increase of 9.05% in
accuracy, besides introducing a more homogeneous and
conceptually well-grounded set of categories
 Even using a simple WSD technique, it is possible to
annotate text with high accuracy
NLP & Semantic Computing Group
N L P
Word Tagging with Foundational
Ontology Classes: Extending the
WordNet-DOLCE Mapping to Verbs
Thanks!

Word Tagging with Foundational Ontology Classes

  • 1.
    NLP & SemanticComputing Group N L P Word Tagging with Foundational Ontology Classes: Extending the WordNet-DOLCE Mapping to Verbs Vivian S. Silva André Freitas Siegfried Handschuh
  • 2.
    NLP & SemanticComputing Group Introduction • NLP applications such as Question Answering and Text Entailment require complex inferences involving large commonsense knowledge bases  Need to map the words to an enumerable set of categories, reducing the reasoning search space • Rules and algorithms can be applied to these categories
  • 3.
    NLP & SemanticComputing Group Introduction Linguistic resources, such as WordNet, can serve as a “bridge” between natural language text and higher level semantic representations Foundational ontologies, which are sets of high level formal categories, can provide a suitable semantic representation Map WordNet to a foundational ontology to enable FO based word tagging
  • 4.
    NLP & SemanticComputing Group Why Foundational Ontologies? John is Mary’s son Representation Reasoning Mary gave birth Data performs (Mary, give birth) son (John, Mary) son (x,y)  mother (y,x) mother (Mary, John) Foundational ontologies are intended to represent the world in the way people perceive it, classifying entities into categories that are familiar to people’s common sense can represent data in a formal way can reason over data through rules and restrictions
  • 5.
    NLP & SemanticComputing Group Practical Application Assumption Mary is a mother Hypothesis Mary gave birth Text Entailment Task Support definition (e.g. from WN) “a mother is a woman who has given birth” Foundational Ontology Mapping Rule Applying the rule Mary mother give birth agent role action (agent plays role) and (role performs action) -> (agent performs action) (Mary plays mother) and (mother performs give birth) -> (Mary performs give birth) Foundational classes Commonsense concepts
  • 6.
    NLP & SemanticComputing Group DOLCE-WordNet Alignment • Sweetening WordNet with DOLCE (Gangemi et al., 2003)  DOLCE: oriented towards language and cognition  813 noun synsets mapped to 50 DOLCE classes  No verb synsets mapped • Proposal:  Expand the nouns alignment to the whole taxonomy  Map also the verb synsets to DOLCE, using their links to already mapped noun synsets
  • 7.
    NLP & SemanticComputing Group Verb Alignment Methodology Update and Expansion of Nouns Alignment Mappings update from version 1.6 to 3.0: 809 synsets; Alignment expanded trough the taxonomy using the hypernym links: 80,897 mapped synsets – 98.5% noun database Top Level Verbs Selection Verbs classification performed over the top level synsets: 560 synsets Direct Links Derivationally related form lexical link retrieved for each of the 560 synsets; results manually filtered to identify the noun that best represents the verb occurrence. Examples: run - running; appear - apparition; leak - leakage Indirect Links When no direct links were found, indirect paths were searched, using the antonym and verb group links. Example: ignore [antonym of] know – knowingness Manual Assignment For verbs with no explicit direct or indirect link to a noun, implicit relationships given by the words in their gloss were then identified Example: overarch (“form an arch over”) - arch (“form an arch or curve”)
  • 8.
    NLP & SemanticComputing Group Verb Alignment Methodology • Alignment examples: breathe_1 breathing_1 process derivationally related form degrade_1 aggrade_1 event antonym change_2 inherited hypernym change_1 derivationally related form take orders_2 eventordinance_3 verbs nouns DOLCE classes (a) (b) (c) (a) Direct link (b) Indirect link (c) Manual assignment  Using as much information as it’s available in the synset’s gloss, in order to make the classification as less subjective as possible!
  • 9.
    NLP & SemanticComputing Group Alignment Results DOLCE class Top Synsets Full Taxonomy event 412 12,037 cognitive-event 63 854 state 62 597 process 15 259 cognitive-state 8 20 Total 560 13,767 direct links 36.25% indirect links 16.25% implicit relationships 47.50% explicit links 52.50% Alignments expanded trough the taxonomy using the hypernym links: verb database 100% mapped
  • 10.
    NLP & SemanticComputing Group Evaluation • Objective: evaluate the usefulness of the resulting alignments in a semantic annotation task (not the alignment quality!) • Datasets: SemCor 3.0 and eXtended WordNet (XWN)  Sense number for each word/phrase used to retrieve the synset ID, and then the DOLCE class associated to the synset  Labeled datasets used as gold standard • FO tagging approach:  Lookup in the WN-DOLCE mappings table  First sense WSD
  • 11.
    NLP & SemanticComputing Group Evaluation Results • Baseline:  Random: chooses a random label among the ones available for a word/phrase  SuperSense Tagger (Ciaramita & Altun, 2006): most similar tool, assigns a super sense (high level WN synsets) to each word/phrase XWN SemCor Precision Recall F1-Score Precision Recall F1-Score Random 71.82 72.04 71.93 61.52 62.52 62.02 FO Tagging 89.68 89.74 89.71 86.10 86.36 86.23 SuperSense Tagging - - - 76.65 77.71 77.18 9.05% Accuracy of the chosen approach for FO tagging at selecting the most suitable label from the standard mappings set
  • 12.
    NLP & SemanticComputing Group Known Issues • WordNet hypernym links not always effectively represent subsumption relationships  FO tagging deals with very high level categories  Related concepts tend to converge to the same category even when not following a strict subsumption relationship • Tagging restricted to the words present in WordNet  Future work: use the labeled datasets to train a machine learning tagger
  • 13.
    NLP & SemanticComputing Group Conclusions • Using a previous WN-DOLCE alignment for noun synsets, we extended the mapping to the verb synsets  Using lexical links and gloss’ words to track back the noun that best represent a verb occurrence  Assigning to the verb the same DOLCE class associated to its noun counterpart • Resulting alignment used in the implementation of the FO Tagging semantic annotation framework  Compared to SST, it showed an increase of 9.05% in accuracy, besides introducing a more homogeneous and conceptually well-grounded set of categories  Even using a simple WSD technique, it is possible to annotate text with high accuracy
  • 14.
    NLP & SemanticComputing Group N L P Word Tagging with Foundational Ontology Classes: Extending the WordNet-DOLCE Mapping to Verbs Thanks!