Presentation of Alkhaldi, N., Debruyne, C. (2011) Comparing XML Files with a DOGMA Ontology to Generate Omega-RIDL Annotations. In Proc. of On the Move to Meaningful Internet Systems 2011: OTM Workshops - Semantic and Decision Support (SeDeS 2011), LNCS, Springer - October 2011
Abstract: To facilitate the process of annotating data in the DOGMA ontology-engineering framework, we present a method and tool for semi-automatic annotation of XML data using an ontology. XML elements are compared against concepts and their interrelations in the ontology using various metrics at different levels (lexical level, semantic level, structural level, etc.). The result of these metrics are then used to propose the user a series of annotations from XML elements to concepts in the ontology, which are then validated by that user. Those annotations - expressed in Ω-RIDL - are then used to transform data from one format into another format. In this paper, we demonstrate our approach on XML data containing vendor offers in the tourism domain, more precisely holiday packages.
Automating Google Workspace (GWS) & more with Apps Script
Generating Annotations by Comparing XML and Ontology
1. Comparing XML Files with a
DOGMA Ontology to Generate
Ω-RIDL Annotations
Nadejda Alkhaldi and Christophe Debruyne
16/10/11 Herhaling titel van presentatie 1
3. Introduction
Ontologies are a [formal,] explicit specification of a
[shared] conceptualization (Gruber)
Autonomously developed and maintained information
systems commit to the ontology, a mostly manual
activity.
How can we automate (a part) of this process?
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.3
4. Method: overview
First we need an ontology.
– We used the DOGMA method for ontology engineering
– The development of the ontology is reported elsewhere in
Debruyne et al. (WEBIST 2011)
Semi-automatically annotate the data
– Match concept in the (structure of) the data to the ontology
– Generate a Ω-RIDL commitment file
– Review of the mappings by representative of the information
system
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.4
5. γ in Γ Context-identifiers,
Method: DOGMA pointers to a community
DOGMA Ontology Descriptions <Λ, ci, K>
– Λ a lexon base, a finite set of plausible binary fact types called
lexons <γ, t1, r1, r2, t2>
<Vendor Community, Offer, has, is of, Title>
– ci a partial function mapping context-identifiers and terms to
concepts
– K a finite set of ontological commitments containing
– A selection of lexons
– A mapping from application symbols to ontology terms
– Predicates over those terms and roles to express constraints
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.5
6. Method: DOGMA
Example of a commitment
Ω-RIDL: Verheyden et al. (SWDB 2004), Trog et al. (RuleML 2007)
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.6
7. Method: (Semi)-Automatic Annotation
First … related work?
– Annotation Techniques:
AeroDAML, SHOE Knowledge Annotator, S-CREAM, MnM,
Armadillo, KIM, SemTag, Ontea.
– Ontology and schema matching techniques:
CUPID, iMAP, oMAP, H-Match
– Looking at different aspect and reusing ideas that might be
usable
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.7
8. Method: (Semi)-Automatic Annotation
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.8
9. Method: (Semi)-Automatic Annotation
Some considerations
– Ontology contains explicit relations between concepts, the XML
not
– XML tags can be matched concepts of the ontology, but the
content of a tag can also represent an a concept
E.g., <facility type=“bar”> should be typed onto the concept of
Bar and not onto Facility of which Bar is a subtype.
– No XML Schema to rely on!
– Spelling mistakes/language variations
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.9
10. Method: (Semi)-Automatic Annotation
1) Element match
– Match tag and attribute names using string metrics
2) Linguistic match
– Match tag and attribute names using an external thesaurs (e.g.,
WordNet or a domain specific thesuarus)
3) Content match
– Match the content of a tag (with respect to the tag) to identify
the concept represented by the content
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.10
11. Method: (Semi)-Automatic Annotation
4) Structural Match
– Adjust the previously computed weighted means by looking to
the structure of both the ontology graph and XML-tree.
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.11
12. Method: (Semi)-Automatic Annotation
To summarize:
– using an XML and a DOGMA ontology
– a series of mapping scores are calculated based on element,
linguistic and content match
– Those scores are then refined using the structural match
– The refined scores are then compared against a threshold to
produce the Ω-RIDL mappings.
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.12
13. Method: Summary
– using an XML and a DOGMA ontology
– a series of mapping scores are calculated based on element,
linguistic and content match
– Those scores are then refined using the structural match
– The refined scores are then compared against a threshold to
produce the Ω-RIDL mappings.
– The user can then use the generated mappings to get an idea
how his application can commit to the ontology and then decide
how to do so.
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.13
15. Experiment
Data of the COMDRIVE RFP project
– Holiday Packages in the winter sports domain
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.15
16. Experiment
Data of the COMDRIVE RFP project
– Holiday Packages in the winter sports domain
Ontology developed in several iterations in the project
– Bootstrapping of the ontology
– Meeting with vendor experts
– Meeting with consumer experts
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.16
17. Experiment
Data of the COMDRIVE RFP project
– Holiday Packages in the winter sports domain
Ontology developed in several iterations in the project
– Bootstrapping of the ontology
– Meeting with vendor experts
– Meeting with consumer experts
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.17
18. Experiment
Some generated mappings
– map ‘‘/countries/country/sumary/code’’ on
Code identifies / identified by Commodity.
– map ‘‘/countries/country/regions/region’’ on Region.
– map ‘‘/countries/country/regions/region’’ on
Ski Area destination of / with destination Holiday Package.
– map ‘‘/countries/country/regions/region/cities/city’’ City.
– …
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.18
19. Conclusions
The four heuristics were able to tackle the considerations
mentioned.
The algorithm depends on a good choice of parameters, otherwise
a lot of “nonsense” mappings are generated
The structural match needs to be revisited to cope with more
complicated cases such as:
– map ‘‘/countries/country/regions/region/summary/description’’
on Description of / has RFP.
Appropriate for suggesting the user mappings (needs testing)
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.19
20. Future work
Revision of the structural match
Integration with tool suite (e.g., Business Semantics Studio)
Additional testing
Comparing XML Files with a DOGMA Ontology to
Generate Ω-RIDL Annotations
16/10/11 Pag.20