Data Integration Ontology Mapping


Published on

semantic web ontology mapping

Published in: Technology, Education
  • @radhikanarsi use jena to parse the owl file and extract data
    Are you sure you want to  Yes  No
    Your message goes here
  • how to retrieve data from ontologies?
    Are you sure you want to  Yes  No
    Your message goes here
  • how to represent user knowledge through ontologies in data mining for Association rule mining?
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Integration Ontology Mapping

  1. 1. Pradeep Pillai and Michael Kandefer Department of Computer Science and Engineering University at Buffalo Buffalo, NY, 14260 {pbpillai,mwk3} Schema Matching and Ontology Mapping: A Comparison
  2. 2. <ul><li>Interoperability problem </li></ul><ul><ul><li>Problem of combining heterogeneous and distributed data sources </li></ul></ul><ul><ul><ul><li>Two solutions: </li></ul></ul></ul><ul><ul><ul><ul><li>Schema matching </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Ontology mapping </li></ul></ul></ul></ul><ul><li>W3C converging on standards for publishing web ontologies (e.g. OWL) </li></ul><ul><ul><ul><ul><li>Distributed ontologies is still an issue </li></ul></ul></ul></ul><ul><li>Intuition: Schema matching approaches are applicable to the ontology domain </li></ul>Introduction
  3. 3. <ul><li>Schema Matching </li></ul><ul><li>Ontology Mapping </li></ul><ul><li>Comparison </li></ul><ul><li>Ontology mapping using schema matching </li></ul><ul><li>Conclusion </li></ul>AGENDA
  4. 4. <ul><li>Distinction between matching and mapping isn’t clear </li></ul><ul><ul><li>Schema matching: process of “establishing [logical] correspondences between elements of the source and target schemas” [Cho08] </li></ul></ul><ul><ul><li>Schema mapping: process of generating the assertions from schema matching </li></ul></ul><ul><ul><ul><ul><li>Sometimes called “instance mapping” </li></ul></ul></ul></ul>Schema Matching Definition
  5. 5. <ul><li>Two general categories [ShvEuz05,MadBerRah01] </li></ul><ul><li>Element-based: Mappings created based on analysis of the schema elements </li></ul><ul><ul><li>String-based </li></ul></ul><ul><ul><li>Language-based </li></ul></ul><ul><ul><li>Constraint-based </li></ul></ul><ul><li>Structure-based: Mapping created based on analysis of the elements and schema structure </li></ul><ul><ul><li>Tree-based </li></ul></ul><ul><ul><li>Graph-based </li></ul></ul><ul><li>Matching approaches aren’t mutually exclusive </li></ul><ul><ul><li>Hybrid systems employ multiple methodologies </li></ul></ul><ul><li>Other properties </li></ul><ul><ul><li>Mappings need not be 1:1 </li></ul></ul><ul><ul><li>Auxiliary information can be utilized </li></ul></ul>Schema matching topology
  6. 6. <ul><li>Utilizes string comparisons between elements to establish mappings </li></ul><ul><ul><li>Prefix/Suffix: Look for similar prefixes/suffixes </li></ul></ul><ul><ul><li>Edit distance: How many swaps, additions, or subtractions it takes to convert one element into the other </li></ul></ul><ul><ul><li>NGram: compute the number of common substrings of length n </li></ul></ul><ul><ul><li>Ex. COMA, S-Match </li></ul></ul>Element-based: String mappings
  7. 7. Element-based: String mappings - Prefix(3) - 3-Gram(2) - Edit distance(5) PurchaseOrder DeliverTo InvoiceTo Items Address Address Item Street City City Street ItemCount ItemNumber Quantity UnitOfMeasure PO POShipTo POBillTo POLines Item Street City City Street Count Line Qty UoM
  8. 8. <ul><li>Utilizes properties of language in order to find elements with a common word sense </li></ul><ul><ul><li>Normalization </li></ul></ul><ul><ul><ul><li>Tokenization: Punctuation used to divide an element into tokens. </li></ul></ul></ul><ul><ul><ul><li>Expansion: Expand acronym and short-hand tokens. </li></ul></ul></ul><ul><ul><ul><li>Elimination: Remove undesirable tokens, such as prepositions, before comparison </li></ul></ul></ul><ul><ul><ul><li>Lemmatization: Tokens converted to their basic form (e.g. remove pluralization) and compared </li></ul></ul></ul><ul><li>Auxiliary information: Utilize external sources to aid matching </li></ul><ul><ul><li>Wordnet, thesauri, or dictionaries </li></ul></ul><ul><li>Ex. Cupid, S-Match </li></ul>Element-based: Language mappings
  9. 9. Element-based: Language mappings POBillTo InvoiceTo <PO,Bill,To> <Invoice,To> Tokenize: <PO,Bill> <Invoice> Elimination: Expansion: <Purchase,Order,Bill> <Invoice> <Purchase,Order,Bill> <Bill> Related form: PurchaseOrder DeliverTo InvoiceTo Items Address Address Item Street City City Street ItemCount ItemNumber Quantity UnitOfMeasure PO POShipTo POBillTo POLines Item Street City City Street Count Line Qty UoM
  10. 10. <ul><li> Represents schemas as graphs/trees </li></ul><ul><ul><li>Nodes are elements and attributes </li></ul></ul><ul><ul><li>Arcs are relationships </li></ul></ul><ul><li>Assumes matched elements between two graphs should have related elements that can be matched </li></ul><ul><li>Ex. Similarity flooding, Cupid </li></ul>Structure-based: Graph/tree mappings
  11. 11. <ul><li>Ontology definition: </li></ul><ul><li>“ Specification of a conceptualization .” [ Gru92 ] </li></ul><ul><li>” Explicit formal specification of the terms in the domain and relations among them .” </li></ul><ul><li>Ontology Mapping Definition: </li></ul><ul><li>“ Given two ontologies O1 and O2 , mapping one ontology onto another means that for each entity (concept C , relation R ,or instance I ) in ontology O1 , we try to find a corresponding entity, which has the same intended meaning, in ontology O2 ” [Ehrig and Staab] </li></ul>Ontology Mapping Problem
  12. 12. <ul><li>Research Classification of Ontology Mapping [Noy04] </li></ul><ul><li>Mapping Discovery aims to find the similarities between two ontologies, and how do we determine which concepts and properties represent similar notions? </li></ul><ul><li>Declarative formal representation of mappings identifies the ways we can represent the mappings between two ontologies to enable reasoning with that mapping. </li></ul><ul><li>Reasoning with mappings Is concerned with performing reasoning based on the mapping between ontologies. After defining the mapping, what type of and how we can perform reasoning on these mappings? </li></ul>Ontology Mapping Research
  13. 13. <ul><li>Snoggle </li></ul><ul><li>A user interactive visual ontology mapping tool. </li></ul><ul><li>User’s define mappings definitions between the two ontologies which are expressed in SWRL (Semantic Web Rule Language). </li></ul><ul><li>Converted into Jena Rules which are applied to the Jena inference engine to produce instances which can be queried. </li></ul>Survey : State of the Art
  14. 14. <ul><li>GLUE – [ Doa+3 ] - Machine learning techniques to find mappings. </li></ul><ul><li>If the system is provided with two ontologies, for each concept in one ontology it finds the most similar concept in the other ontology. </li></ul><ul><li>GLUE architecture consists of </li></ul><ul><li>- Distribution Estimator </li></ul><ul><li> - Similarity Estimator. </li></ul><ul><li>- Relaxation Labeler </li></ul><ul><li>GLUE output's one to one correspondences </li></ul><ul><li>between the taxonomies the ontologies . </li></ul><ul><li>- String similarity, structure and </li></ul><ul><li>and machine learning strategies. </li></ul>GLUE
  15. 15. <ul><li>PROMPT [Noy04] </li></ul><ul><li>Input: Two ontology's in OWL/ OKBC </li></ul><ul><li>Output: Suggestions of mapping and a merging ontology based on the choice made by the user. </li></ul><ul><li>iPROMPT : Interactive ontology merging tool. </li></ul><ul><li>AnchorPROMT : Graph-based mappings to provide additional information for iPROMPT. </li></ul><ul><li>PROMPTDiff : Compares different ontology versions by combining matchers in a fixed point manner. </li></ul><ul><li>PROMPTFactor : Tool for extracting a part of an ontology. </li></ul>PROMPT
  16. 16. <ul><li>Lucene Ontology Mapper </li></ul><ul><li> The source ontology is indexed into Lucene Documents (fields) using the </li></ul><ul><li>Lucene search engine </li></ul><ul><li>Each field in the target ontology is provided as a search argument which is turn compared with the fileds in the source document and the hit scores are computed. </li></ul><ul><li>Fields with the maximum hit scores are said to be similar and hence mapped. </li></ul><ul><li>PowerMap also uses Lucene as part of its Ontology Mapping Framework </li></ul>IR Approaches
  17. 17. <ul><li>QOM </li></ul><ul><li>String similarity, structure and instances. </li></ul><ul><li>Input : Two OWL or RDFS ontology's with elements (e.g., classes, properties, instances) in the ontology's </li></ul><ul><li>Output: One-to-one or one-to-none correspondences. </li></ul><ul><ul><ul><li>Heuristics are used to lower the number of candidate mappings. </li></ul></ul></ul><ul><ul><ul><li>It avoids the complete pair wise comparison of trees in favor of the top-down strategy </li></ul></ul></ul><ul><ul><ul><li>Sigmoid functions are applied which emphasizes high individual similarities and de-emphasizes low individual similarities </li></ul></ul></ul><ul><ul><ul><li>Threshold is used to discard spurious evidence of similarity. </li></ul></ul></ul>QOM
  18. 18. <ul><li>Schemas [Cho08,UscGru03] </li></ul><ul><ul><li>Specify database structure </li></ul></ul><ul><ul><ul><li>Relationships </li></ul></ul></ul><ul><ul><ul><li>Attributes </li></ul></ul></ul><ul><ul><li>Typically relational or XML </li></ul></ul><ul><li>Ontologies [UscGru03, ShvEuz05] </li></ul><ul><ul><li>Formal semantic specification of a shared conceptualization </li></ul></ul><ul><ul><ul><li>Concepts </li></ul></ul></ul><ul><ul><ul><li>Relationships </li></ul></ul></ul><ul><ul><li>Typically encoded with formal languages </li></ul></ul><ul><ul><ul><li>Description logics </li></ul></ul></ul><ul><ul><li>Most utilize taxonomic structure </li></ul></ul>Schemas and Ontologies
  19. 19. <ul><li>Both are forms of meta-data </li></ul><ul><li>Both utilized for domain description </li></ul><ul><li>Both utilize constraints (but in different ways) </li></ul>Similarities
  20. 20. <ul><li>Few differences </li></ul><ul><li>The essential (and trivial) difference is what each specifies and their uses </li></ul><ul><ul><li>DB for querying </li></ul></ul><ul><ul><li>Ontologies for search and derivation </li></ul></ul><ul><ul><li>Lines are blurring (e.g. SPARQL) </li></ul></ul><ul><li>Schemas don’t have semantics </li></ul><ul><ul><li>Relational schemas lack generality </li></ul></ul><ul><li>Ontologies use constraints to establish meaning </li></ul><ul><li>Schemas use constraints to establish integrity </li></ul>Differences
  21. 21. <ul><li>Element matching approaches [Wac+6] </li></ul><ul><ul><li>Top-level ontologies </li></ul></ul><ul><ul><ul><li>Shared ontology utilized for common language and semantics for subsumed ontologies </li></ul></ul></ul><ul><ul><ul><li>Ontologies that inherit the top-level ontology can be mapped easier </li></ul></ul></ul><ul><ul><li>Semantic Correspondence </li></ul></ul><ul><ul><ul><li>Utilizes top-level ontologies for automatic ontology mapping </li></ul></ul></ul><ul><ul><ul><li>Formal concept analysis: Produces a common concept lattice between ontologies through object-attribute analysis </li></ul></ul></ul><ul><li>Structure level [ShvEuz05] </li></ul><ul><ul><li>Topology matching </li></ul></ul><ul><ul><ul><li>Utilizes sub-/super- class semantics </li></ul></ul></ul><ul><ul><ul><li>Assumes the superclasses and subclasses of matched elements are more likely to be related </li></ul></ul></ul><ul><ul><li>Model matching </li></ul></ul><ul><ul><ul><li>Utilizes semantic interpretations of ontologies to construct logical representations of potential mappings </li></ul></ul></ul><ul><ul><ul><li>Utilizes background “knowledge” to provide axioms for the representation </li></ul></ul></ul><ul><ul><ul><li>Runs a SAT/Validity checker to determine “correct” mappings </li></ul></ul></ul>Consequences of Differences
  22. 22. <ul><li>Due to similarities, and few differences </li></ul><ul><ul><li>Applications can be made that translate DB Schemas to Ontologies [XuZhaDon06] </li></ul></ul><ul><ul><li>Methodologies developed with both in mind will benefit both </li></ul></ul><ul><ul><li>Algorithms for schema matching applicable to ontology mapping </li></ul></ul><ul><ul><ul><li>Some approaches that rely on semantics prevent the opposite [Hess06] </li></ul></ul></ul><ul><ul><ul><li>Schema vocabularies and forced taxonomic structure could eliminate this </li></ul></ul></ul>Schema -> Ontology
  23. 23. <ul><li>Implementing an algorithm for OWL ontology mapping based on Cupid </li></ul><ul><li>Cupid [MadBerRah01] </li></ul><ul><ul><li>Hybrid approach </li></ul></ul><ul><ul><li>Uses linguistic and data-type constraint matching followed by tree structure mapping </li></ul></ul><ul><ul><li>“ Derives” mappings as a result of coefficient computation </li></ul></ul><ul><li>Our approach </li></ul><ul><ul><li>Parse two OWL ontologies </li></ul></ul><ul><ul><li>Use a simple string matcher for initial similarities </li></ul></ul><ul><ul><li>Utilize tree structure methodology on known OWL semantics </li></ul></ul>Schema Matching Algorithm
  24. 24. <ul><li>Assumptions </li></ul><ul><ul><li>Leaf nodes are structurally ( ssim) similar if they have lexical and data-type similarity </li></ul></ul><ul><ul><ul><li>lsim(s,t) [0-1] : Lexical similarity uses substring, normalization, and hypernymy and synonymy matching </li></ul></ul></ul><ul><ul><ul><li>data-type-similarity(s,t) [0-.5] : Look up table of data-types and their similarity </li></ul></ul></ul><ul><ul><li>Non-leaf nodes are ssim if they are lsim and their leaf nodes are </li></ul></ul><ul><ul><li>weighted similarly ( wsim), immediate children do not influence ssim . </li></ul></ul><ul><ul><ul><li>wsim(s,t) [0-1] : Measure of the lexical and structural similarity. Preference to one or the other is controlled by a modifying constant. </li></ul></ul></ul><ul><li>Constants </li></ul><ul><ul><li>w struct : Modifies the influence of each matcher </li></ul></ul><ul><ul><li>th accept : When to accept two leaf nodes as strongly linked </li></ul></ul><ul><ul><li>th high /th low : When to increase/decrease structural similarity </li></ul></ul><ul><ul><li>c inc /c dec : How much to increase/decrease structural similarity </li></ul></ul><ul><li>Algorithm – TreeMatch( S , T ) </li></ul><ul><ul><li>Initialize ssim(s,t) = data-type-similarity(s,t) for every leaf node in S and T </li></ul></ul><ul><ul><li>Using post-order traversal, for every node s in S, and node t in T </li></ul></ul><ul><ul><ul><li>wsim(s,t) = w struct * ssim(s,t) + (1 – w struct ) * lsim(s,t) </li></ul></ul></ul><ul><ul><ul><li>if wsim(s,t) > th high increase ssim for all leaf nodes of s and t by c inc </li></ul></ul></ul><ul><ul><ul><li>if wsim(s,t) < thlow decrease ssim for all leaf nodes of s and t by c dec </li></ul></ul></ul>Tree Matcher
  25. 25. Cupid Mappings - High lsim (A1) - High wsim (A2) - Matches PurchaseOrder DeliverTo InvoiceTo Items Address Address Item Street City City Street ItemCount ItemNumber Quantity UnitOfMeasure PO POShipTo POBillTo POLines Item Street City City Street Count Line Qty UoM
  26. 26. <ul><li>Schema Matching and Ontology Matching address similar problems </li></ul><ul><li>Schema matching approaches are applicable to ontology mapping </li></ul><ul><ul><li>Doesn’t utilize semantic information </li></ul></ul><ul><ul><li>The opposite doesn’t hold. </li></ul></ul><ul><li>Hybrid approaches are the best methodologies for automatic, generic schema matching and ontology mapping </li></ul><ul><li>Systems that employ schema matching might be capable of working with ontologies provided minimal adjustment (e.g. Cupid) </li></ul><ul><ul><li>Additional experimentation is needed </li></ul></ul>Conclusions
  27. 27. <ul><li>[Cho08] – J. Chomicki. Data Integration: Schema Mapping. February 2008. </li></ul><ul><li> </li></ul><ul><li>[Doa+3] – A. Doan, J. Madhavan, P. Domingos, and A. Halevy . Learning to Map between Ontologies on the </li></ul><ul><li> Semantic Web. Proceedings of the 11th international conference on World Wide Web . 2002. </li></ul><ul><li>[Gru93] – T.R. Grubber. A Translation Approach to Portable Ontologies . </li></ul><ul><li> Knowledge Acquisition 5(2) . 1992 </li></ul><ul><li>[Hess06] – A. Hess. An Interative Algorithm for Ontology Mapping Capable of Using Training Data. </li></ul><ul><li>Proceedings of ESWC '06 . 2006. </li></ul><ul><li>[MadBerRah01] – J. Madhaven, P. Bernstein, and E. Rahm. Gweneric Schema Matching with Cupid. </li></ul><ul><li> Proceedings of the 27 th VLDB Conference. 2001. </li></ul><ul><li>[Noy04] - N. Noy. Semantic Integration: A Survey of Ontology-based Approaches. </li></ul><ul><li> Sigmond Record, Special Issue on Semantic Integration. 2004 </li></ul><ul><li>[SchvEuz05] – P. Shvaiko and J. Euzenat. A Survey of Schema-based Matching Approaches. </li></ul><ul><li> Journal on Data Semantics. 2005. </li></ul><ul><li>[UscGru05] – M. Uschold and M. Gruninger. Ontology and Semantics for Seamless Connectivity. </li></ul><ul><li> Sigmond Record 33(4). 2004. </li></ul><ul><li>[Wac+6] - H. Wache, T. Vogele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. Hubner. </li></ul><ul><li> Ontology-based Integration of Information: A Survey of Existing Approaches. IJCAI--01 Workshop: </li></ul><ul><li> Ontologies and Information Sharing. 2001 </li></ul><ul><li>[XuZhaDon06] – Z. Xu, S. Zhang, and Y. Dong. Mapping between Relational Database Schema and OWL </li></ul><ul><li> Ontology for Deep Annotation. Proceedings of the 2006 IEEE/WIC/ACM International Conference </li></ul><ul><li> on Web Intelligence. 2006. </li></ul>References