Ontology For Data Integration

2,333 views
2,238 views

Published on

semantic data integration the process of using a conceptual representation of the data and of their relationships to eliminate possible heterogeneities.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,333
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
63
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ontology For Data Integration

  1. 1. Some Thoughts<br />Juan Esteva, Ph. D.<br />.<br />751 Malena Dr., Ann Arbor, MI 48103<br />Tel: 734-786-0233 <br />Cell 734-277-4962<br />Fax 734-821-0235<br />SkypeDrEsteva<br />juan.esteva@Ajatella.com<br />Ontology Data Integration For Competitive Decision Making<br />
  2. 2. Not Just The Facts<br />3/4/2010<br />Juan Esteva, Ph. D.<br />2<br />“Good decisions are based on information that is analyzed and transformed into usable knowledge” Eileen Feretic<br />
  3. 3. Information at the point of impact<br />3/4/2010<br />Juan Esteva, Ph. D.<br />3<br />“Information needs to be at the point of impact—at the front lines where people are making decisions. The right analysis needs to be done at the right place. It’s important for organizations to treat information as a strategic asset in order to optimize every decision, every process, everything they do.” AmbujGoyal,<br />
  4. 4. Data in Silos<br />3/4/2010<br />Juan Esteva, Ph. D.<br />4<br />“One of the biggest challenges organizations face is the amount of data sitting in silos, too often, valuable data simply isn’t accessible or available.” Boris Evelson<br />
  5. 5. Business Decisions for Competitive Advantage<br />3/4/2010<br />Juan Esteva, Ph. D.<br />5<br />“In today’s troubled economy and competitive business environment, making good decisions is a matter of survival. But good decisions aren’t based on gut feeling alone. They should be based on information gathered from multiple sources, which is then synthesized and analyzed to generate a road map of options and possible outcomes that transform data into usable knowledge” Eileen Feretic<br />
  6. 6. Business Intelligence<br />3/4/2010<br />Juan Esteva, Ph. D.<br />6<br />Business Intelligence and now Business Analytics systems come into play <br />[However,] it is hard to assemble [heterogeneous data and] disparate pieces of information in a way that provides the intelligence and insight needed to make good business decisions. Eileen Feretic<br />Alas enter Ontology Data Integration.<br />
  7. 7. Data Integration<br />3/4/2010<br />Juan Esteva, Ph. D.<br />7<br />Data integration provides the ability to manipulate data transparently across multiple data sources.<br />Based on the architecture there are 2 systems:<br />Central Data Integration<br />A central data integration system usually has a global schema, which provides the user with a uniform interface to access information stored in the data sources<br />Peer-2-peer<br />In contrast, in a peer-to-peer data integration system, there are no global points of control on the data sources (or peers). Instead, any peer can accept user queries for the information distributed in the whole system.<br />
  8. 8. Common Approaches for Data Integration<br />3/4/2010<br />Juan Esteva, Ph. D.<br />8<br />Global-as-View<br />In the GaV approach, every entity in the global schema is associated with a view over the source local schema. Therefore querying strategies are simple, but the evolution of the local source schemas is not easily supported.<br />Local-as-View<br />On the contrary, the LaV approach permits changes to source schemas without affecting the global schema, since the local schemas are defined as views over the global schema, but query processing can be complex.<br />
  9. 9. Data Heterogeneity<br />3/4/2010<br />Juan Esteva, Ph. D.<br />9<br />Data sources can be heterogeneous in:<br />Syntax<br />Syntactic heterogeneity is caused by the use of different models or languages.<br />Schema<br />Schematic heterogeneity results from structural differences.<br />Semantics<br />Semantic heterogeneity is caused by different meanings or interpretations of data in various contexts<br />To achieve data interoperability, the issues posed by data heterogeneity need to be eliminated<br />
  10. 10. Possible Solutions<br />3/4/2010<br />Juan Esteva, Ph. D.<br />10<br />The advent of XML has created a syntactic platform for Web data standardization and exchange. However, schematic data heterogeneity may persist, depending on the XML schemas used (e.g., nesting hierarchies). Likewise, semantic heterogeneity may persist even if both syntactic and schematic heterogeneities do not occur (e.g., naming concepts differently).<br />We should be concerned with solving all three kinds of heterogeneities by bridging syntactic, schematic, and semantic heterogeneities across different sources.<br />
  11. 11. Semantic Data Integration Using Ontologies<br />3/4/2010<br />Juan Esteva, Ph. D.<br />11<br />We call semantic data integration the process of using a conceptual representation of the data and of their relationships to eliminate possible heterogeneities.<br />At the heart of semantic data integration is the concept of ontology, which is an explicit specification of a shared conceptualization<br />
  12. 12. Ontology & Data Integration<br />3/4/2010<br />Juan Esteva, Ph. D.<br />12<br />Metadata Representation. Metadata (i.e., source schemas) in each data source can be explicitly represented by a local ontology, using a single language.<br />Global Conceptualization. The global ontology provides a conceptual view over the schematically-heterogeneous source schemas.<br />Support for High-level Queries. Given a high-level view of the sources, as provided by a global ontology, the user can formulate a query without specific knowledge of the different data sources. The query is then rewritten into queries over the sources, based on the semantic mappings between the global and local ontologies.<br />Declarative Mediation. Query processing in a hybrid peer-to-peer system uses the global ontology as a declarative mediator for query rewriting between peers.<br />Mapping Support. A thesaurus, formalized in terms of an ontology, can be used for the mapping process to facilitate its automation.<br />
  13. 13. What do we need?<br />3/4/2010<br />Juan Esteva, Ph. D.<br />13<br />Increase search capabilities<br />From discovery to reasoning<br />Increasing metadata as to provide strong semantics<br />From glossaries to ontologies<br />Consequently, moving from syntactic interoperability to structural interoperability and finally to semantic interoperability<br />
  14. 14. Graphically the model progression will be [2] <br />3/4/2010<br />Juan Esteva, Ph. D.<br />14<br />The point of this graph is that Increasing Metadata (from glossaries to ontologies) is highly correlated with Increasing Search Capability (from discovery to reasoning).<br />
  15. 15. Juan Esteva, Ph. D.<br />3/4/2010<br />15<br />References<br />
  16. 16. References<br />3/4/2010<br />Juan Esteva, Ph. D.<br />16<br />Applying 4D ontologies to Enterprise Architecture, Matthew West, Shell Corp.<br />FHA Data Architecture Working Group: SICoP DRM 2.0 Pilot, 2005<br />The Role of Ontologies in Data Integration, Isabel F. Cruz Huiyong Xiao<br />
  17. 17. Topic Maps<br />3/4/2010<br />Juan Esteva, Ph. D.<br />17<br />Topic Maps is a standard for the representation and interchange of knowledge, with an emphasis on the findability of information. The ISO standard is formally known as ISO/IEC 13250:2003.<br />A topic map represents information using topics (representing any concept, from people, countries, and organizations to software modules, individual files, and events), associations (representing the relationships between topics), and occurrences (representing information resources relevant to a particular topic).<br />
  18. 18. SKOS<br />3/4/2010<br />Juan Esteva, Ph. D.<br />18<br />Simple Knowledge Organization System (SKOS) <br />SKOS is a common data model for sharing and linking knowledge organization systems via the Web. <br />
  19. 19. RDF<br />3/4/2010<br />Juan Esteva, Ph. D.<br />19<br />Resource Description Language RDF<br />RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed. <br />
  20. 20. OWL<br />3/4/2010<br />Juan Esteva, Ph. D.<br />20<br />Web Ontology Language OWL<br />is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things. OWL is a computational logic-based language such that knowledge expressed in OWL can be reasoned with by computer programs either to verify the consistency of that knowledge or to make implicit knowledge explicit. OWL documents, known as ontologies, can be published in the World Wide Web and may refer to or be referred from other OWL ontologies. OWL is part of the W3C’s Semantic Web technology stack, which includes RDF, RDFS, SPARQL, etc. <br />

×