Your SlideShare is downloading. ×
0
‫أكاديمية الحكومة اإللكترونية الفلسطينية‬              The Palestinian eGovernment Academy                         www.ego...
AboutThis tutorial is part of the PalGov project, funded by the TEMPUS IV program of theCommission of the European Communi...
© Copyright NotesEveryone is encouraged to use this material, or part of it, but shouldproperly cite the project (logo and...
Tutorial Map                                                                                                          Topi...
Module ILOsAfter completing this module students will be able to:   - Understand the importance of Data Integration.   - U...
Example from the government Domain    •   Consider all interactions with government agencies in order to register        a...
Example from the government Domain    •   Consider when the business evolves or changes.    •   Example: Changing the addr...
Example from the government Domain•   Consider the data registered about the same radio station in the    databases of dif...
Example from the government Domain•   From our simple example one can point out to some challenges in    Data Integration:...
Problem is in all domains               PalGov © 2011   10
Problem is in all domains•   Problem is now even more challenging with the Web.•   The Data Web envisions the web as a glo...
Challenges of Data Integration:      Heterogeneities in Database Schemas• One can distinguish between several heterogeneit...
Name and Meaning Heterogeneities•   Synonyms – Different names for the same concepts    – employee, clerk    – exam, cours...
Heterogeneities in Structure and Type                                                        Source: Carlo Batini•   The s...
Heterogeneities in Structure                                                             Source: Carlo Batini• EXAMPLES:  ...
Heterogeneities in Type Examples:    In a single attribute (e.g., Numberic, Alphanumeric). E.g., the     attribute “gend...
Heterogeneities in the rules and constraints                                                          Source: Carlo Batini...
Model Heterogeneities•   Model Heterogeneities occurs when different databases adheres to    different data models:    – R...
References•   Carlo Batini: Course on Data Integration. BZU IT Summer School    2011.•   Stefano Spaccapietra: Information...
Upcoming SlideShare
Loading in...5
×

Pal gov.tutorial2.session12 1.the problem of data integration

512

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
512
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Pal gov.tutorial2.session12 1.the problem of data integration"

  1. 1. ‫أكاديمية الحكومة اإللكترونية الفلسطينية‬ The Palestinian eGovernment Academy www.egovacademy.psTutorial II: Data Integration and Open Information Systems Session 12.1 The problem of Data Integration Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info PalGov © 2011 1
  2. 2. AboutThis tutorial is part of the PalGov project, funded by the TEMPUS IV program of theCommission of the European Communities, grant agreement 511159-TEMPUS-1-2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.psProject Consortium: Birzeit University, Palestine University of Trento, Italy (Coordinator ) Palestine Polytechnic University, Palestine Vrije Universiteit Brussel, Belgium Palestine Technical University, Palestine Université de Savoie, France Ministry of Telecom and IT, Palestine University of Namur, Belgium Ministry of Interior, Palestine TrueTrust, UK Ministry of Local Government, PalestineCoordinator:Dr. Mustafa JarrarBirzeit University, P.O.Box 14- Birzeit, PalestineTelfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011 2
  3. 3. © Copyright NotesEveryone is encouraged to use this material, or part of it, but shouldproperly cite the project (logo and website), and the author of that part.No part of this tutorial may be reproduced or modified in any form or byany means, without prior written permission from the project, who havethe full copyrights on the material. Attribution-NonCommercial-ShareAlike CC-BY-NC-SAThis license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creationsunder the identical terms. PalGov © 2011 3
  4. 4. Tutorial Map Topic h Intended Learning Objectives Session 1: XML Basics and Namespaces 3A: Knowledge and Understanding Session 2: XML DTD‟s 3 2a1: Describe tree and graph data models. Session 3: XML Schemas 3 2a2: Understand the notation of XML, RDF, RDFS, and OWL. 2a3: Demonstrate knowledge about querying techniques for data Session 4: Lab-XML Schemas 3 models as SPARQL and XPath. Session 5: RDF and RDFs 3 2a4: Explain the concepts of identity management and Linked data. Session 6: Lab-RDF and RDFs 3 2a5: Demonstrate knowledge about Integration &fusion of Session 7: OWL (Ontology Web Language) 3 heterogeneous data. Session 8: Lab-OWL 3B: Intellectual Skills Session 9: Lab-RDF Stores -Challenges and Solutions 3 2b1: Represent data using tree and graph data models (XML & Session 10: Lab-SPARQL 3 RDF). Session 11: Lab-Oracle Semantic Technology 3 2b2: Describe data semantics using RDFS and OWL. Session 12_1: The problem of Data Integration 1.5 2b3: Manage and query data represented in RDF, XML, OWL. Session 12_2: Architectural Solutions for the Integration Issues 1.5 2b4: Integrate and fuse heterogeneous data. Session 13_1: Data Schema Integration 1C: Professional and Practical Skills Session 13_2: GAV and LAV Integration 1 2c1: Using Oracle Semantic Technology and/or Virtuoso to store Session 13_3: Data Integration and Fusion using RDF 1 and query RDF stores. Session 14: Lab-Data Integration and Fusion using RDF 3D: General and Transferable Skills 2d1: Working with team. Session 15_1: Data Web and Linked Data 1.5 2d2: Presenting and defending ideas. Session 15_2: RDFa 1.5 2d3: Use of creativity and innovation in problem solving. 2d4: Develop communication skills and logical reasoning abilities. Session 16: Lab-RDFa 3 PalGov © 2011 4
  5. 5. Module ILOsAfter completing this module students will be able to: - Understand the importance of Data Integration. - Understand the problems and challenges of Data Integration. PalGov © 2011 5
  6. 6. Example from the government Domain • Consider all interactions with government agencies in order to register a new business in Palestine. • Example: Establishing a new Radio Station. Ministry of Ministry of Ministry of Chamber ofMinistry of Information National Economy Finance Commerce Telecom PalGov © 2011 6
  7. 7. Example from the government Domain • Consider when the business evolves or changes. • Example: Changing the address of the radio station. – Address must be changed in 5 different databases. Ministry of Ministry of Ministry of Chamber ofMinistry of Information National Economy Finance Commerce Telecom PalGov © 2011 7
  8. 8. Example from the government Domain• Consider the data registered about the same radio station in the databases of different ministries and governmental agencies: ID Name Type Location Agency 1 R2563I Radio Al-Amal Radio Station Ramallah B_ID Business Name Activity Type City Agency 2 LM1847 Al-Amal Radio Ramallah Broadcast Broadcasting and Bireh ID Company Name Company Type Location Agency 3 182NS3 Broadcast Al- Broadcasting Al-Balu’ Amal Station ... PalGov © 2011 8
  9. 9. Example from the government Domain• From our simple example one can point out to some challenges in Data Integration: – No agreed upon naming (name, business name, company name) – No agreed upon meaning (Does ‟Activity Type‟ mean exactly the same as „Company Type‟?) – Different Registered Data: Radio Al-Amal, Al-Amal Broadcast, …. ID Name Type City Agency 1 R2563I Radio Al-Amal Radio Station Ramallah B_ID Business Name Activity Type Province Agency 2 LM1847 Al-Amal Radio Ramallah Broadcast Broadcasting and Bireh ID Company Name Company Type Location Agency 3 182NS3 Broadcast Al- Broadcasting Al-Balu’ Amal Station ... PalGov © 2011 9
  10. 10. Problem is in all domains PalGov © 2011 10
  11. 11. Problem is in all domains• Problem is now even more challenging with the Web.• The Data Web envisions the web as a global world-wide database.• This means that one can query distributed multiple databases on the web as if he/she is querying a local database. PalGov © 2011 11
  12. 12. Challenges of Data Integration: Heterogeneities in Database Schemas• One can distinguish between several heterogeneities between different schemas: – Name Heterogeneities (difference in used vocabulary). – Meaning Heterogeneities (different meaning for the same attribute in two schemas). – Heterogeneities in the structure and type. – Heterogeneities in the rules and constraints. – Data Model Heterogeneities. PalGov © 2011 12
  13. 13. Name and Meaning Heterogeneities• Synonyms – Different names for the same concepts – employee, clerk – exam, course – code, num• Homonyms – Same name for different concepts (different meanings) - City as City of birth in one schema, - City as City of Residence in another schema Saraly: Net Salary Section A specialized division of a Salary: Gross Salary Division large organization Homonyms Synonyms PalGov © 2011 13
  14. 14. Heterogeneities in Structure and Type Source: Carlo Batini• The same concepts are represented with different conceptual structures in two schemas: – Attribute in one schema and derived value in another schema. – Attribute in one schema and entity in another schema. – Entity in one schema and relationship in another schema. – Different abstraction levels for the same concept in two schemas: e.g. two entities with homonym names related by an IS-A hierarchy in two schemas. PalGov © 2011 14
  15. 15. Heterogeneities in Structure Source: Carlo Batini• EXAMPLES: EMPLOYEE EMPLOYEE GENDER Person Person DEPARTMENT PROJECTMAN WOMAN PROJECT BOOK BOOK PUBLISHER PUBLISHER PalGov © 2011 15
  16. 16. Heterogeneities in Type Examples:  In a single attribute (e.g., Numberic, Alphanumeric). E.g., the attribute “gender”: – Male/Female – M/F – 0/1  Year has a four digit domain in one schema and two digit domain in another schema  Different currencies (Euros, US Dollars, etc.)  Different measure systems (kilos vs. pounds, centigrade vs. Fahrenheit.)  Different granularities (grams, kilos, etc.) PalGov © 2011 16
  17. 17. Heterogeneities in the rules and constraints Source: Carlo Batini• EXAMPLES: – Different cardinalities in the same relationships – Key conflicts PalGov © 2011 17
  18. 18. Model Heterogeneities• Model Heterogeneities occurs when different databases adheres to different data models: – Relational Data Model, XML, RDF, Object-Oriented, OWL, ...• Solution: Reduce Model Heterogeneity by using one data model.• Example: Convert the Relational Model to RDF graph model. PalGov © 2011 18
  19. 19. References• Carlo Batini: Course on Data Integration. BZU IT Summer School 2011.• Stefano Spaccapietra: Information Integration. Presentation at the IFIP Academy. Porto Alegre. 2005.• Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009. PalGov © 2011 19
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×