Jarrar © 2013 1
Dr. Mustafa Jarrar
University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
Lecture Notes, Web Data Manag...
Jarrar © 2013 2
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2014/01/web-data-manage...
Jarrar © 2013 3
Outline
Example from the government Domain
- Problem is in all domains
- Challenges of Data Integration:
-...
Jarrar © 2013 4
Example from the government Domain
Consider all interactions with government agencies in order
to register...
Jarrar © 2013 5
Example from the government Domain
Consider when the business evolves or changes.
Example: Changing the ad...
Jarrar © 2013 6
Example from the government Domain
Consider the data registered about the same radio station in
the databa...
Jarrar © 2013 7
Example from the government Domain
From our simple example one can point out to some
challenges in Data In...
Jarrar © 2013 8
Problem is in all domains
Jarrar © 2013 9
Problem is in all domains
Problem is now even more challenging with the Web.
The Data Web envisions the we...
Jarrar © 2013 10
Challenges of Data Integration:
Heterogeneities in Database Schemas
One can distinguish between several h...
Jarrar © 2013 11
Name and Meaning Heterogeneities
Synonyms – Different names for the same concepts
– employee, clerk
– exa...
Jarrar © 2013 12
Heterogeneities in Structure and Type
The same concepts are represented with
different conceptual structu...
Jarrar © 2013 13
Heterogeneities in Structure
EXAMPLES:
PUBLISHERBOOKBOOK
PUBLISHER
EMPLOYEE
DEPARTMENT
PROJECT
EMPLOYEE
P...
Jarrar © 2013 14
Heterogeneities in Type
Examples:
 In a single attribute (e.g., Numberic, Alphanumeric).
E.g., the attri...
Jarrar © 2013 15
Heterogeneities in the rules and constraints
EXAMPLES:
– Different cardinalities in the same relationship...
Jarrar © 2013 16
Model Heterogeneities
Model Heterogeneities occurs when different databases adheres to
different data mod...
Jarrar © 2013 17
References and Acknowledgement
• Carlo Batini: Course on Data Integration. BZU IT Summer School
2011.
• S...
Upcoming SlideShare
Loading in...5
×

Jarrar: Introduction to data Integration

391

Published on

Lecture Notes by Mustafa Jarrar at Birzeit University, Palestine.

See the course webpage at: http://jarrar-courses.blogspot.com/2014/01/introduction-to-data-integration.html

and http://www.jarrar.info
you may also watch this lecture at: http://www.youtube.com/watch?v=TEgHq2J1OMo

The lecture covers:
- Example from the government Domain
- Problem is in all domains
- Challenges of Data Integration
- Name and Meaning Heterogeneities
-Heterogeneities in Structure and Type
-Heterogeneities in Structure
-Heterogeneities in Type
-Heterogeneities in the rules and constraints

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
391
On Slideshare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Jarrar: Introduction to data Integration

  1. 1. Jarrar © 2013 1 Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info Lecture Notes, Web Data Management Birzeit University, Palestine 2013 Introduction to Data Integration
  2. 2. Jarrar © 2013 2 Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2014/01/web-data-management.html
  3. 3. Jarrar © 2013 3 Outline Example from the government Domain - Problem is in all domains - Challenges of Data Integration: - Heterogeneities in Database Schemas - Name and Meaning Heterogeneities - Heterogeneities in Structure and Type - Heterogeneities in the rules and constraints - Model Heterogeneities Keywords: Data Integration, Registered data, domain, domain name system, web, distributed database, database schema, Heterogeneities, Model Heterogeneities, Data model, Synonyms, Homonyms, Attribute, Entity
  4. 4. Jarrar © 2013 4 Example from the government Domain Consider all interactions with government agencies in order to register a new business in Palestine. Example: Establishing a new Radio Station. Ministry of Telecom Ministry of Information Ministry of National Economy Chamber of Commerce Ministry of Finance
  5. 5. Jarrar © 2013 5 Example from the government Domain Consider when the business evolves or changes. Example: Changing the address of the radio station. – Address must be changed in 5 different databases. Ministry of Telecom Ministry of Information Ministry of National Economy Chamber of Commerce Ministry of Finance
  6. 6. Jarrar © 2013 6 Example from the government Domain Consider the data registered about the same radio station in the databases of different ministries and governmental agencies: ID Name Type City R2563I Radio Al-Amal Radio Station Ramallah B_ID Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Company Name Company Type Location 182NS3 Broadcast Al- Amal Broadcasting Station Al-Balu’ Agency 1 Agency 2 Agency 3 . . .
  7. 7. Jarrar © 2013 7 Example from the government Domain From our simple example one can point out to some challenges in Data Integration: – No agreed upon naming (name, business name, company name) – No agreed upon meaning (Does ’Activity Type’ mean exactly the same as ‘Company Type’?) – Different Registered Data: Radio Al-Amal, Al-Amal Broadcast, …. ID Name Type City R2563I Radio Al-Amal Radio Station Ramallah B_ID Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Company Name Company Type Location 182NS3 Broadcast Al- Amal Broadcasting Station Al-Balu’ Agency 1 Agency 2 Agency 3 . . .
  8. 8. Jarrar © 2013 8 Problem is in all domains
  9. 9. Jarrar © 2013 9 Problem is in all domains Problem is now even more challenging with the Web. The Data Web envisions the web as a global world-wide database. This means that one can query distributed multiple databases on the web as if he/she is querying a local database.
  10. 10. Jarrar © 2013 10 Challenges of Data Integration: Heterogeneities in Database Schemas One can distinguish between several heterogeneities between different schemas: – Name Heterogeneities (difference in used vocabulary). – Meaning Heterogeneities (different meaning for the same attribute in two schemas). – Heterogeneities in the structure and type. – Heterogeneities in the rules and constraints. – Data Model Heterogeneities.
  11. 11. Jarrar © 2013 11 Name and Meaning Heterogeneities Synonyms – Different names for the same concepts – employee, clerk – exam, course – code, num Homonyms – Same name for different concepts (different meanings) - City as City of birth in one schema, - City as City of Residence in another schema Saraly: Net Salary Salary: Gross Salary Section Division Synonyms Homonyms A specialized division of a large organization
  12. 12. Jarrar © 2013 12 Heterogeneities in Structure and Type The same concepts are represented with different conceptual structures in two schemas: – Attribute in one schema and derived value in another schema. – Attribute in one schema and entity in another schema. – Entity in one schema and relationship in another schema. – Different abstraction levels for the same concept in two schemas: e.g. two entities with homonym names related by an IS-A hierarchy in two schemas. Source: Carlo Batini
  13. 13. Jarrar © 2013 13 Heterogeneities in Structure EXAMPLES: PUBLISHERBOOKBOOK PUBLISHER EMPLOYEE DEPARTMENT PROJECT EMPLOYEE PROJECT Source: Carlo Batini Person WOMANMAN GENDER Person
  14. 14. Jarrar © 2013 14 Heterogeneities in Type Examples:  In a single attribute (e.g., Numberic, Alphanumeric). E.g., the attribute “gender”: – Male/Female – M/F – 0/1  Year has a four digit domain in one schema and two digit domain in another schema  Different currencies (Euros, US Dollars, etc.)  Different measure systems (kilos vs. pounds, centigrade vs. Fahrenheit.)  Different granularities (grams, kilos, etc.)
  15. 15. Jarrar © 2013 15 Heterogeneities in the rules and constraints EXAMPLES: – Different cardinalities in the same relationships – Key conflicts Source: Carlo Batini
  16. 16. Jarrar © 2013 16 Model Heterogeneities Model Heterogeneities occurs when different databases adheres to different data models: – Relational Data Model, XML, RDF, Object-Oriented, OWL, ... Solution: Reduce Model Heterogeneity by using one data model. Example: Convert the Relational Model to RDF graph model.
  17. 17. Jarrar © 2013 17 References and Acknowledgement • Carlo Batini: Course on Data Integration. BZU IT Summer School 2011. • Stefano Spaccapietra: Information Integration. Presentation at the IFIP Academy. Porto Alegre. 2005. • Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009. Thanks to Anton Deik for helping me preparing this lecture

×