Mustafa Jarrar
Lecture Notes, Web Data Management (MCOM7348)
University of Birzeit, Palestine
1st Semester, 2013

Introduc...
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2013/11/web-data-management.html

Jarra...
Example from the government Domain
Consider all interactions with government agencies in order
to register a new business ...
Example from the government Domain
Consider when the business evolves or changes.
Example: Changing the address of the rad...
Example from the government Domain
Consider the data registered about the same radio station in
the databases of different...
Example from the government Domain
From our simple example one can point out to some
challenges in Data Integration:
–  No...
Problem is in all domains

Jarrar © 2013

7
Problem is in all domains
Problem is now even more challenging with the Web.
The Data Web envisions the web as a global wo...
Challenges of Data Integration:
Heterogeneities in Database Schemas
One can distinguish between several heterogeneities
be...
Name and Meaning Heterogeneities
Synonyms – Different names for the same concepts
–  employee, clerk
–  exam, course
–  co...
Heterogeneities in Structure and Type
Source: Carlo Batini

The same concepts are represented with
different conceptual st...
Heterogeneities in Structure
Source: Carlo Batini

EXAMPLES:
EMPLOYEE

Person
MAN

Person

GENDER

EMPLOYEE

DEPARTMENT

P...
Heterogeneities in Type
Examples:
§  In a single attribute (e.g., Numberic, Alphanumeric).
E.g., the attribute “gender”:
...
Heterogeneities in the rules and constraints
Source: Carlo Batini

EXAMPLES:
–  Different cardinalities in the same relati...
Model Heterogeneities
Model Heterogeneities occurs when different databases adheres to
different data models:
–  Relationa...
References and Acknowledgement
•  Carlo Batini: Course on Data Integration. BZU IT Summer School
2011.
•  Stefano Spaccapi...
Upcoming SlideShare
Loading in …5
×

Jarrar: Introduction to Data Integration

398 views
277 views

Published on

This lecture discusses the need and importance of data integration

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
398
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Jarrar: Introduction to Data Integration

  1. 1. Mustafa Jarrar Lecture Notes, Web Data Management (MCOM7348) University of Birzeit, Palestine 1st Semester, 2013 Introduction to Data Integration Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info Jarrar © 2013 1
  2. 2. Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2013/11/web-data-management.html Jarrar © 2013 2
  3. 3. Example from the government Domain Consider all interactions with government agencies in order to register a new business in Palestine. Example: Establishing a new Radio Station. Ministry of Telecom Ministry of Information Ministry of National Economy Jarrar © 2013 Ministry of Finance Chamber of Commerce 3
  4. 4. Example from the government Domain Consider when the business evolves or changes. Example: Changing the address of the radio station. –  Address must be changed in 5 different databases. Ministry of Telecom Ministry of Information Ministry of National Economy Jarrar © 2013 Ministry of Finance Chamber of Commerce 4
  5. 5. Example from the government Domain Consider the data registered about the same radio station in the databases of different ministries and governmental agencies: ID Agency 3 R2563I Radio Al-Amal Radio Station Ramallah Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Agency 2 Type B_ID Agency 1 Name Company Name Company Type Location 182NS3 Broadcast AlAmal Broadcasting Station Al-Balu’ ... Jarrar © 2013 City 5
  6. 6. Example from the government Domain From our simple example one can point out to some challenges in Data Integration: –  No agreed upon naming (name, business name, company name) –  No agreed upon meaning (Does ’Activity Type’ mean exactly the same as ‘Company Type’?) –  Different Registered Data: Radio Al-Amal, Al-Amal Broadcast, …. ID Agency 3 R2563I Radio Al-Amal Radio Station Ramallah Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Agency 2 Type B_ID Agency 1 Name Company Name Company Type Location 182NS3 Broadcast AlAmal Broadcasting Station Al-Balu’ ... Jarrar © 2013 City 6
  7. 7. Problem is in all domains Jarrar © 2013 7
  8. 8. Problem is in all domains Problem is now even more challenging with the Web. The Data Web envisions the web as a global world-wide database. This means that one can query distributed multiple databases on the web as if he/she is querying a local database. Jarrar © 2013 8
  9. 9. Challenges of Data Integration: Heterogeneities in Database Schemas One can distinguish between several heterogeneities between different schemas: –  Name Heterogeneities (difference in used vocabulary). –  Meaning Heterogeneities (different meaning for the same attribute in two schemas). –  Heterogeneities in the structure and type. –  Heterogeneities in the rules and constraints. –  Data Model Heterogeneities. Jarrar © 2013 9
  10. 10. Name and Meaning Heterogeneities Synonyms – Different names for the same concepts –  employee, clerk –  exam, course –  code, num Homonyms – Same name for different concepts (different meanings) - City as City of birth in one schema, - City as City of Residence in another schema Saraly: Net Salary Section Salary: Gross Salary Division Homonyms A specialized division of a large organization Synonyms Jarrar © 2013 10
  11. 11. Heterogeneities in Structure and Type Source: Carlo Batini The same concepts are represented with different conceptual structures in two schemas: –  Attribute in one schema and derived value in another schema. –  Attribute in one schema and entity in another schema. –  Entity in one schema and relationship in another schema. –  Different abstraction levels for the same concept in two schemas: e.g. two entities with homonym names related by an IS-A hierarchy in two schemas. Jarrar © 2013 11
  12. 12. Heterogeneities in Structure Source: Carlo Batini EXAMPLES: EMPLOYEE Person MAN Person GENDER EMPLOYEE DEPARTMENT PROJECT WOMAN PROJECT BOOK BOOK PUBLISHER PUBLISHER Jarrar © 2013 12
  13. 13. Heterogeneities in Type Examples: §  In a single attribute (e.g., Numberic, Alphanumeric). E.g., the attribute “gender”: –  Male/Female –  M/F –  0/1 §  Year has a four digit domain in one schema and two digit domain in another schema §  Different currencies (Euros, US Dollars, etc.) §  Different measure systems (kilos vs. pounds, centigrade vs. Fahrenheit.) §  Different granularities (grams, kilos, etc.) Jarrar © 2013 13
  14. 14. Heterogeneities in the rules and constraints Source: Carlo Batini EXAMPLES: –  Different cardinalities in the same relationships –  Key conflicts Jarrar © 2013 14
  15. 15. Model Heterogeneities Model Heterogeneities occurs when different databases adheres to different data models: –  Relational Data Model, XML, RDF, Object-Oriented, OWL, ... Solution: Reduce Model Heterogeneity by using one data model. Example: Convert the Relational Model to RDF graph model. Jarrar © 2013 15
  16. 16. References and Acknowledgement •  Carlo Batini: Course on Data Integration. BZU IT Summer School 2011. •  Stefano Spaccapietra: Information Integration. Presentation at the IFIP Academy. Porto Alegre. 2005. •  Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009. Thanks to Anton Deik for helping me preparing this lecture Jarrar © 2013 16

×