Jarrar: Architectural solutions in Data Integration

Uploaded on

Lecture Notes by Mustafa Jarrar at Birzeit University, Palestine. …

Lecture Notes by Mustafa Jarrar at Birzeit University, Palestine.
See the course webpage at: http://jarrar-courses.blogspot.com/2014/01/architectural-solutions-in-data.html and http://www.jarrar.info
you may also watch this lecture at: http://www.youtube.com/watch?v=1BMmpV4yU10

The lecture covers the different Solutions in data integration:

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Mustafa Jarrar Lecture Notes, Web Data Management (MCOM7348) University of Birzeit, Palestine 1st Semester, 2013 Architectural Solutions in Data Integration Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info Jarrar © 2013 1
  • 2. Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2013/11/web-data-management.html Jarrar © 2013 2
  • 3. Different Solutions Two families of solutions for the integration issue: –  Application-driven Integration •  Various types of middleware (e.g. Web Services, Remote Procedure Call (RPC), Publish & Subscribe) that achieve reconciliation through application to middleware communication –  Data-driven Integration •  Various types of data reconciliation and integration –  Consolidation –  Data Warehouse –  Data Integration Jarrar © 2013 3
  • 4. Architectures of application-driven Integration Service Oriented Architecture AS SS .  .  . MSG-­‐1 .  .  . enterprise service bus SS AS SS AS Jarrar © 2013 AS SS MSG-­‐N SS AS .  .  . SS AS Legend SS  =  Security  Server AS  =  Adapter  Server MSG  = Data  Message 4
  • 5. Architectures of application-driven Integration Source: Carlo Batini Publish-Subscribe Architecture Update of an object O 1 2 Middleware 5 7 Application 1 6 Source 1 Application 2 4 Source 2 Subscribes 3 Application n Source n Publishes Typical application-driven integration architecture for integration of updates. Jarrar © 2013 5
  • 6. Information Integration Architectures Source: Carlo Batini Consolidation Source 1 Source 1 Source 2 ….. Source 2 Unique DB Source n New architecture once for all Source n Jarrar © 2013 6
  • 7. Information Integration Architectures Source: Carlo Batini Data Warehouse Source 1 Source 2 Data Warehouse middleware Unique DB ….. Source n New database New architecture: periodically updated Jarrar © 2013 7
  • 8. Information Integration Architectures Source: Carlo Batini Virtual Data Integration Source 1 Local schema Mediator Source 2 Local schema Local schema Local schema Local schema Global schema ….. Source n Local schema New architecture No new database! Jarrar © 2013 8
  • 9. The integration problem… Source: Carlo Batini Registry of clients 1 Source 1 Registry of clients 2 Source 2 Retail sales On line sales Source 3 Which kind of integration? New architecture How to decide? Source 4 ….. Other Source n Jarrar © 2013 9
  • 10. Criteria to be adopted Source: Carlo Batini •  Autonomy, the degree of independence between the different database administrators in their design choices; •  Relevance of historical data, and consequent need to periodically store new data without deleting the old ones; •  Query complexity, in terms of amount of data and tables visited and number of operators on them, and consequent time complexity in query execution; •  Relevance of currency in queries, the need for queries to extract current data; •  Economic value of integration, the relevance of having integrated information in input for business operational and decisional processes in order to produce effective outputs; Jarrar © 2013 10
  • 11. Criteria to be adopted Source: Carlo Batini •  Volatility of sources, frequency of adding or deleting sources, and frequency of change of source schemas; •  Relevance of queries w.r.t transactions, relative importance and frequency of queries with respect to changes in data; •  Management complexity, the effort to be spent in management activities related to databases and hw-sw infrastructures, due to the corresponding complexity of the organizations using the data bases; •  Costs of heterogeneity, hidden and explicit costs related to business processes that are due to making use of heterogeneous data. Jarrar © 2013 11
  • 12. References and Acknowledge •  Carlo Batini: Course on Data Integration. BZU IT Summer School 2011. •  Stefano Spaccapietra: Information Integration. Presentation at the IFIP Academy. Porto Alegre. 2005. •  Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009. Thanks to Anton Deik for helping me preparing this lecture Jarrar © 2013 12