Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data integration


Published on

Published in: Technology
  • Be the first to comment

Data integration

  1. 1. Moving beyond ETL
  2. 2. DefinitionIn general, integration of multiple informationsystems aims at combining selected systems so thatthey form a unified new whole and give users theillusion of interacting with one single informationsystem
  3. 3. Reasons for IntegrationFirst, given a set of existing informationsystems, an integrated view can be created tofacilitate information access and reuse througha single information access point
  4. 4. Reasons for IntegrationSecond, given a certain information need, datafrom different complementing informationsystems is to be combined to gain a morecomprehensive basis to satisfy the need
  5. 5. ApplicationsIn the area of Business Intelligence (BI) integrated information usedfor querying and reporting for • Statistical Analysis • OLAP • Data MiningIn order to enable • Forecasting • Decision Making • Enterprise-wide Planning
  6. 6. Integration Problem• Users will be provided with homogeneous logical view of data physically distributed over heterogeneous data sources• All data has to be represented using the same abstraction principle (unified global data model and unified semantic)
  7. 7. Kinds of Heterogeneity• Hardware and Operating Systems• Data Management Software• Data Models, Schemas and Semantic• Middle-ware• User Interfaces• Business Rules and Integrity Constraints
  8. 8. Abstraction Levels
  9. 9. 1. Manual Integration• Users directly interact with all relevant information systems and manually integrate selected data• Users have to deal with different user interfaces and query languages• Users need to have detailed knowledge on location, logical data representation, and data semantics.
  10. 10. 2. Common User Interface• The user is supplied with a common user interface (e.g. a web browser) that provides a uniform look and feel• Data from relevant information systems is still separately presented• Homogenization and integration of data yet has to be done by the users• For instance, as in Search Engines
  11. 11. 3. Integration by Applications• Uses Integration applications that access various data sources and return integrated results to the user• Practical for a small number of component systems• Applications become increasingly fat as the number of system interfaces and data formats to homogenize and integrate grows
  12. 12. 4. Integration by Middle-ware• Middleware provides functionality used to solve aspects of the integration problem• Integration efforts are still needed in applications• Different middleware tools usually have to be combined to build integrated systems.
  13. 13. 5. Uniform Data Access• A logical integration of data is accomplished at the data access level• Global applications are provided with a unified global view of physically distributed data• Global provision of physically integrated data can be time-consuming• Data access, homogenization, and integration have to be done at runtime
  14. 14. 6. Common Data Storage• Physical data integration is performed by transferring data to a new data storage• Local sources can either be retired or remain operational• In general, provides fast data access• If local data sources are retired, applications have to be migrated to the new data storage• In case local data sources remain operational, periodical refreshing of the common data storage needs to be considered
  15. 15. Important Examples• Mediated Query Systems• Portals• Data Warehouses• Operational Data Stores• Federated Database Systems (FDBMS)• Workflow Management Systems (WFMS)• Integration by Web Services• Peer-to-Peer (P2P) Integration
  16. 16. Mediated Query Systems• Represent a uniform data access solution by providing a single point for read-only querying access to various data sources• Uses a mediator that contains a global query processor to send sub-queries to local data sources; returned local query results are then combined
  17. 17. Portals• Another form of uniform data access are personalized doorways to the internet or intranet• Each user is provided with information tailored to his information needs• Web mining is applied to determine user- profiles by click-stream analysis
  18. 18. Data Warehouses• Realize a common data storage approach• Data from several operational sources (OLTP) are extracted, transformed, and loaded (ETL) into a data warehouse• Analysis, such as OLAP, can be performed on cubes of integrated and aggregated data
  19. 19. Operational Data Stores• A second example of a common data storage• A “warehouse with fresh data” is built by immediately propagating updates in local data sources to the data store• Up-to-Date integrated data is available for decision support• Unlike in data warehouses, data is neither cleansed nor aggregated nor are data histories supported
  20. 20. Federated Database Systems• Achieve a uniform data access solution by logically integrating data from underlying local DBMS• Implement their own data model, support global queries, global transactions, and global access control
  21. 21. Workflow Management Systems• Represent an integration-by-application approach• Allow to implement business processes where each single step is executed by a different application or user• Support modeling, execution, and maintenance of processes that are comprised of interactions between applications and human users
  22. 22. Integration by Web Services• Performs integration through software components (web services) that support machine-to-machine interaction by XML-based messages conveyed by internet protocols• Depending on offered integration functionality either represent - a uniform data access approach, or - a common data access for later manual or application-based integration
  23. 23. Peer-to-Peer (P2P) Integration• A decentralized approach to integration between distributed peers where data can be mutually shared and integrated• Depending on offered integration functionality either represent - a uniform data access approach, or - a common data access for later manual or application-based integration
  24. 24. Semantic Data Integration