• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data integration
 

Data integration

on

  • 258 views

 

Statistics

Views

Total Views
258
Views on SlideShare
258
Embed Views
0

Actions

Likes
0
Downloads
19
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Data integration Data integration Presentation Transcript

    • Moving beyond ETL
    • DefinitionIn general, integration of multiple informationsystems aims at combining selected systems so thatthey form a unified new whole and give users theillusion of interacting with one single informationsystem
    • Reasons for IntegrationFirst, given a set of existing informationsystems, an integrated view can be created tofacilitate information access and reuse througha single information access point
    • Reasons for IntegrationSecond, given a certain information need, datafrom different complementing informationsystems is to be combined to gain a morecomprehensive basis to satisfy the need
    • ApplicationsIn the area of Business Intelligence (BI) integrated information usedfor querying and reporting for • Statistical Analysis • OLAP • Data MiningIn order to enable • Forecasting • Decision Making • Enterprise-wide Planning
    • Integration Problem• Users will be provided with homogeneous logical view of data physically distributed over heterogeneous data sources• All data has to be represented using the same abstraction principle (unified global data model and unified semantic)
    • Kinds of Heterogeneity• Hardware and Operating Systems• Data Management Software• Data Models, Schemas and Semantic• Middle-ware• User Interfaces• Business Rules and Integrity Constraints
    • Abstraction Levels
    • 1. Manual Integration• Users directly interact with all relevant information systems and manually integrate selected data• Users have to deal with different user interfaces and query languages• Users need to have detailed knowledge on location, logical data representation, and data semantics.
    • 2. Common User Interface• The user is supplied with a common user interface (e.g. a web browser) that provides a uniform look and feel• Data from relevant information systems is still separately presented• Homogenization and integration of data yet has to be done by the users• For instance, as in Search Engines
    • 3. Integration by Applications• Uses Integration applications that access various data sources and return integrated results to the user• Practical for a small number of component systems• Applications become increasingly fat as the number of system interfaces and data formats to homogenize and integrate grows
    • 4. Integration by Middle-ware• Middleware provides functionality used to solve aspects of the integration problem• Integration efforts are still needed in applications• Different middleware tools usually have to be combined to build integrated systems.
    • 5. Uniform Data Access• A logical integration of data is accomplished at the data access level• Global applications are provided with a unified global view of physically distributed data• Global provision of physically integrated data can be time-consuming• Data access, homogenization, and integration have to be done at runtime
    • 6. Common Data Storage• Physical data integration is performed by transferring data to a new data storage• Local sources can either be retired or remain operational• In general, provides fast data access• If local data sources are retired, applications have to be migrated to the new data storage• In case local data sources remain operational, periodical refreshing of the common data storage needs to be considered
    • Important Examples• Mediated Query Systems• Portals• Data Warehouses• Operational Data Stores• Federated Database Systems (FDBMS)• Workflow Management Systems (WFMS)• Integration by Web Services• Peer-to-Peer (P2P) Integration
    • Mediated Query Systems• Represent a uniform data access solution by providing a single point for read-only querying access to various data sources• Uses a mediator that contains a global query processor to send sub-queries to local data sources; returned local query results are then combined
    • Portals• Another form of uniform data access are personalized doorways to the internet or intranet• Each user is provided with information tailored to his information needs• Web mining is applied to determine user- profiles by click-stream analysis
    • Data Warehouses• Realize a common data storage approach• Data from several operational sources (OLTP) are extracted, transformed, and loaded (ETL) into a data warehouse• Analysis, such as OLAP, can be performed on cubes of integrated and aggregated data
    • Operational Data Stores• A second example of a common data storage• A “warehouse with fresh data” is built by immediately propagating updates in local data sources to the data store• Up-to-Date integrated data is available for decision support• Unlike in data warehouses, data is neither cleansed nor aggregated nor are data histories supported
    • Federated Database Systems• Achieve a uniform data access solution by logically integrating data from underlying local DBMS• Implement their own data model, support global queries, global transactions, and global access control
    • Workflow Management Systems• Represent an integration-by-application approach• Allow to implement business processes where each single step is executed by a different application or user• Support modeling, execution, and maintenance of processes that are comprised of interactions between applications and human users
    • Integration by Web Services• Performs integration through software components (web services) that support machine-to-machine interaction by XML-based messages conveyed by internet protocols• Depending on offered integration functionality either represent - a uniform data access approach, or - a common data access for later manual or application-based integration
    • Peer-to-Peer (P2P) Integration• A decentralized approach to integration between distributed peers where data can be mutually shared and integrated• Depending on offered integration functionality either represent - a uniform data access approach, or - a common data access for later manual or application-based integration
    • Semantic Data Integration