Moving beyond ETL
Definition

In general, integration of multiple information

systems aims at combining selected systems so that

they form a unified new whole and give users the

illusion of interacting with one single information

system
Reasons for Integration




First, given a set of existing information
systems, an integrated view can be created to
facilitate information access and reuse through
a single information access point
Reasons for Integration




Second, given a certain information need, data
from   different   complementing   information
systems is to be combined to gain a more
comprehensive basis to satisfy the need
Applications
In the area of Business Intelligence (BI) integrated information used
for querying and reporting for

 •    Statistical Analysis

 •    OLAP

 •    Data Mining

In order to enable

 •    Forecasting

 •    Decision Making

 •    Enterprise-wide Planning
Integration Problem
• Users will be provided with homogeneous logical view of
  data physically distributed over heterogeneous data
  sources
• All   data    has   to    be
  represented using the same
  abstraction         principle
  (unified global data model
  and unified semantic)
Kinds of Heterogeneity
•   Hardware and Operating Systems

•   Data Management Software

•   Data Models, Schemas and Semantic

•   Middle-ware

•   User Interfaces

•   Business Rules

    and Integrity Constraints
Abstraction Levels
1. Manual Integration
• Users   directly   interact   with   all   relevant
  information systems and manually integrate
  selected data
• Users have to deal with different user interfaces
  and query languages
• Users need to have detailed knowledge on
  location, logical data representation, and data
  semantics.
2. Common User Interface
• The user is supplied with a common user
  interface (e.g. a web browser) that provides a
  uniform look and feel
• Data from relevant information systems is still
  separately presented
• Homogenization and integration of data yet has
  to be done by the users
• For instance, as in Search Engines
3. Integration by Applications

• Uses Integration applications that access various
  data sources and return integrated results to the
  user

• Practical for a small number of component systems

• Applications become increasingly fat as the
  number of system interfaces and data formats to
  homogenize and integrate grows
4. Integration by Middle-ware

• Middleware provides functionality used to
  solve aspects of the integration problem

• Integration    efforts   are   still   needed   in
  applications

• Different middleware tools usually have to be
  combined to build integrated systems.
5. Uniform Data Access
• A logical integration of data is accomplished at
  the data access level
• Global applications are provided with a unified
  global view of physically distributed data
• Global provision of physically integrated data can
  be time-consuming
• Data access, homogenization, and integration
  have to be done at runtime
6. Common Data Storage
• Physical data integration is performed by
  transferring data to a new data storage
• Local sources can either be retired or remain
  operational
• In general, provides fast data access
• If local data sources are retired, applications
  have to be migrated to the new data storage
• In case local data sources remain operational,
  periodical refreshing of the common data
  storage needs to be considered
Important Examples
•   Mediated Query Systems
•   Portals
•   Data Warehouses
•   Operational Data Stores
•   Federated Database Systems (FDBMS)
•   Workflow Management Systems (WFMS)
•   Integration by Web Services
•   Peer-to-Peer (P2P) Integration
Mediated Query Systems

• Represent a uniform data access solution by
 providing a single point for read-only querying
 access to various data sources
• Uses a mediator that contains a global query
 processor to send sub-queries to local data
 sources; returned local query results are then
 combined
Portals

• Another form of uniform data access are
  personalized doorways to the internet or
  intranet
• Each user is provided with information tailored
  to his information needs
• Web mining is applied to determine user-
  profiles by click-stream analysis
Data Warehouses

• Realize a common data storage approach

• Data from several operational sources (OLTP)
  are extracted, transformed, and loaded (ETL)
  into a data warehouse

• Analysis, such as OLAP, can be performed on
  cubes of integrated and aggregated data
Operational Data Stores
• A second example of a common data storage
• A “warehouse with fresh data” is built by
  immediately propagating updates in local data
  sources to the data store
• Up-to-Date integrated data is available for decision
  support
• Unlike in data warehouses, data is neither cleansed
  nor aggregated nor are data histories supported
Federated Database Systems

• Achieve a uniform data access solution by
 logically integrating data from underlying
 local DBMS

• Implement their own data model, support
 global queries, global transactions, and
 global access control
Workflow Management Systems

• Represent an integration-by-application approach

• Allow to implement business processes where
  each single step is executed by a different
  application or user

• Support modeling, execution, and maintenance of
  processes that are comprised of interactions
  between applications and human users
Integration by Web Services
• Performs integration through software components
  (web   services)     that   support   machine-to-machine
  interaction by XML-based messages conveyed by
  internet protocols
• Depending on offered integration functionality either
  represent
 - a uniform data access approach, or
 - a common data access for later manual or
   application-based integration
Peer-to-Peer (P2P) Integration
• A decentralized approach to integration between
  distributed peers where data can be mutually shared
  and integrated
• Depending on offered integration functionality either
  represent
 - a uniform data access approach, or
 - a common data access for later manual or
   application-based integration
Semantic Data Integration
Data integration

Data integration

  • 1.
  • 2.
    Definition In general, integrationof multiple information systems aims at combining selected systems so that they form a unified new whole and give users the illusion of interacting with one single information system
  • 3.
    Reasons for Integration First,given a set of existing information systems, an integrated view can be created to facilitate information access and reuse through a single information access point
  • 4.
    Reasons for Integration Second,given a certain information need, data from different complementing information systems is to be combined to gain a more comprehensive basis to satisfy the need
  • 5.
    Applications In the areaof Business Intelligence (BI) integrated information used for querying and reporting for • Statistical Analysis • OLAP • Data Mining In order to enable • Forecasting • Decision Making • Enterprise-wide Planning
  • 6.
    Integration Problem • Userswill be provided with homogeneous logical view of data physically distributed over heterogeneous data sources • All data has to be represented using the same abstraction principle (unified global data model and unified semantic)
  • 7.
    Kinds of Heterogeneity • Hardware and Operating Systems • Data Management Software • Data Models, Schemas and Semantic • Middle-ware • User Interfaces • Business Rules and Integrity Constraints
  • 8.
  • 9.
    1. Manual Integration •Users directly interact with all relevant information systems and manually integrate selected data • Users have to deal with different user interfaces and query languages • Users need to have detailed knowledge on location, logical data representation, and data semantics.
  • 10.
    2. Common UserInterface • The user is supplied with a common user interface (e.g. a web browser) that provides a uniform look and feel • Data from relevant information systems is still separately presented • Homogenization and integration of data yet has to be done by the users • For instance, as in Search Engines
  • 11.
    3. Integration byApplications • Uses Integration applications that access various data sources and return integrated results to the user • Practical for a small number of component systems • Applications become increasingly fat as the number of system interfaces and data formats to homogenize and integrate grows
  • 12.
    4. Integration byMiddle-ware • Middleware provides functionality used to solve aspects of the integration problem • Integration efforts are still needed in applications • Different middleware tools usually have to be combined to build integrated systems.
  • 13.
    5. Uniform DataAccess • A logical integration of data is accomplished at the data access level • Global applications are provided with a unified global view of physically distributed data • Global provision of physically integrated data can be time-consuming • Data access, homogenization, and integration have to be done at runtime
  • 14.
    6. Common DataStorage • Physical data integration is performed by transferring data to a new data storage • Local sources can either be retired or remain operational • In general, provides fast data access • If local data sources are retired, applications have to be migrated to the new data storage • In case local data sources remain operational, periodical refreshing of the common data storage needs to be considered
  • 15.
    Important Examples • Mediated Query Systems • Portals • Data Warehouses • Operational Data Stores • Federated Database Systems (FDBMS) • Workflow Management Systems (WFMS) • Integration by Web Services • Peer-to-Peer (P2P) Integration
  • 16.
    Mediated Query Systems •Represent a uniform data access solution by providing a single point for read-only querying access to various data sources • Uses a mediator that contains a global query processor to send sub-queries to local data sources; returned local query results are then combined
  • 17.
    Portals • Another formof uniform data access are personalized doorways to the internet or intranet • Each user is provided with information tailored to his information needs • Web mining is applied to determine user- profiles by click-stream analysis
  • 18.
    Data Warehouses • Realizea common data storage approach • Data from several operational sources (OLTP) are extracted, transformed, and loaded (ETL) into a data warehouse • Analysis, such as OLAP, can be performed on cubes of integrated and aggregated data
  • 19.
    Operational Data Stores •A second example of a common data storage • A “warehouse with fresh data” is built by immediately propagating updates in local data sources to the data store • Up-to-Date integrated data is available for decision support • Unlike in data warehouses, data is neither cleansed nor aggregated nor are data histories supported
  • 20.
    Federated Database Systems •Achieve a uniform data access solution by logically integrating data from underlying local DBMS • Implement their own data model, support global queries, global transactions, and global access control
  • 21.
    Workflow Management Systems •Represent an integration-by-application approach • Allow to implement business processes where each single step is executed by a different application or user • Support modeling, execution, and maintenance of processes that are comprised of interactions between applications and human users
  • 22.
    Integration by WebServices • Performs integration through software components (web services) that support machine-to-machine interaction by XML-based messages conveyed by internet protocols • Depending on offered integration functionality either represent - a uniform data access approach, or - a common data access for later manual or application-based integration
  • 23.
    Peer-to-Peer (P2P) Integration •A decentralized approach to integration between distributed peers where data can be mutually shared and integrated • Depending on offered integration functionality either represent - a uniform data access approach, or - a common data access for later manual or application-based integration
  • 24.