SlideShare a Scribd company logo
1 of 80
Download to read offline
Data Integration, Access,
Flow, Exchange, Transfer,
Load And Extract
Architecture
Alan McSweeney
http://ie.linkedin.com/in/alanmcsweeney
https://www.amazon.com/dp/1797567616
Data Integration, Access, Flow, Exchange, Transfer,
Load, Share And Extract
• Set of data movements between data entities - data sources and data targets -
across the organisation’s data landscape
• Data integration is more than just extracting data from operational systems to
populate data warehouses and long-term data stores
• The movement, creation, transfer and exchange of data breathes life into the set
of organisation solutions
• Data integration is the combination of all these data flows, transfers, exchanges,
loads, extracts that occurs across the data landscape and the tools, methods and
approaches to facilitating and achieving them
• Data integration is an enterprise-level capability that should be available to all
applications and solutions
• The organisation’s data fabric should include infrastructural components and tools
that deliver these data integration facilities
• Individual solution and applications and their implementation projects should not
have to create (additional) point-to-point custom integrations
• Data interoperability and solution interoperability are closely related – you cannot
have effective solution interoperability without data interoperability
March 22, 2021 2
Evolution Of Data Integration
• With many organisations, data integration tends to have
evolved over time with many solution-specific tactical
approaches implemented
• The consequence is that there is frequently a mixed,
inconsistent data integration topography
• Data integrations are often poorly understood,
undocumented and difficult to support, maintain and
enhance
March 22, 2021 3
Current State Of Data Integration
March 22, 2021 4
Data Integration
• Data integration has multiple meanings and multiple ways
of being used such as:
− Integration in terms of handling data transfers, exchanges,
requests for information using a variety of information movement
technologies
− Integration in terms of migrating data from a source to a target
system and/or loading data into a target system
− Integration in terms of aggregating data from multiple sources
and creating one source, with possibly date and time dimensions
added to the integrated data, for reporting and analytics
− Integration in terms of synchronising two data sources or
regularly extracting data from one data sources to update a target
− Integration in terms of service orientation and API management
to provide access to raw data or the results of processing
March 22, 2021 5
Two Aspects Of Data Integration
• Overall data integration architecture needs to handle both types
March 22, 2021 6
Operational
System
Operational
System
Operational
System
Operational Integration – allow data to move from one operational system and its data
store to another
Analytic Integration – move data from operational systems and their data stores into a
common structure for retrieval, reporting and analysis
Operational
System
Operational
System
Analytic Data
Store
Data Retrieval
Data Integration And Organisation Data Plumbing
March 22, 2021 7
Organisation
Technology
Solutions
Landscape
Data Plumbing
Required to
Support
Solutions
Landscape and
Solution
Interoperability
Data Fabric, Data Landscape And Data Entities
• The data landscape is an integrated view of all data
entities within (core) and outside (extended) the
organisation that the organisation obtains, shares and
provides data
• The data fabric is the aggregation of the data entities and
their data flows across the core and extended organisation
• Data entities are data assets that are involved in the
provisioning, storage, processing and transfer of
organisation data
− Data entities perform data-related activities across the spectrum
of data actions and events
− A data entity is a hardware or software technology component
involved in any form of data processing
March 22, 2021 8
Importance Of Data Integration In IT Architecture
• Enterprise Architecture – defines overall IT architecture for the organisation
• Data Architecture – defines the data architecture for the organisation, of which data integration and
interoperability is one element
• Solution Architecture – designs solutions in the context of overall enterprise and data architectures and the
need for solutions to access, integrate, exchange, transfer and extract data
− Effective data integration is key to solution interoperability
• Data Integration Architecture – defines a common approach to and set of enabling and implementing
technologies in the areas of data integration, access, flow, exchange, transfer, load and extract that can be
used by all IT solutions
March 22, 2021 9
Enterprise
Architecture
Data
Architecture
Data
Integration
Architecture
Solution
Architecture
Business And Information Technology Architecture
March 22, 2021 10
Business Strategy Business Architecture Business Governance
Information
Technology
Governance
Information
Technology Strategy
Information
Technology
Architecture
Data
Architecture
Information
Technology Security
Architecture
Application, Solution,
Infrastructure and
Service Architecture
Overall Data Architecture And Capabilities
March 22, 2021 11
Data Infrastructure
and Storage
Data Security,
Protection,
Access Control,
Authentication,
Authorisation
Data
Management,
Governance,
Architecture,
Operations,
Supporting
Processes
Data Reporting and
Analytics,
Visualisation Tools
and Facilities
Data Design,
Modelling,
Operational Data
Stores
Master and Reference
Data Management
Metadata Data
Management
Data Integration,
Access, Flow,
Exchange, Transfer,
Transformation,
Load And Extract
Data Warehouse, Data
Marts, Data Lakes
Unstructured Data
and Document
Management
External Data Sources
and Interacting
Parties
Data Integration Architecture
March 22, 2021 12
Data Sources Data Channels
Data Integration
Security,
Authentication,
Authorisation
Data Integration
Operations
Management,
Administration
Data Integration
Development, Testing
and Deployment
External Data Sources
and Targets
Data Integration
Technologies
Data Integration
Scheduler and Rules
Engine
Internal Data Sources
and Targets
Data Integration As Part Of Overall Information
Technology Architecture
March 22, 2021 13
Overall Business and IT
Architecture Context
Data
Architecture
Components
Data
Integration
Architecture
Components
Organisation Data Zones
• Data zones are containers for data entities with similar access
and location characteristics
March 22, 2021 14
Central Data
Entities and
Infrastructure
Zone
Business
Unit/Location
Entities and
Infrastructure
Zone(s)
Organisation Data Zone
Secure External Organisation Access Zone
Secure External Organisation Participation and Collaboration Zone
Insecure External Organisation Presentation And Access Zone
Sample Organisation Data Zones
• Central Data Infrastructure – this contains the central data applications
and their associated data
• Business Unit/Location Data Infrastructure – this is an individual
organisation business unit or location and the data entities it contains
• Organisation – this data zone represents the entire organisation and it
contains all the locations and business units or functions within the
organisation
• Secure External Organisation Access – this zone contains data entities that
enable secure access from outside the organisation
• Secure External Organisation Participation and Collaboration – this is a
location outside the physical organisation boundary where data entities
that are provided by or too trusted external parties reside, including cloud
platforms
• Insecure External Organisation Presentation And Access – this represents
a location where publicly accessible data entities reside. These entities are
regarded as insecure and/or untrusted
• Integration can occur within and between data zones
March 22, 2021 15
Source
Data
Entity
Target
Data
Entity
Internal And External Data
• Data can be defined as internal or external
− Internal data is (logically) held within a source data entity
− External data is data brought into or send out of a source data
entity to a target data entity
March 22, 2021 16
Internal Data
Data Entity
Data Load, Data Processing,
New Data Generation
External Data External Data
Internal And External Data
• At its core, data integration is concerned with enabling
the transition of data from internal to external states
• The internal and external state of data is separate from the
internal to external location of the source or target data
entity
− Internal – within the organisation data zones
− External – outside the organisation data zones
March 22, 2021 17
Data Integration Issues And Trends
March 22, 2021 18
The data landscape has been broadened and there are more data entities that form part of the extended
organisation data landscape as more applications are moved to the cloud and as cloud platforms are used for
providing additional facilities not currently present in organisations such as data analytics and machine learning
Initiatives and projects that are part digital transformation programmes involve integrating data between
internal and external parties
Need to reduce the latency of data integration as response time requirements are reduce
Performance, resilience and availability integration requirements are increasing
Need to deploy operational integrations more quickly to respond to business needs
There is a wider range of data entities as the data landscape increases in complexity
Process automation initiatives require an operational data integration platform
Greater volume and complexity of data integrations represent a potential data loss risk unless actively
monitored and managed
There are more data demands within the organisation especially in the areas of analytics and the associated
data integrations from operational data sources
Data Trends Affecting Data Integration
Greater volumes of operational data from increasing numbers of
different sources and providers
Greater volumes of derived data
More data sources both internal and external to the organisation
Data in larger numbers of different formats
Data with wider range of contents
Data being generated at different rates
Data being generated at different times
Data being generated with varying degrees accuracy, reliability
and greater fuzziness
Data that changes constantly
Data that is of different utility and value
March 22, 2021 19
Data Integration, Access, Flow, Exchange, Transfer,
Load And Extraction Processes
March 22, 2021 20
Application
Data Source
Application
Data Store
Data Load
Data
Transfer
Data
Exchange
Application
Application
Data
Access
Data
Extraction
Data Source
Data
Flow
Data
Migration
Data
Extraction
Data Store
Data
Replication
Location
Data
Publication
Application
Data
Presentation
Application
Data
Retrieval
Data Integration, Access, Flow, Exchange, Transfer,
Load And Extraction Processes
March 22, 2021 21
Application
Data Source
Application
Data Store
Data Load
Data
Transfer
Data
Exchange
Application
Application
Data
Access
Data
Extraction
Data Source
Data
Flow
Data
Migration
Data
Extraction
Data Store
Data
Replication
Location
Data
Publication
Application
Data
Presentation
Application
Data
Retrieval
Data Integration
Data Integration, Access, Flow, Exchange, Transfer,
Load And Extraction Processes
• Within any organisation, there will be many different data movements being performed in
different ways using different technologies and approaches:
− API/Web Service
− SOAP
− RPC
− SOA/ESB
− FTP
− ETL/ELT
− EDI
− AS1/2/3
− SMTP
− Database replication
− Change data capture
− IPaaS
− Stream processing
− Message queueing (MQSeries, MQTT, AMQP, Active MQ, JMS, Azure Queues, …)
− DB link
− Batch
− DDS
− OPC-UA/IEC 62541
− IEC 60870
− Proprietary technologies (such as SWIFT)
− … And many others
March 22, 2021 22
Proliferation of integration
technologies and approaches
indicates the long-standing and
pervasive nature of data
integration with information
technology
Wider Data Integration Concerns
March 22, 2021 23
Cloud Data
Store (Lake,
Warehouse)
SaaS
Application
and Data Store
On Premises
Data
Application
and Data Store
On Premises
Data
Warehouse
Cloud
Reporting and
Analysis
Application
On Premises
Reporting and
Analysis
Application
On Premises
Data
Application
and Data Store
On Premises
Data
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
IaaS Hosted
Application
and Data Store
External
Collaborating
Party
External DMZ
Wider Data Integration Scenarios And Concerns
• The data integration landscape is becoming more
heterogenous leading to data integration across data
zones
− Between on-premises entities
− Between on-premises and external collaborating parties
− Between external collaborating parties and cloud-based entities
− Between on-premises and cloud SaaS solutions
− Between on-premises and cloud infrastructure IaaS solutions
− Within the same cloud provider
− Between different cloud providers
• The approach to data integration and the technologies to
use has changed from a purely internal use only solution to
one encompassing a range of inter-zonal data movements
March 22, 2021 24
Data Integration Scenarios
March 22, 2021 25
Cloud Data
Store (Lake,
Warehouse)
SaaS
Application
and Data Store
On Premises
Data
Application
and Data Store
On Premises
Data
Warehouse
Cloud
Reporting and
Analysis
Application
On Premises
Reporting and
Analysis
Application
On Premises
Data
Application
and Data Store
On Premises
Data
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
IaaS Hosted
Application
and Data Store
External
Collaborating
Party
External DMZ
Between
On-premises
Entities
Between On-premises Entities and
External Collaborating Parties
Data Integration Logical Components
• On Premises Data Integration
− Performs integration within and between on-premises data
entities
• Data Integration Gateway
− Enables data integration between internal and external data
entities
• External Data Integration
− Enables data integration between internal and external data
entitles
− This includes between on-premises and cloud
March 22, 2021 26
Data Integration Components
March 22, 2021 27
Cloud Data
Store (Lake,
Warehouse)
SaaS
Application
and Data Store
On Premises
Data
Application
and Data Store
On Premises
Data
Warehouse
Cloud
Reporting and
Analysis
Application
On Premises
Reporting and
Analysis
Application
On Premises
Data
Application
and Data Store
On Premises
Data
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
IaaS Hosted
Application
and Data Store
External
Collaborating
Party
On
Premises
Data
Integration
Data
Integration
Gateway
External DMZ
External
Data
Integration
Data Integration Platform
March 22, 2021 28
Data Integration Logically Extends
Across The Entire Data Span
Data Integration
Plugboard
Data Integration, Access, Flow, Exchange, Transfer,
Load And Extract Architecture – Options
• Options
− Implement full data integration architecture
− Implement a logical meta integration architecture combining
multiple tools and technologies
− Implement multiple separate (technology or application specific)
integration platform, with or without overall management
• Irrespective of the approach, creating and maintaining an
inventory of data integrations in an essential activity
March 22, 2021 29
Data Integration Mediation/Wrapper/Meta Tool
• Rather than seek to have one big data integration solution,
consider the option of using multiple tools that are
(logically) integrated into a common integration
architecture
March 22, 2021 30
Individual Data Integration Tools/Applications
Meta Data Integration Platform
Tool Or Meta Tool
• Meta data integration tool approach can increase
complexity without increasing flexibility or reducing cost
• Overhead of managing multiple individual integration tools
and integrating these with meta tool can be complex
March 22, 2021 31
Core And Extended Dimensions Of Data Integration
March 22, 2021 32
Data Sources
and Data
Ingestion,
Data Ingestion
Rules
Data Targets and
Data Mapping/
Transfer, Data
Integration Rules
Data
Transport
Technologies
Data
Transformations
and Data
Processing Rules
Data
Structures,
Formats and
Types
Security
and
Access
Control
Speed,
Volume,
Throughput,
Capacity,
Scalability
Development,
Validation,
Deployment
and
Maintenance
Monitoring,
Administration
and
Management
Logging,
Analysis,
Reporting,
Event and Alert
Management
Scheduling
and
Triggering Interim
Data
Storage/
Data
Staging
Capacity
Management
Availability
and
Continuity
Management
Platform
Architecture
Management
Operations
Management
Governance
and
Knowledge
Management,
Data
Semantics
Service Level
Management
Dimensions Of Data Integration
• Three dimensions of data integration
− Core – operational components – the core functionality of the data integration platform
• Data Sources and Data Ingestion, Data Ingestion Rules
• Data Targets and Data Mapping/Transfer, Data Integration Rules
• Data Transport Technologies
• Interim Data Storage/Data Staging
• Data Structures, Formats and Types
• Data Transformations and Data Processing Rules
− Platform – management aspects – the operational elements of the data integration platform
• Speed, Volume, Throughput, Capacity, Scalability
• Security and Access Control
• Development, Validation, Deployment and Maintenance
• Monitoring, Administration and Management
• Scheduling and Triggering
• Logging, Analysis, Reporting, Event and Alert Management
− Service – key supporting processes and enabling components – that need to be part of any
usable data integration platform
• Service Level Management
• Capacity Management
• Availability and Continuity Management
• Platform Architecture Management
• Governance and Knowledge Management, Data Semantics
• Operations Management
March 22, 2021 33
Data Integration Core Operational Characteristics
• Data Sources and Data Ingestion, Data Ingestion Rules – the
sources of data for data integration and the rules and technologies
for processing
• Data Targets and Data Mapping/Transfer, Data Integration Rules –
the targets of data for data integration and the rules and
technologies for processing
• Data Transport Technologies – support for the range of data
integration technologies
• Interim Data Storage/Data Staging – provision of a data staging
area for asynchronous data retrieval
• Data Structures, Formats and Types – support for a range of input
and output data formats and types and the ability to convert from
one to another
• Data Transformations and Data Processing Rules – facility for
transforming source data
March 22, 2021 34
Data Integration Platform Management
Characteristics
• Speed, Volume, Throughput, Capacity, Scalability – ability of the platform
to handle the volume of data integration activity within agreed times
• Security and Access Control – provision of facilities to authenticate and
authorise data access requests and to interact with data source security
layer
• Development, Validation, Deployment and Maintenance – capability to
develop, test, deploy and manage new data integrations and changes to
existing data integrations
• Monitoring, Administration and Management – facilities to monitor the
operation of the data integration platform and manage and administer it
• Scheduling and Triggering – capacity to manage data integration
schedules and events that trigger integrations
• Logging, Analysis, Reporting, Event and Alert Management -provision of
event and activity logging, the ability to define and receive alerts and the
ability to report on and analyse event data
March 22, 2021 35
Data Integration Platform Service Characteristics
• Service Level Management – ensuring that the platform complies with
agreed data integration performance and throughput service levels
• Capacity Management – monitoring the resources used by the integration
platform and ensuring that the platform has sufficient resources
• Availability and Continuity Management – guaranteeing that the platform
meets availability needs and ensuring its continuity of operations
• Platform Architecture Management – managing the overall platform
architecture, its upgrades, the additional of new facilities and the support
for new integration technologies
• Governance and Knowledge Management, Data Semantics – managing
knowledge about data integration and providing information about data
read from sources and transferred to targets
• Operations Management – managing the provision of operational support
services for all aspects of the data integration platform
March 22, 2021 36
Logical Unified Data Integration Architecture
March 22, 2021 37
Dashboard/
Analytics/
Reporting
Deployed Data
Integrations
Operational
Process Usage
Log
Scheduler,
Rules Engine
Operational
Data
Integrations
Integration Design and
Development, Version
Management and Control
Integration
Templates and
Template
Library
Integration
Publication/
Deployment
External
Data Sources
and Targets
Internal Data Sources
and Targets
Integration
Component
/Product
/Tool Library
Deployed
Integration
Operation
Alerting/
Event
Management
Management
and
Administration
Interface
Internal Access
Layer
External
Access
Layer
Data
Knowledge
Store
Security
Interim Data
Store
External
to
Internal
Translation
Data
Integration
Execution
Core integration Platform
Data
Integration
Gateway
Logical Unified Data Integration Architecture –
Components – 1/2
• Core integration Platform – this orchestrates and manages the operation of data integrations
• Deployed Integration Operation – these are specific data integrations that have been developed,
tested and are deployed to the Core Integration Platform
• Scheduler, Rules Engine – this component manages the definition and operation integration schedules
and the actioning of integrations based on triggering events
• Operational Data Integrations – these are data integrations that are deployed to operation
• Data Integration Execution – this is the component of the Core Integration Platform that executes data
integrations
• Data Integration Gateway – gateway components provide communications channels to external data
sources and targets
• External Access Layer/Connectors – this allows external data sources and targets connect to the Core
Integration Platform
• Internal Access Layer /Connectors – this allows internal data sources and targets connect to the Core
Integration Platform
• Security – this provide support for source and target authorisation and authentication and integration
with their security layers
• Internal Data Sources and Targets – these are the data sources and targets that are local to the
platform
• External Data Targets and Targets – these are the data sources and targets that are remote from the
platform
• External to Internal Translation – this is intended to represent a facility that translates external
requests to internal addresses to provide an additional level of security
March 22, 2021 38
Logical Unified Data Integration Architecture –
Components – 2/2
• Data Knowledge Store – this stores information about data being integrated with to enable its retrieval
by subject and content
• Interim Data Store – this is a staging area for data being stored between transfer from source to target
• Operational Process Usage Log – this contains a log of integration usage and activities
• Alerting/Event Management – this allows for the definition, maintenance and handling events and
alerts
• Dashboard/Analytics/Reporting – this provide a facilities to report on platform activity and usage
• Management and Administration Interface – this allows the platform to be managed and
administered
• Deployed Data Integrations – this represents the set of active deployed integrations
• Integration Design and Development, Version Management and Control – this enables data
integrations to be developed, tested, deployed to production and subsequently updated
• Integration Templates and Template Library – this contains a library of data integration templates that
can be used and reused during development
• Integration Component /Product/Tool Library – this represents a library of integration technology
tools that can be incorporated into and used in integration run times
• Integration Publication/ Deployment – this supports the process for deploying data integrations into
production
March 22, 2021 39
Generalised Data Integration Approach
• Every data integration consists of a minimum of two (logical)
components
1. A source extract/provision half
2. A target delivery half
• The source must make the data available in some form and either
allow (enable PULL) or initiate (PUSH) the data movement to the
target
• The target then receives (PUSH) or retrieves (PULL) the data
• Direct source to target data integration involves individual point-to-
point connections, bypassing any data integration hub
• There may be an interim transformation stage where the format
and content of the provided data is changed to suit the needs of
target
• Some Source/Target PUSH/PULL combinations imply the need for a
staging area where extracted/provided data from the source resides
before being passed to the target
− Asynchronous data integration
• Classification can be extended by allowing for multiple sources and
targets
March 22, 2021 40
Source
PUSH PULL
Target
PUSH
PULL
Logical Data Integration Scenarios
March 22, 2021 41
Data Source Data Source
Data Source
Data Source
Data Target
Data Source
Source PULL
Target PUSH
Data Source Data Target
Source PUSH
Target PUSH
Source PULL
Target PULL
Source PUSH
Target PULL
Source PUSH
Target PUSH
INCOMING HALF OUTGOING HALF
Data Target
Source PUSH
Target PULL
Data Target
Source PUSH
Target PUSH
Data Target
Data
Integration
Hub
Integration Combinations
• There are many different integration modes/patterns depending on factors such as:
− Number of sources for a single integration
− Number of targets for a single integration
− Push or pull by source and target
− Initiator of the integration – source, target or hub
• Single Source, Single Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
• Multiple Source, Single Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
• Single Source, Multiple Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
• Multiple Source, Multiple Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
March 22, 2021 42
Single Source PUSH Single Target PUSH
• Single data source pushes data to integration hub
• Hub pushes data to target
March 22, 2021 43
Data Source Data Target
Source PUSH
Target PUSH
Single Source PUSH Single Target PULL
March 22, 2021 44
• Single data source pushes data to integration hub
• Hub allows the target to pull data
Data Source Data Target
Source PUSH
Target PULL
Single Source PULL Single Target PUSH
March 22, 2021 45
• Data pulled from single data source
• Hub pushes data to target
Data Source Data Target
Source PULL
Target PUSH
Single Source PULL Single Target PULL
March 22, 2021 46
• Data pulled from single data source
• Hub allows the target to pull data
Data Source Data Target
Source PULL
Target PULL
Multiple Source PUSH Single Target PUSH
March 22, 2021 47
Data Source Data Target
Multiple Source PUSH
Target PUSH
Data Source
Data Source
• Multiple data sources push data to integration hub where
it is aggregated
• Hub pushes data to target
Multiple Source PUSH Single Target PULL
March 22, 2021 48
Data Source Data Target
Multiple Source PUSH
Target PULL
Data Source
Data Source
• Data pushed from multiple data sources and aggregated
• Hub allows the target to pull data
Multiple Source PULL Single Target PUSH
March 22, 2021 49
Data Source Data Target
Multiple Source PULL
Target PUSH
Data Source
Data Source
• Data pulled from multiple data sources and aggregated
• Hub pushes data to target
Multiple Source PULL Single Target PULL
March 22, 2021 50
Data Source Data Target
Multiple Source PULL
Target PULL
Data Source
Data Source
• Data pulled from multiple data sources and aggregated
• Hub pushes data to multiple targets
Single Source PUSH Multiple Target PUSH
March 22, 2021 51
Data Source Data Target
Source PUSH
Multiple Target PUSH
Data Target
Data Target
• Single data source pushes data to integration hub
• Hub allows the target to pull data
Single Source PUSH Multiple Target PULL
March 22, 2021 52
Data Source Data Target
Source PUSH
Multiple Target PULL
Data Target
Data Target
• Single data source pushes data to integration hub
• Hub allows multiple targets to pull data
Single Source PULL Multiple Target PUSH
March 22, 2021 53
Data Source Data Target
Source PULL
Multiple Target PUSH
Data Target
Data Target
• Data pulled from single data source
• Hub pushes data to multiple targets
Single Source PULL Multiple Target PULL
March 22, 2021 54
Data Source Data Target
Source PULL
Multiple Target PULL
Data Target
Data Target
• Data pulled from single data source
• Hub allows multiple targets to pull data
Multiple Source PUSH Multiple Target PUSH
March 22, 2021 55
Data Source Data Target
Multiple Source PUSH
Multiple Target PUSH
Data Target
Data Target
• Multiple data sources pushes data to integration hub and
aggregated
• Hub allows multiple targets to pull aggregated data
Data Source
Data Source
Multiple Source PUSH Multiple Target PULL
March 22, 2021 56
Data Source Data Target
Multiple Source PUSH
Multiple Target PULL
Data Target
Data Target
• Multiple data sources pushes data to integration hub and
aggregated
• Hub pushes aggregated data to multiple targets
Data Source
Data Source
Multiple Source PULL Multiple Target PUSH
March 22, 2021 57
Data Source Data Target
Multiple Source PULL
Multiple Target PUSH
Data Target
Data Target
• Data pulled from multiple data sources and aggregated
• Hub pushes aggregated data to multiple targets
Data Source
Data Source
Multiple Source PULL Multiple Target PULL
March 22, 2021 58
Data Source Data Target
Multiple Source PULL
Multiple Target PULL
Data Target
Data Target
• Data pulled from multiple data sources and aggregated
• Hub allows multiple targets to pull aggregated data
Data Source
Data Source
Data Integration Initiation And Notification
• For source PULL/target PUSH integrations, the integration hub is
always in direct control and can synchronise the two halves of the
integration – its can initiate the data PULL and then PUSH the
resulting data
• For other combinations, the hub has less control of synchronisation
− Source PUSH/Target PUSH – integration hub can PUSH the data to the target
after it has been PUSHed by the source
− Source PULL/Target PULL – integration hub can PULL the data from the source
when the target requests it
− Source PUSH/Target PULL – integration hub must wait for source to PUSH data
before it can respond to PULL request from target
March 22, 2021 59
Source
PUSH PULL
Target
PUSH
PULL
= Fully Synchronised
= Partially Synchronised
= Unsynchronised
Synchronous And Asynchronous Data Integration
• Synchronous integration occurs where the hub initiates both
the PULLing of source data and the PUSHing of transmitted
data
• Asynchronous integration is where the source supply and the
target provision of data do not occur in sequence or where the
triggering of the source supply or target provision events are
not controlled
• This includes subscription-type integration where the data is
retained by the hub and retrieved by subscribers
March 22, 2021 60
Data Source Data Target
Source PULL
Target PUSH
Data Integration Hub Data Retention
• How long should the integration hub retain data?
• The integration hub should not become one more
organisation data store where data is retained forever
• Target PULL integrations are the potential source of
accumulated retained undelivered data
• The integration hub needs to include a facility to purge
unretrieved data and/or the data retention interval needs
to be specified as a data integration attribute
• Where a target makes a PULL request for data no longer
available, the integration hub needs to handle this.
March 22, 2021 61
Data Integration Initiation – Source PULL/Target
PUSH
March 22, 2021 62
Data Target
Data Source Data Target
Hub Requests Data from Source and Send it
To The Target
Data Integration Initiation – Source PUSH/Target
PUSH
March 22, 2021 63
Data Source Data Target
Hub Receives Data from Source
Data Target
Data Target
Hub Pushes Data to Target
Data Integration Initiation – Source PULL/Target
PULL
March 22, 2021 64
Data Target
Data Target
Target Requests Data
Data Source Data Target
Hub Pulls Data From Source
Data Target
Data Target
Hub Responds to Pull Request From Target
Data Integration Initiation – Source PUSH/Target
PULL
March 22, 2021 65
Data Target
Data Target
Target Requests Data
Hub Responds Data Is Not Available
Data Source Data Target
Source Pushes Data to Hub Hub Receives Data from Source
Data Target
Data Target
Hub Notifies Target Data is Available
Data Target
Data Target
Target Requests Data
Hub Responds to Pull Request From Target
Data Integration Security
• Data integration security arises in fours areas
− Source
• PUSH – source may need to authenticate with the integration hub
• PULL – integration hub may need to authenticate with data source
− Target
• PUSH – integration hub may need to authenticate with data target
• PULL – target may need to authenticate with the integration hub
• Integration hub needs to support a range of authentication
and authorisation protocols
• Integration hub also needs to support security operations
and administration
March 22, 2021 66
Data Integration Security – Source PUSH
March 22, 2021 67
Data Source Data Target
Hub Authenticates Source and Transmits
Authorisation and Access Details
Data Source Data Target
Data Source Data Target
Source Authenticates With Hub, Identifying
Integration Name
Source PUSHes data
Data Integration Security – Source PULL
March 22, 2021 68
Data Source Data Target
Source Authenticates Source and Transmits
Authorisation and Access Details
Data Source Data Target
Data Source Data Target
Hub Authenticates With Source, Identifying
Integration Name
Hub PULLs data
Data Integration Security – Target PUSH
March 22, 2021 69
Data Target
Data Target
Data Target
Data Target
Data Target
Data Target
Target Authenticates Source and Transmits
Authorisation and Access Details
Hub Authenticates With Target, Identifying
Integration Name
Hub PUSHes data
Data Integration Security – Target PULL
March 22, 2021 70
Data Target
Data Target
Data Target
Data Target
Data Target
Data Target
Hub Authenticates Target and Transmits
Authorisation and Access Details
Target Authenticates With Hub, Identifying
Integration Name
Target PULLs data
Data Integration Metadata
• Data that provides information about the data integration that enables the
integration to be defined, implemented, operated, managed and monitored
• Classifications of metadata types
March 22, 2021 71
Types of
Integration
Metadata
Descriptive Information about the data integration
Business
What the data is, its sources, targets, meaning and relationships
with other data
Structural How the data integration is organised, operated and how versions
are maintained?
Administrative/
Process
How the data integration should be managed and administered
through its lifecycle stages and who can perform what operations
on the metadata
Statistical Information on actual data integration options, usage and other
volumetrics
Reference Sets of values for structured metadata fields
Attributes Of A Data Integration
• Each data
integration has a
number of
attributes or sets
of metadata that
defines its
operation and
use in detail
• This information
is needed to
define and
operate the
integration
• The information
must be
collected, stored,
made available
and maintained
in a metadata
store
March 22, 2021 72
Attribute Description
Identifier Defines a unique integration identifier
Related Integrations Lists related integrations and identifies the nature of the relationships, including any dependencies
Source(s) Defines the source systems or locations where the source data will be obtained from
Target(s) Defines the target systems or locations to which the data will be delivered or made available
Push/Pull from Source Identifies if the data is pulled or pushed from the source
Push/Pull from Target Identifies if the data is pulled or pushed to the target
Source Data Format Defines the format of the source data
Target Data Format Defines the format of the target data
Source Protocol Defines the interface protocol used to obtain the source data and any protocol-specific information
Target Protocol Defines the interface protocol used to deliver the target data and any protocol-specific information
Validation Lists any validations to be performed on the source data, defining where they are blocking or non-
blocking and any exception processing to be performed
Transformation Defines any transformation to be performed on the source data including transformation steps and
any splits or aggregations performed
Data Size Contains an estimate of the size of the source and (transformed) target data
Trigger Defines the event(s) that triggers the integration, if relevant
Frequency Defines the expected frequency of the data integration, if relevant
Data Retention Defines how long the data should be retained between source and target
Monitoring and Alerting Lists how the integration will be monitored and how alerts will be generated based on events
Source Access Security Defines any security associated with accessing the data source
Target Access Security Defines any security associated with accessing the data target
Audit Log Identifies where audit information relating to the operation and use of the integration ate stored
Restart After Failure Lists detail on how the integration should be recovered and restarted after failure
Data Sensitivity Lists the sensitivity of the data being handled by the integration
Ownership Identifies the business and technical owners of the integration
Priority Defines any priority assigned to the integration
Supporting Documentation Identifies where documentation relating to the integration is available
User Interface to
View/Maintain Transferred
Data
Identifies the user interface that is available to view and maintain the transferred data
Version Details on the current integration version and any previous versions
Active/Inactive Flag Indicates if the integration is active or inactive
Data Integration Specification
• Data integration can be logically specified as follows
{Integration{Name, Attributes}
Sources
{Source1,TechnologyType,Direction,Attributes}
{Source2,TechnologyType,Direction,Attributes}
{…}
}
{Transformation
{Name, Attributes}
Steps
{Step1,<Processing>}
{Step2,<Processing>}
[…]
}
Targets
{Target1,TechnologyType,Direction,Attributes}
{Target2,TechnologyType,Direction,Attributes}
{…}
}
March 22, 2021 73
Set of data sources, the mechanisms
by which data is transferred, the
transfer direction (PUSH/PULL) and
the extended integration attributes
The transformation performed on
the source data to create the data
sent to or made available to the
target
Set of data targets, the mechanisms
by which data is transferred, the
transfer direction (PUSH/PULL) and
the extended integration attributes
Overall integration identifier and
attributes
Data Integration Specification
• Attributes can be defined at the overall data integration
level or at the individual data source and target definition
level
• Technology type could be one of:
− FT – transfer a file using a file transfer protocol
− API – information is requested using an API made available by the
application
− MSG – information is exchanged using a message queueing
protocol
− ETL – data is exchanged using an ETL process
− HTTP – data is exchanged using HTTP GET/PUT
• This describes a common approach to defining data
integrations
March 22, 2021 74
Data Integration Transformation Specification
• Set of data processing activities, requiring on or more inputs
and performed in structured interim contingent outcome-
dependent order or sequence to generate one or more outputs
and cause one or more outcomes
• Transformation is the self-contained unit that completes a
given task
• Transformation can consist of sub-processes and/or activities
• Transformation and its constituent activities, stages and steps
can be decomposed into a number of levels of detail, down to
the individual atomic level
• Transformation is primarily concerned with its outcomes and
outputs
March 22, 2021 75
Data Integration Transformation
March 22, 2021 76
• Transformation can be represented at different levels of detail
Transformation
Trigger(s)
Required Input(s)
Output(s)
Outcome(s)
Data Integration Transformation
March 22, 2021 77
• Activities within transformation can be linked by routers that
direct flow and maintain order based on the values of output(s)
and the status of outcome(s)
Data
Processing
Trigger(s)
Required
Input(s)
Output(s)
Outcome(s)
Router
Data
Processing
Trigger(s)
Required
Input(s)
Output(s)
Outcome(s)
Data
Processing
Trigger(s)
Required
Input(s)
Output(s)
Outcome(s)
Standardised Deployed Operational Data
Integrations
March 22, 2021 78
Dashboard/
Analytics/
Reporting
Deployed Data
Integrations
Operational
Process Usage
Log
Scheduler,
Rules Engine
Operational
Data
Integrations
Integration Design and
Development, Version
Management and Control
Integration
Templates and
Template
Library
Integration
Publication/
Deployment
External
Data Sources
and Targets
Internal Data Sources
and Targets
Integration
Component
/Product
/Tool Library
Deployed
Integration
Operation
Alerting/
Event
Management
Management
and
Administration
Interface
Internal Access
Layer
External
Access
Layer
Data
Knowledge
Store
Security
Interim Data
Store
External
to
Internal
Translation
Data
Integration
Execution
Core integration Platform
Data
Integration
Gateway
Next Steps
• Understand the Scope of the Current Data Integration
State
− Create an inventory of data integration technologies
− Create an inventory of existing data integrations
• Create a Future State Data Integration Architecture
− Create a data integration reference architecture
− Translate reference architecture into an implementation design
− Map implementation design to integration technologies and
products
− Map existing integrations to implementation design
March 22, 2021 79
More Information
Alan McSweeney
http://ie.linkedin.com/in/alanmcsweeney
https://www.amazon.com/dp/1797567616
22 March 2021 80

More Related Content

What's hot

Capability Model_Data Governance
Capability Model_Data GovernanceCapability Model_Data Governance
Capability Model_Data Governance
Steve Novak
 
Forget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart DataForget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart Data
Alan McSweeney
 

What's hot (20)

Capability Model_Data Governance
Capability Model_Data GovernanceCapability Model_Data Governance
Capability Model_Data Governance
 
Forget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart DataForget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart Data
 
Incorporating A DesignOps Approach Into Solution Architecture
Incorporating A DesignOps Approach Into Solution ArchitectureIncorporating A DesignOps Approach Into Solution Architecture
Incorporating A DesignOps Approach Into Solution Architecture
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Enterprise Architecture Implementation And The Open Group Architecture Framew...
Enterprise Architecture Implementation And The Open Group Architecture Framew...Enterprise Architecture Implementation And The Open Group Architecture Framew...
Enterprise Architecture Implementation And The Open Group Architecture Framew...
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
 
Complexity and Solution Architecture
Complexity and Solution ArchitectureComplexity and Solution Architecture
Complexity and Solution Architecture
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
DMBOK and Data Governance
DMBOK and Data GovernanceDMBOK and Data Governance
DMBOK and Data Governance
 
Ibm data governance framework
Ibm data governance frameworkIbm data governance framework
Ibm data governance framework
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Structured Approach to Solution Architecture
Structured Approach to Solution ArchitectureStructured Approach to Solution Architecture
Structured Approach to Solution Architecture
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Data Governance Workshop
Data Governance WorkshopData Governance Workshop
Data Governance Workshop
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...
 

Similar to Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architecture

Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
Alan McSweeney
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Enterprise information infrastructure
Enterprise information infrastructureEnterprise information infrastructure
Enterprise information infrastructure
Junaid Muzaffar
 
Unlock the Power of Mainframe Data for Democratized Cloud Analytics
Unlock the Power of Mainframe Data for Democratized Cloud AnalyticsUnlock the Power of Mainframe Data for Democratized Cloud Analytics
Unlock the Power of Mainframe Data for Democratized Cloud Analytics
Precisely
 
Using Modeling Base Approach For It Planning
Using Modeling Base Approach For It PlanningUsing Modeling Base Approach For It Planning
Using Modeling Base Approach For It Planning
natty_gur
 

Similar to Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architecture (20)

PD 2 - Data Integration Architecture.pptx
PD 2 - Data Integration Architecture.pptxPD 2 - Data Integration Architecture.pptx
PD 2 - Data Integration Architecture.pptx
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
 
Preparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guidePreparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guide
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Accelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationAccelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data Virtualization
 
Leveraging AI and ML for efficient data integration.pdf
Leveraging AI and ML for efficient data integration.pdfLeveraging AI and ML for efficient data integration.pdf
Leveraging AI and ML for efficient data integration.pdf
 
data collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptxdata collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptx
 
Enterprise information infrastructure
Enterprise information infrastructureEnterprise information infrastructure
Enterprise information infrastructure
 
SG Data Mgt - Findings and Recommendations.pptx
SG Data Mgt - Findings and Recommendations.pptxSG Data Mgt - Findings and Recommendations.pptx
SG Data Mgt - Findings and Recommendations.pptx
 
Unlock the Power of Mainframe Data for Democratized Cloud Analytics
Unlock the Power of Mainframe Data for Democratized Cloud AnalyticsUnlock the Power of Mainframe Data for Democratized Cloud Analytics
Unlock the Power of Mainframe Data for Democratized Cloud Analytics
 
CXAIR for Data Migration
CXAIR for Data MigrationCXAIR for Data Migration
CXAIR for Data Migration
 
Sgcp14dunlea
Sgcp14dunleaSgcp14dunlea
Sgcp14dunlea
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
Master data management
Master data managementMaster data management
Master data management
 
Data Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentData Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data Environment
 
Software Infrastructure Design, Integration, & Migration Roadmap
Software Infrastructure Design, Integration, & Migration RoadmapSoftware Infrastructure Design, Integration, & Migration Roadmap
Software Infrastructure Design, Integration, & Migration Roadmap
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Using Modeling Base Approach For It Planning
Using Modeling Base Approach For It PlanningUsing Modeling Base Approach For It Planning
Using Modeling Base Approach For It Planning
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
 

More from Alan McSweeney

Solution Architecture and Solution Estimation.pdf
Solution Architecture and Solution Estimation.pdfSolution Architecture and Solution Estimation.pdf
Solution Architecture and Solution Estimation.pdf
Alan McSweeney
 
IT Architecture’s Role In Solving Technical Debt.pdf
IT Architecture’s Role In Solving Technical Debt.pdfIT Architecture’s Role In Solving Technical Debt.pdf
IT Architecture’s Role In Solving Technical Debt.pdf
Alan McSweeney
 
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Alan McSweeney
 
Solution Security Architecture
Solution Security ArchitectureSolution Security Architecture
Solution Security Architecture
Alan McSweeney
 
Solution Architecture And (Robotic) Process Automation Solutions
Solution Architecture And (Robotic) Process Automation SolutionsSolution Architecture And (Robotic) Process Automation Solutions
Solution Architecture And (Robotic) Process Automation Solutions
Alan McSweeney
 
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Alan McSweeney
 
Review of Information Technology Function Critical Capability Models
Review of Information Technology Function Critical Capability ModelsReview of Information Technology Function Critical Capability Models
Review of Information Technology Function Critical Capability Models
Alan McSweeney
 
Critical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference ArchitectureCritical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference Architecture
Alan McSweeney
 
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020
Alan McSweeney
 
Solution Architecture and Solution Acquisition
Solution Architecture and Solution AcquisitionSolution Architecture and Solution Acquisition
Solution Architecture and Solution Acquisition
Alan McSweeney
 

More from Alan McSweeney (20)

Solution Architecture and Solution Estimation.pdf
Solution Architecture and Solution Estimation.pdfSolution Architecture and Solution Estimation.pdf
Solution Architecture and Solution Estimation.pdf
 
Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...
Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...
Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...
 
Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...
Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...
Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...
 
IT Architecture’s Role In Solving Technical Debt.pdf
IT Architecture’s Role In Solving Technical Debt.pdfIT Architecture’s Role In Solving Technical Debt.pdf
IT Architecture’s Role In Solving Technical Debt.pdf
 
Solution Architecture And Solution Security
Solution Architecture And Solution SecuritySolution Architecture And Solution Security
Solution Architecture And Solution Security
 
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
 
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
 
Solution Security Architecture
Solution Security ArchitectureSolution Security Architecture
Solution Security Architecture
 
Solution Architecture And (Robotic) Process Automation Solutions
Solution Architecture And (Robotic) Process Automation SolutionsSolution Architecture And (Robotic) Process Automation Solutions
Solution Architecture And (Robotic) Process Automation Solutions
 
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
 
Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...
Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...
Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...
 
Operational Risk Management Data Validation Architecture
Operational Risk Management Data Validation ArchitectureOperational Risk Management Data Validation Architecture
Operational Risk Management Data Validation Architecture
 
Ireland 2019 and 2020 Compared - Individual Charts
Ireland   2019 and 2020 Compared - Individual ChartsIreland   2019 and 2020 Compared - Individual Charts
Ireland 2019 and 2020 Compared - Individual Charts
 
Analysis of Irish Mortality Using Public Data Sources 2014-2020
Analysis of Irish Mortality Using Public Data Sources 2014-2020Analysis of Irish Mortality Using Public Data Sources 2014-2020
Analysis of Irish Mortality Using Public Data Sources 2014-2020
 
Ireland – 2019 And 2020 Compared In Data
Ireland – 2019 And 2020 Compared In DataIreland – 2019 And 2020 Compared In Data
Ireland – 2019 And 2020 Compared In Data
 
Review of Information Technology Function Critical Capability Models
Review of Information Technology Function Critical Capability ModelsReview of Information Technology Function Critical Capability Models
Review of Information Technology Function Critical Capability Models
 
Critical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference ArchitectureCritical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference Architecture
 
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020
 
Agile Solution Architecture and Design
Agile Solution Architecture and DesignAgile Solution Architecture and Design
Agile Solution Architecture and Design
 
Solution Architecture and Solution Acquisition
Solution Architecture and Solution AcquisitionSolution Architecture and Solution Acquisition
Solution Architecture and Solution Acquisition
 

Recently uploaded

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 

Recently uploaded (20)

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 

Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architecture

  • 1. Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architecture Alan McSweeney http://ie.linkedin.com/in/alanmcsweeney https://www.amazon.com/dp/1797567616
  • 2. Data Integration, Access, Flow, Exchange, Transfer, Load, Share And Extract • Set of data movements between data entities - data sources and data targets - across the organisation’s data landscape • Data integration is more than just extracting data from operational systems to populate data warehouses and long-term data stores • The movement, creation, transfer and exchange of data breathes life into the set of organisation solutions • Data integration is the combination of all these data flows, transfers, exchanges, loads, extracts that occurs across the data landscape and the tools, methods and approaches to facilitating and achieving them • Data integration is an enterprise-level capability that should be available to all applications and solutions • The organisation’s data fabric should include infrastructural components and tools that deliver these data integration facilities • Individual solution and applications and their implementation projects should not have to create (additional) point-to-point custom integrations • Data interoperability and solution interoperability are closely related – you cannot have effective solution interoperability without data interoperability March 22, 2021 2
  • 3. Evolution Of Data Integration • With many organisations, data integration tends to have evolved over time with many solution-specific tactical approaches implemented • The consequence is that there is frequently a mixed, inconsistent data integration topography • Data integrations are often poorly understood, undocumented and difficult to support, maintain and enhance March 22, 2021 3
  • 4. Current State Of Data Integration March 22, 2021 4
  • 5. Data Integration • Data integration has multiple meanings and multiple ways of being used such as: − Integration in terms of handling data transfers, exchanges, requests for information using a variety of information movement technologies − Integration in terms of migrating data from a source to a target system and/or loading data into a target system − Integration in terms of aggregating data from multiple sources and creating one source, with possibly date and time dimensions added to the integrated data, for reporting and analytics − Integration in terms of synchronising two data sources or regularly extracting data from one data sources to update a target − Integration in terms of service orientation and API management to provide access to raw data or the results of processing March 22, 2021 5
  • 6. Two Aspects Of Data Integration • Overall data integration architecture needs to handle both types March 22, 2021 6 Operational System Operational System Operational System Operational Integration – allow data to move from one operational system and its data store to another Analytic Integration – move data from operational systems and their data stores into a common structure for retrieval, reporting and analysis Operational System Operational System Analytic Data Store Data Retrieval
  • 7. Data Integration And Organisation Data Plumbing March 22, 2021 7 Organisation Technology Solutions Landscape Data Plumbing Required to Support Solutions Landscape and Solution Interoperability
  • 8. Data Fabric, Data Landscape And Data Entities • The data landscape is an integrated view of all data entities within (core) and outside (extended) the organisation that the organisation obtains, shares and provides data • The data fabric is the aggregation of the data entities and their data flows across the core and extended organisation • Data entities are data assets that are involved in the provisioning, storage, processing and transfer of organisation data − Data entities perform data-related activities across the spectrum of data actions and events − A data entity is a hardware or software technology component involved in any form of data processing March 22, 2021 8
  • 9. Importance Of Data Integration In IT Architecture • Enterprise Architecture – defines overall IT architecture for the organisation • Data Architecture – defines the data architecture for the organisation, of which data integration and interoperability is one element • Solution Architecture – designs solutions in the context of overall enterprise and data architectures and the need for solutions to access, integrate, exchange, transfer and extract data − Effective data integration is key to solution interoperability • Data Integration Architecture – defines a common approach to and set of enabling and implementing technologies in the areas of data integration, access, flow, exchange, transfer, load and extract that can be used by all IT solutions March 22, 2021 9 Enterprise Architecture Data Architecture Data Integration Architecture Solution Architecture
  • 10. Business And Information Technology Architecture March 22, 2021 10 Business Strategy Business Architecture Business Governance Information Technology Governance Information Technology Strategy Information Technology Architecture Data Architecture Information Technology Security Architecture Application, Solution, Infrastructure and Service Architecture
  • 11. Overall Data Architecture And Capabilities March 22, 2021 11 Data Infrastructure and Storage Data Security, Protection, Access Control, Authentication, Authorisation Data Management, Governance, Architecture, Operations, Supporting Processes Data Reporting and Analytics, Visualisation Tools and Facilities Data Design, Modelling, Operational Data Stores Master and Reference Data Management Metadata Data Management Data Integration, Access, Flow, Exchange, Transfer, Transformation, Load And Extract Data Warehouse, Data Marts, Data Lakes Unstructured Data and Document Management External Data Sources and Interacting Parties
  • 12. Data Integration Architecture March 22, 2021 12 Data Sources Data Channels Data Integration Security, Authentication, Authorisation Data Integration Operations Management, Administration Data Integration Development, Testing and Deployment External Data Sources and Targets Data Integration Technologies Data Integration Scheduler and Rules Engine Internal Data Sources and Targets
  • 13. Data Integration As Part Of Overall Information Technology Architecture March 22, 2021 13 Overall Business and IT Architecture Context Data Architecture Components Data Integration Architecture Components
  • 14. Organisation Data Zones • Data zones are containers for data entities with similar access and location characteristics March 22, 2021 14 Central Data Entities and Infrastructure Zone Business Unit/Location Entities and Infrastructure Zone(s) Organisation Data Zone Secure External Organisation Access Zone Secure External Organisation Participation and Collaboration Zone Insecure External Organisation Presentation And Access Zone
  • 15. Sample Organisation Data Zones • Central Data Infrastructure – this contains the central data applications and their associated data • Business Unit/Location Data Infrastructure – this is an individual organisation business unit or location and the data entities it contains • Organisation – this data zone represents the entire organisation and it contains all the locations and business units or functions within the organisation • Secure External Organisation Access – this zone contains data entities that enable secure access from outside the organisation • Secure External Organisation Participation and Collaboration – this is a location outside the physical organisation boundary where data entities that are provided by or too trusted external parties reside, including cloud platforms • Insecure External Organisation Presentation And Access – this represents a location where publicly accessible data entities reside. These entities are regarded as insecure and/or untrusted • Integration can occur within and between data zones March 22, 2021 15
  • 16. Source Data Entity Target Data Entity Internal And External Data • Data can be defined as internal or external − Internal data is (logically) held within a source data entity − External data is data brought into or send out of a source data entity to a target data entity March 22, 2021 16 Internal Data Data Entity Data Load, Data Processing, New Data Generation External Data External Data
  • 17. Internal And External Data • At its core, data integration is concerned with enabling the transition of data from internal to external states • The internal and external state of data is separate from the internal to external location of the source or target data entity − Internal – within the organisation data zones − External – outside the organisation data zones March 22, 2021 17
  • 18. Data Integration Issues And Trends March 22, 2021 18 The data landscape has been broadened and there are more data entities that form part of the extended organisation data landscape as more applications are moved to the cloud and as cloud platforms are used for providing additional facilities not currently present in organisations such as data analytics and machine learning Initiatives and projects that are part digital transformation programmes involve integrating data between internal and external parties Need to reduce the latency of data integration as response time requirements are reduce Performance, resilience and availability integration requirements are increasing Need to deploy operational integrations more quickly to respond to business needs There is a wider range of data entities as the data landscape increases in complexity Process automation initiatives require an operational data integration platform Greater volume and complexity of data integrations represent a potential data loss risk unless actively monitored and managed There are more data demands within the organisation especially in the areas of analytics and the associated data integrations from operational data sources
  • 19. Data Trends Affecting Data Integration Greater volumes of operational data from increasing numbers of different sources and providers Greater volumes of derived data More data sources both internal and external to the organisation Data in larger numbers of different formats Data with wider range of contents Data being generated at different rates Data being generated at different times Data being generated with varying degrees accuracy, reliability and greater fuzziness Data that changes constantly Data that is of different utility and value March 22, 2021 19
  • 20. Data Integration, Access, Flow, Exchange, Transfer, Load And Extraction Processes March 22, 2021 20 Application Data Source Application Data Store Data Load Data Transfer Data Exchange Application Application Data Access Data Extraction Data Source Data Flow Data Migration Data Extraction Data Store Data Replication Location Data Publication Application Data Presentation Application Data Retrieval
  • 21. Data Integration, Access, Flow, Exchange, Transfer, Load And Extraction Processes March 22, 2021 21 Application Data Source Application Data Store Data Load Data Transfer Data Exchange Application Application Data Access Data Extraction Data Source Data Flow Data Migration Data Extraction Data Store Data Replication Location Data Publication Application Data Presentation Application Data Retrieval Data Integration
  • 22. Data Integration, Access, Flow, Exchange, Transfer, Load And Extraction Processes • Within any organisation, there will be many different data movements being performed in different ways using different technologies and approaches: − API/Web Service − SOAP − RPC − SOA/ESB − FTP − ETL/ELT − EDI − AS1/2/3 − SMTP − Database replication − Change data capture − IPaaS − Stream processing − Message queueing (MQSeries, MQTT, AMQP, Active MQ, JMS, Azure Queues, …) − DB link − Batch − DDS − OPC-UA/IEC 62541 − IEC 60870 − Proprietary technologies (such as SWIFT) − … And many others March 22, 2021 22 Proliferation of integration technologies and approaches indicates the long-standing and pervasive nature of data integration with information technology
  • 23. Wider Data Integration Concerns March 22, 2021 23 Cloud Data Store (Lake, Warehouse) SaaS Application and Data Store On Premises Data Application and Data Store On Premises Data Warehouse Cloud Reporting and Analysis Application On Premises Reporting and Analysis Application On Premises Data Application and Data Store On Premises Data Application and Data Store SaaS Application and Data Store SaaS Application and Data Store SaaS Application and Data Store IaaS Hosted Application and Data Store External Collaborating Party External DMZ
  • 24. Wider Data Integration Scenarios And Concerns • The data integration landscape is becoming more heterogenous leading to data integration across data zones − Between on-premises entities − Between on-premises and external collaborating parties − Between external collaborating parties and cloud-based entities − Between on-premises and cloud SaaS solutions − Between on-premises and cloud infrastructure IaaS solutions − Within the same cloud provider − Between different cloud providers • The approach to data integration and the technologies to use has changed from a purely internal use only solution to one encompassing a range of inter-zonal data movements March 22, 2021 24
  • 25. Data Integration Scenarios March 22, 2021 25 Cloud Data Store (Lake, Warehouse) SaaS Application and Data Store On Premises Data Application and Data Store On Premises Data Warehouse Cloud Reporting and Analysis Application On Premises Reporting and Analysis Application On Premises Data Application and Data Store On Premises Data Application and Data Store SaaS Application and Data Store SaaS Application and Data Store SaaS Application and Data Store IaaS Hosted Application and Data Store External Collaborating Party External DMZ Between On-premises Entities Between On-premises Entities and External Collaborating Parties
  • 26. Data Integration Logical Components • On Premises Data Integration − Performs integration within and between on-premises data entities • Data Integration Gateway − Enables data integration between internal and external data entities • External Data Integration − Enables data integration between internal and external data entitles − This includes between on-premises and cloud March 22, 2021 26
  • 27. Data Integration Components March 22, 2021 27 Cloud Data Store (Lake, Warehouse) SaaS Application and Data Store On Premises Data Application and Data Store On Premises Data Warehouse Cloud Reporting and Analysis Application On Premises Reporting and Analysis Application On Premises Data Application and Data Store On Premises Data Application and Data Store SaaS Application and Data Store SaaS Application and Data Store SaaS Application and Data Store IaaS Hosted Application and Data Store External Collaborating Party On Premises Data Integration Data Integration Gateway External DMZ External Data Integration
  • 28. Data Integration Platform March 22, 2021 28 Data Integration Logically Extends Across The Entire Data Span Data Integration Plugboard
  • 29. Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architecture – Options • Options − Implement full data integration architecture − Implement a logical meta integration architecture combining multiple tools and technologies − Implement multiple separate (technology or application specific) integration platform, with or without overall management • Irrespective of the approach, creating and maintaining an inventory of data integrations in an essential activity March 22, 2021 29
  • 30. Data Integration Mediation/Wrapper/Meta Tool • Rather than seek to have one big data integration solution, consider the option of using multiple tools that are (logically) integrated into a common integration architecture March 22, 2021 30 Individual Data Integration Tools/Applications Meta Data Integration Platform
  • 31. Tool Or Meta Tool • Meta data integration tool approach can increase complexity without increasing flexibility or reducing cost • Overhead of managing multiple individual integration tools and integrating these with meta tool can be complex March 22, 2021 31
  • 32. Core And Extended Dimensions Of Data Integration March 22, 2021 32 Data Sources and Data Ingestion, Data Ingestion Rules Data Targets and Data Mapping/ Transfer, Data Integration Rules Data Transport Technologies Data Transformations and Data Processing Rules Data Structures, Formats and Types Security and Access Control Speed, Volume, Throughput, Capacity, Scalability Development, Validation, Deployment and Maintenance Monitoring, Administration and Management Logging, Analysis, Reporting, Event and Alert Management Scheduling and Triggering Interim Data Storage/ Data Staging Capacity Management Availability and Continuity Management Platform Architecture Management Operations Management Governance and Knowledge Management, Data Semantics Service Level Management
  • 33. Dimensions Of Data Integration • Three dimensions of data integration − Core – operational components – the core functionality of the data integration platform • Data Sources and Data Ingestion, Data Ingestion Rules • Data Targets and Data Mapping/Transfer, Data Integration Rules • Data Transport Technologies • Interim Data Storage/Data Staging • Data Structures, Formats and Types • Data Transformations and Data Processing Rules − Platform – management aspects – the operational elements of the data integration platform • Speed, Volume, Throughput, Capacity, Scalability • Security and Access Control • Development, Validation, Deployment and Maintenance • Monitoring, Administration and Management • Scheduling and Triggering • Logging, Analysis, Reporting, Event and Alert Management − Service – key supporting processes and enabling components – that need to be part of any usable data integration platform • Service Level Management • Capacity Management • Availability and Continuity Management • Platform Architecture Management • Governance and Knowledge Management, Data Semantics • Operations Management March 22, 2021 33
  • 34. Data Integration Core Operational Characteristics • Data Sources and Data Ingestion, Data Ingestion Rules – the sources of data for data integration and the rules and technologies for processing • Data Targets and Data Mapping/Transfer, Data Integration Rules – the targets of data for data integration and the rules and technologies for processing • Data Transport Technologies – support for the range of data integration technologies • Interim Data Storage/Data Staging – provision of a data staging area for asynchronous data retrieval • Data Structures, Formats and Types – support for a range of input and output data formats and types and the ability to convert from one to another • Data Transformations and Data Processing Rules – facility for transforming source data March 22, 2021 34
  • 35. Data Integration Platform Management Characteristics • Speed, Volume, Throughput, Capacity, Scalability – ability of the platform to handle the volume of data integration activity within agreed times • Security and Access Control – provision of facilities to authenticate and authorise data access requests and to interact with data source security layer • Development, Validation, Deployment and Maintenance – capability to develop, test, deploy and manage new data integrations and changes to existing data integrations • Monitoring, Administration and Management – facilities to monitor the operation of the data integration platform and manage and administer it • Scheduling and Triggering – capacity to manage data integration schedules and events that trigger integrations • Logging, Analysis, Reporting, Event and Alert Management -provision of event and activity logging, the ability to define and receive alerts and the ability to report on and analyse event data March 22, 2021 35
  • 36. Data Integration Platform Service Characteristics • Service Level Management – ensuring that the platform complies with agreed data integration performance and throughput service levels • Capacity Management – monitoring the resources used by the integration platform and ensuring that the platform has sufficient resources • Availability and Continuity Management – guaranteeing that the platform meets availability needs and ensuring its continuity of operations • Platform Architecture Management – managing the overall platform architecture, its upgrades, the additional of new facilities and the support for new integration technologies • Governance and Knowledge Management, Data Semantics – managing knowledge about data integration and providing information about data read from sources and transferred to targets • Operations Management – managing the provision of operational support services for all aspects of the data integration platform March 22, 2021 36
  • 37. Logical Unified Data Integration Architecture March 22, 2021 37 Dashboard/ Analytics/ Reporting Deployed Data Integrations Operational Process Usage Log Scheduler, Rules Engine Operational Data Integrations Integration Design and Development, Version Management and Control Integration Templates and Template Library Integration Publication/ Deployment External Data Sources and Targets Internal Data Sources and Targets Integration Component /Product /Tool Library Deployed Integration Operation Alerting/ Event Management Management and Administration Interface Internal Access Layer External Access Layer Data Knowledge Store Security Interim Data Store External to Internal Translation Data Integration Execution Core integration Platform Data Integration Gateway
  • 38. Logical Unified Data Integration Architecture – Components – 1/2 • Core integration Platform – this orchestrates and manages the operation of data integrations • Deployed Integration Operation – these are specific data integrations that have been developed, tested and are deployed to the Core Integration Platform • Scheduler, Rules Engine – this component manages the definition and operation integration schedules and the actioning of integrations based on triggering events • Operational Data Integrations – these are data integrations that are deployed to operation • Data Integration Execution – this is the component of the Core Integration Platform that executes data integrations • Data Integration Gateway – gateway components provide communications channels to external data sources and targets • External Access Layer/Connectors – this allows external data sources and targets connect to the Core Integration Platform • Internal Access Layer /Connectors – this allows internal data sources and targets connect to the Core Integration Platform • Security – this provide support for source and target authorisation and authentication and integration with their security layers • Internal Data Sources and Targets – these are the data sources and targets that are local to the platform • External Data Targets and Targets – these are the data sources and targets that are remote from the platform • External to Internal Translation – this is intended to represent a facility that translates external requests to internal addresses to provide an additional level of security March 22, 2021 38
  • 39. Logical Unified Data Integration Architecture – Components – 2/2 • Data Knowledge Store – this stores information about data being integrated with to enable its retrieval by subject and content • Interim Data Store – this is a staging area for data being stored between transfer from source to target • Operational Process Usage Log – this contains a log of integration usage and activities • Alerting/Event Management – this allows for the definition, maintenance and handling events and alerts • Dashboard/Analytics/Reporting – this provide a facilities to report on platform activity and usage • Management and Administration Interface – this allows the platform to be managed and administered • Deployed Data Integrations – this represents the set of active deployed integrations • Integration Design and Development, Version Management and Control – this enables data integrations to be developed, tested, deployed to production and subsequently updated • Integration Templates and Template Library – this contains a library of data integration templates that can be used and reused during development • Integration Component /Product/Tool Library – this represents a library of integration technology tools that can be incorporated into and used in integration run times • Integration Publication/ Deployment – this supports the process for deploying data integrations into production March 22, 2021 39
  • 40. Generalised Data Integration Approach • Every data integration consists of a minimum of two (logical) components 1. A source extract/provision half 2. A target delivery half • The source must make the data available in some form and either allow (enable PULL) or initiate (PUSH) the data movement to the target • The target then receives (PUSH) or retrieves (PULL) the data • Direct source to target data integration involves individual point-to- point connections, bypassing any data integration hub • There may be an interim transformation stage where the format and content of the provided data is changed to suit the needs of target • Some Source/Target PUSH/PULL combinations imply the need for a staging area where extracted/provided data from the source resides before being passed to the target − Asynchronous data integration • Classification can be extended by allowing for multiple sources and targets March 22, 2021 40 Source PUSH PULL Target PUSH PULL
  • 41. Logical Data Integration Scenarios March 22, 2021 41 Data Source Data Source Data Source Data Source Data Target Data Source Source PULL Target PUSH Data Source Data Target Source PUSH Target PUSH Source PULL Target PULL Source PUSH Target PULL Source PUSH Target PUSH INCOMING HALF OUTGOING HALF Data Target Source PUSH Target PULL Data Target Source PUSH Target PUSH Data Target Data Integration Hub
  • 42. Integration Combinations • There are many different integration modes/patterns depending on factors such as: − Number of sources for a single integration − Number of targets for a single integration − Push or pull by source and target − Initiator of the integration – source, target or hub • Single Source, Single Target − Source Push Target Push − Source Push Target Pull − Source Pull Target Push − Source Pull Target Pull • Multiple Source, Single Target − Source Push Target Push − Source Push Target Pull − Source Pull Target Push − Source Pull Target Pull • Single Source, Multiple Target − Source Push Target Push − Source Push Target Pull − Source Pull Target Push − Source Pull Target Pull • Multiple Source, Multiple Target − Source Push Target Push − Source Push Target Pull − Source Pull Target Push − Source Pull Target Pull March 22, 2021 42
  • 43. Single Source PUSH Single Target PUSH • Single data source pushes data to integration hub • Hub pushes data to target March 22, 2021 43 Data Source Data Target Source PUSH Target PUSH
  • 44. Single Source PUSH Single Target PULL March 22, 2021 44 • Single data source pushes data to integration hub • Hub allows the target to pull data Data Source Data Target Source PUSH Target PULL
  • 45. Single Source PULL Single Target PUSH March 22, 2021 45 • Data pulled from single data source • Hub pushes data to target Data Source Data Target Source PULL Target PUSH
  • 46. Single Source PULL Single Target PULL March 22, 2021 46 • Data pulled from single data source • Hub allows the target to pull data Data Source Data Target Source PULL Target PULL
  • 47. Multiple Source PUSH Single Target PUSH March 22, 2021 47 Data Source Data Target Multiple Source PUSH Target PUSH Data Source Data Source • Multiple data sources push data to integration hub where it is aggregated • Hub pushes data to target
  • 48. Multiple Source PUSH Single Target PULL March 22, 2021 48 Data Source Data Target Multiple Source PUSH Target PULL Data Source Data Source • Data pushed from multiple data sources and aggregated • Hub allows the target to pull data
  • 49. Multiple Source PULL Single Target PUSH March 22, 2021 49 Data Source Data Target Multiple Source PULL Target PUSH Data Source Data Source • Data pulled from multiple data sources and aggregated • Hub pushes data to target
  • 50. Multiple Source PULL Single Target PULL March 22, 2021 50 Data Source Data Target Multiple Source PULL Target PULL Data Source Data Source • Data pulled from multiple data sources and aggregated • Hub pushes data to multiple targets
  • 51. Single Source PUSH Multiple Target PUSH March 22, 2021 51 Data Source Data Target Source PUSH Multiple Target PUSH Data Target Data Target • Single data source pushes data to integration hub • Hub allows the target to pull data
  • 52. Single Source PUSH Multiple Target PULL March 22, 2021 52 Data Source Data Target Source PUSH Multiple Target PULL Data Target Data Target • Single data source pushes data to integration hub • Hub allows multiple targets to pull data
  • 53. Single Source PULL Multiple Target PUSH March 22, 2021 53 Data Source Data Target Source PULL Multiple Target PUSH Data Target Data Target • Data pulled from single data source • Hub pushes data to multiple targets
  • 54. Single Source PULL Multiple Target PULL March 22, 2021 54 Data Source Data Target Source PULL Multiple Target PULL Data Target Data Target • Data pulled from single data source • Hub allows multiple targets to pull data
  • 55. Multiple Source PUSH Multiple Target PUSH March 22, 2021 55 Data Source Data Target Multiple Source PUSH Multiple Target PUSH Data Target Data Target • Multiple data sources pushes data to integration hub and aggregated • Hub allows multiple targets to pull aggregated data Data Source Data Source
  • 56. Multiple Source PUSH Multiple Target PULL March 22, 2021 56 Data Source Data Target Multiple Source PUSH Multiple Target PULL Data Target Data Target • Multiple data sources pushes data to integration hub and aggregated • Hub pushes aggregated data to multiple targets Data Source Data Source
  • 57. Multiple Source PULL Multiple Target PUSH March 22, 2021 57 Data Source Data Target Multiple Source PULL Multiple Target PUSH Data Target Data Target • Data pulled from multiple data sources and aggregated • Hub pushes aggregated data to multiple targets Data Source Data Source
  • 58. Multiple Source PULL Multiple Target PULL March 22, 2021 58 Data Source Data Target Multiple Source PULL Multiple Target PULL Data Target Data Target • Data pulled from multiple data sources and aggregated • Hub allows multiple targets to pull aggregated data Data Source Data Source
  • 59. Data Integration Initiation And Notification • For source PULL/target PUSH integrations, the integration hub is always in direct control and can synchronise the two halves of the integration – its can initiate the data PULL and then PUSH the resulting data • For other combinations, the hub has less control of synchronisation − Source PUSH/Target PUSH – integration hub can PUSH the data to the target after it has been PUSHed by the source − Source PULL/Target PULL – integration hub can PULL the data from the source when the target requests it − Source PUSH/Target PULL – integration hub must wait for source to PUSH data before it can respond to PULL request from target March 22, 2021 59 Source PUSH PULL Target PUSH PULL = Fully Synchronised = Partially Synchronised = Unsynchronised
  • 60. Synchronous And Asynchronous Data Integration • Synchronous integration occurs where the hub initiates both the PULLing of source data and the PUSHing of transmitted data • Asynchronous integration is where the source supply and the target provision of data do not occur in sequence or where the triggering of the source supply or target provision events are not controlled • This includes subscription-type integration where the data is retained by the hub and retrieved by subscribers March 22, 2021 60 Data Source Data Target Source PULL Target PUSH
  • 61. Data Integration Hub Data Retention • How long should the integration hub retain data? • The integration hub should not become one more organisation data store where data is retained forever • Target PULL integrations are the potential source of accumulated retained undelivered data • The integration hub needs to include a facility to purge unretrieved data and/or the data retention interval needs to be specified as a data integration attribute • Where a target makes a PULL request for data no longer available, the integration hub needs to handle this. March 22, 2021 61
  • 62. Data Integration Initiation – Source PULL/Target PUSH March 22, 2021 62 Data Target Data Source Data Target Hub Requests Data from Source and Send it To The Target
  • 63. Data Integration Initiation – Source PUSH/Target PUSH March 22, 2021 63 Data Source Data Target Hub Receives Data from Source Data Target Data Target Hub Pushes Data to Target
  • 64. Data Integration Initiation – Source PULL/Target PULL March 22, 2021 64 Data Target Data Target Target Requests Data Data Source Data Target Hub Pulls Data From Source Data Target Data Target Hub Responds to Pull Request From Target
  • 65. Data Integration Initiation – Source PUSH/Target PULL March 22, 2021 65 Data Target Data Target Target Requests Data Hub Responds Data Is Not Available Data Source Data Target Source Pushes Data to Hub Hub Receives Data from Source Data Target Data Target Hub Notifies Target Data is Available Data Target Data Target Target Requests Data Hub Responds to Pull Request From Target
  • 66. Data Integration Security • Data integration security arises in fours areas − Source • PUSH – source may need to authenticate with the integration hub • PULL – integration hub may need to authenticate with data source − Target • PUSH – integration hub may need to authenticate with data target • PULL – target may need to authenticate with the integration hub • Integration hub needs to support a range of authentication and authorisation protocols • Integration hub also needs to support security operations and administration March 22, 2021 66
  • 67. Data Integration Security – Source PUSH March 22, 2021 67 Data Source Data Target Hub Authenticates Source and Transmits Authorisation and Access Details Data Source Data Target Data Source Data Target Source Authenticates With Hub, Identifying Integration Name Source PUSHes data
  • 68. Data Integration Security – Source PULL March 22, 2021 68 Data Source Data Target Source Authenticates Source and Transmits Authorisation and Access Details Data Source Data Target Data Source Data Target Hub Authenticates With Source, Identifying Integration Name Hub PULLs data
  • 69. Data Integration Security – Target PUSH March 22, 2021 69 Data Target Data Target Data Target Data Target Data Target Data Target Target Authenticates Source and Transmits Authorisation and Access Details Hub Authenticates With Target, Identifying Integration Name Hub PUSHes data
  • 70. Data Integration Security – Target PULL March 22, 2021 70 Data Target Data Target Data Target Data Target Data Target Data Target Hub Authenticates Target and Transmits Authorisation and Access Details Target Authenticates With Hub, Identifying Integration Name Target PULLs data
  • 71. Data Integration Metadata • Data that provides information about the data integration that enables the integration to be defined, implemented, operated, managed and monitored • Classifications of metadata types March 22, 2021 71 Types of Integration Metadata Descriptive Information about the data integration Business What the data is, its sources, targets, meaning and relationships with other data Structural How the data integration is organised, operated and how versions are maintained? Administrative/ Process How the data integration should be managed and administered through its lifecycle stages and who can perform what operations on the metadata Statistical Information on actual data integration options, usage and other volumetrics Reference Sets of values for structured metadata fields
  • 72. Attributes Of A Data Integration • Each data integration has a number of attributes or sets of metadata that defines its operation and use in detail • This information is needed to define and operate the integration • The information must be collected, stored, made available and maintained in a metadata store March 22, 2021 72 Attribute Description Identifier Defines a unique integration identifier Related Integrations Lists related integrations and identifies the nature of the relationships, including any dependencies Source(s) Defines the source systems or locations where the source data will be obtained from Target(s) Defines the target systems or locations to which the data will be delivered or made available Push/Pull from Source Identifies if the data is pulled or pushed from the source Push/Pull from Target Identifies if the data is pulled or pushed to the target Source Data Format Defines the format of the source data Target Data Format Defines the format of the target data Source Protocol Defines the interface protocol used to obtain the source data and any protocol-specific information Target Protocol Defines the interface protocol used to deliver the target data and any protocol-specific information Validation Lists any validations to be performed on the source data, defining where they are blocking or non- blocking and any exception processing to be performed Transformation Defines any transformation to be performed on the source data including transformation steps and any splits or aggregations performed Data Size Contains an estimate of the size of the source and (transformed) target data Trigger Defines the event(s) that triggers the integration, if relevant Frequency Defines the expected frequency of the data integration, if relevant Data Retention Defines how long the data should be retained between source and target Monitoring and Alerting Lists how the integration will be monitored and how alerts will be generated based on events Source Access Security Defines any security associated with accessing the data source Target Access Security Defines any security associated with accessing the data target Audit Log Identifies where audit information relating to the operation and use of the integration ate stored Restart After Failure Lists detail on how the integration should be recovered and restarted after failure Data Sensitivity Lists the sensitivity of the data being handled by the integration Ownership Identifies the business and technical owners of the integration Priority Defines any priority assigned to the integration Supporting Documentation Identifies where documentation relating to the integration is available User Interface to View/Maintain Transferred Data Identifies the user interface that is available to view and maintain the transferred data Version Details on the current integration version and any previous versions Active/Inactive Flag Indicates if the integration is active or inactive
  • 73. Data Integration Specification • Data integration can be logically specified as follows {Integration{Name, Attributes} Sources {Source1,TechnologyType,Direction,Attributes} {Source2,TechnologyType,Direction,Attributes} {…} } {Transformation {Name, Attributes} Steps {Step1,<Processing>} {Step2,<Processing>} […] } Targets {Target1,TechnologyType,Direction,Attributes} {Target2,TechnologyType,Direction,Attributes} {…} } March 22, 2021 73 Set of data sources, the mechanisms by which data is transferred, the transfer direction (PUSH/PULL) and the extended integration attributes The transformation performed on the source data to create the data sent to or made available to the target Set of data targets, the mechanisms by which data is transferred, the transfer direction (PUSH/PULL) and the extended integration attributes Overall integration identifier and attributes
  • 74. Data Integration Specification • Attributes can be defined at the overall data integration level or at the individual data source and target definition level • Technology type could be one of: − FT – transfer a file using a file transfer protocol − API – information is requested using an API made available by the application − MSG – information is exchanged using a message queueing protocol − ETL – data is exchanged using an ETL process − HTTP – data is exchanged using HTTP GET/PUT • This describes a common approach to defining data integrations March 22, 2021 74
  • 75. Data Integration Transformation Specification • Set of data processing activities, requiring on or more inputs and performed in structured interim contingent outcome- dependent order or sequence to generate one or more outputs and cause one or more outcomes • Transformation is the self-contained unit that completes a given task • Transformation can consist of sub-processes and/or activities • Transformation and its constituent activities, stages and steps can be decomposed into a number of levels of detail, down to the individual atomic level • Transformation is primarily concerned with its outcomes and outputs March 22, 2021 75
  • 76. Data Integration Transformation March 22, 2021 76 • Transformation can be represented at different levels of detail Transformation Trigger(s) Required Input(s) Output(s) Outcome(s)
  • 77. Data Integration Transformation March 22, 2021 77 • Activities within transformation can be linked by routers that direct flow and maintain order based on the values of output(s) and the status of outcome(s) Data Processing Trigger(s) Required Input(s) Output(s) Outcome(s) Router Data Processing Trigger(s) Required Input(s) Output(s) Outcome(s) Data Processing Trigger(s) Required Input(s) Output(s) Outcome(s)
  • 78. Standardised Deployed Operational Data Integrations March 22, 2021 78 Dashboard/ Analytics/ Reporting Deployed Data Integrations Operational Process Usage Log Scheduler, Rules Engine Operational Data Integrations Integration Design and Development, Version Management and Control Integration Templates and Template Library Integration Publication/ Deployment External Data Sources and Targets Internal Data Sources and Targets Integration Component /Product /Tool Library Deployed Integration Operation Alerting/ Event Management Management and Administration Interface Internal Access Layer External Access Layer Data Knowledge Store Security Interim Data Store External to Internal Translation Data Integration Execution Core integration Platform Data Integration Gateway
  • 79. Next Steps • Understand the Scope of the Current Data Integration State − Create an inventory of data integration technologies − Create an inventory of existing data integrations • Create a Future State Data Integration Architecture − Create a data integration reference architecture − Translate reference architecture into an implementation design − Map implementation design to integration technologies and products − Map existing integrations to implementation design March 22, 2021 79