Let’s talk dataIntegration, Interoperability and Virtualization
Presented by:
javier@enroutesystems.com
art@enroutesystems.com
8/8/17
DII – The new kid in the
block
Data Integration and Interoperability (DII) describes processes
related to the movement and consolidation of data within
and between data stores, applications and organizations.
Why we’re geeking out for DII
1. SOA/Microservices are becoming more popular.
2. Integration of structured and unstructured data
3. Deliver value faster… avoid ROGUE users
4. Although it’s not new, DII in DMBOK provides clear
guidelines to organizations aiming to become more efficient
through IT.
Data Interoperability
Data Interoperability is the ability for multiple systems to communicate.
Monolyt
hs
SOA/Microservi
ces
Data Integration
Integration consolidates data into consistent forms, either physical or
virtual.
History
Data virtualization exists
since Bill Inmon
popularized data
warehouse in the 1990s.
But virtual models back
then were not very
popular due to the lack
of computer power
available (or
accessible).
Today, change in data
types and business
expectations on
information velocity have
made virtualization a
more popular concept.
Did you know?
Last time Bill Inmon wrote about
Data virtualization he compared
it to a frustrating whack a mole
game, where no matter how
much you hit the mole… it
keeps coming back!
http://www.b-eye-
network.com/view/9956
Click here
Let’s take a look at your future
Virtual vs Physical
Why data virtualization?
Fast and Easy
• Rapid data integration which enables a faster time to solution
• Integrations and changes are easy (No need to update Extractions, tables, DataMart's)
Integrate more!
• Opportunity to integrate structured and unstructured data
Cheaper and more secure
• Less expensive to maintain
• No need to replicate data
• Reduces overhead of management of data integration systems (Easier + Faster = Less
reqauired resources)
Agile
• Enables iterative development with quick deliverables (Note: very important one since in most
cases, users don’t know what they want… too many iterations)
• Developers are more focused on business instead of understanding the mechanics of data
manipulation (Why? Because Data virtualization tools automatically connect to many data
sources )
Use Cases
Data Warehouse augmentation
Problem
• Bringing in new data sources to a data warehouse
takes a significant amount of effort, but even more
so, if the data sources include unstructured data.
Fix
• Data virtualization can be applied to augment
existing data warehouse with virtual views that
incorporate unstructured data.
Support ETL process
Problem
• It is sometimes too complicated to access web
services data, extract it and make it part of the ETL,
specially if you need to develop access methods for
external or new types of data.
Fix
• Data virtualization tools have access methods which
can be used to easily extract data from web
services, pre-process this data and have it ready to
Data Warehouse Federation/Canonical
Problem
• Some organizations have multiple separate data
warehouses which may take too much effort to
integrate.
Fix
• Data virtualization allows to quickly generate
federated views of all these data warehouses and
integrate this data for different services. Individual
warehouses continue to operate with no
interruptions. (Same thing for DWH migrations!)
Use Cases
Data Warehouse prototyping
Problem
• Organizations are moving to agile development,
where iterations and short term sprints are key to
delivering value on a weekly ot bi weekly basis.
Fix
• When data prototypes are built fast and are
validated by users, this then generates a proven
product that can then be materialized saving time
and therefore money.
Data Mashups
Problem
• Web mashups are enabled by APIs and most
corporate data sources do not have accessible APIs
to support this mashup process.
Fix
• Data virtualization tools are enables of mashups
since they use same protocols and data delivery
formats as APIs.
Master Data on Steroids– Past, present and future
data
Problem
• Master Data Hubs traditionally only hold identity and
descriptive information, but transactional data is
usually not stored in MDHs.
Fix
• With data virtualization, you could make a canonical
layer where you would input data from the MDH and
other sources and enrich master data with
summarized transactional data. (E.g. adding value
of customer over time, purchasing forecast etc…)
So is ETL going away?
This does not mean ETL is not needed, its more around identifying when ETL is not
enough, and use virtualization to enhance Data integration! When ETL is too slow, or
data sources are difficult to access or data types are challenging.
Maybe in the future it’ll be the other way around, where we’ll look at ETL for cases
when data virtualization is not enough. For instance, when you need to perform highly
complex transformations that could impact performance in a virtual database.
Today, It is common to virtualize in development and materialize in production.
Misconceptions
1. VDB DOES NOT replace a DWH. VDB enhances DWH by:
• Combine structured and unstructured data into a single data layer

Data Integration, Interoperability and Virtualization

  • 1.
    Let’s talk dataIntegration,Interoperability and Virtualization Presented by: javier@enroutesystems.com art@enroutesystems.com 8/8/17
  • 2.
    DII – Thenew kid in the block Data Integration and Interoperability (DII) describes processes related to the movement and consolidation of data within and between data stores, applications and organizations. Why we’re geeking out for DII 1. SOA/Microservices are becoming more popular. 2. Integration of structured and unstructured data 3. Deliver value faster… avoid ROGUE users 4. Although it’s not new, DII in DMBOK provides clear guidelines to organizations aiming to become more efficient through IT.
  • 3.
    Data Interoperability Data Interoperabilityis the ability for multiple systems to communicate. Monolyt hs SOA/Microservi ces
  • 4.
    Data Integration Integration consolidatesdata into consistent forms, either physical or virtual.
  • 5.
    History Data virtualization exists sinceBill Inmon popularized data warehouse in the 1990s. But virtual models back then were not very popular due to the lack of computer power available (or accessible). Today, change in data types and business expectations on information velocity have made virtualization a more popular concept. Did you know? Last time Bill Inmon wrote about Data virtualization he compared it to a frustrating whack a mole game, where no matter how much you hit the mole… it keeps coming back! http://www.b-eye- network.com/view/9956
  • 7.
    Click here Let’s takea look at your future Virtual vs Physical
  • 8.
    Why data virtualization? Fastand Easy • Rapid data integration which enables a faster time to solution • Integrations and changes are easy (No need to update Extractions, tables, DataMart's) Integrate more! • Opportunity to integrate structured and unstructured data Cheaper and more secure • Less expensive to maintain • No need to replicate data • Reduces overhead of management of data integration systems (Easier + Faster = Less reqauired resources) Agile • Enables iterative development with quick deliverables (Note: very important one since in most cases, users don’t know what they want… too many iterations) • Developers are more focused on business instead of understanding the mechanics of data manipulation (Why? Because Data virtualization tools automatically connect to many data sources )
  • 10.
    Use Cases Data Warehouseaugmentation Problem • Bringing in new data sources to a data warehouse takes a significant amount of effort, but even more so, if the data sources include unstructured data. Fix • Data virtualization can be applied to augment existing data warehouse with virtual views that incorporate unstructured data. Support ETL process Problem • It is sometimes too complicated to access web services data, extract it and make it part of the ETL, specially if you need to develop access methods for external or new types of data. Fix • Data virtualization tools have access methods which can be used to easily extract data from web services, pre-process this data and have it ready to Data Warehouse Federation/Canonical Problem • Some organizations have multiple separate data warehouses which may take too much effort to integrate. Fix • Data virtualization allows to quickly generate federated views of all these data warehouses and integrate this data for different services. Individual warehouses continue to operate with no interruptions. (Same thing for DWH migrations!)
  • 11.
    Use Cases Data Warehouseprototyping Problem • Organizations are moving to agile development, where iterations and short term sprints are key to delivering value on a weekly ot bi weekly basis. Fix • When data prototypes are built fast and are validated by users, this then generates a proven product that can then be materialized saving time and therefore money. Data Mashups Problem • Web mashups are enabled by APIs and most corporate data sources do not have accessible APIs to support this mashup process. Fix • Data virtualization tools are enables of mashups since they use same protocols and data delivery formats as APIs. Master Data on Steroids– Past, present and future data Problem • Master Data Hubs traditionally only hold identity and descriptive information, but transactional data is usually not stored in MDHs. Fix • With data virtualization, you could make a canonical layer where you would input data from the MDH and other sources and enrich master data with summarized transactional data. (E.g. adding value of customer over time, purchasing forecast etc…)
  • 12.
    So is ETLgoing away? This does not mean ETL is not needed, its more around identifying when ETL is not enough, and use virtualization to enhance Data integration! When ETL is too slow, or data sources are difficult to access or data types are challenging. Maybe in the future it’ll be the other way around, where we’ll look at ETL for cases when data virtualization is not enough. For instance, when you need to perform highly complex transformations that could impact performance in a virtual database. Today, It is common to virtualize in development and materialize in production. Misconceptions 1. VDB DOES NOT replace a DWH. VDB enhances DWH by: • Combine structured and unstructured data into a single data layer