Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Integration, Interoperability and Virtualization


Published on

This presentation is focused on the Data Integration and Interoperability section of the DMBOK2. It focuses on Data virtualization as a key tool of Data Integration.

Published in: Technology
  • Be the first to comment

Data Integration, Interoperability and Virtualization

  1. 1. Let’s talk dataIntegration, Interoperability and Virtualization Presented by: 8/8/17
  2. 2. DII – The new kid in the block Data Integration and Interoperability (DII) describes processes related to the movement and consolidation of data within and between data stores, applications and organizations. Why we’re geeking out for DII 1. SOA/Microservices are becoming more popular. 2. Integration of structured and unstructured data 3. Deliver value faster… avoid ROGUE users 4. Although it’s not new, DII in DMBOK provides clear guidelines to organizations aiming to become more efficient through IT.
  3. 3. Data Interoperability Data Interoperability is the ability for multiple systems to communicate. Monolyt hs SOA/Microservi ces
  4. 4. Data Integration Integration consolidates data into consistent forms, either physical or virtual.
  5. 5. History Data virtualization exists since Bill Inmon popularized data warehouse in the 1990s. But virtual models back then were not very popular due to the lack of computer power available (or accessible). Today, change in data types and business expectations on information velocity have made virtualization a more popular concept. Did you know? Last time Bill Inmon wrote about Data virtualization he compared it to a frustrating whack a mole game, where no matter how much you hit the mole… it keeps coming back! http://www.b-eye-
  6. 6. Click here Let’s take a look at your future Virtual vs Physical
  7. 7. Why data virtualization? Fast and Easy • Rapid data integration which enables a faster time to solution • Integrations and changes are easy (No need to update Extractions, tables, DataMart's) Integrate more! • Opportunity to integrate structured and unstructured data Cheaper and more secure • Less expensive to maintain • No need to replicate data • Reduces overhead of management of data integration systems (Easier + Faster = Less reqauired resources) Agile • Enables iterative development with quick deliverables (Note: very important one since in most cases, users don’t know what they want… too many iterations) • Developers are more focused on business instead of understanding the mechanics of data manipulation (Why? Because Data virtualization tools automatically connect to many data sources )
  8. 8. Use Cases Data Warehouse augmentation Problem • Bringing in new data sources to a data warehouse takes a significant amount of effort, but even more so, if the data sources include unstructured data. Fix • Data virtualization can be applied to augment existing data warehouse with virtual views that incorporate unstructured data. Support ETL process Problem • It is sometimes too complicated to access web services data, extract it and make it part of the ETL, specially if you need to develop access methods for external or new types of data. Fix • Data virtualization tools have access methods which can be used to easily extract data from web services, pre-process this data and have it ready to Data Warehouse Federation/Canonical Problem • Some organizations have multiple separate data warehouses which may take too much effort to integrate. Fix • Data virtualization allows to quickly generate federated views of all these data warehouses and integrate this data for different services. Individual warehouses continue to operate with no interruptions. (Same thing for DWH migrations!)
  9. 9. Use Cases Data Warehouse prototyping Problem • Organizations are moving to agile development, where iterations and short term sprints are key to delivering value on a weekly ot bi weekly basis. Fix • When data prototypes are built fast and are validated by users, this then generates a proven product that can then be materialized saving time and therefore money. Data Mashups Problem • Web mashups are enabled by APIs and most corporate data sources do not have accessible APIs to support this mashup process. Fix • Data virtualization tools are enables of mashups since they use same protocols and data delivery formats as APIs. Master Data on Steroids– Past, present and future data Problem • Master Data Hubs traditionally only hold identity and descriptive information, but transactional data is usually not stored in MDHs. Fix • With data virtualization, you could make a canonical layer where you would input data from the MDH and other sources and enrich master data with summarized transactional data. (E.g. adding value of customer over time, purchasing forecast etc…)
  10. 10. So is ETL going away? This does not mean ETL is not needed, its more around identifying when ETL is not enough, and use virtualization to enhance Data integration! When ETL is too slow, or data sources are difficult to access or data types are challenging. Maybe in the future it’ll be the other way around, where we’ll look at ETL for cases when data virtualization is not enough. For instance, when you need to perform highly complex transformations that could impact performance in a virtual database. Today, It is common to virtualize in development and materialize in production. Misconceptions 1. VDB DOES NOT replace a DWH. VDB enhances DWH by: • Combine structured and unstructured data into a single data layer