Your SlideShare is downloading. ×
An introduction to data virtualization in business intelligence
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

An introduction to data virtualization in business intelligence


Published on

A brief description of what Data Virtualisation is and how it can be used to support business intelligence applications and development. Originally presented to the ETIS Conference in Riga, Latvia in …

A brief description of what Data Virtualisation is and how it can be used to support business intelligence applications and development. Originally presented to the ETIS Conference in Riga, Latvia in October 2013

Published in: Technology, Business

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. An Introduction to Data Virtualization in Business Intelligence David M Walker Data Management & Warehousing 18 OKTOBRIS 2013
  • 2. What Is Data Virtualization? •  Wikipedia: “Data virtualization is [..] an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located.” •  Or more simply: A solution that sits in front of multiple data sources and allows them to be treated as a single SQL database
  • 3. Basic Model End$Users$access$ via$a$Repor0ng$ Tools$ ETL$treats$$ DV$plaWorm$$ as$a$source$ Data$Publishing$ Batch/RESTful$ Message$Based$ SOA/Publica0on$ Data$Virtualiza0on$PlaWorm$ Defines$a$‘model’$of$the$source$systems$(similar$in$concept$to$a$BO$Universe)$ Models$can$generally$be$layered$on$top$of$other$models$$$ •  Tradi0onal$Databases$ •  •  •  •  •  •  IBM$(DB2$&$Netezza)$ Microso@$(SQL$Server)$ Oracle$(Oracle$&$MySQL)$ Postgres$ Sybase$(ASE$&$IQ)$ Etc.$ •  NoSQL$/$NewSQL$ •  •  •  •  •  Apache$Hadoop$ Cassandra$ Mongo$ Neo4J$ etc.$ •  Other$Formats$ •  •  •  •  •  •  •  •  Microso@$Office$ Messaging$ Flat$Files$ XML$ Web$ Cloud$ Applica0on$APIs$ etc.$
  • 4. Advanced Features: Role Based Access Control & Data Masking User$1$ User$2$ First&Name& Last&Name& DoB& Salary& First&Name& Last&Name& Age& Joe$ Bloggs$ 30^Jan^1983$ NULL$ Joe$ Bloggs$ 30$ Jane$ Smith$ 17^Jun^1978$ NULL$ Jane$ Smith$ 35$ Role$Based$ Authen0ca0on$ Data$Virtualiza0on$PlaWorm:$ Manages$sensi0ve$informa0on$based$on$a$users$role$ First&Name& Last&Name& DoB& Salary& Joe$ Bloggs$ 30^Jan^1983$ €60,100$ Jane$ Smith$ 17^Jun^1978$ €75,400$
  • 5. Advanced Features: Caching User$sees$performance$as$if$all$the$data$was$local$ Data$Virtualiza0on$PlaWorm$ $$ $ Cached$Copy$of$$ Remote$Database$Table$ Local$Database$Table$$ with$good$connec0vity$$ Remote$Database$Table$ with$poor$connec0vity$$
  • 6. Advanced Features: Creating a Canonical Data Model User$sees$system$as$a$single$CDM$and$not$mul0ple$sources$ Data$Virtualiza0on$PlaWorm$ $$ $ Data$mapped$to$ conform$to$a$$$ Canonical$Model$ Finance$System$ Other$Systems$ CRM$System$ Billing$System$ Website$
  • 7. But it’s not a Silver Bullet •  Can be slow –  Depending on how much data has to be fetched from remote systems to the DV platform – platforms try to be smart to reduce this •  Can impact performance on underlying systems –  Lots of BI users making queries on resource sensitive OLTP systems is not a good idea •  Requires Resources –  Another set of servers, technologies, etc. to manage, but this cost is often offset against the reduction in complexity elsewhere. •  Not a replacement – it is an additional tool –  You will still need ETL and Messaging
  • 8. BI Use Cases: Agile Data Mart Design •  Access data warehouse data quickly and easily •  Design the data mart you think you want •  Test it with real data and your actual reporting tool •  Also possible with data warehouse design Data$Virtualiza0on$PlaWorm$ A$ OR$ Data$Warehouse$ B$
  • 9. BI Use Case: Virtual Data Marts •  Big Tin Appliance with lots of horse power? •  Don’t want to duplicate data in the appliance and consume disk space for a data mart but want the star schema for ease of use? Data$Virtualiza0on$PlaWorm$ Data$Warehouse$
  • 10. BI Use Case: Data Mart Extensions •  Existing (physical) data mart •  New Data source that needs to be incorporated quickly •  Create virtual copy of existing data mart and data source •  Integrate into updated data mart design Data$ Virtualiza0on$ PlaWorm$ Data$Mart$ New$Data$ Source$ $
  • 11. BI Use Case: Agile Set Based ELT Design •  If your normal ETL style is a series of set SQL queries built on top of each other then you can quickly prototype ETL before moving it into your normal ETL engine to persist execute (normally for performance) Data$Virtualiza0on$PlaWorm$ Source$ Source$ Source$
  • 12. BI Use Case: Big Data Integration •  DV Platform connects to Big Data Sources •  Data Sources are mapped into DV •  User accesses them via standard tools (SQL, RESTful interfaces, etc.) SQL$based$tools$ SQL$Interface$ Data$Virtualiza0on$PlaWorm$ Map$Reduce,$etc.$Interface$
  • 13. BI Use Case: Source System Analysis •  Apply your data quality and data profiling tools to all your data sources •  Look for relationships across systems •  Remove limitations of accessibility by enabling caching so that you are not hitting the source system but have fresh data Data$Quality$&$Profiling$Tools$ Data$Virtualiza0on$PlaWorm$ Source$ Source$ Source$
  • 14. BI Use Case: Data Masking •  Currently building two versions of a data mart, one with sensitive data in and one without •  Instead build one and use Role Based Access Control (RBAC) to restrict what an individual can see Data$Virtualiza0on$PlaWorm$ AND$ Physical$Data$Mart$
  • 15. BI Use Cases •  Some examples –  Usefulness of each example depends on the organization •  Generally an enabler for more agility –  Quicker prototyping and integration •  Will not solve all your problems –  And has a cost associated with it (license & hardware
  • 16. Vendors: What The Analysts Say •  Forrester Wave Data Virtualization Q1 2012 •  Forrester Wave Q1/12 –  Informatica –  IBM –  Denodo •  EU (Spanish) Origins –  Composite •  Now part of Cisco •  Was OEM’d by Informatica –  Microsoft –  SAP –  And others •  Gartner –  No Magic Quadrant, instead includes Data Virtualization in Data Integration
  • 17. Vendors: Product Positioning Stand Alone •  Players –  Cisco (Composite) –  Denodo •  Selection –  Popular where IBM/ Informatica are not already embedded Integrated •  Players –  IBM –  Informatica •  Selection –  Popular with organisations that already have the vendor ETL tool
  • 18. An Introduction to Data Virtualization in Business Intelligence David M Walker Data Management & Warehousing THANK YOU - PALDIES