• Save
Why Data Virtualization? An Introduction by Denodo
Upcoming SlideShare
Loading in...5
×
 

Why Data Virtualization? An Introduction by Denodo

on

  • 3,672 views

Data Virtualization means Real-time Data Access and Integration. But why do I need it? This presentation tries to answer it in a simple yet clear way. ...

Data Virtualization means Real-time Data Access and Integration. But why do I need it? This presentation tries to answer it in a simple yet clear way.
By Alberto Pan, CTO of Denodo, and Justo Hidalgo, VP Product Management.

Statistics

Views

Total Views
3,672
Views on SlideShare
3,672
Embed Views
0

Actions

Likes
5
Downloads
0
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • http://www.flickr.com/photos/maxbraun/98688824/
  • http://dutchamericantranslations.wordpress.com/2010/01/04/matters-of-taste-acronym-or-initialism/
  • http://www.flickr.com/photos/glenirah/4376553184/
  • http://www.flickr.com/photos/adikos/4443291195/
  • Collaboration: self-documenting model, but also actionable. Rapid prototyping platform.
  • Collaboration: self-documenting model, but also actionable. Rapid prototyping platform.
  • http://www.flickr.com/photos/laserstars/908946494/
  • http://www.flickr.com/photos/tudor/458287668/
  • http://www.flickr.com/photos/totalaldo/508664515/
  • http://www.flickr.com/photos/heist_mine/4256417595/
  • http://www.flickr.com/photos/oskay/2157682522/
  • http://www.flickr.com/photos/stevendepolo/3703145222/
  • http://www.flickr.com/photos/m-nicolson/2414298534/
  • http://www.flickr.com/photos/psd/2086641/

Why Data Virtualization? An Introduction by Denodo Why Data Virtualization? An Introduction by Denodo Presentation Transcript

  • What is Data Virtualization and Why It Matters to You Alberto Pan, CTO Justo Hidalgo, VP Product Management & Consulting Denodo Technologies
  •  
  • Contents
    • Why Data Virtualization?
    • Productivity
    • Distributed Query Optimization
    • Layer Independence
    • Governance
    • Data Quality
    • Architecture
  • Our Goal: Serving the Information Barista
  • GREAT, BUT WHAT’S THE PROBLEM?
  • Disjoint Views of Entities – the Elements Customer data spread over different and heterogeneous data sources Too much effort to locate and obtain the data. Data need to be not only extracted, but combined among different applications, interfaces and formats. Log files (.txt/.log files) CRM (MySQL) Billing System (Web Service - Rest) Incidences System (Web Application) Inventory System (MS SQL Server) Product Catalog (Web Service -SOAP) Knowledge Base (Internet) Product Data (CSV)
  • It Would be So Nice If…
  • Happy Ending: Single View of Element- Virtual Integration JDBC ODBC WS CSV XML Web Web Flat files Homogeneous access to all data CRM (MySQL) Billing System (Web Service - Rest) Incidences System (Web Application) Inventory System (MS SQL Server) Product Catalog (Web Service -SOAP) Knowledge Base Product Data (CSV) Log files (.txt/.log files)
  • BUT, WHY A DATA VIRTUALIZATION LAYER ?
  • DIDN’T WE HAVE ENOUGH WITH ETL, ESB, EAI, WS, …?
  •  
  • So, We Went and Asked our Experts
  • Why a Data Virtualization Layer?
    • P roductivity
    • D istributed Query Optimization
    • P hysical and Logical independence
    • G overnance
    • D ata Quality
  • PRODUCTIVITY (because time is money)
    • Built-in connectors for data sources
    • Complex Data Combination operations do not need to be programmed
    Productivity… Applications & 3 rd Party Tools Enterprise Applications, BI, Portals, Dashboards, Web Applications… NAME DESCRIPTION PRICE NAME DESCRIPTION PRICE NAME MANUFACTURER SCORE NAME DESCRIPTION PRICE MANUFACTURER SCORE U ∞
    • Applications do not need to deal with complex data-related issues
      • E.g. swapping of large result sets
      • E.g. caching of costly result sets
      • E.g. management of changes in the sources is done in the DV layer, leaving the business layer unaffected
    • Collaboration and Prototyping
      • Virtualization allows rapid prototyping and testing
    … Productivity…
    • Uniform access
      • Developers use a single model and API instead of learning a mixture of different APIs
      • Learning and execution curves are lower for every additional project on top of the DV layer
    … Productivity
    • Multi-access
      • A Data Virtualization layer can offer the most appropriate access type for each application (JDBC, Web Service, Sharepoint widget…)
  • DISTRIBUTED QUERY OPTIMIZATION (because customers are waiting)
    • Multiple execution strategies available
      • Performance of a distributed join query may vary enormously depending on the used method
        • e.g: hash join , merge join, nested join,…
      • Even if the join is among the same data views, the optimum method may be different for different queries.
    Distributed Query Optimization…
    • The final Executable Plan depends on characteristics such as
      • Strategies
      • Sources
      • Order
    Hash Join Logic Plan Candidate Physical Plans BOOK REVIEW BOOK REVIEW 1 BOOK REVIEW 2 BOOK REVIEW 2 BOOKSTORE A BOOKSTORE B   BOOK STORE A   BOOK STORE B Nested Loop Join BOOK STORE A   NL BOOK STORE B BOOK STORE A   BOOK STORE B Hash Join
    • Source query limitations
    • Push processing to data sources
    • Materialization : pre-load frequently used data and temporal locality
    … Distributed Query Optimization join pushed into data source Delegate join into data source
  •  
    • Applications are independent of changes in data source location, implementation (e.g. from legacy to new system) and schema. E.g.
      • A mainframe is replaced by a new system.
      • Customer data now comes from two systems instead of one due to a merge/acquisition.
      • Two aplications are reengineered into a single one.
      • The data schema of a data source changes.
    Physical and Logical Independence…
  • Let each tool do its business ! An ESB is good at orchestrating business services Data Virtualization is good at accessing information repositories, homogeneizing them and turning them into services … Physical and Logical Independence… ESB DATA VIRTUALIZATION
    • Changes need to be done in a single place. E.g. the way to determine if a customer is ‘VIP’ changes.
      • Many applications will use this data field.
      • In some applications (e.g. BRMS systems) the field can be used many times.
    … Physical and Logical Independence
  • GOVERNANCE (because 24x7 matters)
    • Single entry point for data auditing :
      • Track Data and Metadata changes.
      • E.g. Which user was the last one that modified a certain view?
    • Single point to introspect and query metadata.
      • What is the schema provided by any data source?
    Governance…
    • Change impact management . Single point to answer questions like:
    … Governance…
      • What are the consequences of a change in a data source?
      • Where does the data used by applications come from?.
      • What transformations are applied on source data before they are consumed by applications?
    • Single entry point for data monitoring :
      • Track data sources and data services usage. E.g.
        • how does the number of concurrent connections to a data source evolves throughout the day?
        • send me an e-mail alert if at least 10% of the last 100 queries to a data source failed.
    • Security :
      • Provide authentication and authorization mechanisms for data access.
      • Provide Data encryption functionalities.
    • Protect data sources:
      • Limit concurrent queries to a certain data source.
      • Cache all or part of the data.
      • Limit data replication needs at the data source level.
    … Governance
  • DATA QUALITY (because reliability matters)
    • Many data quality actions can be applied at this layer, avoiding duplicating them in every data source/ application.
    Data Quality
  • … AND WHAT CAN WE DO WITH THESE PIECES?
  • Data Virtualization Detailed Architecture…
  • WRAPPING UP
  • Denodo Platform 4.6 – Virtualized Data Services in Less Time
    • Improved connectivity with Enterprise Ecosystem
      • Sources Connectivity, Middleware and DQ Tools, Publish level
    • Improved Productivity & Ease of Use for
      • Application Developer (connectivity, web integration etc.)  and
      • Data Management Professional (metadata, governance etc)
    • Benefits to Business
      • Rapid access to real-time data from disparate sources for - Agile Reporting and Operational BI / Dashboards - Customer Service Operations, Customer Portals
      • Web Integration becomes “mainstream”
  • You might want to start small …
  • … but you can get very far with Data Virtualization!
  • www.denodo.com | info@denodo.com