Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence

628 views

Published on

In this presentation, executives from Denodo preview the new Denodo Platform 6.0 release that delivers Dynamic Query Optimizer, cloud offering on Amazon Web Services, and self-service data discovery and search. Over 30 analysts, led by Claudia Imhoff, provide input on strategic direction and benefits of Denodo 6.0 to the data virtualization and the broader data integration market.

This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/DR6r3m.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence

  1. 1. Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence
  2. 2. Speakers Ravi Shankar Chief Marketing Officer Pablo Alvarez Principal Technical Account Manager @Ravi_Shankar_@Denodo
  3. 3. Agenda1.About Denodo 2.Data Virtualization Overview 3.Denodo 6.0 Overview 4.Denodo 6.0 Virtual Launch
  4. 4. About Denodo
  5. 5. HEADQUARTERS Palo Alto, CA. DENODO OFFICES, CUSTOMERS, PARTNERS Global presence throughout North America, EMEA, APAC, and Latin America. CUSTOMERS 250+ customers, including many F500 and G2000 companies across every major industry have gained significant business agility and ROI. LEADERSHIP  Longest continuous focus on data virtualization and data services.  Product leadership.  Solutions expertise. 5 THE LEADER IN DATA VIRTUALIZATION Denodo provides agile, high performance data integration and data abstraction across the broadest range of enterprise, cloud, big data and unstructured data sources, and real-time data services at half the cost of traditional approaches.
  6. 6. Award-Winning Data Virtualization Leader 6 Forrester Wave: Enterprise Data Virtualization Forrester Wave: Enterprise Data Virtualization, Q1 2015 2015 Magic Quadrant for Data Integration Tools 2015 Leader in Forrester Wave: Enterprise Data Virtualization. 2015 Technology Innovation Award for Information Management 2015 #1 Readers Choice Awards For Data Virtualization Platforms 2015 Rank Companies that Matters Most in Data 2015 Big Data 50 – Companies Driving Innovation 2015 Leadership Award in Big Data For Denodo Customer Autodesk Trend-Setting Products in Data and Information Management for 2016 2016 Premier 100 Technology Leader For Denodo Customer CIT
  7. 7. Data Virtualization
  8. 8. The Business Need 8 Ready Access to Critical Information to Support Business Processes MarketingSales ExecutiveSupport Customers Warranties Channels Products Access to complete information Access to related information Access in real-time Cross-sell / Up-sell
  9. 9. Manually access different systems Not productive – slows down response times IT responds with point-to- point data integration The Challenge 9 Data Is Siloed Across Disparate Systems MarketingSales ExecutiveSupport Database Apps Warehouse Cloud Big Data Documents AppsNo SQL
  10. 10. The Solution 10 Data Abstraction Layer Abstracts access to disparate data sources Acts as a single repository (virtual) Makes data available in real-time to consumers 10 DATA ABSTRACTION LAYER
  11. 11. Data Virtualization 11 Data Abstraction Layer Publishes the data to applications Combines related data into views Connects to disparate data sources 2 3 1
  12. 12. Benefits of Data Virtualization 12 Data Virtualization Better Data Integration Lower integration costs by 80%. Flexibility to change. Real-time (on-demand) data services. Complete Information Focus on business information needs. Include web / cloud, big data, unstructured, streaming. Bigger volumes, richer/easier access to data. Better Business Outcome Projects in 4-6 weeks. ROI in <6 months. Adds new IT and business capabilities
  13. 13. Problem Solution Results Case Study 13 Autodesk Successfully Changes Their Revenue Model and Transforms Business  Autodesk was changing their business revenue model from a conventional perpetual license model to subscription-based license model.  Inability to deliver high quality data in a timely manner to business stakeholders.  Evolution from traditional operational data warehouse to contemporary logical data warehouse deemed necessary for faster speed.  General purpose platform to deliver data through logical data warehouse.  Denodo Abstraction Layer helps live invoicing with SAP.  Data virtualization enabled a culture of “see before you build”.  Successfully transitioned to subscription-based licensing.  For the first time, Autodesk can do single point security enforcement and have uniform data environment for access. Autodesk, Inc. is an American multinational software corporation that makes software for the architecture, engineering, construction, manufacturing, media, and entertainment industries.
  14. 14. Denodo Platform 6.0
  15. 15. Accelerate Your Fast Data Strategy With Denodo Platform 6.0 Dynamic Query Optimizer In the Cloud Self Service Data Discovery and Search Best Real-time Performance. Shortest Time to Data. Rapid Decision Making.
  16. 16. Accelerate Your Fast Data Strategy with Denodo Platform 6.0 16 New Release of Denodo Platform Delivers Breakthrough Performance, Accelerates Adoption, and Expedites Business Use of Data Breakthrough Performance Dynamic Query Optimizer delivers breakthrough performance for big data, logical data warehouse, and operational scenarios Data Virtualization In the Cloud Denodo Platform for AWS accelerates adoption of data virtualization Self-service data discovery, and search Self-service data discovery and search expedites use of data by business users
  17. 17. Dynamic Query Optimizer 17 Delivers Breakthrough Performance for Big Data, Logical Data Warehouse, and Operational Scenarios Dynamically determines lowest-cost query execution plan based on statistics Factors in all the special characteristics of big data sources such as number of processing units and partitions Can easily handle any number of incremental queries Enables connectivity to the broadest array of big data sources such as Redshift, Impala, Spark. Best dynamic query optimization engine.
  18. 18. How Dynamic Query Optimizer Works 18 Example: Mining external dimensions with EDW Total sales by retailer and product during the last month for the brand ACME Time Dimension Fact table (sales) Product Dimension Retailer Dimension EDW MDM SELECT retailer.name, product.name, SUM(sales.amount) FROM sales JOIN retailer ON sales.retailer_fk = retailer.id JOIN product ON sales.product_fk = product.id JOIN time ON sales.time_fk = time.id WHERE time.date < ADDMONTH(NOW(),-1) AND product.brand = ‘ACME’ GROUP BY product.name, retailer.name
  19. 19. How Dynamic Query Optimizer Works 19 Example: Non-optimized 1,000,000,0 00 rows JOIN JOIN JOIN GROUP BY product.name, retailer.name 100 rows 10 rows 30 rows 10,000,000 rows SELECT sales.retailer_fk, sales.product_fk, sales.time_fk, sales.amount FROM sales SELECT retailer.name, retailer.id FROM retailer SELECT product.name, product.id FROM product WHERE produc.brand = ‘ACME’ SELECT time.date, time.id FROM time WHERE time.date < add_months(CURRENT_ TIMESTAMP, -1)
  20. 20. How Dynamic Query Optimizer Works 20 Step 1: Applies JOIN reordering to maximize delegation 100,000,000 rows JOIN JOIN 100 rows 10 rows 10,000,000 rows GROUP BY product.name, retailer.name SELECT sales.retailer_fk, sales.product_fk, sales.amount FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1) SELECT retailer.name, retailer.id FROM retailer SELECT product.name, product.id FROM product WHERE produc.brand = ‘ACME’
  21. 21. How Dynamic Query Optimizer Works 21 Step 2 100,000 rows JOIN JOIN 100 rows 10 rows 1,000 rows GROUP BY product.name, retailer.name Since the JOIN is on foreign keys (1-to-many), and the GROUP BY is on attributes from the dimensions, it applies the partial aggregation push down optimization SELECT sales.retailer_fk, sales.product_fk, SUM(sales.amount) FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1) GROUP BY sales.retailer_fk, sales.product_fk SELECT retailer.name, retailer.id FROM retailer SELECT product.name, product.id FROM product WHERE produc.brand = ‘ACME’
  22. 22. How Dynamic Query Optimizer Works 22 Step 3 Selects the right JOIN strategy based on costs for data volume estimations 10,000 rows NESTED JOIN HASH JOIN 100 rows10 rows 1,000 rows GROUP BY product.name, retailer.name SELECT sales.retailer_fk, sales.product_fk, SUM(sales.amount) FROM sales JOIN time ON sales.time_fk = time.id WHERE time.date < add_months(CURRENT_TIMESTAMP, -1) GROUP BY sales.retailer_fk, sales.product_fk WHERE product.id IN (1,2,…) SELECT retailer.name, retailer.id FROM retailer SELECT product.name, product.id FROM product WHERE produc.brand = ‘ACME’
  23. 23. How Dynamic Query Optmizer Works The use of Automatic JOIN reordering groups branches that go to the same source to maximize query delegation and reduce processing in the DV layer  End users don’t need to worry about the optimal “pairing” of the tables The Partial Aggregation push-down optimization is key in those scenarios. Based on PK- FK restrictions, pushes the aggregation (for the PKs) to the DW  Leverages the processing power of the DW, optimized for these aggregations  Reduces significantly the data transferred through the network (from 1 b to 100 k) The Cost-based Optimizer picks the right JOIN strategies based on estimations on data volumes, existence of indexes, transfer rates, etc.  Denodo estimates costs in a different way for parallel databases (Vertica, Netezza, Teradata) than for regular databases to take into consideration the different way those systems operate (distributed data, parallel processing, different aggregation techniques, etc.) 23 Summary
  24. 24. How Dynamic Query Optimizer Works Pruning of unnecessary JOIN branches (based on 1 to + associations) when the attributes of the 1-side are not projected  Relevant for horizontal partitioning and “fat” semantic models when queries do not need attributes for all the tables  Unnecessary tables are removed from the query (even for single-source models) Pruning of UNION branches based on incompatible filters  Enables detection of unnecessary UNION branches in vertical partitioning scenarios Automatic data movement  Creation of temp tables in one of the systems to enable complete delegation of a federated branch.  The target source needs to have the “data movement” option enabled for this option to be taken into account 24 Other relevant optimization techniques
  25. 25. Performance Comparison Logical Data Warehouse vs. Physical Data Warehouse Customer Dimension 2 M rows Sales Facts 290 M rows Items Dimension 400 K rows * TPC-DS is the de-facto industry standard benchmark for measuring the performance of decision support solutions including, but not limited to, Big Data systems. • Denodo has done extensive testing using queries from the standard benchmarking test TPC-DS* and the following scenario • The baseline was set using the same queries with all data in a Netezza appliance
  26. 26. Performance Comparison Logical Data Warehouse vs. Physical Data Warehouse Query Description Returned Rows Avg. Time Physical (Netezza) Denodo Avg. Time Logical Optimization Technique (automatically chosen) Total sales by customer 1.99 M 21.0 sec 21. 5 sec Full aggregation push-down Total sales by customer and year between 2000 and 2004 5.51 M 52.3 sec 59.1 sec Full aggregation push-down Total sales by item brand 31.4 K 4.7 sec 5.3 sec Partial aggregation push-down Total sales by item where sale price less than current list price 17.1 K 3.5 sec 5.2 sec On the fly data movement
  27. 27. Improved Cache Performance 27 Incremental Queries Merge cached data and changed data to provide fully up-to-date results with minimum latency Get Leads changed / added since 1:00AM CACHE Leads updated at 1:00AM Up-to-date Leads data 1. Salesforce ‘Leads’ data cached in VDP at 1:00 AM 2. Query needing Leads data arrives at 11:00 AM 3. Only new/changed leads are retrieved through the WAN 4. Response is up-to-date but query is much faster
  28. 28. Big Data Connectivity Big Data and Cloud Databases Connectivity ■ Redshift – enhanced adapter as data source, cache and data movement target ■ Vertica – enhanced as source, cache and data movement target ■ Apache Spark – enhanced adapter ■ Impala – enhanced as cache and data movement target 28
  29. 29. Data Virtualization in the Cloud 29 Accelerate Adoption of Data Virtualization Ready-to-use and available on AWS Marketplace Dynamic and elastic infrastructure Complete with all enterprise-grade features at the lowest cost Zero set-up requirements Flexible rent-by-the-hour options A wide range of capacity options Only data virtualization platform on AWS.
  30. 30. Buying a Subscription • Customer must have an Amazon AWS account • Choose configuration required (building block + Amazon VM) • Building block by ‘sources’ or ‘number of conc. queries & results’ • Click-Through license agreement • Amazon provides monthly billing based on usage • Annual subscriptions billed upfront • Support included in final pricing 30
  31. 31. Self-Service Data Discovery and Search 31 Expedite Use of Data by Business Users Search – Google-like search for data and metadata Discover – Easy-to-use user interface to browse data and metadata as well as data lineage Explore – Ability to view the graphical representation of entities and relationships Advanced Query Wizard for users to create ad-hoc queries Sandbox environment to explore the data before publishing Data virtualization solution to search data from sources.
  32. 32. Search 32 Google-like Search Global Search – enter keyword to find views containing that data
  33. 33. Discover 33 Data Lineage Views Data lineage and tree view information including derived fields transformations
  34. 34. Explore 34 Graphical representation of views and relationships
  35. 35. Create Ad-hoc Queries 35 GUI Based Query creation & save as new Denodo view Export data via CSV & HTML
  36. 36. Managing Very Large Deployments ■ Establish limits on resource usage e.g. ■ Estimated memory, estimated cost, # of concurrent queries, limits to max. execution time and/or max. # of rows ■ Assigned to user and/or roles ■ Limits can be individual or global e.g. ■ Individual: Each query of a user with role ‘marketing’ cannot use more than 100 MB ■ Global: All concurrent queries from users with role ‘marketing’ cannot use more than 300 MB ■ Possible actions if limits are surpassed: ■ Prevent execution ■ Allow execution with restricted resources ■ Allow execution; cancel if resources limit is surpassed ■ Can be dynamically assigned through custom policies ■ (e.g. assign different plans based on time of day) 36 New Resource Manager
  37. 37. Managing Very Large Deployments Monitor operation of the system, Diagnose Problems and Analyze Usage Metrics ■ The new tool will also allow ‘after the fact’ diagnosis of problems ■ Set the time when the problem occurred and you will see everything that was happening in an integrated, graphical manner down to the individual query level 37 Enhanced Monitoring and Diagnostic Tool
  38. 38. Unified Security and Governance 38 Enforcing Security and Governance Policies ■ Kerberos “Southbound” support for databases and Web Services ■ Kerberos pass-through support and Kerberos constrained delegation ■ API for accessing view dependencies information and data lineage information
  39. 39. Agile Development 39 New Admin Tool ■ Multiple tabs and databases ■ Resize and organize all panels and dialogs ■ Manages several open tasks at the same time ■ VQL highlighting and autocomplete features ■ Graphical support for GIT
  40. 40. 40
  41. 41. Denodo 6.0 – Fast Data Strategy Summit 41 March 30 – US; March 31 – EMEA 9:00 Welcome: Fast Data Strategy Summit Angel Vina, CEO, Denodo 9:30 Analyst Keynote: Accelerating Fast Data Strategy with Data Virtualization Presenter: Noel Yuhanna, Principal Research Analyst, Forrester Research 10:00 Customer Case Study: Designing Fast Data Architectures with Data Virtualization and Big Data on Cloud Presenter: Kurt Jackson, Platform Architect, Autodesk 10:30 Experts Panel: Core Components of Fast Data Strategy – Big Data and Data Virtualization Panelists: Noel Yuhanna, Principal Research Analyst, Forrester Research Mark Eaton, Enterprise Architect, Autodesk Matt Morgan, Vice President, Product and Partner Marketing, Hortonworks Moderated by: Ravi Shankar, CMO, Denodo 11:00 Use cases: Where does Fast Data Strategy fit within IT Projects Presenter: Ravi Shankar, CMO, Denodo 12:00 Demo: How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and Operatio Scenarios Presenter: Pablo Alvarez, Principal Technical Account Manager, Denodo 12:05 Closing: Fast Data Strategy Summit Angel Vina, CEO, Denodo
  42. 42. Denodo 6.0 – Fast Data Strategy Summit 42 March 30 – US; March 31 – EMEA Tracks Case Studies Intro to Data Virtualization Technical Deep-Dive Customer Case Study: SQLization of Hadoop – Increasing Business Adoption of Big Data Chuck DeVries, VP, Strategic Technology and Enterprise Architecture, Vizient Intro: Getting Started with Data Virtualization – What problems DV solves Richard Walker, VP, Sales, Denodo Data Science: Expediting Use of Data by Business Users with Self-service Discovery and Search Mark Pritchard, Director, Sales Engineering Customer Case Study: Data Services – Rapid Application Development using Data Virtualization Jay Heydt, Manager, Database Technologies, DrillingInfo Demo: Getting Started with Data Virtualization – What problems DV solves Pablo Alvarez, Principal Technical Account Manager, Denodo Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses Alberto Bengoa, Sr. Product Manager, Denodo Customer Case Study: Data Virtualization in the Cloud Avinash Desphande, Big Data and Advanced Analytics, Logitech Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0 Alberto Pan, CTO, Denodo Data Virtualization Deployments: How to Manage Very Large Deployments Juan Lozano, Sales Engineering Manager, Denodo Customer Case Study: TBD TBD Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption Paul Moxon, Sr. Director, Strategic Technology Office, Denodo Big Data: Architecture and Performance Considerations in Logical Data Lakes Alberto Pan, CTO, Denodo Customer Case Study: TBD TBD Data Virtualization Maturity: Enterprise Features in Denodo Platform 6.0 Suresh Chandrasekaran, Sr. Vice President, Denodo Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB Alberto Bengoa, Sr. Product Manager, Denodo Customer Case Study: TBD TBD Analyst View of Data Virtualization: Conversations with Boulder Business Intelligence Brain Trust Claudia Imhoff, CEO, Intelligent Solutions Partner Enablement: Architecting and Deploying Data Virtualization
  43. 43. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.

×