Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

281 views

Published on

Watch the full webinar: Data Ninja Webinar Series by Denodo: https://goo.gl/gBZNXS

Enterprise semantic modeling is not a new concept. The idea of defining a semantic layer that business users can use and understand has been supported by enterprise reporting tools for a long time. However, those solutions were tied to the reporting tool of choice.

Modern data virtualization platforms like Denodo offer the capabilities to move the semantic layer outside a specific application. This means that the same semantic data model can be shared by a variety of reporting tools, published as data services and queried through a web-based catalog. The virtual layer becomes the true enterprise data fabric; all data is accessible through a unified single layer, security is always in place, and multiple access methods are available to adapt to the needs of the consumer.

This is session 4 of the Data Ninja Webinar Series organized by Denodo. If you want to learn more about some of the solutions enabled by data virtualization, click here to watch the entire series: https://goo.gl/8XFd1O

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Data Ninja Webinar Series: Data Virtualization as the Enterprise Data Fabric

  1. 1. Data Virtualization as the Enterprise Data Fabric webinars Data Ninja Webinar Series Sessions covering data virtualization solutions for driving business value
  2. 2. 2 Data Ninja Webinars Five webinars over the next few months…
  3. 3. Speakers Senior Engineer Pablo Alvarez
  4. 4. Agenda1.The Data Fabric 2.Evolution of the Data Fabric: A Historical Perspective 3.Benefits 4.Performance and Scalability 5.Going Beyond 6.Q&A
  5. 5. 5 In computing, a Fabric is a system of interconnected nodes that looks like a "weave" when viewed collectively from a distance. In this context, a Data Fabric is a system that allows global access to all your data assets, and leverages storage and processing power from multiple heterogeneous nodes.
  6. 6. 6 Data Virtualization as the Data Fabric  Offers a common access point for consumers  Allows specialized data stores to be used for what they are best at With other approaches, like Data Lakes, that are based on replication to a single large target system, this ability is lost. Data virtualization’s architecture is based on the usage of underlying sources whenever possible. This can be seen as a network of different specialized processing and storage nodes that form the Data Fabric under the umbrella of a common virtual data model:
  7. 7. 7 Successful Customer Use Cases AGILE BUSINESS INTELLIGENCE Replaced traditional BI with the Logical Data Warehouse that integrates multiple sources around a central EDW 360 VIEW APPLICATIONS ‘Unified Desktop’ that provides integrated customer information CLOUD INTEGRATION Virtual layer to abstract access to SaaS applications and enable integration with data center DATA SERVICES Services Layer (REST, OData) on top of Denodo’s data model with access to any data
  8. 8. Evolution of the Data Fabric: A Historical Perspective
  9. 9. 9 The Old Days: EDW Reporting Simple WYSIWYG reporting tools One-to-One reporting on top a tailor- made Data Warehouse and Data Marts Problems:  Poor reusability  Reports built on top of Data Mart data model  Excessive replication Operational Data Staging EDW SQL Data Mart
  10. 10. 10 The Dawn: Reporting with Semantic Layers Operational Data Staging EDW SQL More advanced reporting tools with a built-in semantic layer for easier use and better reusability One-to-One reporting on top a tailor-made Data Warehouse Problems:  Limited to a single source  Limited to a single reporting tool
  11. 11. 11 Reporting with Federation Operational Data Staging EDW SQL Reporting tools add a built-in federation engine that allows for multi-source reporting Problems:  Bad Performance  Limited cross-source security  Limited to a single reporting tool Other RDBMS
  12. 12. 12 Early Data Virtualization Operational Data Staging EDW SQL Data Virtualization as an independent semantic abstraction layer  Reusable semantic model can be used by multiple reporting tools  Engine specialized in federation (optimizer, caching, etc)  Integrated security Other RDBMS Integrated Security Other Sources Cache
  13. 13. 13 Mature Data Virtualization Operational Data EDW SQL Integrated Security Other Sources Cache In-memory Fabric Big Data SaaS REST OData Catalog & Data Exploration Monitoring Auditing
  14. 14. Benefits
  15. 15. Benefits 15 Data Virtualization as the Enterprise Data Fabric Abstracts access to disparate data sources • Homogeneous data access regardless of back-end technology • No need to deal with new languages and APIs: access to SFDC, Excel, Redshift, Oracle, Hadoop, other SaaS APIs, etc. 15 Acts as a single semantic repository • Definition of a consistent business data model across all consumers and reporting tools • Combination of data regardless of locations and nature • Avoids unnecessary replication
  16. 16. Benefits 16 Data Virtualization as the Enterprise Data Fabric 16 Centralized security layer • Role-based authorization to all tables in the virtual layer • Integration with AD/LDAP and Kerberos • Security is moved outside the reporting layer to avoid security bypasses • Centralized access point simplifies operations and auditing Real-time fabric execution model • Advanced optimizer designed specifically for virtualization • Execution push-down to leverage source computing capabilities • Data comes straight from the sources • Cache layer to improve performance when needed
  17. 17. Performance & Scalability
  18. 18. 18 A mature virtualization engine like Denodo offers results comparable with single source executions. Let’s see how this is possible…
  19. 19. 19 Performance Denodo’s unique query optimizer Denodo’s optimizer borrows many techniques from traditional RDBMs  Cost-base query plans based on statistics and indexes  Multiple JOIN methods  Query rewriting to generate more optimal SQL However, given the distributed execution of a query in a processing fabric, Denodo has designed unique techniques to maximize performance in this environment  Dynamic rewriting focused on maximizing execution at source and reduction of network traffic  Cost estimates also factor-in:  Processing power of the sources (e.g. number of nodes in a Hadoop cluster)  Network and transfer rates
  20. 20. 20 Performance DV Overhead: Direct vs Denodo with single source TPCDS Benchmark Tests using JDBC with IBM Netezza as data source with 10 Gbps LAN network Results in seconds When queries only hit an individual source, the data virtualization layer pushes the processing completely to the source with minimal overhead As a note, since data needs to flow through the DV layer, the network between sources and DV should be broad to avoid network bottlenecks
  21. 21. 21 Performance Denodo has done extensive testing using queries from the standard benchmarking test TPC-DS* and the following scenario that compares the performance of a federated approach in Denodo with an MPP system where all the data has been replicated via ETL Benchmarks: Federating large data sets Customer Dim. 2 M rows Sales Facts 290 M rows Items Dim. 400 K rows * TPC-DS is the de-facto industry standard benchmark for measuring the performance of decision support solutions including, but not limited to, Big Data systems. vs. Sales Facts 290 M rows Items Dim. 400 K rows Customer Dim. 2 M rows
  22. 22. 22 Performance Query Description Returned Rows Netezza Time Denodo Time (Federated Oracle, Netezza & SQL Server) Denodo Optimization Technique (automatically selected) Total sales by customer 1.99 M 20.9 sec. 21.4 sec. Full aggregation push-down Total sales by customer and year between 2000 and 2004 5.51 M 52.3 sec. 59.0 sec. Full aggregation push-down Total sales by item brand 31.35 K 4.7 sec. 5.0 sec. Partial aggregation push-down Total sales by item where sale price less than current list price 17.05 K 3.5 sec. 5.2 sec. On the fly data movement Benchmarks: Federating large data sets Execution times are comparable with single source executions based only on automatic optimizer decisions
  23. 23. 23 Performance SELECT c.id, SUM(s.amount) as total FROM customer c JOIN sales s ON c.id = s.customer_id GROUP BY c.id Reporting Tools are not optimized for federation across sources System Execution Time Data Transferred Optimization Technique (automatically selected) Denodo 9 sec. 4 M Aggregation push-down Tableau 125 sec. 292 M None: full scan Join Group By 290 M 2 M Sales Customer Group By Join 2 M 2 M Sales Customer
  24. 24. 24 Scalability SQL Cluster: Denodo1:9999 Denodo2:9999 Denodo3:9999 Denodo4:9999 Web Cont. Cluster: Denodo1:9090 Denodo2:9090 Denodo3:9090 Denodo4:9090 Virtual Server SQL Cluster: 192.168.0.10:9999 Web Container Cluster: 192.168.0.10:9090 Load Balancer Shared Cache Server  Denodo can be deployed in a cluster for HA and horizontal scaling  “Shared-nothing” execution engine ensures linear scalability  Based on the use of an external load balancer  Supports auto-scaling for cloud deployments (like AWS)
  25. 25. Going Beyond
  26. 26. Going Beyond 26 What’s cooking in the virtualization space 26 Holistic Operations Console • Common operations web console to orchestrate monitoring, notifications, diagnosis, auditing, migration, license management, etc. Web-based Self Service • Advanced catalog enables a centralized “data marketplace” • Keyword base search • Collaboration (tags, comments, request for access, etc.) Next-gen “Fabric” Execution Engine • Tight integration with in-memory and data grids to move processing from the virtual layer to specialized execution engines
  27. 27. Q&A
  28. 28. Next Steps Get Started! Download Denodo Express: www.denodoexpress.com Access Denodo Platform on AWS: www.denodo.com/en/denodo- platform/denodo-platform-for-aws Denodo Platform 6.0 Whitepaper Download & Read: http://www.denodo.com/en/document/whitepaper/denodo- platform-60-whitepaper Data Virtualization for Data Services Visit: http://www.denodo.com/en/solutions/horizontal- solutions/data-services
  29. 29. Data Ninja Webinar Series Sessions covering data virtualization solutions for driving business value Next Session: Realizing the Promise of Data Lakes Thursday, December 15th , 2016
  30. 30. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.

×