#DenodoDataFest
Performance Acceleration:
Summaries, Recommendation, MPP and more
Director of Product Management
Inessa Gerber
Agenda
1. Performance, what does it mean to your organization?
2. IT driven optimization and performance techniques
3. Business driven and guided data discovery
4. Demo: AI driven features for developers & business
5. Conclusion and Final Thoughts
#DenodoDataFest
Performance Across your Organization
▪ System performance and optimized query execution
▪ Streamlined development and management
▪ Guided data discovery for the business user
Optimized Query Execution
#DenodoDataFest
#DenodoDataFest
Denodo Logical Data Fabric
▪ As logical layer, Denodo only stores metadata
▪ Data content remains in the original source
▪ External sources often have processing capabilities
▪ Denodo orchestrates execution of queries in an
optimal way
▪ Maximizing processing push down to the sources
▪ Minimizing data transfer through the network
▪ Additionally, selective materialization techniques
(like caching and summaries) can be used to further
optimize data access
#DenodoDataFest
Query Optimization at a Glance
▪ Query Optimizer combines information from incoming query (aggregations, joins, etc.) and the existing
metadata (view definitions, source capabilities, stats, etc.) to generate optimal execution plan
▪ The Optimizer can generate multiple execution plans, and then chooses the optimal plan for execution
Query
parsing
SQL
REST
OData
GraphQ
L
Mapping
to SQL
Analysis of
metadata
and source
capabilities
Rule-based Optimizer
Cost-based Optimizer
Execution
plan
Result
Set
Consumer
Request
#DenodoDataFest
Query Optimization
▪ Caching: Used for enhancing performance, protecting data sources from costly queries, and/or reusing
complex data combinations and transformations
▪ Summaries: Store common intermediate results that the query optimizer can then use as an starting point to
accelerate analytical queries. Unlike with caching, you do not need to create a view to cache a data set. The
query optimizer will automatically analyze if it can rewrite the incoming queries to take advantage of the data in
the summary
▪ Parallel Processing: Provides native integration with several Massive Parallel Processing (MPP) systems to
accelerate certain queries that require significant processing. Pushing of query processing to the MPP engine will
be used when the query requires the processing of large amounts of data to be done in Denodo, and that
processing cannot be done in streaming mode.
▪ Data Movement: When a query involves two views and one of them is much larger than the other, Virtual
DataPort can transfer the data of the smaller view into the data source of the larger view and execute the
operation in the second data source.
#DenodoDataFest
Cache Overview
Caching, is a form of data replication that can be used to optimize the application in certain scenarios
▪ Improve Query Performance
▪ Slow or high latency data sources (files, cloud apps like Salesforce.com, etc.)
▪ Complex combinations, transformations on large data volumes that take substantial time to process
▪ Reuse data sets in frequently requested queries
▪ Protect sensitive data sources, minimize impact of added workload, and control data access costs
▪ Client queries are automatically deflected to the cache system instead
▪ Client protection against intermittent system availability (unreliable data sources)
9
#DenodoDataFest
How Caching Works
▪ Cached data is stored in a relational database of the client’s choice
▪ Cache tables are created and managed by the Denodo Cache Engine
▪ Can be traditional RDBMS, in-memory database or Cloud Based
▪ Support for native bulk load tools for faster cache population
▪ Denodo supports three cache modes to fit wide range of scenarios
▪ Partial Query-by-Query
▪ Useful for web services or stored procedures with input fields
▪ Full Data Set Replication
▪ Support for full refresh and delta increments
▪ On-demand merge of cached data with real time access to recent changes
10
#DenodoDataFest
Smart Query Acceleration (Summaries)
Materialized Summary Tables
▪ Pre-aggregated data to serve relevant queries
▪ Much smaller than original data set
▪ Key for LDW self-service initiatives
▪ Integrated with query optimizer
▪ Full data lineage and base invalidation
Benefits
▪ Reduce processing at the source & Denodo
▪ Reduce data transfer over network
▪ Transparent to the user
Summ1 Summ2 Summ3 Summ4
#DenodoDataFest
Smart Query Acceleration
Applicable to single source and multi-source queries, and can drastically improve performance
Sales Summary
368,000
Sales
300,000,000
Store
400
Date
73,000
Sales
300,000,000
Store
2,000,000
Date
73,000
Sales by store
during 2020
Sales in Store A
by year
Sales by city
Sales by store
during 2020
Sales in Store A
by year
Sales by city
#DenodoDataFest
Smart Query Acceleration Benchmarks
Query Original Time Accelerated Time Gain factor Summary used
Single Source
(Redshift)
Sales by store
during 2020
8.5 sec. 0.5 s 17 summ_sales_by_date_store
Sales in Store A
by year
7.0 sec. 0.4 s 17.5 summ_sales_by_date_store
Sales by city 5.7 sec 0.6 s 9.5 summ_sales_by_date_store
Multi-Source
(Redshift +
Oracle)
Sales by store
during 2020
14.3 s 6.6 s 2.1 summ_sales_by_date_store
Sales in Store A
by year
10.3 s 0.8 s 12.8 summ_sales_by_date_store
Sales by city 5.8 s 0.6 s 9.6
summ_sales_by_date_store
#DenodoDataFest
Multi Cloud Architecture
US - Zone
EMEA - Zone
On-Prem Systems
#DenodoDataFest
Multi Cloud Architecture
Consumers, apps, users,
etc..
US - Zone
On-Prem Systems
EMEA - Zone
#DenodoDataFest
Multi Cloud Architecture with Summaries
Consumers, apps, users,
etc..
Updates
US - Zone
On-Prem Systems
EMEA - Zone
#DenodoDataFest
Parallel Processing (MPP) Integration
▪ Data Virtualization and Data Lake strategies are often complementary
▪ Data lakes offer processing muscle to process content in a distributed file system
▪ Data Virtualization orchestrates execution, ingestion, ELT processes, semantic modeling
and security
▪ Denodo integrates tightly with a variety of data lake engines
▪ Optimized query push down and efficient data loads into the lake
▪ Support for data lakes as caching layer and ELT flows
▪ On-demand lift&shift execution of external data into the data lake engine to leverage its
MPP capabilities
#DenodoDataFest
MPP Integration: Future Embedded Engine
▪ Customers with existing data lake engines can continue using their
current environment, or can transition to the embedded one
▪ Embedded engine will offer
▪ High performant MPP queries over data in distributed filesystems without
the need of additional software
▪ Out-of-the-box MPP options for caching and acceleration capabilities
▪ Efficient integrated store for large volumes of active metadata / query
history to enable upcoming AI capabilities
▪ Integrated security, deployment configuration and management
Streamlining Development
#DenodoDataFest
#DenodoDataFest
Need for Guided Application Development
▪ Which is the right technique to optimize my application?
▪ What optimizations have been applied and are in use?
▪ Can the system guide developers to optimize their work?
▪ Taking advantage of the privileged position to gather, analyze,
and use the data and usage statistics to guide developers
#DenodoDataFest
ML/AI Based Automation
Privileged to have access to
▪ Usage patterns and statistics on data access, and source response
▪ How datasets are combined and their semantics
▪ What consumer tools are used and by whom
The gathered Active Metadata is used to feed AI
▪ AI driven automation is key to guided development
▪ Active Metadata is vital for the recommendation engine
▪ Captured information is key to recognizing data valuation
#DenodoDataFest
AI Driven Recommendations (Summaries)
AI driven recommendations for Summaries
▪ Based on usage pattern, statistics, data, location,
cost optimization, execution simulations
▪ Recommend Summaries, Location, and provide
information on potential performance gain
▪ Eliminates guess-work and provides for guided
approach to optimize application
#DenodoDataFest
ML/AI Based Automation in the Future
Query: Smart Autocomplete
▪ Augment keyword-based autocomplete with frequently used SQL fragments
Development: Suggest Joins and Transformations
▪ Automatic suggestion of common combinations and transformations, based on past activity of similar users
Discovery: Automatically infer relationships
▪ Use metadata analysis and historical usage (e.g. JOIN conditions)
Performance: automatically refine cost estimations
▪ Detect cases where the optimizer chose a non-optimal execution plan and correct it in future similar queries
Company Proprietary and Confidential
Performant Business User
#DenodoDataFest
#DenodoDataFest
Need for Right data Right Now
▪ Business users know what they need, but not how to find it
▪ Power and Standard users need different discovery experience
▪ Data discovery needs guardrails to prevent user errors
▪ One can not assume the user has specific expertise
▪ Faster Data + Right Data = Valuable Data Insights
#DenodoDataFest
Denodo Data Catalog at a Glance
▪ Organized inventory of virtualized and curated data assets
▪ Enriched metadata for key business indicators
▪ Collaboration across business and IT teams
▪ Active metadata and data valuation indicators
▪ Integrated with Delivery Layer for rapid & secure data access
#DenodoDataFest
Enabling Performant Business User
AI-driven recommendations for relevant data based on usage patterns and
relationships, guides the users to the key data assets and provides for quick results
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in
any for or by any means, electronic or mechanical, including photocopying and
microfilm, without prior the written authorization from Denodo Technologies.
Thank You!
Demo
Recommendations in Data Catalog &
Summaries for the Query Execution
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in
any for or by any means, electronic or mechanical, including photocopying and
microfilm, without prior the written authorization from Denodo Technologies.
Thank You!
Conclusion
#DenodoDataFest
Conclusion
▪ Performance has many facades across IT and Business. Denodo addresses the need
to optimize across the board and is not limiting the features to any area
▪ Transparency is critical in your optimization process. Denodo is a fully transparent
platform enabling you to discover the query process and lifecycle of the data
▪ Guided development and discovery is a vital part of robust development. Denodo is
truly in a privileged position to guide the development of applications and
optimizations based on the gathered information and AI driven features
#DenodoDataFest
Additional Resources
▪ Denodo Caching Module
https://community.denodo.com/docs/html/browse/8.0/en/vdp/administration/cache_module/cache_module
▪ Best Practices to Optimize Performance (Caching)
https://community.denodo.com/kb/en/view/document/Best%20Practices%20to%20Maximize%20Performance%20III:%20Caching?category=Best+Practices
▪ Smart Query Acceleration using Summaries
https://community.denodo.com/docs/html/browse/latest/en/vdp/administration/optimizing_queries/summary_views/summary_views
▪ Parallel Processing (MPP)
https://community.denodo.com/docs/html/browse/latest/en/vdp/administration/optimizing_queries/parallel_processing/parallel_processing
▪ Using AI to Further Accelerate Denodo Platform Performance
https://www.datavirtualizationblog.com/using-ai-to-further-accelerate-denodo-platform-performance/
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in
any for or by any means, electronic or mechanical, including photocopying and
microfilm, without prior the written authorization from Denodo Technologies.
Thank You!

Performance Acceleration: Summaries, Recommendation, MPP and more

  • 2.
    #DenodoDataFest Performance Acceleration: Summaries, Recommendation,MPP and more Director of Product Management Inessa Gerber
  • 3.
    Agenda 1. Performance, whatdoes it mean to your organization? 2. IT driven optimization and performance techniques 3. Business driven and guided data discovery 4. Demo: AI driven features for developers & business 5. Conclusion and Final Thoughts
  • 4.
    #DenodoDataFest Performance Across yourOrganization ▪ System performance and optimized query execution ▪ Streamlined development and management ▪ Guided data discovery for the business user
  • 5.
  • 6.
    #DenodoDataFest Denodo Logical DataFabric ▪ As logical layer, Denodo only stores metadata ▪ Data content remains in the original source ▪ External sources often have processing capabilities ▪ Denodo orchestrates execution of queries in an optimal way ▪ Maximizing processing push down to the sources ▪ Minimizing data transfer through the network ▪ Additionally, selective materialization techniques (like caching and summaries) can be used to further optimize data access
  • 7.
    #DenodoDataFest Query Optimization ata Glance ▪ Query Optimizer combines information from incoming query (aggregations, joins, etc.) and the existing metadata (view definitions, source capabilities, stats, etc.) to generate optimal execution plan ▪ The Optimizer can generate multiple execution plans, and then chooses the optimal plan for execution Query parsing SQL REST OData GraphQ L Mapping to SQL Analysis of metadata and source capabilities Rule-based Optimizer Cost-based Optimizer Execution plan Result Set Consumer Request
  • 8.
    #DenodoDataFest Query Optimization ▪ Caching:Used for enhancing performance, protecting data sources from costly queries, and/or reusing complex data combinations and transformations ▪ Summaries: Store common intermediate results that the query optimizer can then use as an starting point to accelerate analytical queries. Unlike with caching, you do not need to create a view to cache a data set. The query optimizer will automatically analyze if it can rewrite the incoming queries to take advantage of the data in the summary ▪ Parallel Processing: Provides native integration with several Massive Parallel Processing (MPP) systems to accelerate certain queries that require significant processing. Pushing of query processing to the MPP engine will be used when the query requires the processing of large amounts of data to be done in Denodo, and that processing cannot be done in streaming mode. ▪ Data Movement: When a query involves two views and one of them is much larger than the other, Virtual DataPort can transfer the data of the smaller view into the data source of the larger view and execute the operation in the second data source.
  • 9.
    #DenodoDataFest Cache Overview Caching, isa form of data replication that can be used to optimize the application in certain scenarios ▪ Improve Query Performance ▪ Slow or high latency data sources (files, cloud apps like Salesforce.com, etc.) ▪ Complex combinations, transformations on large data volumes that take substantial time to process ▪ Reuse data sets in frequently requested queries ▪ Protect sensitive data sources, minimize impact of added workload, and control data access costs ▪ Client queries are automatically deflected to the cache system instead ▪ Client protection against intermittent system availability (unreliable data sources) 9
  • 10.
    #DenodoDataFest How Caching Works ▪Cached data is stored in a relational database of the client’s choice ▪ Cache tables are created and managed by the Denodo Cache Engine ▪ Can be traditional RDBMS, in-memory database or Cloud Based ▪ Support for native bulk load tools for faster cache population ▪ Denodo supports three cache modes to fit wide range of scenarios ▪ Partial Query-by-Query ▪ Useful for web services or stored procedures with input fields ▪ Full Data Set Replication ▪ Support for full refresh and delta increments ▪ On-demand merge of cached data with real time access to recent changes 10
  • 11.
    #DenodoDataFest Smart Query Acceleration(Summaries) Materialized Summary Tables ▪ Pre-aggregated data to serve relevant queries ▪ Much smaller than original data set ▪ Key for LDW self-service initiatives ▪ Integrated with query optimizer ▪ Full data lineage and base invalidation Benefits ▪ Reduce processing at the source & Denodo ▪ Reduce data transfer over network ▪ Transparent to the user Summ1 Summ2 Summ3 Summ4
  • 12.
    #DenodoDataFest Smart Query Acceleration Applicableto single source and multi-source queries, and can drastically improve performance Sales Summary 368,000 Sales 300,000,000 Store 400 Date 73,000 Sales 300,000,000 Store 2,000,000 Date 73,000 Sales by store during 2020 Sales in Store A by year Sales by city Sales by store during 2020 Sales in Store A by year Sales by city
  • 13.
    #DenodoDataFest Smart Query AccelerationBenchmarks Query Original Time Accelerated Time Gain factor Summary used Single Source (Redshift) Sales by store during 2020 8.5 sec. 0.5 s 17 summ_sales_by_date_store Sales in Store A by year 7.0 sec. 0.4 s 17.5 summ_sales_by_date_store Sales by city 5.7 sec 0.6 s 9.5 summ_sales_by_date_store Multi-Source (Redshift + Oracle) Sales by store during 2020 14.3 s 6.6 s 2.1 summ_sales_by_date_store Sales in Store A by year 10.3 s 0.8 s 12.8 summ_sales_by_date_store Sales by city 5.8 s 0.6 s 9.6 summ_sales_by_date_store
  • 14.
    #DenodoDataFest Multi Cloud Architecture US- Zone EMEA - Zone On-Prem Systems
  • 15.
    #DenodoDataFest Multi Cloud Architecture Consumers,apps, users, etc.. US - Zone On-Prem Systems EMEA - Zone
  • 16.
    #DenodoDataFest Multi Cloud Architecturewith Summaries Consumers, apps, users, etc.. Updates US - Zone On-Prem Systems EMEA - Zone
  • 17.
    #DenodoDataFest Parallel Processing (MPP)Integration ▪ Data Virtualization and Data Lake strategies are often complementary ▪ Data lakes offer processing muscle to process content in a distributed file system ▪ Data Virtualization orchestrates execution, ingestion, ELT processes, semantic modeling and security ▪ Denodo integrates tightly with a variety of data lake engines ▪ Optimized query push down and efficient data loads into the lake ▪ Support for data lakes as caching layer and ELT flows ▪ On-demand lift&shift execution of external data into the data lake engine to leverage its MPP capabilities
  • 18.
    #DenodoDataFest MPP Integration: FutureEmbedded Engine ▪ Customers with existing data lake engines can continue using their current environment, or can transition to the embedded one ▪ Embedded engine will offer ▪ High performant MPP queries over data in distributed filesystems without the need of additional software ▪ Out-of-the-box MPP options for caching and acceleration capabilities ▪ Efficient integrated store for large volumes of active metadata / query history to enable upcoming AI capabilities ▪ Integrated security, deployment configuration and management
  • 19.
  • 20.
    #DenodoDataFest Need for GuidedApplication Development ▪ Which is the right technique to optimize my application? ▪ What optimizations have been applied and are in use? ▪ Can the system guide developers to optimize their work? ▪ Taking advantage of the privileged position to gather, analyze, and use the data and usage statistics to guide developers
  • 21.
    #DenodoDataFest ML/AI Based Automation Privilegedto have access to ▪ Usage patterns and statistics on data access, and source response ▪ How datasets are combined and their semantics ▪ What consumer tools are used and by whom The gathered Active Metadata is used to feed AI ▪ AI driven automation is key to guided development ▪ Active Metadata is vital for the recommendation engine ▪ Captured information is key to recognizing data valuation
  • 22.
    #DenodoDataFest AI Driven Recommendations(Summaries) AI driven recommendations for Summaries ▪ Based on usage pattern, statistics, data, location, cost optimization, execution simulations ▪ Recommend Summaries, Location, and provide information on potential performance gain ▪ Eliminates guess-work and provides for guided approach to optimize application
  • 23.
    #DenodoDataFest ML/AI Based Automationin the Future Query: Smart Autocomplete ▪ Augment keyword-based autocomplete with frequently used SQL fragments Development: Suggest Joins and Transformations ▪ Automatic suggestion of common combinations and transformations, based on past activity of similar users Discovery: Automatically infer relationships ▪ Use metadata analysis and historical usage (e.g. JOIN conditions) Performance: automatically refine cost estimations ▪ Detect cases where the optimizer chose a non-optimal execution plan and correct it in future similar queries Company Proprietary and Confidential
  • 24.
  • 25.
    #DenodoDataFest Need for Rightdata Right Now ▪ Business users know what they need, but not how to find it ▪ Power and Standard users need different discovery experience ▪ Data discovery needs guardrails to prevent user errors ▪ One can not assume the user has specific expertise ▪ Faster Data + Right Data = Valuable Data Insights
  • 26.
    #DenodoDataFest Denodo Data Catalogat a Glance ▪ Organized inventory of virtualized and curated data assets ▪ Enriched metadata for key business indicators ▪ Collaboration across business and IT teams ▪ Active metadata and data valuation indicators ▪ Integrated with Delivery Layer for rapid & secure data access
  • 27.
    #DenodoDataFest Enabling Performant BusinessUser AI-driven recommendations for relevant data based on usage patterns and relationships, guides the users to the key data assets and provides for quick results
  • 28.
    © Copyright DenodoTechnologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies. Thank You! Demo Recommendations in Data Catalog & Summaries for the Query Execution
  • 29.
    © Copyright DenodoTechnologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies. Thank You! Conclusion
  • 30.
    #DenodoDataFest Conclusion ▪ Performance hasmany facades across IT and Business. Denodo addresses the need to optimize across the board and is not limiting the features to any area ▪ Transparency is critical in your optimization process. Denodo is a fully transparent platform enabling you to discover the query process and lifecycle of the data ▪ Guided development and discovery is a vital part of robust development. Denodo is truly in a privileged position to guide the development of applications and optimizations based on the gathered information and AI driven features
  • 31.
    #DenodoDataFest Additional Resources ▪ DenodoCaching Module https://community.denodo.com/docs/html/browse/8.0/en/vdp/administration/cache_module/cache_module ▪ Best Practices to Optimize Performance (Caching) https://community.denodo.com/kb/en/view/document/Best%20Practices%20to%20Maximize%20Performance%20III:%20Caching?category=Best+Practices ▪ Smart Query Acceleration using Summaries https://community.denodo.com/docs/html/browse/latest/en/vdp/administration/optimizing_queries/summary_views/summary_views ▪ Parallel Processing (MPP) https://community.denodo.com/docs/html/browse/latest/en/vdp/administration/optimizing_queries/parallel_processing/parallel_processing ▪ Using AI to Further Accelerate Denodo Platform Performance https://www.datavirtualizationblog.com/using-ai-to-further-accelerate-denodo-platform-performance/
  • 32.
    © Copyright DenodoTechnologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies. Thank You!