The document discusses techniques for optimizing performance in Denodo, including caching, summaries, parallel processing, and AI-driven recommendations. Caching stores pre-aggregated data to improve query performance on slow data sources. Summaries further optimize queries by storing common intermediate results. Parallel processing pushes queries to external data lake engines for distributed processing. AI analyzes metadata to recommend optimizations like summaries and guide developers and business users to relevant data.
3. Agenda
1. Performance, what does it mean to your organization?
2. IT driven optimization and performance techniques
3. Business driven and guided data discovery
4. Demo: AI driven features for developers & business
5. Conclusion and Final Thoughts
4. #DenodoDataFest
Performance Across your Organization
▪ System performance and optimized query execution
▪ Streamlined development and management
▪ Guided data discovery for the business user
6. #DenodoDataFest
Denodo Logical Data Fabric
▪ As logical layer, Denodo only stores metadata
▪ Data content remains in the original source
▪ External sources often have processing capabilities
▪ Denodo orchestrates execution of queries in an
optimal way
▪ Maximizing processing push down to the sources
▪ Minimizing data transfer through the network
▪ Additionally, selective materialization techniques
(like caching and summaries) can be used to further
optimize data access
7. #DenodoDataFest
Query Optimization at a Glance
▪ Query Optimizer combines information from incoming query (aggregations, joins, etc.) and the existing
metadata (view definitions, source capabilities, stats, etc.) to generate optimal execution plan
▪ The Optimizer can generate multiple execution plans, and then chooses the optimal plan for execution
Query
parsing
SQL
REST
OData
GraphQ
L
Mapping
to SQL
Analysis of
metadata
and source
capabilities
Rule-based Optimizer
Cost-based Optimizer
Execution
plan
Result
Set
Consumer
Request
8. #DenodoDataFest
Query Optimization
▪ Caching: Used for enhancing performance, protecting data sources from costly queries, and/or reusing
complex data combinations and transformations
▪ Summaries: Store common intermediate results that the query optimizer can then use as an starting point to
accelerate analytical queries. Unlike with caching, you do not need to create a view to cache a data set. The
query optimizer will automatically analyze if it can rewrite the incoming queries to take advantage of the data in
the summary
▪ Parallel Processing: Provides native integration with several Massive Parallel Processing (MPP) systems to
accelerate certain queries that require significant processing. Pushing of query processing to the MPP engine will
be used when the query requires the processing of large amounts of data to be done in Denodo, and that
processing cannot be done in streaming mode.
▪ Data Movement: When a query involves two views and one of them is much larger than the other, Virtual
DataPort can transfer the data of the smaller view into the data source of the larger view and execute the
operation in the second data source.
9. #DenodoDataFest
Cache Overview
Caching, is a form of data replication that can be used to optimize the application in certain scenarios
▪ Improve Query Performance
▪ Slow or high latency data sources (files, cloud apps like Salesforce.com, etc.)
▪ Complex combinations, transformations on large data volumes that take substantial time to process
▪ Reuse data sets in frequently requested queries
▪ Protect sensitive data sources, minimize impact of added workload, and control data access costs
▪ Client queries are automatically deflected to the cache system instead
▪ Client protection against intermittent system availability (unreliable data sources)
9
10. #DenodoDataFest
How Caching Works
▪ Cached data is stored in a relational database of the client’s choice
▪ Cache tables are created and managed by the Denodo Cache Engine
▪ Can be traditional RDBMS, in-memory database or Cloud Based
▪ Support for native bulk load tools for faster cache population
▪ Denodo supports three cache modes to fit wide range of scenarios
▪ Partial Query-by-Query
▪ Useful for web services or stored procedures with input fields
▪ Full Data Set Replication
▪ Support for full refresh and delta increments
▪ On-demand merge of cached data with real time access to recent changes
10
11. #DenodoDataFest
Smart Query Acceleration (Summaries)
Materialized Summary Tables
▪ Pre-aggregated data to serve relevant queries
▪ Much smaller than original data set
▪ Key for LDW self-service initiatives
▪ Integrated with query optimizer
▪ Full data lineage and base invalidation
Benefits
▪ Reduce processing at the source & Denodo
▪ Reduce data transfer over network
▪ Transparent to the user
Summ1 Summ2 Summ3 Summ4
12. #DenodoDataFest
Smart Query Acceleration
Applicable to single source and multi-source queries, and can drastically improve performance
Sales Summary
368,000
Sales
300,000,000
Store
400
Date
73,000
Sales
300,000,000
Store
2,000,000
Date
73,000
Sales by store
during 2020
Sales in Store A
by year
Sales by city
Sales by store
during 2020
Sales in Store A
by year
Sales by city
13. #DenodoDataFest
Smart Query Acceleration Benchmarks
Query Original Time Accelerated Time Gain factor Summary used
Single Source
(Redshift)
Sales by store
during 2020
8.5 sec. 0.5 s 17 summ_sales_by_date_store
Sales in Store A
by year
7.0 sec. 0.4 s 17.5 summ_sales_by_date_store
Sales by city 5.7 sec 0.6 s 9.5 summ_sales_by_date_store
Multi-Source
(Redshift +
Oracle)
Sales by store
during 2020
14.3 s 6.6 s 2.1 summ_sales_by_date_store
Sales in Store A
by year
10.3 s 0.8 s 12.8 summ_sales_by_date_store
Sales by city 5.8 s 0.6 s 9.6
summ_sales_by_date_store
17. #DenodoDataFest
Parallel Processing (MPP) Integration
▪ Data Virtualization and Data Lake strategies are often complementary
▪ Data lakes offer processing muscle to process content in a distributed file system
▪ Data Virtualization orchestrates execution, ingestion, ELT processes, semantic modeling
and security
▪ Denodo integrates tightly with a variety of data lake engines
▪ Optimized query push down and efficient data loads into the lake
▪ Support for data lakes as caching layer and ELT flows
▪ On-demand lift&shift execution of external data into the data lake engine to leverage its
MPP capabilities
18. #DenodoDataFest
MPP Integration: Future Embedded Engine
▪ Customers with existing data lake engines can continue using their
current environment, or can transition to the embedded one
▪ Embedded engine will offer
▪ High performant MPP queries over data in distributed filesystems without
the need of additional software
▪ Out-of-the-box MPP options for caching and acceleration capabilities
▪ Efficient integrated store for large volumes of active metadata / query
history to enable upcoming AI capabilities
▪ Integrated security, deployment configuration and management
20. #DenodoDataFest
Need for Guided Application Development
▪ Which is the right technique to optimize my application?
▪ What optimizations have been applied and are in use?
▪ Can the system guide developers to optimize their work?
▪ Taking advantage of the privileged position to gather, analyze,
and use the data and usage statistics to guide developers
21. #DenodoDataFest
ML/AI Based Automation
Privileged to have access to
▪ Usage patterns and statistics on data access, and source response
▪ How datasets are combined and their semantics
▪ What consumer tools are used and by whom
The gathered Active Metadata is used to feed AI
▪ AI driven automation is key to guided development
▪ Active Metadata is vital for the recommendation engine
▪ Captured information is key to recognizing data valuation
22. #DenodoDataFest
AI Driven Recommendations (Summaries)
AI driven recommendations for Summaries
▪ Based on usage pattern, statistics, data, location,
cost optimization, execution simulations
▪ Recommend Summaries, Location, and provide
information on potential performance gain
▪ Eliminates guess-work and provides for guided
approach to optimize application
23. #DenodoDataFest
ML/AI Based Automation in the Future
Query: Smart Autocomplete
▪ Augment keyword-based autocomplete with frequently used SQL fragments
Development: Suggest Joins and Transformations
▪ Automatic suggestion of common combinations and transformations, based on past activity of similar users
Discovery: Automatically infer relationships
▪ Use metadata analysis and historical usage (e.g. JOIN conditions)
Performance: automatically refine cost estimations
▪ Detect cases where the optimizer chose a non-optimal execution plan and correct it in future similar queries
Company Proprietary and Confidential
25. #DenodoDataFest
Need for Right data Right Now
▪ Business users know what they need, but not how to find it
▪ Power and Standard users need different discovery experience
▪ Data discovery needs guardrails to prevent user errors
▪ One can not assume the user has specific expertise
▪ Faster Data + Right Data = Valuable Data Insights
26. #DenodoDataFest
Denodo Data Catalog at a Glance
▪ Organized inventory of virtualized and curated data assets
▪ Enriched metadata for key business indicators
▪ Collaboration across business and IT teams
▪ Active metadata and data valuation indicators
▪ Integrated with Delivery Layer for rapid & secure data access
27. #DenodoDataFest
Enabling Performant Business User
AI-driven recommendations for relevant data based on usage patterns and
relationships, guides the users to the key data assets and provides for quick results
30. #DenodoDataFest
Conclusion
▪ Performance has many facades across IT and Business. Denodo addresses the need
to optimize across the board and is not limiting the features to any area
▪ Transparency is critical in your optimization process. Denodo is a fully transparent
platform enabling you to discover the query process and lifecycle of the data
▪ Guided development and discovery is a vital part of robust development. Denodo is
truly in a privileged position to guide the development of applications and
optimizations based on the gathered information and AI driven features
31. #DenodoDataFest
Additional Resources
▪ Denodo Caching Module
https://community.denodo.com/docs/html/browse/8.0/en/vdp/administration/cache_module/cache_module
▪ Best Practices to Optimize Performance (Caching)
https://community.denodo.com/kb/en/view/document/Best%20Practices%20to%20Maximize%20Performance%20III:%20Caching?category=Best+Practices
▪ Smart Query Acceleration using Summaries
https://community.denodo.com/docs/html/browse/latest/en/vdp/administration/optimizing_queries/summary_views/summary_views
▪ Parallel Processing (MPP)
https://community.denodo.com/docs/html/browse/latest/en/vdp/administration/optimizing_queries/parallel_processing/parallel_processing
▪ Using AI to Further Accelerate Denodo Platform Performance
https://www.datavirtualizationblog.com/using-ai-to-further-accelerate-denodo-platform-performance/