Watch here: https://bit.ly/2BiNVlU
Denodo 8 builds on its outstanding performance features to include query acceleration capabilities. Denodo’s new engine is able to accelerate execution through an automatic query-rewriting process that identifies pre-calculated summaries that can be used to reduce execution time. As a result, end users and reporting tools don’t have to change their queries, while the execution times can be 100x faster in analytical scenarios.
Attend this session to learn:
- How this new functionality works
- Use cases and scenarios that can take advantage of it
- See it action with a product demo
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Accelerate your Queries with Data Virtualization
1. DATA VIRTUALIZATION PACKED LUNCH
WEBINAR SERIES
Sessions Covering Key Data Integration Challenges
Solved with Data Virtualization
2. Accelerate your Queries with Data
Virtualization
Speed up execution in analytical scenarios up to 100x
Pablo Alvarez-Yanez
Director of Product Management, Denodo
5. 55
What does “query acceleration” means?
“Query acceleration” is a set of techniques that are able
to significantly speed up the execution of a new query
that was not previously cached
These techniques can make a query run up to 100x
faster, improving the usability of ad-hoc queries in very
reactive scenarios, like interactive dashboards
Query acceleration applies not just to federated queries,
but to queries in a single source too
In short, it’s a way to make ad hoc queries run much
faster
7. 7
Main factors that determine
performance:
• Processing in the source
• Data transfer
• Processing in Denodo
Optimizing analytic scenarios
Total sales by
store?
denodo
Let’s review the techniques traditionally
used to improve execution times
8. 8
Based on metadata analysis, generate an
execution plan that generate equivalent
results but faster than the original one
These optimization techniques are key for
a the performance of real-time queries
• Processing in the source
• Data transfer
• Processing in Denodo
Rule-based and Cost-based optimizations
300 M 2 M
Sales Store
join
group by
2 M
2 M
Sales Store
join
group by
ID
Group by
store
Total sales by
store?
125 secs
15 secs
9. 9
Save the results of a previous query to use in
following executions
• Processing in the source
• Data transfer
• Processing in Denodo
Limited for self service scenarios (ad hoc queries):
• Require end user knowledge of the data model
• Queries need to refer to the specific views that
have been cached
• Not flexible enough for aggregation queries
Traditional Caching
Total sales by
store
denodo
Cache Database
Is there any way to get rid of those
limitations?
10. 10
Caching
• Some caching decisions can have a big impact
• Guidance on selecting the cache database,
cache mode, views to cache, refreshing
options, indexes and more
Detecting Bottlenecks in a Query
• The cause of a slow query can have different
roots: client, data source, network, Denodo
configuration
• Guidance on analyzing a slow query to find
the bottleneck and the different solutions for
each cause
More technical resources on this topics
Modeling Big Data and Analytic Use
Cases
• Guidance on partitioned unions, joins,
slowly changing dimensions, view
parameters, alternative wrappers, etc.
Configuring the Query Optimizer
• The query optimizer needs complete
information to make right decisions
• Guidance on view statistics, PKs, indexes,
associations, data movement, …
15. 15
Automatically detect and reuse that data
Store sales by
store?
denodo
Store sales by
month in
2019?
denodo
16. 1616
What?
Common partial aggregates of large facts tables and
common dimensions can be materialized and used as
starting points to accelerate queries
Similar to the concepts of ‘aggregation-awareness’ used
by some BI tools and OLAP databases
We call these partial aggregates: “Summaries”
The Denodo Query optimizer analyzes each query and
chooses among the available summaries and other
optimization techniques
This process is completely transparent to the end user
17. 1717
Why?
Summaries are aggregated and
therefore much smaller that original
tables
However, one summary can be used
to accelerate many different queries
Processing smaller tables means that
queries can be resolved much faster
18. 18
Some execution numbers
• TPCS-DS data:
• Distributed in 3 different systems
• Tables with hundreds of millions of rows
• Summary: total sales by store id, sold_date_id
Query
Execution Time
(no acceleration)
Execution Time
(acceleration)
Performance Gain Summary used
Total sales by year 15.45 s 2.38 s 6.5 x summary_total_by_store_day
Total sales by quarter,
store name and city
22.49 s 2.62 s 8.57 x summary_total_by_store_day
Total sales by store and
city for last quarter
14.71 s 0.47 s 31.1 x summary_total_by_store_day
Total sales in a specific
store
14.36 s 2.66 s 5.39 x summary_total_by_store_day
Total sales in a specific
store and year
14.32 s 3.18 s 4.0 x summary_total_by_store_day
20. 20
• Fast queries for interactive dashboards
• Flexibility for ad hoc queries on top of
semantic model
• No need for data in real time
End User Requirements
SALESITEMSALES
STORE
Underlying data sources
DATE
Sales by store?
Sales by
customer in
Store 2?Store 1 sales in
January
21. 21
• Huge data volumes in raw tables
• Data Source is not fast enough
• Smaller summaries avoid time consuming
heavy calculations
• Summaries can be created in same system
to enable delegation of JOINs with other
tables or somewhere else (e.g. a faster
database)
Data Lake acceleration
SALESITEMSALES
STOREDATE
Sales by store?
SALES
Sales:
10 billion rows
Sales summary:
1 million rows
22. 22
• Data Sources charges based on usage or data volumes
• Snowflake charges by “compute credits”
• Athena by bytes scanned
• Smaller summaries mean less data processed and less
CPU time
• Summarized queries are not just faster, but also
cheaper
• Cloud DW do not offer natively acceleration
capabilities
Pay-per-use Cost Savings in Cloud
SALESITEMSALES
STOREDATE
Sales by store?
SALES
10 billion rows
Sales summary
1 million rows
23. 23
• Enterprise Data Warehouse is already at capacity
• New initiatives (data science, self service
analytics, etc.) demand additional capacity
• Replicate data to additional data mart or data
lake is costly
• Summary-based queries reduce workload and
compute time, and have small storage demands
Reduce Workload in EDW
SALESITEMSALES
STOREDATE
Sales by store?
SALES
10 billion rows
Sales summary
1 million rows
24. 24
Transition to Cloud
Phase I: All on Premise Phase II: Hybrid environment
SALES SALES
SUMMARY STORE
denodo denodo denodo
Create summary with
common data from on-
prem tables to avoid
accessing remote legacy
systems
29. 29
How to create the right summaries
Denodo 8 will automatically analyze past queries and
suggest summaries that will increase performance
Summaries can also be defined manually. How?
1. Identify the queries you need to accelerate
2. Understand the facts and dimension tables that are used
3. Define summaries that accelerate those queries
1. Define content:
Generic enough to cover multiple queries
Specific enough to keep it small and fast
2. Decide location
30. 30
How to create the right summaries: Content
Summaries should be generic enough to accelerate multiple
queries
Common approaches are:
Summaries on Fact tables only
Aggregate by the FKs to a dimension instead
Example #1: contains the total sales by sold_date_id and
store_id.
This can address queries asking the total sales by year, quarter,
store_name, store_address, etc
Include dimensions with frequently used hierarchical attributes
E.g.: day > week > quarter > year
Example #2: total sales by year, store_id
Can address queries aggregating by year and by any store
attribute
STORE DATE
Total_sales_by_date_id_store_id
GROUP BY
SALES
STORE
Total_sales_by_year_store_id
GROUP BY
SALES
DATE
31. 31
How to create the right summaries: Content
Summaries should also be specific enough to be small
and fast
Too many FKs can lead to a huge summary
You can create multiple summaries over the same set of tables
Create multiple summaries with common combinations of
FKs
Add a filter
E.g. specific year, country, product category
Aggregate by common hierarchical attributes in the
dimension
E.g. quarter, store region
Example: total sales by product_id by quarter just in the
current year by store region
Total_sales_curr_year_prod_store_division
PRODUCT
GROUP BY
SALES
STORE DATE
32. 32
How to create the right summaries: Location
Best alternatives:
• Close to Denodo
• Summary persisted in selected Cache data source
• Reduces workload on the original system
• It may introduce federation
• Close to the rest of the data
• Summary persisted in the same data source that
contained the original tables
• Maximizes push-down
• Requires write permissions in data source
33. 33
Summaries and Caching in your Modeling Strategy
Base Layer
Original source models
Semantic Layer
Logical DW model
Business Layer (optional)
De-normalized view for business
Reporting Layer (optional)
Pre-canned reports with calculated metrics
Caching
Slow and protected sources only
Summaries
Summaries
Caching
35. 35
Conclusions
Query acceleration capabilities benefit end users:
• Remarkably faster queries
• In a completely transparent manner
And simplify the job of administrators and developers:
• Less manual tuning: Query optimizer combines the
summaries and existing optimization techniques to
efficiently optimize any query
• Brings powerful optimization techniques to any source and
any reporting tool
Denodo 8 Query acceleration capabilities are:
• A game changer for self service initiatives
• Key to save time, cost and resources
36.
37. 37
Next Steps
Access Denodo Platform in the Cloud!
Take a Test Drive today!
www.denodo.com/TestDrive
G E T S TA R T E D TO DAY