Accelerate your Queries with Data Virtualization

DATA VIRTUALIZATION PACKED LUNCH
WEBINAR SERIES
Sessions Covering Key Data Integration Challenges
Solved with Data Virtualization

Accelerate your Queries with Data
Virtualization
Speed up execution in analytical scenarios up to 100x
Pablo Alvarez-Yanez
Director of Product Management, Denodo

Agenda
1. Accelerating?
2. Understanding performance in analytical queries
3. How query acceleration works
4. Use cases
5. Best Practices
6. Demo
7. Conclusion

55
What does “query acceleration” means?
 “Query acceleration” is a set of techniques that are able
to significantly speed up the execution of a new query
that was not previously cached
 These techniques can make a query run up to 100x
faster, improving the usability of ad-hoc queries in very
reactive scenarios, like interactive dashboards
 Query acceleration applies not just to federated queries,
but to queries in a single source too
 In short, it’s a way to make ad hoc queries run much
faster

Understanding performance in
analytical queries
6

7
Main factors that determine
performance:
• Processing in the source
• Data transfer
• Processing in Denodo
Optimizing analytic scenarios
Total sales by
store?
denodo
Let’s review the techniques traditionally
used to improve execution times

8
Based on metadata analysis, generate an
execution plan that generate equivalent
results but faster than the original one
These optimization techniques are key for
a the performance of real-time queries
• Data transfer
Rule-based and Cost-based optimizations
300 M 2 M
Sales Store
join
group by
2 M
2 M
Sales Store
join
group by
ID
Group by
store
Total sales by
store?
125 secs
15 secs

9
Save the results of a previous query to use in
following executions
• Data transfer
Limited for self service scenarios (ad hoc queries):
• Require end user knowledge of the data model
• Queries need to refer to the specific views that
have been cached
• Not flexible enough for aggregation queries
Traditional Caching
Total sales by
store
denodo
Cache Database
Is there any way to get rid of those
limitations?

10
 Caching
• Some caching decisions can have a big impact
• Guidance on selecting the cache database,
cache mode, views to cache, refreshing
options, indexes and more
 Detecting Bottlenecks in a Query
• The cause of a slow query can have different
roots: client, data source, network, Denodo
configuration
• Guidance on analyzing a slow query to find
the bottleneck and the different solutions for
each cause
More technical resources on this topics
 Modeling Big Data and Analytic Use
Cases
• Guidance on partitioned unions, joins,
slowly changing dimensions, view
parameters, alternative wrappers, etc.
 Configuring the Query Optimizer
• The query optimizer needs complete
information to make right decisions
• Guidance on view statistics, PKs, indexes,
associations, data movement, …

How query acceleration works
11

12
Similar queries share common data and operations
Sales by store?
denodo
Store sales by
month in
2019?
denodo

13
Identify the common patterns
Sales by store?
denodo
Store sales by
month in
2019?
denodo

14
Pre-calculate the common patterns
Sales by store?
denodo
Store sales by
month in
2019?
denodo

15
Automatically detect and reuse that data
Store sales by
store?
denodo
Store sales by
month in
2019?
denodo

1616
What?
 Common partial aggregates of large facts tables and
common dimensions can be materialized and used as
starting points to accelerate queries
 Similar to the concepts of ‘aggregation-awareness’ used
by some BI tools and OLAP databases
 We call these partial aggregates: “Summaries”
 The Denodo Query optimizer analyzes each query and
chooses among the available summaries and other
optimization techniques
 This process is completely transparent to the end user

1717
Why?
 Summaries are aggregated and
therefore much smaller that original
tables
 However, one summary can be used
to accelerate many different queries
 Processing smaller tables means that
queries can be resolved much faster

18
Some execution numbers
• TPCS-DS data:
• Distributed in 3 different systems
• Tables with hundreds of millions of rows
• Summary: total sales by store id, sold_date_id
Query
Execution Time
(no acceleration)
Execution Time
(acceleration)
Performance Gain Summary used
Total sales by year 15.45 s 2.38 s 6.5 x summary_total_by_store_day
Total sales by quarter,
store name and city
22.49 s 2.62 s 8.57 x summary_total_by_store_day
Total sales by store and
city for last quarter
Total sales in a specific
store
Total sales in a specific
store and year

20
• Fast queries for interactive dashboards
• Flexibility for ad hoc queries on top of
semantic model
• No need for data in real time
End User Requirements
SALESITEMSALES
STORE
Underlying data sources
DATE
Sales by store?
Sales by
customer in
Store 2?Store 1 sales in
January

21
• Huge data volumes in raw tables
• Data Source is not fast enough
• Smaller summaries avoid time consuming
heavy calculations
• Summaries can be created in same system
to enable delegation of JOINs with other
tables or somewhere else (e.g. a faster
database)
Data Lake acceleration
SALESITEMSALES
STOREDATE
Sales by store?
SALES
Sales:
10 billion rows
Sales summary:
1 million rows

22
• Data Sources charges based on usage or data volumes
• Snowflake charges by “compute credits”
• Athena by bytes scanned
• Smaller summaries mean less data processed and less
CPU time
• Summarized queries are not just faster, but also
cheaper
• Cloud DW do not offer natively acceleration
capabilities
Pay-per-use Cost Savings in Cloud
SALESITEMSALES
STOREDATE
Sales by store?
SALES
10 billion rows
Sales summary
1 million rows

23
• Enterprise Data Warehouse is already at capacity
• New initiatives (data science, self service
analytics, etc.) demand additional capacity
• Replicate data to additional data mart or data
lake is costly
• Summary-based queries reduce workload and
compute time, and have small storage demands
Reduce Workload in EDW
SALESITEMSALES
STOREDATE
Sales by store?
SALES
10 billion rows
Sales summary
1 million rows

24
Transition to Cloud
Phase I: All on Premise Phase II: Hybrid environment
SALES SALES
SUMMARY STORE
denodo denodo denodo
Create summary with
common data from on-
prem tables to avoid
accessing remote legacy
systems

25
Hybrid environments
Denodo servers close to local data sources
Summaries accelerate access to
relevant data from the remote locations

27
Demo Scenario
Scenario 1:
Single source
Scenario 2:
multi-source LDW
SALESITEMSALES
STOREDATE
SALESITEMSALES
STOREDATE

29
How to create the right summaries
Denodo 8 will automatically analyze past queries and
suggest summaries that will increase performance
Summaries can also be defined manually. How?
1. Identify the queries you need to accelerate
2. Understand the facts and dimension tables that are used
3. Define summaries that accelerate those queries
1. Define content:
 Generic enough to cover multiple queries
 Specific enough to keep it small and fast
2. Decide location

30
How to create the right summaries: Content
Summaries should be generic enough to accelerate multiple
queries
Common approaches are:
 Summaries on Fact tables only
 Aggregate by the FKs to a dimension instead
 Example #1: contains the total sales by sold_date_id and
store_id.
 This can address queries asking the total sales by year, quarter,
store_name, store_address, etc
 Include dimensions with frequently used hierarchical attributes
 E.g.: day > week > quarter > year
 Example #2: total sales by year, store_id
 Can address queries aggregating by year and by any store
attribute
STORE DATE
Total_sales_by_date_id_store_id
GROUP BY
SALES
STORE
Total_sales_by_year_store_id
GROUP BY
SALES
DATE

31
How to create the right summaries: Content
Summaries should also be specific enough to be small
and fast
 Too many FKs can lead to a huge summary
 You can create multiple summaries over the same set of tables
 Create multiple summaries with common combinations of
FKs
 Add a filter
 E.g. specific year, country, product category
 Aggregate by common hierarchical attributes in the
dimension
 E.g. quarter, store region
 Example: total sales by product_id by quarter just in the
current year by store region
Total_sales_curr_year_prod_store_division
PRODUCT
GROUP BY
SALES
STORE DATE

32
How to create the right summaries: Location
Best alternatives:
• Close to Denodo
• Summary persisted in selected Cache data source
• Reduces workload on the original system
• It may introduce federation
• Close to the rest of the data
• Summary persisted in the same data source that
contained the original tables
• Maximizes push-down
• Requires write permissions in data source

33
Summaries and Caching in your Modeling Strategy
Base Layer
Original source models
Semantic Layer
Logical DW model
Business Layer (optional)
De-normalized view for business
Reporting Layer (optional)
Pre-canned reports with calculated metrics
Caching
Slow and protected sources only
Summaries
Summaries
Caching

35
Conclusions
Query acceleration capabilities benefit end users:
• Remarkably faster queries
• In a completely transparent manner
And simplify the job of administrators and developers:
• Less manual tuning: Query optimizer combines the
summaries and existing optimization techniques to
efficiently optimize any query
• Brings powerful optimization techniques to any source and
any reporting tool
Denodo 8 Query acceleration capabilities are:
• A game changer for self service initiatives
• Key to save time, cost and resources

37
Next Steps
Access Denodo Platform in the Cloud!
Take a Test Drive today!
www.denodo.com/TestDrive
G E T S TA R T E D TO DAY

Thank you!
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and
microfilm, without prior the written authorization from Denodo Technologies.

Accelerate your Queries with Data Virtualization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Accelerate your Queries with Data Virtualization

Similar to Accelerate your Queries with Data Virtualization (20)

More from Denodo

More from Denodo (20)

Recently uploaded

Recently uploaded (20)

Accelerate your Queries with Data Virtualization