SlideShare a Scribd company logo
Ron Hu, Zhenhua Wang
Huawei Technologies, Inc.
Sameer Agarwal, Wenchen Fan
Databricks Inc.
Cost-Based Optimizer in
Apache Spark 2.2
Session 1 Topics
• Motivation
• Statistics Collection Framework
• Cost Based Optimizations
• TPC-DS Benchmark and Query Analysis
• Demo
How Spark Executes a Query?
How Spark Executes a Query?
Focus of Today’s Talk
Catalyst Optimizer: An Overview
events =“/logs”)
stats =
errors = stats.where(
stats.status == “ERR”)
Query Plan is an
internal representation
of a user’s program
Series of Transformations
that convert the initial query
plan into an optimized plan
SCAN logs
SCAN users SCAN logsSCAN users
SCAN users
Catalyst Optimizer: An Overview
In Spark, the optimizer’s goal is to
minimize end-to-end query response time.
Two key ideas:
- Prune unnecessary data as early as possible
- e.g., filter pushdown, column pruning
- Minimize per-operator cost
- e.g., broadcast vs shuffle, optimal join order
SCAN logsSCAN users
SCAN users
Rule-based Optimizer in Spark 2.1
• Most of Spark SQL optimizer’s rules are heuristics rules.
– PushDownPredicate, ColumnPruning,
• Does NOT consider the cost of each operator
• Does NOT consider selectivity when estimating join relation size
• Therefore:
– Join order is mostly decided by its position in the SQL queries
– Physical Join implementation is decided based on heuristics
An Example (TPC-DS q11 variant)
SCAN: store_sales SCAN: customer
SCAN: date_dim
SELECT customer_id
FROM customer, store_sales, date_dim
WHERE c_customer_sk = ss_customer_sk AND
ss_sold_date_sk = d_date_sk AND
c_customer_sk > 1000
An Example (TPC-DS q11 variant)
SCAN: store_sales SCAN: customer
SCAN: date_dim
3 billion 12 million
2.5 billion
10 million
500 million
0.1 million
An Example (TPC-DS q11 variant)
SCAN: store_sales
SCAN: customer
SCAN: date_dim
3 billion
12 million
2.5 billion 500 million 10 million
500 million
0.1 million
40% faster
80% less data
An Example (TPC-DS q11 variant)
SCAN: store_sales
SCAN: customer
SCAN: date_dim
3 billion
12 million
2.5 billion 500 million 10 million
500 million
0.1 million
How do we automatically optimize queries like these?
Cost Based Optimizer (CBO)
• Collect, infer and propagate table/column
statistics on source/intermediate data
• Calculate the cost for each operator in terms of
number of output rows, size of output, etc.
• Based on the cost calculation, pick the most
optimal query execution plan
Rest of the Talk
• Statistics Collection Framework
– Table/Column Level Statistics Collected
– Cardinality Estimation (Filters, Joins, Aggregates etc.)
• Cost-based Optimizations
– Build Side Selection
– Multi-way Join Re-ordering
• TPC-DS Benchmarks
• Demo
Statistics Collection Framework
and Cost Based Optimizations
Ron Hu
Huawei Technologies
Step 1: Collect, infer and propagate table
and column statistics on source and
intermediate data
Table Statistics Collected
• Command to collect statistics of a table.
• It collects table level statistics and saves into
– Number of rows
– Table size in bytes
Column Statistics Collected
• Command to collect column level statistics of individual columns.
FOR COLUMNS column-name1, column-name2, ….
• It collects column level statistics and saves into meta-store.
String/Binary type
✓ Distinct count
✓ Null count
✓ Average length
✓ Max length
Numeric/Date/Timestamp type
✓ Distinct count
✓ Max
✓ Min
✓ Null count
✓ Average length (fixed length)
✓ Max length (fixed length)
Filter Cardinality Estimation
• Between Logical expressions: AND, OR, NOT
• In each logical expression: =, <, <=, >, >=, in, etc
• Current support type in Expression
– For <, <=, >, >=, <=>: Integer, Double, Date, Timestamp, etc
– For = , <=>: String, Integer, Double, Date, Timestamp, etc.
• Example: A <= B
– Based on A, B’s min/max/distinct count/null count values, decide
the relationships between A and B. After completing this
expression, we set the new min/max/distinct count/null count
– Assume all the data is evenly distributed if no histogram
Filter Operator Example
• Column A (op) literal B
– (op) can be “=“, “<”, “<=”, “>”, “>=”, “like”
– Like the styles as “l_orderkey = 3”, “l_shipdate <= “1995-03-21”
– Column’s max/min/distinct count/null countshould be updated
– Example: Column A < value B
Column AB B
A.min A.max
Filtering Factor = 0%
need to changeA’s statistics
Filtering Factor = 100%
no need to changeA’s statistics
Without histograms, supposedatais evenly distributed
Filtering Factor = (B.value – A.min) / (A.max – A.min)
A.min = no change
A.max = B.value
A.ndv = A.ndv * Filtering Factor
Filter Operator Example
• Column A (op) Column B
– (op) can be “<”, “<=”, “>”, “>=”
– We cannot suppose the data is evenly distributed,so the empirical filtering factor is set to 1/3
– Example: Column A < Column B
selectivity = 100% selectivity = 0%
selectivity = 33.3%
selectivity = 33.3%
Join Cardinality Estimation
• Inner-Join: The number of rows of “A join B on A.k1 = B.k1” is
estimated as: num(A B) = num(A) * num(B) / max(distinct(A.k1),
– where num(A) is the number of records in table A, distinct is the number of
distinct values of that column.
– The underlying assumption for this formula is that each value of the smaller
domain is included in the larger domain.
• We similarly estimate cardinalities for Left-Outer Join, Right-Outer
Join and Full-Outer Join
Other Operator Estimation
• Project: does not change row count
• Aggregate: consider uniqueness of group-by
• Limit, Sample, etc.
Step 2: Cost Estimation and Optimal
Plan Selection
Build Side Selection
• For two-way hash joins, we need to choose one operand as build side and the
other as probe side.
• Choose lower-cost child as build side of hash join.
– Before: build side was selected based on original table sizes. ➔ BuildRight
– Now with CBO: build side is selected based on
estimated cost of various operators before join. ➔ BuildLeft
Scan t2Filter
Scan t15 billion records,
500 GB
t1.value= 200
1 million records,
100 MB
100 million records,
20 GB
Hash Join Implementation: Broadcast vs. Shuffle
Physical Plan
➢ SortMergeJoinExec/
➢ CartesianProductExec/
Logical Plan
➢ Equi-join
• Inner Join
• LeftSemi/LeftAnti Join
• LeftOuter/RightOuter Join
➢ Theta-join
• Broadcast Criterion: whether the join side’s output size is small (default 10MB).
Scan t2Filter
Scan t15 billion records,
500 GB
t1.value = 100
Only 1000 records,
100 KB
100 million records,
20 GB
Scan t2Aggregate
Scan t2Join
… …
Multi-way Join Reorder
• Reorder the joins using a dynamic programming algorithm.
1. First we put all items (basic joined nodes) into level 0.
2. Build all two-way joins at level 1 from plans at level 0 (single items).
3. Build all 3-way joins from plans at previous levels (two-way joins and single items).
4. Build all 4-way joins etc, until we build all n-way joins and pick the best plan among
• When building m-way joins, only keep the best plan (optimal sub-solution)
for the same set of m items.
– E.g., for 3-way joins of items {A, B, C}, we keep only the best plan
among: (A J B) J C, (A J C) J B and (B J C) J A
Multi-way Join Reorder
Selinger et al. Access Path Selection in a Relational Database Management System. In SIGMOD 1979
Join Cost Formula
• The cost of a plan is the sum of costs of all intermediate tables.
• Cost = weight * Costcpu + CostIO * (1 - weight)
– In Spark, we use
weight * cardinality + size * (1 – weight)
– weight is a tuning parameter configured via
spark.sql.cbo.joinReorder.card.weight (0.7 as
TPC-DS Benchmarks and
Query Analysis
Zhenhua Wang
Huawei Technologies
Session 2 Topics
• Motivation
• Statistics Collection Framework
• Cost Based Optimizations
• TPC-DS Benchmark and Query Analysis
• Demo
Preliminary Performance Test
• Setup:
− TPC-DS size at 1 TB (scale factor 1000)
− 4 node cluster (Huawei FusionServer RH2288: 40 cores, 384GB mem)
− Apache Spark 2.2 RC (dated 5/12/2017)
• Statistics collection
– A total of 24 tables and 425 columns
➢ Take 14 minutes to collect statistics for all tables and all columns.
– Fast because all statistics are computed by integrating with Spark’s built-in
aggregate functions.
– Should take much less time if we collect statistics for columns used in predicate,
join, and group-by only.
TPC-DS Query Q11
WITH year_total AS (
c_customer_id customer_id,
c_first_name customer_first_name,
c_last_name customer_last_name,
c_preferred_cust_flag customer_preferred_cust_flag,
c_birth_country customer_birth_country,
c_login customer_login,
c_email_address customer_email_address,
d_year dyear,
sum(ss_ext_list_price - ss_ext_discount_amt) year_total,
's' sale_type
FROM customer, store_sales, date_dim
WHERE c_customer_sk = ss_customer_sk
AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id, c_first_name, c_last_name, d_year
, c_preferred_cust_flag, c_birth_country, c_login, c_email_address, d_year
c_customer_id customer_id,
c_first_name customer_first_name,
c_last_name customer_last_name,
c_preferred_cust_flag customer_preferred_cust_flag,
c_birth_country customer_birth_country,
c_login customer_login,
c_email_address customer_email_address,
d_year dyear,
sum(ws_ext_list_price - ws_ext_discount_amt) year_total,
'w' sale_type
FROM customer, web_sales, date_dim
WHERE c_customer_sk = ws_bill_customer_sk AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id, c_first_name, c_last_name, c_preferred_cust_flag,
c_birth_country, c_login, c_email_address, d_year)
SELECT t_s_secyear.customer_preferred_cust_flag
FROM year_total t_s_firstyear
, year_total t_s_secyear
, year_total t_w_firstyear
, year_total t_w_secyear
WHERE t_s_secyear.customer_id = t_s_firstyear.customer_id
AND t_s_firstyear.customer_id = t_w_secyear.customer_id
AND t_s_firstyear.customer_id = t_w_firstyear.customer_id
AND t_s_firstyear.sale_type = 's'
AND t_w_firstyear.sale_type = 'w'
AND t_s_secyear.sale_type = 's'
AND t_w_secyear.sale_type = 'w'
AND t_s_firstyear.dyear = 2001
AND t_s_secyear.dyear = 2001 + 1
AND t_w_firstyear.dyear = 2001
AND t_w_secyear.dyear = 2001 + 1
AND t_s_firstyear.year_total > 0
AND t_w_firstyear.year_total > 0
AND CASE WHEN t_w_firstyear.year_total > 0
THEN t_w_secyear.year_total / t_w_firstyear.year_total
> CASE WHEN t_s_firstyear.year_total > 0
THEN t_s_secyear.year_total / t_s_firstyear.year_total
ORDER BY t_s_secyear.customer_preferred_cust_flag
Query Analysis – Q11 CBO OFF
Large join result
Join	#1
store_sales customer
2.9 billion
Join	#2
web_sales customer
Join	#4
Join	#3
12 million
2.7 billion 73,049 73,049
12 million720 million
534 million
719 million
144 million
Query Analysis – Q11 CBO ON
Small join result
Join	#1
store_sales date_dim
2.9 billion
Join	#2
web_sales date_dim
Join	#4
Join	#3
534 million 12 million 12 million
73,049720 million
534 million
144 million
144 million
1.4x Speedup
80% less
TPC-DS Query Q72
count(CASE WHEN p_promo_sk IS NULL
ELSE 0 END) no_promo,
count(CASE WHEN p_promo_sk IS NOT NULL
ELSE 0 END) promo,
count(*) total_cnt
FROM catalog_sales
JOIN inventory ON (cs_item_sk = inv_item_sk)
JOIN warehouse ON (w_warehouse_sk = inv_warehouse_sk)
JOIN item ON (i_item_sk = cs_item_sk)
JOIN customer_demographics ON (cs_bill_cdemo_sk = cd_demo_sk)
JOIN household_demographics ON (cs_bill_hdemo_sk = hd_demo_sk)
JOIN date_dim d1 ON (cs_sold_date_sk = d1.d_date_sk)
JOIN date_dim d2 ON (inv_date_sk = d2.d_date_sk)
JOIN date_dim d3 ON (cs_ship_date_sk = d3.d_date_sk)
LEFT OUTER JOIN promotion ON (cs_promo_sk = p_promo_sk)
LEFT OUTER JOIN catalog_returns ON (cr_item_sk = cs_item_sk AND cr_order_number = cs_order_number)
WHERE d1.d_week_seq = d2.d_week_seq
AND inv_quantity_on_hand < cs_quantity
AND d3.d_date > (cast(d1.d_date AS DATE) + interval 5 days)
AND hd_buy_potential = '>10000'
AND d1.d_year = 1999
AND hd_buy_potential = '>10000'
AND cd_marital_status = 'D'
AND d1.d_year = 1999
GROUP BY i_item_desc, w_warehouse_name, d1.d_week_seq
ORDER BY total_cnt DESC, i_item_desc, w_warehouse_name, d_week_seq
Query Analysis – Q72 CBO OFF
Join	#1
Join	#2
Join	#3
Join	#4
Join	#5
Join	#6
Join	#7
Join	#8
catalog_sales inventory
date_dim	 d1
date_dim	 d2
date_dim	 d3
1.4 billion 783 million
223 billion 20
300,000223 billion
223 billion
44.7 million
7.5 million
1.6 million
9 million
1.9 million
8.7 million
Really large
Query Analysis – Q72 CBO ON
Join	#1
Join	#2
Join	#3
Join	#4
Join	#8
Join	#5
Join	#6
Join	#7
date_dim	 d1 date_dim	 d2date_dim	 d3
1.4 billion
783 million
1.9 million
73,049 73,049 73,049
238 million
238 million
47.6 million
47.6 million
1 billion
1 billion
8.7 million
Much smaller
results !
2-3 orders of
magnitude less!
8.1x Speedup
TPC-DS Query Performance
q1 q5 q9 q13 q16 q20 q23b q26 q30 q34 q38 q41 q45 q49 q53 q57 q61 q65 q70 q74 q78 q82 q86 q90 q94 q98
without CBO
with CBO
TPC-DS Query Speedup
• TPC-DS query speedup
ratio with CBO versus
without CBO
• 16 queries show speedup
> 30%
• The max speedup is 8X.
• The geo-mean of
speedup is 2.2X.
TPC-DS Query 64
WITH cs_ui AS
sum(cs_ext_list_price) AS sale,
sum(cr_refunded_cash + cr_reversed_charge + cr_store_credit) AS refund
FROM catalog_sales, catalog_returns
WHERE cs_item_sk = cr_item_sk AND cs_order_number = cr_order_number
GROUP BY cs_item_sk
HAVING sum(cs_ext_list_price) > 2 * sum(cr_refunded_cash + cr_reversed_charge + cr_store_credit)),
cross_sales AS
i_product_name product_name, i_item_sk item_sk, s_store_name store_name,
s_zip store_zip, ad1.ca_street_number b_street_number, ad1.ca_street_name b_streen_name,
ad1.ca_city b_city, ad1.ca_zip b_zip, ad2.ca_street_number c_street_number,
ad2.ca_street_name c_street_name, ad2.ca_city c_city, ad2.ca_zip c_zip,
d1.d_year AS syear, d2.d_year AS fsyear, d3.d_year s2year,
count(*) cnt, sum(ss_wholesale_cost) s1, sum(ss_list_price) s2, sum(ss_coupon_amt) s3
FROM store_sales, store_returns, cs_ui, date_dim d1, date_dim d2, date_dim d3,
store, customer, customer_demographics cd1, customer_demographics cd2,
promotion, household_demographics hd1, household_demographics hd2,
customer_address ad1, customer_address ad2, income_band ib1, income_band ib2, item
WHERE ss_store_sk = s_store_sk AND ss_sold_date_sk = d1.d_date_sk AND
ss_customer_sk = c_customer_sk AND ss_cdemo_sk = cd1.cd_demo_sk AND
ss_hdemo_sk = hd1.hd_demo_sk AND ss_addr_sk = ad1.ca_address_sk AND
ss_item_sk = i_item_sk AND ss_item_sk = sr_item_sk AND
ss_ticket_number = sr_ticket_number AND ss_item_sk = cs_ui.cs_item_sk AND
c_current_cdemo_sk = cd2.cd_demo_sk AND c_current_hdemo_sk = hd2.hd_demo_sk AND
c_current_addr_sk = ad2.ca_address_sk AND c_first_sales_date_sk = d2.d_date_sk AND
c_first_shipto_date_sk = d3.d_date_sk AND ss_promo_sk = p_promo_sk AND
hd1.hd_income_band_sk = ib1.ib_income_band_sk AND
hd2.hd_income_band_sk = ib2.ib_income_band_sk AND
cd1.cd_marital_status <> cd2.cd_marital_status AND
i_color IN ('purple', 'burlywood', 'indian', 'spring', 'floral', 'medium') AND
i_current_price BETWEEN 64 AND 64 + 10 AND i_current_price BETWEEN 64 + 1 AND 64 + 15
GROUP BY i_product_name, i_item_sk, s_store_name, s_zip, ad1.ca_street_number,
ad1.ca_street_name, ad1.ca_city, ad1.ca_zip, ad2.ca_street_number,
ad2.ca_street_name, ad2.ca_city, ad2.ca_zip, d1.d_year, d2.d_year, d3.d_year)
FROM cross_sales cs1, cross_sales cs2
WHERE cs1.item_sk = cs2.item_sk AND
cs1.syear = 1999 AND
cs2.syear = 1999 + 1 AND
cs2.cnt <= cs1.cnt AND
cs1.store_name = cs2.store_name AND
cs1.store_zip = cs2.store_zip
ORDER BY cs1.product_name, cs1.store_name, cs2.cnt
Query Analysis – Q64 CBO ON
10% slower
FileScan (store_sales)
FileScan (store_returns)
BroadcastExchange (cs_ui)
Sort Sort
Fragment 1
Fragment 2
Query Analysis – Q64 CBO OFF
FileScan (store_sales)
FileScan (store_returns)
Sort (cs_ui)
Sort (cs_ui)
Fragment 1
Fragment 2
CBO Demo
Wenchen Fan
Current Status, Credits and
Future Work
Ron Hu
Huawei Technologies
Available in Apache Spark 2.2
• Configured via spark.sql.cbo.enabled
• ‘Off By Default’. Why?
– Spark is used in production
– Many Spark users may already rely on “human
intelligence” to write queries in best order
– Plan on enabling this by default in Spark 2.3
• We encourage you test CBO with Spark 2.2!
Current Status
• SPARK-16026 is the umbrella jira.
– 32 sub-tasks have been resolved
– A big project spanning 8 months
– 10+ Spark contributors involved
– 7000+ lines of Scala code have been contributed
• Good framework to allow integrations
– Use statistics to derive if a join attribute is unique
– Benefit star schema detection and its integration into join
Birth of Spark SQL CBO
• Prototype
– In 2015, Ron Hu, Fang Cao, etc. of Huawei’s research
department prototyped the CBO concept on Spark 1.2.
– After a successful prototype, we shared technology with
Zhenhua Wang, Fei Wang, etc of Huawei’s product
development team.
• We delivered a talk at Spark Summit 2016:
– “Enhancing Spark SQL Optimizer with Reliable Statistics”.
• The talk was well received by the community.
• Good community support
– Developers: Zhenhua Wang, Ron Hu, Reynold Xin,
Wenchen Fan, Xiao Li
– Reviewers: Wenchen, Herman, Reynold, Xiao, Liang-chi,
Ioana, Nattavut, Hyukjin, Shuai, …..
– Extensive discussion in JIRAs and PRs (tens to hundreds
– All the comments made the development time longer, but
improved code quality.
• It was a pleasure working with community.
Future Work: Cost Based Optimizer
• Current cost formula is coarse.
Cost = cardinality * weight + size * (1 - weight)
• Cannot tell the cost difference between sort-
merge join and hash join
– spark.sql.join.preferSortMergeJoin defaults to true.
• Underestimates (or ignores) shuffle cost.
• Will improve cost formula in next release.
Future Work: Statistics Collection Framework
• Advanced statistics: e.g. histograms, sketches.
• Hint mechanism.
• Partition level statistics.
• Speed up statistics collection by sampling data
for large tables.
• Motivation
• Statistics Collection Framework
– Table/Column Level Statistics Collected
– Cardinality Estimation (Filters, Joins, Aggregates etc.)
• Cost-based Optimizations
– Build Side Selection
– Multi-way Join Re-ordering
• TPC-DS Benchmarks
• Demo
Thank You.
Sameer’s Office Hours @ 4:30pm Today
Wenchen’s Office Hours @ 3pm Tomorrow
Multi-way Join Reorder – Example
• Given A J B J C J D with join conditions A.k1 = B.k1 and
B.k2 = C.k2 and C.k3 = D.k3
level 0: p({A}), p({B}), p({C}), p({D})
level 1: p({A, B}), p({A, C}), p({A, D}), p({B, C}), p({B, D}), p({C, D})
level 2: p({A, B, C}), p({A, B, D}), p({A, C, D}), p({B, C, D})
level 3: p({A, B, C, D}) -- final output plan
Multi-way Join Reorder – Example
• Pruning strategy: exclude cartesian product candidates.
This significantly reduces the search space.
level 0: p({A}), p({B}), p({C}), p({D})
level 1: p({A, B}), p({A, C}), p({A, D}), p({B, C}), p({B, D}), p({C, D})
level 2: p({A, B, C}), p({A, B, D}), p({A, C, D}), p({B, C, D})
level 3: p({A, B, C, D}) -- final output plan
New Commands in Apache Spark 2.2
• CBO commands
– Collect table-level statistics
– Collect column-level statistics
column_name2, …
– Display statistics in the optimized logical plan
> SELECT cc_call_center_sk, cc_call_center_id FROM call_center;
== Optimized Logical Plan ==
Project [cc_call_center_sk#75, cc_call_center_id#76], Statistics(sizeInBytes=1680.0 B, rowCount=42, hints=none)
+- Relation[…31 fields] parquet, Statistics(sizeInBytes=22.5 KB, rowCount=42, hints=none)

More Related Content

What's hot

A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Julian Hyde
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Julian Hyde
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
Spark Summit
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
Julian Hyde

What's hot (20)

A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!

Similar to Cost-Based Optimizer in Apache Spark 2.2

Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
Jen Aman
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Ronald Francisco Vargas Quesada
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
Zbigniew Jerzak
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
Julian Hyde
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineDataWorks Summit
Beyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To CodeBeyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To Code
Yuto Hayamizu
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
Oracle Query Optimizer - An Introduction
Oracle Query Optimizer - An IntroductionOracle Query Optimizer - An Introduction
Oracle Query Optimizer - An Introduction
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
Satoshi Nagayasu
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
Olav Sandstå
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
Brent Ozar
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward
19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE
Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...

Similar to Cost-Based Optimizer in Apache Spark 2.2 (20)

Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Beyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To CodeBeyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To Code
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
Oracle Query Optimizer - An Introduction
Oracle Query Optimizer - An IntroductionOracle Query Optimizer - An Introduction
Oracle Query Optimizer - An Introduction
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE
Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded

Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
Tier1 app
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf

Recently uploaded (20)

Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf

Cost-Based Optimizer in Apache Spark 2.2

  • 1. Ron Hu, Zhenhua Wang Huawei Technologies, Inc. Sameer Agarwal, Wenchen Fan Databricks Inc. Cost-Based Optimizer in Apache Spark 2.2
  • 2. Session 1 Topics • Motivation • Statistics Collection Framework • Cost Based Optimizations • TPC-DS Benchmark and Query Analysis • Demo 2
  • 3. How Spark Executes a Query? Logical Plan Physical Plan Catalog Optimizer RDDs … SQL Code Generator Data Frames
  • 4. How Spark Executes a Query? Logical Plan Physical Plan Catalog Optimizer RDDs … SQL Code Generator Data Frames Focus of Today’s Talk
  • 5. Catalyst Optimizer: An Overview 5 events =“/logs”) stats = events.join(users) .groupBy(“loc”,“status”) .avg(“duration”) errors = stats.where( stats.status == “ERR”) Query Plan is an internal representation of a user’s program Series of Transformations that convert the initial query plan into an optimized plan SCAN logs JOIN FILTER AGG SCAN users SCAN logsSCAN users JOIN FILTER AGG SCAN users
  • 6. Catalyst Optimizer: An Overview 6 In Spark, the optimizer’s goal is to minimize end-to-end query response time. Two key ideas: - Prune unnecessary data as early as possible - e.g., filter pushdown, column pruning - Minimize per-operator cost - e.g., broadcast vs shuffle, optimal join order SCAN logsSCAN users JOIN FILTER AGG SCAN users
  • 7. Rule-based Optimizer in Spark 2.1 • Most of Spark SQL optimizer’s rules are heuristics rules. – PushDownPredicate, ColumnPruning, ConstantFolding,… • Does NOT consider the cost of each operator • Does NOT consider selectivity when estimating join relation size • Therefore: – Join order is mostly decided by its position in the SQL queries – Physical Join implementation is decided based on heuristics 7
  • 8. An Example (TPC-DS q11 variant) 8 SCAN: store_sales SCAN: customer SCAN: date_dim FILTER JOIN JOIN SELECT customer_id FROM customer, store_sales, date_dim WHERE c_customer_sk = ss_customer_sk AND ss_sold_date_sk = d_date_sk AND c_customer_sk > 1000
  • 9. An Example (TPC-DS q11 variant) 9 SCAN: store_sales SCAN: customer SCAN: date_dim FILTER JOIN JOIN 3 billion 12 million 2.5 billion 10 million 500 million 0.1 million
  • 10. An Example (TPC-DS q11 variant) 10 SCAN: store_sales SCAN: customer SCAN: date_dim FILTERJOIN JOIN 3 billion 12 million 2.5 billion 500 million 10 million 500 million 0.1 million 40% faster 80% less data
  • 11. An Example (TPC-DS q11 variant) 11 SCAN: store_sales SCAN: customer SCAN: date_dim FILTERJOIN JOIN 3 billion 12 million 2.5 billion 500 million 10 million 500 million 0.1 million How do we automatically optimize queries like these?
  • 12. Cost Based Optimizer (CBO) • Collect, infer and propagate table/column statistics on source/intermediate data • Calculate the cost for each operator in terms of number of output rows, size of output, etc. • Based on the cost calculation, pick the most optimal query execution plan 12
  • 13. Rest of the Talk • Statistics Collection Framework – Table/Column Level Statistics Collected – Cardinality Estimation (Filters, Joins, Aggregates etc.) • Cost-based Optimizations – Build Side Selection – Multi-way Join Re-ordering • TPC-DS Benchmarks • Demo 13
  • 14. Statistics Collection Framework and Cost Based Optimizations Ron Hu Huawei Technologies
  • 15. Step 1: Collect, infer and propagate table and column statistics on source and intermediate data
  • 16. Table Statistics Collected • Command to collect statistics of a table. – Ex: ANALYZE TABLE table-name COMPUTE STATISTICS • It collects table level statistics and saves into metastore. – Number of rows – Table size in bytes 16
  • 17. Column Statistics Collected • Command to collect column level statistics of individual columns. – Ex: ANALYZE TABLE table-name COMPUTE STATISTICS FOR COLUMNS column-name1, column-name2, …. • It collects column level statistics and saves into meta-store. String/Binary type ✓ Distinct count ✓ Null count ✓ Average length ✓ Max length Numeric/Date/Timestamp type ✓ Distinct count ✓ Max ✓ Min ✓ Null count ✓ Average length (fixed length) ✓ Max length (fixed length) 17
  • 18. Filter Cardinality Estimation • Between Logical expressions: AND, OR, NOT • In each logical expression: =, <, <=, >, >=, in, etc • Current support type in Expression – For <, <=, >, >=, <=>: Integer, Double, Date, Timestamp, etc – For = , <=>: String, Integer, Double, Date, Timestamp, etc. • Example: A <= B – Based on A, B’s min/max/distinct count/null count values, decide the relationships between A and B. After completing this expression, we set the new min/max/distinct count/null count – Assume all the data is evenly distributed if no histogram information. 18
  • 19. Filter Operator Example • Column A (op) literal B – (op) can be “=“, “<”, “<=”, “>”, “>=”, “like” – Like the styles as “l_orderkey = 3”, “l_shipdate <= “1995-03-21” – Column’s max/min/distinct count/null countshould be updated – Example: Column A < value B Column AB B A.min A.max Filtering Factor = 0% need to changeA’s statistics Filtering Factor = 100% no need to changeA’s statistics Without histograms, supposedatais evenly distributed Filtering Factor = (B.value – A.min) / (A.max – A.min) A.min = no change A.max = B.value A.ndv = A.ndv * Filtering Factor 19
  • 20. Filter Operator Example • Column A (op) Column B – (op) can be “<”, “<=”, “>”, “>=” – We cannot suppose the data is evenly distributed,so the empirical filtering factor is set to 1/3 – Example: Column A < Column B B A AA A B B B selectivity = 100% selectivity = 0% selectivity = 33.3% 20 selectivity = 33.3%
  • 21. Join Cardinality Estimation • Inner-Join: The number of rows of “A join B on A.k1 = B.k1” is estimated as: num(A B) = num(A) * num(B) / max(distinct(A.k1), distinct(B.k1)), – where num(A) is the number of records in table A, distinct is the number of distinct values of that column. – The underlying assumption for this formula is that each value of the smaller domain is included in the larger domain. • We similarly estimate cardinalities for Left-Outer Join, Right-Outer Join and Full-Outer Join 21
  • 22. Other Operator Estimation • Project: does not change row count • Aggregate: consider uniqueness of group-by columns • Limit, Sample, etc. 22
  • 23. Step 2: Cost Estimation and Optimal Plan Selection
  • 24. Build Side Selection • For two-way hash joins, we need to choose one operand as build side and the other as probe side. • Choose lower-cost child as build side of hash join. – Before: build side was selected based on original table sizes. ➔ BuildRight – Now with CBO: build side is selected based on estimated cost of various operators before join. ➔ BuildLeft Join Scan t2Filter Scan t15 billion records, 500 GB t1.value= 200 1 million records, 100 MB 100 million records, 20 GB 24
  • 25. Hash Join Implementation: Broadcast vs. Shuffle Physical Plan ➢ SortMergeJoinExec/ BroadcastHashJoinExec/ ShuffledHashJoinExec ➢ CartesianProductExec/ BroadcastNestedLoopJoinExec Logical Plan ➢ Equi-join • Inner Join • LeftSemi/LeftAnti Join • LeftOuter/RightOuter Join ➢ Theta-join • Broadcast Criterion: whether the join side’s output size is small (default 10MB). Join Scan t2Filter Scan t15 billion records, 500 GB t1.value = 100 Only 1000 records, 100 KB 100 million records, 20 GB Join Scan t2Aggregate … Join Scan t2Join … … 25
  • 26. Multi-way Join Reorder • Reorder the joins using a dynamic programming algorithm. 1. First we put all items (basic joined nodes) into level 0. 2. Build all two-way joins at level 1 from plans at level 0 (single items). 3. Build all 3-way joins from plans at previous levels (two-way joins and single items). 4. Build all 4-way joins etc, until we build all n-way joins and pick the best plan among them. • When building m-way joins, only keep the best plan (optimal sub-solution) for the same set of m items. – E.g., for 3-way joins of items {A, B, C}, we keep only the best plan among: (A J B) J C, (A J C) J B and (B J C) J A 26
  • 27. Multi-way Join Reorder 27 Selinger et al. Access Path Selection in a Relational Database Management System. In SIGMOD 1979
  • 28. Join Cost Formula • The cost of a plan is the sum of costs of all intermediate tables. • Cost = weight * Costcpu + CostIO * (1 - weight) – In Spark, we use weight * cardinality + size * (1 – weight) – weight is a tuning parameter configured via spark.sql.cbo.joinReorder.card.weight (0.7 as default) 28
  • 29. TPC-DS Benchmarks and Query Analysis Zhenhua Wang Huawei Technologies
  • 30. Session 2 Topics • Motivation • Statistics Collection Framework • Cost Based Optimizations • TPC-DS Benchmark and Query Analysis • Demo 30
  • 31. Preliminary Performance Test • Setup: − TPC-DS size at 1 TB (scale factor 1000) − 4 node cluster (Huawei FusionServer RH2288: 40 cores, 384GB mem) − Apache Spark 2.2 RC (dated 5/12/2017) • Statistics collection – A total of 24 tables and 425 columns ➢ Take 14 minutes to collect statistics for all tables and all columns. – Fast because all statistics are computed by integrating with Spark’s built-in aggregate functions. – Should take much less time if we collect statistics for columns used in predicate, join, and group-by only. 31
  • 32. TPC-DS Query Q11 32 WITH year_total AS ( SELECT c_customer_id customer_id, c_first_name customer_first_name, c_last_name customer_last_name, c_preferred_cust_flag customer_preferred_cust_flag, c_birth_country customer_birth_country, c_login customer_login, c_email_address customer_email_address, d_year dyear, sum(ss_ext_list_price - ss_ext_discount_amt) year_total, 's' sale_type FROM customer, store_sales, date_dim WHERE c_customer_sk = ss_customer_sk AND ss_sold_date_sk = d_date_sk GROUP BY c_customer_id, c_first_name, c_last_name, d_year , c_preferred_cust_flag, c_birth_country, c_login, c_email_address, d_year UNION ALL SELECT c_customer_id customer_id, c_first_name customer_first_name, c_last_name customer_last_name, c_preferred_cust_flag customer_preferred_cust_flag, c_birth_country customer_birth_country, c_login customer_login, c_email_address customer_email_address, d_year dyear, sum(ws_ext_list_price - ws_ext_discount_amt) year_total, 'w' sale_type FROM customer, web_sales, date_dim WHERE c_customer_sk = ws_bill_customer_sk AND ws_sold_date_sk = d_date_sk GROUP BY c_customer_id, c_first_name, c_last_name, c_preferred_cust_flag, c_birth_country, c_login, c_email_address, d_year) SELECT t_s_secyear.customer_preferred_cust_flag FROM year_total t_s_firstyear , year_total t_s_secyear , year_total t_w_firstyear , year_total t_w_secyear WHERE t_s_secyear.customer_id = t_s_firstyear.customer_id AND t_s_firstyear.customer_id = t_w_secyear.customer_id AND t_s_firstyear.customer_id = t_w_firstyear.customer_id AND t_s_firstyear.sale_type = 's' AND t_w_firstyear.sale_type = 'w' AND t_s_secyear.sale_type = 's' AND t_w_secyear.sale_type = 'w' AND t_s_firstyear.dyear = 2001 AND t_s_secyear.dyear = 2001 + 1 AND t_w_firstyear.dyear = 2001 AND t_w_secyear.dyear = 2001 + 1 AND t_s_firstyear.year_total > 0 AND t_w_firstyear.year_total > 0 AND CASE WHEN t_w_firstyear.year_total > 0 THEN t_w_secyear.year_total / t_w_firstyear.year_total ELSE NULL END > CASE WHEN t_s_firstyear.year_total > 0 THEN t_s_secyear.year_total / t_s_firstyear.year_total ELSE NULL END ORDER BY t_s_secyear.customer_preferred_cust_flag LIMIT 100
  • 33. Query Analysis – Q11 CBO OFF Large join result Join #1 store_sales customer date_dim 2.9 billion … … Join #2 web_sales customer date_dim Join #4 … Join #3 12 million 2.7 billion 73,049 73,049 12 million720 million 534 million 719 million 144 million 33
  • 34. Query Analysis – Q11 CBO ON Small join result Join #1 store_sales date_dim customer 2.9 billion … … Join #2 web_sales date_dim customer Join #4 … Join #3 73,049 534 million 12 million 12 million 73,049720 million 534 million 144 million 144 million 1.4x Speedup 34 80% less
  • 35. TPC-DS Query Q72 35 SELECT i_item_desc, w_warehouse_name, d1.d_week_seq, count(CASE WHEN p_promo_sk IS NULL THEN 1 ELSE 0 END) no_promo, count(CASE WHEN p_promo_sk IS NOT NULL THEN 1 ELSE 0 END) promo, count(*) total_cnt FROM catalog_sales JOIN inventory ON (cs_item_sk = inv_item_sk) JOIN warehouse ON (w_warehouse_sk = inv_warehouse_sk) JOIN item ON (i_item_sk = cs_item_sk) JOIN customer_demographics ON (cs_bill_cdemo_sk = cd_demo_sk) JOIN household_demographics ON (cs_bill_hdemo_sk = hd_demo_sk) JOIN date_dim d1 ON (cs_sold_date_sk = d1.d_date_sk) JOIN date_dim d2 ON (inv_date_sk = d2.d_date_sk) JOIN date_dim d3 ON (cs_ship_date_sk = d3.d_date_sk) LEFT OUTER JOIN promotion ON (cs_promo_sk = p_promo_sk) LEFT OUTER JOIN catalog_returns ON (cr_item_sk = cs_item_sk AND cr_order_number = cs_order_number) WHERE d1.d_week_seq = d2.d_week_seq AND inv_quantity_on_hand < cs_quantity AND d3.d_date > (cast(d1.d_date AS DATE) + interval 5 days) AND hd_buy_potential = '>10000' AND d1.d_year = 1999 AND hd_buy_potential = '>10000' AND cd_marital_status = 'D' AND d1.d_year = 1999 GROUP BY i_item_desc, w_warehouse_name, d1.d_week_seq ORDER BY total_cnt DESC, i_item_desc, w_warehouse_name, d_week_seq LIMIT 100
  • 36. Query Analysis – Q72 CBO OFF Join #1 Join #2 Join #3 Join #4 Join #5 Join #6 Join #7 Join #8 catalog_sales inventory warehouse item customer_demographics date_dim d1 date_dim d2 date_dim d3 household_demographics 1.4 billion 783 million 223 billion 20 300,000223 billion 223 billion 44.7 million 7.5 million 1.6 million 9 million 1.9 million 7,200 73,049 73,049 73,049 8.7 million Really large intermediate results 36
  • 37. Query Analysis – Q72 CBO ON Join #1 Join #2 Join #3 Join #4 Join #8 Join #5 Join #6 Join #7 catalog_sales inventory warehouseitem customer_demographics date_dim d1 date_dim d2date_dim d3 household_demographics 1.4 billion 783 million 20300,000 1.9 million 7,200 73,049 73,049 73,049 238 million 238 million 47.6 million 47.6 million 2,555 1 billion 1 billion 8.7 million Much smaller intermediate results ! 37 2-3 orders of magnitude less! 8.1x Speedup
  • 38. TPC-DS Query Performance 38 0 350 700 1050 1400 1750 q1 q5 q9 q13 q16 q20 q23b q26 q30 q34 q38 q41 q45 q49 q53 q57 q61 q65 q70 q74 q78 q82 q86 q90 q94 q98 Runtime(seconds) without CBO with CBO
  • 39. TPC-DS Query Speedup • TPC-DS query speedup ratio with CBO versus without CBO • 16 queries show speedup > 30% • The max speedup is 8X. • The geo-mean of speedup is 2.2X. 39
  • 40. TPC-DS Query 64 40 WITH cs_ui AS (SELECT cs_item_sk, sum(cs_ext_list_price) AS sale, sum(cr_refunded_cash + cr_reversed_charge + cr_store_credit) AS refund FROM catalog_sales, catalog_returns WHERE cs_item_sk = cr_item_sk AND cs_order_number = cr_order_number GROUP BY cs_item_sk HAVING sum(cs_ext_list_price) > 2 * sum(cr_refunded_cash + cr_reversed_charge + cr_store_credit)), cross_sales AS (SELECT i_product_name product_name, i_item_sk item_sk, s_store_name store_name, s_zip store_zip, ad1.ca_street_number b_street_number, ad1.ca_street_name b_streen_name, ad1.ca_city b_city, ad1.ca_zip b_zip, ad2.ca_street_number c_street_number, ad2.ca_street_name c_street_name, ad2.ca_city c_city, ad2.ca_zip c_zip, d1.d_year AS syear, d2.d_year AS fsyear, d3.d_year s2year, count(*) cnt, sum(ss_wholesale_cost) s1, sum(ss_list_price) s2, sum(ss_coupon_amt) s3 FROM store_sales, store_returns, cs_ui, date_dim d1, date_dim d2, date_dim d3, store, customer, customer_demographics cd1, customer_demographics cd2, promotion, household_demographics hd1, household_demographics hd2, customer_address ad1, customer_address ad2, income_band ib1, income_band ib2, item WHERE ss_store_sk = s_store_sk AND ss_sold_date_sk = d1.d_date_sk AND ss_customer_sk = c_customer_sk AND ss_cdemo_sk = cd1.cd_demo_sk AND ss_hdemo_sk = hd1.hd_demo_sk AND ss_addr_sk = ad1.ca_address_sk AND ss_item_sk = i_item_sk AND ss_item_sk = sr_item_sk AND ss_ticket_number = sr_ticket_number AND ss_item_sk = cs_ui.cs_item_sk AND c_current_cdemo_sk = cd2.cd_demo_sk AND c_current_hdemo_sk = hd2.hd_demo_sk AND c_current_addr_sk = ad2.ca_address_sk AND c_first_sales_date_sk = d2.d_date_sk AND c_first_shipto_date_sk = d3.d_date_sk AND ss_promo_sk = p_promo_sk AND hd1.hd_income_band_sk = ib1.ib_income_band_sk AND hd2.hd_income_band_sk = ib2.ib_income_band_sk AND cd1.cd_marital_status <> cd2.cd_marital_status AND i_color IN ('purple', 'burlywood', 'indian', 'spring', 'floral', 'medium') AND i_current_price BETWEEN 64 AND 64 + 10 AND i_current_price BETWEEN 64 + 1 AND 64 + 15 GROUP BY i_product_name, i_item_sk, s_store_name, s_zip, ad1.ca_street_number, ad1.ca_street_name, ad1.ca_city, ad1.ca_zip, ad2.ca_street_number, ad2.ca_street_name, ad2.ca_city, ad2.ca_zip, d1.d_year, d2.d_year, d3.d_year) SELECT cs1.product_name, cs1.store_name, cs1.store_zip, cs1.b_street_number, cs1.b_streen_name, cs1.b_city, cs1.b_zip, cs1.c_street_number, cs1.c_street_name, cs1.c_city, cs1.c_zip, cs1.syear, cs1.cnt, cs1.s1, cs1.s2, cs1.s3, cs2.s1, cs2.s2, cs2.s3, cs2.syear, cs2.cnt FROM cross_sales cs1, cross_sales cs2 WHERE cs1.item_sk = cs2.item_sk AND cs1.syear = 1999 AND cs2.syear = 1999 + 1 AND cs2.cnt <= cs1.cnt AND cs1.store_name = cs2.store_name AND cs1.store_zip = cs2.store_zip ORDER BY cs1.product_name, cs1.store_name, cs2.cnt
  • 41. Query Analysis – Q64 CBO ON 41 10% slower FileScan (store_sales) Exchange ExchangeSort Aggregate SortMergeJoin BroadcastHashJoin FileScan (store_returns) Exchange Sort BroadcastExchange (cs_ui) ReusedExchange Sort Sort BroadcastHashJoin ReusedExchange ReusedExchange Sort SortMergeJoin Sort Fragment 1 Fragment 2
  • 42. Query Analysis – Q64 CBO OFF 42 FileScan (store_sales) Exchange Exchange Sort Aggregate SortMergeJoin SortMergeJoin FileScan (store_returns) Exchange Sort Sort (cs_ui) SortMergeJoin Sort ReusedExchange Aggregate Sort (cs_ui) ReusedExchange Sort Exchange Fragment 1 Fragment 2
  • 44. Current Status, Credits and Future Work Ron Hu Huawei Technologies
  • 45. Available in Apache Spark 2.2 • Configured via spark.sql.cbo.enabled • ‘Off By Default’. Why? – Spark is used in production – Many Spark users may already rely on “human intelligence” to write queries in best order – Plan on enabling this by default in Spark 2.3 • We encourage you test CBO with Spark 2.2! 45
  • 46. Current Status • SPARK-16026 is the umbrella jira. – 32 sub-tasks have been resolved – A big project spanning 8 months – 10+ Spark contributors involved – 7000+ lines of Scala code have been contributed • Good framework to allow integrations – Use statistics to derive if a join attribute is unique – Benefit star schema detection and its integration into join reorder 46
  • 47. Birth of Spark SQL CBO • Prototype – In 2015, Ron Hu, Fang Cao, etc. of Huawei’s research department prototyped the CBO concept on Spark 1.2. – After a successful prototype, we shared technology with Zhenhua Wang, Fei Wang, etc of Huawei’s product development team. • We delivered a talk at Spark Summit 2016: – “Enhancing Spark SQL Optimizer with Reliable Statistics”. • The talk was well received by the community. – 47
  • 48. Collaboration • Good community support – Developers: Zhenhua Wang, Ron Hu, Reynold Xin, Wenchen Fan, Xiao Li – Reviewers: Wenchen, Herman, Reynold, Xiao, Liang-chi, Ioana, Nattavut, Hyukjin, Shuai, ….. – Extensive discussion in JIRAs and PRs (tens to hundreds conversations). – All the comments made the development time longer, but improved code quality. • It was a pleasure working with community. 48
  • 49. Future Work: Cost Based Optimizer • Current cost formula is coarse. Cost = cardinality * weight + size * (1 - weight) • Cannot tell the cost difference between sort- merge join and hash join – spark.sql.join.preferSortMergeJoin defaults to true. • Underestimates (or ignores) shuffle cost. • Will improve cost formula in next release. 49
  • 50. Future Work: Statistics Collection Framework • Advanced statistics: e.g. histograms, sketches. • Hint mechanism. • Partition level statistics. • Speed up statistics collection by sampling data for large tables. 50
  • 51. Conclusion • Motivation • Statistics Collection Framework – Table/Column Level Statistics Collected – Cardinality Estimation (Filters, Joins, Aggregates etc.) • Cost-based Optimizations – Build Side Selection – Multi-way Join Re-ordering • TPC-DS Benchmarks • Demo 51
  • 53. Multi-way Join Reorder – Example • Given A J B J C J D with join conditions A.k1 = B.k1 and B.k2 = C.k2 and C.k3 = D.k3 level 0: p({A}), p({B}), p({C}), p({D}) level 1: p({A, B}), p({A, C}), p({A, D}), p({B, C}), p({B, D}), p({C, D}) level 2: p({A, B, C}), p({A, B, D}), p({A, C, D}), p({B, C, D}) level 3: p({A, B, C, D}) -- final output plan 53
  • 54. Multi-way Join Reorder – Example • Pruning strategy: exclude cartesian product candidates. This significantly reduces the search space. level 0: p({A}), p({B}), p({C}), p({D}) level 1: p({A, B}), p({A, C}), p({A, D}), p({B, C}), p({B, D}), p({C, D}) level 2: p({A, B, C}), p({A, B, D}), p({A, C, D}), p({B, C, D}) level 3: p({A, B, C, D}) -- final output plan 54
  • 55. New Commands in Apache Spark 2.2 • CBO commands – Collect table-level statistics • ANALYZE TABLE table_name COMPUTE STATISTICS – Collect column-level statistics • ANALYZE TABLE table-name COMPUTE STATISTICS FOR COLUMNS column_name1, column_name2, … – Display statistics in the optimized logical plan > EXPLAIN COST > SELECT cc_call_center_sk, cc_call_center_id FROM call_center; … == Optimized Logical Plan == Project [cc_call_center_sk#75, cc_call_center_id#76], Statistics(sizeInBytes=1680.0 B, rowCount=42, hints=none) +- Relation[…31 fields] parquet, Statistics(sizeInBytes=22.5 KB, rowCount=42, hints=none) … 55