Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Sergei Petrunia <sergey@mariadb.com>
Igor Babaev <igor@mariadb.com>
M|18, February 2018
Understanding
the Query Optimizer
Optimizations for VIEWs,
derived tables, and CTEs
3
Plan
● Earlier versions: derived table merging
● MariaDB 10.2: Condition pushdown
● MariaDB 10.3: Condition pushdown thr...
4
Background – derived table merge
● “Customers and their big orders from October”
select *
from
customers,
(select *
from...
5
Naive execution
select *
from
customers,
(select *
from orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
) ...
6
Derived table merge
select *
from
customers,
(select *
from orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31...
7
Execution after merge
customers
Join
orders
select *
from
customers,
orders
where
order_date BETWEEN '2017-10-01' and
'2...
8
Another use case - grouping
● Can’t merge due to GROUP BY in the child.
create view OCT_TOTALS as
select
customer_id,
SU...
9
Execution is inefficient
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_...
10
Condition pushdown
select *
from OCT_TOTALS
where customer_id=1
create view OCT_TOTALS as
select
customer_id,
SUM(amoun...
11
Condition pushdown
select *
from OCT_TOTALS
where customer_id=1
orders
1 – find customer_id=1
OCT_TOTALS,
customer_id=1...
12
Condition Pushdown into HAVING
select *
from OCT_TOTALS
where TOTAL_AMT > 1M
● Conditions that cannot be pushed
through...
13
Condition pushdown will also push inferred conditions
select
custmer.customer_name,
TOTAL_AMT
from
customer, OCT_TOTALS...
14
“Split grouping for derived”
select *
from
customer, OCT_TOTALS
where
customer.customer_id=OCT_TOTALS.customer_id and
c...
15
Execution, the old way
Sum
orders
select *
from
customer, OCT_TOTALS
where
customer.customer_id=
OCT_TOTALS.customer_id...
16
Split grouping execution
Sum
customer
Customer 2
Customer 2
Customer 1
Customer 100
orders
Customer 1
Customer 1
Custom...
17
Split grouping execution
● EXPLAIN shows “LATERAL DERIVED”
● @@optimizer_switch flag: split_materialization (ON by defa...
18
Summary
● MariaDB 10.2: Condition pushdown for derived tables optimization
– Push a condition into derived table
– Used...
Window Functions Optimizations
20
Plan
● What are window functions
● Using window functions is an optimization by itself
● Condition pushdown through PAR...
21
Window functions basics
● Window functions are like aggregate functions,
– Each with its own GROUP BY clause
● Except t...
22
Aggregate function example
select
country, sum(Population) as total
from Cities
group by country
+-----------+---------...
23
Window function example
select
name,
rank() over (partition by country,
order by population desc)
from cities
+--------...
24
Window function example
select
name,
rank() over (partition by country,
order by population desc)
from cities
+--------...
25
Window function computation
● Can look at
– Current row
– Rows in the partition, ordered
● Can compute the window funct...
26
Window function computation
● Many functions can be computed “on
the fly”
– RANK, ROW_NUMBER
– SUM, AVG
– ...
27
Window function computation
10 $total+10
$total
● Example● Example
SELECT
SUM(amount) OVER (ORDER BY date
ROWS BETWEEN
...
28
Compare to non-window function
$total
● Typically uses a correlated subquery
SELECT
(SELECT SUM(amount)
FROM
transactio...
29
Performance comparison
# Rows Regular SQL Window Function
100 3.72 sec 0.01 sec
500 30.04 sec 0.01 sec
1000 59.6 sec 0....
30
MariaDB 10.3: Pushdown through Window Functions
● “Customer’s biggest orders”
create view top_three_orders as
select *
...
31
MariaDB 10.3: Pushdown through Window Functions
MariaDB 10.2, MySQL 8.0
● Compute
top_three_orders for all
customers
● ...
32
Doing fewer sorts
tbl
tbl
tbl
join
sort
select
rank() over (order by incidents),
ntile(4)over (order by incidents),
ran...
33
Window function optimzation conclusions
● Using window functions is an optimization by itself
● Condition pushdown thro...
34
Thanks!
Q & A
Upcoming SlideShare
Loading in …5
×

M|18 Understanding the Query Optimizer

203 views

Published on

M|18 Understanding the Query Optimizer

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

M|18 Understanding the Query Optimizer

  1. 1. Sergei Petrunia <sergey@mariadb.com> Igor Babaev <igor@mariadb.com> M|18, February 2018 Understanding the Query Optimizer
  2. 2. Optimizations for VIEWs, derived tables, and CTEs
  3. 3. 3 Plan ● Earlier versions: derived table merging ● MariaDB 10.2: Condition pushdown ● MariaDB 10.3: Condition pushdown through window functions ● MariaDB 10.3: GROUP BY splitting
  4. 4. 4 Background – derived table merge ● “Customers and their big orders from October” select * from customers, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERS where OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = customer.customer_id
  5. 5. 5 Naive execution select * from customers, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERS where OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = customers.customer_id orders customers 1 – compute oct_orders 2- do join OCT_ORDERS amount > 1M
  6. 6. 6 Derived table merge select * from customers, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERS where OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = customers.customer_id select * from customers, orders where order_date BETWEEN '2017-10-01' and '2017-10-31' and orders.amount > 1M and orders.customer_id = customers.customer_id
  7. 7. 7 Execution after merge customers Join orders select * from customers, orders where order_date BETWEEN '2017-10-01' and '2017-10-31' and orders.amount > 1M and orders.customer_id = customers.customer_id Made in October amount > 1M ● Allows the optimizer to join customers→orders or orders→customers ● Good for optimization
  8. 8. 8 Another use case - grouping ● Can’t merge due to GROUP BY in the child. create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id select * from OCT_TOTALS where customer_id=1
  9. 9. 9 Execution is inefficient create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id select * from OCT_TOTALS where customer_id=1 orders 1 – compute all totals 2- get* customer=1 OCT_TOTALS customer_id=1 Sum ( “derived_with_keys” will build/use an index here)
  10. 10. 10 Condition pushdown select * from OCT_TOTALS where customer_id=1 create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id ● Can push down conditions on GROUP BY columns ● … to filter out rows that go into groups we dont care about
  11. 11. 11 Condition pushdown select * from OCT_TOTALS where customer_id=1 orders 1 – find customer_id=1 OCT_TOTALS, customer_id=1 customer_id=1 Sum ● Looking only at rows you’re interested in is much more efficient create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id orders
  12. 12. 12 Condition Pushdown into HAVING select * from OCT_TOTALS where TOTAL_AMT > 1M ● Conditions that cannot be pushed through GROUP BY will be pushed into the HAVING clause create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id having TOTAL_AMT > 1M
  13. 13. 13 Condition pushdown will also push inferred conditions select custmer.customer_name, TOTAL_AMT from customer, OCT_TOTALS where customer.customer_id=OCT_TOTALS.customer_id and customer.customer_id=1 create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id OCT_TOTALS.customer_id=1
  14. 14. 14 “Split grouping for derived” select * from customer, OCT_TOTALS where customer.customer_id=OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2') create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id
  15. 15. 15 Execution, the old way Sum orders select * from customer, OCT_TOTALS where customer.customer_id= OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2') create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id Customer 1 Customer 2 Customer 3 Customer 100 Customer 1 Customer 2 Customer 3 Customer 100 customer Customer 1 Customer 2 OCT_TOTALS ● Inefficient, OCT_TOTALS is computed for *all* customers.
  16. 16. 16 Split grouping execution Sum customer Customer 2 Customer 2 Customer 1 Customer 100 orders Customer 1 Customer 1 Customer 2 Sum SumSum ● Can be used when doing join from customer to orders ● Must have equalities for GROUP BY columns: OCT_TOTALS.customer_id=customer.customer_id – This allows to select one group ● The underlying table (orders) must have an index on the GROUP BY column (customer_id) – This allows to use ref access
  17. 17. 17 Split grouping execution ● EXPLAIN shows “LATERAL DERIVED” ● @@optimizer_switch flag: split_materialization (ON by default) ● Cost-based choice whether use lateralization select * from customer, OCT_TOTALS where customer.customer_id= OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2') create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id +------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+ | 1 | PRIMARY | customer | ALL | PRIMARY | NULL | NULL | NULL | 1000 | | | 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | customer.customer_id | 36 | | | 2 | LATERAL DERIVED | orders | ref | customer_id | customer_id | 4 | customer.customer_id | 365 | Using where | +------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
  18. 18. 18 Summary ● MariaDB 10.2: Condition pushdown for derived tables optimization – Push a condition into derived table – Used when derived table cannot be merged – Biggest effect is for subqueries with GROUP BY ● MariaDB 10.3: Condition Pushdown through Window functions ● MariaDB 10.3: Lateral derived optimization – When doing a join, can’t do condition pushdown – So, lateral derived is used. It allows to only examine GROUP BY groups that match other tables. Group By columns must be indexed
  19. 19. Window Functions Optimizations
  20. 20. 20 Plan ● What are window functions ● Using window functions is an optimization by itself ● Condition pushdown through PARTITION BY ● Doing fewer sorts
  21. 21. 21 Window functions basics ● Window functions are like aggregate functions, – Each with its own GROUP BY clause ● Except that – The groups are ordered – The groups are not collapsed
  22. 22. 22 Aggregate function example select country, sum(Population) as total from Cities group by country +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +---------+----------+ | country | total | +---------+----------+ | DEU | 4030488 | | RUS | 8389200 | | USA | 11467668 | +---------+----------+
  23. 23. 23 Window function example select name, rank() over (partition by country, order by population desc) from cities +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +-----------+------+ | name | rank | +-----------+------+ | Berlin | 1 | | Frankfurt | 2 | | Moscow | 1 | | New York | 1 | | Chicago | 2 | | Seattle | 3 | +-----------+------+
  24. 24. 24 Window function example select name, rank() over (partition by country, order by population desc) from cities +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +-----------+------+ | name | rank | +-----------+------+ | Berlin | 1 | | Frankfurt | 2 | | Moscow | 1 | | New York | 1 | | Chicago | 2 | | Seattle | 3 | +-----------+------+
  25. 25. 25 Window function computation ● Can look at – Current row – Rows in the partition, ordered ● Can compute the window function ● Computing values individually would be expensive – O(#rows_in_partition ^ 2)
  26. 26. 26 Window function computation ● Many functions can be computed “on the fly” – RANK, ROW_NUMBER – SUM, AVG – ...
  27. 27. 27 Window function computation 10 $total+10 $total ● Example● Example SELECT SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cur_balance FROM transactions
  28. 28. 28 Compare to non-window function $total ● Typically uses a correlated subquery SELECT (SELECT SUM(amount) FROM transactions t WHERE t.date <= date AND account_id = 12345 ) AS cur_balance FROM transactions ● N^2 complexity
  29. 29. 29 Performance comparison # Rows Regular SQL Window Function 100 3.72 sec 0.01 sec 500 30.04 sec 0.01 sec 1000 59.6 sec 0.02 sec 2000 1 min 59 sec 0.03 sec 4000 4 min 1 sec 0.04 sec 16000 18 min 26 sec 0.18 sec
  30. 30. 30 MariaDB 10.3: Pushdown through Window Functions ● “Customer’s biggest orders” create view top_three_orders as select * from ( select customer_id, amount, rank() over (partition by customer_id order by amount desc ) as order_rank from orders ) as ordered_orders where order_rank<3 select * from top_three_orders where customer_id=1 +-------------+--------+------------+ | customer_id | amount | order_rank | +-------------+--------+------------+ | 1 | 10000 | 1 | | 1 | 9500 | 2 | | 1 | 400 | 3 | | 2 | 3200 | 1 | | 2 | 1000 | 2 | | 2 | 400 | 3 | ...
  31. 31. 31 MariaDB 10.3: Pushdown through Window Functions MariaDB 10.2, MySQL 8.0 ● Compute top_three_orders for all customers ● select rows with customer_id=1 select * from top_three_orders where customer_id=1 MariaDB 10.3 (and e.g. PostgreSQL) ● Only compute top_three_orders for customer_id=1 – This can be much faster! – Can make use of index(customer_id)
  32. 32. 32 Doing fewer sorts tbl tbl tbl join sort select rank() over (order by incidents), ntile(4)over (order by incidents), rank() over (order by ...), from support_staff ● Each window function requires a sort ● Could avoid sorting if using an index (not supported yet) ● Identical PARTITION/ORDER BY must share the sort step – Compatible may share the sort step (supported)
  33. 33. 33 Window function optimzation conclusions ● Using window functions is an optimization by itself ● Condition pushdown through PARTITION BY – This is the most important ● Fewer sorts are done.
  34. 34. 34 Thanks! Q & A

×