Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

M|18 Understanding the Query Optimizer

74 views

Published on

M|18 Understanding the Query Optimizer

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

M|18 Understanding the Query Optimizer

  1. 1. Sergei Petrunia <sergey@mariadb.com> Igor Babaev <igor@mariadb.com> M|18, February 2018 Understanding the Query Optimizer
  2. 2. Optimizations for VIEWs, derived tables, and CTEs
  3. 3. 3 Plan ● Earlier versions: derived table merging ● MariaDB 10.2: Condition pushdown ● MariaDB 10.3: Condition pushdown through window functions ● MariaDB 10.3: GROUP BY splitting
  4. 4. 4 Background – derived table merge ● “Customers and their big orders from October” select * from customers, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERS where OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = customer.customer_id
  5. 5. 5 Naive execution select * from customers, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERS where OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = customers.customer_id orders customers 1 – compute oct_orders 2- do join OCT_ORDERS amount > 1M
  6. 6. 6 Derived table merge select * from customers, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERS where OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = customers.customer_id select * from customers, orders where order_date BETWEEN '2017-10-01' and '2017-10-31' and orders.amount > 1M and orders.customer_id = customers.customer_id
  7. 7. 7 Execution after merge customers Join orders select * from customers, orders where order_date BETWEEN '2017-10-01' and '2017-10-31' and orders.amount > 1M and orders.customer_id = customers.customer_id Made in October amount > 1M ● Allows the optimizer to join customers→orders or orders→customers ● Good for optimization
  8. 8. 8 Another use case - grouping ● Can’t merge due to GROUP BY in the child. create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id select * from OCT_TOTALS where customer_id=1
  9. 9. 9 Execution is inefficient create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id select * from OCT_TOTALS where customer_id=1 orders 1 – compute all totals 2- get* customer=1 OCT_TOTALS customer_id=1 Sum ( “derived_with_keys” will build/use an index here)
  10. 10. 10 Condition pushdown select * from OCT_TOTALS where customer_id=1 create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id ● Can push down conditions on GROUP BY columns ● … to filter out rows that go into groups we dont care about
  11. 11. 11 Condition pushdown select * from OCT_TOTALS where customer_id=1 orders 1 – find customer_id=1 OCT_TOTALS, customer_id=1 customer_id=1 Sum ● Looking only at rows you’re interested in is much more efficient create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id orders
  12. 12. 12 Condition Pushdown into HAVING select * from OCT_TOTALS where TOTAL_AMT > 1M ● Conditions that cannot be pushed through GROUP BY will be pushed into the HAVING clause create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id having TOTAL_AMT > 1M
  13. 13. 13 Condition pushdown will also push inferred conditions select custmer.customer_name, TOTAL_AMT from customer, OCT_TOTALS where customer.customer_id=OCT_TOTALS.customer_id and customer.customer_id=1 create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id OCT_TOTALS.customer_id=1
  14. 14. 14 “Split grouping for derived” select * from customer, OCT_TOTALS where customer.customer_id=OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2') create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id
  15. 15. 15 Execution, the old way Sum orders select * from customer, OCT_TOTALS where customer.customer_id= OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2') create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id Customer 1 Customer 2 Customer 3 Customer 100 Customer 1 Customer 2 Customer 3 Customer 100 customer Customer 1 Customer 2 OCT_TOTALS ● Inefficient, OCT_TOTALS is computed for *all* customers.
  16. 16. 16 Split grouping execution Sum customer Customer 2 Customer 2 Customer 1 Customer 100 orders Customer 1 Customer 1 Customer 2 Sum SumSum ● Can be used when doing join from customer to orders ● Must have equalities for GROUP BY columns: OCT_TOTALS.customer_id=customer.customer_id – This allows to select one group ● The underlying table (orders) must have an index on the GROUP BY column (customer_id) – This allows to use ref access
  17. 17. 17 Split grouping execution ● EXPLAIN shows “LATERAL DERIVED” ● @@optimizer_switch flag: split_materialization (ON by default) ● Cost-based choice whether use lateralization select * from customer, OCT_TOTALS where customer.customer_id= OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2') create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id +------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+ | 1 | PRIMARY | customer | ALL | PRIMARY | NULL | NULL | NULL | 1000 | | | 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | customer.customer_id | 36 | | | 2 | LATERAL DERIVED | orders | ref | customer_id | customer_id | 4 | customer.customer_id | 365 | Using where | +------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
  18. 18. 18 Summary ● MariaDB 10.2: Condition pushdown for derived tables optimization – Push a condition into derived table – Used when derived table cannot be merged – Biggest effect is for subqueries with GROUP BY ● MariaDB 10.3: Condition Pushdown through Window functions ● MariaDB 10.3: Lateral derived optimization – When doing a join, can’t do condition pushdown – So, lateral derived is used. It allows to only examine GROUP BY groups that match other tables. Group By columns must be indexed
  19. 19. Window Functions Optimizations
  20. 20. 20 Plan ● What are window functions ● Using window functions is an optimization by itself ● Condition pushdown through PARTITION BY ● Doing fewer sorts
  21. 21. 21 Window functions basics ● Window functions are like aggregate functions, – Each with its own GROUP BY clause ● Except that – The groups are ordered – The groups are not collapsed
  22. 22. 22 Aggregate function example select country, sum(Population) as total from Cities group by country +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +---------+----------+ | country | total | +---------+----------+ | DEU | 4030488 | | RUS | 8389200 | | USA | 11467668 | +---------+----------+
  23. 23. 23 Window function example select name, rank() over (partition by country, order by population desc) from cities +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +-----------+------+ | name | rank | +-----------+------+ | Berlin | 1 | | Frankfurt | 2 | | Moscow | 1 | | New York | 1 | | Chicago | 2 | | Seattle | 3 | +-----------+------+
  24. 24. 24 Window function example select name, rank() over (partition by country, order by population desc) from cities +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +-----------+------+ | name | rank | +-----------+------+ | Berlin | 1 | | Frankfurt | 2 | | Moscow | 1 | | New York | 1 | | Chicago | 2 | | Seattle | 3 | +-----------+------+
  25. 25. 25 Window function computation ● Can look at – Current row – Rows in the partition, ordered ● Can compute the window function ● Computing values individually would be expensive – O(#rows_in_partition ^ 2)
  26. 26. 26 Window function computation ● Many functions can be computed “on the fly” – RANK, ROW_NUMBER – SUM, AVG – ...
  27. 27. 27 Window function computation 10 $total+10 $total ● Example● Example SELECT SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cur_balance FROM transactions
  28. 28. 28 Compare to non-window function $total ● Typically uses a correlated subquery SELECT (SELECT SUM(amount) FROM transactions t WHERE t.date <= date AND account_id = 12345 ) AS cur_balance FROM transactions ● N^2 complexity
  29. 29. 29 Performance comparison # Rows Regular SQL Window Function 100 3.72 sec 0.01 sec 500 30.04 sec 0.01 sec 1000 59.6 sec 0.02 sec 2000 1 min 59 sec 0.03 sec 4000 4 min 1 sec 0.04 sec 16000 18 min 26 sec 0.18 sec
  30. 30. 30 MariaDB 10.3: Pushdown through Window Functions ● “Customer’s biggest orders” create view top_three_orders as select * from ( select customer_id, amount, rank() over (partition by customer_id order by amount desc ) as order_rank from orders ) as ordered_orders where order_rank<3 select * from top_three_orders where customer_id=1 +-------------+--------+------------+ | customer_id | amount | order_rank | +-------------+--------+------------+ | 1 | 10000 | 1 | | 1 | 9500 | 2 | | 1 | 400 | 3 | | 2 | 3200 | 1 | | 2 | 1000 | 2 | | 2 | 400 | 3 | ...
  31. 31. 31 MariaDB 10.3: Pushdown through Window Functions MariaDB 10.2, MySQL 8.0 ● Compute top_three_orders for all customers ● select rows with customer_id=1 select * from top_three_orders where customer_id=1 MariaDB 10.3 (and e.g. PostgreSQL) ● Only compute top_three_orders for customer_id=1 – This can be much faster! – Can make use of index(customer_id)
  32. 32. 32 Doing fewer sorts tbl tbl tbl join sort select rank() over (order by incidents), ntile(4)over (order by incidents), rank() over (order by ...), from support_staff ● Each window function requires a sort ● Could avoid sorting if using an index (not supported yet) ● Identical PARTITION/ORDER BY must share the sort step – Compatible may share the sort step (supported)
  33. 33. 33 Window function optimzation conclusions ● Using window functions is an optimization by itself ● Condition pushdown through PARTITION BY – This is the most important ● Fewer sorts are done.
  34. 34. 34 Thanks! Q & A

×