Your SlideShare is downloading. ×
0
Advanced query optimizertuning and analysisSergei PetruniaTimour KatchaounovMonty Program AbMySQL Conference And Expo 2013
2 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief ...
3 07:48:08 AMIs there a problem with query optimizer?• Databaseperformance isaffected by manyfactors• One of them is thequ...
4 07:48:08 AMSings that there is a query optimizer problem• Some (not all) queries are slow• A query seems to run longer t...
5 07:48:08 AMCatching slow queries, the old ways● Watch the Slow query log– Percona Server/MariaDB:--log_slow_verbosity=qu...
6 07:48:08 AMThe new way: SHOW PROCESSLIST + SHOW EXPLAIN• Available in MariaDB 10.0+• Displays EXPLAIN of a running state...
7 07:48:08 AMSHOW EXPLAIN usage● Intended usage– SHOW PROCESSLIST ...– SHOW EXPLAIN FOR ...● Why not just run EXPLAIN agai...
8 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]● use performance_schema● Many ways to...
9 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]• Modified Q18 from DBT3select c_name,...
10 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]● Find candidate slow queries● Simple...
11 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]*************************** 5. row **...
12 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]● Check the actual queries and consta...
13 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]+-----------+------------------------...
14 07:48:08 AMActions after finding the slow queryBad query plan– Rewrite the query– Force a good query plan• Bad optimize...
15 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief...
16 07:48:08 AMConsider a simple select• 15M rows were scanned, 19 rows in output• Query plan seems inefficient– (note: thi...
17 07:48:08 AMQuery plan analysis• Entire table is scanned• WHERE condition checkedafter records are read– Not used to lim...
18 07:48:08 AMLets add an index• Outcome– Down to reading 300K rows– Still, 300K >> 19 rows.alter table orders add key i_o...
19 07:48:08 AMFinding out which indexes to add● index (o_orderdate)● index (o_clerk)Check selectivity of conditions that w...
20 07:48:08 AM+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+|id|select_type|tabl...
21 07:48:08 AMConditions must be in SARGable form• Condition must represent a range• It must have form that is recognized ...
22 07:48:08 AMNew in MySQL-5.6: optimizer_trace● Lets you see the rangesset optimizer_trace=1;explain select * from orders...
23 07:48:08 AMNew in MySQL-5.6: optimizer_trace..."range_scan_alternatives": [{"index": "i_o_orderdate","ranges": ["1992-0...
24 07:48:08 AMSource of #rows estimates for rangeselect * from orderswhere o_orderDate BETWEEN 1992-06-06 and 1992-07-06+-...
25 07:48:08 AMSimple selects: conclusions• Efficiency == “#rows_scanned is close to #rows_returned”• Indexes and WHERE con...
26 07:48:08 AMNow, will skip some topicsOne can also speedup simple selects with● index_merge access method● index access ...
27 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief...
28 07:48:08 AMA simple joinselect * from customer, orders where c_custkey=o_custkey• “Customers with their orders”
29 07:48:08 AMExecution: Nested Loops joinselect * from customer, orders where c_custkey=o_custkeyfor each customer C {for...
30 07:48:08 AMExecution: Nested loops join (2)select * from customer, orders where c_custkey=o_custkeyfor each customer C ...
31 07:48:08 AMExecution: Nested loops join (3)select * from customer, orders where c_custkey=o_custkeyfor each customer C ...
32 07:48:08 AMExecution: Nested loops join (4)select * from customer, orders where c_custkey=o_custkey+--+-----------+----...
33 07:48:08 AMUsing index for join: ref accessalter table orders add index i_o_custkey(o_custkey)select * from customer, o...
34 07:48:08 AMref access - analysis+--+-----------+--------+----+-------------+-----------+-------+------------------+----...
35 07:48:08 AMConditions that can be used for ref access● Can use equalities– tbl.key=other_table.col– tbl.key=const– tbl....
36 07:48:08 AMConditions that cant be used for ref access● Doesnt work for non-equalitiest1.key BETWEEN t2.col1 AND t2.col...
37 07:48:08 AMIs ref always efficient?● Efficient, if column has many different values– Best case – unique index (eq_ref)●...
38 07:48:08 AMref access estimates - index statistics• How many rows will matchtbl.key_column = $valuefor an arbitrary $va...
39 07:48:08 AMref access – conclusions● Based on t.key=... equality conditions● Can make joins very efficient● Relies on i...
40 07:48:08 AMOptimizer statistics● MySQL/Percona Server– Index statistics– Persistent/transient InnoDB stats● MariaDB– In...
41 07:48:08 AMIndex statistics● Cardinality allows to calculate a table-wideaverage #rows-per-key-prefix● It is a statisti...
42 07:48:08 AMIndex statistics in MySQL 5.6● Sample [8] random index leaf pages● Table statistics (stored)– rows - estimat...
43 07:48:08 AMIndex statics updates● Statistics updated when:– ANALYZE TABLE tbl_name [, tbl_name] …– SHOW TABLE STATUS, S...
44 07:48:08 AMDisplaying optimizer statistics● MySQL 5.5, MariaDB 5.3, and older– Issue SQL statements to count rows/keys–...
45 07:48:08 AMPlan [in]stability● Statistics may vary a lot (orders)MariaDB [dbt3]> select * from information_schema.innod...
46 07:48:08 AMControlling statistics (MySQL 5.6)● Persistent and user-updatetable InnoDB statistics– innodb_analyze_is_per...
47 07:48:08 AMControlling statistics (MariaDB 10.0)Current XtraDB index statistics+● Engine-independent, persistent, user-...
48 07:48:08 AMJoin conditionpushdown
49 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderp...
50 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderp...
51 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderp...
52 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderp...
53 07:48:08 AMObserving join condition pushdownEXPLAIN: {"query_block": {"select_id": 1,"nested_loop": [{"table": {"table_...
54 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -50...
55 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -50...
56 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -50...
57 07:48:08 AM●o_orderpriority=1-URGENTo_orderpriority=1-URGENT● select count(*) from orders – 1.5M rows● select count(*) ...
58 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -50...
59 07:48:08 AMReasoning about join plan efficiency - summaryBasic* approach to evaluation of join plan efficiency:for each...
60 07:48:08 AMAttached conditions
61 07:48:08 AMAttached conditions● Ideally, should be used for table access● Not all conditions can be used [at the same t...
62 07:48:08 AMInforming optimizer about attached conditionsCurrently: a range access thats too expensive to use+--+-------...
63 07:48:08 AMAttached condition selectivity● Unused indexes provide info about selectivity– Works, but very expensive● Ma...
64 07:48:08 AMHow to check if the query planmatches the reality
65 07:48:08 AMCheck if query plan is realistic● EXPLAIN shows what optimizerexpects. It may be wrong– Out-of-date index st...
66 07:48:08 AMJoin analysis: example query (Q18, DBT3)<reset counters>select c_name, c_custkey, o_orderkey, o_orderdate,o_...
67 07:48:08 AMJoin analysis: handler counters (old)FLUSH STATUS;=> RUN QUERYSHOW STATUS LIKE "Handler%";+-----------------...
68 07:48:08 AMJoin analysis: USERSTAT by FacebookMariaDB, Percona ServerSET GLOBAL USERSTAT=1;FLUSH TABLE_STATISTICS;FLUSH...
69 07:48:08 AMJoin analysis: PERFORMANCE SCHEMA[MySQL 5.6, MariaDB 10.0]● summary tables with read/write statistics– table...
70 07:48:08 AMAnalyze joins via PERFORMANCE SCHEMA:SHOW TABLE_STATISTICS analogueselect object_schema, object_name, count_...
71 07:48:08 AMAnalyze joins via PERFORMANCE SCHEMA:SHOW INDEX_STATISTICS analogueselect object_schema, object_name, index_...
72 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief...
73 07:48:08 AMBatched joins● Optimization for analytical queries● Analytic queries shovel through lots of data– e.g. “aver...
74 07:48:08 AMBatched Key Access Idea
75 07:48:08 AMBatched Key Access Idea
76 07:48:08 AMBatched Key Access Idea
77 07:48:08 AMBatched Key Access Idea
78 07:48:08 AMBatched Key Access Idea
79 07:48:08 AMBatched Key Access Idea
80 07:48:08 AMBatched Key Access Idea● Non-BKA join hits data at random● Caches are not used efficiently● Prefetching is n...
81 07:48:08 AMBatched Key Access Idea● BKA implementation accesses datain order● Takes advantages of caches andprefetching
82 07:48:08 AMBatched Key access effectset join_cache_level=6;select max(l_extendedprice)from orders, lineitemwherel_order...
83 07:48:08 AMBatched Key Access Performance-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,...
84 07:48:08 AMBatched Key Access settings● Needs to be turned onset join_buffer_size= 32*1024*1024;set join_cache_level=6;...
85 07:48:08 AMBatched Key Access - conclusions● Targeted at big joins● Needs to be enabled manually● @@join_buffer_size is...
86 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief...
87 07:48:08 AMORDER BYGROUP BYaggregates
88 07:48:08 AMAggregate functions, no GROUP BY● COUNT, SUM, AVG, etc need to examine all rowsselect SUM(column) from tbl n...
89 07:48:08 AMORDER BY … LIMITThree algorithms● Use an index to read in order● Read one table, sort, join - “Using filesor...
90 07:48:08 AMUsing index to read data in order● No special indicationin EXPLAIN output● LIMIT n: as soon aswe read n reco...
91 07:48:08 AMA problem with LIMIT N optimization`orders` has 1.5 M rowsexplain select * from orders order by o_orderdate ...
92 07:48:08 AMUsing filesort strategy● Have to read the entirefirst table● For remaining, can applyLIMIT n● ORDER BY can o...
93 07:48:08 AMUsing temporary; Using filesort● ORDER BY clausecan use columns ofany table● LIMIT is applied onlyafter exec...
94 07:48:08 AMORDER BY - conclusions● Resolving ORDER BY with index allows veryefficient handling for LIMIT– Optimization ...
95 07:48:08 AMGROUP BY strategiesThere are three strategies● Ordered index scan● Loose Index Scan (LooseScan)● Groups tabl...
96 07:48:08 AMOrdered index scan● Groups areenumerated one afteranother● Can computeaggregates on the fly● Loose index sca...
97 07:48:08 AMExecution of GROUP BY with temptable
98 07:48:08 AMSubqueries
99 07:48:08 AMSubquery optimizations● Before MariaDB 5.3/MySQL 5.6 - “dont use subqueries”● Queries that caused most of th...
100 07:48:08 AMTuning for subqueries● “Before”: one execution strategy– No tuning possible● “After”: similar to joins– Rea...
101 07:48:08 AMWhat if it still picks a poor query plan?For both MariaDB and MySQL:● Check EXPLAIN [EXTENDED], find a keyw...
102 07:48:08 AMThanks!Q & A
Upcoming SlideShare
Loading in...5
×

MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013

3,723

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,723
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
118
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013"

  1. 1. Advanced query optimizertuning and analysisSergei PetruniaTimour KatchaounovMonty Program AbMySQL Conference And Expo 2013
  2. 2. 2 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief recap from 2012● JOINs– ref access● index statistics– join condition pushdown– join plan efficiency– query plan vs reality● Big I/O bound JOINs– Batched Key Access● Aggregate functions● ORDER BY ... LIMIT● GROUP BY● Subqueries
  3. 3. 3 07:48:08 AMIs there a problem with query optimizer?• Databaseperformance isaffected by manyfactors• One of them is thequery optimizer• Is my performanceproblem caused bythe optimizer?
  4. 4. 4 07:48:08 AMSings that there is a query optimizer problem• Some (not all) queries are slow• A query seems to run longer than it ought to– And examines more records than it ought to• Usually, query remains slow regardless ofother activity on the server
  5. 5. 5 07:48:08 AMCatching slow queries, the old ways● Watch the Slow query log– Percona Server/MariaDB:--log_slow_verbosity=query_plan# Thread_id: 1 Schema: dbt3sf10 QC_hit: No# Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No# Filesort: No Filesort_on_disk: No Merge_passes: 0SET timestamp=1333385770;select * from customer where c_acctbal < -1000;• Run SHOW PROCESSLIST periodically– Run pt-query-digest on the log
  6. 6. 6 07:48:08 AMThe new way: SHOW PROCESSLIST + SHOW EXPLAIN• Available in MariaDB 10.0+• Displays EXPLAIN of a running statementMariaDB> show processlist;+--+----+---------+-------+-------+----+------------+-------------------------...|Id|User|Host |db |Command|Time|State |Info+--+----+---------+-------+-------+----+------------+-------------------------...| 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ...| 2|root|localhost|dbt3sf1|Query | 0|init |show processlist+--+----+---------+-------+-------+----+------------+-------------------------...MariaDB> show explain for 1;+--+-----------+------+----+-------------+----+-------+----+-------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+------+----+-------------+----+-------+----+-------+-----------+|1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where|+--+-----------+------+----+-------------+----+-------+----+-------+-----------+MariaDB [dbt3sf1]> show warnings;+-----+----+-----------------------------------------------------------------+|Level|Code|Message |+-----+----+-----------------------------------------------------------------+|Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995|+-----+----+-----------------------------------------------------------------+
  7. 7. 7 07:48:08 AMSHOW EXPLAIN usage● Intended usage– SHOW PROCESSLIST ...– SHOW EXPLAIN FOR ...● Why not just run EXPLAIN again– Difficult to replicate setups● Temporary tables● Optimizer settings● Storage engines index statistics● ...– No uncertainty about whether youre looking atthe same query plan or not.
  8. 8. 8 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]● use performance_schema● Many ways to analyze via queries– events_statements_summary_by_digest● count_star, sum_timer_wait,min_timer_wait, avg_timer_wait, max_timer_wait● digest_text, digest● sum_rows_examined, sum_created_tmp_disk_tables,sum_select_full_join– events_statements_history● sql_text, digest_text, digest● timer_start, timer_end, timer_wait● rows_examined, created_tmp_disk_tables,select_full_join8
  9. 9. 9 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]• Modified Q18 from DBT3select c_name, c_custkey, o_orderkey, o_orderdate,o_totalprice, sum(l_quantity)from customer, orders, lineitemwhereo_totalprice > ?and c_custkey = o_custkeyand o_orderkey = l_orderkeygroup by c_name, c_custkey, o_orderkey,o_orderdate, o_totalpriceorder by o_totalprice desc, o_orderdateLIMIT 10;• App executes Q18 many times with? = 550000, 500000, 400000, ...9
  10. 10. 10 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]● Find candidate slow queries● Simple tests: select_full_join > 0,created_tmp_disk_tables > 0, etc● Complex conditions:max execution time > X sec ORmin/max time vary a lot:select max_timer_wait/avg_timer_wait as max_ratio,avg_timer_wait/min_timer_wait as min_ratiofrom events_statements_summary_by_digestwhere max_timer_wait > 1000000000000or max_timer_wait / avg_timer_wait > 2or avg_timer_wait / min_timer_wait > 2G
  11. 11. 11 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]*************************** 5. row ***************************DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6bDIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` ,`o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE`o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY`c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice`DESC , `o_orderdate` LIMIT ?COUNT_STAR: 3SUM_TIMER_WAIT: 3251758347000MIN_TIMER_WAIT: 3914209000 → 0.0039 secAVG_TIMER_WAIT: 1083919449000MAX_TIMER_WAIT: 3204044053000 → 3.2 secSUM_LOCK_TIME: 555000000SUM_ROWS_SENT: 25SUM_ROWS_EXAMINED: 0SUM_CREATED_TMP_DISK_TABLES: 0SUM_CREATED_TMP_TABLES: 3SUM_SELECT_FULL_JOIN: 0SUM_SELECT_RANGE: 3SUM_SELECT_SCAN: 0SUM_SORT_RANGE: 0SUM_SORT_ROWS: 25SUM_SORT_SCAN: 3SUM_NO_INDEX_USED: 0SUM_NO_GOOD_INDEX_USED: 0FIRST_SEEN: 1970-01-01 03:38:27LAST_SEEN: 1970-01-01 03:38:43max_ratio: 2.9560min_ratio: 276.9192High variance ofexecution time
  12. 12. 12 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]● Check the actual queries and constants● The events_statements_history tableselect timer_wait/1000000000000 as exec_time, sql_textfrom events_statements_historywhere digest in(select digest from events_statements_summary_by_digestwhere max_timer_wait > 1000000000000or max_timer_wait / avg_timer_wait > 2or avg_timer_wait / min_timer_wait > 2)order by timer_wait;
  13. 13. 13 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]+-----------+-----------------------------------------------------------------------------------+| exec_time | sql_text |+-----------+-----------------------------------------------------------------------------------+| 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)from customer, orders, lineitemwhere o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 || 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)from customer, orders, lineitemwhere o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 || 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)from customer, orders, lineitemwhere o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 |+-----------+-----------------------------------------------------------------------------------+Observation:orders.o_totalprice > ? is less and less selective
  14. 14. 14 07:48:08 AMActions after finding the slow queryBad query plan– Rewrite the query– Force a good query plan• Bad optimizer settings– Do tuning• Query is inherently complex– Dont waste time with it– Look for other solutions.
  15. 15. 15 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief recap from 2012● JOINs– ref access● index statistics– join condition pushdown– join plan efficiency– query plan vs reality● Big I/O bound JOINs– Batched Key Access● Aggregate functions● ORDER BY ... LIMIT● GROUP BY● Subqueries
  16. 16. 16 07:48:08 AMConsider a simple select• 15M rows were scanned, 19 rows in output• Query plan seems inefficient– (note: this logic doesnt directly apply to group/order by queries).select * from orderswhereo_orderDate BETWEEN 1992-06-06 and 1992-07-06 ando_clerk=Clerk#000009506+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+19 rows in set (7.65 sec)● Check the query plan:● Run the query:
  17. 17. 17 07:48:08 AMQuery plan analysis• Entire table is scanned• WHERE condition checkedafter records are read– Not used to limit#examined rows.+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+select * from orderswhereo_orderDate BETWEEN 1992-06-06 and 1992-07-06 ando_clerk=Clerk#000009506
  18. 18. 18 07:48:08 AMLets add an index• Outcome– Down to reading 300K rows– Still, 300K >> 19 rows.alter table orders add key i_o_orderdate (o_orderdate);select * from orderswhereo_orderDate BETWEEN 1992-06-06 and 1992-07-06 ando_clerk=Clerk#000009506+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+19 rows in set (0.76 sec)● Query time:
  19. 19. 19 07:48:08 AMFinding out which indexes to add● index (o_orderdate)● index (o_clerk)Check selectivity of conditions that will use the indexselect * from orderswhereo_orderDate BETWEEN 1992-06-06 and 1992-07-06 ando_clerk=Clerk#000009506select count(*) from orderswhereo_orderDate BETWEEN 1992-06-06 and 1992-07-06;306322 rowsselect count(*) from orders where o_clerk=Clerk#0000095061507 rows.
  20. 20. 20 07:48:08 AM+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+|1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where|+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------++--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+|1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where|+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+Try adding composite indexes● index (o_clerk, o_orderdate)● index (o_orderdate, o_clerk)Bingo! 100% efficiencyMuch worse!• If condition uses multiple columns, composite index will be most efficient• Order of column matters– Explanation why is outside of scope of this tutorial. Covered in last yearstutorial
  21. 21. 21 07:48:08 AMConditions must be in SARGable form• Condition must represent a range• It must have form that is recognized by the optimizero_orderDate BETWEEN 1992-06-01 and 1992-06-30day(o_orderDate)=1992 and month(o_orderdate)=6TO_DAYS(o_orderDATE) between TO_DAYS(1992-06-06) andTO_DAYS(1992-07-06)o_clerk=Clerk#000009506o_clerk LIKE Clerk#000009506o_clerk LIKE %Clerk#000009506%column IN (1,10,15,21, ...)(col1, col2) IN ( (1,1), (2,2), (3,3), …). 
  22. 22. 22 07:48:08 AMNew in MySQL-5.6: optimizer_trace● Lets you see the rangesset optimizer_trace=1;explain select * from orderswhere o_orderDATE between 1992-06-01 and 1992-07-03 ando_orderdate not in (1992-01-01, 1992-06-12,1992-07-04)select * from information_schema.optimizer_traceG● Will print a big JSON struct● Search for range_scan_alternatives.
  23. 23. 23 07:48:08 AMNew in MySQL-5.6: optimizer_trace..."range_scan_alternatives": [{"index": "i_o_orderdate","ranges": ["1992-06-01 <= o_orderDATE < 1992-06-12","1992-06-12 < o_orderDATE <= 1992-07-03"],"index_dives_for_eq_ranges": true,"rowid_ordered": false,"using_mrr": false,"index_only": false,"rows": 319082,"cost": 382900,"chosen": true},{"index": "i_o_date_clerk","ranges": ["1992-06-01 <= o_orderDATE < 1992-06-12","1992-06-12 < o_orderDATE <= 1992-07-03"],"index_dives_for_eq_ranges": true,"rowid_ordered": false,"using_mrr": false,"index_only": false,"rows": 406336,"cost": 487605,"chosen": false,"cause": "cost"}],...● Considered ranges are shownin range_scan_alternativessection● This is actually original usecase of optimizer_trace● Alas, recent mysql-5.6 displaysmisleading info about rangeson multi-component keys (willfile a bug)● Still, very useful.
  24. 24. 24 07:48:08 AMSource of #rows estimates for rangeselect * from orderswhere o_orderDate BETWEEN 1992-06-06 and 1992-07-06+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+?• “records_in_range” estimate• Done by diving into index• Usually is fairly accurate• Not affected by ANALYZETABLE.
  25. 25. 25 07:48:08 AMSimple selects: conclusions• Efficiency == “#rows_scanned is close to #rows_returned”• Indexes and WHERE conditions reduce #rows scanned• Index estimates are usually accurate• Multi-column indexes– “handle” conditions on multiple columns– Order of columns in the index matters• optimizer_trace allows to view the ranges– But misrepresents ranges over multi-column indexes.
  26. 26. 26 07:48:08 AMNow, will skip some topicsOne can also speedup simple selects with● index_merge access method● index access method● Index Condition PushdownWe dont have time for these now, check out the lastyears tutorial.
  27. 27. 27 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief recap from 2012● JOINs– ref access● index statistics– join condition pushdown– join plan efficiency– query plan vs reality● Big I/O bound JOINs– Batched Key Access● Aggregate functions● ORDER BY ... LIMIT● GROUP BY● Subqueries
  28. 28. 28 07:48:08 AMA simple joinselect * from customer, orders where c_custkey=o_custkey• “Customers with their orders”
  29. 29. 29 07:48:08 AMExecution: Nested Loops joinselect * from customer, orders where c_custkey=o_custkeyfor each customer C {for each order O {if (C.c_custkey == O.o_custkey)produce record(C, O);}}• Complexity:– Scans table customer– For each record in customer, scans table orders• Is this ok?
  30. 30. 30 07:48:08 AMExecution: Nested loops join (2)select * from customer, orders where c_custkey=o_custkeyfor each customer C {for each order O {if (C.c_custkey == O.o_custkey)produce record(C, O);}}• EXPLAIN:+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | ||1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
  31. 31. 31 07:48:08 AMExecution: Nested loops join (3)select * from customer, orders where c_custkey=o_custkeyfor each customer C {for each order O {if (C.c_custkey == O.o_custkey)produce record(C, O);}}• EXPLAIN:+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | ||1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+rows to readfrom customerrows to read from ordersc_custkey=o_custkey
  32. 32. 32 07:48:08 AMExecution: Nested loops join (4)select * from customer, orders where c_custkey=o_custkey+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | ||1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+• Scan a 1,493,361-row table 148,749 times– Consider 1,493,361 * 148,749 row combinations• Is this query inherently complex?– We know each customer has his own orders– size(customer x orders)= size(orders)– Lower bound is1,493,361 + 148,749 + costs to match customer<->order.
  33. 33. 33 07:48:08 AMUsing index for join: ref accessalter table orders add index i_o_custkey(o_custkey)select * from customer, orders where c_custkey=o_custkey
  34. 34. 34 07:48:08 AMref access - analysis+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| ||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+select * from customer, orders where c_custkey=o_custkey● One ref lookup scans 7 rows.● In total: 7 * 148,749=1,041,243 rows– `orders` has 1.4M rows– no redundant reads from `orders`● The whole query plan– Reads all customers– Reads 1M orders (of 1.4M)● Efficient!
  35. 35. 35 07:48:08 AMConditions that can be used for ref access● Can use equalities– tbl.key=other_table.col– tbl.key=const– tbl.key IS NULL● For multipart keys, will use largest prefix– keypart1=... AND keypart2= … AND keypartK=... .
  36. 36. 36 07:48:08 AMConditions that cant be used for ref access● Doesnt work for non-equalitiest1.key BETWEEN t2.col1 AND t2.col2● Doesnt work for OR-ed equalitiest1.key=t2.col1 OR t1.key=t2.col2– Except for ref_or_nullt1.key=... OR t1.key IS NULL● Doesnt “combine” ref and rangeaccess– t.keypart1 BETWEEN c1 AND c2 ANDt.keypart2=t2.col– t.keypart2 BETWEEN c1 AND c2 ANDt.keypart1=t2.col .
  37. 37. 37 07:48:08 AMIs ref always efficient?● Efficient, if column has many different values– Best case – unique index (eq_ref)● A few different values – not useful● Skewed distribution: depends on which part thejoin touchesgoodbaddepends
  38. 38. 38 07:48:08 AMref access estimates - index statistics• How many rows will matchtbl.key_column = $valuefor an arbitrary $value?• Index statisticsshow keys from orders where key_name=i_o_custkey*************************** 1. row ***************Table: ordersNon_unique: 1Key_name: i_o_custkeySeq_in_index: 1Column_name: o_custkeyCollation: ACardinality: 214462Sub_part: NULLPacked: NULLNull: YESIndex_type: BTREEshow table status like orders*************************** 1. row ****Name: ordersEngine: InnoDBVersion: 10Row_format: CompactRows: 1495152Avg_row_length: 133Data_length: 199966720Max_data_length: 0Index_length: 122421248Data_free: 6291456...average = Rows /Cardinality = 1495152 / 214462 = 6.97.
  39. 39. 39 07:48:08 AMref access – conclusions● Based on t.key=... equality conditions● Can make joins very efficient● Relies on index statistics for estimates.
  40. 40. 40 07:48:08 AMOptimizer statistics● MySQL/Percona Server– Index statistics– Persistent/transient InnoDB stats● MariaDB– Index statistics, persistent/transient● Same as Percona Server (via XtraDB)– Persistent,engine-independent,index-independent statistics.
  41. 41. 41 07:48:08 AMIndex statistics● Cardinality allows to calculate a table-wideaverage #rows-per-key-prefix● It is a statistical value (inexact)● Exact collection procedure depends on thestorage engine– InnoDB – random sampling– MyISAM – index scan– Engine-independent – index scan.
  42. 42. 42 07:48:08 AMIndex statistics in MySQL 5.6● Sample [8] random index leaf pages● Table statistics (stored)– rows - estimated number of rows in a table– Other stats not used by optimizer● Index statistics (stored)– fields - #fields in the index– rows_per_key - rows per 1 key value, per prefix fields([1 column value], [2 columns value], [3 columns value], …)– Other stats not used by optimizer.
  43. 43. 43 07:48:08 AMIndex statics updates● Statistics updated when:– ANALYZE TABLE tbl_name [, tbl_name] …– SHOW TABLE STATUS, SHOW INDEX– Access to INFORMATION_SCHEMA.[TABLES|STATISTICS]– A table is opened for the first time(after server restart)– A table has changed >10%– When InnoDB Monitor is turned ON.
  44. 44. 44 07:48:08 AMDisplaying optimizer statistics● MySQL 5.5, MariaDB 5.3, and older– Issue SQL statements to count rows/keys– Indirectly, look at EXPLAIN for simple queries● MariaDB 5.5, Percona Server 5.5 (using XtraDB)– information_schema.[innodb_index_stats, innodb_table_stats]– Read-only, always visible● MySQL 5.6– mysql.[innodb_index_stats, innodb_table_stats]– User updatetable– Only available if innodb_analyze_is_persistent=ON● MariaDB 10.0– Persistent updateable tables mysql.[index_stats, column_stats, table_stats]– User updateable– + current XtraDB mechanisms.
  45. 45. 45 07:48:08 AMPlan [in]stability● Statistics may vary a lot (orders)MariaDB [dbt3]> select * from information_schema.innodb_index_stats;+------------+-----------------+--------------+ +---------------+| table_name | index_name | rows_per_key | | rows_per_key | error (actual)+------------+-----------------+--------------+ +---------------+| partsupp | PRIMARY | 3, 1 | | 4, 1 | 25%| partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4)| partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80)| orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234)| orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15)| lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477)+------------+-----------------+--------------+ +---------------+MariaDB [dbt3]> select * from information_schema.innodb_table_stats;+-----------------+----------+ +----------+| table_name | rows | | rows |+-----------------+----------+ +----------+| partsupp | 6524766 | | 9101065 | 28% (8000000)| orders | 15039855 | ==> | 14948612 | 0.6% (15000000)| lineitem | 60062904 | | 59992655 | 0.1% (59986052)+-----------------+----------+ +----------+.
  46. 46. 46 07:48:08 AMControlling statistics (MySQL 5.6)● Persistent and user-updatetable InnoDB statistics– innodb_analyze_is_persistent = ON,– updated manually by ANALYZE TABLE or– automatically by innodb_stats_auto_recalc = ON● Control the precision of sampling [default 8]– innodb_stats_persistent_sample_pages,– innodb_stats_transient_sample_pages●No new statistics compared to older versions.
  47. 47. 47 07:48:08 AMControlling statistics (MariaDB 10.0)Current XtraDB index statistics+● Engine-independent, persistent, user-updateable statistics● Precise● Additional statistics per column (even when there is noindex):– min_value, max_value: minimum/maximum value percolumn– nulls_ratio: fraction of null values in a column– avg_length: average size of values in a column– avg_frequency: average number of rows with the samevalue.
  48. 48. 48 07:48:08 AMJoin conditionpushdown
  49. 49. 49 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
  50. 50. 50 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  51. 51. 51 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  52. 52. 52 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+● Conjunctive (ANDed) conditions are split into parts● Each part is attached as early as possible– Either as “Using where”– Or as table access method.
  53. 53. 53 07:48:08 AMObserving join condition pushdownEXPLAIN: {"query_block": {"select_id": 1,"nested_loop": [{"table": {"table_name": "orders","access_type": "ALL","possible_keys": ["i_o_custkey"],"rows": 1499715,"filtered": 100,"attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` =1-URGENT) and (`dbt3sf1`.`orders`.`o_custkey` is not null))"}},{"table": {"table_name": "customer","access_type": "eq_ref","possible_keys": ["PRIMARY"],"key": "PRIMARY","used_key_parts": ["c_custkey"],"key_length": "4","ref": ["dbt3sf1.orders.o_custkey"],"rows": 1,"filtered": 100,"attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` <<cache>(-(500)))"}● Before mysql-5.6:EXPLAIN shows only“Using where”– The condition itselfonly visible in debugtrace● Starting from 5.6:EXPLAIN FORMAT=JSONshows attachedconditions.
  54. 54. 54 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 and o_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+First table, “customer”● type=ALL, 150 K rows●select count(*) from customer where c_acctbal < -500 gives 6804.● alter table customer add index (c_acctbal).
  55. 55. 55 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 and o_orderpriority=1-URGENT;First table, “customer”● type=ALL, 150 K rows●select count(*) from customer where c_acctbal < -500 gives 6804.● alter table customer add index (c_acctbal)+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------++--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+Now, access to customer is efficient.
  56. 56. 56 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 and o_orderpriority=1-URGENT;Second table, “orders”● Attached condition: c_custkey=o_custkey and o_orderpriority=1-URGENT● ref access uses only c_custkey=o_custkey● What about o_orderpriority=1-URGENT?.+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
  57. 57. 57 07:48:08 AM●o_orderpriority=1-URGENTo_orderpriority=1-URGENT● select count(*) from orders – 1.5M rows● select count(*) from orders where o_orderpriority=1-URGENT - 300Krows● 300K / 1.5M = 0.2
  58. 58. 58 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 and o_orderpriority=1-URGENT;Second table, “orders”● Attached condition: c_custkey=o_custkey and o_orderpriority=1-URGENT● ref access uses only c_custkey=o_custkey● What about o_orderpriority=1-URGENT? Selectivity= 0.2– Can examine 7*0.2=1.4 rows, 6802 times if we add an index:alter table orders add index (o_custkey, o_orderpriority)oralter table orders add index (o_orderpriority, o_custkey)+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
  59. 59. 59 07:48:08 AMReasoning about join plan efficiency - summaryBasic* approach to evaluation of join plan efficiency:for each table $T in the join order {Look at conditions attached to table $T (condition mustuse table $T, may also use previous tables)Does access method used with $T make a good useof attached conditions?}+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+* some other details may also affect join performance
  60. 60. 60 07:48:08 AMAttached conditions
  61. 61. 61 07:48:08 AMAttached conditions● Ideally, should be used for table access● Not all conditions can be used [at the same time]– Unused ones are still useful– They reduce number of scans for subsequent tablesselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  62. 62. 62 07:48:08 AMInforming optimizer about attached conditionsCurrently: a range access thats too expensive to use+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+|id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra |+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where|+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+explain extendedselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal > 8000 ando_orderpriority=1-URGENT;● `orders` will be scanned 150081 * 36.22%= 54359 times● This reduces the cost of join– Has an effect when comparing potential join plans● => Index i_o_custkey is not used. But may help the optimizer.
  63. 63. 63 07:48:08 AMAttached condition selectivity● Unused indexes provide info about selectivity– Works, but very expensive● MariaDB 10.0 has engine-independent statistics– Index statistics– Non-indexed Column statistics● Histograms– Further info:Tomorrow, 2:20 pm @ Ballroom DIgor BabaevEngine-independent persistent statistics with histogramsin MariaDB.
  64. 64. 64 07:48:08 AMHow to check if the query planmatches the reality
  65. 65. 65 07:48:08 AMCheck if query plan is realistic● EXPLAIN shows what optimizerexpects. It may be wrong– Out-of-date index statistics– Non-uniform data distribution● Other DBMS: EXPLAIN ANALYZE● MySQL: no equivalent. Instead, have– Handler counters– “User statistics” (Percona, MariaDB)– PERFORMANCE_SCHEMA
  66. 66. 66 07:48:08 AMJoin analysis: example query (Q18, DBT3)<reset counters>select c_name, c_custkey, o_orderkey, o_orderdate,o_totalprice, sum(l_quantity)from customer, orders, lineitemwhereo_totalprice > 500000and c_custkey = o_custkeyand o_orderkey = l_orderkeygroup by c_name, c_custkey, o_orderkey, o_orderdate,o_totalpriceorder by o_totalprice desc, o_orderdateLIMIT 10;<collect statistics>
  67. 67. 67 07:48:08 AMJoin analysis: handler counters (old)FLUSH STATUS;=> RUN QUERYSHOW STATUS LIKE "Handler%";+----------------------------+-------+| Handler_mrr_key_refills | 0 || Handler_mrr_rowid_refills | 0 || Handler_read_first | 0 || Handler_read_key | 1646 || Handler_read_last | 0 || Handler_read_next | 1462 || Handler_read_prev | 0 || Handler_read_rnd | 10 || Handler_read_rnd_deleted | 0 || Handler_read_rnd_next | 184 || Handler_tmp_update | 1096 || Handler_tmp_write | 183 || Handler_update | 0 || Handler_write | 0 |
  68. 68. 68 07:48:08 AMJoin analysis: USERSTAT by FacebookMariaDB, Percona ServerSET GLOBAL USERSTAT=1;FLUSH TABLE_STATISTICS;FLUSH INDEX_STATISTICS;=> RUN QUERYSHOW TABLE_STATISTICS;+--------------+------------+-----------+--------------+-------------------------+| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |+--------------+------------+-----------+--------------+-------------------------+| dbt3 | orders | 183 | 0 | 0 || dbt3 | lineitem | 1279 | 0 | 0 || dbt3 | customer | 183 | 0 | 0 |+--------------+------------+-----------+--------------+-------------------------+SHOW INDEX_STATISTICS;+--------------+------------+-----------------------+-----------+| Table_schema | Table_name | Index_name | Rows_read |+--------------+------------+-----------------------+-----------+| dbt3 | customer | PRIMARY | 183 || dbt3 | lineitem | i_l_orderkey_quantity | 1279 || dbt3 | orders | i_o_totalprice | 183 |+--------------+------------+-----------------------+-----------+
  69. 69. 69 07:48:08 AMJoin analysis: PERFORMANCE SCHEMA[MySQL 5.6, MariaDB 10.0]● summary tables with read/write statistics– table_io_waits_summary_by_table– table_io_waits_summary_by_index_usage● Superset of the userstat tables● More overhead● Not possible to associate statistics with a query=> truncate stats tables before running a query● Possible bug– performance schema not ignored– Disable byUPDATE setup_consumers SET ENABLED = NOwhere name = global_instrumentation;
  70. 70. 70 07:48:08 AMAnalyze joins via PERFORMANCE SCHEMA:SHOW TABLE_STATISTICS analogueselect object_schema, object_name, count_read, count_write,sum_timer_read, sum_timer_write, ...from table_io_waits_summary_by_tablewhere object_schema = dbt3 and count_star > 0;+---------------+-------------+------------+-------------+| object_schema | object_name | count_read | count_write |+---------------+-------------+------------+-------------+| dbt3 | customer | 183 | 0 || dbt3 | lineitem | 1462 | 0 || dbt3 | orders | 184 | 0 |+---------------+-------------+------------+-------------++----------------+-----------------+| sum_timer_read | sum_timer_write | ...+----------------+-----------------+| 8326528406 | 0 || 12117332778 | 0 || 7946312812 | 0 |+----------------+-----------------+
  71. 71. 71 07:48:08 AMAnalyze joins via PERFORMANCE SCHEMA:SHOW INDEX_STATISTICS analogueselect object_schema, object_name, index_name, count_read,sum_timer_read, sum_timer_write, ...from table_io_waits_summary_by_index_usagewhere object_schema = dbt3 and count_star > 0and index_name is not null;+---------------+-------------+-----------------------+------------+| object_schema | object_name | index_name | count_read |+---------------+-------------+-----------------------+------------+| dbt3 | customer | PRIMARY | 183 || dbt3 | lineitem | i_l_orderkey_quantity | 1462 || dbt3 | orders | i_o_totalprice | 184 |+---------------+-------------+-----------------------+------------++----------------+-----------------+| sum_timer_read | sum_timer_write | ...+----------------+-----------------+| 8326528406 | 0 || 12117332778 | 0 || 7946312812 | 0 |+----------------+-----------------+
  72. 72. 72 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief recap from 2012● JOINs– ref access● index statistics– join condition pushdown– join plan efficiency– query plan vs reality● Big I/O bound JOINs– Batched Key Access● Aggregate functions● ORDER BY ... LIMIT● GROUP BY● Subqueries
  73. 73. 73 07:48:08 AMBatched joins● Optimization for analytical queries● Analytic queries shovel through lots of data– e.g. “average size of order in the last month”– or “pairs of goods purchased together”● Indexes,etc wont help when you really need tolook at all data● More data means greater chance of being io-bound● Solution: batched joins
  74. 74. 74 07:48:08 AMBatched Key Access Idea
  75. 75. 75 07:48:08 AMBatched Key Access Idea
  76. 76. 76 07:48:08 AMBatched Key Access Idea
  77. 77. 77 07:48:08 AMBatched Key Access Idea
  78. 78. 78 07:48:08 AMBatched Key Access Idea
  79. 79. 79 07:48:08 AMBatched Key Access Idea
  80. 80. 80 07:48:08 AMBatched Key Access Idea● Non-BKA join hits data at random● Caches are not used efficiently● Prefetching is not useful
  81. 81. 81 07:48:08 AMBatched Key Access Idea● BKA implementation accesses datain order● Takes advantages of caches andprefetching
  82. 82. 82 07:48:08 AMBatched Key access effectset join_cache_level=6;select max(l_extendedprice)from orders, lineitemwherel_orderkey=o_orderkey ando_orderdate between $DATE1 and $DATE2The benchmark was run with● Various BKA buffer size● Various size of $DATE1...$DATE2 range
  83. 83. 83 07:48:08 AMBatched Key Access Performance-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000050010001500200025003000BKA join performance depending on buffer sizequery_size=1, regularquery_size=1, BKAquery_size=2, regularquery_size=2, BKAquery_size=3, regularquery_size=3, BKABuffer size, bytesQuerytime,secPerformance without BKAPerformance with BKA,given sufficient buffer size● 4x-10x speedup● The more the data, the bigger the speedup● Buffer size setting is very important.
  84. 84. 84 07:48:08 AMBatched Key Access settings● Needs to be turned onset join_buffer_size= 32*1024*1024;set join_cache_level=6; -- MariaDBset optimizer_switch=batched_key_access=on -- MySQL 5.6set optimizer_switch=mrr=on;set optimizer_switch=mrr_sort_keys=on; -- MariaDB only● Further join_buffer_size tuning is watching– Query performance– Handler_mrr_init counterand increasing join_buffer_size until either saturates.
  85. 85. 85 07:48:08 AMBatched Key Access - conclusions● Targeted at big joins● Needs to be enabled manually● @@join_buffer_size is the most importantsetting● MariaDBs implementation is a superset ofMySQLs.
  86. 86. 86 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief recap from 2012● JOINs– ref access● index statistics– join condition pushdown– join plan efficiency– query plan vs reality● Big I/O bound JOINs– Batched Key Access● Aggregate functions● ORDER BY ... LIMIT● GROUP BY● Subqueries
  87. 87. 87 07:48:08 AMORDER BYGROUP BYaggregates
  88. 88. 88 07:48:08 AMAggregate functions, no GROUP BY● COUNT, SUM, AVG, etc need to examine all rowsselect SUM(column) from tbl needs to examine the whole tbl.● MIN and MAX can use index for lookup+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+|id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra |+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+|1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away|+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+index (o_orderdate)select max(o_orderdate) from ordersselect min(o_orderdate) from orders where o_orderdate > 1995-05-01select max(o_orderdate) from orders where o_orderpriority=1-URGENTindex (o_orderpriority, o_orderdate)
  89. 89. 89 07:48:08 AMORDER BY … LIMITThree algorithms● Use an index to read in order● Read one table, sort, join - “Using filesort”● Execute join into temporary table and thensort - “Using temporary; Using filesort”
  90. 90. 90 07:48:08 AMUsing index to read data in order● No special indicationin EXPLAIN output● LIMIT n: as soon aswe read n records,we can stop!
  91. 91. 91 07:48:08 AMA problem with LIMIT N optimization`orders` has 1.5 M rowsexplain select * from orders order by o_orderdate desc limit 10;+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra|+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | |+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+select * from orders where o_orderpriority=1-URGENT order by o_orderdate desc limit 10;+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where|+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+● A problem:– 1.5M rows, 300K of them URGENT– Scanning by date, when will we find 10 URGENT rows?– No good solution so far.
  92. 92. 92 07:48:08 AMUsing filesort strategy● Have to read the entirefirst table● For remaining, can applyLIMIT n● ORDER BY can only usecolumns of tbl1.
  93. 93. 93 07:48:08 AMUsing temporary; Using filesort● ORDER BY clausecan use columns ofany table● LIMIT is applied onlyafter executing theentire join andsorting.
  94. 94. 94 07:48:08 AMORDER BY - conclusions● Resolving ORDER BY with index allows veryefficient handling for LIMIT– Optimization forWHERE unused_condition ORDER BY … LIMIT nis challenging.● Use sql_big_result, IGNORE INDEX FOR ORDER BY● Using filesort– Needs all ORDER BY columns in the first table– Take advantage of LIMIT when doing join to non-first tables● Using where; Using filesort is least efficient.
  95. 95. 95 07:48:08 AMGROUP BY strategiesThere are three strategies● Ordered index scan● Loose Index Scan (LooseScan)● Groups table(Using temporary; [Using filesort]).
  96. 96. 96 07:48:08 AMOrdered index scan● Groups areenumerated one afteranother● Can computeaggregates on the fly● Loose index scan isalso able to jump tonext group.
  97. 97. 97 07:48:08 AMExecution of GROUP BY with temptable
  98. 98. 98 07:48:08 AMSubqueries
  99. 99. 99 07:48:08 AMSubquery optimizations● Before MariaDB 5.3/MySQL 5.6 - “dont use subqueries”● Queries that caused most of the pain– SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins– SELECT … FROM (SELECT …) - derived tables● MariaDB 5.3 and MySQL 5.6– Have common inheritance, MySQL 6.0 alpha– Huge (100x, 1000x) speedups for painful areas– Other kinds of subqueries received a speedup, too– MariaDB 5.3/5.5 has a superset of MySQL 5.6s optimizations● 5.6 handles some un-handled edge cases, too
  100. 100. 100 07:48:08 AMTuning for subqueries● “Before”: one execution strategy– No tuning possible● “After”: similar to joins– Reasonable execution strategies supported– Need indexes– Need selective conditions– Support batching in most important cases● Should be better 9x% of the time.
  101. 101. 101 07:48:08 AMWhat if it still picks a poor query plan?For both MariaDB and MySQL:● Check EXPLAIN [EXTENDED], find a keyword around asubquery table● Google “site:kb.askmonty.org $subuqery_keyword”or https://kb.askmonty.org/en/subquery-optimizations-map/● Find which optimization it was● set optimizer_switch=$subquery_optimization=off
  102. 102. 102 07:48:08 AMThanks!Q & A
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×