• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
 

MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013

on

  • 2,859 views

 

Statistics

Views

Total Views
2,859
Views on SlideShare
2,836
Embed Views
23

Actions

Likes
1
Downloads
78
Comments
0

3 Embeds 23

http://www.scoop.it 20
https://twitter.com 2
http://www.pinterest.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013 MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013 Presentation Transcript

    • Advanced query optimizertuning and analysisSergei PetruniaTimour KatchaounovMonty Program AbMySQL Conference And Expo 2013
    • 2 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief recap from 2012● JOINs– ref access● index statistics– join condition pushdown– join plan efficiency– query plan vs reality● Big I/O bound JOINs– Batched Key Access● Aggregate functions● ORDER BY ... LIMIT● GROUP BY● Subqueries
    • 3 07:48:08 AMIs there a problem with query optimizer?• Databaseperformance isaffected by manyfactors• One of them is thequery optimizer• Is my performanceproblem caused bythe optimizer?
    • 4 07:48:08 AMSings that there is a query optimizer problem• Some (not all) queries are slow• A query seems to run longer than it ought to– And examines more records than it ought to• Usually, query remains slow regardless ofother activity on the server
    • 5 07:48:08 AMCatching slow queries, the old ways● Watch the Slow query log– Percona Server/MariaDB:--log_slow_verbosity=query_plan# Thread_id: 1 Schema: dbt3sf10 QC_hit: No# Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No# Filesort: No Filesort_on_disk: No Merge_passes: 0SET timestamp=1333385770;select * from customer where c_acctbal < -1000;• Run SHOW PROCESSLIST periodically– Run pt-query-digest on the log
    • 6 07:48:08 AMThe new way: SHOW PROCESSLIST + SHOW EXPLAIN• Available in MariaDB 10.0+• Displays EXPLAIN of a running statementMariaDB> show processlist;+--+----+---------+-------+-------+----+------------+-------------------------...|Id|User|Host |db |Command|Time|State |Info+--+----+---------+-------+-------+----+------------+-------------------------...| 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ...| 2|root|localhost|dbt3sf1|Query | 0|init |show processlist+--+----+---------+-------+-------+----+------------+-------------------------...MariaDB> show explain for 1;+--+-----------+------+----+-------------+----+-------+----+-------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+------+----+-------------+----+-------+----+-------+-----------+|1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where|+--+-----------+------+----+-------------+----+-------+----+-------+-----------+MariaDB [dbt3sf1]> show warnings;+-----+----+-----------------------------------------------------------------+|Level|Code|Message |+-----+----+-----------------------------------------------------------------+|Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995|+-----+----+-----------------------------------------------------------------+
    • 7 07:48:08 AMSHOW EXPLAIN usage● Intended usage– SHOW PROCESSLIST ...– SHOW EXPLAIN FOR ...● Why not just run EXPLAIN again– Difficult to replicate setups● Temporary tables● Optimizer settings● Storage engines index statistics● ...– No uncertainty about whether youre looking atthe same query plan or not.
    • 8 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]● use performance_schema● Many ways to analyze via queries– events_statements_summary_by_digest● count_star, sum_timer_wait,min_timer_wait, avg_timer_wait, max_timer_wait● digest_text, digest● sum_rows_examined, sum_created_tmp_disk_tables,sum_select_full_join– events_statements_history● sql_text, digest_text, digest● timer_start, timer_end, timer_wait● rows_examined, created_tmp_disk_tables,select_full_join8
    • 9 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]• Modified Q18 from DBT3select c_name, c_custkey, o_orderkey, o_orderdate,o_totalprice, sum(l_quantity)from customer, orders, lineitemwhereo_totalprice > ?and c_custkey = o_custkeyand o_orderkey = l_orderkeygroup by c_name, c_custkey, o_orderkey,o_orderdate, o_totalpriceorder by o_totalprice desc, o_orderdateLIMIT 10;• App executes Q18 many times with? = 550000, 500000, 400000, ...9
    • 10 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]● Find candidate slow queries● Simple tests: select_full_join > 0,created_tmp_disk_tables > 0, etc● Complex conditions:max execution time > X sec ORmin/max time vary a lot:select max_timer_wait/avg_timer_wait as max_ratio,avg_timer_wait/min_timer_wait as min_ratiofrom events_statements_summary_by_digestwhere max_timer_wait > 1000000000000or max_timer_wait / avg_timer_wait > 2or avg_timer_wait / min_timer_wait > 2G
    • 11 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]*************************** 5. row ***************************DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6bDIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` ,`o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE`o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY`c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice`DESC , `o_orderdate` LIMIT ?COUNT_STAR: 3SUM_TIMER_WAIT: 3251758347000MIN_TIMER_WAIT: 3914209000 → 0.0039 secAVG_TIMER_WAIT: 1083919449000MAX_TIMER_WAIT: 3204044053000 → 3.2 secSUM_LOCK_TIME: 555000000SUM_ROWS_SENT: 25SUM_ROWS_EXAMINED: 0SUM_CREATED_TMP_DISK_TABLES: 0SUM_CREATED_TMP_TABLES: 3SUM_SELECT_FULL_JOIN: 0SUM_SELECT_RANGE: 3SUM_SELECT_SCAN: 0SUM_SORT_RANGE: 0SUM_SORT_ROWS: 25SUM_SORT_SCAN: 3SUM_NO_INDEX_USED: 0SUM_NO_GOOD_INDEX_USED: 0FIRST_SEEN: 1970-01-01 03:38:27LAST_SEEN: 1970-01-01 03:38:43max_ratio: 2.9560min_ratio: 276.9192High variance ofexecution time
    • 12 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]● Check the actual queries and constants● The events_statements_history tableselect timer_wait/1000000000000 as exec_time, sql_textfrom events_statements_historywhere digest in(select digest from events_statements_summary_by_digestwhere max_timer_wait > 1000000000000or max_timer_wait / avg_timer_wait > 2or avg_timer_wait / min_timer_wait > 2)order by timer_wait;
    • 13 07:48:08 AMCatching slow queries (NEW)PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]+-----------+-----------------------------------------------------------------------------------+| exec_time | sql_text |+-----------+-----------------------------------------------------------------------------------+| 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)from customer, orders, lineitemwhere o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 || 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)from customer, orders, lineitemwhere o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 || 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)from customer, orders, lineitemwhere o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 |+-----------+-----------------------------------------------------------------------------------+Observation:orders.o_totalprice > ? is less and less selective
    • 14 07:48:08 AMActions after finding the slow queryBad query plan– Rewrite the query– Force a good query plan• Bad optimizer settings– Do tuning• Query is inherently complex– Dont waste time with it– Look for other solutions.
    • 15 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief recap from 2012● JOINs– ref access● index statistics– join condition pushdown– join plan efficiency– query plan vs reality● Big I/O bound JOINs– Batched Key Access● Aggregate functions● ORDER BY ... LIMIT● GROUP BY● Subqueries
    • 16 07:48:08 AMConsider a simple select• 15M rows were scanned, 19 rows in output• Query plan seems inefficient– (note: this logic doesnt directly apply to group/order by queries).select * from orderswhereo_orderDate BETWEEN 1992-06-06 and 1992-07-06 ando_clerk=Clerk#000009506+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+19 rows in set (7.65 sec)● Check the query plan:● Run the query:
    • 17 07:48:08 AMQuery plan analysis• Entire table is scanned• WHERE condition checkedafter records are read– Not used to limit#examined rows.+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+select * from orderswhereo_orderDate BETWEEN 1992-06-06 and 1992-07-06 ando_clerk=Clerk#000009506
    • 18 07:48:08 AMLets add an index• Outcome– Down to reading 300K rows– Still, 300K >> 19 rows.alter table orders add key i_o_orderdate (o_orderdate);select * from orderswhereo_orderDate BETWEEN 1992-06-06 and 1992-07-06 ando_clerk=Clerk#000009506+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+19 rows in set (0.76 sec)● Query time:
    • 19 07:48:08 AMFinding out which indexes to add● index (o_orderdate)● index (o_clerk)Check selectivity of conditions that will use the indexselect * from orderswhereo_orderDate BETWEEN 1992-06-06 and 1992-07-06 ando_clerk=Clerk#000009506select count(*) from orderswhereo_orderDate BETWEEN 1992-06-06 and 1992-07-06;306322 rowsselect count(*) from orders where o_clerk=Clerk#0000095061507 rows.
    • 20 07:48:08 AM+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+|1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where|+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------++--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+|1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where|+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+Try adding composite indexes● index (o_clerk, o_orderdate)● index (o_orderdate, o_clerk)Bingo! 100% efficiencyMuch worse!• If condition uses multiple columns, composite index will be most efficient• Order of column matters– Explanation why is outside of scope of this tutorial. Covered in last yearstutorial
    • 21 07:48:08 AMConditions must be in SARGable form• Condition must represent a range• It must have form that is recognized by the optimizero_orderDate BETWEEN 1992-06-01 and 1992-06-30day(o_orderDate)=1992 and month(o_orderdate)=6TO_DAYS(o_orderDATE) between TO_DAYS(1992-06-06) andTO_DAYS(1992-07-06)o_clerk=Clerk#000009506o_clerk LIKE Clerk#000009506o_clerk LIKE %Clerk#000009506%column IN (1,10,15,21, ...)(col1, col2) IN ( (1,1), (2,2), (3,3), …). 
    • 22 07:48:08 AMNew in MySQL-5.6: optimizer_trace● Lets you see the rangesset optimizer_trace=1;explain select * from orderswhere o_orderDATE between 1992-06-01 and 1992-07-03 ando_orderdate not in (1992-01-01, 1992-06-12,1992-07-04)select * from information_schema.optimizer_traceG● Will print a big JSON struct● Search for range_scan_alternatives.
    • 23 07:48:08 AMNew in MySQL-5.6: optimizer_trace..."range_scan_alternatives": [{"index": "i_o_orderdate","ranges": ["1992-06-01 <= o_orderDATE < 1992-06-12","1992-06-12 < o_orderDATE <= 1992-07-03"],"index_dives_for_eq_ranges": true,"rowid_ordered": false,"using_mrr": false,"index_only": false,"rows": 319082,"cost": 382900,"chosen": true},{"index": "i_o_date_clerk","ranges": ["1992-06-01 <= o_orderDATE < 1992-06-12","1992-06-12 < o_orderDATE <= 1992-07-03"],"index_dives_for_eq_ranges": true,"rowid_ordered": false,"using_mrr": false,"index_only": false,"rows": 406336,"cost": 487605,"chosen": false,"cause": "cost"}],...● Considered ranges are shownin range_scan_alternativessection● This is actually original usecase of optimizer_trace● Alas, recent mysql-5.6 displaysmisleading info about rangeson multi-component keys (willfile a bug)● Still, very useful.
    • 24 07:48:08 AMSource of #rows estimates for rangeselect * from orderswhere o_orderDate BETWEEN 1992-06-06 and 1992-07-06+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+?• “records_in_range” estimate• Done by diving into index• Usually is fairly accurate• Not affected by ANALYZETABLE.
    • 25 07:48:08 AMSimple selects: conclusions• Efficiency == “#rows_scanned is close to #rows_returned”• Indexes and WHERE conditions reduce #rows scanned• Index estimates are usually accurate• Multi-column indexes– “handle” conditions on multiple columns– Order of columns in the index matters• optimizer_trace allows to view the ranges– But misrepresents ranges over multi-column indexes.
    • 26 07:48:08 AMNow, will skip some topicsOne can also speedup simple selects with● index_merge access method● index access method● Index Condition PushdownWe dont have time for these now, check out the lastyears tutorial.
    • 27 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief recap from 2012● JOINs– ref access● index statistics– join condition pushdown– join plan efficiency– query plan vs reality● Big I/O bound JOINs– Batched Key Access● Aggregate functions● ORDER BY ... LIMIT● GROUP BY● Subqueries
    • 28 07:48:08 AMA simple joinselect * from customer, orders where c_custkey=o_custkey• “Customers with their orders”
    • 29 07:48:08 AMExecution: Nested Loops joinselect * from customer, orders where c_custkey=o_custkeyfor each customer C {for each order O {if (C.c_custkey == O.o_custkey)produce record(C, O);}}• Complexity:– Scans table customer– For each record in customer, scans table orders• Is this ok?
    • 30 07:48:08 AMExecution: Nested loops join (2)select * from customer, orders where c_custkey=o_custkeyfor each customer C {for each order O {if (C.c_custkey == O.o_custkey)produce record(C, O);}}• EXPLAIN:+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | ||1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
    • 31 07:48:08 AMExecution: Nested loops join (3)select * from customer, orders where c_custkey=o_custkeyfor each customer C {for each order O {if (C.c_custkey == O.o_custkey)produce record(C, O);}}• EXPLAIN:+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | ||1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+rows to readfrom customerrows to read from ordersc_custkey=o_custkey
    • 32 07:48:08 AMExecution: Nested loops join (4)select * from customer, orders where c_custkey=o_custkey+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | ||1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+• Scan a 1,493,361-row table 148,749 times– Consider 1,493,361 * 148,749 row combinations• Is this query inherently complex?– We know each customer has his own orders– size(customer x orders)= size(orders)– Lower bound is1,493,361 + 148,749 + costs to match customer<->order.
    • 33 07:48:08 AMUsing index for join: ref accessalter table orders add index i_o_custkey(o_custkey)select * from customer, orders where c_custkey=o_custkey
    • 34 07:48:08 AMref access - analysis+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| ||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+select * from customer, orders where c_custkey=o_custkey● One ref lookup scans 7 rows.● In total: 7 * 148,749=1,041,243 rows– `orders` has 1.4M rows– no redundant reads from `orders`● The whole query plan– Reads all customers– Reads 1M orders (of 1.4M)● Efficient!
    • 35 07:48:08 AMConditions that can be used for ref access● Can use equalities– tbl.key=other_table.col– tbl.key=const– tbl.key IS NULL● For multipart keys, will use largest prefix– keypart1=... AND keypart2= … AND keypartK=... .
    • 36 07:48:08 AMConditions that cant be used for ref access● Doesnt work for non-equalitiest1.key BETWEEN t2.col1 AND t2.col2● Doesnt work for OR-ed equalitiest1.key=t2.col1 OR t1.key=t2.col2– Except for ref_or_nullt1.key=... OR t1.key IS NULL● Doesnt “combine” ref and rangeaccess– t.keypart1 BETWEEN c1 AND c2 ANDt.keypart2=t2.col– t.keypart2 BETWEEN c1 AND c2 ANDt.keypart1=t2.col .
    • 37 07:48:08 AMIs ref always efficient?● Efficient, if column has many different values– Best case – unique index (eq_ref)● A few different values – not useful● Skewed distribution: depends on which part thejoin touchesgoodbaddepends
    • 38 07:48:08 AMref access estimates - index statistics• How many rows will matchtbl.key_column = $valuefor an arbitrary $value?• Index statisticsshow keys from orders where key_name=i_o_custkey*************************** 1. row ***************Table: ordersNon_unique: 1Key_name: i_o_custkeySeq_in_index: 1Column_name: o_custkeyCollation: ACardinality: 214462Sub_part: NULLPacked: NULLNull: YESIndex_type: BTREEshow table status like orders*************************** 1. row ****Name: ordersEngine: InnoDBVersion: 10Row_format: CompactRows: 1495152Avg_row_length: 133Data_length: 199966720Max_data_length: 0Index_length: 122421248Data_free: 6291456...average = Rows /Cardinality = 1495152 / 214462 = 6.97.
    • 39 07:48:08 AMref access – conclusions● Based on t.key=... equality conditions● Can make joins very efficient● Relies on index statistics for estimates.
    • 40 07:48:08 AMOptimizer statistics● MySQL/Percona Server– Index statistics– Persistent/transient InnoDB stats● MariaDB– Index statistics, persistent/transient● Same as Percona Server (via XtraDB)– Persistent,engine-independent,index-independent statistics.
    • 41 07:48:08 AMIndex statistics● Cardinality allows to calculate a table-wideaverage #rows-per-key-prefix● It is a statistical value (inexact)● Exact collection procedure depends on thestorage engine– InnoDB – random sampling– MyISAM – index scan– Engine-independent – index scan.
    • 42 07:48:08 AMIndex statistics in MySQL 5.6● Sample [8] random index leaf pages● Table statistics (stored)– rows - estimated number of rows in a table– Other stats not used by optimizer● Index statistics (stored)– fields - #fields in the index– rows_per_key - rows per 1 key value, per prefix fields([1 column value], [2 columns value], [3 columns value], …)– Other stats not used by optimizer.
    • 43 07:48:08 AMIndex statics updates● Statistics updated when:– ANALYZE TABLE tbl_name [, tbl_name] …– SHOW TABLE STATUS, SHOW INDEX– Access to INFORMATION_SCHEMA.[TABLES|STATISTICS]– A table is opened for the first time(after server restart)– A table has changed >10%– When InnoDB Monitor is turned ON.
    • 44 07:48:08 AMDisplaying optimizer statistics● MySQL 5.5, MariaDB 5.3, and older– Issue SQL statements to count rows/keys– Indirectly, look at EXPLAIN for simple queries● MariaDB 5.5, Percona Server 5.5 (using XtraDB)– information_schema.[innodb_index_stats, innodb_table_stats]– Read-only, always visible● MySQL 5.6– mysql.[innodb_index_stats, innodb_table_stats]– User updatetable– Only available if innodb_analyze_is_persistent=ON● MariaDB 10.0– Persistent updateable tables mysql.[index_stats, column_stats, table_stats]– User updateable– + current XtraDB mechanisms.
    • 45 07:48:08 AMPlan [in]stability● Statistics may vary a lot (orders)MariaDB [dbt3]> select * from information_schema.innodb_index_stats;+------------+-----------------+--------------+ +---------------+| table_name | index_name | rows_per_key | | rows_per_key | error (actual)+------------+-----------------+--------------+ +---------------+| partsupp | PRIMARY | 3, 1 | | 4, 1 | 25%| partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4)| partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80)| orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234)| orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15)| lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477)+------------+-----------------+--------------+ +---------------+MariaDB [dbt3]> select * from information_schema.innodb_table_stats;+-----------------+----------+ +----------+| table_name | rows | | rows |+-----------------+----------+ +----------+| partsupp | 6524766 | | 9101065 | 28% (8000000)| orders | 15039855 | ==> | 14948612 | 0.6% (15000000)| lineitem | 60062904 | | 59992655 | 0.1% (59986052)+-----------------+----------+ +----------+.
    • 46 07:48:08 AMControlling statistics (MySQL 5.6)● Persistent and user-updatetable InnoDB statistics– innodb_analyze_is_persistent = ON,– updated manually by ANALYZE TABLE or– automatically by innodb_stats_auto_recalc = ON● Control the precision of sampling [default 8]– innodb_stats_persistent_sample_pages,– innodb_stats_transient_sample_pages●No new statistics compared to older versions.
    • 47 07:48:08 AMControlling statistics (MariaDB 10.0)Current XtraDB index statistics+● Engine-independent, persistent, user-updateable statistics● Precise● Additional statistics per column (even when there is noindex):– min_value, max_value: minimum/maximum value percolumn– nulls_ratio: fraction of null values in a column– avg_length: average size of values in a column– avg_frequency: average number of rows with the samevalue.
    • 48 07:48:08 AMJoin conditionpushdown
    • 49 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
    • 50 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
    • 51 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
    • 52 07:48:08 AMJoin condition pushdownselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+● Conjunctive (ANDed) conditions are split into parts● Each part is attached as early as possible– Either as “Using where”– Or as table access method.
    • 53 07:48:08 AMObserving join condition pushdownEXPLAIN: {"query_block": {"select_id": 1,"nested_loop": [{"table": {"table_name": "orders","access_type": "ALL","possible_keys": ["i_o_custkey"],"rows": 1499715,"filtered": 100,"attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` =1-URGENT) and (`dbt3sf1`.`orders`.`o_custkey` is not null))"}},{"table": {"table_name": "customer","access_type": "eq_ref","possible_keys": ["PRIMARY"],"key": "PRIMARY","used_key_parts": ["c_custkey"],"key_length": "4","ref": ["dbt3sf1.orders.o_custkey"],"rows": 1,"filtered": 100,"attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` <<cache>(-(500)))"}● Before mysql-5.6:EXPLAIN shows only“Using where”– The condition itselfonly visible in debugtrace● Starting from 5.6:EXPLAIN FORMAT=JSONshows attachedconditions.
    • 54 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 and o_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+First table, “customer”● type=ALL, 150 K rows●select count(*) from customer where c_acctbal < -500 gives 6804.● alter table customer add index (c_acctbal).
    • 55 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 and o_orderpriority=1-URGENT;First table, “customer”● type=ALL, 150 K rows●select count(*) from customer where c_acctbal < -500 gives 6804.● alter table customer add index (c_acctbal)+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------++--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+Now, access to customer is efficient.
    • 56 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 and o_orderpriority=1-URGENT;Second table, “orders”● Attached condition: c_custkey=o_custkey and o_orderpriority=1-URGENT● ref access uses only c_custkey=o_custkey● What about o_orderpriority=1-URGENT?.+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
    • 57 07:48:08 AM●o_orderpriority=1-URGENTo_orderpriority=1-URGENT● select count(*) from orders – 1.5M rows● select count(*) from orders where o_orderpriority=1-URGENT - 300Krows● 300K / 1.5M = 0.2
    • 58 07:48:08 AMReasoning about join plan efficiencyselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 and o_orderpriority=1-URGENT;Second table, “orders”● Attached condition: c_custkey=o_custkey and o_orderpriority=1-URGENT● ref access uses only c_custkey=o_custkey● What about o_orderpriority=1-URGENT? Selectivity= 0.2– Can examine 7*0.2=1.4 rows, 6802 times if we add an index:alter table orders add index (o_custkey, o_orderpriority)oralter table orders add index (o_orderpriority, o_custkey)+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
    • 59 07:48:08 AMReasoning about join plan efficiency - summaryBasic* approach to evaluation of join plan efficiency:for each table $T in the join order {Look at conditions attached to table $T (condition mustuse table $T, may also use previous tables)Does access method used with $T make a good useof attached conditions?}+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+* some other details may also affect join performance
    • 60 07:48:08 AMAttached conditions
    • 61 07:48:08 AMAttached conditions● Ideally, should be used for table access● Not all conditions can be used [at the same time]– Unused ones are still useful– They reduce number of scans for subsequent tablesselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal < -500 ando_orderpriority=1-URGENT;+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
    • 62 07:48:08 AMInforming optimizer about attached conditionsCurrently: a range access thats too expensive to use+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+|id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra |+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+|1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where||1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where|+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+explain extendedselect *fromcustomer, orderswherec_custkey=o_custkey and c_acctbal > 8000 ando_orderpriority=1-URGENT;● `orders` will be scanned 150081 * 36.22%= 54359 times● This reduces the cost of join– Has an effect when comparing potential join plans● => Index i_o_custkey is not used. But may help the optimizer.
    • 63 07:48:08 AMAttached condition selectivity● Unused indexes provide info about selectivity– Works, but very expensive● MariaDB 10.0 has engine-independent statistics– Index statistics– Non-indexed Column statistics● Histograms– Further info:Tomorrow, 2:20 pm @ Ballroom DIgor BabaevEngine-independent persistent statistics with histogramsin MariaDB.
    • 64 07:48:08 AMHow to check if the query planmatches the reality
    • 65 07:48:08 AMCheck if query plan is realistic● EXPLAIN shows what optimizerexpects. It may be wrong– Out-of-date index statistics– Non-uniform data distribution● Other DBMS: EXPLAIN ANALYZE● MySQL: no equivalent. Instead, have– Handler counters– “User statistics” (Percona, MariaDB)– PERFORMANCE_SCHEMA
    • 66 07:48:08 AMJoin analysis: example query (Q18, DBT3)<reset counters>select c_name, c_custkey, o_orderkey, o_orderdate,o_totalprice, sum(l_quantity)from customer, orders, lineitemwhereo_totalprice > 500000and c_custkey = o_custkeyand o_orderkey = l_orderkeygroup by c_name, c_custkey, o_orderkey, o_orderdate,o_totalpriceorder by o_totalprice desc, o_orderdateLIMIT 10;<collect statistics>
    • 67 07:48:08 AMJoin analysis: handler counters (old)FLUSH STATUS;=> RUN QUERYSHOW STATUS LIKE "Handler%";+----------------------------+-------+| Handler_mrr_key_refills | 0 || Handler_mrr_rowid_refills | 0 || Handler_read_first | 0 || Handler_read_key | 1646 || Handler_read_last | 0 || Handler_read_next | 1462 || Handler_read_prev | 0 || Handler_read_rnd | 10 || Handler_read_rnd_deleted | 0 || Handler_read_rnd_next | 184 || Handler_tmp_update | 1096 || Handler_tmp_write | 183 || Handler_update | 0 || Handler_write | 0 |
    • 68 07:48:08 AMJoin analysis: USERSTAT by FacebookMariaDB, Percona ServerSET GLOBAL USERSTAT=1;FLUSH TABLE_STATISTICS;FLUSH INDEX_STATISTICS;=> RUN QUERYSHOW TABLE_STATISTICS;+--------------+------------+-----------+--------------+-------------------------+| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |+--------------+------------+-----------+--------------+-------------------------+| dbt3 | orders | 183 | 0 | 0 || dbt3 | lineitem | 1279 | 0 | 0 || dbt3 | customer | 183 | 0 | 0 |+--------------+------------+-----------+--------------+-------------------------+SHOW INDEX_STATISTICS;+--------------+------------+-----------------------+-----------+| Table_schema | Table_name | Index_name | Rows_read |+--------------+------------+-----------------------+-----------+| dbt3 | customer | PRIMARY | 183 || dbt3 | lineitem | i_l_orderkey_quantity | 1279 || dbt3 | orders | i_o_totalprice | 183 |+--------------+------------+-----------------------+-----------+
    • 69 07:48:08 AMJoin analysis: PERFORMANCE SCHEMA[MySQL 5.6, MariaDB 10.0]● summary tables with read/write statistics– table_io_waits_summary_by_table– table_io_waits_summary_by_index_usage● Superset of the userstat tables● More overhead● Not possible to associate statistics with a query=> truncate stats tables before running a query● Possible bug– performance schema not ignored– Disable byUPDATE setup_consumers SET ENABLED = NOwhere name = global_instrumentation;
    • 70 07:48:08 AMAnalyze joins via PERFORMANCE SCHEMA:SHOW TABLE_STATISTICS analogueselect object_schema, object_name, count_read, count_write,sum_timer_read, sum_timer_write, ...from table_io_waits_summary_by_tablewhere object_schema = dbt3 and count_star > 0;+---------------+-------------+------------+-------------+| object_schema | object_name | count_read | count_write |+---------------+-------------+------------+-------------+| dbt3 | customer | 183 | 0 || dbt3 | lineitem | 1462 | 0 || dbt3 | orders | 184 | 0 |+---------------+-------------+------------+-------------++----------------+-----------------+| sum_timer_read | sum_timer_write | ...+----------------+-----------------+| 8326528406 | 0 || 12117332778 | 0 || 7946312812 | 0 |+----------------+-----------------+
    • 71 07:48:08 AMAnalyze joins via PERFORMANCE SCHEMA:SHOW INDEX_STATISTICS analogueselect object_schema, object_name, index_name, count_read,sum_timer_read, sum_timer_write, ...from table_io_waits_summary_by_index_usagewhere object_schema = dbt3 and count_star > 0and index_name is not null;+---------------+-------------+-----------------------+------------+| object_schema | object_name | index_name | count_read |+---------------+-------------+-----------------------+------------+| dbt3 | customer | PRIMARY | 183 || dbt3 | lineitem | i_l_orderkey_quantity | 1462 || dbt3 | orders | i_o_totalprice | 184 |+---------------+-------------+-----------------------+------------++----------------+-----------------+| sum_timer_read | sum_timer_write | ...+----------------+-----------------+| 8326528406 | 0 || 12117332778 | 0 || 7946312812 | 0 |+----------------+-----------------+
    • 72 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief recap from 2012● JOINs– ref access● index statistics– join condition pushdown– join plan efficiency– query plan vs reality● Big I/O bound JOINs– Batched Key Access● Aggregate functions● ORDER BY ... LIMIT● GROUP BY● Subqueries
    • 73 07:48:08 AMBatched joins● Optimization for analytical queries● Analytic queries shovel through lots of data– e.g. “average size of order in the last month”– or “pairs of goods purchased together”● Indexes,etc wont help when you really need tolook at all data● More data means greater chance of being io-bound● Solution: batched joins
    • 74 07:48:08 AMBatched Key Access Idea
    • 75 07:48:08 AMBatched Key Access Idea
    • 76 07:48:08 AMBatched Key Access Idea
    • 77 07:48:08 AMBatched Key Access Idea
    • 78 07:48:08 AMBatched Key Access Idea
    • 79 07:48:08 AMBatched Key Access Idea
    • 80 07:48:08 AMBatched Key Access Idea● Non-BKA join hits data at random● Caches are not used efficiently● Prefetching is not useful
    • 81 07:48:08 AMBatched Key Access Idea● BKA implementation accesses datain order● Takes advantages of caches andprefetching
    • 82 07:48:08 AMBatched Key access effectset join_cache_level=6;select max(l_extendedprice)from orders, lineitemwherel_orderkey=o_orderkey ando_orderdate between $DATE1 and $DATE2The benchmark was run with● Various BKA buffer size● Various size of $DATE1...$DATE2 range
    • 83 07:48:08 AMBatched Key Access Performance-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000050010001500200025003000BKA join performance depending on buffer sizequery_size=1, regularquery_size=1, BKAquery_size=2, regularquery_size=2, BKAquery_size=3, regularquery_size=3, BKABuffer size, bytesQuerytime,secPerformance without BKAPerformance with BKA,given sufficient buffer size● 4x-10x speedup● The more the data, the bigger the speedup● Buffer size setting is very important.
    • 84 07:48:08 AMBatched Key Access settings● Needs to be turned onset join_buffer_size= 32*1024*1024;set join_cache_level=6; -- MariaDBset optimizer_switch=batched_key_access=on -- MySQL 5.6set optimizer_switch=mrr=on;set optimizer_switch=mrr_sort_keys=on; -- MariaDB only● Further join_buffer_size tuning is watching– Query performance– Handler_mrr_init counterand increasing join_buffer_size until either saturates.
    • 85 07:48:08 AMBatched Key Access - conclusions● Targeted at big joins● Needs to be enabled manually● @@join_buffer_size is the most importantsetting● MariaDBs implementation is a superset ofMySQLs.
    • 86 07:48:08 AM● Introduction– What is an optimizer problem– How to catch it● old an new tools● Single-table selects– brief recap from 2012● JOINs– ref access● index statistics– join condition pushdown– join plan efficiency– query plan vs reality● Big I/O bound JOINs– Batched Key Access● Aggregate functions● ORDER BY ... LIMIT● GROUP BY● Subqueries
    • 87 07:48:08 AMORDER BYGROUP BYaggregates
    • 88 07:48:08 AMAggregate functions, no GROUP BY● COUNT, SUM, AVG, etc need to examine all rowsselect SUM(column) from tbl needs to examine the whole tbl.● MIN and MAX can use index for lookup+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+|id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra |+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+|1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away|+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+index (o_orderdate)select max(o_orderdate) from ordersselect min(o_orderdate) from orders where o_orderdate > 1995-05-01select max(o_orderdate) from orders where o_orderpriority=1-URGENTindex (o_orderpriority, o_orderdate)
    • 89 07:48:08 AMORDER BY … LIMITThree algorithms● Use an index to read in order● Read one table, sort, join - “Using filesort”● Execute join into temporary table and thensort - “Using temporary; Using filesort”
    • 90 07:48:08 AMUsing index to read data in order● No special indicationin EXPLAIN output● LIMIT n: as soon aswe read n records,we can stop!
    • 91 07:48:08 AMA problem with LIMIT N optimization`orders` has 1.5 M rowsexplain select * from orders order by o_orderdate desc limit 10;+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra|+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | |+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+select * from orders where o_orderpriority=1-URGENT order by o_orderdate desc limit 10;+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where|+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+● A problem:– 1.5M rows, 300K of them URGENT– Scanning by date, when will we find 10 URGENT rows?– No good solution so far.
    • 92 07:48:08 AMUsing filesort strategy● Have to read the entirefirst table● For remaining, can applyLIMIT n● ORDER BY can only usecolumns of tbl1.
    • 93 07:48:08 AMUsing temporary; Using filesort● ORDER BY clausecan use columns ofany table● LIMIT is applied onlyafter executing theentire join andsorting.
    • 94 07:48:08 AMORDER BY - conclusions● Resolving ORDER BY with index allows veryefficient handling for LIMIT– Optimization forWHERE unused_condition ORDER BY … LIMIT nis challenging.● Use sql_big_result, IGNORE INDEX FOR ORDER BY● Using filesort– Needs all ORDER BY columns in the first table– Take advantage of LIMIT when doing join to non-first tables● Using where; Using filesort is least efficient.
    • 95 07:48:08 AMGROUP BY strategiesThere are three strategies● Ordered index scan● Loose Index Scan (LooseScan)● Groups table(Using temporary; [Using filesort]).
    • 96 07:48:08 AMOrdered index scan● Groups areenumerated one afteranother● Can computeaggregates on the fly● Loose index scan isalso able to jump tonext group.
    • 97 07:48:08 AMExecution of GROUP BY with temptable
    • 98 07:48:08 AMSubqueries
    • 99 07:48:08 AMSubquery optimizations● Before MariaDB 5.3/MySQL 5.6 - “dont use subqueries”● Queries that caused most of the pain– SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins– SELECT … FROM (SELECT …) - derived tables● MariaDB 5.3 and MySQL 5.6– Have common inheritance, MySQL 6.0 alpha– Huge (100x, 1000x) speedups for painful areas– Other kinds of subqueries received a speedup, too– MariaDB 5.3/5.5 has a superset of MySQL 5.6s optimizations● 5.6 handles some un-handled edge cases, too
    • 100 07:48:08 AMTuning for subqueries● “Before”: one execution strategy– No tuning possible● “After”: similar to joins– Reasonable execution strategies supported– Need indexes– Need selective conditions– Support batching in most important cases● Should be better 9x% of the time.
    • 101 07:48:08 AMWhat if it still picks a poor query plan?For both MariaDB and MySQL:● Check EXPLAIN [EXTENDED], find a keyword around asubquery table● Google “site:kb.askmonty.org $subuqery_keyword”or https://kb.askmonty.org/en/subquery-optimizations-map/● Find which optimization it was● set optimizer_switch=$subquery_optimization=off
    • 102 07:48:08 AMThanks!Q & A