Sergei Petrunia, MariaDB
New features
in MariaDB/MySQL
query optimizer
12:49:092
MySQL/MariaDB optimizer development
● Some features have common heritage
● Big releases:
– MariaDB 5.3/5.5
– MySQL 5.6
– (upcoming) MariaDB 10.0
12:49:093
New optimizer features
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
PERFORMANCE_SCHEMA
Engine-independent
statistics
InnoDB persistent statistics
12:49:094
New optimizer features
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
Engine-independent
statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
12:49:095
Subqueries in MySQL
● Subqueries are practially unusable
● e.g. Facebook disabled them in the parser
● Reason - “naive execution”.
12:49:096
Naive subquery execution
● For IN (SELECT... ) subqueries:
select * from hotel
where
hotel.country='USA' and
hotel.name IN (select hotel_stays.hotel
from hotel_stays
where hotel_stays.customer='John Smith')
for (each hotel in USA ) {
if (john smith stayed here) {
…
}
}
● Naive execution:
● Slow!
12:49:097
Naive subquery execution (2)
● For FROM(SELECT …) subquereis:
1. Retrieve all hotels with > 500 rooms, store in a temporary
table big_hotel;
2. Search in big_hotel for hotels near AMS.
● Naive execution:
● Slow!
select *
from
(select *
from hotel
where hotel.rooms > 500
) as big_hotel
where
big_hotel.nearest_aiport='AMS';
12:49:098
New subquery optimizations
● Handle IN (SELECT ...)
● Handle FROM (SELECT …)
● Handle a lot of cases
● Comparison with
PostgreSQL
– ~1000x slower before
– ~same order of magnitude now
● Releases
– MySQL 6.0
– MariaDB 5.5
● Sheeri Kritzer @ Mozilla seems
happy with this one
– MySQL 5.6
● Subset of MariaDB 5.5's
features
12:49:099
Subquery optimizations - summary
● Subqueries were generally unusable before MariaDB
5.3/5.5
● “Core” subquery optimizations are in
– MariaDB 5.3/5.5
– MySQL 5.6
● MariaDB has extra additions
● Further information:
https://kb.askmonty.org/en/subquery-optimizations/
12:49:0910
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
Engine-independent
statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
12:49:0911
Batched Key Access - background
● Big, IO-bound joins were slow
– DBT-3 benchmark could not finish*
● Reason?
● Nested Loops join hits the second table at random
locations.
12:49:0912
Batched Key Access idea
Nested Loops Join Batched Key Access
Speedup reasons
● Fewer disk head movements
● Cache-friendliness
● Prefetch-friendliness
12:49:0913
Batched Key Access benchmark
set join_cache_level=6; – enable BKA
select max(l_extendedprice)
from orders, lineitem
where
l_orderkey=o_orderkey and
o_orderdate between $DATE1 and $DATE2
Run with
● Various join_buffer_size settings
● Various size of $DATE1...$DATE2 range
12:49:0914
Batched Key Access benchmark (2)
-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000
0
500
1000
1500
2000
2500
3000
BKA join performance depending on buffer size
query_size=1, regular
query_size=1, BKA
query_size=2, regular
query_size=2, BKA
query_size=3, regular
query_size=3, BKA
Buffer size, bytes
Querytime,sec
Performance without BKA
Performance with BKA,
given sufficient buffer size
12:49:0915
Batched Key Access summary
● Optimization for big, IO-bound joins
– Orders-of-magnitude speedups
● Available in
– MariaDB 5.3/5.5 (more advanced)
– MySQL 5.6
● Not fully automatic yet
– Needs to be manually enabled
– Need to set buffer sizes.
12:49:0916
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
Engine-independent
statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
12:49:0917
Index Condition Pushdown
alter table lineitem add index s_r (l_shipdate, l_receiptdate);
select count(*) from lineitem
where
l_shipdate between '1993-01-01' and '1993-02-01' and
datediff(l_receiptdate,l_shipdate) > 25 and
l_quantity > 40
● A new feature in MariaDB 5.3/ MySQL 5.6
+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+
| 1 | SIMPLE | lineitem | range | s_r | s_r | 4 | NULL | 158854 | Using index condition; Using where |
+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+
1.Read index records in the range
l_shipdate between '1993-01-01' and '1993-02-01'
2.Check the index condition
datediff(l_receiptdate,l_shipdate) > 25
3.Read full table rows
4.Check the WHERE condition
l_quantity > 40
← New!
← Filters out records before
table rows are read
12:49:0918
Index Condition Pushdown - conclusions
Summary
● Applicable to any index-based access (ref, range, etc)
● Checks parts of WHERE after reading the index
● Reduces number of table records to be read
● Speedup can be like in “Using index”
– Great for IO-bound load (5x, 10x)
– Some for CPU-bound workload (2x)
Conclusions
● Have a selective condition on column?
– Put the column into index, at the end.
12:49:0919
Extended keys
● Before: optimizer has limited support for “tail” columns
– 'Using index' supports it
– ORDER BY col1, col2, pk1 support it
● After MariaDB 5.5/ MySQL 5.6
– all parts of optimizer (ref access, range access, etc) can use the “tail”
CREATE TABLE tbl (
pk1 sometype,
pk2 sometype,
...
col1 sometype,
col2 sometype,
...
KEY indexA (col1, col2)
...
PRIMARY KEY (pk1, pk2)
) ENGINE=InnoDB
indexA col1 col2 pk1 pk2
● Secondary indexes in InnoDB have invisible “tail”
12:49:0920
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
Engine-independent
statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
12:49:0921
Better EXPLAIN in MySQL 5.6
● EXPLAIN for UPDATE/DELETE/INSERT … SELECT
– shows query plan for the finding records to update/delete
mysql> explain update customer set c_acctbal = c_acctbal - 100 where c_custkey=12354;
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | customer | range | PRIMARY | PRIMARY | 4 | NULL | 1 | Using where |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
● EXPLAIN FORMAT=JSON
– Produces [big] JSON output
– Shows more information:
● Shows conditions attached to tables
● Shows whether “Using temporary; using filesort” is done to handle
GROUP BY or ORDER BY.
● Shows where subqueries are attached
– No other known additions
– Will be in MariaDB 10.0
The most useful addition!
12:49:0922
EXPLAIN FORMAT=JSON
What are the “conditions attached to tables”?
explain
select
count(*)
from
orders, customer
where
customer.c_custkey=orders.o_custkey and
customer.c_mktsegment='BUILDING' and
orders.o_totalprice > customer.c_acctbal and
orders.o_orderpriority='1-URGENT'
+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 1509871 | Using where |
| 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | dbt3sf10.customer.c_custkey | 7 | Using where |
+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
?
12:49:0923
EXPLAIN FORMAT=JSON (2)
{
"query_block": {
"select_id": 1,
"nested_loop": [
{
"table": {
"table_name": "customer",
"access_type": "ALL",
"possible_keys": [
"PRIMARY"
],
"rows": 1509871,
"filtered": 100,
"attached_condition": "(`dbt3sf10`.`customer`.`c_mktsegment` = 'BUILDING')"
}
},
{
"table": {
"table_name": "orders",
"access_type": "ref",
"possible_keys": [
"i_o_custkey"
],
"key": "i_o_custkey",
"used_key_parts": [
"o_custkey"
],
"key_length": "5",
"ref": [
"dbt3sf10.customer.c_custkey"
],
"rows": 7,
"filtered": 100,
"attached_condition": "((`dbt3sf10`.`orders`.`o_orderpriority` = '1-URGENT') and (`dbt3sf10`.`orders`.`o_totalprice` >
`dbt3sf10`.`customer`.`c_acctbal`))"
}
}
]
}
}
+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 1509871 | Using where |
| 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | dbt3sf10.customer.c_custkey | 7 | Using where |
+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
12:49:0924
EXPLAIN ANALYZE (kind of)
● Does EXPLAIN match the reality?
● Where is most of the time spent?
● MySQL/MariaDB don't have “EXPLAIN ANALYZE” ...
select
count(*)
from
orders, customer
where
customer.c_custkey=orders.o_custkey and
customer.c_mktsegment='BUILDING' and orders.o_orderpriority='1-URGENT'
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 149415 | Using where |
| 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index |
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
12:49:0925
Traditional solution: Status variables
Problems:
● Only #rows counters
● all tables are counted together
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
mysql> {run query}
mysql> show status like 'Handler%';
+----------------------------+--------+
| Variable_name | Value |
+----------------------------+--------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_icp_attempts | 0 |
| Handler_icp_match | 0 |
| Handler_mrr_init | 0 |
| Handler_mrr_key_refills | 0 |
| Handler_mrr_rowid_refills | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 30142 |
| Handler_read_last | 0 |
| Handler_read_next | 303959 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_deleted | 0 |
| Handler_read_rnd_next | 150001 |
| Handler_rollback | 0 |
...
. . .
12:49:0926
Newer solution: userstat
● In Facebook patch, Percona, MariaDB:
mysql> set global userstat=1;
mysql> flush table_statistics;
mysql> flush index_statistics;
mysql> {query}
mysql> show table_statistics;
+--------------+------------+-----------+--------------+-------------------------+
| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+--------------+------------+-----------+--------------+-------------------------+
| dbt3sf1 | orders | 303959 | 0 | 0 |
| dbt3sf1 | customer | 150000 | 0 | 0 |
+--------------+------------+-----------+--------------+-------------------------+
mysql> show index_statistics;
+--------------+------------+-------------+-----------+
| Table_schema | Table_name | Index_name | Rows_read |
+--------------+------------+-------------+-----------+
| dbt3sf1 | orders | i_o_custkey | 303959 |
+--------------+------------+-------------+-----------+
● Counters are per-table
– Ok as long as you don't have self-joins
● Overhead is negligible
● Counters are server-wide (other queries affect them, too)
12:49:0927
Latest addition: PERFORMANCE_SCHEMA
● Allows to measure *time* spent reading each table
● Has some visible overhead (Facebook's tests: 7%)
● Counters are system-wide
● Still no luck with self-joins
mysql> truncate performance_schema.table_io_waits_summary_by_table;
mysql> {query}
mysql> select
object_schema,
object_name,
count_read,
sum_timer_read, -- this is picoseconds
sum_timer_read / (1000*1000*1000*1000) as read_seconds -- this is seconds
from
performance_schema.table_io_waits_summary_by_table
where
object_schema = 'dbt3sf1' and object_name in ('orders','customer');
+---------------+-------------+------------+----------------+--------------+
| object_schema | object_name | count_read | sum_timer_read | read_seconds |
+---------------+-------------+------------+----------------+--------------+
| dbt3sf1 | orders | 334101 | 5739345397323 | 5.7393 |
| dbt3sf1 | customer | 150001 | 1273653046701 | 1.2737 |
+---------------+-------------+------------+----------------+--------------+
12:49:0928
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
Engine-independent
statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
12:49:0929
What is table/index statistics?
select
count(*)
from
customer, orders
where
customer.c_custkey=orders.o_custkey and customer.c_mktsegment='BUILDING';
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 148305 | Using where |
| 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index |
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
MariaDB > show table status like 'orders'G
*************************** 1. row ***************************
Name: orders
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 1495152
.............
MariaDB > show keys from orders where key_name='i_o_custkey'G
*************************** 1. row ***************************
Table: orders
Non_unique: 1
Key_name: i_o_custkey
Seq_in_index: 1
Column_name: o_custkey
Collation: A
Cardinality: 212941
Sub_part: NULL
.................
?
1495152 / 212941 = 7
“There are on average 7 orders
for a given c_custkey”
12:49:0930
The problem with index statistics and InnoDB
MySQL 5.5, InnoDB
● Statistics is calculated on-the-fly
– When the table is opened (server restart, DDL)
– When sufficient number of records have been updated
– ...
● Calculation uses random sampling
– @@innodb_stats_sample_pages
● Result:
– Statistics changes without warning
=> Query plans change, without warning
● For example, DBT-3 benchmark
– 22 analytics queries
– Plans-per-query: avg=2.8, max=7.
12:49:0931
Persistent table statistics
Persistent statistics v1
● Percona Server 5.5 (ported to MariaDB 5.5)
– Need to enable it: innodb_use_sys_stats_table=1
● Statistics is stored inside InnoDB
– User-visible through information_schema.innodb_sys_stats (read-only)
● Setting innodb_stats_auto_update=OFF prevents unexpected updates
Persistent statistics v2
● MySQL 5.6
– Enabled by default: innodb_stats_persistent=1
● Stored in regular InnoDB tables
– mysql.innodb_table_stats, mysql.innodb_index_stats
● Setting innodb_stats_auto_recalc=OFF prevents unexpected updates
● Can also specify persistence/auto-recalc as a table option
12:49:0932
Persistent table statistics - summary
● Percona, then MySQL
– Made statistics persistent
– Disallowed automatic updates
● Remaining issue #1: it's still random sampling
– DBT-3 benchmark
– scale=30
– Re-ran EXPLAINS for
benchmark queries
– Counted different query
plans
● Remaining issue #2: limited amount of statistics
– Only on index columns
– Only AVG(#different_values)
12:49:0933
Upcoming: Engine-independent statistics
MariaDB 10.0: Engine-independent statistics
● Collected/used on SQL layer
● No auto updates, only ANALYZE TABLE
– 100% precise statics
● More statistics
– Index statistics (like before)
– Table statistics (like before)
– Column statistics
● MIN/MAX values
● Number of NULL / not NULL values
● Histograms
● => Optimizer will be smarter and more reliable
12:49:0934
Conclusions
● Lots of new query optimizer features recently
– Subqueries now just work
– Big joins are much faster
● Need to turn it on
– More diagnostics
● Even more is coming
● Releases with features
– MariaDB 5.5
– MySQL 5.6,
– (upcoming) MariaDB 10.0
12:49:0935
New optimizer features
Subqueries Batched Key Access
(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/
DELETE
Subqueries
FROM IN Others
PERFORMANCE_SCHEMA
Engine-independent
statistics
InnoDB persistent statistics
12:49:0936
Thanks
Q & A

New features-in-mariadb-and-mysql-optimizers

  • 1.
    Sergei Petrunia, MariaDB Newfeatures in MariaDB/MySQL query optimizer
  • 2.
    12:49:092 MySQL/MariaDB optimizer development ●Some features have common heritage ● Big releases: – MariaDB 5.3/5.5 – MySQL 5.6 – (upcoming) MariaDB 10.0
  • 3.
    12:49:093 New optimizer features SubqueriesBatched Key Access (MRR) Index Condition Pushdown Extended Keys EXPLAIN UPDATE/ DELETE Subqueries FROM IN Others PERFORMANCE_SCHEMA Engine-independent statistics InnoDB persistent statistics
  • 4.
    12:49:094 New optimizer features SubqueriesBatched Key Access (MRR) Index Condition Pushdown Extended Keys EXPLAIN UPDATE/ DELETE Subqueries FROM IN Others Engine-independent statistics InnoDB persistent statistics PERFORMANCE_SCHEMA
  • 5.
    12:49:095 Subqueries in MySQL ●Subqueries are practially unusable ● e.g. Facebook disabled them in the parser ● Reason - “naive execution”.
  • 6.
    12:49:096 Naive subquery execution ●For IN (SELECT... ) subqueries: select * from hotel where hotel.country='USA' and hotel.name IN (select hotel_stays.hotel from hotel_stays where hotel_stays.customer='John Smith') for (each hotel in USA ) { if (john smith stayed here) { … } } ● Naive execution: ● Slow!
  • 7.
    12:49:097 Naive subquery execution(2) ● For FROM(SELECT …) subquereis: 1. Retrieve all hotels with > 500 rooms, store in a temporary table big_hotel; 2. Search in big_hotel for hotels near AMS. ● Naive execution: ● Slow! select * from (select * from hotel where hotel.rooms > 500 ) as big_hotel where big_hotel.nearest_aiport='AMS';
  • 8.
    12:49:098 New subquery optimizations ●Handle IN (SELECT ...) ● Handle FROM (SELECT …) ● Handle a lot of cases ● Comparison with PostgreSQL – ~1000x slower before – ~same order of magnitude now ● Releases – MySQL 6.0 – MariaDB 5.5 ● Sheeri Kritzer @ Mozilla seems happy with this one – MySQL 5.6 ● Subset of MariaDB 5.5's features
  • 9.
    12:49:099 Subquery optimizations -summary ● Subqueries were generally unusable before MariaDB 5.3/5.5 ● “Core” subquery optimizations are in – MariaDB 5.3/5.5 – MySQL 5.6 ● MariaDB has extra additions ● Further information: https://kb.askmonty.org/en/subquery-optimizations/
  • 10.
    12:49:0910 Subqueries Batched KeyAccess (MRR) Index Condition Pushdown Extended Keys EXPLAIN UPDATE/ DELETE Subqueries FROM IN Others Engine-independent statistics InnoDB persistent statistics PERFORMANCE_SCHEMA
  • 11.
    12:49:0911 Batched Key Access- background ● Big, IO-bound joins were slow – DBT-3 benchmark could not finish* ● Reason? ● Nested Loops join hits the second table at random locations.
  • 12.
    12:49:0912 Batched Key Accessidea Nested Loops Join Batched Key Access Speedup reasons ● Fewer disk head movements ● Cache-friendliness ● Prefetch-friendliness
  • 13.
    12:49:0913 Batched Key Accessbenchmark set join_cache_level=6; – enable BKA select max(l_extendedprice) from orders, lineitem where l_orderkey=o_orderkey and o_orderdate between $DATE1 and $DATE2 Run with ● Various join_buffer_size settings ● Various size of $DATE1...$DATE2 range
  • 14.
    12:49:0914 Batched Key Accessbenchmark (2) -2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000 0 500 1000 1500 2000 2500 3000 BKA join performance depending on buffer size query_size=1, regular query_size=1, BKA query_size=2, regular query_size=2, BKA query_size=3, regular query_size=3, BKA Buffer size, bytes Querytime,sec Performance without BKA Performance with BKA, given sufficient buffer size
  • 15.
    12:49:0915 Batched Key Accesssummary ● Optimization for big, IO-bound joins – Orders-of-magnitude speedups ● Available in – MariaDB 5.3/5.5 (more advanced) – MySQL 5.6 ● Not fully automatic yet – Needs to be manually enabled – Need to set buffer sizes.
  • 16.
    12:49:0916 Subqueries Batched KeyAccess (MRR) Index Condition Pushdown Extended Keys EXPLAIN UPDATE/ DELETE Subqueries FROM IN Others Engine-independent statistics InnoDB persistent statistics PERFORMANCE_SCHEMA
  • 17.
    12:49:0917 Index Condition Pushdown altertable lineitem add index s_r (l_shipdate, l_receiptdate); select count(*) from lineitem where l_shipdate between '1993-01-01' and '1993-02-01' and datediff(l_receiptdate,l_shipdate) > 25 and l_quantity > 40 ● A new feature in MariaDB 5.3/ MySQL 5.6 +----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+ | 1 | SIMPLE | lineitem | range | s_r | s_r | 4 | NULL | 158854 | Using index condition; Using where | +----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+ 1.Read index records in the range l_shipdate between '1993-01-01' and '1993-02-01' 2.Check the index condition datediff(l_receiptdate,l_shipdate) > 25 3.Read full table rows 4.Check the WHERE condition l_quantity > 40 ← New! ← Filters out records before table rows are read
  • 18.
    12:49:0918 Index Condition Pushdown- conclusions Summary ● Applicable to any index-based access (ref, range, etc) ● Checks parts of WHERE after reading the index ● Reduces number of table records to be read ● Speedup can be like in “Using index” – Great for IO-bound load (5x, 10x) – Some for CPU-bound workload (2x) Conclusions ● Have a selective condition on column? – Put the column into index, at the end.
  • 19.
    12:49:0919 Extended keys ● Before:optimizer has limited support for “tail” columns – 'Using index' supports it – ORDER BY col1, col2, pk1 support it ● After MariaDB 5.5/ MySQL 5.6 – all parts of optimizer (ref access, range access, etc) can use the “tail” CREATE TABLE tbl ( pk1 sometype, pk2 sometype, ... col1 sometype, col2 sometype, ... KEY indexA (col1, col2) ... PRIMARY KEY (pk1, pk2) ) ENGINE=InnoDB indexA col1 col2 pk1 pk2 ● Secondary indexes in InnoDB have invisible “tail”
  • 20.
    12:49:0920 Subqueries Batched KeyAccess (MRR) Index Condition Pushdown Extended Keys EXPLAIN UPDATE/ DELETE Subqueries FROM IN Others Engine-independent statistics InnoDB persistent statistics PERFORMANCE_SCHEMA
  • 21.
    12:49:0921 Better EXPLAIN inMySQL 5.6 ● EXPLAIN for UPDATE/DELETE/INSERT … SELECT – shows query plan for the finding records to update/delete mysql> explain update customer set c_acctbal = c_acctbal - 100 where c_custkey=12354; +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | customer | range | PRIMARY | PRIMARY | 4 | NULL | 1 | Using where | +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ ● EXPLAIN FORMAT=JSON – Produces [big] JSON output – Shows more information: ● Shows conditions attached to tables ● Shows whether “Using temporary; using filesort” is done to handle GROUP BY or ORDER BY. ● Shows where subqueries are attached – No other known additions – Will be in MariaDB 10.0 The most useful addition!
  • 22.
    12:49:0922 EXPLAIN FORMAT=JSON What arethe “conditions attached to tables”? explain select count(*) from orders, customer where customer.c_custkey=orders.o_custkey and customer.c_mktsegment='BUILDING' and orders.o_totalprice > customer.c_acctbal and orders.o_orderpriority='1-URGENT' +----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+ | 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 1509871 | Using where | | 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | dbt3sf10.customer.c_custkey | 7 | Using where | +----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+ ?
  • 23.
    12:49:0923 EXPLAIN FORMAT=JSON (2) { "query_block":{ "select_id": 1, "nested_loop": [ { "table": { "table_name": "customer", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows": 1509871, "filtered": 100, "attached_condition": "(`dbt3sf10`.`customer`.`c_mktsegment` = 'BUILDING')" } }, { "table": { "table_name": "orders", "access_type": "ref", "possible_keys": [ "i_o_custkey" ], "key": "i_o_custkey", "used_key_parts": [ "o_custkey" ], "key_length": "5", "ref": [ "dbt3sf10.customer.c_custkey" ], "rows": 7, "filtered": 100, "attached_condition": "((`dbt3sf10`.`orders`.`o_orderpriority` = '1-URGENT') and (`dbt3sf10`.`orders`.`o_totalprice` > `dbt3sf10`.`customer`.`c_acctbal`))" } } ] } } +----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+ | 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 1509871 | Using where | | 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | dbt3sf10.customer.c_custkey | 7 | Using where | +----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
  • 24.
    12:49:0924 EXPLAIN ANALYZE (kindof) ● Does EXPLAIN match the reality? ● Where is most of the time spent? ● MySQL/MariaDB don't have “EXPLAIN ANALYZE” ... select count(*) from orders, customer where customer.c_custkey=orders.o_custkey and customer.c_mktsegment='BUILDING' and orders.o_orderpriority='1-URGENT' +------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+ | 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 149415 | Using where | | 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index | +------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
  • 25.
    12:49:0925 Traditional solution: Statusvariables Problems: ● Only #rows counters ● all tables are counted together mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> {run query} mysql> show status like 'Handler%'; +----------------------------+--------+ | Variable_name | Value | +----------------------------+--------+ | Handler_commit | 1 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_icp_attempts | 0 | | Handler_icp_match | 0 | | Handler_mrr_init | 0 | | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 30142 | | Handler_read_last | 0 | | Handler_read_next | 303959 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 150001 | | Handler_rollback | 0 | ... . . .
  • 26.
    12:49:0926 Newer solution: userstat ●In Facebook patch, Percona, MariaDB: mysql> set global userstat=1; mysql> flush table_statistics; mysql> flush index_statistics; mysql> {query} mysql> show table_statistics; +--------------+------------+-----------+--------------+-------------------------+ | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes | +--------------+------------+-----------+--------------+-------------------------+ | dbt3sf1 | orders | 303959 | 0 | 0 | | dbt3sf1 | customer | 150000 | 0 | 0 | +--------------+------------+-----------+--------------+-------------------------+ mysql> show index_statistics; +--------------+------------+-------------+-----------+ | Table_schema | Table_name | Index_name | Rows_read | +--------------+------------+-------------+-----------+ | dbt3sf1 | orders | i_o_custkey | 303959 | +--------------+------------+-------------+-----------+ ● Counters are per-table – Ok as long as you don't have self-joins ● Overhead is negligible ● Counters are server-wide (other queries affect them, too)
  • 27.
    12:49:0927 Latest addition: PERFORMANCE_SCHEMA ●Allows to measure *time* spent reading each table ● Has some visible overhead (Facebook's tests: 7%) ● Counters are system-wide ● Still no luck with self-joins mysql> truncate performance_schema.table_io_waits_summary_by_table; mysql> {query} mysql> select object_schema, object_name, count_read, sum_timer_read, -- this is picoseconds sum_timer_read / (1000*1000*1000*1000) as read_seconds -- this is seconds from performance_schema.table_io_waits_summary_by_table where object_schema = 'dbt3sf1' and object_name in ('orders','customer'); +---------------+-------------+------------+----------------+--------------+ | object_schema | object_name | count_read | sum_timer_read | read_seconds | +---------------+-------------+------------+----------------+--------------+ | dbt3sf1 | orders | 334101 | 5739345397323 | 5.7393 | | dbt3sf1 | customer | 150001 | 1273653046701 | 1.2737 | +---------------+-------------+------------+----------------+--------------+
  • 28.
    12:49:0928 Subqueries Batched KeyAccess (MRR) Index Condition Pushdown Extended Keys EXPLAIN UPDATE/ DELETE Subqueries FROM IN Others Engine-independent statistics InnoDB persistent statistics PERFORMANCE_SCHEMA
  • 29.
    12:49:0929 What is table/indexstatistics? select count(*) from customer, orders where customer.c_custkey=orders.o_custkey and customer.c_mktsegment='BUILDING'; +------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+ | 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 148305 | Using where | | 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index | +------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+ MariaDB > show table status like 'orders'G *************************** 1. row *************************** Name: orders Engine: InnoDB Version: 10 Row_format: Compact Rows: 1495152 ............. MariaDB > show keys from orders where key_name='i_o_custkey'G *************************** 1. row *************************** Table: orders Non_unique: 1 Key_name: i_o_custkey Seq_in_index: 1 Column_name: o_custkey Collation: A Cardinality: 212941 Sub_part: NULL ................. ? 1495152 / 212941 = 7 “There are on average 7 orders for a given c_custkey”
  • 30.
    12:49:0930 The problem withindex statistics and InnoDB MySQL 5.5, InnoDB ● Statistics is calculated on-the-fly – When the table is opened (server restart, DDL) – When sufficient number of records have been updated – ... ● Calculation uses random sampling – @@innodb_stats_sample_pages ● Result: – Statistics changes without warning => Query plans change, without warning ● For example, DBT-3 benchmark – 22 analytics queries – Plans-per-query: avg=2.8, max=7.
  • 31.
    12:49:0931 Persistent table statistics Persistentstatistics v1 ● Percona Server 5.5 (ported to MariaDB 5.5) – Need to enable it: innodb_use_sys_stats_table=1 ● Statistics is stored inside InnoDB – User-visible through information_schema.innodb_sys_stats (read-only) ● Setting innodb_stats_auto_update=OFF prevents unexpected updates Persistent statistics v2 ● MySQL 5.6 – Enabled by default: innodb_stats_persistent=1 ● Stored in regular InnoDB tables – mysql.innodb_table_stats, mysql.innodb_index_stats ● Setting innodb_stats_auto_recalc=OFF prevents unexpected updates ● Can also specify persistence/auto-recalc as a table option
  • 32.
    12:49:0932 Persistent table statistics- summary ● Percona, then MySQL – Made statistics persistent – Disallowed automatic updates ● Remaining issue #1: it's still random sampling – DBT-3 benchmark – scale=30 – Re-ran EXPLAINS for benchmark queries – Counted different query plans ● Remaining issue #2: limited amount of statistics – Only on index columns – Only AVG(#different_values)
  • 33.
    12:49:0933 Upcoming: Engine-independent statistics MariaDB10.0: Engine-independent statistics ● Collected/used on SQL layer ● No auto updates, only ANALYZE TABLE – 100% precise statics ● More statistics – Index statistics (like before) – Table statistics (like before) – Column statistics ● MIN/MAX values ● Number of NULL / not NULL values ● Histograms ● => Optimizer will be smarter and more reliable
  • 34.
    12:49:0934 Conclusions ● Lots ofnew query optimizer features recently – Subqueries now just work – Big joins are much faster ● Need to turn it on – More diagnostics ● Even more is coming ● Releases with features – MariaDB 5.5 – MySQL 5.6, – (upcoming) MariaDB 10.0
  • 35.
    12:49:0935 New optimizer features SubqueriesBatched Key Access (MRR) Index Condition Pushdown Extended Keys EXPLAIN UPDATE/ DELETE Subqueries FROM IN Others PERFORMANCE_SCHEMA Engine-independent statistics InnoDB persistent statistics
  • 36.