SlideShare a Scribd company logo
1 of 30
Download to read offline
Sergei Petrunia
MariaDB
ANALYZE for statements
MariaDB Fest
September 2020
MariaDB's hidden gem
Includes a comparison with
MySQL’s EXPLAIN ANALYZE
ANALYZE for statements
●
Introduced in MariaDB 10.1 (Oct, 2015)
− Got less recognition than it deserves
●
Was improved in the next MariaDB versions
− Based on experience
●
MySQL 8.0.18 (Oct, 2019) introduced EXPLAIN
ANALYZE
− Very similar feature, let’s compare.
The plan
●
ANALYZE In MariaDB
− ANALYZE
− ANALYZE FORMAT=JSON
●
EXPLAIN ANALYZE in MySQL
−Description and comparison
Background: EXPLAIN
●
Sometimes problem is apparent
●
Sometimes not
− Query plan vs reality?
− Where the time was spent?
explain select *
from lineitem, orders
where o_orderkey=l_orderkey and
o_orderdate between '1990-01-01' and '1998-12-06' and
l_extendedprice > 1000000
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |filtered|Extra |
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+
|1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278| 50.00 |Using where|
|1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 | 100.00 |Using where|
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+
ANALYZE vs EXPLAIN
●
Optimize the query ●
Optimize the query
●
Run the query
− Collect execution statistics
− Discard query output
●
Produce EXPLAIN output
− With also execution statistics
●
Produce EXPLAIN output
ANALYZEEXPLAIN
14:40:206
Tabular ANALYZE statement
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra |
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where|
|1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where|
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
analyze select *
from lineitem, orders
where o_orderkey=l_orderkey and
o_orderdate between '1990-01-01' and '1998-12-06' and
l_extendedprice > 1000000
r_ is for “real”
●
r_rows – observed #rows
●
r_filtered – observed condition selectivity.
Interpreting r_rows
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra |
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where|
|1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where|
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
●
ALL/index
− r_rows ≈ rows
− Different in case of LIMIT or subqueries
●
range/index_merge
− Up to ~2x difference with InnoDB
− Bigger difference in edge cases (IGNORE INDEX?)
Interpreting r_rows (2)
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra |
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where|
|1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where|
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
For ref access
●
rows is AVG(records for a key)
●
Some discrepancy is normal
●
Big discrepancy (>10x) is worth investigating
− rows=1, r_rows >> rows ? No index statistics (ANALYZE TABLE)
− Column has lots of NULL values? (innodb_stats_method)
− Skewed data distribution?
●Complex, IGNORE INDEX.
Interpreting r_filtered
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra |
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where|
|1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where|
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
●
(r_)filtered is selectivity of “Using where”
●
r_filtered << 100% → reading and discarding lots of rows
− Check if conditions allow index use
●Use EXPLAIN|ANALYZE FORMAT=JSON to see the condition
− Consider adding indexes
− Don't chase r_filtered=100.00, tradeoff between reads and writes.
analyze select *
from lineitem, orders
where o_orderkey=l_orderkey and
o_orderdate between '1990-01-01' and '1998-12-06' and
l_extendedprice > 1000000
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra |
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where|
|1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where|
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
r_filtered and query plans
●
filtered
− Shows how many rows will be removed from consideration
− Is important for N-way join optimization
●
r_filtered ≈ filtered
− Optimizer doesn't know condition selectivity → poor plans
− Consider collecting histogram(s) on columns used by the condition
Tabular ANALYZE summary
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra |
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where|
|1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where|
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
●
New columns:
− r_rows
− r_filtered
●
Can check estimates vs reality
14:40:2012
ANALYZE FORMAT=JSON
EXPLAIN
FORMAT=JSON
+ ANALYZE = ANALYZE FORMAT=JSON
14:38:1213
Basics
{
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 190.15729,
"table": {
"table_name": "orders",
"access_type": "ALL",
"r_loops": 1,
"rows": 1492405,
"r_rows": 1500000,
"r_table_time_ms": 142.0726678,
"r_other_time_ms": 48.07306875,
"filtered": 100,
"r_filtered": 85.9236,
"attached_condition": "orders.o_totalprice > 50000"
}
}
}
●
r_rows
●
r_filtered
analyze format=json
select count(*) from orders where o_totalprice > 50000
●
r_loops – number of times the operation was executed
●
r_..._time_ms – total time spent
− r_table_time_ms – time spent reading data from the table
− r_other_time_ms – time spent checking the WHERE, making lookup key, etc
A complex query: join
{
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 4056.659499,
"table": {
"table_name": "orders",
"access_type": "ALL",
"possible_keys": ["PRIMARY", "i_o_orderdate"],
"r_loops": 1,
"rows": 1492405,
"r_rows": 1500000,
"r_table_time_ms": 323.3849353,
"r_other_time_ms": 118.576661,
"filtered": 49.99996567,
"r_filtered": 100,
"attached_condition": "orders.o_orderDATE between '1990-
},
"table": {
"table_name": "lineitem",
"access_type": "ref",
"possible_keys": ["i_l_orderkey", "i_l_orderkey_quantity"],
"key": "PRIMARY",
"key_length": "4",
"used_key_parts": ["l_orderkey"],
"ref": ["dbt3sf1.orders.o_orderkey"],
"r_loops": 1500000,
"rows": 2,
"r_rows": 4.00081,
"r_table_time_ms": 3454.758327,
"r_other_time_ms": 159.9274175,
"filtered": 100,
"r_filtered": 0,
"attached_condition": "lineitem.l_extendedprice > 1000000"
}
}
}
analyze format=json
select *
from
lineitem, orders
where
o_orderkey=l_orderkey and
o_orderdate between '1990-01-01'
and '1998-12-06' and
l_extendedprice > 1000000
●
For each table, check
− r_table_time_ms,
r_other_time_ms
− r_loops
●
Instantly shows the problem areas
A complex query: subquery
{
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 1104.099893,
"table": {
"table_name": "OT",
"access_type": "range",
"possible_keys": ["i_o_orderdate"],
"key": "i_o_orderdate",
"key_length": "4",
"used_key_parts": ["o_orderDATE"],
"r_loops": 1,
"rows": 78708,
"r_rows": 39162,
"r_table_time_ms": 63.71686522,
"r_other_time_ms": 5.819774094,
"filtered": 100,
"r_filtered": 0.002553496,
"index_condition": "OT.o_orderDATE between '1998-06-01' and '1998-12-06'",
"attached_condition": "OT.o_totalprice > 0.9 * (subquery#2)"
},
"subqueries": [
{
"expression_cache": {
"state": "disabled",
"r_loops": 200,
"r_hit_ratio": 0,
"query_block": {
"select_id": 2,
"r_loops": 39162,
"r_total_time_ms": 1032.541544,
"outer_ref_condition": "OT.o_custkey is not null",
"table": {
"table_name": "orders",
"access_type": "ref",
"possible_keys": ["i_o_custkey"],
"key": "i_o_custkey",
"key_length": "5",
"used_key_parts": ["o_custkey"],
"ref": ["dbt3sf1.OT.o_custkey"],
"r_loops": 39162,
"rows": 7,
"r_rows": 17.73660691,
"r_table_time_ms": 985.8381759,
"r_other_time_ms": 32.85006493,
"filtered": 100,
"r_filtered": 100
}
}
}}]}}
analyze format=json
select *
from orders OT
where
o_orderdate between '1998-06-01' and
'1998-12-06' and
o_totalprice >
0.9 * (select
sum(o_totalprice)
from
orders
where
o_custkey=OT.o_custkey)
●
Each subquery is a query_block
●
It has r_total_time_ms, r_loops
− time includes children’ time
●
Narrowing down problems in
complex queries is now easy!
●
Can also see subquery cache
− was used but had r_rit_ratio=0 so
it disabled itself.
Sorting
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 52.68082832,
"read_sorted_file": {
"r_rows": 50,
"filesort": {
"sort_key": "customer.c_acctbal desc",
"r_loops": 1,
"r_total_time_ms": 52.65527147,
"r_limit": 50,
"r_used_priority_queue": true,
"r_output_rows": 51,
"r_sort_mode": "sort_key,addon_fields",
"table": {
"table_name": "customer",
"access_type": "ALL",
"r_loops": 1,
"rows": 148473,
"r_rows": 150000,
"r_table_time_ms": 35.72704671,
"r_other_time_ms": 16.90903666,
"filtered": 100,
"r_filtered": 100
}
}
}
}
}
analyze format=json
select * from customer
order by c_acctbal desc limit 50
●
EXPLAIN shows “Using filesort”
− Will it really read/write files?
− Is my @@sort_buffer_size large
enough?
− Is priority queue optimization used?
Sorting (2)
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 52.68082832,
"read_sorted_file": {
"r_rows": 50,
"filesort": {
"sort_key": "customer.c_acctbal desc",
"r_loops": 1,
"r_total_time_ms": 52.65527147,
"r_limit": 50,
"r_used_priority_queue": true,
"r_output_rows": 51,
"r_sort_mode": "sort_key,addon_fields",
"table": {
"table_name": "customer",
"access_type": "ALL",
"r_loops": 1,
"rows": 148473,
"r_rows": 150000,
"r_table_time_ms": 35.72704671,
"r_other_time_ms": 16.90903666,
"filtered": 100,
"r_filtered": 100
}
}
}
}
}
analyze format=json
select * from customer
order by c_acctbal desc limit 50
analyze format=json
select * from customer
order by c_acctbal desc limit 50000
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 85.70263992,
"read_sorted_file": {
"r_rows": 50000,
"filesort": {
"sort_key": "customer.c_acctbal desc",
"r_loops": 1,
"r_total_time_ms": 79.33072276,
"r_limit": 50000,
"r_used_priority_queue": false,
"r_output_rows": 150000,
"r_sort_passes": 1,
"r_buffer_size": "2047Kb",
"r_sort_mode": "sort_key,packed_addon_fields",
"table": {
"table_name": "customer",
"access_type": "ALL",
"r_loops": 1,
"rows": 148473,
"r_rows": 150000,
"r_table_time_ms": 37.21097011,
"r_other_time_ms": 42.09169676,
"filtered": 100,
"r_filtered": 100
}
}
}
}
}
14:38:1218
ANALYZE and UPDATEs
●
ANALYZE works for any statement that supports EXPLAIN
− UPDATE, DELETE
●
DML statements will do the modifications!
− ANALYZE UPDATE ... will update
− ANALYZE DELETE ... will delete
− (PostgreSQL’s EXPLAIN ANALYZE also works like this)
●
Don’t want modifications to be made?
begin;
analyze update ...;
rollback;
Overhead of ANALYZE
●
ANALYZE data is:
− Counters (cheap, always counted)
− Time measurements (expensive. counted only if running ANALYZE)
●
Max. overhead I observed: 10%
− Artificial example constructed to maximize overhead
− Typically less
ANALYZE and slow query log
slow_query_log=ON
log_slow_verbosity=explain
...
# Time: 200817 12:12:47
# User@Host: root[root] @ localhost []
# Thread_id: 18 Schema: dbt3sf1 QC_hit: No
# Query_time: 3.948642 Lock_time: 0.000062 Rows_sent: 0 Rows_examined: 7501215
# Rows_affected: 0 Bytes_sent: 1756
#
# explain: id select_type table type possible_keys key key_len ref rows r_rows filtered
r_filtered Extra
# explain: 1 SIMPLE orders ALL PRIMARY,i_o_orderdate NULL NULL NULL 1492405 1500000.00 50.00 100.00
Using where
# explain: 1 SIMPLE lineitem ref PRIMARY,i_l_orderkey,i_l_orderkey_quantity PRIMARY 4
dbt3sf1.orders.o_orderkey 24.00 100.00 0.00 Using where
#
SET timestamp=1597655567;
select *
...
●
log_slow_verbosity=explain shows ANALYZE output
●
Obstacles to printing JSON:
− No timing data
●
my.cnf
●
hostname-slow.log
− Keeping format compatible with pt-query-digest?
ANALYZE FORMAT=JSON Summary
●
It is EXPLAIN FORMAT=JSON with execution data
●
Use r_rows and r_filtered to check estimates vs reality
●
Use r_total_time_ms and r_loops to narrow down problems
●
Sorting, buffering, caches report r_ values about their
execution.
●
Please do attach ANALYZE FORMAT=JSON output when
asking for optimizer help or reporting an issue.
MySQL 8:
EXPLAIN ANALYZE
EXPLAIN ANALYZE in MySQL 8
●
Introduced in MySQL 8.0.18 (Oct, 2019)
●
Produces PostgreSQL-like output
− (In next slides, I will edit it for readability)
| -> Nested loop inner join (cost=642231.45 rows=1007917) (actual time=65.871..4580.448 rows=24 loops=1)
-> Filter: (orders.o_orderDATE between '1990-01-01' and '1998-12-06') (cost=151912.95 rows=743908) (actual
time=0.154..536.302 rows=1500000 loops=1)
-> Table scan on orders (cost=151912.95 rows=1487817) (actual time=0.146..407.074 rows=1500000 loops=1)
-> Filter: (lineitem.l_extendedprice > 104500) (cost=0.25 rows=1) (actual time=0.003..0.003 rows=0
loops=1500000)
-> Index lookup on lineitem using PRIMARY (l_orderkey=orders.o_orderkey) (cost=0.25 rows=4) (actual
time=0.002..0.002 rows=4 loops=1500000)
●
Reason for this: the runtime is switched to “Iterator
Interface”.
Basics
●
loops is like MariaDB’s r_loops
●
rows is number of rows in the output
●
rows after Filter / rows before it =
33705/39162=86% - this is r_filtered.
explain analyze
select * from orders
where
o_orderdate between '1998-06-01' and '1998-12-06' and
o_totalprice>50000
| -> Filter: (orders.o_totalprice > 50000) (cost=35418.86 rows=26233)
(actual time=0.607..51.881 rows=33705 loops=1)
-> Index range scan on orders using i_o_orderdate, with index
condition: (orders.o_orderDATE between '1998-06-01' and '1998-12-06')
(cost=35418.86 rows=78708)
(actual time=0.603..50.277 rows=39162 loops=1)
●
actual time shows X..Y
− X - time to get the first tuple
− Y- time to get all output
− Both are averages across all loops.
A complex query: join
"query_block": {
"select_id": 1,
"r_loops": 1,
"r_total_time_ms": 4056.659499,
"table": {
"table_name": "orders",
"access_type": "ALL",
"possible_keys": ["PRIMARY", "i_o_orderdate"],
"r_loops": 1,
"rows": 1492405,
"r_rows": 1500000,
"r_table_time_ms": 323.3849353,
"r_other_time_ms": 118.576661,
"filtered": 49.99996567,
"r_filtered": 100,
"attached_condition": "orders.o_orderDATE between '1990-
},
"table": {
"table_name": "lineitem",
"access_type": "ref",
"possible_keys": ["i_l_orderkey", "i_l_orderkey_quantity"
"key": "PRIMARY",
"key_length": "4",
"used_key_parts": ["l_orderkey"],
"ref": ["dbt3sf1.orders.o_orderkey"],
"r_loops": 1500000,
"rows": 2,
"r_rows": 4.00081,
"r_table_time_ms": 3454.758327,
"r_other_time_ms": 159.9274175,
"filtered": 100,
"r_filtered": 0,
"attached_condition": "lineitem.l_extendedprice > 1000000
}
}
analyze format=json
select *
from
lineitem, orders
where
o_orderkey=l_orderkey and
o_orderdate between '1990-01-01'
and '1998-12-06' and
l_extendedprice > 1000000
-> Nested loop inner join (cost=642231.45 rows=1007917)
(actual time=4496.131..4496.131 rows=0 loops=1)
-> Filter: (orders.o_orderDATE between '1990-01-01' and '1998-12-06')
(cost=151912.95 rows=743908)
(actual time=0.162..534.395 rows=1500000 loops=1)
-> Table scan on orders (cost=151912.95 rows=1487817)
(actual time=0.154..402.573 rows=1500000 loops=1)
-> Filter: (lineitem.l_extendedprice > 1000000) (cost=0.25 rows=1)
(actual time=0.003..0.003 rows=0 loops=1500000)
-> Index lookup on lineitem using PRIMARY
(l_orderkey=orders.o_orderkey)
(cost=0.25 rows=4)
(actual time=0.002..0.002 rows=4 loops=1500000)
●
r_total_time_ms = actual_time.second_value * loops. Practice your arithmetics skills :-)
Sorting
explain analyze select * from customer order by c_acctbal desc limit 50
| -> Limit: 50 row(s) (actual time=55.830..55.837 rows=50 loops=1)
-> Sort: customer.c_acctbal DESC, limit input to 50 row(s) per chunk (cost=15301.60 rows=148446)
(actual time=55.830..55.835 rows=50 loops=1)
-> Table scan on customer (actual time=0.137..38.462 rows=150000 loops=1)
●
No difference
●
But look at Optimizer Trace: it does show the algorithm used:
explain analyze select * from customer order by c_acctbal desc limit 50000
| -> Limit: 50000 row(s) (actual time=101.536..106.927 rows=50000 loops=1)
-> Sort: customer.c_acctbal DESC, limit input to 50000 row(s) per chunk (cost=15301.60 rows=148446)
(actual time=101.535..105.539 rows=50000 loops=1)
-> Table scan on customer (actual time=0.147..37.550 rows=150000 loops=1)
"filesort_priority_queue_optimization": {
"limit": 50,
"chosen": true
},
"filesort_priority_queue_optimization": {
"limit": 50000
},
"filesort_execution": [
],
"filesort_summary": {
"max_rows_per_buffer": 303,
"num_rows_estimate": 1400441,
"num_rows_found": 150000,
"num_initial_chunks_spilled_to_disk": 108,
"peak_memory_used": 270336,
"sort_algorithm": "std::stable_sort",
}
Handling for UPDATE/DELETE
●
Single-table UPDATE/DELETE is not supported
mysql> explain analyze delete from orders where o_orderkey=1234;
+----------------------------------------+
| EXPLAIN |
+----------------------------------------+
| <not executable by iterator executor>
|
+----------------------------------------+
1 row in set (0.00 sec)
− The same used to be true for outer joins but was fixed
●
Multi-table UPDATE/DELETE is supported
− Unlike in MariaDB/PostgreSQL, won’t make any changes.
MySQL’s EXPLAIN ANALYZE summary
●
The syntax is EXPLAIN ANALYZE
●
Postgres-like output
●
Not all plan details are shown
− Some can be found in the Optimizer Trace
●
Not all statements are supported
●
Can have higher overhead
− Artificial “bad” example: 50% overhead (vs 10% in MariaDB)
●
Shows extra info:
− Has nodes for grouping operations (worth adding?)
− #rows output for each node (worth adding, too?)
− “Time to get the first row” (not sure if useful?)
Final conclusions
●
MariaDB has
− ANALYZE <statement>
− ANALYZE FORMAT=JSON <statement>
●
Stable feature
●
Very useful in diagnosing optimizer issues
●
MySQL has got EXPLAIN ANALYZE last year
− A similar feature
− Different interface, printed data, etc
− No obvious advantages over MariaDB’s?
14:40:2030
Thanks!

More Related Content

What's hot

How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
Sveta Smirnova
 
MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016
Wagner Bianchi
 

What's hot (20)

Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizer
 
イルカさんチームからゾウさんチームに教えたいMySQLレプリケーション
イルカさんチームからゾウさんチームに教えたいMySQLレプリケーションイルカさんチームからゾウさんチームに教えたいMySQLレプリケーション
イルカさんチームからゾウさんチームに教えたいMySQLレプリケーション
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query Tuning
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
MySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB StatusMySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB Status
 
MySQL Performance Schema in MySQL 8.0
MySQL Performance Schema in MySQL 8.0MySQL Performance Schema in MySQL 8.0
MySQL Performance Schema in MySQL 8.0
 
MySQLチューニング
MySQLチューニングMySQLチューニング
MySQLチューニング
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
 
SOUG Oracle Unified Audit for Multitenant Databases
SOUG Oracle Unified Audit for Multitenant DatabasesSOUG Oracle Unified Audit for Multitenant Databases
SOUG Oracle Unified Audit for Multitenant Databases
 
MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScale
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
DOAG Oracle Unified Audit in Multitenant Environments
DOAG Oracle Unified Audit in Multitenant EnvironmentsDOAG Oracle Unified Audit in Multitenant Environments
DOAG Oracle Unified Audit in Multitenant Environments
 
Laracon EU 2018: Things every developer absolutely, positively needs to know ...
Laracon EU 2018: Things every developer absolutely, positively needs to know ...Laracon EU 2018: Things every developer absolutely, positively needs to know ...
Laracon EU 2018: Things every developer absolutely, positively needs to know ...
 
M|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change MethodsM|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change Methods
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
 

Similar to ANALYZE for Statements - MariaDB's hidden gem

Applied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System PresentationApplied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System Presentation
Richard Crowley
 
15 protips for mysql users pfz
15 protips for mysql users   pfz15 protips for mysql users   pfz
15 protips for mysql users pfz
Joshua Thijssen
 

Similar to ANALYZE for Statements - MariaDB's hidden gem (20)

Explain
ExplainExplain
Explain
 
MariaDB: ANALYZE for statements (lightning talk)
MariaDB:  ANALYZE for statements (lightning talk)MariaDB:  ANALYZE for statements (lightning talk)
MariaDB: ANALYZE for statements (lightning talk)
 
Applied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System PresentationApplied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System Presentation
 
Parallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQLParallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQL
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL Indexing
 
Explain2
Explain2Explain2
Explain2
 
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performance
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
 
MySQLinsanity
MySQLinsanityMySQLinsanity
MySQLinsanity
 
15 protips for mysql users pfz
15 protips for mysql users   pfz15 protips for mysql users   pfz
15 protips for mysql users pfz
 
Need for Speed: Mysql indexing
Need for Speed: Mysql indexingNeed for Speed: Mysql indexing
Need for Speed: Mysql indexing
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 
Mysql 4.0 casual
Mysql 4.0 casualMysql 4.0 casual
Mysql 4.0 casual
 
Analytics functions in mysql, oracle and hive
Analytics functions in mysql, oracle and hiveAnalytics functions in mysql, oracle and hive
Analytics functions in mysql, oracle and hive
 
Pro PostgreSQL
Pro PostgreSQLPro PostgreSQL
Pro PostgreSQL
 
Functional Database Strategies
Functional Database StrategiesFunctional Database Strategies
Functional Database Strategies
 
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
 
MySQL SQL Tutorial
MySQL SQL TutorialMySQL SQL Tutorial
MySQL SQL Tutorial
 

More from Sergey Petrunya

More from Sergey Petrunya (20)

MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixes
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
 
MyRocks in MariaDB
MyRocks in MariaDBMyRocks in MariaDB
MyRocks in MariaDB
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQL
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDB
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
WSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration ToolingWSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration Tooling
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 

ANALYZE for Statements - MariaDB's hidden gem

  • 1. Sergei Petrunia MariaDB ANALYZE for statements MariaDB Fest September 2020 MariaDB's hidden gem Includes a comparison with MySQL’s EXPLAIN ANALYZE
  • 2. ANALYZE for statements ● Introduced in MariaDB 10.1 (Oct, 2015) − Got less recognition than it deserves ● Was improved in the next MariaDB versions − Based on experience ● MySQL 8.0.18 (Oct, 2019) introduced EXPLAIN ANALYZE − Very similar feature, let’s compare.
  • 3. The plan ● ANALYZE In MariaDB − ANALYZE − ANALYZE FORMAT=JSON ● EXPLAIN ANALYZE in MySQL −Description and comparison
  • 4. Background: EXPLAIN ● Sometimes problem is apparent ● Sometimes not − Query plan vs reality? − Where the time was spent? explain select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000 +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |filtered|Extra | +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+ |1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278| 50.00 |Using where| |1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 | 100.00 |Using where| +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+
  • 5. ANALYZE vs EXPLAIN ● Optimize the query ● Optimize the query ● Run the query − Collect execution statistics − Discard query output ● Produce EXPLAIN output − With also execution statistics ● Produce EXPLAIN output ANALYZEEXPLAIN
  • 6. 14:40:206 Tabular ANALYZE statement +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra | +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where| |1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where| +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ analyze select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000 r_ is for “real” ● r_rows – observed #rows ● r_filtered – observed condition selectivity.
  • 7. Interpreting r_rows +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra | +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where| |1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where| +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ ● ALL/index − r_rows ≈ rows − Different in case of LIMIT or subqueries ● range/index_merge − Up to ~2x difference with InnoDB − Bigger difference in edge cases (IGNORE INDEX?)
  • 8. Interpreting r_rows (2) +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra | +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where| |1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where| +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ For ref access ● rows is AVG(records for a key) ● Some discrepancy is normal ● Big discrepancy (>10x) is worth investigating − rows=1, r_rows >> rows ? No index statistics (ANALYZE TABLE) − Column has lots of NULL values? (innodb_stats_method) − Skewed data distribution? ●Complex, IGNORE INDEX.
  • 9. Interpreting r_filtered +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra | +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where| |1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where| +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ ● (r_)filtered is selectivity of “Using where” ● r_filtered << 100% → reading and discarding lots of rows − Check if conditions allow index use ●Use EXPLAIN|ANALYZE FORMAT=JSON to see the condition − Consider adding indexes − Don't chase r_filtered=100.00, tradeoff between reads and writes. analyze select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000 +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra | +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where| |1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where| +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
  • 10. r_filtered and query plans ● filtered − Shows how many rows will be removed from consideration − Is important for N-way join optimization ● r_filtered ≈ filtered − Optimizer doesn't know condition selectivity → poor plans − Consider collecting histogram(s) on columns used by the condition
  • 11. Tabular ANALYZE summary +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra | +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where| |1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where| +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ ● New columns: − r_rows − r_filtered ● Can check estimates vs reality
  • 13. 14:38:1213 Basics { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 190.15729, "table": { "table_name": "orders", "access_type": "ALL", "r_loops": 1, "rows": 1492405, "r_rows": 1500000, "r_table_time_ms": 142.0726678, "r_other_time_ms": 48.07306875, "filtered": 100, "r_filtered": 85.9236, "attached_condition": "orders.o_totalprice > 50000" } } } ● r_rows ● r_filtered analyze format=json select count(*) from orders where o_totalprice > 50000 ● r_loops – number of times the operation was executed ● r_..._time_ms – total time spent − r_table_time_ms – time spent reading data from the table − r_other_time_ms – time spent checking the WHERE, making lookup key, etc
  • 14. A complex query: join { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 4056.659499, "table": { "table_name": "orders", "access_type": "ALL", "possible_keys": ["PRIMARY", "i_o_orderdate"], "r_loops": 1, "rows": 1492405, "r_rows": 1500000, "r_table_time_ms": 323.3849353, "r_other_time_ms": 118.576661, "filtered": 49.99996567, "r_filtered": 100, "attached_condition": "orders.o_orderDATE between '1990- }, "table": { "table_name": "lineitem", "access_type": "ref", "possible_keys": ["i_l_orderkey", "i_l_orderkey_quantity"], "key": "PRIMARY", "key_length": "4", "used_key_parts": ["l_orderkey"], "ref": ["dbt3sf1.orders.o_orderkey"], "r_loops": 1500000, "rows": 2, "r_rows": 4.00081, "r_table_time_ms": 3454.758327, "r_other_time_ms": 159.9274175, "filtered": 100, "r_filtered": 0, "attached_condition": "lineitem.l_extendedprice > 1000000" } } } analyze format=json select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000 ● For each table, check − r_table_time_ms, r_other_time_ms − r_loops ● Instantly shows the problem areas
  • 15. A complex query: subquery { "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 1104.099893, "table": { "table_name": "OT", "access_type": "range", "possible_keys": ["i_o_orderdate"], "key": "i_o_orderdate", "key_length": "4", "used_key_parts": ["o_orderDATE"], "r_loops": 1, "rows": 78708, "r_rows": 39162, "r_table_time_ms": 63.71686522, "r_other_time_ms": 5.819774094, "filtered": 100, "r_filtered": 0.002553496, "index_condition": "OT.o_orderDATE between '1998-06-01' and '1998-12-06'", "attached_condition": "OT.o_totalprice > 0.9 * (subquery#2)" }, "subqueries": [ { "expression_cache": { "state": "disabled", "r_loops": 200, "r_hit_ratio": 0, "query_block": { "select_id": 2, "r_loops": 39162, "r_total_time_ms": 1032.541544, "outer_ref_condition": "OT.o_custkey is not null", "table": { "table_name": "orders", "access_type": "ref", "possible_keys": ["i_o_custkey"], "key": "i_o_custkey", "key_length": "5", "used_key_parts": ["o_custkey"], "ref": ["dbt3sf1.OT.o_custkey"], "r_loops": 39162, "rows": 7, "r_rows": 17.73660691, "r_table_time_ms": 985.8381759, "r_other_time_ms": 32.85006493, "filtered": 100, "r_filtered": 100 } } }}]}} analyze format=json select * from orders OT where o_orderdate between '1998-06-01' and '1998-12-06' and o_totalprice > 0.9 * (select sum(o_totalprice) from orders where o_custkey=OT.o_custkey) ● Each subquery is a query_block ● It has r_total_time_ms, r_loops − time includes children’ time ● Narrowing down problems in complex queries is now easy! ● Can also see subquery cache − was used but had r_rit_ratio=0 so it disabled itself.
  • 16. Sorting "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 52.68082832, "read_sorted_file": { "r_rows": 50, "filesort": { "sort_key": "customer.c_acctbal desc", "r_loops": 1, "r_total_time_ms": 52.65527147, "r_limit": 50, "r_used_priority_queue": true, "r_output_rows": 51, "r_sort_mode": "sort_key,addon_fields", "table": { "table_name": "customer", "access_type": "ALL", "r_loops": 1, "rows": 148473, "r_rows": 150000, "r_table_time_ms": 35.72704671, "r_other_time_ms": 16.90903666, "filtered": 100, "r_filtered": 100 } } } } } analyze format=json select * from customer order by c_acctbal desc limit 50 ● EXPLAIN shows “Using filesort” − Will it really read/write files? − Is my @@sort_buffer_size large enough? − Is priority queue optimization used?
  • 17. Sorting (2) "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 52.68082832, "read_sorted_file": { "r_rows": 50, "filesort": { "sort_key": "customer.c_acctbal desc", "r_loops": 1, "r_total_time_ms": 52.65527147, "r_limit": 50, "r_used_priority_queue": true, "r_output_rows": 51, "r_sort_mode": "sort_key,addon_fields", "table": { "table_name": "customer", "access_type": "ALL", "r_loops": 1, "rows": 148473, "r_rows": 150000, "r_table_time_ms": 35.72704671, "r_other_time_ms": 16.90903666, "filtered": 100, "r_filtered": 100 } } } } } analyze format=json select * from customer order by c_acctbal desc limit 50 analyze format=json select * from customer order by c_acctbal desc limit 50000 "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 85.70263992, "read_sorted_file": { "r_rows": 50000, "filesort": { "sort_key": "customer.c_acctbal desc", "r_loops": 1, "r_total_time_ms": 79.33072276, "r_limit": 50000, "r_used_priority_queue": false, "r_output_rows": 150000, "r_sort_passes": 1, "r_buffer_size": "2047Kb", "r_sort_mode": "sort_key,packed_addon_fields", "table": { "table_name": "customer", "access_type": "ALL", "r_loops": 1, "rows": 148473, "r_rows": 150000, "r_table_time_ms": 37.21097011, "r_other_time_ms": 42.09169676, "filtered": 100, "r_filtered": 100 } } } } }
  • 18. 14:38:1218 ANALYZE and UPDATEs ● ANALYZE works for any statement that supports EXPLAIN − UPDATE, DELETE ● DML statements will do the modifications! − ANALYZE UPDATE ... will update − ANALYZE DELETE ... will delete − (PostgreSQL’s EXPLAIN ANALYZE also works like this) ● Don’t want modifications to be made? begin; analyze update ...; rollback;
  • 19. Overhead of ANALYZE ● ANALYZE data is: − Counters (cheap, always counted) − Time measurements (expensive. counted only if running ANALYZE) ● Max. overhead I observed: 10% − Artificial example constructed to maximize overhead − Typically less
  • 20. ANALYZE and slow query log slow_query_log=ON log_slow_verbosity=explain ... # Time: 200817 12:12:47 # User@Host: root[root] @ localhost [] # Thread_id: 18 Schema: dbt3sf1 QC_hit: No # Query_time: 3.948642 Lock_time: 0.000062 Rows_sent: 0 Rows_examined: 7501215 # Rows_affected: 0 Bytes_sent: 1756 # # explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra # explain: 1 SIMPLE orders ALL PRIMARY,i_o_orderdate NULL NULL NULL 1492405 1500000.00 50.00 100.00 Using where # explain: 1 SIMPLE lineitem ref PRIMARY,i_l_orderkey,i_l_orderkey_quantity PRIMARY 4 dbt3sf1.orders.o_orderkey 24.00 100.00 0.00 Using where # SET timestamp=1597655567; select * ... ● log_slow_verbosity=explain shows ANALYZE output ● Obstacles to printing JSON: − No timing data ● my.cnf ● hostname-slow.log − Keeping format compatible with pt-query-digest?
  • 21. ANALYZE FORMAT=JSON Summary ● It is EXPLAIN FORMAT=JSON with execution data ● Use r_rows and r_filtered to check estimates vs reality ● Use r_total_time_ms and r_loops to narrow down problems ● Sorting, buffering, caches report r_ values about their execution. ● Please do attach ANALYZE FORMAT=JSON output when asking for optimizer help or reporting an issue.
  • 23. EXPLAIN ANALYZE in MySQL 8 ● Introduced in MySQL 8.0.18 (Oct, 2019) ● Produces PostgreSQL-like output − (In next slides, I will edit it for readability) | -> Nested loop inner join (cost=642231.45 rows=1007917) (actual time=65.871..4580.448 rows=24 loops=1) -> Filter: (orders.o_orderDATE between '1990-01-01' and '1998-12-06') (cost=151912.95 rows=743908) (actual time=0.154..536.302 rows=1500000 loops=1) -> Table scan on orders (cost=151912.95 rows=1487817) (actual time=0.146..407.074 rows=1500000 loops=1) -> Filter: (lineitem.l_extendedprice > 104500) (cost=0.25 rows=1) (actual time=0.003..0.003 rows=0 loops=1500000) -> Index lookup on lineitem using PRIMARY (l_orderkey=orders.o_orderkey) (cost=0.25 rows=4) (actual time=0.002..0.002 rows=4 loops=1500000) ● Reason for this: the runtime is switched to “Iterator Interface”.
  • 24. Basics ● loops is like MariaDB’s r_loops ● rows is number of rows in the output ● rows after Filter / rows before it = 33705/39162=86% - this is r_filtered. explain analyze select * from orders where o_orderdate between '1998-06-01' and '1998-12-06' and o_totalprice>50000 | -> Filter: (orders.o_totalprice > 50000) (cost=35418.86 rows=26233) (actual time=0.607..51.881 rows=33705 loops=1) -> Index range scan on orders using i_o_orderdate, with index condition: (orders.o_orderDATE between '1998-06-01' and '1998-12-06') (cost=35418.86 rows=78708) (actual time=0.603..50.277 rows=39162 loops=1) ● actual time shows X..Y − X - time to get the first tuple − Y- time to get all output − Both are averages across all loops.
  • 25. A complex query: join "query_block": { "select_id": 1, "r_loops": 1, "r_total_time_ms": 4056.659499, "table": { "table_name": "orders", "access_type": "ALL", "possible_keys": ["PRIMARY", "i_o_orderdate"], "r_loops": 1, "rows": 1492405, "r_rows": 1500000, "r_table_time_ms": 323.3849353, "r_other_time_ms": 118.576661, "filtered": 49.99996567, "r_filtered": 100, "attached_condition": "orders.o_orderDATE between '1990- }, "table": { "table_name": "lineitem", "access_type": "ref", "possible_keys": ["i_l_orderkey", "i_l_orderkey_quantity" "key": "PRIMARY", "key_length": "4", "used_key_parts": ["l_orderkey"], "ref": ["dbt3sf1.orders.o_orderkey"], "r_loops": 1500000, "rows": 2, "r_rows": 4.00081, "r_table_time_ms": 3454.758327, "r_other_time_ms": 159.9274175, "filtered": 100, "r_filtered": 0, "attached_condition": "lineitem.l_extendedprice > 1000000 } } analyze format=json select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000 -> Nested loop inner join (cost=642231.45 rows=1007917) (actual time=4496.131..4496.131 rows=0 loops=1) -> Filter: (orders.o_orderDATE between '1990-01-01' and '1998-12-06') (cost=151912.95 rows=743908) (actual time=0.162..534.395 rows=1500000 loops=1) -> Table scan on orders (cost=151912.95 rows=1487817) (actual time=0.154..402.573 rows=1500000 loops=1) -> Filter: (lineitem.l_extendedprice > 1000000) (cost=0.25 rows=1) (actual time=0.003..0.003 rows=0 loops=1500000) -> Index lookup on lineitem using PRIMARY (l_orderkey=orders.o_orderkey) (cost=0.25 rows=4) (actual time=0.002..0.002 rows=4 loops=1500000) ● r_total_time_ms = actual_time.second_value * loops. Practice your arithmetics skills :-)
  • 26. Sorting explain analyze select * from customer order by c_acctbal desc limit 50 | -> Limit: 50 row(s) (actual time=55.830..55.837 rows=50 loops=1) -> Sort: customer.c_acctbal DESC, limit input to 50 row(s) per chunk (cost=15301.60 rows=148446) (actual time=55.830..55.835 rows=50 loops=1) -> Table scan on customer (actual time=0.137..38.462 rows=150000 loops=1) ● No difference ● But look at Optimizer Trace: it does show the algorithm used: explain analyze select * from customer order by c_acctbal desc limit 50000 | -> Limit: 50000 row(s) (actual time=101.536..106.927 rows=50000 loops=1) -> Sort: customer.c_acctbal DESC, limit input to 50000 row(s) per chunk (cost=15301.60 rows=148446) (actual time=101.535..105.539 rows=50000 loops=1) -> Table scan on customer (actual time=0.147..37.550 rows=150000 loops=1) "filesort_priority_queue_optimization": { "limit": 50, "chosen": true }, "filesort_priority_queue_optimization": { "limit": 50000 }, "filesort_execution": [ ], "filesort_summary": { "max_rows_per_buffer": 303, "num_rows_estimate": 1400441, "num_rows_found": 150000, "num_initial_chunks_spilled_to_disk": 108, "peak_memory_used": 270336, "sort_algorithm": "std::stable_sort", }
  • 27. Handling for UPDATE/DELETE ● Single-table UPDATE/DELETE is not supported mysql> explain analyze delete from orders where o_orderkey=1234; +----------------------------------------+ | EXPLAIN | +----------------------------------------+ | <not executable by iterator executor> | +----------------------------------------+ 1 row in set (0.00 sec) − The same used to be true for outer joins but was fixed ● Multi-table UPDATE/DELETE is supported − Unlike in MariaDB/PostgreSQL, won’t make any changes.
  • 28. MySQL’s EXPLAIN ANALYZE summary ● The syntax is EXPLAIN ANALYZE ● Postgres-like output ● Not all plan details are shown − Some can be found in the Optimizer Trace ● Not all statements are supported ● Can have higher overhead − Artificial “bad” example: 50% overhead (vs 10% in MariaDB) ● Shows extra info: − Has nodes for grouping operations (worth adding?) − #rows output for each node (worth adding, too?) − “Time to get the first row” (not sure if useful?)
  • 29. Final conclusions ● MariaDB has − ANALYZE <statement> − ANALYZE FORMAT=JSON <statement> ● Stable feature ● Very useful in diagnosing optimizer issues ● MySQL has got EXPLAIN ANALYZE last year − A similar feature − Different interface, printed data, etc − No obvious advantages over MariaDB’s?