Optimizing queries MySQL

Optimizing queries in
MySQL
Tabular, tree and visual
explain plans, new
optimization features
November 15-th, 2019

Who am I?
• Software Development Manager & Team Leader @ Codix
• Using MySQL since 3.23.x
• Building MySQL Server and related products for Slackware Linux for
more than 14 years (check SlackPack)
• Free and open source software enthusiast
• Formula 1 and Le Mans 24 fan
• gdsotirov @
2

Agenda
• Query optimization
• What is explain plan?
• How to explain queries in MySQL
• Understanding tabular explain plans
• Optimization features (indexes, hints, histograms)
• New optimization features (TREE, hash join, EXPLAIN ANALYZE)
• Using visual explain plans
3

Query optimization
• SQL queries express requests to the database
• Database parses, optimizes and executes the query
• The optimizer should choose the most efficient way for execution:
• using all available information about tables, columns, indexes, etc.
• evaluating alternative plans
• For us (database practitioners) optimizing (or tuning) queries means to
• ensure use of efficient access paths;
• ensure use of optimal join order;
• rewrite query, to help optimizer choose a better plan; and
• even reorganize (normalize) the schema.
• The optimizer provides information about how it intends to execute a
query through the explain plan
For automatic query optimization see EverSQL 4

What is explain plan?
• Explain plan (or execution plan) shows the steps that MySQL
optimizer would take to execute a query
• Includes information about:
• access paths;
• used indexes and partitions;
• joins between tables;
• order of operations; and
• extra details.
• It helps understand if indexes are missing or not used, whether joins
are done in optimal order or generally why queries are slow
5

Why should we as developers care?
• We (should) know the schema (or should we?)
• We know how the application(s) queries data (i.e. the access paths)
as we are familiar with the implemented functionalities
• We know what data is stored in tables as it comes from the
application(s)
• We should care about performance of the application(s), which may
depend on the performance of SQL queries
• We do not need to depend on others (e.g. DBAs), if we could
diagnose and fix a slow query
6

How to explain queries - EXPLAIN syntax
• The general syntax is:
{EXPLAIN | DESCRIBE | DESC}
[explain_type: {FORMAT = {TRADITIONAL | JSON | TREE}]
{statement | FOR CONNECTION con_id}
• Statement could be SELECT, DELETE, INSERT, REPLACE or UPDATE
(before 5.6.3 only SELECT)
• TREE is new format since 8.0.16 GA (2019-04-25)
• Can explain the currently executing query for a connection (since 5.7.2)
• Requires SELECT privilege for tables and views + SHOW VIEW privilege for
views
• DESCRIBE is synonym for EXPLAIN but used mostly for getting table
structure
7

Example schema – departments and
employees
CREATE DATABASE dept_emp;
USE dept_emp;
CREATE TABLE dept (
deptno INTEGER,
dname VARCHAR(14),
loc VARCHAR(13),
CONSTRAINT pk_dept
PRIMARY KEY (deptno)
);
CREATE TABLE emp (
empno INTEGER,
ename VARCHAR(10),
job VARCHAR(9),
mgr INTEGER,
hiredate DATE,
sal DECIMAL(7,2),
comm DECIMAL(7,2),
deptno INTEGER,
CONSTRAINT pk_emp PRIMARY KEY (empno),
CONSTRAINT fk_deptno FOREIGN KEY (deptno)
REFERENCES dept (deptno)
);
8

Example schema - data
INSERT INTO dept VALUES (10, 'ACCOUNTING', 'NEW YORK');
INSERT INTO dept VALUES (20, 'RESEARCH' , 'DALLAS');
INSERT INTO dept VALUES (30, 'SALES' , 'CHICAGO');
INSERT INTO dept VALUES (40, 'OPERATIONS', 'BOSTON');
INSERT INTO emp VALUES (7839, 'KING' , 'PRESIDENT', NULL, '1981-11-17', 5000, NULL, 10);
INSERT INTO emp VALUES (7698, 'BLAKE' , 'MANAGER' , 7839, '1981-05-01', 2850, NULL, 30);
INSERT INTO emp VALUES (7782, 'CLARK' , 'MANAGER' , 7839, '1981-06-09', 2450, NULL, 10);
INSERT INTO emp VALUES (7566, 'JONES' , 'MANAGER' , 7839, '1981-04-02', 2975, NULL, 20);
INSERT INTO emp VALUES (7788, 'SCOTT' , 'ANALYST' , 7566, '1987-06-13', 3000, NULL, 20);
INSERT INTO emp VALUES (7902, 'FORD' , 'ANALYST' , 7566, '1981-12-03', 3000, NULL, 20);
INSERT INTO emp VALUES (7369, 'SMITH' , 'CLERK' , 7902, '1980-12-17', 800, NULL, 20);
INSERT INTO emp VALUES (7499, 'ALLEN' , 'SALESMAN' , 7698, '1981-02-20', 1600, 300, 30);
INSERT INTO emp VALUES (7521, 'WARD' , 'SALESMAN' , 7698, '1981-02-22', 1250, 500, 30);
INSERT INTO emp VALUES (7654, 'MARTIN', 'SALESMAN' , 7698, '1981-09-28', 1250, 1400, 30);
INSERT INTO emp VALUES (7844, 'TURNER', 'SALESMAN' , 7698, '1981-09-08', 1500, 0, 30);
INSERT INTO emp VALUES (7876, 'ADAMS' , 'CLERK' , 7788, '1987-06-13', 1100, NULL, 20);
INSERT INTO emp VALUES (7900, 'JAMES' , 'CLERK' , 7698, '1981-12-03', 950, NULL, 30);
INSERT INTO emp VALUES (7934, 'MILLER', 'CLERK' , 7782, '1982-01-23', 1300, NULL, 10);
+ 1 000 000 rows
See gen_emps.sql
9

Traditional explain plan Example
EXPLAIN
SELECT E.ename, E.job, D.dname
FROM dept D,
emp E
WHERE E.deptno = D.deptno
AND E.job = 'CLERK';
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
| id | select_type | table || type | possible_keys | key || ref | rows | filtered | Extra |
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
| 1 | SIMPLE | D || ALL | PRIMARY | NULL || NULL | 4 | 100 | NULL |
| 1 | SIMPLE | E || ref | fk_deptno | fk_deptno || dept_emp.D.deptno | 250003 | 10 | Using where |
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
2 rows in set, 1 warning (0.0006 sec)
Note (code 1003): /* select#1 */ select `dept_emp`.è`.èname` AS èname`,`dept_emp`.è`.`job` AS
`job`,`dept_emp`.`d`.`dname` AS `dname` from `dept_emp`.`dept` `d` join `dept_emp`.èmp` è` where
((`dept_emp`.è`.`deptno` = `dept_emp`.`d`.`deptno`) and (`dept_emp`.è`.`job` = 'CLERK'))
rows to join from E = 250 003 * 10% = 25 000.3 (per department)
10

Understanding tabular explain plans 1/5
• id: the sequential number of the SELECT in
query
• select_type: possible values include:
+----+-------------+
| id | select_type |
+----+-------------+
| 1 | SIMPLE |
| 1 | SIMPLE |
+----+-------------+
Value Meaning
SIMPLE no unions or subqueries
PRIMARY outermost SELECT
[DEPENDENT] UNION [dependent on outer query] second or later SELECT in a union
UNION RESULT result of a union
[DEPENDENT] SUBQUERY [dependent on outer query] first SELECT in subquery
[DEPENDENT] DERIVED [dependent on another table] derived table
MATERIALIZED materialized subquery
UNCACHEABLE [SUBQUERY|UNION] a subquery/union that must be re-evaluated for each row of the outer query
11

• table: name of table, union, subquery or derived. Could be
• table alias (or name)
• <unionM,N> - union between rows with ids M and N;
• <subqueryN> - result of materialized subquery from row N;
• <derivedN> - result of derived table from row N.
• partitions: NULL or the names of matched partitions
• type: join (access) type
+-------+------------+------+
| table | partitions | type |
+-------+------------+------+
| D | NULL | ALL |
| E | NULL | ref |
+-------+------------+------+
12

Join (access) types
best
worst
Value Meaning
system for tables with just one row
const for tables matching at most one row (by constant value)
eq_ref for 1:1 relations (primary keys or UNIQUE NOT NULL indexes)
ref for 1:N relations (non-unique indexes)
ref_or_null like ref, but searches also NULL values
fulltext join using full text index
index_merge using index merge optimization (merge multiple ranges)
unique_subquery for some IN subqueries returning primary key values
index_subquery same as previous, but for non-unique indexes
range for comparison operators (e.g. >, <, >=, <=), BETWEEN, IN, LIKE
index same as ALL, but only the index tree is scanned
ALL all rows (i.e. full table scan)
13

• filtered: estimated percentage of filtered rows (before
5.7.3 EXPLAIN EXTENDED was needed)
• rows x filtered gives the estimated number of rows to be joined
• filter estimates are based on range estimates (e.g. for dates), index
statistics or hardcoded selectivity factors (see Selinger):
+----------+
| filtered |
+----------+
| 100 |
| 10 |
+----------+
Predicate Selectivity Filtered
Equality (=) 0.1 10.00 %
Comparison (>, <, >=, <=) 1/3 ≈ 0.33 33.33 %
BETWEEN (also LIKE) 1/9 ≈ 0.11 (it’s ¼ by Selinger!) 11.11 %
(pred1) AND (pred2) SEL(pred1) * SEL(pred2)
(pred1) OR (pred2) SEL(pred1) + SEL(pred2) - SEL(pred1) * SEL(pred2)
NOT (pred) 1 – SEL(pred)
15

• Extra information:
• NULL; or
• Plan isn't ready yet – when explaining query in a named connection;
• Recursive – indicates recursive CTE;
• Rematerialize – for dependent lateral derived tables;
• Using filesort – extra pass for sorting (i.e. when no index could be used);
• Using index – only index tree is scanned;
• Using index condition – WHERE conditions pushed down to storage engine;
• Using join buffer (Block Nested Loop) – BNL algorithm;
• Using join buffer (Batched Key Access) – BKA algorithm;
• Using temporary – when temporary table is used (e.g. GROUP/ORDER BY);
• Using where – selected rows are restricted by a WHERE clause;
• many more.
16

Indexes
• Indexes improve efficiency of queries by providing faster access to data
• Proper indexing reduces query response time and improves scalability
• As developers we need to index, because we know the access paths of the
application(s)
• MySQL supports B-Tree, hash, full text and spatial indexes (R-Tree)
• B-Tree indexes are by default ordered (in ascending order or also
descending order since MySQL 8.0.1 DMR)
• Indexes could be based on multiple columns (composite), functional (since
8.0.13 GA) and multi-valued (since 8.0.17 GA)
• Indexes could also be invisible (since 8.0.0 DMR)
• Use ANALYZE TABLE regularly to update indexes cardinality
17

Optimizer hints
• Fine control over optimizer execution plans of individual statements
• Statement level hints first appeared in MySQL 5.7.7 RC (2015-04-08)
• Extended with new hints in 8.0 for join order, resource group, etc.
• Syntax is similar to Oracle – e.g. /*+ JOIN_ORDER(...) */
• System variable optimizer_switch could also be used, but on
global or session level
• The necessary “evil” when the optimizer cannot itself chose the best
execution plan
• Several scope levels: query block, table, join order, index, subquery and
global (e.g. MAX_EXECUTION_TIME, RESOURCE_GROUP, SET_VAR)
18

Optimizer hints Example
SELECT
E.ename, E.job, D.dname
FROM dept D,
emp E
AND E.job = 'PRESIDENT';
SELECT /*+ JOIN_ORDER(E, D) */
E.ename, E.job, D.dname
FROM dept D,
emp E
+----+-------+------+-----------+--------+----------+
| id | table | type | key | rows | filtered |
+----+-------+------+-----------+--------+----------+
| 1 | D | ALL | NULL | 4 | 100 |
| 1 | E | ref | fk_deptno | 250003 | 10 |
+----+-------+------+-----------+--------+----------+
Execution time: ≈ 2.1 sec
+----+-------+--------+---------+---------+----------+
| id | table | type | key | rows | filtered |
+----+-------+--------+---------+---------+----------+
| 1 | E | ALL | NULL | 1000014 | 9.995684 |
| 1 | D | eq_ref | PRIMARY | 1 | 100 |
+----+-------+--------+---------+---------+----------+
Execution time: ≈ 0.3 sec
19

Histograms
• Since MySQL 8.0.3 RC (released 2017-09-21)
• Statistical information about distribution of values in a column
• Values are grouped in buckets (maximum 1024)
• Cumulative frequency is automatically calculated for each bucket
• For large data sets sampling of the data is used, but it needs memory
(see histogram_generation_max_mem_size variable)
• Sampling requires full table scan, but would be improved for InnoDB
in 8.0.19 (probably to be released in January 2020)
• Sampling is not deterministic!
20

Histogram types
Singleton
• Single value per bucket
• Bucket stores value and cumulative
frequency
• Useful for estimation of equality and
range conditions
Equi-height
• Multiple values per bucket
• Bucket stores min and max inclusive
values, cumulative frequency and
number of distinct values
• Frequent values in separate buckets
• Most useful for range conditions
21
0
0.1
0.2
0.3
0.4
0.5
0.6
5 4 3 1 2
Frequency
0
0.05
0.1
0.15
0.2
0.25
0.3
[1,7] 8 [9,12] [13,19] [20,25]
[1,7] 8 [9,12] [13,19] [20,25]

Histograms continued
• Used by optimizer for estimating join cost
• Help the optimizer to make better row estimates
• Useful for columns that are NOT first column of any index, but used in
WHERE clause for joins or IN subqueries
• Best for columns with:
• low cardinality;
• uneven distribution; and
• distribution that does not vary much over time
• Not useful for columns with constantly increasing values (e.g. dates,
counters, etc.)
22

Histogram Example 1/4 – creation and meta
ANALYZE TABLE emp
UPDATE HISTOGRAM ON job
WITH 5 BUCKETS;
{"buckets": [
["base64:type254:QU5BTFlTVA==", 0.44427638528762128],
["base64:type254:Q0xFUks=" , 0.7765828956408041],
["base64:type254:TUFOQUdFUg==", 0.8882609755557034],
["base64:type254:U0FMRVNNQU4=", 1.0]
],
"data-type": "string",
"null-values": 0.0,
"collation-id": 33,
"last-updated": "2019-10-17 07:33:42.007222",
"sampling-rate": 0.10810659461578348,
"histogram-type": "singleton",
"number-of-buckets-specified": 5
}
SELECT JSON_PRETTY(`histogram`)
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS CS
WHERE CS.`schema_name` = 'dept_emp'
AND CS.`table_name` = 'emp'
AND CS.`column_name` = 'job';
23

Histogram Example 2/4 - sampling
SET histogram_generation_max_mem_size = 184*1024*1024; /* 184 MB */
ANALYZE TABLE emp UPDATE HISTOGRAM ON job WITH 5 BUCKETS;
{"buckets": [
["base64:type254:QU5BTFlTVA==", 0.4440377834710314],
["base64:type254:Q0xFUks=" , 0.777311117644353],
["base64:type254:TUFOQUdFUg==", 0.8887915569182032],
["base64:type254:UFJFU0lERU5U", 0.8887925569042035],
["base64:type254:U0FMRVNNQU4=", 1.0]
],
"data-type": "string",
"null-values": 0.0,
"collation-id": 33,
"last-updated": "2019-10-21 10:52:03.974566",
"sampling-rate": 1.0,
"histogram-type": "singleton",
"number-of-buckets-specified": 5
}
Note: Setting histogram_generation_max_mem_size requires SESSION_VARIABLES_ADMIN (since 8.0.14) or
SYSTEM_VARIABLES_ADMIN privilege. 24

Histogram Example 3/4 - frequencies
SELECT HG.val, ROUND(HG.freq, 3) cfreq,
ROUND(HG.freq - LAG(HG.freq, 1, 0) OVER (), 3) freq
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS CS,
JSON_TABLE(`histogram`->'$.buckets', '$[*]'
COLUMNS(val VARCHAR(10) PATH '$[0]',
freq DOUBLE PATH '$[1]')) HG
WHERE CS.`schema_name` = 'dept_emp'
AND CS.`table_name` = 'emp'
AND CS.`column_name` = 'job';
+-----------+-------+-------+
| val | cfreq | freq |
+-----------+-------+-------+
| ANALYST | 0.444 | 0.444 |
| CLERK | 0.777 | 0.333 |
| MANAGER | 0.889 | 0.111 |
| PRESIDENT | 0.889 | 0 |
| SALESMAN | 1 | 0.111 |
+-----------+-------+-------+
5 rows in set (0.0009 sec)
25

Histogram Example 4/4 – query plan effect
FROM dept D,
emp E
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
| 1 | SIMPLE | D || ALL | PRIMARY | NULL || NULL | 4 | 100 | NULL |
| 1 | SIMPLE | E || ref | fk_deptno | fk_deptno || dept_emp.D.deptno | 250003 | 33.32733 | Using where |
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
rows to join from E = 250 003 * 33.32733154296875 ≈ 83 319 (per department)
+----+-------------+-------++--------+---------------+---------++-------------------+---------+----------+-------------+
+----+-------------+-------++--------+---------------+---------++-------------------+---------+----------+-------------+
| 1 | SIMPLE | E || ALL | fk_deptno | NULL || NULL | 1000014 | 0.000099 | Using where |
| 1 | SIMPLE | D || eq_ref | PRIMARY | PRIMARY || dept_emp.E.deptno | 1 | 100 | NULL |
+----+-------------+-------++--------+---------------+---------++-------------------+---------+----------+-------------+
rows to join from E = 1 000 014 * 0.00009999860048992559 ≈ 1 26

Histograms vs Indexes
Histograms
• Use less disk space
• Updated only on demand
• For low cardinality columns
• Create needs only backup lock
(permits DML)
Indexes
• Use more disk space
• Updated with each DML
• For any cardinality columns
• Create needs metadata lock
(no DML permitted)
27

TREE explain plan
• Appeared in MySQL 8.0.16 GA (released 2019-04-25) and improved
with 8.0.18 GA (released 2019-10-14)
• Still under development and considered “experimental”
• Displays operations (iterators) nested as a tree
• Helps to better understand the order of execution of operations
• For access path operations includes (since 8.0.18 GA) information
about:
• estimated execution cost
• estimated number of returned rows
See TREE explain format in MySQL 8.0.16 28

TREE explain plan Example 1/2
EXPLAIN FORMAT=TRADITIONAL
SELECT D.dname, LDT.min_sal, LDT.avg_sal, LDT.max_sal
FROM dept D,
LATERAL (SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
FROM emp E
) AS LDT;
+----+-------------------+------------++------++-----------++-------------------+------+----------+----------------------------+
| id | select_type | table || type || key || ref | rows | filtered | Extra |
+----+-------------------+------------++------++-----------++-------------------+------+----------+----------------------------+
| 1 | PRIMARY | D || ALL || NULL || NULL | 4| 100 | Rematerialize (<derived2>) |
| 1 | PRIMARY | <derived2> || ALL || NULL || NULL | 2| 100 | NULL |
| 2 | DEPENDENT DERIVED | E || ref || fk_deptno || dept_emp.D.deptno |250003| 100 | NULL |
+----+-------------------+------------++------++-----------++-------------------+------+----------+----------------------------+
3 rows in set, 2 warnings (0.0010 sec)
Note (code 1276): Field or reference 'dept_emp.D.deptno' of SELECT #2 was resolved in SELECT #1
Note (code 1003): /* select#1 */ select `dept_emp`.`d`.`dname` AS `dname`,`ldt`.`min_sal` AS ...
29

TREE explain plan Example 2/2
EXPLAIN FORMAT=TREE
SELECT D.dname, LDT.min_sal, LDT.avg_sal, LDT.max_sal
FROM dept D,
LATERAL (SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
FROM emp E
) AS LDT;
+----------------------------------------------------------------------+
| EXPLAIN |
+----------------------------------------------------------------------+
| -> Nested loop inner join
-> Invalidate materialized tables (row from D) (cost=0.65 rows=4)
-> Table scan on D (cost=0.65 rows=4)
-> Table scan on LDT
-> Materialize (invalidate on row from D)
-> Aggregate: min(e.sal), avg(e.sal), max(e.sal)
-> Index lookup on E using fk_deptno (deptno=d.deptno)
(cost=36095.62 rows=332359) |
+----------------------------------------------------------------------+ 30

Hash join optimization
• New in MySQL 8.0.18 GA (released 2019-10-14)
• Nested loop was the only join type before -
Nested Loop Join (NLJ), Block Nested Loop (BNL)
and Batched Key Access (BKA) algorithms
• Hash the smaller table and use it to lookup
rows from the other table
• Uses xxHash for extremely fast RAM hashing
• The best is hashing to be done entirely in
memory, but could also use disk (less efficient)
• You may need to adjust join buffer size (see
join_buffer_size)
Table 1 Table 2
xxHash64 xxHash64
Join buffer
#
#
#
Result
=
See WL#2241 31

Hash join continued
• Used automatically for any query with eq_ref condition and join
uses no indexes
• Would also work for cartesian joins (i.e. for joins without join
condition)
• Unfortunately visible only in TREE format of explain plan
• New hints for forcing hash join or NL - HASH_JOIN or
NO_HASH_JOIN
• Also on global or session level with hash_join=on|off in
optimizer_switch system variable
32

Hash join Example 1/2
CREATE TABLE job_sal (
job VARCHAR(9),
sal_min DECIMAL(9,2),
sal_max DECIMAL(9,2)
);
INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('ANALYST’ , 3000, 4000);
INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('CLERK’ , 800, 1500);
INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('MANAGER’ , 2800, 3500);
INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('SALESMAN’ , 1250, 1900);
INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('PRESIDENT', 5000, NULL);
33

Hash join Example 2/2
EXPLAIN FORMAT=TREE
SELECT E.ename, E.sal, JS.sal_min, JS.sal_max
FROM emp E,
job_sal JS
WHERE E.job = JS.job
AND E.sal NOT BETWEEN JS.sal_min AND JS.sal_max;
+---------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------+
| -> Filter: (E.sal not between JS.sal_min and JS.sal_max) (cost=499211.71 rows=442901)
-> Inner hash join (E.job = JS.job) (cost=499211.71 rows=442901)
-> Table scan on E (cost=1962.39 rows=996514)
-> Hash
-> Table scan on JS (cost=0.75 rows=5) |
+---------------------------------------------------------------------------------------+
1 row in set (0.0011 sec)
34

Block Nested Loop (BNL) vs Hash join
Block Nested Loop (BNL)
• Block NL query run for about
1.30 sec for 1M employees
• For equality and non-equality
joins
• For smaller result sets BNL
should be just fine
Hash join
• Hash join query run for about
0.9 sec for 1M employees
• For equality joins only
• For large result sets hash join
should be faster
35

EXPLAIN ANALYZE
• New in MySQL 8.0.18 GA (released 2019-10-14)
• Actually executes the query and provides timings from the execution
• Useful for comparing optimizer’s estimations to actual execution
• Output is in TREE format only (hopefully just for now – see WL#4168)
• In addition to TREE output provides also information about:
• time to return first row
• time to return all rows
• number of returned rows
• number of loops
• Only for SELECT statements. Cannot be used with FOR CONNECTION
See also EXPLAIN for PostgreSQL 36

EXPLAIN ANALYZE Example 1
+--------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------+
| -> Filter: (e.sal not between js.sal_min and js.sal_max) (cost=499211.71 rows=442901)
(actual time=0.098..778.486 rows=915166 loops=1)
-> Inner hash join (e.job = js.job) (cost=499211.71 rows=442901)
-> Table scan on E (cost=1962.39 rows=996514)
-> Hash
-> Table scan on JS (cost=0.75 rows=5) (actual time=0.041..0.048 rows=5 loops=1)|
+--------------------------------------------------------------------------------------------+
EXPLAIN ANALYZE
SELECT E.ename, E.sal, JS.sal_min, JS.sal_max
FROM emp E,
job_sal JS
WHERE E.job = JS.job
AND E.sal NOT BETWEEN JS.sal_min AND JS.sal_max; Note: Actual times are in milliseconds!
37

EXPLAIN ANALYZE Example 2
EXPLAIN ANALYZE
FROM dept D,
emp E
+------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------+
| -> Nested loop inner join (cost=112394.29 rows=333278)
-> Table scan on D (cost=1.40 rows=4)
-> Filter: (e.job = 'CLERK') (cost=5180.86 rows=83319)
-> Index lookup on E using fk_deptno (deptno=d.deptno) (cost=5180.86 rows=250004)
(actual time=1.013..786.799 rows=250004 loops=4) |
+------------------------------------------------------------------------------------------+
first row -> 5.904 ≈ 5.675 = 4.659 + 1.016
all rows -> 3 281.472 ≈ 3 249.65 = 4.666 + 4 * 811.246
Note: Actual times are aggregated between loops!
38

Problems with TREE and EXPLAIN ANALYZE
• Will not explain queries using nested loop (shows just “not executable
by iterator executor”)
• Will not explain SELECT COUNT(*) FROM table queries (shows
just “Count rows in table”)
• Does not compute select list subqueries (see bug 97296) [FIXED]
• Not integrated with MySQL Workbench (see bug 97282)
• Does not print units on timings (see bug 97492)
39

Visual explain plans
• Displayed by default in MySQL Workbench
40

Additional information by JSON/visual explain
• Used columns – list of columns either read or written
• Used key parts – list of used key parts
• Rows produced per join – estimated rows after join
• Cost estimates (since 5.7.2) are split into:
• query cost – total cost of query (or subquery) block
• sort cost (CPU) – cost of the first sorting operation
• read cost (IO) – the cost of reading data from table
• eval cost (CPU) – the cost of condition evaluation
• prefix cost (CPU) – the cost of joining tables
• data read – estimated amount of data processed (rows x
record width)
See WL#6510 and MySQL 5.7.2 Release notes 41

Benefits from visual explain plans
• Help understand easily where is the problem
• easily spot bad access paths by box color:
• missing index(es);
• wrong or insufficient join conditions
• easily spot where most rows are generated by the thickness of lines:
• bad access path and/or no filter;
• involuntary cartesian joins;
• wrong join order
• Cost calculations are not fully documented, so hard to understand
• basically cost values are related to the number of blocks read (IO) and rows
processed (CPU)
• costs reflect the work done by the database
42

Visual explain plans Example 1 1/2
MySQL 5.5 ≈ 9.5 sec
Option A: Add WHERE
condition in the subquery
MySQL 5.5/8.0 ≈ 1.2 sec 43

MySQL 8.0 ≈ 0.015 secMySQL 5.5 ≈ 0.015 sec
Option B: Use derived
table instead of subquery
in WHERE
Option C: Use MySQL 8 ;-)
44

In MySQL 5.7.20: 6-7 sec
45

In MySQL 8.0.13: ≈ 24 sec
46

Option A: Use JOIN_ORDER hint
Option B: Re-create multi-column index on two instead of
three columns to improve selectivity (one of the columns in
the index had low cardinality)
Option C: Option B + remove table msg_progs
Execution time: ≈ 4 sec 47

Summary
• Use EXPLAIN to examine and tune query plan
• Use EXPLAIN ANALYZE to profile query execution
• Use different types of indexes and histograms properly
• There are no strict rules to follow for optimizing a query
• So be creative and evaluate different options
48

References
• MySQL Reference Manual (and section Optimization in particular)
• Use the index, Luke! and Modern SQL sites by Markus Winand
• MySQL EXPLAIN Explained by Øystein Grøvlen
• Histogram statistics in MySQL & Hash join in MySQL 8 by Erik Frøseth
• MySQL Explain Example by Tomer Shay
• MySQL EXPLAIN ANALYZE by Norvald H. Ryeng
• Using Explain Analyze in MySQL 8 from Percona blog
• My blog posts on MySQL
49

Optimizing queries MySQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Optimizing queries MySQL

Similar to Optimizing queries MySQL (20)

Recently uploaded

Recently uploaded (20)

Optimizing queries MySQL

Editor's Notes