SlideShare a Scribd company logo
Optimizing queries in
MySQL
Tabular, tree and visual
explain plans, new
optimization features
November 15-th, 2019
Who am I?
• Software Development Manager & Team Leader @ Codix
• Using MySQL since 3.23.x
• Building MySQL Server and related products for Slackware Linux for
more than 14 years (check SlackPack)
• Free and open source software enthusiast
• Formula 1 and Le Mans 24 fan
• gdsotirov @
2
Agenda
• Query optimization
• What is explain plan?
• How to explain queries in MySQL
• Understanding tabular explain plans
• Optimization features (indexes, hints, histograms)
• New optimization features (TREE, hash join, EXPLAIN ANALYZE)
• Using visual explain plans
3
Query optimization
• SQL queries express requests to the database
• Database parses, optimizes and executes the query
• The optimizer should choose the most efficient way for execution:
• using all available information about tables, columns, indexes, etc.
• evaluating alternative plans
• For us (database practitioners) optimizing (or tuning) queries means to
• ensure use of efficient access paths;
• ensure use of optimal join order;
• rewrite query, to help optimizer choose a better plan; and
• even reorganize (normalize) the schema.
• The optimizer provides information about how it intends to execute a
query through the explain plan
For automatic query optimization see EverSQL 4
What is explain plan?
• Explain plan (or execution plan) shows the steps that MySQL
optimizer would take to execute a query
• Includes information about:
• access paths;
• used indexes and partitions;
• joins between tables;
• order of operations; and
• extra details.
• It helps understand if indexes are missing or not used, whether joins
are done in optimal order or generally why queries are slow
5
Why should we as developers care?
• We (should) know the schema (or should we?)
• We know how the application(s) queries data (i.e. the access paths)
as we are familiar with the implemented functionalities
• We know what data is stored in tables as it comes from the
application(s)
• We should care about performance of the application(s), which may
depend on the performance of SQL queries
• We do not need to depend on others (e.g. DBAs), if we could
diagnose and fix a slow query
6
How to explain queries - EXPLAIN syntax
• The general syntax is:
{EXPLAIN | DESCRIBE | DESC}
[explain_type: {FORMAT = {TRADITIONAL | JSON | TREE}]
{statement | FOR CONNECTION con_id}
• Statement could be SELECT, DELETE, INSERT, REPLACE or UPDATE
(before 5.6.3 only SELECT)
• TREE is new format since 8.0.16 GA (2019-04-25)
• Can explain the currently executing query for a connection (since 5.7.2)
• Requires SELECT privilege for tables and views + SHOW VIEW privilege for
views
• DESCRIBE is synonym for EXPLAIN but used mostly for getting table
structure
7
Example schema – departments and
employees
CREATE DATABASE dept_emp;
USE dept_emp;
CREATE TABLE dept (
deptno INTEGER,
dname VARCHAR(14),
loc VARCHAR(13),
CONSTRAINT pk_dept
PRIMARY KEY (deptno)
);
CREATE TABLE emp (
empno INTEGER,
ename VARCHAR(10),
job VARCHAR(9),
mgr INTEGER,
hiredate DATE,
sal DECIMAL(7,2),
comm DECIMAL(7,2),
deptno INTEGER,
CONSTRAINT pk_emp PRIMARY KEY (empno),
CONSTRAINT fk_deptno FOREIGN KEY (deptno)
REFERENCES dept (deptno)
);
8
Example schema - data
INSERT INTO dept VALUES (10, 'ACCOUNTING', 'NEW YORK');
INSERT INTO dept VALUES (20, 'RESEARCH' , 'DALLAS');
INSERT INTO dept VALUES (30, 'SALES' , 'CHICAGO');
INSERT INTO dept VALUES (40, 'OPERATIONS', 'BOSTON');
INSERT INTO emp VALUES (7839, 'KING' , 'PRESIDENT', NULL, '1981-11-17', 5000, NULL, 10);
INSERT INTO emp VALUES (7698, 'BLAKE' , 'MANAGER' , 7839, '1981-05-01', 2850, NULL, 30);
INSERT INTO emp VALUES (7782, 'CLARK' , 'MANAGER' , 7839, '1981-06-09', 2450, NULL, 10);
INSERT INTO emp VALUES (7566, 'JONES' , 'MANAGER' , 7839, '1981-04-02', 2975, NULL, 20);
INSERT INTO emp VALUES (7788, 'SCOTT' , 'ANALYST' , 7566, '1987-06-13', 3000, NULL, 20);
INSERT INTO emp VALUES (7902, 'FORD' , 'ANALYST' , 7566, '1981-12-03', 3000, NULL, 20);
INSERT INTO emp VALUES (7369, 'SMITH' , 'CLERK' , 7902, '1980-12-17', 800, NULL, 20);
INSERT INTO emp VALUES (7499, 'ALLEN' , 'SALESMAN' , 7698, '1981-02-20', 1600, 300, 30);
INSERT INTO emp VALUES (7521, 'WARD' , 'SALESMAN' , 7698, '1981-02-22', 1250, 500, 30);
INSERT INTO emp VALUES (7654, 'MARTIN', 'SALESMAN' , 7698, '1981-09-28', 1250, 1400, 30);
INSERT INTO emp VALUES (7844, 'TURNER', 'SALESMAN' , 7698, '1981-09-08', 1500, 0, 30);
INSERT INTO emp VALUES (7876, 'ADAMS' , 'CLERK' , 7788, '1987-06-13', 1100, NULL, 20);
INSERT INTO emp VALUES (7900, 'JAMES' , 'CLERK' , 7698, '1981-12-03', 950, NULL, 30);
INSERT INTO emp VALUES (7934, 'MILLER', 'CLERK' , 7782, '1982-01-23', 1300, NULL, 10);
+ 1 000 000 rows
See gen_emps.sql
9
Traditional explain plan Example
EXPLAIN
SELECT E.ename, E.job, D.dname
FROM dept D,
emp E
WHERE E.deptno = D.deptno
AND E.job = 'CLERK';
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
| id | select_type | table || type | possible_keys | key || ref | rows | filtered | Extra |
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
| 1 | SIMPLE | D || ALL | PRIMARY | NULL || NULL | 4 | 100 | NULL |
| 1 | SIMPLE | E || ref | fk_deptno | fk_deptno || dept_emp.D.deptno | 250003 | 10 | Using where |
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
2 rows in set, 1 warning (0.0006 sec)
Note (code 1003): /* select#1 */ select `dept_emp`.`e`.`ename` AS `ename`,`dept_emp`.`e`.`job` AS
`job`,`dept_emp`.`d`.`dname` AS `dname` from `dept_emp`.`dept` `d` join `dept_emp`.`emp` `e` where
((`dept_emp`.`e`.`deptno` = `dept_emp`.`d`.`deptno`) and (`dept_emp`.`e`.`job` = 'CLERK'))
rows to join from E = 250 003 * 10% = 25 000.3 (per department)
10
Understanding tabular explain plans 1/5
• id: the sequential number of the SELECT in
query
• select_type: possible values include:
+----+-------------+
| id | select_type |
+----+-------------+
| 1 | SIMPLE |
| 1 | SIMPLE |
+----+-------------+
Value Meaning
SIMPLE no unions or subqueries
PRIMARY outermost SELECT
[DEPENDENT] UNION [dependent on outer query] second or later SELECT in a union
UNION RESULT result of a union
[DEPENDENT] SUBQUERY [dependent on outer query] first SELECT in subquery
[DEPENDENT] DERIVED [dependent on another table] derived table
MATERIALIZED materialized subquery
UNCACHEABLE [SUBQUERY|UNION] a subquery/union that must be re-evaluated for each row of the outer query
11
Understanding tabular explain plans 2/5
• table: name of table, union, subquery or derived. Could be
• table alias (or name)
• <unionM,N> - union between rows with ids M and N;
• <subqueryN> - result of materialized subquery from row N;
• <derivedN> - result of derived table from row N.
• partitions: NULL or the names of matched partitions
• type: join (access) type
+-------+------------+------+
| table | partitions | type |
+-------+------------+------+
| D | NULL | ALL |
| E | NULL | ref |
+-------+------------+------+
12
Join (access) types
best
worst
Value Meaning
system for tables with just one row
const for tables matching at most one row (by constant value)
eq_ref for 1:1 relations (primary keys or UNIQUE NOT NULL indexes)
ref for 1:N relations (non-unique indexes)
ref_or_null like ref, but searches also NULL values
fulltext join using full text index
index_merge using index merge optimization (merge multiple ranges)
unique_subquery for some IN subqueries returning primary key values
index_subquery same as previous, but for non-unique indexes
range for comparison operators (e.g. >, <, >=, <=), BETWEEN, IN, LIKE
index same as ALL, but only the index tree is scanned
ALL all rows (i.e. full table scan)
13
Understanding tabular explain plans 3/5
• possible_keys: NULL or names of indexes that could be used
• key: actually used index (may be other than listed in possible_keys)
• key_len: used key parts in bytes (e.g. >= 4 bytes for INT, cumulated
for composite indexes)
• ref: columns or constants used to compare to indexes
• rows: estimated number of rows to be retrieved by the chosen
access path
+---------------+-----------+---------+-------------------+--------+
| possible_keys | key | key_len | ref | rows |
+---------------+-----------+---------+-------------------+--------+
| PRIMARY | NULL | NULL | NULL | 4 |
| fk_deptno | fk_deptno | 5 | dept_emp.D.deptno | 250003 |
+---------------+-----------+---------+-------------------+--------+
14
Understanding tabular explain plans 4/5
• filtered: estimated percentage of filtered rows (before
5.7.3 EXPLAIN EXTENDED was needed)
• rows x filtered gives the estimated number of rows to be joined
• filter estimates are based on range estimates (e.g. for dates), index
statistics or hardcoded selectivity factors (see Selinger):
+----------+
| filtered |
+----------+
| 100 |
| 10 |
+----------+
Predicate Selectivity Filtered
Equality (=) 0.1 10.00 %
Comparison (>, <, >=, <=) 1/3 ≈ 0.33 33.33 %
BETWEEN (also LIKE) 1/9 ≈ 0.11 (it’s ¼ by Selinger!) 11.11 %
(pred1) AND (pred2) SEL(pred1) * SEL(pred2)
(pred1) OR (pred2) SEL(pred1) + SEL(pred2) - SEL(pred1) * SEL(pred2)
NOT (pred) 1 – SEL(pred)
15
Understanding tabular explain plans 5/5
• Extra information:
• NULL; or
• Plan isn't ready yet – when explaining query in a named connection;
• Recursive – indicates recursive CTE;
• Rematerialize – for dependent lateral derived tables;
• Using filesort – extra pass for sorting (i.e. when no index could be used);
• Using index – only index tree is scanned;
• Using index condition – WHERE conditions pushed down to storage engine;
• Using join buffer (Block Nested Loop) – BNL algorithm;
• Using join buffer (Batched Key Access) – BKA algorithm;
• Using temporary – when temporary table is used (e.g. GROUP/ORDER BY);
• Using where – selected rows are restricted by a WHERE clause;
• many more.
16
Indexes
• Indexes improve efficiency of queries by providing faster access to data
• Proper indexing reduces query response time and improves scalability
• As developers we need to index, because we know the access paths of the
application(s)
• MySQL supports B-Tree, hash, full text and spatial indexes (R-Tree)
• B-Tree indexes are by default ordered (in ascending order or also
descending order since MySQL 8.0.1 DMR)
• Indexes could be based on multiple columns (composite), functional (since
8.0.13 GA) and multi-valued (since 8.0.17 GA)
• Indexes could also be invisible (since 8.0.0 DMR)
• Use ANALYZE TABLE regularly to update indexes cardinality
17
Optimizer hints
• Fine control over optimizer execution plans of individual statements
• Statement level hints first appeared in MySQL 5.7.7 RC (2015-04-08)
• Extended with new hints in 8.0 for join order, resource group, etc.
• Syntax is similar to Oracle – e.g. /*+ JOIN_ORDER(...) */
• System variable optimizer_switch could also be used, but on
global or session level
• The necessary “evil” when the optimizer cannot itself chose the best
execution plan
• Several scope levels: query block, table, join order, index, subquery and
global (e.g. MAX_EXECUTION_TIME, RESOURCE_GROUP, SET_VAR)
18
Optimizer hints Example
SELECT
E.ename, E.job, D.dname
FROM dept D,
emp E
WHERE E.deptno = D.deptno
AND E.job = 'PRESIDENT';
SELECT /*+ JOIN_ORDER(E, D) */
E.ename, E.job, D.dname
FROM dept D,
emp E
WHERE E.deptno = D.deptno
AND E.job = 'PRESIDENT';
+----+-------+------+-----------+--------+----------+
| id | table | type | key | rows | filtered |
+----+-------+------+-----------+--------+----------+
| 1 | D | ALL | NULL | 4 | 100 |
| 1 | E | ref | fk_deptno | 250003 | 10 |
+----+-------+------+-----------+--------+----------+
Execution time: ≈ 2.1 sec
+----+-------+--------+---------+---------+----------+
| id | table | type | key | rows | filtered |
+----+-------+--------+---------+---------+----------+
| 1 | E | ALL | NULL | 1000014 | 9.995684 |
| 1 | D | eq_ref | PRIMARY | 1 | 100 |
+----+-------+--------+---------+---------+----------+
Execution time: ≈ 0.3 sec
19
Histograms
• Since MySQL 8.0.3 RC (released 2017-09-21)
• Statistical information about distribution of values in a column
• Values are grouped in buckets (maximum 1024)
• Cumulative frequency is automatically calculated for each bucket
• For large data sets sampling of the data is used, but it needs memory
(see histogram_generation_max_mem_size variable)
• Sampling requires full table scan, but would be improved for InnoDB
in 8.0.19 (probably to be released in January 2020)
• Sampling is not deterministic!
20
Histogram types
Singleton
• Single value per bucket
• Bucket stores value and cumulative
frequency
• Useful for estimation of equality and
range conditions
Equi-height
• Multiple values per bucket
• Bucket stores min and max inclusive
values, cumulative frequency and
number of distinct values
• Frequent values in separate buckets
• Most useful for range conditions
21
0
0.1
0.2
0.3
0.4
0.5
0.6
5 4 3 1 2
Frequency
0
0.05
0.1
0.15
0.2
0.25
0.3
[1,7] 8 [9,12] [13,19] [20,25]
[1,7] 8 [9,12] [13,19] [20,25]
Histograms continued
• Used by optimizer for estimating join cost
• Help the optimizer to make better row estimates
• Useful for columns that are NOT first column of any index, but used in
WHERE clause for joins or IN subqueries
• Best for columns with:
• low cardinality;
• uneven distribution; and
• distribution that does not vary much over time
• Not useful for columns with constantly increasing values (e.g. dates,
counters, etc.)
22
Histogram Example 1/4 – creation and meta
ANALYZE TABLE emp
UPDATE HISTOGRAM ON job
WITH 5 BUCKETS;
{"buckets": [
["base64:type254:QU5BTFlTVA==", 0.44427638528762128],
["base64:type254:Q0xFUks=" , 0.7765828956408041],
["base64:type254:TUFOQUdFUg==", 0.8882609755557034],
["base64:type254:U0FMRVNNQU4=", 1.0]
],
"data-type": "string",
"null-values": 0.0,
"collation-id": 33,
"last-updated": "2019-10-17 07:33:42.007222",
"sampling-rate": 0.10810659461578348,
"histogram-type": "singleton",
"number-of-buckets-specified": 5
}
SELECT JSON_PRETTY(`histogram`)
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS CS
WHERE CS.`schema_name` = 'dept_emp'
AND CS.`table_name` = 'emp'
AND CS.`column_name` = 'job';
23
Histogram Example 2/4 - sampling
SET histogram_generation_max_mem_size = 184*1024*1024; /* 184 MB */
ANALYZE TABLE emp UPDATE HISTOGRAM ON job WITH 5 BUCKETS;
{"buckets": [
["base64:type254:QU5BTFlTVA==", 0.4440377834710314],
["base64:type254:Q0xFUks=" , 0.777311117644353],
["base64:type254:TUFOQUdFUg==", 0.8887915569182032],
["base64:type254:UFJFU0lERU5U", 0.8887925569042035],
["base64:type254:U0FMRVNNQU4=", 1.0]
],
"data-type": "string",
"null-values": 0.0,
"collation-id": 33,
"last-updated": "2019-10-21 10:52:03.974566",
"sampling-rate": 1.0,
"histogram-type": "singleton",
"number-of-buckets-specified": 5
}
Note: Setting histogram_generation_max_mem_size requires SESSION_VARIABLES_ADMIN (since 8.0.14) or
SYSTEM_VARIABLES_ADMIN privilege. 24
Histogram Example 3/4 - frequencies
SELECT HG.val, ROUND(HG.freq, 3) cfreq,
ROUND(HG.freq - LAG(HG.freq, 1, 0) OVER (), 3) freq
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS CS,
JSON_TABLE(`histogram`->'$.buckets', '$[*]'
COLUMNS(val VARCHAR(10) PATH '$[0]',
freq DOUBLE PATH '$[1]')) HG
WHERE CS.`schema_name` = 'dept_emp'
AND CS.`table_name` = 'emp'
AND CS.`column_name` = 'job';
+-----------+-------+-------+
| val | cfreq | freq |
+-----------+-------+-------+
| ANALYST | 0.444 | 0.444 |
| CLERK | 0.777 | 0.333 |
| MANAGER | 0.889 | 0.111 |
| PRESIDENT | 0.889 | 0 |
| SALESMAN | 1 | 0.111 |
+-----------+-------+-------+
5 rows in set (0.0009 sec)
25
Histogram Example 4/4 – query plan effect
SELECT E.ename, E.job, D.dname
FROM dept D,
emp E
WHERE E.deptno = D.deptno
AND E.job = 'CLERK';
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
| id | select_type | table || type | possible_keys | key || ref | rows | filtered | Extra |
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
| 1 | SIMPLE | D || ALL | PRIMARY | NULL || NULL | 4 | 100 | NULL |
| 1 | SIMPLE | E || ref | fk_deptno | fk_deptno || dept_emp.D.deptno | 250003 | 33.32733 | Using where |
+----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+
rows to join from E = 250 003 * 33.32733154296875 ≈ 83 319 (per department)
AND E.job = 'PRESIDENT';
+----+-------------+-------++--------+---------------+---------++-------------------+---------+----------+-------------+
| id | select_type | table || type | possible_keys | key || ref | rows | filtered | Extra |
+----+-------------+-------++--------+---------------+---------++-------------------+---------+----------+-------------+
| 1 | SIMPLE | E || ALL | fk_deptno | NULL || NULL | 1000014 | 0.000099 | Using where |
| 1 | SIMPLE | D || eq_ref | PRIMARY | PRIMARY || dept_emp.E.deptno | 1 | 100 | NULL |
+----+-------------+-------++--------+---------------+---------++-------------------+---------+----------+-------------+
rows to join from E = 1 000 014 * 0.00009999860048992559 ≈ 1 26
Histograms vs Indexes
Histograms
• Use less disk space
• Updated only on demand
• For low cardinality columns
• Create needs only backup lock
(permits DML)
Indexes
• Use more disk space
• Updated with each DML
• For any cardinality columns
• Create needs metadata lock
(no DML permitted)
27
TREE explain plan
• Appeared in MySQL 8.0.16 GA (released 2019-04-25) and improved
with 8.0.18 GA (released 2019-10-14)
• Still under development and considered “experimental”
• Displays operations (iterators) nested as a tree
• Helps to better understand the order of execution of operations
• For access path operations includes (since 8.0.18 GA) information
about:
• estimated execution cost
• estimated number of returned rows
See TREE explain format in MySQL 8.0.16 28
TREE explain plan Example 1/2
EXPLAIN FORMAT=TRADITIONAL
SELECT D.dname, LDT.min_sal, LDT.avg_sal, LDT.max_sal
FROM dept D,
LATERAL (SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
FROM emp E
WHERE E.deptno = D.deptno
) AS LDT;
+----+-------------------+------------++------++-----------++-------------------+------+----------+----------------------------+
| id | select_type | table || type || key || ref | rows | filtered | Extra |
+----+-------------------+------------++------++-----------++-------------------+------+----------+----------------------------+
| 1 | PRIMARY | D || ALL || NULL || NULL | 4| 100 | Rematerialize (<derived2>) |
| 1 | PRIMARY | <derived2> || ALL || NULL || NULL | 2| 100 | NULL |
| 2 | DEPENDENT DERIVED | E || ref || fk_deptno || dept_emp.D.deptno |250003| 100 | NULL |
+----+-------------------+------------++------++-----------++-------------------+------+----------+----------------------------+
3 rows in set, 2 warnings (0.0010 sec)
Note (code 1276): Field or reference 'dept_emp.D.deptno' of SELECT #2 was resolved in SELECT #1
Note (code 1003): /* select#1 */ select `dept_emp`.`d`.`dname` AS `dname`,`ldt`.`min_sal` AS ...
29
TREE explain plan Example 2/2
EXPLAIN FORMAT=TREE
SELECT D.dname, LDT.min_sal, LDT.avg_sal, LDT.max_sal
FROM dept D,
LATERAL (SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
FROM emp E
WHERE E.deptno = D.deptno
) AS LDT;
+----------------------------------------------------------------------+
| EXPLAIN |
+----------------------------------------------------------------------+
| -> Nested loop inner join
-> Invalidate materialized tables (row from D) (cost=0.65 rows=4)
-> Table scan on D (cost=0.65 rows=4)
-> Table scan on LDT
-> Materialize (invalidate on row from D)
-> Aggregate: min(e.sal), avg(e.sal), max(e.sal)
-> Index lookup on E using fk_deptno (deptno=d.deptno)
(cost=36095.62 rows=332359) |
+----------------------------------------------------------------------+ 30
Hash join optimization
• New in MySQL 8.0.18 GA (released 2019-10-14)
• Nested loop was the only join type before -
Nested Loop Join (NLJ), Block Nested Loop (BNL)
and Batched Key Access (BKA) algorithms
• Hash the smaller table and use it to lookup
rows from the other table
• Uses xxHash for extremely fast RAM hashing
• The best is hashing to be done entirely in
memory, but could also use disk (less efficient)
• You may need to adjust join buffer size (see
join_buffer_size)
Table 1 Table 2
xxHash64 xxHash64
Join buffer
#
#
#
Result
=
See WL#2241 31
Hash join continued
• Used automatically for any query with eq_ref condition and join
uses no indexes
• Would also work for cartesian joins (i.e. for joins without join
condition)
• Unfortunately visible only in TREE format of explain plan
• New hints for forcing hash join or NL - HASH_JOIN or
NO_HASH_JOIN
• Also on global or session level with hash_join=on|off in
optimizer_switch system variable
32
Hash join Example 1/2
CREATE TABLE job_sal (
job VARCHAR(9),
sal_min DECIMAL(9,2),
sal_max DECIMAL(9,2)
);
INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('ANALYST’ , 3000, 4000);
INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('CLERK’ , 800, 1500);
INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('MANAGER’ , 2800, 3500);
INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('SALESMAN’ , 1250, 1900);
INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('PRESIDENT', 5000, NULL);
33
Hash join Example 2/2
EXPLAIN FORMAT=TREE
SELECT E.ename, E.sal, JS.sal_min, JS.sal_max
FROM emp E,
job_sal JS
WHERE E.job = JS.job
AND E.sal NOT BETWEEN JS.sal_min AND JS.sal_max;
+---------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------+
| -> Filter: (E.sal not between JS.sal_min and JS.sal_max) (cost=499211.71 rows=442901)
-> Inner hash join (E.job = JS.job) (cost=499211.71 rows=442901)
-> Table scan on E (cost=1962.39 rows=996514)
-> Hash
-> Table scan on JS (cost=0.75 rows=5) |
+---------------------------------------------------------------------------------------+
1 row in set (0.0011 sec)
34
Block Nested Loop (BNL) vs Hash join
Block Nested Loop (BNL)
• Block NL query run for about
1.30 sec for 1M employees
• For equality and non-equality
joins
• For smaller result sets BNL
should be just fine
Hash join
• Hash join query run for about
0.9 sec for 1M employees
• For equality joins only
• For large result sets hash join
should be faster
35
EXPLAIN ANALYZE
• New in MySQL 8.0.18 GA (released 2019-10-14)
• Actually executes the query and provides timings from the execution
• Useful for comparing optimizer’s estimations to actual execution
• Output is in TREE format only (hopefully just for now – see WL#4168)
• In addition to TREE output provides also information about:
• time to return first row
• time to return all rows
• number of returned rows
• number of loops
• Only for SELECT statements. Cannot be used with FOR CONNECTION
See also EXPLAIN for PostgreSQL 36
EXPLAIN ANALYZE Example 1
+--------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------+
| -> Filter: (e.sal not between js.sal_min and js.sal_max) (cost=499211.71 rows=442901)
(actual time=0.098..778.486 rows=915166 loops=1)
-> Inner hash join (e.job = js.job) (cost=499211.71 rows=442901)
(actual time=0.089..568.473 rows=1000014 loops=1)
-> Table scan on E (cost=1962.39 rows=996514)
(actual time=0.025..288.830 rows=1000014 loops=1)
-> Hash
-> Table scan on JS (cost=0.75 rows=5) (actual time=0.041..0.048 rows=5 loops=1)|
+--------------------------------------------------------------------------------------------+
1 row in set (0.8240 sec)
EXPLAIN ANALYZE
SELECT E.ename, E.sal, JS.sal_min, JS.sal_max
FROM emp E,
job_sal JS
WHERE E.job = JS.job
AND E.sal NOT BETWEEN JS.sal_min AND JS.sal_max; Note: Actual times are in milliseconds!
37
EXPLAIN ANALYZE Example 2
EXPLAIN ANALYZE
SELECT E.ename, E.job, D.dname
FROM dept D,
emp E
WHERE E.deptno = D.deptno
AND E.job = 'CLERK';
+------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------+
| -> Nested loop inner join (cost=112394.29 rows=333278)
(actual time=5.904..3281.472 rows=333278 loops=1)
-> Table scan on D (cost=1.40 rows=4)
(actual time=4.659..4.666 rows=4 loops=1)
-> Filter: (e.job = 'CLERK') (cost=5180.86 rows=83319)
(actual time=1.016..811.246 rows=83320 loops=4)
-> Index lookup on E using fk_deptno (deptno=d.deptno) (cost=5180.86 rows=250004)
(actual time=1.013..786.799 rows=250004 loops=4) |
+------------------------------------------------------------------------------------------+
1 row in set (3.3051 sec)
first row -> 5.904 ≈ 5.675 = 4.659 + 1.016
all rows -> 3 281.472 ≈ 3 249.65 = 4.666 + 4 * 811.246
Note: Actual times are aggregated between loops!
38
Problems with TREE and EXPLAIN ANALYZE
• Will not explain queries using nested loop (shows just “not executable
by iterator executor”)
• Will not explain SELECT COUNT(*) FROM table queries (shows
just “Count rows in table”)
• Does not compute select list subqueries (see bug 97296) [FIXED]
• Not integrated with MySQL Workbench (see bug 97282)
• Does not print units on timings (see bug 97492)
39
Visual explain plans
• Displayed by default in MySQL Workbench
40
Additional information by JSON/visual explain
• Used columns – list of columns either read or written
• Used key parts – list of used key parts
• Rows produced per join – estimated rows after join
• Cost estimates (since 5.7.2) are split into:
• query cost – total cost of query (or subquery) block
• sort cost (CPU) – cost of the first sorting operation
• read cost (IO) – the cost of reading data from table
• eval cost (CPU) – the cost of condition evaluation
• prefix cost (CPU) – the cost of joining tables
• data read – estimated amount of data processed (rows x
record width)
See WL#6510 and MySQL 5.7.2 Release notes 41
Benefits from visual explain plans
• Help understand easily where is the problem
• easily spot bad access paths by box color:
• missing index(es);
• wrong or insufficient join conditions
• easily spot where most rows are generated by the thickness of lines:
• bad access path and/or no filter;
• involuntary cartesian joins;
• wrong join order
• Cost calculations are not fully documented, so hard to understand
• basically cost values are related to the number of blocks read (IO) and rows
processed (CPU)
• costs reflect the work done by the database
42
Visual explain plans Example 1 1/2
MySQL 5.5 ≈ 9.5 sec
Option A: Add WHERE
condition in the subquery
MySQL 5.5/8.0 ≈ 1.2 sec 43
Visual explain plans Example 1 2/2
MySQL 8.0 ≈ 0.015 secMySQL 5.5 ≈ 0.015 sec
Option B: Use derived
table instead of subquery
in WHERE
Option C: Use MySQL 8 ;-)
44
Visual explain plans Example 2 1/3
In MySQL 5.7.20: 6-7 sec
45
Visual explain plans Example 2 2/3
In MySQL 8.0.13: ≈ 24 sec
46
Visual explain plans Example 2 3/3
Option A: Use JOIN_ORDER hint
Option B: Re-create multi-column index on two instead of
three columns to improve selectivity (one of the columns in
the index had low cardinality)
Option C: Option B + remove table msg_progs
Execution time: ≈ 4 sec 47
Summary
• Use EXPLAIN to examine and tune query plan
• Use EXPLAIN ANALYZE to profile query execution
• Use different types of indexes and histograms properly
• There are no strict rules to follow for optimizing a query
• So be creative and evaluate different options
48
References
• MySQL Reference Manual (and section Optimization in particular)
• Use the index, Luke! and Modern SQL sites by Markus Winand
• MySQL EXPLAIN Explained by Øystein Grøvlen
• Histogram statistics in MySQL & Hash join in MySQL 8 by Erik Frøseth
• MySQL Explain Example by Tomer Shay
• MySQL EXPLAIN ANALYZE by Norvald H. Ryeng
• Using Explain Analyze in MySQL 8 from Percona blog
• My blog posts on MySQL
49
Questions?
50
51

More Related Content

What's hot

Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksQuery Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Jaime Crespo
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performance
oysteing
 
MySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsMySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 Tips
OSSCube
 
Explain that explain
Explain that explainExplain that explain
Explain that explain
Fabrizio Parrella
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Jaime Crespo
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
Sveta Smirnova
 
How to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScaleHow to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScale
MariaDB plc
 
PostgreSQL: Advanced indexing
PostgreSQL: Advanced indexingPostgreSQL: Advanced indexing
PostgreSQL: Advanced indexing
Hans-Jürgen Schönig
 
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZEMySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
Norvald Ryeng
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain Explained
Jeremy Coates
 
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
PgDay.Seoul
 
MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바
NeoClova
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer Guide
Morgan Tocker
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
MYXPLAIN
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0
Mydbops
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Mydbops
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바
NeoClova
 
MySQL partitions tutorial
MySQL partitions tutorialMySQL partitions tutorial
MySQL partitions tutorial
Giuseppe Maxia
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
Sveta Smirnova
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
EDB
 

What's hot (20)

Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksQuery Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performance
 
MySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsMySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 Tips
 
Explain that explain
Explain that explainExplain that explain
Explain that explain
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
How to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScaleHow to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScale
 
PostgreSQL: Advanced indexing
PostgreSQL: Advanced indexingPostgreSQL: Advanced indexing
PostgreSQL: Advanced indexing
 
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZEMySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain Explained
 
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
 
MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer Guide
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바
 
MySQL partitions tutorial
MySQL partitions tutorialMySQL partitions tutorial
MySQL partitions tutorial
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 

Similar to Optimizing queries MySQL

Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
Mydbops
 
Explain
ExplainExplain
Applied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System PresentationApplied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System Presentation
Richard Crowley
 
DATA BASE || INTRODUCTION OF DATABASE \\ SQL 2018
DATA BASE || INTRODUCTION OF DATABASE \\ SQL 2018DATA BASE || INTRODUCTION OF DATABASE \\ SQL 2018
DATA BASE || INTRODUCTION OF DATABASE \\ SQL 2018
teachersduniya.com
 
Sql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ISql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices I
Carlos Oliveira
 
Distributed databases systems-3-2015.ppt
Distributed databases systems-3-2015.pptDistributed databases systems-3-2015.ppt
Distributed databases systems-3-2015.ppt
NaglaaAbdelhady
 
4. Data Manipulation.ppt
4. Data Manipulation.ppt4. Data Manipulation.ppt
4. Data Manipulation.ppt
KISHOYIANKISH
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
MathewJohnSinoCruz
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
Sveta Smirnova
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
MYXPLAIN
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
Sergey Petrunya
 
15 protips for mysql users pfz
15 protips for mysql users   pfz15 protips for mysql users   pfz
15 protips for mysql users pfz
Joshua Thijssen
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
Dave Stokes
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
Dave Stokes
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
Stéphane Fréchette
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL Indexing
MYXPLAIN
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releases
Georgi Sotirov
 
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-KompatibilitätMariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB plc
 
What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3
MariaDB plc
 
Advance MySQL Training by Pratyush Majumdar
Advance MySQL Training by Pratyush MajumdarAdvance MySQL Training by Pratyush Majumdar
Advance MySQL Training by Pratyush Majumdar
Pratyush Majumdar
 

Similar to Optimizing queries MySQL (20)

Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
Explain
ExplainExplain
Explain
 
Applied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System PresentationApplied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System Presentation
 
DATA BASE || INTRODUCTION OF DATABASE \\ SQL 2018
DATA BASE || INTRODUCTION OF DATABASE \\ SQL 2018DATA BASE || INTRODUCTION OF DATABASE \\ SQL 2018
DATA BASE || INTRODUCTION OF DATABASE \\ SQL 2018
 
Sql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ISql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices I
 
Distributed databases systems-3-2015.ppt
Distributed databases systems-3-2015.pptDistributed databases systems-3-2015.ppt
Distributed databases systems-3-2015.ppt
 
4. Data Manipulation.ppt
4. Data Manipulation.ppt4. Data Manipulation.ppt
4. Data Manipulation.ppt
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
15 protips for mysql users pfz
15 protips for mysql users   pfz15 protips for mysql users   pfz
15 protips for mysql users pfz
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL Indexing
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releases
 
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-KompatibilitätMariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
 
What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3
 
Advance MySQL Training by Pratyush Majumdar
Advance MySQL Training by Pratyush MajumdarAdvance MySQL Training by Pratyush Majumdar
Advance MySQL Training by Pratyush Majumdar
 

Recently uploaded

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 

Recently uploaded (20)

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 

Optimizing queries MySQL

  • 1. Optimizing queries in MySQL Tabular, tree and visual explain plans, new optimization features November 15-th, 2019
  • 2. Who am I? • Software Development Manager & Team Leader @ Codix • Using MySQL since 3.23.x • Building MySQL Server and related products for Slackware Linux for more than 14 years (check SlackPack) • Free and open source software enthusiast • Formula 1 and Le Mans 24 fan • gdsotirov @ 2
  • 3. Agenda • Query optimization • What is explain plan? • How to explain queries in MySQL • Understanding tabular explain plans • Optimization features (indexes, hints, histograms) • New optimization features (TREE, hash join, EXPLAIN ANALYZE) • Using visual explain plans 3
  • 4. Query optimization • SQL queries express requests to the database • Database parses, optimizes and executes the query • The optimizer should choose the most efficient way for execution: • using all available information about tables, columns, indexes, etc. • evaluating alternative plans • For us (database practitioners) optimizing (or tuning) queries means to • ensure use of efficient access paths; • ensure use of optimal join order; • rewrite query, to help optimizer choose a better plan; and • even reorganize (normalize) the schema. • The optimizer provides information about how it intends to execute a query through the explain plan For automatic query optimization see EverSQL 4
  • 5. What is explain plan? • Explain plan (or execution plan) shows the steps that MySQL optimizer would take to execute a query • Includes information about: • access paths; • used indexes and partitions; • joins between tables; • order of operations; and • extra details. • It helps understand if indexes are missing or not used, whether joins are done in optimal order or generally why queries are slow 5
  • 6. Why should we as developers care? • We (should) know the schema (or should we?) • We know how the application(s) queries data (i.e. the access paths) as we are familiar with the implemented functionalities • We know what data is stored in tables as it comes from the application(s) • We should care about performance of the application(s), which may depend on the performance of SQL queries • We do not need to depend on others (e.g. DBAs), if we could diagnose and fix a slow query 6
  • 7. How to explain queries - EXPLAIN syntax • The general syntax is: {EXPLAIN | DESCRIBE | DESC} [explain_type: {FORMAT = {TRADITIONAL | JSON | TREE}] {statement | FOR CONNECTION con_id} • Statement could be SELECT, DELETE, INSERT, REPLACE or UPDATE (before 5.6.3 only SELECT) • TREE is new format since 8.0.16 GA (2019-04-25) • Can explain the currently executing query for a connection (since 5.7.2) • Requires SELECT privilege for tables and views + SHOW VIEW privilege for views • DESCRIBE is synonym for EXPLAIN but used mostly for getting table structure 7
  • 8. Example schema – departments and employees CREATE DATABASE dept_emp; USE dept_emp; CREATE TABLE dept ( deptno INTEGER, dname VARCHAR(14), loc VARCHAR(13), CONSTRAINT pk_dept PRIMARY KEY (deptno) ); CREATE TABLE emp ( empno INTEGER, ename VARCHAR(10), job VARCHAR(9), mgr INTEGER, hiredate DATE, sal DECIMAL(7,2), comm DECIMAL(7,2), deptno INTEGER, CONSTRAINT pk_emp PRIMARY KEY (empno), CONSTRAINT fk_deptno FOREIGN KEY (deptno) REFERENCES dept (deptno) ); 8
  • 9. Example schema - data INSERT INTO dept VALUES (10, 'ACCOUNTING', 'NEW YORK'); INSERT INTO dept VALUES (20, 'RESEARCH' , 'DALLAS'); INSERT INTO dept VALUES (30, 'SALES' , 'CHICAGO'); INSERT INTO dept VALUES (40, 'OPERATIONS', 'BOSTON'); INSERT INTO emp VALUES (7839, 'KING' , 'PRESIDENT', NULL, '1981-11-17', 5000, NULL, 10); INSERT INTO emp VALUES (7698, 'BLAKE' , 'MANAGER' , 7839, '1981-05-01', 2850, NULL, 30); INSERT INTO emp VALUES (7782, 'CLARK' , 'MANAGER' , 7839, '1981-06-09', 2450, NULL, 10); INSERT INTO emp VALUES (7566, 'JONES' , 'MANAGER' , 7839, '1981-04-02', 2975, NULL, 20); INSERT INTO emp VALUES (7788, 'SCOTT' , 'ANALYST' , 7566, '1987-06-13', 3000, NULL, 20); INSERT INTO emp VALUES (7902, 'FORD' , 'ANALYST' , 7566, '1981-12-03', 3000, NULL, 20); INSERT INTO emp VALUES (7369, 'SMITH' , 'CLERK' , 7902, '1980-12-17', 800, NULL, 20); INSERT INTO emp VALUES (7499, 'ALLEN' , 'SALESMAN' , 7698, '1981-02-20', 1600, 300, 30); INSERT INTO emp VALUES (7521, 'WARD' , 'SALESMAN' , 7698, '1981-02-22', 1250, 500, 30); INSERT INTO emp VALUES (7654, 'MARTIN', 'SALESMAN' , 7698, '1981-09-28', 1250, 1400, 30); INSERT INTO emp VALUES (7844, 'TURNER', 'SALESMAN' , 7698, '1981-09-08', 1500, 0, 30); INSERT INTO emp VALUES (7876, 'ADAMS' , 'CLERK' , 7788, '1987-06-13', 1100, NULL, 20); INSERT INTO emp VALUES (7900, 'JAMES' , 'CLERK' , 7698, '1981-12-03', 950, NULL, 30); INSERT INTO emp VALUES (7934, 'MILLER', 'CLERK' , 7782, '1982-01-23', 1300, NULL, 10); + 1 000 000 rows See gen_emps.sql 9
  • 10. Traditional explain plan Example EXPLAIN SELECT E.ename, E.job, D.dname FROM dept D, emp E WHERE E.deptno = D.deptno AND E.job = 'CLERK'; +----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+ | id | select_type | table || type | possible_keys | key || ref | rows | filtered | Extra | +----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+ | 1 | SIMPLE | D || ALL | PRIMARY | NULL || NULL | 4 | 100 | NULL | | 1 | SIMPLE | E || ref | fk_deptno | fk_deptno || dept_emp.D.deptno | 250003 | 10 | Using where | +----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+ 2 rows in set, 1 warning (0.0006 sec) Note (code 1003): /* select#1 */ select `dept_emp`.`e`.`ename` AS `ename`,`dept_emp`.`e`.`job` AS `job`,`dept_emp`.`d`.`dname` AS `dname` from `dept_emp`.`dept` `d` join `dept_emp`.`emp` `e` where ((`dept_emp`.`e`.`deptno` = `dept_emp`.`d`.`deptno`) and (`dept_emp`.`e`.`job` = 'CLERK')) rows to join from E = 250 003 * 10% = 25 000.3 (per department) 10
  • 11. Understanding tabular explain plans 1/5 • id: the sequential number of the SELECT in query • select_type: possible values include: +----+-------------+ | id | select_type | +----+-------------+ | 1 | SIMPLE | | 1 | SIMPLE | +----+-------------+ Value Meaning SIMPLE no unions or subqueries PRIMARY outermost SELECT [DEPENDENT] UNION [dependent on outer query] second or later SELECT in a union UNION RESULT result of a union [DEPENDENT] SUBQUERY [dependent on outer query] first SELECT in subquery [DEPENDENT] DERIVED [dependent on another table] derived table MATERIALIZED materialized subquery UNCACHEABLE [SUBQUERY|UNION] a subquery/union that must be re-evaluated for each row of the outer query 11
  • 12. Understanding tabular explain plans 2/5 • table: name of table, union, subquery or derived. Could be • table alias (or name) • <unionM,N> - union between rows with ids M and N; • <subqueryN> - result of materialized subquery from row N; • <derivedN> - result of derived table from row N. • partitions: NULL or the names of matched partitions • type: join (access) type +-------+------------+------+ | table | partitions | type | +-------+------------+------+ | D | NULL | ALL | | E | NULL | ref | +-------+------------+------+ 12
  • 13. Join (access) types best worst Value Meaning system for tables with just one row const for tables matching at most one row (by constant value) eq_ref for 1:1 relations (primary keys or UNIQUE NOT NULL indexes) ref for 1:N relations (non-unique indexes) ref_or_null like ref, but searches also NULL values fulltext join using full text index index_merge using index merge optimization (merge multiple ranges) unique_subquery for some IN subqueries returning primary key values index_subquery same as previous, but for non-unique indexes range for comparison operators (e.g. >, <, >=, <=), BETWEEN, IN, LIKE index same as ALL, but only the index tree is scanned ALL all rows (i.e. full table scan) 13
  • 14. Understanding tabular explain plans 3/5 • possible_keys: NULL or names of indexes that could be used • key: actually used index (may be other than listed in possible_keys) • key_len: used key parts in bytes (e.g. >= 4 bytes for INT, cumulated for composite indexes) • ref: columns or constants used to compare to indexes • rows: estimated number of rows to be retrieved by the chosen access path +---------------+-----------+---------+-------------------+--------+ | possible_keys | key | key_len | ref | rows | +---------------+-----------+---------+-------------------+--------+ | PRIMARY | NULL | NULL | NULL | 4 | | fk_deptno | fk_deptno | 5 | dept_emp.D.deptno | 250003 | +---------------+-----------+---------+-------------------+--------+ 14
  • 15. Understanding tabular explain plans 4/5 • filtered: estimated percentage of filtered rows (before 5.7.3 EXPLAIN EXTENDED was needed) • rows x filtered gives the estimated number of rows to be joined • filter estimates are based on range estimates (e.g. for dates), index statistics or hardcoded selectivity factors (see Selinger): +----------+ | filtered | +----------+ | 100 | | 10 | +----------+ Predicate Selectivity Filtered Equality (=) 0.1 10.00 % Comparison (>, <, >=, <=) 1/3 ≈ 0.33 33.33 % BETWEEN (also LIKE) 1/9 ≈ 0.11 (it’s ¼ by Selinger!) 11.11 % (pred1) AND (pred2) SEL(pred1) * SEL(pred2) (pred1) OR (pred2) SEL(pred1) + SEL(pred2) - SEL(pred1) * SEL(pred2) NOT (pred) 1 – SEL(pred) 15
  • 16. Understanding tabular explain plans 5/5 • Extra information: • NULL; or • Plan isn't ready yet – when explaining query in a named connection; • Recursive – indicates recursive CTE; • Rematerialize – for dependent lateral derived tables; • Using filesort – extra pass for sorting (i.e. when no index could be used); • Using index – only index tree is scanned; • Using index condition – WHERE conditions pushed down to storage engine; • Using join buffer (Block Nested Loop) – BNL algorithm; • Using join buffer (Batched Key Access) – BKA algorithm; • Using temporary – when temporary table is used (e.g. GROUP/ORDER BY); • Using where – selected rows are restricted by a WHERE clause; • many more. 16
  • 17. Indexes • Indexes improve efficiency of queries by providing faster access to data • Proper indexing reduces query response time and improves scalability • As developers we need to index, because we know the access paths of the application(s) • MySQL supports B-Tree, hash, full text and spatial indexes (R-Tree) • B-Tree indexes are by default ordered (in ascending order or also descending order since MySQL 8.0.1 DMR) • Indexes could be based on multiple columns (composite), functional (since 8.0.13 GA) and multi-valued (since 8.0.17 GA) • Indexes could also be invisible (since 8.0.0 DMR) • Use ANALYZE TABLE regularly to update indexes cardinality 17
  • 18. Optimizer hints • Fine control over optimizer execution plans of individual statements • Statement level hints first appeared in MySQL 5.7.7 RC (2015-04-08) • Extended with new hints in 8.0 for join order, resource group, etc. • Syntax is similar to Oracle – e.g. /*+ JOIN_ORDER(...) */ • System variable optimizer_switch could also be used, but on global or session level • The necessary “evil” when the optimizer cannot itself chose the best execution plan • Several scope levels: query block, table, join order, index, subquery and global (e.g. MAX_EXECUTION_TIME, RESOURCE_GROUP, SET_VAR) 18
  • 19. Optimizer hints Example SELECT E.ename, E.job, D.dname FROM dept D, emp E WHERE E.deptno = D.deptno AND E.job = 'PRESIDENT'; SELECT /*+ JOIN_ORDER(E, D) */ E.ename, E.job, D.dname FROM dept D, emp E WHERE E.deptno = D.deptno AND E.job = 'PRESIDENT'; +----+-------+------+-----------+--------+----------+ | id | table | type | key | rows | filtered | +----+-------+------+-----------+--------+----------+ | 1 | D | ALL | NULL | 4 | 100 | | 1 | E | ref | fk_deptno | 250003 | 10 | +----+-------+------+-----------+--------+----------+ Execution time: ≈ 2.1 sec +----+-------+--------+---------+---------+----------+ | id | table | type | key | rows | filtered | +----+-------+--------+---------+---------+----------+ | 1 | E | ALL | NULL | 1000014 | 9.995684 | | 1 | D | eq_ref | PRIMARY | 1 | 100 | +----+-------+--------+---------+---------+----------+ Execution time: ≈ 0.3 sec 19
  • 20. Histograms • Since MySQL 8.0.3 RC (released 2017-09-21) • Statistical information about distribution of values in a column • Values are grouped in buckets (maximum 1024) • Cumulative frequency is automatically calculated for each bucket • For large data sets sampling of the data is used, but it needs memory (see histogram_generation_max_mem_size variable) • Sampling requires full table scan, but would be improved for InnoDB in 8.0.19 (probably to be released in January 2020) • Sampling is not deterministic! 20
  • 21. Histogram types Singleton • Single value per bucket • Bucket stores value and cumulative frequency • Useful for estimation of equality and range conditions Equi-height • Multiple values per bucket • Bucket stores min and max inclusive values, cumulative frequency and number of distinct values • Frequent values in separate buckets • Most useful for range conditions 21 0 0.1 0.2 0.3 0.4 0.5 0.6 5 4 3 1 2 Frequency 0 0.05 0.1 0.15 0.2 0.25 0.3 [1,7] 8 [9,12] [13,19] [20,25] [1,7] 8 [9,12] [13,19] [20,25]
  • 22. Histograms continued • Used by optimizer for estimating join cost • Help the optimizer to make better row estimates • Useful for columns that are NOT first column of any index, but used in WHERE clause for joins or IN subqueries • Best for columns with: • low cardinality; • uneven distribution; and • distribution that does not vary much over time • Not useful for columns with constantly increasing values (e.g. dates, counters, etc.) 22
  • 23. Histogram Example 1/4 – creation and meta ANALYZE TABLE emp UPDATE HISTOGRAM ON job WITH 5 BUCKETS; {"buckets": [ ["base64:type254:QU5BTFlTVA==", 0.44427638528762128], ["base64:type254:Q0xFUks=" , 0.7765828956408041], ["base64:type254:TUFOQUdFUg==", 0.8882609755557034], ["base64:type254:U0FMRVNNQU4=", 1.0] ], "data-type": "string", "null-values": 0.0, "collation-id": 33, "last-updated": "2019-10-17 07:33:42.007222", "sampling-rate": 0.10810659461578348, "histogram-type": "singleton", "number-of-buckets-specified": 5 } SELECT JSON_PRETTY(`histogram`) FROM INFORMATION_SCHEMA.COLUMN_STATISTICS CS WHERE CS.`schema_name` = 'dept_emp' AND CS.`table_name` = 'emp' AND CS.`column_name` = 'job'; 23
  • 24. Histogram Example 2/4 - sampling SET histogram_generation_max_mem_size = 184*1024*1024; /* 184 MB */ ANALYZE TABLE emp UPDATE HISTOGRAM ON job WITH 5 BUCKETS; {"buckets": [ ["base64:type254:QU5BTFlTVA==", 0.4440377834710314], ["base64:type254:Q0xFUks=" , 0.777311117644353], ["base64:type254:TUFOQUdFUg==", 0.8887915569182032], ["base64:type254:UFJFU0lERU5U", 0.8887925569042035], ["base64:type254:U0FMRVNNQU4=", 1.0] ], "data-type": "string", "null-values": 0.0, "collation-id": 33, "last-updated": "2019-10-21 10:52:03.974566", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 5 } Note: Setting histogram_generation_max_mem_size requires SESSION_VARIABLES_ADMIN (since 8.0.14) or SYSTEM_VARIABLES_ADMIN privilege. 24
  • 25. Histogram Example 3/4 - frequencies SELECT HG.val, ROUND(HG.freq, 3) cfreq, ROUND(HG.freq - LAG(HG.freq, 1, 0) OVER (), 3) freq FROM INFORMATION_SCHEMA.COLUMN_STATISTICS CS, JSON_TABLE(`histogram`->'$.buckets', '$[*]' COLUMNS(val VARCHAR(10) PATH '$[0]', freq DOUBLE PATH '$[1]')) HG WHERE CS.`schema_name` = 'dept_emp' AND CS.`table_name` = 'emp' AND CS.`column_name` = 'job'; +-----------+-------+-------+ | val | cfreq | freq | +-----------+-------+-------+ | ANALYST | 0.444 | 0.444 | | CLERK | 0.777 | 0.333 | | MANAGER | 0.889 | 0.111 | | PRESIDENT | 0.889 | 0 | | SALESMAN | 1 | 0.111 | +-----------+-------+-------+ 5 rows in set (0.0009 sec) 25
  • 26. Histogram Example 4/4 – query plan effect SELECT E.ename, E.job, D.dname FROM dept D, emp E WHERE E.deptno = D.deptno AND E.job = 'CLERK'; +----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+ | id | select_type | table || type | possible_keys | key || ref | rows | filtered | Extra | +----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+ | 1 | SIMPLE | D || ALL | PRIMARY | NULL || NULL | 4 | 100 | NULL | | 1 | SIMPLE | E || ref | fk_deptno | fk_deptno || dept_emp.D.deptno | 250003 | 33.32733 | Using where | +----+-------------+-------++------+---------------+-----------++-------------------+--------+----------+-------------+ rows to join from E = 250 003 * 33.32733154296875 ≈ 83 319 (per department) AND E.job = 'PRESIDENT'; +----+-------------+-------++--------+---------------+---------++-------------------+---------+----------+-------------+ | id | select_type | table || type | possible_keys | key || ref | rows | filtered | Extra | +----+-------------+-------++--------+---------------+---------++-------------------+---------+----------+-------------+ | 1 | SIMPLE | E || ALL | fk_deptno | NULL || NULL | 1000014 | 0.000099 | Using where | | 1 | SIMPLE | D || eq_ref | PRIMARY | PRIMARY || dept_emp.E.deptno | 1 | 100 | NULL | +----+-------------+-------++--------+---------------+---------++-------------------+---------+----------+-------------+ rows to join from E = 1 000 014 * 0.00009999860048992559 ≈ 1 26
  • 27. Histograms vs Indexes Histograms • Use less disk space • Updated only on demand • For low cardinality columns • Create needs only backup lock (permits DML) Indexes • Use more disk space • Updated with each DML • For any cardinality columns • Create needs metadata lock (no DML permitted) 27
  • 28. TREE explain plan • Appeared in MySQL 8.0.16 GA (released 2019-04-25) and improved with 8.0.18 GA (released 2019-10-14) • Still under development and considered “experimental” • Displays operations (iterators) nested as a tree • Helps to better understand the order of execution of operations • For access path operations includes (since 8.0.18 GA) information about: • estimated execution cost • estimated number of returned rows See TREE explain format in MySQL 8.0.16 28
  • 29. TREE explain plan Example 1/2 EXPLAIN FORMAT=TRADITIONAL SELECT D.dname, LDT.min_sal, LDT.avg_sal, LDT.max_sal FROM dept D, LATERAL (SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal FROM emp E WHERE E.deptno = D.deptno ) AS LDT; +----+-------------------+------------++------++-----------++-------------------+------+----------+----------------------------+ | id | select_type | table || type || key || ref | rows | filtered | Extra | +----+-------------------+------------++------++-----------++-------------------+------+----------+----------------------------+ | 1 | PRIMARY | D || ALL || NULL || NULL | 4| 100 | Rematerialize (<derived2>) | | 1 | PRIMARY | <derived2> || ALL || NULL || NULL | 2| 100 | NULL | | 2 | DEPENDENT DERIVED | E || ref || fk_deptno || dept_emp.D.deptno |250003| 100 | NULL | +----+-------------------+------------++------++-----------++-------------------+------+----------+----------------------------+ 3 rows in set, 2 warnings (0.0010 sec) Note (code 1276): Field or reference 'dept_emp.D.deptno' of SELECT #2 was resolved in SELECT #1 Note (code 1003): /* select#1 */ select `dept_emp`.`d`.`dname` AS `dname`,`ldt`.`min_sal` AS ... 29
  • 30. TREE explain plan Example 2/2 EXPLAIN FORMAT=TREE SELECT D.dname, LDT.min_sal, LDT.avg_sal, LDT.max_sal FROM dept D, LATERAL (SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal FROM emp E WHERE E.deptno = D.deptno ) AS LDT; +----------------------------------------------------------------------+ | EXPLAIN | +----------------------------------------------------------------------+ | -> Nested loop inner join -> Invalidate materialized tables (row from D) (cost=0.65 rows=4) -> Table scan on D (cost=0.65 rows=4) -> Table scan on LDT -> Materialize (invalidate on row from D) -> Aggregate: min(e.sal), avg(e.sal), max(e.sal) -> Index lookup on E using fk_deptno (deptno=d.deptno) (cost=36095.62 rows=332359) | +----------------------------------------------------------------------+ 30
  • 31. Hash join optimization • New in MySQL 8.0.18 GA (released 2019-10-14) • Nested loop was the only join type before - Nested Loop Join (NLJ), Block Nested Loop (BNL) and Batched Key Access (BKA) algorithms • Hash the smaller table and use it to lookup rows from the other table • Uses xxHash for extremely fast RAM hashing • The best is hashing to be done entirely in memory, but could also use disk (less efficient) • You may need to adjust join buffer size (see join_buffer_size) Table 1 Table 2 xxHash64 xxHash64 Join buffer # # # Result = See WL#2241 31
  • 32. Hash join continued • Used automatically for any query with eq_ref condition and join uses no indexes • Would also work for cartesian joins (i.e. for joins without join condition) • Unfortunately visible only in TREE format of explain plan • New hints for forcing hash join or NL - HASH_JOIN or NO_HASH_JOIN • Also on global or session level with hash_join=on|off in optimizer_switch system variable 32
  • 33. Hash join Example 1/2 CREATE TABLE job_sal ( job VARCHAR(9), sal_min DECIMAL(9,2), sal_max DECIMAL(9,2) ); INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('ANALYST’ , 3000, 4000); INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('CLERK’ , 800, 1500); INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('MANAGER’ , 2800, 3500); INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('SALESMAN’ , 1250, 1900); INSERT INTO job_sal (job, sal_min, sal_max) VALUES ('PRESIDENT', 5000, NULL); 33
  • 34. Hash join Example 2/2 EXPLAIN FORMAT=TREE SELECT E.ename, E.sal, JS.sal_min, JS.sal_max FROM emp E, job_sal JS WHERE E.job = JS.job AND E.sal NOT BETWEEN JS.sal_min AND JS.sal_max; +---------------------------------------------------------------------------------------+ | EXPLAIN | +---------------------------------------------------------------------------------------+ | -> Filter: (E.sal not between JS.sal_min and JS.sal_max) (cost=499211.71 rows=442901) -> Inner hash join (E.job = JS.job) (cost=499211.71 rows=442901) -> Table scan on E (cost=1962.39 rows=996514) -> Hash -> Table scan on JS (cost=0.75 rows=5) | +---------------------------------------------------------------------------------------+ 1 row in set (0.0011 sec) 34
  • 35. Block Nested Loop (BNL) vs Hash join Block Nested Loop (BNL) • Block NL query run for about 1.30 sec for 1M employees • For equality and non-equality joins • For smaller result sets BNL should be just fine Hash join • Hash join query run for about 0.9 sec for 1M employees • For equality joins only • For large result sets hash join should be faster 35
  • 36. EXPLAIN ANALYZE • New in MySQL 8.0.18 GA (released 2019-10-14) • Actually executes the query and provides timings from the execution • Useful for comparing optimizer’s estimations to actual execution • Output is in TREE format only (hopefully just for now – see WL#4168) • In addition to TREE output provides also information about: • time to return first row • time to return all rows • number of returned rows • number of loops • Only for SELECT statements. Cannot be used with FOR CONNECTION See also EXPLAIN for PostgreSQL 36
  • 37. EXPLAIN ANALYZE Example 1 +--------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------+ | -> Filter: (e.sal not between js.sal_min and js.sal_max) (cost=499211.71 rows=442901) (actual time=0.098..778.486 rows=915166 loops=1) -> Inner hash join (e.job = js.job) (cost=499211.71 rows=442901) (actual time=0.089..568.473 rows=1000014 loops=1) -> Table scan on E (cost=1962.39 rows=996514) (actual time=0.025..288.830 rows=1000014 loops=1) -> Hash -> Table scan on JS (cost=0.75 rows=5) (actual time=0.041..0.048 rows=5 loops=1)| +--------------------------------------------------------------------------------------------+ 1 row in set (0.8240 sec) EXPLAIN ANALYZE SELECT E.ename, E.sal, JS.sal_min, JS.sal_max FROM emp E, job_sal JS WHERE E.job = JS.job AND E.sal NOT BETWEEN JS.sal_min AND JS.sal_max; Note: Actual times are in milliseconds! 37
  • 38. EXPLAIN ANALYZE Example 2 EXPLAIN ANALYZE SELECT E.ename, E.job, D.dname FROM dept D, emp E WHERE E.deptno = D.deptno AND E.job = 'CLERK'; +------------------------------------------------------------------------------------------+ | EXPLAIN | +------------------------------------------------------------------------------------------+ | -> Nested loop inner join (cost=112394.29 rows=333278) (actual time=5.904..3281.472 rows=333278 loops=1) -> Table scan on D (cost=1.40 rows=4) (actual time=4.659..4.666 rows=4 loops=1) -> Filter: (e.job = 'CLERK') (cost=5180.86 rows=83319) (actual time=1.016..811.246 rows=83320 loops=4) -> Index lookup on E using fk_deptno (deptno=d.deptno) (cost=5180.86 rows=250004) (actual time=1.013..786.799 rows=250004 loops=4) | +------------------------------------------------------------------------------------------+ 1 row in set (3.3051 sec) first row -> 5.904 ≈ 5.675 = 4.659 + 1.016 all rows -> 3 281.472 ≈ 3 249.65 = 4.666 + 4 * 811.246 Note: Actual times are aggregated between loops! 38
  • 39. Problems with TREE and EXPLAIN ANALYZE • Will not explain queries using nested loop (shows just “not executable by iterator executor”) • Will not explain SELECT COUNT(*) FROM table queries (shows just “Count rows in table”) • Does not compute select list subqueries (see bug 97296) [FIXED] • Not integrated with MySQL Workbench (see bug 97282) • Does not print units on timings (see bug 97492) 39
  • 40. Visual explain plans • Displayed by default in MySQL Workbench 40
  • 41. Additional information by JSON/visual explain • Used columns – list of columns either read or written • Used key parts – list of used key parts • Rows produced per join – estimated rows after join • Cost estimates (since 5.7.2) are split into: • query cost – total cost of query (or subquery) block • sort cost (CPU) – cost of the first sorting operation • read cost (IO) – the cost of reading data from table • eval cost (CPU) – the cost of condition evaluation • prefix cost (CPU) – the cost of joining tables • data read – estimated amount of data processed (rows x record width) See WL#6510 and MySQL 5.7.2 Release notes 41
  • 42. Benefits from visual explain plans • Help understand easily where is the problem • easily spot bad access paths by box color: • missing index(es); • wrong or insufficient join conditions • easily spot where most rows are generated by the thickness of lines: • bad access path and/or no filter; • involuntary cartesian joins; • wrong join order • Cost calculations are not fully documented, so hard to understand • basically cost values are related to the number of blocks read (IO) and rows processed (CPU) • costs reflect the work done by the database 42
  • 43. Visual explain plans Example 1 1/2 MySQL 5.5 ≈ 9.5 sec Option A: Add WHERE condition in the subquery MySQL 5.5/8.0 ≈ 1.2 sec 43
  • 44. Visual explain plans Example 1 2/2 MySQL 8.0 ≈ 0.015 secMySQL 5.5 ≈ 0.015 sec Option B: Use derived table instead of subquery in WHERE Option C: Use MySQL 8 ;-) 44
  • 45. Visual explain plans Example 2 1/3 In MySQL 5.7.20: 6-7 sec 45
  • 46. Visual explain plans Example 2 2/3 In MySQL 8.0.13: ≈ 24 sec 46
  • 47. Visual explain plans Example 2 3/3 Option A: Use JOIN_ORDER hint Option B: Re-create multi-column index on two instead of three columns to improve selectivity (one of the columns in the index had low cardinality) Option C: Option B + remove table msg_progs Execution time: ≈ 4 sec 47
  • 48. Summary • Use EXPLAIN to examine and tune query plan • Use EXPLAIN ANALYZE to profile query execution • Use different types of indexes and histograms properly • There are no strict rules to follow for optimizing a query • So be creative and evaluate different options 48
  • 49. References • MySQL Reference Manual (and section Optimization in particular) • Use the index, Luke! and Modern SQL sites by Markus Winand • MySQL EXPLAIN Explained by Øystein Grøvlen • Histogram statistics in MySQL & Hash join in MySQL 8 by Erik Frøseth • MySQL Explain Example by Tomer Shay • MySQL EXPLAIN ANALYZE by Norvald H. Ryeng • Using Explain Analyze in MySQL 8 from Percona blog • My blog posts on MySQL 49
  • 51. 51

Editor's Notes

  1. 4th generation programming languages provide higher level of abstraction. A subset of domain specific languages (e.g. database, reporting, GUI, web development). Also declarative languages. “SQL language is perhaps the most successful fourth-generation programming language (4GL).” Markus Winand
  2. MySQL’s optimizer is cost based (CBO)
  3. DESCRITBE | DESC is provided for compatibility with Oracle
  4. Difference between DEPENDENT and UNCACHEABLE is that in the first case the subquery is re-evaluated only once for each set of different values from the outer context, while in the second case the subquery is re-evaluated for each row of the outer context Select type could also be DELETE, INSERT or UPDATE for non-SELECT statements.
  5. Always use aliases when joining tables!
  6. Index may be other than listed in possible_keys, because possible_keys lists indexes suitable for access (looking up rows), but a covering index could be used
  7. Using index condition is for Index Condition Pushdown (ICP), not Engine condition pushdown possible only for NDB.
  8. „Searching in a database index is like searching in a printed telephone directory.” Markus Winand No need to index all columns of a table (i.e. over indexing). For concatenated indexes the most important is to choose properly column order.
  9. Before MySQL 5.7.7 optimizer adjustments only possible through optimizer_switch system variable. In some cases hints are the necessary evil 
  10. Default value for histogram_generation_max_mem_size is 20 000 000 (i.e. 19 MB) Cumulative frequency – it is the sum (or running total) of all the frequencies up to the current point in the data set. Not deterministic - Sampling considers different data each time.
  11. MySQL automatically chooses histogram type considering number of distinct values and buckets specified
  12. Data variation over time – histogram needs to be updated For increasing dates better use index
  13. For string values, a maximum of 42 characters from the beginning are considered
  14. Total rows for CLERK = 83 319 * 4 = 333 276
  15. DML – Data Manipulation Language (e.g. INSERT, UPDATE, DELETE)
  16. Cost and row estimates since MySQL 8.0.18
  17. Join buffer size is 256 MB by default in MySQL 8 Limit the columns selected from the hashed table Hash joins won’t benefit from indexes on the join condition, but for sure would benefit from indexes on columns used in independent conditions (e.g. sale_date > ‘2019-10-01’) Hash join may not succeed if disk is used and open_files_limit is reached
  18. Pipelined means to return first results without processing all input data.
  19. A profiling tool
  20. Actual time for loops is average over all loops. Total time is time * loops Times add up, but are not exactly equal
  21. Used columns is useful for identifying candidates for covering indexes. Used key parts is useful for checking composite indexes.