MySQL Optimizer Overview: Logical Transformations, Cost-Based Optimizations and Access Methods

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
MySQL Optimizer Overview
Olav Sandstå
Senior Principal Engineer
MySQL Optimizer Team, Oracle
September, 2015

Program Agenda
Logical transformations
Cost-based optimizations
Analyzing access methods
Join optimizer
Plan refinements
Subquery optimizations
Query execution plan
1
2
3
4
5
2
6
7

MySQL Optimizer
SELECT a, b
FROM t1, t2, t3
WHERE t1.a = t2.b
AND t2.b = t3.c
AND t2.d > 20
AND t2.d < 30;
Table/index info
(data dictionary)
Statistics
(storage engines)
t2 t3
t1
Table
scan
Range
scan
Ref
access
JOIN
JOINQuery
Optimizer
3

MySQL
Architecture
Optimizer
Cost-based optimizer:
Join order and access methods
Plan refinement
Query execution
Parser
Resolver:
Semantic check,name resolution
Storage Engine
InnoDB MyISAM
SQL query
Query result
4

MySQL Optimizer Characteristics
• Produces the query plan that uses least
resources
– IO and CPU
• Optimizes a single query
– No inter-query optimizations
• Produces left-deep linear query execution
plan
JOIN
JOIN
t1 t2
t3
JOIN
t4Table
scan
Table
scan
Range
scan
Ref
access
5

Optimizer Overview
Main phases
Optimizer
Cost-based optimizer:
Join order and access methods
Plan refinement
Query execution
Parser
Resolver:
Semantic check,name resolution
Storage engine
InnoDB MyISAM
Prepare for cost-based
optimization
Negation elimination
Equality and constant propagation
Evaluation of constant expressions
Conversions of outer to inner join
Subquery transformation
Ref access analysis
Range access analysis
Estimation of condition fan out
Constant table detection
Table condition pushdown
Access method adjustments
Sort avoidance
Index condition pushdown
Access method selection
Join order
6

Program Agenda
Join optimizer
Plan refinements
1
2
3
4
5
7
6
7

Logical Transformations
• Logical transformations of query
conditions:
– Negation elimination
– Equality propagations
– Evaluate constant expressions
– Remove trivial conditions
• Conversion of outer to inner join
• Merging of views and derived tables
• Subquery transformations
Simpler query to
optimize and
execute
Prepare for later
optimizations
8

Example:
Logical Transformations
t1.a = 9 AND t2.a = 9 AND (9 <= 10 AND t2.b <= 3 OR (t1.b = 12 AND t2.b = 5));
Evaluate const
expressions
SELECT * FROM t1, t2 WHERE
t1.a = t2.a AND t2.a = 9 AND (NOT (t1.a > 10 OR t2.b > 3) OR (t1.b = t2.b + 7 AND t2.b = 5));
Negation
elimination
t1.a = t2.a AND t2.a = 9 AND (t1.a <= 10 AND t2.b <= 3 OR (t1.b = t2.b + 7 AND t2.b = 5));
Equality/const
propagation
t1.a = 9 AND t2.a = 9 AND (9 <= 10 AND t2.b <= 3 OR (t1.b = 5 + 7 AND t2.b = 5));
=TRUE
Trivial condition
removal
t1.a = 9 AND t2.a = 9 AND (t2.b <= 3 OR (t1.b = 12 AND t2.b = 5));
9

Program Agenda
Join optimizer
Plan refinements
1
2
3
4
5
10
6
7

Cost-based Query Optimization
General idea:
• Assign cost to operations
• Assign cost to partial or alternative plans
• Search for plan with lowest cost
t2 t3
t1
Table
scan
Range
scan
Ref
access
JOIN
JOIN
11

Cost-based Query Optimizations
The main cost-based optimizations:
• Index and access method:
– Table scan
– Index scan
– Range scan
– Index lookup (ref access)
• Join order
• Join buffering strategy
• Subquery strategy
t2 t3
t1
Table
scan
Range
scan
Ref
access
JOIN
JOIN
12

Optimizer Cost Model
t1 Cost estimate
Row estimate
Cost Model
Cost formulas
Access
methods
Join Subquery
Cost constants
CPU IO
Metadata:
- Record and index size
- Index information
- Uniqueness
Statistics:
- Table size
- Cardinality
- Range estimates
Cost model
configuration
Range
scan
JOIN
13
New in
MySQL 5.7

• The cost for executing a query
• Cost unit:
– “read a random data page from disk”
• Main cost factors:
– IO cost:
• #pages read from table
• #pages read from index
– CPU cost:
• Evaluating query conditions
• Comparing keys/records
• Sorting keys
• Main cost constants:
Cost Estimates
Cost Default value
Reading a random disk page 1.0
Reading a data page from
memory buffer
1.0
Evaluating query condition 0.2
Comparing key/record 0.1
New in MySQL 5.7:
Configurable
14

Cost Model Examples
Table scan:
• IO-cost: #pages in table * IO_BLOCK_READ_COST
• CPU-cost: #records * ROW_EVALUATE_COST
Range scan (on secondary index):
• IO-cost: #records_in_range * IO_BLOCK_READ_COST
• CPU cost: #records_in_range * ROW_EVALUATE_COST +
#records_in_range * ROW_EVALUATE_COST
SELECT * FROM t1 WHERE a BETWEEN 20 AND 23
Evaluate range condition Evaluate WHERE condition
15

Program Agenda
Join optimizer
Plan refinements
1
2
3
4
5
16
6
7

Selecting Access Method
• For each table, find the best access method:
• 1. Check if the access method is useful
• 2. Estimate cost of using access method
• 3. Select the cheapest to be used
• Choice of access method is cost based
Finding the optimal method to read data from storage engine
Main access methods
• Table scan
• Index scan
• Index lookup
(ref access)
• Range scan
• Index merge
• Loose index scan
17

Index Lookup (Ref Access)
• Read all records with a given key value using an index
• Examples:
SELECT * FROM t1 WHERE t1.key = 7;
SELECT * FROM t1, t2 WHERE t1.key = t2.key;
• “eq_ref”:
– Reading from a unique index, max one record returned
• “ref”:
– Reading from a non-unique index or a prefix of an index, possibly multiple records
returned
– The record estimate is based on cardinality number from index statistics
18

Ref Access Analysis
• Determine which indexes that can be used for index lookup in a join
SELECT City.Name AS Capital, Language FROM CountryLanguage, Country, City
WHERE City.CountryCode = Country.Code
AND City.ID = Country.Capital
AND CountryLanguage.Country = Country.Code
CountryLanguage
Country
Country
Code
Capital
City
CountryCode
ID
19

Range Optimizer
• Goal: find the “minimal” ranges to read for each index
• Example:
SELECT * FROM t1 WHERE (key1 > 10 AND key1 < 20) AND key2 > 30
• Range scan using INDEX(key1):
• Range scan using INDEX(key2):
10 20
30
20

Range Optimizer, cont.
• Range optimizer selects the “useful” parts of the WHERE condition:
– Conditions comparing a column value with a constant:
– Nested AND/OR conditions are supported
• Result: list of disjoint ranges that need to be read from index:
• Cost estimate based on number of records in each range:
– Record estimate is found by asking the storage engine (“index dives”)
key > 3
key = 4
key IS NULLkey BETWEEN 4 AND 6
key LIKE ”abc%”key IN (10,12,..)
21

Range Access for Multi-part Index
• Table:
• INDEX idx (a, b, c);
• Logical storage layout of index:
Example table with multi-part index
10
1 2 3 4 5
11
1 2 3 4 5
12
1 2 3 4 5
13
1 2 3 4 5
a
b
c
pk a b c d
22

Range Access for Multi-part Index, cont
• Equality on 1st index part?
– Can add condition on 2nd index part to range condition
• Example:
SELECT * from t1 WHERE a IN (10,11,13) AND (b=2 OR b=4)
• Resulting range scan:
10
1 2 3 4 5
11
1 2 3 4 5
12
1 2 3 4 5
13
1 2 3 4 5
a
b
c
2 4 2 4 2 4
23

Range Access for Multi-part Index, cont
• Non-equality on 1st index part:
– Can NOT add condition on 2nd index part in range condition
• Example:
SELECT * from t1 WHERE a > 10 AND a < 13 AND (b=2 OR b=4)
• Resulting range scan:
10
1 2 3 4 5
11
1 2 3 4 5
12
1 2 3 4 5
13
1 2 3 4 5
a
b
c
a >10 AND a < 13
24

• Use multiple indexes on the same
table
• Implemented index merge
strategies:
– Index Merge Union
• OR conditions between different indexes
– Index Merge Intersect
• AND conditions between different indexes
– Index Merge Sort-Union
• OR conditions where condition is a range
Index Merge
• Example:
SELECT * FROM t1 WHERE a=10 OR b=10
10INDEX(a)
10INDEX(b)
a=10 OR b=10Result:
Union
25

Loose Index Scan
• Optimization for GROUP BY and DISTINCT:
SELECT a, b FROM t1 GROUP BY a, b;
SELECT DISTINCT a, b FROM t1;
SELECT a, MIN(b) FROM t1 GROUP BY a;
• GROUP BY/DISTINCT must be on the prefix of the index
10
1 2 3 4 5
11
1 2 3 4 5
12
1 2 3 4 5
13
1 2 3 4 5
a
b
c
26

Program Agenda
Join optimizer
Plan refinements
1
2
3
4
5
27
6
7

Join Optimizer
• Goal:
– Given a JOIN of N tables, find the best JOIN
ordering
• “Greedy search strategy”:
– Start with all 1-table plans
– Expand each plan with remaining tables
• Depth-first
– If “cost of partial plan” > “cost of best plan”:
• “prune” plan
– Heuristic pruning:
• Prune less promising partial plans
t1
t2
t2
t2
t2
t3
t3
t3
t4t4
t4
t4t4
t3
t3 t2
t4t2 t3
28
N! possible
plans

Join Optimizer Illustrated
SELECT City.Name, Language FROM Language, Country, City
WHERE City.CountryCode = Country.Code
AND City.ID = Country.Capital
AND City.Population >= 1000000
AND Language.Country = Country.Code;
Language Country City
Language
Language
Language
LanguageCountry
Country Country
Country
City
CityCity
City
cost=26568 cost=32568 cost=627
cost=1245
cost=862
start
29

New in
MySQL 5.7
Record and Cost Estimates for JOIN
• tx JOIN tx+1
• records(tx+1) = records(tx) * condition_filter_effect * records_per_key
Condition filter effect
tx tx+1
Ref
access
Number of
records read
from tx
Conditionfilter
effect
Records passing the
table conditions on tx
30
Cardinality statistics
for index

How to Calculate Condition Filter Effect, step 1
A condition contributes to the condition filter effect for a table only if:
– It references a field in the table
– It is not used by the access method
– It depends on an available value:
• employee.name = “John” will always contribute to filter on employee
• employee.first_office_id <> office.id; depends on JOIN order
SELECT office_name
FROM office JOIN employee
WHERE office.id = employee.office_id AND
employee.name = “John” AND
employee.first_office_id <> office.id;
31
New in
MySQL
5.7

Filter estimate based on what is
available:
1. Range estimate
2. Index statistics
3. Guesstimate
= 0.1
<=,<,>,>= 1/3
BETWEEN 1/9
NOT <op> 1 – SEL(<op>)
AND P(A and B) = P(A) * P(B)
OR P(A or B) = P(A) + P(B) – P(A and B)
… …
How to Calculate Condition Filter Effect, step 2
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = “San Francisco” AND
employee.name = “John” AND age > 21 AND
hire_date BETWEEN “2014-01-01” AND “2014-06-01”;
32

SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = “San Francisco” AND
employee.name = “John” AND age > 21 AND
hire_date BETWEEN “2014-01-01” AND “2014-06-01”;
Calculating Condition Filter Effect for Tables
Condition filter effect for tables:
– office: 0.03
– employee: 0.1 * 0.11 * 0.89
Example
0.1
(guesstimate)
0.89
(range)
0.11
(guesstimate)
0.03
(index)
33

Program Agenda
Join optimizer
Plan refinements
1
2
3
4
5
34
6
7

Finalizing Query Plan
• Assigning query conditions to tables
– Evaluate conditions as early as possible in join order
• ORDER BY optimization: avoid sorting
– Change to different index
– Read in descending order
• Change to a cheaper access method
– Example: Use range scan instead of table scan or ref access
• Index Condition Pushdown
Main optimizations:
35

ORDER BY Optimizations
• General solution: “File sort”
– Store query result in temporary table before sorting
– If data volume is large, may need to sort in several passes with intermediate storage
on disk
• Optimizations:
– Switch to use index that provides result in sorted order
– For “LIMIT n” queries, maintain priority queue on n top items in memory instead of
file sort
36

Index Condition Pushdown
• Pushes conditions that can be evaluated on the
index down to storage engine
– Works only on indexed columns
• Goal: evaluate conditions without having to
access the actual record
– Reduces number of disk/block accesses
– Reduces CPU usage
Query
conditions
Index
Table data
Storage engine
MySQL server
37

How it works
Index Condition Pushdown
Without ICP:
Storage Engine:
1. Reads index
2. Reads record
3. Returns record
Server:
4. Evaluates condition
With ICP:
Storage Engine:
1. Reads index and
evaluates pushed index
condition
2. Reads record
3. Returns record
Server:
4. Evaluates rest of condition
Execution
Index
Table data
2.
1.
3.
4.
Storage engine
MySQL server
Optimizer
38

Program Agenda
Join optimizer
Plan refinements
1
2
3
4
5
39
6
7

Subquery category:
• IN (SELECT …)
• NOT IN (SELECT …)
• FROM (SELECT …)
• <CompOp> ALL/ANY (SELECT ..)
• EXISTS/other
Strategy:
Overview of Subquery Optimizations
• Semi-join
• Materialization
• IN ➜ EXISTS
• Merged
• Materialized
• MAX/MIN re-write
• Execute subquery
40
New in
MySQL 5.7

Optimization of IN subqueries
1. Transform IN (and =ANY) subquery to semi-join:
2. Apply transformations/strategies for avoiding/removing duplicates:
3. Optimize using cost-based JOIN optimizer
A. Semi-join Transformation
Table pullout
Duplicate
Weedout First Match
LooseScan
Semi-join
materialization
41
SELECT * FROM t1
WHERE query_where AND outer_expr IN (SELECT inner_expr FROM t2 WHERE cond2)
SELECT * FROM t1 SEMIJOIN t2 ON outer_expr = inner_expr
WHERE query_where AND cond2

Optimization of IN Subqueries, cont.
• Only for non-correlated subqueries
• Execute subquery once
– store result in temporary table with unique index (removes duplicates)
• Outer query does lookup in temporary table
B. Subquery Materialization
SELECT title FROM film
WHERE film_id IN
(SELECT film_id FROM actor WHERE name=“Bullock”)
Temporarytable
Index
Materialize
Lookup
42

Optimization of IN Subqueries, cont.
• Convert IN subquery to EXISTS subquery by “push-down” IN-equality to
subquery:
• Benefit: subquery will evaluate fewer records
• Note: special handling if pushed down expressions can be NULL
C. IN  EXISTS transformation
WHERE film_id IN (SELECT film_id FROM actor WHERE name=“Bullock”)
WHERE EXISTS (SELECT 1 FROM actor
WHERE name=“Bullock” AND film.film_id = actor.film_id)
43

Program Agenda
Join optimizer
Plan refinements
1
2
3
4
5
44
6
7

Understanding the Query Plan
• Use EXPLAIN to print the final query plan:
• Explain for a running query (New in MySQL 5.7):
EXPLAIN FOR CONNECTION connection_id;
EXPLAIN
EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.a = t2.a WHERE b > 10 AND c > 10;
+----+--------+-------+-------+---------------+-----+---------+------+------+----------+-----------------------+
| id | type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------+-------+-------+---------------+-----+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | t1 | range | PRIMARY,idx1 | idx1| 4 | NULL | 12 | 33.33 | Using index condition |
| 2 | SIMPLE | t2 | ref | idx2 | idx2| 4 | t1.a | 1 | 100.00 | NULL |
+----+--------+-------+-------+---------------+-----+---------+------+------+----------+-----------------------+
45

Added in
MySQL 5.7
• JSON format:
• Contains more information:
– Used index parts
– Pushed index conditions
– Cost estimates
– Data estimates
Structured EXPLAIN
EXPLAIN FORMAT=JSON
SELECT * FROM t1 WHERE b > 10 AND c > 10;
EXPLAIN
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "17.81"
},
"table": {
"table_name": "t1",
"access_type": "range",
"possible_keys": [
"idx1"
],
"key": "idx1",
"used_key_parts": [
"b"
],
"key_length": "4",
"rows_examined_per_scan": 12,
"rows_produced_per_join": 3,
"filtered": "33.33",
"index_condition": "(`test`.`t1`.`b` > 10)",
"cost_info": {
"read_cost": "17.01",
"eval_cost": "0.80",
"prefix_cost": "17.81",
"data_read_per_join": "63"
},
………
"attached_condition": "(`test`.`t1`.`c` > 10)"
}
}
}
EXPLAIN FORMAT=JSON SELECT …
46

Optimizer Trace
• Trace of the main steps and decisions done
by the optimizer
Understand HOW a query is optimized
SET optimizer_trace=”enabled=on”;
SELECT * FROM t1 WHERE a > 10;
SELECT * FROM
INFORMATION_SCHEMA.OPTIMIZER_TRACE;
"table": "`t1`",
"range_analysis": {
"table_scan": {
"rows": 54,
"cost": 13.9
},
"best_covering_index_scan": {
"index": ”idx",
"cost": 11.903,
"chosen": true
},
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": ”idx",
"ranges": [
"10 < a"
],
"rowid_ordered": false,
"using_mrr": false,
"index_only": true,
"rows": 12,
"cost": 3.4314,
"chosen": true
}
48

Influencing the Optimizer
• Add indexes
• Use hints:
– Index hints: USE INDEX, FORCE INDEX, IGNORE INDEX
– Join order: STRAIGHT_JOIN
– Subquery strategy: /*+ SEMIJOIN(FirstMatch) */
– Join buffer strategy: /*+ BKA(table1) */
• Adjust optimizer_switch flags:
– set optimizer_switch=“index_merge=off”
• Ask question in the MySQL optimizer forum
When the optimizer does not do what you want:
New hint syntax
and new hints
in MySQL 5.7
49

Summary
• Query transformations
• Selecting data access method
• Join optimizer
• Subquery optimizations
• Plan refinements
Questions?
MySQL 5.7:
What´s New in the Optimizer
Today at 16:20
Matterhorn 1
50

MySQL Optimizer Overview: Logical Transformations, Cost-Based Optimizations and Access Methods

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to MySQL Optimizer Overview: Logical Transformations, Cost-Based Optimizations and Access Methods

Similar to MySQL Optimizer Overview: Logical Transformations, Cost-Based Optimizations and Access Methods (20)

Recently uploaded

Recently uploaded (20)

MySQL Optimizer Overview: Logical Transformations, Cost-Based Optimizations and Access Methods

Editor's Notes