MySQL Index
How mysql choose the execution plan
Li Xinhe @2016 July
You’ve Made a Great Choice!
Understanding indexing is crucial both for Devs and DBAs
Poor index choices are responsible for large portion of production problems.
Indexing is not a rocket science
Maybe not for Optimizer
source code lines almost 2M
code shipped per year “5 line”
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
Quiz From Garena Test for Software Developers
Which of the following queries can fully utilize the composite index "INDEX(a, b)"
on the columns "a" and "b" in the "user" table? ______
A. SELECT * FROM user WHERE a=0 AND b=0;
B. SELECT * FROM user WHERE a=0 OR b=0;
C. SELECT * FROM user WHERE a>0 AND b=0;
D. SELECT * FROM user WHERE a=0 AND b>0;
Quiz:Your Answer
A. a=0 AND b=0;
B. a=0 OR b=0;
C. a>0 AND b=0;
D. a=0 AND b>0;
Quiz :My Answer
A. a=0 AND b=0;
B. a=0 OR b=0;
C. a>0 AND b=0;
D. a=0 AND b>0;
Official Answer AD
My Answer A, AD, ACD, ABCD
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
MySQL & Index
What are indexes for:
Speed up access in the db
Help to enforce constraints (UNIQUE, FOREIGN KEY)
Types of Indexes
BTree Majority of indexes we deal in MySQL
RTree
HASH
B++Tree Example
Indexes in MyISAM vs Innodb
MyISAM:
Point to physical offset in the data file
All indexes are equivalent
Innodb
Clustered Indexes (primary key) store data in the leaf page, not pointer
Secondary Indexes
Indexes
Multiple Column Indexes Or Composite Index
KEY `index1` (`a`,`b`)
Still one B+Tree Index
Index query Vs Post filter
Storage Engine (Innodb) use the Index for query, then MySQL will filter if needed
Overhead of The Indexing
Update the indexes when writing
Impact on Cost of Indexing for Innodb
Long PK
Make all Secondary keys longer and slower
Random PK
Insertion causes a lot of page splits reduce the lifetime of SSD
Low selectivity index
Index on gender
Random Read Vs Sequential Read
Prefetching
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
Explain the “EXPLAIN”
ID
select_type
SIMPLE , PRIMARY, SUBQUERY, DERIVED, UNION, UNION RESULT
Type best ---> worst
const, system > eq_ref , ref > range > index >> ALL
Possible_keys & key &Rows
Key_len: Composite Index
Extra
EXPLAIN
More data in MySQL 5.7
Try “format=json” MySQL 5.6
TRACE
EXPLAIN shows the selected plan
TRACE show WHY the plan was selected:
Alternative plans
Estimated costs
Decisions mode
JSON format
How to use Mysql 5.6
SET optimizer_trace= "enabled=on"
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
MySQL Optimizer
Cost-based Query Optimization: General idea
Assign cost to operations
Compute cost of partial or alternative plans
Search for plan with lowest cost
Cost-based optimizations:
Access method
Join order
Subquery strategy
Input to Cost Model
IO-cost:
Estimates from storage engine based on number of pages to read
Both index and data pages
Schema:
Length of records and keys
Input to Cost Model
Statistics:
Number of rows in table
Key distribution/Cardinality:
Average number of records per key value
Only for indexed columns
Maintained by storage engine
More on Cost Model
Not just minimizing number of scanned rows
Lots of other heuristics and hacks
Primary Key is special for Innodb
Covering Index benefits
Full table scan is faster
Also can use index for sorting
Memory Disk SSD
Table scan 6.8s 36s 15s
Index scan 5.2s 2.5hour 30min
Cost Model Example
SELECT * FROM t2 WHERE a BETWEEN x AND y;
Table scan:
IO cost : #pages in table
CPU cost : #rows * ROW_EVALUATE_COST
Range scan:
IO cost : #pages to read from index + #rows_in_range
Cost Model Example EXPLAIN
EXPLAIN SELECT * FROM t2 WHERE a BETWEEN 50 AND 60;
EXPLAIN SELECT * FROM t2 WHERE a BETWEEN 50 AND 70;
Cost Model Example TRACE
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
ICP Index_Condition_Pushdown
Main Ideal:
Using Index data to filter WHERE clause
Push where clause “Conditions” for Storage engine to filter
SELECT A WHERE B = 2 AND C LIKE “%lee%”
NO ICP
Index(B) -- traditional, using index for range only
Index(B,C,A) -- covering. All involved columns included
Using ICP
Index(B,C) range access by B, filter clause on c, only read full row if match
ICP Index_Condition_Pushdown
No ICP Using ICP
WHERE B = 2 AND C LIKE “%lee%” Index (B, C)
ICP Index_Condition_Pushdown
Mysql 5.6+ (5.7 support partitioned tables)
Used for the range, ref, eg_ref and ref_or_null
By default is on
SELECT @@optimizer_switch;
set @@optimizer_switch = "index_condition_pushdown=off"
ICP demo
Table & Data
create table icp(id int, age int, name varchar(30), memo varchar(600));
alter table icp add index aind(age, name, memo);
while (100K)
{
--eval insert into icp values($i, 1, 'a$i', repeat('a$i', 100))
}
SQL: select * from icp where age = 1 and memo like '%9999%';
show session status like '%handler%';
Handler_read_next 100000 -- > 10+
Explain to check using ICP “Using index condition”
Agenda
1. Quiz
2. Intro MySQL & Index
3. Tools for monitoring, analyzing and tuning queries
4. MySQL cost-based optimizer
5. ICP
6. Quiz Discussion
Quiz: Explain
Scenario 1 Most case in live db config and db distribution AD
Scenario 2 Enable Index_Condition_Pushdown ACD
Scenario 3 Special data distribution A
Scenario 4 Special table structure (Covering Index) ABCD
Scenario 5 Special Storage Engine Index using hashtab A
How to modify the question to make answer unique?
A. a=0 AND b=0;
B. a=0 OR b=0;
C. a>0 AND b=0;
D. a=0 AND b>0;

How mysql choose the execution plan

  • 1.
    MySQL Index How mysqlchoose the execution plan Li Xinhe @2016 July
  • 2.
    You’ve Made aGreat Choice! Understanding indexing is crucial both for Devs and DBAs Poor index choices are responsible for large portion of production problems. Indexing is not a rocket science Maybe not for Optimizer source code lines almost 2M code shipped per year “5 line”
  • 3.
    Agenda 1. Quiz 2. IntroMySQL & Index 3. Tools for monitoring, analyzing and tuning queries 4. MySQL cost-based optimizer 5. ICP 6. Quiz Discussion
  • 4.
    Quiz From GarenaTest for Software Developers Which of the following queries can fully utilize the composite index "INDEX(a, b)" on the columns "a" and "b" in the "user" table? ______ A. SELECT * FROM user WHERE a=0 AND b=0; B. SELECT * FROM user WHERE a=0 OR b=0; C. SELECT * FROM user WHERE a>0 AND b=0; D. SELECT * FROM user WHERE a=0 AND b>0;
  • 5.
    Quiz:Your Answer A. a=0AND b=0; B. a=0 OR b=0; C. a>0 AND b=0; D. a=0 AND b>0;
  • 6.
    Quiz :My Answer A.a=0 AND b=0; B. a=0 OR b=0; C. a>0 AND b=0; D. a=0 AND b>0; Official Answer AD My Answer A, AD, ACD, ABCD
  • 7.
    Agenda 1. Quiz 2. IntroMySQL & Index 3. Tools for monitoring, analyzing and tuning queries 4. MySQL cost-based optimizer 5. ICP 6. Quiz Discussion
  • 8.
    MySQL & Index Whatare indexes for: Speed up access in the db Help to enforce constraints (UNIQUE, FOREIGN KEY) Types of Indexes BTree Majority of indexes we deal in MySQL RTree HASH
  • 9.
  • 10.
    Indexes in MyISAMvs Innodb MyISAM: Point to physical offset in the data file All indexes are equivalent Innodb Clustered Indexes (primary key) store data in the leaf page, not pointer Secondary Indexes
  • 12.
    Indexes Multiple Column IndexesOr Composite Index KEY `index1` (`a`,`b`) Still one B+Tree Index Index query Vs Post filter Storage Engine (Innodb) use the Index for query, then MySQL will filter if needed Overhead of The Indexing Update the indexes when writing
  • 13.
    Impact on Costof Indexing for Innodb Long PK Make all Secondary keys longer and slower Random PK Insertion causes a lot of page splits reduce the lifetime of SSD Low selectivity index Index on gender Random Read Vs Sequential Read Prefetching
  • 14.
    Agenda 1. Quiz 2. IntroMySQL & Index 3. Tools for monitoring, analyzing and tuning queries 4. MySQL cost-based optimizer 5. ICP 6. Quiz Discussion
  • 15.
    Explain the “EXPLAIN” ID select_type SIMPLE, PRIMARY, SUBQUERY, DERIVED, UNION, UNION RESULT Type best ---> worst const, system > eq_ref , ref > range > index >> ALL Possible_keys & key &Rows Key_len: Composite Index Extra
  • 16.
    EXPLAIN More data inMySQL 5.7 Try “format=json” MySQL 5.6
  • 17.
    TRACE EXPLAIN shows theselected plan TRACE show WHY the plan was selected: Alternative plans Estimated costs Decisions mode JSON format How to use Mysql 5.6 SET optimizer_trace= "enabled=on"
  • 18.
    Agenda 1. Quiz 2. IntroMySQL & Index 3. Tools for monitoring, analyzing and tuning queries 4. MySQL cost-based optimizer 5. ICP 6. Quiz Discussion
  • 19.
  • 20.
    Cost-based Query Optimization:General idea Assign cost to operations Compute cost of partial or alternative plans Search for plan with lowest cost Cost-based optimizations: Access method Join order Subquery strategy
  • 21.
    Input to CostModel IO-cost: Estimates from storage engine based on number of pages to read Both index and data pages Schema: Length of records and keys
  • 22.
    Input to CostModel Statistics: Number of rows in table Key distribution/Cardinality: Average number of records per key value Only for indexed columns Maintained by storage engine
  • 23.
    More on CostModel Not just minimizing number of scanned rows Lots of other heuristics and hacks Primary Key is special for Innodb Covering Index benefits Full table scan is faster Also can use index for sorting Memory Disk SSD Table scan 6.8s 36s 15s Index scan 5.2s 2.5hour 30min
  • 24.
    Cost Model Example SELECT* FROM t2 WHERE a BETWEEN x AND y; Table scan: IO cost : #pages in table CPU cost : #rows * ROW_EVALUATE_COST Range scan: IO cost : #pages to read from index + #rows_in_range
  • 25.
    Cost Model ExampleEXPLAIN EXPLAIN SELECT * FROM t2 WHERE a BETWEEN 50 AND 60; EXPLAIN SELECT * FROM t2 WHERE a BETWEEN 50 AND 70;
  • 26.
  • 27.
    Agenda 1. Quiz 2. IntroMySQL & Index 3. Tools for monitoring, analyzing and tuning queries 4. MySQL cost-based optimizer 5. ICP 6. Quiz Discussion
  • 28.
    ICP Index_Condition_Pushdown Main Ideal: UsingIndex data to filter WHERE clause Push where clause “Conditions” for Storage engine to filter SELECT A WHERE B = 2 AND C LIKE “%lee%” NO ICP Index(B) -- traditional, using index for range only Index(B,C,A) -- covering. All involved columns included Using ICP Index(B,C) range access by B, filter clause on c, only read full row if match
  • 29.
    ICP Index_Condition_Pushdown No ICPUsing ICP WHERE B = 2 AND C LIKE “%lee%” Index (B, C)
  • 30.
    ICP Index_Condition_Pushdown Mysql 5.6+(5.7 support partitioned tables) Used for the range, ref, eg_ref and ref_or_null By default is on SELECT @@optimizer_switch; set @@optimizer_switch = "index_condition_pushdown=off"
  • 31.
    ICP demo Table &Data create table icp(id int, age int, name varchar(30), memo varchar(600)); alter table icp add index aind(age, name, memo); while (100K) { --eval insert into icp values($i, 1, 'a$i', repeat('a$i', 100)) } SQL: select * from icp where age = 1 and memo like '%9999%'; show session status like '%handler%'; Handler_read_next 100000 -- > 10+ Explain to check using ICP “Using index condition”
  • 32.
    Agenda 1. Quiz 2. IntroMySQL & Index 3. Tools for monitoring, analyzing and tuning queries 4. MySQL cost-based optimizer 5. ICP 6. Quiz Discussion
  • 33.
    Quiz: Explain Scenario 1Most case in live db config and db distribution AD Scenario 2 Enable Index_Condition_Pushdown ACD Scenario 3 Special data distribution A Scenario 4 Special table structure (Covering Index) ABCD Scenario 5 Special Storage Engine Index using hashtab A How to modify the question to make answer unique? A. a=0 AND b=0; B. a=0 OR b=0; C. a>0 AND b=0; D. a=0 AND b>0;