A Billion Goods in a Few Categories
When Optimizer Histograms Help and When They Don’t
September 18, 2019
Sveta Smirnova
•Introduction
•The Use Case
The Cardinality: Two Levels
Example
•Why the Difference?
•Even Worse Use Case
ANALYZE TABLE Limitations
Example
•How Histograms Work?
•Left Overs
Table of Contents
2
The column statistics data dictionary table stores histogram statistics about
column values, for use by the optimizer in constructing query execution plans
MySQL User Reference Manual
Optimizer Statistics aka Histograms
3
• MySQL Support engineer
• Author of
• MySQL Troubleshooting
• JSON UDF functions
• FILTER clause for MySQL
• Speaker
• Percona Live, OOW, Fosdem,
DevConf, HighLoad...
Sveta Smirnova
4
Introduction
• Hardware
• Wise options
• Optimized queries
• Brain
Everything can Be Resolved!
6
• This talk is about
•
How I spent the last three years
• Resolving the same issue
• For different customers
Not Everything
7
• This talk is about
•
How I spent the last three years
• Resolving the same issue
• For different customers
•
Task was to speed up the query
Not Everything
7
• Specific data distribution
Not All the Queries Can be Optimized
8
• Specific data distribution
• Access on different fields
•
ON goods.shop id = shop.id
• WHERE shop.location IN (...)
• GROUP BY goods.category, shop.profile
• ORDER BY shop.distance, goods.quantity
Not All the Queries Can be Optimized
8
• Specific data distribution
• Access on different fields
•
ON goods.shop id = shop.id
• WHERE shop.location IN (...)
• GROUP BY goods.category, shop.profile
• ORDER BY shop.distance, goods.quantity
• Index cannot be used effectively
Not All the Queries Can be Optimized
8
• Data distribution varies
•
Big difference between number of values
Red 1,000,000
Green 2
Blue 100,000
Latest Support Tickets
9
• Data distribution varies
•
Constantly changing
Red 100,000
Green 1,000,000
Blue 10
Latest Support Tickets
9
• Data distribution varies
•
Constantly changing
Red 1,000
Green 2,000
Blue 50,000
Latest Support Tickets
9
• Data distribution varies
• Cardinality is not correct
• Was not updated in time
•
Updates too often
• Calculated wrongly
Latest Support Tickets
9
• Data distribution varies
• Cardinality is not correct
• Index maintenance is expensive
• Hardware resources
•
Slow updates
• Window to run CREATE INDEX
Latest Support Tickets
9
• Data distribution varies
• Cardinality is not correct
• Index maintenance is expensive
•
Optimizer does not work as we wish it
Examples in my talk @Percona Live Frankfurt
Latest Support Tickets
9
• Topic based on real Support cases
•
Couple of them are still in progress
Disclaimer
10
• Topic based on real Support cases
• All examples are 100% fake
•
They are created so that
• No customer can be identified
• Everything generated
Table names
Column names
Data
• Use case itself is fictional
Disclaimer
10
• Topic based on real Support cases
• All examples are 100% fake
• All examples are simplified
• Only columns, required to show the issue
•
Everything extra removed
• Real tables usually store much more data
Disclaimer
10
• Topic based on real Support cases
• All examples are 100% fake
• All examples are simplified
• All disasters happened with version 5.7
Disclaimer
10
The Use Case
•
categories
• Less than 20 rows
Two Tables
12
•
categories
• Less than 20 rows
• goods
• More than 1M rows
• 20 unique cat id values
• Many other fields
Price
Date: added, last updated, etc.
Characteristics
Store
...
Two Tables
12
select *
from
goods
join
categories
on
(categories.id=goods.cat_id)
where
date_added between ’2018-07-01’ and ’2018-08-01’
and
cat_id in (16,11)
and
price >= 1000 and <=10000 [ and ... ]
[ GROUP BY ... [ORDER BY ... [ LIMIT ...]]]
;
JOIN
13
• Select from the small table
Option 1: Select from the Small Table First
14
• Select from the small table
• For each cat id select from the large table
Option 1: Select from the Small Table First
14
• Select from the small table
• For each cat id select from the large table
• Filter result on date added[ and price[...]]
Option 1: Select from the Small Table First
14
• Select from the small table
• For each cat id select from the large table
• Filter result on date added[ and price[...]]
• Slow with many items in the category
Option 1: Select from the Small Table First
14
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
• Filter rows by date added[ and price[...]]
Option 2: Select From the Large Table First
16
• Filter rows by date added[ and price[...]]
• Get cat id values
Option 2: Select From the Large Table First
16
• Filter rows by date added[ and price[...]]
• Get cat id values
• Retrieve rows from the small table
Option 2: Select From the Large Table First
16
• Filter rows by date added[ and price[...]]
• Get cat id values
• Retrieve rows from the small table
• Slow if number of rows, filtered by
date added, is larger than number of goods in
the selected categories
Option 2: Select From the Large Table First
16
Option 2: Illustration
17
Option 2: Illustration
17
Option 2: Illustration
17
Option 2: Illustration
17
Option 2: Illustration
17
•
CREATE INDEX index everything
(cat id, date added[, price[, ...]])
• It resolves the issue
What if We use Combined Indexes?
18
•
CREATE INDEX index everything
(cat id, date added[, price[, ...]])
• It resolves the issue
• But not in all cases
What if We use Combined Indexes?
18
• Maintenance cost
•
Slower INSERT/UPDATE/DELETE
• Disk space
The Problem
19
• Maintenance cost
•
Slower INSERT/UPDATE/DELETE
• Disk space
• Index not useful for selecting rows
JOIN categories ON (categories.id=goods.cat_id)
JOIN shops ON (shops.id=goods.shop_id)
[ JOIN ... ]
WHERE
date_added between ’2018-07-01’ and ’2018-08-01’
AND
cat_id in (16,11) AND price >= 1000 AND price <=10000 [ AND ... ]
GROUP BY product_type
ORDER BY date_updated DESC
LIMIT 50,100
The Problem
19
• Maintenance cost
•
Slower INSERT/UPDATE/DELETE
• Disk space
• Index not useful for selecting rows
• Tables may have wrong cardinality
The Problem
19
The Use Case
The Cardinality: Two Levels
The Query
Parser
Optimizer
Storage Engine
Data
MySQL Architecture
21
• Optimizer
•
Engine
• MyRocks
• InnoDB
•
Any
MySQL is Layered Architecture
22
• Number of unique values in the index
• Optimizer uses for the query execution plan
Cardinality
23
• Number of unique values in the index
• Optimizer uses for the query execution plan
• Example
• ID: 1,2,3,4,5
•
Number of rows: 5
• Cardinality: 5
Cardinality
23
• Number of unique values in the index
• Optimizer uses for the query execution plan
• Example
• Gender: m,f,f,f,f,m,m,m,m,m,m,f,f,m,f,m,f
•
Number of rows: 17
• Cardinality: 2
Cardinality
23
• Stores statistics on disk
•
mysql.innodb table stats
•
mysql.innodb index stats
InnoDB: Overview
24
• Stores statistics on disk
• Returns statistics to Optimizer
InnoDB: Overview
24
• Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
InnoDB: Overview
24
• Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
•
When opens table
• flag = HA STATUS CONST
• Reads data from disk
•
Stores it in memory
InnoDB: Overview
24
• Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
•
When opens table
• Subsequent table accesses
• flag = HA STATUS VARIABLE
• Statistics from memory
•
Up to date Primary Key data
InnoDB: Overview
24
• Table created with option STATS AUTO RECALC = 0
• Before ANALYZE TABLE
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 64
...
InnoDB: Flow
25
• Table created with option STATS AUTO RECALC = 0
• After ANALYZE TABLE
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 2
...
InnoDB: Flow
25
• Table created with option STATS AUTO RECALC = 0
• After inserting rows
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 16
...
InnoDB: Flow
25
• Table created with option STATS AUTO RECALC = 0
• After restart
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 2
...
InnoDB: Flow
25
• Takes data from the engine
Optimizer: Overview
26
• Takes data from the engine
• Class ha statistics
•
sql/handler.h
Optimizer: Overview
26
• Takes data from the engine
• Class ha statistics
•
sql/handler.h
• Does not have Cardinality field at all
Optimizer: Overview
26
• Takes data from the engine
• Class ha statistics
•
sql/handler.h
• Does not have Cardinality field at all
• Uses formula to calculate Cardinality
Optimizer: Overview
26
• n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
Optimizer: Formula
27
• n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
• rec per key: number of duplicates per key
•
Calculated by InnoDB in time of ANALYZE
• rec per key = n rows / unique values
• Do not change!
Optimizer: Formula
27
• n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
• rec per key: number of duplicates per key
•
Calculated by InnoDB in time of ANALYZE
• rec per key = n rows / unique values
• Do not change!
•
Cardinality = n rows / rec per key
Optimizer: Formula
27
• Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
Persistent Statistics Are Not Persistent
28
• Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
• Optimizer calculates Cardinality every time
when accesses engine statistics
Persistent Statistics Are Not Persistent
28
• Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
• Optimizer calculates Cardinality every time
when accesses engine statistics
•
Weak user control
Persistent Statistics Are Not Persistent
28
The Use Case
Example
• EXPLAIN without histograms
mysql> explain select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’ -- Large range
-> order by goods.cat_id
-> limit 10G -- We ask for 10 rows only!
Example
30
• EXPLAIN without histograms
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: categories -- Small table first
partitions: NULL
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 20
filtered: 70.00
Extra: Using where; Using index;
Using temporary; Using filesort
Example
30
• EXPLAIN without histograms
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: goods -- Large table
partitions: NULL
type: ref
possible_keys: cat_id_2
key: cat_id_2
key_len: 5
ref: orig.categories.id
rows: 51827
filtered: 11.11 -- Default value
Extra: Using where
2 rows in set, 1 warning (0.01 sec)
Example
30
• Execution time without histograms
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
mysql> select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10;
ab9f9bb7bc4f357712ec34f067eda364 -
10 rows in set (56.47 sec)
Example
30
• Engine statistics without histograms
mysql> show status like ’Handler%’;
+----------------------------+--------+
| Variable_name | Value |
+----------------------------+--------+
...
| Handler_read_next | 964718 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 10 |
| Handler_read_rnd_next | 951671 |
...
| Handler_write | 951670 |
+----------------------------+--------+
18 rows in set (0.01 sec)
Example
30
• Now let add the histogram
mysql> analyze table goods update histogram on date_added;
+------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------+-----------+----------+------------------------------+
| orig.goods | histogram | status | Histogram statistics created
for column ’date_added’. |
+------------+-----------+----------+------------------------------+
1 row in set (2.01 sec)
Example
30
• EXPLAIN with the histogram
mysql> explain select goods.* from goods
-> join categories
-> on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10G
Example
30
• EXPLAIN with the histogram
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: goods -- Large table first
partitions: NULL
type: index
possible_keys: cat_id_2
key: cat_id_2
key_len: 5
ref: NULL
rows: 10 -- Same as we asked
filtered: 98.70 -- True numbers
Extra: Using where
Example
30
• EXPLAIN with the histogram
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: categories -- Small table
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: orig.goods.cat_id
rows: 1
filtered: 100.00
Extra: Using index
2 rows in set, 1 warning (0.01 sec)
Example
30
• Execution time with the histogram
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
mysql> select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10;
eeb005fae0dd3441c5c380e1d87fee84 -
10 rows in set (0.00 sec) -- 56/0 times faster!
Example
30
• Engine statistics with the histogram
mysql> show status like ’Handler%’;
+----------------------------+-------++----------------------------+-------+
| Variable_name | Value || Variable_name | Value |
+----------------------------+-------++----------------------------+-------+
| Handler_commit | 1 || Handler_read_prev | 0 |
| Handler_delete | 0 || Handler_read_rnd | 0 |
| Handler_discover | 0 || Handler_read_rnd_next | 0 |
| Handler_external_lock | 4 || Handler_rollback | 0 |
| Handler_mrr_init | 0 || Handler_savepoint | 0 |
| Handler_prepare | 0 || Handler_savepoint_rollback | 0 |
| Handler_read_first | 1 || Handler_update | 0 |
| Handler_read_key | 3 || Handler_write | 0 |
| Handler_read_last | 0 |+----------------------------+-------+
| Handler_read_next | 9 |18 rows in set (0.00 sec)
Example
30
Why the Difference?
1 2 3 4 5 6 7 8 9 10
0
200
400
600
800
Indexes: Number of Items with Same Value
32
1 2 3 4 5 6 7 8 9 10
0
200
400
600
800
Indexes: Cardinality
33
1 2 3 4 5 6 7 8 9 10
0
200
400
600
800
Histograms: Number of Values in Each Bucket
34
1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
Histograms: Data in the Histogram
35
Even Worse Use Case
Even Worse Use Case
ANALYZE TABLE Limitations
• ANALYZE TABLE often
• Use large number of STATS SAMPLE PAGES
Solutions in 5.7-
38
• Counts number of pages in the table
How ANALYZE TABLE Works with InnoDB?
39
• Counts number of pages in the table
• Takes STATS SAMPLE PAGES
How ANALYZE TABLE Works with InnoDB?
39
• Counts number of pages in the table
• Takes STATS SAMPLE PAGES
• Counts number of unique values in secondary
index in these pages
How ANALYZE TABLE Works with InnoDB?
39
• Counts number of pages in the table
• Takes STATS SAMPLE PAGES
• Counts number of unique values in secondary
index in these pages
•
Divides number of pages in the table on
number of sample pages and multiplies result
by number of unique values
How ANALYZE TABLE Works with InnoDB?
39
• Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 20 (default)
• Unique values in the secondary index:
• In sample pages: 10
• In the table: 11
Example
40
• Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 20 (default)
• Unique values in the secondary index:
• In sample pages: 10
• In the table: 11
• Cardinality: 20,000 * 10 / 20 = 10,000
Example
40
• Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 5,000
•
Unique values in the secondary index:
• In sample pages: 10
•
In the table: 11
• Cardinality: 20,000 * 10 / 5,000 = 40
Example 2
41
• Time consuming
mysql> select count(*) from goods;
+----------+
| count(*) |
+----------+
| 80303000 |
+----------+
1 row in set (35.95 sec)
Use Larger STATS SAMPLE PAGES?
42
• Time consuming
• With default STATS SAMPLE PAGES
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (0.32 sec)
Use Larger STATS SAMPLE PAGES?
42
• Time consuming
• With bigger number
mysql> alter table goods STATS_SAMPLE_PAGES=5000;
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (27.13 sec)
Use Larger STATS SAMPLE PAGES?
42
• Time consuming
• With bigger number
• 27.13/0.32 = 85 times slower!
Use Larger STATS SAMPLE PAGES?
42
• Time consuming
• With bigger number
• 27.13/0.32 = 85 times slower!
•
Not always a solution
Use Larger STATS SAMPLE PAGES?
42
Even Worse Use Case
Example
•
goods characteristics
CREATE TABLE ‘goods_characteristics‘ (
‘id‘ int(11) NOT NULL AUTO_INCREMENT,
‘good_id‘ varchar(30) DEFAULT NULL,
‘size‘ int(11) DEFAULT NULL,
‘manufacturer‘ varchar(30) DEFAULT NULL,
PRIMARY KEY (‘id‘),
KEY ‘good_id‘ (‘good_id‘,‘size‘,‘manufacturer‘),
KEY ‘size‘ (‘size‘,‘manufacturer‘)
) ENGINE=InnoDB AUTO_INCREMENT=196606
DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Two Similar Tables
44
•
goods shops
CREATE TABLE ‘goods_shops‘ (
‘id‘ int(11) NOT NULL AUTO_INCREMENT,
‘good_id‘ varchar(30) DEFAULT NULL,
‘location‘ varchar(30) DEFAULT NULL,
‘delivery_options‘ varchar(30) DEFAULT NULL,
PRIMARY KEY (‘id‘),
KEY ‘good_id‘ (‘good_id‘,‘location‘,‘delivery_options‘),
KEY ‘location‘ (‘location‘,‘delivery_options‘)
) ENGINE=InnoDB AUTO_INCREMENT=131071
DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Two Similar Tables
44
• Size
mysql> select count(*) from goods_characteristics;
+----------+
| count(*) |
+----------+
| 131072 |
+----------+
1 row in set (0.08 sec)
mysql> select count(*) from goods_shops;
+----------+
| count(*) |
+----------+
| 65536 |
+----------+
1 row in set (0.04 sec)
Two Similar Tables
44
• Data Distribution: goods characteristics
mysql> select count(*) num_rows, good_id, size
-> from goods_characteristics group by good_id, size;
+----------+---------+------+
| num_rows | good_id | size |
+----------+---------+------+
| 65536 | laptop | 7 | | 8189 | laptop | 13 |
| 8187 | laptop | 8 | | 8191 | laptop | 14 |
| 8190 | laptop | 9 | | 8190 | laptop | 15 |
| 8188 | laptop | 10 | | 10 | laptop | 16 |
| 8192 | laptop | 11 | | 10 | laptop | 17 |
| 8189 | laptop | 12 | +----------+---------+------+
Two Similar Tables
44
• Data Distribution: goods characteristics
mysql> select count(*) num_rows, good_id, manufacturer
-> from goods_characteristics group by good_id, manufacturer order by num_rows desc;
+----------+---------+--------------+
| num_rows | good_id | manufacturer |
+----------+---------+--------------+
| 65536 | laptop | Noname | | 8189 | laptop | Toshiba |
| 8191 | laptop | Samsung | | 8189 | laptop | Apple |
| 8191 | laptop | Acer | | 8189 | laptop | Asus |
| 8189 | laptop | Dell | | 10 | laptop | Sony |
| 8189 | laptop | HP | | 10 | laptop | Casper |
| 8189 | laptop | Lenovo | +----------+---------+--------------+
Two Similar Tables
44
• Data Distribution: goods shops
mysql> select count(*) num_rows, good_id, location
-> from goods_shops group by good_id, location order by num_rows desc;
+----------+---------+---------------+
| num_rows | good_id | location |
+----------+---------+---------------+
| 8191 | laptop | New York | | 8189 | laptop | Tokio |
| 8191 | laptop | San Francisco | | 8189 | laptop | Istanbul |
| 8189 | laptop | Paris | | 8189 | laptop | London |
| 8189 | laptop | Berlin | | 10 | laptop | Moscow |
| 8189 | laptop | Brussels | | 10 | laptop | Kiev |
+----------+---------+---------------+
Two Similar Tables
44
• Data Distribution: goods shops
mysql> select count(*) num_rows, good_id, delivery_options
-> from goods_shops group by good_id, delivery_options order by num_rows desc;
+----------+---------+------------------+
| num_rows | good_id | delivery_options |
+----------+---------+------------------+
| 8192 | laptop | DHL | | 8189 | laptop | Gruzovichkof |
| 8191 | laptop | PTT | | 8188 | laptop | Courier |
| 8190 | laptop | Normal Post | | 8187 | laptop | No delivery |
| 8190 | laptop | Tracked | | 10 | laptop | Premium |
| 8189 | laptop | Fedex | | 10 | laptop | Urgent |
+----------+---------+------------------+
Two Similar Tables
44
Histogram statistics are useful primarily for nonindexed columns. Adding an
index to a column for which histogram statistics are applicable might also help
the optimizer make row estimates. The tradeoffs are:
An index must be updated when table data is modified.
A histogram is created or updated only on demand, so it adds no overhead
when table data is modified. On the other hand, the statistics become progres-
sively more out of date when table modifications occur, until the next time they
are updated.
MySQL User Reference Manual
Optimizer Statistics aka Histograms
45
mysql> alter table goods_characteristics stats_sample_pages=5000;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table goods_shops stats_sample_pages=5000;
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table goods_characteristics, goods_shops;
+----------------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------------------+---------+----------+----------+
| test.goods_characteristics | analyze | status | OK |
| test.goods_shops | analyze | status | OK |
+----------------------------+---------+----------+----------+
2 rows in set (0.35 sec)
Index Statistics is More than Good
46
• The query
mysql> select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
Performance
47
• Handlers
mysql> show status like ’Handler%’;
+----------------------------+-------------+
| Variable_name | Value |
+----------------------------+-------------+
| Handler_commit | 0 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 4 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 13043 |
| Handler_read_last | 0 |
| Handler_read_next | 854,767,916 |
...
Performance
47
• Table order
mysql> explain select count(*) from goods_shops join goods_characteristics
-> using (good_id) where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----+-----------------------+-------+---------+--------+----------+---------------+
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+---------------+
| 1 | goods_characteristics | index | good_id | 131072 | 25.00 | Using... |
| 1 | goods_shops | ref | good_id | 65536 | 36.00 | Using... |
+----+-----------------------+-------+---------+--------+----------+---------------+
2 rows in set, 1 warning (0.00 sec)
Performance
47
• Table order matters
mysql> explain select count(*) from goods_shops straight_join goods_characteristics
-> using (good_id) where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----+-----------------------+-------+---------+--------+----------+---------------+
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+---------------+
| 1 | goods_shops | index | good_id | 65536 | 36.00 | Using... |
| 1 | goods_characteristics | ref | good_id | 131072 | 25.00 | Using... |
+----+-----------------------+-------+---------+--------+----------+---------------+
2 rows in set, 1 warning (0.00 sec)
Performance
47
• Table order matters
mysql> select count(*) from goods_shops straight_join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----------+
| count(*) |
+----------+
| 816640 |
+----------+
1 row in set (2.11 sec)
Performance
47
• Table order matters
mysql> show status like ’Handler_read_next’;
+-------------------+-----------+
| Variable_name | Value |
+-------------------+-----------+
| Handler_read_next | 5,308,416 |
+-------------------+-----------+
1 row in set (0.00 sec)
Performance
47
• Not for all data
mysql> select count(*) from goods_shops straight_join goods_characteristics
-> using (good_id)
-> where (size > 15 or manufacturer in (’Sony’, ’Casper’))
-> and location in
-> (’New York’, ’San Francisco’, ’Paris’, ’Berlin’, ’Brussels’, ’London’)
-> and delivery_options in
-> (’DHL’,’Normal Post’, ’Tracked’, ’Fedex’, ’No delivery’);
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
Performance
47
• Not for all data
mysql> show status like ’Handler%’;
+----------------------------+------------+
| Variable_name | Value |
+----------------------------+------------+
| Handler_commit | 10 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 28 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 143 |
| Handler_read_last | 0 |
| Handler_read_next | 16,950,265 |
Performance
47
mysql> analyze table goods_shops update histogram
-> on location, delivery_options;
+-------------+-----------+----------+--------------------------------+
| Table | Op | Msg_type | Msg_text |
+-------------+-----------+----------+--------------------------------+
| goods_shops | histogram | status | Histogram statistics created
for column ’delivery_options’. |
| goods_shops | histogram | status | Histogram statistics created
for column ’location’. |
+-------------+-----------+----------+--------------------------------+
2 rows in set (0.18 sec)
Histograms to The Rescue
48
mysql> analyze table goods_characteristics update histogram
-> on size, manufacturer ;
+-----------------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------+-----------+----------+------------------------------+
| goods_characteristics | histogram | status | Histogram statistics created
for column ’manufacturer’. |
| goods_characteristics | histogram | status | Histogram statistics created
for column ’size’. |
+-----------------------+-----------+----------+------------------------------+
2 rows in set (0.23 sec)
Histograms to The Rescue
48
• The query
mysql> select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----------+
| count(*) |
+----------+
| 816640 |
+----------+
1 row in set (2.16 sec)
Histograms to The Rescue
48
• The query
mysql> show status like ’Handler_read_next’;
+-------------------+-----------+
| Variable_name | Value |
+-------------------+-----------+
| Handler_read_next | 5,308,418 |
+-------------------+-----------+
1 row in set (0.00 sec)
Histograms to The Rescue
48
• Filtering effect
mysql> explain select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----+-----------------------+-------+---------+--------+----------+----------+
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+----------+
| 1 | goods_shops | index | good_id | 65536 | 0.06 | Using... |
| 1 | goods_characteristics | ref | good_id | 131072 | 15.63 | Using... |
+----+-----------------------+-------+---------+--------+----------+----------+
2 rows in set, 1 warning (0.00 sec)
Histograms to The Rescue
48
How Histograms Work?
↓ sql/sql planner.cc
Low Level
50
↓ sql/sql planner.cc
↓ calculate condition filter
Low Level
50
↓ sql/sql planner.cc
↓ calculate condition filter
↓ Item func *::get filtering effect
Low Level
50
↓ sql/sql planner.cc
↓ calculate condition filter
↓ Item func *::get filtering effect
• get histogram selectivity
Low Level
50
↓ sql/sql planner.cc
↓ calculate condition filter
↓ Item func *::get filtering effect
• get histogram selectivity
• Seen as a percent of filtered rows in EXPLAIN
Low Level
50
• Example data
mysql> create table example(f1 int) engine=innodb;
mysql> insert into example values(1),(1),(1),(2),(3);
mysql> select f1, count(f1) from example group by f1;
+------+-----------+
| f1 | count(f1) |
+------+-----------+
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
+------+-----------+
3 rows in set (0.00 sec)
Filtered Rows
51
• Without a histogram
mysql> explain select * from example where f1 > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• Without a histogram
mysql> explain select * from example where f1 > 1G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• Without a histogram
mysql> explain select * from example where f1 > 2G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• Without a histogram
mysql> explain select * from example where f1 > 3G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• With the histogram
mysql> analyze table example update histogram on f1 with 3 buckets;
+-----------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------+-----------+----------+------------------------------+
| hist_ex.example | histogram | status | Histogram statistics created
for column ’f1’. |
+-----------------+-----------+----------+------------------------------+
1 row in set (0.03 sec)
Filtered Rows
51
• With the histogram
mysql> select * from information_schema.column_statistics
-> where table_name=’example’G
*************************** 1. row ***************************
SCHEMA_NAME: hist_ex
TABLE_NAME: example
COLUMN_NAME: f1
HISTOGRAM:
"buckets": [[1, 0.6], [2, 0.8], [3, 1.0]],
"data-type": "int", "null-values": 0.0, "collation-id": 8,
"last-updated": "2018-11-07 09:07:19.791470",
"sampling-rate": 1.0, "histogram-type": "singleton",
"number-of-buckets-specified": 3
1 row in set (0.00 sec)
Filtered Rows
51
• With the histogram
mysql> explain select * from example where f1 > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 100.00 -- all rows
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• With the histogram
mysql> explain select * from example where f1 > 1G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 40.00 -- 2 rows
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• With the histogram
mysql> explain select * from example where f1 > 2G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 20.00 -- one row
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• With the histogram
mysql> explain select * from example where f1 > 3G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 20.00 - one row
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
1 2 3
0
0.5
1
1.5
2
Indexes: Cardinality
52
1 2 3
0
0.2
0.4
0.6
0.8
1
Histograms
53
Left Overs
•
CREATE INDEX
• Metadata lock
•
Can be blocked by any query
Maintenance: Locking
55
•
CREATE INDEX
• Metadata lock
•
Can be blocked by any query
• UPDATE HISTOGRAM
• Backup lock
• Can be locked only by a backup
•
Can be created any time without fear
Maintenance: Locking
55
•
CREATE INDEX
• Locks writes
•
Locks reads ∗
PS-2503
Before Percona Server 5.6.38-83.0/5.7.20-18
Upstream
• Every DML updates the index
Maintenance: Load
56
•
CREATE INDEX
• Locks writes
•
Locks reads ∗
•
Every DML updates the index
•
UPDATE HISTOGRAM
• Uses up to histogram generation max mem size
•
Persistent after creation
• DML do not touch it
Maintenance: Load
56
• Helps if query plan can be changed
• Not a replacement for the index:
•
GROUP BY
• ORDER BY
• Query on a single table ∗
Histograms
57
• Data distribution is uniform
• Range optimization can be used
• Full table scan is fast
When Histogram are Not Helpful?
58
• Index statistics collected by the engine
• Optimizer calculates Cardinality each time
when it accesses statistics
•
Indexes don’t always improve performance
• Histograms can help
Still new feature
• Histograms do not replace other optimizations!
Conclusion
59
MySQL User Reference Manual
Blog by Erik Froseth
Blog by Frederic Descamps
Talk by Oystein Grovlen @Fosdem
Talk by Sergei Petrunia @PerconaLive
WL #8707
More information
60
www.slideshare.net/SvetaSmirnova
twitter.com/svetsmirnova
github.com/svetasmirnova
Thank you!
61

A Billion Goods in a Few Categories: When Optimizer Histograms Help and When They Don’t

  • 1.
    A Billion Goodsin a Few Categories When Optimizer Histograms Help and When They Don’t September 18, 2019 Sveta Smirnova
  • 2.
    •Introduction •The Use Case TheCardinality: Two Levels Example •Why the Difference? •Even Worse Use Case ANALYZE TABLE Limitations Example •How Histograms Work? •Left Overs Table of Contents 2
  • 3.
    The column statisticsdata dictionary table stores histogram statistics about column values, for use by the optimizer in constructing query execution plans MySQL User Reference Manual Optimizer Statistics aka Histograms 3
  • 4.
    • MySQL Supportengineer • Author of • MySQL Troubleshooting • JSON UDF functions • FILTER clause for MySQL • Speaker • Percona Live, OOW, Fosdem, DevConf, HighLoad... Sveta Smirnova 4
  • 5.
  • 6.
    • Hardware • Wiseoptions • Optimized queries • Brain Everything can Be Resolved! 6
  • 7.
    • This talkis about • How I spent the last three years • Resolving the same issue • For different customers Not Everything 7
  • 8.
    • This talkis about • How I spent the last three years • Resolving the same issue • For different customers • Task was to speed up the query Not Everything 7
  • 9.
    • Specific datadistribution Not All the Queries Can be Optimized 8
  • 10.
    • Specific datadistribution • Access on different fields • ON goods.shop id = shop.id • WHERE shop.location IN (...) • GROUP BY goods.category, shop.profile • ORDER BY shop.distance, goods.quantity Not All the Queries Can be Optimized 8
  • 11.
    • Specific datadistribution • Access on different fields • ON goods.shop id = shop.id • WHERE shop.location IN (...) • GROUP BY goods.category, shop.profile • ORDER BY shop.distance, goods.quantity • Index cannot be used effectively Not All the Queries Can be Optimized 8
  • 12.
    • Data distributionvaries • Big difference between number of values Red 1,000,000 Green 2 Blue 100,000 Latest Support Tickets 9
  • 13.
    • Data distributionvaries • Constantly changing Red 100,000 Green 1,000,000 Blue 10 Latest Support Tickets 9
  • 14.
    • Data distributionvaries • Constantly changing Red 1,000 Green 2,000 Blue 50,000 Latest Support Tickets 9
  • 15.
    • Data distributionvaries • Cardinality is not correct • Was not updated in time • Updates too often • Calculated wrongly Latest Support Tickets 9
  • 16.
    • Data distributionvaries • Cardinality is not correct • Index maintenance is expensive • Hardware resources • Slow updates • Window to run CREATE INDEX Latest Support Tickets 9
  • 17.
    • Data distributionvaries • Cardinality is not correct • Index maintenance is expensive • Optimizer does not work as we wish it Examples in my talk @Percona Live Frankfurt Latest Support Tickets 9
  • 18.
    • Topic basedon real Support cases • Couple of them are still in progress Disclaimer 10
  • 19.
    • Topic basedon real Support cases • All examples are 100% fake • They are created so that • No customer can be identified • Everything generated Table names Column names Data • Use case itself is fictional Disclaimer 10
  • 20.
    • Topic basedon real Support cases • All examples are 100% fake • All examples are simplified • Only columns, required to show the issue • Everything extra removed • Real tables usually store much more data Disclaimer 10
  • 21.
    • Topic basedon real Support cases • All examples are 100% fake • All examples are simplified • All disasters happened with version 5.7 Disclaimer 10
  • 22.
  • 23.
    • categories • Less than20 rows Two Tables 12
  • 24.
    • categories • Less than20 rows • goods • More than 1M rows • 20 unique cat id values • Many other fields Price Date: added, last updated, etc. Characteristics Store ... Two Tables 12
  • 25.
    select * from goods join categories on (categories.id=goods.cat_id) where date_added between’2018-07-01’ and ’2018-08-01’ and cat_id in (16,11) and price >= 1000 and <=10000 [ and ... ] [ GROUP BY ... [ORDER BY ... [ LIMIT ...]]] ; JOIN 13
  • 26.
    • Select fromthe small table Option 1: Select from the Small Table First 14
  • 27.
    • Select fromthe small table • For each cat id select from the large table Option 1: Select from the Small Table First 14
  • 28.
    • Select fromthe small table • For each cat id select from the large table • Filter result on date added[ and price[...]] Option 1: Select from the Small Table First 14
  • 29.
    • Select fromthe small table • For each cat id select from the large table • Filter result on date added[ and price[...]] • Slow with many items in the category Option 1: Select from the Small Table First 14
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
    • Filter rowsby date added[ and price[...]] Option 2: Select From the Large Table First 16
  • 39.
    • Filter rowsby date added[ and price[...]] • Get cat id values Option 2: Select From the Large Table First 16
  • 40.
    • Filter rowsby date added[ and price[...]] • Get cat id values • Retrieve rows from the small table Option 2: Select From the Large Table First 16
  • 41.
    • Filter rowsby date added[ and price[...]] • Get cat id values • Retrieve rows from the small table • Slow if number of rows, filtered by date added, is larger than number of goods in the selected categories Option 2: Select From the Large Table First 16
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
    • CREATE INDEX indexeverything (cat id, date added[, price[, ...]]) • It resolves the issue What if We use Combined Indexes? 18
  • 48.
    • CREATE INDEX indexeverything (cat id, date added[, price[, ...]]) • It resolves the issue • But not in all cases What if We use Combined Indexes? 18
  • 49.
    • Maintenance cost • SlowerINSERT/UPDATE/DELETE • Disk space The Problem 19
  • 50.
    • Maintenance cost • SlowerINSERT/UPDATE/DELETE • Disk space • Index not useful for selecting rows JOIN categories ON (categories.id=goods.cat_id) JOIN shops ON (shops.id=goods.shop_id) [ JOIN ... ] WHERE date_added between ’2018-07-01’ and ’2018-08-01’ AND cat_id in (16,11) AND price >= 1000 AND price <=10000 [ AND ... ] GROUP BY product_type ORDER BY date_updated DESC LIMIT 50,100 The Problem 19
  • 51.
    • Maintenance cost • SlowerINSERT/UPDATE/DELETE • Disk space • Index not useful for selecting rows • Tables may have wrong cardinality The Problem 19
  • 52.
    The Use Case TheCardinality: Two Levels
  • 53.
  • 54.
    • Optimizer • Engine • MyRocks •InnoDB • Any MySQL is Layered Architecture 22
  • 55.
    • Number ofunique values in the index • Optimizer uses for the query execution plan Cardinality 23
  • 56.
    • Number ofunique values in the index • Optimizer uses for the query execution plan • Example • ID: 1,2,3,4,5 • Number of rows: 5 • Cardinality: 5 Cardinality 23
  • 57.
    • Number ofunique values in the index • Optimizer uses for the query execution plan • Example • Gender: m,f,f,f,f,m,m,m,m,m,m,f,f,m,f,m,f • Number of rows: 17 • Cardinality: 2 Cardinality 23
  • 58.
    • Stores statisticson disk • mysql.innodb table stats • mysql.innodb index stats InnoDB: Overview 24
  • 59.
    • Stores statisticson disk • Returns statistics to Optimizer InnoDB: Overview 24
  • 60.
    • Stores statisticson disk • Returns statistics to Optimizer • In ha innobase::info • handler/ha innodb.cc InnoDB: Overview 24
  • 61.
    • Stores statisticson disk • Returns statistics to Optimizer • In ha innobase::info • handler/ha innodb.cc • When opens table • flag = HA STATUS CONST • Reads data from disk • Stores it in memory InnoDB: Overview 24
  • 62.
    • Stores statisticson disk • Returns statistics to Optimizer • In ha innobase::info • handler/ha innodb.cc • When opens table • Subsequent table accesses • flag = HA STATUS VARIABLE • Statistics from memory • Up to date Primary Key data InnoDB: Overview 24
  • 63.
    • Table createdwith option STATS AUTO RECALC = 0 • Before ANALYZE TABLE mysql> show index from testG ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 64 ... InnoDB: Flow 25
  • 64.
    • Table createdwith option STATS AUTO RECALC = 0 • After ANALYZE TABLE mysql> show index from testG ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 2 ... InnoDB: Flow 25
  • 65.
    • Table createdwith option STATS AUTO RECALC = 0 • After inserting rows mysql> show index from testG ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 16 ... InnoDB: Flow 25
  • 66.
    • Table createdwith option STATS AUTO RECALC = 0 • After restart mysql> show index from testG ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 2 ... InnoDB: Flow 25
  • 67.
    • Takes datafrom the engine Optimizer: Overview 26
  • 68.
    • Takes datafrom the engine • Class ha statistics • sql/handler.h Optimizer: Overview 26
  • 69.
    • Takes datafrom the engine • Class ha statistics • sql/handler.h • Does not have Cardinality field at all Optimizer: Overview 26
  • 70.
    • Takes datafrom the engine • Class ha statistics • sql/handler.h • Does not have Cardinality field at all • Uses formula to calculate Cardinality Optimizer: Overview 26
  • 71.
    • n rows:number of rows in the table • Naturally up to date • Constantly changing! Optimizer: Formula 27
  • 72.
    • n rows:number of rows in the table • Naturally up to date • Constantly changing! • rec per key: number of duplicates per key • Calculated by InnoDB in time of ANALYZE • rec per key = n rows / unique values • Do not change! Optimizer: Formula 27
  • 73.
    • n rows:number of rows in the table • Naturally up to date • Constantly changing! • rec per key: number of duplicates per key • Calculated by InnoDB in time of ANALYZE • rec per key = n rows / unique values • Do not change! • Cardinality = n rows / rec per key Optimizer: Formula 27
  • 74.
    • Engine storespersistent statistics InnoDB Storage Tables Statistics As Calculated Row Count Only in Memory Persistent Statistics Are Not Persistent 28
  • 75.
    • Engine storespersistent statistics InnoDB Storage Tables Statistics As Calculated Row Count Only in Memory • Optimizer calculates Cardinality every time when accesses engine statistics Persistent Statistics Are Not Persistent 28
  • 76.
    • Engine storespersistent statistics InnoDB Storage Tables Statistics As Calculated Row Count Only in Memory • Optimizer calculates Cardinality every time when accesses engine statistics • Weak user control Persistent Statistics Are Not Persistent 28
  • 77.
  • 78.
    • EXPLAIN withouthistograms mysql> explain select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -- Large range -> order by goods.cat_id -> limit 10G -- We ask for 10 rows only! Example 30
  • 79.
    • EXPLAIN withouthistograms *************************** 1. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table first partitions: NULL type: index possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 20 filtered: 70.00 Extra: Using where; Using index; Using temporary; Using filesort Example 30
  • 80.
    • EXPLAIN withouthistograms *************************** 2. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table partitions: NULL type: ref possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: orig.categories.id rows: 51827 filtered: 11.11 -- Default value Extra: Using where 2 rows in set, 1 warning (0.01 sec) Example 30
  • 81.
    • Execution timewithout histograms mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10; ab9f9bb7bc4f357712ec34f067eda364 - 10 rows in set (56.47 sec) Example 30
  • 82.
    • Engine statisticswithout histograms mysql> show status like ’Handler%’; +----------------------------+--------+ | Variable_name | Value | +----------------------------+--------+ ... | Handler_read_next | 964718 | | Handler_read_prev | 0 | | Handler_read_rnd | 10 | | Handler_read_rnd_next | 951671 | ... | Handler_write | 951670 | +----------------------------+--------+ 18 rows in set (0.01 sec) Example 30
  • 83.
    • Now letadd the histogram mysql> analyze table goods update histogram on date_added; +------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +------------+-----------+----------+------------------------------+ | orig.goods | histogram | status | Histogram statistics created for column ’date_added’. | +------------+-----------+----------+------------------------------+ 1 row in set (2.01 sec) Example 30
  • 84.
    • EXPLAIN withthe histogram mysql> explain select goods.* from goods -> join categories -> on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10G Example 30
  • 85.
    • EXPLAIN withthe histogram *************************** 1. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table first partitions: NULL type: index possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: NULL rows: 10 -- Same as we asked filtered: 98.70 -- True numbers Extra: Using where Example 30
  • 86.
    • EXPLAIN withthe histogram *************************** 2. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: orig.goods.cat_id rows: 1 filtered: 100.00 Extra: Using index 2 rows in set, 1 warning (0.01 sec) Example 30
  • 87.
    • Execution timewith the histogram mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10; eeb005fae0dd3441c5c380e1d87fee84 - 10 rows in set (0.00 sec) -- 56/0 times faster! Example 30
  • 88.
    • Engine statisticswith the histogram mysql> show status like ’Handler%’; +----------------------------+-------++----------------------------+-------+ | Variable_name | Value || Variable_name | Value | +----------------------------+-------++----------------------------+-------+ | Handler_commit | 1 || Handler_read_prev | 0 | | Handler_delete | 0 || Handler_read_rnd | 0 | | Handler_discover | 0 || Handler_read_rnd_next | 0 | | Handler_external_lock | 4 || Handler_rollback | 0 | | Handler_mrr_init | 0 || Handler_savepoint | 0 | | Handler_prepare | 0 || Handler_savepoint_rollback | 0 | | Handler_read_first | 1 || Handler_update | 0 | | Handler_read_key | 3 || Handler_write | 0 | | Handler_read_last | 0 |+----------------------------+-------+ | Handler_read_next | 9 |18 rows in set (0.00 sec) Example 30
  • 89.
  • 90.
    1 2 34 5 6 7 8 9 10 0 200 400 600 800 Indexes: Number of Items with Same Value 32
  • 91.
    1 2 34 5 6 7 8 9 10 0 200 400 600 800 Indexes: Cardinality 33
  • 92.
    1 2 34 5 6 7 8 9 10 0 200 400 600 800 Histograms: Number of Values in Each Bucket 34
  • 93.
    1 2 34 5 6 7 8 9 10 0 0.2 0.4 0.6 0.8 1 Histograms: Data in the Histogram 35
  • 94.
  • 95.
    Even Worse UseCase ANALYZE TABLE Limitations
  • 96.
    • ANALYZE TABLEoften • Use large number of STATS SAMPLE PAGES Solutions in 5.7- 38
  • 97.
    • Counts numberof pages in the table How ANALYZE TABLE Works with InnoDB? 39
  • 98.
    • Counts numberof pages in the table • Takes STATS SAMPLE PAGES How ANALYZE TABLE Works with InnoDB? 39
  • 99.
    • Counts numberof pages in the table • Takes STATS SAMPLE PAGES • Counts number of unique values in secondary index in these pages How ANALYZE TABLE Works with InnoDB? 39
  • 100.
    • Counts numberof pages in the table • Takes STATS SAMPLE PAGES • Counts number of unique values in secondary index in these pages • Divides number of pages in the table on number of sample pages and multiplies result by number of unique values How ANALYZE TABLE Works with InnoDB? 39
  • 101.
    • Number ofpages in the table: 20,000 • STATS SAMPLE PAGES: 20 (default) • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 Example 40
  • 102.
    • Number ofpages in the table: 20,000 • STATS SAMPLE PAGES: 20 (default) • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 • Cardinality: 20,000 * 10 / 20 = 10,000 Example 40
  • 103.
    • Number ofpages in the table: 20,000 • STATS SAMPLE PAGES: 5,000 • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 • Cardinality: 20,000 * 10 / 5,000 = 40 Example 2 41
  • 104.
    • Time consuming mysql>select count(*) from goods; +----------+ | count(*) | +----------+ | 80303000 | +----------+ 1 row in set (35.95 sec) Use Larger STATS SAMPLE PAGES? 42
  • 105.
    • Time consuming •With default STATS SAMPLE PAGES mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (0.32 sec) Use Larger STATS SAMPLE PAGES? 42
  • 106.
    • Time consuming •With bigger number mysql> alter table goods STATS_SAMPLE_PAGES=5000; Query OK, 0 rows affected (0.04 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (27.13 sec) Use Larger STATS SAMPLE PAGES? 42
  • 107.
    • Time consuming •With bigger number • 27.13/0.32 = 85 times slower! Use Larger STATS SAMPLE PAGES? 42
  • 108.
    • Time consuming •With bigger number • 27.13/0.32 = 85 times slower! • Not always a solution Use Larger STATS SAMPLE PAGES? 42
  • 109.
    Even Worse UseCase Example
  • 110.
    • goods characteristics CREATE TABLE‘goods_characteristics‘ ( ‘id‘ int(11) NOT NULL AUTO_INCREMENT, ‘good_id‘ varchar(30) DEFAULT NULL, ‘size‘ int(11) DEFAULT NULL, ‘manufacturer‘ varchar(30) DEFAULT NULL, PRIMARY KEY (‘id‘), KEY ‘good_id‘ (‘good_id‘,‘size‘,‘manufacturer‘), KEY ‘size‘ (‘size‘,‘manufacturer‘) ) ENGINE=InnoDB AUTO_INCREMENT=196606 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci Two Similar Tables 44
  • 111.
    • goods shops CREATE TABLE‘goods_shops‘ ( ‘id‘ int(11) NOT NULL AUTO_INCREMENT, ‘good_id‘ varchar(30) DEFAULT NULL, ‘location‘ varchar(30) DEFAULT NULL, ‘delivery_options‘ varchar(30) DEFAULT NULL, PRIMARY KEY (‘id‘), KEY ‘good_id‘ (‘good_id‘,‘location‘,‘delivery_options‘), KEY ‘location‘ (‘location‘,‘delivery_options‘) ) ENGINE=InnoDB AUTO_INCREMENT=131071 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci Two Similar Tables 44
  • 112.
    • Size mysql> selectcount(*) from goods_characteristics; +----------+ | count(*) | +----------+ | 131072 | +----------+ 1 row in set (0.08 sec) mysql> select count(*) from goods_shops; +----------+ | count(*) | +----------+ | 65536 | +----------+ 1 row in set (0.04 sec) Two Similar Tables 44
  • 113.
    • Data Distribution:goods characteristics mysql> select count(*) num_rows, good_id, size -> from goods_characteristics group by good_id, size; +----------+---------+------+ | num_rows | good_id | size | +----------+---------+------+ | 65536 | laptop | 7 | | 8189 | laptop | 13 | | 8187 | laptop | 8 | | 8191 | laptop | 14 | | 8190 | laptop | 9 | | 8190 | laptop | 15 | | 8188 | laptop | 10 | | 10 | laptop | 16 | | 8192 | laptop | 11 | | 10 | laptop | 17 | | 8189 | laptop | 12 | +----------+---------+------+ Two Similar Tables 44
  • 114.
    • Data Distribution:goods characteristics mysql> select count(*) num_rows, good_id, manufacturer -> from goods_characteristics group by good_id, manufacturer order by num_rows desc; +----------+---------+--------------+ | num_rows | good_id | manufacturer | +----------+---------+--------------+ | 65536 | laptop | Noname | | 8189 | laptop | Toshiba | | 8191 | laptop | Samsung | | 8189 | laptop | Apple | | 8191 | laptop | Acer | | 8189 | laptop | Asus | | 8189 | laptop | Dell | | 10 | laptop | Sony | | 8189 | laptop | HP | | 10 | laptop | Casper | | 8189 | laptop | Lenovo | +----------+---------+--------------+ Two Similar Tables 44
  • 115.
    • Data Distribution:goods shops mysql> select count(*) num_rows, good_id, location -> from goods_shops group by good_id, location order by num_rows desc; +----------+---------+---------------+ | num_rows | good_id | location | +----------+---------+---------------+ | 8191 | laptop | New York | | 8189 | laptop | Tokio | | 8191 | laptop | San Francisco | | 8189 | laptop | Istanbul | | 8189 | laptop | Paris | | 8189 | laptop | London | | 8189 | laptop | Berlin | | 10 | laptop | Moscow | | 8189 | laptop | Brussels | | 10 | laptop | Kiev | +----------+---------+---------------+ Two Similar Tables 44
  • 116.
    • Data Distribution:goods shops mysql> select count(*) num_rows, good_id, delivery_options -> from goods_shops group by good_id, delivery_options order by num_rows desc; +----------+---------+------------------+ | num_rows | good_id | delivery_options | +----------+---------+------------------+ | 8192 | laptop | DHL | | 8189 | laptop | Gruzovichkof | | 8191 | laptop | PTT | | 8188 | laptop | Courier | | 8190 | laptop | Normal Post | | 8187 | laptop | No delivery | | 8190 | laptop | Tracked | | 10 | laptop | Premium | | 8189 | laptop | Fedex | | 10 | laptop | Urgent | +----------+---------+------------------+ Two Similar Tables 44
  • 117.
    Histogram statistics areuseful primarily for nonindexed columns. Adding an index to a column for which histogram statistics are applicable might also help the optimizer make row estimates. The tradeoffs are: An index must be updated when table data is modified. A histogram is created or updated only on demand, so it adds no overhead when table data is modified. On the other hand, the statistics become progres- sively more out of date when table modifications occur, until the next time they are updated. MySQL User Reference Manual Optimizer Statistics aka Histograms 45
  • 118.
    mysql> alter tablegoods_characteristics stats_sample_pages=5000; Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table goods_shops stats_sample_pages=5000; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> analyze table goods_characteristics, goods_shops; +----------------------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +----------------------------+---------+----------+----------+ | test.goods_characteristics | analyze | status | OK | | test.goods_shops | analyze | status | OK | +----------------------------+---------+----------+----------+ 2 rows in set (0.35 sec) Index Statistics is More than Good 46
  • 119.
    • The query mysql>select count(*) from goods_shops join goods_characteristics -> using (good_id) -> where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); ^C^C -- query aborted ERROR 1317 (70100): Query execution was interrupted Performance 47
  • 120.
    • Handlers mysql> showstatus like ’Handler%’; +----------------------------+-------------+ | Variable_name | Value | +----------------------------+-------------+ | Handler_commit | 0 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 4 | | Handler_mrr_init | 0 | | Handler_prepare | 0 | | Handler_read_first | 1 | | Handler_read_key | 13043 | | Handler_read_last | 0 | | Handler_read_next | 854,767,916 | ... Performance 47
  • 121.
    • Table order mysql>explain select count(*) from goods_shops join goods_characteristics -> using (good_id) where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); +----+-----------------------+-------+---------+--------+----------+---------------+ | id | table | type | key | rows | filtered | Extra | +----+-----------------------+-------+---------+--------+----------+---------------+ | 1 | goods_characteristics | index | good_id | 131072 | 25.00 | Using... | | 1 | goods_shops | ref | good_id | 65536 | 36.00 | Using... | +----+-----------------------+-------+---------+--------+----------+---------------+ 2 rows in set, 1 warning (0.00 sec) Performance 47
  • 122.
    • Table ordermatters mysql> explain select count(*) from goods_shops straight_join goods_characteristics -> using (good_id) where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); +----+-----------------------+-------+---------+--------+----------+---------------+ | id | table | type | key | rows | filtered | Extra | +----+-----------------------+-------+---------+--------+----------+---------------+ | 1 | goods_shops | index | good_id | 65536 | 36.00 | Using... | | 1 | goods_characteristics | ref | good_id | 131072 | 25.00 | Using... | +----+-----------------------+-------+---------+--------+----------+---------------+ 2 rows in set, 1 warning (0.00 sec) Performance 47
  • 123.
    • Table ordermatters mysql> select count(*) from goods_shops straight_join goods_characteristics -> using (good_id) -> where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); +----------+ | count(*) | +----------+ | 816640 | +----------+ 1 row in set (2.11 sec) Performance 47
  • 124.
    • Table ordermatters mysql> show status like ’Handler_read_next’; +-------------------+-----------+ | Variable_name | Value | +-------------------+-----------+ | Handler_read_next | 5,308,416 | +-------------------+-----------+ 1 row in set (0.00 sec) Performance 47
  • 125.
    • Not forall data mysql> select count(*) from goods_shops straight_join goods_characteristics -> using (good_id) -> where (size > 15 or manufacturer in (’Sony’, ’Casper’)) -> and location in -> (’New York’, ’San Francisco’, ’Paris’, ’Berlin’, ’Brussels’, ’London’) -> and delivery_options in -> (’DHL’,’Normal Post’, ’Tracked’, ’Fedex’, ’No delivery’); ^C^C -- query aborted ERROR 1317 (70100): Query execution was interrupted Performance 47
  • 126.
    • Not forall data mysql> show status like ’Handler%’; +----------------------------+------------+ | Variable_name | Value | +----------------------------+------------+ | Handler_commit | 10 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 28 | | Handler_mrr_init | 0 | | Handler_prepare | 0 | | Handler_read_first | 1 | | Handler_read_key | 143 | | Handler_read_last | 0 | | Handler_read_next | 16,950,265 | Performance 47
  • 127.
    mysql> analyze tablegoods_shops update histogram -> on location, delivery_options; +-------------+-----------+----------+--------------------------------+ | Table | Op | Msg_type | Msg_text | +-------------+-----------+----------+--------------------------------+ | goods_shops | histogram | status | Histogram statistics created for column ’delivery_options’. | | goods_shops | histogram | status | Histogram statistics created for column ’location’. | +-------------+-----------+----------+--------------------------------+ 2 rows in set (0.18 sec) Histograms to The Rescue 48
  • 128.
    mysql> analyze tablegoods_characteristics update histogram -> on size, manufacturer ; +-----------------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +-----------------------+-----------+----------+------------------------------+ | goods_characteristics | histogram | status | Histogram statistics created for column ’manufacturer’. | | goods_characteristics | histogram | status | Histogram statistics created for column ’size’. | +-----------------------+-----------+----------+------------------------------+ 2 rows in set (0.23 sec) Histograms to The Rescue 48
  • 129.
    • The query mysql>select count(*) from goods_shops join goods_characteristics -> using (good_id) -> where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); +----------+ | count(*) | +----------+ | 816640 | +----------+ 1 row in set (2.16 sec) Histograms to The Rescue 48
  • 130.
    • The query mysql>show status like ’Handler_read_next’; +-------------------+-----------+ | Variable_name | Value | +-------------------+-----------+ | Handler_read_next | 5,308,418 | +-------------------+-----------+ 1 row in set (0.00 sec) Histograms to The Rescue 48
  • 131.
    • Filtering effect mysql>explain select count(*) from goods_shops join goods_characteristics -> using (good_id) -> where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); +----+-----------------------+-------+---------+--------+----------+----------+ | id | table | type | key | rows | filtered | Extra | +----+-----------------------+-------+---------+--------+----------+----------+ | 1 | goods_shops | index | good_id | 65536 | 0.06 | Using... | | 1 | goods_characteristics | ref | good_id | 131072 | 15.63 | Using... | +----+-----------------------+-------+---------+--------+----------+----------+ 2 rows in set, 1 warning (0.00 sec) Histograms to The Rescue 48
  • 132.
  • 133.
  • 134.
    ↓ sql/sql planner.cc ↓calculate condition filter Low Level 50
  • 135.
    ↓ sql/sql planner.cc ↓calculate condition filter ↓ Item func *::get filtering effect Low Level 50
  • 136.
    ↓ sql/sql planner.cc ↓calculate condition filter ↓ Item func *::get filtering effect • get histogram selectivity Low Level 50
  • 137.
    ↓ sql/sql planner.cc ↓calculate condition filter ↓ Item func *::get filtering effect • get histogram selectivity • Seen as a percent of filtered rows in EXPLAIN Low Level 50
  • 138.
    • Example data mysql>create table example(f1 int) engine=innodb; mysql> insert into example values(1),(1),(1),(2),(3); mysql> select f1, count(f1) from example group by f1; +------+-----------+ | f1 | count(f1) | +------+-----------+ | 1 | 3 | | 2 | 1 | | 3 | 1 | +------+-----------+ 3 rows in set (0.00 sec) Filtered Rows 51
  • 139.
    • Without ahistogram mysql> explain select * from example where f1 > 0G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 140.
    • Without ahistogram mysql> explain select * from example where f1 > 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 141.
    • Without ahistogram mysql> explain select * from example where f1 > 2G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 142.
    • Without ahistogram mysql> explain select * from example where f1 > 3G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 143.
    • With thehistogram mysql> analyze table example update histogram on f1 with 3 buckets; +-----------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +-----------------+-----------+----------+------------------------------+ | hist_ex.example | histogram | status | Histogram statistics created for column ’f1’. | +-----------------+-----------+----------+------------------------------+ 1 row in set (0.03 sec) Filtered Rows 51
  • 144.
    • With thehistogram mysql> select * from information_schema.column_statistics -> where table_name=’example’G *************************** 1. row *************************** SCHEMA_NAME: hist_ex TABLE_NAME: example COLUMN_NAME: f1 HISTOGRAM: "buckets": [[1, 0.6], [2, 0.8], [3, 1.0]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-11-07 09:07:19.791470", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 3 1 row in set (0.00 sec) Filtered Rows 51
  • 145.
    • With thehistogram mysql> explain select * from example where f1 > 0G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 100.00 -- all rows Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 146.
    • With thehistogram mysql> explain select * from example where f1 > 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 40.00 -- 2 rows Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 147.
    • With thehistogram mysql> explain select * from example where f1 > 2G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 -- one row Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 148.
    • With thehistogram mysql> explain select * from example where f1 > 3G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 - one row Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 149.
  • 150.
  • 151.
  • 152.
    • CREATE INDEX • Metadatalock • Can be blocked by any query Maintenance: Locking 55
  • 153.
    • CREATE INDEX • Metadatalock • Can be blocked by any query • UPDATE HISTOGRAM • Backup lock • Can be locked only by a backup • Can be created any time without fear Maintenance: Locking 55
  • 154.
    • CREATE INDEX • Lockswrites • Locks reads ∗ PS-2503 Before Percona Server 5.6.38-83.0/5.7.20-18 Upstream • Every DML updates the index Maintenance: Load 56
  • 155.
    • CREATE INDEX • Lockswrites • Locks reads ∗ • Every DML updates the index • UPDATE HISTOGRAM • Uses up to histogram generation max mem size • Persistent after creation • DML do not touch it Maintenance: Load 56
  • 156.
    • Helps ifquery plan can be changed • Not a replacement for the index: • GROUP BY • ORDER BY • Query on a single table ∗ Histograms 57
  • 157.
    • Data distributionis uniform • Range optimization can be used • Full table scan is fast When Histogram are Not Helpful? 58
  • 158.
    • Index statisticscollected by the engine • Optimizer calculates Cardinality each time when it accesses statistics • Indexes don’t always improve performance • Histograms can help Still new feature • Histograms do not replace other optimizations! Conclusion 59
  • 159.
    MySQL User ReferenceManual Blog by Erik Froseth Blog by Frederic Descamps Talk by Oystein Grovlen @Fosdem Talk by Sergei Petrunia @PerconaLive WL #8707 More information 60
  • 160.