SlideShare a Scribd company logo
A Billion Goods in a Few Categories
When Optimizer Histograms Help and When They Don’t
September 18, 2019
Sveta Smirnova
•Introduction
•The Use Case
The Cardinality: Two Levels
Example
•Why the Difference?
•Even Worse Use Case
ANALYZE TABLE Limitations
Example
•How Histograms Work?
•Left Overs
Table of Contents
2
The column statistics data dictionary table stores histogram statistics about
column values, for use by the optimizer in constructing query execution plans
MySQL User Reference Manual
Optimizer Statistics aka Histograms
3
• MySQL Support engineer
• Author of
• MySQL Troubleshooting
• JSON UDF functions
• FILTER clause for MySQL
• Speaker
• Percona Live, OOW, Fosdem,
DevConf, HighLoad...
Sveta Smirnova
4
Introduction
• Hardware
• Wise options
• Optimized queries
• Brain
Everything can Be Resolved!
6
• This talk is about
•
How I spent the last three years
• Resolving the same issue
• For different customers
Not Everything
7
• This talk is about
•
How I spent the last three years
• Resolving the same issue
• For different customers
•
Task was to speed up the query
Not Everything
7
• Specific data distribution
Not All the Queries Can be Optimized
8
• Specific data distribution
• Access on different fields
•
ON goods.shop id = shop.id
• WHERE shop.location IN (...)
• GROUP BY goods.category, shop.profile
• ORDER BY shop.distance, goods.quantity
Not All the Queries Can be Optimized
8
• Specific data distribution
• Access on different fields
•
ON goods.shop id = shop.id
• WHERE shop.location IN (...)
• GROUP BY goods.category, shop.profile
• ORDER BY shop.distance, goods.quantity
• Index cannot be used effectively
Not All the Queries Can be Optimized
8
• Data distribution varies
•
Big difference between number of values
Red 1,000,000
Green 2
Blue 100,000
Latest Support Tickets
9
• Data distribution varies
•
Constantly changing
Red 100,000
Green 1,000,000
Blue 10
Latest Support Tickets
9
• Data distribution varies
•
Constantly changing
Red 1,000
Green 2,000
Blue 50,000
Latest Support Tickets
9
• Data distribution varies
• Cardinality is not correct
• Was not updated in time
•
Updates too often
• Calculated wrongly
Latest Support Tickets
9
• Data distribution varies
• Cardinality is not correct
• Index maintenance is expensive
• Hardware resources
•
Slow updates
• Window to run CREATE INDEX
Latest Support Tickets
9
• Data distribution varies
• Cardinality is not correct
• Index maintenance is expensive
•
Optimizer does not work as we wish it
Examples in my talk @Percona Live Frankfurt
Latest Support Tickets
9
• Topic based on real Support cases
•
Couple of them are still in progress
Disclaimer
10
• Topic based on real Support cases
• All examples are 100% fake
•
They are created so that
• No customer can be identified
• Everything generated
Table names
Column names
Data
• Use case itself is fictional
Disclaimer
10
• Topic based on real Support cases
• All examples are 100% fake
• All examples are simplified
• Only columns, required to show the issue
•
Everything extra removed
• Real tables usually store much more data
Disclaimer
10
• Topic based on real Support cases
• All examples are 100% fake
• All examples are simplified
• All disasters happened with version 5.7
Disclaimer
10
The Use Case
•
categories
• Less than 20 rows
Two Tables
12
•
categories
• Less than 20 rows
• goods
• More than 1M rows
• 20 unique cat id values
• Many other fields
Price
Date: added, last updated, etc.
Characteristics
Store
...
Two Tables
12
select *
from
goods
join
categories
on
(categories.id=goods.cat_id)
where
date_added between ’2018-07-01’ and ’2018-08-01’
and
cat_id in (16,11)
and
price >= 1000 and <=10000 [ and ... ]
[ GROUP BY ... [ORDER BY ... [ LIMIT ...]]]
;
JOIN
13
• Select from the small table
Option 1: Select from the Small Table First
14
• Select from the small table
• For each cat id select from the large table
Option 1: Select from the Small Table First
14
• Select from the small table
• For each cat id select from the large table
• Filter result on date added[ and price[...]]
Option 1: Select from the Small Table First
14
• Select from the small table
• For each cat id select from the large table
• Filter result on date added[ and price[...]]
• Slow with many items in the category
Option 1: Select from the Small Table First
14
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
Option 1: Illustration
15
• Filter rows by date added[ and price[...]]
Option 2: Select From the Large Table First
16
• Filter rows by date added[ and price[...]]
• Get cat id values
Option 2: Select From the Large Table First
16
• Filter rows by date added[ and price[...]]
• Get cat id values
• Retrieve rows from the small table
Option 2: Select From the Large Table First
16
• Filter rows by date added[ and price[...]]
• Get cat id values
• Retrieve rows from the small table
• Slow if number of rows, filtered by
date added, is larger than number of goods in
the selected categories
Option 2: Select From the Large Table First
16
Option 2: Illustration
17
Option 2: Illustration
17
Option 2: Illustration
17
Option 2: Illustration
17
Option 2: Illustration
17
•
CREATE INDEX index everything
(cat id, date added[, price[, ...]])
• It resolves the issue
What if We use Combined Indexes?
18
•
CREATE INDEX index everything
(cat id, date added[, price[, ...]])
• It resolves the issue
• But not in all cases
What if We use Combined Indexes?
18
• Maintenance cost
•
Slower INSERT/UPDATE/DELETE
• Disk space
The Problem
19
• Maintenance cost
•
Slower INSERT/UPDATE/DELETE
• Disk space
• Index not useful for selecting rows
JOIN categories ON (categories.id=goods.cat_id)
JOIN shops ON (shops.id=goods.shop_id)
[ JOIN ... ]
WHERE
date_added between ’2018-07-01’ and ’2018-08-01’
AND
cat_id in (16,11) AND price >= 1000 AND price <=10000 [ AND ... ]
GROUP BY product_type
ORDER BY date_updated DESC
LIMIT 50,100
The Problem
19
• Maintenance cost
•
Slower INSERT/UPDATE/DELETE
• Disk space
• Index not useful for selecting rows
• Tables may have wrong cardinality
The Problem
19
The Use Case
The Cardinality: Two Levels
The Query
Parser
Optimizer
Storage Engine
Data
MySQL Architecture
21
• Optimizer
•
Engine
• MyRocks
• InnoDB
•
Any
MySQL is Layered Architecture
22
• Number of unique values in the index
• Optimizer uses for the query execution plan
Cardinality
23
• Number of unique values in the index
• Optimizer uses for the query execution plan
• Example
• ID: 1,2,3,4,5
•
Number of rows: 5
• Cardinality: 5
Cardinality
23
• Number of unique values in the index
• Optimizer uses for the query execution plan
• Example
• Gender: m,f,f,f,f,m,m,m,m,m,m,f,f,m,f,m,f
•
Number of rows: 17
• Cardinality: 2
Cardinality
23
• Stores statistics on disk
•
mysql.innodb table stats
•
mysql.innodb index stats
InnoDB: Overview
24
• Stores statistics on disk
• Returns statistics to Optimizer
InnoDB: Overview
24
• Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
InnoDB: Overview
24
• Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
•
When opens table
• flag = HA STATUS CONST
• Reads data from disk
•
Stores it in memory
InnoDB: Overview
24
• Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
•
When opens table
• Subsequent table accesses
• flag = HA STATUS VARIABLE
• Statistics from memory
•
Up to date Primary Key data
InnoDB: Overview
24
• Table created with option STATS AUTO RECALC = 0
• Before ANALYZE TABLE
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 64
...
InnoDB: Flow
25
• Table created with option STATS AUTO RECALC = 0
• After ANALYZE TABLE
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 2
...
InnoDB: Flow
25
• Table created with option STATS AUTO RECALC = 0
• After inserting rows
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 16
...
InnoDB: Flow
25
• Table created with option STATS AUTO RECALC = 0
• After restart
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 2
...
InnoDB: Flow
25
• Takes data from the engine
Optimizer: Overview
26
• Takes data from the engine
• Class ha statistics
•
sql/handler.h
Optimizer: Overview
26
• Takes data from the engine
• Class ha statistics
•
sql/handler.h
• Does not have Cardinality field at all
Optimizer: Overview
26
• Takes data from the engine
• Class ha statistics
•
sql/handler.h
• Does not have Cardinality field at all
• Uses formula to calculate Cardinality
Optimizer: Overview
26
• n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
Optimizer: Formula
27
• n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
• rec per key: number of duplicates per key
•
Calculated by InnoDB in time of ANALYZE
• rec per key = n rows / unique values
• Do not change!
Optimizer: Formula
27
• n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
• rec per key: number of duplicates per key
•
Calculated by InnoDB in time of ANALYZE
• rec per key = n rows / unique values
• Do not change!
•
Cardinality = n rows / rec per key
Optimizer: Formula
27
• Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
Persistent Statistics Are Not Persistent
28
• Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
• Optimizer calculates Cardinality every time
when accesses engine statistics
Persistent Statistics Are Not Persistent
28
• Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
• Optimizer calculates Cardinality every time
when accesses engine statistics
•
Weak user control
Persistent Statistics Are Not Persistent
28
The Use Case
Example
• EXPLAIN without histograms
mysql> explain select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’ -- Large range
-> order by goods.cat_id
-> limit 10G -- We ask for 10 rows only!
Example
30
• EXPLAIN without histograms
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: categories -- Small table first
partitions: NULL
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 20
filtered: 70.00
Extra: Using where; Using index;
Using temporary; Using filesort
Example
30
• EXPLAIN without histograms
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: goods -- Large table
partitions: NULL
type: ref
possible_keys: cat_id_2
key: cat_id_2
key_len: 5
ref: orig.categories.id
rows: 51827
filtered: 11.11 -- Default value
Extra: Using where
2 rows in set, 1 warning (0.01 sec)
Example
30
• Execution time without histograms
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
mysql> select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10;
ab9f9bb7bc4f357712ec34f067eda364 -
10 rows in set (56.47 sec)
Example
30
• Engine statistics without histograms
mysql> show status like ’Handler%’;
+----------------------------+--------+
| Variable_name | Value |
+----------------------------+--------+
...
| Handler_read_next | 964718 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 10 |
| Handler_read_rnd_next | 951671 |
...
| Handler_write | 951670 |
+----------------------------+--------+
18 rows in set (0.01 sec)
Example
30
• Now let add the histogram
mysql> analyze table goods update histogram on date_added;
+------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------+-----------+----------+------------------------------+
| orig.goods | histogram | status | Histogram statistics created
for column ’date_added’. |
+------------+-----------+----------+------------------------------+
1 row in set (2.01 sec)
Example
30
• EXPLAIN with the histogram
mysql> explain select goods.* from goods
-> join categories
-> on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10G
Example
30
• EXPLAIN with the histogram
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: goods -- Large table first
partitions: NULL
type: index
possible_keys: cat_id_2
key: cat_id_2
key_len: 5
ref: NULL
rows: 10 -- Same as we asked
filtered: 98.70 -- True numbers
Extra: Using where
Example
30
• EXPLAIN with the histogram
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: categories -- Small table
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: orig.goods.cat_id
rows: 1
filtered: 100.00
Extra: Using index
2 rows in set, 1 warning (0.01 sec)
Example
30
• Execution time with the histogram
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
mysql> select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10;
eeb005fae0dd3441c5c380e1d87fee84 -
10 rows in set (0.00 sec) -- 56/0 times faster!
Example
30
• Engine statistics with the histogram
mysql> show status like ’Handler%’;
+----------------------------+-------++----------------------------+-------+
| Variable_name | Value || Variable_name | Value |
+----------------------------+-------++----------------------------+-------+
| Handler_commit | 1 || Handler_read_prev | 0 |
| Handler_delete | 0 || Handler_read_rnd | 0 |
| Handler_discover | 0 || Handler_read_rnd_next | 0 |
| Handler_external_lock | 4 || Handler_rollback | 0 |
| Handler_mrr_init | 0 || Handler_savepoint | 0 |
| Handler_prepare | 0 || Handler_savepoint_rollback | 0 |
| Handler_read_first | 1 || Handler_update | 0 |
| Handler_read_key | 3 || Handler_write | 0 |
| Handler_read_last | 0 |+----------------------------+-------+
| Handler_read_next | 9 |18 rows in set (0.00 sec)
Example
30
Why the Difference?
1 2 3 4 5 6 7 8 9 10
0
200
400
600
800
Indexes: Number of Items with Same Value
32
1 2 3 4 5 6 7 8 9 10
0
200
400
600
800
Indexes: Cardinality
33
1 2 3 4 5 6 7 8 9 10
0
200
400
600
800
Histograms: Number of Values in Each Bucket
34
1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
Histograms: Data in the Histogram
35
Even Worse Use Case
Even Worse Use Case
ANALYZE TABLE Limitations
• ANALYZE TABLE often
• Use large number of STATS SAMPLE PAGES
Solutions in 5.7-
38
• Counts number of pages in the table
How ANALYZE TABLE Works with InnoDB?
39
• Counts number of pages in the table
• Takes STATS SAMPLE PAGES
How ANALYZE TABLE Works with InnoDB?
39
• Counts number of pages in the table
• Takes STATS SAMPLE PAGES
• Counts number of unique values in secondary
index in these pages
How ANALYZE TABLE Works with InnoDB?
39
• Counts number of pages in the table
• Takes STATS SAMPLE PAGES
• Counts number of unique values in secondary
index in these pages
•
Divides number of pages in the table on
number of sample pages and multiplies result
by number of unique values
How ANALYZE TABLE Works with InnoDB?
39
• Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 20 (default)
• Unique values in the secondary index:
• In sample pages: 10
• In the table: 11
Example
40
• Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 20 (default)
• Unique values in the secondary index:
• In sample pages: 10
• In the table: 11
• Cardinality: 20,000 * 10 / 20 = 10,000
Example
40
• Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 5,000
•
Unique values in the secondary index:
• In sample pages: 10
•
In the table: 11
• Cardinality: 20,000 * 10 / 5,000 = 40
Example 2
41
• Time consuming
mysql> select count(*) from goods;
+----------+
| count(*) |
+----------+
| 80303000 |
+----------+
1 row in set (35.95 sec)
Use Larger STATS SAMPLE PAGES?
42
• Time consuming
• With default STATS SAMPLE PAGES
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (0.32 sec)
Use Larger STATS SAMPLE PAGES?
42
• Time consuming
• With bigger number
mysql> alter table goods STATS_SAMPLE_PAGES=5000;
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (27.13 sec)
Use Larger STATS SAMPLE PAGES?
42
• Time consuming
• With bigger number
• 27.13/0.32 = 85 times slower!
Use Larger STATS SAMPLE PAGES?
42
• Time consuming
• With bigger number
• 27.13/0.32 = 85 times slower!
•
Not always a solution
Use Larger STATS SAMPLE PAGES?
42
Even Worse Use Case
Example
•
goods characteristics
CREATE TABLE ‘goods_characteristics‘ (
‘id‘ int(11) NOT NULL AUTO_INCREMENT,
‘good_id‘ varchar(30) DEFAULT NULL,
‘size‘ int(11) DEFAULT NULL,
‘manufacturer‘ varchar(30) DEFAULT NULL,
PRIMARY KEY (‘id‘),
KEY ‘good_id‘ (‘good_id‘,‘size‘,‘manufacturer‘),
KEY ‘size‘ (‘size‘,‘manufacturer‘)
) ENGINE=InnoDB AUTO_INCREMENT=196606
DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Two Similar Tables
44
•
goods shops
CREATE TABLE ‘goods_shops‘ (
‘id‘ int(11) NOT NULL AUTO_INCREMENT,
‘good_id‘ varchar(30) DEFAULT NULL,
‘location‘ varchar(30) DEFAULT NULL,
‘delivery_options‘ varchar(30) DEFAULT NULL,
PRIMARY KEY (‘id‘),
KEY ‘good_id‘ (‘good_id‘,‘location‘,‘delivery_options‘),
KEY ‘location‘ (‘location‘,‘delivery_options‘)
) ENGINE=InnoDB AUTO_INCREMENT=131071
DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Two Similar Tables
44
• Size
mysql> select count(*) from goods_characteristics;
+----------+
| count(*) |
+----------+
| 131072 |
+----------+
1 row in set (0.08 sec)
mysql> select count(*) from goods_shops;
+----------+
| count(*) |
+----------+
| 65536 |
+----------+
1 row in set (0.04 sec)
Two Similar Tables
44
• Data Distribution: goods characteristics
mysql> select count(*) num_rows, good_id, size
-> from goods_characteristics group by good_id, size;
+----------+---------+------+
| num_rows | good_id | size |
+----------+---------+------+
| 65536 | laptop | 7 | | 8189 | laptop | 13 |
| 8187 | laptop | 8 | | 8191 | laptop | 14 |
| 8190 | laptop | 9 | | 8190 | laptop | 15 |
| 8188 | laptop | 10 | | 10 | laptop | 16 |
| 8192 | laptop | 11 | | 10 | laptop | 17 |
| 8189 | laptop | 12 | +----------+---------+------+
Two Similar Tables
44
• Data Distribution: goods characteristics
mysql> select count(*) num_rows, good_id, manufacturer
-> from goods_characteristics group by good_id, manufacturer order by num_rows desc;
+----------+---------+--------------+
| num_rows | good_id | manufacturer |
+----------+---------+--------------+
| 65536 | laptop | Noname | | 8189 | laptop | Toshiba |
| 8191 | laptop | Samsung | | 8189 | laptop | Apple |
| 8191 | laptop | Acer | | 8189 | laptop | Asus |
| 8189 | laptop | Dell | | 10 | laptop | Sony |
| 8189 | laptop | HP | | 10 | laptop | Casper |
| 8189 | laptop | Lenovo | +----------+---------+--------------+
Two Similar Tables
44
• Data Distribution: goods shops
mysql> select count(*) num_rows, good_id, location
-> from goods_shops group by good_id, location order by num_rows desc;
+----------+---------+---------------+
| num_rows | good_id | location |
+----------+---------+---------------+
| 8191 | laptop | New York | | 8189 | laptop | Tokio |
| 8191 | laptop | San Francisco | | 8189 | laptop | Istanbul |
| 8189 | laptop | Paris | | 8189 | laptop | London |
| 8189 | laptop | Berlin | | 10 | laptop | Moscow |
| 8189 | laptop | Brussels | | 10 | laptop | Kiev |
+----------+---------+---------------+
Two Similar Tables
44
• Data Distribution: goods shops
mysql> select count(*) num_rows, good_id, delivery_options
-> from goods_shops group by good_id, delivery_options order by num_rows desc;
+----------+---------+------------------+
| num_rows | good_id | delivery_options |
+----------+---------+------------------+
| 8192 | laptop | DHL | | 8189 | laptop | Gruzovichkof |
| 8191 | laptop | PTT | | 8188 | laptop | Courier |
| 8190 | laptop | Normal Post | | 8187 | laptop | No delivery |
| 8190 | laptop | Tracked | | 10 | laptop | Premium |
| 8189 | laptop | Fedex | | 10 | laptop | Urgent |
+----------+---------+------------------+
Two Similar Tables
44
Histogram statistics are useful primarily for nonindexed columns. Adding an
index to a column for which histogram statistics are applicable might also help
the optimizer make row estimates. The tradeoffs are:
An index must be updated when table data is modified.
A histogram is created or updated only on demand, so it adds no overhead
when table data is modified. On the other hand, the statistics become progres-
sively more out of date when table modifications occur, until the next time they
are updated.
MySQL User Reference Manual
Optimizer Statistics aka Histograms
45
mysql> alter table goods_characteristics stats_sample_pages=5000;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table goods_shops stats_sample_pages=5000;
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table goods_characteristics, goods_shops;
+----------------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------------------+---------+----------+----------+
| test.goods_characteristics | analyze | status | OK |
| test.goods_shops | analyze | status | OK |
+----------------------------+---------+----------+----------+
2 rows in set (0.35 sec)
Index Statistics is More than Good
46
• The query
mysql> select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
Performance
47
• Handlers
mysql> show status like ’Handler%’;
+----------------------------+-------------+
| Variable_name | Value |
+----------------------------+-------------+
| Handler_commit | 0 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 4 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 13043 |
| Handler_read_last | 0 |
| Handler_read_next | 854,767,916 |
...
Performance
47
• Table order
mysql> explain select count(*) from goods_shops join goods_characteristics
-> using (good_id) where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----+-----------------------+-------+---------+--------+----------+---------------+
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+---------------+
| 1 | goods_characteristics | index | good_id | 131072 | 25.00 | Using... |
| 1 | goods_shops | ref | good_id | 65536 | 36.00 | Using... |
+----+-----------------------+-------+---------+--------+----------+---------------+
2 rows in set, 1 warning (0.00 sec)
Performance
47
• Table order matters
mysql> explain select count(*) from goods_shops straight_join goods_characteristics
-> using (good_id) where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----+-----------------------+-------+---------+--------+----------+---------------+
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+---------------+
| 1 | goods_shops | index | good_id | 65536 | 36.00 | Using... |
| 1 | goods_characteristics | ref | good_id | 131072 | 25.00 | Using... |
+----+-----------------------+-------+---------+--------+----------+---------------+
2 rows in set, 1 warning (0.00 sec)
Performance
47
• Table order matters
mysql> select count(*) from goods_shops straight_join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----------+
| count(*) |
+----------+
| 816640 |
+----------+
1 row in set (2.11 sec)
Performance
47
• Table order matters
mysql> show status like ’Handler_read_next’;
+-------------------+-----------+
| Variable_name | Value |
+-------------------+-----------+
| Handler_read_next | 5,308,416 |
+-------------------+-----------+
1 row in set (0.00 sec)
Performance
47
• Not for all data
mysql> select count(*) from goods_shops straight_join goods_characteristics
-> using (good_id)
-> where (size > 15 or manufacturer in (’Sony’, ’Casper’))
-> and location in
-> (’New York’, ’San Francisco’, ’Paris’, ’Berlin’, ’Brussels’, ’London’)
-> and delivery_options in
-> (’DHL’,’Normal Post’, ’Tracked’, ’Fedex’, ’No delivery’);
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
Performance
47
• Not for all data
mysql> show status like ’Handler%’;
+----------------------------+------------+
| Variable_name | Value |
+----------------------------+------------+
| Handler_commit | 10 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 28 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 143 |
| Handler_read_last | 0 |
| Handler_read_next | 16,950,265 |
Performance
47
mysql> analyze table goods_shops update histogram
-> on location, delivery_options;
+-------------+-----------+----------+--------------------------------+
| Table | Op | Msg_type | Msg_text |
+-------------+-----------+----------+--------------------------------+
| goods_shops | histogram | status | Histogram statistics created
for column ’delivery_options’. |
| goods_shops | histogram | status | Histogram statistics created
for column ’location’. |
+-------------+-----------+----------+--------------------------------+
2 rows in set (0.18 sec)
Histograms to The Rescue
48
mysql> analyze table goods_characteristics update histogram
-> on size, manufacturer ;
+-----------------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------+-----------+----------+------------------------------+
| goods_characteristics | histogram | status | Histogram statistics created
for column ’manufacturer’. |
| goods_characteristics | histogram | status | Histogram statistics created
for column ’size’. |
+-----------------------+-----------+----------+------------------------------+
2 rows in set (0.23 sec)
Histograms to The Rescue
48
• The query
mysql> select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----------+
| count(*) |
+----------+
| 816640 |
+----------+
1 row in set (2.16 sec)
Histograms to The Rescue
48
• The query
mysql> show status like ’Handler_read_next’;
+-------------------+-----------+
| Variable_name | Value |
+-------------------+-----------+
| Handler_read_next | 5,308,418 |
+-------------------+-----------+
1 row in set (0.00 sec)
Histograms to The Rescue
48
• Filtering effect
mysql> explain select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----+-----------------------+-------+---------+--------+----------+----------+
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+----------+
| 1 | goods_shops | index | good_id | 65536 | 0.06 | Using... |
| 1 | goods_characteristics | ref | good_id | 131072 | 15.63 | Using... |
+----+-----------------------+-------+---------+--------+----------+----------+
2 rows in set, 1 warning (0.00 sec)
Histograms to The Rescue
48
How Histograms Work?
↓ sql/sql planner.cc
Low Level
50
↓ sql/sql planner.cc
↓ calculate condition filter
Low Level
50
↓ sql/sql planner.cc
↓ calculate condition filter
↓ Item func *::get filtering effect
Low Level
50
↓ sql/sql planner.cc
↓ calculate condition filter
↓ Item func *::get filtering effect
• get histogram selectivity
Low Level
50
↓ sql/sql planner.cc
↓ calculate condition filter
↓ Item func *::get filtering effect
• get histogram selectivity
• Seen as a percent of filtered rows in EXPLAIN
Low Level
50
• Example data
mysql> create table example(f1 int) engine=innodb;
mysql> insert into example values(1),(1),(1),(2),(3);
mysql> select f1, count(f1) from example group by f1;
+------+-----------+
| f1 | count(f1) |
+------+-----------+
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
+------+-----------+
3 rows in set (0.00 sec)
Filtered Rows
51
• Without a histogram
mysql> explain select * from example where f1 > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• Without a histogram
mysql> explain select * from example where f1 > 1G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• Without a histogram
mysql> explain select * from example where f1 > 2G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• Without a histogram
mysql> explain select * from example where f1 > 3G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• With the histogram
mysql> analyze table example update histogram on f1 with 3 buckets;
+-----------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------+-----------+----------+------------------------------+
| hist_ex.example | histogram | status | Histogram statistics created
for column ’f1’. |
+-----------------+-----------+----------+------------------------------+
1 row in set (0.03 sec)
Filtered Rows
51
• With the histogram
mysql> select * from information_schema.column_statistics
-> where table_name=’example’G
*************************** 1. row ***************************
SCHEMA_NAME: hist_ex
TABLE_NAME: example
COLUMN_NAME: f1
HISTOGRAM:
"buckets": [[1, 0.6], [2, 0.8], [3, 1.0]],
"data-type": "int", "null-values": 0.0, "collation-id": 8,
"last-updated": "2018-11-07 09:07:19.791470",
"sampling-rate": 1.0, "histogram-type": "singleton",
"number-of-buckets-specified": 3
1 row in set (0.00 sec)
Filtered Rows
51
• With the histogram
mysql> explain select * from example where f1 > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 100.00 -- all rows
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• With the histogram
mysql> explain select * from example where f1 > 1G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 40.00 -- 2 rows
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• With the histogram
mysql> explain select * from example where f1 > 2G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 20.00 -- one row
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
• With the histogram
mysql> explain select * from example where f1 > 3G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 20.00 - one row
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
1 2 3
0
0.5
1
1.5
2
Indexes: Cardinality
52
1 2 3
0
0.2
0.4
0.6
0.8
1
Histograms
53
Left Overs
•
CREATE INDEX
• Metadata lock
•
Can be blocked by any query
Maintenance: Locking
55
•
CREATE INDEX
• Metadata lock
•
Can be blocked by any query
• UPDATE HISTOGRAM
• Backup lock
• Can be locked only by a backup
•
Can be created any time without fear
Maintenance: Locking
55
•
CREATE INDEX
• Locks writes
•
Locks reads ∗
PS-2503
Before Percona Server 5.6.38-83.0/5.7.20-18
Upstream
• Every DML updates the index
Maintenance: Load
56
•
CREATE INDEX
• Locks writes
•
Locks reads ∗
•
Every DML updates the index
•
UPDATE HISTOGRAM
• Uses up to histogram generation max mem size
•
Persistent after creation
• DML do not touch it
Maintenance: Load
56
• Helps if query plan can be changed
• Not a replacement for the index:
•
GROUP BY
• ORDER BY
• Query on a single table ∗
Histograms
57
• Data distribution is uniform
• Range optimization can be used
• Full table scan is fast
When Histogram are Not Helpful?
58
• Index statistics collected by the engine
• Optimizer calculates Cardinality each time
when it accesses statistics
•
Indexes don’t always improve performance
• Histograms can help
Still new feature
• Histograms do not replace other optimizations!
Conclusion
59
MySQL User Reference Manual
Blog by Erik Froseth
Blog by Frederic Descamps
Talk by Oystein Grovlen @Fosdem
Talk by Sergei Petrunia @PerconaLive
WL #8707
More information
60
www.slideshare.net/SvetaSmirnova
twitter.com/svetsmirnova
github.com/svetasmirnova
Thank you!
61

More Related Content

What's hot

Introduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]sIntroduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]s
Sveta Smirnova
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
Sveta Smirnova
 
Performance Schema in Action: demo
Performance Schema in Action: demoPerformance Schema in Action: demo
Performance Schema in Action: demo
Sveta Smirnova
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
Sveta Smirnova
 
0888 learning-mysql
0888 learning-mysql0888 learning-mysql
0888 learning-mysql
sabir18
 
Optimizer overviewoow2014
Optimizer overviewoow2014Optimizer overviewoow2014
Optimizer overviewoow2014
Mysql User Camp
 
Troubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance add-onsTroubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance add-ons
Sveta Smirnova
 
Understanding Query Execution
Understanding Query ExecutionUnderstanding Query Execution
Understanding Query Execution
webhostingguy
 
Preparse Query Rewrite Plugins
Preparse Query Rewrite PluginsPreparse Query Rewrite Plugins
Preparse Query Rewrite Plugins
Sveta Smirnova
 
MySQL Query tuning 101
MySQL Query tuning 101MySQL Query tuning 101
MySQL Query tuning 101
Sveta Smirnova
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
Sergey Petrunya
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
Norvald Ryeng
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
Georgi Sotirov
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
Sveta Smirnova
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
Sergey Petrunya
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
Sergey Petrunya
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releases
Georgi Sotirov
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
MYXPLAIN
 
Oracle Database Advanced Querying
Oracle Database Advanced QueryingOracle Database Advanced Querying
Oracle Database Advanced Querying
Zohar Elkayam
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
Sergey Petrunya
 

What's hot (20)

Introduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]sIntroduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]s
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
 
Performance Schema in Action: demo
Performance Schema in Action: demoPerformance Schema in Action: demo
Performance Schema in Action: demo
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
 
0888 learning-mysql
0888 learning-mysql0888 learning-mysql
0888 learning-mysql
 
Optimizer overviewoow2014
Optimizer overviewoow2014Optimizer overviewoow2014
Optimizer overviewoow2014
 
Troubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance add-onsTroubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance add-ons
 
Understanding Query Execution
Understanding Query ExecutionUnderstanding Query Execution
Understanding Query Execution
 
Preparse Query Rewrite Plugins
Preparse Query Rewrite PluginsPreparse Query Rewrite Plugins
Preparse Query Rewrite Plugins
 
MySQL Query tuning 101
MySQL Query tuning 101MySQL Query tuning 101
MySQL Query tuning 101
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releases
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
 
Oracle Database Advanced Querying
Oracle Database Advanced QueryingOracle Database Advanced Querying
Oracle Database Advanced Querying
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
 

Similar to A Billion Goods in a Few Categories: When Optimizer Histograms Help and When They Don’t

Index the obvious and not so obvious
Index the obvious and not so obviousIndex the obvious and not so obvious
Index the obvious and not so obvious
Harry Zheng
 
Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...
Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...
Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...
Ontico
 
PL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme PerformancePL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme Performance
Zohar Elkayam
 
Why Use EXPLAIN FORMAT=JSON?
 Why Use EXPLAIN FORMAT=JSON?  Why Use EXPLAIN FORMAT=JSON?
Why Use EXPLAIN FORMAT=JSON?
Sveta Smirnova
 
Mssql
MssqlMssql
Mssql
Janas Khan
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
ColdFusionConference
 
MySQL vs MonetDB Bencharmarks
MySQL vs MonetDB BencharmarksMySQL vs MonetDB Bencharmarks
MySQL vs MonetDB Bencharmarks
"FENG "GEORGE"" YU
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)
Karthik .P.R
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
Ivan Zoratti
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
Michael Keane
 
My Query is slow, now what?
My Query is slow, now what?My Query is slow, now what?
My Query is slow, now what?
Gianluca Sartori
 
Postgres for MySQL (and other database) people
Postgres for MySQL (and other database) peoplePostgres for MySQL (and other database) people
Postgres for MySQL (and other database) people
Command Prompt., Inc
 
10x Performance Improvements
10x Performance Improvements10x Performance Improvements
10x Performance Improvements
Ronald Bradford
 
10x improvement-mysql-100419105218-phpapp02
10x improvement-mysql-100419105218-phpapp0210x improvement-mysql-100419105218-phpapp02
10x improvement-mysql-100419105218-phpapp02
promethius
 
06 Excel.pdf
06 Excel.pdf06 Excel.pdf
06 Excel.pdf
SugumarSarDurai
 
Hundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario SpacagnaHundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario Spacagna
Spark Summit
 
Presentation top tips for getting optimal sql execution
Presentation    top tips for getting optimal sql executionPresentation    top tips for getting optimal sql execution
Presentation top tips for getting optimal sql execution
xKinAnx
 
Mysql query optimization
Mysql query optimizationMysql query optimization
Mysql query optimization
Baohua Cai
 
Everything You Need to Know About Oracle 12c Indexes
Everything You Need to Know About Oracle 12c IndexesEverything You Need to Know About Oracle 12c Indexes
Everything You Need to Know About Oracle 12c Indexes
SolarWinds
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
ESUG
 

Similar to A Billion Goods in a Few Categories: When Optimizer Histograms Help and When They Don’t (20)

Index the obvious and not so obvious
Index the obvious and not so obviousIndex the obvious and not so obvious
Index the obvious and not so obvious
 
Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...
Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...
Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...
 
PL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme PerformancePL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme Performance
 
Why Use EXPLAIN FORMAT=JSON?
 Why Use EXPLAIN FORMAT=JSON?  Why Use EXPLAIN FORMAT=JSON?
Why Use EXPLAIN FORMAT=JSON?
 
Mssql
MssqlMssql
Mssql
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
MySQL vs MonetDB Bencharmarks
MySQL vs MonetDB BencharmarksMySQL vs MonetDB Bencharmarks
MySQL vs MonetDB Bencharmarks
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
My Query is slow, now what?
My Query is slow, now what?My Query is slow, now what?
My Query is slow, now what?
 
Postgres for MySQL (and other database) people
Postgres for MySQL (and other database) peoplePostgres for MySQL (and other database) people
Postgres for MySQL (and other database) people
 
10x Performance Improvements
10x Performance Improvements10x Performance Improvements
10x Performance Improvements
 
10x improvement-mysql-100419105218-phpapp02
10x improvement-mysql-100419105218-phpapp0210x improvement-mysql-100419105218-phpapp02
10x improvement-mysql-100419105218-phpapp02
 
06 Excel.pdf
06 Excel.pdf06 Excel.pdf
06 Excel.pdf
 
Hundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario SpacagnaHundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario Spacagna
 
Presentation top tips for getting optimal sql execution
Presentation    top tips for getting optimal sql executionPresentation    top tips for getting optimal sql execution
Presentation top tips for getting optimal sql execution
 
Mysql query optimization
Mysql query optimizationMysql query optimization
Mysql query optimization
 
Everything You Need to Know About Oracle 12c Indexes
Everything You Need to Know About Oracle 12c IndexesEverything You Need to Know About Oracle 12c Indexes
Everything You Need to Know About Oracle 12c Indexes
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
 

More from Sveta Smirnova

MySQL 2024: Зачем переходить на MySQL 8, если в 5.х всё устраивает?
MySQL 2024: Зачем переходить на MySQL 8, если в 5.х всё устраивает?MySQL 2024: Зачем переходить на MySQL 8, если в 5.х всё устраивает?
MySQL 2024: Зачем переходить на MySQL 8, если в 5.х всё устраивает?
Sveta Smirnova
 
Database in Kubernetes: Diagnostics and Monitoring
Database in Kubernetes: Diagnostics and MonitoringDatabase in Kubernetes: Diagnostics and Monitoring
Database in Kubernetes: Diagnostics and Monitoring
Sveta Smirnova
 
MySQL Database Monitoring: Must, Good and Nice to Have
MySQL Database Monitoring: Must, Good and Nice to HaveMySQL Database Monitoring: Must, Good and Nice to Have
MySQL Database Monitoring: Must, Good and Nice to Have
Sveta Smirnova
 
MySQL Cookbook: Recipes for Developers
MySQL Cookbook: Recipes for DevelopersMySQL Cookbook: Recipes for Developers
MySQL Cookbook: Recipes for Developers
Sveta Smirnova
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOps
Sveta Smirnova
 
MySQL Test Framework для поддержки клиентов и верификации багов
MySQL Test Framework для поддержки клиентов и верификации баговMySQL Test Framework для поддержки клиентов и верификации багов
MySQL Test Framework для поддержки клиентов и верификации багов
Sveta Smirnova
 
MySQL Cookbook: Recipes for Your Business
MySQL Cookbook: Recipes for Your BusinessMySQL Cookbook: Recipes for Your Business
MySQL Cookbook: Recipes for Your Business
Sveta Smirnova
 
Производительность MySQL для DevOps
 Производительность MySQL для DevOps Производительность MySQL для DevOps
Производительность MySQL для DevOps
Sveta Smirnova
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOps
Sveta Smirnova
 
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB ClusterHow to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster
Sveta Smirnova
 
How to migrate from MySQL to MariaDB without tears
How to migrate from MySQL to MariaDB without tearsHow to migrate from MySQL to MariaDB without tears
How to migrate from MySQL to MariaDB without tears
Sveta Smirnova
 
How Safe is Asynchronous Master-Master Setup?
How Safe is Asynchronous Master-Master Setup?How Safe is Asynchronous Master-Master Setup?
How Safe is Asynchronous Master-Master Setup?
Sveta Smirnova
 
Современному хайлоду - современные решения: MySQL 8.0 и улучшения Percona
Современному хайлоду - современные решения: MySQL 8.0 и улучшения PerconaСовременному хайлоду - современные решения: MySQL 8.0 и улучшения Percona
Современному хайлоду - современные решения: MySQL 8.0 и улучшения Percona
Sveta Smirnova
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with Galera
Sveta Smirnova
 
How Safe is Asynchronous Master-Master Setup?
 How Safe is Asynchronous Master-Master Setup? How Safe is Asynchronous Master-Master Setup?
How Safe is Asynchronous Master-Master Setup?
Sveta Smirnova
 
Что нужно знать о трёх топовых фичах MySQL
Что нужно знать  о трёх топовых фичах  MySQLЧто нужно знать  о трёх топовых фичах  MySQL
Что нужно знать о трёх топовых фичах MySQL
Sveta Smirnova
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
Sveta Smirnova
 
Why MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it BackWhy MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it Back
Sveta Smirnova
 

More from Sveta Smirnova (18)

MySQL 2024: Зачем переходить на MySQL 8, если в 5.х всё устраивает?
MySQL 2024: Зачем переходить на MySQL 8, если в 5.х всё устраивает?MySQL 2024: Зачем переходить на MySQL 8, если в 5.х всё устраивает?
MySQL 2024: Зачем переходить на MySQL 8, если в 5.х всё устраивает?
 
Database in Kubernetes: Diagnostics and Monitoring
Database in Kubernetes: Diagnostics and MonitoringDatabase in Kubernetes: Diagnostics and Monitoring
Database in Kubernetes: Diagnostics and Monitoring
 
MySQL Database Monitoring: Must, Good and Nice to Have
MySQL Database Monitoring: Must, Good and Nice to HaveMySQL Database Monitoring: Must, Good and Nice to Have
MySQL Database Monitoring: Must, Good and Nice to Have
 
MySQL Cookbook: Recipes for Developers
MySQL Cookbook: Recipes for DevelopersMySQL Cookbook: Recipes for Developers
MySQL Cookbook: Recipes for Developers
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOps
 
MySQL Test Framework для поддержки клиентов и верификации багов
MySQL Test Framework для поддержки клиентов и верификации баговMySQL Test Framework для поддержки клиентов и верификации багов
MySQL Test Framework для поддержки клиентов и верификации багов
 
MySQL Cookbook: Recipes for Your Business
MySQL Cookbook: Recipes for Your BusinessMySQL Cookbook: Recipes for Your Business
MySQL Cookbook: Recipes for Your Business
 
Производительность MySQL для DevOps
 Производительность MySQL для DevOps Производительность MySQL для DevOps
Производительность MySQL для DevOps
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOps
 
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB ClusterHow to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster
 
How to migrate from MySQL to MariaDB without tears
How to migrate from MySQL to MariaDB without tearsHow to migrate from MySQL to MariaDB without tears
How to migrate from MySQL to MariaDB without tears
 
How Safe is Asynchronous Master-Master Setup?
How Safe is Asynchronous Master-Master Setup?How Safe is Asynchronous Master-Master Setup?
How Safe is Asynchronous Master-Master Setup?
 
Современному хайлоду - современные решения: MySQL 8.0 и улучшения Percona
Современному хайлоду - современные решения: MySQL 8.0 и улучшения PerconaСовременному хайлоду - современные решения: MySQL 8.0 и улучшения Percona
Современному хайлоду - современные решения: MySQL 8.0 и улучшения Percona
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with Galera
 
How Safe is Asynchronous Master-Master Setup?
 How Safe is Asynchronous Master-Master Setup? How Safe is Asynchronous Master-Master Setup?
How Safe is Asynchronous Master-Master Setup?
 
Что нужно знать о трёх топовых фичах MySQL
Что нужно знать  о трёх топовых фичах  MySQLЧто нужно знать  о трёх топовых фичах  MySQL
Что нужно знать о трёх топовых фичах MySQL
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
 
Why MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it BackWhy MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it Back
 

Recently uploaded

What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 

Recently uploaded (20)

What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 

A Billion Goods in a Few Categories: When Optimizer Histograms Help and When They Don’t

  • 1. A Billion Goods in a Few Categories When Optimizer Histograms Help and When They Don’t September 18, 2019 Sveta Smirnova
  • 2. •Introduction •The Use Case The Cardinality: Two Levels Example •Why the Difference? •Even Worse Use Case ANALYZE TABLE Limitations Example •How Histograms Work? •Left Overs Table of Contents 2
  • 3. The column statistics data dictionary table stores histogram statistics about column values, for use by the optimizer in constructing query execution plans MySQL User Reference Manual Optimizer Statistics aka Histograms 3
  • 4. • MySQL Support engineer • Author of • MySQL Troubleshooting • JSON UDF functions • FILTER clause for MySQL • Speaker • Percona Live, OOW, Fosdem, DevConf, HighLoad... Sveta Smirnova 4
  • 6. • Hardware • Wise options • Optimized queries • Brain Everything can Be Resolved! 6
  • 7. • This talk is about • How I spent the last three years • Resolving the same issue • For different customers Not Everything 7
  • 8. • This talk is about • How I spent the last three years • Resolving the same issue • For different customers • Task was to speed up the query Not Everything 7
  • 9. • Specific data distribution Not All the Queries Can be Optimized 8
  • 10. • Specific data distribution • Access on different fields • ON goods.shop id = shop.id • WHERE shop.location IN (...) • GROUP BY goods.category, shop.profile • ORDER BY shop.distance, goods.quantity Not All the Queries Can be Optimized 8
  • 11. • Specific data distribution • Access on different fields • ON goods.shop id = shop.id • WHERE shop.location IN (...) • GROUP BY goods.category, shop.profile • ORDER BY shop.distance, goods.quantity • Index cannot be used effectively Not All the Queries Can be Optimized 8
  • 12. • Data distribution varies • Big difference between number of values Red 1,000,000 Green 2 Blue 100,000 Latest Support Tickets 9
  • 13. • Data distribution varies • Constantly changing Red 100,000 Green 1,000,000 Blue 10 Latest Support Tickets 9
  • 14. • Data distribution varies • Constantly changing Red 1,000 Green 2,000 Blue 50,000 Latest Support Tickets 9
  • 15. • Data distribution varies • Cardinality is not correct • Was not updated in time • Updates too often • Calculated wrongly Latest Support Tickets 9
  • 16. • Data distribution varies • Cardinality is not correct • Index maintenance is expensive • Hardware resources • Slow updates • Window to run CREATE INDEX Latest Support Tickets 9
  • 17. • Data distribution varies • Cardinality is not correct • Index maintenance is expensive • Optimizer does not work as we wish it Examples in my talk @Percona Live Frankfurt Latest Support Tickets 9
  • 18. • Topic based on real Support cases • Couple of them are still in progress Disclaimer 10
  • 19. • Topic based on real Support cases • All examples are 100% fake • They are created so that • No customer can be identified • Everything generated Table names Column names Data • Use case itself is fictional Disclaimer 10
  • 20. • Topic based on real Support cases • All examples are 100% fake • All examples are simplified • Only columns, required to show the issue • Everything extra removed • Real tables usually store much more data Disclaimer 10
  • 21. • Topic based on real Support cases • All examples are 100% fake • All examples are simplified • All disasters happened with version 5.7 Disclaimer 10
  • 23. • categories • Less than 20 rows Two Tables 12
  • 24. • categories • Less than 20 rows • goods • More than 1M rows • 20 unique cat id values • Many other fields Price Date: added, last updated, etc. Characteristics Store ... Two Tables 12
  • 25. select * from goods join categories on (categories.id=goods.cat_id) where date_added between ’2018-07-01’ and ’2018-08-01’ and cat_id in (16,11) and price >= 1000 and <=10000 [ and ... ] [ GROUP BY ... [ORDER BY ... [ LIMIT ...]]] ; JOIN 13
  • 26. • Select from the small table Option 1: Select from the Small Table First 14
  • 27. • Select from the small table • For each cat id select from the large table Option 1: Select from the Small Table First 14
  • 28. • Select from the small table • For each cat id select from the large table • Filter result on date added[ and price[...]] Option 1: Select from the Small Table First 14
  • 29. • Select from the small table • For each cat id select from the large table • Filter result on date added[ and price[...]] • Slow with many items in the category Option 1: Select from the Small Table First 14
  • 38. • Filter rows by date added[ and price[...]] Option 2: Select From the Large Table First 16
  • 39. • Filter rows by date added[ and price[...]] • Get cat id values Option 2: Select From the Large Table First 16
  • 40. • Filter rows by date added[ and price[...]] • Get cat id values • Retrieve rows from the small table Option 2: Select From the Large Table First 16
  • 41. • Filter rows by date added[ and price[...]] • Get cat id values • Retrieve rows from the small table • Slow if number of rows, filtered by date added, is larger than number of goods in the selected categories Option 2: Select From the Large Table First 16
  • 47. • CREATE INDEX index everything (cat id, date added[, price[, ...]]) • It resolves the issue What if We use Combined Indexes? 18
  • 48. • CREATE INDEX index everything (cat id, date added[, price[, ...]]) • It resolves the issue • But not in all cases What if We use Combined Indexes? 18
  • 49. • Maintenance cost • Slower INSERT/UPDATE/DELETE • Disk space The Problem 19
  • 50. • Maintenance cost • Slower INSERT/UPDATE/DELETE • Disk space • Index not useful for selecting rows JOIN categories ON (categories.id=goods.cat_id) JOIN shops ON (shops.id=goods.shop_id) [ JOIN ... ] WHERE date_added between ’2018-07-01’ and ’2018-08-01’ AND cat_id in (16,11) AND price >= 1000 AND price <=10000 [ AND ... ] GROUP BY product_type ORDER BY date_updated DESC LIMIT 50,100 The Problem 19
  • 51. • Maintenance cost • Slower INSERT/UPDATE/DELETE • Disk space • Index not useful for selecting rows • Tables may have wrong cardinality The Problem 19
  • 52. The Use Case The Cardinality: Two Levels
  • 54. • Optimizer • Engine • MyRocks • InnoDB • Any MySQL is Layered Architecture 22
  • 55. • Number of unique values in the index • Optimizer uses for the query execution plan Cardinality 23
  • 56. • Number of unique values in the index • Optimizer uses for the query execution plan • Example • ID: 1,2,3,4,5 • Number of rows: 5 • Cardinality: 5 Cardinality 23
  • 57. • Number of unique values in the index • Optimizer uses for the query execution plan • Example • Gender: m,f,f,f,f,m,m,m,m,m,m,f,f,m,f,m,f • Number of rows: 17 • Cardinality: 2 Cardinality 23
  • 58. • Stores statistics on disk • mysql.innodb table stats • mysql.innodb index stats InnoDB: Overview 24
  • 59. • Stores statistics on disk • Returns statistics to Optimizer InnoDB: Overview 24
  • 60. • Stores statistics on disk • Returns statistics to Optimizer • In ha innobase::info • handler/ha innodb.cc InnoDB: Overview 24
  • 61. • Stores statistics on disk • Returns statistics to Optimizer • In ha innobase::info • handler/ha innodb.cc • When opens table • flag = HA STATUS CONST • Reads data from disk • Stores it in memory InnoDB: Overview 24
  • 62. • Stores statistics on disk • Returns statistics to Optimizer • In ha innobase::info • handler/ha innodb.cc • When opens table • Subsequent table accesses • flag = HA STATUS VARIABLE • Statistics from memory • Up to date Primary Key data InnoDB: Overview 24
  • 63. • Table created with option STATS AUTO RECALC = 0 • Before ANALYZE TABLE mysql> show index from testG ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 64 ... InnoDB: Flow 25
  • 64. • Table created with option STATS AUTO RECALC = 0 • After ANALYZE TABLE mysql> show index from testG ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 2 ... InnoDB: Flow 25
  • 65. • Table created with option STATS AUTO RECALC = 0 • After inserting rows mysql> show index from testG ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 16 ... InnoDB: Flow 25
  • 66. • Table created with option STATS AUTO RECALC = 0 • After restart mysql> show index from testG ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 2 ... InnoDB: Flow 25
  • 67. • Takes data from the engine Optimizer: Overview 26
  • 68. • Takes data from the engine • Class ha statistics • sql/handler.h Optimizer: Overview 26
  • 69. • Takes data from the engine • Class ha statistics • sql/handler.h • Does not have Cardinality field at all Optimizer: Overview 26
  • 70. • Takes data from the engine • Class ha statistics • sql/handler.h • Does not have Cardinality field at all • Uses formula to calculate Cardinality Optimizer: Overview 26
  • 71. • n rows: number of rows in the table • Naturally up to date • Constantly changing! Optimizer: Formula 27
  • 72. • n rows: number of rows in the table • Naturally up to date • Constantly changing! • rec per key: number of duplicates per key • Calculated by InnoDB in time of ANALYZE • rec per key = n rows / unique values • Do not change! Optimizer: Formula 27
  • 73. • n rows: number of rows in the table • Naturally up to date • Constantly changing! • rec per key: number of duplicates per key • Calculated by InnoDB in time of ANALYZE • rec per key = n rows / unique values • Do not change! • Cardinality = n rows / rec per key Optimizer: Formula 27
  • 74. • Engine stores persistent statistics InnoDB Storage Tables Statistics As Calculated Row Count Only in Memory Persistent Statistics Are Not Persistent 28
  • 75. • Engine stores persistent statistics InnoDB Storage Tables Statistics As Calculated Row Count Only in Memory • Optimizer calculates Cardinality every time when accesses engine statistics Persistent Statistics Are Not Persistent 28
  • 76. • Engine stores persistent statistics InnoDB Storage Tables Statistics As Calculated Row Count Only in Memory • Optimizer calculates Cardinality every time when accesses engine statistics • Weak user control Persistent Statistics Are Not Persistent 28
  • 78. • EXPLAIN without histograms mysql> explain select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -- Large range -> order by goods.cat_id -> limit 10G -- We ask for 10 rows only! Example 30
  • 79. • EXPLAIN without histograms *************************** 1. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table first partitions: NULL type: index possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 20 filtered: 70.00 Extra: Using where; Using index; Using temporary; Using filesort Example 30
  • 80. • EXPLAIN without histograms *************************** 2. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table partitions: NULL type: ref possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: orig.categories.id rows: 51827 filtered: 11.11 -- Default value Extra: Using where 2 rows in set, 1 warning (0.01 sec) Example 30
  • 81. • Execution time without histograms mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10; ab9f9bb7bc4f357712ec34f067eda364 - 10 rows in set (56.47 sec) Example 30
  • 82. • Engine statistics without histograms mysql> show status like ’Handler%’; +----------------------------+--------+ | Variable_name | Value | +----------------------------+--------+ ... | Handler_read_next | 964718 | | Handler_read_prev | 0 | | Handler_read_rnd | 10 | | Handler_read_rnd_next | 951671 | ... | Handler_write | 951670 | +----------------------------+--------+ 18 rows in set (0.01 sec) Example 30
  • 83. • Now let add the histogram mysql> analyze table goods update histogram on date_added; +------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +------------+-----------+----------+------------------------------+ | orig.goods | histogram | status | Histogram statistics created for column ’date_added’. | +------------+-----------+----------+------------------------------+ 1 row in set (2.01 sec) Example 30
  • 84. • EXPLAIN with the histogram mysql> explain select goods.* from goods -> join categories -> on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10G Example 30
  • 85. • EXPLAIN with the histogram *************************** 1. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table first partitions: NULL type: index possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: NULL rows: 10 -- Same as we asked filtered: 98.70 -- True numbers Extra: Using where Example 30
  • 86. • EXPLAIN with the histogram *************************** 2. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: orig.goods.cat_id rows: 1 filtered: 100.00 Extra: Using index 2 rows in set, 1 warning (0.01 sec) Example 30
  • 87. • Execution time with the histogram mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10; eeb005fae0dd3441c5c380e1d87fee84 - 10 rows in set (0.00 sec) -- 56/0 times faster! Example 30
  • 88. • Engine statistics with the histogram mysql> show status like ’Handler%’; +----------------------------+-------++----------------------------+-------+ | Variable_name | Value || Variable_name | Value | +----------------------------+-------++----------------------------+-------+ | Handler_commit | 1 || Handler_read_prev | 0 | | Handler_delete | 0 || Handler_read_rnd | 0 | | Handler_discover | 0 || Handler_read_rnd_next | 0 | | Handler_external_lock | 4 || Handler_rollback | 0 | | Handler_mrr_init | 0 || Handler_savepoint | 0 | | Handler_prepare | 0 || Handler_savepoint_rollback | 0 | | Handler_read_first | 1 || Handler_update | 0 | | Handler_read_key | 3 || Handler_write | 0 | | Handler_read_last | 0 |+----------------------------+-------+ | Handler_read_next | 9 |18 rows in set (0.00 sec) Example 30
  • 90. 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 Indexes: Number of Items with Same Value 32
  • 91. 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 Indexes: Cardinality 33
  • 92. 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 Histograms: Number of Values in Each Bucket 34
  • 93. 1 2 3 4 5 6 7 8 9 10 0 0.2 0.4 0.6 0.8 1 Histograms: Data in the Histogram 35
  • 95. Even Worse Use Case ANALYZE TABLE Limitations
  • 96. • ANALYZE TABLE often • Use large number of STATS SAMPLE PAGES Solutions in 5.7- 38
  • 97. • Counts number of pages in the table How ANALYZE TABLE Works with InnoDB? 39
  • 98. • Counts number of pages in the table • Takes STATS SAMPLE PAGES How ANALYZE TABLE Works with InnoDB? 39
  • 99. • Counts number of pages in the table • Takes STATS SAMPLE PAGES • Counts number of unique values in secondary index in these pages How ANALYZE TABLE Works with InnoDB? 39
  • 100. • Counts number of pages in the table • Takes STATS SAMPLE PAGES • Counts number of unique values in secondary index in these pages • Divides number of pages in the table on number of sample pages and multiplies result by number of unique values How ANALYZE TABLE Works with InnoDB? 39
  • 101. • Number of pages in the table: 20,000 • STATS SAMPLE PAGES: 20 (default) • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 Example 40
  • 102. • Number of pages in the table: 20,000 • STATS SAMPLE PAGES: 20 (default) • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 • Cardinality: 20,000 * 10 / 20 = 10,000 Example 40
  • 103. • Number of pages in the table: 20,000 • STATS SAMPLE PAGES: 5,000 • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 • Cardinality: 20,000 * 10 / 5,000 = 40 Example 2 41
  • 104. • Time consuming mysql> select count(*) from goods; +----------+ | count(*) | +----------+ | 80303000 | +----------+ 1 row in set (35.95 sec) Use Larger STATS SAMPLE PAGES? 42
  • 105. • Time consuming • With default STATS SAMPLE PAGES mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (0.32 sec) Use Larger STATS SAMPLE PAGES? 42
  • 106. • Time consuming • With bigger number mysql> alter table goods STATS_SAMPLE_PAGES=5000; Query OK, 0 rows affected (0.04 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (27.13 sec) Use Larger STATS SAMPLE PAGES? 42
  • 107. • Time consuming • With bigger number • 27.13/0.32 = 85 times slower! Use Larger STATS SAMPLE PAGES? 42
  • 108. • Time consuming • With bigger number • 27.13/0.32 = 85 times slower! • Not always a solution Use Larger STATS SAMPLE PAGES? 42
  • 109. Even Worse Use Case Example
  • 110. • goods characteristics CREATE TABLE ‘goods_characteristics‘ ( ‘id‘ int(11) NOT NULL AUTO_INCREMENT, ‘good_id‘ varchar(30) DEFAULT NULL, ‘size‘ int(11) DEFAULT NULL, ‘manufacturer‘ varchar(30) DEFAULT NULL, PRIMARY KEY (‘id‘), KEY ‘good_id‘ (‘good_id‘,‘size‘,‘manufacturer‘), KEY ‘size‘ (‘size‘,‘manufacturer‘) ) ENGINE=InnoDB AUTO_INCREMENT=196606 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci Two Similar Tables 44
  • 111. • goods shops CREATE TABLE ‘goods_shops‘ ( ‘id‘ int(11) NOT NULL AUTO_INCREMENT, ‘good_id‘ varchar(30) DEFAULT NULL, ‘location‘ varchar(30) DEFAULT NULL, ‘delivery_options‘ varchar(30) DEFAULT NULL, PRIMARY KEY (‘id‘), KEY ‘good_id‘ (‘good_id‘,‘location‘,‘delivery_options‘), KEY ‘location‘ (‘location‘,‘delivery_options‘) ) ENGINE=InnoDB AUTO_INCREMENT=131071 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci Two Similar Tables 44
  • 112. • Size mysql> select count(*) from goods_characteristics; +----------+ | count(*) | +----------+ | 131072 | +----------+ 1 row in set (0.08 sec) mysql> select count(*) from goods_shops; +----------+ | count(*) | +----------+ | 65536 | +----------+ 1 row in set (0.04 sec) Two Similar Tables 44
  • 113. • Data Distribution: goods characteristics mysql> select count(*) num_rows, good_id, size -> from goods_characteristics group by good_id, size; +----------+---------+------+ | num_rows | good_id | size | +----------+---------+------+ | 65536 | laptop | 7 | | 8189 | laptop | 13 | | 8187 | laptop | 8 | | 8191 | laptop | 14 | | 8190 | laptop | 9 | | 8190 | laptop | 15 | | 8188 | laptop | 10 | | 10 | laptop | 16 | | 8192 | laptop | 11 | | 10 | laptop | 17 | | 8189 | laptop | 12 | +----------+---------+------+ Two Similar Tables 44
  • 114. • Data Distribution: goods characteristics mysql> select count(*) num_rows, good_id, manufacturer -> from goods_characteristics group by good_id, manufacturer order by num_rows desc; +----------+---------+--------------+ | num_rows | good_id | manufacturer | +----------+---------+--------------+ | 65536 | laptop | Noname | | 8189 | laptop | Toshiba | | 8191 | laptop | Samsung | | 8189 | laptop | Apple | | 8191 | laptop | Acer | | 8189 | laptop | Asus | | 8189 | laptop | Dell | | 10 | laptop | Sony | | 8189 | laptop | HP | | 10 | laptop | Casper | | 8189 | laptop | Lenovo | +----------+---------+--------------+ Two Similar Tables 44
  • 115. • Data Distribution: goods shops mysql> select count(*) num_rows, good_id, location -> from goods_shops group by good_id, location order by num_rows desc; +----------+---------+---------------+ | num_rows | good_id | location | +----------+---------+---------------+ | 8191 | laptop | New York | | 8189 | laptop | Tokio | | 8191 | laptop | San Francisco | | 8189 | laptop | Istanbul | | 8189 | laptop | Paris | | 8189 | laptop | London | | 8189 | laptop | Berlin | | 10 | laptop | Moscow | | 8189 | laptop | Brussels | | 10 | laptop | Kiev | +----------+---------+---------------+ Two Similar Tables 44
  • 116. • Data Distribution: goods shops mysql> select count(*) num_rows, good_id, delivery_options -> from goods_shops group by good_id, delivery_options order by num_rows desc; +----------+---------+------------------+ | num_rows | good_id | delivery_options | +----------+---------+------------------+ | 8192 | laptop | DHL | | 8189 | laptop | Gruzovichkof | | 8191 | laptop | PTT | | 8188 | laptop | Courier | | 8190 | laptop | Normal Post | | 8187 | laptop | No delivery | | 8190 | laptop | Tracked | | 10 | laptop | Premium | | 8189 | laptop | Fedex | | 10 | laptop | Urgent | +----------+---------+------------------+ Two Similar Tables 44
  • 117. Histogram statistics are useful primarily for nonindexed columns. Adding an index to a column for which histogram statistics are applicable might also help the optimizer make row estimates. The tradeoffs are: An index must be updated when table data is modified. A histogram is created or updated only on demand, so it adds no overhead when table data is modified. On the other hand, the statistics become progres- sively more out of date when table modifications occur, until the next time they are updated. MySQL User Reference Manual Optimizer Statistics aka Histograms 45
  • 118. mysql> alter table goods_characteristics stats_sample_pages=5000; Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table goods_shops stats_sample_pages=5000; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> analyze table goods_characteristics, goods_shops; +----------------------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +----------------------------+---------+----------+----------+ | test.goods_characteristics | analyze | status | OK | | test.goods_shops | analyze | status | OK | +----------------------------+---------+----------+----------+ 2 rows in set (0.35 sec) Index Statistics is More than Good 46
  • 119. • The query mysql> select count(*) from goods_shops join goods_characteristics -> using (good_id) -> where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); ^C^C -- query aborted ERROR 1317 (70100): Query execution was interrupted Performance 47
  • 120. • Handlers mysql> show status like ’Handler%’; +----------------------------+-------------+ | Variable_name | Value | +----------------------------+-------------+ | Handler_commit | 0 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 4 | | Handler_mrr_init | 0 | | Handler_prepare | 0 | | Handler_read_first | 1 | | Handler_read_key | 13043 | | Handler_read_last | 0 | | Handler_read_next | 854,767,916 | ... Performance 47
  • 121. • Table order mysql> explain select count(*) from goods_shops join goods_characteristics -> using (good_id) where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); +----+-----------------------+-------+---------+--------+----------+---------------+ | id | table | type | key | rows | filtered | Extra | +----+-----------------------+-------+---------+--------+----------+---------------+ | 1 | goods_characteristics | index | good_id | 131072 | 25.00 | Using... | | 1 | goods_shops | ref | good_id | 65536 | 36.00 | Using... | +----+-----------------------+-------+---------+--------+----------+---------------+ 2 rows in set, 1 warning (0.00 sec) Performance 47
  • 122. • Table order matters mysql> explain select count(*) from goods_shops straight_join goods_characteristics -> using (good_id) where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); +----+-----------------------+-------+---------+--------+----------+---------------+ | id | table | type | key | rows | filtered | Extra | +----+-----------------------+-------+---------+--------+----------+---------------+ | 1 | goods_shops | index | good_id | 65536 | 36.00 | Using... | | 1 | goods_characteristics | ref | good_id | 131072 | 25.00 | Using... | +----+-----------------------+-------+---------+--------+----------+---------------+ 2 rows in set, 1 warning (0.00 sec) Performance 47
  • 123. • Table order matters mysql> select count(*) from goods_shops straight_join goods_characteristics -> using (good_id) -> where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); +----------+ | count(*) | +----------+ | 816640 | +----------+ 1 row in set (2.11 sec) Performance 47
  • 124. • Table order matters mysql> show status like ’Handler_read_next’; +-------------------+-----------+ | Variable_name | Value | +-------------------+-----------+ | Handler_read_next | 5,308,416 | +-------------------+-----------+ 1 row in set (0.00 sec) Performance 47
  • 125. • Not for all data mysql> select count(*) from goods_shops straight_join goods_characteristics -> using (good_id) -> where (size > 15 or manufacturer in (’Sony’, ’Casper’)) -> and location in -> (’New York’, ’San Francisco’, ’Paris’, ’Berlin’, ’Brussels’, ’London’) -> and delivery_options in -> (’DHL’,’Normal Post’, ’Tracked’, ’Fedex’, ’No delivery’); ^C^C -- query aborted ERROR 1317 (70100): Query execution was interrupted Performance 47
  • 126. • Not for all data mysql> show status like ’Handler%’; +----------------------------+------------+ | Variable_name | Value | +----------------------------+------------+ | Handler_commit | 10 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 28 | | Handler_mrr_init | 0 | | Handler_prepare | 0 | | Handler_read_first | 1 | | Handler_read_key | 143 | | Handler_read_last | 0 | | Handler_read_next | 16,950,265 | Performance 47
  • 127. mysql> analyze table goods_shops update histogram -> on location, delivery_options; +-------------+-----------+----------+--------------------------------+ | Table | Op | Msg_type | Msg_text | +-------------+-----------+----------+--------------------------------+ | goods_shops | histogram | status | Histogram statistics created for column ’delivery_options’. | | goods_shops | histogram | status | Histogram statistics created for column ’location’. | +-------------+-----------+----------+--------------------------------+ 2 rows in set (0.18 sec) Histograms to The Rescue 48
  • 128. mysql> analyze table goods_characteristics update histogram -> on size, manufacturer ; +-----------------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +-----------------------+-----------+----------+------------------------------+ | goods_characteristics | histogram | status | Histogram statistics created for column ’manufacturer’. | | goods_characteristics | histogram | status | Histogram statistics created for column ’size’. | +-----------------------+-----------+----------+------------------------------+ 2 rows in set (0.23 sec) Histograms to The Rescue 48
  • 129. • The query mysql> select count(*) from goods_shops join goods_characteristics -> using (good_id) -> where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); +----------+ | count(*) | +----------+ | 816640 | +----------+ 1 row in set (2.16 sec) Histograms to The Rescue 48
  • 130. • The query mysql> show status like ’Handler_read_next’; +-------------------+-----------+ | Variable_name | Value | +-------------------+-----------+ | Handler_read_next | 5,308,418 | +-------------------+-----------+ 1 row in set (0.00 sec) Histograms to The Rescue 48
  • 131. • Filtering effect mysql> explain select count(*) from goods_shops join goods_characteristics -> using (good_id) -> where size < 12 and -> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or -> delivery_options in (’Premium’, ’Urgent’)); +----+-----------------------+-------+---------+--------+----------+----------+ | id | table | type | key | rows | filtered | Extra | +----+-----------------------+-------+---------+--------+----------+----------+ | 1 | goods_shops | index | good_id | 65536 | 0.06 | Using... | | 1 | goods_characteristics | ref | good_id | 131072 | 15.63 | Using... | +----+-----------------------+-------+---------+--------+----------+----------+ 2 rows in set, 1 warning (0.00 sec) Histograms to The Rescue 48
  • 134. ↓ sql/sql planner.cc ↓ calculate condition filter Low Level 50
  • 135. ↓ sql/sql planner.cc ↓ calculate condition filter ↓ Item func *::get filtering effect Low Level 50
  • 136. ↓ sql/sql planner.cc ↓ calculate condition filter ↓ Item func *::get filtering effect • get histogram selectivity Low Level 50
  • 137. ↓ sql/sql planner.cc ↓ calculate condition filter ↓ Item func *::get filtering effect • get histogram selectivity • Seen as a percent of filtered rows in EXPLAIN Low Level 50
  • 138. • Example data mysql> create table example(f1 int) engine=innodb; mysql> insert into example values(1),(1),(1),(2),(3); mysql> select f1, count(f1) from example group by f1; +------+-----------+ | f1 | count(f1) | +------+-----------+ | 1 | 3 | | 2 | 1 | | 3 | 1 | +------+-----------+ 3 rows in set (0.00 sec) Filtered Rows 51
  • 139. • Without a histogram mysql> explain select * from example where f1 > 0G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 140. • Without a histogram mysql> explain select * from example where f1 > 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 141. • Without a histogram mysql> explain select * from example where f1 > 2G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 142. • Without a histogram mysql> explain select * from example where f1 > 3G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 143. • With the histogram mysql> analyze table example update histogram on f1 with 3 buckets; +-----------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +-----------------+-----------+----------+------------------------------+ | hist_ex.example | histogram | status | Histogram statistics created for column ’f1’. | +-----------------+-----------+----------+------------------------------+ 1 row in set (0.03 sec) Filtered Rows 51
  • 144. • With the histogram mysql> select * from information_schema.column_statistics -> where table_name=’example’G *************************** 1. row *************************** SCHEMA_NAME: hist_ex TABLE_NAME: example COLUMN_NAME: f1 HISTOGRAM: "buckets": [[1, 0.6], [2, 0.8], [3, 1.0]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-11-07 09:07:19.791470", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 3 1 row in set (0.00 sec) Filtered Rows 51
  • 145. • With the histogram mysql> explain select * from example where f1 > 0G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 100.00 -- all rows Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 146. • With the histogram mysql> explain select * from example where f1 > 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 40.00 -- 2 rows Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 147. • With the histogram mysql> explain select * from example where f1 > 2G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 -- one row Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 148. • With the histogram mysql> explain select * from example where f1 > 3G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 - one row Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 51
  • 149. 1 2 3 0 0.5 1 1.5 2 Indexes: Cardinality 52
  • 152. • CREATE INDEX • Metadata lock • Can be blocked by any query Maintenance: Locking 55
  • 153. • CREATE INDEX • Metadata lock • Can be blocked by any query • UPDATE HISTOGRAM • Backup lock • Can be locked only by a backup • Can be created any time without fear Maintenance: Locking 55
  • 154. • CREATE INDEX • Locks writes • Locks reads ∗ PS-2503 Before Percona Server 5.6.38-83.0/5.7.20-18 Upstream • Every DML updates the index Maintenance: Load 56
  • 155. • CREATE INDEX • Locks writes • Locks reads ∗ • Every DML updates the index • UPDATE HISTOGRAM • Uses up to histogram generation max mem size • Persistent after creation • DML do not touch it Maintenance: Load 56
  • 156. • Helps if query plan can be changed • Not a replacement for the index: • GROUP BY • ORDER BY • Query on a single table ∗ Histograms 57
  • 157. • Data distribution is uniform • Range optimization can be used • Full table scan is fast When Histogram are Not Helpful? 58
  • 158. • Index statistics collected by the engine • Optimizer calculates Cardinality each time when it accesses statistics • Indexes don’t always improve performance • Histograms can help Still new feature • Histograms do not replace other optimizations! Conclusion 59
  • 159. MySQL User Reference Manual Blog by Erik Froseth Blog by Frederic Descamps Talk by Oystein Grovlen @Fosdem Talk by Sergei Petrunia @PerconaLive WL #8707 More information 60