We store data with an intention to use it: search, retrieve, group, sort... To do it effectively the MySQL Optimizer uses index statistics when compiles the query execution plan. This approach works excellently unless your data distribution is not even.
Last year I worked on several tickets where data follow the same pattern: millions of popular products fit into a couple of categories and rest used the rest. We had a hard time to find a solution for retrieving goods fast. We offered workarounds for version 5.7. However new MariaDB and MySQL 8.0 feature: histograms, - would work better, cleaner and faster. The idea of the talk was born.
Of course, histograms are not a panacea and do not help in all situations.
I will discuss:
how index statistics physically stored by the storage engine
which data exchanged with the Optimizer
why it is not enough to make correct index choice
when histograms can help and when they cannot
differences between MySQL and MariaDB histograms
Call Girls in Naraina Delhi đŻCall Us đ8264348440đ
Â
Billion Goods in Few Categories: How Histograms Save a Life?
1. Billion Goods in Few Categories
How Histograms Save a Life?
Sveta Smirnova
Percona
2. â˘Introduction
â˘The Use Case
The Cardinality: Two Levels
Example
â˘Why the Difference?
â˘Even Worse Use Case
ANALYZE TABLE Limitations
Example
â˘How Histograms Work?
â˘Left Overs
Table of Contents
2
3. The column statistics data dictionary table stores histogram statistics about
column values, for use by the optimizer in constructing query execution plans
MySQL User Reference Manual
Optimizer Statistics aka Histograms
3
4. ⢠MySQL Support engineer
⢠Author of
⢠MySQL Troubleshooting
⢠JSON UDF functions
⢠FILTER clause for MySQL
⢠Speaker
⢠Percona Live, OOW, Fosdem,
DevConf, HighLoad...
Sveta Smirnova
4
6. ⢠Hardware
⢠Wise options
⢠Optimized queries
⢠Brain
Everything can Be Resolved!
6
7. ⢠This talk is about
⢠How I spent the last three years
⢠Resolving the same issue
⢠For different customers
Not Everything
7
8. ⢠This talk is about
⢠How I spent the last three years
⢠Resolving the same issue
⢠For different customers
⢠Task was to speed up the query
Not Everything
7
10. ⢠SpeciďŹc data distribution
⢠Access on different ďŹelds
⢠ON goods.shop id = shop.id
⢠WHERE shop.location IN (...)
⢠GROUP BY goods.category, shop.profile
⢠ORDER BY shop.distance, goods.quantity
Not All the Queries Can be Optimized
8
11. ⢠SpeciďŹc data distribution
⢠Access on different ďŹelds
⢠ON goods.shop id = shop.id
⢠WHERE shop.location IN (...)
⢠GROUP BY goods.category, shop.profile
⢠ORDER BY shop.distance, goods.quantity
⢠Index cannot be used effectively
Not All the Queries Can be Optimized
8
12. ⢠Data distribution varies
⢠Big difference between number of values
Red 1,000,000
Green 2
Blue 100,000
Latest Support Tickets
9
13. ⢠Data distribution varies
⢠Constantly changing
Red 100,000
Green 1,000,000
Blue 10
Latest Support Tickets
9
14. ⢠Data distribution varies
⢠Constantly changing
Red 1,000
Green 2,000
Blue 50,000
Latest Support Tickets
9
15. ⢠Data distribution varies
⢠Cardinality is not correct
⢠Was not updated in time
⢠Updates too often
⢠Calculated wrongly
Latest Support Tickets
9
16. ⢠Data distribution varies
⢠Cardinality is not correct
⢠Index maintenance is expensive
⢠Hardware resources
⢠Slow updates
⢠Window to run CREATE INDEX
Latest Support Tickets
9
17. ⢠Data distribution varies
⢠Cardinality is not correct
⢠Index maintenance is expensive
⢠Optimizer does not work as we wish it
Examples in my talk @Percona Live Frankfurt
Latest Support Tickets
9
18. ⢠Topic based on real Support cases
⢠Couple of them are still in progress
Disclaimer
10
19. ⢠Topic based on real Support cases
⢠All examples are 100% fake
⢠They are created so that
⢠No customer can be identiďŹed
⢠Everything generated
Table names
Column names
Data
⢠Use case itself is ďŹctional
Disclaimer
10
20. ⢠Topic based on real Support cases
⢠All examples are 100% fake
⢠All examples are simpliďŹed
⢠Only columns, required to show the issue
⢠Everything extra removed
⢠Real tables usually store much more data
Disclaimer
10
21. ⢠Topic based on real Support cases
⢠All examples are 100% fake
⢠All examples are simpliďŹed
⢠All disasters happened with version 5.7
Disclaimer
10
24. ⢠categories
⢠Less than 20 rows
⢠goods
⢠More than 1M rows
⢠20 unique cat id values
⢠Many other ďŹelds
Price
Date: added, last updated, etc.
Characteristics
Store
...
Two Tables
12
26. ⢠Select from the small table
Option 1: Select from the Small Table First
14
27. ⢠Select from the small table
⢠For each cat id select from the large table
Option 1: Select from the Small Table First
14
28. ⢠Select from the small table
⢠For each cat id select from the large table
⢠Filter result on date added[ and price[...]]
Option 1: Select from the Small Table First
14
29. ⢠Select from the small table
⢠For each cat id select from the large table
⢠Filter result on date added[ and price[...]]
⢠Slow with many items in the category
Option 1: Select from the Small Table First
14
38. ⢠Filter rows by date added[ and price[...]]
Option 2: Select From the Large Table First
16
39. ⢠Filter rows by date added[ and price[...]]
⢠Get cat id values
Option 2: Select From the Large Table First
16
40. ⢠Filter rows by date added[ and price[...]]
⢠Get cat id values
⢠Retrieve rows from the small table
Option 2: Select From the Large Table First
16
41. ⢠Filter rows by date added[ and price[...]]
⢠Get cat id values
⢠Retrieve rows from the small table
⢠Slow if number of rows, ďŹltered by
date added, is larger than number of
goods in the selected categories
Option 2: Select From the Large Table First
16
47. ⢠CREATE INDEX index everything
(cat id, date added[, price[, ...]])
⢠It resolves the issue
What if We use Combined Indexes?
18
48. ⢠CREATE INDEX index everything
(cat id, date added[, price[, ...]])
⢠It resolves the issue
⢠But not in all cases
What if We use Combined Indexes?
18
50. ⢠Maintenance cost
⢠Slower INSERT/UPDATE/DELETE
⢠Disk space
⢠Index not useful for selecting rows
JOIN categories ON (categories.id=goods.cat_id)
JOIN shops ON (shops.id=goods.shop_id)
[ JOIN ... ]
WHERE
date_added between â2018-07-01â and â2018-08-01â
AND
cat_id in (16,11) AND price >= 1000 AND price <=10000 [ AND ... ]
GROUP BY product_type
ORDER BY date_updated DESC
LIMIT 50,100
The Problem
19
51. ⢠Maintenance cost
⢠Slower INSERT/UPDATE/DELETE
⢠Disk space
⢠Index not useful for selecting rows
⢠Tables may have wrong cardinality
The Problem
19
55. ⢠Number of unique values in the index
⢠Optimizer uses for the query execution plan
Cardinality
23
56. ⢠Number of unique values in the index
⢠Optimizer uses for the query execution plan
⢠Example
⢠ID: 1,2,3,4,5
⢠Number of rows: 5
⢠Cardinality: 5
Cardinality
23
57. ⢠Number of unique values in the index
⢠Optimizer uses for the query execution plan
⢠Example
⢠Gender: m,f,f,f,f,m,m,m,m,m,m,f,f,m,f,m,f
⢠Number of rows: 17
⢠Cardinality: 2
Cardinality
23
58. ⢠Stores statistics on disk
⢠mysql.innodb table stats
⢠mysql.innodb index stats
InnoDB: Overview
24
59. ⢠Stores statistics on disk
⢠Returns statistics to Optimizer
InnoDB: Overview
24
60. ⢠Stores statistics on disk
⢠Returns statistics to Optimizer
⢠In ha innobase::info
⢠handler/ha innodb.cc
InnoDB: Overview
24
61. ⢠Stores statistics on disk
⢠Returns statistics to Optimizer
⢠In ha innobase::info
⢠handler/ha innodb.cc
⢠When opens table
⢠flag = HA STATUS CONST
⢠Reads data from disk
⢠Stores it in memory
InnoDB: Overview
24
62. ⢠Stores statistics on disk
⢠Returns statistics to Optimizer
⢠In ha innobase::info
⢠handler/ha innodb.cc
⢠When opens table
⢠Subsequent table accesses
⢠flag = HA STATUS VARIABLE
⢠Statistics from memory
⢠Up to date Primary Key data
InnoDB: Overview
24
63. ⢠Table created with option STATS AUTO RECALC = 0
⢠Before ANALYZE TABLE
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 64
...
InnoDB: Flow
25
64. ⢠Table created with option STATS AUTO RECALC = 0
⢠After ANALYZE TABLE
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 2
...
InnoDB: Flow
25
65. ⢠Table created with option STATS AUTO RECALC = 0
⢠After inserting rows
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 16
...
InnoDB: Flow
25
66. ⢠Table created with option STATS AUTO RECALC = 0
⢠After restart
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 2
...
InnoDB: Flow
25
68. ⢠Takes data from the engine
⢠Class ha statistics
⢠sql/handler.h
Optimizer: Overview
26
69. ⢠Takes data from the engine
⢠Class ha statistics
⢠sql/handler.h
⢠Does not have Cardinality ďŹeld at all
Optimizer: Overview
26
70. ⢠Takes data from the engine
⢠Class ha statistics
⢠sql/handler.h
⢠Does not have Cardinality ďŹeld at all
⢠Uses formula to calculate Cardinality
Optimizer: Overview
26
71. ⢠n rows: number of rows in the table
⢠Naturally up to date
⢠Constantly changing!
Optimizer: Formula
27
72. ⢠n rows: number of rows in the table
⢠Naturally up to date
⢠Constantly changing!
⢠rec per key: number of duplicates per key
⢠Calculated by InnoDB in time of ANALYZE
⢠rec per key = n rows / unique values
⢠Do not change!
Optimizer: Formula
27
73. ⢠n rows: number of rows in the table
⢠Naturally up to date
⢠Constantly changing!
⢠rec per key: number of duplicates per key
⢠Calculated by InnoDB in time of ANALYZE
⢠rec per key = n rows / unique values
⢠Do not change!
⢠Cardinality = n rows / rec per key
Optimizer: Formula
27
74. ⢠Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
Persistent Statistics Are Not Persistent
28
75. ⢠Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
⢠Optimizer calculates Cardinality every time
when accesses engine statistics
Persistent Statistics Are Not Persistent
28
76. ⢠Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
⢠Optimizer calculates Cardinality every time
when accesses engine statistics
⢠Weak user control
Persistent Statistics Are Not Persistent
28
78. ⢠EXPLAIN without histograms
mysql> explain select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between â2000-01-01â and â2001-01-01â -- Large range
-> order by goods.cat_id
-> limit 10G -- We ask for 10 rows only!
Example
30
79. ⢠EXPLAIN without histograms
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: categories -- Small table first
partitions: NULL
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 20
filtered: 70.00
Extra: Using where; Using index;
Using temporary; Using filesort
Example
30
80. ⢠EXPLAIN without histograms
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: goods -- Large table
partitions: NULL
type: ref
possible_keys: cat_id_2
key: cat_id_2
key_len: 5
ref: orig.categories.id
rows: 51827
filtered: 11.11 -- Default value
Extra: Using where
2 rows in set, 1 warning (0.01 sec)
Example
30
81. ⢠Execution time without histograms
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
mysql> select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between â2000-01-01â and â2001-01-01â
-> order by goods.cat_id
-> limit 10;
ab9f9bb7bc4f357712ec34f067eda364 -
10 rows in set (56.47 sec)
Example
30
82. ⢠Engine statistics without histograms
mysql> show status like âHandler%â;
+----------------------------+--------+
| Variable_name | Value |
+----------------------------+--------+
...
| Handler_read_next | 964718 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 10 |
| Handler_read_rnd_next | 951671 |
...
| Handler_write | 951670 |
+----------------------------+--------+
18 rows in set (0.01 sec)
Example
30
83. ⢠Now let add the histogram
mysql> analyze table goods update histogram on date_added;
+------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------+-----------+----------+------------------------------+
| orig.goods | histogram | status | Histogram statistics created
for column âdate_addedâ. |
+------------+-----------+----------+------------------------------+
1 row in set (2.01 sec)
Example
30
84. ⢠EXPLAIN with the histogram
mysql> explain select goods.* from goods
-> join categories
-> on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between â2000-01-01â and â2001-01-01â
-> order by goods.cat_id
-> limit 10G
Example
30
85. ⢠EXPLAIN with the histogram
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: goods -- Large table first
partitions: NULL
type: index
possible_keys: cat_id_2
key: cat_id_2
key_len: 5
ref: NULL
rows: 10 -- Same as we asked
filtered: 98.70 -- True numbers
Extra: Using where
Example
30
86. ⢠EXPLAIN with the histogram
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: categories -- Small table
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: orig.goods.cat_id
rows: 1
filtered: 100.00
Extra: Using index
2 rows in set, 1 warning (0.01 sec)
Example
30
87. ⢠Execution time with the histogram
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
mysql> select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between â2000-01-01â and â2001-01-01â
-> order by goods.cat_id
-> limit 10;
eeb005fae0dd3441c5c380e1d87fee84 -
10 rows in set (0.00 sec) -- 56/0 times faster!
Example
30
96. ⢠ANALYZE TABLE often
⢠Use large number of STATS SAMPLE PAGES
Solutions in 5.7-
38
97. ⢠Counts number of pages in the table
How ANALYZE TABLE Works with InnoDB?
39
98. ⢠Counts number of pages in the table
⢠Takes STATS SAMPLE PAGES
How ANALYZE TABLE Works with InnoDB?
39
99. ⢠Counts number of pages in the table
⢠Takes STATS SAMPLE PAGES
⢠Counts number of unique values in
secondary index in these pages
How ANALYZE TABLE Works with InnoDB?
39
100. ⢠Counts number of pages in the table
⢠Takes STATS SAMPLE PAGES
⢠Counts number of unique values in
secondary index in these pages
⢠Divides number of pages in the table on
number of sample pages and multiplies
result by number of unique values
How ANALYZE TABLE Works with InnoDB?
39
101. ⢠Number of pages in the table: 20,000
⢠STATS SAMPLE PAGES: 20 (default)
⢠Unique values in the secondary index:
⢠In sample pages: 10
⢠In the table: 11
Example
40
102. ⢠Number of pages in the table: 20,000
⢠STATS SAMPLE PAGES: 20 (default)
⢠Unique values in the secondary index:
⢠In sample pages: 10
⢠In the table: 11
⢠Cardinality: 20,000 * 10 / 20 = 10,000
Example
40
103. ⢠Number of pages in the table: 20,000
⢠STATS SAMPLE PAGES: 5,000
⢠Unique values in the secondary index:
⢠In sample pages: 10
⢠In the table: 11
⢠Cardinality: 20,000 * 10 / 5,000 = 40
Example 2
41
104. ⢠Time consuming
mysql> select count(*) from goods;
+----------+
| count(*) |
+----------+
| 80303000 |
+----------+
1 row in set (35.95 sec)
Use Larger STATS SAMPLE PAGES?
42
105. ⢠Time consuming
⢠With default STATS SAMPLE PAGES
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (0.32 sec)
Use Larger STATS SAMPLE PAGES?
42
106. ⢠Time consuming
⢠With bigger number
mysql> alter table goods STATS_SAMPLE_PAGES=5000;
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (27.13 sec)
Use Larger STATS SAMPLE PAGES?
42
107. ⢠Time consuming
⢠With bigger number
⢠27.13/0.32 = 85 times slower!
Use Larger STATS SAMPLE PAGES?
42
108. ⢠Time consuming
⢠With bigger number
⢠27.13/0.32 = 85 times slower!
⢠Not always a solution
Use Larger STATS SAMPLE PAGES?
42
114. ⢠Data Distribution: goods characteristics
mysql> select count(*) num_rows, good_id, manufacturer
-> from goods_characteristics group by good_id, manufacturer order by num_ro
+----------+---------+--------------+
| num_rows | good_id | manufacturer |
+----------+---------+--------------+
| 65536 | laptop | Noname | | 8189 | laptop | Toshiba |
| 8191 | laptop | Samsung | | 8189 | laptop | Apple |
| 8191 | laptop | Acer | | 8189 | laptop | Asus |
| 8189 | laptop | Dell | | 10 | laptop | Sony |
| 8189 | laptop | HP | | 10 | laptop | Casper |
| 8189 | laptop | Lenovo | +----------+---------+--------------+
Two Similar Tables
44
115. ⢠Data Distribution: goods shops
mysql> select count(*) num_rows, good_id, location
-> from goods_shops group by good_id, location order by num_rows desc;
+----------+---------+---------------+
| num_rows | good_id | location |
+----------+---------+---------------+
| 8191 | laptop | New York | | 8189 | laptop | Tokio |
| 8191 | laptop | San Francisco | | 8189 | laptop | Istanbul |
| 8189 | laptop | Paris | | 8189 | laptop | London |
| 8189 | laptop | Berlin | | 10 | laptop | Moscow |
| 8189 | laptop | Brussels | | 10 | laptop | Kiev |
+----------+---------+---------------+
Two Similar Tables
44
116. ⢠Data Distribution: goods shops
mysql> select count(*) num_rows, good_id, delivery_options
-> from goods_shops group by good_id, delivery_options order by num_rows des
+----------+---------+------------------+
| num_rows | good_id | delivery_options |
+----------+---------+------------------+
| 8192 | laptop | DHL | | 8189 | laptop | Gruzovichkof
| 8191 | laptop | PTT | | 8188 | laptop | Courier
| 8190 | laptop | Normal Post | | 8187 | laptop | No delivery
| 8190 | laptop | Tracked | | 10 | laptop | Premium
| 8189 | laptop | Fedex | | 10 | laptop | Urgent
+----------+---------+----------------
Two Similar Tables
44
117. Histogram statistics are useful primarily for nonindexed columns. Adding an
index to a column for which histogram statistics are applicable might also help
the optimizer make row estimates. The tradeoďŹs are:
An index must be updated when table data is modiďŹed.
A histogram is created or updated only on demand, so it adds no overhead
when table data is modiďŹed. On the other hand, the statistics become progres-
sively more out of date when table modiďŹcations occur, until the next time they
are updated.
MySQL User Reference Manual
Optimizer Statistics aka Histograms
45
118. mysql> alter table goods_characteristics stats_sample_pages=5000;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table goods_shops stats_sample_pages=5000;
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table goods_characteristics, goods_shops;
+----------------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------------------+---------+----------+----------+
| test.goods_characteristics | analyze | status | OK |
| test.goods_shops | analyze | status | OK |
+----------------------------+---------+----------+----------+
2 rows in set (0.35 sec)
Index Statistics is More than Good
46
119. ⢠The query
mysql> select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (âLenovoâ, âDellâ, âToshibaâ, âSamsungâ, âAcerâ)
-> and (location in (âMoscowâ, âKievâ) or
-> delivery_options in (âPremiumâ, âUrgentâ));
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
Performance
47
121. ⢠Table order
mysql> explain select count(*) from goods_shops join goods_characteristics
-> using (good_id) where size < 12 and
-> manufacturer in (âLenovoâ, âDellâ, âToshibaâ, âSamsungâ, âAcerâ)
-> and (location in (âMoscowâ, âKievâ) or
-> delivery_options in (âPremiumâ, âUrgentâ));
+----+-----------------------+-------+---------+--------+----------+------------
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+------------
| 1 | goods_characteristics | index | good_id | 131072 | 25.00 | Using... |
| 1 | goods_shops | ref | good_id | 65536 | 36.00 | Using... |
+----+-----------------------+-------+---------+--------+----------+------------
2 rows in set, 1 warning (0.00 sec)
Performance
47
122. ⢠Table order matters
mysql> explain select count(*) from goods_shops straight_join goods_characterist
-> using (good_id) where size < 12 and
-> manufacturer in (âLenovoâ, âDellâ, âToshibaâ, âSamsungâ, âAcerâ)
-> and (location in (âMoscowâ, âKievâ) or
-> delivery_options in (âPremiumâ, âUrgentâ));
+----+-----------------------+-------+---------+--------+----------+------------
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+------------
| 1 | goods_shops | index | good_id | 65536 | 36.00 | Using... |
| 1 | goods_characteristics | ref | good_id | 131072 | 25.00 | Using... |
+----+-----------------------+-------+---------+--------+----------+------------
2 rows in set, 1 warning (0.00 sec)
Performance
47
123. ⢠Table order matters
mysql> select count(*) from goods_shops straight_join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (âLenovoâ, âDellâ, âToshibaâ, âSamsungâ, âAcerâ)
-> and (location in (âMoscowâ, âKievâ) or
-> delivery_options in (âPremiumâ, âUrgentâ));
+----------+
| count(*) |
+----------+
| 816640 |
+----------+
1 row in set (2.11 sec)
Performance
47
124. ⢠Table order matters
mysql> show status like âHandler_read_nextâ;
+-------------------+-----------+
| Variable_name | Value |
+-------------------+-----------+
| Handler_read_next | 5,308,416 |
+-------------------+-----------+
1 row in set (0.00 sec)
Performance
47
125. ⢠Not for all data
mysql> select count(*) from goods_shops straight_join goods_characteristics
-> using (good_id)
-> where (size > 15 or manufacturer in (âSonyâ, âCasperâ))
-> and location in
-> (âNew Yorkâ, âSan Franciscoâ, âParisâ, âBerlinâ, âBrusselsâ, âLondonâ)
-> and delivery_options in
-> (âDHLâ,âNormal Postâ, âTrackedâ, âFedexâ, âNo deliveryâ);
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
Performance
47
126. ⢠Not for all data
mysql> show status like âHandler%â;
+----------------------------+------------+
| Variable_name | Value |
+----------------------------+------------+
| Handler_commit | 10 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 28 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 143 |
| Handler_read_last | 0 |
| Handler_read_next | 16,950,265 |
Performance
47
127. mysql> analyze table goods_shops update histogram
-> on location, delivery_options;
+-------------+-----------+----------+--------------------------------+
| Table | Op | Msg_type | Msg_text |
+-------------+-----------+----------+--------------------------------+
| goods_shops | histogram | status | Histogram statistics created
for column âdelivery_optionsâ. |
| goods_shops | histogram | status | Histogram statistics created
for column âlocationâ. |
+-------------+-----------+----------+--------------------------------+
2 rows in set (0.18 sec)
Histograms to The Rescue
48
128. mysql> analyze table goods_characteristics update histogram
-> on size, manufacturer ;
+-----------------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------+-----------+----------+------------------------------+
| goods_characteristics | histogram | status | Histogram statistics created
for column âmanufacturerâ. |
| goods_characteristics | histogram | status | Histogram statistics created
for column âsizeâ. |
+-----------------------+-----------+----------+------------------------------+
2 rows in set (0.23 sec)
Histograms to The Rescue
48
129. ⢠The query
mysql> select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (âLenovoâ, âDellâ, âToshibaâ, âSamsungâ, âAcerâ)
-> and (location in (âMoscowâ, âKievâ) or
-> delivery_options in (âPremiumâ, âUrgentâ));
+----------+
| count(*) |
+----------+
| 816640 |
+----------+
1 row in set (2.16 sec)
Histograms to The Rescue
48
130. ⢠The query
mysql> show status like âHandler_read_nextâ;
+-------------------+-----------+
| Variable_name | Value |
+-------------------+-----------+
| Handler_read_next | 5,308,418 |
+-------------------+-----------+
1 row in set (0.00 sec)
Histograms to The Rescue
48
131. ⢠Filtering effect
mysql> explain select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (âLenovoâ, âDellâ, âToshibaâ, âSamsungâ, âAcerâ)
-> and (location in (âMoscowâ, âKievâ) or
-> delivery_options in (âPremiumâ, âUrgentâ));
+----+-----------------------+-------+---------+--------+----------+----------+
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+----------+
| 1 | goods_shops | index | good_id | 65536 | 0.06 | Using... |
| 1 | goods_characteristics | ref | good_id | 131072 | 15.63 | Using... |
+----+-----------------------+-------+---------+--------+----------+----------+
2 rows in set, 1 warning (0.00 sec)
Histograms to The Rescue
48
137. â sql/sql planner.cc
â calculate condition filter
â Item func *::get filtering effect
⢠get histogram selectivity
⢠Seen as a percent of ďŹltered rows in
EXPLAIN
Low Level
50
138. ⢠Example data
mysql> create table example(f1 int) engine=innodb;
mysql> insert into example values(1),(1),(1),(2),(3);
mysql> select f1, count(f1) from example group by f1;
+------+-----------+
| f1 | count(f1) |
+------+-----------+
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
+------+-----------+
3 rows in set (0.00 sec)
Filtered Rows
51
139. ⢠Without a histogram
mysql> explain select * from example where f1 > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
140. ⢠Without a histogram
mysql> explain select * from example where f1 > 1G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
141. ⢠Without a histogram
mysql> explain select * from example where f1 > 2G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
142. ⢠Without a histogram
mysql> explain select * from example where f1 > 3G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
143. ⢠With the histogram
mysql> analyze table example update histogram on f1 with 3 buckets;
+-----------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------+-----------+----------+------------------------------+
| hist_ex.example | histogram | status | Histogram statistics created
for column âf1â. |
+-----------------+-----------+----------+------------------------------+
1 row in set (0.03 sec)
Filtered Rows
51
144. ⢠With the histogram
mysql> select * from information_schema.column_statistics
-> where table_name=âexampleâG
*************************** 1. row ***************************
SCHEMA_NAME: hist_ex
TABLE_NAME: example
COLUMN_NAME: f1
HISTOGRAM:
"buckets": [[1, 0.6], [2, 0.8], [3, 1.0]],
"data-type": "int", "null-values": 0.0, "collation-id": 8,
"last-updated": "2018-11-07 09:07:19.791470",
"sampling-rate": 1.0, "histogram-type": "singleton",
"number-of-buckets-specified": 3
1 row in set (0.00 sec)
Filtered Rows
51
145. ⢠With the histogram
mysql> explain select * from example where f1 > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 100.00 -- all rows
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
146. ⢠With the histogram
mysql> explain select * from example where f1 > 1G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 40.00 -- 2 rows
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
147. ⢠With the histogram
mysql> explain select * from example where f1 > 2G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 20.00 -- one row
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
148. ⢠With the histogram
mysql> explain select * from example where f1 > 3G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 20.00 - one row
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
152. Histograms Indexes
Maintained by Optimizer Storage Engine
Updated On Demand On every DML â
Storage Light Heavy
Optimizer Uses Real Numbers ââ Cardinality
â Unless persistent statistics used
ââ For up to 1024 buckets
Histograms vs Indexes
55
153. ⢠CREATE INDEX
⢠Metadata lock
⢠Can be blocked by any query
Maintenance: Locking
56
154. ⢠CREATE INDEX
⢠Metadata lock
⢠Can be blocked by any query
⢠UPDATE HISTOGRAM
⢠Backup lock
⢠Can be locked only by a backup
⢠Can be created any time without fear
Maintenance: Locking
56
155. ⢠CREATE INDEX
⢠Locks writes
⢠Locks reads â
PS-2503
Before Percona Server 5.6.38-83.0/5.7.20-18
Upstream
⢠Every DML updates the index
Maintenance: Load
57
156. ⢠CREATE INDEX
⢠Locks writes
⢠Locks reads â
⢠Every DML updates the index
⢠UPDATE HISTOGRAM
⢠Uses up to
histogram generation max mem size
⢠Persistent after creation
⢠DML do not touch it
Maintenance: Load
57
157. ⢠Helps if query plan can be changed
⢠Not a replacement for the index:
⢠GROUP BY
⢠ORDER BY
⢠Query on a single table â
Only if ďŹltering effect can change the plan
Histograms
58
158. ⢠Data distribution is uniform
⢠Range optimization can be used
⢠Full table scan is fast
When Histogram are Not Helpful?
59
159. ⢠Index statistics collected by the engine
⢠Optimizer calculates Cardinality each time
when it accesses statistics
⢠Indexes donât always improve performance
⢠Histograms can help
Still new feature
⢠Histograms do not replace other
optimizations!
Conclusion
60
160. MySQL User Reference Manual
Blog by Erik Froseth
Blog by Frederic Descamps
Talk by Oystein Grovlen @Fosdem
Talk by Sergei Petrunia @PerconaLive
WL #8707
More information
61
163. Perconaâs open source database experts are
true superheroes, improving database
performance for customers across the globe.
Perconaâs open source database experts are
true superheroes, improving database
performance for customers across the globe.
Discover what it means to have a Percona
career with the smartest people in the
database performance industries, solving the
most challenging problems our customers
come across.
Weâre Hiring!
64