Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Optimizer Histograms: When they Help and When Do Not?

188 views

Published on

Talk for pre-Fosdem MySQL Day on February 1, 2019.

Last year I worked on several tickets where data follow the same pattern: millions of popular products fit into a couple of categories and rest used the rest. We had a hard time to find a solution for retrieving goods fast.

MySQL 8.0 has a feature which resolves such issues: optimizer histograms, storing statistics of an exact number of values in each data bucket.

However in real life histograms help not with all queries, accessing non-uniform data. How you write a query, the number of rows in the table, data distribution: all these may affect the use of histograms.

In this session I show examples, demonstrating how Optimizer uses histograms.

Published in: Software
  • Be the first to comment

Optimizer Histograms: When they Help and When Do Not?

  1. 1. Optimizer Histograms When they Help and When Do Not? February, 01, 2019 Sveta Smirnova
  2. 2. • MySQL Support engineer • Author of • MySQL Troubleshooting • JSON UDF functions • FILTER clause for MySQL • Speaker • Percona Live, OOW, Fosdem, DevConf, HighLoad... Sveta Smirnova 2
  3. 3. •Why do I Care? •The Use Case •Even Worse Use Case •Why the Difference? •How Histograms Work? Table of Contents 3
  4. 4. The column statistics data dictionary table stores histogram statistics about column values, for use by the optimizer in constructing query execution plans MySQL User Reference Manual Optimizer Statistics aka Histograms 4
  5. 5. Why do I Care?
  6. 6. • Data distribution vary • Big difference between number of values • Costantly changing Latest Support Tickets 6
  7. 7. • Data distribution vary • Cardinality is not correct • Was not updated in time • Updates too often • Calculated wrongly Latest Support Tickets 6
  8. 8. • Data distribution vary • Cardinality is not correct • Index maintenance costs a lot • Hardware resources • Slow updates • Window to run CREATE INDEX Latest Support Tickets 6
  9. 9. • Data distribution vary • Cardinality is not correct • Index maintenance costs a lot • Optimizer does not work as we wish to Examples in my talk @Percona Live Latest Support Tickets 6
  10. 10. • Topic based on real Support cases • Couple of them are still in progress Disclaimer 7
  11. 11. • Topic based on real Support cases • All examples are 100% fake • They created such that • No customer can be identified • Everything generated Table names Column names Data • Use case itself is fictional Disclaimer 7
  12. 12. • Topic based on real Support cases • All examples are 100% fake • All examples are simplified • Only columns, required to show the issue • Everything extra removed • Real tables usually store much more data Disclaimer 7
  13. 13. • Topic based on real Support cases • All examples are 100% fake • All examples are simplified • All disasters happened with version 5.7 Disclaimer 7
  14. 14. The Use Case
  15. 15. • categories • Less than 20 rows Two tables 9
  16. 16. • categories • Less than 20 rows • goods • More than 1M rows • 20 unique cat id values • Many other fields Price Date: added, last updated, etc. Characteristics Store ... Two tables 9
  17. 17. select * from goods join categories on (categories.id=goods.cat_id) where date_added between ’2018-07-01’ and ’2018-08-01’ and cat_id in (16,11) and price >= 1000 and <=10000 [ and ... ] [ GROUP BY ... [ORDER BY ... [ LIMIT ...]]] ; JOIN 10
  18. 18. • Select from the Small Table Option 1: Select from the Small Table First 11
  19. 19. • Select from the Small Table • For each cat id select from the large table Option 1: Select from the Small Table First 11
  20. 20. • Select from the Small Table • For each cat id select from the large table • Filter result on date added[ and price[...]] Option 1: Select from the Small Table First 11
  21. 21. • Select from the Small Table • For each cat id select from the large table • Filter result on date added[ and price[...]] • Slow with many items in the category Option 1: Select from the Small Table First 11
  22. 22. • Filter rows by date added[ and price[...]] Option 2: Select from the Large Table First 12
  23. 23. • Filter rows by date added[ and price[...]] • Get cat id values Option 2: Select from the Large Table First 12
  24. 24. • Filter rows by date added[ and price[...]] • Get cat id values • Retrieve rows from the small table Option 2: Select from the Large Table First 12
  25. 25. • Filter rows by date added[ and price[...]] • Get cat id values • Retrieve rows from the small table • Slow if number of rows, filtered by date added, is larger than number of goods in the selected categories Option 2: Select from the Large Table First 12
  26. 26. • CREATE INDEX index everything (cat id, date added[, price[, ...]]) • It resolves the issue What if use Combined Indexes? 13
  27. 27. • CREATE INDEX index everything (cat id, date added[, price[, ...]]) • It resolves the issue • But not in all cases What if use Combined Indexes? 13
  28. 28. • Maintenance cost • Slower INSERT/UPDATE/DELETE • Disk space The Problem 14
  29. 29. • Maintenance cost • Slower INSERT/UPDATE/DELETE • Disk space • Index not useful for selecting rows JOIN categories ON (categories.id=goods.cat_id) JOIN shops ON (shops.id=goods.shop_id) [ JOIN ... ] WHERE date_added between ’2018-07-01’ and ’2018-08-01’ AND cat_id in (16,11) AND price >= 1000 AND price <=10000 [ AND ... ] GROUP BY product_type ORDER BY date_updated DESC LIMIT 50,100 The Problem 14
  30. 30. • Maintenance cost • Slower INSERT/UPDATE/DELETE • Disk space • Index not useful for selecting rows • Tables may have wrong cardinality The Problem 14
  31. 31. • EXPLAIN without histograms mysql> explain select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -- Large range -> order by goods.cat_id -> limit 10G -- We ask for 10 rows only! Example 15
  32. 32. • EXPLAIN without histograms *************************** 1. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table first partitions: NULL type: index possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 20 filtered: 70.00 Extra: Using where; Using index; Using temporary; Using filesort Example 15
  33. 33. • EXPLAIN without histograms *************************** 2. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table partitions: NULL type: ref possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: orig.categories.id rows: 51827 filtered: 11.11 -- Default value Extra: Using where 2 rows in set, 1 warning (0.01 sec) Example 15
  34. 34. • Execution time without histograms mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10; ab9f9bb7bc4f357712ec34f067eda364 - 10 rows in set (56.47 sec) Example 15
  35. 35. • Engine statistics without histograms mysql> show status like ’Handler%’; +----------------------------+--------+ | Variable_name | Value | +----------------------------+--------+ ... | Handler_read_next | 964718 | | Handler_read_prev | 0 | | Handler_read_rnd | 10 | | Handler_read_rnd_next | 951671 | ... | Handler_write | 951670 | +----------------------------+--------+ 18 rows in set (0.01 sec) Example 15
  36. 36. • Now lets add the histogram mysql> analyze table goods update histogram on date_added; +------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +------------+-----------+----------+------------------------------+ | orig.goods | histogram | status | Histogram statistics created for column ’date_added’. | +------------+-----------+----------+------------------------------+ 1 row in set (2.01 sec) Example 15
  37. 37. • EXPLAIN with the histogram mysql> explain select goods.* from goods -> join categories -> on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10G Example 15
  38. 38. • EXPLAIN with the histogram *************************** 1. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table first partitions: NULL type: index possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: NULL rows: 10 -- Same as we asked filtered: 98.70 -- True numbers Extra: Using where Example 15
  39. 39. • EXPLAIN with the histogram *************************** 2. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: orig.goods.cat_id rows: 1 filtered: 100.00 Extra: Using index 2 rows in set, 1 warning (0.01 sec) Example 15
  40. 40. • Execution time with the histogram mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10; eeb005fae0dd3441c5c380e1d87fee84 - 10 rows in set (0.00 sec) -- 56/0 times faster! Example 15
  41. 41. • Engine statistics with the histogram mysql> show status like ’Handler%’; +----------------------------+-------++----------------------------+-------+ | Variable_name | Value || Variable_name | Value | +----------------------------+-------++----------------------------+-------+ | Handler_commit | 1 || Handler_read_prev | 0 | | Handler_delete | 0 || Handler_read_rnd | 0 | | Handler_discover | 0 || Handler_read_rnd_next | 0 | | Handler_external_lock | 4 || Handler_rollback | 0 | | Handler_mrr_init | 0 || Handler_savepoint | 0 | | Handler_prepare | 0 || Handler_savepoint_rollback | 0 | | Handler_read_first | 1 || Handler_update | 0 | | Handler_read_key | 3 || Handler_write | 0 | | Handler_read_last | 0 |+----------------------------+-------+ | Handler_read_next | 9 |18 rows in set (0.00 sec) Example 15
  42. 42. Even Worse Use Case
  43. 43. • goods characteristics CREATE TABLE ‘goods_characteristics‘ ( ‘id‘ int(11) NOT NULL AUTO_INCREMENT, ‘good_id‘ varchar(30) DEFAULT NULL, ‘size‘ int(11) DEFAULT NULL, ‘manufacturer‘ varchar(30) DEFAULT NULL, PRIMARY KEY (‘id‘), KEY ‘good_id‘ (‘good_id‘,‘size‘,‘manufacturer‘), KEY ‘size‘ (‘size‘,‘manufacturer‘) ) ENGINE=InnoDB AUTO_INCREMENT=196606 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci Two Similar Tables 17
  44. 44. • goods shops CREATE TABLE ‘goods_shops‘ ( ‘id‘ int(11) NOT NULL AUTO_INCREMENT, ‘good_id‘ varchar(30) DEFAULT NULL, ‘location‘ varchar(30) DEFAULT NULL, ‘delivery_options‘ varchar(30) DEFAULT NULL, PRIMARY KEY (‘id‘), KEY ‘good_id‘ (‘good_id‘,‘location‘,‘delivery_options‘), KEY ‘location‘ (‘location‘,‘delivery_options‘) ) ENGINE=InnoDB AUTO_INCREMENT=131071 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci Two Similar Tables 17
  45. 45. • Size mysql> select count(*) from goods_characteristics; +----------+ | count(*) | +----------+ | 131072 | +----------+ 1 row in set (0.08 sec) mysql> select count(*) from goods_shops; +----------+ | count(*) | +----------+ | 65536 | +----------+ 1 row in set (0.04 sec) Two Similar Tables 17
  46. 46. • Data Distribution: goods characteristics mysql> select count(*) num_rows, good_id, size -> from goods_characteristics group by good_id, size; +----------+---------+------+ | num_rows | good_id | size | +----------+---------+------+ | 65536 | laptop | 7 | | 8187 | laptop | 8 | | 8190 | laptop | 9 | | 8188 | laptop | 10 | | 8192 | laptop | 11 | | 8189 | laptop | 12 | | 8189 | laptop | 13 | | 8191 | laptop | 14 | | 8190 | laptop | 15 | | 10 | laptop | 16 | | 10 | laptop | 17 | +----------+---------+------+ Two Similar Tables 17
  47. 47. • Data Distribution: goods characteristics mysql> select count(*) num_rows, good_id, manufacturer -> from goods_characteristics group by good_id, manufacturer order by num_rows desc; +----------+---------+--------------+ | num_rows | good_id | manufacturer | +----------+---------+--------------+ | 65536 | laptop | Noname | | 8191 | laptop | Samsung | | 8191 | laptop | Acer | | 8189 | laptop | Dell | | 8189 | laptop | HP | | 8189 | laptop | Lenovo | | 8189 | laptop | Toshiba | | 8189 | laptop | Apple | | 8189 | laptop | Asus | | 10 | laptop | Sony | | 10 | laptop | Casper | +----------+---------+--------------+ Two Similar Tables 17
  48. 48. • Data Distribution: goods shops mysql> select count(*) num_rows, good_id, location -> from goods_shops group by good_id, location order by num_rows desc; +----------+---------+---------------+ | num_rows | good_id | location | +----------+---------+---------------+ | 8191 | laptop | New York | | 8191 | laptop | San Francisco | | 8189 | laptop | Paris | | 8189 | laptop | Berlin | | 8189 | laptop | Brussels | | 8189 | laptop | Tokio | | 8189 | laptop | Istanbul | | 8189 | laptop | London | | 10 | laptop | Moscow | | 10 | laptop | Kiev | +----------+---------+---------------+ Two Similar Tables 17
  49. 49. • Data Distribution: goods shops mysql> select count(*) num_rows, good_id, delivery_options -> from goods_shops group by good_id, delivery_options order by num_rows desc; +----------+---------+------------------+ | num_rows | good_id | delivery_options | +----------+---------+------------------+ | 8192 | laptop | DHL | | 8191 | laptop | PTT | | 8190 | laptop | Normal Post | | 8190 | laptop | Tracked | | 8189 | laptop | Fedex | | 8189 | laptop | Gruzovichkof | | 8188 | laptop | Courier | | 8187 | laptop | No delivery | | 10 | laptop | Premium | | 10 | laptop | Urgent | +----------+---------+------------------+ Two Similar Tables 17
  50. 50. Histogram statistics are useful primarily for nonindexed columns. Adding an index to a column for which histogram statistics are applicable might also help the optimizer make row estimates. The tradeoffs are: An index must be updated when table data is modified. A histogram is created or updated only on demand, so it adds no overhead when table data is modified. On the other hand, the statistics become progres- sively more out of date when table modifications occur, until the next time they are updated. MySQL User Reference Manual Optimizer Statistics aka Histograms 18
  51. 51. mysql> alter table goods_characteristics stats_sample_pages=5000; Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table goods_shops stats_sample_pages=5000; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> analyze table goods_characteristics, goods_shops; +----------------------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +----------------------------+---------+----------+----------+ | test.goods_characteristics | analyze | status | OK | | test.goods_shops | analyze | status | OK | +----------------------------+---------+----------+----------+ 2 rows in set (0.35 sec) Index Statistics is More than Good 19
  52. 52. • The query mysql> select count(*) from goods_shops join goods_characteristics using (good_id) -> where size < 12 and manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or delivery_options in (’Premium’, ’Urgent’)); ^C^C -- query aborted ERROR 1317 (70100): Query execution was interrupted Performance? 20
  53. 53. • Handlers mysql> show status like ’Handler%’; +----------------------------+-------------+ | Variable_name | Value | +----------------------------+-------------+ | Handler_commit | 0 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 4 | | Handler_mrr_init | 0 | | Handler_prepare | 0 | | Handler_read_first | 1 | | Handler_read_key | 13043 | | Handler_read_last | 0 | | Handler_read_next | 854,767,916 | ... Performance? 20
  54. 54. • Table order mysql> explain select count(*) from goods_shops join goods_characteristics using (good_id) -> where size < 12 and manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or delivery_options in (’Premium’, ’Urgent’)); +----+-----------------------+-------+---------+--------+----------+--------------------------+ | id | table | type | key | rows | filtered | Extra | +----+-----------------------+-------+---------+--------+----------+--------------------------+ | 1 | goods_characteristics | index | good_id | 131072 | 25.00 | Using where; Using index | | 1 | goods_shops | ref | good_id | 65536 | 36.00 | Using where; Using index | +----+-----------------------+-------+---------+--------+----------+--------------------------+ 2 rows in set, 1 warning (0.00 sec) Performance? 20
  55. 55. • Table order matters mysql> explain select count(*) from goods_shops straight_join goods_characteristics -> using (good_id) -> where size < 12 and manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or delivery_options in (’Premium’, ’Urgent’)); +----+-----------------------+-------+---------+--------+----------+--------------------------+ | id | table | type | key | rows | filtered | Extra | +----+-----------------------+-------+---------+--------+----------+--------------------------+ | 1 | goods_shops | index | good_id | 65536 | 36.00 | Using where; Using index | | 1 | goods_characteristics | ref | good_id | 131072 | 25.00 | Using where; Using index | +----+-----------------------+-------+---------+--------+----------+--------------------------+ 2 rows in set, 1 warning (0.00 sec) Performance? 20
  56. 56. • Table order matters mysql> select count(*) from goods_shops straight_join goods_characteristics using (good_id) -> where size < 12 and manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or delivery_options in (’Premium’, ’Urgent’)); +----------+ | count(*) | +----------+ | 816640 | +----------+ 1 row in set (2.11 sec) mysql> show status like ’Handler_read_next’; +-------------------+-----------+ | Variable_name | Value | +-------------------+-----------+ | Handler_read_next | 5,308,416 | +-------------------+-----------+ 1 row in set (0.00 sec) Performance? 20
  57. 57. mysql> analyze table goods_shops update histogram on location, delivery_options; +-------------+-----------+----------+-----------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +-------------+-----------+----------+-----------------------------------------------------+ | goods_shops | histogram | status | Histogram statistics created... ’delivery_options’. | | goods_shops | histogram | status | Histogram statistics created for column ’location’. | +-------------+-----------+----------+-----------------------------------------------------+ 2 rows in set (0.18 sec) mysql> analyze table goods_characteristics update histogram on size, manufacturer ; +-----------------------+-----------+----------+-------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +-----------------------+-----------+----------+-------------------------------------------------+ | goods_characteristics | histogram | status | Histogram statistics created... ’manufacturer’. | | goods_characteristics | histogram | status | Histogram statistics created for column ’size’. | +-----------------------+-----------+----------+-------------------------------------------------+ 2 rows in set (0.23 sec) Histograms to Rescue 21
  58. 58. • The query mysql> select count(*) from goods_shops join goods_characteristics using (good_id) -> where size < 12 and manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’) -> and (location in (’Moscow’, ’Kiev’) or delivery_options in (’Premium’, ’Urgent’)); +----------+ | count(*) | +----------+ | 816640 | +----------+ 1 row in set (2.16 sec) mysql> show status like ’Handler_read_next’; +-------------------+-----------+ | Variable_name | Value | +-------------------+-----------+ | Handler_read_next | 5,308,418 | +-------------------+-----------+ 1 row in set (0.00 sec) Histograms to Rescue 21
  59. 59. • Filtering effect mysql> explain select count(*) from goods_shops join goods_characteristics using (good_id) where s +----+-----------------------+-------+---------+--------+----------+--------------------------+ | id | table | type | key | rows | filtered | Extra | +----+-----------------------+-------+---------+--------+----------+--------------------------+ | 1 | goods_shops | index | good_id | 65536 | 0.06 | Using where; Using index | | 1 | goods_characteristics | ref | good_id | 131072 | 15.63 | Using where; Using index | +----+-----------------------+-------+---------+--------+----------+--------------------------+ 2 rows in set, 1 warning (0.00 sec) Histograms to Rescue 21
  60. 60. Why the Difference?
  61. 61. 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 Indexes: Number of Items with Same Value 23
  62. 62. 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 Indexes: Cardinality 24
  63. 63. 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 Histograms: Number of Values in Each Bucket 25
  64. 64. 1 2 3 4 5 6 7 8 9 10 0 0.2 0.4 0.6 0.8 1 Histograms: Data in the Histogram 26
  65. 65. How Histograms Work?
  66. 66. ↓ sql/sql planner.cc Low Level 28
  67. 67. ↓ sql/sql planner.cc ↓ calculate condition filter Low Level 28
  68. 68. ↓ sql/sql planner.cc ↓ calculate condition filter ↓ Item func *::get filtering effect Low Level 28
  69. 69. ↓ sql/sql planner.cc ↓ calculate condition filter ↓ Item func *::get filtering effect • get histogram selectivity Low Level 28
  70. 70. ↓ sql/sql planner.cc ↓ calculate condition filter ↓ Item func *::get filtering effect • get histogram selectivity • Seen as a percent of filtered rows in EXPLAIN Low Level 28
  71. 71. • Example data mysql> create table example(f1 int) engine=innodb; mysql> insert into example values(1),(1),(1),(2),(3); mysql> select f1, count(f1) from example group by f1; +------+-----------+ | f1 | count(f1) | +------+-----------+ | 1 | 3 | | 2 | 1 | | 3 | 1 | +------+-----------+ 3 rows in set (0.00 sec) Filtered Rows 29
  72. 72. • Without a histogram mysql> explain select * from example where f1 > 0G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 29
  73. 73. • Without a histogram mysql> explain select * from example where f1 > 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 29
  74. 74. • Without a histogram mysql> explain select * from example where f1 > 2G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 29
  75. 75. • Without a histogram mysql> explain select * from example where f1 > 3G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 29
  76. 76. • With the histogram mysql> analyze table example update histogram on f1 with 3 buckets; +-----------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +-----------------+-----------+----------+------------------------------+ | hist_ex.example | histogram | status | Histogram statistics created for column ’f1’. | +-----------------+-----------+----------+------------------------------+ 1 row in set (0.03 sec) Filtered Rows 29
  77. 77. • With the histogram mysql> select * from information_schema.column_statistics -> where table_name=’example’G *************************** 1. row *************************** SCHEMA_NAME: hist_ex TABLE_NAME: example COLUMN_NAME: f1 HISTOGRAM: "buckets": [[1, 0.6], [2, 0.8], [3, 1.0]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-11-07 09:07:19.791470", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 3 1 row in set (0.00 sec) Filtered Rows 29
  78. 78. • With the histogram mysql> explain select * from example where f1 > 0G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 100.00 -- all rows Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 29
  79. 79. • With the histogram mysql> explain select * from example where f1 > 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 40.00 -- 2 rows Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 29
  80. 80. • With the histogram mysql> explain select * from example where f1 > 2G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 -- one row Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 29
  81. 81. • With the histogram mysql> explain select * from example where f1 > 3G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 - one row Extra: Using where 1 row in set, 1 warning (0.00 sec) Filtered Rows 29
  82. 82. • CREATE INDEX • Metadata lock • Can be blocked by any query Locking 30
  83. 83. • CREATE INDEX • Metadata lock • Can be blocked by any query • UPDATE HISTOGRAM • Backup lock • Can be locked only by a backup • Can be created any time without fear Locking 30
  84. 84. • Helps if query plan can be changed • Not a replacement for the index: • GROUP BY • ORDER BY • Query on a single table ∗ Outcome 31
  85. 85. • Data distribution is uniform • Range optimization can be used • Full table scan is fast When Histogram are not Helpful? 32
  86. 86. • Index statistics collected by the engine • Optimizer calculates Cardinality each time when accesses statistics • Indexes not always improve performance • Histograms can help Still new feature • Histograms do not replace other optimizations! Conclusion 33
  87. 87. MySQL User Reference Manual Blog by Erik Froseth Blog by Frederic Descamps Talk by Oystein Grovlen @Fosdem Talk by Sergei Petrunia @PerconaLive WL #8707 More information 34
  88. 88. www.slideshare.net/SvetaSmirnova twitter.com/svetsmirnova github.com/svetasmirnova Thank you! 35
  89. 89. DATABASE PERFORMANCE MATTERS

×