We store data with the intention to use it: search, retrieve, group, sort... To perform these actions effectively MySQL storage engines index data and communicate statistics with the Optimizer when it compiles a query execution plan. This approach works perfectly well unless your data distribution is not even.
Last year I worked on several tickets where data follow the same pattern: millions of popular products fit into a couple of categories and rest used the rest. We had a hard time to find a solution for retrieving goods fast. Workarounds for version 5.7 were offered. However new MySQL 8.0 feature: histograms, - would work better, cleaner and faster. This is how the idea of the talk was born.
I will discuss
- how index statistics physically stored
- which data exchanged with the Optimizer
- why it is not enough to make correct index choice
In the end, I will explain which issues resolve histograms and why using index statistics is insufficient for fast retrieving of not evenly distributed data.
https://www.percona.com/live/e18/sessions/billion-goods-in-few-categories-how-histograms-save-a-life
Billion Goods in Few Categories: how Histograms Save a Life?
1. Billion Goods in Few Categories:
how Histograms Save a Life?
November, 7, 2018
Sveta Smirnova
2. •The Case
•The Cardinality: Two Levels
•ANALYZE TABLE Limitations
•Solutions in Percona Server 5.7
•Histograms
•Conclusion
Table of Contents
2
3. • MySQL Support engineer
• Author of
• MySQL Troubleshooting
• JSON UDF functions
• FILTER clause for MySQL
• Speaker
• Percona Live, OOW, Fosdem,
DevConf, HighLoad...
Sveta Smirnova
3
4. • Hardware
• Wise options
• Optimized queries
• Brain
Everything can be Resolved!
4
5. • This talk is about
• How I spent last two years
• Resolving the same issue
• For different customers
Not Everything
5
6. • This talk is about
• How I spent last two years
• Resolving the same issue
• For different customers
• Task was to speed up the query
Not Everything
5
7. • Specific data distribution
• Access on different fields
• ON clause
• WHERE clause
• GROUP BY
• ORDER BY
• Index cannot be used effectively
Not All the Queries can be Optimized
6
8. • Topic based on real Support cases
• Couple of them are still in progress
Disclaimer
7
9. • Topic based on real Support cases
• All examples are 100% fake
• They created such that
• No customer can be identified
• Everything generated
Table names
Column names
Data
• Use case itself is fictional
Disclaimer
7
10. • Topic based on real Support cases
• All examples are 100% fake
• All examples are simplified
• Only columns, required to show the issue
• Everything extra removed
• Real tables usually store much more data
Disclaimer
7
11. • Topic based on real Support cases
• All examples are 100% fake
• All examples are simplified
• All disasters happened with version 5.7
Disclaimer
7
14. • categories
• Less than 20 rows
• goods
• More than 1M rows
• 20 unique cat id values
• Many other fields
Price
Date: added, last updated, etc.
Characteristics
Store
...
Two tables
9
16. • Select from the Small Table
Option 1: Select from the Small Table First
11
17. • Select from the Small Table
• For each cat id select from the large table
Option 1: Select from the Small Table First
11
18. • Select from the Small Table
• For each cat id select from the large table
• Filter result on date added[ and price[...]]
Option 1: Select from the Small Table First
11
19. • Select from the Small Table
• For each cat id select from the large table
• Filter result on date added[ and price[...]]
• Slow with many items in the category
Option 1: Select from the Small Table First
11
20. • Filter rows by date added[ and price[...]]
Option 2: Select from the Large Table First
12
21. • Filter rows by date added[ and price[...]]
• Get cat id values
Option 2: Select from the Large Table First
12
22. • Filter rows by date added[ and price[...]]
• Get cat id values
• Retrieve rows from the small table
Option 2: Select from the Large Table First
12
23. • Filter rows by date added[ and price[...]]
• Get cat id values
• Retrieve rows from the small table
• Slow if number of rows, filtered by
date added, is larger than number of goods
in the selected categories
Option 2: Select from the Large Table First
12
24. • CREATE INDEX index everything
(cat id, date added[, price[, ...]])
• It resolves the issue
What if use Combined Indexes?
13
25. • CREATE INDEX index everything
(cat id, date added[, price[, ...]])
• It resolves the issue
• But not in all cases
What if use Combined Indexes?
13
30. • Stores statistics on disk
• mysql.innodb table stats
• mysql.innodb index stats
InnoDB: Overview
17
31. • Stores statistics on disk
• Returns statistics to Optimizer
InnoDB: Overview
17
32. • Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
InnoDB: Overview
17
33. • Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
• When opens table
• flag = HA STATUS CONST
• Reads data from disk
• Stores it in memory
InnoDB: Overview
17
34. • Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
• When opens table
• Subsequent table accesses
• flag = HA STATUS VARIABLE
• Statistics from memory
• Up to date Primary Key data
InnoDB: Overview
17
35. • Table created with option STATS AUTO RECALC = 0
• Before ANALYZE TABLE
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 64
...
InnoDB: Flow
18
36. • Table created with option STATS AUTO RECALC = 0
• After ANALYZE TABLE
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 2
...
InnoDB: Flow
18
37. • Table created with option STATS AUTO RECALC = 0
• After inserting rows
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 16
...
InnoDB: Flow
18
38. • Table created with option STATS AUTO RECALC = 0
• After restart
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 2
...
InnoDB: Flow
18
39. • Takes data from the engine
Optimizer: Overview
19
40. • Takes data from the engine
• Class ha statistics
• sql/handler.h
Optimizer: Overview
19
41. • Takes data from the engine
• Class ha statistics
• sql/handler.h
• Does not have Cardinality field at all
Optimizer: Overview
19
42. • Takes data from the engine
• Class ha statistics
• sql/handler.h
• Does not have Cardinality field at all
• Uses formula to calculate Cardinality
Optimizer: Overview
19
43. • n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
Optimizer: Formula
20
44. • n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
• rec per key: number of duplicates per key
• Calculated by InnoDB in time of ANALYZE
• rec per key = n rows / unique values
• Do not change!
Optimizer: Formula
20
45. • n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
• rec per key: number of duplicates per key
• Calculated by InnoDB in time of ANALYZE
• rec per key = n rows / unique values
• Do not change!
• Cardinality = n rows / rec per key
Optimizer: Formula
20
46. • Engine stores persistent statistics
TokuDB InnoDB
Storage Files Tables
Statistics As Calculated As Calculated
Row Count Persistent Only in Memory
Persistent Statistics Are Not Persistent
21
47. • Engine stores persistent statistics
TokuDB InnoDB
Storage Files Tables
Statistics As Calculated As Calculated
Row Count Persistent Only in Memory
• Optimizer calculates Cardinality every time
when accesses engine statistics
Persistent Statistics Are Not Persistent
21
48. • Engine stores persistent statistics
TokuDB InnoDB
Storage Files Tables
Statistics As Calculated As Calculated
Row Count Persistent Only in Memory
• Optimizer calculates Cardinality every time
when accesses engine statistics
• Weak user control
Persistent Statistics Are Not Persistent
21
50. • Counts number of pages in the table
How ANALYZE TABLE Works with InnoDB?
23
51. • Counts number of pages in the table
• Takes STATS SAMPLE PAGES
How ANALYZE TABLE Works with InnoDB?
23
52. • Counts number of pages in the table
• Takes STATS SAMPLE PAGES
• Counts number of unique values in
secondary index in these pages
How ANALYZE TABLE Works with InnoDB?
23
53. • Counts number of pages in the table
• Takes STATS SAMPLE PAGES
• Counts number of unique values in
secondary index in these pages
• Divides number of pages in the table on
number of sample pages and multiplies
result on number of unique values
How ANALYZE TABLE Works with InnoDB?
23
54. • Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 20 (default)
• Unique values in the secondary index:
• In sample pages: 10
• In the table: 11
Example
24
55. • Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 20 (default)
• Unique values in the secondary index:
• In sample pages: 10
• In the table: 11
• Cardinality: 20,000 * 10 / 20 = 10,000
Example
24
56. • Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 5,000
• Unique values in the secondary index:
• In sample pages: 10
• In the table: 11
• Cardinality: 20,000 * 10 / 5,000 = 40
Example 2
25
57. • Time consuming
mysql> select count(*) from goods;
+----------+
| count(*) |
+----------+
| 80303000 |
+----------+
1 row in set (35.95 sec)
Use Larger STATS SAMPLE PAGES?
26
58. • Time consuming
• With default STATS SAMPLE PAGES
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (0.32 sec)
Use Larger STATS SAMPLE PAGES?
26
59. • Time consuming
• With bigger number
mysql> alter table goods STATS_SAMPLE_PAGES=5000;
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (27.13 sec)
Use Larger STATS SAMPLE PAGES?
26
60. • Time consuming
• With bigger number
• 27.13/0.32 = 85 times slower!
Use Larger STATS SAMPLE PAGES?
26
61. User Manual claims it does not
During the analysis, the table is locked
with a read lock for InnoDB and MyISAM.
Does ANALYZE TABLE Block Reads?
27
62. User Manual claims it does not
• But!
Does ANALYZE TABLE Block Reads?
27
63. User Manual claims it does not
Sometimes it blocks all subsequent queries
+------+-------------------------+---------------------------------+
| Time | State | Info |
+------+-------------------------+---------------------------------+
| 32 | Writing to net | select * from t where c > ’%0%’ |
| 12 | Waiting for table flush | select * from test.t where i=1 |
| 12 | Waiting for table flush | select * from test.t where i=2 |
| 12 | Waiting for table flush | select * from test.t where i=3 |
| 11 | Waiting for table flush | select * from test.t where i=7 |
| 10 | Waiting for table flush | select * from test.t where i=11 |
...
Does ANALYZE TABLE Block Reads?
27
64. Is not a solution
Simply Increasing STATS SAMPLE PAGES
28
70. • InnoDB stores its statistics
mysql.innodb index stats
Without the Fix: Manual Update
32
71. • InnoDB stores its statistics
mysql.innodb index stats
• This table is writable
Without the Fix: Manual Update
32
72. • InnoDB stores its statistics
mysql.innodb index stats
• This table is writable
• Updating it with following FLUSH TABLE
allows to fake any statistics
Without the Fix: Manual Update
32
73. • InnoDB stores its statistics
mysql.innodb index stats
• This table is writable
• Updating it with following FLUSH TABLE
allows to fake any statistics
• Hack
• Not documented
• Not recommended
• Can stop working any time
Without the Fix: Manual Update
32
74. • With Percona fix for blocking ANALYZE
TABLE we can use large value for
STATS SAMPLE PAGES
• Does not help when
• Index cannot be used
• Data distribution in the index vary a lot
5.7: Resume
33
75. • With Percona fix for blocking ANALYZE
TABLE we can use large value for
STATS SAMPLE PAGES
• Does not help when
• Index cannot be used
• Data distribution in the index vary a lot
• Manual update allows to fix statistics
• Not recommended
• Can stop working any time
5.7: Resume
33
77. • Optimizer Column Statistics
• Engine-independent
• No fancy calculations
• Knows about data distribution
What are the Histograms?
35
78. 1 2 3 4 5 6 7 8 9 10
0
200
400
600
800
Number of Values in Each Bucket
36
79. 1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
Data in the Histogram
37
80. • Accurate statistics
• Truly persistent
• No extra calculations on access
• Optimizer knows about data distribution
• Without touching the table!
How Histograms are Helpful?
38
81. • Example data
mysql> create table example(f1 int) engine=innodb;
mysql> insert into example values(1),(1),(1),(2),(3);
mysql> select f1, count(f1) from example group by f1;
+------+-----------+
| f1 | count(f1) |
+------+-----------+
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
+------+-----------+
3 rows in set (0.00 sec)
Filtered Rows
39
82. • Without a histogram
mysql> explain select * from example where f1 > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
Filtered Rows
39
83. • Without a histogram
mysql> explain select * from example where f1 > 1G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
Filtered Rows
39
84. • Without a histogram
mysql> explain select * from example where f1 > 2G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
Filtered Rows
39
85. • Without a histogram
mysql> explain select * from example where f1 > 3G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
Filtered Rows
39
86. • With the histogram
mysql> analyze table example update histogram on f1 with 3 buckets;
+-----------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------+-----------+----------+------------------------------+
| hist_ex.example | histogram | status | Histogram statistics created
for column ’f1’. |
+-----------------+-----------+----------+------------------------------+
1 row in set (0.03 sec)
Filtered Rows
39
87. • With the histogram
mysql> select * from information_schema.column_statistics
-> where table_name=’example’G
*************************** 1. row ***************************
SCHEMA_NAME: hist_ex
TABLE_NAME: example
COLUMN_NAME: f1
HISTOGRAM:
"buckets": [[1, 0.6], [2, 0.8], [3, 1.0]],
"data-type": "int", "null-values": 0.0, "collation-id": 8,
"last-updated": "2018-11-07 09:07:19.791470",
"sampling-rate": 1.0, "histogram-type": "singleton",
"number-of-buckets-specified": 3
1 row in set (0.00 sec)
Filtered Rows
39
88. • With the histogram
mysql> explain select * from example where f1 > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 100.00 -- all rows
Extra: Using where
Filtered Rows
39
89. • With the histogram
mysql> explain select * from example where f1 > 1G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 40.00 -- 2 rows
Extra: Using where
Filtered Rows
39
90. • With the histogram
mysql> explain select * from example where f1 > 2G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 20.00 -- one row
Extra: Using where
Filtered Rows
39
91. • With the histogram
mysql> explain select * from example where f1 > 3G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 20.00 - one row
Extra: Using where
Filtered Rows
39
92. • EXPLAIN without histograms
mysql> explain select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’ -- Large range
-> order by goods.cat_id
-> limit 10G -- We ask for 10 rows only!
Example
40
93. • EXPLAIN without histograms
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: categories -- Small table first
partitions: NULL
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 20
filtered: 70.00
Extra: Using where; Using index;
Using temporary; Using filesort
Example
40
94. • EXPLAIN without histograms
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: goods -- Large table
partitions: NULL
type: ref
possible_keys: cat_id_2
key: cat_id_2
key_len: 5
ref: orig.categories.id
rows: 51827
filtered: 11.11 -- Default value
Extra: Using where
2 rows in set, 1 warning (0.01 sec)
Example
40
95. • Execution time without histograms
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
mysql> select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10;
ab9f9bb7bc4f357712ec34f067eda364 -
10 rows in set (56.47 sec)
Example
40
96. • Engine statistics without histograms
mysql> show status like ’Handler%’;
+----------------------------+--------+
| Variable_name | Value |
+----------------------------+--------+
...
| Handler_read_next | 964718 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 10 |
| Handler_read_rnd_next | 951671 |
...
| Handler_write | 951670 |
+----------------------------+--------+
18 rows in set (0.01 sec)
Example
40
97. • Now lets add the histogram
mysql> analyze table goods update histogram on date_added;
+------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------+-----------+----------+------------------------------+
| orig.goods | histogram | status | Histogram statistics created
for column ’date_added’. |
+------------+-----------+----------+------------------------------+
1 row in set (2.01 sec)
Example
40
98. • EXPLAIN with the histogram
mysql> explain select goods.* from goods
-> join categories
-> on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10G
Example
40
99. • EXPLAIN with the histogram
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: goods -- Large table first
partitions: NULL
type: index
possible_keys: cat_id_2
key: cat_id_2
key_len: 5
ref: NULL
rows: 10 -- Same as we asked
filtered: 98.70 -- True numbers
Extra: Using where
Example
40
100. • EXPLAIN with the histogram
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: categories -- Small table
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: orig.goods.cat_id
rows: 1
filtered: 100.00
Extra: Using index
2 rows in set, 1 warning (0.01 sec)
Example
40
101. • Execution time with the histogram
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
mysql> select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10;
eeb005fae0dd3441c5c380e1d87fee84 -
10 rows in set (0.00 sec) -- 56 times faster!
Example
40
103. • Data distribution is uniform
• Range optimization can be used
• Full table scan is fast
When Histogram are not Helpful?
41
104. Backward index scan
• Better Statistics Persistence in InnoDB
• MySQL bug #80178
• MySQL bug #84654
• Better PRIMARY key access
Other Improvements in 8.0
42
106. • Index statistics collected by the egine
• Optimizer calculates Cardinality each time
when accesses statistics
• Indexes not always improve performance
• Histograms can help
Still new feature
Conclusion
44
107. MySQL User Reference Manual
Blog by Erik Froseth
Blog by Frederic Descamps
Talk by Oystein Grovlen @Fosdem
Talk by Sergei Petrunia @PerconaLive
Talk by Sergei Golubchik @HighLoad++
More information
45
111. • Stores key statistics on disk and in memory
• tablename status id.tokudb
TokuDB: Overview
49
112. • Stores key statistics on disk and in memory
• Stores row count on disk and in memory
• tablename main id.tokudb
• tablename key keyname id.tokudb
TokuDB: Overview
49
113. • Stores key statistics on disk and in memory
• Stores row count on disk and in memory
• Returns statistics to Optimizer
TokuDB: Overview
49
114. • Stores key statistics on disk and in memory
• Stores row count on disk and in memory
• Returns statistics to Optimizer
• In ha tokudb::info (handler/ha tokudb.cc)
TokuDB: Overview
49
116. • Stored on disk
• Updated during ANALYZE
• Background ANALYZE
• Explicitly called
TokuDB: Key Statistics
50
117. • Stored on disk
• Updated during ANALYZE
• Background ANALYZE
• Explicitly called
• Not updated when tokudb auto analyze=0
TokuDB: Key Statistics
50
119. • Updated in TOKUDB SHARE::update cardinality counts
• Stored in tokudb::set card in status
• In standard ANALYZE
• standard t::on run
TokuDB Key Statistics: Code
51
120. • Updated in TOKUDB SHARE::update cardinality counts
• Stored in tokudb::set card in status
• Retrieved in tokudb::get card from status
• When table is open
• In ha tokudb::initialize share
TokuDB Key Statistics: Code
51
121. • Updated in TOKUDB SHARE::update cardinality counts
• Stored in tokudb::set card in status
• Retrieved in tokudb::get card from status
• Used in TOKUDB SHARE::set cardinality counts in table
for (uint32_t j = 0; j < key->actual_key_parts; j++) {
...
assert_always(next_key_part < _rec_per_keys);
ulong val = _rec_per_key[next_key_part++];
val = (val * tokudb::sysvars::cardinality_scale_percent) / 100;
TokuDB Key Statistics: Code
51
122. • Stored on disk
• Updated
• Each time table is updated
• When ha tokudb::info called
TokuDB Logical Rows Count
52
123. mysql> create table test(
-> id int not null auto_increment primary key,
-> f1 int,
-> ts timestamp,
-> key(f1)
-> ) engine=tokudb;
Query OK, 0 rows affected (0.10 sec)
mysql> insert into test (f1, ts) values(1, NOW()), (2, NOW());
Query OK, 2 rows affected (0.03 sec)
Records: 2 Duplicates: 0 Warnings: 0
...
mysql> insert into test (f1, ts) select f1, NOW() from test;
Query OK, 32 rows affected (0.01 sec)
Records: 32 Duplicates: 0 Warnings: 0
TokuDB Test Case
53
124. mysql> select count(distinct id), count(distinct f1) from test;
+--------------------+--------------------+
| count(distinct id) | count(distinct f1) |
+--------------------+--------------------+
| 64 | 2 |
+--------------------+--------------------+
1 row in set (0.01 sec)
TokuDB Test Case
53
125. • SHOW INDEX
mysql> show index from testG
*************************** 1. row ***************************
Table: test
Non_unique: 0
Key_name: PRIMARY
Column_name: id
Cardinality: 64
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Column_name: f1
Cardinality: 64
TokuDB: After First run
54
126. • Number of rows
$ ../bin/tokuftdump --header --nodata var/mysqld.1/data/test/test_key_f1_145_1_1
ft:
layout_version=29
layout_version_original=29
layout_version_read_from_disk=29
build_id=0
build_id_original=0
time_of_creation= 1537709029 Sun Sep 23 16:23:49 2018
time_of_last_modification=1537709100 Sun Sep 23 16:25:00 2018
...
estimated numrows=64
estimated numbytes=640
logical row count=64
TokuDB: After First run
54
127. • Index Statistics
Thread 44 "mysqld" hit Breakpoint 1, TOKUDB_SHARE::set_cardinality_counts_in_tab
(this=0x7fd86da54020, table=0x7fd86d90b020)
at /home/sveta/src/percona-server/storage/tokudb/ha_tokudb.cc:400
400 if (val == 0 || _rows == 0 ||
(gdb) p key->name
$21 = 0x7fd86d879999 "f1"
(gdb) p val
$22 = 0
TokuDB: After First run
54
129. mysql> analyze table test;
+-----------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-----------+---------+----------+----------+
| test.test | analyze | status | OK |
+-----------+---------+----------+----------+
1 row in set (0.01 sec)
TokuDB: ANALYZE TABLE
55
130. • SHOW INDEX
mysql> show index from testG
*************************** 1. row ***************************
Table: test
Non_unique: 0
Key_name: PRIMARY
Column_name: id
Cardinality: 64
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Column_name: f1
Cardinality: 2
TokuDB: After ANALYZE TABLE
56
131. • Number of rows
$ ../bin/tokuftdump --header --nodata var/mysqld.1/data/test/test_key_f1_145_1_1
ft:
layout_version=29
layout_version_original=29
layout_version_read_from_disk=29
build_id=0
build_id_original=0
time_of_creation= 1537709029 Sun Sep 23 16:23:49 2018
time_of_last_modification=1537709100 Sun Sep 23 16:25:00 2018
...
estimated numrows=64
estimated numbytes=640
logical row count=64
TokuDB: After ANALYZE TABLE
56
132. • Index Statistics
Thread 44 "mysqld" hit Breakpoint 1, TOKUDB_SHARE::set_cardinality_counts_in_tab
(this=0x7fd86da54020, table=0x7fd86d90b020)
at /home/sveta/src/percona-server/storage/tokudb/ha_tokudb.cc:400
400 if (val == 0 || _rows == 0 ||
(gdb) p key->name
$26 = 0x7fd86d879999 "f1"
(gdb) p val
$27 = 32
TokuDB: After ANALYZE TABLE
56
134. mysql> insert into test (f1, ts) select f1, NOW() from test;
Query OK, 64 rows affected (0.01 sec)
Records: 64 Duplicates: 0 Warnings: 0
mysql> insert into test (f1, ts) select f1, NOW() from test;
Query OK, 128 rows affected (0.01 sec)
Records: 128 Duplicates: 0 Warnings: 0
mysql> insert into test (f1, ts) select f1, NOW() from test;
Query OK, 256 rows affected (0.02 sec)
Records: 256 Duplicates: 0 Warnings: 0
mysql> select count(distinct id), count(distinct f1) from test;
+--------------------+--------------------+
| count(distinct id) | count(distinct f1) |
+--------------------+--------------------+
| 512 | 2 |
+--------------------+--------------------+
1 row in set (0.01 sec)
TokuDB: Let’s Insert More Data
57
135. • SHOW INDEX
mysql> show index from testG
*************************** 1. row ***************************
Table: test
Non_unique: 0
Key_name: PRIMARY
Column_name: id
Cardinality: 512
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Column_name: f1
Cardinality: 16
TokuDB: After INSERT
58
136. • Number of rows
$ ../bin/tokuftdump --header --nodata var/mysqld.1/data/test/test_key_f1_145_1_1
ft:
layout_version=29
layout_version_original=29
layout_version_read_from_disk=29
build_id=0
build_id_original=0
time_of_creation= 1537709029 Sun Sep 23 16:23:49 2018
time_of_last_modification=1537709880 Sun Sep 23 16:38:00 2018
...
estimated numrows=512
estimated numbytes=5120
logical row count=512
TokuDB: After INSERT
58
137. • Index Statistics
Thread 44 "mysqld" hit Breakpoint 1, TOKUDB_SHARE::set_cardinality_counts_in_tab
(this=0x7fd86da54020, table=0x7fd86d90b020)
at /home/sveta/src/percona-server/storage/tokudb/ha_tokudb.cc:400
400 if (val == 0 || _rows == 0 ||
(gdb) p key->name
$30 = 0x7fd86d879999 "f1"
(gdb) p val
$31 = 32
TokuDB: After INSERT
58
139. • SHOW INDEX
mysql> show index from testG
*************************** 1. row ***************************
Table: test
Non_unique: 0
Key_name: PRIMARY
Column_name: id
Cardinality: 512
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Column_name: f1
Cardinality: 16
TokuDB: After Restart
59
140. • Index Statistics
Thread 44 "mysqld" hit Breakpoint 1, TOKUDB_SHARE::set_cardinality_counts_in_tab
(this=0x7fd4e67ea020, table=0x7fd4e6765c20)
at /home/sveta/src/percona-server/storage/tokudb/ha_tokudb.cc:400
400 if (val == 0 || _rows == 0 ||
(gdb) p key->name
$3 = 0x7fd4e66d7599 "f1"
(gdb) p val
$4 = 32
TokuDB: After Restart
59
143. • Index statistics updated only when ANALYZE
TABLE is running
TokuDB: Conclusion
60
144. • Index statistics updated only when ANALYZE
TABLE is running
• Logical row count updated each time when
number of rows change
TokuDB: Conclusion
60
145. • Index statistics updated only when ANALYZE
TABLE is running
• Logical row count updated each time when
number of rows change
• Cardinality based on both numbers
TokuDB: Conclusion
60
146. • Index statistics updated only when ANALYZE
TABLE is running
• Logical row count updated each time when
number of rows change
• Cardinality based on both numbers
• It is expected the cardinality is not the same
• After updates
• Even when ANALYZE TABLE never run
TokuDB: Conclusion
60