3. 3 12:42
Table statistics in MySQL (MariaDB < 10.0)
1. #rows in the table
2. #rows in a given index range (e.g. tbl.key < 123)
3. Index statistics: #rows that match tbl.key=const
• e.g. for orders.customer_id=... we get
AVG(#orders for customer)
• Basis for join optimization
• ANALYZE collects this
4. 4 12:42
Issues with statistics
●
Issue #1: index statistics is imprecise/varying
− InnoDB collects stats using sampling
− innodb_stats_persistent (ON since 5.6)
− Still, can vary widely
●
Issue #2: not enough statistics
− tbl.non_indexed_col IS [NOT] NULL
− tbl.non_indexed_col BETWEEN 10 AND 20
5. 5 12:42
JOINs need column statistics
select * from
order
join customer on order.cust_id = customer.cust_id
join supplier on order.order_id=supplier.order_id
where
order.priority='high' and order.total_price > 1K and
customer.status='vip' and customer.country='Germany' and
supplier.industry='electronics' and supplier.country='Finland'
6. 6 12:42
Solution: EITS
EITS = Engine Independent Table Statistics
●
mysql.table_stats
− #rows in table
●
mysql.index_stats
− Index cardinality for each prefix. Gives AVG(#rows for key value)
●
mysql.column_stats
− MIN value, MAX value
− Fraction of NULL values
− #different values
− Histogram
EITS = Engine Independent Table Statistics
●
mysql.table_stats
− #rows in table
●
mysql.index_stats
− Index cardinality for each prefix. Gives AVG(#rows for key value)
●
mysql.column_stats
− MIN value, MAX value
− Fraction of NULL values
− #different values
− Histogram
Provides estimates for range conds
− non_key_col > 'foo'
− non_key_col=1234
− non_key_col IS [NOT] NULL
7. 7 12:42
Colletecting EITS statistics
●
Disabled by default
●
Must be collected manually (ANALYZE TABLE)
− Takes a table/index scan
set histogram_size=200; // if you want histograms (you do)
analyze table tbl persistent for
columns (col1, col2, ...)
indexes (idx1, idx2, ...);
analyze table tbl persistent for all;
set use_stat_tables='preferably';
analyze table tbl;
8. 8 12:42
Collecting EITS statistics
●
Can also modify statistics directly
set histogram_size=200;
set use_stat_tables='preferably'
analyze table orders;
+------------------+---------+----------+-----------------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------------+---------+----------+-----------------------------------------+
| dbt3sf1.orders | analyze | status | Engine-independent statistics collected |
| dbt3sf1.orders | analyze | status | OK |
+------------------+---------+----------+-----------------------------------------+
insert into mysql.column_stats values(...);
flush table ...;
9. 9 12:42
Enabling use of EITS statistics
●
Statistics use not enabled by default
set use_stat_tables='preferably'; // or 'complementary'
set optimizer_use_condition_selectivity=4; // 1..5
●
Can enable globally or per-session
− Or even per-query: set var=value query.
10. 10 12:42
New statistics test run
select *
from
lineitem, orders
where
o_orderkey=l_orderkey and
o_orderdate between '1990-01-01' and '1998-12-06' and
l_extendedprice > 1000000
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |filtered|Extra |
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+
|1 |SIMPLE |orders |ALL |PRIMARY |NULL |NULL |NULL |1494230| 100.00 |Using where|
|1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 | 100.00 |Using where|
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+
●
4.2 seconds
●
filtered=100%
− Close to truth for o_orderdate between ...
− Far from truth for l_extendedprice > 1000000
− In 10.1, can use “ANALYZE statement” to check this
11. 11 12:42
New statistics test run (2)
set histogram_size=200;
set use_stat_tables='preferably'
analyze table lineitem, orders;
+------------------+---------+----------+-----------------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------------+---------+----------+-----------------------------------------+
| dbt3sf1.lineitem | analyze | status | Engine-independent statistics collected |
| dbt3sf1.lineitem | analyze | status | OK |
| dbt3sf1.orders | analyze | status | Engine-independent statistics collected |
| dbt3sf1.orders | analyze | status | OK |
+------------------+---------+----------+-----------------------------------------+
set optimizer_use_condition_selectivity=4; .
●
Collect table statistics
●
Make the optimizer use it
12. 12 12:42
New statistics test run (3)
+--+-----------+--------+------+-------------+-------+-------+-------------------+-------+--------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |filtered|Extra |
+--+-----------+--------+------+-------------+-------+-------+-------------------+-------+--------+-----------+
|1 |SIMPLE |lineitem|ALL |PRIMARY,i_...|NULL |NULL |NULL |6001215| 0.50 |Using where|
|1 |SIMPLE |orders |eq_ref|PRIMARY |PRIMARY|4 |lineitem.l_orderkey|1 | 99.50 |Using where|
+--+-----------+--------+------+-------------+-------+-------+-------------------+-------+--------+-----------+
select *
from
lineitem, orders
where
o_orderkey=l_orderkey and
o_orderdate between '1990-01-01' and '1998-12-06' and
l_extendedprice > 1000000
●
Re-run the query
●
lineitem.filtered=0.5% -
●
1.5 sec (from 4.2 sec)
− Can be much more for many-table joins.
l_extendedprice > 1000000
14. 14 12:42
Histogram properties
●
Good for continuous, densely populated domains
− DATE[TIME], sequential identifiers, prices, counts, ...
●
Not as good for sparse domains
− VARCHAR(100) CHARSET UTF8
●
Not as good for highly-skewed domains
− List of popular items would work better
− Should still provide an estimate that's better than no estimate
set histogram_size=256, histogram_type='single_prec_hb';
set histogram_size=128, histogram_type='double_prec_hb';
●
Can try a different histogram settings:
15. 15 12:42
EITS summary
●
New kind of statistics in MariaDB 10.0
− Complements InnoDB's statistics
●
Must be collected manually
− set histogram_size=255;
− analyze table tbl persistent for all;
●
Must be enabled to be used (safe!)
− set optimizer_use_stat_tables='preferably';
− set optimizer_use_condition_selectivity=4;
●
Please report your experience!