SlideShare a Scribd company logo
1 of 50
Download to read offline
Sergei Petrunia
MariaDB devroom
FOSDEM 2021
Improving MariaDB’s
Query Optimizer
with better selectivity estimates
MariaDB Fest 2021
Sergei Petrunia
MariaDB developer
2
Background:
what are selectivity estimates
why they are important
3
Optimizer uses data statistics
Cost-based query optimizer
uses data statisistics:
●
Cardinalities
●
Selectivities
●
Cost model
●
...
Query Optimizer
Query
Query Plan
Query
Data
statistics
4
Cardinalities and selectivities
Cardinality is a number of rows
●
Number of rows in the table
●
Number of rows left after local
condition check (the condition
has “selectivity”)
●
Number of rows after the join
operation - ”join output
cardinality”
●
...
select *
from
orders, lineitem
where
o_orderkey=l_orderkey and
o_orderdate between $DATE1 and $DATE2 and
l_extendedprice > 1000
orders
lineitem
join
5
Condition selectivity
●
Condition selectivity:
●
Selectivity
– 0 (0%) = “no rows are accepted”
– 1.0 (or 100%) = “all rows are accepted”
– 0.33 (or 33%) = “one row in 3 is accepted”
selectivity(cond)=
rows satisfyingcond
totalrows
∗100%
6
Selectivity is important
●
Table access cost depends on the
“incoming” cardinality
●
Cardinality errors are multiplied, e.g.
– 2x customers
– 3x orders per customer
– gives 6x lineitems
●
Compare: errors in read costs are
added.
●
Wrong cardinality is a common cause of
huge errors in estimates.
select * from
customer
join orders on c_custkey=o_custkey
join lineitem on o_orderkey=l_orderkey
where
c_acctbal < 100 and
o_shippriority=3 and
l_extendedprice>1000
customer
orders
lineitem
7
Computing selectivity
8
Computing selectivity
select *
from
lineitem, orders
where
o_orderkey=l_orderkey and
o_orderdate between $DATE1 and $DATE2 and
l_extendedprice > 1000000
●
Local condition selectivity
– Uses columns of one table
– Typically, “column CMP const”
●
Join condition selectivity
9
Local condition selectivity
10
Local condition selectivity
●
Textbook:
– “Guesstimates”
●
col=const: 10%, col < const: 50%
– Histograms
●
or other pre-collected stats
●
MySQL/MariaDB: “records_in_range estimates”
– Use an index as a histogram
o_orderdate between $DATE1 and $DATE2
l_extendedprice > 1000000
o_shippriority=3
– Perform sampling
11
Histograms
12
Basic Histogram
●
“Value list” histogram
●
List the values and their
frequencies
●
Works when
n_values < n_buckets
●
MySQL’s name: “singleton”
Value Cardinality
‘foo’ 100
‘bar’ 200
‘baz’ 300
13
Equi-width histogram
●
Pre-defined bucket
bounds
●
Easy to collect
●
Not always accurate:
– A few “outlier” values in peripheral buckets
– Most values are in a few very popular buckets
●
What if densely-populated regions had more buckets?
10 20 30 40 50 60 70 80
14
Equi-height histogram
●
Make buckets have the same
number of rows
– Densely populated areas
get more buckets
– Sparsely populated get less
●
Better precision
●
Can be collected with one
pass.
10 20 30 40 50 60 70 80
10 20 30 40 50 60 70 80
15
Histogram usage
●
MariaDB
– 10.0-10.6: “Height-balanced” (aka equi-height)
– 10.7: improved equi-height* with common values
●
MySQL 8: equi-height* and “singleton”
●
PostgreSQL: MostCommonValue list + equi-height
●
CockroachDB: equi-height*
●
TiDB: equi-height* + TopN
16
Histograms in MariaDB
●
Introduced in MariaDB 10.0
●
Not enabled by default
– Require manual collection, update
– Need to enable use by the optimizer
●
Collection is expensive
– Space consumption issues with VARCHAR(N>>)
17
Histograms in MariaDB 10.4
●
Closer to being on by default
– ”The optimizer will use histograms if present”
– Still, need non-default ANALYZE command to collect.
●
Bernoulli sampling, @@analyze_sample_percentage
– 100 (default): use all data
– 0: pick the percentage automatically
18
Histograms in MariaDB 10.4: internals
●
Height-balanced (=equi-depth)
●
Store “fractions”
– 0.0 is min_value
– 1.0 is max_value
●
Fixed-precision
– SINGLE_PREC_HB - 1/256
– DOUBLE_PREC_HB 1/64K
10 20 30 40 50 60 70 80
10 20 30 40 50 60 70 80
0.0, 0.42, 0.6, 0.7,
0.85,
0.91,
0.93,
1.0
min_val=10
max_val=80
●
Ok for ranges
●
Poor for: Popular values, VARCHARs
19
Precision issue (MDEV-26125)
●
1M population, “country” matches real countries’ population.
analyze select * from generated_pop where country='China';
..-+---------+------------+----------+------------+-------------+
| rows | r_rows | filtered | r_filtered | Extra |
..-+---------+------------+----------+------------+-------------+
| 1000000 | 1000000.00 | 22.66 | 21.04 | Using where |
..-+---------+------------+----------+------------+-------------+
analyze select * from generated_pop where country='Chile';
..-+---------+------------+----------+------------+-------------+
| rows | r_rows | filtered | r_filtered | Extra |
..-+---------+------------+----------+------------+-------------+
| 1000000 | 1000000.00 | 22.66 | 0.25 | Using where |
..-+---------+------------+----------+------------+-------------+
analyze select * from generated_pop where country='Sweden';
..-+---------+------------+----------+------------+-------------+
| rows | r_rows | filtered | r_filtered | Extra |
..-+---------+------------+----------+------------+-------------+
| 1000000 | 1000000.00 | 4.69 | 0.15 | Using where |
..-+---------+------------+----------+------------+-------------+
Ok, for China it’s close
“filtered” is the estimate r_filted is the real value (observed)
Very inaccurate for its
neighbor Chile!
Better for Sweden
20
Histograms in MariaDB 10.7
●
Based on GSoC’21 project by Michael Okoko
●
@@histogram_type=JSON_HB
●
Stores values, not fractions
●
Histogram is stored as JSON
●
The histogram is height-balanced* (=equi-height)
– Common values are in their own buckets
21
JSON_HB fixes MDEV-26125
●
The same 1M population dataset, with JSON histogram:
set histogram_type=json_hb;
analyze table generated_pop persistent for all;
analyze select * from generated_pop where country='China';
..-+---------+------------+----------+------------+-------------+
| rows | r_rows | filtered | r_filtered | Extra |
..-+---------+------------+----------+------------+-------------+
| 1000000 | 1000000.00 | 21.04 | 21.04 | Using where |
..-+---------+------------+----------+------------+-------------+
analyze select * from generated_pop where country='Chile';
..-+---------+------------+----------+------------+-------------+
| rows | r_rows | filtered | r_filtered | Extra |
..-+---------+------------+----------+------------+-------------+
| 1000000 | 1000000.00 | 0.14 | 0.25 | Using where |
..-+---------+------------+----------+------------+-------------+
analyze select * from generated_pop where country='Sweden';
..-+---------+------------+----------+------------+-------------+
| rows | r_rows | filtered | r_filtered | Extra |
..-+---------+------------+----------+------------+-------------+
| 1000000 | 1000000.00 | 0.17 | 0.15 | Using where |
..-+---------+------------+----------+------------+-------------+
Same as before for
China
Much better for Chile
Also better for Sweden
22
JSON_HB under the hood
●
A table of vehicles registered in an Australian state
{
"histogram_hb_v2": [
...
{
"start": "C3",
"size": 0.003936638,
"ndv": 24
},
{
"start": "CALAIS",
"size": 0.002868234,
"ndv": 10
},
{
"start": "CAMRY",
"size": 0.034154968,
"ndv": 1
},
...
●
start – Start value
●
size – fraction of table rows in the bucket
– May vary
– So, not really “equi-height”
– The reason: “popular” values are in their
own buckets
●
ndv
– Special case: ndv=1
23
MySQL’s equi-height histogram
●
Bucket is a JSON array with
– min, max
– cumulative_fraction
– ndv
●
Buckets may have different sizes
– Popular values in their own buckets
– “Each value should be in one bucket” (and not
two adjacent buckets)(?)
– https://bugs.mysql.com/bug.php?id=104789
●
There are “holes” between buckets
{
"buckets": [
...
[
"base64:type254:QlVHR1k=",
"base64:type254:Q0FERFk=",
0.14083891639965046,
20
],
[
"base64:type254:Q0FMQUlT",
"base64:type254:Q0FNQVJP",
0.14234833037629427,
2
],
[
"base64:type254:Q0FNUlk=",
"base64:type254:Q0FNUlk=",
0.18120912003813255,
1
],
24
MySQL’s equi-height histogram
{
"buckets": [
...
[
"BUGGY",
"CADDY",
0.14083891639965046,
20
],
[
"CALAIS",
"CAMARO",
0.14234833037629427,
2
],
[
"CAMRY",
"CAMRY",
0.18120912003813255,
1
],
●
Bucket is a JSON array with
– min, max
– cumulative_fraction
– ndv
●
Buckets may have different sizes
– Popular values in their own buckets
– “Each value should be in one bucket” (and not
two adjacent buckets)(?)
– https://bugs.mysql.com/bug.php?id=104789
●
There are “holes” between buckets
25
CockroachDB’s equi-height histogram
●
Just the upper_bound
●
range_rows is number of rows between prev and this bound, exclusive.
– Not exactly equi-height, either.
– “A value goes into one bucket”
●
equal_rows is how many rows were equal to the upper_bound
– Every bucket has has a “singleton” co-bucket
●
reason for this?
upper_bound | range_rows | distinct_range_rows | equal_rows
------------------+------------+---------------------+------------
...
'CAMRY' | 306 | 4.2680661910276 | 25189
'CAPTIVA' | 1760 | 24.5483545627732 | 5971
'CARRY' | 3368 | 46.9766239587613 | 306
'CERATO' | 1760 | 24.5483545627732 | 4517
'CHEROKEE' | 2067 | 28.8303686825296 | 1607
'CIVIC' | 459 | 6.40209928654141 | 7962
...
26
TiDB’s equi-height histogram
●
TOP_N values are stored separately
●
Lower_bound and Upper_bound (holes)
●
Count is cumulative number of values (Not exactly equi-height, again)
●
Repeats is the occurence number of the Upper_bound
– Again, every other bucket is a singleton
●
ndv is only present with newer collection algorithm
+-----------+-------+---------+--------------------+----------------------+------+
| Bucket_id | Count | Repeats | Lower_Bound | Upper_Bound | Ndv |
+-----------+-------+---------+--------------------+----------------------+------+
...
| 60 | 22893 | 226 | CABSTAR | CALIFORNIA | 0 |
| 61 | 23212 | 226 | CAMROAD | CAPRI | 0 |
| 62 | 23714 | 271 | CARAVAN (IMPORT) | CELERIO | 0 |
| 63 | 24079 | 136 | CELICA (IMPORT) | CF SERIES | 0 |
| 64 | 24489 | 181 | CHARADE CENTRO | CHASER | 0 |
| 65 | 24808 | 45 | CHEVELLE | CITY COUPE | 0 |
SHOW STATS TOP_N;
+----------------------+-------+
| Value | Count |
+----------------------+-------+
| CAMRY | 24507 |
| CAPTIVA | 5019 |
...
27
PostgreSQL’s equi-height histogram
●
most_common_vals (100 of them) are stored separately
●
The histogram just stores the bucket bounds
– A real equi-height histogram.
most_common_vals | {HILUX,COMMODORE,LANDCRUISER,... }
most_common_freqs | {0.052233335,0.0415,0.0379,... }
histogram_bounds | {100,125I,208,"300 SERIES 616",307,318I,
320I,323,323I,335CI,407,"4 RUNNER",530I,86,9-Mar,A200,A4,A5,
ALMERA,AVALON,B180,B3000,BEETLE,BRERA,"C180 SERIES",...
28
On the number of buckets
●
PostgreSQL: default_statistics_target=100
– 100 MCVs + 100 buckets
●
MySQL: the default number of buckets is 100
●
CockroachDB: 200
– for non-indexed cols, just 2. This means they only get min/max?
●
TiDB – varied, 50..250?
●
MariaDB: histogram_size=254
– SINGLE_PREC_HB: 254, DOUBLE_PREC_HB: 127 buckets.
29
Histogram benchmark?
●
Can’t have it – apples to apples comparison is hard
●
Sampling vs full-scan collection
●
Histogram size
– Do MCV/TopN count as buckets?
– Bucket with two bounds vs bucket with one
– ...
●
Adequate precision is enough
30
Histograms take-aways
●
MariaDB 10.7 is getting new histogram_type=JSON_HB
●
Provides histograms similar to other databases
– Height-balanced*, common values are included
– Make them the default in the next release?
●
Things that are still missing:
1. “Genuine sampling”. Collection should not be a full table scan.
2. Automatic re-collection.
31
Selectivity for multiple conditions
32
Combining multiple conditions
Combined selectivity
●
if independent: sel1*sel2
●
If dependent:
– “Worst” case: MIN(sel1, sel2)
– “Best” case: 0.0
select ...
from order_items
where shipdate='2021-12-15' AND item_name='christmas light'
'swimsuit'
sel1 sel2
33
Solution #1: use a certain assumption
●
Combine conditions on the same column
●
Textbook: assume conditions are independent
sel=sel1*sel2
●
Conservative: use the most selective
sel= MIN(sel1, sel2)
shipdate >= '2021-12-15' AND shipdate <= '2021-12-24' AND item_name='doll'
34
Solution #1: use a certain assumption
●
“Exponential back-off” (SQL Server 2014)
– order conditions by selectivity, most selective first:
s1
s2
s3
s4
...
– Then,
– Assume s1
,“half” of s2 ,
“quarter” of s3
, ...
shipdate >= '2021-12-15' AND shipdate <= '2021-12-24' AND item_name='doll'
sel=s1⋅s2
1
2
⋅s3
1
4
⋅s4
1
8
35
Solution #2:
Multi-Column statistics
36
Multi-column stats
●
Multi-column histogram would take a lot of space
– Typically not used
●
TiDB seems to support them (but see TiDB issue 22589)?
●
PostgreSQL
– 10: Functional dependency
– 12 (Oct,2019): Multivariate MCV lists
●
MySQL, MariaDB: records_in_range.
37
MariaDB, MySQL: records-in-range
The optimizer
●
Builds a list of ranges over {col1, col2, col3 ...}
– (this is complex topic)
– In MariaDB, check the optimizer trace (not in MySQL: Bug #95824)
●
Uses the index as a large histogram
– The call to query the index:
records_in_range({c1,c2,c3} <= {col1,col2,col3} <= {...})
INDEX idx(col1, col2, col3, ...)
WHERE col1=... AND col2<=... AND ...col3...
38
records_in_range in action
●
Using vehicle registration database again
alter table vehicle_reg add index (Make, Model);
explain select * from vehicle_reg where Make='FORD' and Model='FOCUS';
+------+-------------+-------------+------+---------------+------+---------+-------------+------+-------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+------+-------------+-------------+------+---------------+------+---------+-------------+------+-------------
| 1 | SIMPLE | vehicle_reg | ref | Make | Make | 390 | const,const | 8638 | Using index
+------+-------------+-------------+------+---------------+------+---------+-------------+------+-------------
explain select * from vehicle_reg where Make='MITSUBISHI' and Model='FOCUS';
+------+-------------+-------------+------+---------------+------+---------+-------------+------+-------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+------+-------------+-------------+------+---------------+------+---------+-------------+------+-------------
| 1 | SIMPLE | vehicle_reg | ref | Make | Make | 390 | const,const | 1 | Using index
+------+-------------+-------------+------+---------------+------+---------+-------------+------+-------------
●
Try a frequently occurring combination:
●
Try a combination that doesn’t exist:
●
Good estimates!
39
PostgreSQL, for comparison
●
INDEX(Make, Model), no multi-variate statistics
test=# explain select * from vehicle_reg where Make='FORD' and Model='FOCUS';
QUERY PLAN
-----------------------------------------------------------------------------------------
Bitmap Heap Scan on vehicle_reg (cost=8.35..1354.94 rows=383 width=125)
Recheck Cond: (((make)::text = 'FORD'::text) AND ((model)::text = 'FOCUS'::text))
-> Bitmap Index Scan on make_model (cost=0.00..8.25 rows=383 width=0)
Index Cond: (((make)::text = 'FORD'::text) AND ((model)::text = 'FOCUS'::text))
●
Doesn’t see the difference!
– 347/383 is a count(Mitsubishi)/count(Ford)
test=# explain select * from vehicle_reg where Make='MITSUBISHI' and Model='FOCUS';
QUERY PLAN
-----------------------------------------------------------------------------------------------
Bitmap Heap Scan on vehicle_reg (cost=7.98..1237.73 rows=347 width=125)
Recheck Cond: (((make)::text = 'MITSUBISHI'::text) AND ((model)::text = 'FOCUS'::text))
-> Bitmap Index Scan on make_model (cost=0.00..7.90 rows=347 width=0)
Index Cond: (((make)::text = 'MITSUBISHI'::text) AND ((model)::text = 'FOCUS'::text))
●
383 rows is “sel1 * sel2”
40
Multivariate stats
test=# explain select * from vehicle_reg where Make='FORD' and Model='FOCUS';
QUERY PLAN
-----------------------------------------------------------------------------------------
Bitmap Heap Scan on vehicle_reg (cost=69.86..10465.53 rows=4823 width=124)
Recheck Cond: (((make)::text = 'FORD'::text) AND ((model)::text = 'FOCUS'::text))
-> Bitmap Index Scan on make_model (cost=0.00..68.66 rows=4823 width=0)
Index Cond: (((make)::text = 'FORD'::text) AND ((model)::text = 'FOCUS'::text))
●
rows=4823 is a much better estimate. Ford-Focus is in MCV list
test=# explain select * from vehicle_reg where Make='MITSUBISHI' and Model='FOCUS';
QUERY PLAN
-----------------------------------------------------------------------------------------------
Bitmap Heap Scan on vehicle_reg (cost=8.38..1364.94 rows=386 width=124)
Recheck Cond: (((make)::text = 'MITSUBISHI'::text) AND ((model)::text = 'FOCUS'::text))
-> Bitmap Index Scan on make_model (cost=0.00..8.29 rows=386 width=0)
Index Cond: (((make)::text = 'MITSUBISHI'::text) AND ((model)::text = 'FOCUS'::text))
CREATE STATISTICS make_model_stat (mcv) ON Make, Model FROM vehicle_reg;
analyze vehicle_reg;
●
rows=386, the same as before. No help if the combination is not in MCV list.
41
Multiple conditions selectivity takeaways
●
Basic: combine per-column statistics
– Assume independence/overlap/etc
– Very imprecise (no information)
●
Advanced: multi-column statistics
– Much better
– Much harder than single-column stats
– MariaDB/MySQL has a non-conventional form: records_in_range
42
Combining
multi-column selectivities
43
Combining multi-column statistics
col1=10 AND col2='foo' AND col3<='BBB' AND col4 IN (1,2,3)
44
Combining multi-column statistics
●
Multi-column statistics can “overlap”
●
How to combine them?
– Don’t want to assume independence...
col1=10 AND col2='foo' AND col3<='BBB' AND col4 IN (1,2,3)
Stat 1
Stat 2
Stat 3
Stat 4
45
The right way – research papers
●
VLDB‘2005: V. Markl et. al, "Consistently Estimating the Selectivity
of Conjuncts of Predicates"
– Introduces “Maximum Entropy” principle
– Makes use of all available information
●
EBDT’2020, D Havenstein et. al. “Fast Entropy Maximization for
Selectivity Estimation of Conjunctive Predicates on CPUs and
GPUs”
– New, faster entropy maximization algorithm
●
Hard!
– Not used in production(?)
46
What is used in production?
●
Prefer one statistic to other
●
Some heuristic rule
– “Statistics that uses more columns goes first”
– “Minimize the number of independence assumptions we have to
make”
– “Pick the most selective stats first”
– ...
47
In MariaDB
●
Reconciling overlapping estimates since 2006!
– e.g. sql_select.cc, grep for ReuseRangeEstimateForRef
●
The issue got worse in 10.4
– Optimizer tries to account for selectivity more agressively
– => More “collisions”
●
MDEV-23707 and linked tasks: MDEV-21813, MDEV-25830, ...
– Need to fix these
– Working on this
●
Workaround for now: set optimizer_use_condition_selectivity=1
48
Not covered in this talk:
Join Condition Selectivity
49
Takeaways
●
Computing condition selectivity is
– important
– hard
●
MariaDB 10.7: JSON_HB histograms
– more precise for common values, VARCHARs.
●
Work is underway to improve selectivity
computations in the optimizer.
50
Thanks for your attention!

More Related Content

What's hot

mysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementmysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementlalit choudhary
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptSubrata Kumer Paul
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0MariaDB plc
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOpsSveta Smirnova
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsFederico Razzoli
 
이것이 레디스다.
이것이 레디스다.이것이 레디스다.
이것이 레디스다.Kris Jeong
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog ProjectKendrick Lo
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Aleksandr Kuzminsky
 
PostgreSQL 12 New Features with Examples (English) GA
PostgreSQL 12 New Features with Examples (English) GAPostgreSQL 12 New Features with Examples (English) GA
PostgreSQL 12 New Features with Examples (English) GANoriyoshi Shinoda
 
MySQL Timeout Variables Explained
MySQL Timeout Variables Explained MySQL Timeout Variables Explained
MySQL Timeout Variables Explained Mydbops
 
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Jens Hadlich
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql DatabasePrashant Gupta
 
redis 소개자료 - 네오클로바
redis 소개자료 - 네오클로바redis 소개자료 - 네오클로바
redis 소개자료 - 네오클로바NeoClova
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceMariaDB plc
 
MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting Mydbops
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB plc
 
MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)Colin Charles
 
Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkNamHyuk Ahn
 

What's hot (20)

mysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementmysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancement
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.ppt
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOps
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAs
 
이것이 레디스다.
이것이 레디스다.이것이 레디스다.
이것이 레디스다.
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog Project
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)
 
PostgreSQL 12 New Features with Examples (English) GA
PostgreSQL 12 New Features with Examples (English) GAPostgreSQL 12 New Features with Examples (English) GA
PostgreSQL 12 New Features with Examples (English) GA
 
MySQL Timeout Variables Explained
MySQL Timeout Variables Explained MySQL Timeout Variables Explained
MySQL Timeout Variables Explained
 
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql Database
 
redis 소개자료 - 네오클로바
redis 소개자료 - 네오클로바redis 소개자료 - 네오클로바
redis 소개자료 - 네오클로바
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
 
MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
 
MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)
 
Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural Network
 

Similar to MariaDB's Query Optimizer with Better Selectivity Estimates

Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Sergey Petrunya
 
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...Kangaroot
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreMariaDB plc
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...EDB
 
MariaDB for Developers and Operators (DevOps)
MariaDB for Developers and Operators (DevOps)MariaDB for Developers and Operators (DevOps)
MariaDB for Developers and Operators (DevOps)Colin Charles
 
MariaDB for developers
MariaDB for developersMariaDB for developers
MariaDB for developersColin Charles
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityColin Charles
 
What's New In MySQL 5.6
What's New In MySQL 5.6What's New In MySQL 5.6
What's New In MySQL 5.6Abdul Manaf
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
Oracle Database InMemory
Oracle Database InMemoryOracle Database InMemory
Oracle Database InMemoryJorge Barba
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesSergey Petrunya
 
Why Use EXPLAIN FORMAT=JSON?
 Why Use EXPLAIN FORMAT=JSON?  Why Use EXPLAIN FORMAT=JSON?
Why Use EXPLAIN FORMAT=JSON? Sveta Smirnova
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreMariaDB plc
 
Mysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperabilityMysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperabilitySergey Petrunya
 
Maria db cassandra interoperability cassandra storage engine in mariadb
Maria db cassandra interoperability cassandra storage engine in mariadbMaria db cassandra interoperability cassandra storage engine in mariadb
Maria db cassandra interoperability cassandra storage engine in mariadbYUCHENG HU
 
How Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzerHow Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzerAndrey Karpov
 
MariaDB and Clickhouse Percona Live 2019 talk
MariaDB and Clickhouse Percona Live 2019 talkMariaDB and Clickhouse Percona Live 2019 talk
MariaDB and Clickhouse Percona Live 2019 talkAlexander Rubin
 
Fosdem2017 Scientific computing on Jruby
Fosdem2017  Scientific computing on JrubyFosdem2017  Scientific computing on Jruby
Fosdem2017 Scientific computing on JrubyPrasun Anand
 
Optimizer Histograms: When they Help and When Do Not?
Optimizer Histograms: When they Help and When Do Not?Optimizer Histograms: When they Help and When Do Not?
Optimizer Histograms: When they Help and When Do Not?Sveta Smirnova
 

Similar to MariaDB's Query Optimizer with Better Selectivity Estimates (20)

Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
 
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
MySQLinsanity
MySQLinsanityMySQLinsanity
MySQLinsanity
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
 
MariaDB for Developers and Operators (DevOps)
MariaDB for Developers and Operators (DevOps)MariaDB for Developers and Operators (DevOps)
MariaDB for Developers and Operators (DevOps)
 
MariaDB for developers
MariaDB for developersMariaDB for developers
MariaDB for developers
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
 
What's New In MySQL 5.6
What's New In MySQL 5.6What's New In MySQL 5.6
What's New In MySQL 5.6
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Oracle Database InMemory
Oracle Database InMemoryOracle Database InMemory
Oracle Database InMemory
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
Why Use EXPLAIN FORMAT=JSON?
 Why Use EXPLAIN FORMAT=JSON?  Why Use EXPLAIN FORMAT=JSON?
Why Use EXPLAIN FORMAT=JSON?
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
Mysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperabilityMysqlconf2013 mariadb-cassandra-interoperability
Mysqlconf2013 mariadb-cassandra-interoperability
 
Maria db cassandra interoperability cassandra storage engine in mariadb
Maria db cassandra interoperability cassandra storage engine in mariadbMaria db cassandra interoperability cassandra storage engine in mariadb
Maria db cassandra interoperability cassandra storage engine in mariadb
 
How Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzerHow Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzer
 
MariaDB and Clickhouse Percona Live 2019 talk
MariaDB and Clickhouse Percona Live 2019 talkMariaDB and Clickhouse Percona Live 2019 talk
MariaDB and Clickhouse Percona Live 2019 talk
 
Fosdem2017 Scientific computing on Jruby
Fosdem2017  Scientific computing on JrubyFosdem2017  Scientific computing on Jruby
Fosdem2017 Scientific computing on Jruby
 
Optimizer Histograms: When they Help and When Do Not?
Optimizer Histograms: When they Help and When Do Not?Optimizer Histograms: When they Help and When Do Not?
Optimizer Histograms: When they Help and When Do Not?
 

More from Sergey Petrunya

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12Sergey Petrunya
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesSergey Petrunya
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureSergey Petrunya
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace WalkthroughSergey Petrunya
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemSergey Petrunya
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что новогоSergey Petrunya
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performanceSergey Petrunya
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeSergey Petrunya
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Sergey Petrunya
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkSergey Petrunya
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standSergey Petrunya
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18Sergey Petrunya
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3Sergey Petrunya
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLSergey Petrunya
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Sergey Petrunya
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howSergey Petrunya
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBSergey Petrunya
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Sergey Petrunya
 

More from Sergey Petrunya (20)

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixes
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gem
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
 
MyRocks in MariaDB
MyRocks in MariaDBMyRocks in MariaDB
MyRocks in MariaDB
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQL
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDB
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
 

Recently uploaded

Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 

Recently uploaded (20)

Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 

MariaDB's Query Optimizer with Better Selectivity Estimates

  • 1. Sergei Petrunia MariaDB devroom FOSDEM 2021 Improving MariaDB’s Query Optimizer with better selectivity estimates MariaDB Fest 2021 Sergei Petrunia MariaDB developer
  • 2. 2 Background: what are selectivity estimates why they are important
  • 3. 3 Optimizer uses data statistics Cost-based query optimizer uses data statisistics: ● Cardinalities ● Selectivities ● Cost model ● ... Query Optimizer Query Query Plan Query Data statistics
  • 4. 4 Cardinalities and selectivities Cardinality is a number of rows ● Number of rows in the table ● Number of rows left after local condition check (the condition has “selectivity”) ● Number of rows after the join operation - ”join output cardinality” ● ... select * from orders, lineitem where o_orderkey=l_orderkey and o_orderdate between $DATE1 and $DATE2 and l_extendedprice > 1000 orders lineitem join
  • 5. 5 Condition selectivity ● Condition selectivity: ● Selectivity – 0 (0%) = “no rows are accepted” – 1.0 (or 100%) = “all rows are accepted” – 0.33 (or 33%) = “one row in 3 is accepted” selectivity(cond)= rows satisfyingcond totalrows ∗100%
  • 6. 6 Selectivity is important ● Table access cost depends on the “incoming” cardinality ● Cardinality errors are multiplied, e.g. – 2x customers – 3x orders per customer – gives 6x lineitems ● Compare: errors in read costs are added. ● Wrong cardinality is a common cause of huge errors in estimates. select * from customer join orders on c_custkey=o_custkey join lineitem on o_orderkey=l_orderkey where c_acctbal < 100 and o_shippriority=3 and l_extendedprice>1000 customer orders lineitem
  • 8. 8 Computing selectivity select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between $DATE1 and $DATE2 and l_extendedprice > 1000000 ● Local condition selectivity – Uses columns of one table – Typically, “column CMP const” ● Join condition selectivity
  • 10. 10 Local condition selectivity ● Textbook: – “Guesstimates” ● col=const: 10%, col < const: 50% – Histograms ● or other pre-collected stats ● MySQL/MariaDB: “records_in_range estimates” – Use an index as a histogram o_orderdate between $DATE1 and $DATE2 l_extendedprice > 1000000 o_shippriority=3 – Perform sampling
  • 12. 12 Basic Histogram ● “Value list” histogram ● List the values and their frequencies ● Works when n_values < n_buckets ● MySQL’s name: “singleton” Value Cardinality ‘foo’ 100 ‘bar’ 200 ‘baz’ 300
  • 13. 13 Equi-width histogram ● Pre-defined bucket bounds ● Easy to collect ● Not always accurate: – A few “outlier” values in peripheral buckets – Most values are in a few very popular buckets ● What if densely-populated regions had more buckets? 10 20 30 40 50 60 70 80
  • 14. 14 Equi-height histogram ● Make buckets have the same number of rows – Densely populated areas get more buckets – Sparsely populated get less ● Better precision ● Can be collected with one pass. 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80
  • 15. 15 Histogram usage ● MariaDB – 10.0-10.6: “Height-balanced” (aka equi-height) – 10.7: improved equi-height* with common values ● MySQL 8: equi-height* and “singleton” ● PostgreSQL: MostCommonValue list + equi-height ● CockroachDB: equi-height* ● TiDB: equi-height* + TopN
  • 16. 16 Histograms in MariaDB ● Introduced in MariaDB 10.0 ● Not enabled by default – Require manual collection, update – Need to enable use by the optimizer ● Collection is expensive – Space consumption issues with VARCHAR(N>>)
  • 17. 17 Histograms in MariaDB 10.4 ● Closer to being on by default – ”The optimizer will use histograms if present” – Still, need non-default ANALYZE command to collect. ● Bernoulli sampling, @@analyze_sample_percentage – 100 (default): use all data – 0: pick the percentage automatically
  • 18. 18 Histograms in MariaDB 10.4: internals ● Height-balanced (=equi-depth) ● Store “fractions” – 0.0 is min_value – 1.0 is max_value ● Fixed-precision – SINGLE_PREC_HB - 1/256 – DOUBLE_PREC_HB 1/64K 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 0.0, 0.42, 0.6, 0.7, 0.85, 0.91, 0.93, 1.0 min_val=10 max_val=80 ● Ok for ranges ● Poor for: Popular values, VARCHARs
  • 19. 19 Precision issue (MDEV-26125) ● 1M population, “country” matches real countries’ population. analyze select * from generated_pop where country='China'; ..-+---------+------------+----------+------------+-------------+ | rows | r_rows | filtered | r_filtered | Extra | ..-+---------+------------+----------+------------+-------------+ | 1000000 | 1000000.00 | 22.66 | 21.04 | Using where | ..-+---------+------------+----------+------------+-------------+ analyze select * from generated_pop where country='Chile'; ..-+---------+------------+----------+------------+-------------+ | rows | r_rows | filtered | r_filtered | Extra | ..-+---------+------------+----------+------------+-------------+ | 1000000 | 1000000.00 | 22.66 | 0.25 | Using where | ..-+---------+------------+----------+------------+-------------+ analyze select * from generated_pop where country='Sweden'; ..-+---------+------------+----------+------------+-------------+ | rows | r_rows | filtered | r_filtered | Extra | ..-+---------+------------+----------+------------+-------------+ | 1000000 | 1000000.00 | 4.69 | 0.15 | Using where | ..-+---------+------------+----------+------------+-------------+ Ok, for China it’s close “filtered” is the estimate r_filted is the real value (observed) Very inaccurate for its neighbor Chile! Better for Sweden
  • 20. 20 Histograms in MariaDB 10.7 ● Based on GSoC’21 project by Michael Okoko ● @@histogram_type=JSON_HB ● Stores values, not fractions ● Histogram is stored as JSON ● The histogram is height-balanced* (=equi-height) – Common values are in their own buckets
  • 21. 21 JSON_HB fixes MDEV-26125 ● The same 1M population dataset, with JSON histogram: set histogram_type=json_hb; analyze table generated_pop persistent for all; analyze select * from generated_pop where country='China'; ..-+---------+------------+----------+------------+-------------+ | rows | r_rows | filtered | r_filtered | Extra | ..-+---------+------------+----------+------------+-------------+ | 1000000 | 1000000.00 | 21.04 | 21.04 | Using where | ..-+---------+------------+----------+------------+-------------+ analyze select * from generated_pop where country='Chile'; ..-+---------+------------+----------+------------+-------------+ | rows | r_rows | filtered | r_filtered | Extra | ..-+---------+------------+----------+------------+-------------+ | 1000000 | 1000000.00 | 0.14 | 0.25 | Using where | ..-+---------+------------+----------+------------+-------------+ analyze select * from generated_pop where country='Sweden'; ..-+---------+------------+----------+------------+-------------+ | rows | r_rows | filtered | r_filtered | Extra | ..-+---------+------------+----------+------------+-------------+ | 1000000 | 1000000.00 | 0.17 | 0.15 | Using where | ..-+---------+------------+----------+------------+-------------+ Same as before for China Much better for Chile Also better for Sweden
  • 22. 22 JSON_HB under the hood ● A table of vehicles registered in an Australian state { "histogram_hb_v2": [ ... { "start": "C3", "size": 0.003936638, "ndv": 24 }, { "start": "CALAIS", "size": 0.002868234, "ndv": 10 }, { "start": "CAMRY", "size": 0.034154968, "ndv": 1 }, ... ● start – Start value ● size – fraction of table rows in the bucket – May vary – So, not really “equi-height” – The reason: “popular” values are in their own buckets ● ndv – Special case: ndv=1
  • 23. 23 MySQL’s equi-height histogram ● Bucket is a JSON array with – min, max – cumulative_fraction – ndv ● Buckets may have different sizes – Popular values in their own buckets – “Each value should be in one bucket” (and not two adjacent buckets)(?) – https://bugs.mysql.com/bug.php?id=104789 ● There are “holes” between buckets { "buckets": [ ... [ "base64:type254:QlVHR1k=", "base64:type254:Q0FERFk=", 0.14083891639965046, 20 ], [ "base64:type254:Q0FMQUlT", "base64:type254:Q0FNQVJP", 0.14234833037629427, 2 ], [ "base64:type254:Q0FNUlk=", "base64:type254:Q0FNUlk=", 0.18120912003813255, 1 ],
  • 24. 24 MySQL’s equi-height histogram { "buckets": [ ... [ "BUGGY", "CADDY", 0.14083891639965046, 20 ], [ "CALAIS", "CAMARO", 0.14234833037629427, 2 ], [ "CAMRY", "CAMRY", 0.18120912003813255, 1 ], ● Bucket is a JSON array with – min, max – cumulative_fraction – ndv ● Buckets may have different sizes – Popular values in their own buckets – “Each value should be in one bucket” (and not two adjacent buckets)(?) – https://bugs.mysql.com/bug.php?id=104789 ● There are “holes” between buckets
  • 25. 25 CockroachDB’s equi-height histogram ● Just the upper_bound ● range_rows is number of rows between prev and this bound, exclusive. – Not exactly equi-height, either. – “A value goes into one bucket” ● equal_rows is how many rows were equal to the upper_bound – Every bucket has has a “singleton” co-bucket ● reason for this? upper_bound | range_rows | distinct_range_rows | equal_rows ------------------+------------+---------------------+------------ ... 'CAMRY' | 306 | 4.2680661910276 | 25189 'CAPTIVA' | 1760 | 24.5483545627732 | 5971 'CARRY' | 3368 | 46.9766239587613 | 306 'CERATO' | 1760 | 24.5483545627732 | 4517 'CHEROKEE' | 2067 | 28.8303686825296 | 1607 'CIVIC' | 459 | 6.40209928654141 | 7962 ...
  • 26. 26 TiDB’s equi-height histogram ● TOP_N values are stored separately ● Lower_bound and Upper_bound (holes) ● Count is cumulative number of values (Not exactly equi-height, again) ● Repeats is the occurence number of the Upper_bound – Again, every other bucket is a singleton ● ndv is only present with newer collection algorithm +-----------+-------+---------+--------------------+----------------------+------+ | Bucket_id | Count | Repeats | Lower_Bound | Upper_Bound | Ndv | +-----------+-------+---------+--------------------+----------------------+------+ ... | 60 | 22893 | 226 | CABSTAR | CALIFORNIA | 0 | | 61 | 23212 | 226 | CAMROAD | CAPRI | 0 | | 62 | 23714 | 271 | CARAVAN (IMPORT) | CELERIO | 0 | | 63 | 24079 | 136 | CELICA (IMPORT) | CF SERIES | 0 | | 64 | 24489 | 181 | CHARADE CENTRO | CHASER | 0 | | 65 | 24808 | 45 | CHEVELLE | CITY COUPE | 0 | SHOW STATS TOP_N; +----------------------+-------+ | Value | Count | +----------------------+-------+ | CAMRY | 24507 | | CAPTIVA | 5019 | ...
  • 27. 27 PostgreSQL’s equi-height histogram ● most_common_vals (100 of them) are stored separately ● The histogram just stores the bucket bounds – A real equi-height histogram. most_common_vals | {HILUX,COMMODORE,LANDCRUISER,... } most_common_freqs | {0.052233335,0.0415,0.0379,... } histogram_bounds | {100,125I,208,"300 SERIES 616",307,318I, 320I,323,323I,335CI,407,"4 RUNNER",530I,86,9-Mar,A200,A4,A5, ALMERA,AVALON,B180,B3000,BEETLE,BRERA,"C180 SERIES",...
  • 28. 28 On the number of buckets ● PostgreSQL: default_statistics_target=100 – 100 MCVs + 100 buckets ● MySQL: the default number of buckets is 100 ● CockroachDB: 200 – for non-indexed cols, just 2. This means they only get min/max? ● TiDB – varied, 50..250? ● MariaDB: histogram_size=254 – SINGLE_PREC_HB: 254, DOUBLE_PREC_HB: 127 buckets.
  • 29. 29 Histogram benchmark? ● Can’t have it – apples to apples comparison is hard ● Sampling vs full-scan collection ● Histogram size – Do MCV/TopN count as buckets? – Bucket with two bounds vs bucket with one – ... ● Adequate precision is enough
  • 30. 30 Histograms take-aways ● MariaDB 10.7 is getting new histogram_type=JSON_HB ● Provides histograms similar to other databases – Height-balanced*, common values are included – Make them the default in the next release? ● Things that are still missing: 1. “Genuine sampling”. Collection should not be a full table scan. 2. Automatic re-collection.
  • 32. 32 Combining multiple conditions Combined selectivity ● if independent: sel1*sel2 ● If dependent: – “Worst” case: MIN(sel1, sel2) – “Best” case: 0.0 select ... from order_items where shipdate='2021-12-15' AND item_name='christmas light' 'swimsuit' sel1 sel2
  • 33. 33 Solution #1: use a certain assumption ● Combine conditions on the same column ● Textbook: assume conditions are independent sel=sel1*sel2 ● Conservative: use the most selective sel= MIN(sel1, sel2) shipdate >= '2021-12-15' AND shipdate <= '2021-12-24' AND item_name='doll'
  • 34. 34 Solution #1: use a certain assumption ● “Exponential back-off” (SQL Server 2014) – order conditions by selectivity, most selective first: s1 s2 s3 s4 ... – Then, – Assume s1 ,“half” of s2 , “quarter” of s3 , ... shipdate >= '2021-12-15' AND shipdate <= '2021-12-24' AND item_name='doll' sel=s1⋅s2 1 2 ⋅s3 1 4 ⋅s4 1 8
  • 36. 36 Multi-column stats ● Multi-column histogram would take a lot of space – Typically not used ● TiDB seems to support them (but see TiDB issue 22589)? ● PostgreSQL – 10: Functional dependency – 12 (Oct,2019): Multivariate MCV lists ● MySQL, MariaDB: records_in_range.
  • 37. 37 MariaDB, MySQL: records-in-range The optimizer ● Builds a list of ranges over {col1, col2, col3 ...} – (this is complex topic) – In MariaDB, check the optimizer trace (not in MySQL: Bug #95824) ● Uses the index as a large histogram – The call to query the index: records_in_range({c1,c2,c3} <= {col1,col2,col3} <= {...}) INDEX idx(col1, col2, col3, ...) WHERE col1=... AND col2<=... AND ...col3...
  • 38. 38 records_in_range in action ● Using vehicle registration database again alter table vehicle_reg add index (Make, Model); explain select * from vehicle_reg where Make='FORD' and Model='FOCUS'; +------+-------------+-------------+------+---------------+------+---------+-------------+------+------------- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra +------+-------------+-------------+------+---------------+------+---------+-------------+------+------------- | 1 | SIMPLE | vehicle_reg | ref | Make | Make | 390 | const,const | 8638 | Using index +------+-------------+-------------+------+---------------+------+---------+-------------+------+------------- explain select * from vehicle_reg where Make='MITSUBISHI' and Model='FOCUS'; +------+-------------+-------------+------+---------------+------+---------+-------------+------+------------- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra +------+-------------+-------------+------+---------------+------+---------+-------------+------+------------- | 1 | SIMPLE | vehicle_reg | ref | Make | Make | 390 | const,const | 1 | Using index +------+-------------+-------------+------+---------------+------+---------+-------------+------+------------- ● Try a frequently occurring combination: ● Try a combination that doesn’t exist: ● Good estimates!
  • 39. 39 PostgreSQL, for comparison ● INDEX(Make, Model), no multi-variate statistics test=# explain select * from vehicle_reg where Make='FORD' and Model='FOCUS'; QUERY PLAN ----------------------------------------------------------------------------------------- Bitmap Heap Scan on vehicle_reg (cost=8.35..1354.94 rows=383 width=125) Recheck Cond: (((make)::text = 'FORD'::text) AND ((model)::text = 'FOCUS'::text)) -> Bitmap Index Scan on make_model (cost=0.00..8.25 rows=383 width=0) Index Cond: (((make)::text = 'FORD'::text) AND ((model)::text = 'FOCUS'::text)) ● Doesn’t see the difference! – 347/383 is a count(Mitsubishi)/count(Ford) test=# explain select * from vehicle_reg where Make='MITSUBISHI' and Model='FOCUS'; QUERY PLAN ----------------------------------------------------------------------------------------------- Bitmap Heap Scan on vehicle_reg (cost=7.98..1237.73 rows=347 width=125) Recheck Cond: (((make)::text = 'MITSUBISHI'::text) AND ((model)::text = 'FOCUS'::text)) -> Bitmap Index Scan on make_model (cost=0.00..7.90 rows=347 width=0) Index Cond: (((make)::text = 'MITSUBISHI'::text) AND ((model)::text = 'FOCUS'::text)) ● 383 rows is “sel1 * sel2”
  • 40. 40 Multivariate stats test=# explain select * from vehicle_reg where Make='FORD' and Model='FOCUS'; QUERY PLAN ----------------------------------------------------------------------------------------- Bitmap Heap Scan on vehicle_reg (cost=69.86..10465.53 rows=4823 width=124) Recheck Cond: (((make)::text = 'FORD'::text) AND ((model)::text = 'FOCUS'::text)) -> Bitmap Index Scan on make_model (cost=0.00..68.66 rows=4823 width=0) Index Cond: (((make)::text = 'FORD'::text) AND ((model)::text = 'FOCUS'::text)) ● rows=4823 is a much better estimate. Ford-Focus is in MCV list test=# explain select * from vehicle_reg where Make='MITSUBISHI' and Model='FOCUS'; QUERY PLAN ----------------------------------------------------------------------------------------------- Bitmap Heap Scan on vehicle_reg (cost=8.38..1364.94 rows=386 width=124) Recheck Cond: (((make)::text = 'MITSUBISHI'::text) AND ((model)::text = 'FOCUS'::text)) -> Bitmap Index Scan on make_model (cost=0.00..8.29 rows=386 width=0) Index Cond: (((make)::text = 'MITSUBISHI'::text) AND ((model)::text = 'FOCUS'::text)) CREATE STATISTICS make_model_stat (mcv) ON Make, Model FROM vehicle_reg; analyze vehicle_reg; ● rows=386, the same as before. No help if the combination is not in MCV list.
  • 41. 41 Multiple conditions selectivity takeaways ● Basic: combine per-column statistics – Assume independence/overlap/etc – Very imprecise (no information) ● Advanced: multi-column statistics – Much better – Much harder than single-column stats – MariaDB/MySQL has a non-conventional form: records_in_range
  • 43. 43 Combining multi-column statistics col1=10 AND col2='foo' AND col3<='BBB' AND col4 IN (1,2,3)
  • 44. 44 Combining multi-column statistics ● Multi-column statistics can “overlap” ● How to combine them? – Don’t want to assume independence... col1=10 AND col2='foo' AND col3<='BBB' AND col4 IN (1,2,3) Stat 1 Stat 2 Stat 3 Stat 4
  • 45. 45 The right way – research papers ● VLDB‘2005: V. Markl et. al, "Consistently Estimating the Selectivity of Conjuncts of Predicates" – Introduces “Maximum Entropy” principle – Makes use of all available information ● EBDT’2020, D Havenstein et. al. “Fast Entropy Maximization for Selectivity Estimation of Conjunctive Predicates on CPUs and GPUs” – New, faster entropy maximization algorithm ● Hard! – Not used in production(?)
  • 46. 46 What is used in production? ● Prefer one statistic to other ● Some heuristic rule – “Statistics that uses more columns goes first” – “Minimize the number of independence assumptions we have to make” – “Pick the most selective stats first” – ...
  • 47. 47 In MariaDB ● Reconciling overlapping estimates since 2006! – e.g. sql_select.cc, grep for ReuseRangeEstimateForRef ● The issue got worse in 10.4 – Optimizer tries to account for selectivity more agressively – => More “collisions” ● MDEV-23707 and linked tasks: MDEV-21813, MDEV-25830, ... – Need to fix these – Working on this ● Workaround for now: set optimizer_use_condition_selectivity=1
  • 48. 48 Not covered in this talk: Join Condition Selectivity
  • 49. 49 Takeaways ● Computing condition selectivity is – important – hard ● MariaDB 10.7: JSON_HB histograms – more precise for common values, VARCHARs. ● Work is underway to improve selectivity computations in the optimizer.
  • 50. 50 Thanks for your attention!