SlideShare a Scribd company logo
1 of 32
Download to read offline
Sergei Petrunia
MariaDB devroom
FOSDEM 2021
Join Optimizer
1. How it works
2. What we’re working on to improve it
Optimizer Call
July 2022
Sergei Petrunia
MariaDB
2
Join order search
●
Total number of possible join orders
for N-table join is:
N * (N-1) * (N-2) *… = N!
●
Join orders are built left-to-right
●
Cannot enumerate all possible join
orders.
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t3
t4
t3
t4
t1
t4
t1
t3
t4
t3
t4
t1
t3
t1
t3
t4
3
Pruning
●
Enumerate promising join orders
first.
●
Do not explore join orders that are
apparently worse.
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t3
t4
t3
t4
t1
t4
t1
t3
t4
t3
t4
t1
t3
t1
t3
t4
4
Pruning # 1: by cost
●
Cost of the current_prefix is
already higher than total cost of
best plan.
– Adding tables will make it even
higher
– No point to try.
●
This pruning is always done (no
switch)
●
Optimizer trace: pruned_by_cost
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t3
t4
t3
t4
t1
t4
t1
t3
t4
t3
t4
t1
t3
t1
t3
t4
5
pruned_by_cost weaknesses
●
A really expensive table at the
end of the join order.
●
Any prefix that doesn’t include it is
relatively cheap
– Even if its comparably worse:
–
–
●
=> No pruning.
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t4
t3
t3
t4
t1
t3
t1
t4
t4
t3
t3
t1
t4
t1
t3
t4
t1
t1 t4
t2 t4
6
Pruning # 2: by heuristic
●
Adding a table tX to a join prefix
– Adds read_time (time to read tX)
– Produces record_count row
combinations to be joined with further
tables (aka “join suffix”).
– Both have an effect on the total cost:
●
read_time is time spent right now.
●
record_count will affect cost of join
suffix.
– We don’t know the “exchange ratio”
because we don’t know the costs of
“join suffix”.
t0
t1
t2
t3
incoming_record_count
record_count_t1
record_count_t2
record_count_t3
read_time_t1
7
The idea behind the heuristic
– … we don’t know the “exchange ratio”
because we don’t know the costs of “join
suffix”
●
Also the suffixes are different!
– Let’s assume the suffixes have similar costs.
– Then, if
●
read_time_t1 < read_time_t2, AND
●
record_count_t1 < record_count_t2
– Then t1 “is better” than t2.
– Can prune away t2.
t0
t1
t2
t3
incoming_record_count
record_count_t1
record_count_t2
record_count_t3
read_time_t1
8
Applying heuristic pruning
●
Do it locally in each join prefix
●
First, consider more promising
tables first.
●
Less-promising tables second
– And try to prune them away.
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t3
t4
t3
t4
t1
t4
t1
t3
t4
t3
t4
t1
t3
t1
t3
t4
9
Pruning # 2: by heuristic
●
A Model Table (yes, I’ve just invented this term):
– Lowest read_time AND record_count seen so
far
– Either
●
record_count < 2.0, or
●
there are no possible "key dependencies" on
tables not in the prefix
– A “possible key dependency” is an eqality in form:
tbl.keyXpartY=expr(tables_no_in_prefix)
●
^^ this is a “heuristic” to apply the heuristic.
●
Prune away tables that have both worse read_time
and record_count than the Model Table.
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t3
t4
t3
t4
t1
t4
t1
t3
t4
t3
t4
t1
t3
t1
t3
t4
10
How one can see heuristic pruning
●
@@optimizer_prune_level
– 0 – not enabled.
– 1 – enabled (the default)
●
Optimizer trace: grep for
“pruned_by_heuristic”
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t3
t4
t3
t4
t1
t4
t1
t3
t4
t3
t4
t1
t3
t1
t3
t4
11
Greedy Search
12
Greedy search
●
Consider only prefixes of limited size
– Based on that, pick the first table
– Repeat
●
@@optimizer_search_depth
– Default: 62
(both MySQL and MariaDB)
– 0 – “pick depth automatically”
●
Why is this not default yet?
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t3
t4
t3
t4
t1
t4
t1
t3
t4
t3
t4
t1
t3
t1
t3
t4
13
MDEV-28073
(fixed in 10.6)
14
MDEV-28073: patch #1: “edge tables”
●
If the suffix t1-t4-t3 uses only eq_ref or similar:
– It is [nearly] the best
– Don’t enumerate other table combinations.
●
They can’t be much better.
●
Optimizer trace: pruned_by_hanging_leaf
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t3
t4
t3
t4
t1
t4
t1
t3
t4
t3
t4
t1
t3
t1
commit b729896d00e022f6205399376c0cc107e1ee0704
Author: Monty <monty@mariadb.org>
Date: Tue May 10 11:47:20 2022 +0300
MDEV-28073 Query performance degradation in newer MariaDB versions when
using many tables
The issue was that best_extension_by_limited_search() had to go through
too many plans with the same cost as there where many EQ_REF tables.
Fixed by shortcutting EQ_REF (AND REF) when the result only contains one
row. This got the optimization time down from hours to sub seconds.
t0
15
MDEV-28073: patch #2: key_dependent
select ...
from
person, car_rides, bicycle_rides
where
person.name=car_rides.rider and
person.name=bicycle_rides.rider and
...
car_rides bicycle_rides
person
●
Remember the “heuristics to apply the heuristic” a few slides above:
– there are no possible "key dependencies" on tables not in the prefix
It can be false due to multi-equalities:
●
person.name=bicycle_rides.riders is a “possible key dependency”.
●
But we already have person.name from car_rides.rider (the equality is “bound”)
– Trying join orders with bicycle_rides before person won’t produce a better plan.
●
Solution: adjust the heuristic: there are no possible key_dependencies on tables not in
the prefix that are not already bound.
name
16
MDEV-28073: patch #3: table order de-scrambling
●
The optimizer should try good tables first
●
Implemented by taking tables off the unused portion of
join->best_ref array.
– Initially it’s ordered (“promising” tables first)
– But due to bug eventually gets out of order
●
Plan searches that enumerate many options could
suffer from poor pruning towards the end.
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t3
t4
t3
t4
t1
t4
t1
t3
t4
t3
t4
t1
t3
t1
Author: Michael Widenius <monty@mariadb.org>
Date: Sun May 15 15:46:29 2022 +0300
greedy_search() and best_extension_by_limited_search() scrambled table order
best_extension_by_limited_search() assumes that tables should be sorted
according to size to be able to quickly disregard bad plans. However the
current usage of swap_variables() will change the table order to a not
sorted one for the next recursive call. This breaks the assumtion and
causes performance issues when using many tables (we have to examine
many more plans).
t0
17
MDEV-28852
(MariaDB 10.10)
18
Local table pre-sorting
19
In which order do we try the tables?
●
Current:
– join_tab_cmp() orders all tables by their
JOIN_TAB::found_records
(records after table’s condition is checked)
– The same ordering is used everywhere
– This *ignores* the join prefix and efficien
table read plans we can use
– e.g. here, ignores the prefix of t1:
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t3
t4
t3
t4
t1
t4
t1
t3
t4
t3
t4
t1
t3
t1
t3
t4
t1
t1
t2
t3
t4
20
In which order do we try the tables?
●
First, evaluate possible table accesses for {t2,t3,t4}.
●
Sort them by #found_rows
●
Then try extending join orders
– Do all kinds of pruning while doing this
t1
t1
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
commit 0762dd9283185c72c6955f44fc4d862a0a928569
Author: Monty <monty@mariadb.org>
Date: Tue May 31 17:36:32 2022 +0300
Improve pruning in greedy_search by sorting tables during search
MDEV-28073 Slow query performance in MariaDB when using many tables
The faster we can find a good query plan, the more options we have for
finding and pruning (ignoring) bad plans.
This patch adds sorting of plans to best_extension_by_limited_search().
21
Improving the pruning
(MDEV-28929)
22
Remember: the idea behind the heuristic
– … we don’t know the “exchange ratio”
because we don’t know the costs of “join
suffix”
●
Also the suffixes are different!
– Let’s assume the suffixes have similar costs.
– Then, if
●
read_time_t1 < read_time_t2, AND
●
record_count_t1 < record_count_t2
– Then t1 “is better” than t2.
– Can prune away t2.
–
t0
t1
t2
t3
incoming_record_count
record_count_t1
record_count_t2
record_count_t3
read_time_t1
23
Let’s plot the tables
Let’s plot
records_read
read_time
t1
24
Let’s plot the tables
records_read
read_time
t1
Better than t1
Worse than t1
25
Let’s plot the tables
●
“Typical” situation: plan with
higher cost produce more #rows.
– A lot of opportunities to do
pruning
●
The optimizer orders plans by
records_read
– (that is, goes left-to-right)
– Pick the first plan as “Model”,
prune those that are worse.
records_read
read_time
t1
t0
t2
t3
t4
26
When pruning doesn’t work
records_read
read_time
t1
t0
t2
t3
t4
●
“Bad” situation:
– plans with high cost produce
few rows
– And vice versa
●
Can’t do pruning.
27
When pruning could work but doesn’t
records_read
read_time
t1
t2
t3
t4
t0
●
Walking left-to-right, the optimizer
picks t0 as Model table.
●
And then can’t prune away any
other table.
Tables that are worse than t0 are here
28
How to do as much pruning as possible?
records_read
read_time
●
Pick a minimal set of Model
tables that allow to prune away
the rest?
●
Complexity seems to be at least
N^2.
●
Some approximate algorithm?
– Use the table with min_cost
– Use the table with
min_records_read
●
Have a patch with some
approximate implementation
29
eq_ref chaining
30
Motivation
●
Tables with “attributes” that are joined using Primary Key
select * from base_table, attr1, attr2, ... attrN
where
attr1.pk = base_table.pk and
attr2.pk = base_table.pk and
...
attrN.pk = base_table.pk
●
Lots of nearly-identical query plans: There are factorial(n_attributes) permutations
– Have the same or very close cost
●
=> Can’t do pruning
●
The fix with “Edge tables” aka pruned_by_hanging_leaf helps but only if the
attributes are at the end of the join order.
31
eq_ref chaining
●
The idea: if we see a eq_ref access, try
considering only eq_refs as long as we can.
●
MySQL 5.7 has a similar optimization
– TODO: describe the differences.
t1
t1
t2
t2
t3
t4
t3
t4
t2
t4
t2
t3
t4
t3
t4
t2
t3
t2
t1
t3
t4
t3
t4
t1
t4
t1
t3
t4
t3
t4
t1
t3
t1
t3
t4
commit 5abb6bff6cfb5cb5d87520f1e32e9b41db46bd7b
Author: Monty <monty@mariadb.org>
Date: Thu Jun 2 19:47:23 2022 +0300
Added EQ_REF chaining to the greedy_optimizer
MDEV-28073 Slow query performance in MariaDB when using many table
The idea is to prefer and chain EQ_REF tables (tables that uses an
unique key to find a row) when searching for the best table combination.
This significantly reduces row combinations that has to be examined.
This is optimization is enabled when setting optimizer_prune_level=2 (default)
32
Thanks for your attention!

More Related Content

What's hot

What's hot (20)

ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gem
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQL
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
 
MySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxMySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptx
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query Tuning
 
How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0
 
Performance tuning ColumnStore
Performance tuning ColumnStorePerformance tuning ColumnStore
Performance tuning ColumnStore
 
Tableau Prep.pptx
Tableau Prep.pptxTableau Prep.pptx
Tableau Prep.pptx
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
 
Oracle Advanced SQL and Analytic Functions
Oracle Advanced SQL and Analytic FunctionsOracle Advanced SQL and Analytic Functions
Oracle Advanced SQL and Analytic Functions
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 

Similar to MariaDB's join optimizer: how it works and current fixes

Advanced query optimization
Advanced query optimizationAdvanced query optimization
Advanced query optimization
MYXPLAIN
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure
Eman magdy
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
Cloudera, Inc.
 
Press the link to see the book from my google drive.https.docx
Press the link to see the book from my google drive.https.docxPress the link to see the book from my google drive.https.docx
Press the link to see the book from my google drive.https.docx
ChantellPantoja184
 
Machines constrained flow shop scheduling processing time, setup time each as...
Machines constrained flow shop scheduling processing time, setup time each as...Machines constrained flow shop scheduling processing time, setup time each as...
Machines constrained flow shop scheduling processing time, setup time each as...
Alexander Decker
 
11.machines constrained flow shop scheduling processing time, setup time each...
11.machines constrained flow shop scheduling processing time, setup time each...11.machines constrained flow shop scheduling processing time, setup time each...
11.machines constrained flow shop scheduling processing time, setup time each...
Alexander Decker
 
Rtl design optimizations and tradeoffs
Rtl design optimizations and tradeoffsRtl design optimizations and tradeoffs
Rtl design optimizations and tradeoffs
Grace Abraham
 

Similar to MariaDB's join optimizer: how it works and current fixes (20)

Algorithim lec1.pptx
Algorithim lec1.pptxAlgorithim lec1.pptx
Algorithim lec1.pptx
 
Advanced query optimization
Advanced query optimizationAdvanced query optimization
Advanced query optimization
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internals
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
 
Merge sort analysis and its real time applications
Merge sort analysis and its real time applicationsMerge sort analysis and its real time applications
Merge sort analysis and its real time applications
 
MySQL Query Optimisation 101
MySQL Query Optimisation 101MySQL Query Optimisation 101
MySQL Query Optimisation 101
 
Data Structures 6
Data Structures 6Data Structures 6
Data Structures 6
 
Flowshop scheduling
Flowshop schedulingFlowshop scheduling
Flowshop scheduling
 
Press the link to see the book from my google drive.https.docx
Press the link to see the book from my google drive.https.docxPress the link to see the book from my google drive.https.docx
Press the link to see the book from my google drive.https.docx
 
5 Cool Things About PLSQL
5 Cool Things About PLSQL5 Cool Things About PLSQL
5 Cool Things About PLSQL
 
Machines constrained flow shop scheduling processing time, setup time each as...
Machines constrained flow shop scheduling processing time, setup time each as...Machines constrained flow shop scheduling processing time, setup time each as...
Machines constrained flow shop scheduling processing time, setup time each as...
 
11.machines constrained flow shop scheduling processing time, setup time each...
11.machines constrained flow shop scheduling processing time, setup time each...11.machines constrained flow shop scheduling processing time, setup time each...
11.machines constrained flow shop scheduling processing time, setup time each...
 
11.machines constrained flow shop scheduling processing time, setup time each...
11.machines constrained flow shop scheduling processing time, setup time each...11.machines constrained flow shop scheduling processing time, setup time each...
11.machines constrained flow shop scheduling processing time, setup time each...
 
Electrical Engineering Exam Help
Electrical Engineering Exam HelpElectrical Engineering Exam Help
Electrical Engineering Exam Help
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Rtl design optimizations and tradeoffs
Rtl design optimizations and tradeoffsRtl design optimizations and tradeoffs
Rtl design optimizations and tradeoffs
 
Oracle Join Methods and 12c Adaptive Plans
Oracle Join Methods and 12c Adaptive PlansOracle Join Methods and 12c Adaptive Plans
Oracle Join Methods and 12c Adaptive Plans
 
merge sort help in language C with algorithms
merge sort help in language C with algorithmsmerge sort help in language C with algorithms
merge sort help in language C with algorithms
 
How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012
 

More from Sergey Petrunya

More from Sergey Petrunya (20)

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
 
MyRocks in MariaDB
MyRocks in MariaDBMyRocks in MariaDB
MyRocks in MariaDB
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDB
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
 
MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.
 
Window functions in MariaDB 10.2
Window functions in MariaDB 10.2Window functions in MariaDB 10.2
Window functions in MariaDB 10.2
 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 

MariaDB's join optimizer: how it works and current fixes

  • 1. Sergei Petrunia MariaDB devroom FOSDEM 2021 Join Optimizer 1. How it works 2. What we’re working on to improve it Optimizer Call July 2022 Sergei Petrunia MariaDB
  • 2. 2 Join order search ● Total number of possible join orders for N-table join is: N * (N-1) * (N-2) *… = N! ● Join orders are built left-to-right ● Cannot enumerate all possible join orders. t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t3 t4 t3 t4 t1 t4 t1 t3 t4 t3 t4 t1 t3 t1 t3 t4
  • 3. 3 Pruning ● Enumerate promising join orders first. ● Do not explore join orders that are apparently worse. t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t3 t4 t3 t4 t1 t4 t1 t3 t4 t3 t4 t1 t3 t1 t3 t4
  • 4. 4 Pruning # 1: by cost ● Cost of the current_prefix is already higher than total cost of best plan. – Adding tables will make it even higher – No point to try. ● This pruning is always done (no switch) ● Optimizer trace: pruned_by_cost t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t3 t4 t3 t4 t1 t4 t1 t3 t4 t3 t4 t1 t3 t1 t3 t4
  • 5. 5 pruned_by_cost weaknesses ● A really expensive table at the end of the join order. ● Any prefix that doesn’t include it is relatively cheap – Even if its comparably worse: – – ● => No pruning. t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t4 t3 t3 t4 t1 t3 t1 t4 t4 t3 t3 t1 t4 t1 t3 t4 t1 t1 t4 t2 t4
  • 6. 6 Pruning # 2: by heuristic ● Adding a table tX to a join prefix – Adds read_time (time to read tX) – Produces record_count row combinations to be joined with further tables (aka “join suffix”). – Both have an effect on the total cost: ● read_time is time spent right now. ● record_count will affect cost of join suffix. – We don’t know the “exchange ratio” because we don’t know the costs of “join suffix”. t0 t1 t2 t3 incoming_record_count record_count_t1 record_count_t2 record_count_t3 read_time_t1
  • 7. 7 The idea behind the heuristic – … we don’t know the “exchange ratio” because we don’t know the costs of “join suffix” ● Also the suffixes are different! – Let’s assume the suffixes have similar costs. – Then, if ● read_time_t1 < read_time_t2, AND ● record_count_t1 < record_count_t2 – Then t1 “is better” than t2. – Can prune away t2. t0 t1 t2 t3 incoming_record_count record_count_t1 record_count_t2 record_count_t3 read_time_t1
  • 8. 8 Applying heuristic pruning ● Do it locally in each join prefix ● First, consider more promising tables first. ● Less-promising tables second – And try to prune them away. t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t3 t4 t3 t4 t1 t4 t1 t3 t4 t3 t4 t1 t3 t1 t3 t4
  • 9. 9 Pruning # 2: by heuristic ● A Model Table (yes, I’ve just invented this term): – Lowest read_time AND record_count seen so far – Either ● record_count < 2.0, or ● there are no possible "key dependencies" on tables not in the prefix – A “possible key dependency” is an eqality in form: tbl.keyXpartY=expr(tables_no_in_prefix) ● ^^ this is a “heuristic” to apply the heuristic. ● Prune away tables that have both worse read_time and record_count than the Model Table. t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t3 t4 t3 t4 t1 t4 t1 t3 t4 t3 t4 t1 t3 t1 t3 t4
  • 10. 10 How one can see heuristic pruning ● @@optimizer_prune_level – 0 – not enabled. – 1 – enabled (the default) ● Optimizer trace: grep for “pruned_by_heuristic” t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t3 t4 t3 t4 t1 t4 t1 t3 t4 t3 t4 t1 t3 t1 t3 t4
  • 12. 12 Greedy search ● Consider only prefixes of limited size – Based on that, pick the first table – Repeat ● @@optimizer_search_depth – Default: 62 (both MySQL and MariaDB) – 0 – “pick depth automatically” ● Why is this not default yet? t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t3 t4 t3 t4 t1 t4 t1 t3 t4 t3 t4 t1 t3 t1 t3 t4
  • 14. 14 MDEV-28073: patch #1: “edge tables” ● If the suffix t1-t4-t3 uses only eq_ref or similar: – It is [nearly] the best – Don’t enumerate other table combinations. ● They can’t be much better. ● Optimizer trace: pruned_by_hanging_leaf t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t3 t4 t3 t4 t1 t4 t1 t3 t4 t3 t4 t1 t3 t1 commit b729896d00e022f6205399376c0cc107e1ee0704 Author: Monty <monty@mariadb.org> Date: Tue May 10 11:47:20 2022 +0300 MDEV-28073 Query performance degradation in newer MariaDB versions when using many tables The issue was that best_extension_by_limited_search() had to go through too many plans with the same cost as there where many EQ_REF tables. Fixed by shortcutting EQ_REF (AND REF) when the result only contains one row. This got the optimization time down from hours to sub seconds. t0
  • 15. 15 MDEV-28073: patch #2: key_dependent select ... from person, car_rides, bicycle_rides where person.name=car_rides.rider and person.name=bicycle_rides.rider and ... car_rides bicycle_rides person ● Remember the “heuristics to apply the heuristic” a few slides above: – there are no possible "key dependencies" on tables not in the prefix It can be false due to multi-equalities: ● person.name=bicycle_rides.riders is a “possible key dependency”. ● But we already have person.name from car_rides.rider (the equality is “bound”) – Trying join orders with bicycle_rides before person won’t produce a better plan. ● Solution: adjust the heuristic: there are no possible key_dependencies on tables not in the prefix that are not already bound. name
  • 16. 16 MDEV-28073: patch #3: table order de-scrambling ● The optimizer should try good tables first ● Implemented by taking tables off the unused portion of join->best_ref array. – Initially it’s ordered (“promising” tables first) – But due to bug eventually gets out of order ● Plan searches that enumerate many options could suffer from poor pruning towards the end. t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t3 t4 t3 t4 t1 t4 t1 t3 t4 t3 t4 t1 t3 t1 Author: Michael Widenius <monty@mariadb.org> Date: Sun May 15 15:46:29 2022 +0300 greedy_search() and best_extension_by_limited_search() scrambled table order best_extension_by_limited_search() assumes that tables should be sorted according to size to be able to quickly disregard bad plans. However the current usage of swap_variables() will change the table order to a not sorted one for the next recursive call. This breaks the assumtion and causes performance issues when using many tables (we have to examine many more plans). t0
  • 19. 19 In which order do we try the tables? ● Current: – join_tab_cmp() orders all tables by their JOIN_TAB::found_records (records after table’s condition is checked) – The same ordering is used everywhere – This *ignores* the join prefix and efficien table read plans we can use – e.g. here, ignores the prefix of t1: t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t3 t4 t3 t4 t1 t4 t1 t3 t4 t3 t4 t1 t3 t1 t3 t4 t1 t1 t2 t3 t4
  • 20. 20 In which order do we try the tables? ● First, evaluate possible table accesses for {t2,t3,t4}. ● Sort them by #found_rows ● Then try extending join orders – Do all kinds of pruning while doing this t1 t1 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 commit 0762dd9283185c72c6955f44fc4d862a0a928569 Author: Monty <monty@mariadb.org> Date: Tue May 31 17:36:32 2022 +0300 Improve pruning in greedy_search by sorting tables during search MDEV-28073 Slow query performance in MariaDB when using many tables The faster we can find a good query plan, the more options we have for finding and pruning (ignoring) bad plans. This patch adds sorting of plans to best_extension_by_limited_search().
  • 22. 22 Remember: the idea behind the heuristic – … we don’t know the “exchange ratio” because we don’t know the costs of “join suffix” ● Also the suffixes are different! – Let’s assume the suffixes have similar costs. – Then, if ● read_time_t1 < read_time_t2, AND ● record_count_t1 < record_count_t2 – Then t1 “is better” than t2. – Can prune away t2. – t0 t1 t2 t3 incoming_record_count record_count_t1 record_count_t2 record_count_t3 read_time_t1
  • 23. 23 Let’s plot the tables Let’s plot records_read read_time t1
  • 24. 24 Let’s plot the tables records_read read_time t1 Better than t1 Worse than t1
  • 25. 25 Let’s plot the tables ● “Typical” situation: plan with higher cost produce more #rows. – A lot of opportunities to do pruning ● The optimizer orders plans by records_read – (that is, goes left-to-right) – Pick the first plan as “Model”, prune those that are worse. records_read read_time t1 t0 t2 t3 t4
  • 26. 26 When pruning doesn’t work records_read read_time t1 t0 t2 t3 t4 ● “Bad” situation: – plans with high cost produce few rows – And vice versa ● Can’t do pruning.
  • 27. 27 When pruning could work but doesn’t records_read read_time t1 t2 t3 t4 t0 ● Walking left-to-right, the optimizer picks t0 as Model table. ● And then can’t prune away any other table. Tables that are worse than t0 are here
  • 28. 28 How to do as much pruning as possible? records_read read_time ● Pick a minimal set of Model tables that allow to prune away the rest? ● Complexity seems to be at least N^2. ● Some approximate algorithm? – Use the table with min_cost – Use the table with min_records_read ● Have a patch with some approximate implementation
  • 30. 30 Motivation ● Tables with “attributes” that are joined using Primary Key select * from base_table, attr1, attr2, ... attrN where attr1.pk = base_table.pk and attr2.pk = base_table.pk and ... attrN.pk = base_table.pk ● Lots of nearly-identical query plans: There are factorial(n_attributes) permutations – Have the same or very close cost ● => Can’t do pruning ● The fix with “Edge tables” aka pruned_by_hanging_leaf helps but only if the attributes are at the end of the join order.
  • 31. 31 eq_ref chaining ● The idea: if we see a eq_ref access, try considering only eq_refs as long as we can. ● MySQL 5.7 has a similar optimization – TODO: describe the differences. t1 t1 t2 t2 t3 t4 t3 t4 t2 t4 t2 t3 t4 t3 t4 t2 t3 t2 t1 t3 t4 t3 t4 t1 t4 t1 t3 t4 t3 t4 t1 t3 t1 t3 t4 commit 5abb6bff6cfb5cb5d87520f1e32e9b41db46bd7b Author: Monty <monty@mariadb.org> Date: Thu Jun 2 19:47:23 2022 +0300 Added EQ_REF chaining to the greedy_optimizer MDEV-28073 Slow query performance in MariaDB when using many table The idea is to prefer and chain EQ_REF tables (tables that uses an unique key to find a row) when searching for the best table combination. This significantly reduces row combinations that has to be examined. This is optimization is enabled when setting optimizer_prune_level=2 (default)
  • 32. 32 Thanks for your attention!