SlideShare a Scribd company logo
1 of 28
Download to read offline
Lessons for the optimizer
from TPC-DS benchmark
Sergei Petrunia
Query Optimizer developer
MariaDB Corporation
2019 MariaDB Developers Unconference
New York
The goals
1. Want to evaluate/measure the query optimizer
2. Hard to do, optimizer should handle
– Different query patterns
– Different data distributions, etc
3. How does one do it anyway?
– Look at benchmarks
– Or “optimizer part” of the benchmarks
Benchmarks
1. sysbench
– Popular
– Does only basic queries, few query patterns
2. DBT-3 (aka TPC-H)
– 6 tables, 22 analytic queries
– Was used to see some optimizer problems
– Limited:
●
Uniform data distribution, uncorrelated columns
●
...
TPC-DS benchmark
● Obsoletes DBT-3 benchmark
● Richer dataset
– 25 Tables, 99 queries
– Non-uniform data distributions
● Uses advanced SQL features
– 32 queries use CTE
– 27 queries use Window Functions
– etc
● Could not really run it until MariaDB 10.2 (or MySQL 8)
MariaDB still can’t run all of TPC-DS
●
2 Queries: FULL OUTER JOIN
●
10 Queries: ROLLUP + ORDER BY problem (MDEV-17807)
●
~20 more queries have fixable problems
– “Every derived table must have an alias”, etc
select
...
group by
a,b,c with rollup
order by
a,b,c
ERROR 1221 (HY000): Incorrect usage of CUBE/ROLLUP
and ORDER BY
Oracle MySQL and TPC-DS
● ROLLUP + ORDER BY is supported since 8.0.12
● Doesn’t support FULL OUTER JOIN (2 queries)
● Doesn’t support EXCEPT (1 query)
● Doesn’t support INTERSECT (3 queries)
Running queries from TPC-DS
● Data generator creates CSV files
– Adjust #define for MySQL/MariaDB
● Query generator produces “streams” from templates
– A set of QueryNNN.tpl files
– A stream is a text file with one instance of each of the 99 queries
– One can add hooks at query start/end
● Queries have a few typos
● There’s no tool to run queries/measure time
– Note that the read queries are a subset of benchmark (TpCX$)
Getting it to run
● A collection of scripts at
https://github.com/spetrunia/tpcds-run-tool
● The goal is a fully-automated run
– MariaDB, MySQL, PostgreSQL
● Because we need to play with settings/options
Test runs done
● The dataset
– Scale=1
– 1.2 GB CSV files
– 6 GB when loaded
● The Queries
– 10..20 “Streams”
● Tuning
– Innodb_buffer_pool=8G (50% RAM)
– shared_buffers = 4G (25% RAM)
Test results
Test results
● ...
Test results
● … a bit inconclusive – query times varied across my runs (?)
● Time to run one stream = 20 min – 2 hours
● Searching for the source of randomness
– Started to work on full automation
●
(did I run ANALYZE? Did I have correct with my.cnf
parameters?)
– Started to look at rngseed in dataset/query generator
MariaDB/MySQL
MariaDB 10.2, 10.4, MySQL 8
● Scale=1, 6.1 GB data, 8G buffer pool
● rngseed=1234 for both
● Benchmark takes ~20 min
● Query times are very non-uniform
+-------------+---------------+
| query_name | QueryTime_ms |
+-------------+---------------+
| query72.tpl | 678,321 |
| query23.tpl | 80,025 |
| query2.tpl | 65,156 |
| query39.tpl | 63,761 |
| query78.tpl | 63,473 |
| query4.tpl | 27,549 |
| query31.tpl | 24,344 |
| query47.tpl | 19,156 |
| query11.tpl | 17,484 |
| query74.tpl | 16,571 |
| query21.tpl | 16,212 |
| query59.tpl | 10,522 |
| query88.tpl | 9,965 |
Query#72 dominates
query1.tpl
query13.tpl
query19.tpl
query23.tpl
query28.tpl
query31.tpl
query35.tpl
query40.tpl
query44.tpl
query48.tpl
query53.tpl
query58.tpl
query61.tpl
query65.tpl
query69.tpl
query73.tpl
query78.tpl
query83.tpl
query89.tpl
query92.tpl
query96.tpl
0
100000
200000
300000
400000
500000
600000
700000
800000
Without Query #72
query1.tpl
query13.tpl
query19.tpl
query23.tpl
query28.tpl
query31.tpl
query35.tpl
query40.tpl
query44.tpl
query48.tpl
query53.tpl
query58.tpl
query61.tpl
query65.tpl
query69.tpl
query74.tpl
query79.tpl
query84.tpl
query9.tpl
query93.tpl
query98.tpl
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
PostgreSQL 11
PostgreSQL 11
● There was a “fast” run
● Showing results from the last
two runs (both where “slow”)
– rngseed=5678 for both
– 121 min
– rngseed=1234 (data),
rngseed=4321 (query)
– 145..154 min.
Heaviest queries in the run
● Execution time varies
● Is this a query optimizer issue?
● Or different constants in a skewed dataset?
+-------------+-----------------+-----------------+--------+
| query_name | PG11-seed5678 | PG11-seed1234 | X |
+-------------+-----------------+-----------------+--------+
| query4.tpl | 3,628,830 | 3,578,944 | 1.0139 |
| query11.tpl | 2,004,392 | 2,013,597 | 0.9954 |
| query1.tpl | 87,981 | 1,947,624 | 0.0452 |
| query74.tpl | 693,784 | 641,696 | 1.0812 |
| query47.tpl | 624,717 | 539,941 | 1.1570 |
| query57.tpl | 116,570 | 112,472 | 1.0364 |
| query81.tpl | 22,089 | 47,366 | 0.4663 |
| query6.tpl | 27,896 | 27,009 | 1.0328 |
| query30.tpl | 11,214 | 11,171 | 1.0038 |
| query39.tpl | 10,803 | 10,702 | 1.0094 |
| query95.tpl | 16,418 | 10,065 | 1.6312 |
`
● Do we need a “representative
collection of datasets”?
– Check N datasets?
Compare most heavy queries
● Some queries are present in both lists, but some are only in one.
● Not clear
+-------------+-----------------+-----------------+--------+
| query_name | PG11-seed5678 | PG11-seed1234 | X |
+-------------+-----------------+-----------------+--------+
| query4.tpl | 3,628,830 | 3,578,944 | 1.0139 |
| query11.tpl | 2,004,392 | 2,013,597 | 0.9954 |
| query1.tpl | 87,981 | 1,947,624 | 0.0452 |
| query74.tpl | 693,784 | 641,696 | 1.0812 |
| query47.tpl | 624,717 | 539,941 | 1.1570 |
| query57.tpl | 116,570 | 112,472 | 1.0364 |
| query81.tpl | 22,089 | 47,366 | 0.4663 |
| query6.tpl | 27,896 | 27,009 | 1.0328 |
| query30.tpl | 11,214 | 11,171 | 1.0038 |
| query39.tpl | 10,803 | 10,702 | 1.0094 |
| query95.tpl | 16,418 | 10,065 | 1.6312 |
`
+-------------+---------------+
| query_name | QueryTime_ms |
+-------------+---------------+
| query72.tpl | 678,321 |
| query23.tpl | 80,025 |
| query2.tpl | 65,156 |
| query39.tpl | 63,761 |
| query78.tpl | 63,473 |
| query4.tpl | 27,549 |
| query31.tpl | 24,344 |
| query47.tpl | 19,156 |
| query11.tpl | 17,484 |
| query74.tpl | 16,571 |
| query21.tpl | 16,212 |
| query59.tpl | 10,522 |
MariaDB PostgreSQL
Observations about the benchmark
● rngseed on the dataset matters A LOT
– What is a representative set of rngseed values?
● rngseed on query streams – much less
● Hardware?
● Queries are not equal
– Heavy vs lightweight queries
– Is SUM(query_time) an adequate metric?
●
Wont see that a fast query got 10x slower
Other observations
● Both DBT-3 and TPC-DS workloads are relevant for the optimizer
– Condition selectivities
– Semi-join optimizations
– …
● But don’t match the optimizer issues we see
– ORDER BY … LIMIT optimization
– Long IN-list
– …
Extra: parallel query in PG?
Extra – PostgreSQL 11, parallel query?
● Trying on a run with both rngseed=5678:
● Parallel settings
max_parallel_workers_per_gather=8 (the default was 2)
dynamic_shared_memory_type=posix
show max_worker_processes= 8
● Results
– Only saw one core to be occupied
– The run still took 121 min, didin’t see any speedup
Try a parallel query
select
sum(inv_quantity_on_hand*i_current_price)
from
inventory, item
where
i_item_sk=inv_item_sk;
QUERY PLAN
---------------------------------------------------------------------------------
Aggregate (cost=301495.25..301495.26 rows=1 width=32)
-> Hash Join (cost=1635.00..213408.54 rows=11744894 width=10)
Hash Cond: (inventory.inv_item_sk = item.i_item_sk)
-> Seq Scan on inventory (cost=0.00..180935.94 rows=11744894 width=8)
-> Hash (cost=1410.00..1410.00 rows=18000 width=10)
-> Seq Scan on item (cost=0.00..1410.00 rows=18000 width=10)
● max_parallel_workers_per_gather=0
Try a parallel query
select
sum(inv_quantity_on_hand*i_current_price)
from
inventory, item
where
i_item_sk=inv_item_sk;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=125048.98..125048.99 rows=1 width=32)
-> Gather (cost=125048.55..125048.96 rows=4 width=32)
Workers Planned: 4
-> Partial Aggregate (cost=124048.55..124048.56 rows=1 width=32)
-> Parallel Hash Join (cost=1468.23..102026.87 rows=2936224 width=10)
Hash Cond: (inventory.inv_item_sk = item.i_item_sk)
-> Parallel Seq Scan on inventory (cost=0.00..92849.24 rows=2936224 width=8)
-> Parallel Hash (cost=1335.88..1335.88 rows=10588 width=10)
-> Parallel Seq Scan on item (cost=0.00..1335.88 rows=10588 width=10)
● max_parallel_workers_per_gather=8
Try a parallel query
select
sum(inv_quantity_on_hand*i_current_price)
from
inventory, item
where
i_item_sk=inv_item_sk;
● Results
– max_parallel_workers_per_gather=8: 1.0 sec
– max_parallel_workers_per_gather=0: 3.8 sec
● Didn’t see anything like that in TPC-DS benchmark
Thanks!

More Related Content

What's hot

Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Jaime Crespo
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioAlluxio, Inc.
 
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...ScyllaDB
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizationsSzehon Ho
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 

What's hot (20)

Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizations
 
Druid
DruidDruid
Druid
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 

Similar to Lessons for the optimizer from running the TPC-DS benchmark

Database and application performance vivek sharma
Database and application performance vivek sharmaDatabase and application performance vivek sharma
Database and application performance vivek sharmaaioughydchapter
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internalsAlexey Ermakov
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit
 
Islamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningIslamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningUmair Shahid
 
Islamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningIslamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningUmair Shahid
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and BenchmarksJignesh Shah
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning BasicsOracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basicsnitin anjankar
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedDainius Jocas
 
PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesInMobi Technology
 
Graphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv YafoGraphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv YafoDieter Plaetinck
 

Similar to Lessons for the optimizer from running the TPC-DS benchmark (20)

Database and application performance vivek sharma
Database and application performance vivek sharmaDatabase and application performance vivek sharma
Database and application performance vivek sharma
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internals
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
Islamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningIslamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuning
 
Islamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningIslamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuning
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and Benchmarks
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning BasicsOracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basics
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at Vinted
 
PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major Features
 
Graphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv YafoGraphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv Yafo
 
techniques.ppt
techniques.ppttechniques.ppt
techniques.ppt
 

More from Sergey Petrunya

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12Sergey Petrunya
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesSergey Petrunya
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Sergey Petrunya
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesSergey Petrunya
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureSergey Petrunya
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace WalkthroughSergey Petrunya
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemSergey Petrunya
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesSergey Petrunya
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что новогоSergey Petrunya
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performanceSergey Petrunya
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeSergey Petrunya
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Sergey Petrunya
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standSergey Petrunya
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18Sergey Petrunya
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3Sergey Petrunya
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLSergey Petrunya
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Sergey Petrunya
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howSergey Petrunya
 

More from Sergey Petrunya (20)

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixes
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gem
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
 
MyRocks in MariaDB
MyRocks in MariaDBMyRocks in MariaDB
MyRocks in MariaDB
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQL
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
 

Recently uploaded

BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxalwaysnagaraju26
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 

Recently uploaded (20)

BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 

Lessons for the optimizer from running the TPC-DS benchmark

  • 1. Lessons for the optimizer from TPC-DS benchmark Sergei Petrunia Query Optimizer developer MariaDB Corporation 2019 MariaDB Developers Unconference New York
  • 2. The goals 1. Want to evaluate/measure the query optimizer 2. Hard to do, optimizer should handle – Different query patterns – Different data distributions, etc 3. How does one do it anyway? – Look at benchmarks – Or “optimizer part” of the benchmarks
  • 3. Benchmarks 1. sysbench – Popular – Does only basic queries, few query patterns 2. DBT-3 (aka TPC-H) – 6 tables, 22 analytic queries – Was used to see some optimizer problems – Limited: ● Uniform data distribution, uncorrelated columns ● ...
  • 4. TPC-DS benchmark ● Obsoletes DBT-3 benchmark ● Richer dataset – 25 Tables, 99 queries – Non-uniform data distributions ● Uses advanced SQL features – 32 queries use CTE – 27 queries use Window Functions – etc ● Could not really run it until MariaDB 10.2 (or MySQL 8)
  • 5. MariaDB still can’t run all of TPC-DS ● 2 Queries: FULL OUTER JOIN ● 10 Queries: ROLLUP + ORDER BY problem (MDEV-17807) ● ~20 more queries have fixable problems – “Every derived table must have an alias”, etc select ... group by a,b,c with rollup order by a,b,c ERROR 1221 (HY000): Incorrect usage of CUBE/ROLLUP and ORDER BY
  • 6. Oracle MySQL and TPC-DS ● ROLLUP + ORDER BY is supported since 8.0.12 ● Doesn’t support FULL OUTER JOIN (2 queries) ● Doesn’t support EXCEPT (1 query) ● Doesn’t support INTERSECT (3 queries)
  • 7. Running queries from TPC-DS ● Data generator creates CSV files – Adjust #define for MySQL/MariaDB ● Query generator produces “streams” from templates – A set of QueryNNN.tpl files – A stream is a text file with one instance of each of the 99 queries – One can add hooks at query start/end ● Queries have a few typos ● There’s no tool to run queries/measure time – Note that the read queries are a subset of benchmark (TpCX$)
  • 8. Getting it to run ● A collection of scripts at https://github.com/spetrunia/tpcds-run-tool ● The goal is a fully-automated run – MariaDB, MySQL, PostgreSQL ● Because we need to play with settings/options
  • 9. Test runs done ● The dataset – Scale=1 – 1.2 GB CSV files – 6 GB when loaded ● The Queries – 10..20 “Streams” ● Tuning – Innodb_buffer_pool=8G (50% RAM) – shared_buffers = 4G (25% RAM)
  • 12. Test results ● … a bit inconclusive – query times varied across my runs (?) ● Time to run one stream = 20 min – 2 hours ● Searching for the source of randomness – Started to work on full automation ● (did I run ANALYZE? Did I have correct with my.cnf parameters?) – Started to look at rngseed in dataset/query generator
  • 14. MariaDB 10.2, 10.4, MySQL 8 ● Scale=1, 6.1 GB data, 8G buffer pool ● rngseed=1234 for both ● Benchmark takes ~20 min ● Query times are very non-uniform +-------------+---------------+ | query_name | QueryTime_ms | +-------------+---------------+ | query72.tpl | 678,321 | | query23.tpl | 80,025 | | query2.tpl | 65,156 | | query39.tpl | 63,761 | | query78.tpl | 63,473 | | query4.tpl | 27,549 | | query31.tpl | 24,344 | | query47.tpl | 19,156 | | query11.tpl | 17,484 | | query74.tpl | 16,571 | | query21.tpl | 16,212 | | query59.tpl | 10,522 | | query88.tpl | 9,965 |
  • 18. PostgreSQL 11 ● There was a “fast” run ● Showing results from the last two runs (both where “slow”) – rngseed=5678 for both – 121 min – rngseed=1234 (data), rngseed=4321 (query) – 145..154 min.
  • 19. Heaviest queries in the run ● Execution time varies ● Is this a query optimizer issue? ● Or different constants in a skewed dataset? +-------------+-----------------+-----------------+--------+ | query_name | PG11-seed5678 | PG11-seed1234 | X | +-------------+-----------------+-----------------+--------+ | query4.tpl | 3,628,830 | 3,578,944 | 1.0139 | | query11.tpl | 2,004,392 | 2,013,597 | 0.9954 | | query1.tpl | 87,981 | 1,947,624 | 0.0452 | | query74.tpl | 693,784 | 641,696 | 1.0812 | | query47.tpl | 624,717 | 539,941 | 1.1570 | | query57.tpl | 116,570 | 112,472 | 1.0364 | | query81.tpl | 22,089 | 47,366 | 0.4663 | | query6.tpl | 27,896 | 27,009 | 1.0328 | | query30.tpl | 11,214 | 11,171 | 1.0038 | | query39.tpl | 10,803 | 10,702 | 1.0094 | | query95.tpl | 16,418 | 10,065 | 1.6312 | ` ● Do we need a “representative collection of datasets”? – Check N datasets?
  • 20. Compare most heavy queries ● Some queries are present in both lists, but some are only in one. ● Not clear +-------------+-----------------+-----------------+--------+ | query_name | PG11-seed5678 | PG11-seed1234 | X | +-------------+-----------------+-----------------+--------+ | query4.tpl | 3,628,830 | 3,578,944 | 1.0139 | | query11.tpl | 2,004,392 | 2,013,597 | 0.9954 | | query1.tpl | 87,981 | 1,947,624 | 0.0452 | | query74.tpl | 693,784 | 641,696 | 1.0812 | | query47.tpl | 624,717 | 539,941 | 1.1570 | | query57.tpl | 116,570 | 112,472 | 1.0364 | | query81.tpl | 22,089 | 47,366 | 0.4663 | | query6.tpl | 27,896 | 27,009 | 1.0328 | | query30.tpl | 11,214 | 11,171 | 1.0038 | | query39.tpl | 10,803 | 10,702 | 1.0094 | | query95.tpl | 16,418 | 10,065 | 1.6312 | ` +-------------+---------------+ | query_name | QueryTime_ms | +-------------+---------------+ | query72.tpl | 678,321 | | query23.tpl | 80,025 | | query2.tpl | 65,156 | | query39.tpl | 63,761 | | query78.tpl | 63,473 | | query4.tpl | 27,549 | | query31.tpl | 24,344 | | query47.tpl | 19,156 | | query11.tpl | 17,484 | | query74.tpl | 16,571 | | query21.tpl | 16,212 | | query59.tpl | 10,522 | MariaDB PostgreSQL
  • 21. Observations about the benchmark ● rngseed on the dataset matters A LOT – What is a representative set of rngseed values? ● rngseed on query streams – much less ● Hardware? ● Queries are not equal – Heavy vs lightweight queries – Is SUM(query_time) an adequate metric? ● Wont see that a fast query got 10x slower
  • 22. Other observations ● Both DBT-3 and TPC-DS workloads are relevant for the optimizer – Condition selectivities – Semi-join optimizations – … ● But don’t match the optimizer issues we see – ORDER BY … LIMIT optimization – Long IN-list – …
  • 24. Extra – PostgreSQL 11, parallel query? ● Trying on a run with both rngseed=5678: ● Parallel settings max_parallel_workers_per_gather=8 (the default was 2) dynamic_shared_memory_type=posix show max_worker_processes= 8 ● Results – Only saw one core to be occupied – The run still took 121 min, didin’t see any speedup
  • 25. Try a parallel query select sum(inv_quantity_on_hand*i_current_price) from inventory, item where i_item_sk=inv_item_sk; QUERY PLAN --------------------------------------------------------------------------------- Aggregate (cost=301495.25..301495.26 rows=1 width=32) -> Hash Join (cost=1635.00..213408.54 rows=11744894 width=10) Hash Cond: (inventory.inv_item_sk = item.i_item_sk) -> Seq Scan on inventory (cost=0.00..180935.94 rows=11744894 width=8) -> Hash (cost=1410.00..1410.00 rows=18000 width=10) -> Seq Scan on item (cost=0.00..1410.00 rows=18000 width=10) ● max_parallel_workers_per_gather=0
  • 26. Try a parallel query select sum(inv_quantity_on_hand*i_current_price) from inventory, item where i_item_sk=inv_item_sk; QUERY PLAN ---------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=125048.98..125048.99 rows=1 width=32) -> Gather (cost=125048.55..125048.96 rows=4 width=32) Workers Planned: 4 -> Partial Aggregate (cost=124048.55..124048.56 rows=1 width=32) -> Parallel Hash Join (cost=1468.23..102026.87 rows=2936224 width=10) Hash Cond: (inventory.inv_item_sk = item.i_item_sk) -> Parallel Seq Scan on inventory (cost=0.00..92849.24 rows=2936224 width=8) -> Parallel Hash (cost=1335.88..1335.88 rows=10588 width=10) -> Parallel Seq Scan on item (cost=0.00..1335.88 rows=10588 width=10) ● max_parallel_workers_per_gather=8
  • 27. Try a parallel query select sum(inv_quantity_on_hand*i_current_price) from inventory, item where i_item_sk=inv_item_sk; ● Results – max_parallel_workers_per_gather=8: 1.0 sec – max_parallel_workers_per_gather=0: 3.8 sec ● Didn’t see anything like that in TPC-DS benchmark