CREATE STATISTICS - What is it for? (PostgresLondon)

T
Tomas VondraSoftware developer, Consultant at 2ndQuadrant
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
Tomas Vondra <tomas.vondra@2ndquadrant.com>
CREATE STATISTICS
What is it for?
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
Agenda
●
Quick intro into planning and estimates.
●
Estimates with correlated columns.
●
CREATE STATISTICS to the rescue!
– functional dependencies
– ndistinct
– MCV lists
●
Future improvements.
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
ZIP_CODES
CREATE TABLE zip_codes (
postal_code INT,
place_name VARCHAR(180),
state_name VARCHAR(100),
county_name VARCHAR(100),
community_name VARCHAR(100),
latitude REAL,
longitude REAL
);
cat create-table.sql | psql test
cat zip-codes-gb.csv | psql test -c "copy zip_codes from stdin"
-- http://download.geonames.org/export/zip/
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN
EXPLAIN (ANALYZE, TIMING off)
SELECT * FROM zip_codes WHERE place_name = 'Manchester';
QUERY PLAN
------------------------------------------------------------------
Seq Scan on zip_codes (cost=0.00..42175.91 rows=14028 width=67)
(actual rows=13889 loops=1)
Filter: ((place_name)::text = 'Manchester'::text)
Rows Removed by Filter: 1683064
Planning Time: 0.113 ms
Execution Time: 151.340 ms
(5 rows)
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
reltuples , relpages
SELECT reltuples, relpages FROM pg_class
WHERE relname = 'zip_codes';
reltuples | relpages
--------------+----------
1.696953e+06 | 20964
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
SELECT * FROM pg_stats
WHERE tablename = 'zip_codes'
AND attname = 'place_name';
------------------+---------------------------------
schemaname | public
tablename | zip_codes
attname | place_name
... | ...
most_common_vals | {... , Manchester, ...}
most_common_freqs | {..., 0.0082665813, ...}
... | ...
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT * FROM zip_codes WHERE place_name = 'Manchester';
QUERY PLAN
------------------------------------------------------------------
Seq Scan on zip_codes (cost=0.00..42175.91 rows=14028 width=67)
(actual rows=13889 loops=1)
Filter: ((place_name)::text = 'Manchester'::text)
Rows Removed by Filter: 1683064
reltuples | 1.696953e+06
most_common_vals | {..., Manchester, ...}
most_common_freqs | {..., 0.0082665813, ...}
1.696953e+06 * 0.0082665813 = 14027.9999367789
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT * FROM zip_codes WHERE community_name = 'Manchester';
QUERY PLAN
------------------------------------------------------------------
Seq Scan on zip_codes (cost=0.00..42175.91 rows=13858 width=67)
(actual rows=13912 loops=1)
Filter: ((community_name)::text = 'Manchester'::text)
Rows Removed by Filter: 1683041
reltuples | 1.696953e+06
most_common_vals | {..., Manchester, ...}
most_common_freqs | {..., 0.0081664017, ...}
1.696953e+06 * 0.0081664017 = 13857.9998640201
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT * FROM zip_codes WHERE place_name = 'Manchester'
AND community_name = 'Manchester';
QUERY PLAN
----------------------------------------------------------------
Seq Scan on zip_codes (cost=0.00..46418.29 rows=115 width=67)
(actual rows=11744 loops=1)
Filter: (((place_name)::text = 'Manchester'::text) AND
((community_name)::text = 'Manchester'::text))
Rows Removed by Filter: 1685209
Underestimate
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
P (A & B) = P(A) * P(B)
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
SELECT * FROM zip_codes
WHERE place_name = 'Manchester'
AND community_name = 'Manchester';
P(place_name = 'Manchester' & community_name = 'Manchester')
= P(place_name = 'Manchester') * P(community_name = 'Manchester')
= 0.0082665813 * 0.0081664017
= 0.00006750822358150821
0.00006750822358150821 * 1.696953e+06 = 114.558282531
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT * FROM zip_codes WHERE place_name = 'Manchester'
AND community_name = 'Manchester';
QUERY PLAN
----------------------------------------------------------------
Seq Scan on zip_codes (cost=0.00..46418.29 rows=115 width=67)
(actual rows=11744 loops=1)
Filter: (((place_name)::text = 'Manchester'::text) AND
((community_name)::text = 'Manchester'::text))
Rows Removed by Filter: 1685209
Underestimate
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT * FROM zip_codes WHERE place_name != 'London'
AND community_name = 'Westminster';
QUERY PLAN
------------------------------------------------------------------
Seq Scan on zip_codes (cost=0.00..46418.29 rows=10896 width=67)
(actual rows=4 loops=1)
Filter: (((place_name)::text <> 'London'::text) AND
((community_name)::text = 'Westminster'::text))
Rows Removed by Filter: 1696949
Overestimate
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
Correlated columns
●
Attribute Value Independence Assumption (AVIA)
– may result in wildly inaccurate estimates
– both underestimates and overestimates
●
consequences
– poor scan choices (Seq Scan vs. Index Scan)
– poor join choices (Nested Loop)
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
Index Scan using orders_city_idx on orders
(cost=0.28..185.10 rows=90 width=36)
(actual rows=12248237 loops=1)
Seq Scan using on orders
(cost=0.13..129385.10 rows=12248237 width=36)
(actual rows=90 loops=1)
Poor scan choices
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
-> Nested Loop (… rows=90 …) (… rows=12248237 …)
-> Index Scan using orders_city_idx on orders
(cost=0.28..185.10 rows=90 width=36)
(actual rows=12248237 loops=1)
...
-> Index Scan … (… loops=12248237)
Poor join choices
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
-> Nested Loop (… rows=90 …) (… rows=12248237 …)
-> Nested Loop (… rows=90 …) (… rows=12248237 …)
-> Nested Loop (… rows=90 …) (… rows=12248237 …)
-> Index Scan using orders_city_idx on orders
(cost=0.28..185.10 rows=90 width=36)
(actual rows=12248237 loops=1)
...
-> Index Scan … (… loops=12248237)
-> Index Scan … (… loops=12248237)
-> Index Scan … (… loops=12248237)
-> Index Scan … (… loops=12248237)
Poor join choices
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
functional dependencies
(WHERE)
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
Functional Dependencies
●
value in column A determines value in column B
●
trivial example: primary key determines everything
– zip code {place, state, county, community→ }
– M11 0AT {Manchester, England, Greater Manchester,→
Manchester District (B)}
●
other dependencies:
– zip code place state county community→ → → →
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
CREATE STATISTICS
CREATE STATISTICS s (dependencies)
ON place_name, community_name FROM zip_codes;
2 5
ANALYZE zip_codes;
SELECT stxdependencies FROM pg_statistic_ext WHERE stxname = 's';
dependencies
------------------------------------------
{"2 => 5": 0.697633, "5 => 2": 0.095800}
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
place → community: 0.697633 = d
P(place = 'Manchester' & community = 'Manchester') =
P(place = 'Manchester') * [d + (1-d) * P(community = 'Manchester')]
1.696953e+06 * 0.008266581 * (0.697633 + (1.0 - 0.697633) * 0.0081664017)
= 9281.03
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT * FROM zip_codes WHERE place_name = 'Manchester'
AND county_name = 'Manchester';
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on zip_codes (cost=0.00..46418.29 rows=9307 width=67)
(actual rows=11744 loops=1)
Filter: (((place_name)::text = 'Manchester'::text) AND
((community_name)::text = 'Manchester'::text))
Rows Removed by Filter: 1685209
Underestimate : fixed
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT * FROM zip_codes WHERE place_name != 'London'
AND community_name = 'Westminster';
QUERY PLAN
------------------------------------------------------------------
Seq Scan on zip_codes (cost=0.00..46418.29 rows=10896 width=67)
(actual rows=4 loops=1)
Filter: (((place_name)::text <> 'London'::text) AND
((community_name)::text = 'Westminster'::text))
Rows Removed by Filter: 1696949
Functional dependencies only work with equalities.
Overestimate #1: not fixed :-(
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT * FROM zip_codes WHERE place_name = 'Manchester'
AND county_name = 'Westminster';
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on zip_codes (cost=0.00..46418.29 rows=9305 width=67)
(actual rows=0 loops=1)
Filter: (((place_name)::text = 'Manchester'::text) AND
((community_name)::text = 'Westminster'::text))
Rows Removed by Filter: 1696953
The queries need to “respect” the functional dependencies.
Overestimate #2: not fixed :-(
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
ndistinct
(GROUP BY)
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT count(*) FROM zip_codes GROUP BY community_name;
QUERY PLAN
-------------------------------------------------------------------------
HashAggregate (cost=46418.29..46421.86 rows=358 width=29)
(actual rows=359 loops=1)
Group Key: community_name
-> Seq Scan on zip_codes (cost=0.00..37933.53 rows=1696953 width=21)
(actual rows=1696953 loops=1)
Planning Time: 0.087 ms
Execution Time: 337.718 ms
(5 rows)
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
SELECT attname, n_distinct
FROM pg_stats WHERE tablename = 'zip_codes';
attname | n_distinct
----------------+------------
community_name | 358
county_name | 91
latitude | 59925
longitude | 64559
place_name | 12281
postal_code | -1
state_name | 3
(7 rows)
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT count(*) FROM zip_codes GROUP BY community_name, place_name;
QUERY PLAN
----------------------------------------------------------------------------------
GroupAggregate (cost=294728.63..313395.11 rows=169695 width=40)
(actual rows=15194 loops=1)
Group Key: community_name, place_name
-> Sort (cost=294728.63..298971.01 rows=1696953 width=32)
(actual rows=1696953 loops=1)
Sort Key: community_name, place_name
Sort Method: external merge Disk: 69648kB
-> Seq Scan on zip_codes (cost=0.00..37933.53 rows=1696953 width=32)
(actual rows=1696953 loops=1)
Planning Time: 0.374 ms
Execution Time: 1554.933 ms
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT count(*) FROM zip_codes GROUP BY community_name, place_name;
QUERY PLAN
----------------------------------------------------------------------------------
GroupAggregate (cost=294728.63..313395.11 rows=169695 width=40)
(actual rows=15194 loops=1)
Group Key: community_name, place_name
-> Sort (cost=294728.63..298971.01 rows=1696953 width=32)
(actual rows=1696953 loops=1)
Sort Key: community_name, place_name
Sort Method: external merge Disk: 69648kB
-> Seq Scan on zip_codes (cost=0.00..37933.53 rows=1696953 width=32)
(actual rows=1696953 loops=1)
Planning Time: 0.374 ms
Execution Time: 1554.933 ms
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
ndistinct(community, place)
=
ndistinct(community) * ndistinct(place)
358 * 12281 = 4396598
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
ndistinct(community, place)
=
ndistinct(community) * ndistinct(place)
358 * 12281 = 4396598 (169695)
(capped to 10% of the table)
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
CREATE STATISTICS s (ndistinct)
ON place_name, community_name, county_name
FROM zip_codes;
ANALYZE zip_codes;
SELECT stxndistinct FROM pg_statistic_ext WHERE stxname = 's';
n_distinct
---------------------------------------------------------------
{"2, 4": 12996, "2, 5": 13221, "4, 5": 399, "2, 4, 5": 13252}
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
EXPLAIN (ANALYZE, TIMING off)
SELECT count(*) FROM zip_codes GROUP BY community_name, postal_code;
QUERY PLAN
---------------------------------------------------------------------------
HashAggregate (cost=50660.68..50792.89 rows=13221 width=40)
(actual rows=15194 loops=1)
Group Key: community_name, place_name
-> Seq Scan on zip_codes (cost=0.00..37933.53 rows=1696953 width=32)
(actual rows=1696953 loops=1)
Planning Time: 0.056 ms
Execution Time: 436.828 ms
(5 rows)
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
ndistinct
●
the “old behavior” was defensive
– unreliable estimates with multiple columns
– HashAggregate can’t spill to disk (OOM)
– rather than crash do Sort+GroupAggregate (slow)
●
ndistinct coefficients
– make multi-column ndistinct estimates more reliable
– reduced danger of OOM
– large tables + GROUP BY multiple columns
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
Future Improvements
●
additional types of statistics
– MCV lists (PG12), histograms (??), …
●
statistics on expressions
– currently only simple column references
– alternative to functional indexes
●
improving join estimates
– using MCV lists
– special multi-table statistics (syntax already supports it)
https://github.com/tvondra/create-statistics-talk https://www.2ndQuadrant.com
PostgresLondon
London, July 2, 2019
Questions?
Tomas Vondra
tomas.vondra@2ndquadrant.com
tomas@pgaddict.com
@fuzzycz
1 of 36

Recommended

CREATE STATISTICS - what is it for? by
CREATE STATISTICS - what is it for?CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?Tomas Vondra
481 views35 slides
[TechPlayConf]Rekognition導入事例 by
[TechPlayConf]Rekognition導入事例[TechPlayConf]Rekognition導入事例
[TechPlayConf]Rekognition導入事例Shinji Miyazato
2.1K views42 slides
Quill - 一個 Scala 的資料庫存取利器 by
Quill - 一個 Scala 的資料庫存取利器Quill - 一個 Scala 的資料庫存取利器
Quill - 一個 Scala 的資料庫存取利器vito jeng
79 views37 slides
Apache Spark™ Applications the Easy Way - Pierre Borckmans by
Apache Spark™ Applications the Easy Way - Pierre BorckmansApache Spark™ Applications the Easy Way - Pierre Borckmans
Apache Spark™ Applications the Easy Way - Pierre Borckmanssparktc
1.4K views25 slides
Using SQL-MapReduce for Advanced Analytics by
Using SQL-MapReduce for Advanced AnalyticsUsing SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced AnalyticsTeradata Aster
4K views25 slides
Big Data-Driven Applications with Cassandra and Spark by
Big Data-Driven Applications  with Cassandra and SparkBig Data-Driven Applications  with Cassandra and Spark
Big Data-Driven Applications with Cassandra and SparkArtem Chebotko
852 views46 slides

More Related Content

Similar to CREATE STATISTICS - What is it for? (PostgresLondon)

Fast NoSQL from HDDs? by
Fast NoSQL from HDDs? Fast NoSQL from HDDs?
Fast NoSQL from HDDs? ScyllaDB
693 views54 slides
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ... by
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...DataStax Academy
3.6K views50 slides
Geospatial and bitemporal search in cassandra with pluggable lucene index by
Geospatial and bitemporal search in cassandra with pluggable lucene indexGeospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene indexAndrés de la Peña
870 views50 slides
Community-driven Language Design at TC39 on the JavaScript Pipeline Operator ... by
Community-driven Language Design at TC39 on the JavaScript Pipeline Operator ...Community-driven Language Design at TC39 on the JavaScript Pipeline Operator ...
Community-driven Language Design at TC39 on the JavaScript Pipeline Operator ...Igalia
220 views51 slides
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench by
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchRim Moussa
2.1K views42 slides
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines by
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB
4.1K views46 slides

Similar to CREATE STATISTICS - What is it for? (PostgresLondon)(20)

Fast NoSQL from HDDs? by ScyllaDB
Fast NoSQL from HDDs? Fast NoSQL from HDDs?
Fast NoSQL from HDDs?
ScyllaDB693 views
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ... by DataStax Academy
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...
DataStax Academy3.6K views
Geospatial and bitemporal search in cassandra with pluggable lucene index by Andrés de la Peña
Geospatial and bitemporal search in cassandra with pluggable lucene indexGeospatial and bitemporal search in cassandra with pluggable lucene index
Geospatial and bitemporal search in cassandra with pluggable lucene index
Community-driven Language Design at TC39 on the JavaScript Pipeline Operator ... by Igalia
Community-driven Language Design at TC39 on the JavaScript Pipeline Operator ...Community-driven Language Design at TC39 on the JavaScript Pipeline Operator ...
Community-driven Language Design at TC39 on the JavaScript Pipeline Operator ...
Igalia220 views
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench by Rim Moussa
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Rim Moussa2.1K views
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines by MongoDB
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB4.1K views
Wait! What’s going on inside my database? by Jeremy Schneider
Wait! What’s going on inside my database?Wait! What’s going on inside my database?
Wait! What’s going on inside my database?
Jeremy Schneider4.6K views
MongoDB World 2016: The Best IoT Analytics with MongoDB by MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB1.6K views
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN... by EDB
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
EDB472 views
4Developers 2018: Beyond c++17 (Mateusz Pusz) by PROIDEA
4Developers 2018: Beyond c++17 (Mateusz Pusz)4Developers 2018: Beyond c++17 (Mateusz Pusz)
4Developers 2018: Beyond c++17 (Mateusz Pusz)
PROIDEA32 views
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot by Citus Data
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotDistributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Citus Data509 views
Big Data Analytics with MariaDB ColumnStore by MariaDB plc
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
MariaDB plc163 views
A Deep Dive into Query Execution Engine of Spark SQL by Databricks
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
Databricks6.6K views
Story of static code analyzer development by Andrey Karpov
Story of static code analyzer developmentStory of static code analyzer development
Story of static code analyzer development
Andrey Karpov68 views
List intersection for web search: Algorithms, Cost Models, and Optimizations by Sunghwan Kim
List intersection for web search: Algorithms, Cost Models, and OptimizationsList intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and Optimizations
Sunghwan Kim88 views

More from Tomas Vondra

Data corruption by
Data corruptionData corruption
Data corruptionTomas Vondra
1.8K views34 slides
DB vs. encryption by
DB vs. encryptionDB vs. encryption
DB vs. encryptionTomas Vondra
111 views19 slides
PostgreSQL performance improvements in 9.5 and 9.6 by
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6Tomas Vondra
14.2K views40 slides
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016 by
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016Tomas Vondra
8.8K views48 slides
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt by
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAltPostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAltTomas Vondra
938 views49 slides
PostgreSQL on EXT4, XFS, BTRFS and ZFS by
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSTomas Vondra
6.5K views46 slides

More from Tomas Vondra(18)

PostgreSQL performance improvements in 9.5 and 9.6 by Tomas Vondra
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6
Tomas Vondra14.2K views
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016 by Tomas Vondra
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
Tomas Vondra8.8K views
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt by Tomas Vondra
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAltPostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
Tomas Vondra938 views
PostgreSQL on EXT4, XFS, BTRFS and ZFS by Tomas Vondra
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra6.5K views
Performance improvements in PostgreSQL 9.5 and beyond by Tomas Vondra
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
Tomas Vondra10.3K views
Postgresql na EXT3/4, XFS, BTRFS a ZFS by Tomas Vondra
Postgresql na EXT3/4, XFS, BTRFS a ZFSPostgresql na EXT3/4, XFS, BTRFS a ZFS
Postgresql na EXT3/4, XFS, BTRFS a ZFS
Tomas Vondra1.3K views
PostgreSQL on EXT4, XFS, BTRFS and ZFS by Tomas Vondra
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra47.9K views
Novinky v PostgreSQL 9.4 a JSONB by Tomas Vondra
Novinky v PostgreSQL 9.4 a JSONBNovinky v PostgreSQL 9.4 a JSONB
Novinky v PostgreSQL 9.4 a JSONB
Tomas Vondra6.5K views
PostgreSQL performance archaeology by Tomas Vondra
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
Tomas Vondra4.4K views
Výkonnostní archeologie by Tomas Vondra
Výkonnostní archeologieVýkonnostní archeologie
Výkonnostní archeologie
Tomas Vondra849 views
Český fulltext a sdílené slovníky by Tomas Vondra
Český fulltext a sdílené slovníkyČeský fulltext a sdílené slovníky
Český fulltext a sdílené slovníky
Tomas Vondra955 views
SSD vs HDD / WAL, indexes and fsync by Tomas Vondra
SSD vs HDD / WAL, indexes and fsyncSSD vs HDD / WAL, indexes and fsync
SSD vs HDD / WAL, indexes and fsync
Tomas Vondra1.2K views
Checkpoint (CSPUG 22.11.2011) by Tomas Vondra
Checkpoint (CSPUG 22.11.2011)Checkpoint (CSPUG 22.11.2011)
Checkpoint (CSPUG 22.11.2011)
Tomas Vondra630 views
Čtení explain planu (CSPUG 21.6.2011) by Tomas Vondra
Čtení explain planu (CSPUG 21.6.2011)Čtení explain planu (CSPUG 21.6.2011)
Čtení explain planu (CSPUG 21.6.2011)
Tomas Vondra1.7K views
Replikace (CSPUG 19.4.2011) by Tomas Vondra
Replikace (CSPUG 19.4.2011)Replikace (CSPUG 19.4.2011)
Replikace (CSPUG 19.4.2011)
Tomas Vondra740 views
PostgreSQL / Performance monitoring by Tomas Vondra
PostgreSQL / Performance monitoringPostgreSQL / Performance monitoring
PostgreSQL / Performance monitoring
Tomas Vondra6.1K views

Recently uploaded

Copilot Prompting Toolkit_All Resources.pdf by
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfRiccardo Zamana
6 views4 slides
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema by
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDeltares
17 views13 slides
Neo4j y GenAI by
Neo4j y GenAI Neo4j y GenAI
Neo4j y GenAI Neo4j
42 views41 slides
Winter '24 Release Chat.pdf by
Winter '24 Release Chat.pdfWinter '24 Release Chat.pdf
Winter '24 Release Chat.pdfmelbourneauuser
9 views20 slides
HarshithAkkapelli_Presentation.pdf by
HarshithAkkapelli_Presentation.pdfHarshithAkkapelli_Presentation.pdf
HarshithAkkapelli_Presentation.pdfharshithakkapelli
11 views16 slides
Unleash The Monkeys by
Unleash The MonkeysUnleash The Monkeys
Unleash The MonkeysJacob Duijzer
7 views28 slides

Recently uploaded(20)

Copilot Prompting Toolkit_All Resources.pdf by Riccardo Zamana
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
Riccardo Zamana6 views
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema by Deltares
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
Deltares17 views
Neo4j y GenAI by Neo4j
Neo4j y GenAI Neo4j y GenAI
Neo4j y GenAI
Neo4j42 views
DSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - Parker by Deltares
DSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - ParkerDSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - Parker
DSD-INT 2023 SFINCS Modelling in the U.S. Pacific Northwest - Parker
Deltares9 views
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -... by Deltares
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
Deltares6 views
Consulting for Data Monetization Maximizing the Profit Potential of Your Data... by Flexsin
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...Consulting for Data Monetization Maximizing the Profit Potential of Your Data...
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...
Flexsin 15 views
Software evolution understanding: Automatic extraction of software identifier... by Ra'Fat Al-Msie'deen
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...
DSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - Prida by Deltares
DSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - PridaDSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - Prida
DSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - Prida
Deltares18 views
Tridens DevOps by Tridens
Tridens DevOpsTridens DevOps
Tridens DevOps
Tridens9 views
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... by Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller36 views
What Can Employee Monitoring Software Do?​ by wAnywhere
What Can Employee Monitoring Software Do?​What Can Employee Monitoring Software Do?​
What Can Employee Monitoring Software Do?​
wAnywhere21 views
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit... by Deltares
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
Deltares13 views
Upgrading Incident Management with Icinga - Icinga Camp Milan 2023 by Icinga
Upgrading Incident Management with Icinga - Icinga Camp Milan 2023Upgrading Incident Management with Icinga - Icinga Camp Milan 2023
Upgrading Incident Management with Icinga - Icinga Camp Milan 2023
Icinga38 views
El Arte de lo Possible by Neo4j
El Arte de lo PossibleEl Arte de lo Possible
El Arte de lo Possible
Neo4j38 views
Roadmap y Novedades de producto by Neo4j
Roadmap y Novedades de productoRoadmap y Novedades de producto
Roadmap y Novedades de producto
Neo4j50 views
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t... by Deltares
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
Deltares9 views

CREATE STATISTICS - What is it for? (PostgresLondon)