SlideShare a Scribd company logo
Copyright©2018 NTT Corp. All Rights Reserved.
Hypothetical Partitioning
for PostgreSQL
POSTGRESOPEN SV 2018
7th of September
Yuzuko Hosoya
NTT Open Source Software Center
2Copyright©2018 NTT Corp. All Rights Reserved.
Who am I ?
•Yuzuko Hosoya (@pyyycha)
From Tokyo, Japan
•Work
⁃NTT Open Source Software Center
⁃Working on HypoPG
⁃Interested in Planner/Partitioning
•Like
Dancing (Classical Ballet/Jazz)
3Copyright©2018 NTT Corp. All Rights Reserved.
I. Introduction of HypoPG
II. Why Hypothetical Partitioning?
III.HypoPG Usage & Architecture
IV.Demo
V. Future Works
VI.Summary
Agenda
4Copyright©2018 NTT Corp. All Rights Reserved.
Introduction of HypoPG
5Copyright©2018 NTT Corp. All Rights Reserved.
Introduction of HypoPG
Released by Julien Rouhaud
Latest version HypoPG 1.1.2
Support PostgreSQL 9.2 and above
Github https://github.com/HypoPG/hypopg
Documentation https://hypopg.readthedocs.io
6Copyright©2018 NTT Corp. All Rights Reserved.
•Supports hypothetical Indexes
•Allows users to define hypothetical indexes on real
tables and data
•If hypothetical indexes are defined on a table,
planner considers them along with any real indexes
•Outputs queries’ plan/cost with EXPLAIN as if
hypothetical indexes actually existed
•Helps index design tuning
HypoPG 1.X
7Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG 2 (still under development)
•Supports hypothetical partitioning (v10~)
•Allows users to try/simulate different hypothetical
partitioning schemes on real tables and data
•Outputs queries’ plan/cost with EXPLAIN using
hypothetical partitioning schemes
•Helps partitioning design tuning
8Copyright©2018 NTT Corp. All Rights Reserved.
Why Hypothetical Partitioning?
9Copyright©2018 NTT Corp. All Rights Reserved.
• Declarative Partitioning was a long-desired feature
• It will be further improved with new PostgreSQL versions
PostgreSQL Partitioning
Many more enhancements
of partitioning
⁃ HASH Partitioning
⁃ Partitioned Index
⁃ Faster Partition Pruning
⁃ Partition-wise Join/Aggregate
and so on
Declarative Partitioning
was introduced
⁃ Partitioning enabled by
CREATE TABLE statement
⁃ RANGE/LIST Partitioning
v10
v11
v9.6
Partitioning by combination
of following features
⁃ Table Inheritance
⁃ Check Constraint
⁃ Trigger
10Copyright©2018 NTT Corp. All Rights Reserved.
•The most IMPORTANT factor affecting query performance
over a partitioned table is partitioning schemes
Which tables to be partitioned
Which strategy to be chosen
Which columns to be specified as a partition key
How many partitions to be created
Query Performance and Partitioning Schemes
11Copyright©2018 NTT Corp. All Rights Reserved.
•Do not scan unnecessary partitions to reduce data size
to be read from the disk
Partition Pruning
January
December
12 partitions
orders
SELECT * FROM orders WHERE date=‘January’;
Scanned
NOT
Scanned
12Copyright©2018 NTT Corp. All Rights Reserved.
•Do not scan unnecessary partitions to reduce data size
to be read from the disk
•Optimal partitioning Scheme:
Partition by the column specified in WHERE clause
Partition Pruning
January
December
12 partitions
orders
Partition
by
date
SELECT * FROM orders WHERE date=‘January’;
13Copyright©2018 NTT Corp. All Rights Reserved.
•Joining pairs of partitions might be faster than joining
large tables
Partition-wise Join
SELECT c.name FROM orders o LEFT JOIN customers c
ON o.cust_id = c.cust_id where o.date = ‘September’;
orders customers
1-1000
5001-6000
1-1000
5001-6000
14Copyright©2018 NTT Corp. All Rights Reserved.
•Joining pairs of partitions might be faster than joining
large tables
•Optimal partitioning scheme:
Partitioned by the column specified in JOIN clause
Partition-wise Join
SELECT c.name FROM orders o LEFT JOIN customers c
ON o.cust_id = c.cust_id where o.date = ‘September’;
orders customers
1-1000
5001-6000
1-1000
5001-6000Partition
by
cust_id
Partition
by
cust_id
15Copyright©2018 NTT Corp. All Rights Reserved.
•Performing aggregation per partition and combining
them might be faster than performing aggregation on
large table
Partition-wise Aggregate
AggJanuary
December
orders
Agg
SELECT date, sum(price) FROM orders GROUP BY date;
16Copyright©2018 NTT Corp. All Rights Reserved.
•Performing aggregation per partition and combining
them might be faster than performing aggregation on
large table
•Optimal partitioning scheme:
Partitioned by the column specified in GROUP BY clause
Partition-wise Aggregate
SELECT date, sum(price) FROM orders GROUP BY date;
January
December
orders
Partition
by
date
17Copyright©2018 NTT Corp. All Rights Reserved.
•Assume this situation
Question:
How do you partition these tables?
orders customers
SELECT * FROM orders WHERE date=‘January’;
SELECT c.name FROM orders o LEFT JOIN customers c
ON o.cust_id = c.cust_id where o.date = ‘September’;
Partition
by
cust_id?
Partition
by
date?
Partition
by
order_id?
18Copyright©2018 NTT Corp. All Rights Reserved.
•There are often multiple choices on partitioning schemes
•Queries’ plan/cost are largely affected by data size and
parameter settings
It’s VERY HARD …
19Copyright©2018 NTT Corp. All Rights Reserved.
•There are often multiple choices on partitioning schemes
•Query plan/cost are largely affected by data size and
parameter settings
It’s HARD …
Do you want to know how your queries
behave before actually partitioning tables?
20Copyright©2018 NTT Corp. All Rights Reserved.
• You can quickly check how your queries would
behave if certain tables were partitioned
without actually partitioning any tables and
wasting any resources
• You can try different partitioning schemes
in a short time
Hypothetical Partitioning is Helpful!
21Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG
Usage & Architecture
22Copyright©2018 NTT Corp. All Rights Reserved.
•Need to have an actual plain table
•Create hypothetical partitioning schemes using following
HypoPG function:
For hypothetical partitioned table
For hypothetical partitions
NOTE: the 3rd argument is only for subpartitioning
How to Create
Hypothetical Partitioning Schemes
hypopg_partition_table
(‘table_name’, ‘PARTITION BY strategy(column)’)
hypopg_add_partition
(‘table_name’,
‘PARTITION OF parent FOR VALUES partition_bound’,
‘PARTITION BY strategy(column)’)
23Copyright©2018 NTT Corp. All Rights Reserved.
•Create hypothetical statistics for hypothetical partitions
using following HypoPG function:
•The table specified in the first argument of this function
must be a root table
•Statistics of all leaf partitions, which are related to the
given table, are created
How to Create Hypothetical Statistics
hypopg_analyze
(‘table_name’, percentage for sampling)
24Copyright©2018 NTT Corp. All Rights Reserved.
•For each partition, get the partition bounds including its
ancestors if any
How hypopg_analyze() Works
January
December
orders February
1-1000
1000-2000
partition bound:
date = ‘January’ ,
cust_id >= 1 AND
cust_id < 1000
hypopg_analyze(‘orders’, 70);
partition
by date
partition
by cust_id
25Copyright©2018 NTT Corp. All Rights Reserved.
•For each partition, get the partition bounds including its
ancestors if any
•Generate a WHERE clause according to the partition bound
How hypopg_analyze() Works
January
December
orders February
1-1000
1000-2000
partition bound:
date = ‘January’ and
cust_id >= 1 and
cust_id < 1000
WHERE
date = ‘January’ AND
cust_id >= 1 AND
cust_id < 1000
hypopg_analyze(‘orders’, 70);
partition
by date
partition
by cust_id
26Copyright©2018 NTT Corp. All Rights Reserved.
•For each partition, get the partition bounds including its
ancestors if any
•Generate a WHERE clause according to the partition bound
•Compute stats for sampling data got by TABLESAMPLE
and the WHERE clause
How hypopg_analyze() Works
January
December
orders February
1-1000
1000-2000
partition bound:
date = ‘January’ and
cust_id >= 1 and
cust_id < 1000
WHERE
date = ‘January’ AND
cust_id >= 1 AND
cust_id < 1000
hypopg_analyze(‘orders’, 70);
partition
by date
partition
by cust_id
27Copyright©2018 NTT Corp. All Rights Reserved.
Get sampling data from a target table according to percentage
Two kinds of sampling method
⁃SYSTEM: pick data by the page
⁃BERNOULLI: pick data by the row
 Specify WHERE clause
eliminate data that didn’t satisfied WHERE condition
TABLESAMPLE Clause
SELECT select_expression FROM table_name
TABLESAMPLE sampling_method(percentage) WHERE condition
sampling filtering
target
table
28Copyright©2018 NTT Corp. All Rights Reserved.
•For each partition, get the partition bounds including its
ancestors if any
•Generate a WHERE clause according to the partition bound
•Compute stats for sampling data got by TABLESAMPLE
and the WHERE clause
How hypopg_analyze() Works
January
December
orders February
1-1000
1000-2000
partition bound:
date = ‘January’ and
cust_id >= 1 and
cust_id < 1000
WHERE
date = ‘January’ AND
cust_id >= 1 AND
cust_id < 1000
sampling filtering
Compute Stats
orders
hypopg_analyze(‘orders’, 70);
partition
by date
partition
by cust_id
29Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG Architecture
HypoPG uses four kinds of hooks
T
T
T1 T3T2
stat
stat stat stat
EXPLAIN
Query Plan
PostgreSQL HypoPG
Actual Hypothetical
Table OID
Hypothetical
Scheme/
Statistics
30Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG Architecture
Using ProcessUtility_hook, HypoPG detects an EXPLAIN
without ANALYZE option and saves it in a local flag for
following steps
T
T
T1 T3T2
stat
stat stat stat
EXPLAIN
Query Plan
PostgreSQL HypoPG
Actual Hypothetical
Table OID
Hypothetical
Scheme/
Statistics
1
31Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG Architecture
Using get_relation_info_hook, hypothetical partitioning
scheme is injected if the target table is partitioned
hypothetically
T
T
T1 T3T2
stat
stat stat stat
EXPLAIN
Query Plan
PostgreSQL HypoPG
Actual Hypothetical
Table OID
Hypothetical
Scheme/
Statistics
2
32Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG Architecture
Using get_relation_stats_hook, hypothetical statistics
are injected to estimate correctly if they were created
in advance
T
T
T1 T3T2
stat
stat stat stat
EXPLAIN
Query Plan
PostgreSQL HypoPG
Actual Hypothetical
Table OID
Hypothetical
Scheme/
Statistics
3
33Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG Architecture
Using ExecutorEnd_hook, the EXPLAIN flag is removed.
Finally a query plan using hypothetical partitioning
schemes is displayed!!
T
T
T1 T3T2
stat
stat stat stat
EXPLAIN
Query Plan
PostgreSQL HypoPG
Actual Hypothetical
Table OID
Hypothetical
Scheme/
Statistics
4
34Copyright©2018 NTT Corp. All Rights Reserved.
•Simulate RANGE/LIST/HASH Partitioning
•Simulate SELECT queries
Partition Pruning, Partition-wise Join/Aggregate,
N-way Join, Parallel Query
•Simulate multi-level hypothetical partitioning
•Simulate hypothetical indexes on hypothetical partitions
What You Can Do
35Copyright©2018 NTT Corp. All Rights Reserved.
•Support only plain tables (not partitioned tables)
•Support only SELECT queries
•Do not support PostgreSQL 10 yet
Limitations
36Copyright©2018 NTT Corp. All Rights Reserved.
•Create a hypothetical partitioning scheme and execute
some simple queries
⁃A simple customers table to be partitioned actually:
⁃A simple orders table to be partitioned hypothetically:
NOTE: for convenience, this table is named **hypo_**orders to
quickly identify it in the plans.
DEMO time!
customers (cust_id integer PRIMARY KEY,
name TEXT, address TEXT)
hypo_orders (orders_id int PRIMARY KEY,
cust_id int, price int, date date)
37Copyright©2018 NTT Corp. All Rights Reserved.
Future Works & Summary
38Copyright©2018 NTT Corp. All Rights Reserved.
•Support PostgreSQL 10
•Have a better integration in PostgreSQL core
to ease our limitations
- Support UPDATE/DELETE/INSERT queries
- Support already partitioned tables
•Add automated advisor feature
What’s next?
39Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG2 will support hypothetical partitioning
⁃Allows users to try/simulate different hypothetical partitioning
schemes on real tables and data
⁃Outputs queries’ plan/cost with EXPLAIN using hypothetical
partitioning schemes
Hypothetical partitioning helps you all to design
partitioning schemes
⁃You can quickly check how your queries would behave if certain
tables were partitioned without actually partitioning any tables
and wasting any resources
⁃You can try different partitioning schemes in a short time
We’d be happy to have some feedbacks!
Summary
40Copyright©2018 NTT Corp. All Rights Reserved.
THANK YOU!
Yuzuko Hosoya
hosoya.yuzuko@lab.ntt.co.jp

More Related Content

Similar to Hypothetical Partitioning for PostgreSQL

PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
Atsushi Torikoshi
 
MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?
Norvald Ryeng
 
Application of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructureApplication of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructure
NTT DATA OSS Professional Services
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
Taro L. Saito
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
Databricks
 
Engineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee JooEngineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee Joo
Two Sigma
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles Sonigo
 
Graph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at ScaleGraph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at Scale
TigerGraph
 
Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps Monitoring
DevOps.com
 
Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0
oysteing
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
kbajda
 
Sensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process ControlSensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process Control
TIBCO_Software
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Taro L. Saito
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
GetInData
 
Introducing the Tachyum Prodigy Processor
Introducing the Tachyum Prodigy ProcessorIntroducing the Tachyum Prodigy Processor
Introducing the Tachyum Prodigy Processor
inside-BigData.com
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
DataKitchen
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOps
Christopher Bergh
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Jaroslav Gergic
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
DataWorks Summit
 
Distributed Database Architecture for GDPR
Distributed Database Architecture for GDPRDistributed Database Architecture for GDPR
Distributed Database Architecture for GDPR
Yugabyte
 

Similar to Hypothetical Partitioning for PostgreSQL (20)

PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
 
MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?
 
Application of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructureApplication of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructure
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
 
Engineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee JooEngineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee Joo
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
 
Graph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at ScaleGraph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at Scale
 
Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps Monitoring
 
Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
 
Sensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process ControlSensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process Control
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
 
Introducing the Tachyum Prodigy Processor
Introducing the Tachyum Prodigy ProcessorIntroducing the Tachyum Prodigy Processor
Introducing the Tachyum Prodigy Processor
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOps
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
 
Distributed Database Architecture for GDPR
Distributed Database Architecture for GDPRDistributed Database Architecture for GDPR
Distributed Database Architecture for GDPR
 

Recently uploaded

Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 

Recently uploaded (20)

Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 

Hypothetical Partitioning for PostgreSQL

  • 1. Copyright©2018 NTT Corp. All Rights Reserved. Hypothetical Partitioning for PostgreSQL POSTGRESOPEN SV 2018 7th of September Yuzuko Hosoya NTT Open Source Software Center
  • 2. 2Copyright©2018 NTT Corp. All Rights Reserved. Who am I ? •Yuzuko Hosoya (@pyyycha) From Tokyo, Japan •Work ⁃NTT Open Source Software Center ⁃Working on HypoPG ⁃Interested in Planner/Partitioning •Like Dancing (Classical Ballet/Jazz)
  • 3. 3Copyright©2018 NTT Corp. All Rights Reserved. I. Introduction of HypoPG II. Why Hypothetical Partitioning? III.HypoPG Usage & Architecture IV.Demo V. Future Works VI.Summary Agenda
  • 4. 4Copyright©2018 NTT Corp. All Rights Reserved. Introduction of HypoPG
  • 5. 5Copyright©2018 NTT Corp. All Rights Reserved. Introduction of HypoPG Released by Julien Rouhaud Latest version HypoPG 1.1.2 Support PostgreSQL 9.2 and above Github https://github.com/HypoPG/hypopg Documentation https://hypopg.readthedocs.io
  • 6. 6Copyright©2018 NTT Corp. All Rights Reserved. •Supports hypothetical Indexes •Allows users to define hypothetical indexes on real tables and data •If hypothetical indexes are defined on a table, planner considers them along with any real indexes •Outputs queries’ plan/cost with EXPLAIN as if hypothetical indexes actually existed •Helps index design tuning HypoPG 1.X
  • 7. 7Copyright©2018 NTT Corp. All Rights Reserved. HypoPG 2 (still under development) •Supports hypothetical partitioning (v10~) •Allows users to try/simulate different hypothetical partitioning schemes on real tables and data •Outputs queries’ plan/cost with EXPLAIN using hypothetical partitioning schemes •Helps partitioning design tuning
  • 8. 8Copyright©2018 NTT Corp. All Rights Reserved. Why Hypothetical Partitioning?
  • 9. 9Copyright©2018 NTT Corp. All Rights Reserved. • Declarative Partitioning was a long-desired feature • It will be further improved with new PostgreSQL versions PostgreSQL Partitioning Many more enhancements of partitioning ⁃ HASH Partitioning ⁃ Partitioned Index ⁃ Faster Partition Pruning ⁃ Partition-wise Join/Aggregate and so on Declarative Partitioning was introduced ⁃ Partitioning enabled by CREATE TABLE statement ⁃ RANGE/LIST Partitioning v10 v11 v9.6 Partitioning by combination of following features ⁃ Table Inheritance ⁃ Check Constraint ⁃ Trigger
  • 10. 10Copyright©2018 NTT Corp. All Rights Reserved. •The most IMPORTANT factor affecting query performance over a partitioned table is partitioning schemes Which tables to be partitioned Which strategy to be chosen Which columns to be specified as a partition key How many partitions to be created Query Performance and Partitioning Schemes
  • 11. 11Copyright©2018 NTT Corp. All Rights Reserved. •Do not scan unnecessary partitions to reduce data size to be read from the disk Partition Pruning January December 12 partitions orders SELECT * FROM orders WHERE date=‘January’; Scanned NOT Scanned
  • 12. 12Copyright©2018 NTT Corp. All Rights Reserved. •Do not scan unnecessary partitions to reduce data size to be read from the disk •Optimal partitioning Scheme: Partition by the column specified in WHERE clause Partition Pruning January December 12 partitions orders Partition by date SELECT * FROM orders WHERE date=‘January’;
  • 13. 13Copyright©2018 NTT Corp. All Rights Reserved. •Joining pairs of partitions might be faster than joining large tables Partition-wise Join SELECT c.name FROM orders o LEFT JOIN customers c ON o.cust_id = c.cust_id where o.date = ‘September’; orders customers 1-1000 5001-6000 1-1000 5001-6000
  • 14. 14Copyright©2018 NTT Corp. All Rights Reserved. •Joining pairs of partitions might be faster than joining large tables •Optimal partitioning scheme: Partitioned by the column specified in JOIN clause Partition-wise Join SELECT c.name FROM orders o LEFT JOIN customers c ON o.cust_id = c.cust_id where o.date = ‘September’; orders customers 1-1000 5001-6000 1-1000 5001-6000Partition by cust_id Partition by cust_id
  • 15. 15Copyright©2018 NTT Corp. All Rights Reserved. •Performing aggregation per partition and combining them might be faster than performing aggregation on large table Partition-wise Aggregate AggJanuary December orders Agg SELECT date, sum(price) FROM orders GROUP BY date;
  • 16. 16Copyright©2018 NTT Corp. All Rights Reserved. •Performing aggregation per partition and combining them might be faster than performing aggregation on large table •Optimal partitioning scheme: Partitioned by the column specified in GROUP BY clause Partition-wise Aggregate SELECT date, sum(price) FROM orders GROUP BY date; January December orders Partition by date
  • 17. 17Copyright©2018 NTT Corp. All Rights Reserved. •Assume this situation Question: How do you partition these tables? orders customers SELECT * FROM orders WHERE date=‘January’; SELECT c.name FROM orders o LEFT JOIN customers c ON o.cust_id = c.cust_id where o.date = ‘September’; Partition by cust_id? Partition by date? Partition by order_id?
  • 18. 18Copyright©2018 NTT Corp. All Rights Reserved. •There are often multiple choices on partitioning schemes •Queries’ plan/cost are largely affected by data size and parameter settings It’s VERY HARD …
  • 19. 19Copyright©2018 NTT Corp. All Rights Reserved. •There are often multiple choices on partitioning schemes •Query plan/cost are largely affected by data size and parameter settings It’s HARD … Do you want to know how your queries behave before actually partitioning tables?
  • 20. 20Copyright©2018 NTT Corp. All Rights Reserved. • You can quickly check how your queries would behave if certain tables were partitioned without actually partitioning any tables and wasting any resources • You can try different partitioning schemes in a short time Hypothetical Partitioning is Helpful!
  • 21. 21Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Usage & Architecture
  • 22. 22Copyright©2018 NTT Corp. All Rights Reserved. •Need to have an actual plain table •Create hypothetical partitioning schemes using following HypoPG function: For hypothetical partitioned table For hypothetical partitions NOTE: the 3rd argument is only for subpartitioning How to Create Hypothetical Partitioning Schemes hypopg_partition_table (‘table_name’, ‘PARTITION BY strategy(column)’) hypopg_add_partition (‘table_name’, ‘PARTITION OF parent FOR VALUES partition_bound’, ‘PARTITION BY strategy(column)’)
  • 23. 23Copyright©2018 NTT Corp. All Rights Reserved. •Create hypothetical statistics for hypothetical partitions using following HypoPG function: •The table specified in the first argument of this function must be a root table •Statistics of all leaf partitions, which are related to the given table, are created How to Create Hypothetical Statistics hypopg_analyze (‘table_name’, percentage for sampling)
  • 24. 24Copyright©2018 NTT Corp. All Rights Reserved. •For each partition, get the partition bounds including its ancestors if any How hypopg_analyze() Works January December orders February 1-1000 1000-2000 partition bound: date = ‘January’ , cust_id >= 1 AND cust_id < 1000 hypopg_analyze(‘orders’, 70); partition by date partition by cust_id
  • 25. 25Copyright©2018 NTT Corp. All Rights Reserved. •For each partition, get the partition bounds including its ancestors if any •Generate a WHERE clause according to the partition bound How hypopg_analyze() Works January December orders February 1-1000 1000-2000 partition bound: date = ‘January’ and cust_id >= 1 and cust_id < 1000 WHERE date = ‘January’ AND cust_id >= 1 AND cust_id < 1000 hypopg_analyze(‘orders’, 70); partition by date partition by cust_id
  • 26. 26Copyright©2018 NTT Corp. All Rights Reserved. •For each partition, get the partition bounds including its ancestors if any •Generate a WHERE clause according to the partition bound •Compute stats for sampling data got by TABLESAMPLE and the WHERE clause How hypopg_analyze() Works January December orders February 1-1000 1000-2000 partition bound: date = ‘January’ and cust_id >= 1 and cust_id < 1000 WHERE date = ‘January’ AND cust_id >= 1 AND cust_id < 1000 hypopg_analyze(‘orders’, 70); partition by date partition by cust_id
  • 27. 27Copyright©2018 NTT Corp. All Rights Reserved. Get sampling data from a target table according to percentage Two kinds of sampling method ⁃SYSTEM: pick data by the page ⁃BERNOULLI: pick data by the row  Specify WHERE clause eliminate data that didn’t satisfied WHERE condition TABLESAMPLE Clause SELECT select_expression FROM table_name TABLESAMPLE sampling_method(percentage) WHERE condition sampling filtering target table
  • 28. 28Copyright©2018 NTT Corp. All Rights Reserved. •For each partition, get the partition bounds including its ancestors if any •Generate a WHERE clause according to the partition bound •Compute stats for sampling data got by TABLESAMPLE and the WHERE clause How hypopg_analyze() Works January December orders February 1-1000 1000-2000 partition bound: date = ‘January’ and cust_id >= 1 and cust_id < 1000 WHERE date = ‘January’ AND cust_id >= 1 AND cust_id < 1000 sampling filtering Compute Stats orders hypopg_analyze(‘orders’, 70); partition by date partition by cust_id
  • 29. 29Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Architecture HypoPG uses four kinds of hooks T T T1 T3T2 stat stat stat stat EXPLAIN Query Plan PostgreSQL HypoPG Actual Hypothetical Table OID Hypothetical Scheme/ Statistics
  • 30. 30Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Architecture Using ProcessUtility_hook, HypoPG detects an EXPLAIN without ANALYZE option and saves it in a local flag for following steps T T T1 T3T2 stat stat stat stat EXPLAIN Query Plan PostgreSQL HypoPG Actual Hypothetical Table OID Hypothetical Scheme/ Statistics 1
  • 31. 31Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Architecture Using get_relation_info_hook, hypothetical partitioning scheme is injected if the target table is partitioned hypothetically T T T1 T3T2 stat stat stat stat EXPLAIN Query Plan PostgreSQL HypoPG Actual Hypothetical Table OID Hypothetical Scheme/ Statistics 2
  • 32. 32Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Architecture Using get_relation_stats_hook, hypothetical statistics are injected to estimate correctly if they were created in advance T T T1 T3T2 stat stat stat stat EXPLAIN Query Plan PostgreSQL HypoPG Actual Hypothetical Table OID Hypothetical Scheme/ Statistics 3
  • 33. 33Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Architecture Using ExecutorEnd_hook, the EXPLAIN flag is removed. Finally a query plan using hypothetical partitioning schemes is displayed!! T T T1 T3T2 stat stat stat stat EXPLAIN Query Plan PostgreSQL HypoPG Actual Hypothetical Table OID Hypothetical Scheme/ Statistics 4
  • 34. 34Copyright©2018 NTT Corp. All Rights Reserved. •Simulate RANGE/LIST/HASH Partitioning •Simulate SELECT queries Partition Pruning, Partition-wise Join/Aggregate, N-way Join, Parallel Query •Simulate multi-level hypothetical partitioning •Simulate hypothetical indexes on hypothetical partitions What You Can Do
  • 35. 35Copyright©2018 NTT Corp. All Rights Reserved. •Support only plain tables (not partitioned tables) •Support only SELECT queries •Do not support PostgreSQL 10 yet Limitations
  • 36. 36Copyright©2018 NTT Corp. All Rights Reserved. •Create a hypothetical partitioning scheme and execute some simple queries ⁃A simple customers table to be partitioned actually: ⁃A simple orders table to be partitioned hypothetically: NOTE: for convenience, this table is named **hypo_**orders to quickly identify it in the plans. DEMO time! customers (cust_id integer PRIMARY KEY, name TEXT, address TEXT) hypo_orders (orders_id int PRIMARY KEY, cust_id int, price int, date date)
  • 37. 37Copyright©2018 NTT Corp. All Rights Reserved. Future Works & Summary
  • 38. 38Copyright©2018 NTT Corp. All Rights Reserved. •Support PostgreSQL 10 •Have a better integration in PostgreSQL core to ease our limitations - Support UPDATE/DELETE/INSERT queries - Support already partitioned tables •Add automated advisor feature What’s next?
  • 39. 39Copyright©2018 NTT Corp. All Rights Reserved. HypoPG2 will support hypothetical partitioning ⁃Allows users to try/simulate different hypothetical partitioning schemes on real tables and data ⁃Outputs queries’ plan/cost with EXPLAIN using hypothetical partitioning schemes Hypothetical partitioning helps you all to design partitioning schemes ⁃You can quickly check how your queries would behave if certain tables were partitioned without actually partitioning any tables and wasting any resources ⁃You can try different partitioning schemes in a short time We’d be happy to have some feedbacks! Summary
  • 40. 40Copyright©2018 NTT Corp. All Rights Reserved. THANK YOU! Yuzuko Hosoya hosoya.yuzuko@lab.ntt.co.jp