SlideShare a Scribd company logo
1 of 40
Copyright©2018 NTT Corp. All Rights Reserved.
Hypothetical Partitioning
for PostgreSQL
POSTGRESOPEN SV 2018
7th of September
Yuzuko Hosoya
NTT Open Source Software Center
2Copyright©2018 NTT Corp. All Rights Reserved.
Who am I ?
•Yuzuko Hosoya (@pyyycha)
From Tokyo, Japan
•Work
⁃NTT Open Source Software Center
⁃Working on HypoPG
⁃Interested in Planner/Partitioning
•Like
Dancing (Classical Ballet/Jazz)
3Copyright©2018 NTT Corp. All Rights Reserved.
I. Introduction of HypoPG
II. Why Hypothetical Partitioning?
III.HypoPG Usage & Architecture
IV.Demo
V. Future Works
VI.Summary
Agenda
4Copyright©2018 NTT Corp. All Rights Reserved.
Introduction of HypoPG
5Copyright©2018 NTT Corp. All Rights Reserved.
Introduction of HypoPG
Released by Julien Rouhaud
Latest version HypoPG 1.1.2
Support PostgreSQL 9.2 and above
Github https://github.com/HypoPG/hypopg
Documentation https://hypopg.readthedocs.io
6Copyright©2018 NTT Corp. All Rights Reserved.
•Supports hypothetical Indexes
•Allows users to define hypothetical indexes on real
tables and data
•If hypothetical indexes are defined on a table,
planner considers them along with any real indexes
•Outputs queries’ plan/cost with EXPLAIN as if
hypothetical indexes actually existed
•Helps index design tuning
HypoPG 1.X
7Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG 2 (still under development)
•Supports hypothetical partitioning (v10~)
•Allows users to try/simulate different hypothetical
partitioning schemes on real tables and data
•Outputs queries’ plan/cost with EXPLAIN using
hypothetical partitioning schemes
•Helps partitioning design tuning
8Copyright©2018 NTT Corp. All Rights Reserved.
Why Hypothetical Partitioning?
9Copyright©2018 NTT Corp. All Rights Reserved.
• Declarative Partitioning was a long-desired feature
• It will be further improved with new PostgreSQL versions
PostgreSQL Partitioning
Many more enhancements
of partitioning
⁃ HASH Partitioning
⁃ Partitioned Index
⁃ Faster Partition Pruning
⁃ Partition-wise Join/Aggregate
and so on
Declarative Partitioning
was introduced
⁃ Partitioning enabled by
CREATE TABLE statement
⁃ RANGE/LIST Partitioning
v10
v11
v9.6
Partitioning by combination
of following features
⁃ Table Inheritance
⁃ Check Constraint
⁃ Trigger
10Copyright©2018 NTT Corp. All Rights Reserved.
•The most IMPORTANT factor affecting query performance
over a partitioned table is partitioning schemes
Which tables to be partitioned
Which strategy to be chosen
Which columns to be specified as a partition key
How many partitions to be created
Query Performance and Partitioning Schemes
11Copyright©2018 NTT Corp. All Rights Reserved.
•Do not scan unnecessary partitions to reduce data size
to be read from the disk
Partition Pruning
January
December
12 partitions
orders
SELECT * FROM orders WHERE date=‘January’;
Scanned
NOT
Scanned
12Copyright©2018 NTT Corp. All Rights Reserved.
•Do not scan unnecessary partitions to reduce data size
to be read from the disk
•Optimal partitioning Scheme:
Partition by the column specified in WHERE clause
Partition Pruning
January
December
12 partitions
orders
Partition
by
date
SELECT * FROM orders WHERE date=‘January’;
13Copyright©2018 NTT Corp. All Rights Reserved.
•Joining pairs of partitions might be faster than joining
large tables
Partition-wise Join
SELECT c.name FROM orders o LEFT JOIN customers c
ON o.cust_id = c.cust_id where o.date = ‘September’;
orders customers
1-1000
5001-6000
1-1000
5001-6000
14Copyright©2018 NTT Corp. All Rights Reserved.
•Joining pairs of partitions might be faster than joining
large tables
•Optimal partitioning scheme:
Partitioned by the column specified in JOIN clause
Partition-wise Join
SELECT c.name FROM orders o LEFT JOIN customers c
ON o.cust_id = c.cust_id where o.date = ‘September’;
orders customers
1-1000
5001-6000
1-1000
5001-6000Partition
by
cust_id
Partition
by
cust_id
15Copyright©2018 NTT Corp. All Rights Reserved.
•Performing aggregation per partition and combining
them might be faster than performing aggregation on
large table
Partition-wise Aggregate
AggJanuary
December
orders
Agg
SELECT date, sum(price) FROM orders GROUP BY date;
16Copyright©2018 NTT Corp. All Rights Reserved.
•Performing aggregation per partition and combining
them might be faster than performing aggregation on
large table
•Optimal partitioning scheme:
Partitioned by the column specified in GROUP BY clause
Partition-wise Aggregate
SELECT date, sum(price) FROM orders GROUP BY date;
January
December
orders
Partition
by
date
17Copyright©2018 NTT Corp. All Rights Reserved.
•Assume this situation
Question:
How do you partition these tables?
orders customers
SELECT * FROM orders WHERE date=‘January’;
SELECT c.name FROM orders o LEFT JOIN customers c
ON o.cust_id = c.cust_id where o.date = ‘September’;
Partition
by
cust_id?
Partition
by
date?
Partition
by
order_id?
18Copyright©2018 NTT Corp. All Rights Reserved.
•There are often multiple choices on partitioning schemes
•Queries’ plan/cost are largely affected by data size and
parameter settings
It’s VERY HARD …
19Copyright©2018 NTT Corp. All Rights Reserved.
•There are often multiple choices on partitioning schemes
•Query plan/cost are largely affected by data size and
parameter settings
It’s HARD …
Do you want to know how your queries
behave before actually partitioning tables?
20Copyright©2018 NTT Corp. All Rights Reserved.
• You can quickly check how your queries would
behave if certain tables were partitioned
without actually partitioning any tables and
wasting any resources
• You can try different partitioning schemes
in a short time
Hypothetical Partitioning is Helpful!
21Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG
Usage & Architecture
22Copyright©2018 NTT Corp. All Rights Reserved.
•Need to have an actual plain table
•Create hypothetical partitioning schemes using following
HypoPG function:
For hypothetical partitioned table
For hypothetical partitions
NOTE: the 3rd argument is only for subpartitioning
How to Create
Hypothetical Partitioning Schemes
hypopg_partition_table
(‘table_name’, ‘PARTITION BY strategy(column)’)
hypopg_add_partition
(‘table_name’,
‘PARTITION OF parent FOR VALUES partition_bound’,
‘PARTITION BY strategy(column)’)
23Copyright©2018 NTT Corp. All Rights Reserved.
•Create hypothetical statistics for hypothetical partitions
using following HypoPG function:
•The table specified in the first argument of this function
must be a root table
•Statistics of all leaf partitions, which are related to the
given table, are created
How to Create Hypothetical Statistics
hypopg_analyze
(‘table_name’, percentage for sampling)
24Copyright©2018 NTT Corp. All Rights Reserved.
•For each partition, get the partition bounds including its
ancestors if any
How hypopg_analyze() Works
January
December
orders February
1-1000
1000-2000
partition bound:
date = ‘January’ ,
cust_id >= 1 AND
cust_id < 1000
hypopg_analyze(‘orders’, 70);
partition
by date
partition
by cust_id
25Copyright©2018 NTT Corp. All Rights Reserved.
•For each partition, get the partition bounds including its
ancestors if any
•Generate a WHERE clause according to the partition bound
How hypopg_analyze() Works
January
December
orders February
1-1000
1000-2000
partition bound:
date = ‘January’ and
cust_id >= 1 and
cust_id < 1000
WHERE
date = ‘January’ AND
cust_id >= 1 AND
cust_id < 1000
hypopg_analyze(‘orders’, 70);
partition
by date
partition
by cust_id
26Copyright©2018 NTT Corp. All Rights Reserved.
•For each partition, get the partition bounds including its
ancestors if any
•Generate a WHERE clause according to the partition bound
•Compute stats for sampling data got by TABLESAMPLE
and the WHERE clause
How hypopg_analyze() Works
January
December
orders February
1-1000
1000-2000
partition bound:
date = ‘January’ and
cust_id >= 1 and
cust_id < 1000
WHERE
date = ‘January’ AND
cust_id >= 1 AND
cust_id < 1000
hypopg_analyze(‘orders’, 70);
partition
by date
partition
by cust_id
27Copyright©2018 NTT Corp. All Rights Reserved.
Get sampling data from a target table according to percentage
Two kinds of sampling method
⁃SYSTEM: pick data by the page
⁃BERNOULLI: pick data by the row
 Specify WHERE clause
eliminate data that didn’t satisfied WHERE condition
TABLESAMPLE Clause
SELECT select_expression FROM table_name
TABLESAMPLE sampling_method(percentage) WHERE condition
sampling filtering
target
table
28Copyright©2018 NTT Corp. All Rights Reserved.
•For each partition, get the partition bounds including its
ancestors if any
•Generate a WHERE clause according to the partition bound
•Compute stats for sampling data got by TABLESAMPLE
and the WHERE clause
How hypopg_analyze() Works
January
December
orders February
1-1000
1000-2000
partition bound:
date = ‘January’ and
cust_id >= 1 and
cust_id < 1000
WHERE
date = ‘January’ AND
cust_id >= 1 AND
cust_id < 1000
sampling filtering
Compute Stats
orders
hypopg_analyze(‘orders’, 70);
partition
by date
partition
by cust_id
29Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG Architecture
HypoPG uses four kinds of hooks
T
T
T1 T3T2
stat
stat stat stat
EXPLAIN
Query Plan
PostgreSQL HypoPG
Actual Hypothetical
Table OID
Hypothetical
Scheme/
Statistics
30Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG Architecture
Using ProcessUtility_hook, HypoPG detects an EXPLAIN
without ANALYZE option and saves it in a local flag for
following steps
T
T
T1 T3T2
stat
stat stat stat
EXPLAIN
Query Plan
PostgreSQL HypoPG
Actual Hypothetical
Table OID
Hypothetical
Scheme/
Statistics
1
31Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG Architecture
Using get_relation_info_hook, hypothetical partitioning
scheme is injected if the target table is partitioned
hypothetically
T
T
T1 T3T2
stat
stat stat stat
EXPLAIN
Query Plan
PostgreSQL HypoPG
Actual Hypothetical
Table OID
Hypothetical
Scheme/
Statistics
2
32Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG Architecture
Using get_relation_stats_hook, hypothetical statistics
are injected to estimate correctly if they were created
in advance
T
T
T1 T3T2
stat
stat stat stat
EXPLAIN
Query Plan
PostgreSQL HypoPG
Actual Hypothetical
Table OID
Hypothetical
Scheme/
Statistics
3
33Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG Architecture
Using ExecutorEnd_hook, the EXPLAIN flag is removed.
Finally a query plan using hypothetical partitioning
schemes is displayed!!
T
T
T1 T3T2
stat
stat stat stat
EXPLAIN
Query Plan
PostgreSQL HypoPG
Actual Hypothetical
Table OID
Hypothetical
Scheme/
Statistics
4
34Copyright©2018 NTT Corp. All Rights Reserved.
•Simulate RANGE/LIST/HASH Partitioning
•Simulate SELECT queries
Partition Pruning, Partition-wise Join/Aggregate,
N-way Join, Parallel Query
•Simulate multi-level hypothetical partitioning
•Simulate hypothetical indexes on hypothetical partitions
What You Can Do
35Copyright©2018 NTT Corp. All Rights Reserved.
•Support only plain tables (not partitioned tables)
•Support only SELECT queries
•Do not support PostgreSQL 10 yet
Limitations
36Copyright©2018 NTT Corp. All Rights Reserved.
•Create a hypothetical partitioning scheme and execute
some simple queries
⁃A simple customers table to be partitioned actually:
⁃A simple orders table to be partitioned hypothetically:
NOTE: for convenience, this table is named **hypo_**orders to
quickly identify it in the plans.
DEMO time!
customers (cust_id integer PRIMARY KEY,
name TEXT, address TEXT)
hypo_orders (orders_id int PRIMARY KEY,
cust_id int, price int, date date)
37Copyright©2018 NTT Corp. All Rights Reserved.
Future Works & Summary
38Copyright©2018 NTT Corp. All Rights Reserved.
•Support PostgreSQL 10
•Have a better integration in PostgreSQL core
to ease our limitations
- Support UPDATE/DELETE/INSERT queries
- Support already partitioned tables
•Add automated advisor feature
What’s next?
39Copyright©2018 NTT Corp. All Rights Reserved.
HypoPG2 will support hypothetical partitioning
⁃Allows users to try/simulate different hypothetical partitioning
schemes on real tables and data
⁃Outputs queries’ plan/cost with EXPLAIN using hypothetical
partitioning schemes
Hypothetical partitioning helps you all to design
partitioning schemes
⁃You can quickly check how your queries would behave if certain
tables were partitioned without actually partitioning any tables
and wasting any resources
⁃You can try different partitioning schemes in a short time
We’d be happy to have some feedbacks!
Summary
40Copyright©2018 NTT Corp. All Rights Reserved.
THANK YOU!
Yuzuko Hosoya
hosoya.yuzuko@lab.ntt.co.jp

More Related Content

Similar to Hypothetical Partitioning for PostgreSQL

PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationAtsushi Torikoshi
 
MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?Norvald Ryeng
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidDatabricks
 
Engineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee JooEngineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee JooTwo Sigma
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles Sonigo
 
Graph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at ScaleGraph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at ScaleTigerGraph
 
Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringDevOps.com
 
Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0oysteing
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBOkbajda
 
Sensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process ControlSensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process ControlTIBCO_Software
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Taro L. Saito
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...GetInData
 
Introducing the Tachyum Prodigy Processor
Introducing the Tachyum Prodigy ProcessorIntroducing the Tachyum Prodigy Processor
Introducing the Tachyum Prodigy Processorinside-BigData.com
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsDataKitchen
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOpsChristopher Bergh
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboardDataWorks Summit
 
Distributed Database Architecture for GDPR
Distributed Database Architecture for GDPRDistributed Database Architecture for GDPR
Distributed Database Architecture for GDPRYugabyte
 

Similar to Hypothetical Partitioning for PostgreSQL (20)

PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
 
MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?
 
Application of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructureApplication of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructure
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
 
Engineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee JooEngineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee Joo
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
 
Graph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at ScaleGraph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at Scale
 
Why Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps MonitoringWhy Open Source Works for DevOps Monitoring
Why Open Source Works for DevOps Monitoring
 
Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
 
Sensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process ControlSensor Data Management & Analytics: Advanced Process Control
Sensor Data Management & Analytics: Advanced Process Control
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
 
Introducing the Tachyum Prodigy Processor
Introducing the Tachyum Prodigy ProcessorIntroducing the Tachyum Prodigy Processor
Introducing the Tachyum Prodigy Processor
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOps
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
 
Distributed Database Architecture for GDPR
Distributed Database Architecture for GDPRDistributed Database Architecture for GDPR
Distributed Database Architecture for GDPR
 

Recently uploaded

WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2WSO2
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2
 

Recently uploaded (20)

WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
 

Hypothetical Partitioning for PostgreSQL

  • 1. Copyright©2018 NTT Corp. All Rights Reserved. Hypothetical Partitioning for PostgreSQL POSTGRESOPEN SV 2018 7th of September Yuzuko Hosoya NTT Open Source Software Center
  • 2. 2Copyright©2018 NTT Corp. All Rights Reserved. Who am I ? •Yuzuko Hosoya (@pyyycha) From Tokyo, Japan •Work ⁃NTT Open Source Software Center ⁃Working on HypoPG ⁃Interested in Planner/Partitioning •Like Dancing (Classical Ballet/Jazz)
  • 3. 3Copyright©2018 NTT Corp. All Rights Reserved. I. Introduction of HypoPG II. Why Hypothetical Partitioning? III.HypoPG Usage & Architecture IV.Demo V. Future Works VI.Summary Agenda
  • 4. 4Copyright©2018 NTT Corp. All Rights Reserved. Introduction of HypoPG
  • 5. 5Copyright©2018 NTT Corp. All Rights Reserved. Introduction of HypoPG Released by Julien Rouhaud Latest version HypoPG 1.1.2 Support PostgreSQL 9.2 and above Github https://github.com/HypoPG/hypopg Documentation https://hypopg.readthedocs.io
  • 6. 6Copyright©2018 NTT Corp. All Rights Reserved. •Supports hypothetical Indexes •Allows users to define hypothetical indexes on real tables and data •If hypothetical indexes are defined on a table, planner considers them along with any real indexes •Outputs queries’ plan/cost with EXPLAIN as if hypothetical indexes actually existed •Helps index design tuning HypoPG 1.X
  • 7. 7Copyright©2018 NTT Corp. All Rights Reserved. HypoPG 2 (still under development) •Supports hypothetical partitioning (v10~) •Allows users to try/simulate different hypothetical partitioning schemes on real tables and data •Outputs queries’ plan/cost with EXPLAIN using hypothetical partitioning schemes •Helps partitioning design tuning
  • 8. 8Copyright©2018 NTT Corp. All Rights Reserved. Why Hypothetical Partitioning?
  • 9. 9Copyright©2018 NTT Corp. All Rights Reserved. • Declarative Partitioning was a long-desired feature • It will be further improved with new PostgreSQL versions PostgreSQL Partitioning Many more enhancements of partitioning ⁃ HASH Partitioning ⁃ Partitioned Index ⁃ Faster Partition Pruning ⁃ Partition-wise Join/Aggregate and so on Declarative Partitioning was introduced ⁃ Partitioning enabled by CREATE TABLE statement ⁃ RANGE/LIST Partitioning v10 v11 v9.6 Partitioning by combination of following features ⁃ Table Inheritance ⁃ Check Constraint ⁃ Trigger
  • 10. 10Copyright©2018 NTT Corp. All Rights Reserved. •The most IMPORTANT factor affecting query performance over a partitioned table is partitioning schemes Which tables to be partitioned Which strategy to be chosen Which columns to be specified as a partition key How many partitions to be created Query Performance and Partitioning Schemes
  • 11. 11Copyright©2018 NTT Corp. All Rights Reserved. •Do not scan unnecessary partitions to reduce data size to be read from the disk Partition Pruning January December 12 partitions orders SELECT * FROM orders WHERE date=‘January’; Scanned NOT Scanned
  • 12. 12Copyright©2018 NTT Corp. All Rights Reserved. •Do not scan unnecessary partitions to reduce data size to be read from the disk •Optimal partitioning Scheme: Partition by the column specified in WHERE clause Partition Pruning January December 12 partitions orders Partition by date SELECT * FROM orders WHERE date=‘January’;
  • 13. 13Copyright©2018 NTT Corp. All Rights Reserved. •Joining pairs of partitions might be faster than joining large tables Partition-wise Join SELECT c.name FROM orders o LEFT JOIN customers c ON o.cust_id = c.cust_id where o.date = ‘September’; orders customers 1-1000 5001-6000 1-1000 5001-6000
  • 14. 14Copyright©2018 NTT Corp. All Rights Reserved. •Joining pairs of partitions might be faster than joining large tables •Optimal partitioning scheme: Partitioned by the column specified in JOIN clause Partition-wise Join SELECT c.name FROM orders o LEFT JOIN customers c ON o.cust_id = c.cust_id where o.date = ‘September’; orders customers 1-1000 5001-6000 1-1000 5001-6000Partition by cust_id Partition by cust_id
  • 15. 15Copyright©2018 NTT Corp. All Rights Reserved. •Performing aggregation per partition and combining them might be faster than performing aggregation on large table Partition-wise Aggregate AggJanuary December orders Agg SELECT date, sum(price) FROM orders GROUP BY date;
  • 16. 16Copyright©2018 NTT Corp. All Rights Reserved. •Performing aggregation per partition and combining them might be faster than performing aggregation on large table •Optimal partitioning scheme: Partitioned by the column specified in GROUP BY clause Partition-wise Aggregate SELECT date, sum(price) FROM orders GROUP BY date; January December orders Partition by date
  • 17. 17Copyright©2018 NTT Corp. All Rights Reserved. •Assume this situation Question: How do you partition these tables? orders customers SELECT * FROM orders WHERE date=‘January’; SELECT c.name FROM orders o LEFT JOIN customers c ON o.cust_id = c.cust_id where o.date = ‘September’; Partition by cust_id? Partition by date? Partition by order_id?
  • 18. 18Copyright©2018 NTT Corp. All Rights Reserved. •There are often multiple choices on partitioning schemes •Queries’ plan/cost are largely affected by data size and parameter settings It’s VERY HARD …
  • 19. 19Copyright©2018 NTT Corp. All Rights Reserved. •There are often multiple choices on partitioning schemes •Query plan/cost are largely affected by data size and parameter settings It’s HARD … Do you want to know how your queries behave before actually partitioning tables?
  • 20. 20Copyright©2018 NTT Corp. All Rights Reserved. • You can quickly check how your queries would behave if certain tables were partitioned without actually partitioning any tables and wasting any resources • You can try different partitioning schemes in a short time Hypothetical Partitioning is Helpful!
  • 21. 21Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Usage & Architecture
  • 22. 22Copyright©2018 NTT Corp. All Rights Reserved. •Need to have an actual plain table •Create hypothetical partitioning schemes using following HypoPG function: For hypothetical partitioned table For hypothetical partitions NOTE: the 3rd argument is only for subpartitioning How to Create Hypothetical Partitioning Schemes hypopg_partition_table (‘table_name’, ‘PARTITION BY strategy(column)’) hypopg_add_partition (‘table_name’, ‘PARTITION OF parent FOR VALUES partition_bound’, ‘PARTITION BY strategy(column)’)
  • 23. 23Copyright©2018 NTT Corp. All Rights Reserved. •Create hypothetical statistics for hypothetical partitions using following HypoPG function: •The table specified in the first argument of this function must be a root table •Statistics of all leaf partitions, which are related to the given table, are created How to Create Hypothetical Statistics hypopg_analyze (‘table_name’, percentage for sampling)
  • 24. 24Copyright©2018 NTT Corp. All Rights Reserved. •For each partition, get the partition bounds including its ancestors if any How hypopg_analyze() Works January December orders February 1-1000 1000-2000 partition bound: date = ‘January’ , cust_id >= 1 AND cust_id < 1000 hypopg_analyze(‘orders’, 70); partition by date partition by cust_id
  • 25. 25Copyright©2018 NTT Corp. All Rights Reserved. •For each partition, get the partition bounds including its ancestors if any •Generate a WHERE clause according to the partition bound How hypopg_analyze() Works January December orders February 1-1000 1000-2000 partition bound: date = ‘January’ and cust_id >= 1 and cust_id < 1000 WHERE date = ‘January’ AND cust_id >= 1 AND cust_id < 1000 hypopg_analyze(‘orders’, 70); partition by date partition by cust_id
  • 26. 26Copyright©2018 NTT Corp. All Rights Reserved. •For each partition, get the partition bounds including its ancestors if any •Generate a WHERE clause according to the partition bound •Compute stats for sampling data got by TABLESAMPLE and the WHERE clause How hypopg_analyze() Works January December orders February 1-1000 1000-2000 partition bound: date = ‘January’ and cust_id >= 1 and cust_id < 1000 WHERE date = ‘January’ AND cust_id >= 1 AND cust_id < 1000 hypopg_analyze(‘orders’, 70); partition by date partition by cust_id
  • 27. 27Copyright©2018 NTT Corp. All Rights Reserved. Get sampling data from a target table according to percentage Two kinds of sampling method ⁃SYSTEM: pick data by the page ⁃BERNOULLI: pick data by the row  Specify WHERE clause eliminate data that didn’t satisfied WHERE condition TABLESAMPLE Clause SELECT select_expression FROM table_name TABLESAMPLE sampling_method(percentage) WHERE condition sampling filtering target table
  • 28. 28Copyright©2018 NTT Corp. All Rights Reserved. •For each partition, get the partition bounds including its ancestors if any •Generate a WHERE clause according to the partition bound •Compute stats for sampling data got by TABLESAMPLE and the WHERE clause How hypopg_analyze() Works January December orders February 1-1000 1000-2000 partition bound: date = ‘January’ and cust_id >= 1 and cust_id < 1000 WHERE date = ‘January’ AND cust_id >= 1 AND cust_id < 1000 sampling filtering Compute Stats orders hypopg_analyze(‘orders’, 70); partition by date partition by cust_id
  • 29. 29Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Architecture HypoPG uses four kinds of hooks T T T1 T3T2 stat stat stat stat EXPLAIN Query Plan PostgreSQL HypoPG Actual Hypothetical Table OID Hypothetical Scheme/ Statistics
  • 30. 30Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Architecture Using ProcessUtility_hook, HypoPG detects an EXPLAIN without ANALYZE option and saves it in a local flag for following steps T T T1 T3T2 stat stat stat stat EXPLAIN Query Plan PostgreSQL HypoPG Actual Hypothetical Table OID Hypothetical Scheme/ Statistics 1
  • 31. 31Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Architecture Using get_relation_info_hook, hypothetical partitioning scheme is injected if the target table is partitioned hypothetically T T T1 T3T2 stat stat stat stat EXPLAIN Query Plan PostgreSQL HypoPG Actual Hypothetical Table OID Hypothetical Scheme/ Statistics 2
  • 32. 32Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Architecture Using get_relation_stats_hook, hypothetical statistics are injected to estimate correctly if they were created in advance T T T1 T3T2 stat stat stat stat EXPLAIN Query Plan PostgreSQL HypoPG Actual Hypothetical Table OID Hypothetical Scheme/ Statistics 3
  • 33. 33Copyright©2018 NTT Corp. All Rights Reserved. HypoPG Architecture Using ExecutorEnd_hook, the EXPLAIN flag is removed. Finally a query plan using hypothetical partitioning schemes is displayed!! T T T1 T3T2 stat stat stat stat EXPLAIN Query Plan PostgreSQL HypoPG Actual Hypothetical Table OID Hypothetical Scheme/ Statistics 4
  • 34. 34Copyright©2018 NTT Corp. All Rights Reserved. •Simulate RANGE/LIST/HASH Partitioning •Simulate SELECT queries Partition Pruning, Partition-wise Join/Aggregate, N-way Join, Parallel Query •Simulate multi-level hypothetical partitioning •Simulate hypothetical indexes on hypothetical partitions What You Can Do
  • 35. 35Copyright©2018 NTT Corp. All Rights Reserved. •Support only plain tables (not partitioned tables) •Support only SELECT queries •Do not support PostgreSQL 10 yet Limitations
  • 36. 36Copyright©2018 NTT Corp. All Rights Reserved. •Create a hypothetical partitioning scheme and execute some simple queries ⁃A simple customers table to be partitioned actually: ⁃A simple orders table to be partitioned hypothetically: NOTE: for convenience, this table is named **hypo_**orders to quickly identify it in the plans. DEMO time! customers (cust_id integer PRIMARY KEY, name TEXT, address TEXT) hypo_orders (orders_id int PRIMARY KEY, cust_id int, price int, date date)
  • 37. 37Copyright©2018 NTT Corp. All Rights Reserved. Future Works & Summary
  • 38. 38Copyright©2018 NTT Corp. All Rights Reserved. •Support PostgreSQL 10 •Have a better integration in PostgreSQL core to ease our limitations - Support UPDATE/DELETE/INSERT queries - Support already partitioned tables •Add automated advisor feature What’s next?
  • 39. 39Copyright©2018 NTT Corp. All Rights Reserved. HypoPG2 will support hypothetical partitioning ⁃Allows users to try/simulate different hypothetical partitioning schemes on real tables and data ⁃Outputs queries’ plan/cost with EXPLAIN using hypothetical partitioning schemes Hypothetical partitioning helps you all to design partitioning schemes ⁃You can quickly check how your queries would behave if certain tables were partitioned without actually partitioning any tables and wasting any resources ⁃You can try different partitioning schemes in a short time We’d be happy to have some feedbacks! Summary
  • 40. 40Copyright©2018 NTT Corp. All Rights Reserved. THANK YOU! Yuzuko Hosoya hosoya.yuzuko@lab.ntt.co.jp