The Truth About Partitioning

THE TRUTH ABOUT PARTITIONING
APJ Live Webinar on 24th March, 2020
Beena J Emerson, Senior Software Engineer, EDB

© Copyright EnterpriseDB Corporation, 2020. All Rights Reserved.
AGENDA

WHO IS EDB?
The world leader in
open-source based Postgres
software and services
• Founded in 2004
• 300+ employees
• Offices worldwide
• Customer base >4000
• Recognized RDBMS leader by:
- Gartner
- Forrester
EnterpriseDB (EDB) delivers the premier open source-based, multi-model data platform for
new applications, cloud re-platforming, application modernization, and legacy migration.

Poll Question 1
Q: Do you use Postgres partitioning?
● Already using
● Planning to use
● Not using

Poll Question 2
Q: Are you using partitioning in another database?
● Oracle
● Microsoft SQL
● Others

INTRODUCTION
● What is Partitioning?
● Key Terms
● Partitioning Benefits

What is Partitioning?
PARENT TABLE
CHILD TABLES

Key Terms
ID Col 1 Col 2 Col3
PARTITION KEY
1 -
10000
10000 -
20000
20000 -
30000
PARTITION
BOUNDS
PARTITIONED
TABLE
PARTITIONS
}

FASTER DISK
Partitioning Benefits
Data Segregation
JAN
2020
DEC
2019
NOV
2019
SEP
2019

Maintenance
10 mn
rows
2
mn
5000
ms
300
ms
200
ms
VACUUM

Performance / Scalability
10 mn
rows
2
mn
Scanning smaller tables can take less time.
A = 100?

STYLE OF PARTITIONING IN POSTGRESQL
● Inheritance (Trigger- based) Partition
● Declarative Partitioning

Inheritance (Trigger-Based) Partitions
A = 3
PARENT TABLE TRIGGER
CHECK CHILD TABLES
● Manual
● Error-prone
● Constraints not mutually
exclusive
● Hard to maintain
partitions

Declarative Partitioning
● PostgreSQL 10
● Automated - No manual handling of triggers.
● Simpler syntax
● Easy management of partitions

PARTITIONING STRATEGIES
● List
● Range
● Hash

List Partitioning
DELHI
MUMBAI, GOA
HYDERABAD
BANGLORE,
MYSORE
● PostgreSQL 10
● Explicitly mention values for partition
key - single or multiple

Range Partitioning
MINVALUE -
100
100 - 500
500 - 1000
1000 -
MAXVALUE
● PostgreSQL 10
● Range boundaries
○ Lower inclusive (>=)
○ Upper exclusive (<)
● Unbounded values
○ MINVALUE
○ MAXVALUE

Hash Partitioning
Parent
Insert
Hash
fn
Mod m
● PostgreSQL 11
● Specify modulus and remainder
○ Modulus - non-zero positive integer
○ Remainder - non-negative integer
○ remainder < modulus
● Rows spread on hash value of the partition
key

DECLARATIVE PARTITIONING SYNTAX
● Create partitioned table
● Create partitions
● Add a partition
● Remove a partition

Create Partitioned Table
CREATE TABLE parent ( <col list > )
PARTITION BY <strategy> ( <partition key> );
List:
CREATE TABLE plist(id int, col1 varchar)
PARTITION BY LIST (col1);
Range:
CREATE TABLE prange(id int, col1 int, col2 int)
PARTITION BY RANGE (col1);
Hash:
CREATE TABLE phash(id int, col1 int, col2 int)
PARTITION BY HASH (col1);

Create Partitions
CREATE TABLE child PARTITION OF parent
FOR VALUES <partition bounds>
List:
CREATE TABLE clist PARTITION OF plist
FOR VALUES IN (‘CHENNAI’, ‘OOTY’);
Range:
CREATE TABLE crange PARTITION OF prange
FOR VALUES FROM (10) TO (20);
Hash:
CREATE TABLE chash PARTITION OF phash
FOR VALUES WITH (MODULUS 5, REMAINDER 0);

Add a Partition
ALTER TABLE parent ATTACH PARTITION child
FOR VALUES <partition bounds>
List:
ALTER TABLE plist ATTACH PARTITION clist2
FOR VALUES IN (‘MUMBAI’);
Range:
CREATE TABLE prange ATTACH PARTITION
crange2 FOR VALUES FROM (20) TO (50);
Hash:
CREATE TABLE phash ATTACH PARTITION
chash2 FOR VALUES WITH (MODULUS 5,
REMAINDER 1);
* All entries in the child will be checked to confirm if they meet

Remove Partition
ALTER TABLE parent
DETACH PARTITION child;
● It will no longer have the partition bound restriction.
● It will retain all the other constraints and triggers.
● If you no longer want the partition data then you
may simply use DROP command to remove the
table completely.

TYPES OF PARTITIONING
● Multi column partitioning
● Multi level partitioning

Multicolumn Partitioning
● Multiple columns as partition key
● Supported for range and hash
● Column limit : 32

Multicolumn Range Partitioning
● Specify the lower and upper bound for each of the partition key
involved.
CREATE TABLE prange (col1 int, col2 int, col3 int)
PARTITION BY RANGE (col1, col2, col3);
CREATE TABLE crange2 PARTITION OF prange
FOR VALUES FROM (10, 100, 50) TO (500, 500, 150);

● Every column following MAXVALUE / MINVALUE must also be the
same.
FOR VALUES FROM (10, MINVALUE, MINVALUE)
TO (10, 100, 50);
FOR VALUES FROM (500, 500, 150)
TO (MAXVALUE, MAXVALUE, MAXVALUE);

● The row comparison operator is used for insert
○ Elements are compared left-to-right, stopping at first unequal
pair of elements.
Consider partition (0, 0) TO (100, 50)
(0, 199), (100, 49) fits while
(100, 50), (101, 10) does not.

Multicolumn Hash Partitioning
● Only one bound is specified - The hash of each of partition key is
calculated and combined to get a single hash value based on which
the child partition is determined.
CREATE TABLE phash (col1 int, col2 int) PARTITION BY
HASH (col1, col2);
CREATE TABLE phash1 PARTITION OF hparent FOR VALUES
WITH (MODULUS 3, REMAINDER 2);
CREATE TABLE phash2 PARTITION OF hparent FOR VALUES
WITH (MODULUS 3, REMAINDER 1);

Multilevel Partitioning
1 - 20
20 -30
30 -50
A
B
C
1 - 100
100 - 200
col3
col1
col4
● Different strategies and partition key
can be used at different levels.
● Example:
CREATE TABLE child1 PARTITION
OF parent
FOR VALUES FROM (1) TO (20)
PARTITION BY LIST (col3);

BENCHMARKING
● pgbench options
● Bulk load performance
● Read-only query performance
● Sequential scan performance

pgbench options
pgbench -i --partitions <integer>
[--partition-method <method>]
● partitions : positive non-zero integer value
● partition-method : Default range. Hash also supported.
● Error if the --partition-method is specified without a valid --
partitions option.
● The pgbench_accounts table is partitioned on aid PRIMARY KEY

pgbench options
● For range partitions, scale is equally split across partitions.
○ lower bound of the first partition is MINVALUE,
○ upper bound of the last partition is MAXVALUE.
pgbench_accounts_1 FOR VALUES FROM (MINVALUE) TO (10001)....
pgbench_accounts_10 FOR VALUES FROM (90001) TO (MAXVALUE)
● For hash partitions, the number of partitions specified is used in the
modulo operation and the remainder ranges from 0 to partitions - 1
pgbench_accounts_1 FOR VALUES WITH (modulus 10, remainder 0)..
pgbench_accounts_10 FOR VALUES WITH (modulus 10, remainder 9)
(Examples are using scale 1 and --partitions as 10.)

Bulk load: range partitioning
● bulkload command
COPY is used to
populate accounts table
● range-partitioned table
takes a slightly longer
time
● partition count hardly
influences the load time.

Bulk load: hash partitioning
● number of partitions has
heavily impacted the
load time
● All partitions (tables) are
constantly switched.

Bulk load: Conclusion
● data ordered on the partition key column, no matter the size
or the number of partitions, the operation would take about 20–
25% more time than an unpartitioned table.
● If the data being copied is unordered with respect to the
partition key then the time taken will depend on how often the
partition has to be switched while insertions.

Seq Scan: default point query
● Remove the index
created by pgbench
● non-partitioned table
entire data scanned.
● Partition pruning -
chosen partition scanned
● ~50 GB data
● The amount of data in each partition reduces as the number of
partitions increase

Read-only: default point query
● default query
● ~50GB data + 10GB
indexes (scale=5000)
● target only one row in
one particular partition.
● 40% drop: overhead of
handling of a large
number of partitions
● slow degradation as
number of partitions
increase

Read-only: custom range query
● index scan
● targeting 0.02% rows in
sequence
● range: at most two
partitions touched
● hash: all partitions
touched
● ~50 GB

OTHER FEATURES
● Default partitions
● Runtime Partition Pruning
● Partition-wise join
● Partition-level aggregation
● Partition Tree Information

Default Partition
● PostgreSQL 11
● Catch tuples that do not match
partition bounds of others.
● Support for: list, range
● Syntax:
CREATE TABLE child
PARTITION OF parent
DEFAULT;
A
B
C
DEFAULT
Z
(NOT (col1 IS NOT NULL) AND
(col1 = ANY (ARRAY[ 'A', 'B', 'C' ])))

Add new partition
All the rows in default partition
are scanned.
Amount of time taken depends
on the number of rows in the
default partition
Sample:
1382 ms to add partition when
one Cr rows in default partition
but 2 ms when it is empty.
ERROR: updated partition
constraint for default
partition "part_def" would
be violated by some row
1 - 500
500 - 1000
DEF: 2207, 5000...
2000 - 2500
Default Partition

● PostgreSQL 11
● Performed at two levels
○ Executor Initialization - prepared query
○ Actual Execution
● SET enable_partition_pruning. Default is on.
Runtime Partition Pruning

Executor Initialization
PREPARE prep (int) as SELECT * from t1 where pkey < $1;
EXPLAIN EXECUTE prep(1500);
Append (cost=0.00..168.06 rows=3012 width=8)
Subplans Removed: 2
-> Seq Scan on p1 t1_1 (cost=0.00..38.25 rows=753 width=8)
Filter: (pkey < $1)
-> Seq Scan on p2 t1_2 (cost=0.00..38.25 rows=753 width=8)
Filter: (pkey < $1)
(6 rows)

Actual Execution - Unpartitioned Case
Considering a table with 6000 rows performs nest loop join with another
containing 4000 rows but only 2000 rows match the join condition.
Nested Loop (actual rows=1000 loops=1)
-> Seq Scan on ltbl (actual rows=6000 loops=1)
-> Index Scan using rtbl_pkey on rtbl (actual rows=0 loops=6000)
Index Cond: (col1 = ltbl.col1)
Planning Time: 0.248 ms
Execution Time: 15.265 ms
(6 rows)

Actual Execution - Partitoined Case
Consider that the table with 4000 rows is partitioned.
Nested Loop (actual rows=1000 loops=1)
-> Seq Scan on ltbl (actual rows=6000 loops=1)
-> Append (actual rows=0 loops=6000)
-> Index Scan using p1_pkey on p1 t1_1 (actual rows=1 loops=1500)
Index Cond: (pkey = ltbl.col1)
-> Index Scan using p2_pkey on p2 t1_2 (never executed)
-> Index Scan using p3_pkey on p3 t1_3 (actual rows=0 loops=500)
Execution Time: 10.632 ms (~30% drop)

● PostgreSQL 11
● SET enable_partitionwise_join. Default is off.
● Join should be on partition key and both partitions should have same
bounds for partitions.
Partition-wise join
1 - 20
20 -30
30 -50
1 - 20
20 -30
30 -50

Hash Join (actual rows=20000 loops=1)
Hash Cond: (a.col1 = b.col1)
-> Seq Scan on a1 a_1 (actual rows=10000 loops=1)
-> Hash (actual rows=20000 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1038kB
-> Seq Scan on b1 b_1 (actual rows=10000 loops=1)
Partition-wise join

SET enable_partitionwise_join =on;
Append (actual rows=20000 loops=1)
-> Hash Join (actual rows=10000 loops=1)
Hash Cond: (a_1.col1 = b_1.col1)
-> Hash Join (actual rows=10000 loops=1)
Hash Cond: (a_2.col1 = b_2.col1)
.(repeat for a3, b3 and a4, b4)
.
Almost 50% reduction in execution time.
Partition-wise join

● PostgreSQL 11
● manually set enable_partitionwise_aggregate (default is off)
● When GROUP BY uses partition key, aggregate individual partition.
● When grouped on non-partition key, PartialAggregate performed
on partitions and then combined.
● Aggregate pushed down if partition is foreign table.
Partition-level Aggregation

Table is partitioned on col1 and has 3 partitions with total of 25,000
rows.
Non-partitioned case:
SELECT col1, count(*) FROM t1 GROUP BY col1;
HashAggregate (actual rows=8 loops=1)
Group Key: t1_p1.col1
-> Seq Scan on t1_p1 (actual rows=10000 loops=1)
Partition-level Aggregation : Example

SET enable_partitionwise_aggregate=on;
Partitioned: GROUP KEY using partition key
Append (actual rows=7 loops=1)
-> HashAggregate (actual rows=2 loops=1)
about 20% decrease in execution time

When Aggregate does not use partition key
Finalize HashAggregate (actual rows=7 loops=1)
-> Partial HashAggregate (actual rows=2 loops=1)
A = 15
B = 20
C = 30
D = 25
E = 5
A = 10
B = 20
A = 5
C = 30
D = 25
E = 5

● PostgreSQL 12
● pg_partition_tree: Displays the entire partition tree in table format.
● pg_partition_ancestors: Displays all the ancestors from the partition
specified to the root.
● pg_partition_root: Displays the topmost root partitioned table.
Partition Tree Information

SELECT * FROM pg_partition_tree ('t1');
relid | parentrelid | isleaf | level
-------+-------------+--------+-------
t1 | | f | 0
p1 | t1 | f | 1
p11 | p1 | f | 2
p12 | p1 | t | 2
(4 rows)
t1 p1
p11
p12

SELECT * FROM pg_partition_root('p12');
pg_partition_root
-------------------
t1
(1 row)
SELECT * FROM pg_partition_ancestors('p12');
relid
-------
p12
p1
t1
(3 rows)

CONCLUSION
● Partition helps in certain scenarios not all.
● Know your data and system, experiment to
determine the best partition parameters for
your database table.
● Bad partition strategy could make your
system slower.

Confused about best partition approach?
EDB can help!
● DBA services for Postgres in the Cloud or Postgres in your
data centre
● Advice and consulting for Postgres deployments
● Technical Account Management to provide a knowledgeable
point of contact between your team and EDB technical team.
Email info@enterprisedb.com

Next APJ Live Webinar
When:
Wednesday, 15 April, 2020
Topic:
No Time to Waste: Migrate from Oracle to EDB Postgres in
Minutes
Look out for the email invite

The Truth About Partitioning

More Related Content

What's hot

Similar to The Truth About Partitioning

More from EDB

Recently uploaded

The Truth About Partitioning