SlideShare a Scribd company logo
1 of 36
Download to read offline
Practical
Partitioning in
Production with
Postgres
Jimmy Angelakos
Senior PostgreSQL Architect
Postgres Vision 2021-06-23
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
2
We’ll be looking at:
• Intro to Partitioning in PostgreSQL
• Why?
• How?
• Practical Example
Introduction to
Partitioning in
PostgreSQL
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
4
• RDBMS context: division of a table into distinct independent tables
• Horizontal partitioning (by row) – different rows in different tables
• Why?
– Easier to manage
– Performance
What is partitioning?
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
5
• Has had partitioning for quite some time now PG 8.1 (2005)
…
– Inheritance-based
– Why haven’t I heard of this before?
– It’s not great tbh...
• Declarative Partitioning: PG 10 (2017)
– Massive improvement
Partitioning in PostgreSQL
HISTORY
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
6
CREATE TABLE cust (id INT, signup DATE)
PARTITION BY RANGE (signup);
CREATE TABLE cust_2020
PARTITION OF cust FOR VALUES FROM
('2020-01-01') TO ('2021-01-01');
• Partitions may be partitioned
themselves (sub-partitioning)
Declarative Partitioning
( PG 10+ )
Specification of: By declaring a table (DDL):
• Partitioning method
• Partition key
– Column(s) or expression(s)
– Value determines data routing
• Partition boundaries
Why?
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
8
• Database size: unlimited ✅
• Tables per database: 1.4 billion ✅
• Table size: 32 TB 😐
– Default block size: 8192 bytes
• Rows per table: depends
– As many as can fit onto 4.2 billion blocks 😐
PostgreSQL limits
(Hard limits, hard to reach)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
9
• Disk size limitations
– You can put partitions on different tablespaces
• Performance
– Partition pruning
– Table scans
– Index scans
– Hidden pitfalls of very large tables*
What partitioning can help with (i)
(Very large tables)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
10
• Maintenance
– Deletions (some filesystems are bad at deleting large numbers of files)
🤭
– DROP TABLE cust_2020;
– ALTER TABLE cust DETACH PARTITION cust_2020;
• VACUUM
– Bloat
– Freezing → xid wraparound
What partitioning can help with (ii)
(Very large tables)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
11
• Magic bullet
– No substitute for rational database design
• Sharding
– Not about putting part of the data on different nodes
• Performance tuning
– Unless you have one of the mentioned issues
What partitioning is not
How?
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
13
• Get your calculator out
– Data ingestion rate (both rows and size in bytes)
– Projected increases (e.g. 25 locations projected to be 200 by end of year)
– Data retention requirements
• Will inform choice of partitioning method and key
• For instance: 1440 measurements/day from each of 1000 sensors – extrapolate per year
• Keep checking if this is valid and be prepared to revise
Dimensioning
Plan ahead!
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
14
• Range: For key column(s) e.g. ranges of dates, identifiers, etc.
– Lower end: inclusive, upper end: exclusive
• List: Explicit key values stated for each partition
• Hash (PG 11+): If you have a column with values close to unique
– Define Modulus ( & remainder ) for number of almost-evenly-sized partitions
Partitioning method
Dimensioning usually makes this clearer
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
15
• Analysis
– Determine main keys used for retrieval from queries
– Proper key selection enables partition pruning
– Can use multiple columns for higher granularity (more partitions)
• Desirable
– High enough cardinality (range of values) for the number of partitions needed
– A column that doesn’t change often, to avoid moving rows among partitions
Partition Key selection
Choose wisely - know your data!
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
16
• Simply put, partitions are partitioned tables themselves. Plan ahead!
• CREATE TABLE transactions ( , location_code
… TEXT, tstamp TIMESTAMPTZ)
PARTITION BY RANGE (tstamp);
• CREATE TABLE transactions_2021_06
PARTITION OF transactions FOR VALUES FROM ('2021-06-01') TO ('2021-07-01')
PARTITION BY HASH (location_code);
• CREATE TABLE transactions_2021_06_p1
PARTITION OF transactions_2021_06 FOR VALUES WITH (MODULUS 4, REMAINDER 0);
Sub-partitioning
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
17
Partitioning by multiple columns
• CREATE TABLE transactions ( , location_code
… TEXT, tstamp TIMESTAMPTZ)
PARTITION BY RANGE (tstamp, location_code);
• CREATE TABLE transactions_2021_06_a PARTITION OF transactions
FOR VALUES FROM ('2021-06-01', 'AAA') TO ('2021-07-01', 'AZZ');
• CREATE TABLE transactions_2021_06_b PARTITION OF transactions
FOR VALUES FROM ('2021-06-01', 'BAA') TO ('2021-07-01', 'BZZ');
ERROR: partition "transactions_2021_06_b" would overlap partition
"transactions_2021_06_a"
• Because tstamp '2021-06-01' can only go in the first partition!
Be careful!
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
18
• Automatic creation of partitions
– Create in advance
– Use a cronjob
• Imperative merging/splitting of partitions
– Move rows manually
• Sharding to different nodes
– You may have to configure FDW manually
What Postgres does not do
core
Practical
Example
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
20
• Is your table too large to handle?
• Can partitioning help?
• What if it’s in constant use?
Partitioning a live production system
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
21
• OLTP workload, transactions keep flowing in
– Table keeps increasing in size
• VACUUM never ends
– Has been running for a full month already…
• Queries are getting slower
– Not just because of sheer number of rows...
The situation
Huge 20 TB table
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
22
• Postgres has 1GB segment size
– Can only be changed at
compilation time
– 20 TB table = 20000 segments
(files on disk)
• Why is this a problem?
– md.c →
* Hidden performance pitfall (i)
For VERY large tables
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
23
●
This loops 20000 times every time you
want to access a table page
– Linked list of segments
●
Code from PG 9.6
●
It has been heavily optimised recently
(caching, etc).
●
Still needs to run a lot of times
* Hidden performance pitfall (ii)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
24
• Need to partition the huge table
– Dimensioning
– Partition method
– Partition key
• Make sure we’re on the latest version (PG 13)
– Get latest features & performance enhancements
So what do we do?
Next steps
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
25
• Dimensioning
– One partition per month will be about 30GB of data, so acceptable size
• Method, Key
– Candidate key is transaction date, which we can partition by range
– Check that there are no data errors (e.g. dates in the future when they shouldn’t be)
• Partition sizes don’t have to be equal
– We can partition older, less often accessed data by year
What is our table like?
It holds daily transaction totals for each point of sales
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
26
• Lock the table totally (ACCESS EXCLUSIVE) or prevent writes
– People will start yelling, and they will be right
• Cause excessive load on the system (e.g. I/O) or cause excessive disk space usage
– Can’t copy whole 20 TB table into empty partitioned table
– See above about yelling
• Present an inconsistent or incomplete view of the data
Problems
What things you cannot do in production
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
27
• Rename the huge table and its indices
• Create an empty partitioned table with the old huge table’s name
• Create the required indices on the new partitioned table
– They will be created automatically for each new partition
• Create first new partition for new incoming data
• Attach the old table as a partition of the new table so it can be used normally*
• Move data out of the old table incrementally at our own pace
The plan
Take it step by step
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
28
-- Do this all in one transaction
BEGIN;
ALTER TABLE dailytotals RENAME TO dailytotals_legacy;
ALTER INDEX dailytotals_batchid RENAME TO dailytotals_legacy_batchid;
ALTER INDEX …
…
Rename the huge table and its indices
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
29
CREATE TABLE dailytotals (
totalid BIGINT NOT NULL DEFAULT nextval('dailytotals_totalid_seq')
, totaldate DATE NOT NULL
, totalsum BIGINT
…
, batchid BIGINT NOT NULL
)
PARTITION BY RANGE (totaldate);
CREATE INDEX dailytotals_batchid ON dailytotals (batchid);
…
Create empty partitioned table & indices
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
30
CREATE TABLE dailytotals_202106
PARTITION OF dailytotals
FOR VALUES FROM ('2021-06-01') TO ('2021-07-01');
Create partition for new incoming data
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
31
DO $$
DECLARE earliest DATE;
DECLARE latest DATE;
BEGIN
-- Set boundaries
SELECT min(totaldate) INTO earliest FROM dailytotals_legacy;
latest := '2021-06-01'::DATE;
Attach old table as a partition (i)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
32
-- HACK HACK HACK (only because we know and trust our data)
ALTER TABLE dailytotals_legacy
ADD CONSTRAINT dailytotals_legacy_totaldate
CHECK (totaldate >= earliest AND totaldate < latest)
NOT VALID;
-- You should not touch pg_catalog directly 😕
UPDATE pg_constraint
SET convalidated = true
WHERE conname = 'dailytotals_legacy_totaldate';
Attach old table as a partition (ii)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
33
ALTER TABLE dailytotals
ATTACH PARTITION dailytotals_legacy
FOR VALUES FROM (earliest) TO (latest);
END;
$$ LANGUAGE PLPGSQL;
COMMIT;
Attach old table as a partition (iii)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
34
• For instance, during quiet hours for the system, in scheduled batch jobs, etc.
WITH rows AS (
DELETE FROM dailytotals_legacy d
WHERE (totaldate >= '2020-01-01' AND totaldate < '2021-01-01')
RETURNING d.* )
INSERT INTO dailytotals SELECT * FROM rows;
• In the same transaction: DETACH the old table, perform the move, reATTACH with changed
boundaries. Rinse and repeat!
• Make sure the target partition exists!
Move data from old table at our own pace
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
35
• PG11: DEFAULT partition, UPDATE on partition key, HASH method, PKs, FKs, Indexes, Triggers
• PG12: Performance (pruning, COPY), FK references for partitioned tables, ordered scans
• PG13: Logical replication for partitioned tables, improved performance (JOINs, pruning)
• (Soon) PG14: REINDEX CONCURRENTLY, DETACH CONCURRENTLY, faster UPDATE/DELETE
Partitioning improvements
Make sure you’re on the latest release so you have them!
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
36
• Know your data!
• Upgrade – be on the latest release!
• Partition before you get in deep water!
• Find me on Twitter: @vyruss
To conclude...

More Related Content

What's hot

Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
Building Your First App with MongoDB
Building Your First App with MongoDBBuilding Your First App with MongoDB
Building Your First App with MongoDBMongoDB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDBAlex Sharp
 
Tuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlTuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlMydbops
 
Groovy and PBCS is Game Changing
Groovy and PBCS is Game ChangingGroovy and PBCS is Game Changing
Groovy and PBCS is Game ChangingKyle Goodfriend
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PGConf APAC
 
PostGreSQL Performance Tuning
PostGreSQL Performance TuningPostGreSQL Performance Tuning
PostGreSQL Performance TuningMaven Logix
 
Git 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using GitGit 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using GitGeoff Hoffman
 
Understanding Branching and Merging in Git
Understanding Branching and Merging in GitUnderstanding Branching and Merging in Git
Understanding Branching and Merging in Gitgittower
 
Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresJitendra Singh
 
Git and GitHub
Git and GitHubGit and GitHub
Git and GitHubJames Gray
 
Temporal Snapshot Fact Tables
Temporal Snapshot Fact TablesTemporal Snapshot Fact Tables
Temporal Snapshot Fact TablesDavide Mauri
 
PostgreSQL continuous backup and PITR with Barman
 PostgreSQL continuous backup and PITR with Barman PostgreSQL continuous backup and PITR with Barman
PostgreSQL continuous backup and PITR with BarmanEDB
 
Git Introduction Tutorial
Git Introduction TutorialGit Introduction Tutorial
Git Introduction TutorialThomas Rausch
 
IBM DB2 for z/OS Administration Basics
IBM DB2 for z/OS Administration BasicsIBM DB2 for z/OS Administration Basics
IBM DB2 for z/OS Administration BasicsIBM
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in CassandraEd Anuff
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 

What's hot (20)

Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Building Your First App with MongoDB
Building Your First App with MongoDBBuilding Your First App with MongoDB
Building Your First App with MongoDB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Tuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlTuning Autovacuum in Postgresql
Tuning Autovacuum in Postgresql
 
Groovy and PBCS is Game Changing
Groovy and PBCS is Game ChangingGroovy and PBCS is Game Changing
Groovy and PBCS is Game Changing
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
PostGreSQL Performance Tuning
PostGreSQL Performance TuningPostGreSQL Performance Tuning
PostGreSQL Performance Tuning
 
Git 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using GitGit 101 - Crash Course in Version Control using Git
Git 101 - Crash Course in Version Control using Git
 
Understanding Branching and Merging in Git
Understanding Branching and Merging in GitUnderstanding Branching and Merging in Git
Understanding Branching and Merging in Git
 
Git in 10 minutes
Git in 10 minutesGit in 10 minutes
Git in 10 minutes
 
Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and Underscores
 
Git and GitHub
Git and GitHubGit and GitHub
Git and GitHub
 
Temporal Snapshot Fact Tables
Temporal Snapshot Fact TablesTemporal Snapshot Fact Tables
Temporal Snapshot Fact Tables
 
PostgreSQL continuous backup and PITR with Barman
 PostgreSQL continuous backup and PITR with Barman PostgreSQL continuous backup and PITR with Barman
PostgreSQL continuous backup and PITR with Barman
 
Git Introduction Tutorial
Git Introduction TutorialGit Introduction Tutorial
Git Introduction Tutorial
 
Git tutorial
Git tutorialGit tutorial
Git tutorial
 
IBM DB2 for z/OS Administration Basics
IBM DB2 for z/OS Administration BasicsIBM DB2 for z/OS Administration Basics
IBM DB2 for z/OS Administration Basics
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 

Similar to Practical Partitioning in Production with Postgres

Large Table Partitioning with PostgreSQL and Django
 Large Table Partitioning with PostgreSQL and Django Large Table Partitioning with PostgreSQL and Django
Large Table Partitioning with PostgreSQL and DjangoEDB
 
Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]Jimmy Angelakos
 
New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13EDB
 
PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!EDB
 
Dok Talks #133 - My First 90 days with Clickhouse
Dok Talks #133 - My First 90 days with ClickhouseDok Talks #133 - My First 90 days with Clickhouse
Dok Talks #133 - My First 90 days with ClickhouseDoKC
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfAlkin Tezuysal
 
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAACuneyt Goksu
 
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasadBigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasadBhawani N Prasad
 
Scaling db infra_pay_pal
Scaling db infra_pay_palScaling db infra_pay_pal
Scaling db infra_pay_palpramod garre
 
Performance Tuning Oracle's BI Applications
Performance Tuning Oracle's BI ApplicationsPerformance Tuning Oracle's BI Applications
Performance Tuning Oracle's BI ApplicationsKPI Partners
 
IDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OSIDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OSCuneyt Goksu
 
Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)IDERA Software
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolThierry Badard
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 
Star schema my sql
Star schema   my sqlStar schema   my sql
Star schema my sqldeathsubte
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
Data Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlData Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlHAFIZ Islam
 
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB  | InfluxDays...Sam Dillard [InfluxData] | Performance Optimization in InfluxDB  | InfluxDays...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...InfluxData
 

Similar to Practical Partitioning in Production with Postgres (20)

Large Table Partitioning with PostgreSQL and Django
 Large Table Partitioning with PostgreSQL and Django Large Table Partitioning with PostgreSQL and Django
Large Table Partitioning with PostgreSQL and Django
 
Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]
 
New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13
 
PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!
 
Dok Talks #133 - My First 90 days with Clickhouse
Dok Talks #133 - My First 90 days with ClickhouseDok Talks #133 - My First 90 days with Clickhouse
Dok Talks #133 - My First 90 days with Clickhouse
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasadBigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
 
Scaling db infra_pay_pal
Scaling db infra_pay_palScaling db infra_pay_pal
Scaling db infra_pay_pal
 
Performance Tuning Oracle's BI Applications
Performance Tuning Oracle's BI ApplicationsPerformance Tuning Oracle's BI Applications
Performance Tuning Oracle's BI Applications
 
IDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OSIDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OS
 
Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL tool
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Star schema my sql
Star schema   my sqlStar schema   my sql
Star schema my sql
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Data Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlData Warehouse Logical Design using Mysql
Data Warehouse Logical Design using Mysql
 
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB  | InfluxDays...Sam Dillard [InfluxData] | Performance Optimization in InfluxDB  | InfluxDays...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
 

More from Jimmy Angelakos

Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Jimmy Angelakos
 
Changing your huge table's data types in production
Changing your huge table's data types in productionChanging your huge table's data types in production
Changing your huge table's data types in productionJimmy Angelakos
 
The State of (Full) Text Search in PostgreSQL 12
The State of (Full) Text Search in PostgreSQL 12The State of (Full) Text Search in PostgreSQL 12
The State of (Full) Text Search in PostgreSQL 12Jimmy Angelakos
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesJimmy Angelakos
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseJimmy Angelakos
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataJimmy Angelakos
 
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλονEισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλονJimmy Angelakos
 
PostgreSQL: Mέθοδοι για Data Replication
PostgreSQL: Mέθοδοι για Data ReplicationPostgreSQL: Mέθοδοι για Data Replication
PostgreSQL: Mέθοδοι για Data ReplicationJimmy Angelakos
 

More from Jimmy Angelakos (8)

Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 
Changing your huge table's data types in production
Changing your huge table's data types in productionChanging your huge table's data types in production
Changing your huge table's data types in production
 
The State of (Full) Text Search in PostgreSQL 12
The State of (Full) Text Search in PostgreSQL 12The State of (Full) Text Search in PostgreSQL 12
The State of (Full) Text Search in PostgreSQL 12
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic Data
 
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλονEισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
 
PostgreSQL: Mέθοδοι για Data Replication
PostgreSQL: Mέθοδοι για Data ReplicationPostgreSQL: Mέθοδοι για Data Replication
PostgreSQL: Mέθοδοι για Data Replication
 

Recently uploaded

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 

Recently uploaded (20)

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 

Practical Partitioning in Production with Postgres

  • 1. Practical Partitioning in Production with Postgres Jimmy Angelakos Senior PostgreSQL Architect Postgres Vision 2021-06-23
  • 2. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 2 We’ll be looking at: • Intro to Partitioning in PostgreSQL • Why? • How? • Practical Example
  • 4. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 4 • RDBMS context: division of a table into distinct independent tables • Horizontal partitioning (by row) – different rows in different tables • Why? – Easier to manage – Performance What is partitioning?
  • 5. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 5 • Has had partitioning for quite some time now PG 8.1 (2005) … – Inheritance-based – Why haven’t I heard of this before? – It’s not great tbh... • Declarative Partitioning: PG 10 (2017) – Massive improvement Partitioning in PostgreSQL HISTORY
  • 6. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 6 CREATE TABLE cust (id INT, signup DATE) PARTITION BY RANGE (signup); CREATE TABLE cust_2020 PARTITION OF cust FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'); • Partitions may be partitioned themselves (sub-partitioning) Declarative Partitioning ( PG 10+ ) Specification of: By declaring a table (DDL): • Partitioning method • Partition key – Column(s) or expression(s) – Value determines data routing • Partition boundaries
  • 8. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 8 • Database size: unlimited ✅ • Tables per database: 1.4 billion ✅ • Table size: 32 TB 😐 – Default block size: 8192 bytes • Rows per table: depends – As many as can fit onto 4.2 billion blocks 😐 PostgreSQL limits (Hard limits, hard to reach)
  • 9. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 9 • Disk size limitations – You can put partitions on different tablespaces • Performance – Partition pruning – Table scans – Index scans – Hidden pitfalls of very large tables* What partitioning can help with (i) (Very large tables)
  • 10. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 10 • Maintenance – Deletions (some filesystems are bad at deleting large numbers of files) 🤭 – DROP TABLE cust_2020; – ALTER TABLE cust DETACH PARTITION cust_2020; • VACUUM – Bloat – Freezing → xid wraparound What partitioning can help with (ii) (Very large tables)
  • 11. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 11 • Magic bullet – No substitute for rational database design • Sharding – Not about putting part of the data on different nodes • Performance tuning – Unless you have one of the mentioned issues What partitioning is not
  • 12. How?
  • 13. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 13 • Get your calculator out – Data ingestion rate (both rows and size in bytes) – Projected increases (e.g. 25 locations projected to be 200 by end of year) – Data retention requirements • Will inform choice of partitioning method and key • For instance: 1440 measurements/day from each of 1000 sensors – extrapolate per year • Keep checking if this is valid and be prepared to revise Dimensioning Plan ahead!
  • 14. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 14 • Range: For key column(s) e.g. ranges of dates, identifiers, etc. – Lower end: inclusive, upper end: exclusive • List: Explicit key values stated for each partition • Hash (PG 11+): If you have a column with values close to unique – Define Modulus ( & remainder ) for number of almost-evenly-sized partitions Partitioning method Dimensioning usually makes this clearer
  • 15. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 15 • Analysis – Determine main keys used for retrieval from queries – Proper key selection enables partition pruning – Can use multiple columns for higher granularity (more partitions) • Desirable – High enough cardinality (range of values) for the number of partitions needed – A column that doesn’t change often, to avoid moving rows among partitions Partition Key selection Choose wisely - know your data!
  • 16. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 16 • Simply put, partitions are partitioned tables themselves. Plan ahead! • CREATE TABLE transactions ( , location_code … TEXT, tstamp TIMESTAMPTZ) PARTITION BY RANGE (tstamp); • CREATE TABLE transactions_2021_06 PARTITION OF transactions FOR VALUES FROM ('2021-06-01') TO ('2021-07-01') PARTITION BY HASH (location_code); • CREATE TABLE transactions_2021_06_p1 PARTITION OF transactions_2021_06 FOR VALUES WITH (MODULUS 4, REMAINDER 0); Sub-partitioning
  • 17. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 17 Partitioning by multiple columns • CREATE TABLE transactions ( , location_code … TEXT, tstamp TIMESTAMPTZ) PARTITION BY RANGE (tstamp, location_code); • CREATE TABLE transactions_2021_06_a PARTITION OF transactions FOR VALUES FROM ('2021-06-01', 'AAA') TO ('2021-07-01', 'AZZ'); • CREATE TABLE transactions_2021_06_b PARTITION OF transactions FOR VALUES FROM ('2021-06-01', 'BAA') TO ('2021-07-01', 'BZZ'); ERROR: partition "transactions_2021_06_b" would overlap partition "transactions_2021_06_a" • Because tstamp '2021-06-01' can only go in the first partition! Be careful!
  • 18. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 18 • Automatic creation of partitions – Create in advance – Use a cronjob • Imperative merging/splitting of partitions – Move rows manually • Sharding to different nodes – You may have to configure FDW manually What Postgres does not do core
  • 20. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 20 • Is your table too large to handle? • Can partitioning help? • What if it’s in constant use? Partitioning a live production system
  • 21. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 21 • OLTP workload, transactions keep flowing in – Table keeps increasing in size • VACUUM never ends – Has been running for a full month already… • Queries are getting slower – Not just because of sheer number of rows... The situation Huge 20 TB table
  • 22. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 22 • Postgres has 1GB segment size – Can only be changed at compilation time – 20 TB table = 20000 segments (files on disk) • Why is this a problem? – md.c → * Hidden performance pitfall (i) For VERY large tables
  • 23. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 23 ● This loops 20000 times every time you want to access a table page – Linked list of segments ● Code from PG 9.6 ● It has been heavily optimised recently (caching, etc). ● Still needs to run a lot of times * Hidden performance pitfall (ii)
  • 24. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 24 • Need to partition the huge table – Dimensioning – Partition method – Partition key • Make sure we’re on the latest version (PG 13) – Get latest features & performance enhancements So what do we do? Next steps
  • 25. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 25 • Dimensioning – One partition per month will be about 30GB of data, so acceptable size • Method, Key – Candidate key is transaction date, which we can partition by range – Check that there are no data errors (e.g. dates in the future when they shouldn’t be) • Partition sizes don’t have to be equal – We can partition older, less often accessed data by year What is our table like? It holds daily transaction totals for each point of sales
  • 26. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 26 • Lock the table totally (ACCESS EXCLUSIVE) or prevent writes – People will start yelling, and they will be right • Cause excessive load on the system (e.g. I/O) or cause excessive disk space usage – Can’t copy whole 20 TB table into empty partitioned table – See above about yelling • Present an inconsistent or incomplete view of the data Problems What things you cannot do in production
  • 27. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 27 • Rename the huge table and its indices • Create an empty partitioned table with the old huge table’s name • Create the required indices on the new partitioned table – They will be created automatically for each new partition • Create first new partition for new incoming data • Attach the old table as a partition of the new table so it can be used normally* • Move data out of the old table incrementally at our own pace The plan Take it step by step
  • 28. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 28 -- Do this all in one transaction BEGIN; ALTER TABLE dailytotals RENAME TO dailytotals_legacy; ALTER INDEX dailytotals_batchid RENAME TO dailytotals_legacy_batchid; ALTER INDEX … … Rename the huge table and its indices
  • 29. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 29 CREATE TABLE dailytotals ( totalid BIGINT NOT NULL DEFAULT nextval('dailytotals_totalid_seq') , totaldate DATE NOT NULL , totalsum BIGINT … , batchid BIGINT NOT NULL ) PARTITION BY RANGE (totaldate); CREATE INDEX dailytotals_batchid ON dailytotals (batchid); … Create empty partitioned table & indices
  • 30. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 30 CREATE TABLE dailytotals_202106 PARTITION OF dailytotals FOR VALUES FROM ('2021-06-01') TO ('2021-07-01'); Create partition for new incoming data
  • 31. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 31 DO $$ DECLARE earliest DATE; DECLARE latest DATE; BEGIN -- Set boundaries SELECT min(totaldate) INTO earliest FROM dailytotals_legacy; latest := '2021-06-01'::DATE; Attach old table as a partition (i)
  • 32. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 32 -- HACK HACK HACK (only because we know and trust our data) ALTER TABLE dailytotals_legacy ADD CONSTRAINT dailytotals_legacy_totaldate CHECK (totaldate >= earliest AND totaldate < latest) NOT VALID; -- You should not touch pg_catalog directly 😕 UPDATE pg_constraint SET convalidated = true WHERE conname = 'dailytotals_legacy_totaldate'; Attach old table as a partition (ii)
  • 33. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 33 ALTER TABLE dailytotals ATTACH PARTITION dailytotals_legacy FOR VALUES FROM (earliest) TO (latest); END; $$ LANGUAGE PLPGSQL; COMMIT; Attach old table as a partition (iii)
  • 34. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 34 • For instance, during quiet hours for the system, in scheduled batch jobs, etc. WITH rows AS ( DELETE FROM dailytotals_legacy d WHERE (totaldate >= '2020-01-01' AND totaldate < '2021-01-01') RETURNING d.* ) INSERT INTO dailytotals SELECT * FROM rows; • In the same transaction: DETACH the old table, perform the move, reATTACH with changed boundaries. Rinse and repeat! • Make sure the target partition exists! Move data from old table at our own pace
  • 35. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 35 • PG11: DEFAULT partition, UPDATE on partition key, HASH method, PKs, FKs, Indexes, Triggers • PG12: Performance (pruning, COPY), FK references for partitioned tables, ordered scans • PG13: Logical replication for partitioned tables, improved performance (JOINs, pruning) • (Soon) PG14: REINDEX CONCURRENTLY, DETACH CONCURRENTLY, faster UPDATE/DELETE Partitioning improvements Make sure you’re on the latest release so you have them!
  • 36. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 36 • Know your data! • Upgrade – be on the latest release! • Partition before you get in deep water! • Find me on Twitter: @vyruss To conclude...