Distributed Postgres with Citus / Will Leinweber (PostgreSQL)

Distributed
Postgres
with Citus
Will Leinweber

Will Leinweber
Principal Cloud Engineer at Citus
Previously at Heroku Postgres
@leinweber
bitfission.com (warning: autoplays MIDI)

Developers Love Postgres
Postgres
MySQL
MongoDB
SQL Server +
Oracle
RDBMS: Postgres, MySQL, Microsoft SQL Server, Oracle

A. Start with SQL, need to scale out and migrate to NoSQL
B. Start with NoSQL, hope you actually later need scale out
C. Start with SQL, need to scale out and stay with SQL?
Possible Paths

What is Citus?
1.Scales out Postgres
2.Extension to Postgres
3.Available in 3 Ways
• Using sharding & replication
• Query engine parallelizes SQL queries across many nodes
• Using Postgres extension APIs

Citus, Packaged Three Ways
Open
Source
Enterprise
Software
Fully-Managed
Database as a Service
github.com/citusdata/citus

citus=> select * from pg_dist_shard limit 10;
logicalrelid | shardid | shardminvalue | shardmaxvalue
--------------+---------+---------------+---------------
19395 | 102040 | -2147483648 | -2013265921
19395 | 102041 | -2013265920 | -1879048193
19395 | 102042 | -1879048192 | -1744830465
19395 | 102043 | -1744830464 | -1610612737
19395 | 102044 | -1610612736 | -1476395009
19395 | 102045 | -1476395008 | -1342177281
19395 | 102046 | -1342177280 | -1207959553
19395 | 102047 | -1207959552 | -1073741825
19395 | 102048 | -1073741824 | -939524097
19395 | 102049 | -939524096 | -805306369
...

3 Challenges Distributing Postgres
1. Postgres and High Availability
2. To build new distributed database—or to fork?
3. Distributed transactions

Postgres &
High Availability (HA)
Designing for a Cloud-native world

Why is High Availability hard?
Postgres replication uses one primary & multiple
secondary nodes. Two challenges:
1. Most Postgres clients aren’t smart. When the
primary fails, they retry the same IP.
2. Postgres replicates entire state. This makes it
resource intensive to reconstruct new nodes from a
primary.

Database Failures Should Be Transparent

Database Failures Shouldn’t Be a Big Deal
1. Postgres streaming replication to replicate from
primary to secondary. Back up to S3.
2. Volume level replication to replicate to secondary’s
volume. Back up to S3.
3. Incremental backups to S3. Reconstruct secondary
nodes from S3.
3 Methods for HA & Backups in Postgres

Postgres - Streaming Replication (1)
Write-ahead logs
(streaming repl.)
Table foo
Primary –
Postgres
streaming repl.
Table bar
WAL logs
Table foo
Table bar
WAL logs
Secondary –
Postgres
streaming repl.
Monitoring Agents -
streaming repl.
setup & auto failover
S3 / Blob Storage
(Encrypted)
Backup
Process

Postgres – AWS RDS & Azure (2)
Postgres
Primary
Monitoring Agents
(Auto node failover)
Persistent Volume
Postgres
Standby
S3 / Blob Storage
(Encrypted)
Table foo
Table bar
WAL logs
Table foo
Table bar
WAL logs
Backup process
Backup
Process
Persistent Volume

Postgres – Reconstruct from WAL (3)
Postgres
Primary
Monitoring Agents
(Auto node failover)
Persistent Volume
Postgres
Secondary
Backup
Process
S3 / Blob Storage
(Encrypted)
Table foo
Table bar
WAL logs
Persistent Volume
Table foo
Table bar
WAL logs
Backup process

WHO DOES THIS? PRIMARY BENEFITS
Streaming Replication
(local / ephemeral disk)
On-prem
Manual EC2
Simple to set up
Direct I/O: High I/O & large storage
Disk Mirroring
RDS
Azure Preview
Works for MySQL and Postgres
Data durability in cloud environments
Reconstruct from WAL
Heroku
Citus Data
Enables Fork and PITR
Node reconstruction in background
(Data durability in cloud environments)
How do these approaches compare?

wal-e
github.com/wal-e/wal-e
github.com/wal-g/wal-g

Summary
• In Postgres, a database node’s state gets replicated in
its entirety. The replication can be set up in three
ways.
• Reconstructing a secondary node from S3 makes
bringing up or shooting down nodes easy.
• When you shard your database, the state you need to
replicate per node becomes smaller.

Postgres has a
huge ecosystem.
How do you keep up with it?

3 ways to build a distributed database
1. Build a distributed database from scratch
2. Middleware sharding (mimic the parser)
3. Fork your favorite database (like Postgres)

Postgres Features, Tools & Frameworks
• Postgres manual (US Letter)
• Clients for different
programming languages
• ORMs, libraries, GUIs
• Tools (dump, restore, analyze)
• New features

At First, Forked Postgres with Style

Two Stage Query Optimization
1. Plan to minimize network I/O
2. Nodes talk to each other using SQL over libpq
3. Learned to cooperate with planner / executor bit by bit
(Volcano style executor)

Citus Architecture (Simplified)
SELECT avg(revenue)
FROM sales
Coordinator
SELECT sum(revenue), count(revenue)
FROM table_1001
SELECT sum … FROM table_1003
Worker node 1
Table metadata
Table_1001
Table_1003
Worker node 2
Table_1002
Table_1004
Worker node N
.
.
.
.
.
.
Each node Postgres with Citus installed
1 shard = 1 Postgres table

Unfork Citus using Extension APIs
CREATE EXTENSION citus;
• System catalogs – Distributed metadata
• Planner hook – Insert, Update, Delete, Select
• Executor hook – Insert, Update, Delete, Select
• Utility hook – Alter Table, Create Index, Vacuum, etc.
• Transaction & resources handling – file descriptors, etc.
• Background worker process – Maintenance processes (distributed
deadlock detection, task tracker, etc.)
• Logical decoding – Online data migrations

Postgres has transactions
How to handle distributed transactions

BEGIN
INSERT
UPDATE
SELECT
COMMIT
ROLLBACK

Consistency in Distributed Databases
1. 2PC: All participating nodes need to be up
2. Paxos: Achieves consensus with quorum
3. Raft: More understandable alternative to Paxos

Concurrency in Distributed Databases

What is a Lock?
• Protects against concurrent modifications.
• Locks are released at the end of a transaction.
Deadlocks

Transactions Block on 1st Conflicting LockWhat is a lock?
Protects against concurrent modifications
Locks released at end of transaction
BEGIN;
UPDATE data SET y = 2 WHERE x = 1;
<obtained lock on rows with x = 1>
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = 5 WHERE x = 1;
<waiting for lock on rows with x = 1>
COMMIT;

Transactions and Concurrency
• Transactions that don’t modify the same row can run concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
COMMIT;

BEGIN;
COMMIT;
BEGIN;
COMMIT;
(Distributed) deadlock!
BEGIN;
BEGIN;
But what if they start blocking each other?

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Deadlocks in Citus

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
PostgreSQL’s deadlock detector still works

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Deadlocks in Citus

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlock detection in Citus 7
Citus 7 adds distributed deadlock detection

BEGIN;
COMMIT;
BEGIN;
COMMIT;
BEGIN;
BEGIN;
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Deadlocks in Citus
Citus 7 adds distributed deadlock detection
Citus 7 adds distributed deadlock detection.

Summary
Distributed transactions are a complex topic. Most
articles on this topic focus on data consistency.
Data consistency is only one side of the coin. If you’re
using a relational database, your application benefits
from another key feature: deadlock detection.
https://www.citusdata.com/blog/2017/08/31/databases-
and-distributed-deadlocks-a-faq

Conclusion
Postgres High Availability (HA)
Extension APIs
Distributed Deadlock Detection

SQL is hard, not impossible, to scale

Distributed Postgres with Citus / Will Leinweber (PostgreSQL)

Recommended

Recommended

More Related Content

More from Ontico

More from Ontico (20)

Recently uploaded

Recently uploaded (20)

Distributed Postgres with Citus / Will Leinweber (PostgreSQL)

Editor's Notes