Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Distributed
Postgres
with Citus
Will Leinweber
Will Leinweber
Principal Cloud Engineer at Citus
Previously at Heroku Postgres
@leinweber
bitfission.com (warning: autopla...
Developers Love Postgres
Postgres
MySQL
MongoDB
SQL Server +
Oracle
RDBMS: Postgres, MySQL, Microsoft SQL Server, Oracle
A. Start with SQL, need to scale out and migrate to NoSQL
B. Start with NoSQL, hope you actually later need scale out
C. S...
What is Citus?
1.Scales out Postgres
2.Extension to Postgres
3.Available in 3 Ways
• Using sharding & replication
• Query ...
Citus, Packaged Three Ways
Open
Source
Enterprise
Software
Fully-Managed
Database as a Service
github.com/citusdata/citus
Simplified Citus Architecture
(coordinator node)=# d
Schema | Name
--------+------------
public | cw_metrics
public | events
(worker node)=# d
Schema | ...
citus=> select * from pg_dist_shard limit 10;
logicalrelid | shardid | shardminvalue | shardmaxvalue
--------------+------...
3 Challenges Distributing Postgres
1. Postgres and High Availability
2. To build new distributed database—or to fork?
3. D...
Postgres &
High Availability (HA)
Designing for a Cloud-native world
Why is High Availability hard?
Postgres replication uses one primary & multiple
secondary nodes. Two challenges:
1. Most P...
Database Failures Should Be Transparent
Database Failures Shouldn’t Be a Big Deal
1. Postgres streaming replication to replicate from
primary to secondary. Back u...
Postgres - Streaming Replication (1)
Write-ahead logs
(streaming repl.)
Table foo
Primary –
Postgres
streaming repl.
Table...
Postgres – AWS RDS & Azure (2)
Postgres
Primary
Monitoring Agents
(Auto node failover)
Persistent Volume
Postgres
Standby
...
Postgres – Reconstruct from WAL (3)
Postgres
Primary
Monitoring Agents
(Auto node failover)
Persistent Volume
Postgres
Sec...
WHO DOES THIS? PRIMARY BENEFITS
Streaming Replication
(local / ephemeral disk)
On-prem
Manual EC2
Simple to set up
Direct ...
wal-e
github.com/wal-e/wal-e
github.com/wal-g/wal-g
Summary
• In Postgres, a database node’s state gets replicated in
its entirety. The replication can be set up in three
way...
Postgres has a
huge ecosystem.
How do you keep up with it?
3 ways to build a distributed database
1. Build a distributed database from scratch
2. Middleware sharding (mimic the pars...
Example Transaction Block
Postgres Features, Tools & Frameworks
• Postgres manual (US Letter)
• Clients for different
programming languages
• ORMs, ...
At First, Forked Postgres with Style
Two Stage Query Optimization
1. Plan to minimize network I/O
2. Nodes talk to each other using SQL over libpq
3. Learned t...
Citus Architecture (Simplified)
SELECT avg(revenue)
FROM sales
Coordinator
SELECT sum(revenue), count(revenue)
FROM table_...
Unfork Citus using Extension APIs
CREATE EXTENSION citus;
• System catalogs – Distributed metadata
• Planner hook – Insert...
Postgres has transactions
How to handle distributed transactions
BEGIN
INSERT
UPDATE
SELECT
COMMIT
ROLLBACK
Consistency in Distributed Databases
1. 2PC: All participating nodes need to be up
2. Paxos: Achieves consensus with quoru...
Concurrency in Distributed Databases
Locks
Locks
What is a Lock?
• Protects against concurrent modifications.
• Locks are released at the end of a transaction.
Deadlocks
Transactions Block on 1st Conflicting LockWhat is a lock?
Protects against concurrent modifications
Locks released at end ...
Transactions and Concurrency
• Transactions that don’t modify the same row can run concurrently.
Transactions block on 1st...
Transactions and Concurrency
• Transactions that don’t modify the same row can run concurrently.
Transactions block on 1st...
Transactions and Concurrency
• Transactions that don’t modify the same row can run concurrently.
Transactions block on 1st...
Transactions and Concurrency
• Transactions that don’t modify the same row can run concurrently.
Transactions block on 1st...
Transactions and Concurrency
• Transactions that don’t modify the same row can run concurrently.
Transactions block on 1st...
Transactions and Concurrency
• Transactions that don’t modify the same row can run concurrently.
Transactions block on 1st...
Transactions and Concurrency
• Transactions that don’t modify the same row can run concurrently.
Transactions block on 1st...
Transactions and Concurrency
• Transactions that don’t modify the same row can run concurrently.
Transactions block on 1st...
Transactions and Concurrency
• Transactions that don’t modify the same row can run concurrently.
Transactions block on 1st...
Summary
Distributed transactions are a complex topic. Most
articles on this topic focus on data consistency.
Data consiste...
Conclusion
Postgres High Availability (HA)
Extension APIs
Distributed Deadlock Detection
SQL is hard, not impossible, to scale
© 2017 Citus Data. All right reserved.
will@citusdata.com
Questions?
@citusdata
Will Leinweber
www.citusdata.com
Distributed Postgres with Citus / Will Leinweber (PostgreSQL)
Upcoming SlideShare
Loading in …5
×

of

Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 1 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 2 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 3 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 4 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 5 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 6 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 7 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 8 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 9 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 10 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 11 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 12 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 13 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 14 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 15 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 16 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 17 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 18 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 19 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 20 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 21 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 22 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 23 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 24 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 25 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 26 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 27 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 28 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 29 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 30 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 31 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 32 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 33 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 34 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 35 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 36 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 37 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 38 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 39 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 40 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 41 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 42 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 43 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 44 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 45 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 46 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 47 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 48 Distributed Postgres with Citus / Will Leinweber (PostgreSQL) Slide 49
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

2 Likes

Share

Download to read offline

Distributed Postgres with Citus / Will Leinweber (PostgreSQL)

Download to read offline

HighLoad++ 2017

Зал «Кейптаун», 7 ноября, 15:00

Тезисы:
http://www.highload.ru/2017/abstracts/3043.html

Citus is an open-source extension to Postgres that transforms it into a multi-node, distributed database. It allows you to horizontally scale out both the.

In this session you'll learn how Citus takes care of sharding, distributed transactions, and even masterless writes. You'll learn how to transition your database from single-node Postgres in order to scale up your database to bigger and bigger sizes as your data grows.

Related Books

Free with a 30 day trial from Scribd

See all

Distributed Postgres with Citus / Will Leinweber (PostgreSQL)

  1. 1. Distributed Postgres with Citus Will Leinweber
  2. 2. Will Leinweber Principal Cloud Engineer at Citus Previously at Heroku Postgres @leinweber bitfission.com (warning: autoplays MIDI)
  3. 3. Developers Love Postgres Postgres MySQL MongoDB SQL Server + Oracle RDBMS: Postgres, MySQL, Microsoft SQL Server, Oracle
  4. 4. A. Start with SQL, need to scale out and migrate to NoSQL B. Start with NoSQL, hope you actually later need scale out C. Start with SQL, need to scale out and stay with SQL? Possible Paths
  5. 5. What is Citus? 1.Scales out Postgres 2.Extension to Postgres 3.Available in 3 Ways • Using sharding & replication • Query engine parallelizes SQL queries across many nodes • Using Postgres extension APIs
  6. 6. Citus, Packaged Three Ways Open Source Enterprise Software Fully-Managed Database as a Service github.com/citusdata/citus
  7. 7. Simplified Citus Architecture
  8. 8. (coordinator node)=# d Schema | Name --------+------------ public | cw_metrics public | events (worker node)=# d Schema | Name --------+------------------- public | cw_metrics_102008 public | cw_metrics_102012 public | cw_metrics_102016 public | cw_metrics_102064 public | cw_metrics_102068 public | events_102104 public | events_102108 public | events_102112 public | events_102116 ...
  9. 9. citus=> select * from pg_dist_shard limit 10; logicalrelid | shardid | shardminvalue | shardmaxvalue --------------+---------+---------------+--------------- 19395 | 102040 | -2147483648 | -2013265921 19395 | 102041 | -2013265920 | -1879048193 19395 | 102042 | -1879048192 | -1744830465 19395 | 102043 | -1744830464 | -1610612737 19395 | 102044 | -1610612736 | -1476395009 19395 | 102045 | -1476395008 | -1342177281 19395 | 102046 | -1342177280 | -1207959553 19395 | 102047 | -1207959552 | -1073741825 19395 | 102048 | -1073741824 | -939524097 19395 | 102049 | -939524096 | -805306369 ...
  10. 10. 3 Challenges Distributing Postgres 1. Postgres and High Availability 2. To build new distributed database—or to fork? 3. Distributed transactions
  11. 11. Postgres & High Availability (HA) Designing for a Cloud-native world
  12. 12. Why is High Availability hard? Postgres replication uses one primary & multiple secondary nodes. Two challenges: 1. Most Postgres clients aren’t smart. When the primary fails, they retry the same IP. 2. Postgres replicates entire state. This makes it resource intensive to reconstruct new nodes from a primary.
  13. 13. Database Failures Should Be Transparent
  14. 14. Database Failures Shouldn’t Be a Big Deal 1. Postgres streaming replication to replicate from primary to secondary. Back up to S3. 2. Volume level replication to replicate to secondary’s volume. Back up to S3. 3. Incremental backups to S3. Reconstruct secondary nodes from S3. 3 Methods for HA & Backups in Postgres
  15. 15. Postgres - Streaming Replication (1) Write-ahead logs (streaming repl.) Table foo Primary – Postgres streaming repl. Table bar WAL logs Table foo Table bar WAL logs Secondary – Postgres streaming repl. Monitoring Agents - streaming repl. setup & auto failover S3 / Blob Storage (Encrypted) Backup Process
  16. 16. Postgres – AWS RDS & Azure (2) Postgres Primary Monitoring Agents (Auto node failover) Persistent Volume Postgres Standby S3 / Blob Storage (Encrypted) Table foo Table bar WAL logs Table foo Table bar WAL logs Backup process Backup Process Persistent Volume
  17. 17. Postgres – Reconstruct from WAL (3) Postgres Primary Monitoring Agents (Auto node failover) Persistent Volume Postgres Secondary Backup Process S3 / Blob Storage (Encrypted) Table foo Table bar WAL logs Persistent Volume Table foo Table bar WAL logs Backup process
  18. 18. WHO DOES THIS? PRIMARY BENEFITS Streaming Replication (local / ephemeral disk) On-prem Manual EC2 Simple to set up Direct I/O: High I/O & large storage Disk Mirroring RDS Azure Preview Works for MySQL and Postgres Data durability in cloud environments Reconstruct from WAL Heroku Citus Data Enables Fork and PITR Node reconstruction in background (Data durability in cloud environments) How do these approaches compare?
  19. 19. wal-e github.com/wal-e/wal-e github.com/wal-g/wal-g
  20. 20. Summary • In Postgres, a database node’s state gets replicated in its entirety. The replication can be set up in three ways. • Reconstructing a secondary node from S3 makes bringing up or shooting down nodes easy. • When you shard your database, the state you need to replicate per node becomes smaller.
  21. 21. Postgres has a huge ecosystem. How do you keep up with it?
  22. 22. 3 ways to build a distributed database 1. Build a distributed database from scratch 2. Middleware sharding (mimic the parser) 3. Fork your favorite database (like Postgres)
  23. 23. Example Transaction Block
  24. 24. Postgres Features, Tools & Frameworks • Postgres manual (US Letter) • Clients for different programming languages • ORMs, libraries, GUIs • Tools (dump, restore, analyze) • New features
  25. 25. At First, Forked Postgres with Style
  26. 26. Two Stage Query Optimization 1. Plan to minimize network I/O 2. Nodes talk to each other using SQL over libpq 3. Learned to cooperate with planner / executor bit by bit (Volcano style executor)
  27. 27. Citus Architecture (Simplified) SELECT avg(revenue) FROM sales Coordinator SELECT sum(revenue), count(revenue) FROM table_1001 SELECT sum … FROM table_1003 Worker node 1 Table metadata Table_1001 Table_1003 SELECT sum … FROM table_1002 SELECT sum … FROM table_1004 Worker node 2 Table_1002 Table_1004 Worker node N . . . . . . Each node Postgres with Citus installed 1 shard = 1 Postgres table
  28. 28. Unfork Citus using Extension APIs CREATE EXTENSION citus; • System catalogs – Distributed metadata • Planner hook – Insert, Update, Delete, Select • Executor hook – Insert, Update, Delete, Select • Utility hook – Alter Table, Create Index, Vacuum, etc. • Transaction & resources handling – file descriptors, etc. • Background worker process – Maintenance processes (distributed deadlock detection, task tracker, etc.) • Logical decoding – Online data migrations
  29. 29. Postgres has transactions How to handle distributed transactions
  30. 30. BEGIN INSERT UPDATE SELECT COMMIT ROLLBACK
  31. 31. Consistency in Distributed Databases 1. 2PC: All participating nodes need to be up 2. Paxos: Achieves consensus with quorum 3. Raft: More understandable alternative to Paxos
  32. 32. Concurrency in Distributed Databases
  33. 33. Locks Locks
  34. 34. What is a Lock? • Protects against concurrent modifications. • Locks are released at the end of a transaction. Deadlocks
  35. 35. Transactions Block on 1st Conflicting LockWhat is a lock? Protects against concurrent modifications Locks released at end of transaction BEGIN; UPDATE data SET y = 2 WHERE x = 1; <obtained lock on rows with x = 1> COMMIT; <all locks released> BEGIN; UPDATE data SET y = 5 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT;
  36. 36. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT;
  37. 37. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?
  38. 38. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other.
  39. 39. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone
  40. 40. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes
  41. 41. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works
  42. 42. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us
  43. 43. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlock detection in Citus 7 Citus 7 adds distributed deadlock detection
  44. 44. Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlock detection in Citus 7 Citus 7 adds distributed deadlock detection Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlock detection in Citus 7 Citus 7 adds distributed deadlock detection.
  45. 45. Summary Distributed transactions are a complex topic. Most articles on this topic focus on data consistency. Data consistency is only one side of the coin. If you’re using a relational database, your application benefits from another key feature: deadlock detection. https://www.citusdata.com/blog/2017/08/31/databases- and-distributed-deadlocks-a-faq
  46. 46. Conclusion Postgres High Availability (HA) Extension APIs Distributed Deadlock Detection
  47. 47. SQL is hard, not impossible, to scale
  48. 48. © 2017 Citus Data. All right reserved. will@citusdata.com Questions? @citusdata Will Leinweber www.citusdata.com
  • ChenLiang9

    Jul. 7, 2019
  • daolong549

    Mar. 14, 2019

HighLoad++ 2017 Зал «Кейптаун», 7 ноября, 15:00 Тезисы: http://www.highload.ru/2017/abstracts/3043.html Citus is an open-source extension to Postgres that transforms it into a multi-node, distributed database. It allows you to horizontally scale out both the. In this session you'll learn how Citus takes care of sharding, distributed transactions, and even masterless writes. You'll learn how to transition your database from single-node Postgres in order to scale up your database to bigger and bigger sizes as your data grows.

Views

Total views

1,111

On Slideshare

0

From embeds

0

Number of embeds

41

Actions

Downloads

26

Shares

0

Comments

0

Likes

2

×