Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot

Distributed Computing on
PostgreSQL
Marco Slot <marco@citusdata.com>
Small data architecture
Big data architecture
Records
Data warehouse
Real-time analytics
Big data architecture using postgres
Messaging
PostgreSQL is a perfect building block
for distributed systems
Features!
PostgreSQL contains many useful features for building a distributed system:
● Well-defined protocol, libpq
● Crash safety
● Concurrent execution
● Transactions
● Access controls
● 2PC
● Replication
● Custom functions
● …
Extensions!
Built-in / contrib:
● postgres_fdw
● dblink RPC!
● plpgsql
Third-party open source:
● pglogical
● pg_cron
● citus
Extensions!
Built-in / contrib:
● postgres_fdw
● dblink RPC!
● plpgsql
Third-party open source:
● pglogical
● pg_cron
● citus
Yours!
dblink
Run queries on remote postgres server
SELECT dblink_connect(node_id,
format('host=%s port=%s dbname=postgres', node_name, node_port))
FROM nodes;
SELECT dblink_send_query(node_id, $$SELECT pg_database_size('postgres')$$)
FROM nodes;
SELECT sum(size::bigint)
FROM nodes, dblink_get_result(nodes.node_id) AS r(size text);
SELECT dblink_disconnect(node_id)
FROM nodes;
RPC using dblink
For every postgres function, we can create a client-side stub using dblink.
CREATE FUNCTION func(input text)
...
CREATE FUNCTION remote_func(host text, port int, input text) RETURNS text
LANGUAGE sql AS $function$
SELECT res FROM dblink(
format('host=%s port=%s', host, port),
format('SELECT * FROM func(%L)', input))
AS res(output text);
$function$;
PL/pgSQL
Procedural language for Postgres:
CREATE FUNCTION distributed_database_size(dbname text)
RETURNS bigint LANGUAGE plpgsql AS $function$
DECLARE
total_size bigint;
BEGIN
PERFORM dblink_send_query(node_id, format('SELECT pg_database_size(%L)', dbname)
FROM nodes;
SELECT sum(size::bigint) INTO total_size
FROM nodes, dblink_get_result(nodes.node_id) AS r(size text);
RETURN total_size
END;
$function$;
Distributed system in progress...
With these extensions, we can already create a simple distributed computing
system.
Nodes
Nodes Nodes Nodes
Parallel operation using dblink
SELECT
transform_data()
Data 1 Data 2 Data 3
postgres_fdw?
pglogical / logical replication
Asynchronously replicate changes to another database.
Nodes
Nodes Nodes Nodes
pg_paxos
Consistently replicate changes between databases.
Nodes
Nodes
Nodes
pg_cron
Cron-based job scheduler for postgres:
CREATE EXTENSION pg_cron;
SELECT cron.schedule('* * * * */10', 'SELECT transform_data()');
Internally uses libpq, meaning it can also schedule jobs on other nodes.
pg_cron provides a way for nodes to act autonomously
Citus
Transparently shards tables across multiple nodes
Coordinator
E1 E4 E2 E5 E2 E5
Events
create_distributed_table('events',
'event_id');
Citus MX
Nodes can have the distributed tables too
Coordinator
E1 E4 E2 E5 E2 E5
Events
Events Events Events
How to build a distributed system
using only PostgreSQL & extensions?
Building a streaming publish-subscribe system
Producers
Postgres nodes
Consumers
topic: adclick
Storage nodes
E1 E4 E2 E5 E2 E5
Events Events Events
Coordinator
Events
CREATE TABLE
Use Citus to create a distributed table
Distributed Table Creation
$ psql -h coordinator
CREATE TABLE events (
event_id bigserial,
ingest_time timestamptz default now(),
topic_name text not null,
payload jsonb
);
SELECT create_distributed_table('events', 'event_id');
$ psql -h any-node
INSERT INTO events (topic_name, payload) VALUES ('adclick','{...}');
Sharding strategy
Shard is chosen by hashing the value in the partition column.
Application-defined:
● stream_id text not null
Optimise data distribution:
● event_id bigserial
Optimise ingest capacity and availability:
● sid int default pick_local_value()
Producers connect to a random node and perform COPY or INSERT into events
Producers
E1 E4 E2 E5 E2 E5
Events Events Events
COPY / INSERT
Consumers in a group together consume events at least / exactly once.
Consumers
E1 E4 E2 E5 E2 E5
topic: adclick%
Consumer
group
Consumers obtain leases for consuming a shard.
Lease are kept in a separate table on each node:
CREATE TABLE leases (
consumer_group text not null,
shard_id bigint not null,
owner text,
new_owner text,
last_heartbeat timestamptz,
PRIMARY KEY (consumer_group, shard_id)
);
Consumer leases
Consumers obtain leases for consuming a shard.
SELECT * FROM claim_lease('click-analytics', 'node-2', 102008);
Under the covers: Insert a new lease or set new_owner to steal lease.
CREATE FUNCTION claim_lease(group_name text, node_name text, shard_id int)
…
INSERT INTO leases (consumer_group, shard_id, owner, last_heartbeat)
VALUES (group_name, shard, node_name, now())
ON CONFLICT (consumer_group, shard_id) DO UPDATE
SET new_owner = node_name
WHERE leases.new_owner IS NULL;
Consumer leases
Distributing leases across consumers
Distributed algorithm for distributing leases across nodes
SELECT * FROM obtain_leases('click-analytics', 'node-2')
-- gets all available lease tables
-- claim all unclaimed shards
-- claim random shards until #claims >= #shards/#consumers
Not perfect, but ensures all shards are consumed with load balancing (unless C>S)
Consumers
E1 E4 E2 E5 E2 E5
leases
First consumer consumes all
obtain_leases
leases leases
Consumers
E1 E4 E2 E5 E2 E5
First consumer consumes all
leases leases leases
Consumers
E1 E4 E2 E5 E2 E5
Second consumer steals leases from first consumer
obtain_leases
leases leases leases
Consumers
E1 E4 E2 E5 E2 E5
Second consumer steals leases from first consumer
Consuming events
Consumer wants to receive all events once.
Several options:
● SQL level
● Logical decoding utility functions
● Use a replication connection
● PG10 logical replication / pglogical
Consuming events
Get a batch of events from a shard:
SELECT * FROM poll_events('click-analytics', 'node-2', 102008, 'adclick',
'<last-processed-event-id>');
-- Check if node has the lease
Set owner = new_owner if new_owner is set
-- Get all pending events (pg_logical_slot_peek_changes)
-- Progress the replication slot (pg_logical_slot_get_changes)
-- Return remaining events if still owner
Consumer loop
E1 E4 E2 E5 E2 E5
1. Call poll_events for each leased shard
2. Process events from each batch
3. Repeat with event IDs of last event in each batch
poll_events
Failure handling
Producer / consumer fails to connect to storage node:
→ Connect to different node
Storage node fails:
→ Use pick_local_value() for partition column, failover to hot standby
Consumer fails to consume batch
→ Events are repeated until confirmed
Consumer fails and does not come back
→ Consumers periodically call obtain_leases
→ Old leases expire
Use pg_cron to periodically expire leases on coordinator:
SELECT cron.schedule('* * * * *', 'SELECT expire_leases()');
CREATE FUNCTION expire_leases()
...
UPDATE leases
SET owner = new_owner, last_heartbeat = now()
WHERE last_heartbeat < now() - interval '2 minutes'
Maintenance: Lease expiration
Use pg_cron to periodically expire leases on coordinator:
$ psql -h coordinator
SELECT cron.schedule('* * * * *', 'SELECT expire_events()');
CREATE FUNCTION expire_events()
...
DELETE FROM events
WHERE ingest_time < now() - interval '1 day';
Maintenance: Delete old events
Prototyped a functional, highly available publish-subscribe systems in
https://goo.gl/R1suAo
~300 lines of code
Demo
Records
Data warehouse
Real-time analytics
Big data architecture using postgres
Messaging
Questions?
marco@citusdata.com
1 of 41

Recommended

Postgres Performance for Humans by
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for HumansCitus Data
929 views73 slides
The Challenges of Distributing Postgres: A Citus Story by
The Challenges of Distributing Postgres: A Citus StoryThe Challenges of Distributing Postgres: A Citus Story
The Challenges of Distributing Postgres: A Citus StoryHanna Kelman
127 views46 slides
Real time analytics at any scale | PostgreSQL User Group NL | Marco Slot by
Real time analytics at any scale | PostgreSQL User Group NL | Marco SlotReal time analytics at any scale | PostgreSQL User Group NL | Marco Slot
Real time analytics at any scale | PostgreSQL User Group NL | Marco SlotCitus Data
758 views32 slides
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur... by
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
848 views56 slides
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ... by
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Citus Data
599 views32 slides
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot by
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotDistributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotCitus Data
509 views53 slides

More Related Content

What's hot

From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016 by
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
2.6K views95 slides
GeoMesa on Apache Spark SQL with Anthony Fox by
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxDatabricks
2.7K views109 slides
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx by
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxData
478 views30 slides
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi... by
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Databricks
1.2K views24 slides
Scaling up data science applications by
Scaling up data science applicationsScaling up data science applications
Scaling up data science applicationsKexin Xie
123 views41 slides
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a... by
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
3.4K views47 slides

What's hot(20)

From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016 by DataStax
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax2.6K views
GeoMesa on Apache Spark SQL with Anthony Fox by Databricks
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony Fox
Databricks2.7K views
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx by InfluxData
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxData478 views
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi... by Databricks
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Databricks1.2K views
Scaling up data science applications by Kexin Xie
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
Kexin Xie123 views
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a... by Altinity Ltd
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd3.4K views
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In... by InfluxData
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
InfluxData566 views
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev by Altinity Ltd
ClickHouse Analytical DBMS. Introduction and usage, by Alexander ZaitsevClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
Altinity Ltd1.5K views
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala... by Martin Zapletal
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Martin Zapletal1.3K views
ClickHouse 2018. How to stop waiting for your queries to complete and start ... by Altinity Ltd
ClickHouse 2018.  How to stop waiting for your queries to complete and start ...ClickHouse 2018.  How to stop waiting for your queries to complete and start ...
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
Altinity Ltd3K views
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO by Altinity Ltd
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Altinity Ltd1.3K views
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by... by Spark Summit
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Spark Summit1.3K views
A Deeper Dive into EXPLAIN by EDB
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
EDB176 views
Your first ClickHouse data warehouse by Altinity Ltd
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
Altinity Ltd1.2K views
ClickHouse new features and development roadmap, by Aleksei Milovidov by Altinity Ltd
ClickHouse new features and development roadmap, by Aleksei MilovidovClickHouse new features and development roadmap, by Aleksei Milovidov
ClickHouse new features and development roadmap, by Aleksei Milovidov
Altinity Ltd3.4K views
Reactive programming on Android by Tomáš Kypta
Reactive programming on AndroidReactive programming on Android
Reactive programming on Android
Tomáš Kypta7.5K views
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi... by Citus Data
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Citus Data227 views

Similar to Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot

Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali? by
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?SegFaultConf
250 views72 slides
pgconfasia2016 plcuda en by
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
783 views49 slides
20181025_pgconfeu_lt_gstorefdw by
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdwKohei KaiGai
5.1K views31 slides
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics by
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
1.8K views44 slides
Kapacitor - Real Time Data Processing Engine by
Kapacitor - Real Time Data Processing EngineKapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing EnginePrashant Vats
1.9K views21 slides
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018 by
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018VMware Tanzu
861 views44 slides

Similar to Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot(20)

Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali? by SegFaultConf
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
SegFaultConf250 views
pgconfasia2016 plcuda en by Kohei KaiGai
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
Kohei KaiGai783 views
20181025_pgconfeu_lt_gstorefdw by Kohei KaiGai
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw
Kohei KaiGai5.1K views
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics by Kohei KaiGai
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai1.8K views
Kapacitor - Real Time Data Processing Engine by Prashant Vats
Kapacitor - Real Time Data Processing EngineKapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing Engine
Prashant Vats1.9K views
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018 by VMware Tanzu
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
VMware Tanzu861 views
Adding replication protocol support for psycopg2 by Alexander Shulgin
Adding replication protocol support for psycopg2Adding replication protocol support for psycopg2
Adding replication protocol support for psycopg2
Alexander Shulgin724 views
Reactive & Realtime Web Applications with TurboGears2 by Alessandro Molina
Reactive & Realtime Web Applications with TurboGears2Reactive & Realtime Web Applications with TurboGears2
Reactive & Realtime Web Applications with TurboGears2
Alessandro Molina2.1K views
A New Chapter of Data Processing with CDK by Shu-Jeng Hsieh
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
Shu-Jeng Hsieh210 views
Building an analytics workflow using Apache Airflow by Yohei Onishi
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
Yohei Onishi2.3K views
Ansiblefest 2018 Network automation journey at roblox by Damien Garros
Ansiblefest 2018 Network automation journey at robloxAnsiblefest 2018 Network automation journey at roblox
Ansiblefest 2018 Network automation journey at roblox
Damien Garros4.7K views
How to make a high-quality Node.js app, Nikita Galkin by Sigma Software
How to make a high-quality Node.js app, Nikita GalkinHow to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita Galkin
Sigma Software269 views
Skytools: PgQ Queues and applications by elliando dias
Skytools: PgQ Queues and applicationsSkytools: PgQ Queues and applications
Skytools: PgQ Queues and applications
elliando dias10.1K views
Lifecycle Inference on Unreliable Event Data by Databricks
Lifecycle Inference on Unreliable Event DataLifecycle Inference on Unreliable Event Data
Lifecycle Inference on Unreliable Event Data
Databricks252 views
Mod06 new development tools by Peter Haase
Mod06 new development toolsMod06 new development tools
Mod06 new development tools
Peter Haase 402 views
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ... by Hakka Labs
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Hakka Labs30K views
PgQ Generic high-performance queue for PostgreSQL by elliando dias
PgQ Generic high-performance queue for PostgreSQLPgQ Generic high-performance queue for PostgreSQL
PgQ Generic high-performance queue for PostgreSQL
elliando dias3.2K views

More from Citus Data

JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201... by
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...Citus Data
312 views37 slides
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak... by
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Citus Data
224 views38 slides
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens by
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
204 views62 slides
When it all goes wrong | PGConf EU 2019 | Will Leinweber by
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberCitus Data
173 views69 slides
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc by
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncCitus Data
281 views72 slides
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E... by
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...Citus Data
397 views15 slides

More from Citus Data(20)

JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201... by Citus Data
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
Citus Data312 views
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak... by Citus Data
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Citus Data224 views
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens by Citus Data
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Citus Data204 views
When it all goes wrong | PGConf EU 2019 | Will Leinweber by Citus Data
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will Leinweber
Citus Data173 views
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc by Citus Data
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Citus Data281 views
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E... by Citus Data
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
Citus Data397 views
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis by Citus Data
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Citus Data284 views
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire... by Citus Data
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Citus Data200 views
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc by Citus Data
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
Citus Data170 views
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior... by Citus Data
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Citus Data215 views
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine by Citus Data
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
Citus Data271 views
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S... by Citus Data
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Citus Data424 views
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber by Citus Data
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
Citus Data95 views
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine by Citus Data
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
Citus Data463 views
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv... by Citus Data
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Citus Data244 views
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine by Citus Data
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
Citus Data125 views
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber by Citus Data
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
Citus Data85 views
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano by Citus Data
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Citus Data212 views
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe... by Citus Data
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Citus Data971 views
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font... by Citus Data
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Citus Data134 views

Recently uploaded

Cencora Executive Symposium by
Cencora Executive SymposiumCencora Executive Symposium
Cencora Executive Symposiummarketingcommunicati21
139 views14 slides
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc
160 views29 slides
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...ShapeBlue
154 views62 slides
Business Analyst Series 2023 - Week 4 Session 7 by
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7DianaGray10
126 views31 slides
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...ShapeBlue
144 views12 slides
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...ShapeBlue
88 views13 slides

Recently uploaded(20)

TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc160 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue154 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10126 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue144 views
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue88 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue179 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue222 views
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
ShapeBlue176 views
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by ShapeBlue
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
ShapeBlue146 views
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue101 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash153 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu365 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue158 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li80 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue253 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty62 views

Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot

  • 1. Distributed Computing on PostgreSQL Marco Slot <marco@citusdata.com>
  • 4. Records Data warehouse Real-time analytics Big data architecture using postgres Messaging
  • 5. PostgreSQL is a perfect building block for distributed systems
  • 6. Features! PostgreSQL contains many useful features for building a distributed system: ● Well-defined protocol, libpq ● Crash safety ● Concurrent execution ● Transactions ● Access controls ● 2PC ● Replication ● Custom functions ● …
  • 7. Extensions! Built-in / contrib: ● postgres_fdw ● dblink RPC! ● plpgsql Third-party open source: ● pglogical ● pg_cron ● citus
  • 8. Extensions! Built-in / contrib: ● postgres_fdw ● dblink RPC! ● plpgsql Third-party open source: ● pglogical ● pg_cron ● citus Yours!
  • 9. dblink Run queries on remote postgres server SELECT dblink_connect(node_id, format('host=%s port=%s dbname=postgres', node_name, node_port)) FROM nodes; SELECT dblink_send_query(node_id, $$SELECT pg_database_size('postgres')$$) FROM nodes; SELECT sum(size::bigint) FROM nodes, dblink_get_result(nodes.node_id) AS r(size text); SELECT dblink_disconnect(node_id) FROM nodes;
  • 10. RPC using dblink For every postgres function, we can create a client-side stub using dblink. CREATE FUNCTION func(input text) ... CREATE FUNCTION remote_func(host text, port int, input text) RETURNS text LANGUAGE sql AS $function$ SELECT res FROM dblink( format('host=%s port=%s', host, port), format('SELECT * FROM func(%L)', input)) AS res(output text); $function$;
  • 11. PL/pgSQL Procedural language for Postgres: CREATE FUNCTION distributed_database_size(dbname text) RETURNS bigint LANGUAGE plpgsql AS $function$ DECLARE total_size bigint; BEGIN PERFORM dblink_send_query(node_id, format('SELECT pg_database_size(%L)', dbname) FROM nodes; SELECT sum(size::bigint) INTO total_size FROM nodes, dblink_get_result(nodes.node_id) AS r(size text); RETURN total_size END; $function$;
  • 12. Distributed system in progress... With these extensions, we can already create a simple distributed computing system. Nodes Nodes Nodes Nodes Parallel operation using dblink SELECT transform_data() Data 1 Data 2 Data 3 postgres_fdw?
  • 13. pglogical / logical replication Asynchronously replicate changes to another database. Nodes Nodes Nodes Nodes
  • 14. pg_paxos Consistently replicate changes between databases. Nodes Nodes Nodes
  • 15. pg_cron Cron-based job scheduler for postgres: CREATE EXTENSION pg_cron; SELECT cron.schedule('* * * * */10', 'SELECT transform_data()'); Internally uses libpq, meaning it can also schedule jobs on other nodes. pg_cron provides a way for nodes to act autonomously
  • 16. Citus Transparently shards tables across multiple nodes Coordinator E1 E4 E2 E5 E2 E5 Events create_distributed_table('events', 'event_id');
  • 17. Citus MX Nodes can have the distributed tables too Coordinator E1 E4 E2 E5 E2 E5 Events Events Events Events
  • 18. How to build a distributed system using only PostgreSQL & extensions?
  • 19. Building a streaming publish-subscribe system Producers Postgres nodes Consumers topic: adclick
  • 20. Storage nodes E1 E4 E2 E5 E2 E5 Events Events Events Coordinator Events CREATE TABLE Use Citus to create a distributed table
  • 21. Distributed Table Creation $ psql -h coordinator CREATE TABLE events ( event_id bigserial, ingest_time timestamptz default now(), topic_name text not null, payload jsonb ); SELECT create_distributed_table('events', 'event_id'); $ psql -h any-node INSERT INTO events (topic_name, payload) VALUES ('adclick','{...}');
  • 22. Sharding strategy Shard is chosen by hashing the value in the partition column. Application-defined: ● stream_id text not null Optimise data distribution: ● event_id bigserial Optimise ingest capacity and availability: ● sid int default pick_local_value()
  • 23. Producers connect to a random node and perform COPY or INSERT into events Producers E1 E4 E2 E5 E2 E5 Events Events Events COPY / INSERT
  • 24. Consumers in a group together consume events at least / exactly once. Consumers E1 E4 E2 E5 E2 E5 topic: adclick% Consumer group
  • 25. Consumers obtain leases for consuming a shard. Lease are kept in a separate table on each node: CREATE TABLE leases ( consumer_group text not null, shard_id bigint not null, owner text, new_owner text, last_heartbeat timestamptz, PRIMARY KEY (consumer_group, shard_id) ); Consumer leases
  • 26. Consumers obtain leases for consuming a shard. SELECT * FROM claim_lease('click-analytics', 'node-2', 102008); Under the covers: Insert a new lease or set new_owner to steal lease. CREATE FUNCTION claim_lease(group_name text, node_name text, shard_id int) … INSERT INTO leases (consumer_group, shard_id, owner, last_heartbeat) VALUES (group_name, shard, node_name, now()) ON CONFLICT (consumer_group, shard_id) DO UPDATE SET new_owner = node_name WHERE leases.new_owner IS NULL; Consumer leases
  • 27. Distributing leases across consumers Distributed algorithm for distributing leases across nodes SELECT * FROM obtain_leases('click-analytics', 'node-2') -- gets all available lease tables -- claim all unclaimed shards -- claim random shards until #claims >= #shards/#consumers Not perfect, but ensures all shards are consumed with load balancing (unless C>S)
  • 28. Consumers E1 E4 E2 E5 E2 E5 leases First consumer consumes all obtain_leases leases leases
  • 29. Consumers E1 E4 E2 E5 E2 E5 First consumer consumes all leases leases leases
  • 30. Consumers E1 E4 E2 E5 E2 E5 Second consumer steals leases from first consumer obtain_leases leases leases leases
  • 31. Consumers E1 E4 E2 E5 E2 E5 Second consumer steals leases from first consumer
  • 32. Consuming events Consumer wants to receive all events once. Several options: ● SQL level ● Logical decoding utility functions ● Use a replication connection ● PG10 logical replication / pglogical
  • 33. Consuming events Get a batch of events from a shard: SELECT * FROM poll_events('click-analytics', 'node-2', 102008, 'adclick', '<last-processed-event-id>'); -- Check if node has the lease Set owner = new_owner if new_owner is set -- Get all pending events (pg_logical_slot_peek_changes) -- Progress the replication slot (pg_logical_slot_get_changes) -- Return remaining events if still owner
  • 34. Consumer loop E1 E4 E2 E5 E2 E5 1. Call poll_events for each leased shard 2. Process events from each batch 3. Repeat with event IDs of last event in each batch poll_events
  • 35. Failure handling Producer / consumer fails to connect to storage node: → Connect to different node Storage node fails: → Use pick_local_value() for partition column, failover to hot standby Consumer fails to consume batch → Events are repeated until confirmed Consumer fails and does not come back → Consumers periodically call obtain_leases → Old leases expire
  • 36. Use pg_cron to periodically expire leases on coordinator: SELECT cron.schedule('* * * * *', 'SELECT expire_leases()'); CREATE FUNCTION expire_leases() ... UPDATE leases SET owner = new_owner, last_heartbeat = now() WHERE last_heartbeat < now() - interval '2 minutes' Maintenance: Lease expiration
  • 37. Use pg_cron to periodically expire leases on coordinator: $ psql -h coordinator SELECT cron.schedule('* * * * *', 'SELECT expire_events()'); CREATE FUNCTION expire_events() ... DELETE FROM events WHERE ingest_time < now() - interval '1 day'; Maintenance: Delete old events
  • 38. Prototyped a functional, highly available publish-subscribe systems in https://goo.gl/R1suAo ~300 lines of code
  • 39. Demo
  • 40. Records Data warehouse Real-time analytics Big data architecture using postgres Messaging