Build a Complex, Realtime Data Management App with Postgres 14!

Jonathan Katz
Jonathan KatzPrincipal Product Manager Technical at Amazon Web Services
Chicago PostgreSQL User Group - October 20, 2021 Jonathan S. Katz
Let's Build a Complex, Real-
Time Data Management
Application
• VP, Platform Engineering @ Crunchy Data
• Previously: Engineering Leadership @ Startups
• Longtime PostgreSQL community contributor
• Core Team Member
• Various Governance Committees
• Conference Organizer / Speaker
• @jkatz05
About Me
• Leading Team in Postgres – 10 contributors
• Certified Open Source PostgreSQL Distribution
• Leader in Postgres Technology for Kubernetes
• Crunchy Bridge: Fully managed cloud service
Crunchy Data
Your partner in deploying
open source PostgreSQL
throughout your enterprise.
CPSM Provider Plugin
This talk introduces many different tools and techniques available
in PostgreSQL for building applications.
It introduces different features and where to find out more
information.
We have a lot of material to cover in a short time - the slides and
demonstrations will be made available
How to Approach This Talk
CPSM Provider Plugin
Imagine we are managing virtual rooms for an event platform.
We have a set of operating hours in which the rooms can be
booked.
Only one booking can occur in a virtual room at a given time.
The Problem
CPSM Provider Plugin
For Example
CPSM Provider Plugin
We need to know...
- All the rooms that are available to book
- When the rooms are available to be booked (operating hours)
- When the rooms have been booked
And...
The system needs to be able to CRUD fast
(Create, Read, Update, Delete. Fast).
Specifications
🤔
Interlude:
Finding Availability
CPSM Provider Plugin
Availability can be thought about in three ways:
Closed
Available
Unavailable (or "booked")
Our ultimate "calendar tuple" is (room, status, range)
Managing Availability
CPSM Provider Plugin
PostgreSQL 9.2 introduced "range types" that included the ability to store and
efficiently search over ranges of data.
Built-in:
Date, Timestamps
Integer, Numeric
Lookups (e.g. overlaps) can be sped up using GiST indexes
Postgres Range Types
SELECT tstzrange('2021-10-28 09:30'::timestamptz, '2021-10-28 10:30'::timestamptz);
Availability
Availability
SELECT *
FROM (
VALUES
('closed', tstzrange('2021-10-28 0:00', '2021-10-28 8:00')),
('available', tstzrange('2021-10-28 08:00', '2021-10-28 09:30')),
('unavailable', tstzrange('2021-10-28 09:30', '2021-10-28 10:30')),
('available', tstzrange('2021-10-28 10:30', '2021-10-28 16:30')),
('unavailable', tstzrange('2021-10-28 16:30', '2021-10-28 18:30')),
('available', tstzrange('2021-10-28 18:30', '2021-10-28 20:00')),
('closed', tstzrange('2021-10-28 20:00', '2021-10-29 0:00'))
) x(status, calendar_range)
ORDER BY lower(x.calendar_range);
Easy, Right?
CPSM Provider Plugin
Insert new ranges and dividing them up
PostgreSQL did not work well with noncontiguous ranges…until PostgreSQL 14
Availability
Just for one day - what about other days?
What happens with data in the past?
What happens with data in the future?
Unavailability
Ensure no double-bookings
Overlapping Events?
Handling multiple spaces
But…
Managing Availability
availability_rule
id <serial> PRIMARY KEY
room_id <int> REFERENCES (room)
days_of_week <int[]>
start_time <time>
end_time <time>
generate_weeks_into_future <int>
DEFAULT 52
room
id <serial>
PRIMARY KEY
name <text>
availability
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
availability_rule_id <int>
REFERENCES (availabilityrule)
available_date <date>
available_range <tstzrange>
unavailability
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
unavailable_date <date>
unavailable_range <tstzrange>
calendar
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
status <text> DOMAIN:
{available, unavailable, closed}
calendar_date <date>
calendar_range <tstzrange>
CPSM Provider Plugin
We can now store data, but what about:
Generating initial calendar?
Generating availability based on rules?
Generating unavailability?
Sounds like we need to build an application
Managing Availability
CPSM Provider Plugin
To build our application, there are a few topics we will need to explore first:
generate_series
Recursive queries
Ranges and Multiranges
SQL Functions
Set returning functions
PL/pgsql
Triggers
Managing Availability
CPSM Provider Plugin
Generate series is a "set returning" function, i.e. a function that can return
multiple rows of data.
Generate series can return:
A set of numbers (int, bigint, numeric) either incremented by 1 or some
other integer interval
A set of timestamps incremented by a time interval(!!)
generate_series:
More Than Just For Test Data
SELECT x::date
FROM generate_series(
'2021-01-01'::date, '2021-12-31'::date, '1 day'::interval
) x;
CPSM Provider Plugin
PostgreSQL 8.4 introduced the "WITH" syntax and with it also introduced the
ability to perform recursive queries
WITH RECURSIVE ... AS ()
Base case vs. recursive case
UNION vs. UNION ALL
CAN HIT INFINITE LOOPS
Recursion in SQL?
CPSM Provider Plugin
Recursion in SQL?
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
)
SELECT fac.n, fac.i
FROM fac;
Infinite Recursion
CPSM Provider Plugin
Recursion in SQL?
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
)
SELECT fac.n, fac.i
FROM fac
LIMIT 100;
Postgres 14 introduces multirange types
Ordered list of ranges
Can be noncontiguous
Adds range aggregates: range_agg and unnest
Multirange Types
SELECT
datemultirange(
daterange(CURRENT_DATE, CURRENT_DATE + 1),
daterange(CURRENT_DATE + 5, CURRENT_DATE + 8),
daterange(CURRENT_DATE + 15, CURRENT_DATE + 22)
);
CPSM Provider Plugin
PostgreSQL provides the ability to write functions to help encapsulate
repeated behavior
PostgreSQL 11 introduces stored procedures which enables you to
embed transactions! PostgreSQL 14 adds the ability to get output from stored
procedures!
SQL functions have many properties, including:
Input / output
Volatility (IMMUTABLE, STABLE, VOLATILE) (default VOLATILE)
Parallel safety (default PARALLEL UNSAFE)
LEAKPROOF; SECURITY DEFINER
Execution Cost
Language type (more on this later)
Functions
CPSM Provider Plugin
Functions
CREATE OR REPLACE FUNCTION chipug_fac(n int)
RETURNS numeric
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT max(fac.n)
FROM fac;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
CPSM Provider Plugin
Functions
CREATE OR REPLACE FUNCTION chipug_fac_set(n int)
RETURNS SETOF numeric
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT fac.n
FROM fac
ORDER BY fac.n;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
CPSM Provider Plugin
Functions
CREATE OR REPLACE FUNCTION chipug_fac_table(n int)
RETURNS TABLE(n numeric)
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT fac.n
FROM fac
ORDER BY fac.n;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
CPSM Provider Plugin
PostgreSQL has the ability to load in procedural languages ("PL") and execute
code in them beyond SQL.
Built-in: pgSQL, Python, Perl, Tcl
Others: Javascript, R, Java, C, JVM, Container, LOLCODE, Ruby, PHP, Lua,
pgPSM, Scheme
Procedural Languages
CPSM Provider Plugin
PL/pgSQL
CREATE EXTENSION IF NOT EXISTS plpgsql;
CREATE OR REPLACE FUNCTION chipug_fac_plpgsql(n int)
RETURNS numeric
AS $$
DECLARE
fac numeric;
i int;
BEGIN
fac := 1;
FOR i IN 1..n LOOP
fac := fac * i;
END LOOP;
RETURN fac;
END;
$$ LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE;
CPSM Provider Plugin
Triggers are functions that can be called before/after/instead of an operation or event
Data changes (INSERT/UPDATE/DELETE)
Events (DDL, DCL, etc. changes)
Atomic
Must return "trigger" or "event_trigger"
(Return "NULL" in a trigger if you want to skip operation)
(Gotcha: RETURN OLD [INSERT] / RETURN NEW [DELETE])
Execute once per modified row or once per SQL statement
Multiple triggers on same event will execute in alphabetical order
Writeable in any PL language that defined trigger interface
Triggers
Building a
Synchronized System
We'll Scan the Code
It's Available for Download 😉
The Test
CPSM Provider Plugin
[Test your live demos before running them, and you will have much
success!]
availability_rule inserts took some time, > 350ms
availability: INSERT 52
calendar: INSERT 52 from nontrivial function
Updates on individual availability / unavailability are not too painful
Lookups are faaaaaaaast
Lessons of the Test
How About At (Web) Scale?
CPSM Provider Plugin
Recursive CTE 😢
Even with only 100 more rooms with a few set of rules, rule
generation time increased significantly
Multirange Types
These are still pretty fast and are handling scaling up well.
May still be slow for a web transaction.
Lookups are still lightning fast!
Web Scale
CPSM Provider Plugin
Added in PostgreSQL 9.4
Replays all logical changes made to the database
Create a logical replication slot in your database
Only one receiver can consume changes from one slot at a time
Slot keeps track of last change that was read by a receiver
If receiver disconnects, slot will ensure database holds changes until
receiver reconnects
Only changes from tables with primary keys are relayed
As of PostgreSQL 10, you can set a "REPLICA IDENTITY" on a
UNIQUE, NOT NULL, non-deferrable, non-partial column(s)
Basis for Logical Replication
Logical Decoding
CPSM Provider Plugin
A logical replication slot has a name and an output plugin
PostgreSQL comes with the "test" output plugin
Have to write a custom parser to read changes from test output plugin
Several output plugins and libraries available
wal2json: https://github.com/eulerto/wal2json
jsoncdc: https://github.com/instructure/jsoncdc
Debezium: http://debezium.io/
(Test: https://www.postgresql.org/docs/current/static/test-decoding.html)
Logical Replication (pgoutput)
Every data change in the database is streamed
Need to be aware of the logical decoding format
Logical Decoding Out of the Box
CPSM Provider Plugin
C: libpq
pg_recvlogical
PostgreSQL functions
Python: psycopg2 - version 2.7
JDBC: version 42
Go: pgx
JavaScript: node-postgres (pg-logical-replication)
Driver Support
CPSM Provider Plugin
Using Logical Decoding
CPSM Provider Plugin
We know it takes time to regenerate calendar
Want to ensure changes always propagate but want to ensure all users
(managers, calendar searchers) have good experience
Thoughts🤔
CPSM Provider Plugin
Will use the same data model as before as well as the same helper
functions, but without the triggers
We will have a Python script that reads from a logical replication
slot and if it detects a relevant change, take an action
Similar to what we did with triggers, but this moves the work to
OUTSIDE the transaction
BUT...we can confirm whether or not the work is completed, thus if
the program fails, we can restart from last acknowledged
transaction ID
Replacing Triggers
Reviewing the Code
CPSM Provider Plugin
A consumer of the logical stream can only read one change at a time
If our processing of a change takes a lot of time, it will create a backlog
of changes
Backlog means the PostgreSQL server needs to retain more WAL logs
Retaining too many WAL logs can lead to running out of disk space
Running out of disk space can lead to...rough times.
The Consumer Bottleneck
🌤
🌥
☁
🌩
Eliminating the Bottleneck
CPSM Provider Plugin
Can utilize a durable message queueing system to store any WAL changes
that are necessary to perform post-processing on
Ensure the changes are worked on in order
"Divide-and-conquer" workload - have multiple workers acting on
different "topics"
Remove WAL bloat
Shifting the Workload
CPSM Provider Plugin
Durable message processing and distribution system
Streams
Supports parallelization of consumers
Multiple consumers, partitions
Highly-available, distributed architecture
Acknowledgement of receiving, processing messages; can replay (sounds like
WAL?)
Can also accomplish this with Debezium, which interfaces with Kafka +
Postgres
Apache Kafka
CPSM Provider Plugin
Architecture
CPSM Provider Plugin
WAL Consumer
import json, sys
from kafka import KafkaProducer
from kafka.errors import KafkaError
import psycopg2
import psycopg2.extras
TABLES = set([
'availability', 'availability_rule', 'room', 'unavailability',
])
reader = WALConsumer()
cursor = reader.connection.cursor()
cursor.start_replication(slot_name='schedule', decode=True)
try:
cursor.consume_stream(reader)
except KeyboardInterrupt:
print("Stopping reader...")
finally:
cursor.close()
reader.connection.close()
print("Exiting reader")
CPSM Provider Plugin
class WALConsumer(object):
def __init__(self):
self.connection = psycopg2.connect("dbname=realtime",
connection_factory=psycopg2.extras.LogicalReplicationConnection,
)
self.producer = producer = KafkaProducer(
bootstrap_servers=['localhost:9092'],
value_serializer=lambda m: json.dumps(m).encode('ascii'),
)
def __call__(self, msg):
payload = json.loads(msg.payload, strict=False)
print(payload)
# determine if the payload should be passed on to a consumer
listening
# to the Kafka que
for data in payload['change']:
if data.get('table') in TABLES:
self.producer.send(data.get('table'), data)
# ensure everything is sent; call flush at this point
self.producer.flush()
# acknowledge that the change has been read - tells PostgreSQL to
stop
# holding onto this log file
msg.cursor.send_feedback(flush_lsn=msg.data_start)
CPSM Provider Plugin
Kafka Consumer
import json
from kafka import KafkaConsumer
from kafka.structs import OffsetAndMetadata, TopicPartition
import psycopg2
class Worker(object):
"""Base class to work perform any post processing on changes"""
OPERATIONS = set([]) # override with "insert", "update", "delete"
def __init__(self, topic):
# connect to the PostgreSQL database
self.connection = psycopg2.connect("dbname=realtime")
# connect to Kafka
self.consumer = KafkaConsumer(
bootstrap_servers=['localhost:9092'],
value_deserializer=lambda m: json.loads(m.decode('utf8')),
auto_offset_reset="earliest",
group_id='1')
# subscribe to the topic(s)
self.consumer.subscribe(topic if isinstance(topic, list) else [topic])
CPSM Provider Plugin
Kafka Consumer
def run(self):
"""Function that runs ad-infinitum"""
# loop through the payloads from the consumer
# determine if there are any follow-up actions based on the kind of
# operation, and if so, act upon it
# always commit when done.
for msg in self.consumer:
print(msg)
# load the data from the message
data = msg.value
# determine if there are any follow-up operations to perform
if data['kind'] in self.OPERATIONS:
# open up a cursor for interacting with PostgreSQL
cursor = self.connection.cursor()
# put the parameters in an easy to digest format
params = dict(zip(data['columnnames'], data['columnvalues']))
# all the function
getattr(self, data['kind'])(cursor, params)
# commit any work that has been done, and close the cursor
self.connection.commit()
cursor.close()
# acknowledge the message has been handled
tp = TopicPartition(msg.topic, msg.partition)
offsets = {tp: OffsetAndMetadata(msg.offset, None)}
self.consumer.commit(offsets=offsets)
CPSM Provider Plugin
Kafka Consumer
# override with the appropriate post-processing code
def insert(self, cursor, params):
"""Override with any post-processing to be done on an ``INSERT``"""
raise NotImplementedError()
def update(self, cursor, params):
"""Override with any post-processing to be done on an ``UPDATE``"""
raise NotImplementedError()
def delete(self, cursor, params):
"""Override with any post-processing to be done on an ``DELETE``"""
raise NotImplementedError()
Testing the Application
CPSM Provider Plugin
Logical decoding allows the bulk inserts to occur significantly faster from a
transactional view
Potential bottleneck for long running execution, but bottlenecks are isolated to
specific queues
Newer versions of PostgreSQL has features that make it easier to build
applications and scale
Lessons
CPSM Provider Plugin
PostgreSQL is robust.
Triggers will keep your data in sync but can have significant
performance overhead
Utilizing a logical replication slot can eliminate trigger overhead
and transfer the computational load elsewhere
Not a panacea: still need to use good architectural patterns!
Conclusion
Thank You
jonathan.katz@crunchydata.com
@jkatz05
https://github.com/CrunchyData/postgres-realtime-demo
1 of 57

Recommended

Building a Complex, Real-Time Data Management Application by
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
2.1K views85 slides
Looking ahead at PostgreSQL 15 by
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Jonathan Katz
695 views48 slides
PostgreSQL Administration for System Administrators by
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
7K views28 slides
Get to know PostgreSQL! by
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!Oddbjørn Steffensen
13.9K views110 slides
PostgreSQL Performance Tuning by
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
4.1K views64 slides
MySQL Advanced Administrator 2021 - 네오클로바 by
MySQL Advanced Administrator 2021 - 네오클로바MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바NeoClova
583 views109 slides

More Related Content

What's hot

Postgresql by
PostgresqlPostgresql
PostgresqlNexThoughts Technologies
3.6K views20 slides
Pgday bdr 천정대 by
Pgday bdr 천정대Pgday bdr 천정대
Pgday bdr 천정대PgDay.Seoul
1.6K views28 slides
Federated Engine 실무적용사례 by
Federated Engine 실무적용사례Federated Engine 실무적용사례
Federated Engine 실무적용사례I Goo Lee
3.2K views18 slides
PostgreSQL Deep Internal by
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep InternalEXEM
2.5K views134 slides
Microsoft SQL Server Query Tuning by
Microsoft SQL Server Query TuningMicrosoft SQL Server Query Tuning
Microsoft SQL Server Query TuningMark Ginnebaugh
2.4K views44 slides
Linux tuning to improve PostgreSQL performance by
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
35.9K views26 slides

What's hot(20)

Pgday bdr 천정대 by PgDay.Seoul
Pgday bdr 천정대Pgday bdr 천정대
Pgday bdr 천정대
PgDay.Seoul1.6K views
Federated Engine 실무적용사례 by I Goo Lee
Federated Engine 실무적용사례Federated Engine 실무적용사례
Federated Engine 실무적용사례
I Goo Lee3.2K views
PostgreSQL Deep Internal by EXEM
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
EXEM2.5K views
Microsoft SQL Server Query Tuning by Mark Ginnebaugh
Microsoft SQL Server Query TuningMicrosoft SQL Server Query Tuning
Microsoft SQL Server Query Tuning
Mark Ginnebaugh2.4K views
Mastering PostgreSQL Administration by EDB
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
EDB7.2K views
What is new in PostgreSQL 14? by Mydbops
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
Mydbops653 views
Postgresql Database Administration Basic - Day1 by PoguttuezhiniVP
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1
PoguttuezhiniVP118 views
[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL by PgDay.Seoul
[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL
[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL
PgDay.Seoul422 views
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang by Databricks
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks5.9K views
The Full MySQL and MariaDB Parallel Replication Tutorial by Jean-François Gagné
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
Parallel Replication in MySQL and MariaDB by Mydbops
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
Mydbops861 views
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7) by Aurimas Mikalauskas
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
Aurimas Mikalauskas7.5K views
MySQL Performance Schema in Action: the Complete Tutorial by Sveta Smirnova
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete Tutorial
Sveta Smirnova2.1K views
Scaling Apache Spark on Kubernetes at Lyft by Databricks
Scaling Apache Spark on Kubernetes at LyftScaling Apache Spark on Kubernetes at Lyft
Scaling Apache Spark on Kubernetes at Lyft
Databricks2.2K views
Introduction to PostgreSQL by Joel Brewer
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
Joel Brewer881 views
PostgreSQL Database Slides by metsarin
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
metsarin5.4K views
Galera cluster for high availability by Mydbops
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
Mydbops2.2K views

Similar to Build a Complex, Realtime Data Management App with Postgres 14!

PostgreSQL 9.6 새 기능 소개 by
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PgDay.Seoul
3.2K views39 slides
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1 by
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
160 views44 slides
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1 by
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
73 views44 slides
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla by
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
4.3K views33 slides
Apache Cassandra 2.0 by
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
6.4K views19 slides
At the core you will have KUSTO by
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTORiccardo Zamana
301 views54 slides

Similar to Build a Complex, Realtime Data Management App with Postgres 14!(20)

PostgreSQL 9.6 새 기능 소개 by PgDay.Seoul
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
PgDay.Seoul3.2K views
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1 by MariaDB plc
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc160 views
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1 by MariaDB plc
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc73 views
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla by ScyllaDB
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
ScyllaDB4.3K views
Apache Cassandra 2.0 by Joe Stein
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
Joe Stein6.4K views
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL by Cloudera, Inc.
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLHBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
Cloudera, Inc.5.3K views
Spock Framework - Slidecast by Daniel Kolman
Spock Framework - SlidecastSpock Framework - Slidecast
Spock Framework - Slidecast
Daniel Kolman1K views
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К... by Ontico
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Ontico7.2K views
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open by PostgresOpen
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PostgresOpen3.6K views
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose by Nikolay Samokhvalov
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
Intro to Scalable Deep Learning on AWS with Apache MXNet by Amazon Web Services
Intro to Scalable Deep Learning on AWS with Apache MXNetIntro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNet
Jdbc oracle by yazidds2
Jdbc oracleJdbc oracle
Jdbc oracle
yazidds2530 views
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features by Dave Stokes
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesLinuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Dave Stokes137 views
Postgres Vienna DB Meetup 2014 by Michael Renner
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
Michael Renner1.2K views
Your Timestamps Deserve Better than a Generic Database by javier ramirez
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
javier ramirez12 views
Scaling PostgreSQL With GridSQL by Jim Mlodgenski
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
Jim Mlodgenski3.9K views

More from Jonathan Katz

Vectors are the new JSON in PostgreSQL by
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLJonathan Katz
1.2K views27 slides
High Availability PostgreSQL on OpenShift...and more! by
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!Jonathan Katz
548 views17 slides
Get Your Insecure PostgreSQL Passwords to SCRAM by
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMJonathan Katz
1.4K views121 slides
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM by
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMJonathan Katz
1.7K views121 slides
Operating PostgreSQL at Scale with Kubernetes by
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
2.7K views28 slides
Using PostgreSQL With Docker & Kubernetes - July 2018 by
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
930 views32 slides

More from Jonathan Katz(12)

Vectors are the new JSON in PostgreSQL by Jonathan Katz
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
Jonathan Katz1.2K views
High Availability PostgreSQL on OpenShift...and more! by Jonathan Katz
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
Jonathan Katz548 views
Get Your Insecure PostgreSQL Passwords to SCRAM by Jonathan Katz
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
Jonathan Katz1.4K views
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM by Jonathan Katz
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Jonathan Katz1.7K views
Operating PostgreSQL at Scale with Kubernetes by Jonathan Katz
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
Jonathan Katz2.7K views
Using PostgreSQL With Docker & Kubernetes - July 2018 by Jonathan Katz
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
Jonathan Katz930 views
An Introduction to Using PostgreSQL with Docker & Kubernetes by Jonathan Katz
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
Jonathan Katz2.3K views
Developing and Deploying Apps with the Postgres FDW by Jonathan Katz
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz4.5K views
On Beyond (PostgreSQL) Data Types by Jonathan Katz
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
Jonathan Katz3.2K views
Accelerating Local Search with PostgreSQL (KNN-Search) by Jonathan Katz
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)
Jonathan Katz5.9K views
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies by Jonathan Katz
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Jonathan Katz18K views
Indexing Complex PostgreSQL Data Types by Jonathan Katz
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
Jonathan Katz17.1K views

Recently uploaded

Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
54 views27 slides
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...ShapeBlue
97 views28 slides
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...ShapeBlue
114 views12 slides
Business Analyst Series 2023 - Week 4 Session 7 by
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7DianaGray10
110 views31 slides
Kyo - Functional Scala 2023.pdf by
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
443 views92 slides
Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
74 views38 slides

Recently uploaded(20)

Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty54 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue97 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue114 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10110 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash103 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue63 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue128 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 views
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T by ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue81 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE67 views
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue69 views
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue110 views
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
ShapeBlue147 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue218 views

Build a Complex, Realtime Data Management App with Postgres 14!

  • 1. Chicago PostgreSQL User Group - October 20, 2021 Jonathan S. Katz Let's Build a Complex, Real- Time Data Management Application
  • 2. • VP, Platform Engineering @ Crunchy Data • Previously: Engineering Leadership @ Startups • Longtime PostgreSQL community contributor • Core Team Member • Various Governance Committees • Conference Organizer / Speaker • @jkatz05 About Me
  • 3. • Leading Team in Postgres – 10 contributors • Certified Open Source PostgreSQL Distribution • Leader in Postgres Technology for Kubernetes • Crunchy Bridge: Fully managed cloud service Crunchy Data Your partner in deploying open source PostgreSQL throughout your enterprise.
  • 4. CPSM Provider Plugin This talk introduces many different tools and techniques available in PostgreSQL for building applications. It introduces different features and where to find out more information. We have a lot of material to cover in a short time - the slides and demonstrations will be made available How to Approach This Talk
  • 5. CPSM Provider Plugin Imagine we are managing virtual rooms for an event platform. We have a set of operating hours in which the rooms can be booked. Only one booking can occur in a virtual room at a given time. The Problem
  • 7. CPSM Provider Plugin We need to know... - All the rooms that are available to book - When the rooms are available to be booked (operating hours) - When the rooms have been booked And... The system needs to be able to CRUD fast (Create, Read, Update, Delete. Fast). Specifications
  • 10. CPSM Provider Plugin Availability can be thought about in three ways: Closed Available Unavailable (or "booked") Our ultimate "calendar tuple" is (room, status, range) Managing Availability
  • 11. CPSM Provider Plugin PostgreSQL 9.2 introduced "range types" that included the ability to store and efficiently search over ranges of data. Built-in: Date, Timestamps Integer, Numeric Lookups (e.g. overlaps) can be sped up using GiST indexes Postgres Range Types SELECT tstzrange('2021-10-28 09:30'::timestamptz, '2021-10-28 10:30'::timestamptz);
  • 13. Availability SELECT * FROM ( VALUES ('closed', tstzrange('2021-10-28 0:00', '2021-10-28 8:00')), ('available', tstzrange('2021-10-28 08:00', '2021-10-28 09:30')), ('unavailable', tstzrange('2021-10-28 09:30', '2021-10-28 10:30')), ('available', tstzrange('2021-10-28 10:30', '2021-10-28 16:30')), ('unavailable', tstzrange('2021-10-28 16:30', '2021-10-28 18:30')), ('available', tstzrange('2021-10-28 18:30', '2021-10-28 20:00')), ('closed', tstzrange('2021-10-28 20:00', '2021-10-29 0:00')) ) x(status, calendar_range) ORDER BY lower(x.calendar_range);
  • 15. CPSM Provider Plugin Insert new ranges and dividing them up PostgreSQL did not work well with noncontiguous ranges…until PostgreSQL 14 Availability Just for one day - what about other days? What happens with data in the past? What happens with data in the future? Unavailability Ensure no double-bookings Overlapping Events? Handling multiple spaces But…
  • 16. Managing Availability availability_rule id <serial> PRIMARY KEY room_id <int> REFERENCES (room) days_of_week <int[]> start_time <time> end_time <time> generate_weeks_into_future <int> DEFAULT 52 room id <serial> PRIMARY KEY name <text> availability id <serial> PRIMARY KEY room_id <int> REFERENCES (room) availability_rule_id <int> REFERENCES (availabilityrule) available_date <date> available_range <tstzrange> unavailability id <serial> PRIMARY KEY room_id <int> REFERENCES (room) unavailable_date <date> unavailable_range <tstzrange> calendar id <serial> PRIMARY KEY room_id <int> REFERENCES (room) status <text> DOMAIN: {available, unavailable, closed} calendar_date <date> calendar_range <tstzrange>
  • 17. CPSM Provider Plugin We can now store data, but what about: Generating initial calendar? Generating availability based on rules? Generating unavailability? Sounds like we need to build an application Managing Availability
  • 18. CPSM Provider Plugin To build our application, there are a few topics we will need to explore first: generate_series Recursive queries Ranges and Multiranges SQL Functions Set returning functions PL/pgsql Triggers Managing Availability
  • 19. CPSM Provider Plugin Generate series is a "set returning" function, i.e. a function that can return multiple rows of data. Generate series can return: A set of numbers (int, bigint, numeric) either incremented by 1 or some other integer interval A set of timestamps incremented by a time interval(!!) generate_series: More Than Just For Test Data SELECT x::date FROM generate_series( '2021-01-01'::date, '2021-12-31'::date, '1 day'::interval ) x;
  • 20. CPSM Provider Plugin PostgreSQL 8.4 introduced the "WITH" syntax and with it also introduced the ability to perform recursive queries WITH RECURSIVE ... AS () Base case vs. recursive case UNION vs. UNION ALL CAN HIT INFINITE LOOPS Recursion in SQL?
  • 21. CPSM Provider Plugin Recursion in SQL? WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac ) SELECT fac.n, fac.i FROM fac; Infinite Recursion
  • 22. CPSM Provider Plugin Recursion in SQL? WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac ) SELECT fac.n, fac.i FROM fac LIMIT 100;
  • 23. Postgres 14 introduces multirange types Ordered list of ranges Can be noncontiguous Adds range aggregates: range_agg and unnest Multirange Types SELECT datemultirange( daterange(CURRENT_DATE, CURRENT_DATE + 1), daterange(CURRENT_DATE + 5, CURRENT_DATE + 8), daterange(CURRENT_DATE + 15, CURRENT_DATE + 22) );
  • 24. CPSM Provider Plugin PostgreSQL provides the ability to write functions to help encapsulate repeated behavior PostgreSQL 11 introduces stored procedures which enables you to embed transactions! PostgreSQL 14 adds the ability to get output from stored procedures! SQL functions have many properties, including: Input / output Volatility (IMMUTABLE, STABLE, VOLATILE) (default VOLATILE) Parallel safety (default PARALLEL UNSAFE) LEAKPROOF; SECURITY DEFINER Execution Cost Language type (more on this later) Functions
  • 25. CPSM Provider Plugin Functions CREATE OR REPLACE FUNCTION chipug_fac(n int) RETURNS numeric AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT max(fac.n) FROM fac; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 26. CPSM Provider Plugin Functions CREATE OR REPLACE FUNCTION chipug_fac_set(n int) RETURNS SETOF numeric AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT fac.n FROM fac ORDER BY fac.n; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 27. CPSM Provider Plugin Functions CREATE OR REPLACE FUNCTION chipug_fac_table(n int) RETURNS TABLE(n numeric) AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT fac.n FROM fac ORDER BY fac.n; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 28. CPSM Provider Plugin PostgreSQL has the ability to load in procedural languages ("PL") and execute code in them beyond SQL. Built-in: pgSQL, Python, Perl, Tcl Others: Javascript, R, Java, C, JVM, Container, LOLCODE, Ruby, PHP, Lua, pgPSM, Scheme Procedural Languages
  • 29. CPSM Provider Plugin PL/pgSQL CREATE EXTENSION IF NOT EXISTS plpgsql; CREATE OR REPLACE FUNCTION chipug_fac_plpgsql(n int) RETURNS numeric AS $$ DECLARE fac numeric; i int; BEGIN fac := 1; FOR i IN 1..n LOOP fac := fac * i; END LOOP; RETURN fac; END; $$ LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE;
  • 30. CPSM Provider Plugin Triggers are functions that can be called before/after/instead of an operation or event Data changes (INSERT/UPDATE/DELETE) Events (DDL, DCL, etc. changes) Atomic Must return "trigger" or "event_trigger" (Return "NULL" in a trigger if you want to skip operation) (Gotcha: RETURN OLD [INSERT] / RETURN NEW [DELETE]) Execute once per modified row or once per SQL statement Multiple triggers on same event will execute in alphabetical order Writeable in any PL language that defined trigger interface Triggers
  • 32. We'll Scan the Code It's Available for Download 😉
  • 34. CPSM Provider Plugin [Test your live demos before running them, and you will have much success!] availability_rule inserts took some time, > 350ms availability: INSERT 52 calendar: INSERT 52 from nontrivial function Updates on individual availability / unavailability are not too painful Lookups are faaaaaaaast Lessons of the Test
  • 35. How About At (Web) Scale?
  • 36. CPSM Provider Plugin Recursive CTE 😢 Even with only 100 more rooms with a few set of rules, rule generation time increased significantly Multirange Types These are still pretty fast and are handling scaling up well. May still be slow for a web transaction. Lookups are still lightning fast! Web Scale
  • 37. CPSM Provider Plugin Added in PostgreSQL 9.4 Replays all logical changes made to the database Create a logical replication slot in your database Only one receiver can consume changes from one slot at a time Slot keeps track of last change that was read by a receiver If receiver disconnects, slot will ensure database holds changes until receiver reconnects Only changes from tables with primary keys are relayed As of PostgreSQL 10, you can set a "REPLICA IDENTITY" on a UNIQUE, NOT NULL, non-deferrable, non-partial column(s) Basis for Logical Replication Logical Decoding
  • 38. CPSM Provider Plugin A logical replication slot has a name and an output plugin PostgreSQL comes with the "test" output plugin Have to write a custom parser to read changes from test output plugin Several output plugins and libraries available wal2json: https://github.com/eulerto/wal2json jsoncdc: https://github.com/instructure/jsoncdc Debezium: http://debezium.io/ (Test: https://www.postgresql.org/docs/current/static/test-decoding.html) Logical Replication (pgoutput) Every data change in the database is streamed Need to be aware of the logical decoding format Logical Decoding Out of the Box
  • 39. CPSM Provider Plugin C: libpq pg_recvlogical PostgreSQL functions Python: psycopg2 - version 2.7 JDBC: version 42 Go: pgx JavaScript: node-postgres (pg-logical-replication) Driver Support
  • 40. CPSM Provider Plugin Using Logical Decoding
  • 41. CPSM Provider Plugin We know it takes time to regenerate calendar Want to ensure changes always propagate but want to ensure all users (managers, calendar searchers) have good experience Thoughts🤔
  • 42. CPSM Provider Plugin Will use the same data model as before as well as the same helper functions, but without the triggers We will have a Python script that reads from a logical replication slot and if it detects a relevant change, take an action Similar to what we did with triggers, but this moves the work to OUTSIDE the transaction BUT...we can confirm whether or not the work is completed, thus if the program fails, we can restart from last acknowledged transaction ID Replacing Triggers
  • 44. CPSM Provider Plugin A consumer of the logical stream can only read one change at a time If our processing of a change takes a lot of time, it will create a backlog of changes Backlog means the PostgreSQL server needs to retain more WAL logs Retaining too many WAL logs can lead to running out of disk space Running out of disk space can lead to...rough times. The Consumer Bottleneck 🌤 🌥 ☁ 🌩
  • 46. CPSM Provider Plugin Can utilize a durable message queueing system to store any WAL changes that are necessary to perform post-processing on Ensure the changes are worked on in order "Divide-and-conquer" workload - have multiple workers acting on different "topics" Remove WAL bloat Shifting the Workload
  • 47. CPSM Provider Plugin Durable message processing and distribution system Streams Supports parallelization of consumers Multiple consumers, partitions Highly-available, distributed architecture Acknowledgement of receiving, processing messages; can replay (sounds like WAL?) Can also accomplish this with Debezium, which interfaces with Kafka + Postgres Apache Kafka
  • 49. CPSM Provider Plugin WAL Consumer import json, sys from kafka import KafkaProducer from kafka.errors import KafkaError import psycopg2 import psycopg2.extras TABLES = set([ 'availability', 'availability_rule', 'room', 'unavailability', ]) reader = WALConsumer() cursor = reader.connection.cursor() cursor.start_replication(slot_name='schedule', decode=True) try: cursor.consume_stream(reader) except KeyboardInterrupt: print("Stopping reader...") finally: cursor.close() reader.connection.close() print("Exiting reader")
  • 50. CPSM Provider Plugin class WALConsumer(object): def __init__(self): self.connection = psycopg2.connect("dbname=realtime", connection_factory=psycopg2.extras.LogicalReplicationConnection, ) self.producer = producer = KafkaProducer( bootstrap_servers=['localhost:9092'], value_serializer=lambda m: json.dumps(m).encode('ascii'), ) def __call__(self, msg): payload = json.loads(msg.payload, strict=False) print(payload) # determine if the payload should be passed on to a consumer listening # to the Kafka que for data in payload['change']: if data.get('table') in TABLES: self.producer.send(data.get('table'), data) # ensure everything is sent; call flush at this point self.producer.flush() # acknowledge that the change has been read - tells PostgreSQL to stop # holding onto this log file msg.cursor.send_feedback(flush_lsn=msg.data_start)
  • 51. CPSM Provider Plugin Kafka Consumer import json from kafka import KafkaConsumer from kafka.structs import OffsetAndMetadata, TopicPartition import psycopg2 class Worker(object): """Base class to work perform any post processing on changes""" OPERATIONS = set([]) # override with "insert", "update", "delete" def __init__(self, topic): # connect to the PostgreSQL database self.connection = psycopg2.connect("dbname=realtime") # connect to Kafka self.consumer = KafkaConsumer( bootstrap_servers=['localhost:9092'], value_deserializer=lambda m: json.loads(m.decode('utf8')), auto_offset_reset="earliest", group_id='1') # subscribe to the topic(s) self.consumer.subscribe(topic if isinstance(topic, list) else [topic])
  • 52. CPSM Provider Plugin Kafka Consumer def run(self): """Function that runs ad-infinitum""" # loop through the payloads from the consumer # determine if there are any follow-up actions based on the kind of # operation, and if so, act upon it # always commit when done. for msg in self.consumer: print(msg) # load the data from the message data = msg.value # determine if there are any follow-up operations to perform if data['kind'] in self.OPERATIONS: # open up a cursor for interacting with PostgreSQL cursor = self.connection.cursor() # put the parameters in an easy to digest format params = dict(zip(data['columnnames'], data['columnvalues'])) # all the function getattr(self, data['kind'])(cursor, params) # commit any work that has been done, and close the cursor self.connection.commit() cursor.close() # acknowledge the message has been handled tp = TopicPartition(msg.topic, msg.partition) offsets = {tp: OffsetAndMetadata(msg.offset, None)} self.consumer.commit(offsets=offsets)
  • 53. CPSM Provider Plugin Kafka Consumer # override with the appropriate post-processing code def insert(self, cursor, params): """Override with any post-processing to be done on an ``INSERT``""" raise NotImplementedError() def update(self, cursor, params): """Override with any post-processing to be done on an ``UPDATE``""" raise NotImplementedError() def delete(self, cursor, params): """Override with any post-processing to be done on an ``DELETE``""" raise NotImplementedError()
  • 55. CPSM Provider Plugin Logical decoding allows the bulk inserts to occur significantly faster from a transactional view Potential bottleneck for long running execution, but bottlenecks are isolated to specific queues Newer versions of PostgreSQL has features that make it easier to build applications and scale Lessons
  • 56. CPSM Provider Plugin PostgreSQL is robust. Triggers will keep your data in sync but can have significant performance overhead Utilizing a logical replication slot can eliminate trigger overhead and transfer the computational load elsewhere Not a panacea: still need to use good architectural patterns! Conclusion