Building a Complex, Real-Time Data Management Application

Jonathan Katz
Jonathan KatzPrincipal Product Manager Technical at Amazon Web Services
Let's Build a Complex, Real-Time
Data Management Application
JONATHAN S. KATZ
PGCONF.EU 2018
OCTOBER 25, 2018
...before the session ends!
About Crunchy Data
2
Market Leading Data Security
• Crunchy Certified PostgreSQL is open source and Common Criteria EAL 2+ Certified, with
essential security enhancements for enterprise deployment

• Author of the DISA Secure Technology Implementation Guide for PostgreSQL and co-author
of CIS PostgreSQL Benchmark. Move ATO from weeks to days!
Cloud Ready Data Management
• Open source, Kubernetes-based solutions proven to scale to 1000s of database instances

• Cloud-agnostic technology provide flexibility on how to deploy databases to public
clouds, private clouds, or on-premise technology
Leader in Open Source Enterprise PostgreSQL
• Developer of essential open source tools for high availability, disaster recovery, and and
monitoring for PostgreSQL

• Leading contributor and sponsor of features that enhance stability, security, and performance
of PostgreSQL
• Director of Communications, Crunchy Data

• Previously: Engineering leadership in startups

• Longtime PostgreSQL community contributor

• Advocacy & various committees for PGDG

• @postgresql + .org content

• Director, PgUS

• Co-Organizer, NYCPUG

• Conference organization + speaking

• @jkatz05
About Me
3
• This talk introduces many different tools and techniques available in
PostgreSQL for building applications

• Introduces different features and where to find out more information

• We have a lot of material to cover in a short time - the slides and
demonstrations will be made available
How to Approach This Talk
4
• Imagine we are managing the rooms at the Marriott Lisbon Hotel

• We have a set of operating hours in which the rooms can be booked

• Only one booking can occur in the room at a given time
The Problem
5
For example...
6
• We need to know...

• All the rooms that are available to book

• When the rooms are available to be booked (operating hours)

• When the rooms have been booked 

• And...

• The system needs to be able to CRUD fast

• (Create, Read, Update, Delete. Fast).
Application Requirements
7
8
🤔
First, let's talk about how we can find
availability
• Availability can be thought about in three ways:

• Closed

• Available

• Unavailable (or "booked")

• Our ultimate "calendar tuple" is (room, status, range)
Managing Availability
10
• PostgreSQL 9.2 introduced "range types" that included the ability to store
and efficiently search over ranges of data

• Built-in:

• Date, Timestamps

• Integer, Numeric

• Lookups (e.g. overlaps) can be sped up using GiST and SP-GiST indexes
PostgreSQL Range Types
11
SELECT
tstzrange('2018-10-26 09:30'::timestamptz, '2018-10-26 10:30'::timestamptz);
Availability
12
Availability
13
SELECT *
FROM (
VALUES
('closed', tstzrange('2018-10-26 0:00', '2018-10-26 8:00')),
('available', tstzrange('2018-10-26 08:00', '2018-10-26 09:30')),
('unavailable', tstzrange('2018-10-26 09:30', '2018-10-26 10:30')),
('available', tstzrange('2018-10-26 10:30', '2018-10-26 16:30')),
('unavailable', tstzrange('2018-10-26 16:30', '2018-10-26 18:30')),
('available', tstzrange('2018-10-26 18:30', '2018-10-26 20:00')),
('closed', tstzrange('2018-10-26 20:00', '2018-10-27 0:00'))
) x(status, calendar_range)
ORDER BY lower(x.calendar_range);
Easy, right?
• Insert new ranges and dividing them up

• PostgreSQL does not work well with discontiguous ranges (...yet)

• Availability

• Just for one day - what about other days?

• What happens with data in the past?

• What happens with data in the future?

• Unavailability

• Ensure no double-bookings

• Overlapping Events?

• Just one space
But...
15
availability_rule
id <serial> PRIMARY KEY
room_id <int> REFERENCES (room)
days_of_week <int[]>
start_time <time>
end_time <time>
generate_weeks_into_future <int>
DEFAULT 52
room
id <serial>
PRIMARY KEY
name <text>
availability
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
availability_rule_id <int>
REFERENCES (availabilityrule)
available_date <date>
available_range <tstzrange>
unavailability
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
unavailable_date <date>
unavailable_range <tstzrange>
calendar
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
status <text> DOMAIN:
{available, unavailable, closed}
calendar_date <date>
calendar_range <tstzrange>
Managing Availability
16
• We can now store data, but what about:

• Generating initial calendar?

• Generating availability based on rules?

• Generating unavailability?

• Sounds like we need to build an application
Managing Availability
17
• To build our application, there are a few topics we will need to explore first:

• generate_series

• Recursive queries

• SQL Functions

• Set returning functions

• PL/pgsql

• Triggers
Managing Availability
18
• Generate series is a "set returning" function, i.e. a function that can return
multiple rows of data

• Generate series can return:

• A set of numbers (int, bigint, numeric) either incremented by 1 or some
other integer interval

• A set of timestamps incremented by a time interval(!!)
generate_series: More than just generating test data
19
SELECT x::date
FROM generate_series(
'2018-01-01'::date, '2018-12-31'::date, '1 day'::interval
) x;
• PostgreSQL 8.4 introduced the "WITH" syntax and with it also introduced
the ability to perform recursive queries

• WITH RECURSIVE ... AS ()

• Base case vs. recursive case

• UNION vs. UNION ALL

• CAN HIT INFINITE LOOPS
Recursion in my SQL?
20
Recursion in my SQL?
21
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
)
SELECT fac.n, fac.i
FROM fac;
Nope
Recursion in my SQL?
22
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= 100
)
SELECT fac.n, fac.i
FROM fac;
Better
• PostgreSQL provides the ability to write functions to help encapsulate repeated
behavior

• PostgreSQL 11 introduces stored procedures which enables you to embed
transactions!

• SQL functions have many properties, including:

• Input / output

• Volatility (IMMUTABLE, STABLE, VOLATILE) (default VOLATILE)

• Parallel safety (default PARALLEL UNSAFE)

• LEAKPROOF; SECURITY DEFINER

• Execution Cost

• Language type (more on this later)
Functions
23
Functions
24
CREATE OR REPLACE FUNCTION pgconfeu_fac(n int)
RETURNS numeric
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT max(fac.n)
FROM fac;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
Functions
25
CREATE OR REPLACE FUNCTION pgconfeu_fac_set(n int)
RETURNS SETOF numeric
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT fac.n
FROM fac
ORDER BY fac.n;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
Functions
26
CREATE OR REPLACE FUNCTION pgopen_fac_table(n int)
RETURNS TABLE(n numeric)
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT fac.n
FROM fac
ORDER BY fac.n;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
• PostgreSQL has the ability to load in procedural languages and execute
code in them beyond SQL

• "PL"

• Built-in: pgSQL, Python, Perl, Tcl

• Others: Javascript, R, Java, C, JVM, Container, LOLCODE, Ruby, PHP,
Lua, pgPSM, Scheme
Procedural Languages
27
PL/pgSQL
28
CREATE EXTENSION IF NOT EXISTS plpgsql;
CREATE OR REPLACE FUNCTION pgopen_fac_plpgsql(n int)
RETURNS numeric
AS $$
DECLARE
fac numeric;
i int;
BEGIN
fac := 1;
FOR i IN 1..n LOOP
fac := fac * i;
END LOOP;
RETURN fac;
END;
$$ LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE;
• Triggers are functions that can be called before/after/instead of an operation or
event

• Data changes (INSERT/UPDATE/DELETE)

• Events (DDL, DCL, etc. changes)

• Atomic

• Must return "trigger" or "event_trigger"

• (Return "NULL" in a trigger if you want to skip operation)

• (Gotcha: RETURN OLD [INSERT] / RETURN NEW [DELETE])

• Execute once per modified row or once per SQL statement

• Multiple triggers on same event will execute in alphabetical order

• Writeable in any PL language that defined trigger interface
Triggers
29
Building a Synchronized System
We will scan through the application code.
It will be available for download later ;-)
The Test
• [Test your live demos before running them, and you will have much
success!]

• availability_rule inserts took some time, > 500ms

• availability: INSERT 52 

• calendar: INSERT 52 from nontrivial function

• Updates on individual availability / unavailability are not too painful

• Lookups are faaaaaaaast
Lessons of The Test
33
How about at (web) scale?
• Even with only 100 more rooms with a few set of rules, rule generation
time increased significantly

• Lookups are still lightning fast!
Web Scale :(
35
• Added in PostgreSQL 9.4

• Replays all logical changes made to the database

• Create a logical replication slot in your database

• Only one receiver can consume changes from one slot at a time

• Slot keeps track of last change that was read by a receiver

• If receiver disconnects, slot will ensure database holds changes until receiver reconnects

• Only changes from tables with primary keys are relayed
• As of PostgreSQL 10, you can set a "REPLICA IDENTITY" on a UNIQUE, NOT NULL,
non-deferrable, non-partial column(s)

• Basis for Logical Replication
Logical Decoding
36
• A logical replication slot has a name and an output plugin

• PostgreSQL comes with the "test" output plugin

• Have to write a custom parser to read changes from test output plugin

• Several output plugins and libraries available

• wal2json: https://github.com/eulerto/wal2json

• jsoncdc: https://github.com/posix4e/jsoncdc

• Debezium: http://debezium.io/

• (Test: https://www.postgresql.org/docs/11/static/test-decoding.html)

• Every change in the database is streamed

• Need to be aware of the logical decoding format
Logical Decoding Out of the Box
37
• C: libpq

• pg_recvlogical

• PostgreSQL functions

• Python: psycopg2 - version 2.7

• JDBC: version 42

• Go: go-pgx

• JavaScript: node-postgres (pg-logical-replication)
Driver Support
38
Using Logical Decoding
39
wal_level = logical
max_wal_senders = 2
max_replication_slots = 2
postgresql.conf
local replication jkatz trust
pg_hba.conf
# DEVELOPMENT ONLY
SELECT *
FROM pg_create_logical_replication_slot('schedule', 'wal2json');
In the database:
• We know it takes time to regenerate calendar

• Want to ensure changes always propagate but want to ensure all users
(managers, calendar searchers) have good experience
Thoughts
40
🤔
• Will use the same data model as before as well as the same helper
functions, but without the triggers

• (That's a lie, we will have one set of DELETE triggers as "DELETE" in
the wal2json output plugin currently does not provide enough
information)
Replacing Triggers
41
Replacing Triggers
42
/**
* Helper function: substitute the data within the `calendar`; this can be used
* for all updates that occur on `availability` and `unavailability`
*/
CREATE OR REPLACE FUNCTION calendar_manage(room_id int, calendar_date date)
RETURNS void
AS $$
WITH delete_calendar AS (
DELETE FROM calendar
WHERE
room_id = $1 AND
calendar_date = $2
)
INSERT INTO calendar (room_id, status, calendar_date, calendar_range)
SELECT $1, c.status, $2, c.calendar_range
FROM calendar_generate_calendar($1, tstzrange($2, $2 + 1)) c
$$ LANGUAGE SQL;
Replacing Triggers
43
/** Now, the trigger functions for availability and unavailability; needs this for DELETE */
CREATE OR REPLACE FUNCTION availability_manage()
RETURNS trigger
AS $trigger$
BEGIN
IF TG_OP = 'DELETE' THEN
PERFORM calendar_manage(OLD.room_id, OLD.available_date);
RETURN OLD;
END IF;
END;
$trigger$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION unavailability_manage()
RETURNS trigger
AS $trigger$
BEGIN
IF TG_OP = 'DELETE' THEN
PERFORM calendar_manage(OLD.room_id, OLD.unavailable_date);
RETURN OLD;
END IF;
END;
$trigger$
LANGUAGE plpgsql;
/** And the triggers, applied to everything */
CREATE TRIGGER availability_manage
AFTER DELETE ON availability
FOR EACH ROW
EXECUTE PROCEDURE availability_manage();
CREATE TRIGGER unavailability_manage
AFTER DELETE ON unavailability
FOR EACH ROW
EXECUTE PROCEDURE unavailability_manage();
• We will have a Python script that reads from a logical replication slot and if
it detects a relevant change, take an action

• Similar to what we did with triggers, but this moves the work to OUTSIDE
the transaction

• BUT...we can confirm whether or not the work is completed, thus if the
program fails, we can restart from last acknowledged transaction ID
Replacing Triggers
44
Reading the Changes
45
import json
import sys
import psycopg2
import psycopg2.extras
SQL = {
'availability': {
'insert': """SELECT calendar_manage(%(room_id)s, %(available_date)s)""",
'update': """SELECT calendar_manage(%(room_id)s, %(available_date)s)""",
},
'availability_rule': {
'insert': True,
'update': True,
},
'room': {
'insert': """
INSERT INTO calendar (room_id, status, calendar_date, calendar_range)
SELECT
%(id)s, 'closed', calendar_date, tstzrange(calendar_date, calendar_date + '1 day'::interval)
FROM generate_series(
date_trunc('week', CURRENT_DATE),
date_trunc('week', CURRENT_DATE + '52 weeks'::interval),
'1 day'::interval
) calendar_date;
""",
},
'unavailability': {
'insert': """SELECT calendar_manage(%(room_id)s, %(unavailable_date)s)""",
'update': """SELECT calendar_manage(%(room_id)s, %(unavailable_date)s)""",
},
}
Reading the Changes
46
class StreamReader(object):
def _consume_change(self, payload):
connection = psycopg2.connect("dbname=realtime")
cursor = connection.cursor()
for data in payload['change']:
sql = SQL.get(data.get('table'), {}).get(data.get('kind'))
if not sql:
return
params = dict(zip(data['columnnames'], data['columnvalues']))
if data['table'] == 'availability_rule':
self._perform_availability_rule(cursor, data['kind'], params)
else:
cursor.execute(sql, params)
connection.commit()
cursor.close()
connection.close()
def _perform_availability_rule(self, cursor, kind, params):
if kind == 'update':
cursor.execute("""DELETE FROM availability WHERE availability_rule_id = %(id)s""", params)
if kind in ['insert', 'update']:
days_of_week = params['days_of_week'].replace('{', '').replace('}', '').split(',')
for day_of_week in days_of_week:
params['day_of_week'] = day_of_week
cursor.execute(
"""
SELECT availability_rule_bulk_insert(ar, %(day_of_week)s)
FROM availability_rule ar
WHERE ar.id = %(id)s
""", params)
Reading the Changes
47
def __init__(self):
self.connection = psycopg2.connect("dbname=schedule",
connection_factory=psycopg2.extras.LogicalReplicationConnection,
)
def __call__(self, msg):
payload = json.loads(msg.payload, strict=False)
print(payload)
self._consume_change(payload)
msg.cursor.send_feedback(flush_lsn=msg.data_start)
Reading the Changes
48
reader = StreamReader()
cursor = reader.connection.cursor()
cursor.start_replication(slot_name='schedule', decode=True)
try:
cursor.consume_stream(reader)
except KeyboardInterrupt:
print("Stopping reader...")
finally:
cursor.close()
reader.connection.close()
print("Exiting reader")
• A consumer of the logical stream can only read one change at a time

• If our processing of a change takes a lot of time, it will create a backlog
of changes

• Backlog means the PostgreSQL server needs to retain more WAL logs

• Retaining too many WAL logs can lead to running out of disk space

• Running out of disk space can lead to...rough times.
The Consumer Bottleneck
49
🌩
🌤
🌥
☁
Can we move any processing to a
separate part of the application?
• Can utilize a durable message queueing system to store any WAL changes
that are necessary to perform post-processing on

• Ensure the changes are worked on in order

• "Divide-and-conquer" workload - have multiple workers acting on
different "topics"

• Remove WAL bloat
Shifting the Workload
51
• Durable message processing and distribution system

• Streams

• Supports parallelization of consumers

• Multiple consumers, partitions

• Highly-available, distributed architecture

• Acknowledgement of receiving, processing messages; can replay (sounds
like WAL?)
Apache Kafka
52
Architecture
53
WAL Consumer
54
import json, sys
from kafka import KafkaProducer
from kafka.errors import KafkaError
import psycopg2
import psycopg2.extras
TABLES = set([
'availability', 'availability_rule', 'room', 'unavailability',
])
reader = WALConsumer()
cursor = reader.connection.cursor()
cursor.start_replication(slot_name='schedule', decode=True)
try:
cursor.consume_stream(reader)
except KeyboardInterrupt:
print("Stopping reader...")
finally:
cursor.close()
reader.connection.close()
print("Exiting reader")
WAL Consumer
55
class WALConsumer(object):
def __init__(self):
self.connection = psycopg2.connect("dbname=realtime",
connection_factory=psycopg2.extras.LogicalReplicationConnection,
)
self.producer = producer = KafkaProducer(
bootstrap_servers=['localhost:9092'],
value_serializer=lambda m: json.dumps(m).encode('ascii'),
)
def __call__(self, msg):
payload = json.loads(msg.payload, strict=False)
print(payload)
# determine if the payload should be passed on to a consumer listening
# to the Kafka que
for data in payload['change']:
if data.get('table') in TABLES:
self.producer.send(data.get('table'), data)
# ensure everything is sent; call flush at this point
self.producer.flush()
# acknowledge that the change has been read - tells PostgreSQL to stop
# holding onto this log file
msg.cursor.send_feedback(flush_lsn=msg.data_start)
Kafka Consumer
56
import json
from kafka import KafkaConsumer
from kafka.structs import OffsetAndMetadata, TopicPartition
import psycopg2
class Worker(object):
"""Base class to work perform any post processing on changes"""
OPERATIONS = set([]) # override with "insert", "update", "delete"
def __init__(self, topic):
# connect to the PostgreSQL database
self.connection = psycopg2.connect("dbname=realtime")
# connect to Kafka
self.consumer = KafkaConsumer(
bootstrap_servers=['localhost:9092'],
value_deserializer=lambda m: json.loads(m.decode('utf8')),
auto_offset_reset="earliest",
group_id='1')
# subscribe to the topic(s)
self.consumer.subscribe(topic if isinstance(topic, list) else [topic])
Kafka Consumer
57
def run(self):
"""Function that runs ad-infinitum"""
# loop through the payloads from the consumer
# determine if there are any follow-up actions based on the kind of
# operation, and if so, act upon it
# always commit when done.
for msg in self.consumer:
print(msg)
# load the data from the message
data = msg.value
# determine if there are any follow-up operations to perform
if data['kind'] in self.OPERATIONS:
# open up a cursor for interacting with PostgreSQL
cursor = self.connection.cursor()
# put the parameters in an easy to digest format
params = dict(zip(data['columnnames'], data['columnvalues']))
# all the function
getattr(self, data['kind'])(cursor, params)
# commit any work that has been done, and close the cursor
self.connection.commit()
cursor.close()
# acknowledge the message has been handled
tp = TopicPartition(msg.topic, msg.partition)
offsets = {tp: OffsetAndMetadata(msg.offset, None)}
self.consumer.commit(offsets=offsets)
Kafka Consumer
58
# override with the appropriate post-processing code
def insert(self, cursor, params):
"""Override with any post-processing to be done on an ``INSERT``"""
raise NotImplementedError()
def update(self, cursor, params):
"""Override with any post-processing to be done on an ``UPDATE``"""
raise NotImplementedError()
def delete(self, cursor, params):
"""Override with any post-processing to be done on an ``DELETE``"""
raise NotImplementedError()
Testing Our Application
• Logical decoding allows the bulk inserts to occur significantly faster from a
transactional view

• DELETEs are tricky if you need to do anything other than using the
PRIMARY KEY

• Can bucket changes by topic

• Potential bottleneck for long running execution, but bottlenecks are
isolated to specific queues
Lessons
60
Conclusion
61
• PostgreSQL is robust

• Triggers will keep your data in sync but can have
significant performance overhead

• Utilizing a logical replication slot can eliminate trigger
overhead and transfer the computational load
elsewhere

• Not a panacea: still need to use good architectural
patterns!
jonathan.katz@crunchydata.com
@jkatz05
Thank You! Questions?
Appendix
Appendix A: Schema for Example
Managing Availability
65
CREATE TABLE room (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE availability_rule (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE,
days_of_week int[] NOT NULL,
start_time time NOT NULL,
end_time time NOT NULL,
generate_weeks_into_future int NOT NULL DEFAULT 52
);
Managing Availability
66
CREATE TABLE availability (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE,
availability_rule_id int NOT NULL
REFERENCES availability_rule (id) ON DELETE CASCADE,
available_date date NOT NULL,
available_range tstzrange NOT NULL
);
CREATE INDEX availability_available_range_gist_idx
ON availability
USING gist(available_range);
Managing Availability
67
CREATE TABLE unavailability (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE,
unavailable_date date NOT NULL,
unavailable_range tstzrange NOT NULL
);
CREATE INDEX unavailability_unavailable_range_gist_idx
ON unavailability
USING gist(unavailable_range);
Managing Availability
68
CREATE TABLE calendar (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE,
status text NOT NULL,
calendar_date date NOT NULL,
calendar_range tstzrange NOT NULL
);
CREATE INDEX calendar_room_id_calendar_date_idx
ON calendar (room_id, calendar_date);
Appendix B:
Finding Availability for a Room
70
/** AVAILABILITY, UNAVAILABILITY, and CALENDAR */
/** We need some lengthy functions to help generate the calendar */
71
/** Helper function: generate the available chunks of time within a block of time for a day within a calendar */
CREATE OR REPLACE FUNCTION calendar_generate_available(room_id int, calendar_range tstzrange)
RETURNS TABLE(status text, calendar_range tstzrange)
AS $$
WITH RECURSIVE availables AS (
SELECT
'closed' AS left_status,
CASE
WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval)
ELSE
tstzrange(
calendar_date,
lower(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval))
)
END AS left_range,
CASE isempty(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval))
WHEN TRUE THEN 'closed'
ELSE 'available'
END AS center_status,
availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval) AS center_range,
'closed' AS right_status,
CASE
WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval)
ELSE
tstzrange(
upper(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)),
calendar_date + '1 day'::interval
)
END AS right_range
FROM generate_series(lower($2), upper($2), '1 day'::interval) AS calendar_date
LEFT OUTER JOIN availability ON
availability.room_id = $1 AND
availability.available_range && $2
UNION
SELECT
'closed' AS left_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
lower(availables.left_range),
lower(availables.left_range * availability.available_range)
)
ELSE
tstzrange(
lower(availables.right_range),
lower(availables.right_range * availability.available_range)
)
END AS left_range,
CASE
WHEN
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
THEN 'available'
ELSE 'closed'
END AS center_status,
CASE
WHEN availability.available_range && availables.left_range THEN
availability.available_range * availables.left_range
ELSE
availability.available_range * availables.right_range
END AS center_range,
'closed' AS right_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
upper(availables.left_range * availability.available_range),
upper(availables.left_range)
)
ELSE
tstzrange(
upper(availables.right_range * availability.available_range),
upper(availables.right_range)
)
END AS right_range
FROM availables
JOIN availability ON
availability.room_id = $1 AND
availability.available_range && $2 AND
availability.available_range <> availables.center_range AND (
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
)
)
SELECT *
FROM (
SELECT
x.left_status AS status,
x.left_range AS calendar_range
FROM availables x
LEFT OUTER JOIN availables y ON
x.left_range <> y.left_range AND
x.left_range @> y.left_range
GROUP BY 1, 2
HAVING NOT bool_or(COALESCE(x.left_range @> y.left_range, FALSE))
UNION
SELECT DISTINCT
x.center_status AS status,
x.center_range AS calendar_range
FROM availables x
UNION
SELECT
x.right_status AS status,
x.right_range AS calendar_range
FROM availables x
LEFT OUTER JOIN availables y ON
x.right_range <> y.right_range AND
x.right_range @> y.right_range
GROUP BY 1, 2
HAVING NOT bool_or(COALESCE(x.right_range @> y.right_range, FALSE))
) x
WHERE
NOT isempty(x.calendar_range) AND
NOT lower_inf(x.calendar_range) AND
NOT upper_inf(x.calendar_range) AND
x.calendar_range <@ $2
$$ LANGUAGE SQL STABLE;
This is the first of two

helpers functions...
• We will have two availability rules:

• Open every day 8am - 8pm

• Open every day 9pm - 10:30pm
For this experiment
72
73
INSERT INTO room (name) VALUES ('Test Room');
INSERT INTO availability_rule
(room_id, days_of_week, start_time, end_time)
VALUES
(1, ARRAY[1,2,3,4,5,6,7], '08:00', '20:00'),
(1, ARRAY[1,2,3,4,5,6,7], '21:00', '22:30');
74
/** Helper function: generate the available chunks of time within a
block of time for a day within a calendar */
CREATE OR REPLACE FUNCTION calendar_generate_available(room_id int,
calendar_range tstzrange)
RETURNS TABLE(status text, calendar_range tstzrange)
AS $$
75
WITH RECURSIVE availables AS (
SELECT
'closed' AS left_status,
CASE
WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1
day'::interval)
ELSE
tstzrange(
calendar_date,
lower(availability.available_range * tstzrange(calendar_date, calendar_date +
'1 day'::interval))
)
END AS left_range,
CASE isempty(availability.available_range * tstzrange(calendar_date, calendar_date + '1
day'::interval))
WHEN TRUE THEN 'closed'
ELSE 'available'
END AS center_status,
availability.available_range * tstzrange(calendar_date, calendar_date + '1
day'::interval) AS center_range,
'closed' AS right_status,
CASE
WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1
day'::interval)
ELSE
tstzrange(
upper(availability.available_range * tstzrange(calendar_date, calendar_date +
'1 day'::interval)),
calendar_date + '1 day'::interval
)
END AS right_range
FROM generate_series(lower($2), upper($2), '1 day'::interval) AS calendar_date
LEFT OUTER JOIN availability ON
availability.room_id = $1 AND
availability.available_range && $2
76
77
UNION
SELECT
'closed' AS left_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
lower(availables.left_range),
lower(availables.left_range * availability.available_range)
)
ELSE
tstzrange(
lower(availables.right_range),
lower(availables.right_range * availability.available_range)
)
END AS left_range,
CASE
WHEN
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
THEN 'available'
ELSE 'closed'
END AS center_status,
CASE
WHEN availability.available_range && availables.left_range THEN
availability.available_range * availables.left_range
ELSE
availability.available_range * availables.right_range
END AS center_range,
'closed' AS right_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
upper(availables.left_range * availability.available_range),
upper(availables.left_range)
)
ELSE
tstzrange(
upper(availables.right_range * availability.available_range),
upper(availables.right_range)
)
END AS right_range
FROM availables
JOIN availability ON
availability.room_id = $1 AND
availability.available_range && $2 AND
availability.available_range <> availables.center_range AND (
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
))
78
UNION
SELECT
...
FROM availables
JOIN availability ON
availability.room_id = $1 AND
availability.available_range && $2 AND
availability.available_range <> availables.center_range AND (
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
)
79
80
'closed' AS left_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
lower(availables.left_range),
lower(availables.left_range * availability.available_range)
)
ELSE
tstzrange(
lower(availables.right_range),
lower(availables.right_range * availability.available_range)
)
END AS left_range,
CASE
WHEN
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
THEN 'available'
ELSE 'closed'
END AS center_status,
CASE
WHEN availability.available_range && availables.left_range THEN
availability.available_range * availables.left_range
ELSE
availability.available_range * availables.right_range
END AS center_range,
'closed' AS right_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
upper(availables.left_range * availability.available_range),
upper(availables.left_range)
)
ELSE
tstzrange(
upper(availables.right_range * availability.available_range),
upper(availables.right_range)
)
END AS right_range
81
82
83
SELECT *
FROM (
SELECT
x.left_status AS status,
x.left_range AS calendar_range
FROM availables x
LEFT OUTER JOIN availables y ON
x.left_range <> y.left_range AND
x.left_range @> y.left_range
GROUP BY 1, 2
HAVING NOT bool_or(COALESCE(x.left_range @> y.left_range, FALSE))
UNION
SELECT DISTINCT
x.center_status AS status,
x.center_range AS calendar_range
FROM availables x
UNION
SELECT
x.right_status AS status,
x.right_range AS calendar_range
FROM availables x
LEFT OUTER JOIN availables y ON
x.right_range <> y.right_range AND
x.right_range @> y.right_range
GROUP BY 1, 2
HAVING NOT bool_or(COALESCE(x.right_range @> y.right_range, FALSE))
) x
WHERE
NOT isempty(x.calendar_range) AND
NOT lower_inf(x.calendar_range) AND
NOT upper_inf(x.calendar_range) AND
x.calendar_range <@ $2
$$ LANGUAGE SQL STABLE;
84
X
X
X
X
X
X X
85
1 of 85

Recommended

Build a Complex, Realtime Data Management App with Postgres 14! by
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
475 views57 slides
Looking ahead at PostgreSQL 15 by
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Jonathan Katz
695 views48 slides
Understanding SQL Trace, TKPROF and Execution Plan for beginners by
Understanding SQL Trace, TKPROF and Execution Plan for beginnersUnderstanding SQL Trace, TKPROF and Execution Plan for beginners
Understanding SQL Trace, TKPROF and Execution Plan for beginnersCarlos Sierra
4.2K views42 slides
SQLd360 by
SQLd360SQLd360
SQLd360Mauro Pagano
1.3K views45 slides
Pivoting Data with SparkSQL by Andrew Ray by
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RaySpark Summit
16.4K views33 slides
NOSQL Database: Apache Cassandra by
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
2.2K views48 slides

More Related Content

What's hot

Oracle Performance Tuning Fundamentals by
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsCarlos Sierra
13.1K views60 slides
How to find what is making your Oracle database slow by
How to find what is making your Oracle database slowHow to find what is making your Oracle database slow
How to find what is making your Oracle database slowSolarWinds
5.9K views50 slides
PostgreSQL Database Slides by
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slidesmetsarin
5.3K views124 slides
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve... by
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Aaron Shilo
2.1K views120 slides
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in... by
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
3.6K views55 slides
The PostgreSQL Query Planner by
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query PlannerCommand Prompt., Inc
8.5K views32 slides

What's hot(20)

Oracle Performance Tuning Fundamentals by Carlos Sierra
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
Carlos Sierra13.1K views
How to find what is making your Oracle database slow by SolarWinds
How to find what is making your Oracle database slowHow to find what is making your Oracle database slow
How to find what is making your Oracle database slow
SolarWinds5.9K views
PostgreSQL Database Slides by metsarin
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
metsarin5.3K views
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve... by Aaron Shilo
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Aaron Shilo2.1K views
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in... by InfluxData
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxData3.6K views
How to Take Advantage of Optimizer Improvements in MySQL 8.0 by Norvald Ryeng
How to Take Advantage of Optimizer Improvements in MySQL 8.0How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0
Norvald Ryeng1.2K views
Introduction to Oracle Data Guard Broker by Zohar Elkayam
Introduction to Oracle Data Guard BrokerIntroduction to Oracle Data Guard Broker
Introduction to Oracle Data Guard Broker
Zohar Elkayam674 views
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J... by Databricks
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks98.6K views
Ms sql server architecture by Ajeet Singh
Ms sql server architectureMs sql server architecture
Ms sql server architecture
Ajeet Singh64.3K views
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG by PgDay.Seoul
[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
PgDay.Seoul1.9K views
Hyperspace for Delta Lake by Databricks
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks560 views
게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming by Amazon Web Services Korea
게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Oracle Performance Tools of the Trade by Carlos Sierra
Oracle Performance Tools of the TradeOracle Performance Tools of the Trade
Oracle Performance Tools of the Trade
Carlos Sierra11.1K views
Oracle RAC features on Exadata by Anil Nair
Oracle RAC features on ExadataOracle RAC features on Exadata
Oracle RAC features on Exadata
Anil Nair6.7K views
Introducing DataFrames in Spark for Large Scale Data Science by Databricks
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
Databricks41K views
PostgreSQL Tutorial For Beginners | Edureka by Edureka!
PostgreSQL Tutorial For Beginners | EdurekaPostgreSQL Tutorial For Beginners | Edureka
PostgreSQL Tutorial For Beginners | Edureka
Edureka!1.1K views
Open Source 101 2022 - MySQL Indexes and Histograms by Frederic Descamps
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
Frederic Descamps358 views
Deep Dive: Memory Management in Apache Spark by Databricks
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks14.5K views

Similar to Building a Complex, Real-Time Data Management Application

Scheduling in Linux and Web Servers by
Scheduling in Linux and Web ServersScheduling in Linux and Web Servers
Scheduling in Linux and Web ServersDavid Evans
6.3K views63 slides
Spark etl by
Spark etlSpark etl
Spark etlImran Rashid
2.8K views40 slides
Intro.ppt by
Intro.pptIntro.ppt
Intro.pptAnonymous9etQKwW
2 views56 slides
Intro_2.ppt by
Intro_2.pptIntro_2.ppt
Intro_2.pptMumitAhmed1
5 views56 slides
Data herding by
Data herdingData herding
Data herdingunbracketed
176 views75 slides
Data herding by
Data herdingData herding
Data herdingunbracketed
904 views75 slides

Similar to Building a Complex, Real-Time Data Management Application(20)

Scheduling in Linux and Web Servers by David Evans
Scheduling in Linux and Web ServersScheduling in Linux and Web Servers
Scheduling in Linux and Web Servers
David Evans6.3K views
Benchmarking at Parse by Travis Redman
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
Travis Redman1.2K views
Advanced Benchmarking at Parse by MongoDB
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
MongoDB1.8K views
Performance Tuning and Optimization by MongoDB
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
MongoDB3.9K views
Benchmarking Solr Performance at Scale by thelabdude
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude18.2K views
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs by Zohar Elkayam
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Zohar Elkayam3.2K views
Migration to ClickHouse. Practical guide, by Alexander Zaitsev by Altinity Ltd
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd9.4K views
ElasticSearch as (only) datastore by Tomas Sirny
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
Tomas Sirny1.5K views
From Pipelines to Refineries: scaling big data applications with Tim Hunter by Databricks
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
Databricks2K views
Big data week presentation by Joseph Adler
Big data week presentationBig data week presentation
Big data week presentation
Joseph Adler1.1K views

More from Jonathan Katz

Vectors are the new JSON in PostgreSQL by
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLJonathan Katz
1.2K views27 slides
High Availability PostgreSQL on OpenShift...and more! by
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!Jonathan Katz
548 views17 slides
Get Your Insecure PostgreSQL Passwords to SCRAM by
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMJonathan Katz
1.4K views121 slides
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM by
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMJonathan Katz
1.7K views121 slides
Operating PostgreSQL at Scale with Kubernetes by
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
2.7K views28 slides
Using PostgreSQL With Docker & Kubernetes - July 2018 by
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
930 views32 slides

More from Jonathan Katz(12)

Vectors are the new JSON in PostgreSQL by Jonathan Katz
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
Jonathan Katz1.2K views
High Availability PostgreSQL on OpenShift...and more! by Jonathan Katz
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
Jonathan Katz548 views
Get Your Insecure PostgreSQL Passwords to SCRAM by Jonathan Katz
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
Jonathan Katz1.4K views
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM by Jonathan Katz
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Jonathan Katz1.7K views
Operating PostgreSQL at Scale with Kubernetes by Jonathan Katz
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
Jonathan Katz2.7K views
Using PostgreSQL With Docker & Kubernetes - July 2018 by Jonathan Katz
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
Jonathan Katz930 views
An Introduction to Using PostgreSQL with Docker & Kubernetes by Jonathan Katz
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
Jonathan Katz2.3K views
Developing and Deploying Apps with the Postgres FDW by Jonathan Katz
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz4.5K views
On Beyond (PostgreSQL) Data Types by Jonathan Katz
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
Jonathan Katz3.2K views
Accelerating Local Search with PostgreSQL (KNN-Search) by Jonathan Katz
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)
Jonathan Katz5.9K views
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies by Jonathan Katz
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Jonathan Katz18K views
Indexing Complex PostgreSQL Data Types by Jonathan Katz
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
Jonathan Katz17.1K views

Recently uploaded

Future of Learning - Yap Aye Wee.pdf by
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdfNUS-ISS
41 views11 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
225 views86 slides
Throughput by
ThroughputThroughput
ThroughputMoisés Armani Ramírez
36 views11 slides
RADIUS-Omnichannel Interaction System by
RADIUS-Omnichannel Interaction SystemRADIUS-Omnichannel Interaction System
RADIUS-Omnichannel Interaction SystemRADIUS
15 views21 slides
Five Things You SHOULD Know About Postman by
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About PostmanPostman
27 views43 slides
Voice Logger - Telephony Integration Solution at Aegis by
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at AegisNirmal Sharma
17 views1 slide

Recently uploaded(20)

Future of Learning - Yap Aye Wee.pdf by NUS-ISS
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
NUS-ISS41 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software225 views
RADIUS-Omnichannel Interaction System by RADIUS
RADIUS-Omnichannel Interaction SystemRADIUS-Omnichannel Interaction System
RADIUS-Omnichannel Interaction System
RADIUS15 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman27 views
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma17 views
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV by Splunk
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk88 views
Web Dev - 1 PPT.pdf by gdsczhcet
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet55 views
Combining Orchestration and Choreography for a Clean Architecture by ThomasHeinrichs1
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean Architecture
ThomasHeinrichs169 views
How the World's Leading Independent Automotive Distributor is Reinventing Its... by NUS-ISS
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...
NUS-ISS15 views
.conf Go 2023 - Data analysis as a routine by Splunk
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
Splunk93 views
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum... by NUS-ISS
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
NUS-ISS34 views
AI: mind, matter, meaning, metaphors, being, becoming, life values by Twain Liu 刘秋艳
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life values
handbook for web 3 adoption.pdf by Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex19 views
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen... by NUS-ISS
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
NUS-ISS28 views
Understanding GenAI/LLM and What is Google Offering - Felix Goh by NUS-ISS
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
NUS-ISS41 views

Building a Complex, Real-Time Data Management Application

  • 1. Let's Build a Complex, Real-Time Data Management Application JONATHAN S. KATZ PGCONF.EU 2018 OCTOBER 25, 2018 ...before the session ends!
  • 2. About Crunchy Data 2 Market Leading Data Security • Crunchy Certified PostgreSQL is open source and Common Criteria EAL 2+ Certified, with essential security enhancements for enterprise deployment • Author of the DISA Secure Technology Implementation Guide for PostgreSQL and co-author of CIS PostgreSQL Benchmark. Move ATO from weeks to days! Cloud Ready Data Management • Open source, Kubernetes-based solutions proven to scale to 1000s of database instances • Cloud-agnostic technology provide flexibility on how to deploy databases to public clouds, private clouds, or on-premise technology Leader in Open Source Enterprise PostgreSQL • Developer of essential open source tools for high availability, disaster recovery, and and monitoring for PostgreSQL • Leading contributor and sponsor of features that enhance stability, security, and performance of PostgreSQL
  • 3. • Director of Communications, Crunchy Data • Previously: Engineering leadership in startups • Longtime PostgreSQL community contributor • Advocacy & various committees for PGDG • @postgresql + .org content • Director, PgUS • Co-Organizer, NYCPUG • Conference organization + speaking • @jkatz05 About Me 3
  • 4. • This talk introduces many different tools and techniques available in PostgreSQL for building applications • Introduces different features and where to find out more information • We have a lot of material to cover in a short time - the slides and demonstrations will be made available How to Approach This Talk 4
  • 5. • Imagine we are managing the rooms at the Marriott Lisbon Hotel • We have a set of operating hours in which the rooms can be booked • Only one booking can occur in the room at a given time The Problem 5
  • 7. • We need to know... • All the rooms that are available to book • When the rooms are available to be booked (operating hours) • When the rooms have been booked • And... • The system needs to be able to CRUD fast • (Create, Read, Update, Delete. Fast). Application Requirements 7
  • 9. First, let's talk about how we can find availability
  • 10. • Availability can be thought about in three ways: • Closed • Available • Unavailable (or "booked") • Our ultimate "calendar tuple" is (room, status, range) Managing Availability 10
  • 11. • PostgreSQL 9.2 introduced "range types" that included the ability to store and efficiently search over ranges of data • Built-in: • Date, Timestamps • Integer, Numeric • Lookups (e.g. overlaps) can be sped up using GiST and SP-GiST indexes PostgreSQL Range Types 11 SELECT tstzrange('2018-10-26 09:30'::timestamptz, '2018-10-26 10:30'::timestamptz);
  • 13. Availability 13 SELECT * FROM ( VALUES ('closed', tstzrange('2018-10-26 0:00', '2018-10-26 8:00')), ('available', tstzrange('2018-10-26 08:00', '2018-10-26 09:30')), ('unavailable', tstzrange('2018-10-26 09:30', '2018-10-26 10:30')), ('available', tstzrange('2018-10-26 10:30', '2018-10-26 16:30')), ('unavailable', tstzrange('2018-10-26 16:30', '2018-10-26 18:30')), ('available', tstzrange('2018-10-26 18:30', '2018-10-26 20:00')), ('closed', tstzrange('2018-10-26 20:00', '2018-10-27 0:00')) ) x(status, calendar_range) ORDER BY lower(x.calendar_range);
  • 15. • Insert new ranges and dividing them up • PostgreSQL does not work well with discontiguous ranges (...yet) • Availability • Just for one day - what about other days? • What happens with data in the past? • What happens with data in the future? • Unavailability • Ensure no double-bookings • Overlapping Events? • Just one space But... 15
  • 16. availability_rule id <serial> PRIMARY KEY room_id <int> REFERENCES (room) days_of_week <int[]> start_time <time> end_time <time> generate_weeks_into_future <int> DEFAULT 52 room id <serial> PRIMARY KEY name <text> availability id <serial> PRIMARY KEY room_id <int> REFERENCES (room) availability_rule_id <int> REFERENCES (availabilityrule) available_date <date> available_range <tstzrange> unavailability id <serial> PRIMARY KEY room_id <int> REFERENCES (room) unavailable_date <date> unavailable_range <tstzrange> calendar id <serial> PRIMARY KEY room_id <int> REFERENCES (room) status <text> DOMAIN: {available, unavailable, closed} calendar_date <date> calendar_range <tstzrange> Managing Availability 16
  • 17. • We can now store data, but what about: • Generating initial calendar? • Generating availability based on rules? • Generating unavailability? • Sounds like we need to build an application Managing Availability 17
  • 18. • To build our application, there are a few topics we will need to explore first: • generate_series • Recursive queries • SQL Functions • Set returning functions • PL/pgsql • Triggers Managing Availability 18
  • 19. • Generate series is a "set returning" function, i.e. a function that can return multiple rows of data • Generate series can return: • A set of numbers (int, bigint, numeric) either incremented by 1 or some other integer interval • A set of timestamps incremented by a time interval(!!) generate_series: More than just generating test data 19 SELECT x::date FROM generate_series( '2018-01-01'::date, '2018-12-31'::date, '1 day'::interval ) x;
  • 20. • PostgreSQL 8.4 introduced the "WITH" syntax and with it also introduced the ability to perform recursive queries • WITH RECURSIVE ... AS () • Base case vs. recursive case • UNION vs. UNION ALL • CAN HIT INFINITE LOOPS Recursion in my SQL? 20
  • 21. Recursion in my SQL? 21 WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac ) SELECT fac.n, fac.i FROM fac; Nope
  • 22. Recursion in my SQL? 22 WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= 100 ) SELECT fac.n, fac.i FROM fac; Better
  • 23. • PostgreSQL provides the ability to write functions to help encapsulate repeated behavior • PostgreSQL 11 introduces stored procedures which enables you to embed transactions! • SQL functions have many properties, including: • Input / output • Volatility (IMMUTABLE, STABLE, VOLATILE) (default VOLATILE) • Parallel safety (default PARALLEL UNSAFE) • LEAKPROOF; SECURITY DEFINER • Execution Cost • Language type (more on this later) Functions 23
  • 24. Functions 24 CREATE OR REPLACE FUNCTION pgconfeu_fac(n int) RETURNS numeric AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT max(fac.n) FROM fac; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 25. Functions 25 CREATE OR REPLACE FUNCTION pgconfeu_fac_set(n int) RETURNS SETOF numeric AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT fac.n FROM fac ORDER BY fac.n; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 26. Functions 26 CREATE OR REPLACE FUNCTION pgopen_fac_table(n int) RETURNS TABLE(n numeric) AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT fac.n FROM fac ORDER BY fac.n; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 27. • PostgreSQL has the ability to load in procedural languages and execute code in them beyond SQL • "PL" • Built-in: pgSQL, Python, Perl, Tcl • Others: Javascript, R, Java, C, JVM, Container, LOLCODE, Ruby, PHP, Lua, pgPSM, Scheme Procedural Languages 27
  • 28. PL/pgSQL 28 CREATE EXTENSION IF NOT EXISTS plpgsql; CREATE OR REPLACE FUNCTION pgopen_fac_plpgsql(n int) RETURNS numeric AS $$ DECLARE fac numeric; i int; BEGIN fac := 1; FOR i IN 1..n LOOP fac := fac * i; END LOOP; RETURN fac; END; $$ LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE;
  • 29. • Triggers are functions that can be called before/after/instead of an operation or event • Data changes (INSERT/UPDATE/DELETE) • Events (DDL, DCL, etc. changes) • Atomic • Must return "trigger" or "event_trigger" • (Return "NULL" in a trigger if you want to skip operation) • (Gotcha: RETURN OLD [INSERT] / RETURN NEW [DELETE]) • Execute once per modified row or once per SQL statement • Multiple triggers on same event will execute in alphabetical order • Writeable in any PL language that defined trigger interface Triggers 29
  • 31. We will scan through the application code. It will be available for download later ;-)
  • 33. • [Test your live demos before running them, and you will have much success!] • availability_rule inserts took some time, > 500ms • availability: INSERT 52 • calendar: INSERT 52 from nontrivial function • Updates on individual availability / unavailability are not too painful • Lookups are faaaaaaaast Lessons of The Test 33
  • 34. How about at (web) scale?
  • 35. • Even with only 100 more rooms with a few set of rules, rule generation time increased significantly • Lookups are still lightning fast! Web Scale :( 35
  • 36. • Added in PostgreSQL 9.4 • Replays all logical changes made to the database • Create a logical replication slot in your database • Only one receiver can consume changes from one slot at a time • Slot keeps track of last change that was read by a receiver • If receiver disconnects, slot will ensure database holds changes until receiver reconnects • Only changes from tables with primary keys are relayed • As of PostgreSQL 10, you can set a "REPLICA IDENTITY" on a UNIQUE, NOT NULL, non-deferrable, non-partial column(s) • Basis for Logical Replication Logical Decoding 36
  • 37. • A logical replication slot has a name and an output plugin • PostgreSQL comes with the "test" output plugin • Have to write a custom parser to read changes from test output plugin • Several output plugins and libraries available • wal2json: https://github.com/eulerto/wal2json • jsoncdc: https://github.com/posix4e/jsoncdc • Debezium: http://debezium.io/ • (Test: https://www.postgresql.org/docs/11/static/test-decoding.html) • Every change in the database is streamed • Need to be aware of the logical decoding format Logical Decoding Out of the Box 37
  • 38. • C: libpq • pg_recvlogical • PostgreSQL functions • Python: psycopg2 - version 2.7 • JDBC: version 42 • Go: go-pgx • JavaScript: node-postgres (pg-logical-replication) Driver Support 38
  • 39. Using Logical Decoding 39 wal_level = logical max_wal_senders = 2 max_replication_slots = 2 postgresql.conf local replication jkatz trust pg_hba.conf # DEVELOPMENT ONLY SELECT * FROM pg_create_logical_replication_slot('schedule', 'wal2json'); In the database:
  • 40. • We know it takes time to regenerate calendar • Want to ensure changes always propagate but want to ensure all users (managers, calendar searchers) have good experience Thoughts 40 🤔
  • 41. • Will use the same data model as before as well as the same helper functions, but without the triggers • (That's a lie, we will have one set of DELETE triggers as "DELETE" in the wal2json output plugin currently does not provide enough information) Replacing Triggers 41
  • 42. Replacing Triggers 42 /** * Helper function: substitute the data within the `calendar`; this can be used * for all updates that occur on `availability` and `unavailability` */ CREATE OR REPLACE FUNCTION calendar_manage(room_id int, calendar_date date) RETURNS void AS $$ WITH delete_calendar AS ( DELETE FROM calendar WHERE room_id = $1 AND calendar_date = $2 ) INSERT INTO calendar (room_id, status, calendar_date, calendar_range) SELECT $1, c.status, $2, c.calendar_range FROM calendar_generate_calendar($1, tstzrange($2, $2 + 1)) c $$ LANGUAGE SQL;
  • 43. Replacing Triggers 43 /** Now, the trigger functions for availability and unavailability; needs this for DELETE */ CREATE OR REPLACE FUNCTION availability_manage() RETURNS trigger AS $trigger$ BEGIN IF TG_OP = 'DELETE' THEN PERFORM calendar_manage(OLD.room_id, OLD.available_date); RETURN OLD; END IF; END; $trigger$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION unavailability_manage() RETURNS trigger AS $trigger$ BEGIN IF TG_OP = 'DELETE' THEN PERFORM calendar_manage(OLD.room_id, OLD.unavailable_date); RETURN OLD; END IF; END; $trigger$ LANGUAGE plpgsql; /** And the triggers, applied to everything */ CREATE TRIGGER availability_manage AFTER DELETE ON availability FOR EACH ROW EXECUTE PROCEDURE availability_manage(); CREATE TRIGGER unavailability_manage AFTER DELETE ON unavailability FOR EACH ROW EXECUTE PROCEDURE unavailability_manage();
  • 44. • We will have a Python script that reads from a logical replication slot and if it detects a relevant change, take an action • Similar to what we did with triggers, but this moves the work to OUTSIDE the transaction • BUT...we can confirm whether or not the work is completed, thus if the program fails, we can restart from last acknowledged transaction ID Replacing Triggers 44
  • 45. Reading the Changes 45 import json import sys import psycopg2 import psycopg2.extras SQL = { 'availability': { 'insert': """SELECT calendar_manage(%(room_id)s, %(available_date)s)""", 'update': """SELECT calendar_manage(%(room_id)s, %(available_date)s)""", }, 'availability_rule': { 'insert': True, 'update': True, }, 'room': { 'insert': """ INSERT INTO calendar (room_id, status, calendar_date, calendar_range) SELECT %(id)s, 'closed', calendar_date, tstzrange(calendar_date, calendar_date + '1 day'::interval) FROM generate_series( date_trunc('week', CURRENT_DATE), date_trunc('week', CURRENT_DATE + '52 weeks'::interval), '1 day'::interval ) calendar_date; """, }, 'unavailability': { 'insert': """SELECT calendar_manage(%(room_id)s, %(unavailable_date)s)""", 'update': """SELECT calendar_manage(%(room_id)s, %(unavailable_date)s)""", }, }
  • 46. Reading the Changes 46 class StreamReader(object): def _consume_change(self, payload): connection = psycopg2.connect("dbname=realtime") cursor = connection.cursor() for data in payload['change']: sql = SQL.get(data.get('table'), {}).get(data.get('kind')) if not sql: return params = dict(zip(data['columnnames'], data['columnvalues'])) if data['table'] == 'availability_rule': self._perform_availability_rule(cursor, data['kind'], params) else: cursor.execute(sql, params) connection.commit() cursor.close() connection.close() def _perform_availability_rule(self, cursor, kind, params): if kind == 'update': cursor.execute("""DELETE FROM availability WHERE availability_rule_id = %(id)s""", params) if kind in ['insert', 'update']: days_of_week = params['days_of_week'].replace('{', '').replace('}', '').split(',') for day_of_week in days_of_week: params['day_of_week'] = day_of_week cursor.execute( """ SELECT availability_rule_bulk_insert(ar, %(day_of_week)s) FROM availability_rule ar WHERE ar.id = %(id)s """, params)
  • 47. Reading the Changes 47 def __init__(self): self.connection = psycopg2.connect("dbname=schedule", connection_factory=psycopg2.extras.LogicalReplicationConnection, ) def __call__(self, msg): payload = json.loads(msg.payload, strict=False) print(payload) self._consume_change(payload) msg.cursor.send_feedback(flush_lsn=msg.data_start)
  • 48. Reading the Changes 48 reader = StreamReader() cursor = reader.connection.cursor() cursor.start_replication(slot_name='schedule', decode=True) try: cursor.consume_stream(reader) except KeyboardInterrupt: print("Stopping reader...") finally: cursor.close() reader.connection.close() print("Exiting reader")
  • 49. • A consumer of the logical stream can only read one change at a time • If our processing of a change takes a lot of time, it will create a backlog of changes • Backlog means the PostgreSQL server needs to retain more WAL logs • Retaining too many WAL logs can lead to running out of disk space • Running out of disk space can lead to...rough times. The Consumer Bottleneck 49 🌩 🌤 🌥 ☁
  • 50. Can we move any processing to a separate part of the application?
  • 51. • Can utilize a durable message queueing system to store any WAL changes that are necessary to perform post-processing on • Ensure the changes are worked on in order • "Divide-and-conquer" workload - have multiple workers acting on different "topics" • Remove WAL bloat Shifting the Workload 51
  • 52. • Durable message processing and distribution system • Streams • Supports parallelization of consumers • Multiple consumers, partitions • Highly-available, distributed architecture • Acknowledgement of receiving, processing messages; can replay (sounds like WAL?) Apache Kafka 52
  • 54. WAL Consumer 54 import json, sys from kafka import KafkaProducer from kafka.errors import KafkaError import psycopg2 import psycopg2.extras TABLES = set([ 'availability', 'availability_rule', 'room', 'unavailability', ]) reader = WALConsumer() cursor = reader.connection.cursor() cursor.start_replication(slot_name='schedule', decode=True) try: cursor.consume_stream(reader) except KeyboardInterrupt: print("Stopping reader...") finally: cursor.close() reader.connection.close() print("Exiting reader")
  • 55. WAL Consumer 55 class WALConsumer(object): def __init__(self): self.connection = psycopg2.connect("dbname=realtime", connection_factory=psycopg2.extras.LogicalReplicationConnection, ) self.producer = producer = KafkaProducer( bootstrap_servers=['localhost:9092'], value_serializer=lambda m: json.dumps(m).encode('ascii'), ) def __call__(self, msg): payload = json.loads(msg.payload, strict=False) print(payload) # determine if the payload should be passed on to a consumer listening # to the Kafka que for data in payload['change']: if data.get('table') in TABLES: self.producer.send(data.get('table'), data) # ensure everything is sent; call flush at this point self.producer.flush() # acknowledge that the change has been read - tells PostgreSQL to stop # holding onto this log file msg.cursor.send_feedback(flush_lsn=msg.data_start)
  • 56. Kafka Consumer 56 import json from kafka import KafkaConsumer from kafka.structs import OffsetAndMetadata, TopicPartition import psycopg2 class Worker(object): """Base class to work perform any post processing on changes""" OPERATIONS = set([]) # override with "insert", "update", "delete" def __init__(self, topic): # connect to the PostgreSQL database self.connection = psycopg2.connect("dbname=realtime") # connect to Kafka self.consumer = KafkaConsumer( bootstrap_servers=['localhost:9092'], value_deserializer=lambda m: json.loads(m.decode('utf8')), auto_offset_reset="earliest", group_id='1') # subscribe to the topic(s) self.consumer.subscribe(topic if isinstance(topic, list) else [topic])
  • 57. Kafka Consumer 57 def run(self): """Function that runs ad-infinitum""" # loop through the payloads from the consumer # determine if there are any follow-up actions based on the kind of # operation, and if so, act upon it # always commit when done. for msg in self.consumer: print(msg) # load the data from the message data = msg.value # determine if there are any follow-up operations to perform if data['kind'] in self.OPERATIONS: # open up a cursor for interacting with PostgreSQL cursor = self.connection.cursor() # put the parameters in an easy to digest format params = dict(zip(data['columnnames'], data['columnvalues'])) # all the function getattr(self, data['kind'])(cursor, params) # commit any work that has been done, and close the cursor self.connection.commit() cursor.close() # acknowledge the message has been handled tp = TopicPartition(msg.topic, msg.partition) offsets = {tp: OffsetAndMetadata(msg.offset, None)} self.consumer.commit(offsets=offsets)
  • 58. Kafka Consumer 58 # override with the appropriate post-processing code def insert(self, cursor, params): """Override with any post-processing to be done on an ``INSERT``""" raise NotImplementedError() def update(self, cursor, params): """Override with any post-processing to be done on an ``UPDATE``""" raise NotImplementedError() def delete(self, cursor, params): """Override with any post-processing to be done on an ``DELETE``""" raise NotImplementedError()
  • 60. • Logical decoding allows the bulk inserts to occur significantly faster from a transactional view • DELETEs are tricky if you need to do anything other than using the PRIMARY KEY • Can bucket changes by topic • Potential bottleneck for long running execution, but bottlenecks are isolated to specific queues Lessons 60
  • 61. Conclusion 61 • PostgreSQL is robust • Triggers will keep your data in sync but can have significant performance overhead • Utilizing a logical replication slot can eliminate trigger overhead and transfer the computational load elsewhere • Not a panacea: still need to use good architectural patterns!
  • 64. Appendix A: Schema for Example
  • 65. Managing Availability 65 CREATE TABLE room ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, name text NOT NULL ); CREATE TABLE availability_rule ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE, days_of_week int[] NOT NULL, start_time time NOT NULL, end_time time NOT NULL, generate_weeks_into_future int NOT NULL DEFAULT 52 );
  • 66. Managing Availability 66 CREATE TABLE availability ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE, availability_rule_id int NOT NULL REFERENCES availability_rule (id) ON DELETE CASCADE, available_date date NOT NULL, available_range tstzrange NOT NULL ); CREATE INDEX availability_available_range_gist_idx ON availability USING gist(available_range);
  • 67. Managing Availability 67 CREATE TABLE unavailability ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE, unavailable_date date NOT NULL, unavailable_range tstzrange NOT NULL ); CREATE INDEX unavailability_unavailable_range_gist_idx ON unavailability USING gist(unavailable_range);
  • 68. Managing Availability 68 CREATE TABLE calendar ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE, status text NOT NULL, calendar_date date NOT NULL, calendar_range tstzrange NOT NULL ); CREATE INDEX calendar_room_id_calendar_date_idx ON calendar (room_id, calendar_date);
  • 70. 70 /** AVAILABILITY, UNAVAILABILITY, and CALENDAR */ /** We need some lengthy functions to help generate the calendar */
  • 71. 71 /** Helper function: generate the available chunks of time within a block of time for a day within a calendar */ CREATE OR REPLACE FUNCTION calendar_generate_available(room_id int, calendar_range tstzrange) RETURNS TABLE(status text, calendar_range tstzrange) AS $$ WITH RECURSIVE availables AS ( SELECT 'closed' AS left_status, CASE WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval) ELSE tstzrange( calendar_date, lower(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)) ) END AS left_range, CASE isempty(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)) WHEN TRUE THEN 'closed' ELSE 'available' END AS center_status, availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval) AS center_range, 'closed' AS right_status, CASE WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval) ELSE tstzrange( upper(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)), calendar_date + '1 day'::interval ) END AS right_range FROM generate_series(lower($2), upper($2), '1 day'::interval) AS calendar_date LEFT OUTER JOIN availability ON availability.room_id = $1 AND availability.available_range && $2 UNION SELECT 'closed' AS left_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( lower(availables.left_range), lower(availables.left_range * availability.available_range) ) ELSE tstzrange( lower(availables.right_range), lower(availables.right_range * availability.available_range) ) END AS left_range, CASE WHEN availability.available_range && availables.left_range OR availability.available_range && availables.right_range THEN 'available' ELSE 'closed' END AS center_status, CASE WHEN availability.available_range && availables.left_range THEN availability.available_range * availables.left_range ELSE availability.available_range * availables.right_range END AS center_range, 'closed' AS right_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( upper(availables.left_range * availability.available_range), upper(availables.left_range) ) ELSE tstzrange( upper(availables.right_range * availability.available_range), upper(availables.right_range) ) END AS right_range FROM availables JOIN availability ON availability.room_id = $1 AND availability.available_range && $2 AND availability.available_range <> availables.center_range AND ( availability.available_range && availables.left_range OR availability.available_range && availables.right_range ) ) SELECT * FROM ( SELECT x.left_status AS status, x.left_range AS calendar_range FROM availables x LEFT OUTER JOIN availables y ON x.left_range <> y.left_range AND x.left_range @> y.left_range GROUP BY 1, 2 HAVING NOT bool_or(COALESCE(x.left_range @> y.left_range, FALSE)) UNION SELECT DISTINCT x.center_status AS status, x.center_range AS calendar_range FROM availables x UNION SELECT x.right_status AS status, x.right_range AS calendar_range FROM availables x LEFT OUTER JOIN availables y ON x.right_range <> y.right_range AND x.right_range @> y.right_range GROUP BY 1, 2 HAVING NOT bool_or(COALESCE(x.right_range @> y.right_range, FALSE)) ) x WHERE NOT isempty(x.calendar_range) AND NOT lower_inf(x.calendar_range) AND NOT upper_inf(x.calendar_range) AND x.calendar_range <@ $2 $$ LANGUAGE SQL STABLE; This is the first of two helpers functions...
  • 72. • We will have two availability rules: • Open every day 8am - 8pm • Open every day 9pm - 10:30pm For this experiment 72
  • 73. 73 INSERT INTO room (name) VALUES ('Test Room'); INSERT INTO availability_rule (room_id, days_of_week, start_time, end_time) VALUES (1, ARRAY[1,2,3,4,5,6,7], '08:00', '20:00'), (1, ARRAY[1,2,3,4,5,6,7], '21:00', '22:30');
  • 74. 74 /** Helper function: generate the available chunks of time within a block of time for a day within a calendar */ CREATE OR REPLACE FUNCTION calendar_generate_available(room_id int, calendar_range tstzrange) RETURNS TABLE(status text, calendar_range tstzrange) AS $$
  • 75. 75 WITH RECURSIVE availables AS ( SELECT 'closed' AS left_status, CASE WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval) ELSE tstzrange( calendar_date, lower(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)) ) END AS left_range, CASE isempty(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)) WHEN TRUE THEN 'closed' ELSE 'available' END AS center_status, availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval) AS center_range, 'closed' AS right_status, CASE WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval) ELSE tstzrange( upper(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)), calendar_date + '1 day'::interval ) END AS right_range FROM generate_series(lower($2), upper($2), '1 day'::interval) AS calendar_date LEFT OUTER JOIN availability ON availability.room_id = $1 AND availability.available_range && $2
  • 76. 76
  • 77. 77 UNION SELECT 'closed' AS left_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( lower(availables.left_range), lower(availables.left_range * availability.available_range) ) ELSE tstzrange( lower(availables.right_range), lower(availables.right_range * availability.available_range) ) END AS left_range, CASE WHEN availability.available_range && availables.left_range OR availability.available_range && availables.right_range THEN 'available' ELSE 'closed' END AS center_status, CASE WHEN availability.available_range && availables.left_range THEN availability.available_range * availables.left_range ELSE availability.available_range * availables.right_range END AS center_range, 'closed' AS right_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( upper(availables.left_range * availability.available_range), upper(availables.left_range) ) ELSE tstzrange( upper(availables.right_range * availability.available_range), upper(availables.right_range) ) END AS right_range FROM availables JOIN availability ON availability.room_id = $1 AND availability.available_range && $2 AND availability.available_range <> availables.center_range AND ( availability.available_range && availables.left_range OR availability.available_range && availables.right_range ))
  • 78. 78 UNION SELECT ... FROM availables JOIN availability ON availability.room_id = $1 AND availability.available_range && $2 AND availability.available_range <> availables.center_range AND ( availability.available_range && availables.left_range OR availability.available_range && availables.right_range )
  • 79. 79
  • 80. 80 'closed' AS left_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( lower(availables.left_range), lower(availables.left_range * availability.available_range) ) ELSE tstzrange( lower(availables.right_range), lower(availables.right_range * availability.available_range) ) END AS left_range, CASE WHEN availability.available_range && availables.left_range OR availability.available_range && availables.right_range THEN 'available' ELSE 'closed' END AS center_status, CASE WHEN availability.available_range && availables.left_range THEN availability.available_range * availables.left_range ELSE availability.available_range * availables.right_range END AS center_range, 'closed' AS right_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( upper(availables.left_range * availability.available_range), upper(availables.left_range) ) ELSE tstzrange( upper(availables.right_range * availability.available_range), upper(availables.right_range) ) END AS right_range
  • 81. 81
  • 82. 82
  • 83. 83 SELECT * FROM ( SELECT x.left_status AS status, x.left_range AS calendar_range FROM availables x LEFT OUTER JOIN availables y ON x.left_range <> y.left_range AND x.left_range @> y.left_range GROUP BY 1, 2 HAVING NOT bool_or(COALESCE(x.left_range @> y.left_range, FALSE)) UNION SELECT DISTINCT x.center_status AS status, x.center_range AS calendar_range FROM availables x UNION SELECT x.right_status AS status, x.right_range AS calendar_range FROM availables x LEFT OUTER JOIN availables y ON x.right_range <> y.right_range AND x.right_range @> y.right_range GROUP BY 1, 2 HAVING NOT bool_or(COALESCE(x.right_range @> y.right_range, FALSE)) ) x WHERE NOT isempty(x.calendar_range) AND NOT lower_inf(x.calendar_range) AND NOT upper_inf(x.calendar_range) AND x.calendar_range <@ $2 $$ LANGUAGE SQL STABLE;
  • 85. 85