Postgres can do THAT?

Alex Brasetvik // Senior Principal Architect @ Cognite
Postgres can do THAT?
Survey of select Postgres features, some which you should
probably learn more about!

▪ Link to in-depth material posted in the last slide
▪ Many topics could be long talks on their own. I want to
make you interested in these topics, not trying to fully
cover them (at all!)
▪ Questions? Tweet me @alexbrasetvik
GETTING STARTED

$ whoami
Alex Brasetvik @ Cognite
● Office of the CTO: focusing on
database related stuff
● Previously co-founded the
Elasticsearch platform that became
Elastic Cloud
● Postgres and Elasticsearch are
some of my favorite hammers
● As a proper cloud engineer, I enjoy
jumping out of planes

AGENDA
generate_series
EXPLAIN
RETURNING
WITH/Common Table Expressions
● Materialized vs inlined CTEs
● Writable CTEs
● Recursive CTEs
What's up in my database? pg_stat_(activity|statements)
Less locky migrations
Index tricks, range types
Deferrable constraints
Exclusion constraints
JSON

AGENDA
generate_series
EXPLAIN
RETURNING
WITH/Common Table Expressions
● Materialized vs inlined CTEs
● Writable CTEs
● Recursive CTEs
What's up in my database? pg_stat_(activity|statements)
Index tricks, range types
Deferrable constraints
JSON
tl;dr: Hold on to your butts

generate_series()
Make lots of dummy data

generate_series(1, 1000)
# create table a (id int);
CREATE TABLE
# insert into a select generate_series(1, 100);
INSERT 0 100
# select * from a limit 3;
id
----
1
2
3
(3 rows)
-- A million rows in a few seconds:
# timing on
# create table some_numbers as
select generate_series(1, 1000*1000) as i;
SELECT 1000000
Time: 2245.545 ms (00:02.246)
# create index on some_numbers(i);
CREATE INDEX
Time: 994.635 ms
● generate_series(start_inclusive, end_inclusive)
● Useful to easily create an arbitrarily large sample set
● As we'll come back to, always test with realistically
sized data sets!
○ Behaviour can drastically change with changes
in data set sizes

generate_series(1, 1000)
# create table a (id int);
CREATE TABLE
# insert into a select generate_series(1, 100);
INSERT 0 100
# select * from a limit 3;
id
----
1
2
3
(3 rows)
-- A million rows in a few:
# timing on
# create table some_numbers as
select generate_series(1, 1000*1000) as i;
SELECT 1000000
Time: 2245.545 ms (00:02.246)
# create index on some_numbers(i);
CREATE INDEX
Time: 994.635 ms
● generate_series(start_inclusive, end_inclusive)
● Useful to easily create an arbitrarily large sample set
● As we'll come back to, always test with realistically
sized data sets!
○ Behaviour can drastically change with changes
in data set sizes
(psql) command
output

EXPLAIN
Asking Postgres to detail its plans

Please EXPLAIN
# explain select * from a where id=42;
-- QUERY PLAN --
Seq Scan on a (cost=0.00..2.25 rows=1 width=4)
Filter: (id = 42)
# explain analyze select * from a where id=42;
-- QUERY PLAN --
(actual time=0.034..0.041 rows=1 loops=1)
Filter: (id = 42)
Rows Removed by Filter: 99
Planning Time: 0.137 ms
Execution Time: 0.057 ms
● EXPLAIN [query goes here] shows the plan of the
statement without executing the query.
● EXPLAIN ANALYZE [query] executes the query while
profiling it, emitting the plan with profiling information.
Analyze = Execute

Please EXPLAIN
# explain select * from a where id=42;
-- QUERY PLAN --
Filter: (id = 42)
# explain analyze select * from a where id=42;
-- QUERY PLAN --
Filter: (id = 42)
Scan the entire table
Then remove rows

# create index on a(id);
CREATE INDEX
# explain (analyze true, verbose true, buffers true)
select * from a where id=42;
-- QUERY PLAN --
Seq Scan on public.a (cost=0.00..2.25 rows=1 width=4)
Output: id
Filter: (a.id = 42)
Buffers: shared hit=1
Settings. FORMAT JSON can be useful too.
Index not
used…?
Let's create an index
Postgres reads 1 page. That
plan is impossible to beat. It
knows there's hardly any data in
the table

# set enable_seqscan to off; -- Disable "seq scans" if possible, for testing
SET
select * from a where id=42;
-- QUERY PLAN --
Index Only Scan using a_id_idx on public.a (cost=0.14..8.16 rows=1 width=4)
Output: id
Index Cond: (a.id = 42)
Heap Fetches: 1
Planner setting

# set enable_seqscan to on; -- Revert to default setting
SET
# insert into a select generate_series(101, 1000*1000); -- 1 million rows in total
INSERT 0 999900
# explain (analyze true, verbose true, buffers true) select * from a where id=42;
-- QUERY PLAN --
Index Only Scan using a_id_idx on public.a (cost=0.42..8.44 rows=1 width=4)
Output: id
Index Cond: (a.id = 42)
Heap Fetches: 1
More data in the table. Picks a plan
with the index.

# drop index a_id_idx; -- Blow away the index, forcing a seq scan
DROP INDEX
# explain (analyze true, verbose true, buffers true) select * from a where id=42;
-- QUERY PLAN --
Gather (cost=1000.00..10634.90 rows=1 width=4)
Output: id
Workers Planned: 2
Workers Launched: 2
→ Parallel Seq Scan on public.a (cost=0.00..9634.80 rows=1 width=4)
Output: id
Filter: (a.id = 42)
Worker 0: actual time=75.269..75.270 rows=0 loops=1
Worker 1: actual time=75.277..75.277 rows=0 loops=1
Execution Time: 102.523 ms (vs 0.075 ms with the index)

Visualizing plans
● Spotting performance problems in text can be hard,
especially as queries grow larger
● Useful visualization tools:
○ explain.dalibo.com
○ explain.depesz.com

Explaining writes
# create table foo(id int primary key);
CREATE TABLE
# insert into foo select generate_series(1, 1000*1000);
INSERT 0 1000000
# create table bar(id int primary key, foo int references foo(id) on delete cascade on update cascade);
CREATE TABLE
# insert into bar select generate_series(1, 1000*1000), generate_series(1, 1000*1000);
INSERT 0 1000000
-- foo and bar both have 1 million rows
-- bar.foo points to foo.id
# delete from bar where id = 42; -- This is fast
DELETE 1
Time: 1.130 ms
# delete from foo where id=1000; -- This is slooow
DELETE 1
Time: 405.049 ms
foo
id int
bar
id int
foo int

delete from foo where id=1000;
-- QUERY PLAN --
Delete on public.foo (cost=0.42..8.44 rows=1 width=6) (actual time=18.937..18.939 rows=0 loops=1)
Buffers: shared hit=3 read=6 dirtied=2
-> Index Scan using foo_pkey on public.foo (cost=0.42..8.44 rows=1 width=6)
Output: ctid
Index Cond: (foo.id = 1000)
Buffers: shared hit=1 read=3
Trigger RI_ConstraintTrigger_a_26211344
for constraint bar_foo_fkey: time=379.013 calls=1
foo
id int
bar
id int
foo int No index
on bar.foo
The cascading delete is very slow!
Trigger for reverse FK must scan all
of bar per deleted row.
(Note that the analyze causes the write to go through! "analyze" = "execute")

EXPLAIN that again?
● Test with real-sized data sets
● Experiment with low memory settings (work_mem) to
see behaviour when Postgres flushes to disk
● You can explain UPDATE and DELETE, not just SELECT
● EXPLAIN (ANALYZE) shows the plan (with profiling
data)
● Spot performance problems, find missing indexes
● … but also get a better understanding of how Postgres
executes things!
○ … which can avoid the performance debugging
down the road
● There are many "node types" other than Seq and
Index scan. Learn more about them!

EXPLAIN your queries
Improve your intuition for how queries execute, not just when you have a
performance problem

RETURNING *
Get back changes made

# create table person (
id int generated always as identity primary key,
name text,
is_crazy boolean
);
CREATE TABLE
# insert into person (name, is_crazy) values
('Alice', false),
('Bob', false),
('Mallory', true)
returning *; -- Emit every row inserted, which includes the auto generated ID
id | name | is_crazy
----+---------+----------
1 | Alice | f
2 | Bob | f
3 | Mallory | t
(3 rows)
INSERT 0 3

# update person set is_crazy = not is_crazy returning name, is_crazy;
name | is_crazy
---------+----------
Alice | t
Bob | t
Mallory | f
(3 rows)
UPDATE 3
sample=# delete from person where is_crazy returning id;
id
----
1
2
(2 rows) RETURNING will … return in a
few slides

WITH
Also called "Common Table Expressions"
(or CTEs)

WITH a tiny graph
# select * from edges;
a | b | type
----------------+----------------+--------------
A | B | friend
B | C | friend
B | D | friend
root | A | parentOf
root | something-else | bff
something-else | A pump?! | !
A pump?! | D | tree-breaker
(7 rows)
# copy (select 'digraph G { ' ||
string_agg('"' || a || '" -> "' || b || '" [label="' || type || '"]; ', E'')
|| '}' from edges)
to program 'dot -Tsvg > /tmp/test.svg'; -- Pipe to Graphviz
COPY 1
# copy (select * from edges) to '/tmp/file.csv' with csv header; -- Make a CSV
COPY 7
# copy (select * from edges) to program 'pbcopy' with csv header;
-- CSV now on clipboard, paste straight to Google Sheets

# with out_per_node as (
select a as node,
count(*) as out_degree
from edges
group by a
), in_per_node as (
select b as node,
count(*) as in_degree
from edges
group by b
)
select node,
coalesce(out_degree, 0) as out_degree,
coalesce(in_degree, 0) as in_degree
from out_per_node
full outer join in_per_node
using(node)
order by node;
node | out_degree | in_degree
----------------+------------+-----------
A | 1 | 1
A pump?! | 1 | 1
B | 2 | 1
C | 0 | 1
D | 0 | 2
root | 2 | 0
something-else | 1 | 1
(7 rows)

select a as node,
from edges
group by a
), in_per_node as (
select b as node,
from edges
group by b
)
select node,
from out_per_node
using(node)
order by node;
Equivalent sub-select:
select node,
from (
select a as node, count(*) as out_degree
from edges
group by a
) as out_per_node
full outer join (
select b as node, count(*) as in_degree
from edges
group by b
) as in_per_node
using(node)
order by node;

select a as node,
from edges
group by a
), in_per_node as (
select b as node,
from edges
group by b
)
select node,
from out_per_node
using(node)
where node = 'root'
order by node;
select node,
from (
from edges
where a = 'root'
group by a
) as out_per_node
full outer join (
from edges
where b = 'root'
group by b
) as in_per_node
using(node)
order by node;

select a as node,
from edges
group by a
), in_per_node as (
select b as node,
from edges
group by b
)
select node,
from out_per_node
using(node)
where node = 'root'
order by node;
select node,
from (
from edges
where a = 'root'
group by a
) as out_per_node
full outer join (
from edges
where b = 'root'
group by b
) as in_per_node
using(node)
order by node;
How equivalent?

# explain with out_per_node as (
select a as node,
from edges
group by a
), in_per_node as (
select b as node,
from edges
group by b
)
select node,
from out_per_node
using(node)
where node = 'root'
order by node;
QUERY PLAN
---------------------------------------------------------------------------------------
Sort (cost=36.62..36.64 rows=9 width=48)
Sort Key: (COALESCE(edges.a, edges_1.b))
-> Hash Full Join (cost=18.24..36.48 rows=9 width=48)
Hash Cond: (edges.a = edges_1.b)
-> GroupAggregate (cost=0.00..18.17 rows=3 width=40)
Group Key: edges.a
-> Seq Scan on edges (cost=0.00..18.12 rows=3 width=32)
Filter: (a = 'root'::text)
-> Hash (cost=18.20..18.20 rows=3 width=40)
Group Key: edges_1.b
-> Seq Scan on edges edges_1 (cost=0.00..18.12 rows=3 width=32)
Filter: (b = 'root'::text)

# explain select node,
from (
from edges
where a = 'root'
group by a
) as out_per_node
full outer join (
from edges
where b = 'root'
group by b
) as in_per_node
using(node)
order by node;
QUERY PLAN
---------------------------------------------------------------------------------------
Sort (cost=36.62..36.64 rows=9 width=48)
Sort Key: (COALESCE(edges.a, edges_1.b))
-> Hash Full Join (cost=18.24..36.48 rows=9 width=48)
Hash Cond: (edges.a = edges_1.b)
Group Key: edges.a
-> Seq Scan on edges (cost=0.00..18.12 rows=3 width=32)
Filter: (a = 'root'::text)
-> Hash (cost=18.20..18.20 rows=3 width=40)
Group Key: edges_1.b
-> Seq Scan on edges edges_1 (cost=0.00..18.12 rows=3 width=32)
Filter: (b = 'root'::text)
(The plans are identical!)

WITH queries
● Those two queries were not identical until Postgres 12
● Consider
● "Materialization", i.e. saving a temporary result, causes this to
materialize the entirety of table a
○ (That's a very inefficient way to get two rows..)
● Older blog posts about CTEs will emphasize this as an "optimisation
barrier", probably warning about them
with a_million_numbers as (
select * from a -- table from earlier with index on id
)
select * from a_million_numbers
where id in (42, 43);
}
Postgres ≤11 will always
materialize the result of
a "table expression" –
either in memory or disk

# set work_mem to '64 kB'; -- force disk flushing with very low limit
with a_million_numbers as MATERIALIZED (
select * from a
)
select * from a_million_numbers where id in (42, 43);
-- QUERY PLAN --
CTE Scan on a_million_numbers (cost=14426.90..36928.92 rows=10001 width=4)
Output: a_million_numbers.id
Filter: (a_million_numbers.id = ANY ('{42,43}'::integer[]))
Buffers: shared hit=4426, temp written=1709
CTE a_million_numbers
-> Seq Scan on public.a (cost=0.00..14426.90 rows=1000090 width=4) (actual
time=0.015..334.729 rows=1000090 loops=1)
Output: a.id
Force Postgres ≤11 behaviour
Low on memory, flushing to disk

with a_million_numbers as [NOT MATERIALIZED] (
select * from a
)
select * from a_million_numbers where id in (42, 43);
-- QUERY PLAN --
Index Only Scan using a_id_idx on a (cost=0.42..12.88 rows=2 width=4) (actual
time=0.020..0.026 rows=4 loops=1)
Index Cond: (id = ANY ('{42,43}'::integer[]))
Heap Fetches: 4
Execution Time: 0.043 ms -- vs 1358ms for materialized plan
Default Postgres ≥12 behaviour

with a_million_numbers as [NOT MATERIALIZED] (
select * from a where id in (42, 43)
)
select * from a_million_numbers where id in (42, 43)
-- QUERY PLAN --
Index Only Scan using a_id_idx on a (cost=0.42..12.88 rows=2 width=4) (actual
time=0.020..0.026 rows=4 loops=1)
Index Cond: (id = ANY ('{42,43}'::integer[]))
Heap Fetches: 4
Execution Time: 0.043 ms -- vs 1358ms for materialized plan
as if inlined

Writable CTEs
● DELETE/UPDATE/INSERT statements with RETURNING
can be used as "tables" too
● Note!
○ RETURNING is the only way to let other table
expressions see the result.
○ Different table expressions cannot otherwise
see the effect of other modifications in the
same statement.
-- Delete rows while simultaneously
-- inserting them elsewhere
WITH moved_rows AS (
DELETE FROM products
WHERE
date >= '2010-10-01' AND
date < '2010-11-01'
RETURNING *
)
INSERT INTO products_log
SELECT * FROM moved_rows;

Recursive CTEs
# WITH RECURSIVE t(n) AS (
VALUES (1) -- starting value(s)
UNION ALL
SELECT n+1 FROM t -- t will keep growing
WHERE n < 100 -- until termination condition is true
)
SELECT sum(n) FROM t;
sum
------
5050

# with recursive graph_traversal(a, b, path, depth) as (
select a, b, ARRAY[a] as path, 0 as depth
from edges
where a='root'
union all
select edges.a, edges.b, path || ARRAY[edges.a], depth + 1
from edges join graph_traversal on(edges.a=graph_traversal.b)
-- Avoid looping forever if there's a cycle
where not(edges.a = ANY(path))
)
select * from graph_traversal
order by depth, a;
a | b | path | depth
----------------+----------------+----------------------------------+-------
root | A | {root} | 0
root | something-else | {root} | 0
A | B | {root,A} | 1
something-else | A pump?! | {root,something-else} | 1
A pump?! | D | {root,something-else,"A pump?!"} | 2
B | C | {root,A,B} | 2
B | D | {root,A,B} | 2
(7 rows)

-- Mandelbrot set
WITH RECURSIVE x(i)
AS (
VALUES(0)
UNION ALL
SELECT i + 1 FROM x WHERE i < 101
),
Z(Ix, Iy, Cx, Cy, X, Y, I)
AS (
SELECT Ix, Iy, X::float, Y::float, X::float, Y::float, 0
FROM
(SELECT -2.2 + 0.031 * i, i FROM x) AS xgen(x,ix)
CROSS JOIN
(SELECT -1.5 + 0.031 * i, i FROM x) AS ygen(y,iy)
UNION ALL
SELECT Ix, Iy, Cx, Cy, X * X - Y * Y + Cx AS X, Y * X * 2 + Cy, I + 1
FROM Z
WHERE X * X + Y * Y < 16.0
AND I < 27
),
Zt (Ix, Iy, I) AS (
SELECT Ix, Iy, MAX(I) AS I
FROM Z
GROUP BY Iy, Ix
ORDER BY Iy, Ix
)
SELECT array_to_string(
array_agg(
SUBSTRING(
' .,,,-----++++%%%%@@@@#### ',
GREATEST(I,1),
1
)
),''
)
FROM Zt
GROUP BY Iy
ORDER BY Iy;

What's going on?
● application_name lets you identify your connection in pg_stat_activity
● pg_terminate_backend(pid) to kill a bad connection
● pg_stat_activity for what's up right now
● pg_stat_statements extension for tracking what has happened
○ Install: create extension if not exists pg_stat_statements
○ # install on all subsequently created databases:
$ psql template1 -c 'create extension if not exists pg_stat_statements'
●

pg_stat_statements
● pg_stat_statements extension for tracking continuous activity
● What are the most expensive queries over time?
● One of the most useful extensions!
# select * from pg_stat_statements order by total_time desc limit 1;
-[ RECORD 1 ]-------+------------------------------------------------------------
[…]
query | with a_million_numbers as materialized (select * from a)
select * from a_million_numbers where id in(42, 43)
calls | 2
total_time | 2500.088486
min_time | 1160.500207
max_time | 1339.588279
mean_time | 1250.044243
stddev_time | 89.544036
rows | 0
shared_blks_hit | 8852
[…]
temp_blks_written | 1710

pg_stat_statements
● Some ORMs/toolkits fill pg_stat_statements with junk:
○ WHERE id IN ($1, $2, …, $987)
○ WHERE id IN ($1, $2, …, $5432)
● Consider id = ANY($parameter_as_array)
# select * from pg_stat_statements order by total_time desc limit 1;
-[ RECORD 1 ]-------+------------------------------------------------------------
[…]
query | with a_million_numbers as materialized (select * from a)
select * from a_million_numbers where id in ($1, $2, $3, …)
calls | 2
total_time | 2500.088486
min_time | 1160.500207
max_time | 1339.588279
mean_time | 1250.044243
stddev_time | 89.544036
rows | 0
shared_blks_hit | 8852
[…]
temp_blks_written | 1710

Locks
The likely reason for a migration causing an outage

Sample lock block
# begin;
# create index on person(name);
CREATE INDEX
Client A Client B Client C
In Postgres, DDL like create table, alter table,
create index, etc. are transactional.
Unlike Oracle and MySQL, "CREATE TABLE"
does not imply commit.
"Implicit commit" is not a thing in Postgres.

Sample lock block
# begin;
# create index on person(name);
CREATE INDEX
-- Index creation needs a share lock
-- Block to see effects on others
# select pg_sleep(600);
# begin;
-- Reading is fine
# select * from person where id=3;
id | name | is_crazy
----+---------+----------
3 | Mallory | f
-- But cannot update:
# update person set name='Not Mallory'
where id=3;
-- We're BLOCKED by client A :(
-- Meanwhiles, checking the status:
# select query, wait_event_type from
pg_stat_activity where state='active' and
pid != pg_backend_pid();
-[ RECORD 1 ]---+----------------------
query | select pg_sleep(600);
wait_event_type | Timeout
-[ RECORD 2 ]---+----------------------
query | update person set
name='Not Mallory' where id=3;
wait_event_type | Lock
Client A Client B Client C
pg_sleep is useful to
artificially slow down
operations to observe
locking effects in dev

Locky migrations
● CREATE INDEX prevents writes on the target table for the duration of the
transaction. Locks are released (only!) on commit/rollback.
● ALTER TABLE (add column and/or constraints, etc) will need an exclusive lock
on the table.
● An exclusive lock on a table will block all concurrent access = OUTAGE
● A share lock will block all concurrent write access = read only access
● You generally want to minimise time spent holding or waiting for such locks
○ statement_timeout
○ lock_timeout

● CREATE INDEX CONCURRENTLY does not block concurrent writing to the
table
○ Reads the table twice. Costs more IO.
○ Cannot be done in a transaction, i.e. it must be the only operation in its
own transaction
○ Caveats apply, consult the docs
● ALTER TABLE ADD CONSTRAINT … NOT VALID
○ Adds a constraint that only applies to subsequent writes
○ Existing data is not validated, so an exclusive lock needed only briefly
● ALTER TABLE … VALIDATE CONSTRAINT …
○ Holds a much weaker lock while reading the entire table
○ Can of course fail if data exists that cannot validate
● CREATE UNIQUE INDEX CONCURRENTLY for a uniqueness constraint

# select * from equipment;
id | mass_in_grams
----+---------------
1 | 42
2 | -1
(2 rows)
# alter table equipment
add constraint mass_not_negative
check (mass_in_grams >= 0) not valid;
ALTER TABLE
# insert into equipment values (3, -42);
ERROR: new row for relation "equipment" violates check
constraint "mass_not_negative"
DETAIL: Failing row contains (3, -42).
# alter table equipment
validate constraint mass_not_negative;
ERROR: check constraint "mass_not_negative" is violated
by some row
Needs exclusive lock for
some milliseconds
Scans the entire table without
strong lock

Deferred constraints
● A constraint can be deferred to commit time
● Useful e.g. for cyclic foreign keys
● Cyclic foreign key useful when partitioning out certain
columns
○ This is useful for "last_modified" kinds of use
cases.
○ Read more about "heap only tuples" if you have
that use case
# create table something (
-- change_info does not exist yet, adding
-- FK via ALTER below:
id int primary key,
what text
);
# create table change_info (
id int primary key
references something(id)
deferrable initially deferred,
last_modified timestamptz not null default now()
);
# alter table something
add constraint must_have_change_info
foreign key (id) references change_info(id)
deferrable initially deferred;

Deferred constraints # insert into something values (1, 'this-will-fail');
ERROR: insert or update on table "something" violates
foreign key constraint "must_have_change_info"
DETAIL: Key (id)=(1) is not present in table
"change_info".
# begin;
-- Not failing, FK check deferred to commit
# insert into something values (1, 'with-version-info');
INSERT 0 1
# insert into change_info values (1, now());
INSERT 0 1
# end;
COMMIT
# begin;
# insert into change_info values (2, now());
INSERT 0 1
-- Will fail:
# end;
ERROR: insert or update on table "change_info" violates
foreign key constraint "change_info_id_fkey"
DETAIL: Key (id)=(2) is not present in table "something".
# create table something (
-- change_info does not exist yet, adding
-- FK via ALTER below:
id int primary key,
what text
);
# create table change_info (
id int primary key
references something(id)
deferrable initially deferred,
last_modified timestamptz not null default now()
);
# alter table something
add constraint must_have_change_info
foreign key (id) references change_info(id)
deferrable initially deferred;

Functional indexes
-- Functional indexes:
# create table users (
id int primary key,
username text
);
# create unique index on users(lower(username));
# insert into users values (1, 'alex'), (2, 'ALEX');
ERROR: duplicate key value violates unique
constraint "users_lower_idx"
DETAIL: Key (lower(username))=(alex) already exists.
-- Note: Index lookups must use the
-- same function
-- This can use the index:
# select * from users
where lower(username)='alex';
-- This will NOT be able to use that index:
where username='alex';

Partial indexes
# create table assets (
id int primary key,
external_id text not null,
deleted_at timestamp
);
# create unique index on assets(external_id)
where deleted_at is null;
# insert into assets values
(1, 'one', now()), (2, 'one', null);
INSERT 0 2
# insert into assets values
(3, 'not one', null), (4, 'one', null);
ERROR: duplicate key value violates unique
constraint "assets_external_id_idx"
DETAIL: Key (external_id)=(one) already exists.
-- This can use the partial index:
# select * from assets
where external_id='one' AND
deleted_at is null;
-- This will NOT be able to use that index:
where external_id='one';
-- missing (deleted_at is null)

Range Types and Exclusion Constraints

Range types
● A range is an interval of numeric-like data
● int4range(0, 10): 0 <= n < 10
● tstzrange(now(), null): any time >= now()
● Index support for overlaps and containment
○ overlaps: int4range(0, 10) && int4range(9, 20)
○ contains:
■ int4range(0, 10) @> 2
■ int4range(0, 10) @> int4range(2, 4)
# create table intervals (
id int primary key,
start timestamptz,
"end" timestamptz
);
-- Insert 1 million random intervals
# with random_starts as (
select generate_series(1, 1000*1000) as id,
'2020-01-01'::timestamptz +
(1000 * random()) * interval '1 day' as start
)
insert into intervals
select id, start, start +
(1000 * random() * interval '1 hour') as "end"
from random_starts;
INSERT 0 1000000
# create index on intervals
using gist(tstzrange(start, "end"));
"Generalised search tree",
R-tree and more

Range types
# create table intervals (
id int primary key,
start timestamptz,
"end" timestamptz
);
-- [Insert random intervals
-- happened here]
# create index on intervals
using
gist(tstzrange(start, "end"));
# explain analyze
select * from intervals
where tstzrange(start, "end") &&
tstzrange(now(), now() + interval '7 days');
-- QUERY PLAN --
Bitmap Heap Scan on intervals
(cost=1368.16..8447.01 rows=28354 width=20)
Recheck Cond: (tstzrange(start, "end") &&
tstzrange(now(), (now() + '7 days'::interval)))
Heap Blocks: exact=6278
-> Bitmap Index Scan on intervals_tstzrange_idx
(cost=0.00..1361.08 rows=28354 width=0)
Index Cond: (tstzrange(start, "end") &&
tstzrange(now(), (now() + '7 days'::interval)))

● Define ranges that cannot overlap
● No two reservations of the same resource can overlap
● Exclusion constraints cannot be added "concurrently",
i.e. without write-locking the table.
# create table reservations (
room text,
start timestamptz,
"end" timestamptz,
constraint no_double_booking exclude using gist(
room with =,
tstzrange(start, "end") with &&
)
);
# insert into reservations values
('zelda', '2021-06-15', '2021-06-16'),
('zelda', '2021-06-17', '2021-06-18'),
('portal', '2021-06-01', '2021-07-01');
INSERT 0 3
# insert into reservations values
('zelda', '2021-06-14', '2021-06-16');
ERROR: conflicting key value violates exclusion
constraint "no_double_booking"
DETAIL: [omitted]

● Define constraint that all elements must be the same
○ "Exclude dissimilar"
● Example:
○ Don't put humans and lions in the same cage at
the same time
# create table cages (
cage text,
animal text,
start timestamptz,
"end" timestamptz,
constraint just_same_animals exclude using gist(
cage with =,
animal with !=,
tstzrange(start, "end") with &&
)
);
# insert into cages values
('cellar', 'human', '2021-06-15', '2021-06-16'),
('bedroom', 'lion', '2021-06-15', '2021-06-16'),
('cellar', 'human', '2021-06-01', '2021-07-01');
INSERT 0 3
# insert into cages values
('bedroom', 'human', '2021-06-14', '2021-06-16');
ERROR: conflicting key value violates exclusion
constraint "just_same_animals"
DETAIL: [omitted]

Upsert
● INSERT can take an ON CONFLICT
● Given a conflict, either
○ DO NOTHING
○ DO UPDATE SET … [ WHERE … ]
user_id bigint primary key,
username text not null,
name text
);
create unique index on users(lower(username));
insert into users values
(1, 'alex', 'Alex'),
(2, 'bob', 'Bob');
# prepare sample_upsert as
insert into users values ($1, $2, $3)
on conflict(user_id)
do update set
username=excluded.username,
name=excluded.name
where
row(users.username, users.name)
is distinct from
row(excluded.username, excluded.name)
returning *;
# execute sample_upsert(1, 'alex', 'AlexB');
user_id | username | name
---------+----------+-------
1 | alex | AlexB
-- Repeat the same, then what?
# execute sample_upsert(1, 'alex', 'AlexB');
---------+----------+------
(0 rows)

Upsert
● ON CONFLICT DO NOTHING does not return data
# prepare fixed_upsert as
with maybe_upserted as (
[ … same as previous …]
returning *
)
select * from maybe_upserted
union all
select * from users where user_id = $1
limit 1;
# execute fixed_upsert(1, 'alex', 'AlexB');
---------+----------+-------
1 | alex | AlexB
-- Repeat the same, then what?
# execute fixed_upsert(1, 'alex', 'AlexB');
---------+----------+-------
1 | alex | AlexB

Upsert
● ON CONFLICT DO NOTHING does not return data
# prepare fixed_upsert as
with maybe_upserted as (
[ … same as previous …]
returning *
)
select * from maybe_upserted
union all
select * from users where user_id = $1
limit 1;
-- (note: union all for this trick)
# explain analyze
execute fixed_upsert(1, 'alex', 'Changed Name');
QUERY PLAN
---------------------------------------------------
Limit (..) (actual ...)
CTE maybe_upserted
-> Insert on users users_1 (...)
Conflict Resolution: UPDATE
Conflict Arbiter Indexes: users_pkey
Conflict Filter: ([…snip…])
Tuples Inserted: 0
Conflicting Tuples: 1
-> Result (...)
-> Append (...) -- (this is UNION ALL)
-> CTE Scan on maybe_upserted (...)
-> Index Scan using users_pkey on users
(...) (never executed)
Index Cond: (user_id = '1'::bigint)

JSON
● jsonb types, functions and aggregates
● Validates conformity only
○ Not a replacement for proper schemas!
● Use with care
○ Not a replacement for proper schemas! :)
# create table metadata (
user_id int,
key text,
metadata jsonb
);
# insert into metadata values
(1, 'settings', '{"foo": "bar"}'),
(1, 'searches', '["where", "what"]');
# select user_id, jsonb_agg(metadata)
from metadata group by 1;
user_id | jsonb_agg
---------+-------------------------------------
1 | [{"foo": "bar"}, ["where", "what"]]
# select user_id, jsonb_object_agg(key, metadata)
from metadata group by 1;
user_id | jsonb_object_agg
---------+------------------------------------------
1 | {"searches": ["where", "what"],
"settings": {"foo": "bar"}}

JSON object graphs
● Compose complete JSON object graphs via
LATERAL joins.
● Does not require the rows to have JSON types
● LATERAL is a bit like "for each"
○ The subquery gets to reference the row
● GraphQL-implementations on top of Postgres do
this (e.g. Hasura, Postgraphile)
# select * from users join metadata using(user_id);
user_id | username | key | metadata
---------+----------+----------+-------------------
1 | alex | settings | {"foo": "bar"}
1 | alex | searches | ["where", "what"]
# select jsonb_build_object(
'username', username,
'metadata', aggregated_metadata
) as user_object
from users
left join lateral ( -- For each user:
select jsonb_object_agg(key, metadata) as metadata from metadata
where metadata.user_id=users.user_id
) as aggregated_metadata on true;
user_object
--------------------------
{
"metadata": {
"searches": [
"where",
"what"
],
"settings": {
"foo": "bar"
}
},
"username": "alex"
}

Upserting object graphs
● Convert JSON object graphs to records via
LATERAL joins
● Upsert objects to different tables with a single
writable WITH-query
user_id bigint generated by default as identity primary key,
username text unique not null,
name text
);
# create table user_settings (
user_id bigint primary key references users(user_id),
settings jsonb not null
);
-- TODO: Define 'upsert' query that either creates or patches
-- across multiple tables with a single parameter
# execute upsert('[
{
"user": {"username": "alex", "name": "Alex"},
"settings": {"foo": "bar"}
},
{
"user": {"username": "mallory", "name": "Mallory"}
}
]');
# execute upsert('[
{
"user": {"username": "alex", "name": "AlexB"}
},
{
"user": {"username": "mallory", "name": "Mallory"},
"settings": {"bar": "baz"}
}
]');

# prepare upsert as
with records as (
select * from jsonb_to_recordset($1::jsonb)
as _("user" jsonb, "settings" jsonb)
), maybe_upserted_users as (
insert into users (username, name)
select username, name
from records
join lateral
jsonb_populate_record(null::users, records.user)
on(true)
on conflict (username) do -- insert or update
update set
name=coalesce(excluded.name, users.name)
where
row(users.username, users.name) is distinct from
returning *
)
select * from maybe_upserted_users;
# execute upsert('[...]');
---------+----------+---------
1 | alex | Alex
2 | mallory | Mallory
user_id bigint ... primary key,
username text unique not null,
name text
);
# create table user_settings (
user_id bigint primary key
references users(user_id),
settings jsonb not null
);
# execute upsert('[
{
"user": {"username": "alex",
"name": "Alex"},
},
{
"user": {"username": "mallory",
"name": "Mallory"}
}
]');

# prepare upsert as
with records as (
from records
join lateral
on(true)
update set
where
returning *
)
select * from maybe_upserted_users;
# execute upsert('[
{
"name": "Alex"},
},
{
"name": "Mallory"}
}
]');
---------+----------+---------
1 | alex | Alex
-- On conflict's WHERE condition makes nothing be returned:
---------+----------+------
(0 rows)

# prepare upsert as
with records as (
from records
join lateral
on(true)
update set
where
returning *
), all_users as (
...
)
select * from all_users
all_users as (
select * from maybe_upserted_users
union all
select * from users where username in (
select "user" ->> 'username' from records
except all
select username from maybe_upserted_users
)
)
-- We now get all users: created or updated or neither
---------+----------+---------
1 | alex | Alex
---------+----------+---------
1 | alex | Alex

# prepare upsert as
with records as (
...
...
), all_users as (
...
), updated_settings as (
insert into user_settings
select user_id, settings
from all_users
join records on (records.user ->> 'username' = username)
where settings is not null
on conflict(user_id) do
update set
settings=excluded.settings
where
user_settings.settings is distinct from
excluded.settings
)
select username, user_id from all_users;
# execute upsert('[
{
"name": "Alex"},
},
{
"name": "Mallory"}
}
]');
username | user_id
----------+---------
alex | 1
mallory | 2
# select * from user_settings;
user_id | settings
---------+----------------
1 | {"foo": "bar"}

NOTIFY+LISTEN and advisory locks
● LISTEN: Get an async callback when something does NOTIFY
○ LISTEN 'channel'; -- Get notifications between transactions
● NOTIFY: Send callbacks. Delivered between transactions to LISTEN-ers
○ NOTIFY 'channel'; -- Wake up listeners
○ Triggers can do ensure notifications are sent
● Advisory locks:
○ "Leader election" through Postgres
○ Locks that last until you disconnect
○ Need a limited number of background processes to pick up work?
○ SELECT pg_try_advisory_lock(1234);
● SKIP LOCKED
○ SELECT * FROM work_items
LIMIT 1
FOR UPDATE SKIP LOCKED

Learn more!
● tinyurl.com/jz21-psql
● medium.com/@alexbrasetvik/postgres-can-do-that-
f221a8046e
● medium.com/cognite
● Tweet me questions or feedback: @alexbrasetvik
all the same

Postgres can do THAT?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Postgres can do THAT?

Similar to Postgres can do THAT? (20)

Recently uploaded

Recently uploaded (20)

Postgres can do THAT?