1. Python in the database
Writing PostgreSQL triggers and functions
with Python
2. Who Am I
Brian Sutherland
Partner at Vanguardistas LLC
Worked with PostgreSQL and Python for years
Used them to build Metro Publisher (SaaS)
3. What is PostgreSQL?
A non-NoSQL database…
Actually an SQL database
Extremely extensible
Performant
First released 1995
Very good general purpose Database
4. But we’re here to talk about Triggers
Code which executes inside the database
process in response to events
e.g. INSERT/UPDATE/DELETE a new row
5. Triggers
Used for
● auditing/logging
● sending email
● validation
● denormalization/cache
● cache invalidation
● replication
6. Triggers
PostgreSQL allows writing triggers in a number
of languages:
● C
● Java
● Javascript
● Python
● ...
7. Triggers
Use with caution
● Principle of least astonishment
○ INSERT can send email
● Transactions
○ Serialization Errors
○ Idempotency
○ Transaction Rollback
8. PL/Python
● Python 2 and 3
● Basic Postgres types are converted to
Python
● An “untrusted” language
● One interpreter per database session
9. Calendaring Example
Web application for calendaring
● Recurring events
● Read queries must be fast
High number of database reads compared to
writes
10. Calendaring Example
Every Weekday at 3PM until 1 January 2020
>>> from dateutil.rrule import *
>>> list(rrule(DAILY,
byweekday=[MO, TU, WE, TH, FR],
dtstart=datetime(2014, 11, 10, 15),
until=datetime(2020, 1, 1)))
[datetime.datetime(2014, 11, 10, 15, 0),
datetime.datetime(2014, 11, 11, 15, 0),
datetime.datetime(2014, 11, 12, 15, 0),
…]
11. Calendaring Example
Naïve Implementation
● Store only the rule in the database
● On display, expand the rule using dateutil
● Render calendar in HTML
12. Calendaring Example
FAIL
There are 100 000 events in the database, find
all events which occur between 3 and 4 PM
today
Calculating………………………...
13. Calendaring Example
Find another way, use triggers to
● Pre-calculate occurrences
● Store them in another “cache” table
● Use PostgreSQL indexes to make queries
fast
14. Calendaring Example
Trigger on the “event” generates occurrences
Store the occurrences in an “occ” table
Thanks to indexing, this query is FAST:
SELECT * FROM occ
WHERE dtstart > X AND dtend < Y
15. Calendaring Example
PostgreSQL has a range type which makes
things even faster:
SELECT * FROM occ
WHERE occuring && tsrange(X, Y)
16. Calendaring Example
Creating the trigger
CREATE LANGUAGE "plpython2u";
CREATE FUNCTION event_occs () RETURNS trigger AS $$
from my.plpy.generate_event_occs import generate_event_occs
generate_event_occs(TD["new"])
return "OK"
$$ LANGUAGE plpython2u;
CREATE TRIGGER event_gen_occs BEFORE UPDATE OR INSERT ON
event FOR EACH ROW EXECUTE PROCEDURE event_occs();
17. Calendaring Example
Much simplified function:
import plpy
def generate_event_occs(new):
d = plpy.prepare("DELETE FROM occ WHERE event_id=$1" , ["int"])
plpy.execute(d, [new[‘event_id’]])
i = plpy.prepare("INSERT INTO occ VALUES ($1,$2)", ["int", “tsrange”])
for period in rrule(new):
plpy.execute(i, [new[‘event_id’], period])
19. JSON Validation example
Much simplified function:
CREATE FUNCTION check_foo() RETURNS trigger AS $$
from json import loads
foo = loads(TD["new"]["foo"])
if "type" not in foo or foo["type"] not in ["a", "b"]:
raise Exception("Invalid Type")
return "OK"
$$ LANGUAGE plpython2u;
20. Best Practices
Immediately import and call a python function
CREATE FUNCTION event_occs () RETURNS trigger AS $$
from my.plpy.generate_event_occs import generate_event_occs
generate_event_occs(TD["new"])
return "OK"
$$ LANGUAGE plpython2u;
21. Best Practices
Import time can kill performance as modules
are re-imported every database connection
22. The ugly
● Except for some very basic types, Python 2
gets fed byte strings in the “database
encoding”
● A little better in Python 3 which gets unicode
● Debugging is interesting... (try running PDB
inside the PostgreSQL process)
23. The REALLY ugly (Fixed?)
ERROR: Exception: oops
CONTEXT: Traceback (most recent call last):
PL/Python function "generate_event_occs", line 3, in <module>
return generate_event_occs(event, rrule, SD)
PL/Python function "generate_event_occs", line 256, in generate_event_occs
PL/Python function "generate_event_occs", line 73, in generate_occurrences
PL/Python function "generate_event_occs", line 97, in generate
PL/Python function "generate_event_occs", line 424, in _handle_byday
PL/Python function "generate_event_occs", line 206, in resolve
PL/Python function "generate_event_occs"
PL/pgSQL function content.generate_event_occs() line 7 at assignment
SQL statement "UPDATE content.event SET dtstart_occs=NULL WHERE uuid=ev_uuid"
PL/pgSQL function content.event_rrule_set_occ_bounds() line 12 at SQL statement
24. Conclusions
VERY useful for complex code in the database
if you already program in python
Python has a lot of libraries which can be used
It has warts, but is a lifesaver