PostgreSQL’s Secret
NoSQL Superpowers
Amanda Gilmore
Postgres supports several
document data formats.
Formats
XML
hstore
json
jsonb
How do you choose?
Do you wrap a foreign database?
How do you choose?
Do you have a (fairly) standard object schema with weird objects?
How do you choose?
Are you worried about lock contention?
XML
http://www.postgresql.org/docs/9.5/static/functions-xml.html
Schema
CREATE TABLE xml_samples (
id bigserial PRIMARY KEY,
documents xml,
comments text
)
Create
INSERT INTO xml_samples (documents, comments) VALUES (
XMLPARSE( DOCUMENT $$<?xml version="1.0"?>
<catalog>
...
</catalog>$$)
, 'This is an entire XML document')
Create
--Alternatively, this inserts one node per row
INSERT INTO xml_samples (documents, comments) VALUES (
$$<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>$$
, 'one book per row, as an XML node')
Read
--Gets all titles in the table
SELECT xpath('//book/title', documents) FROM xml_samples
--Gets all info on a book by title
SELECT * FROM xml_samples WHERE (xpath('//book/title/text()'
, documents))[1]::text = 'Midnight Rain'::text
Update
Cannot do this natively.
hstore
http://www.postgresql.org/docs/9.5/static/hstore.html
Schema
CREATE EXTENSION hstore
CREATE TABLE hstore_samples (
id bigserial PRIMARY KEY,
documents hstore,
comments text
)
Create
INSERT INTO hstore_samples (documents, comments)
VALUES ('"active_resource"=>"true"
,"resource_name"=>"swimming-swiftly-1234"
,"resource_size_in_mb"=>"30"
,"resource_created_at"=>"2016-03-14 17:20:47.216862"'
, 'this is a straight up K/V hash')
Read
--hstores are strings under the hood and require casting
SELECT * from hstore_samples
WHERE (documents -> 'resource_size_in_mb')::int > 30
Read
--You can get all of the values for a certain key:
SELECT (documents -> 'resource_name')::text
FROM hstore_samples
--or convert the entire table to an hstore:
SELECT hstore(t) FROM hstore_samples AS t
Read
--Pulling multiple values from the document
SELECT (documents -> 'resource_name')::text
, (documents -> 'active_resource')::text
FROM hstore_samples
Update
--Cannot do this in place , hence the || operator
UPDATE hstore_samples
SET documents = documents || '"active_resource"=>"false"'::hstore
WHERE documents @> '"resource_name"=>"swimming-swiftly-1234"'::hstore
json & jsonb
http://www.postgresql.org/docs/9.5/static/datatype-json.html
Read
--get json object. Array element by index (int) or by key
(str)
->
--get json object *by text*
->>
--containment operator, used to get an object by path
@>
--checks for existence of a key
?
http://www.postgresql.org/docs/9.5/static/functions-json.html
Schema
CREATE TABLE json_samples (
id bigserial PRIMARY KEY,
text_json json,
binary_json jsonb,
notes text
)
Create
INSERT INTO json_samples (text_json, notes) VALUES (
'{
"minStartTimeNs": "1429828653617000000",
"maxEndTimeNs": "1429839639367000000",
"dataSourceId":
"derived:com.google.heart_rate.bpm:com.google.android.gms:merge_heart_rate_bpm"
}'
, 'This is in the text json field')
Read
--Can check top-level keys before querying:
SELECT jsonb_object_keys(binary_json) FROM json_samples
--Well formed JSON path:
['point'][0]['value'][0]['fpVal']
--And you can extract specific values from the document:
SELECT binary_json->'point'->0->'value'->0->'fpVal' FROM json_samples
--Can check if a given key is present:
SELECT * FROM json_samples WHERE text_json::jsonb ? 'dataSourceId'
SELECT * FROM json_samples WHERE binary_json ? 'dataSourceId'
--And can check if the value of the key matches something:
SELECT * FROM json_samples
WHERE binary_json ->> 'minStartTimeNs' = '1429828653617000000'
Read
Update
--Can update at top level:
UPDATE json_samples
SET binary_json = binary_json || '{"address": {
"streetAddress": "123 Test Street",
"city": "Oakland",
"state": "CA",
"postalCode": "94123"
} }'
WHERE binary_json ->> 'lastName' = 'Gilmore'
Update
--or use jsonb_set() to drill down the tree:
SELECT jsonb_set(binary_json::jsonb
, '{address, streetAddress}'
, '"456 Lorem Ipsum St"'::jsonb)
FROM json_samples
WHERE binary_json ->> 'lastName' = 'Gilmore'
tl;dr
xml Can query on xpath Can’t update
hstore Can index keys Text, less performant
json Richly nested data Text, less performant
jsonb Nested data, fast reads Higher cost writes
xml: http://www.postgresql.org/docs/9.5/static/functions-xml.html
hstore: http://www.postgresql.org/docs/9.5/static/hstore.html
json & jsonb: http://www.postgresql.org/docs/9.5/static/functions-json.html
Source for the xml sample: https://msdn.microsoft.com/en-
us/library/ms762271(v=vs.85).aspx
Sample code for this talk on GitHub:
https://github.com/mandagill/PGConfUS_2016_syntax_samples
Links & References
Links & References
Questions?
Thanks, y’all!
amanda@heroku.com
GitHub: mandagill

PostgreSQL's Secret NoSQL Superpowers

Editor's Notes

  • #2 -Who am I? Why do I care? => We use a lot of rich data stores on my team and querying gets complicated. I rather wish I had these slides when I started working heavily on PG. -Don’t worry about photoing slides, SQL is in GitHub -Assume I’m talking about 9.5 unless otherwise stated
  • #4 Do you want to put all the data in PG and format some fields as documents? Do you want to wrap a NoSQL DB? Here are some questions to ask that may help you come to an answer...
  • #5 Use the data format that maps your foreign data source Mongo -> json Redis -> hstore
  • #6 If your information *generally* maps to an object, likely easier to stay in PG entirely. E.g. car rental
  • #7 Are you worried about lock contention? -> You might have a single json blob in a field, but that row is always going to be locked when you’re performing some operation on it
  • #8 Yeaaaaaah xpaths RSS feeds amirite?
  • #11 Hstore values *must be a string*
  • #13 Asynchronous information processing is a possible use case here. I.e., stashing the API payload as soon as I get it and then processing it later in-application, queued process, etc.
  • #14 Available since 9.1 Contrib module Supports indexing, YEAH
  • #15 Contrib module, need to CREATE EXTENSION
  • #16 This is a basic string, but adding it into an hstore formatted field gives you the extra function-y goodness INSERT is nice and straightforward
  • #17 PG: You could have a B-Tree expression index on the expression "(documents -> 'resource_created_at')::timestamptz" here, which would let you use a b-tree index for the > operator here (Something GIN cannot do). HGMNZ: GIN indexes' support for hstore means that you can index all fields in an hstore with one index, which is quite unique as far as database functionality goes.
  • #21 PostgreSQL allows only one character set encoding per database. It is therefore not possible for the JSON types to conform rigidly to the JSON specification unless the database encoding is UTF8. Check with SHOW SERVER_ENCODING; Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Straight json - only when manipulating data at the application layer All the shiny operators work with jsonb jsonb - Supports indexing! :D LOCKS ARE STILL ROW LEVEL.
  • #22 The ->> is important; if I am matching on ‘Gilmore’ text, I need the ->> instead of the ->
  • #25 This is where it gets fun. :) Can query more deeply nested objects
  • #26 These are for top-level keys
  • #29 Here I’m going to verbally specify that with higher cost writes for jsonb, they are more expensive but reads are cheaper (much like you’d see for a normal column index in any case)