Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
PostgreSQL
It’s kind’ve a nifty database
How do you pronounce it?
Thanks to Thad for the suggestion
Answer Response Percentage
post-gres-q-l 2379 45%
post-gres 161...
Who is this guy?
• I’m Barry Jones
• Been a developing web apps since ’98
• Not a DBA
• Performance and infrastructure nut...
What is PostgreSQL NOT?
• NOT a silver bullet
• NOT the answer to life, the universe and
everything
• NOT better at everyt...
What IS PostgreSQL?
• Fully ACID compliant
• Feature rich and extensible
• Fast, scalable and leverages multicore
processo...
Laundry List of Features
• Multi-version Concurrency Control (MVCC)
• Point in Time Recovery
• Tablespaces
• Asynchronous ...
What are we covering today?
• Full text-search
• Built in data types
• User defined data types
• Automatic data compressio...
Full-text Search
• What about…?
– Solr
– Elastic Search
– Sphinx
– Lucene
– MySQL
• All have their purpose
– Distributed s...
Full-text Search
• Complications of stand alone search engines
– Data synchronization
• Managing deltas, index updates
• F...
Full-text Search
• But what if your needs are more like:
– Search within my database
– Avoid syncing data with outside sys...
Full-text Search
• tsvector
– The text to be searched
• tsquery
– The search query
• to_tsvector(„the church is AWESOME‟) ...
Full-text Search
• ALTER TABLE mytable ADD COLUMN search_vector tsvector
• UPDATE mytable
SET search_vector = to_tsvector(...
Full-text Search
• CREATE FUNCTION search_trigger RETURNS trigger AS $$
begin
new.search_vector :=
setweight(to_tsvector(„...
Full-text Search
• A variety of dictionaries
– Various Languages
– Thesaurus
– Snowball, Stem, Ispell, Synonym
– Write you...
Datatypes: ranges
• int4range, int8range, numrange, tsrange, tstzrange, daterange
• SELECT int4range(10,20) @> 3 == false
...
Datatypes: hstore
• properties
– {“author” => “John Grisham”, “pages” => 535}
– {“director” => “Jon Favreau”, “runtime” = ...
Datatypes: arrays
• CREATE TABLE sal_emp(name text, pay_by_quarter integer[],
schedule text[][])
• CREATE TABLE tictactoe ...
Datatypes: JSON
• Validate JSON structure
• Convert row to JSON
• Functions and operators very similar to hstore
Datatypes: XML
• Validates well-formed XML
• Stores like a TEXT field
• XML operations like Xpath
• Can’t index XML column...
Data compression with TOAST
• TOAST = The Oversized Attribute Storage Technique
• TOASTable data is automatically TOASTed
...
User created datatypes
• Built in types
– Numerics, monetary, binary, time, date, interval, boolean,
enumerated, geometric...
Further exploration: PostGIS
• Adds Geographic datatypes
• Distance, area, union, intersection, perimeter
• Spatial indexe...
Further exploration: Functions
• Can be used in queries
• Can be used in stored procedures and triggers
• Can be used to b...
Further exploration: PLV8
• CREATE OR REPLACE FUNCTION plv8_test(keys text[], vals text[])
RETURNS text AS $$
var o = {};
...
Further exploration: Async commands
/ indexes
• Fine grained control within functions
– PQsendQuery
– PQsendQueryParams
– ...
Thanks!
References / Credits
• NOTE: Some code samples in this presentation have minor
alterations for presentation clarity (such ...
Upcoming SlideShare
Loading in …5
×

PostgreSQL - It's kind've a nifty database

4,576 views

Published on

This presentation was given to a company that makes software for churches that is considering a migration from SQL Server to PostgreSQL. It was designed to give a broad overview of features in PostgreSQL with an emphasis on full-text search, various datatypes like hstore, array, xml, json as well as custom datatypes, TOAST compression and a taste of other interesting features worth following up on.

Published in: Technology
  • Be the first to comment

PostgreSQL - It's kind've a nifty database

  1. 1. PostgreSQL It’s kind’ve a nifty database
  2. 2. How do you pronounce it? Thanks to Thad for the suggestion Answer Response Percentage post-gres-q-l 2379 45% post-gres 1611 30% pahst-grey 24 0% pg-sequel 50 0% post-gree 350 6% postgres-sequel 574 10% p-g 49 0% database 230 4% Total 5267
  3. 3. Who is this guy? • I’m Barry Jones • Been a developing web apps since ’98 • Not a DBA • Performance and infrastructure nut • Pragmatic Idealist – Prefer the best solution possible for current circumstances vs the best solution possible at all costs
  4. 4. What is PostgreSQL NOT? • NOT a silver bullet • NOT the answer to life, the universe and everything • NOT better at everything than everybody • NOT always the best option for your needs • NOT used to it’s potential in most cases • NOT owned by Oracle or Microsoft
  5. 5. What IS PostgreSQL? • Fully ACID compliant • Feature rich and extensible • Fast, scalable and leverages multicore processors very well • Enterprise class with quality corporate support options • Free as in beer • It’s kind’ve nifty
  6. 6. Laundry List of Features • Multi-version Concurrency Control (MVCC) • Point in Time Recovery • Tablespaces • Asynchronous replication • Nested Transactions • Online/hot backups • Genetic query optimizer multiple index types • Write ahead logging (WAL) • Internationalization: character sets, locale-aware sorting, case sensitivity, formatting • Full subquery support • Multiple index scans per query • ANSI-SQL:2008 standard conformant • Table inheritance • LISTEN / NOTIFY event system • Ability to make a Power Point slide run out of room
  7. 7. What are we covering today? • Full text-search • Built in data types • User defined data types • Automatic data compression • A look at some other cool features and extensions, depending how we’re doing on time
  8. 8. Full-text Search • What about…? – Solr – Elastic Search – Sphinx – Lucene – MySQL • All have their purpose – Distributed search of multiple document types • Sphinx – Client search performance is all that matters • Solr – Search constantly incoming data with streaming index updates • Elastic Search excels – You really like Java • Lucene – You want terrible search results that don’t even make sense to you much less your users • MySQL full text search = the worst thing in the world
  9. 9. Full-text Search • Complications of stand alone search engines – Data synchronization • Managing deltas, index updates • Filtering/deleting/hiding expired data • Search server outages, redundancy – Learning curve – Character sets match up with my database? – Additional hardware / servers just for search – Can feel like a black box when you get a support question asking “why is/isn’t this showing up?”
  10. 10. Full-text Search • But what if your needs are more like: – Search within my database – Avoid syncing data with outside systems – Avoid maintaining outside systems – Less black box, more control
  11. 11. Full-text Search • tsvector – The text to be searched • tsquery – The search query • to_tsvector(„the church is AWESOME‟) @@ to_tsquery(SEARCH) • @@ to_tsquery(„church‟) == true • @@ to_tsquery(„churches‟) == true • @@ to_tsquery(„awesome‟) == true • @@ to_tsquery(„the‟) == false • @@ to_tsquery(„churches & awesome‟) == true • @@ to_tsquery(„church & okay‟) == false • to_tsvector(„the church is awesome‟) – 'awesom':4 'church':2 • to_tsvector(„simple‟,‟the church is awesome‟) – 'are':3 'awesome':4 'church':2 'the':1
  12. 12. Full-text Search • ALTER TABLE mytable ADD COLUMN search_vector tsvector • UPDATE mytable SET search_vector = to_tsvector(„english‟,coalesce(title,‟‟) || „ „ || coalesce(body,‟‟) || „ „ || coalesce(tags,‟‟)) • CREATE INDEX search_text ON mytable USING gin(search_vector) • SELECT some, columns, we, need FROM mytable WHERE search_vector @@ to_tsquery(„english‟,„Jesus & awesome‟) ORDER BY ts_rank(search_vector,to_tsquery(„english‟,„Jesus & awesome‟)) DESC • CREATE TRIGGER search_update BEFORE INSERT OR UPDATE ON mytable FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(search_vector, ‟english‟, title, body, tags)
  13. 13. Full-text Search • CREATE FUNCTION search_trigger RETURNS trigger AS $$ begin new.search_vector := setweight(to_tsvector(„english‟,coalesce(new.title,‟‟)),‟A‟) || setweight(to_tsvector(„english‟,coalesce(new.body,‟‟)),‟D‟) || setweight(to_tsvector(„english‟,coalesce(new.tags,‟‟)),‟B‟); return new; end $$ LANGUAGE plpgsql; • CREATE TRIGGER search_vector_update BEFORE INSERT OR UPDATE OF title, body, tags ON mytable FOR EACH ROW EXECUTE PROCEDURE search_trigger();
  14. 14. Full-text Search • A variety of dictionaries – Various Languages – Thesaurus – Snowball, Stem, Ispell, Synonym – Write your own • ts_headline – Snippet extraction and highlighting
  15. 15. Datatypes: ranges • int4range, int8range, numrange, tsrange, tstzrange, daterange • SELECT int4range(10,20) @> 3 == false • SELECT numrange(11.1,22.2) && numrange(20.0,30.0) == true • SELECT int4range(10,20) * int4range(15,25) == 15-20 • CREATE INDEX res_index ON schedule USING gist(during) • ALTER TABLE schedule ADD EXCLUDE USING gist (during WITH &&) ERROR: conflicting key value violates exclusion constraint ”schedule_during_excl” DETAIL: Key (during)=([ 2010-01-01 14:45:00, 2010-01-01 15:45:00 )) conflicts with existing key (during)=([ 2010-01-01 14:30:00, 2010-01-01 15:30:00 )).
  16. 16. Datatypes: hstore • properties – {“author” => “John Grisham”, “pages” => 535} – {“director” => “Jon Favreau”, “runtime” = 126} • SELECT … FROM mytable WHERE properties -> „director‟ LIKE „%Favreau‟ – Does not use an index • WHERE properties @> („author‟ LIKE “%Grisham”) – Uses an index to only check properties with an „author‟ • CREATE INDEX table_properties ON mytable USING gin(properties)
  17. 17. Datatypes: arrays • CREATE TABLE sal_emp(name text, pay_by_quarter integer[], schedule text[][]) • CREATE TABLE tictactoe ( squares integer[3][3] ) • INSERT INTO tictactoe VALUES („{{1,2,3},{4,5,6},{7,8,9}}‟) • SELECT squares[1:2][1:1] == {{1},{4}} • SELECT squares[2:3][2:3] == {{5,6},{8,9}}
  18. 18. Datatypes: JSON • Validate JSON structure • Convert row to JSON • Functions and operators very similar to hstore
  19. 19. Datatypes: XML • Validates well-formed XML • Stores like a TEXT field • XML operations like Xpath • Can’t index XML column but you can index the result of an Xpath function
  20. 20. Data compression with TOAST • TOAST = The Oversized Attribute Storage Technique • TOASTable data is automatically TOASTed • Example: – stored a 2.2m XML document – storage size was 81k
  21. 21. User created datatypes • Built in types – Numerics, monetary, binary, time, date, interval, boolean, enumerated, geometric, network address, bit string, text search, UUID, XML, JSON, array, composite, range – Add-ons for more such as UPC, ISBN and more • Create your own types – Address (contains 2 streets, city, state, zip, country) – Define how your datatype is indexed – GIN and GiST indexes are used by custom datatypes
  22. 22. Further exploration: PostGIS • Adds Geographic datatypes • Distance, area, union, intersection, perimeter • Spatial indexes • Tools to load available geographic data • Distance, Within, Overlaps, Touches, Equals, Contains, Crosses • SELECT name, ST_AsText(geom) FROM nyc_subway_stations WHERE name = „Broad St‟ • SELECT name, boroname FROM nyc_neighborhoods WHERE ST_Intersects(geom, ST_GeomFromText(„POINT(583571 4506714)‟,26918) • SELECT sub.name, nh.name, nh.borough FROM nyc_neighborhoods AS nh JOIN nyc_subway_stations AS sub ON ST_Contains(nh.geom, sub.geom) WHERE sub.name = „Broad St”
  23. 23. Further exploration: Functions • Can be used in queries • Can be used in stored procedures and triggers • Can be used to build indexes • Can be used as table defaults • Can be written in PL/pgSQL, PL/Tcl, PL/Perl, PL/Python out of the box • PL/V8 is available an an extension to use Javascript
  24. 24. Further exploration: PLV8 • CREATE OR REPLACE FUNCTION plv8_test(keys text[], vals text[]) RETURNS text AS $$ var o = {}; for(var i = 0; i < keys.length; i++) { o[keys[i]] = vals[i]; } return JSON.stringify(o); $$ LANGUAGE plv8 IMMUTABLE STRICT; SELECT plv8_test(ARRAY[„name‟,‟age‟],ARRAY[„Tom‟,‟29‟]); • CREATE TYPE rec AS (i integer, t text); CREATE FUNCTION set_of_records RETURNS SETOF rec AS $$ plv8.return_next({“i”: 1,”t”: ”a”}); plv8.return_next({“i”: 2,”t”: “b”}); $$ LANGUAGE plv8; SELECT * FROM set_of_records();
  25. 25. Further exploration: Async commands / indexes • Fine grained control within functions – PQsendQuery – PQsendQueryParams – PQsendPrepare – PQsendQueryPrepared – PQsendDescribePrepared – PQgetResult – PQconsumeInput • Per connection asynchronous commits – set synchronous_commit = off • Concurrent index creation to avoid blocking large tables – CREATE INDEX CONCURRENTLY big_index ON mytable (things)
  26. 26. Thanks!
  27. 27. References / Credits • NOTE: Some code samples in this presentation have minor alterations for presentation clarity (such as leaving out dictionary specifications on some search calls, etc) • http://www.postgresql.org/docs/9.2/static/index.html • http://workshops.opengeo.org/postgis-intro/ • http://stackoverflow.com/questions/15983152/how-can-i-find-out- how-big-a-large-text-field-is-in-postgres • https://devcenter.heroku.com/articles/heroku-postgres- extensions-postgis-full-text-search • http://railscasts.com/episodes/345-hstore?view=asciicast • http://www.slideshare.net/billkarwin/full-text-search-in- postgresql

×