PostgreSQL - It's kind've a nifty database
Upcoming SlideShare
Loading in...5
×
 

PostgreSQL - It's kind've a nifty database

on

  • 1,682 views

This presentation was given to a company that makes software for churches that is considering a migration from SQL Server to PostgreSQL. It was designed to give a broad overview of features in ...

This presentation was given to a company that makes software for churches that is considering a migration from SQL Server to PostgreSQL. It was designed to give a broad overview of features in PostgreSQL with an emphasis on full-text search, various datatypes like hstore, array, xml, json as well as custom datatypes, TOAST compression and a taste of other interesting features worth following up on.

Statistics

Views

Total Views
1,682
Views on SlideShare
1,397
Embed Views
285

Actions

Likes
3
Downloads
20
Comments
0

3 Embeds 285

http://www.brightball.com 282
https://www.linkedin.com 2
http://www.spundge.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

PostgreSQL - It's kind've a nifty database PostgreSQL - It's kind've a nifty database Presentation Transcript

  • PostgreSQL It’s kind’ve a nifty database
  • How do you pronounce it? Thanks to Thad for the suggestion Answer Response Percentage post-gres-q-l 2379 45% post-gres 1611 30% pahst-grey 24 0% pg-sequel 50 0% post-gree 350 6% postgres-sequel 574 10% p-g 49 0% database 230 4% Total 5267
  • Who is this guy? • I’m Barry Jones • Been a developing web apps since ’98 • Not a DBA • Performance and infrastructure nut • Pragmatic Idealist – Prefer the best solution possible for current circumstances vs the best solution possible at all costs
  • What is PostgreSQL NOT? • NOT a silver bullet • NOT the answer to life, the universe and everything • NOT better at everything than everybody • NOT always the best option for your needs • NOT used to it’s potential in most cases • NOT owned by Oracle or Microsoft
  • What IS PostgreSQL? • Fully ACID compliant • Feature rich and extensible • Fast, scalable and leverages multicore processors very well • Enterprise class with quality corporate support options • Free as in beer • It’s kind’ve nifty
  • Laundry List of Features • Multi-version Concurrency Control (MVCC) • Point in Time Recovery • Tablespaces • Asynchronous replication • Nested Transactions • Online/hot backups • Genetic query optimizer multiple index types • Write ahead logging (WAL) • Internationalization: character sets, locale-aware sorting, case sensitivity, formatting • Full subquery support • Multiple index scans per query • ANSI-SQL:2008 standard conformant • Table inheritance • LISTEN / NOTIFY event system • Ability to make a Power Point slide run out of room
  • What are we covering today? • Full text-search • Built in data types • User defined data types • Automatic data compression • A look at some other cool features and extensions, depending how we’re doing on time
  • Full-text Search • What about…? – Solr – Elastic Search – Sphinx – Lucene – MySQL • All have their purpose – Distributed search of multiple document types • Sphinx – Client search performance is all that matters • Solr – Search constantly incoming data with streaming index updates • Elastic Search excels – You really like Java • Lucene – You want terrible search results that don’t even make sense to you much less your users • MySQL full text search = the worst thing in the world
  • Full-text Search • Complications of stand alone search engines – Data synchronization • Managing deltas, index updates • Filtering/deleting/hiding expired data • Search server outages, redundancy – Learning curve – Character sets match up with my database? – Additional hardware / servers just for search – Can feel like a black box when you get a support question asking “why is/isn’t this showing up?”
  • Full-text Search • But what if your needs are more like: – Search within my database – Avoid syncing data with outside systems – Avoid maintaining outside systems – Less black box, more control
  • Full-text Search • tsvector – The text to be searched • tsquery – The search query • to_tsvector(„the church is AWESOME‟) @@ to_tsquery(SEARCH) • @@ to_tsquery(„church‟) == true • @@ to_tsquery(„churches‟) == true • @@ to_tsquery(„awesome‟) == true • @@ to_tsquery(„the‟) == false • @@ to_tsquery(„churches & awesome‟) == true • @@ to_tsquery(„church & okay‟) == false • to_tsvector(„the church is awesome‟) – 'awesom':4 'church':2 • to_tsvector(„simple‟,‟the church is awesome‟) – 'are':3 'awesome':4 'church':2 'the':1
  • Full-text Search • ALTER TABLE mytable ADD COLUMN search_vector tsvector • UPDATE mytable SET search_vector = to_tsvector(„english‟,coalesce(title,‟‟) || „ „ || coalesce(body,‟‟) || „ „ || coalesce(tags,‟‟)) • CREATE INDEX search_text ON mytable USING gin(search_vector) • SELECT some, columns, we, need FROM mytable WHERE search_vector @@ to_tsquery(„english‟,„Jesus & awesome‟) ORDER BY ts_rank(search_vector,to_tsquery(„english‟,„Jesus & awesome‟)) DESC • CREATE TRIGGER search_update BEFORE INSERT OR UPDATE ON mytable FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(search_vector, ‟english‟, title, body, tags)
  • Full-text Search • CREATE FUNCTION search_trigger RETURNS trigger AS $$ begin new.search_vector := setweight(to_tsvector(„english‟,coalesce(new.title,‟‟)),‟A‟) || setweight(to_tsvector(„english‟,coalesce(new.body,‟‟)),‟D‟) || setweight(to_tsvector(„english‟,coalesce(new.tags,‟‟)),‟B‟); return new; end $$ LANGUAGE plpgsql; • CREATE TRIGGER search_vector_update BEFORE INSERT OR UPDATE OF title, body, tags ON mytable FOR EACH ROW EXECUTE PROCEDURE search_trigger();
  • Full-text Search • A variety of dictionaries – Various Languages – Thesaurus – Snowball, Stem, Ispell, Synonym – Write your own • ts_headline – Snippet extraction and highlighting
  • Datatypes: ranges • int4range, int8range, numrange, tsrange, tstzrange, daterange • SELECT int4range(10,20) @> 3 == false • SELECT numrange(11.1,22.2) && numrange(20.0,30.0) == true • SELECT int4range(10,20) * int4range(15,25) == 15-20 • CREATE INDEX res_index ON schedule USING gist(during) • ALTER TABLE schedule ADD EXCLUDE USING gist (during WITH &&) ERROR: conflicting key value violates exclusion constraint ”schedule_during_excl” DETAIL: Key (during)=([ 2010-01-01 14:45:00, 2010-01-01 15:45:00 )) conflicts with existing key (during)=([ 2010-01-01 14:30:00, 2010-01-01 15:30:00 )).
  • Datatypes: hstore • properties – {“author” => “John Grisham”, “pages” => 535} – {“director” => “Jon Favreau”, “runtime” = 126} • SELECT … FROM mytable WHERE properties -> „director‟ LIKE „%Favreau‟ – Does not use an index • WHERE properties @> („author‟ LIKE “%Grisham”) – Uses an index to only check properties with an „author‟ • CREATE INDEX table_properties ON mytable USING gin(properties)
  • Datatypes: arrays • CREATE TABLE sal_emp(name text, pay_by_quarter integer[], schedule text[][]) • CREATE TABLE tictactoe ( squares integer[3][3] ) • INSERT INTO tictactoe VALUES („{{1,2,3},{4,5,6},{7,8,9}}‟) • SELECT squares[1:2][1:1] == {{1},{4}} • SELECT squares[2:3][2:3] == {{5,6},{8,9}}
  • Datatypes: JSON • Validate JSON structure • Convert row to JSON • Functions and operators very similar to hstore
  • Datatypes: XML • Validates well-formed XML • Stores like a TEXT field • XML operations like Xpath • Can’t index XML column but you can index the result of an Xpath function
  • Data compression with TOAST • TOAST = The Oversized Attribute Storage Technique • TOASTable data is automatically TOASTed • Example: – stored a 2.2m XML document – storage size was 81k
  • User created datatypes • Built in types – Numerics, monetary, binary, time, date, interval, boolean, enumerated, geometric, network address, bit string, text search, UUID, XML, JSON, array, composite, range – Add-ons for more such as UPC, ISBN and more • Create your own types – Address (contains 2 streets, city, state, zip, country) – Define how your datatype is indexed – GIN and GiST indexes are used by custom datatypes
  • Further exploration: PostGIS • Adds Geographic datatypes • Distance, area, union, intersection, perimeter • Spatial indexes • Tools to load available geographic data • Distance, Within, Overlaps, Touches, Equals, Contains, Crosses • SELECT name, ST_AsText(geom) FROM nyc_subway_stations WHERE name = „Broad St‟ • SELECT name, boroname FROM nyc_neighborhoods WHERE ST_Intersects(geom, ST_GeomFromText(„POINT(583571 4506714)‟,26918) • SELECT sub.name, nh.name, nh.borough FROM nyc_neighborhoods AS nh JOIN nyc_subway_stations AS sub ON ST_Contains(nh.geom, sub.geom) WHERE sub.name = „Broad St”
  • Further exploration: Functions • Can be used in queries • Can be used in stored procedures and triggers • Can be used to build indexes • Can be used as table defaults • Can be written in PL/pgSQL, PL/Tcl, PL/Perl, PL/Python out of the box • PL/V8 is available an an extension to use Javascript
  • Further exploration: PLV8 • CREATE OR REPLACE FUNCTION plv8_test(keys text[], vals text[]) RETURNS text AS $$ var o = {}; for(var i = 0; i < keys.length; i++) { o[keys[i]] = vals[i]; } return JSON.stringify(o); $$ LANGUAGE plv8 IMMUTABLE STRICT; SELECT plv8_test(ARRAY[„name‟,‟age‟],ARRAY[„Tom‟,‟29‟]); • CREATE TYPE rec AS (i integer, t text); CREATE FUNCTION set_of_records RETURNS SETOF rec AS $$ plv8.return_next({“i”: 1,”t”: ”a”}); plv8.return_next({“i”: 2,”t”: “b”}); $$ LANGUAGE plv8; SELECT * FROM set_of_records();
  • Further exploration: Async commands / indexes • Fine grained control within functions – PQsendQuery – PQsendQueryParams – PQsendPrepare – PQsendQueryPrepared – PQsendDescribePrepared – PQgetResult – PQconsumeInput • Per connection asynchronous commits – set synchronous_commit = off • Concurrent index creation to avoid blocking large tables – CREATE INDEX CONCURRENTLY big_index ON mytable (things)
  • Thanks!
  • References / Credits • NOTE: Some code samples in this presentation have minor alterations for presentation clarity (such as leaving out dictionary specifications on some search calls, etc) • http://www.postgresql.org/docs/9.2/static/index.html • http://workshops.opengeo.org/postgis-intro/ • http://stackoverflow.com/questions/15983152/how-can-i-find-out- how-big-a-large-text-field-is-in-postgres • https://devcenter.heroku.com/articles/heroku-postgres- extensions-postgis-full-text-search • http://railscasts.com/episodes/345-hstore?view=asciicast • http://www.slideshare.net/billkarwin/full-text-search-in- postgresql