Your SlideShare is downloading. ×

Postgres demystified

1,637
views

Published on

Published in: Technology

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,637
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
32
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Postgres Demystified Craig Kerstiens @craigkerstiens http://www.craigkerstiens.comhttps://speakerdeck.com/u/craigkerstiens/p/postgres-demystified
  • 2. Postgres Demystified
  • 3. Postgres Demystified
  • 4. Postgres Demystified We’re Hiring
  • 5. Getting Setup Postgres.app
  • 6. Agenda Brief HistoryDeveloping w/ Postgres Postgres Performance Querying
  • 7. Postgres History Post Ingress Postgres Around since 1989/1995PostgresQL Community Driven/Owned
  • 8. MVCCEach query sees transactions committed before it Locks for writing don’t conflict with reading
  • 9. Why Postgres
  • 10. Why Postgres“ its the emacs of databases”
  • 11. Developing w/ Postgres
  • 12. Basicspsql is your friend
  • 13. Basicspsql is your friend # dt # d # d tablename # x # e
  • 14. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point floatinet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 15. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point floatinet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 16. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point floatinet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 17. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point floatinet polygon numeric circle cidr varchar tsquery text timetz path time timestamp macaddr box tsvector
  • 18. DatatypesCREATE TABLE items ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp);
  • 19. DatatypesCREATE TABLE items ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp);
  • 20. DatatypesINSERT INTO itemsVALUES (1, Ruby Gem, {“Programming”,”Jewelry”}, now());INSERT INTO itemsVALUES (2, Django Pony, {“Programming”,”Animal”}, now());
  • 21. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point floatinet polygon numeric circle cidr varchar tsquery text timetz path time timestamp box macaddr tsvector
  • 22. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefuncunaccent btree_gist dict_xsyn
  • 23. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefuncunaccent btree_gist dict_xsyn
  • 24. NoSQL in your SQLCREATE EXTENSION hstore;CREATE TABLE users ( id integer NOT NULL, email character varying(255), data hstore, created_at timestamp without time zone, last_login timestamp without time zone);
  • 25. hStoreINSERT INTO usersVALUES ( 1, craig.kerstiens@gmail.com, sex => "M", state => “California”, now(), now());
  • 26. JSONSELECT {"id":1,"email": "craig.kerstiens@gmail.com",}::json; V8 w/ PLV8 9.2
  • 27. JSONSELECT {"id":1,"email": "craig.kerstiens@gmail.com",}::json; V8 w/ PLV8 9.2
  • 28. JSONSELECT {"id":1,"email": "craig.kerstiens@gmail.com",}::json;create or replace function V8 w/ PLV8js(src text) returns text as $$ return eval( "(function() { " + src + "})" )();$$ LANGUAGE plv8; 9.2
  • 29. JSONSELECT {"id":1,"email": "craig.kerstiens@gmail.com",}::json;create or replace function V8 w/ PLV8js(src text) returns text as $$ return eval( Bad Idea "(function() { " + src + "})" )();$$ LANGUAGE plv8; 9.2
  • 30. Range Types 9.2
  • 31. Range TypesCREATE TABLE talks (room int, during tsrange);INSERT INTO talks VALUES (3, [2012-09-24 13:00, 2012-09-24 13:50)); 9.2
  • 32. Range TypesCREATE TABLE talks (room int, during tsrange);INSERT INTO talks VALUES (3, [2012-09-24 13:00, 2012-09-24 13:50));ALTER TABLE talks ADD EXCLUDE USING gist (during WITH &&);INSERT INTO talks VALUES (1108, [2012-09-24 13:30, 2012-09-24 14:00));ERROR: conflicting key value violates exclusion constraint"talks_during_excl" 9.2
  • 33. Full Text Search
  • 34. Full Text SearchTSVECTOR - Text DataTSQUERY - Search PredicatesSpecialized Indexes and Operators
  • 35. Datatypes UUID date boolean timestamptz array interval integer smallint line bigint XML enum char money serial bytea point floatinet polygon numeric circle cidr varchar tsquery text timetz path time timestamp macaddr box tsvector
  • 36. PostGIS
  • 37. PostGIS1. New datatypes i.e. (2d/3d boxes)
  • 38. PostGIS1. New datatypes i.e. (2d/3d boxes)2. New operators i.e. SELECT foo && bar ...
  • 39. PostGIS1. New datatypes i.e. (2d/3d boxes)2. New operators i.e. SELECT foo && bar ...3. Understand relationships and distance i.e. person within location, nearest distance
  • 40. Performance
  • 41. Sequential Scans
  • 42. Sequential ScansThey’re Bad
  • 43. Sequential ScansThey’re Bad (most of the time)
  • 44. Indexes
  • 45. IndexesThey’re Good
  • 46. IndexesThey’re Good (most of the time)
  • 47. IndexesB-TreeGeneralized Inverted Index (GIN)Generalized Search Tree (GIST)K Nearest Neighbors (KNN)Space Partitioned GIST (SP-GIST)
  • 48. IndexesB-Tree Default Usually want this
  • 49. IndexesGeneralized Inverted Index (GIN) Use with multiple values in 1 column Array/hStore
  • 50. IndexesGeneralized Search Tree (GIST) Full text search Shapes
  • 51. Understanding Query Perf Given SELECT last_name FROM employees WHERE salary >= 50000;
  • 52. Explain# EXPLAIN SELECT last_name FROM employees WHERE salary >=50000; QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000)(3 rows)
  • 53. Explain# EXPLAIN SELECT last_name FROM employees WHERE salary >=50000; Startup Cost QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000)(3 rows)
  • 54. Explain# EXPLAIN SELECT last_name FROM employees WHERE salary >=50000; Max Time Startup Cost QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000)(3 rows)
  • 55. Explain# EXPLAIN SELECT last_name FROM employees WHERE salary >=50000; Max Time Startup Cost QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000)(3 rows) Rows Returned
  • 56. Explain# EXPLAIN ANALYZE SELECT last_name FROM employees WHEREsalary >= 50000; QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actualtime=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000)Total runtime: 295.379(3 rows)
  • 57. Explain# EXPLAIN ANALYZE SELECT last_name FROM employees WHEREsalary >= 50000; Startup Cost QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actualtime=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000)Total runtime: 295.379(3 rows)
  • 58. Explain# EXPLAIN ANALYZE SELECT last_name FROM employees WHEREsalary >= 50000; Startup Cost QUERY PLAN Max Time-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actualtime=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000)Total runtime: 295.379(3 rows)
  • 59. Explain# EXPLAIN ANALYZE SELECT last_name FROM employees WHEREsalary >= 50000; Startup Cost QUERY PLAN Max Time-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actualtime=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000)Total runtime: 295.379(3 rows) Rows Returned
  • 60. Explain# EXPLAIN ANALYZE SELECT last_name FROM employees WHEREsalary >= 50000; Startup Cost QUERY PLAN Max Time-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actualtime=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000)Total runtime: 295.379(3 rows) Rows Returned
  • 61. # CREATE INDEX idx_emps ON employees (salary);
  • 62. # CREATE INDEX idx_emps ON employees (salary);# EXPLAIN ANALYZE SELECT last_name FROM employees WHEREsalary >= 50000; QUERY PLAN-------------------------------------------------------------------------Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000)Total runtime: 1.771 ms(3 rows)
  • 63. # CREATE INDEX idx_emps ON employees (salary);# EXPLAIN ANALYZE SELECT last_name FROM employees WHEREsalary >= 50000; QUERY PLAN-------------------------------------------------------------------------Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000)Total runtime: 1.771 ms(3 rows)
  • 64. Indexes Pro Tips
  • 65. Indexes Pro TipsCREATE INDEX CONCURRENTLY
  • 66. Indexes Pro TipsCREATE INDEX CONCURRENTLYCREATE INDEX WHERE foo=bar
  • 67. Indexes Pro TipsCREATE INDEX CONCURRENTLYCREATE INDEX WHERE foo=barSELECT * WHERE foo LIKE ‘%bar% is BAD
  • 68. Indexes Pro TipsCREATE INDEX CONCURRENTLYCREATE INDEX WHERE foo=barSELECT * WHERE foo LIKE ‘%bar% is BADSELECT * WHERE Food LIKE ‘bar%’ is OKAY
  • 69. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefuncunaccent btree_gist dict_xsyn
  • 70. Cache Hit RateSELECT index hit rate as name, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT cache hit rate as name, case sum(idx_blks_hit) when 0 then NaN::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read),99.99)::numeric end as ratio FROM pg_statio_user_indexes;)
  • 71. Index Hit RateSELECT relname, 100 * idx_scan / (seq_scan + idx_scan), n_live_tupFROM pg_stat_user_tablesORDER BY n_live_tup DESC;
  • 72. Index Hit Rate relname | percent_of_times_index_used | rows_in_table---------------------+-----------------------------+--------------- events | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059 schema_migrations | 0 | 2 authorizations | 0 | 0 delayed_jobs | 23 | 0
  • 73. pg_stats_statements 9.2
  • 74. pg_stats_statements$ select * from pg_stat_statements where query ~ from users where email;userid │ 16384dbid │ 16388query │ select * from users where email = ?;calls │ 2total_time │ 0.000268rows │ 2shared_blks_hit │ 16shared_blks_read │ 0shared_blks_dirtied │ 0shared_blks_written │ 0local_blks_hit │ 0local_blks_read │ 0local_blks_dirtied │ 0 9.2local_blks_written │ 0temp_blks_read │ 0temp_blks_written │ 0time_read │ 0time_write │ 0
  • 75. pg_stats_statements 9.2
  • 76. pg_stats_statementsSELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;----------------------------------------------------------------------query | UPDATE pgbench_branches SET bbalance= bbalance + ? WHERE bid = ?;calls | 3000total_time | 9609.00100000002rows | 2836hit_percent | 99.9778970000200936 9.2
  • 77. Querying
  • 78. Window FunctionsExample: Biggest spender by state
  • 79. Window FunctionsSELECT email, users.data->state, sum(total(items)), rank() OVER (PARTITION BY users.data->state ORDER BY sum(total(items)) desc)FROM users, purchasesWHERE purchases.user_id = users.idGROUP BY 1, 2;
  • 80. Window FunctionsSELECT email, users.data->state, sum(total(items)), rank() OVER (PARTITION BY users.data->state ORDER BY sum(total(items)) desc)FROM users, purchasesWHERE purchases.user_id = users.idGROUP BY 1, 2;
  • 81. Extensions dblink hstore trigram uuid-ossp pgstattuple citext pgrowlocks pgcrypto isn ltree fuzzystrmatch cube earthdistance dict_int tablefuncunaccent btree_gist dict_xsyn
  • 82. Fuzzystrmatch
  • 83. FuzzystrmatchSELECT soundex(Craig), soundex(Will), difference(Craig, Will);
  • 84. FuzzystrmatchSELECT soundex(Craig), soundex(Will), difference(Craig, Will);SELECT soundex(Craig), soundex(Greg), difference(Craig, Greg);SELECT soundex(Willl), soundex(Will), difference(Willl, Will);
  • 85. Moving Data Aroundcopy (SELECT * FROM users) TO ‘~/users.csv’;copy users FROM ‘~/users.csv’;
  • 86. db_linkSELECT dblink_connect(myconn, dbname=postgres);SELECT * FROM dblink(myconn,SELECT * FROM foo) AS t(a int, b text); a | b-------+------------ 1 | example 2 | example2
  • 87. Foreign Data Wrappersoracle mysql odbc twitter sybase redis jdbcfiles couch s3 www ldap informix mongodb
  • 88. Foreign Data WrappersCREATE EXTENSION redis_fdw;CREATE SERVER redis_server FOREIGN DATA WRAPPER redis_fdw OPTIONS (address 127.0.0.1, port 6379);CREATE FOREIGN TABLE redis_db0 (key text, value text) SERVER redis_server OPTIONS (database 0);CREATE USER MAPPING FOR PUBLIC SERVER redis_server OPTIONS (password secret);
  • 89. Query Redis from PostgresSELECT *FROM redis_db0;SELECT id, email, value as visitsFROM users, redis_db0WHERE (user_ || cast(id as text)) = cast(redis_db0.key as text) AND cast(value as int) > 40;
  • 90. All are not equaloracle mysql odbc twitter sybase redis jdbcfiles couch s3 www ldap informix mongodb
  • 91. Production Ready FDWsoracle mysql odbc twitter sybase redis jdbcfiles couch s3 www ldap informix mongodb
  • 92. Readability
  • 93. ReadabilityWITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5)SELECT users.email, count(*)FROM users, line_items, top_5_productsWHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.idGROUP BY 1ORDER BY 1;
  • 94. Common Table ExpressionsWITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5)SELECT users.email, count(*)FROM users, line_items, top_5_productsWHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.idGROUP BY 1ORDER BY 1;
  • 95. Common Table ExpressionsWITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5)SELECT users.email, count(*)FROM users, line_items, top_5_productsWHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.idGROUP BY 1ORDER BY 1;
  • 96. Common Table ExpressionsWITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5)SELECT users.email, count(*)FROM users, line_items, top_5_productsWHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.idGROUP BY 1 ’ORDER BY 1;
  • 97. Common Table ExpressionsWITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5)SELECT users.email, count(*)FROM users, line_items, top_5_productsWHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.idGROUP BY 1 Don’t do this in production ’ORDER BY 1;
  • 98. Brief HistoryDeveloping w/ Postgres Postgres Performance Querying
  • 99. Extras
  • 100. ExtrasListen/Notify
  • 101. ExtrasListen/NotifyPer Transaction Synchronous Replication
  • 102. ExtrasListen/NotifyPer Transaction Synchronous ReplicationSELECT for UPDATE
  • 103. Postgres - TLDR Datatypes Fast Column Addition Conditional Indexes Listen/Notify Transactional DDL Table Inheritance Foreign Data Wrappers Per Transaction sync replicationConcurrent Index Creation Window functions Extensions NoSQL inside SQLCommon Table Expressions Momentum
  • 104. Questions? Craig Kerstiens @craigkerstiens http://www.craigkerstiens.comhttps://speakerdeck.com/u/craigkerstiens/p/postgres-demystified