Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Postgres Performance for Humans

147 views

Published on

A guide for what to pay attention to when it comes to monitoring and managing your PostgreSQL database. The content is targeted at application developers as opposed to the typical DBA.

Published in: Technology
  • Be the first to comment

Postgres Performance for Humans

  1. 1. Postgres Performance for Humans @craigkerstiens www.citusdata.com
  2. 2. Shameless plugs http://www.postgresweekly.com http://www.craigkerstiens.com http://www.postgresguide.com http://www.postgresapp.com http://www.citusdata.com
  3. 3. Postgres - TLDR
  4. 4. Postgres - TLDR Datatypes Conditional Indexes Transactional DDL Foreign Data Wrappers Concurrent Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window functions JSONB Momentum
  5. 5. TLDRinaquote http://www.craigkerstiens.com/2012/04/30/why-postgres/ “It’s the emacs of databases”
  6. 6. OLTP vs OLAP
  7. 7. OLTP vs OLAP Web apps
  8. 8. OLTP vs OLAP BI/Reporting
  9. 9. PostgresSetup/Config On Amazon Use RDS, Heroku, Citus OR ‘postgresql when its not your dayjob’ Other clouds ‘postgresql when its not your dayjob’ Real hardware High performance PostgreSQL http://thebuild.com/blog/2012/06/04/postgresql-when-its-not-your-job-at-djangocon-europe/
  10. 10. Cache rules everything around me
  11. 11. Cache Hit RateSELECT 'index hit rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes)
  12. 12. Cache Hit Rate name | ratio ----------------+------------------------ cache hit rate | 0.99
  13. 13. Index Hit Rate SELECT relname, 100 * idx_scan / (seq_scan + idx_scan), n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;
  14. 14. Index Hit Rate relname | percent_of_times_index_used | rows_in_table ---------------------+-----------------------------+--------------- events | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059
  15. 15. Rough guidelines Cache hit rate >= 99% Index hit rate >= 95% where on > 10,000 rows
  16. 16. Shortcuts
  17. 17. $ cat ~/.psqlrc set ON_ERROR_ROLLBACK interactive -- automatically switch between extended and normal x auto -- always show how long a query takes timing set show_slow_queries 'SELECT (total_time / 1000 / 60) as total_minutes, (total_time/calls) as average_time, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100;' psql
  18. 18. $ cat ~/.psqlrc set ON_ERROR_ROLLBACK interactive -- automatically switch between extended and normal x auto -- always show how long a query takes timing set show_slow_queries 'SELECT (total_time / 1000 / 60) as total_minutes, (total_time/calls) as average_time, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100;' psql
  19. 19. Understanding Specific Query Performance
  20. 20. Understanding Query Performance SELECT last_name FROM employees WHERE salary >= 50000;
  21. 21. Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000) (3 rows)
  22. 22. Explain # EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees width=6) Filter: (salary >= 50000) (3 rows) startup time max time rows return (cost=0.00..35811.00 rows=1
  23. 23. Explain Analyze # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Filter: (salary >= 50000) (3 rows) startup time max time rows return actual time 2.401..295.247 rows=1428 295.379
  24. 24. Rough guidelines Page response times < 100 ms Common queries < 10ms Rare queries < 100ms
  25. 25. Explain Analyze # EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000) Total runtime: 295.379 (3 rows) Filter: (salary >= 50000) (3 rows) startup time max time rows return actual time 2.401..295.247 rows=1428 295.379
  26. 26. # CREATE INDEX idx_emps ON employees (salary); Indexes!
  27. 27. Indexes! EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN -------------------------------------------------- Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000) Total runtime: 1.771 ms (3 rows)
  28. 28. pg_stat_statements $ select * from pg_stat_statements where query ~ 'from users where email'; userid │ 16384 dbid │ 16388 query │ select * from users where email = ?; calls │ 2 total_time │ 0.000268 rows │ 2 shared_blks_hit │ 16 shared_blks_read │ 0 shared_blks_dirtied │ 0 shared_blks_written │ 0 local_blks_hit │ 0 local_blks_read │ 0 local_blks_dirtied │ 0 local_blks_written │ 0 ...
  29. 29. SELECT (total_time / 1000 / 60) as total, (total_time/calls) as avg, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 100; pg_stat_statements
  30. 30. total | avg | query --------+--------+------------------------- 295.76 | 10.13 | SELECT id FROM users... 219.13 | 80.24 | SELECT * FROM ... (2 rows) pg_stat_statements
  31. 31. Indexes
  32. 32. Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) Space Partitioned GIST (SP-GIST) Block Range (BRIN)
  33. 33. Indexes Which do I use?
  34. 34. BTree This is what you usually want
  35. 35. Generalized Inverted Index (GIN) Use with multiple values in 1 column Array/hStore/JSONB
  36. 36. Generalized Search Tree (GIST) Full text search Shapes
  37. 37. Indexes B-Tree Generalized Inverted Index (GIN) Generalized Search Tree (GIST) Space Partitioned GIST (SP-GIST) Block Range (BRIN) VODKA (Coming soon)
  38. 38. More indexes
  39. 39. Indexes Conditional Functional Concurrent creation
  40. 40. Conditional > SELECT * FROM places; name | population ----------------------------------- ACMAR | 6055 ARAB | 13650
  41. 41. Conditional > SELECT * FROM places WHERE population > 10000; name | population ----------------------------------- ARAB | 13650
  42. 42. Conditional > CREATE INDEX idx_large_population ON places(name) where population > 10000;
  43. 43. Functional > SELECT * FROM places; data ----------------------------------- {"city": "ACMAR", "pop": 6055} {"city": "ARAB", "pop": 13650}
  44. 44. > SELECT * FROM places WHERE get_numeric('pop', data) > 10000; data ----------------------------------- {"city": "ARAB", "pop": 13650} Functional
  45. 45. > CREATE INDEX idx_large_population ON places(get_numeric('pop', data)); Functional
  46. 46. ConditionalandFunctional > CREATE INDEX idx_large_population ON places(data) WHERE get_numeric('pop', data) > 10000;
  47. 47. CREATE INDEX CONCURRENTLY ... roughly 2-3x slower Doesn’t lock table One more thing
  48. 48. hstore / JSON / JSONB
  49. 49. hstore / JSON / JSONB
  50. 50. JSONB CREATE TABLE users ( id integer NOT NULL, email character varying(255), data jsonb, created_at timestamp without time zone, last_login timestamp without time zone );
  51. 51. SELECT '{"id":1,"email": “craig.kerstiens@gmail.com",}'::jsonb; SELECT * FROM talks WHERE data->’conference’ = ‘PyOhio’; JSONB
  52. 52. JSONB Indexes work gin gist
  53. 53. Pooling
  54. 54. Application/Framework layer Stand alone daemon Options
  55. 55. pgbouncer pgpool PG options
  56. 56. Backups
  57. 57. Logical pg_dump can be human readable, is portable
  58. 58. Physical The bytes on disk Base backup
  59. 59. Logical Good across architectures Good for portability Has load on DB Works < 50 GB Physical More initial setup Less portability Limited load on system Use above 50 GB
  60. 60. Adding Cache
  61. 61. Increase cache with cash
  62. 62. Vertical scaling
  63. 63. Vertical scaling
  64. 64. Horizontal scaling Reads to a replica
  65. 65. Replicationslony londiste bucardo pgpool wal-e barman
  66. 66. Replicationslony londiste bucardo pgpool wal-e barman
  67. 67. Horizontal scaling Reads to a replica Split up large tables Split up data by customer • 1 database per customer • 1 schema per customer Shard within your application
  68. 68. Horizontal scaling Use something like Citus
  69. 69. Recap
  70. 70. OLAP Whole other talk Disk IO is important Order on disk is helpful (pg-reorg) MPP solutions on top of Postgres Recap
  71. 71. OLTP (webapps) Ensure bulk of data is cache Optimize overall query load with pg_stat_statements Efficient use of indexes When cache sucks, throw more at it Recap
  72. 72. Thanks! Questions? http://www.speakerdeck.com/u/craigkerstiens/ @craigkerstiens @citusdata

×