Successfully reported this slideshow.
Your SlideShare is downloading. ×

Advanced pg_stat_statements: Filtering, Regression Testing & more

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 71 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Advanced pg_stat_statements: Filtering, Regression Testing & more (20)

Advertisement

Advanced pg_stat_statements: Filtering, Regression Testing & more

  1. 1. Advanced pg_stat_statements: Filtering, Regression Testing & more @LukasFittl
  2. 2. Skilled Developer Amateur Hacker @LukasFittl
  3. 3. pganalyze.com 1.6 million unique queries tracked using pg_stat_statements
  4. 4. Intro pg_stat_statements userid | 10 dbid | 1397527 query | SELECT * FROM x WHERE y = ? calls | 5 total_time | 15.249 rows | 0 shared_blks_hit | 451 shared_blks_read | 41 shared_blks_dirtied | 26 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_dirtied | 0 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 blk_read_time | 0 blk_write_time | 0
  5. 5. Intro query | SELECT * FROM x WHERE y = ? calls | 5 total_time | 15.249 Query + Avg Time + Timeframe
  6. 6. Intro
  7. 7. Improving Data Quality pg_query Filtering & Regression Testing
  8. 8. Improving Data Quality pg_query Filtering & Regression Testing
  9. 9. SELECT "postgres_settings".* FROM "postgres_settings" WHERE "postgres_settings"."database_id" = $1 AND "postgres_settings"."invalidated_at_snapshot_id" IS NULL AND (id not in (70288,70289,70290,70291,70292,70293,70294,70295,70296,70297,70298 ,70299,70300,70301,70302,70303,70304,70305,70306,70307,70308,70309 ,70310,70311,70312,70313,70314,70315,70316,70317,70318,70319,70320 ,70321,70322,70323,70324,70325,70326,70327,99059,99060,70330,70331 ,70332,70333,70334,70335,70336,70337,70338,99061,70340,70341,70342 ,70343,70344,70345,70346,70347,70348,70349,70350,70351,70352,70353 ,70354,70355,70356,70357,70358,70359,70360,99062,70362,70363,70364 ,70365,70366,70367,70368,70369,70370,70371,70372,70373,70374,70375 ,70376,70377,70378,70379,70380,70381,70382,70383,70384,70385,70386 ,99063,99064,99065,99066,99067,70392,70393,70394,70395,70396,70397 ,70398,70399,70400,70401,70402,70403,70404,70405,99068,70407,70408 ,70409,70410,70411,70412,70413,70414,70415,70416,70417,99069,70419 ,70420,70421,99070,70423,70424,70425,70426,70427,70428, Truncation Improving Data Quality
  10. 10. Improving Data Quality -[ RECORD 1 ]———+-------------------------------- query | SELECT * FROM x WHERE y = ? calls | 5 total_time | 15.249 -[ RECORD 2 ]———+-------------------------------- query | SELECT * FROM z WHERE a = 123 calls | 50 total_time | 104.19 Race Condition during pg_stat_statements_reset()
  11. 11. Lesson Learned: Avoid frequent Improving Data Quality pg_stat_statements_reset()
  12. 12. Fingerprinting SELECT a AS b == SELECT a AS c Problematic: y IN (?, ?, ?) != y IN (?, ?) Improving Data Quality SELECT a, b FROM x != SELECT b, a FROM x DEALLOCATE p141 != DEALLOCATE p150
  13. 13. Limited Statistical Information ! Histogram / MAX(runtime) would be super-useful Improving Data Quality
  14. 14. pg_stat_plans Improving Data Quality pg_stat_statements variant that differentiates between query plans. Slower + Don’t use it before this bug is fixed: https://github.com/2ndQuadrant/pg_stat_plans/issues/39
  15. 15. Improving Data Quality Filtering & Regression Testing pg_query
  16. 16. Storing & Cleaning pg_stat_statements data pg_query
  17. 17. pg_query Monitoring Setup Snapshot {“schema”: {“n_live_tup”: 75, "relpages": 1, "reltuples": 75.0,…}, “queries”: [{..}, {..}]} Production Database Collector Normalize {“schema”: {“n_live_tup”: 75, "relpages": 1, "reltuples": 75.0,…}, “queries”: [{..}, {..}]} Monitoring Database Parse Fingerprint Extract Tables
  18. 18. pg_query queries id | 7053479 database_id | 1 received_query | SELECT * FROM x WHERE y = ? normalized_query | SELECT * FROM x WHERE y = ? created_at | 2014-06-27 16:20:08.334705 updated_at | 2014-06-27 16:20:08.334705 parse_tree | [{"SELECT":{...}] parse_error | parse_warnings | statement_types | {SELECT} truncated | f fingerprint | 00704f1fd8442b7c17821cb8a61856c3d61b330e
  19. 19. pg_query query_snapshots id | 170661585 query_id | 7053479 calls | 29 total_time | 94.38 rows | 29 snapshot_id | 3386118 snapshots id | 3386118 database_id | 408 collected_at | 2014-09-09 20:10:01 submitter | pganalyze-collector 0.6.1 query_source | pg_stat_statements
  20. 20. pg_query Normalize Parse Fingerprint Extract Tables
  21. 21. pg_query Normalize Parse Parsing an SQL Query Fingerprint Extract Tables
  22. 22. EXPLAIN (PARSETREE TRUE) pg_query SELECT * FROM x WHERE y = 1 ({SELECT :distinctClause <> :intoClause <> :targetList ( {RESTARGET :name <> :indirection <> :val {COLUMNREF :fields ({A_STAR}) :location 7} :location 7}) :fromClause ( {RANGEVAR :schemaname <> :relname x :inhOpt 2 :relpersistence p :alias <> :location 14}) :whereClause {AEXPR :name (“=") :lexpr {COLUMNREF :fields ("y") :location 22} :rexpr {PARAMREF :number 0 :location 26} :location 24} Unfortunately doesn’t exist.
  23. 23. pg_query Parse Statement raw_parse(..) pg_catalog Rewrite Query Query Planner Execute
  24. 24. tree = raw_parser(query_str); pg_query str = nodeToString(tree); printf(str); ({SELECT :distinctClause <> :intoClause <> :targetList ( {RESTARGET :name <> :indirection <> :val {COLUMNREF :fields ({A_STAR}) :location 7} :location 7}) :fromClause ( {RANGEVAR :schemaname <> :relname x :inhOpt 2 :relpersistence p :alias <> :location 14}) :whereClause {AEXPR :name (“=") :lexpr {COLUMNREF :fields ("y") :location 22} :rexpr {PARAMREF :number 0 :location 26} :location 24}
  25. 25. pg_query Parse Statement raw_parse(..) pg_catalog Rewrite Query Query Planner Execute
  26. 26. github.com/pganalyze/pg_query pg_query Extension Compiles a full copy of PostgreSQL when you do “gem install pg_query”
  27. 27. pg_query PgQuery._raw_parse( “SELECT * FROM x WHERE y = 1”) ({SELECT :distinctClause <> :intoClause <> :targetList ( {RESTARGET :name <> :indirection <> :val {COLUMNREF :fields ({A_STAR}) :location 7} :location 7}) :fromClause ( {RANGEVAR :schemaname <> :relname x :inhOpt 2 :relpersistence p :alias <> :location 14}) :whereClause {AEXPR :name (“=") :lexpr {COLUMNREF :fields ("y") :location 22} :rexpr {PARAMREF :number 0 :location 26} :location 24} :groupClause <> :havingClause <> :windowClause <> :valuesLists <> :sortClause <> :limitOffset <> :limitCount <> :lockingClause <> :withClause <>
  28. 28. pg_query nodeToString is incomplete :( PgQuery._raw_parse(“CREATE SCHEMA foo”) WARNING: 01000: could not dump unrecognized node type: 754
  29. 29. src/backend/nodes/outfuncs.c pg_query Patch: Generate automatically, JSON output
  30. 30. PgQuery._raw_parse( pg_query “SELECT * FROM x WHERE y = 1”) [{"SELECT": { "targetList": [{ "RESTARGET": { "val": { "COLUMNREF": { "fields": [{"A_STAR": {}}], "location": 7 } }, "location": 7 } } ], "fromClause": [ { "RANGEVAR": { "relname": "x", "inhOpt": 2, "relpersistence": "p", "location": 14 } } ], "whereClause": { "AEXPR": { "name": [ "=" ], "lexpr": {
  31. 31. pg_query Parsing a normalized Normalize Parse SQL query Fingerprint Extract Tables
  32. 32. EXPLAIN SELECT * FROM x WHERE y = 1 QUERY PLAN --------------------------------------------------------------------- Index Scan using idx_for_y on x (cost=0.15..8.17 rows=1 width=140) Index Cond: (id = 1) Parse Analyze Plan pg_query
  33. 33. EXPLAIN SELECT * FROM x WHERE y = ? ERROR: syntax error at or near ";" LINE 1: EXPLAIN SELECT * FROM x WHERE y = ?; Parse Analyze Plan pg_query
  34. 34. EXPLAIN SELECT * FROM x WHERE y = ? EXPLAIN SELECT * FROM x WHERE y = $1 ERROR: there is no parameter $1 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $1; Parse Analyze Plan pg_query
  35. 35. pg_query Parser Patch to support parsing “?”
  36. 36. pg_query Downside: Breaks ? operator in some cases Real fix: Don’t use ? as a replacement character.
  37. 37. pg_query Fingerprinting Normalize Parse Fingerprint Extract Tables
  38. 38. pg_query > require ‘pg_query’ ! > q1 = PgQuery.parse(‘SELECT a, b FROM x’) > q1.fingerprint [“c72f1bc9feda72c0b4ba030eea90b4fed3ac8e86”] ! > q2 = PgQuery.parse(‘SELECT b, a FROM x’) > q2.fingerprint [“c72f1bc9feda72c0b4ba030eea90b4fed3ac8e86”]
  39. 39. pg_query 40 lines of unit-tested Ruby code
  40. 40. pg_query Extracting Table References Normalize Parse Fingerprint Extract Tables
  41. 41. pg_query > require ‘pg_query’ > q = PgQuery.parse(‘SELECT * FROM x’) > q.tables [“x”]
  42. 42. pg_query ~90 lines of unit-tested Ruby code
  43. 43. github.com/pganalyze/pg_query pg_query
  44. 44. Improving Data Quality pg_query Filtering & Regression Testing
  45. 45. Filtering Filtering & Regression Testing
  46. 46. monitor.rb Filtering & Regression Testing Simple top-like tool that shows pg_stat_statements data https://gist.github.com/lfittl/301542602607b738b23f
  47. 47. Filtering & Regression Testing monitor.rb -d testdb AVG | QUERY -------------------------------------------------------------------------------- 10.7ms | SELECT oid, typname, typelem, typdelim, typinput FROM pg_type 3.0ms | SET time zone 'UTC' 0.4ms | SELECT a.attname, format_type(a.atttypid, a.atttypmod), pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = ?::regclass AND a.attnum > ? AND NOT a.attisdropped ORDER BY a.attnum 0.2ms | SELECT pg_stat_statements_reset() 0.1ms | SELECT query, calls, total_time FROM pg_stat_statements 0.1ms | SELECT attr.attname FROM pg_attribute attr INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[?] WHERE cons.contype = ? AND cons.conrelid = ?: :regclass 0.0ms | SELECT COUNT(*) FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind in (?,?) AND c.relname = ? AND n.nspname = ANY (current_schemas(?)) 0.0ms | SELECT * FROM posts JOIN users ON (posts.author_id = users.id) WHERE users.login = ?; 0.0ms | SET client_min_messages TO 'panic' 0.0ms | set client_encoding to 'UTF8' 0.0ms | SHOW client_min_messages 0.0ms | SELECT * FROM ad_reels WHERE id = ?; 0.0ms | SELECT * FROM posts WHERE guid = ?; 0.0ms | SELECT ? 0.0ms | SET client_min_messages TO 'warning' 0.0ms | SET standard_conforming_strings = on 0.0ms | SELECT "posts".* FROM "posts" ORDER BY "posts"."id" DESC LIMIT ? 0.0ms | SHOW TIME ZONE
  48. 48. Filtering & Regression Testing monitor.rb -d testdb -t posts AVG | QUERY -------------------------------------------------------------------------------- 0.0ms | SELECT * FROM posts JOIN users ON (posts.author_id = users.id) WHERE users.login = ?; 0.0ms | SELECT * FROM posts WHERE guid = ?; 0.0ms | SELECT "posts".* FROM "posts" ORDER BY "posts"."id" DESC LIMIT ?
  49. 49. Filtering & Regression Testing if cli.config[:table] q = PgQuery.parse(query["query"]) next unless q.tables.include?(cli.config[:table]) end
  50. 50. Regression Testing Filtering & Regression Testing
  51. 51. Which query plans are affected by removal of an index? ! How would execution plans be affected by an upgrade to 9.X? Filtering & Regression Testing
  52. 52. Regression Test based on pg_stat_statements + table statistics. ! (no actual data) Filtering & Regression Testing
  53. 53. Schema Dump + Table Level Statistics "n_live_tup": 75, "relpages": 1, "reltuples": 75.0, “stanumbers1": [..], "stavalues1": “{..}”, … Local Test Database Testing Setup Production Database EXPLAIN SELECT FROM x WHERE y = ? Filtering & Regression Testing
  54. 54. EXPLAIN SELECT * FROM x WHERE y = ? EXPLAIN SELECT * FROM x WHERE y = $1 ERROR: there is no parameter $1 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $1; Parse Analyze Plan Filtering & Regression Testing
  55. 55. y = $1 ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; Filtering & Regression Testing
  56. 56. y = $1 ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; y = NULL QUERY PLAN ---------------------------------------------------------------- Result (cost=0.00..21.60 rows=1 width=40) One-Time Filter: NULL::boolean -> Seq Scan on x (cost=0.00..21.60 rows=1 width=40) Filtering & Regression Testing
  57. 57. y = $1 ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; y = NULL QUERY PLAN ---------------------------------------------------------------- Result (cost=0.00..21.60 rows=1 width=40) One-Time Filter: NULL::boolean -> Seq Scan on x (cost=0.00..21.60 rows=1 width=40) y = (SELECT null) ERROR: failed to find conversion function from unknown to integer Filtering & Regression Testing
  58. 58. y = $1 ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; y = NULL QUERY PLAN ---------------------------------------------------------------- Result (cost=0.00..21.60 rows=1 width=40) One-Time Filter: NULL::boolean -> Seq Scan on x (cost=0.00..21.60 rows=1 width=40) y = (SELECT null) ERROR: failed to find conversion function from unknown to integer y = (SELECT null::integer) QUERY PLAN ---------------------------------------------------------------------- Index Scan using idx_for_y on x (cost=0.16..8.18 rows=1 width=144) Index Cond: (y = $0) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01 rows=1 width=0) Filtering & Regression Testing
  59. 59. Finding out the type y = $1 ERROR: there is no parameter $1 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $1; pg_prepared_statements PREPARE tmp AS SELECT * FROM x WHERE y = $1; SELECT unnest(parameter_types) AS data_type FROM pg_prepared_statements WHERE name = ‘tmp’; DEALLOCATE tmp; data_type ----------- integer Filtering & Regression Testing
  60. 60. EXPLAIN SELECT * FROM x WHERE y = ? EXPLAIN SELECT * FROM x WHERE y = $0 EXPLAIN SELECT * FROM x WHERE y = ((SELECT null::integer)::integer) QUERY PLAN --------------------------------------------------------------------- Index Scan using idx_for_y on x (cost=0.16..8.18 rows=1 width=144) Index Cond: (y = $0) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01 rows=1 width=0) Parse Analyze Plan Filtering & Regression Testing
  61. 61. Open Issue: Planner reads actual physical size whilst planning Filtering & Regression Testing
  62. 62. github.com/pganalyze/pg_simulator Filtering & Regression Testing
  63. 63. Improving Data Quality pg_query Filtering & Regression Testing
  64. 64. 9.5 proposal for pg_s_s: Closing ! Instead of ? use $0 as replacement character - making the output parseable again.
  65. 65. 9.5 proposal for outfuncs.c: ! Generate automatically from struct definitions, cutting 3000 hand-written lines down to 1000. ! Add JSON output support. Closing
  66. 66. 9.X proposal: ! Consider adding a way to get a parsetree more easily. ! Via SQL / shared library / helper tool. Closing
  67. 67. Tools & libraries available at: Closing github.com/pganalyze
  68. 68. @LukasFittl Thank you! github.com/pganalyze pganalyze.com
  69. 69. Backup Slides
  70. 70. Classifying queries Improving Data Quality Frequent/OLTP vs analytical query

×