Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open


Published on

Published in: Technology
  • Be the first to comment

Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open

  1. 1. Click to edit Master subtitle style 1 Query Planning Gone Wrong Robert Haas Chief Architect, Database Server
  2. 2. Why This Talk? ● 2010: The PostgreSQL Query Planner (Robert Haas) ● How does the query planner actually work from a user perspective? What does it really do? ● Very common audience question: What do I do when the query planner fails? How do I fix my query? ● 2011: Hacking the Query Planner (Tom Lane) ● How does the query planner actually work from a developer perspective? What does it *really* do? ● Plea for help to improve the query planner. ● But... what should we be improving?
  3. 3. Methodology: Which Problems Matter? ● Read hundreds of email threads on pgsql-performance over a period of almost two years. ● Disregarded all those that were not about query performance problems. ● Decided what I thought the root cause (or, occasionally, causes) of each complaint was. ● Skipped a very small number where I couldn't form an opinion. ● Counted the number of times each problem was reported.
  4. 4. Methodology: Possible Critiques ● The problems reported on pgsql-performance aren't necessarily representative of all the problems PostgreSQL users encounter (reporting bias). ● In particular, confusing problems might be more likely to be reported. ● I might not have correctly identified the cause of each problem (researcher bias). ● Others?
  5. 5. Statistically Speaking, Why Is My Query Slow? (168) ● Settings (23). Includes anything you can fix with postgresql.conf changes, DDL, or operating systems settings changes. ● Just Plain Slow (23). Includes anything that amounts to an unreasonable expectation on the part of the user. These are often questions of the form “why is query A slower than query B?” when A is actually doing something much more expensive than B. ● We're Bad At That (22). Includes anything that could be faster in some other database product, but isn't fast in PostgreSQL for some reason (not implemented yet, or architectural artifact). ● Planner Error (83). Bad decisions about the cost of one plan vs. another plan due to limitations of the optimizer. ● Bugs (14). Bugs in the query planner, or in one case, the Linux kernel. ● User Error (3). User got confused and did something illogical.
  6. 6. Settings (23) ● Planner Cost Constants (8). Adjustments needed to seq_page_cost, random_page_cost, and perhaps cpu_tuple_cost to accurately model real costs. ● Missing Index (4) ● Cost for @@ Operator Is Too Low (2) ● work_mem Too Low (2) ● Statistics Target Too Low (2) ● Statistics Target Too High (1) ● n_distinct Estimates Aren't Accurate On Large Tables (1) ● Not Analyzing Tables Often Enough (1) ● TOAST Decompression is Slow (1) ● vm.zone_reclaim_mode = 1 Causes Extra Disk I/O (1)
  7. 7. Just Plain Slow (23) ● It Takes a While to Process a Lot of Data (6) ● Disks Are Slower Than Memory (6) ● Clauses Involving Multiple Tables Can't Be Pushed Down (2) ● Random I/O is Slower Than Sequential I/O (1) ● Linearly Scanning an Array is O(n) (1) ● One Regular Expression is Faster Than Two (1) ● Can't Figure Out Which Patterns Match a String Without Trying Them All (1) ● xmlagg Is Much Slower Than string_agg (1) ● Scanning More Tables is Slower Than Scanning Fewer Tables (1) ● Replanning Isn't Free (1) ● Repeated Concatenation Using xmlconcat Is Slow (1) ● UNION is Slower than UNION ALL (1)
  8. 8. We're Bad At That (22) ● Plan Types We Can't Generate (11) ● Parameterized Paths (7). Two of these are post-9.2 complaints, involving cases where 9.2 can't parameterize as needed. ● Merge Append (3). Fixed in 9.1. ● Batched Sort of Data Already Ordered By Leading Columns (1). ● Executor Limitations (3) ● Indexing Unordered Data Causes Random I/O (1) ● <> is Not Indexable (1) ● DISTINCT + HashAggregate Reads All Input Before Emitting Any Results (1). This matters if there is a LIMIT. ● Architecture (8) ● No Parallel Query (2), Table Bloat (1), Backend Startup Cost (1), Redundant Updates Are Expensive (1), AFTER Trigger Queue Size (1), On-Disk Size of numeric (1), Autovacuum Not Smart About Inherited Tables (1)
  9. 9. Planner Errors (83) ● Any guesses?
  10. 10. Planner Errors (83) ● Conceptual Errors (28). The planner isn't able to recognize that two different queries are equivalent, so it doesn't even consider the best plan. ● Estimation Errors (55). The planner considers the optimal plan, but rejects it as too expensive. ● Row Count Estimation Errors (48). The planner mis-estimates the number of rows that will be returned by some scan, join, or aggregate. ● Cost Estimation Errors (7). The planner estimates the row count correctly but incorrectly estimates the relative cost.
  11. 11. Grand Prize Winners ● Selectivity of filter conditions involving correlated columns is estimated inaccurately (13) ● Suppose we want all the rows from a table where a = 1 and b = 1 and c = 1 and d = 1 and e = 1. The planner must estimate the number of rows that will match, but only has statistics on each column individually. ● Planner incorrectly thinks that “SELECT * FROM foo WHERE a = 1 ORDER BY b LIMIT n” will fill the limit after reading a small percentage of the index (11) ● It can scan an index on b and filter for rows where a = 1. ● Or it can scan an index on a, find all rows where a = 1, and perform a top-N sort. ● It often prefers the former when the latter would be faster. ● Can often be worked around with a composite or functional index.
  12. 12. Planner Error: Row Count Estimation – Others (24) ● Using WITH Results in a Bad Plan (5). Some of these are query fattening issues, while others result from failure to dig out variable statistics. ● Generic Plans Can Have Wildly Wrong Estimates (4). Improved. ● Selectivity Estimates on Arbitrary Estimates are Poor (4) ● Join Selectivity Doesn't Know about Cross-Table Correlations (3) ● Uncommitted Tuples Don't Affect Statistics (2) ● No Stats for WITH RECURSIVE (1) or GROUP BY (1) Results ● Redundant Equality Constraints Not Identified As Such (1) ● IN/NOT IN Estimation Doesn't Assume Array Elements Distinct (1). Fixed. ● Histogram Bounds Can Slide Due to New Data (1). Fixed. ● Inheritance Parents Aren't Assumed to be Completely Empty (1). Fixed.
  13. 13. Planner Error: Cost Estimation (7) ● Planner doesn't account for de-TOASTing cost (4) ● Plan change causes volume of data to exceed server memory (2) ● Hash join sometimes decides to hash the larger table when it should probably be hashing the smaller one (1)
  14. 14. Planner Error: Conceptual (28) ● Cross-data type comparisons are not always indexable (3) ● Inlining the same thing multiple times can lose (3) ● NOT IN is hard to optimize – and we don't try very hard (3) ● Target lists are computed too early or unnecessary targets are computed (3) ● Can't rewrite SELECT max(a) FROM foo WHERE b IN (…) as max of index scans (2) ● Can't rearrange joins and aggregates relative to one another (2) ● Can't deduce implied inequalities (2) ● Ten other issues that came up once each
  15. 15. Thank You ● Any questions?