Understandung Firebird optimizer, by Dmitry Yemanov (in English)

3,028 views

Published on

Understanding Firebird optimizer, by Dmitry Yemanov (in English)

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,028
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
40
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Understandung Firebird optimizer, by Dmitry Yemanov (in English)

  1. 1. Understanding Firebird optimizer Dmitry Yemanov [email_address] Firebird Project
  2. 2. Optimizer Keypoints <ul><ul><li>Allow the data to be retrieved in the most efficient way possible </li></ul></ul><ul><ul><li>Analyze the existing statistical information
  3. 3. Inject additional predicates </li></ul></ul><ul><ul><li>Order operations by priority </li></ul></ul><ul><ul><li>Try different join permutations </li></ul></ul><ul><ul><li>Strategies </li></ul></ul><ul><ul><li>Rule-based (heuristics)
  4. 4. Cost-based (statistics)
  5. 5. Mixed </li></ul></ul>
  6. 6. Optimizer Algorithm <ul><ul><li>Preparation </li></ul></ul><ul><ul><li>Expand views
  7. 7. Separate predicates: «base», «parent», «missing»
  8. 8. Distribute equalities
  9. 9. Generate index mappings </li></ul></ul><ul><ul><li>Main stage </li></ul></ul><ul><ul><li>Calculate cost for different join orders
  10. 10. Choose the best index coverage for the given join order
  11. 11. Ensure early predicates evaluation
  12. 12. Decide about navigation or sorting </li></ul></ul>
  13. 13. Rule-based Approach <ul><ul><li>Heuristical assumptions </li></ul></ul><ul><ul><li>Indexed retrieval is better than a full table scan
  14. 14. Loop join (indexed) is better than a merge join </li></ul></ul><ul><ul><li>Index b-tree has three levels of depth </li></ul></ul><ul><ul><li>Compound indices are better than a few simple ones </li></ul></ul><ul><ul><li>Drawbacks </li></ul></ul><ul><ul><li>Indices could be not really good for some operations
  15. 15. Not ready for «ad hoc» queries </li></ul></ul>
  16. 16. Cost-based Approach <ul><ul><li>Key ideas </li></ul></ul><ul><ul><li>Every operation has an associated cost value
  17. 17. Cost is calculated using the statistical information </li></ul></ul><ul><ul><li>Cost is aggregated from bottom up in the access path </li></ul></ul><ul><ul><li>Drawbacks </li></ul></ul><ul><ul><li>Complex implementation
  18. 18. Slower optimization process
  19. 19. Requires up-to-date statistics </li></ul></ul>
  20. 20. Basic Terms <ul><ul><li>Selectivity </li></ul></ul><ul><ul><li>Represents a fraction of rows from a row set
  21. 21. Value range is 0.0 to 1.0 </li></ul></ul><ul><ul><li>Cardinality </li></ul></ul><ul><ul><li>Represents number of rows in a row set
  22. 22. Base cardinality is a number of rows in a base table </li></ul></ul><ul><ul><li>Cost </li></ul></ul><ul><ul><li>Represents computational complexity of the retrieval
  23. 23. Is a function of the estimated cardinalities
  24. 24. Linearly depends on the number of logical reads (page fetches) </li></ul></ul>
  25. 25. Cost Measurement <ul><ul><li>Full table scan </li></ul></ul><ul><ul><li>cost = <base cardinality> </li></ul></ul><ul><ul><li>Unique index scan + table scan </li></ul></ul><ul><ul><li>cost = <b-tree level> + 1 </li></ul></ul><ul><ul><li>Range index scan + table scan </li></ul></ul><ul><ul><li>cost = <b-tree level> + N + <selectivity> * <base cardinality>
  26. 26. N represents a number of the leaf pages to be scanned and thus depends on the average key length </li></ul></ul>
  27. 27. Cost Aggregation SELECT * FROM T1 JOIN T2 ON T1.PK = T2.FK WHERE T1.VAL < 100 ORDER BY T1.RANK PLAN SORT ( JOIN ( T1 NATURAL, T2 INDEX (FK) ) ) Table T1: base cardinality = 1000 Table T2: base cardinality = 5000 Index FK: selectivity = 0.001 Final Row Set cost = 5000 cardinality = 2500 Sort cost = 5000 cardinality = 2500 Full Scan cost = 1000 cardinality = 1000 Filter cost = 1000 cardinality = 500 Index Scan cost = 7 cardinality = 5 Loop Join cost = 4000 cardinality = 2500
  28. 28. Statistics <ul><ul><li>What is it? </li></ul></ul><ul><ul><li>Information that describes data amounts and distribution of values on different levels (table / index / column) </li></ul></ul><ul><ul><li>Where is located? </li></ul></ul><ul><ul><li>Stored in the database
  29. 29. Calculated «on the fly» </li></ul></ul><ul><ul><li>How is updated? </li></ul></ul><ul><ul><li>By user's request (SET STATITICS)
  30. 30. On index creation / activation
  31. 31. On database restore </li></ul></ul>
  32. 32. Core Statistics <ul><ul><li>Base cardinality (number of rows in a table) </li></ul></ul><ul><ul><li>For small tables: number of used record slots on data pages
  33. 33. For large tables: number of data pages / average record length
  34. 34. Estimated at runtime using a page scan </li></ul></ul><ul><ul><li>Index selectivity </li></ul></ul><ul><ul><li>1 / number of distinct keys
  35. 35. Maintained per segment: (A), (A, B), (A, B, C)
  36. 36. Uniform value distribution is assumed
  37. 37. Stored on the index root page, visible through RDB$INDICES and RDB$INDEX_SEGMENTS </li></ul></ul>
  38. 38. Advanced Statistics <ul><ul><li>Table level </li></ul></ul><ul><ul><li>Average page fill factor
  39. 39. Average record length </li></ul></ul><ul><ul><li>Index level </li></ul></ul><ul><ul><li>B-tree depth
  40. 40. Average key length
  41. 41. Clustering factor </li></ul></ul><ul><ul><li>Column level </li></ul></ul><ul><ul><li>Number of NULLs
  42. 42. Value distribution histograms </li></ul></ul>
  43. 43. Clustering Factor Bad Clustering Factor Good Clustering Factor Index Key 1 Index Key 2 Index Key 3 Index Key 5 Index Key 4 Data Page 12 Data Page 25 Data Page 28 Data Page 57 Data Page 44 Data Page 12 Data Page 13 Data Page 14
  44. 44. Decisions Based on Statistics <ul><ul><li>Full table scan vs indexed retrieval </li></ul></ul><ul><ul><li>Big selectivity value suggests a full table scan </li></ul></ul><ul><ul><li>Order of streams in loop joins </li></ul></ul><ul><ul><li>Calculate costs of different stream permutations and choose the cheapest one </li></ul></ul><ul><ul><li>Loop join vs merge join </li></ul></ul><ul><ul><li>Calculate costs of different stream permutations </li></ul></ul><ul><ul><li>Index navigation vs external sorting </li></ul></ul><ul><ul><li>Depends on the clustering factor </li></ul></ul>
  45. 45. Decisions Based on Statistics (cont'd) <ul><ul><li>What indices to use </li></ul></ul><ul><ul><li>Compare index selectivities and index scan costs
  46. 46. Estimate how many indices would work best
  47. 47. Consider segment operations for compound indices
  48. 48. Special handling of different comparisons
  49. 49. Calculate selectivities for AND / OR operations </li></ul></ul>
  50. 50. Thank you!

×