Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cost-based OptimizationandStatistics in Firebird<br />Dmitry Yemanov<br />The Firebird Project<br />http://www.firebirdsql...
Introduction<br /><ul><li>Optimizer decides how to find all the information required in the most efficient way it can
Different queries and/or fetch strategies may benefit from different data access paths
Some information should exist in order to help the optimizer in guessing about the best access path
Optimization strategies
Rule-based (heuristics)
Cost-based (statistics)</li></li></ul><li>Rule-based Optimization<br /><ul><li>Heuristical definitions
Indexed retrieval is better than a full table scan(and indexed loop join is better than a merge join)
B-tree has three levels of depth
Compound indices are better than simple ones
Drawbacks
Indices could be bad for some operations
User intentions are not taken into account
Not ready for “ad hoc” queries</li></li></ul><li>Cost-based Optimization<br /><ul><li>Key points
Every operation has an associated cost value
Cost value is calculated using statistical data
Cost is aggregated from bottom up in the access path
Drawbacks
Complex implementation
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Working with Large Firebird databases
Next
Upcoming SlideShare
Working with Large Firebird databases
Next
Download to read offline and view in fullscreen.

Share

Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

Download to read offline

Basic introduction to internal mechanism of Firebird optimizer. How it works, how it decides to use this or that index, why sometimes it fails and what you can do to improve performance? Definitely this presentation will not answer all these questions but it gives you a basic knowledge of Firebird optimizer internals. This is not for all developers and requires some qualification, definitely.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)

  1. 1. Cost-based OptimizationandStatistics in Firebird<br />Dmitry Yemanov<br />The Firebird Project<br />http://www.firebirdsql.org<br />
  2. 2. Introduction<br /><ul><li>Optimizer decides how to find all the information required in the most efficient way it can
  3. 3. Different queries and/or fetch strategies may benefit from different data access paths
  4. 4. Some information should exist in order to help the optimizer in guessing about the best access path
  5. 5. Optimization strategies
  6. 6. Rule-based (heuristics)
  7. 7. Cost-based (statistics)</li></li></ul><li>Rule-based Optimization<br /><ul><li>Heuristical definitions
  8. 8. Indexed retrieval is better than a full table scan(and indexed loop join is better than a merge join)
  9. 9. B-tree has three levels of depth
  10. 10. Compound indices are better than simple ones
  11. 11. Drawbacks
  12. 12. Indices could be bad for some operations
  13. 13. User intentions are not taken into account
  14. 14. Not ready for “ad hoc” queries</li></li></ul><li>Cost-based Optimization<br /><ul><li>Key points
  15. 15. Every operation has an associated cost value
  16. 16. Cost value is calculated using statistical data
  17. 17. Cost is aggregated from bottom up in the access path
  18. 18. Drawbacks
  19. 19. Complex implementation
  20. 20. Slow optimization process
  21. 21. Requires up-to-date statistics</li></li></ul><li>Basic Terms<br /><ul><li>Selectivity
  22. 22. Represents a fraction of rows from a row set
  23. 23. Lies in the value range 0.0 to 1.0
  24. 24. Cardinality
  25. 25. Represents number of rows in a row set
  26. 26. Base cardinality is the number of rows in a base table</li></li></ul><li>Understanding of Cost<br /><ul><li>Cost
  27. 27. Is a function of the estimated cardinalities
  28. 28. Represents computational complexity of the retrieval
  29. 29. Measurement
  30. 30. Cost value linearly depends on the number of logical reads required to perform an operation
  31. 31. Logical read is equal to a single page fetch
  32. 32. Cost value may also take into account auxiliary steps such as an external sorting</li></li></ul><li>Cost Measurement (example)<br /><ul><li>Full table scan
  33. 33. cost = base cardinality
  34. 34. Unique index scan
  35. 35. cost = b-tree level + 1
  36. 36. Range index scan
  37. 37. cost = b-tree level + N + selectivity * base cardinality(N represents the number of the required leaf page fetches and thus depends on the average key length)</li></li></ul><li>Cost Aggregation (example)<br />Final Row Set<br />cost = 9000<br />SELECT *<br />FROM T1<br /> JOIN T2 ON T1.PK = T2.FK<br />WHERE T1.VAL + T2.VAL &lt; 100<br />ORDER BY T1.NUM<br />Sort<br />cost = 9000<br />Filter<br />cost = 7000<br />Loop Join<br />cost = 6000<br />Full Scan<br />cost = 1000<br />Index Scan<br />cost = 5<br />
  38. 38. Statistics<br /><ul><li>Information describing data amounts and distribution of values on different levels(table, index, column)
  39. 39. Stored in a database or estimated at runtime
  40. 40. Collected by request or automatically</li></li></ul><li>Core Statistics<br /><ul><li>Number of Rows in a Table (Base Cardinality)
  41. 41. Small tables:number of used record slots on the data pages
  42. 42. Large tables:number of used data pages / average record length
  43. 43. Estimated at runtimevia scanning pointer or data pages</li></li></ul><li>Core Statistics (continued)<br /><ul><li>Index Selectivity
  44. 44. 1 / number of distinct keys in the index
  45. 45. Maintained per segment: (A), (A, B), (A, B, C)
  46. 46. Assumes uniform distribution of values
  47. 47. Calculated during index creation or upon request(SET STATISTICS statement)
  48. 48. Stored on the index root page
  49. 49. Visible in RDB$INDICES and RDB$INDEX_SEGMENTS</li></li></ul><li>Decisions Based on Core Statistics<br /><ul><li>Full Table Scan over Indexed Retrieval
  50. 50. Selectivity close to 1.0 suggests a full scan
  51. 51. What Indices to Use
  52. 52. Compare index selectivities and index scan costs
  53. 53. Consider segment operations for compound indices
  54. 54. Calculate selectivities for AND and OR operations
  55. 55. Order of Streams in Loop Joins
  56. 56. Calculate costs for different join ordersand choose the best one</li></li></ul><li>Advanced Statistics<br /><ul><li>Table level
  57. 57. Average page fill factor
  58. 58. Average row length(both help with a better base cardinality estimation)
  59. 59. Number of rows(allows to avoid the runtime pages scan)</li></li></ul><li>Advanced Statistics (continued)<br /><ul><li>Index level
  60. 60. B-tree depth
  61. 61. Average key length(both help with a better cost estimation for index scans)
  62. 62. Clustering factor(allows to prefer an index navigationover an external sort under some conditions;also could be used to avoid filling the sparse bitmap)</li></li></ul><li>Clustering Factor<br />Index Key 1<br />Data Page 12<br />Index Key 2<br />Data Page 25<br />Data Page 12<br />Index Key 3<br />Data Page 28<br />Data Page 13<br />Index Key 4<br />Data Page 44<br />Data Page 14<br />Index Key 5<br />Data Page 57<br />Bad Clustering Factor<br />Good Clustering Factor<br />
  63. 63. Advanced Statistics (continued)<br /><ul><li>Column level
  64. 64. Selectivity(core feature, required to estimate costs)
  65. 65. Number of NULLs(useful for selectivity estimations for IS [NOT] NULL)
  66. 66. Value distribution histogram(allows selectivity estimations for non-uniform value distributions)</li></li></ul><li>Sample Histograms<br />1. Non-Selective Column<br />&apos;A&apos;<br />&apos;B&apos;<br />&apos;C&apos;<br />&apos;D&apos;<br />2. Selective Column<br />1 5 5 5 10 20 50 50 80 100<br />
  67. 67. Decisions Based on Advanced Statistics<br /><ul><li>Sort Aggregation vs Hash Aggregation
  68. 68. Selectivity of columns being grouped by
  69. 69. Loop Join vs Merge Join vs Hash Join
  70. 70. Cardinality of tables and filtering predicates
  71. 71. Index Usage
  72. 72. Number of NULLs or histogram
  73. 73. Index Navigation vs External Sorting
  74. 74. Clustering factor</li></li></ul><li>The Firebird Projectwww.firebirdsql.org<br />
  • puslic

    Nov. 18, 2014
  • zsazsi98

    Oct. 31, 2014

Basic introduction to internal mechanism of Firebird optimizer. How it works, how it decides to use this or that index, why sometimes it fails and what you can do to improve performance? Definitely this presentation will not answer all these questions but it gives you a basic knowledge of Firebird optimizer internals. This is not for all developers and requires some qualification, definitely.

Views

Total views

5,169

On Slideshare

0

From embeds

0

Number of embeds

1,115

Actions

Downloads

74

Shares

0

Comments

0

Likes

2

×