Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Calamities with cardinalities

375 views

Published on

Optimizer basics and how things can go wrong with cardinality estimates

Published in: Technology
  • Be the first to comment

Calamities with cardinalities

  1. 1.  Independent consultant  Performance Troubleshooting  In-house workshops  Cost-Based Optimizer  Performance By Design  (Ex-) Oracle ACE Director  Member of OakTable Network 2009-2016 Alumni
  2. 2. http://oracle-randolf.blogspot.com
  3. 3. https://www.youtube.com/c/RandolfGeist
  4. 4. https://community.oracle.com/welcome
  5. 5.  Three main questions you should ask when looking for an efficient execution plan:  How much data? How many rows / volume?  How scattered / clustered is the data?  Caching? => Know your data!
  6. 6.  Why are these questions so important?  Two main strategies:  One “Big Job” => How much data, volume?  Few/many “Small Jobs” => How many times / rows? => Effort per iteration? Clustering / Caching
  7. 7.  Optimizer’s cost estimate is based on:  How much data? How many rows / volume?  (partially)  (Caching?) Not at all
  8. 8.  Single table cardinality  Join cardinality  Filter subquery / Aggregation cardinality
  9. 9.  Selectivity of predicates applying to a single table
  10. 10.  Selectivity of predicates applying to a single table
  11. 11.  Selectivity of predicates applying to a single table
  12. 12.  Selectivity of predicates applying to a single table BaseCardinality FilteredCardinality/FilterRatio
  13. 13.  Optimizer challenges  Skewed column value distribution  Gaps / clustered values  Correlated column values  Complex predicates and expressions  Bind variables
  14. 14. Demo! optimizer_basics_single_table_cardinality_testcase.sql
  15. 15.  Impact limited to a “single table”  Influences the favored (Full Table Scan, Index Access etc.)  Influences the and (NESTED LOOP, HASH, MERGE) => An incorrect single table cardinality potentially screws up whole !
  16. 16.  Oracle joins exactly row sources at a time  If more than two row sources need to be joined, join operations are required  Many different possible (factorial!)
  17. 17.  Tree shape of execution plan
  18. 18.  Challenges  Getting the right!  A join can mean anything between and a product
  19. 19.  Getting the right T1 T2 T1, T2 1,000 rows 1,000 rows 0 rows 1,000,000 rows
  20. 20.  Getting the right T1 T2 T1, T2 Join cardinality = Cardinality T1 * Cardinality T2 * Join selectivity
  21. 21.  Challenges  Semi Joins (EXISTS (), = ANY())  Anti Joins (NOT EXISTS (), <> ALL())  Non-Equi Joins (Range, Unequal etc.)
  22. 22.  Even for the most common form of a join - the – there are several challenges  Non-uniform join column value distribution  Partially overlapping join columns  Correlated column values  Expressions  Complex join expressions (multiple AND, OR)
  23. 23. Demo! optimizer_basics_join_cardinality_testcase.sql
  24. 24.  Influences the and (NESTED LOOP, HASH, MERGE) => An incorrect join cardinality/selectivity potentially screws up whole !
  25. 25. use ANALYZE … COMPUTE / ESTIMATE STATISTICS anymore - Blocks, Rows, Avg Row Len to configure there, always generated - Low / High Value, Num Distinct, Num Nulls => Controlled via option of
  26. 26. Controlling column statistics via METHOD_OPT  If you see FOR ALL COLUMNS: Question it! Only applicable if the author really knows what he/she is doing! => Without basic column statistics Optimizer is resorting to hard coded defaults! in pre-10g releases: FOR ALL COLUMNS : Basic column statistics for all columns, no histograms from on: FOR ALL COLUMNS : Basic column statistics for all columns, histograms if Oracle determines so
  27. 27. get generated along with in a (almost) histogram requires a pass  Therefore Oracle resorts to if =>  This limits the quality of histograms and their significance – in particular HEIGHT BALANCED histograms are a potential threat
  28. 28.  Limited resolution of value pairs maximum  Less than distinct column values =>  More than distinct column values => is always a of data, even when statistics!
  29. 29. get generated along with in a (almost) histograms can be generated along this single pass with high quality  New histograms still require a pass  Hybrid histograms do a much better job than Height Balanced, not perfect though  Caution: You only get the new histogram types with DBMS_STATS.AUTO_SAMPLE_SIZE
  30. 30.  Default max resolution still value pairs maximum, but can be increased to 2048 max  Less than distinct column values =>  More than distinct column values =>  Again: Watch out for old scripts / implementations using non-default ESTIMATE_PERCENT
  31. 31. generates if a column gets used as a and it has than distinct values (up to 2048 in 12c if explicitly requested via parameter / prefs) in behaviour of histograms introduced in 10.2.0.4 / 11g of (no longer) new “value not found in Frequency Histogram” behaviour of edge case of very popular / unpopular values
  32. 32. SELECT SKEWED_NUMBER FROM T ORDER BY SKEWED_NUMBER 1,000 2,000 3,000 4,000 5,000 … 1 70,000 140,000 210,000 280,000 350,000 10,000,000 rows 100 popular values with 70,000 occurrences 250 buckets each covering 40,000 rows (compute) 250 buckets each covering approx. 22/23 rows (sampling) 40,000 80,000 120,000 160,000 200,000 240,000 280,000 320,000 360,000 400,000 1,000 2,000 2,000 3,000 3,000 4,000 4,000 5,000 6,000 6,000 RowsRowsEndpoint 1
  33. 33. 1,000 100,000 7,000,000 10,000,000
  34. 34. 1,000 100,000 7,000,000 10,000,000
  35. 35. Demo! histogram_limitations_testcase_12c.sql frequency_histogram_egde_case_11204.sql
  36. 36.  Still pretty common: Not using the for storing dates or numbers  Frequently encountered: Storing in numbers or strings, or in strings  Several were added to the optimizer over time, like detecting numbers stored in strings under certain circumstances  Yet there are still many problems with that approach and quite often as a result
  37. 37. Demo! calamities_with_cardinalities_daft_data_types.sql
  38. 38. are still, even with the latest version of Oracle one of the key factors for efficient execution plans generated by the optimizer  Oracle does not know the you ask about the data  Pre 12c: You may want to use FOR ALL COLUMNS SIZE 1 as default and only generate where really  You may get better results with the histogram , but not always
  39. 39.  There are that work well with histograms when generated via Oracle  => Pre 12c: You may need to generate histograms using DBMS_STATS.SET_COLUMN_STATS for critical columns  Don’t Dynamic Sampling / Function Based Indexes / Virtual Columns / Extended Statistics  Know your and
  40. 40. and determine whether the or strategy should be preferred  If the optimizer gets these estimates right, the resulting will be within the of the given access paths
  41. 41. Q & A

×