Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Science of DBMS: Query Optimization

927 views

Published on

This session will take a look at when/how query optimization takes place, the resources used for query optimization, the role of index statistics and common application query problems (other than simplistic missing indexes) that lead to DBA’s assuming there is a query optimization issue.

Published in: Technology
  • Be the first to comment

The Science of DBMS: Query Optimization

  1. 1. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 -ISUG TECH 2015-ISUG TECH 2015 ConferenceConference :The Science of DBMS Query Optimization:The Science of DBMS Query Optimization ,Jeff Tallman SAP ASE Product Management,Jeff Tallman SAP ASE Product Management
  2. 2. 2Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group AgendaAgenda Intro& Optimization Basics q Basic optimization cost factors q Procedure Cache (ASE) Query Processing& Optimization q Internals of QP q Impact of LOP-tree q Understanding optimization vs. execution Optimization Costing q Histograms & column densities q IN() & OR clauses q Out of range histograms q Joins & Multi-column densities Controllingoptimization q Sp_chgattribute ‘opt concurrency threshold’ q Sp_modifystats q Resource Granularity
  3. 3. 3Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group SomeCaveatsSomeCaveats Query Optimization isvery vendor proprietary/confidential q You can buy books on generic optimization techniques…. q …but DBMS vendors hire PhD’s to develop implementations ü Query performance often depends on how good the optimization is ü This is a key difference between OpenSource and COTS DBMS packages  The strength of the query optimizer is largely due to the $$$ vested in skills of highly educated staffing Asa result, thissession will NOT explain thesecretsof ASE’soptimizer q However, it will explain how it works, what influences it, what resources it uses, etc. q Additionally, most modern optimizers all use the same lava tree model ü Query optimization is based on an upside down tree with data spewing out the top
  4. 4. 4Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Goal of ThisSessionGoal of ThisSession Thegoal of thissession q Help you understand the intricacies of query optimization q Use that knowledge to write queries that can be optimized better q Understand how/when additional index statistics might be necessary q Understand how to influence optimization ü Other than the usual index forcing, AQP plan clauses, etc. q Differentiate when the optimizer is messing up…or your SQL did Assumptionsfor thissession
  5. 5. 5Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group RulesBased OptimizationRulesBased Optimization Rulesbased optimization q Index selection and join order processing are based on specific rules q For example: ü Index selection is based on the index whose leading columns are most covered by query predicates ü Join order is based on left to right ordering in FROM clause designates driving tables/join order Thegood, bad & ugly q Very good for extremely volatile data in which histogram statistics are often stale/impossible q Good for insert intensive monotonic sequences in which new values are out of range of histograms q Not so good…in fact sometimes ugly…on data that has any sort of skew with highly repetitive values q The really ugly part is if the SQL coders don’t know the “rules”
  6. 6. 6Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Cost Based OptimizationCost Based Optimization Used by all mainstream DBMS’s q Oracle, IBM DB2 UDB, MS SQL, ASE Attemptstofind thecheapest method toperform query q Uses some factoring of IO, CPU and memory q Formula for cost varies among DBMS’s Thekey tocostingisindex/column histograms q In a sense, histograms attempt to report the relative skew of the data being queried q The optimizer’s goal is to find the cheapest access path considering the data skew q If it wasn’t for the histogram reporting the skew…a rules based optimization would be the only choice
  7. 7. 7Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group SimpleCost Factors(1)SimpleCost Factors(1) Physical IO q This is pretty obvious – disks are slow. q But we also need to predict how many writes (and then re-reads) we may need to do for intermediate results Logical IO q This is where PhD’s are made q Remember, at query optimization time, we don’t know what pages we are after…. q However, we need to determine how many LIOs we expect based on ü How much of a table is already in cache ü How often we may revisit the same pages for multiple rows
  8. 8. 8Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group SimpleCost Factors(2)SimpleCost Factors(2) Memory q Besides LIO, memory can be used to cache query intermediate results such as subquery results, hash tables for HJ, etc. q In addition, memory can be used to avoid writes – e.g. in memory sorts for order by, sort merge joins, etc. CPU q Again, fairly basic – but every LIO requires CPU ü We need to do the data comparison for non-index key predicates ü Again, though, we really don’t know how fast the CPU is that we are on…and how awful the data comparisons will be  We might apply some fuzzy logic on LIKE ‘%pattern%’ on large varchars or something….but ….. q Also, basic – sorts require CPU as well ü Distinct processing, Order by processing, etc.
  9. 9. 9Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group ProcedureCache& OptimizationProcedureCache& Optimization Optimization • oneof theconsumersof proccache q Index statistics are loaded into proc cache for each query optimization ü Visible with set option show long q Temporary work plans are created in proc cache q Reported via set statistics resource on q Total consumption not a lot (rule of thumb = #engines * 2MB for OLTP) Twobigproblems q There is no ‘sharing’ of index statistics in proc cache q Index statistics don’t stay in cache ü As soon as query optimization for that query is finished, the proc buffers are deallocated. ü This means a TON of logical IOs on sysstatistics  Unless you use a lot of fully prepared statements or stored procedures ü Hence you really want to ensure you have a dedicated systables cache q This is largely due to historical aspects ü Remember, in 1984, 1MB of memory was a lot ü Today, sum of the index statistics are likely 256MB or less
  10. 10. 10Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group LoadingStats& ProcCacheUsageLoadingStats& ProcCacheUsage Creating Initial Statistics for table aqi_locations l .....Done creating Initial Statistics for table aqi_locations l Creating Initial Statistics for table aqi_samples s .....Done creating Initial Statistics for table aqi_samples s Creating Initial Statistics for index aqi_locations_PK .....Done creating Initial Statistics for index aqi_locations_PK … Phase 2b initialization of OptBlock0 ... ... phase 2b done. Start merging statistics for table aqi_locations l ..... Done merging statistics for table aqi_locations l Start merging statistics for table aqi_samples s ..... Done merging statistics for table aqi_samples s … Total estimated I/O cost for statement 1 (at line 1): 33926. Parse and Compile Time 0. Adaptive Server cpu time: 0 ms. Statement: 1 Compile time resource usage: (est worker processes=0 proccache=126), Execution time resource usage: (worker processes=0 auxsdesc=0 plansize=14 proccache=23 proccache hwm=28 tempdb hwm=2) Private buffer count: 48,Private HWM buffer count: 48 use demo_db go set statement_cache off set switch on 3604 set option show long set statistics time, io, resource, plancost on set showplan on go select l.city, l.county, s.sample_date, s.air_temp from aqi_locations l, aqi_samples s where l.location_id=s.location_id and s.sample_date = 'July 1 2000 12:00:00:000PM' and l.state='PA' and s.weather='Overcast' and s.air_temp = 90 go set switch off 3604 set option show off set statistics time, io, resource, plancost off set showplan off go Loading stats Compile time proc cache usage for stats & work plans 126 proc pages * 2k memory page = 252KB
  11. 11. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 QUERY PROCESSING &QUERY PROCESSING & OPTIMIZATIONOPTIMIZATION Internals, LOP Trees& Execution
  12. 12. 12Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group QP PhasesQP Phases Receivebuffer SQL Parsing Query Normalization q Resolves object id’s q Replaces system functions/functions with literals with literal values q Rearranges AND/OR according to precedence Pre-Processing q Transforms subqueries q Rearranges aggregates q Creates Logical Operators (LOP) Query Optimization Query Execution TDSLANG select * from table where due_dt =getdate() and recv_date is null SELECT {column list} FROM • table COND1 due_dt <=getdate() COND2 (AND) r recv_date is null SELECT {column id’s & datatypes} FROM • objid=123456 COND1 col_id=3 (dt) >= (dt) ‘Jan 1 2015’ COND2 (AND) col_id=4 (dt) IS NULL Receive Buffer SQL Parsing Normalization Pre-Processing Query Optimization Query Execution Focus
  13. 13. 13Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group SomeNoteson WaitEventsSomeNoteson WaitEvents Believeit or not…. q Until execution phase, all the rest counts as ‘awaiting command’ in sp_who or WaitEvent ID=250 in monProcessWaits q It kinda makes sense….until query is executing…it isn’t executing… q ….but parsing, compiling & optimization all can use considerable CPU time ü Sooo…that is why set statistics time on reports compile time separately Sooo…if ‘awaitingcommand’ a lot…. q See if packets received are increasing
  14. 14. 14Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Optimization Startswith LOP TreeOptimization Startswith LOP Tree Duringpre-processingphase, a LOP treeiscreated q A high level tree that represents the logical operations representing the relations between the entities q Often, the LOP tree is the first place where optimization starts to go wrong….due to bad query formation by developers Use‘set option show on’ toseelop tree q It will be near the very top of the output q You will need trace 3604 enabled Duringexecution, a physical operator (Pop) isused q Lop Join q Pop NLJoin
  15. 15. 15Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group ExampleQueryExampleQuery use demo_db go set option show on set switch on 3604 set statistics plancost, time, resource, io on set showplan on set statement_cache off -- avoid rerunning goofy plans from previous run set nodata on -- don’t return results (avoids network time/scrolling of large results) go select l.county, avg(s.air_temp) from aqi_locations l, aqi_samples s where l.location_id=s.location_id and s.sample_date between 'July 1 2000 00:01am' and 'July 31 2000 23:59:59' and state='PA' group by l.county go set option show off set switch off 3604 set statistics plancost, time, resource, io off set showplan off --set statement_cache off go
  16. 16. 16Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group ExampleLOP TreeExampleLOP Tree 1> select l.county, avg(s.air_temp) 2> from aqi_locations l, 3> aqi_samples s 4> where l.location_id=s.location_id 5> and s.sample_date between 'July 1 2000 00:01am' and 'July 31 2000 23:59:59' 6> and state='PA' 7> group by l.county The Lop tree: ( project ( group ( join ( scan aqi_locations ) ( scan aqi_samples ) ) ) )
  17. 17. 17Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group LOP Tree& OptBlocksLOP Tree& OptBlocks Each LOP treelevel becomesan Optblock q Outermost block (0) is one below (project) q Each block will generally have a relational operator ü Join, group, scalar, etc. ü Scan is only considered an operator if the query only has one entity and no other operators Optimizer will determinean optimal plan for that block q ASE set option show will print optimization for each optblock q The optblock list is also printed at The Lop tree: ( project ( group ( join ( scan aqi_locations ) ( scan aqi_samples ) ) ) ) OptBlock1 OptBlock0
  18. 18. 18Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group ExampleOptBlockExampleOptBlock The Lop tree: … OptBlock1 The Lop tree: ( join ( scan aqi_locations ) ( scan aqi_samples ) ) Generic Tables: ( Gtt1( aqi_locations l ) Gtt2( aqi_samples s ) Gti3( aqi_locations_PK ) … Generic Columns: ( Gc0(aqi_locations l ,Rid) Gc1(aqi_locations l ,state) Gc2(aqi_locations l ,location_id) … Predicates: ( { aqi_samples s.sample_date} >= "Jul 1 2000 12:01AM" tc:{5} … Transitive Closures: ( Tc0 = { Gc0(aqi_locations l ,Rid)} … OptBlock0 The Lop tree: ( pseudoscan ) Generic Tables: ( Gtg0 ) Generic Columns: ( Gc8(Gtg0 ,_gcelement_8) Gc9(Gtg0 ,_gcelement_9) Gc10(Gtg0 ,_gcelement_10) … Predicates: ( ) Transitive Closures: ( Tc7 = { Gc8(Gtg0 ,_gcelement_8) Gc12(Gtg0 ,_virtualagg) …
  19. 19. 19Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group If you haveany doubtsIf you haveany doubts If your index isbeingconsidered…. q It will be listed in Generic Tables with Gtti ü Format is <tablelist>, <indexlist> q Example: ü Generic Tables: ( Gtt1( aqi_locations l ) Gtt2( aqi_samples s ) Gti3( aqi_locations_PK ) Gti4( city_state_idx ) Gti5( county_state_idx ) Gti6( aqi_samples_PK ) Gti7( aqi_weather_date_idx ) ) If your whereclauseisbeingconsidered… q It will be listed in Predicates q Example: ü Predicates: ( { aqi_samples s.sample_date} >= "Jul 1 2000 12:01AM" tc:{5} { aqi_samples s.sample_date} <= "Jul 31 2000 11:59PM" tc:{5} { aqi_locations l.state} = 'PA' tc:{1} )
  20. 20. 20Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Tofind optimization detailsTofind optimization details Look for optblock begin/end section markersin output q Begin  ************************************************************************** ****  BEGIN: Search Space Traversal for OptBlock1  ************************************************************************** **** q End  ************************************************************************** ****  DONE: Search Space Traversal for OptBlock1  ************************************************************************** **** Any section could befairly lengthy q The key is to find the optblock where you think the problem is….
  21. 21. 21Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group TheLOP role…a taleof twoqueriesTheLOP role…a taleof twoqueries select * into tempdb..my_objects from sybsystemprocs..sysobjects create index type_date_idx on tempdb..my_objects (type, crdate) declare @type char(2) select @type='P' select @type, max(crdate) from tempdb..my_objects where type=@type declare @type char(2) select @type='P' select type, max(crdate) from tempdb..my_objects where type=@type group by type The setup: “Good” Query: “Bad” Query:
  22. 22. 22Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Theshowplans…and final IO costsTheshowplans…and final IO costs QUERY PLAN FOR STATEMENT 2 (at line 9). Optimized using Serial Mode STEP 1 The type of query is SELECT. 2 operator(s) under root |ROOT:EMIT Operator (VA = 2) | | |SCALAR AGGREGATE Operator (VA = 1) | | Evaluate Ungrouped MAXIMUM AGGREGATE. | | Scanning only up to the first qualifying row. | | | | |SCAN Operator (VA = 0) | | | FROM TABLE | | | my_objects | | | Index : type_date_idx | | | Backward scan. | | | Positioning by key. | | | Index contains all needed columns. Base table will not be read. | | | Keys are: | | | type ASC | | | Using I/O Size 4 Kbytes for index leaf pages. | | | With LRU Buffer Replacement Strategy for index leaf pages. Total estimated I/O cost for statement 2 (at line 9): 54. … Table: my_objects scan count 1, logical reads: (regular=2 apf=0 total=2), physical reads: (regular=0 apf=0 total=0), apf IOs used=0 Total actual I/O cost for this command: 4. “Good” Query Plan & Cost: QUERY PLAN FOR STATEMENT 2 (at line 9). Optimized using Serial Mode STEP 1 The type of query is SELECT. 3 operator(s) under root |ROOT:EMIT Operator (VA = 3) | | |RESTRICT Operator (VA = 2)(0)(0)(0)(4)(0) | | | | |GROUP SORTED Operator (VA = 1) | | | Evaluate Grouped MAXIMUM AGGREGATE. | | | | | | |SCAN Operator (VA = 0) | | | | FROM TABLE | | | | my_objects | | | | Index : type_date_idx | | | | Forward Scan. | | | | Positioning by key. | | | | Index contains all needed columns. Base table will not be read. | | | | Keys are: | | | | type ASC | | | | Using I/O Size 4 Kbytes for index leaf pages. | | | | With LRU Buffer Replacement Strategy for index leaf pages. Total estimated I/O cost for statement 2 (at line 9): 360. … Table: my_objects scan count 1, logical reads: (regular=4 apf=0 total=4), physical reads: (regular=0 apf=0 total=0), apf IOs used=0 Total actual I/O cost for this command: 8. “Bad” Query Plan & Cost:
  23. 23. 23Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group A first clue…theplancostA first clue…theplancost ==================== Lava Operator Tree ==================== Emit (VA = 2) r:1 er:1 cpu: 0 / ScalarAgg Max (VA = 1) r:1 er:1 cpu: 0 / IndexScan type_date_idx (VA = 0) r:1 er:1 l:2 el:2 p:0 ep:2 ============================================================ “Good” Query LOP Plancost: ==================== Lava Operator Tree ==================== Emit (VA = 3) r:1 er:6 cpu: 0 / Restrict (0)(0)(0)(4)(0) (VA = 2) r:1 er:6 / GroupSorted Grouping (VA = 1) r:1 er:6 / IndexScan type_date_idx (VA = 0) r:647 er:598 l:4 el:4 p:0 ep:4 ============================================================ “Bad” Query LOP Plancost:
  24. 24. 24Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Theactual LOP treesTheactual LOP trees The Lop tree: ( project ( scalar ( scan my_objects ) ) ) OptBlock1 The Lop tree: ( scan my_objects ) OptBlock0 The Lop tree: ( pseudoscan ) “Good” Query LOP tree: The Lop tree: ( project ( group ( scan my_objects ) ) ) OptBlock1 The Lop tree: ( scan my_objects ) OptBlock0 The Lop tree: ( pseudoscan ) “Bad” Query LOP Plancost:
  25. 25. 25Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group TheLessonTheLesson TheLOP can influenceoptimization and final costs q Try to use operators that are lighter weight (e.g. scalar vs. group by) q In this case, we knew the @type up front…. ü Re-selecting it in the ‘group by’ variant is duplicative/redundant ü Literals, @vars are scalars whereas group by is a vector Execution can play a roleaswell q We saw in this example, in the scalar variant that the optimizer can limit the rows to be scanned  | |SCALAR AGGREGATE Operator (VA = 1)  | | Evaluate Ungrouped MAXIMUM AGGREGATE.  | | Scanning only up to the first qualifying row. q Execution can also short-circuit based in certain
  26. 26. 26Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Optimization vs. Execution (1)Optimization vs. Execution (1) Optimizer getsa lot of blamefor thingsit isnot involved in Example: q Customer on SCN whines about table scan due to optimizer ‘bug’ on the following example query   Select * from sysobjects  Where id=8 OR 1=2  q Customer “thinks” optimizer should simply use the index  What doyou think thereal problem isand why???
  27. 27. 27Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Let’sstart simple(1)Let’sstart simple(1) 1> select count(*) from sysobjects plan '(t_scan sysobjects)' QUERY PLAN FOR STATEMENT 1 (at line 1). Optimized using Serial Mode Optimized using the Abstract Plan in the PLAN clause. STEP 1 The type of query is SELECT. 2 operator(s) under root |ROOT:EMIT Operator (VA = 2) | | |SCALAR AGGREGATE Operator (VA = 1) | | Evaluate Ungrouped COUNT AGGREGATE. | | | | |SCAN Operator (VA = 0) | | | FROM TABLE | | | sysobjects | | | Table Scan. | | | Forward Scan. | | | Positioning at start of table. | | | Using I/O Size 32 Kbytes for data pages. | | | With LRU Buffer Replacement Strategy for data pages. Total estimated I/O cost for statement 1 (at line 1): 414. Parse and Compile Time 0. Adaptive Server cpu time: 0 ms. ----------- 702 Let’s force a table scan just to see how many LIO’s it takes
  28. 28. 28Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Let’sstart simple(2)Let’sstart simple(2) Statement: 1 Compile time resource usage: (est worker processes=0 proccache=57), Execution time resource usage: (worker processes=0 auxsdesc=0 plansize=6 proccache=7 proccache hwm=7 tempdb hwm=0) ==================== Lava Operator Tree ==================== Emit (VA = 2) r:1 er:1 cpu: 0 / ScalarAgg Count (VA = 1) r:1 er:1 cpu: 0 / TableScan sysobjects (VA = 0) r:702 er:702 l:26 el:26 p:0 ep:4 ============================================================ Table: sysobjects scan count 1, logical reads: (regular=26 apf=0 total=26), physical reads: (regular=0 apf=0 total=0), apf IOs used=0 Total actual I/O cost for this command: 52. Total writes for this command: 0 Execution Time 0. Adaptive Server cpu time: 0 ms. Adaptive Server elapsed time: 0 ms. The answer is 26…remember that
  29. 29. 29Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group A simplefalseexpression (1)A simplefalseexpression (1) 1> select * from sysobjects where 1=2 QUERY PLAN FOR STATEMENT 1 (at line 1). Optimized using Serial Mode STEP 1 The type of query is SELECT. 2 operator(s) under root |ROOT:EMIT Operator (VA = 2) | | |RESTRICT Operator (VA = 1)(4)(0)(0)(0)(0) | | | | |SCAN Operator (VA = 0) | | | FROM TABLE | | | sysobjects | | | Table Scan. | | | Forward Scan. | | | Positioning at start of table. | | | Using I/O Size 4 Kbytes for data pages. | | | With LRU Buffer Replacement Strategy for data pages. Total estimated I/O cost for statement 1 (at line 1): 237. Parse and Compile Time 0. Adaptive Server cpu time: 0 ms. We are still going to do an table scan….
  30. 30. 30Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group A simplefalseexpression (2)A simplefalseexpression (2) Statement: 1 Compile time resource usage: (est worker processes=0 proccache=69), Execution time resource usage: (worker processes=0 auxsdesc=0 plansize=14 proccache=15 proccache hwm=15 tempdb hwm=0) ==================== Lava Operator Tree ==================== Emit (VA = 2) r:0 er:702 cpu: 0 / Restrict (4)(0)(0)(0)(0) (VA = 1) r:0 er:702 / TableScan sysobjects (VA = 0) r:0 er:702 l:0 el:1 p:0 ep:1 ============================================================ Table: sysobjects scan count 0, logical reads: (regular=0 apf=0 total=0), physical reads: (regular=0 apf=0 total=0), apf IOs used=0 Total actual I/O cost for this command: 0. Total writes for this command: 0 Execution Time 0. Adaptive Server cpu time: 0 ms. Adaptive Server elapsed time: 0 ms. (0 rows affected) What happened to our 26 IO’s???
  31. 31. 31Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Digginga Bit Deeper (1)Digginga Bit Deeper (1) 1> select * from sysobjects where 1=2 2> The Lop tree: ( project ( scan sysobjects ) ) OptBlock0 The Lop tree: ( scan sysobjects ) Generic Tables: ( Gtt0( sysobjects ) ) Generic Columns: … Predicates: ( 1=2) Transitive Closures: … We do see the expression…but notice there is no index listed in Generic Tables… ….and notice that the predicate listed doesn’t have a condition number (tc{#})…
  32. 32. 32Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Digginga Bit Deeper (2)Digginga Bit Deeper (2) ****************************************************************************** BEGIN: Search Space Traversal for OptBlock0 ****************************************************************************** Scan plans selected for this optblock: Statistics for rows returned to client... Estimated rows :702 Estimated row width :239.5 Estimated client cost is :132.95 Estimating selectivity for table 'sysobjects' Table scan cost is 702 rows, 21 pages, Cost adjusted for Fastfirstrow goal, Adjustment ratio0.001424501 Adjusted Table scan cost is 1 rows, 21 pages, The table (Datarows) has 702 rows, 21 pages, Data Page Cluster Ratio 0.9999900 Search argument selectivity is 1. using table prefetch (size 32K I/O) Large IO selected: The number of leaf pages qualified is > MIN_PREFETCH pages in data cache 'default data cache' (cacheid 0) with LRU replacement OptBlock0 Eqc{0} -> Pops added: ( PopTabScan sysobjects ) cost:237.6 T(L1,P0.9999995,C2106) O(L1,P0.9999995,C2106) order: none The best plan found in OptBlock0 : ( PopTabScan cost:237.6 T(L1,P0.9999995,C2106) O(L1,P0.9999995,C2106) props: [{}] Gtt0( sysobjects ) ) cost:237.6 T(L1,P0.9999995,C2106) O(L1,P0.9999995,C2106) order: none Hmmm….no indexes looked at…
  33. 33. 33Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Let’sTry SomethingClose(1)Let’sTry SomethingClose(1) 1> select * from sysobjects where id=8 and 1=2 QUERY PLAN FOR STATEMENT 1 (at line 1). Optimized using Serial Mode STEP 1 The type of query is SELECT. 2 operator(s) under root |ROOT:EMIT Operator (VA = 2) | | |RESTRICT Operator (VA = 1)(4)(0)(0)(0)(0) | | | | |SCAN Operator (VA = 0) | | | FROM TABLE | | | sysobjects | | | Using Clustered Index. | | | Index : csysobjects | | | Forward Scan. | | | Positioning by key. | | | Keys are: | | | id ASC | | | Using I/O Size 4 Kbytes for index leaf pages. | | | With LRU Buffer Replacement Strategy for index leaf pages. | | | Using I/O Size 4 Kbytes for data pages. | | | With LRU Buffer Replacement Strategy for data pages. Total estimated I/O cost for statement 1 (at line 1): 81. Parse and Compile Time 0. Adaptive Server cpu time: 0 ms. Heyyy!!!! We used an index…even with a FALSE expression….
  34. 34. 34Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Let’sTry SomethingClose(2)Let’sTry SomethingClose(2) Statement: 1 Compile time resource usage: (est worker processes=0 proccache=69), Execution time resource usage: (worker processes=0 auxsdesc=0 plansize=14 proccache=17 proccache hwm=17 tempdb hwm=0) ==================== Lava Operator Tree ==================== Emit (VA = 2) r:0 er:71 cpu: 0 / Restrict (4)(0)(0)(0)(0) (VA = 1) r:0 er:71 / IndexScan csysobjects (VA = 0) r:0 er:71 l:0 el:3 p:0 ep:3 ============================================================ Table: sysobjects scan count 0, logical reads: (regular=0 apf=0 total=0), physical reads: (regular=0 apf=0 total=0), apf IOs used=0 Total actual I/O cost for this command: 0. Total writes for this command: 0 Execution Time 0. Adaptive Server cpu time: 0 ms. Adaptive Server elapsed time: 0 ms. (0 rows affected) …but we *STILL* didn’t do any LIO’s….how is that???
  35. 35. 35Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Let’sTry SomethingClose(3)Let’sTry SomethingClose(3) 1> select * from sysobjects where id=8 and 1=2 2> 3> The Lop tree: ( project ( scan sysobjects ) ) OptBlock0 The Lop tree: ( scan sysobjects ) Generic Tables: ( Gtt0( sysobjects ) Gti1( csysobjects ) ) Generic Columns: … Predicates: ( { sysobjects.id } = 8 tc:{25} 1=2) Transitive Closures: … …We now have an index to look at as well as a predicate with a tc{#}….it applies to the condition before the label.
  36. 36. 36Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Let’sTry SomethingClose(4)Let’sTry SomethingClose(4) ****************************************************************************** BEGIN: Search Space Traversal for OptBlock0 ****************************************************************************** Scan plans selected for this optblock: Statistics for rows returned to client... Estimated rows :70.2 Estimated row width :239.5 Estimated client cost is :14.7343 Scan on table sysobjects skipped because table scan less than concurrency threshold Scan on table sysobjects skipped because table scan less than concurrency threshold Beginning selection of qualifying indexes for table 'sysobjects', Estimating selectivity of index 'sysobjects.csysobjects', indid 3 id = 8 Estimated selectivity for id, selectivity = 0.1, scan selectivity 0.001424501, filter selectivity 0.001424501 restricted selectivity 0.1 Cost adjusted for Fastfirstrow goal, Adjustment ratio 0.01424501 unique index with all keys, one row scans 1 rows, 1 pages Adjustment ratio 0.01424501 applied gives 0.01424501 rows, 1 pages Data Row Cluster Ratio 0.06314244 Index Page Cluster Ratio 0.99999 Data Page Cluster Ratio 0.2469512 using no index prefetch (size 4K I/O) in index cache 'default data cache' (cacheid 0) with LRU replacement Yep, we evaluated the index
  37. 37. 37Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Let’sTry SomethingClose(5)Let’sTry SomethingClose(5) ****************************************************************************** BEGIN: Search Space Traversal for OptBlock0 ****************************************************************************** … using no table prefetch (size 4K I/O) in data cache 'default data cache' (cacheid 0) with LRU replacement Data Page LIO for 'csysobjects' on table 'sysobjects' = 1 OptBlock0 Eqc{0} -> Pops added: ( PopRidJoin ( PopIndScan csysobjects sysobjects ) ) cost:81.39999 T(L3,P3,C4) O(L1,P1,C3) order: none The best plan found in OptBlock0 : ( PopRidJoin cost:81.39999 T(L3,P3,C4) O(L1,P1,C3) props: [{}] ( PopIndScan cost:54.09999 T(L2,P2,C1) O(L2,P2,C1) props: [{}] Gti1( csysobjects ) Gtt0( sysobjects ) ) cost:54.09999 T(L2,P2,C1) O(L2,P2,C1) order: none ) cost:81.39999 T(L3,P3,C4) O(L1,P1,C3) order: none ****************************************************************************** DONE: Search Space Traversal for OptBlock0 ****************************************************************************** …and that was about it….so we go with the index
  38. 38. 38Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Understandingwhat happenedUnderstandingwhat happened Query optimizer optimizes…not executes q Expression evaluation happens during execution time q Soooo….. 1=2 is not even looked at by optimizer ü Both are literals and optimizer skips this as a literal expression that cannot be optimized Query execution can ‘short circuit’ q Obviously false expressions q N-ary Nested Loop Joins q …
  39. 39. 39Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Soo….What about Our Query?Soo….What about Our Query? Our Example:   Select * from sysobjects  Where id=8 OR 1=2  What happens q Optimizer evaluates index on id=8 q Optimizer sees OR clause ü …opposite side of OR clause is unoptimizable expression which could be *anything* (e.g. an unindexed param like type=‘U’) ü Since it could be anything OR clause means table scan q Since we have to table scan the OR’d condition…. ü No sense in using the index for id=8…we will just hit those rows on the way by doing the OR clause
  40. 40. 40Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Why did I bringthat up???Why did I bringthat up??? Haveyou ever donethisin a stored proc???  Select….  from tableA, …  where …  and (((@var1=1) and (colA=‘value’))  or ((@var1=2) and (colB=‘value))  ) Or worseyet…  Select….  from tableA, …  where …  and (((@var1=1) and (colA=‘value’))  or ((@var1=2) and (colB=‘value))  ) I have….ooops….
  41. 41. 41Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group A morecomplicated exampleA morecomplicated example INSERT INTO #temp (...) SELECT DISTINCT ... FROM MYDBNAME..TABLE_A A , MYDBNAME..TABLE_B B , MYDBNAME..TABLE_C C , MYDBNAME..TABLE_D D , MYDBNAME..TABLE_E E , MYDBNAME..TABLE_F F , MYDBNAME..TABLE_G G , MYDBNAME..TABLE_H H WHERE A.COLUMN_1 = @VARIABLE_1 AND A.COLUMN_2 = @VARIABLE_2 AND A.COLUMN_3 = IsNull(@VARIABLE_3,A.COLUMN_3) AND A.COLUMN_4 = IsNull(@VARIABLE_4,A.COLUMN_4) AND A.COLUMN_5 = IsNull(@VARIABLE_5,A.COLUMN_5) ... AND A.COLUMN_6 BETWEEN @VARIABLE_6 AND @VARIABLE_7 ... ORDER BY ... Customer is trying to avoid writing IF/ELSE logic for different conditions/variables being passed in…if @VAR3-5 are set, the intent would be that they would be used as SARGs….but if not set, then the predicate is a no-op as column is compared to itself….
  42. 42. 42Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Simplifying(1)Simplifying(1) use demo_db go set statement_cache off set switch on 3604 set option show on set statistics time, io, resource, plancost on set showplan on go declare @air_temp smallint, @weather varchar(30), @bDate datetime, @eDate datetime select @air_temp=null, @weather=null, @bDate='July 1 2000 00:00:01', @eDate='July 31 2000 23:59:59' --select @air_temp=80, @weather='sunny',@bDate='July 1 2000 00:00:01', @eDate='July 31 2000 23:59:59' select count(*) from aqi_samples where sample_date between @bDate and @eDate and air_temp=isnull(@air_temp,air_temp) and weather=isnull(@weather,weather) go set switch off 3604 set option show off set statistics time, io, resource, plancost off set showplan off go Table has 168M rows with an index on {sample_date, air_temp, weather} …first run with nulls for second 2 index keys
  43. 43. 43Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Simplifying(2)Simplifying(2) The Lop tree: ( project ( scalar ( scan aqi_samples ) ) ) OptBlock1 The Lop tree: ( scan aqi_samples ) Generic Tables: ( Gtt1( aqi_samples ) Gti2( aqi_samples_PK ) Gti3( aqi_weather_date_idx ) ) Generic Columns: … Predicates: ( { aqi_samples.sample_date} >= "Jan 1 1900 12:00AM" tc:{3} { aqi_samples.sample_date} <= "Jan 1 1900 12:00AM" tc:{3} ) Transitive Closures: … OptBlock0 The Lop tree: ( pseudoscan ) Generic Tables: ( Gta0 ) Generic Columns: … Predicates: ( ) Transitive Closures: … The between clause is only one passed to optimizer… not much of a surprise as with the NULLs, we are expecting no-ops on air_temp and weather. Note that since we don’t know the value of @vars at compile time, we use default date here
  44. 44. 44Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Simplifying(3)Simplifying(3) Total estimated I/O cost for statement 3 (at line 4): 17133977. ==================== Lava Operator Tree ==================== Emit (VA = 3) r:1 er:1 cpu: 0 / ScalarAgg Count (VA = 2) r:1 er:1 cpu: 400 / Restrict (0)(0)(0)(11)(0) (VA = 1) r:1.303e+006 er:4.202e+007 / IndexScan aqi_weather_date (VA = 0) r:1.303e+006 er:4.202e+007 l:1969 el:63590 p:251 ep:8005 ============================================================ Table: aqi_samples scan count 1, logical reads: (regular=1969 apf=0 total=1969), physical reads: (regular=8 apf=243 total=251), apf IOs used=243 Total actual I/O cost for this command: 10213. Total writes for this command: 0 Execution Time 4. Adaptive Server cpu time: 417 ms. Adaptive Server elapsed time: 417 ms. Our total IO estimate is 17M+….Our estimated rows (from IndexScan) are off by 30x….which is bad…
  45. 45. 45Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Simplifying– Rerun (1)Simplifying– Rerun (1) use demo_db go set statement_cache off set switch on 3604 set option show on set statistics time, io, resource, plancost on set showplan on go declare @air_temp smallint, @weather varchar(30), @bDate datetime, @eDate datetime --select @air_temp=null, @weather=null, @bDate='July 1 2000 00:00:01', @eDate='July 31 2000 23:59:59' select @air_temp=80, @weather='sunny',@bDate='July 1 2000 00:00:01', @eDate='July 31 2000 23:59:59' select count(*) from aqi_samples where sample_date between @bDate and @eDate and air_temp=isnull(@air_temp,air_temp) and weather=isnull(@weather,weather) go set switch off 3604 set option show off set statistics time, io, resource, plancost off set showplan off go Table has 168M rows with an index on {sample_date, air_temp, weather} …second run with values for second 2 index keys
  46. 46. 46Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Simplifying- Rerun (2)Simplifying- Rerun (2) The Lop tree: ( project ( scalar ( scan aqi_samples ) ) ) OptBlock1 The Lop tree: ( scan aqi_samples ) Generic Tables: ( Gtt1( aqi_samples ) Gti2( aqi_samples_PK ) Gti3( aqi_weather_date_idx ) ) Generic Columns: … Predicates: ( { aqi_samples.sample_date} >= "Jan 1 1900 12:00AM" tc:{3} { aqi_samples.sample_date} <= "Jan 1 1900 12:00AM" tc:{3} ) Transitive Closures: … OptBlock0 The Lop tree: ( pseudoscan ) Generic Tables: ( Gta0 ) Generic Columns: … Predicates: ( ) Transitive Closures: … The between clause is still the only one passed to optimizer… which means this fails as a coding style
  47. 47. 47Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Simplifying- Rerun (3)Simplifying- Rerun (3) Total estimated I/O cost for statement 3 (at line 4): 17133977. ==================== Lava Operator Tree ==================== Emit (VA = 3) r:1 er:1 cpu: 0 / ScalarAgg Count (VA = 2) r:1 er:1 cpu: 300 / Restrict (0)(0)(0)(11)(0) (VA = 1) r:0 er:4.202e+007 / IndexScan aqi_weather_date (VA = 0) r:1.303e+006 er:4.202e+007 l:1969 el:63590 p:0 ep:8005 ============================================================ Table: aqi_samples scan count 1, logical reads: (regular=1969 apf=0 total=1969), physical reads: (regular=0 apf=0 total=0), apf IOs used=0 Total actual I/O cost for this command: 3938. Total writes for this command: 0 Execution Time 3. Adaptive Server cpu time: 309 ms. Adaptive Server elapsed time: 309 ms. We get the same estimates for total IO (17M) and in the bottom node, but the Restrict filters out non-qualifying rows – so we get 0….and finish 100ms faster…the faster execution might make developer think it worked. However, we do the same amount of work (1969 LIOs) so the faster exec is just likely the reduction in ScalarAgg (which it is) due to fewer rows to count.
  48. 48. 48Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Simplifying– Correct (1)Simplifying– Correct (1) use demo_db go set statement_cache off set switch on 3604 set option show on set statistics time, io, resource, plancost on set showplan on go declare @air_temp smallint, @weather varchar(30), @bDate datetime, @eDate datetime --select @air_temp=null, @weather=null, @bDate='July 1 2000 00:00:01', @eDate='July 31 2000 23:59:59' select @air_temp=80, @weather='sunny',@bDate='July 1 2000 00:00:01', @eDate='July 31 2000 23:59:59' select count(*) from aqi_samples where sample_date between @bDate and @eDate and air_temp=@air_temp and weather=@weather go set switch off 3604 set option show off set statistics time, io, resource, plancost off set showplan off go Table has 168M rows with an index on {sample_date, air_temp, weather} …third run with the way it should be…
  49. 49. 49Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Simplifying- Correct (2)Simplifying- Correct (2) The Lop tree: ( project ( scalar ( scan aqi_samples ) ) ) OptBlock1 The Lop tree: ( scan aqi_samples ) Generic Tables: ( Gtt1( aqi_samples ) Gti2( aqi_samples_PK ) Gti3( aqi_weather_date_idx ) ) Generic Columns: … Predicates: ( { aqi_samples.sample_date} >= "Jan 1 1900 12:00AM" tc:{3} { aqi_samples.sample_date} <= "Jan 1 1900 12:00AM" tc:{3} { aqi_samples.air_temp} = 0 tc:{2} { aqi_samples.weather} = ' tc:{1} ) Transitive Closures: … OptBlock0 The Lop tree: ( pseudoscan ) Generic Tables: ( Gta0 ) Generic Columns: … Predicates: ( ) Transitive Closures: … We now have all 3 predicates…since we still have @vars with unknown values, we substitute a 0 for int/smallint and ‘ (empty string) for varchar/char
  50. 50. 50Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Simplifying- Correct (3)Simplifying- Correct (3) Total estimated I/O cost for statement 3 (at line 4): 227844. ==================== Lava Operator Tree ==================== Emit (VA = 2) r:1 er:1 cpu: 0 / ScalarAgg Count (VA = 1) r:1 er:1 cpu: 0 / IndexScan aqi_weather_date (VA = 0) r:0 er:450006 l:306 el:1307 p:0 ep:165 ============================================================ Table: aqi_samples scan count 1, logical reads: (regular=306 apf=0 total=306), physical reads: (regular=0 apf=0 total=0), apf IOs used=0 Total actual I/O cost for this command: 612. Total writes for this command: 0 Execution Time 0. Adaptive Server cpu time: 1 ms. Adaptive Server elapsed time: 1 ms. Total estimated IO is 228K (vs. 17M) and estimated rowcount is TONS less…still off, but likely due to data skew and not knowing values of @vars…. And we only do 300 LIO vs. 1969….and we finish 300x faster
  51. 51. 51Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Index Keys: TheQueryIndex Keys: TheQuery SELECT SUM( T_00 ."MBGBTR" ) FROM "COEP" T_00 INNER JOIN "COBK" T_01 ON T_01 ."KOKRS" = ? AND T_01 ."BELNR" = T_00 ."BELNR" WHERE T_00 ."MANDT" = ? AND T_00 ."LEDNR" = ? AND T_00 ."OBJNR" = ? AND ( T_00 ."KSTAR" BETWEEN ? AND ? OR T_00 ."KSTAR" IN ( ? , ? , ? , ? ) ) AND T_01 ."AWTYP" = ? /* R3:ZVDESR121:558 T:COEP M:400 */ index_name index_keys index_description, COEP~0 MANDT, KOKRS, BELNR, BUZEI nonclustered, unique COEP~1 MANDT, LEDNR, OBJNR, GJAHR, WRTTP, VERSN, KSTAR, HRKFT, PERIO, VRGNG, PAROB, USPOB, VBUND, PARGB, BEKNZ, TWAER nonclustered COEP~Z02 MANDT, KOKRS, BUKRS, OBJNR nonclustered COEP_BDLS0 MANDT, LOGSYSO nonclustered COEP~4 MANDT, TIMESTMP, OBJNR nonclustered COEP~Z03 MANDT, LEDNR, OBJNR, KSTAR nonclustered COEP~Z05 MANDT, OBJNR, KSTAR, GJAHR, PERIO, PAROB1, WRTTP nonclustered COEP~Zt1 MANDT, LEDNR, OBJNR, KSTAR nonclustered
  52. 52. 52Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Index Keys– Bad Index AccessIndex Keys– Bad Index Access |ROOT:EMIT Operator (VA = 5) | | |SCALAR AGGREGATE Operator (VA = 4) | | Evaluate Ungrouped SUM OR AVERAGE AGGREGATE. | | | | |NESTED LOOP JOIN Operator (VA = 3) (Join Type: Inner Join) | | | | | | |RESTRICT Operator (VA = 1)(0)(0)(0)(4)(0) | | | | | | | | |SCAN Operator (VA = 0) | | | | | FROM TABLE | | | | | COEP | | | | | T_00 | | | | | Index : COEP~4 | | | | | Forward Scan. | | | | | Positioning by key. | | | | | Keys are: | | | | | MANDT ASC | | | | | Using I/O Size 128 Kbytes for index leaf pages. | | | | | With LRU Buffer Replacement Strategy for index leaf pages. | | | | | Using I/O Size 128 Kbytes for data pages. | | | | | With LRU Buffer Replacement Strategy for data pages. | | | | | | |SCAN Operator (VA = 2) | | | | FROM TABLE | | | | COBK | | | | T_01 | | | | Index : COBK~Zt1 | | | | Forward Scan. | | | | Positioning at index start. | | | | Index contains all needed columns. Base table will not be read. | | | | Using I/O Size 16 Kbytes for index leaf pages. | | | | With LRU Buffer Replacement Strategy for index leaf pages.
  53. 53. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 OPTIMIZATION COSTINGOPTIMIZATION COSTING (PART 1)(PART 1) Histograms, Column Densities, IN(), Out of RangeHistograms…
  54. 54. 54Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group HistogramsHistograms Thekey tocost-based optimization q Really is a distribution of data skew ü If data was evenly distributed, we wouldn’t need histograms at all q Mostly used for range scans q Can be used for equisargs if data highly skewed..as most is Thebasics q Frequency cells q Range cells Statistics for column: "type" Last update of column statistics: Feb 15 2015 9:18:32:850PM Range cell density: 0.0053191489361702 Total density: 0.4216274332277049 Range selectivity: default used (0.33) In between selectivity: default used (0.25) Unique range values: 0.0053191489361702 Unique total values: 0.2000000000000000 Average column width: default used (2.00) Rows scanned: 188.0000000000000000 Statistics version: 4 Histogram for column: "type" Column datatype: char(2) Requested step count: 20 Actual step count: 9 Sampling Percent: 0 Tuning Factor: 20 Out of range Histogram Adjustment is DEFAULT. Low Domain Hashing. Step Weight Value 1 0.00000000 <= "EJ" 2 0.00531915 < "P " 3 0.10638298 = "P " 4 0.00000000 < "S " 5 0.30319148 = "S " 6 0.00000000 < "U " 7 0.56382978 = "U " 8 0.00000000 < "V " 9 0.02127660 = "V " Range Cells Frequency Cells
  55. 55. 55Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group How Many StepsDoWeNeedHow Many StepsDoWeNeed Fewer = better for resourceusageand timetofind steps  More= better for optimization accuracy q Ideally, you want most range scans to be in a single cell ü Multiple cells means aggregating stats…may be accurate, but takes longer ü For example, for datetime, columns see if cells cover the common query range (week, month, year, ….)  Hard to near impossible to control to semantic boundaries q Increase stats may be better for estimates with high skew
  56. 56. 56Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group ExampleDateHistogramExampleDateHistogram Histogram for column: "sample_date" Column datatype: datetime Requested step count: 100 Actual step count: 103 Sampling Percent: 0 Tuning Factor: 20 Out of range Histogram Adjustment is DEFAULT. Sticky step count. Sticky hashing. Step Weight Value 1 0.00000000 <= "Jan 1 1993 11:59:59:996AM" 2 0.01017933 <= "Feb 13 1993 12:00:00:000PM" 3 0.00763450 <= "Mar 18 1993 12:00:00:000PM" 4 0.01018039 <= "May 1 1993 12:00:00:000PM" 5 0.00766925 <= "Jun 3 1993 12:00:00:000PM" 6 0.00777507 <= "Jul 6 1993 12:00:00:000PM" 7 0.00825124 <= "Aug 8 1993 12:00:00:000PM" 8 0.00816318 <= "Sep 10 1993 12:00:00:000PM" 9 0.00796063 <= "Oct 13 1993 12:00:00:000PM" 10 0.00795876 <= "Nov 15 1993 12:00:00:000PM" 11 0.00795651 <= "Dec 18 1993 12:00:00:000PM" 12 0.00788510 <= "Jan 19 1994 12:00:00:000PM" 13 0.01000150 <= "Feb 28 1994 12:00:00:000PM" 14 0.01000150 <= "Apr 9 1994 12:00:00:000PM“ … ~1.5 month spread…. Problem is that on some months it is mid- month, so a range scan for that month would need 3 cells. If concerned, likely need to double or triple stats
  57. 57. 57Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Histograms& StepsHistograms& Steps Default no HTF Defaults 40 steps 100 steps 500 steps Default number of steps 20 20 20 20 20 Histogram tuning factor 1 20 20 20 20 Requested steps 20 20 40 100 500 Actual steps 20 195 509 1550 7580 (Index statistics for combined city,state) Range cell density 0.00328457 0.00121356 0.00022722 0.00010744 0.00003560 Total density 0.00328457 0.00328457 0.00328457 0.00328457 0.00328457 Unique range values 0.00011547 0.00008212 0.00006416 0.00004897 0.00002615 Unique total values 0.00011547 0.00011547 0.00011547 0.00011547 0.00011547 Impact on estimates for Washington DC & San Francisco CA DC Cell <= Washington <= Washington = Washington = Washington = Washington DC Selectivity 0.05184000 0.02155000 0.02063000 0.02063000 0.02063000 DC Row Estimates 5184 2155 2063 2063 2063 SF Cell <= Somerset <= San Jacint = San Franci = San Franci = San Franci SF Selectivity 0.04875000 0.00678000 0.00634000 0.00634000 0.00634000 SF Row Estimates 4875 678 634 634 634 Statistics from an index on {city,state} for a 100,000 row table with ~6,200 distinct city names
  58. 58. 58Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Column DensitiesColumn Densities Singlecolumn densities q Range cell density/unique range values ü Tells maximum uniqueness… ü Min(weight)!=0 from range cells q Total density ü Relative skewness of the data ü Total density approaching 1.0 is extremely skewed ü Sum(weights^2) q Unique total values ü The number distinct values in column ü 1.0/select count(distinct column) Multiplecolumn densities q Automatically created on index Statistics for column: "type" Last update of column statistics: Feb 15 2015 9:18:32:850PM Range cell density: 0.0053191489361702 Total density: 0.4216274332277049 Range selectivity: default used (0.33) In between selectivity: default used (0.25) Unique range values: 0.0053191489361702 Unique total values: 0.2000000000000000 Average column width: default used (2.00) Rows scanned: 188.0000000000000000 Statistics version: 4 Statistics for column group: "sample_date", "air_temp", "weather" Last update of column statistics: May 27 2014 11:45:45:016AM Range cell density: 0.0000051075008894 Total density: 0.0000051075008894 Range selectivity: default used (0.33) In between selectivity: default used (0.25) Unique range values: 0.0000016297687032 Unique total values: 0.0000016297687032 Average column width: 8.5268955638740458 Rows scanned: 168066824.0000000000000000 Statistics version: 4
  59. 59. 59Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group UsingColumn DensitiesUsingColumn Densities If thecolumn valueisknown and… q …value falls in a range cell ….Estimate will be range cell value ü Whether range or frequency cell If thecolumn valueisnot known q Optimized with a literal placeholder (0, ‘’, Jan 1 1900, etc.) q Selectivity is total density
  60. 60. 60Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Column Selectivity vs. Density (1)Column Selectivity vs. Density (1) Statistics for column: "id" Last update of column statistics: Feb 16 2015 4:47:23:956PM Range cell density: 0.0092592412744228 Total density: 0.0113194187537711 Unique range values: 0.0041383133267069 Unique total values: 0.0055248618784530 Step Weight Value 1 0.00000000 < 1 2 0.01093356 = 1 3 0.01387721 <= 2 4 0.01261564 <= 3 5 0.00714886 <= 4 6 0.00294365 <= 5 7 0.00462574 <= 6 8 0.00210261 <= 8 9 0.00336417 <= 9 10 0.00336417 <= 11 11 0.00378469 <= 12 12 0.00925147 <= 13 13 0.00210261 <= 15 14 0.01808242 <= 16 15 0.00252313 <= 17 16 0.00252313 <= 18 17 0.00168209 <= 19 18 0.00000000 < 21 19 0.00630782 = 21 20 0.00252313 <= 22 21 0.01429773 <= 23 22 0.03868797 <= 24 23 0.00378469 <= 25 1> declare @id int 2> select @id=8 3> select * from syscolumns where id=@id Estimating selectivity of index 'syscolumns.csyscolumns', indid 2 id = 0 Estimated selectivity for id, selectivity = 0.01131942, scan selectivity 0.01131942, filter selectivity 0.01131942 26.91758 rows, 1 pages range cell unknown 1> select * from syscolumns where id=8 Estimating selectivity of index 'syscolumns.csyscolumns', indid 2 id = 8 Estimated selectivity for id, selectivity = 0.002102607, scan selectivity 0.002102607, filter selectivity 0.002102607 5 rows, 1 pages Weight < range cell density selectivity = weight
  61. 61. 61Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Column Selectivity vs. Density (2)Column Selectivity vs. Density (2) Statistics for column: "id" Last update of column statistics: Feb 16 2015 4:47:23:956PM Range cell density: 0.0092592412744228 Total density: 0.0113194187537711 Unique range values: 0.0041383133267069 Unique total values: 0.0055248618784530 Step Weight Value 1 0.00000000 < 1 2 0.01093356 = 1 3 0.01387721 <= 2 4 0.01261564 <= 3 5 0.00714886 <= 4 6 0.00294365 <= 5 7 0.00462574 <= 6 8 0.00210261 <= 8 9 0.00336417 <= 9 10 0.00336417 <= 11 11 0.00378469 <= 12 12 0.00925147 <= 13 13 0.00210261 <= 15 14 0.01808242 <= 16 15 0.00252313 <= 17 16 0.00252313 <= 18 17 0.00168209 <= 19 18 0.00000000 < 21 19 0.00630782 = 21 20 0.00252313 <= 22 21 0.01429773 <= 23 22 0.03868797 <= 24 23 0.00378469 <= 25 1> select * from syscolumns where id=21 Estimating selectivity of index 'syscolumns.csyscolumns', indid 2 id = 21 Estimated selectivity for id, selectivity = 0.006307822, scan selectivity 0.006307822, filter selectivity 0.006307822 15 rows, 1 pages Frequency cell selectivity = weight 1> select * from syscolumns where id=24 Estimating selectivity of index 'syscolumns.csyscolumns', indid 2 id = 24 Estimated selectivity for id, selectivity = 0.03868797, scan selectivity 0.03868797, filter selectivity 0.03868797 92 rows, 1 pages Weight > range cell density selectivity = weight
  62. 62. 62Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Column Selectivity vs. Density (3)Column Selectivity vs. Density (3) Statistics for column: "id" Last update of column statistics: Feb 16 2015 4:47:23:956PM Range cell density: 0.0092592412744228 Total density: 0.0113194187537711 Unique range values: 0.0041383133267069 Unique total values: 0.0055248618784530 Step Weight Value 1 0.00000000 < 1 2 0.01093356 = 1 3 0.01387721 <= 2 4 0.01261564 <= 3 5 0.00714886 <= 4 6 0.00294365 <= 5 7 0.00462574 <= 6 8 0.00210261 <= 8 9 0.00336417 <= 9 10 0.00336417 <= 11 11 0.00378469 <= 12 12 0.00925147 <= 13 13 0.00210261 <= 15 14 0.01808242 <= 16 15 0.00252313 <= 17 16 0.00252313 <= 18 17 0.00168209 <= 19 18 0.00000000 < 21 19 0.00630782 = 21 20 0.00252313 <= 22 21 0.01429773 <= 23 22 0.03868797 <= 24 23 0.00378469 <= 25 1> select * from syscolumns where id between 5 and 10 Estimating selectivity of index 'syscolumns.csyscolumns', indid 2 id >= 5 id <= 10 Estimated selectivity for id, selectivity = 0.01471826, scan selectivity 0.01471826, filter selectivity 0.01471826 35.00002 rows, 1 pages Range query Note that the sum of steps 6 10 is 0.01640034. However, since we are only using a portion of step 10 and the distribute is 2 values per step, we use the formula: Sum(step6..step9) + step10/2.0 = 0.01471826
  63. 63. 63Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group DebuggingSelectivityDebuggingSelectivity You’veprobably noticed…. q You need to have ‘set option show’ and optdiag output Find theindex you thought it should haveused q Look at the selectivity for each predicate q Check out the optdiag to see if it was a really skewed value But sometimesyou just havetolook at thequery q …your expectation may be due to knowledge you infer ü But optimizer doesn’t know ü ….such as the relationship between two columns q …and sometimes the indexing doesn’t support the query
  64. 64. 64Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Unbounded DateRangeUnbounded DateRange create table jobs ( job_number numeric(30,0), … job_category varchar(20), -- 10 distinct values job_priority tinyint, -- 100 distinct values job_begindate datetime, job_enddate datetime, job_status char(1), -- 6 distinct values …, primary key (job_number) ) Consider the above table for each of the scenarios on the following slides. Note the key columns of job dates and those that have some distinct values listed.
  65. 65. 65Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Scenario#1Scenario#1 Consider theindex:  create index job_begin_idx on jobs (job_begindate) …and thetypical query  Select * from jobs  Where job_begindate >= $begin_date  and job_enddate <= $end_date   Why isLIO sometimeshigh and sometimeslow?
  66. 66. 66Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Scenario#1: TheProblemsScenario#1: TheProblems Becausetheindex only hasbegin date q On very recent dates, it can go near the end of the index and scan to the end… q But on dates in the past – even a few months ago ü It positions to the $begin_date ü Scans to end of index ü For each leaf node, it does a LIO to data page to compare $end_date ü Some quick math….assume 50 rows per page per index leaf node  100 leaf pages = 5000 data page LIO’s ≈ 1 sec CPU (@5LIO/ms)  1000 leaf pages = 50000 data page LIO’s ≈ 10 sec CPU  10000 leaf pages = 500000 data page LIO’s ≈ 100 sec CPU  100000 leaf pages = 5000000 data page LIO’s ≈ 1000 sec CPU (16m40s) Soooo…. q For dates not very recent, we get an index leaf scan to end of index q Plus a datapage lookup for every leaf row 2010 2011 2012 2013 2014 > 01Mar2011 > 01Nov2012 > 01Jan2014
  67. 67. 67Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Scenario#1: TheSolutionsScenario#1: TheSolutions Solution #1: Add job_enddatetoindex  create index job_date_idx  on jobs (job_begindate, job_enddate) Solution #2: Add implied boundary todatequery  Select * from jobs  Where job_begindate between $begin_date and $end_date  and job_enddate between $begin_date and $end_date  Why both??? q Wouldn’t fixing the index be enough – why bother the coders and try to teach them better coding style???
  68. 68. 68Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Scenario#2Scenario#2 Consider theindex:  create index job_begin_idx  on jobs (job_category, job_begindate) …and thetypical query  Select * from jobs  Where job_begindate >= $begin_date  and job_enddate <= $end_date  Why doesit sometimesusetheindex and other timesnot?
  69. 69. 69Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Scenario#2: TheProblemScenario#2: TheProblem Theproblem iswearemissinga predicateon leadingindex columns q A similar situation occurs when we have intermediate index keys for which we have no valid SARGs Tohandlethis, ASE doesa bit of a trick q It looks at cardinality of unknown keys ü If low it considers an ORScan for each value ü If high, it considers an index leaf scan q Then it considers the selectivity of the known predicates Sooo…asa result q If we pick a date that is fairly recent (index is more selective), then we will likely do an ORScan and then a index leaf scan from the begin date until the next job_category q If we pick a date that isn’t very selective, then the ORScan becomes too expensive due to leaf scan per Orscan and we compare the multiple index leaf scan vs. single table scan
  70. 70. 70Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Scenario#2: TheSolutionScenario#2: TheSolution Solution: Add implied boundary todatequery  Select * from jobs  Where job_begindate between $begin_date and $end_date  and job_enddate between $begin_date and $end_date  …and thisiswhy wefix both theindex and thequery q In the above case, considering the index in scenario #2, as long as the range is fairly selective, we likely will do the ORScan
  71. 71. 71Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group OrScan in Lava TreeOrScan in Lava Tree ==================== Lava Operator Tree ==================== Emit (VA = 4) r:5 er:1 cpu: 0 / NestLoopJoin Inner Join (VA = 3) r:5 er:1 l:0 el:8 p:0 ep:8 / OrScan Restrict Max Rows: 2 (0)(0)(0)(4)(0) (VA = 0) (VA = 2) r:2 er:-1 r:5 er:1 l:0 el:-1 p:0 ep:-1 / IndexScan TBTCO~7 (VA = 1) r:9 er:1 l:28 el:8 p:0 ep:8 ============================================================
  72. 72. 72Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group OrScan in Show PlanOrScan in Show Plan |ROOT:EMIT Operator (VA = 6) | | |NESTED LOOP JOIN Operator (VA = 5) (Join Type: Inner Join) | | | | |NESTED LOOP JOIN Operator (VA = 3) (Join Type: Inner Join) | | | | | | |SCAN Operator (VA = 0) | | | | FROM OR List | | | | OR List has up to 12 rows of OR/IN values. | | | | | | |RESTRICT Operator (VA = 2)(0)(0)(0)(13)(0) | | | | | | | | |SCAN Operator (VA = 1) | | | | | FROM TABLE | | | | | SAPSR3.MSEG | | | | | T_01 | | | | | Index : MSEG~1 | | | | | Forward Scan. | | | | | Positioning by key. | | | | | Keys are: | | | | | MANDT ASC | | | | | MATNR ASC | | | | | Using I/O Size 128 Kbytes for index leaf pages. | | | | | With LRU Buffer Replacement Strategy for index leaf pages. | | | | | Using I/O Size 128 Kbytes for data pages. | | | | | With LRU Buffer Replacement Strategy for data pages. | |
  73. 73. 73Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Scenario#3Scenario#3 Consider thefollowingindex  create index job_begin_idx  on jobs (job_category, job_status, job_begindate, job_enddate) …and thetypical query  Select * from jobs  Where job_category = ‘night batch’  and job_status in (‘U’, ‘A’, ‘E’)  and job_begindate >= $begin_date  and job_enddate <= $end_date  Why might weonly position by job_category, job_status?
  74. 74. 74Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Scenario#3: TheProblemScenario#3: TheProblem Theproblem iswedon’t havemulti-density stats q And creating them might be a bit of a nightmare Asa result, ASE doesthefollowing q It weighs each selectivity individually: ü ‘nightly batch’ + ‘U’ + $begin_date ü ‘nightly batch’ + ‘A’ + $begin_date ü ‘nightly batch’ + ‘E’ + $begin_date q Then aggregates Here’stheproblem….assumeweonly have20 steps q Let’s pick a begin date 3 or more steps from the end ü …and assume end_date is in the same step ü …but remember, we have an unbounded range on both ….so  …effectively it will think it will be 3 steps for each $begin_date….not 1  …and it will thing $end_date is atrocious as is 17 steps worth (from beginning) q If we aggregate, then we will have 3x….so 9 steps….40% of table is 8 steps….we might table scan or look for different index
  75. 75. 75Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Scenario#3: TheSolutionScenario#3: TheSolution Updatecolumn statsfor distinctivecolumns q Use 100 steps or similar large value ü update statistics job_status (job_begindate) using 100 values q Result is that each step has a much lower selectivity value Add thebounded rangeintothequery q This means we aggregate only across the exact range of dates we want…which reduces the impact of the IN() clause q 
  76. 76. 76Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group ASE’sOR StrategyASE’sOR Strategy If thequery containsan OR clauseon different columns q ASE will (and can) use two different indexes ü On index for predicates on one side of OR ü …and a different index for predicates on other side of OR ü This would be similar to splitting the query in two with union q However, if one side of OR drives a tablescan – ASE will tablescan ü Remember, we saw this with the id=8 OR 1=2 example Common issues q One side of OR not indexed well….drives tablescan q Developer attempted to use 1 index to cover both
  77. 77. 77Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group An Exampleof Indexingvs. ORAn Exampleof Indexingvs. OR Consider thefollowingquery:  SELECT "VBELV" ,"POSNV" ,"VBELN" ,"POSNN" ,"VBTYP_N" ,"RFMNG" ,"MEINS" ,"VBTYP_V"  ,"ERDAT" ,"ERZET" ,"AEDAT" ,"STUFE" ,"VRKME"  FROM "VBFA"  WHERE "MANDT" = ? AND ( "ERDAT" = ? OR "AEDAT" = ? )  /* R3:SAPLZFEDWS1:767 T:VBFA M:430 */  Now, consider theindexes:  index_name index_keys  ------------------------------------- --------------------------------------------  VBFA~0 MANDT, VBELV, POSNV, VBELN, POSNN, VBTYP_N  VBFA~Z01 MANDT, VBELN  VBFA~Z02 ERDAT, BWART  VBFA~Z04 MANDT, ERDAT, AEDAT  VBFA~Z99 MANDT, LOGSYS  Issueisthat thequery seemstodrivea tablescan…. q …it seems obvious that VBFA~Z04 should be used….. q ….or is it???
  78. 78. 78Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Let’slook a littlecloserLet’slook a littlecloser Lookingat systabstats  ColumnName ColumnID Row_Count RequestedSteps ActualSteps ApproxDistincts DistinctsPerStep  -------------- -------- -------------------- -------------- ----------- --------------- -----------------  AEDAT 22 1255008198 50 50 1625 33.0  BWART 17 1255008198 50 29 64 2.0  ERDAT 14 1255008198 50 245 4674 19.0  LOGSYS 38 1255008198 50 2 1 1.0  MANDT 1 1255008198 50 2 1 1.0  POSNN 5 1255008198 50 573 93300 163.0  POSNV 3 1255008198 50 231 12649 55.0  VBELN 4 1255008198 50 38 85330918 2245550.0  VBELV 2 1255008198 50 38 31223216 821664.0  VBTYP_N 6 1255008198 50 31 25 1.0 Hmmmm….not very good query criteria q MANDT is useless as always q AEDAT and ERDAT are not very distinct….1625 and 4674 values respectively ü Which means each distinct value will return ~250K to ~1M
  79. 79. 79Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group AEDAT Stats….from optdiagAEDAT Stats….from optdiag Statistics for column: AEDAT Last update of column statistics: Jan 10 2014 7:21:35:026PM Range cell density: 0.0000017268359901 Total density: 0.9986527756879466 … Unique range values: 0.0000004149259654 Unique total values: 0.0006153846153846 … Histogram for column: AEDAT Column datatype: varchar(24) … Statistics step count sticky Statistics hashing sticky Statistics hashing low domain used Step Weight Value (only 255 bytes used) 1 0.00000000 < '00000000' 2 0.99932617 = '00000000' 3 0.00001720 <= '20080724' 4 0.00001430 <= '20080826' 5 0.00001409 <= '20081030' 6 0.00001545 <= '20081113' 7 0.00001415 <= '20081216' 8 0.00001419 <= '20090310' 9 0.00001468 <= '20090331' 10 0.00002772 <= '20090615' … OUCH!!!!!
  80. 80. 80Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group ERDAT Stats….from optdiagERDAT Stats….from optdiag Statistics for column: ERDAT Last update of column statistics: Jan 10 2014 7:21:35:026PM Range cell density: 0.0005738551548958 Total density: 0.0006834762135235 … Unique range values: 0.0001879716956084 Unique total values: 0.0002139495079161 … Requested step count: 50 Actual step count: 245 … Statistics step count sticky Statistics hashing sticky Statistics hashing low domain used Step Weight Value (only 255 bytes used) 1 0.00000000 < '00000000' 2 0.00004201 = '00000000' 3 0.01879592 <= '20030624' 4 0.01879998 <= '20040316' 5 0.01888011 <= '20041015' 6 0.01887963 <= '20050502' 7 0.01878721 <= '20051031' 8 0.01888958 <= '20060420' 9 0.01879898 <= '20061014' 10 0.01882141 <= '20070417' BETTER!!!!
  81. 81. 81Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Tounderstand, let’ssimplify thingsTounderstand, let’ssimplify things Assumewehavea tableof customer transactions… q with 1 billion rows q PKEY is transaction_id (not that it matters…..) q Has an index (IDX~1) on {purchase_date, ship_date} ü Both purchase_date and ship_date are not very distinct ü think about it …only 365 in a year….~3600 in 10 years… not very distinctive out of 1 billion row table Now consider thequery:   Select * from cust_transactions  where purchase_date=‘Jan 1 2014’ OR ship_date=‘Jan 1 2014’  Seetheproblem?.... Think about it….
  82. 82. 82Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group TheProblemTheProblem Theproblem query:   Select * from cust_transactions  where purchase_date=‘Jan 1 2014’ OR ship_date=‘Jan 1 2014’  Theproblems…. q We can use the index IDX~1 for the purchase_date case …..depending of course on selectivity of the data provided q …but the OR clause means it that we also need to look for the ship date ü individually and not in combination with purchase date – remember a composite index works on COMBINING cols q ….using IDX~1 for that is sort of useless as we can’t use the leading purchase_date column as the OR clause is disjunctive…..the query really could be expressed as:   select * from cust_transactions where purchase_date=‘Jan 1 2014’  union  select * from cust_transactions where ship_date=‘Jan 1 2014’
  83. 83. 83Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Remember special OR strategy???Remember special OR strategy??? When an OR condition exists: q ASE can use multiple indexes – a different index for each side of the OR q This ‘special OR strategy’ is also known as ‘index union’ When lookingat thequery & index q ASE says index is probably okay for purchase_date…. q ….but says it will need to tablescan for ship_date q Why the tablescan ü Remember, this is a DOL table and the index keys are sorted by purchase_date, then ship_date ü ….so we would have to scan ALL the leaf pages to find that ship_date ü ….only to find out that 1/4000th of the table qualifies ü ….and they are scattered around due to purchase date, so….LIO exceeds cost of tablescan so we do tablescan ü ….especially if we have an OR value of ‘00000000’….which is 99% of the table.
  84. 84. 84Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group What about IN()???What about IN()??? If you werewatchingclosely….you already know theanswer If you think about it…. q …an IN() is like an OR list… q ….in fact ASE flattens into one So, all wedois: q Cost each one individually q Aggregate them into a final cost
  85. 85. 85Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group A SimpleIN() exampleA SimpleIN() example 1> select * from sysobjects where id in (2,4,6,8,10,12,14,16) The Lop tree: ( project ( scan sysobjects ) ) OptBlock0 The Lop tree: ( scan sysobjects ) Generic Tables: ( Gtt0( sysobjects ) Gti1( csysobjects ) ) Generic Columns: … Predicates: ( ( { sysobjects.id } = 16 tc:{25} OR{ sysobjects.id } = 14 tc:{25} OR { sysobjects.id } = 12 tc:{25} OR{ sysobjects.id } = 10 tc:{25} OR { sysobjects.id } = 8 tc:{25} OR{ sysobjects.id } = 6 tc:{25} OR { sysobjects.id } = 4 tc:{25} OR{ sysobjects.id } = 2 tc:{25} ) tc:{25} ) Transitive Closures: …) IN() clause is expanded to OR’s….note that all have the same transitive closure id (tc:{25})
  86. 86. 86Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Individual OR term selectivityIndividual OR term selectivity BEGIN GENERAL OR ANALYSIS OF all types of indices FOR sysobjects ANALYZING OR TERM 1 Estimating selectivity of index 'sysobjects.csysobjects', indid 3 id = 16 Estimated selectivity for id, selectivity = 0.1, scan selectivity 0.02272727, filter selectivity 0.02272727 restricted selectivity 0.1 unique index with all keys, one row scans 1 rows, 1 pages … ANALYZING OR TERM 2 Estimating selectivity of index 'sysobjects.csysobjects', indid 3 id = 14 … ANALYZING OR TERM 3 Estimating selectivity of index 'sysobjects.csysobjects', indid 3 id = 12 … ANALYZING OR TERM 4 Estimating selectivity of index 'sysobjects.csysobjects', indid 3 id = 10 … ==================== Lava Operator Tree ==================== Emit (VA = 3) r:8 er:5 cpu: 0 / NestLoopJoin Inner Join (VA = 2) r:8 er:5 l:0 el:5 p:0 ep:4 / OrScan IndexScan Max Rows: 8 csysobjects (VA = 0) (VA = 1) r:8 er:-1 r:8 er:5 l:0 el:-1 l:12 el:5 p:0 ep:-1 p:0 ep:4 ============================================================
  87. 87. 87Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group AggregatingSelectivity for ORAggregatingSelectivity for OR END GENERAL OR ANALYSIS FOR all types of indices - INDICES FOUND FOR ALL OR TERMS Scan on table sysobjects skipped because table scan less than concurrency threshold Estimating selectivity of index 'sysobjects.csysobjects', indid 3 Estimated selectivity for id, selectivity = 0.8, scan selectivity 0.8, filter selectivity 0.8 restricted selectivity 1 special or terms 8 35.2 rows, 1 pages Data Row Cluster Ratio 0.99999 Index Page Cluster Ratio 1 Data Page Cluster Ratio 1 using no index prefetch (size 4K I/O) in index cache 'default data cache' (cacheid 0) with LRU replacement using no table prefetch (size 4K I/O) in data cache 'default data cache' (cacheid 0) with LRU replacement Data Page LIO for 'csysobjects' on table 'sysobjects' = 1.600336 Whoa!!! Prediction is 80% of the table…which had 44 rows….thankfully in *this* case, it still was only 1 page
  88. 88. 88Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group AggregatingIN()AggregatingIN() Aggregation isunintelligent q It doesn’t check how many are from same range cell Result istheaggregated valueisoften over-inflated  TIP: Makesureyou havehistogram steps> largest IN() list q For SAP systems, this will be 100
  89. 89. 89Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Out of rangehistogramsOut of rangehistograms Originally added toASE 15.0 for monotonicsequences q For example, sequential numbers, datetime (e.g. current datetime) q Often times if stats only updated every week, a large portion of the new data values where higher than the histogram range ü As a result, the optimizer would estimate 0 values and select an index based on that reduced cost estimate whereas in reality there could be millions of rows q With out of range histograms, several factors are used to estimate how many data values exist beyond the last histogram cell and cost is adjusted higher Usually in such cases, out of rangehistogramsisa sign of stalestats q ….but for high insert/append use cases, you may be updating or re-reading a row that was just inserted – e.g. reporting on today’s sales
  90. 90. 90Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Low Cardinality ExamplesLow Cardinality Examples Histogram tuningmay bea bad thingfor short duration “STATUS” columns q Most of the values in the histogram will be “C” for complete q Unless there is a “permanent” status higher than “U” for unprocessed, it is unlikely that update stats will catch a “U” value ü During migration, the system is likely quiesced with nothing incomplete ü Post-migration, if stats are run during quiet period, likely no incomplete values exist q Out of range histogram throws off optimizer….0 would have been better estimate ü Running update stats on weekends or nights when quiet simply causes same problem…as jobs are likely all complete q Spotted with ‘set option show on’ May alsohappen with very low cardinality “TYPE” columns q Or any very low cardinality column, in reality when value in predicate is extremely low occurrence in a very low cardinality column and value is higher than more common value(s)
  91. 91. 91Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group ExampleHistogramExampleHistogram Histogram for column: "ENTRY_TYPE" … Out of range Histogram Adjustment is DEFAULT. Sticky step count. Sticky partial_hashing. Step Weight Value 1 0.00000000 < "C" 2 1.00000000 = "C" Histogram for column: "STATUS" … Out of range Histogram Adjustment is DEFAULT. Low Domain Hashing. Sticky step count. Sticky partial_hashing. Step Weight Value 1 0.00000000 < "C" 2 0.98791176 = "C" 3 0.00084806 < "T" 4 0.01124019 = "T"
  92. 92. 92Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Example‘set option show output’Example‘set option show output’ Estimating selectivity of index 'SAPSR3.ESH_EX_CPOINTER.ESH_EX_CPOINTER~ST', indid 3 STATUS = 'U' ENTRY_TYPE = 'P' Estimated selectivity for ENTRY_TYPE, Out of range histogram adjustment, selectivity = 0.3333333, Estimated selectivity for STATUS, Out of range histogram adjustment, selectivity = 0.2, scan selectivity 0.2, filter selectivity 0.2 60412.2 rows, 34.2 pages Data Row Cluster Ratio 0.9924527 Index Page Cluster Ratio 0.218543 Data Page Cluster Ratio 0.02202437 using index prefetch (size 128K I/O) Large IO selected: The number of leaf pages qualified is > MIN_PREFETCH pages in index cache 'default data cache' (cacheid 0) with LRU replacement
  93. 93. 93Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Toprevent out of rangehistogramsToprevent out of rangehistograms Turn off for updatestatistics q Turn off for columns – not a whole table or specific index q Syntax  update statistics table_name  [[partition data_partition_name]  [ (column1, column2, …) | (column1), (column2), …] |  index_name [partition index_partition_name]]  [using step values | [out_of_range [on | off| default]]]  [with consumers = consumers][, sampling=N percent]  [, no_hashing | partial_hashing | hashing]  [, max_resource_granularity = N [percent]]  [, histogram_tuning_factor = int ]  [, print_progress = int] q Example  Update statistics SAPSR3.ESH_EX_CPOINTER (ENTRY_TYPE) out_of_range off  Update statistics SAPSR3.ESH_EX_CPOINTER (STATUS) out_of_range off Out of rangehistogram is“sticky” q Just like the number of steps, setting this once causes it to be used as the default for all future update statistics that does not specify a value.
  94. 94. (c) 2015 Independent SAP Technical User GroupAnnual Conference, 2015 OPTIMIZATION COSTINGOPTIMIZATION COSTING (PART 2)(PART 2) Multi-Column Densities& Joins…
  95. 95. 95Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Multi-Column DensitiesMulti-Column Densities A underused secret weapon q Useful any time multiple predicates exist q Think of it this way: ü Two sample predicates  Col_A = ‘5’  Col_B = ‘GREEN’ ü Assume both have a selectivity of 0.1  Combination could still be 0.1 if all Col_A=5 and Col_B=‘GREEN’ are same rows  Combination could be 0.01 (or less) if only a single row had the combination When doesit matter q Joins, distinct, subquery (caching), sort estimations, …. q Anyplace where the estimated number of rows returning could change the query plan (and tip costs towards an alternative ‘bad’ plan) q Especially since we don’t have composite column histograms
  96. 96. 96Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Multi-Column Density (Index)Multi-Column Density (Index) Statistics for index: "aqi_weather_date_idx" (nonclustered) Index column list: "sample_date", "air_temp", "weather" Leaf count: 254345 Data page CR count: 167946797.0000000000000000 Index page CR count: 32018.0000000000000000 Data row CR count: 168066295.0000000000000000 Leaf row size: 6.1150672008890936 Index height: 3 Statistics for column group: "sample_date", "air_temp" Last update of column statistics: May 27 2014 11:45:45:016AM Range cell density: 0.0000051768562637 Total density: 0.0000051768562637 Range selectivity: default used (0.33) In between selectivity: default used (0.25) Unique range values: 0.0000016563476210 Unique total values: 0.0000016563476210 Average column width: default used (2.00) Rows scanned: 168066824.0000000000000000 Statistics version: 4 Statistics for column group: "sample_date", "air_temp", "weather" Last update of column statistics: May 27 2014 11:45:45:016AM Range cell density: 0.0000051075008894 Total density: 0.0000051075008894 Range selectivity: default used (0.33) In between selectivity: default used (0.25) Unique range values: 0.0000016297687032 Unique total values: 0.0000016297687032 Average column width: 8.5268955638740458 Rows scanned: 168066824.0000000000000000 Statistics version: 4 This is the cost of a covered query (less any portion of index not needed) The ‘weather’ column must not be very distinct as it doesn’t alter the table total density or range density by very much If the IO cost of the index is ~page count and the IO cost for the table is near the leaf count – it is doing an index scan and then following each leaf…. Often not a good strategy unless only a few rows Any NL join using this index would need to traverse the index tree this many times per outer row (Note: Index cluster ratios removed due to space)
  97. 97. 97Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Usinga Multi-Column DensityUsinga Multi-Column Density Remember, wedon’t havecompositehistograms First weconsider theselectivity of each of thecolumnsindividually q This gives us an idea of how many rows there could be q For example, col_A has 2 rows & col_B has 5 rows…. ü Total range is between 2 & 10 rows ü Probability is likely closer to 2…but depends on reality…. Then welook at multi-column density q This is our flavor of reality to temper probability q We use the above with a proprietary formula to compute the selectivity ü The more selective each column, the closer to the multi-column density
  98. 98. 98Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Example: Multi-Column DensityExample: Multi-Column Density Statistics for column group: "sample_date", "air_temp", "weather" Last update of column statistics: May 27 2014 11:45:45:016AM Range cell density: 0.0000051075008894 Total density: 0.0000051075008894 Range selectivity: default used (0.33) In between selectivity: default used (0.25) Unique range values: 0.0000016297687032 Unique total values: 0.0000016297687032 Average column width: 8.5268955638740458 Rows scanned: 168066824.0000000000000000 Statistics version: 4 1> select l.city, l.county, s.sample_date, s.air_temp 2> from aqi_locations l, aqi_samples s 3> where l.location_id=s.location_id 4> and s.sample_date = 'July 1 2000 12:00:00:000PM' 5> and l.state='PA' 6> and s.weather='Overcast' 7> and s.air_temp = 90 Estimating selectivity of index 'aqi_samples.aqi_weather_date_idx', indid 3 sample_date= Jul 1 2000 12:00:00:000PM weather = 'Overcast' air_temp = 90 Estimated selectivity for sample_date, selectivity = 0.0002490077, Estimated selectivity for air_temp, selectivity = 0.01104084, Estimated selectivity for weather, selectivity = 0.002359544, scan selectivity 5.11258e-006, filter selectivity 5.11258e-006 859.2551 rows, 1.300359 pages Data Row Cluster Ratio 3.186365e-006 Index Page Cluster Ratio 0.9989935 Data Page Cluster Ratio 0.0007121012 using no index prefetch (size 4K I/O) in index cache 'default data cache' (cacheid 0) with LRU replacement using no table prefetch (size 4K I/O) in data cache 'default data cache' (cacheid 0) with LRU replacement Data Page LIO for 'aqi_weather_date_idx' on table 'aqi_samples' = 859.2551 Selectivity based single histogram cell for sample_date Selectivity based single histogram cell for air_temp Selectivity based on single histogram cell for weather Selectivity estimate based on numbers of values for the above combined with multi-cell density. Since only a few values for each, the selectivity is close to multi-column density
  99. 99. 99Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Problem – LargeEstimatesProblem – LargeEstimates In somecases, wecan’t usemulti-column densities q For example, columns involved may have ranges of values q The total estimates of rows could then be astronomical ü Perhaps even higher than the real rowcount In such cases, wecomputea ‘smart’ density q We know the best case is the most selective column q We then simply a formula to derive a selectivity ü Some cite sum(cell weight**2) ü Others use W1*W2 + W1*W2*W3 …
  100. 100. 100Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Example: Multi-Column EstimateExample: Multi-Column Estimate 1> select l.city, l.county, s.sample_date, s.air_temp 2> from aqi_locations l, aqi_samples s 3> where l.location_id=s.location_id 4> and s.sample_date between 'July 1 2000 00:00:01' and 'July 31 2000 23:59:59' 5> and l.state='PA' 6> and s.weather='Overcast' 7> and s.air_temp < 85 Estimating selectivity of index 'aqi_samples.aqi_weather_date_idx', indid 3 sample_date>= Jul 1 2000 12:00:01:000AM sample_date <= Jul 31 2000 11:59:59:000PM weather = 'Overcast' air_temp < 85 Estimated selectivity for sample_date, selectivity = 0.007751161, Estimated selectivity for air_temp, selectivity = 0.7523476, Estimated selectivity for weather, selectivity = 0.002359544, Intelligent Scan selectivity reduction from 0.007751161 to 0.005852389 scan selectivity 0.005852389, filter selectivity 1.375984e-005 restricted selectivity 0.007751161 983592.5 rows, 1488.526 pages Data Row Cluster Ratio 3.186365e-006 Index Page Cluster Ratio 0.9989935 Data Page Cluster Ratio 0.0007121012 using index prefetch (size 32K I/O) Large IO selected: The number of leaf pages qualified is > MIN_PREFETCH pages in index cache 'default data cache' (cacheid 0) with LRU replacement using no table prefetch (size 4K I/O) in data cache 'default data cache' (cacheid 0) with LRU replacement Data Page LIO for 'aqi_weather_date_idx' on table 'aqi_samples' = 2312.572 Selectivity based on aggregating all the dates in the range Selectivity based all temps in unbounded range Selectivity based on single cell density for weather The worst case projection is the most selective of the above A better estimate is we use a formula to derive a new value we think is more accurate for the scan selectivity (estimate of index rows & leaf pages)…loosely it is sum(W1*W2…) – e.g. W1*W2+W1*W2*W3 The filter selectivity (estimate of data pages) is the product of the weights (e.g. W1*W2*W3 or 0.007751161* 0.7523476* 0.002359544 = 0.0000137598)
  101. 101. 101Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group When tocreate(multi-)column statsWhen tocreate(multi-)column stats Okay – weknow automatically created for index keys q …and used for joins When do/ought wecreateour own q On the 2-nth index key (or subset) ü ASE creates stats on {A}, {A,B},{A,B,C}, {A,B,C,D} ü Might be useful to have {B,C,D} or {B,C}  Help trip ORScans if leading column frequently not a predicate  Help with joins when leading column is specified as literal/lateral join (ala SAP) q On low cardinality columns we don’t want to index ü …but frequently used as predicates (such as gender) ü Especially if often used in queries with joins (help inner/out table decision) Not automatically maintained with ‘updateindex stats’ q You need to manually run update stats on each column density you create
  102. 102. 102Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group JoinsJoins Traditional Logic @ DrivingTable q Put the table that seems to ‘drive’ the join as the outer table q Typically, this will be the ‘smaller’ table (or smaller rowset) q The developer may know the driving table (e.g. #temp) q …but optimizer has to figure it out ü Estimate rowsets from each table using index selectivity ü Estimate joined rows from joining with each table in list  Reducing joined rows by applying index selectivity as filter  But remember, this is a guess at optimization time AlternativeLogic Pin smaller in cache q Put larger rowset table as outer and scan once q Inner (smaller) table can be pinned in cache ü Avoid higher PIO In both cases, themulti-column statson join columnsarekey torowset estimates
  103. 103. 103Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Join StrategiesJoin Strategies Remember, wehave3 typesof joins q Nested Loop Joins q Merge Joins (including Sort Merge Joins) q Hash Joins Optimizer needstofigureout which oneisbest q For indexed joins, typically an NLJ will be best … ü ….but this assumes M:N ratio is reasonably small (e.g. 1:10) q A merged join is great for high cardinality joins ü M:N is high r 1:1000+ ü Especially if inner table is sorted in join key sequence q A hash join works best when join keys are not predicates but predicates eliminate a lot of rows on both sides of join ü Outer table is filtered by predicates and join keys hashed into build table ü Inner table is filtered by predicates, join key hashed and probed for in build table
  104. 104. 104Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Thisiswhy statsaresooo…criticalThisiswhy statsaresooo…critical Weusethem toestimate q cardinality of the join q Rows that qualify from predicates (unjoined) If theestimatesareoff by a lot q We likely predict it is a high cardinality join ü Remember, with 4 join keys, if we don’t have stats on the other 3 columns, we use magic values of 0.1 q With very high row counts projected from inner table…. ü If we consider 3 levels of indexing and 10M rows, that’s 40M LIO ü Sorting 10M rows may only take 20M LIO’s… ü ….so we degrade into a Sort Merge Join (SMJ)
  105. 105. 105Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Join Keys: TheQueryJoin Keys: TheQuery SELECT TOP 1 T_00."PRRBA" FROM SAPSR3."/PXY/ACTUAL_DEP" T_00 INNER JOIN SAPSR3."/PXY/SCD" T_01 ON T_01."MANDT" = ? AND T_01."RBARE" = T_00."PRRBA" AND T_01."SCNA" = T_00."PRSCNA" AND T_01."EXECNO" = T_00."PREXEC" AND T_01."STEP" = T_00."PRST" WHERE T_00."MANDT" = ? AND T_00."SCNA" = ? AND T_00."EXECNO" = ? AND T_00."STEP" = ? AND T_00."RBARE" = ? AND T_01."STATUS" <> ? AND T_01."STATUS" <> ? /* R3:/PXY/SAPLRB:72334 T:/PXY/ACTUAL_DEP M:430 */ create unique nonclustered index "/PXY/ACTUAL_DEP~0" on SAPSR3."/PXY/ACTUAL_DEP"(MANDT, SCNA, EXECNO, STEP, RBARE, PRSCNA, PREXEC, PRST, PRRBA) create nonclustered index "/PXY/ACTUAL_DEP~00" on SAPSR3."/PXY/ACTUAL_DEP"(MANDT, PRSCNA, PREXEC, PRST, PRRBA, SCNA, EXECNO, STEP, RBARE) create unique nonclustered index "/PXY/SCD~0" on SAPSR3."/PXY/SCD"(MANDT, RBARE, SCNA, EXECNO, STEP) create nonclustered index "/PXY/SCD~ID1" on SAPSR3."/PXY/SCD"(MANDT, SCNA, EXECNO, RBARE) Notice the lateral join on MANDT = <value>. Knowing that ASE has issues with literals at the beginning of the join, we will see if adding multi- column stats on {RBARE, SCNA, EXECNO, STEP} helps NLJoin costing
  106. 106. 106Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Join Keys– Bad Index UsageJoin Keys– Bad Index Usage | |TOP Operator (VA = 4) | | Top Limit: 1 | | |MERGE JOIN Operator (Join Type: Inner Join) (VA = 3) | | | Using Worktable2 for internal storage. | | | Key Count: 4 | | | Key Ordering: ASC ASC ASC ASC | | | |SORT Operator (VA = 1) | | | | Using Worktable1 for internal storage. | | | | |SCAN Operator (VA = 0) | | | | | FROM TABLE | | | | | SAPSR3./PXY/ACTUAL_DEP | | | | | T_00 | | | | | Index : /PXY/ACTUAL_DEP~0 | | | | | Forward Scan. | | | | | Positioning by key. | | | | | Index contains all needed columns. Base table will not be read. | | | | | Keys are: | | | | | MANDT ASC | | | | | SCNA ASC | | | | | EXECNO ASC | | | | | STEP ASC | | | | | RBARE ASC | | | | | Using I/O Size 16 Kbytes for index leaf pages. | | | | | With LRU Buffer Replacement Strategy for index leaf pages. | | | |SCAN Operator (VA = 2) | | | | FROM TABLE | | | | SAPSR3./PXY/SCD | | | | T_01 | | | | Index : /PXY/SCD~0 | | | | Forward Scan. | | | | Positioning by key. | | | | Keys are: | | | | MANDT ASC | | | | Using I/O Size 16 Kbytes for index leaf pages. | | | | With LRU Buffer Replacement Strategy for index leaf pages. | | | | Using I/O Size 16 Kbytes for data pages. | | | | With LRU Buffer Replacement Strategy for data pages.
  107. 107. 107Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Join Permutation Costing(1)Join Permutation Costing(1) xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx BEGIN: Complete join order evaluation (perm #1) xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Permutation Order: Gt0( SAPSR3./PXY/ACTUAL_DEP T_00 ) |X| Gt1( SAPSR3./PXY/SCD T_01 ) joining using ( PopNlJoin () () ) cost:0 tempdb:0 order: none outer Pops: ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) cost:81.29999 T(L3,P3,C2.999999) O(L3,P3,C2.999999) tempdb:0 order: <3,2,1,9> ( PopSort ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ) cost:114.148 T(L9.765611,P3.76561,C4.765611) O(L6,P0,C1) tempdb:0.001237151 order: {1,2,3,9} Has BmoSort inner Pops: ( PopRidJoin ( PopIndScan /PXY/SCD~0 SAPSR3./PXY/SCD T_01 ) ) cost:1989.483 T(L73.16116,P73.16116,C141.3204) O(L70.16116,P70.16116,C140.3204) tempdb:0.0006185754 order: <9,3,2,1> joining using ( PopMergeJoin () () ) cost:0 tempdb:0 order: none outer Pops: ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) cost:81.29999 T(L3,P3,C2.999999) O(L3,P3,C2.999999) tempdb:0 order: <3,2,1,9> ( PopSort ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ) cost:114.148 T(L9.765611,P3.76561,C4.765611) O(L6,P0,C1) tempdb:0.001237151 order: {1,2,3,9} Has BmoSort inner Pops: ( PopRidJoin ( PopIndScan /PXY/SCD~ID1 SAPSR3./PXY/SCD T_01 ) ) cost:1162186 T(L183590.3,P5562.217,C6559500) O(L182634.3,P4606.217,C4055874) tempdb:0 order: <3,2,9> ( PopRidJoin ( PopIndScan /PXY/SCD~0 SAPSR3./PXY/SCD T_01 ) ) cost:614.7092 T(L20.83115,P20.78714,C533.6843) O(L17.83115,P17.78714,C355.7895) tempdb:0 order: <9,3,2,1> ( PopSort ( PopRidJoin ( PopIndScan /PXY/SCD~ID1 SAPSR3./PXY/SCD T_01 ) ) ) cost:4406059 T(L44736.09,P46577.09,C3.15216e+07) O(L1851,P3692,C3.147871e+07) tempdb:3077.973 order: {1,2,3,9} Has BmoSort
  108. 108. 108Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Join Permutation Costing(2)Join Permutation Costing(2) Eagerly enforcing... the cheapest Pop: ( PopMergeJoin ( PopSort ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ) ( PopRidJoin ( PopIndScan /PXY/SCD~0 SAPSR3./PXY/SCD T_01 ) ) ) cost:735.8733 T(L30.59676,P24.55275,C608.61) O(L0,P0,C70.16021) tempdb:0.0006185754 order: none ... Pop enforcers: ... PopLet enforcers: ... done eager enforcement. All Pops/PopLets before EqcN selection: -> initial Pops: ( PopMergeJoin ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ( PopRidJoin ( PopIndScan /PXY/SCD~ID1 SAPSR3./PXY/SCD T_01 ) ) ) cost:1288721 T(L191677,P7108.215,C7276614) O(L8083.682,P1542.997,C717110.6) tempdb:0 order: none ( PopMergeJoin ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ( PopSort ( PopRidJoin ( PopIndScan /PXY/SCD~ID1 SAPSR3./PXY/SCD T_01 ) ) ) ) cost:4406148 T(L44739.09,P46580.09,C3.152167e+07) O(L0,P0,C70.16021) tempdb:1538.986 order: none ( PopMergeJoin ( PopSort ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ) ( PopRidJoin ( PopIndScan /PXY/SCD~ID1 SAPSR3./PXY/SCD T_01 ) ) ) cost:1162645 T(L183600,P5565.983,C6562956) O(L0,P0,C3451.033) tempdb:0.0006185754 order: none ( PopMergeJoin ( PopSort ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ) ( PopRidJoin ( PopIndScan /PXY/SCD~0 SAPSR3./PXY/SCD T_01 ) ) ) cost:735.8733 T(L30.59676,P24.55275,C608.61) O(L0,P0,C70.16021) tempdb:0.0006185754 order: none ( PopMergeJoin ( PopSort ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ) ( PopSort ( PopRidJoin ( PopIndScan /PXY/SCD~ID1 SAPSR3./PXY/SCD T_01 ) ) ) ) cost:4406180 T(L44745.86,P46580.86,C3.152167e+07) O(L0,P0,C70.16021) tempdb:1538.987 order: none Has BmoSort ( PopNlJoin ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ( PopRidJoin ( PopIndScan /PXY/SCD~0 SAPSR3./PXY/SCD T_01 ) ) ) cost:2070.783 T(L76.16116,P76.16116,C144.3204) tempdb:0 order: none ( PopNlJoin ( PopSort ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ) ( PopRidJoin ( PopIndScan /PXY/SCD~0 SAPSR3./PXY/SCD T_01 ) ) ) cost:2103.631 T(L82.92677,P76.92677,C146.086) tempdb:0.0006185754 order: none Has BmoSort
  109. 109. 109Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Join Permutation Costing(3)Join Permutation Costing(3) Eqc competition ... initial old Pops: ( PopNlJoin ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ( PopRidJoin ( PopIndScan /PXY/SCD~0 SAPSR3./PXY/SCD T_01 ) ) ) cost:2070.783 T(L76.16116,P76.16116,C144.3204) tempdb:0 order: none initial new Pops: ... pruned new against total 0 pruned new against old 5 pruned old against new 1 kept old Pops: ( PopNlJoin ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ( PopRidJoin ( PopIndScan /PXY/SCD~0 SAPSR3./PXY/SCD T_01 ) ) ) cost:2070.783 T(L76.16116,P76.16116,C144.3204) tempdb:0 order: none kept new Pops: ( PopMergeJoin ( PopSort ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ) ( PopRidJoin ( PopIndScan /PXY/SCD~0 SAPSR3./PXY/SCD T_01 ) ) ) cost:735.8733 T(L30.59676,P24.55275,C608.61) O(L0,P0,C70.16021) tempdb:0.0006185754 order: none ... done Eqc competition. ... done join visit. Join plans selected for this permutation: OptBlock0 Eqc{0,1} -> Pops added for the join Eqc{0} - Eqc{1}: ( PopMergeJoin ( PopSort ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ) ( PopRidJoin ( PopIndScan /PXY/SCD~0 SAPSR3./PXY/SCD T_01 ) ) ) cost:735.8733 T(L30.59676,P24.55275,C608.61) O(L0,P0,C70.16021) tempdb:0.0006185754 order: none move greedy pops to new list ( PopNlJoin ( PopIndScan /PXY/ACTUAL_DEP~0 SAPSR3./PXY/ACTUAL_DEP T_00 ) ( PopRidJoin ( PopIndScan /PXY/SCD~0 SAPSR3./PXY/SCD T_01 ) ) ) cost:2070.783 T(L76.16116,P76.16116,C144.3204) tempdb:0 order: none ... done move greedy pops to new list. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx DONE: Complete join order evaluation (perm #1) xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx “old Pops” = 12.5 style optimization – note that the cost is >2000
  110. 110. 110Annual Conference, 2015 (c) 2015 Independent SAP Technical User Group Join Permutation Costing(4)Join Permutation Costing(4) ** Costing set up for RowLimit optimization ** TopLogProps0( SAPSR3./PXY/ACTUAL_DEP T_00 ) - TopPred: [Tc{} Pe{0,1,2,3,4}] TopSubst: {1,2,3,4,5,6,7,8,9,17} TopLogProps0( SAPSR3./PXY/SCD T_01 ) - TopPred: [Tc{} Pe{5,6,7}] TopSubst: {11,12,13,14,15,16} Statistics for rows returned to client... Estimated rows :14073.64 Estimated row width :7.002473 Estimated client cost is :78.59161 Estimating selectivity of index 'SAPSR3./PXY/SCD./PXY/SCD~0', indid 2 MANDT = '430' Estimated selectivity for MANDT, selectivity = 1, scan selectivity 1, filter selectivity 1 Cost adjusted for RowLimit optimization, Adjustment ratio 7.105484e-05 2503626 rows, 6283 pages Adjustment ratio 7.105484e-05 applied gives 177.8947 rows, 1 pages Data Row Cluster Ratio 0.9107559 Index Page Cluster Ratio 0.9874477 Data Page Cluster Ratio 0.242736 using index prefetch (size 128K I/O) Large IO selected: The number of leaf pages qualified is > MIN_PREFETCH pages Adjustment using index prefetch (size 128K I/O) in index cache 'default data cache' (cacheid 0) with LRU replacement using table prefetch (size 128K I/O) Large IO selected: The number of leaf pages qualified is > MIN_PREFETCH pages Adjustment using table prefetch (size 128K I/O) in data cache 'default data cache' (cacheid 0) with LRU replacement Data Page LIO for '/PXY/SCD~0' on table 'SAPSR3./PXY/SCD' = 17.83115

×