Your SlideShare is downloading. ×
Download
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Download

190
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
190
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1.  IBM Corporation 2003 Tampa Bay Relational Users Group Query diagnosis IBM Silicon Valley Lab, U.S.A.
  • 2. Query analysis and tuning
    • Format the SQL statement
      • Prepare the statement for human tuning
    • Separate sections for:
      • SELECT list
      • FROM clause
      • WHERE clause
    • Tools support
      • Data Studio fixpack 2.2.0.1 includes SQL formatting
        • Show transformed SQL text
  • 3. Sample unformatted query
    • EXPLAIN PLAN SET QUERYNO = 1 FOR
    • SELECT DISTINCT ITEM.ITEM_NBR AS ITEM_NBR, ITEM.PRDT_ID, STOREITEM.WK_STRT_DT AS WK_STRT_DT ,STOREITEM.DC_ID AS DC_ID FROM PROD.TIPA004_STITM_PROJ AS STOREITEM , PROD.TITM001_ITEM AS ITEM WHERE ITEM.BUS_UNIT_ID = ‘GS‘ AND ITEM.BUS_UNIT_ID = STOREITEM.BUS_UNIT_ID AND ITEM.MJR_CATG_ID = '00754‘ AND ITEM.INTMD_CATG_ID = '00043‘ AND ITEM.ITEM_NBR = STOREITEM.ITEM_NBR AND ITEM.MJR_CATG_ID = STOREITEM.MJR_CATG_ID AND ITEM.INTMD_CATG_ID = STOREITEM.INTMD_CATG_ID AND STOREITEM.RTL_DEPT_NBR = 1 AND AD_ITEM_FLG = 'Y‘ AND WK_STRT_DT = '2002-02-08';
    Unformatted SQL, where to start?
  • 4. Formatted
    • EXPLAIN PLAN SET QUERYNO = 1 FOR
    • SELECT DISTINCT ITEM.ITEM_NBR AS ITEM_NBR, ITEM.PRDT_ID, STOREITEM.WK_STRT_DT AS WK_STRT_DT ,STOREITEM.DC_ID AS DC_ID
    • FROM PROD.TIPA004_STITM_PROJ AS STOREITEM
    • ,PROD.TITM001_ITEM AS ITEM
    • WHERE ITEM.BUS_UNIT_ID = STOREITEM.BUS_UNIT_ID
    • AND ITEM.MJR_CATG_ID = STOREITEM.MJR_CATG_ID
    • AND ITEM.INTMD_CATG_ID = STOREITEM.INTMD_CATG_ID
    • AND ITEM.ITEM_NBR = STOREITEM.ITEM_NBR
    • AND ITEM.BUS_UNIT_ID = ‘GS‘
    • AND ITEM.MJR_CATG_ID = '00754‘
    • AND ITEM.INTMD_CATG_ID = '00043‘
    • AND STOREITEM.AD_ITEM_FLG = 'Y‘
    • AND STOREITEM.RTL_DEPT_NBR = 1
    • AND STOREITEM.WK_STRT_DT = '2002-02-08';
  • 5. Analyzing query
    • Observe “interesting predicates”
      • Optimizer may produce inaccurate filter factor estimate
      • Range predicates with parameter markers
      • Predicates using interesting literals
        • Probable defaults
      • Complex predicates
        • Complex OR expressions
        • Negation predicates
        • Column expressions
        • Non-column expressions
  • 6. Sample query Pat’s diagnosis
  • 7. Query breakdown
    • SELECT …
    • FROM SETL_TRANS S
    • ,BRANCH CUST
    • ,BRANCH_ADDR A
    • WHERE S.ADV_ABA_R = ?
    • AND S.PROCESS_DT < '9999-12-31‘
    • AND S.TYPE_CD IN ('A', ‘C’, ‘X’)
    • AND S.CLR_CYCLE_CD IN ('EOD', 'IMD‘, ‘OPN’)
    • AND S.STLMT_DT = ?
    • AND S.ACCT_NUM = CUST.ACCT_NUM
    • AND CUST.CUST_EFCT_DT <= ?
    • AND CUST.CUST_INACTV_DT > ?
    • AND A.ACCT_NUM = CUST.ACCT_NUM
    • AND A.CUST_EFCT_DT <= ?
    • AND A.CUST_INACTV_DT > ?
    • AND A.ADDR_TYP_CD = ' '
  • 8. Identify peculiar predicates
    • SELECT …
    • FROM SETL_TRANS S
    • ,BRANCH CUST
    • ,BRANCH_ADDR A
    • WHERE S.ADV_ABA_R = ?
    • AND S.PROCESS_DT < ‘9999-12-31’  MAX DATE
    • AND S.TYPE_CD IN ('A', 'C', ‘X‘, ‘Z’)
    • AND S.CLR_CYCLE_CD IN ('EOD', 'IMD‘, ‘OPN’)
    • AND S.STLMT_DT = ?
    • AND S.ACCT_NUM = CUST.ACCT_NUM
    • AND CUST.CUST_EFCT_DT <= ?  Range with marker
    • AND CUST.CUST_INACTV_DT > ?  Range with marker
    • AND A.ACCT_NUM = CUST.ACCT_NUM
    • AND A.CUST_EFCT_DT <= ?  Range with marker
    • AND A.CUST_INACTV_DT > ?  Range with marker
    • AND A.ADDR_TYP_CD = ' ‘  COL = blank
  • 9. Why are they peculiar?
    • Predicates with typical default often skewed.
    • AND S.PROCESS_DT < ‘9999-12-31’  MAX DATE
    • AND A.ADDR_TYP_CD = ' ‘  COL = blank
    • Range predicates with parameter markers
    • - Impossible to estimate without literal
    • AND CUST.CUST_EFCT_DT <= ?  Range with marker
    • AND CUST.CUST_INACTV_DT > ?  Range with marker
    • AND A.CUST_EFCT_DT <= ?  Range with marker
    • AND A.CUST_INACTV_DT > ?  Range with marker
  • 10. Range predicate interpolation Table 104. Default filter factors for interpolation Note: Op is one of these operators: <, <=, >, >=. COMMENT: This is DB2’s documented guess for an impossible to estimate Filter factor. 1 1 = 1 1 / 10 1 / 3 >= 2 1 / 10 1 / 3 >= 0 3 / 100 1 / 10 >= 100 1 / 100 1 / 30 >= 1,000 3 / 1,000 1 / 100 >= 10,000 1 / 1,000 1 / 300 >= 100,000 3 / 10,000 1 / 1,000 >= 1,000,000 1 / 10,000 1 / 3,000 >= 10,000,000 3 / 100,000 1 / 10,000 >= 100,000,000 Filter Factor for LIKE / BETWEEN Filter factor for OP COLCARDF
  • 11. Analyzing query
    • Embed information within statement
      • Table information
        • CARDF
        • NPAGES
      • Column information for predicates
        • Local predicates
        • Join predicates
      • Observe where the filtering is
        • Selectivity of a predicate is relative to table cardinality
    • Investigate “suspicious” predicates
      • Determine actual versus estimated filtering
      • If there is a problem, identify options
  • 12. Embed statistics
    • SELECT …
    • FROM SETL_TRANS S CARDF 1,600,254 NPAGES 21,627
    • ,BRANCH CUST CARDF 31,696 NPAGES 1132
    • ,BRANCH_ADDR A CARDF 58,627 NPAGES 2791
    • WHERE S.ADV_ABA_R = ? COLCARDF 19,712
    • AND S.PROCESS_DT < ‘9999-12-31’ COLCARDF 11
    • LOW2KEY 2004-03-24 HIGH2KEY 2004-04-05
    • AND S.TYPE_CD IN ('A', 'C', ‘X‘, ‘Z’) COLCARDF 4
    • AND S.CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3
    • AND S.STLMT_DT = ? COLCARDF 13
    • AND S.ACCT_NUM = CUST.ACCT_NUM COLCARDF 15360 / 26,527
    • AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496
    • LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06
    • AND CUST.CUST_INACTV_DT > ? COLCARDF 279
    • LOW2KEY 2004-03-04 HIGH2KEY 2004-04-07
    • AND A.ACCT_NUM = CUST.ACCT_NUM COLCARDF 26,527 / 26,527
    • AND A.CUST_EFCT_DT <= ? COLCARDF 2,496
    • LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06
    • AND A.CUST_INACTV_DT > ? COLCARDF 274
    • LOW2KEY ‘2004-03-04’ HIGH2KEY ‘2004-04-07’
    • AND A.ADDR_TYP_CD = ‘ ‘ COLCARDF 5
  • 13. Suspicious predicate analysis
    • 1) The first range predicate, we’re looking for all values less than ‘9999-12-31.
    • So the predicate searches for all values less than a number significantly greater
    • Than the HIGH2KEY – so basically, all of the rows qualify here.
    • (since the optimizer has the literal value, it KNOWS that all rows qualify).
    • 2) For the column = blank predicate, I don’t believe a skew search was ever done.
    • You could look to see how many values are blank. Is it > 20%? 1/5 = 20%.
    • 1) AND S.PROCESS_DT < '9999-12-31‘ COLCARDF 11
    • LOW2KEY 2004-03-24 HIGH2KEY 2004-04-05
    • 2) AND A.ADDR_TYP_CD = ' ‘ COLCARDF 5
    • Conclusion: First predicate is should not be causing this SQL statement any
    • Problems.
  • 14. Suspicious predicate analysis
    • The literal value used for each of the parameter markers in this case happened
    • To be the same, and the value was 2004-04-06.
    • Comparing the literal value to the HIGH2KEY and what range would qualify
    • Is how I determined the ESTIMATED FF WITH LITERAL.
    • The ESTIMATED FF WITH MARKER is from the chart in the Admin guide.
    • The “error” is how different the optimizers DEFAULT estimate is from ACTUAL filtering.
    • 3) AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496
    • LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06
    • ESTIMATED FF WITH LITERAL: = 100%
    • ESTIMATE WITH MARKER: 1/30 = 3% ( 97% error )
    • 4) AND CUST.CUST_INACTV_DT > ? COLCARDF 279
    • LOW2KEY 2004-03-04 HIGH2KEY 2004-04-07
    • ESTIMATED FF WITH LITERAL: = 99%
    • ESTIMATE WITH MARKER: 1/10 = 10% ( 89% error )
    • 5) AND A.CUST_EFCT_DT <= ? COLCARDF 2,496
    • LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06
    • ESTIMATED FF WITH LITERAL: = 100%
    • ESTIMATE WITH MARKER: 1/30 = 3% ( 97% error )
    • 6) AND A.CUST_INACTV_DT > ? COLCARDF 274
    • LOW2KEY ‘2004-03-04’ HIGH2KEY ‘2004-04-07’
    • ESTIMATED FF WITH LITERAL: = 99%
    • ESTIMATE WITH MARKER: 1/10 = 10% ( 89% error )
  • 15. Suspicious predicate analysis
    • Conclusion
      • The range predicates with parameter markers introduce significant filter factor error. So we should recognize that this filter factor error can cause significant cost estimation problems for the optimizer – possibly resulting in poor access path choice.
  • 16. Where’s the filtering?
    • WHERE S.ADV_ABA_R = ? COLCARDF 19,712
    • (Very selective predicate)
    • AND S.PROCESS_DT < ‘9999-12-31’ COLCARDF 11
    • (This predicate doesn’t filter anything, known from suspicious predicate analysis)
    • AND S.TYPE_CD IN ('A', 'C', ‘X', ‘Z') COLCARDF 4
    • (In-list looking for 4 values, COLCARDF 4 – not filtering)
    • AND S.CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3
    • (In-list looking for 3 values, COLCARDF 3 – not filtering)
    • AND S.STLMT_DT = ? COLCARDF 13
    • (COL = LIT, COLCARDF 13 – somewhat filtering, but not great selectivity)
    • AND S.ACCT_NUM = CUST.ACCT_NUM COLCARDF 15360 / 26,527
    • (For the range predicates, we know that optimizer PERCIEVES them to be selective but
    • In reality, they are not. This was determined during suspicious predicate analysis)
    • AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496
    • AND CUST.CUST_INACTV_DT > ? COLCARDF 279
    • AND A.ACCT_NUM = CUST.ACCT_NUM COLCARDF 26,527 / 26,527
    • AND A.CUST_EFCT_DT <= ? COLCARDF 2,496
    • AND A.CUST_INACTV_DT > ? COLCARDF 274
    • AND A.ADDR_TYP_CD = ‘ ‘ COLCARDF 5
    • (COL = blank. Probably this column is skewed on blank. COLCARDF 5, not typically
    • Very filtering)
  • 17. Where’s the filtering?
    • SELECT …
    • FROM SETL_TRANS S CARDF 1,600,254 NPAGES 21,627
    • ,BRANCH CUST CARDF 31,696 NPAGES 1132
    • ,BRANCH_ADDR A CARDF 58,627 NPAGES 2791
    • WHERE S.ADV_ABA_R = ? COLCARDF 19,712
    • AND S.PROCESS_DT < ‘9999-12-31’ COLCARDF 11
    • LOW2KEY 2004-03-24 HIGH2KEY 2004-04-05
    • AND S.TYPE_CD IN ('A', 'C', ‘X', ‘Z') COLCARDF 4
    • AND S.CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3
    • AND S.STLMT_DT = ? COLCARDF 13
    • AND S.ACCT_NUM = CUST.ACCT_NUM COLCARDF 15360 / 26,527
    • AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496
    • LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06
    • AND CUST.CUST_INACTV_DT > ? COLCARDF 279
    • LOW2KEY 2004-03-04 HIGH2KEY 2004-04-07
    • AND A.ACCT_NUM = CUST.ACCT_NUM COLCARDF 26,527 / 26,527
    • AND A.CUST_EFCT_DT <= ? COLCARDF 2,496
    • LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06
    • AND A.CUST_INACTV_DT > ? COLCARDF 274
    • LOW2KEY ‘2004-03-04’ HIGH2KEY ‘2004-04-07’
    • AND A.ADDR_TYP_CD = ‘ ‘ COLCARDF 5
    Most selective by far
  • 18. Index analysis
    • One significant input to the optimizer is…
      • Available indexes
      • What join sequence they encourage
    • Some index performance considerations
      • Provide efficient access for local predicates
        • Encourages table to be outer table
      • Provide efficient access for join predicates
        • Encourage access to table as INNER table of join
      • Provide ordering to avoid sort
    • Analysis:
      • Are there appropriate indexes to support this query?
  • 19. Identify indexes
    • Table: SETL_TRANS
    • INDEX IXSTRN01
    • (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT)
    • TABLE: BRANCH
    • INDEX: IXBRNC01
    • (CUST_INACTV_DT, CUST_EFCT_DT)
    • INDEX: IXBRNC02
    • (ACCT_NUM, CUST_EFCT_DT)
    • TABLE: BRANCH_ADDR
    • INDEX: IXBRAD01
    • (CUST_INACTV_DT, CUST_EFCT_DT)
    • INDEX: IXBRAD02
    • (ACCT_NUM, ADDR_TYP_CD, CUST_EFCT_DT)
  • 20. Index candidate usage
    • Table: AJT_SETL_TRANS
    • INDEX IXSTRN01
    • ( PROCESS_DT , CLR_CYCLE_CD , ADV_ABA_R , TYPE_CD , ACCT_NUM , STLMT_DT )
    • TABLE: BRANCH
    • INDEX: IXBRNC01
    • ( CUST_INACTV_DT, CUST_EFCT_DT )
    • INDEX IXBRNC02
    • ( ACCT_NUM , CUST_EFCT_DT )
    • TABLE: BRANCH_ADDR
    • INDEX: IXBRAD01
    • ( CUST_INACTV_DT, CUST_EFCT_DT )
    • INDEX: IXBRAD02
    • ( ACCT_NUM , ADDR_TYP_CD , CUST_EFCT_DT )
    • Key:
    • RED = Range predicate, stops matching
    • BLUE: Join predicate
    • GREEN: Local equals predicate / in-list
  • 21. Index design analysis (by table)
    • BRANCH table (Index design OK!)
      • Index IXBRNC02 supports local access
        • CONCERN: Predicate on this column has filter factor grossly overestimated, so optimizer will perceive the access to be more efficient to this table than what really occurs!
      • Index IXBRNC01 supports join access
    • BRANCH_ADDR table (Index design OK!)
      • Index IXBRAD01 leading column on local filtering
        • Predicate on this column has filter factor grossly over estimated
        • Allows table to be considered as inner table efficiently
      • Index IXBRAD02 leading column supports join
        • Allows table to be an efficient inner table
  • 22. Index design analysis (by table)
    • SETL_TRANS table (Not OK!)
      • INDEX IXSTRN01 has one index.
        • No efficient for join
          • join predicate needs to be leading col)
        • No efficient index for outer access
          • Leading column of index qualifies ALL rows
  • 23. Overlay table size
    • Table: SETL_TRANS CARDF 1,600,254 NPAGES 21,627
    • INDEX IXSTRN01
    • ( PROCESS_DT , CLR_CYCLE_CD , ADV_ABA_R , TYPE_CD , ACCT_NUM , STLMT_DT )
    • TABLE: BRANCH CARDF 31,696 NPAGES 1132
    • INDEX: IXBRNC02
    • ( CUST_INACTV_DT, CUST_EFCT_DT )
    • INDEX: IXBRNC01
    • ( ACCT_NUM , CUST_EFCT_DT )
    • TABLE: BRANCH_ADDR CARDF 58,627 NPAGES 2791
    • INDEX: IXBRAD01
    • ( CUST_INACTV_DT, CUST_EFCT_DT )
    • INDEX: IXBRAD02
    • ( ACCT_NUM , ADDR_TYP_CD , CUST_EFCT_DT )
    • Key:
    • RED = Range predicate, stops matching
    • BLUE: Join predicate
    • GREEN: Local equals predicate / in-list
    Biggest table, worst index Options. Must scan 1.6 million rows!
  • 24. Possible new indexes
    • Existing index
    • IXSTRN01
    • ( PROCESS_DT , CLR_CYCLE_CD , ADV_ABA_R , TYPE_CD , ACCT_NUM , STLMT_DT )
    • Efficient outer table access
    • INDEX opt_1
    • ( ADV_ABA_R, STLMT_DT , ACCT_NUM )
    • Efficient inner table access:
    • INDEX opt_2
    • ( ACCT_NUM )
  • 25. Summary of this SQL
    • Indexes on BRANCH, BRANCH_ADDR look better than they are
      • Range predicate with parameter marker estimates 3% of rows qualify
      • In reality, 99% qualify
    • Inefficient index available on SETL_TRANS table
      • No efficient outer table index available
      • No efficient inner table index available
      • This is the biggest table, with the best filter!!!
    • Optimizer bad join method due to combination of above factors
      • Performed full scan of transaction index 26,000 times
    • Resolution:
      • Providing new index on SETL_TRANS should provide more stable, faster access than ever before
      • REOPT, or providing literal values avoids the disaster without new index
  • 26. SQL 2 SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8
  • 27. Local predicate analysis SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005  ??? ‘ FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2  ??? 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958  Skewed, not selective AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126  skewed, selective LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998  skewed, not selective LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018  skewed, selective AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8
    • Both ‘A’ and ‘B’ tables have selective predicates.
    • COUNTRY_CD and PART_CD predicates – there is skew, optimizer assumes uniform distribution
    • B.PART_NUM – Slightly skewed. 3% one value. Uniform estimate is 0.4%.
    • PREFERRED – skewed, query searches for an infrequently occurring value.
    • Without looking at indexes, seems ‘A’ and ‘B’ will compete to be outer table
      • Qualified rows of 67.1 and 77.8 pretty close
  • 28. Local index analysis – ‘A’ SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005  ??? ‘ FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2  ??? 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958  Skewed, not selective INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 Y U 151 3 0.999 PART_CD 5 5 COUNTRY_CD 208 251 FILE 2496 3054 DR 46 3176 SECTOR 178 3548 PDV 16830 17598 IXPRT02 N D 128 2 0.794 PART_CD 5 5 PART_TYPE 8 28 PDV 16830 16850 FILE 2496 16905 IXPRT03 N D 26 2 0.998 PART_TYPE 8 8 PART_CD 5 28 COUNTRY_CD 208 579 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598
  • 29. Local index analysis – ‘A’ SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005  ??? ‘ FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2  ??? 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958  Skewed, not selective INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 Y U 151 3 0.999 PART_CD 5 5 COUNTRY_CD 208 251 FILE 2496 3054 DR 46 3176 SECTOR 178 3548 PDV 16830 17598 IXPRT02 N D 128 2 0.794 PART_CD 5 5 PART_TYPE 8 28 PDV 16830 16850 FILE 2496 16905 IXPRT03 N D 26 2 0.998 PART_TYPE 8 8 PART_CD 5 28 COUNTRY_CD 208 579 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598
  • 30. Local index analysis B SELECT COLS FROM CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 WHERE B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126  skewed, selective LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998  skewed, not selective LOW2KEY=N HIGH2KEY=Y INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CONT_TYPE 7 34728 IXCTR02 N D 50 2 0.624 PART_NUM 260 278 IXCTR03 N D 56 2 0.348 BEGIN_DT 1015 1015 CONTRACTOR_ID 1047 2555 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 BEGIN_DT 1015 34722 END_DT 2656 34722 CONT_TYPE 7 34728 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 BEGIN_DT 1015 2555 PART_NUM 260 34722
  • 31. Local index analysis B
    • Note: SUB_CONTRACTOR is selective due to search for least frequent value. Is not in any candidate index.
    • Otherwise, local index support looks good.
    • May be able to drop IXCTR02 with reverse index scan support.
    SELECT COLS FROM CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 WHERE B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126  skewed, selective LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998  skewed, not selective LOW2KEY=N HIGH2KEY=Y INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CONT_TYPE 7 34728 IXCTR02 N D 50 2 0.624 PART_NUM 260 278 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 BEGIN_DT 1015 34722 END_DT 2656 34722 CONT_TYPE 7 34728 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 BEGIN_DT 1015 2555 PART_NUM 260 34722
  • 32. Local index analysis C
    • Table C
      • There is index support for local filtering.
      • Trailing join column (good)
    SELECT COLS FROM CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 WHERE C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018  skewed, selective INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCPR04 N D 15352 3 0.998 PREFERRED 3 3 CONTRACTOR_ID 316 552 PRODUCT_ID 1391650 1808887
  • 33. Indexes for local summary
    • Each table with local filtering had efficient indexes to support local filtering
      • Positives:
        • Efficient access paths exist.
      • Negatives:
        • Each table will compete for the outer
        • More “apparently efficient” choices, more stress on optimizer, opportunity for incorrect choice
  • 34. Join graph C B D E A
    • Two most selective tables ‘A’ and ‘B’ not joined directly
    • C – D – E each join on same column (PRODUCT_ID)
    • Shaping up like ‘A’ with 67 outer rows as outer vs ‘B’ with 77 rows as outer
  • 35. Join considerations
    • Index support for certain join sequences
      • Indexes available to support matching index access for different desirable join sequences?
    • Join reduction / fan-out considerations
      • Consider expansion / contraction of result size through different join sequences
  • 36. Join indexes A SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘ FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 N P 83 2 0.782 PART_NUM 17598 17598 IXPRT02 N D 112 2 0.782 PART_NUM 17598 17598 PART_TYPE 8 17598 PART_CD 5 17598 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598 IXPRTxx N D 122 2 0.782 PART_NUM 17598 17603 PART_TYPE 8 17598 PART_CD 5 -1 COUNTRY_CD 208 17603
  • 37. Join indexes A
    • Join access available through join the ‘D’ table only
      • Via PART_NUM if ‘D’ is the outer
    • There are multiple indexes to support ‘A’ as inner
      • IXPRT02 and IXPRTxx appear redundant
      • IXPRTxx is superset of IXPRT02, same column sequence
    INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 N P 83 2 0.782 PART_NUM 17598 17598 IXPRT02 N D 112 2 0.782 PART_NUM 17598 17598 PART_TYPE 8 17598 PART_CD 5 17598 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598 IXPRTxx N D 122 2 0.782 PART_NUM 17598 17603 PART_TYPE 8 17598 PART_CD 5 -1 COUNTRY_CD 208 17603
  • 38. Join indexes B SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘ FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CONT_TYPE 7 34728 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 BEGIN_DT 1015 34722 END_DT 2656 34722 CONT_TYPE 7 34728 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 BEGIN_DT 1015 2555 PART_NUM 260 34722
  • 39. Join indexes B
    • Join access available through join the ‘C’ table only
      • Via CONTRACTOR_ID if ‘C’ is the outer
    • There are multiple indexes to support ‘B’ as inner
      • IXCTR01 has PART_NUM as leading local
        • Join from outer will hit far fewer leaf pages due to leading local predicate
        • Smaller “swath” of leaf pages: NLEAF * 1/PART_NUM COLCARDF
        • 210 * (1/260) ~= 1 leaf page
        • Makes this index “outstanding” from inner index access perspective
        • Also an effective “outer” index since it provides good local filtering and join order for a join to ‘C’ table as inner
      • IXCTR04, IXCTR05 lead with join predicate
        • Support the join effectively
        • Join scattered over all leaf pages
    INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CE_TYPE 7 34728 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 CE_DTDIFFREEL 1015 34722 CE_DTLANCREEL 2656 34722 CE_TYPE 7 34728 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 CE_DTDIFFREEL 1015 2555 PART_NUM 260 34722
  • 40. Join indexes C SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘ FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCPR01 Y U 21367 3 1.0 PRODUCT_ID 1391650 1391650 CONTRACTOR_ID 316 1794093 CO_DTHRCONTACT 1645213 2093750 IXCPR02 N D 14771 3 0.999 CONTRACTOR_ID 316 316 PRODUCT_ID 1391650 1794093 IXCPR03 N D 16188 3 0.998 CONTRACTOR_ID 316 316 CO_PHASECONTACT 4 783 PRODUCT_ID 1391650 1931232 IXCPR04 N D 15352 3 0.998 PREFERRED 3 3 CONTRACTOR_ID 316 552 PRODUCT_ID 1391650 1808887
  • 41. Join indexes C
    • Join access available through join the ‘B’, ‘D’, and ‘E’ tables
      • Via CONTRACTOR_ID if ‘B’ is the outer composite
      • Via PRODUCT_ID if ‘D’ or ‘E’ are in the outer composite
    • There is support for either join sequence.
      • CPNQCC02 has PRODUCT_ID as leading column to support ‘D’ or ‘E’ in outer composite
      • CPNQXC02 and IXCPR03 have CONTRACTOR_ID as leading join column if ‘B’ is the outer composite
        • IXCPR03 would also be a candidate if B were cartesianed with D or E. Not that I think that’s likely.
      • CPMQXCOH would likely be preferred index if ‘B’ were in outer composite
        • Selective leading local on PREFERRED bounds the leaf pages that would be hit to < 2% of all leaf pages
        • Makes ‘C’ a possible efficient outer – good local filtering, provides join ordering for join to ‘B’ table
    INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCPR01 Y U 21367 3 1.0 PRODUCT_ID 1391650 1391650 CONTRACTOR_ID 316 1794093 CO_DTHRCONTACT 1645213 2093750 IXCPR02 N D 14771 3 0.999 CONTRACTOR_ID 316 316 PRODUCT_ID 1391650 1794093 IXCPR03 N D 16188 3 0.998 CONTRACTOR_ID 316 316 CO_PHASECONTACT 4 783 PRODUCT_ID 1391650 1931232 IXCPR04 N D 15352 3 0.998 PREFERRED 3 3 CONTRACTOR_ID 316 552 PRODUCT_ID 1391650 1808887
  • 42. Join indexes D SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘ FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPDA05 Y P 44900 4 0.975 PRODUCT_ID 7058356 7058356 IXPDA02 N D 70586 4 0.868 PART_NUM 6132 6132 PRODUCT_ID 7058356 7058356 IXPDA06 N D 66590 4 0.975 PRODUCT_ID 7058356 7058356 PART_NUM 6132 7058356
  • 43. Join indexes D
    • ‘ D’ is accessed in multiple directions
      • Via PART_NUM if ‘A’ is the outer
      • Via PRODUCT_ID if accessed through ‘C’ or ‘E’
    • Both join direction supported by matching index access.
      • RT_ENTID leading column of IXPDA02
      • PRODUCT_ID leading column of IXPDA05, IXPDA06
    • The non-primary key indexes are defined as allowing duplicates – but they cannot.
      • PRODUCT_ID is the primary key and is included in a unique index.
      • Any index which contains PRODUCT_ID therefore is unique. Defining as unique would save some space in the index. Duplicate indexes have slightly larger control structures to allow for duplicate RIDS.
      • DB2 must allow for duplicates if the index is not explicitly defined as unique since you could drop the unique index.
  • 44. Join indexes E
    • Join access available through C and E tables
      • Both tables join on PRODUCT_ID column
    • Join is supported via IXPDA01 index
      • PRODUCT_ID only column
      • Unique index (no fan-out when joining to this table)
    SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘ FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPPA01 N U 141499 4 0.609 PRODUCT_ID 21366326 21366326
  • 45. Join fan-out SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘ FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8
    • Look at join fan-out issues
        • Qualified outer rows * (CARDF of inner / MAX(join colcardf)
      • A  D
        • 67.1 rows * (7,058,356 / 17598) ~= 27,000 rows
      • B  C or C  B
        • 77.8 rows * (2,093,750 / 1047) ~= 155,500 rows (after local filtering on C, down to 38K)
      • So B  C expected to fan-out far more.
  • 46. Explain
    • Join sequence
      • Access ‘A’ via index IXPRT01 ( PART_CD, COUNTRY_CD , …) ~67 rows
      • Nested loop join to ‘D’ using index IXPDA02 ( RV_ENT_ID , PRODUCT_ID) ~27,000 rows
      • Sort merge join to C
        • Sorting composite into PRODUCT_ID sequence
        • Access ‘C’ via IXCPR04 ( PREFERRED, CONTRACTOR_ID )
        • Sorting new into PRODUCT_ID sequence ~7,900 rows
      • Nested loop join to B via index IXCTR01 ~7,900 rows
        • ( PART_NUM , CONTRACTOR_ID , CE_TYPE)
      • Nested loop join to E via index IXPDA01 ~7,900 rows
        • ( PRODUCT_ID )
    • Blue = local predicate
    • Green = join predicate
    1 2 1 1 2 MATCH COLS N N N IXPDA01 I PARTS_PROD_ASM_DTL 1 5 N N N IXCTR01 I CONTRACTOR 1 4 Y Y N IXPRD04 I CONT_PARTS 1 2 3 N N Y IXPPA02 I PARTS_PROD_ASSEMBLY 1 2 N N N IXPRT01 I PART 0 1 SORTC_JOIN SORTN_JOIN IX_ONLY ACCESS NAME ACCESS_TYPE TB_NAME MERGE_COLS METHOD PLANNO
  • 47. Issues – A as outer?
    • Is local filtering to ‘A’ table accurate?
      • There is skew, but use of markers precludes recognition of skew
      • Qualified rows and fan-out could be much worse than estimated
      • ‘ A’ as outer could be underestimated, depends on what values being used
    • Sort merge join to ‘C’ to avoid 27K probes
      • Does not want to probe 27k times matching + fan-out on PRODUCT_ID
      • Uses efficient local index instead
        • 1 probe to scan of 38k rows via PREFERRED
        • 27K probes * 2 rows per inner via index on PRODUCT_ID
      • Index on PREFERRED, PRODUCT_ID likely would might avert SMJ in this context
      • Hesitant to recommend index – since A  D  C could be an inefficient sequence.
  • 48. Issues – B / C as outer?
    • B as outer
      • Less skew on B.PART_NUM = ? – less uncertainty in cost estimate
      • Fan-out to 38K rows is discouraging
      • B  C supported by efficient local + equals index
        • (PREFERRED, CONTRACTOR_ID, PRODUCT_ID)
    • C also a desirable outer
      • Index on (PREFERRED,CONTRACTOR_ID,PRODUCT_ID) provides good local filter
      • Could access B via local filtering on B.PART_NUM = ?, materialize 77 rows into workfile for sort merge join
  • 49. Summary Query 2
    • Bottom line:
      • Uniform distribution estimate on ‘A’ table allows it to compete very favorably.
      • If ‘FR’, ‘GB’, ‘DE’ values used for COUNTRY_CD – ‘A’ as outer no longer desirable.
        • Are ‘FR’, ‘GB’, ‘DE’ values frequently used for this query?
      • If PART_CD = ‘4’ value is used frequently – ‘A’ as outer no longer desirable.
        • Is ‘4’ used frequently?
      • Split query, REOPT, OPTHINTS…
    • Multiple choices
      • Local filtering spread across several tables
      • Estimated filtering looks good
      • Efficient access paths (index to support local, join predicates) exist
      • More difficult for optimizer to identify the cheapest path
      • Scenario more regression prone
      • Optimizer may need more statistics, ability to use more statistics (REOPT) for optimizer identify the cheapest path
  • 50. Commentary
    • How to perform SQL analysis
      • Format query so it’s readable
      • Annotate with important statistics
        • Tables:
          • Table cardinality, NPAGES, qualified number of rows
        • Predicates
          • COLCARDF, LOW2KEY, HIGH2KEY, filter factor estimate
        • Are table level estimates reasonable based on your knowledge?
          • If you don’t know – perform counts to find out if estimates are accurate
          • If you don’t know how selective things are, how will you know what the best path should be?
        • Are predicate level filtering estimates reasonable?
      • Reference table, index, indexed columns report
        • Is the best local filtering supported through matching index access?
        • Any mis-estimated local filtering that’s also matching indexable (may cause one path to look far more efficient than reality)
        • With trailing join predicates to provide order to next desired table (bonus)
        • Is there adequate (matching) index support for desired join sequences?
      • Develop understanding of “plausible” and “desirable” access paths
      • Examine EXPLAIN output
        • Does optimizer choose the path you expect?
        • If not, you should have better understanding of what makes other access paths competitive, tuning can be more targeted
          • Eg. Certain predicate appears filtering, but is not.
          • Can use REOPT, or trick – targeted to solve a specific problem.
        • Skilled targeted tuning is less susceptible to re-regress than blind tuning (where problem is not understood)