Your SlideShare is downloading. ×
  • Like
Troubleshooting
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Troubleshooting

  • 664 views
Published

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
664
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
14
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Sometimes we need to know whether one or more rows exist for a particular where condition, but we don’t actually need the data returned from the query. Ie Select customers who have at least one order over $1000. You don’t need to know about the individual orders, only that at least one exists for an amount over $1000. IDUG – International DB2 Users Group
  • It seems logical to find out whether data exists by counting the rows to see if the result is > zero. To count, DB2 has to actually access each row, or use an index.
  • Selecting 1 is just a way to avoid actually getting data out of the table. I saw a nice example where the developer selected ‘Y’ into :WS-ROW-EXISTS.
  • If there are one million rows that qualify against the where clause in the non-correlated subquery, then the subquery will read one million rows before returning the result of TRUE to the outer query. Since the inner query is executed first, DB2 does not know you are testing existence until the inner query is finished and it moves on to the outer query.
  • The correlated subquery needs some piece of information from the outer query in order to process the inner one. This example is a fictional table…
  • Remember SYSDUMMY1 from a previous class? It’s one column is IBMREQD. What if the subquery only has to match one row using a unique index? Then both queries would take the same amount of time to execute, and the SELECT COUNT(*) would only count 1 row as well. However, it is always good programming practice to code for efficient access. Someone will undoubtedly clone the SQL in the future for another use. We could use our own dummy table, M7535DB1.DUMMY the same way. It has two columns, dummy and ssid and you could use either one.
  • I selected the current timestamp before and after my query. In the absence codes array, I’m checking for three periods of absence in a row. I wanted a search that would find many rows and not use an index. When I tried this with a query that only found one row, and with a query that used an index, the times were much closer together.
  • I did these queries all at once, with timestamps in between, so the ending timestamp from one query is the beginning of the next.
  • This was such a dramatic improvement that I was very impressed. I thought the class would end here, until I read further in the article.
  • List prefetch – if your data is clustered and you are using a non-clustering index, DB2 might sort your access requests by the clustering index before accessing the pages. Sequential prefetch – reads many pages at once in large chunks of data. I didn’t believe this until I tried it.
  • You can say optimize for as many rows as you expect to fetch, for example, in an online screen, 15 or so rows fit on one screen. If the user presses PF7 or 8 t scroll, you open a new cursor. Look at sample program and results
  • Not only did the optimized cursor prove to be the fastest, even a plain old non-optimized cursor was faster than the correlated subquery with exists. In fact, when I reversed the order, so the non-optimized cursor was executed after the optimized cursor, it was faster. This is not nearly as fancy and high-tech sounding, and it requires the extra code to open and close the cursor. I would probably use the where exists to avoid the extra code, but if you are trying to squeeze every microsecond from the time, try this.
  • Quote in the last bullet is from my article authors. I will try this out when we migrate to v7!

Transcript

  • 1. Troubleshooting Pam Odden
  • 2. Objectives
    • Learn how to use a correlated subquery to check for existence
    • Compare methods of checking existence for speed
    • The information for this class was taken from an article in the IDUG Solutions Journal Volume 8, Number 3, 2001; Existence Checking: The Real Story, by Richard Yevich and Susan Lawson.
  • 3. How NOT to Check for Existence
    • SELECT COUNT(*) FROM MYTABLE WHERE…
    • This is often the first way of checking for existence that comes to mind. However, you only need to know if at least one row exists; there is no reason to retrieve and count all occurrences.
  • 4. Another Way NOT to Check for Existence
    • SELECT 1 FROM SYSIBM.SYSDUMMY1
    • WHERE EXISTS
    • (SELECT 1 FROM MYTABLE WHERE…)
    • It is a common misunderstanding of the EXISTS clause that the subquery will terminate as soon as it finds the existence of a row; that it doesn’t matter whether one row or one million rows match.
    • This is only true if the subquery is correlated . If it is not, its performance can actually be marginally worse than the SELECT COUNT(*), because it incurs the cost of accessing sysibm.sysdummy1 besides accessing all the rows matching the WHERE clause.
  • 5. Non-correlated Subqueries
    • A non-correlated subquery is one that has no correlation, or reference, to the outer query. For example:
      • SELECT STUDENT_NAME FROM STUDENT_TABLE
      • WHERE SCHOOLNUM =
      • (SELECT SCHL_NO FROM SCHOOL_NAME_TABLE
      • WHERE NAME = ‘PALO VERDE’);
    • No information is needed from the outer query in order to process the inner query. DB2 processes the inner query first because the outer query is dependent on the result.
    • DB2 takes the following steps to process a non-correlated subquery:
      • Access the inner table, using either a tablespace scan or an index.
      • Sort the results and remove duplicates.
      • Place the results in an intermediate table.
      • Access the outer table, comparing all qualifying rows to those in the intermediate table for a match.
  • 6. Correlated Subqueries
    • A correlated subquery is one that does reference the outer query. For example :
      • SELECT STUDENT_NAME, CUM FROM STUDENT_TABLE A
      • INNER JOIN GRADE_TABLE B
      • ON A.STUDENT_ID = B.STUDENT_ID
      • WHERE CUM =
      • (SELECT MAX(CUM) FROM GRADE_TABLE C
      • WHERE A.STUDENT_ID = C.STUDENT_ID);
    • Here, it is not possible to execute the inner query first since a value from the outer query is not yet known.
    • DB2 takes the following steps to process a correlated subquery:
      • Access the outer table, using either a tablespace scan or an index.
      • For each qualifying outer table row, evaluate the subquery.
      • Pass the results to the outer query one row at a time.
      • Evaluate the outer query using the inner query results, one row at a time.
  • 7. Make our non-correlated subquery correlated
    • SELECT 1 FROM SYSIBM.SYSDUMMY1 X
    • WHERE EXISTS
    • (SELECT 1 FROM MYTABLE WHERE…
    • AND X.IBMREQD = X.IBMREQD )
    • The column IBMREQD is the only column of the SYSDUMMY1 table. By coding this in an “always true” predicate in the subquery, the subquery will not be executed until the first row of SYSDUMMY1 is retrieved.
    • Now the EXISTS clause will execute exactly how we expected. SYSDUMMY1 has only 1 row, which will be retrieved, and the correlated subquery will then be executed. With the EXISTS clause, as soon as one match is found, the search will terminate and return TRUE to the outer query.
  • 8. Troubleshooting Example
    • BIND PACKAGE(CS1020BA) MEMBER(S1020072) CURRENTDATA(NO) DEGREE(1) ISO(CS) VALIDATE(BIND) DSNX200I # BIND SQL ERROR
    • USING PJO AUTHORITY
    • PLAN=(NOT APPLICABLE)
    • DBRM=S1020072
    • STATEMENT=786
    • SQLCODE=-206
    • SQLSTATE=42703
    • TOKENS=CARD-SCHOOLYR
    • CSECT NAME=DSNXORSO
    • RDS CODE=-100
    • DSNT233I # UNSUCCESSFUL BIND FOR
    • PACKAGE = CCSDTSN.CS1020BA.S1020072.()
  • 9. Troubleshooting Example
    • DECLARE STATS CURSOR FOR
    • SELECT STATS.SCHOOLYR
    • , STATS.FINAL_IND
    • , STATS.SUB_DISTRICT
    • , STATS.SCHOOLNUM
    • , STATS.RPTING_TRACK
    • , STATS.GRADE
    • , SUM(STATS.PRESENT)
    • , SUM(STATS.ABSENT)
    • , ASCH.SCHOOLNAME
    • FROM SSTATDB1.MONTHLY_STATS STATS
    • INNER JOIN SSASIDB1.ASCH_SCHOOL ASCH
    • ON STATS.SCHOOLNUM = ASCH.SCHOOLNUM
    • WHERE SCHOOLYR = CARD-SCHOOLYR
    • AND GENDER = 'T'
    • AND FINAL_IND = 'F'
    • AND STATS.SCHOOLNUM IN ('201','351')
    • GROUP BY SCHOOLYR
    • , FINAL_IND
    • , STATS.SUB_DISTRICT
    • , STATS.SCHOOLNUM
    • , ASCH.SCHOOLNAME
    • , RPTING_TRACK
    • , GRADE
    • ORDER BY FINAL_IND
    • , STATS.SUB_DISTRICT
    • , STATS.SCHOOLNUM
    • , RPTING_TRACK
    • , GRADE
    • FOR READ ONLY
  • 10. Timed Example, Non-correlated subquery
    • Using a non-correlated subquery, this took 2 minutes and 2 seconds , even longer than select count(*)
    • select current timestamp from sysibm.sysdummy1;
    • --------+---------+---------+---------+---------+---------+
    • 2002-05-28-13.07.32.430477
    • select 1 from sysibm.sysdummy1
    • where exists
    • (select 1
    • from ssasidb1.aatp_period_attend
    • where absence_codes like '%AAA%');
    • --------+---------+---------+---------+---------+---------+
    • 1
    • select current timestamp from sysibm.sysdummy1;
    • ---------+---------+---------+---------+---------+---------+
    • 2002-05-28-13.09.34.706763
  • 11. Timed Example, Correlated subquery
    • Using a correlated subquery, this took less than 1 second
    • select current timestamp from sysibm.sysdummy1;
    • ---------+---------+---------+---------+---------+---------+
    • 2002-05-28-13.25.11.696250
    • select 1 from sysibm.sysdummy1 dummy
    • where exists
    • (select 1
    • from ssasidb1.aatp_period_attend
    • where absence_codes like '%AAA%'
    • and dummy.ibmreqd = dummy.ibmreqd);
    • ---------+---------+---------+---------+---------+----
    • 1
    • select current timestamp from sysibm.sysdummy1;
    • ---------+---------+---------+---------+---------+----
    • 2002-05-28-13.25.11.731353
  • 12. Optimize for One Row
    • The authors of my source article say, “For existence checking in V6 and earlier, the technique of coding a cursor with OPTIMIZE FOR ONE ROW, opening the cursor, and simply fetching one row, has generally proven to be the existence checking method that provides the best performance.”
    • Coding a cursor with the clause OPTIMIZE FOR ONE ROW tells DB2 that you only intend to fetch the first row, regardless of how many rows are returned.
    • The optimizer may disable such optimization features as List and Sequential Prefetch, depending on whether the access path can benefit from it.
    • Without this clause, the optimizer determines the most efficient access path for the full result set, and this usually is not the most path for accessing just one row.
  • 13. Optimize for One Row, cont.
    • Example:
      • EXEC SQL
      • DECLARE CURS-OPT CURSOR FOR
      • SELECT 1
      • FROM SSASIDB1.AATP_PERIOD_ATTEND
      • WHERE ABSENCE_CODES LIKE '%AAA%'
      • OPTIMIZE FOR 1 ROW
      • END-EXEC.
    • OPTIMIZE is not valid syntax for singleton selects, and is also not effective for SQL coded within products such as QMF, SPUFI and PRF. However, it works well for dynamic or static cursors embedded within an application program since you have the capability to control the number of rows that are fetched.
    • The sample program and its output at the end of the slides shows that, in fact, the optimized cursor, and even the non-optimized cursor proved faster than the WHERE EXISTS clause in our student absence example.
  • 14. Comparison of Existence Checks
    • The sample program (included at end of slides) produced the following results.
      • Exists clause with correlated subquery: 6254 microseconds
      • Non-optimized cursor: 1726 microseconds
      • Optimized cursor: 1121 microseconds
      • Select count: 2 minutes 14 seconds
    • TIME: 2002-05-28-13.31.04.538493
    • USE EXISTS - ROW DOES EXIST
    • TIME: 2002-05-28-13.31.04.544747
    • NON-OPTIMIZED CURSOR COUNT = 000000001
    • TIME: 2002-05-28-13.31.04.546473
    • OPTIMIZED CURSOR COUNT = 000000001
    • TIME: 2002-05-28-13.31.04.547594
    • SELECT COUNT - ROW DOES EXIST
    • TIME: 2002-05-28-13.33.18.363175
  • 15. Fetch First Row Only
    • The FETCH FIRST (1) ROW ONLY clause, implemented in DB2 V7, limits the number of rows returned by the query to one, regardless of how many rows qualify from the WHERE clause.
    • It also implies OPTIMIZE FOR 1 ROW.
    • It can be coded in a singleton select, informing the optimizer to determine the best access path for the retrieval of one row and not to perform the internal 2 nd fetch that SQL usually does to determine whether to return a –811.
    • “ Due to the ability to limit the result set to one row, and also inform the optimizer of this intention, this technique should become the new standard for existence checking from V7 onwards.”
  • 16. Summary
    • For existence checking, it is best to
    • open a cursor (optionally using the OPTIMIZE FOR ONE ROW clause) and fetch only one row
    • or use the WHERE EXISTS clause with a correlated subquery
    • Most important: be aware of the various ways of checking for existence, and if one way is taking a long time, try another