The topics being covered in this presentation as outlined on the agenda will not be divided into the “hard groups” as specified. However – all the topics will be covered sooner or later. The topics are: EXPLAIN has been through a tremendous evolution over the past two decades – and “thank you for that” since it’s quite a job to keep up the Optimizers pace, so we will look into some of the changes to EXPLAIN which might have slipped your mind. The Optimizer has gotten more and more clever – but sometimes it does happen you don’t get the “correct” Access Path, so we’ll look at what resources the Optimizer has and what we can do to help her/him/it to pick “our desired AP”. When the Optimizer doesn’t pick the AP we want – we have many methods to change the Optimizers preferred Access Path. We will also look into some methods to predict the Access Path before we move our programs and SQL into production. Part of item 3 above is the ability to version Access Path information – especially for predicting but certainly also for reactive performance debugging. Finally we will look into how the costs of SQL statements/packages can be compared to the current costs prior to upgrading the SQL and/or package or doing the BIND / REBIND.
When dealing with performance, a number of issues (sometimes unfortunately neglected) are very important to understand. Performance is not only a matter for the Systems Programmer, the DBA or the Application Developer or Programmer analysts. Performance is ONE COMMON task – and everyone need to be involved and understand the issue if the common theme needs to be successful – the Enterprise. The biggest challenge today is to have DBA’s and Application Developers speak “the same language”. Too often “fingers are pointed” instead of creating a mutual common environment and understanding of the issues. In order to capture performance problems before they do become an issue – and potentially hurt the enterprise, it is important to explain the “why’s” to the application folks instead of simply saying “do it this way” – since the same issue will arise again very shortly. Also it is important for the Application folks to explain the business issues to the DBA’s – too often these two groups are moving in different directions – and the ultimate “end-of-the-story” is a hurting enterprise.
So in order to start up from the previous slide – it is important everybody is aware of the tools, the reasons and the methods – so let’s do a short introduction to EXPLAIN – the most important tool when dealing with performance. EXPLAIN is THE tool to use in order to see how the DB2 Optimizer has decided to execute a SQL-statement, a Package or a PLAN – depending on what was explained. Why is EXPLAIN such a necessity – can’t we simply look at the SQL statement and predict how the performance will be ? Many years ago we probably could do that, but with today’s sophisticated SQL possibilities and up to 2MB statements and dozens of table joined – it is impossible to predict the access path in a timely manner (unless you are Terry Purcell). Dealing with performance – we also need to look verify if it matters whether the SQL statement execution takes half a second or 5 minutes – AND – what will happen when/IF a REBIND or BIND is executed (like DB2 upgrades or program promotion). Another issue is – do we always know if a JOIN or SUBSELECT will perform the best - many issues to consider in today’s performance world.
Bottom line – The EXPLAIN function is invaluable in the day-to-day job in order to predict performance, or at least get a clue how the Optimizer decides to access the data. Like mentioned earlier – sometimes it’s hard to predict how the SQL statement is coded, and even though the Optimizer is getting more and more clever (and sophisticated), the performance can differ based on the SQL statement even though the result-set is the same. This slide has two different statements both producing the same result-set, but the performance and access path is completely different.
As mentioned on the previous slide – these two statements are very different but they both do produce the correct and desired result. Can you predict which SQL statement is the cheapest based on the current catalog statistics ? Perhaps you can, but in order to verify which one to use it might be necessary to both execute EXPLAIN as well as the actual statement and look at the execution statistics.
The two statements do get different access path selected by the Optimizer. The lower part of the slide is a snapshot of some of the columns from the DSN_STATEMNT_TABLE. The processing Milliseconds and Processing Service Units are indeed different for the two explained statements, so let’s have a closer look at the actual executions before we make up our mind which statement to use.
Both SQL statements were executed in two scenarios: In order not to get any buffer hits, the database was stopped and started ahead of each execution All pages were in a separate buffer pool in order to see the impact. Both scenarios produced similar difference in terms of execution and CPU time, so we have an easy pick in this case since the real time execution reflected what the EXPLAIN output indicated. The conclusion is – like we discussed earlier, the EXPLAIN function should be a mandatory part of the application development environment, and it is necessary for the DBA’s and Application people to get together and discuss DB2 performance issues.
Why do we spent so much time and resources working with and understanding Explain ? Bad performing SQL statements cost the enterprise a lot of money, and performance is today’s hottest topic – hotter than back-up and recovery planning for some sites. A study a few years ago claimed that that finding and correcting bad performing SQL in production as opposed to finding these in the early stage of development costs at least 30 times more – for a number of reasons: When performance isn’t optimal fewer transactions can be pushed through the pipe and every transaction takes longer This might also impact other SQL executions since a lot of resources are shared. Eventually it might be necessary to upgrade the HW – meaning additional costs for both HW and SW In worst case it could result in loss of businesses and even the entire business. Especially where the business is dependent on the Wild Wild Web – it is so easy for shoppers to visit another site unlike driving to another store if the line at the cash register is too long or no one is available to assist you.
So far we have talked about how important it is to do Explain as early as possible in the development phase. When applications are being changed – meaning we already do have the application running in production – the Explain process needs to be considered from a different angle, but more about this later. When Explain is executed in the test environment everything might look great, but a number of issues will have to be considered before signing off. The DB2 Optimizer makes decisions based on a number of things, like hardware, number of processors, buffer pool size, size of workfiles etc. Also – when statements are being explained (except for Bind / Rebind explain(yes)), the host variables have to be replaced with parameter markers. This could result in some misleading access path descriptions in the “old days” since the Optimizer can’t see the host programs definition of the host variables. This is becoming less an issue for every DB2 version, and especially DB2 V8 has removed some of these concerns. So keep in mind that it might not be sufficient to “only” explain the SQL in the test environment. The real picture might not be available until the SQL is explained in the real production environment (more about this later).
The previous slide listed some of the not-so-obvious parameters influencing the Optimizer decision – while this slide lists some of the more obvious pieces of information assisting the Optimizer to chose the “correct” access path. These are some of the obvious parameters being looked at and checked carefully before the Optimizer decides “how” – but not limited to: Table definition and column attributes Column distribution statistics – and DB2 has introduced a lot of great enhancements during the past couple of versions, where it is necessary to change how Runstats has been executed in the past Indexes and the columns included, Clusterratio, index levels Which columns are being ordered or grouped by – can sort be eliminated.
Now we all know that Explain is very important, so let’s have a deeper look into what is needed in order to execute Explain. Minimum the PLAN_TABLE is needed. This is where the Optimizer records the chosen access path when Explain is executed – either dynamically or via Bind / rebind with Explain(YES). The DSN_STAMENT_TABLE is optional, but I strongly recommend to have this created too (with the same creator as for the used PLAN_TABLE). More later why I think this table is a MUST HAVE. The DSN_FUNCTION_TABLE is only needed if UDF’s are going to be explained – and this will not be covered in this presentation. A new table was introduced in DB2 V8 – the DSN_STATEMENT_CACHE_TABLE. This table is mandatory if the DB2 Dynamic Statement cache is explained – either via using the statement token or the entire cache is explained in one go.
Just like DB2 has been through a huge evolution – so has the PLAN_TABLE. The very first DB2 version didn’t have an EXPLAIN feature associated, so predicting performance was “a lot of fun”. Thank God it didn’t take long before Explain became available, and it has grown since then. Every Db2 version has extended the PLAN_TABLE content, and listed on the right hand side are some of the major events which are reflected in the PLAN_TABLE content.
Now that EXPLAIN and the PLAN_TABLE has received a lot of credits – let’s look at some of the issues we’re facing – and why we sometimes see the execution performance being completely different what we expected based on the PLAN_TABLE content. The reality is that EXPLAIN does not tell us the whole truth / everything, so we still need to put on “the DBA Performance glasses”. What ever goes through the Optimizers mind during the cost estimation process isn’t reflected. Why one index was chosen to be preferred over another one. Why one table was selected to be the first accessed over one of the other ones involved in the join process. It also could be nice to see why a predicate was considered STAGE 1 or 2 as well as why a predicate isn’t indexable (the matrix is huge in the performance guide). Tools do exist which greatly assist in this matter. However – more important issues do exist than these ones which are not reflected on the Explain output – please see the next page.
Explain “only” tells you the basic content of the SELECT, DELETE, UPDATE and INSERT statements – but a lot of other “stuff” is going on behind the scenes which isn’t reflected. The following issues are very important to have in mind when performance is analyzed – no matter if this happens before or after the Explain is executed: Referential Integrity cannot be viewed and can have a major impact on the performance if nor properly indexed. An insert can turn out to be a disaster. Triggers being fired as a result of the SQL statement are not listed in the Explain output – again this can result in “less than optimal performance”, and the same goes for UDF’s which will have to be explained separately. Check constraints are not listed either. Another issue to think about is – even though the Explain output shows one access path, there is no guarantee this access path will be used. DB2 reserves the right to disable prefetch activity – and enable again. The same goes for parallelism – DB2 might turn off the parallelism at execution time, so this is just an indication – or intent. RID Pool shortage used to be a disaster and the SQL execution could fail. DB2 9 has changed this so DB2 will fall back to a tablespace scan, so make sure this event is monitored.
The invention of the DSN_STATEMNT_TABLE being part of the Explain process was (in my opinion) some great news. I believe it is almost as important as the PLAN_TABLE itself due to a couple of columns present in this table: Two of the columns I really like are the PROCSU and PROCMS which can be used to compare costs for a statement between program versions – or – compare the costs associated with different SQL statements like illustrated earlier where two different statements produced the same result-set, but were quite different in terms of costs. Another reason is – it’s a lot easier to compare numbers than rows in the PLAN_TABLE. The DSN_STATEMNT_TABLE got one additional column in DB2 9 which is the estimated ELAPSED TIME. According to IBM this should only be compared within the “same statement” and “same DB2 version” and not to other statements.
And we have another “new born” – DB2 9 has introduced about 9 new tables to be used with Optimization tools – like Explain. One of the most exiting new tables is the DSN_VIRTUAL_INDEXES table to be used when doing EXPLAIN. It is used to “define” an index in order to verify if an index which currently doesn’t exist CAN be used by the Optimizer.
I believe most of us have had scenarios where we think the Optimizer decided an Access Path we did not agree with. In “the old days” a few years ago, the “wrong” access path usually was due to insufficient information. Not because we didn’t supply the Optimizer with correct or enough information – but simply because the catalog and DB2 did NOT have any mechanics to capture and store the information. Runstats was very simple and the Optimizer was still at the stage where it hadn’t learn to walk – still crawling and needing a chair to get up. Where the Access Path often went South was when the table had very skewed data in one or more columns – and where there wasn’t enough information for every column in the index – or rather column combinations. Today everything has changed. Runstats has a LOT more options to collect the information needed for the Optimizer to make more mature decisions.
What did we do a few years back when the Optimizer picked a “wrong” access path ? We had no OPTHINT so we could describe the access path in the PLAN_TABLE and try to have DB2 pick the desired access path. Runstats had no REPORT option – meaning we had to execute Runstats and then hope for no BIND/REBIND until we fixed potential performance issues. We only had FIRSTKEYCARD and FULLKEYCARD which could be pretty bad for multi-column indexes with a few distinct values for the first key-column. I used to update NLEVELS to favor other indexes, change the CLUSTERRATIO, update FIRSTKEYCARD, . . . . . . . . No fun since how did you store this information when the information was changed in the catalog. A lot of users used to (and still do) change the SQL statements to add all kind of weird predicates – making it difficult for the next person to maintain the SQL statement.
Some of the bad scenarios was to force index access – DB2 V8 introduced VOLATILE for the table DDL, which is a way better method to tell that an index should be used if present. This also eliminates the documentation issue of keeping track of which tables should have forced index access (I used to use LABEL and COMMENT when I manually manipulated catalog columns). One issue to have in mind is – PREFETCH will be disabled for these tables. Another method to “HELP” the Optimizer is to use REOPT on the BIND statement. Be careful using this parameter due to the costs of evaluating Access Path at execution time, but for situations where the AP really should be different based on the host variable content (and data is very skewed) – this might be a way better solution than messing with the catalog – og having more indexes being manipulated manually. OPTIMIZE for xx ROWS is another nice method to change the Optimizers mind. Remember this doesn’t restrict you from selecting MORE rows than specified. For this case DB2 V8 introduced FETCH FIRST x ROWS ONLY. Another common method is to rearrange the tables in the FROM clause so the table providing the best filtering is listed first. Finally – adding a predicate like < 0=1 > can invalidate some index usage.
The previous slide mentioned the use of OPTHINT. I don’t think this ever got the usage it was intended to get – it’s not a widely spread used method. It is not a very user friendly method, but it’s still way better than manually manipulating the catalog. The outlined method requires a few steps to get a package-statement using “the desired” access path – and it only works for statements going through the BIND process. If this method appeals to you – consider using the QUERYNO in the SQL statement in the programs. This will eliminate the need to go back and redo the whole thing when the statement changes in the source code. Doing this also provide an easier method to compare PLAN_TABLE content between program versions (more about this later).
However – before moving into some of the manual manipulative methods – the preferred method is still to use the DB2 tools available – RUNSTATS !!! One major reason is the Optimizer changes between every version – and even within one DB2 version, and going through all the packages/statements changed in order to get better performance is a nightmare. RUNSTATS has evolved dramatically – especially in DB2 V8 and DB2 9. The COLGROUP parameter is a great method to get cardinality for combinations of columns (as opposed to FIRSTKEYCARDF and FULLKEYCARDF) – and it’s even possible to describe the number of combinations is needed depending how skewed the data is. DB2 9 introduced HISTOGRAM statistics which can be used too where ranges of variation exist. Soon – we probably won’t even have to think about RUNSTATS – I believe it will become an integral part of the “DB2 engine”.
So far we have been talking about how to get Access Path described and some methods to manipulate the Optimizers decision. This all leads to the main point in this presentation – Versioning Explain – why this is important. Two reasons: We need a proactive approach in order to predict the performance for both new and changes SQL statements. This is a lot easier to control for static programs than all the Wild Wild Web from the dynamic world. We also need to have a good reactive approach in place. Admitted we cannot catch everything before the SQL is in production – or when something changes upon which the SQL is dependent. So we need to have a method to find out what happened when and why. In most cases this is when users complain about performance, but it happens that someone asks us “why does transaction xyz run faster / use less CPU ?”. If we can find the cause we might be able to apply identical changes to other applications.
My point is – having the REACTIVE plan in place as well as the PROACTIVE plan in place is JUST AS IMPORTANT. You might ask – why is Explain so interesting ? Isn’t it enough to simply have access to the most recent Access Path in PLAN_TABLE ? In my opinion this is not sufficient – I really prefer to have access to the CURRENT access path to in order to compare all the ingredients involved in the Access Path selection from the past explain and the current explain. So many things can have changed the Access Path (my favorite scan) – so in order to save time playing “Sherlock Holmes” I need all these ingredients to be handy (we will look into the ingredients later). Having the REACTIVE plan in place is important so we can find out WHAT made the Access Path change – for the better or worse. The purpose of having a PROACTIVE plan in place, is to be able PREDICT what the performance will be before the SQL / Package is ported to production. There are many ways to accomplish this – productions catalog statistics can be imported into the “explain environment”, the program can have a BIND executed with EXPLAIN(YES) on the production system using a “QA Collection-id” – and there are probably many more methods to accomplish this. The main idea is to PREDICT what the Access path and cost will be.
In many cases, Explain is only used by DB2 DBA’s. In order to utilize the resources and knowledge available, the application developers should use an Explain tool too – like CA Plan Analyzer. Instead of having the DBA’s chase the developers to make them change the SQL, the developers can gain a lot of knowledge by using CA Plan Analyzer due to the Expert System Rules and recommendations. One important issue or recommendation is to create Explain Profiles up front so the users don’t have to walk through a lot of options and decisions. Instead the user can simply select the profile of choice where all the needed options and reports are defined.
The important parameters are marked with red. Database Options is used to specify versions of the explain will be saved. DB2 subsystem-id (or group attach name must be specified – here S81A). Use Rollback of PLAN_TABLE output since these will be saved in the Plan Analyzer repository anyway. Specify Future explain in order to see what the Access path will be as opposed to Current, which describes the current access path as recorded in the plan-table. Specify SAVE REPORTS so these can be viewed at a later time. Specify UPDATE REPORTS in order to describe which reports to generate when Explain is executed.
Most of the reports can be generated as long or short format. The red ones are my preference, but a good idea is to execute one explain with all the reports long and another explain with all the reports short (where this is possible) and then decide which ones provide the best information. Also – remember there can be as many profiles as needed in order to reflect the different levels og expertise and knowledge. Specify that the Compare should be generated and by also chosing the Compare Options, it is possible to describe what the compare report should include (see next slide). The STATISTICS report is also important – more about this later.
This is the different options available for the Compare report. Describe what the report should include when CA Plan Analyzer has found a difference in Access Path
We start by creating a strategy where the strategy name is identical to the program name (package). This is an easy method to automate and it’s easy to find the information when performance needs to be investigated.
Since this strategy will explain a DBRM, the TYPE is specified as D and the DBRM library and member is specified. Wildcarding is possible here too, so RQA* would pull in all DBRM’s with a name starting with RQA. It is also possible to select a number of DBRM’s by using RQA% instead.
After the strategy is created, option E(xplain) is typed for the strategy whereby this panel is displayed. Simply specify the EXPLAIN PROFILE needed and hit ENTER, and the CA Plan Analyzer explain is executed.
Once the explain is finished executing, the first version of explain is generated.
Instead of creating the strategy online, the control cards can be imbedded in a batch step and appended the program promotion procedure in the Change Management process. In fact, nothing needs to be done in CA Plan Analyzer to implement this process. The batch control cards will CREATE a strategy if it doesn’t exist already, and in both cases – an explain version is created. The versioning works like a wrap-around, so when the maximum number of versions exist, this process will delete the oldest. The parameters marked in red need to be symbolic values to be replaced by the change management process for the program promotion procedure.
Now the second explain for this strategy (package) has been executed and version 2 is created. Using option R will display all the reports generated for this explain version.
Let us have a closer look at the Explain Compare report.
Statement 1202 has changed from matchcols=1 to matchcols=2 and at the same time has gone from List prefetch to no prefetch. The Service Units has decreased 86% compared to the previous explain, so performance should get a lot better. Every access path change is illustrated by a hyphen.
The sixth statement was statement number 1322 but now is statement number 1325 (someone must have added additional application code). This statement will change from matchcols=2 to matchcols=1 which will be a 300% increase in service units.
Another useful report to look into when performance and access path needs to be analyzed in detail is the Catalog Statistics report. This might lead to the answer why the access path was chosen. This report is especially invaluable when comparing access path between two version is needed since the access path change might be due to changed catalog statistics, so it is possible to see the catalog statistics from the point in time the Explain was executed last time (and even earlier if needed).
The biggest advantage using the automated batch explain approach is, that corrective actions can be done prior to the program moving into production. Also – it provides a unique opportunity to quickly identify WHEN performance changed and WHAT the reason was when someone asks “what happened – why does this transaction use more resources ?” Having access to this kind of information, it is possible to identify reasons for performance changes and use this in other scenarios.
Bottom line – having a well defined PROACTIVE procedure as well as being able to react quickly when being in a REACTIVE mode is important. Both issues can be automated using CA Plan Analyzer.
At this point we have covered a lot of details why Explain versioning is worth to consider if not already implemented. The entire approach – both the REACTIVE as well as the PROACTIVE has been covered, but ONE issue can make the whole thing almost fall apart, but it really needs to be considered, and there are ways to deal with the following issue too. Even though we make all kind of precautions in order to be PROACTIVE and prepare for the REACTIVE approach – consider the situations where a Package is having a REBIND executed outside the normal “change management” implementation. One method to “catch” these is to have a daily job finding the Packages and pass these through the “normal Explain process” in order to save the new access path, catalog statistics etc. Talking about REBIND – this brings us to one topic popping up all the time : When to REBIND and when NOT to REBIND . . . . . . . . . . . . My believe is – if it works – don’t touch it UNLESS major changes has happened to the data AND statistics in the catalog.
The main issue with REBIND is – DB2 will generate a new Access Path (unless OPTHINT used and utilized), but it MIGHT be the same Access Path. Is this a great idea ? Maybe – but we really don’t know until after the fact – and this is the big issue and concern. What IF the DB2 catalog statistics isn’t “optimal” – someone by accident executed Runstats before the reorg of a very disorganized object was done !!! Another issue is – when dealing with TRIGGERS – the only way to predict the performance of a trigger is to do a REBIND using EXPLAIN(YES). Again like mentioned earlier – the REBIND’s happening outside of “our” control can mess up our very good intentions being PROACTIVE.
One of the reasons for using the previously mentioned methods to collect all the information is – in case the Access path suddenly went South – hopefully we can “restore” the catalog information and get back our “FAVORITE SCAN”. In other words – having the ability to FALLBACK to a previous Access Path. This has NOT been an easy task – at least until now. Once on DB2 9, a relative new APAR is available making the fallback procedure a lot easier. Depending on the ZPARM setting and BIND parameters – it is possible to save up to three copies / versions of a Package. All that is needed is to start using some one BIND parameter for the REBIND statement : PLANMGMT. This parameter basically allows you to SAVE an Access Path after the INITIAL BIND is executed.
The previous page illustrated how it now is possible to save an ACCESS PATH. Let’s assume you end up with a package after a REBIND with “less than optimal” Access Path – it is now possible to FALLBACK to a previously saved Access Path by using another REBIND parameter which is SWITCH.
These three jobs illustrates how this new – excellent - feature can be used to save a preferred Access Path using the EXTENDED keyword for PLANMGMT which allows up to three versions of the Package. If an earlier version of a package needs to be active, the Package will have to go through a REBIND using the parameter SWITCH which will indicate which version is desired to become the active. The third example illustrates you cannot SWITCH back to a previously saved Package Access path version – unless you have SAVED one using the new parameter PLANMGMT in the REBIND
One question comes to mind when dealing with multiple version of a Package Access Path – where is this stuff stored. I haven’t found much documentation for this very new feature, but using my favorite “secret weapon” – this is what I discovered: 1) Doing three REBIND’s of one Package resulted in 103 updates to the Skeleton Package Table. 2) When the same Package had a REBIND executed doing a FALLBACK using the SWITCH(PREVIOUS), this resulted in 309 updates to SPTR. Based on this scenario, all the versions are stored in the SPTR table – so make sure you have enough additional space allocated to this dataset before enabling this new feature.
How to Predict DB2 Performance Changes
How To Predict DB2 Performance Changes Mainframe MB103SN