• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
SQL Performance Basics for DB2 UDB for iSeries
 

SQL Performance Basics for DB2 UDB for iSeries

on

  • 3,464 views

 

Statistics

Views

Total Views
3,464
Views on SlideShare
3,462
Embed Views
2

Actions

Likes
0
Downloads
47
Comments
0

2 Embeds 2

http://www.techgig.in 1
http://www.techgig.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Embedded SQL is known as static SQL on DB2 UDB for 400. SQL Precompilers allow you to embed SQL in most of the popular AS/400 programming languages. Most efficient since DB2 stores the SQL access plan in the program objects. We'll seesome pictures portraying these differences in a couple charts. You need to say non-dynamic SQL statements embedded in a program, because PREPARE, EXECUTE, & EXECUTE IMMEDIATE can be embedded in a HLL program just like an INSERT or UPDATE statement.
  • Since Dynamic SQL interface constructs SQL statements on the fly, the optimizer cannot define an SQL access path ahead of time. A lot of the time, this is necessary because you don't know until the user selects check boxes or chooses a menu option exactly what flavor/type of an SQL statement that needs to be executed. Typically, the user interface causes multiple parts of a dynamic SQL statement to be concatenated into 1 final SQL statement text
  • Here's a pictoral representation of what Static SQL looks like. The precompiler validates and parses the SQL requests in the high-level language program. An access plan (content is explained on the next chart) is then created for each SQL statement and stored into the program object. Static, embedded SQL is faster due to the fact that the access plan is stored in this static program object. When the program is run, the validation & access plan creation steps can be skipped - thus, making performance faster. The optimizer doesn't spend a lot of time creating a highly-optimized plan during precompile. One of the primary reasons is that host-variable values are not known at compile time. They do the detailed analysis on the first execution of the statement and then just update the original access plan. So the first time you run the statements in an SQL program, things are going to be a little bit slower. Might be something to be aware of in benchmark activity.
  • Access plan contents are the same for both a static or dynamic SQL request
  • No sharing of resources - access information must be "relearned" by every job and every user. Don't have any static system object where the access plan can be stored. For Dynamic SQL, generic plan quickly generated at Prepare time and thorough, complete plan generated at Open or Execute Time System wide cache being added in V4R4 to enhance reuse
  • Extended dynamic was introduced on the AS/400 for some SQL interfaces. A permanent system object, SQL packages, is used to store the access plan so that some processing can also be skipped with Dynamic SQL. Mention that any discussion of SQL packages here, doesn't relate to DRDA's usage of packages (if anyone has experience with DRDA on the AS/400).
  • Step #2 highlights why no bind commands or statements exist on the AS/400, the database does it automatically. Even better the system will automatically rebuild & rebind the plan if the environment has changed (ie, new index just created to improve performance) NOTES: Access plan rebuilds in a concurrent environment with LOTS of user can be a sore point with large customers. The old support really prevented anyone from updating the access plan because every user had a read lock on the program object that prevented the exclusive lock necessary to update the access plan. Thus, all users were creating a temporary access plan. New support was recently added to add another mechanism where the first user to require an access plan rebuild is basically "let in" to do the plan rebuild and update. Other users would still create a temporary access plan until the update of the access plan in the program object is completed. This isn't perfect, but does allow the plan in the program object to be updated (The old support never really let the access plan to get updated until it was too late).
  • Here's the AS/400 system message that's signalled everytime that an access plan is rebuilt. All of the reasons that cause the rebuild are found in the second-level text of the message. NOTE: A change in memory/storage pool size does not automatically cause an access plan rebuild. Here's the algorithm used by the query optimizer: IF cost of the access plan/implementation is less than 2 seconds, THEN all changes in size of the memory pool are ignored ELSE (cost of access plan/implementation greater than 2 seconds) IF memory pool/memory fair share has gone up >= 100% (double the memory) or been reduced by >= 50% (half the memory), then the optimizer will rebuild the access plan
  • Use the following example to illustrate the point: If you're national company on the East Cost and the first time that the SQL statement is run a program with a host variable value of NY (New York), it very could will be that the optimizer chooses a table scan because 1/2 of your customers live in New York City. The next time the SQL statement is run, someone's doing analysis on the customers in IA (Iowa). This could very well only return 1-2% of your customers. Is it a good idea to use table scans to only select 2% of the rows?
  • Step #2 highlights why no bind commands or statements exist on the AS/400, the database does it automatically. Even better the system will automatically rebuild & rebind the plan if the environment has changed (ie, new index just created to improve performance) NOTES: Access plan rebuilds in a concurrent environment with LOTS of user can be a sore point with large customers. The old support really prevented anyone from updating the access plan because every user had a read lock on the program object that prevented the exclusive lock necessary to update the access plan. Thus, all users were creating a temporary access plan. New support was recently added to add another mechanism where the first user to require an access plan rebuild is basically "let in" to do the plan rebuild and update. Other users would still create a temporary access plan until the update of the access plan in the program object is completed. This isn't perfect, but does allow the plan in the program object to be updated (The old support never really let the access plan to get updated until it was too late).
  • Talk about "warm-up" effect of applications (might be needed for benchmarks) Also talk about the fact that that you can't pre-open ODP's with SQL. We'll talk about how DB2 UDB leaves ODP open and what events/situations force the ODP's to close causing expensive full opens
  • Talk about "warm-up" effect of applications (might be needed for benchmarks) Also talk about the fact that that you can't pre-open ODP's with SQL. We'll talk about how DB2 UDB leaves ODP open and what events/situations force the ODP's to close causing expensive full opens
  • People experienced with using native AS/400 interfaces are familiar with pre-opening files to avoid the cost of a FULL open. With SQL, there's no way to "pre-open" a file - you must rely on your application design, coding, and the system to avoid FULL Opens.
  • Many users want to see where ODP's live and how they are used to implement a SQL statement. ============================================ Very low-level details behind this picture. This view of storage management focuses on database I/O. The physical I/O is done below the MI by SLIC modules. This function enables the AS/400's Single Level Storage. Application/system functions running above the MI do not need to concern themselves with the physical location of the data (is it in memory or on disk?). To use Expert Cache or not is the main performance decision to be made related to physical I/O. Other specialized performance techniques do exist such as Set Object Access and reorganizing a file based on the ordering of a given index. However, in my opinion, only in exceptional cases do you want to get into the use of these techniques. The operating system will use multiple I/O tasks and will make use of blocking factors as high as 128K to maximize the efficiency of this physical I/O. The logical I/O, in this view of storage management, is being accomplished by system modules such as QDBGET, QDBPUT, QDBGETM, etc. Applications can apply blocking techniques to improve the performance of this logical I/O (they can go as high as 64K blocks). Blocking logical I/O is generally a key focus item in improving the performance of batch like applications. Note that blocking at the logical I/O level applies to both index or direct data space accesses to the data. Again, this function is above the MI and does not concern itself with physical location of the data. If the index/data space is in memory multiple jobs will have concurrent access to it and no physical I/O will be needed to satisfy their requests. If it is not in memory, the "hardware" SLIC tasks will be invoked to get it into memory (resulting in physical I/O that is transparent to the application - but perhaps not to the end user as their response time would be impacted). The Job Structure forms the unique, user dependent, repository. PAG stands for Process Access Group. Among other things it contains Open Data Paths (ODPs) which form the link between the shared program and the shared data for each unique job on the system. In general an ODP is created when the application program opens a file. In the case of SQL an ODP is created for each unique SQL request in the application. From a performance perspective the Job Structure and ODPs are relatively expensive objects to create. Thus, an application design point should be to reuse these structures as much as possible. Pre-started jobs, reusable ODPs, shared ODPs and extended dynamic SQL are some of the AS/400 techniques that can help minimize the performance impacts but the application must be designed and implemented to make use of these techniques.
  • All of these SQL requests use a cursor underneath the covers, that is why they all do Opens. Searched UPDATE & Delete basically say that any UPDATE or DELETE statement goes down the Open path. The only exception is positioned Updates or Deletes done via a declared cursor.
  • DB2 UDB tries to minimize the number of ODP creations ("full opens) by reusing existing ODP's as much as possible within a job. Creation of an ODP is 10 to 20 times in terms of CPU than reusing an existing ODP. Reuse of ODP's is only of a benefit when the same statement is executed multiple times within a job. That's why DB2 UDB doesn't start reusing ODPs until after the second execution of the statement. There is a PTF now available to enable Reusable ODP mode after the first execution. Details on the PTF are covered later on. There are certain ODP's that cannot be reused due to their particular implementation.
  • Here's the logic flow that DB2 UDB goes through to determine if the ODP can be left open for reuse
  • Here's an example to see how reusable ODP's might work in an SQL application. The Open or the cursor is when DB2 UDB goes thru it's logic to determine if an ODP needs to be created or reused. Pseudo-close means that ODP (and cursor) are closed from an application logic and SQL language point of view, but left in a state so that a Full Open does not have to be performed (internally it's called a pseudo-open) when the statement is run again. The pseudo-close state is similar to the Recycle Bin on Windows desktops. The objects placed in the recycle bin are logically deleted, but not physically deleted until the recycle bin is emptied NOTE: Can mention that the system command, WRKJOB, using Option-14 will show tables left open as part of the pseudo-closed ODP
  • Here's another look at ODP reuse in a real application environment and the messages that appear in the joblog. In this example, the Same Stored Procedure called 3 different times in a job to fetch "Central" customers into a temporary result table. You see show the ODP is not deleted and reused on subsequent executions of the stored procedure. The "ODP not deleted" message (SQL7914) is THE indicator of ODP's being left open for reuse - the "ODP Reused" message is NOT generated every time that the ODP is reused (different interfaces and different sequences of SQL statements can cause the message to not be signaled). NOTE: There are two ODP's - one for the INSERT request and one for the SELECT request. In general, ODP's are created from the inner SQL statement to the outer SQL statement. Insert with Subselect - ODP's are reused or not reused as a pair (eg, won't have a reusable ODP for the Insert and a non-reusable ODP for the Select)
  • Earlier it was mentioned that you could tune DB2 UDB so that reusable ODP processing could start after the first execution, instead of waiting to the second. This data area is the interface for performing this tuning. NOTES: Support was PTF'd back into V4R2 & V4R3 Extra storage usage can be somewhat monitored by watching auxiliary storage usage ("% system ASP used) on the WRKSYSSTS command. and main storage usage on the same command by looking at faulting rates
  • Now that you understand the benefits of DB2 leaving ODP's open for reuse, this section will cover the designs and requests that can prevent DB2 from reusing existing ODBC
  • DB2 UDB can only reuse ODP;s for a single, embedded SQL statement. Different instances of the same SQL statement cannot share the same ODP. Thus, if same statement will be executed multiple times within a program call, need to code logic so that statement is in a shared subroutine that can called so that DB2 is able to reuse the ODP. NOTE: Dynamic SQL is discussed in the following section, almost all of these open & reuse tips/restrictions apply to all SQL interfaces unless explicitly noted
  • If unqualified SQL tables exist in your SQL requests (not coding the library/collection/schema name) and running with system naming mode (*SYS) so that the system determines the location of the table, then changing of the library list will cause an ODP creation. If the table location has changed, then the existing ODP cannot be reused because it's pointing at the wrong version of a wrong table. An international company exprerienced performance problems because they were using a single SQL program to run against different company databases that lived in different libraries. Every time the library list changed for a different country, the opened ODP was useless. If the database table location isn't really changing (the library list might be changing to just switch the location of data queues, data areas, etc.), then there's a precompiler option that specifies a default collection (library) for unqualified tables. With the default collection, the database doesn't have to search for the location of the table. The system override commands can point at different versions of the same table, so the ODP must also be recreated in that case. NOTES: Users new to the AS/400 will find a library list similar to the concept of a Path on Unix & PC systems SQL naming mode (*SQL) isn't impacted by this because it doesn't search the library list for unqualified tables - it only searches the library with the same name as the user id running the job V4R4 will provide an option to specify a default collection for dynamic SQL as well
  • A temporary index does not always make the ODP non-reusable - If temorary index build appears during full Open and then SQL statement goes into reusable mode, that's the only real indicator that the temporary index is reusable If a sparse (select/omit) temporary index is created to build host variable selection into the index, then ODP is not reusable because the index selection can potentially be different on every execution of the query. Temporary indexes can never be shared by other ODP's. NOTES: Reinforce that the optimizer never just creates a temporary index for selection, always involve a join, group by, or order by - important to keep in mind when creating a permanent index to make the ODP reusable
  • Queries containing complex wildcard searches cannot be implemented with a reusable ODP. In V4R5, the implementation of complex wildcard searches was changes so that it would no longer cause the ODP to be non-reusable. Real simple wildcard searches like "JOHNS%" can be done with a reusable ODP. Ordering specified on a host variable (eg, ORDER BY) value will also make the ODP non-reusable, but probably not likely
  • Normally an UPDATE WHERE CURRENT of request doesn't require an open operation (ODP creation). However, if the SET clause includes a function or operator, then a full open must be performed. To avoid this, you can change the request to have the high level language perform the requested function or operation and then just use the computed result on the UPDATE request.
  • Need to be aware that reusable ODP's do have one drawback - once they're in reusable mode, DB2 UDB is unable to react to environment changes (table size increase, new index, etc) and rebuild the associated access plan. NOTES: This is the reason why we used the hokey comments in the lab exercises, so that you'd see different access methods being used. Reusable ODP's do get share locks on the associated database objects, so that columns and indexes cannot be deleted once in reusable ODP mode
  • Just covered requests type and implementations that prevent ODP reuse from occuring. This chart covers actions that force the ODP to be deleted. The SQL DISCONNECT is typically only used with a remote DRDA connection. However, it could be a trick/technique to just use for local database access to prevent ODP reuse from occuring. This would have to be used carefully, because it would delete the ODP for all SQL statements in a connection. Probably something that you'd want to do with a single SQL statement in a program by itself. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> NOTES: DRDA - ODP Considerations ========================== When a CONNECT (type 1) change the server for an activation group, all ODPs created for the activation group are closed When a DISCONNECT statement ends a connection to the server, all ODPs for that server are closed When a released connection is ended by a successful COMMIT, all ODPs for that application server are closed.
  • Just covered requests type and implementations that prevent ODP reuse from occuring. This chart covers actions that force the ODP to be deleted. The SQL DISCONNECT is typically only used with a remote DRDA connection. However, it could be a trick/technique to just use for local database access to prevent ODP reuse from occuring. This would have to be used carefully, because it would delete the ODP for all SQL statements in a connection. Probably something that you'd want to do with a single SQL statement in a program by itself. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> NOTES: DRDA - ODP Considerations ========================== When a CONNECT (type 1) change the server for an activation group, all ODPs created for the activation group are closed When a DISCONNECT statement ends a connection to the server, all ODPs for that server are closed When a released connection is ended by a successful COMMIT, all ODPs for that application server are closed.
  • Basic SQL performance constructs and issues have been covered, now it's time to look at the nuances of dynamic SQL on the AS/400.
  • "Prepare once, execute many" matches the idea of avoiding full opens. This a design point that often has to be changed in SQL applications ported over to the AS/400. DB2 UDB does perform some background caching, so that a PREPARE does not automatically cause a full open on each execution
  • In case you've forgotten, here's a sample dynamic SQL request to see how the statement text and name are specified in the request.
  • Parameter markers are a good way of making dynamic SQL requests easier to reuse the associated ODP's DB2 UDB does automatically transform some dynamic SQL requests to use parameter markers, but again application designers shouldn't rely on this system support and instead explicitly utilize parameter markers.
  • Here's the same dynamic SQL now utilizing parameter markers and then passing in the parameter marker value on the Execute. NOTES: On other SQL-based interfaces the parameter marker value is assigned via some bind process
  • As mentioned earlier, DB2 UDB does convert some dynamic SQL requests into using parameter markers to make it easier for ODP's to be reused. Included here are a few parameter marker conversion examples. This happens on most dynamic SQL interfaces (the one real exception is Client Access ODBC - had to disable this feature because it was messing up client apps that were using both parameter markers and extended dynamic packages)
  • There are certain SQL requests that prevent DB2 from performing this parameter marker conversion and examples of each restriction are included. NOTES: In past releases, the following were restrictions that prevented PM conversion: 1) Original statement contains both literals and markers 2) INSERT or UPDATE Statement uses special registers (CURRENT DATE, CURRENT TIME, etc)
  • Extended dynamic improves performance by permanently caching dynamic SQL requests within an SQL package. This almost mirrors how the access plan for embedded, static SQL requests are stored in the program object. Search of the package is done by statement name similar to the search that's done for the caching that occurs at the job level for pure dynamic SQL. If a match is found, the access plan stored in the access plan can be reused. NOTES: For Client Access ODBC/JDBC extended dynamic access, here's how the search works... The ODBC Host Server code manages a table which maps the client application's statement name to the internal name you see in the package. If the application prepares a statement named S1 with the text "INSERT INTO x SELECT FROM y" , we first search the package for a statement with the same text. If we find it,we give Host Server the internal name QZxxxxxx and they put it in a mapping table, associating S1 with QZxxxxxx. My understanding is that this table is just for the life of the connection. The advantage this gives us is that the naming on the client is independent of naming on the server which is good for any number of clients using any number of names for the same statement text. This mechanism sorts it out internally so we only need one instance of that statement in the package.
  • PRTSQLINF can be used to display the SQL statements stored in a package (this picture is output of that command). You'll see that the main package contents for a statement include the name, text, and access plan.
  • Package can share statement access plan information to all systems users of the package - very similar to embedded SQL access plans stored in program objects The package is permanent across job and system termination just like a program object. In addition, the optimizer also benefits from some of the basic stats maintained for each statement in the package. The stats kept in the object are the number of times the package is used, number of times a packaged statement executed, and number of rows fetch by a statement, With these stats, statements in a SQL package, tend to go into reusable ODP mode after the first execution.
  • There's three interfaces into the AS/400 extended dynamic support - a system API, the XDA API set, or an option on the IBM provided ODBC & JDBC drivers. The system API requires the user to build, manage, and populate the package. The XDA API set tries to less the complexity and burden from the QSQPRCED API The ODBC & JDBC drivers allow you to use and benefit from extended dynamic by just specifying that option with a mouse click or single line of code NOTES: ODBC & JDBC will be covered in more later on detail in the SQL Interfaces section.
  • Here the different API functions are outlined, so that programmers gets a feel for what's required for using this API to enable extended dynamic SQL access
  • There are some considerations when using the extended dynamic interface and the associated package. Any statement that can be prepared (ie, anything supported by the SQL PREPARE statement) is eligible. Packages do have a size maximum. Prior to V4R3 the size limit was 16 MB and it was hit fairly regularly. The size limit was raised to 500 MB (~16K statements) in V4R3 which made the extended dynamic SQL interface more usable. Pre-V4R3 packages need to be recreated on a V4R3+ system to benefit from the max size enhancement Be aware that the package size can grow even when new statement are NOT being added. Access plan rebuilds use new storage from the end of the package - the rebuilds are not done in place. NOTES: ODBC & JDBC driver restrictions will be discussed later in the best practices sections
  • The next section covers good SQL programming techniques and other factors that impact SQL performance
  • This shows all of the places where blocking can occur. The system does it automatically the background with expert cache and pre-fetching of rows. The SQL application/requester can force the blocking by utilizing blocked insert and fetch constructs in the applcation
  • Varchar column usage is popular on other databases, you want to more carefully consider the use of Varchar columns in DB2 UDB for AS/400 . If primary goal is space saving then set the ALLOCATE setting for a VARCHAR column to 0, If more concerned about performance then you want the ALLOCATE value to accomodate 90-95% of the variable length values. In the indexing strategy, you learned about the fact that Indexes over Varchar/Vargraphic columns cannot provide all the stats to the optimizer (specifically avg number of dupes stats not returned for variable length keys). NOTES: ALLOCATE(0) is the default value, if ALLOCATE is not specified ------------------------------- ------------------------------------------- If you have the following table CREATE TABLE t1 ( id INTEGER, name VARCHAR(50) ALLOCATE(0) /* Allocate 0 is the default */ address VARCHAR(100) ALLOCATE(50) /* All addresses longer than 50 go in overflow */ picture BLOB(10 MB), /* Allocate default for LOBs is 0 */ propertypicture BLOB(1 MB)) and run the following query: SELECT id, name, address FROM t1 You'd expect the system to not have to do the I/O on the 11 MB worth of BLOB columns. However, that is not what happens. VARCHAR & LOB columns are all stored in the same auxilary overflow storage area. That means if you reference one of the columns in that area, the database has to page in all of overflow storage columns. In general, the database tries to avoid doing I/O on huge LOB columns, but it can't be avoided when the row contains VARCHAR columns that overflow. The only way to avoid this quirk is to assign VARCHAR columns the maximum allocate length ( name varchar(50) allocate(50) ) so that the VARCHAR columns are always stored in the fixed part of the row and not the overflow area.
  • Varchar column usage is popular on other databases, you want to more carefully consider the use of Varchar columns in DB2 UDB for AS/400 . If primary goal is space saving then set the ALLOCATE setting for a VARCHAR column to 0, If more concerned about performance then you want the ALLOCATE value to accomodate 90-95% of the variable length values. In the indexing strategy, you learned about the fact that Indexes over Varchar/Vargraphic columns cannot provide all the stats to the optimizer (specifically avg number of dupes stats not returned for variable length keys). NOTES: ALLOCATE(0) is the default value, if ALLOCATE is not specified ------------------------------- ------------------------------------------- If you have the following table CREATE TABLE t1 ( id INTEGER, name VARCHAR(50) ALLOCATE(0) /* Allocate 0 is the default */ address VARCHAR(100) ALLOCATE(50) /* All addresses longer than 50 go in overflow */ picture BLOB(10 MB), /* Allocate default for LOBs is 0 */ propertypicture BLOB(1 MB)) and run the following query: SELECT id, name, address FROM t1 You'd expect the system to not have to do the I/O on the 11 MB worth of BLOB columns. However, that is not what happens. VARCHAR & LOB columns are all stored in the same auxilary overflow storage area. That means if you reference one of the columns in that area, the database has to page in all of overflow storage columns. In general, the database tries to avoid doing I/O on huge LOB columns, but it can't be avoided when the row contains VARCHAR columns that overflow. The only way to avoid this quirk is to assign VARCHAR columns the maximum allocate length ( name varchar(50) allocate(50) ) so that the VARCHAR columns are always stored in the fixed part of the row and not the overflow area.
  • SQL-created tables are faster on reads and slower on writes that DDS-created tables New data being added to SQL table is run thru more data validation, so there's no data cleansing & validation that has to be performed on reads. The reason for this is that data being added to SQL table is run thru more data validation, so there's no data cleansing & validation that has to be performed on reads. Less data checking is done on inserts into DDS,
  • Stored Procedures are big contributors to efficient SQL performance in distributed types of environments. The benefit of stored procedures is that they allow the same amount of database work to be performed with fewer trips to the database server by bundling together DB2 requests into a stored procedure.
  • Although V4R2 was a much bigger and more exciting database release, you have now seen that V4R3 offers a nice variety of valuable enhancements. Enhancements that make it simpler to program and tune DB2 for AS/400 processing.. As well as an advanced, patented technology in encoded vector indexes that further improves the competitiveness of DB2 for AS/400.
  • Courses with asterisks are or will be available by the end of June
  • The next section covers good SQL programming techniques and other factors that impact SQL performance
  • DB2 UDB attempts to block reads and writes as much as possible. There are a few situations listed where blocking is not allowed. NOTES: System does try to use the optimal block size. In general for SQL access it's not worth tuning the block size with OVRDBF command. OVRDBF can be used to improve the blocking in one case, an INSERT w/Subselect (eg, INSERT INTO x SELECT ...) - as of V4R4 the database only does 4K blocks, OVRDBF can be used to get 32K or 128K blocks. "Expert Cache" is one option where the runtime engine tries to blocking and pre-brings on the next block without the user doing anything other turn expert cache for the pool (Memory pool paging option value of *CALC ) COMMIT(*CHG) will now block on read-only cursors with or without the *ALLREAD setting.
  • The table on this chart shows how much extra work can be eliminated by blocking at both the application and database engine level. The ODBC performance example of going from 17 seconds to 1.25 seconds shows the benefits of bundling together database requests. NOTE: V4R4 blocked inserts available with CLI
  • SQL fetches can be blocked most efficiently when: data attrbutes of the source and target match retrieve as many rows as possible - don't try to outsmart the system do not mix single and multiple row FETCH requests on the same cursor and avoid the random fetch options such as PRIOR, CURRENT, and RELATIVE to get the most blocking possible
  • Avoiding SELECT * gives optimizer a better chance at selecting index only access or if all the columns can participate in a sort operation (SELECT DISTINCT, SELECT UNION). What are the chances that all your columns are keys in a single index. This suggestions goes along with the idea of providing the the optmizer with as much information as possible about your request and data
  • Using the FOR FETCH ONLY and FOR UPDATE OF clauses, help the optimizer better understand the nature of your SQL request. Code as much as you know. OPTIMIZE FOR n ROW clauses and WITH DISTINCT VALUES clauses are another example of providing DB2 UDB with more information about DB2 UDB
  • The lower the isolation level, the fewer system resources that are needed to implement the requested level. As pointed out in the optimizer section, commitment control levels of Commit(*CS) and Commit(*CHG) prevent hash join from being used - limiting the optimizer is another reason to choose your isolation level carefully. Switch isolation levels after a statement has been executed can cause the next execution of the statement to create a new ODP. This has been seen most often switching back and forth with the Serializable isolation level
  • AS/400 users need to be aware that SQL will automatically try to journal a table. All tables created into an SQL collection are automatically journalled. When building an application need to understand and implement database journaling appropriately. When looking at performance problems with database, understand and investigate journaling settings. Journal performance tuning topic is outside this course, but here are some places to start... Consider the minimal journal data option introduced in V5R1 to minimize the data being copies and to reduce the size of the journal objects Journal Caching PRPQ has improved performance of some batch environments up to 5-7X by caching together disk writes to the journal Review Hardware Configuration: Having enough write cache for journal performance is critical
  • If using system naming mode (*SYS), avoid unqualified long table name reference since it causes a background search of the system catalog to determine the short name. The short name is needed to determine which library in the library list to use. The default collection precompiler option can also be used to eliminate this performance overhead. Most of the system catalog tables are pretty large, so need to be very careful about queries run against the system catalogs. NOTES: Query to determine long SQL table name from short system name: SELECT table_name, system_table_name FROM systables WHERE table_name='

SQL Performance Basics for DB2 UDB for iSeries SQL Performance Basics for DB2 UDB for iSeries Presentation Transcript

  • The ABC's of Coding High-Performance SQL Apps Shantan Kethireddy [email_address]
  • DB2 UDB Family Certifications
    • Certified Database Associate - DB2 UDB Family (Test 700)
      • Website : ibm.com/certify/certs/dbdaudv81.shtml
      • Education Resources: ibm.com/certify/tests/edu700.shtml
      • Online Tutorial:
      • www7b.boulder.ibm.com/dmdd/library/tutorials/db2cert/db2cert_V8_tut.html
    • Certified Application Developer - DB2 UDB Family (Test 703)
      • Website : ibm.com/certify/certs/dbapudv81.shtml
      • Education Resources : ibm.com/certify/tests/edu703.shtml
    • Sample Tests: certify.torolab.ibm.com/ice
    • Exams were refreshed & updated for DB2 UDB for iSeries
    • Discounted exams for COMMON Attendees, discount on additional exams. See Session 404605 for Room & Time details (21YY-51YY)
  • SQL Interfaces SLIC SQL Static Compiled embedded statements Extended Dynamic Prepare once and then reference Dynamic Prepare every time DB2 UDB (Data Storage & Management) Host Server CLI / JDBC Optimizer ODBC / JDBC / ADO / DRDA / XDA Native (Record I/O) Network SQL Query Engine 410191 (54GN) – Preparing to Get the Best DB2 Performance Out of V5R2 & V5R3
  • Wright brothers software engineering: ~ "Put it (the query) all together and push it off a cliff to see if it flies."
  • Measuring & Monitoring DB2 Performance Database User Display I/O Communications Authentication Disk I/O Output Results Process Request Optimization RunTime Open Processing
    • Journaling
    • Index Maintenance
    • Constraint Enforcement
    • Locking
    • Trigger Processing
    • ODP Creation
    • Database
    • Authentication
    • Access Plan Creation
    • Index Estimates
    BEGIN END
  • Static SQL
    • Non-dynamic SQL statements embedded in application programs
    • Languages Supported:
      • RPG
      • COBOL
      • C, C++
      • SQL Procedural Language (SQL embedded in C)
      • PL/I
    • Most efficient SQL interface on iSeries
  • Dynamic SQL
    • SQL statements are dynamically created on the fly as part of application logic: PREPARE, EXECUTE, EXECUTE IMMEDIATE DSTRING = 'DELETE FROM CORPDATA.EMPLOYEE WHERE EMPNO = 33'; EXEC SQL PREPARE S1 FROM :DSTRING; EXEC SQL EXECUTE S1;
  • Dynamic SQL Interfaces
    • DB2 UDB for iSeries Interfaces that utilize Dynamic SQL...
      • RUNSQLSTM
      • CLI
      • JDBC
      • Net.Data
      • Interactive SQL (STRSQL)
    • Greater performance overhead since DB2 UDB does not know what SQL is being executed ahead of time
    • ODBC
    • iSeries Navigator SQL requests
    • REXX
    • Query Manager & Query Management
  • Access Plans Source Program w/SQL Program Object (*PGM) or Module Object (*MODULE) SQL Precompiler & Language compiler Access Plan
    • Each SQL statement is
      • Parsed
      • Validated for syntax
      • Optimized
    • as access plan created
    • for the statement
    Static SQL View Generic plan quickly generated first time Complete, optimized plan second time
  • Access Plans
    • Plan Contents
    • A control structure that contains info on the actions necessary to satisfy each SQL request
    • These contents include:
      • Access Method
        • Access path ITEM used for file 1.
        • Key row positioning used on file 1.
      • Info on associated tables and indexes
        • Used to determine if access plan needs to be rebuilt due to table changes or index changes
        • EXAMPLE: a column has been removed from a table since the last time the SQL request was executed
      • Any applicable program and/or environment info
        • Examples: Last time access plan rebuilt, DB2 SMP feature installed
  • Access Plans Dynamic SQL statement Working Memory for Job Access Plan
    • Each Dynamic SQL PREPARE is
      • Parsed
      • Validated for syntax
      • Optimized
    • as access plan created
    • for the statement
    Dynamic SQL View
    • Less sharing & reuse of resources
    Generic plan quickly generated on Prepare Complete, optimized plan on Execute/Open
  • Access Plans Dynamic SQL statement SQL Package (*SQLPKG) Access Plan
    • Each Dynamic SQL PREPARE is
      • Parsed
      • Validated for syntax
      • Optimized
    • as access plan created
    • for the statement
    Extended Dynamic SQL View Has this Dynamic request been previously executed? Generic plan quickly generated on Prepare Complete, optimized plan on Execute/Open
  • OPENing the Access Plan
    • Validate the Access Plan
    • IF NOT Valid, THEN Reoptimize & update plan (late binding)
      • Some of the possible reasons:
        • Table size greatly increased
        • Index added/removed
        • Significant host variable value change
    • Implement Access Plan: CREATE ODP (Open Data Path)
    NOTE : If optimizer has to rebuild access plan stored in a program or package object, then users may have to build a temporary access plan in some cases.
  • Reasons for Rebuilding the Access Plan
    • Message ID - CPI4323
    • Message . . . . : The OS/400 Query access plan has been rebuilt.
    • Cause . . . . . : The access plan was rebuilt for reason code &13. The reason codes and their meanings follow:
    • 1 - A file or member is not the same object as the one referred to in the access plan. Some reasons they could be:
    • - Object was deleted and re-created or restored.
    • - Library list was changed.
    • - Object was renamed or moved.
    • - Object was overridden (OVRDBF CL command) to a different object.
    • - This is the first run of this query after the object containing the query has been restored.
    • 2 - Access plan was using a reusable Open Data Path (ODP), and the optimizer chose to use a non-reusable ODP.
    • 3 - Access plan was using a non-reusable Open Data Path (ODP) and the optimizer chose to use a reusable ODP.
    • 4 - The number of records in member &3 of file &1 in library &2 has changed by more than 10%.
    • 5 - A new access path exists over member &6 of file &4 in library &5.
    • 6 - An access path over member &9 of file &7 in library &8 that was used for this access plan no longer exists or is no longer valid.
    • 7 - OS/400 Query requires the access plan to be rebuilt because of system programming changes.
    • 8 - The CCSID (Coded Character Set Identifier) of the current job is different than the CCSID used in the access plan.
    • 9 - The value of one of the following is different in the current job: date format, date separator, time format, or time separator.
    • 10 - The sort sequence table specified has changed.
    • 11 - The size of the storage pool, or paging option of the storage pool has changed and estimated runtime is less than 2 seconds
      • CQE optimizer only rebuilds plan when there has been a 2X change in memory pool size and runtime estimate less than 2 seconds
      • SQE optimizer only rebuilds plan with a 2X change in memory pool size
    • 12 - The system feature DB2 Symmetric Multiprocessing has either been installed or removed.
    • 13 - The value of the degree query attribute has changed either by the CHGSYSVAL or CHGQRYA CL commands.
    • 14 - A view is either being opened by a high level language open, or view is being materialized.
    • If the reason code is 4, 5, or 6 and the file specified in the reason code explanation is a logical file,
    • then member &12 of physical file &10 in library &11 is the file with the specified change.
  • Reasons for Rebuilding the Access Plan
    • Changes in the values of host variables and parameter markers
      • No access plan rebuild message (CPI4323) sent for this case
      • Optimizer determines if new value changes "selectivity" enough to warrant a rebuild as part of plan validation...
        • If program/package history shows current access plan used frequently in the past, then new access plan being built for data skew will be built as a temporary access plan
        • When value used in selection against chosen index and selectivity is 10% worse (less selective) than value used with current access plan AND
        • selectivity less than 50% of table
        • When value not used in select against chosen index and selectivity is 10% better (more selective) than value used with current access plan AND
        • selectivity less than 33% of table
    SELECT * FROM customers WHERE state=:HV1 HV1 = 'NY' SELECT * FROM customers WHERE state=:HV1 HV1 = 'IA'
  • Access Plan Rebuild Considerations
    • Access plan updates are not always done in place
      • If new space alllocated for rebuilt access plan, then size of program & package objects will grow over time - without any changes to the objects
      • Recreating program object is only way to reclaim "dead" access plan space
        • Check with IBM support on the availability of a utility
        • DB2 has background compression algorithms for extended dynamic packages
    • Static embedded SQL interfaces can have temporary access plan builds
      • If DB2 unable to secure the necessary locks to update the program object, then a temporary access plan is built instead of waiting for the locks
      • If SQL programs have a heavy concurrent usage, may want to do more careful planning for Database Group PTF updates or OS/400 upgrades
        • Install of new OS/400 release causes all access plans to be rebuilt
    • CQE access plan implementations involving subqueries and/or hash join are not saved
      • Access plans thrown away regardless of SQL interface
      • QAQQINI option, REUSE_SUBQUERY_PLAN = *YES, added midway thru V5R2 to allow subquery access plans to be saved
  • SQE Plan Cache PLAN X SQE Plan Cache SQL Pgm-A PLAN Y PLAN Z Statement 1 Statement 2 Statement 3 Statement 4 Statement 3 Statement 6 Statement 7 Dynamic SQL SQL Pkg-1 SQL Pgm-B CQE Plan SQE Plan Legend :
  • SQE Plan Cache
    • Self-managed cache for all plans produced by SQE Optimizer
      • Allows more reuse of existing plans regardless of interface for identical SQL statements
        • Room for about 6000-10000 SQL statements
        • Plans are stored in a compressed mode
        • Up to 3 plans can be stored per SQL statement
      • Access is optimized to minimize contention on plan entries across system
      • Cache is automatically maintained to keep most active queries available for reuse
      • Foundation for a self-learning query optimizer to interrogate the plans to make wiser costing decisions
    • SQE Access Plans actually divided between Plan Cache & Containing Object (Program, Package, etc)
      • Plan Cache stores the optimized portion (e.g., the index scan recipe) of the access plan
      • The access plan components needed for validating an SQL request (such as the SQL statement text and object information) is left in the original access plan location along with a virtual link to the plan in the Plan Cache
      • Plan cache entry also contains information on automatic stats collection & refresh
    • Plan Cache is cleared at IPL (& IASP vary off)
  • Access Plan to ODP ACCESS PLAN Internal Structures OPEN DATA PATH (ODP ) Executable code for all requested I/O operations CREATE
    • Create process is EXPENSIVE
      • Longer execution time the first time an SQL statement is executed
    • Emphasizes the need of REUSABLE ODPs
  • ODPs Query (access plan) Program Table ODP via HLL OPEN Program or Interface Table Table ODP via SQL Native database access SQL database access Table
  • ODP's "In Action" Disk Memory Job Structure ODP ODP ODP ODP ODP Physical I/O INDEX TABLE Logical I/O Application Program SQL Request
  • OPEN Optimization
    • OPENs can occur on:
      • OPEN Statement
      • SELECT Into Statement
      • INSERT statement with a VALUES clause
      • INSERT statement with a SELECT (2 OPENs)
      • Searched UPDATE's
      • Searched DELETE's
      • Some SET statements
      • VALUES INTO statement
      • Certain subqueries may require one Open per subselect
    • The request and environment determine if the OPEN requires an ODP Creation ("Full" Open)
  • OPEN Optimization
    • Reusable ODPs
    • To minimize the number of ODPs that have to be created, DB2 UDB leaves the ODP open and reuses the ODP if the statement is run again in job (if possible)
      • Reusable ODPs consume 10 to 20 times less CPU resources than a new ODP
      • Two executions of statement needed to establish reuse pattern
        • Execution statistics per statement are maintained in SQL package and program objects
        • DB2 UDB analyzes these execution statements to determine if ODP reuse should be established after the first execution
    • IF First or Second Execution of Statement THEN... ELSE IF Non-Reusable ODP THEN... ELSE Reusable ODP - Do Nothing
    • Run SQL request
    • Delete ODP or Leave ODP open for Reuse?
      • ODP will not be deleted after second execution
    • Loop back to #1
    Reusing the ODP steps
    • Validate Access Plan
    • IF NOT Valid, THEN Reoptimize & update plan (late binding)
    • Create the ODP
  • Reusing the ODP example DECLARE c1 FOR SELECT empnumber, lastname FROM employee WHERE deptno = '503'; ... OPEN c1; WHILE more rows AND no error FETCH c1 INTO :EmpNo, :EmpName END WHILE; CLOSE c1; ODP either created or reused depending on current mode IF Reusable ODP, THEN ODP is NOT deleted (pseudo-closed) ELSE ODP is deleted
  • OPEN Optimization Reusable ODP Example SQL7912 ODP created. SQL7912 ODP created. ... SQL7913 ODP deleted. SQL7913 ODP deleted. SQL7985 CALL statement complete SQL7912 ODP created. SQL7912 ODP created. ... SQL7914 ODP not deleted. SQL7914 ODP not deleted. SQL7985 CALL statement complete SQL7911 ODP reused. SQL7911 ODP reused. ... ... SQL7914 ODP not deleted. SQL7914 ODP not deleted. SQL7985 CALL statement complete
    • INSERT INTO resultTable
      • SELECT id, name
        • FROM customers
          • WHERE region = 'Central'
  • Miscellaneous considerations
    • Reusable ODP Control - QSQPSCLS1 Data Area
    • Existence of data area allows the reuse behavior after first execution of SQL statement instead of the second execution
      • DB2 checks for data area named QSQPSCLS1 in job's library list - existence only checked at the beginning of the job (first SQL ODP)
      • USE CAREFULLY since cursors that are not reused will consume extra storage
      • Data area contents, type, and length are not applicable
  • Reusable ODP Tips & Techniques
  • OPEN Optimization - Reuse Roadblocks
    • With embedded SQL, DB2 UDB only reuses ODPs opened by the same statement
      • If same statement will be executed multiple times, need to code logic so that statement is in a shared subroutine that can called
    NON-REUSABLE ODP SELECT name FROM emptbl WHERE id=:hostvar ... SELECT name FROM emptbl WHERE id=:hostvar ... REUSABLE ODP CALL Proc1 ... CALL Proc1 ... Proc1:========= SELECT name FROM emptbl WHERE id=:hostvar
  • OPEN Optimization - Reuse Roadblocks
    • Unqualified table and the library list has changed since the ODP was opened (System naming mode - *SYS)
      • If table location is not changing (library list just changing for other objects), then default collection can be used to enable reuse
      • Default collection exists for static, dynamic, and extended dynamic SQL
        • QSQCHGDC API added in V4R5 to allow default collection for dynamic SQL
    • Override Database File (OVRDBF) or Delete Override (DLTOVR) command issued for tables associated with an ODP that was previously opened
    • Program being shared across Switchable Independent ASPs (IASP) (V5R2) where library name is the same in each IASP
  • OPEN Optimization - Reuse Roadblocks
    • ODP requires temporary index
      • Temporary index build does not always cause an ODP to be non-reusable, optimizer does try to reuse temporary index if possible
        • If SQL run multiple times and index is built on each execution, then creating a permanent index will probably make ODP reusable
        • If host variable value used to build selection into temporary index (ie, sparse), then ODP is not reusable because temporary index selection can be different on every execution of the query
          • Optimizer will tend to avoid creating sparse indexes if the statement execution history shows it to be a "frequently executed" statement
      • Temporary indexes are not usable by other ODP's
  • OPEN Optimization - Reuse Roadblocks
    • ODP may or may not be reused if host variable used to specify the pattern of a LIKE predicate. ODP is not reused when the value contains embedded search patterns HostVar = "%OU%WARE“ SELECT * FROM DeptTbl WHERE DeptName LIKE :HostVar
      • Starting with V5R1 embedded search patterns can be implemented with a reusable ODP
    "Simple" LIKEs are reusable- HostVar = 'IBM%'
  • OPEN Optimization
    • UPDATE WHERE CURRENT OF Reuse
    • If an UPDATE WHERE CURRENT OF request contains a function or operator on the SET clause, then an open operation must be performed
    • Can avoid this open by performing the function or operation in the host language
      • Code operation into host language... FETCH EMPT INTO :Salary; Salary = Salary + 1000; UPDATE EMPLOYEE SET Salary = :Salary WHERE CURRENT OF Empt;
      • Instead of... FETCH EMPT INTO :Salary; UPDATE Employee SET Salary = :Salary+1000 WHERE CURRENT OF Empt;
  • OPEN Optimization - Reuse Considerations
    • Reusable ODP's do have one shortcoming... once reuse mode has started access plan is NOT rebuilt when the environment changes
      • What happens to performance if Reusable ODP is now run against a table that started out empty and has now grown 5X in size since the last execution?
      • What if selectively of host variable or parameter marker greatly different on 5th execution of statement?
      • What if index added for tuning after 5th execution of statement in the job?
  • OPEN Optimization
    • Actions that Delete ODPs
    • SQL DISCONNECT statement
    • CLOSQLCSR(*ENDPGM) - ONLY deletes ODP's on program exit, if it's the last SQL program on the call stack
    • A Reclaim request is issued: Reclaim Activation Group (RCLACTGRP) for ILE programs or Reclaim Resource (RCLRSC) for OPM programs
      • A Reclaim will not close ODP when programs precompiled using CLOSQLCUR(*ENDJOB)
      • With COBOL, RCLRSC issued when...
        • First COBOL program on the call stack ends
        • COBOL program issues the STOP RUN statement
  • OPEN Optimization
    • Actions that Delete ODPs (continued)
    • New CONFLICT parameter added to ALCOBJ command in V4R5 that can be used to request that pseudo-closed cursors to be hard closed
      • CONFLICT(*RQSRLS) (not the default) request to release lock sent to each job and thread holding a conflicting lock
        • Will not release real application locks
        • Only releases implicit system locks for Reusable ODP cursors
        • Does not release Reusable ODP locks in requestor's job, only other jobs
    • ODP reuse can also be controlled/managed with the QAQQINI options added in V4R5
      • OPEN_CURSOR_THRESHOLD & OPEN_CURSOR_CLOSE_COUNT
    • CLI provides special statement attribute & Toolbox JDBC Driver
    • OS/400 Extended Dynamic interface gives programmer control of ODP deletion
  • Dynamic & Extended Dynamic SQL
  • Dynamic SQL Tuning
    • With Dynamic interfaces, full opens are avoided by using a "PREPARE once, EXECUTE many" mentality when an SQL statement is going to be executed more than once
    • A PREPARE does NOT automatically create a new statement and full open on each execution
      • DB2 UDB performs caching on Dynamic SQL PREPAREs within a job/connection
      • DB2 UDB caching is not perfect (and subject to change), good application design is the only way to guarantee ODP reuse
      • Job Cache searched by Statement Text & Statement Name to try and reuse existing ODPs or Plans (white space matters on statement)
    PreparedStatement pst = con.prepareStatement (&quot;INSERT INTO c1 VALUES( ?, ?, ?, ?, ?)&quot;); for (int i = 0; i < outerNumOfLoops; i++) { for (int j = 0; j < numOfLoops; j++) { pst.setString(1, &quot;GenData_&quot; + Integer.toString(j)); … pst.addBatch(); } int [] updateCounts = pst.executeBatch(); con.commit(); }
  • Dynamic SQL Tuning - System Cache
    • DB2 UDB for iSeries also caches access plans for Dynamic SQL requests in the SystemWide Statement Cache (SWC)
      • Only access plans are reused (No ODP reuse)
    • SWC requires no administration
      • Cache storage allocation & management handled by DB2 UDB
      • Cache is created from scratch each IPL
      • Cache churn and contention avoided by allowing limited access plan updates
        • In some cases, optimizer will build a temporary access plan to use instead of the cached access plan
        • Might think about system IPL after your database is tuned
      • Cache contents cannot be viewed, max of 165,000+ statements
    • SWC cache does interact with the job cache
  • Dynamic SQL Example
    • SQL statements are dynamically created on the fly as part of application logic DSTRING = 'DELETE FROM CORPDATA.EMPLOYEE WHERE EMPNO = 33'; EXEC SQL PREPARE S1 FROM :DSTRING; EXEC SQL EXECUTE S1;
  • Dynamic SQL Tuning - Parameter Markers
    • Parameter Markers are one implementation method for &quot;EXECUTE many&quot;
      • Improves chance for reusable ODPs
        • DB2 caching of access plans & ODPs done after parameter marker conversion
      • Ex: want to run the same SELECT statement several times using different values for customer state
        • 50 different statements/opens for each of the states OR...
        • Single SQL statement that allows you to plug in the needed state value
      • DB2 UDB does automate some of this
  • Dynamic SQL Tuning
    • Parameter Marker Example StmtString = 'DELETE FROM employee WHERE empno=?'; ... PREPARE s1 USING :StmtString; ... EXECUTE s1 USING :InputEmpNo; ...
  • Dynamic SQL Tuning
    • Automatic Parameter Marker Conversion
    • DB2 UDB automatically tries to convert literals into parameter markers to make statement look repetitive
    SELECT name, address FROM customers WHERE orderamount > 1000.00 AND state = 'NY' CONVERTED TO: SELECT name, address FROM customers WHERE orderamount > ? AND state = ? UPDATE customers SET status = 'A' WHERE orderamount >= 10000 CONVERTED TO: UPDATE customers SET status = ? WHERE orderamount >= ?
  • Dynamic SQL Tuning - Parameter Markers
    • Auto conversion of literals will NOT occur in the following cases:
      • A few complex cases where a mix of parameter markers and literals prevent auto conversion
      • Special Registers (eg, CURRENT DATE)
      • Expressions used in SET or SELECT or VALUES clause SELECT name, SUBSTR(city,1,20) FROM customers WHERE State='IA'
    • CAST scalar function can be used to allow parameter markers in more places by promising attributes
      • Most applicable to functions. Example: SELECT MAX(CAST(? AS DECIMAL(8,2)), PastRate) AS BestRate FROM ...
    • Marker Conversion can impact optimizer choices
      • Sparse indexes are not used
      • Some subqueries cannot be implemented with joins
    • Statements WITHOUT parameter markers will have non-reusable ODPs, IF executed via ExecDirect or Execute Immediate interfaces
  • Extended Dynamic & Packages
    • Package is searched to see if there is a statement with the same SQL and attributes
      • Hash tables used to make statement searches faster
    • If a match is found, then a new statement entry name is allocated with a pointer to the existing statement information (access plan, etc)
      • DB Monitor can be used to determine if &quot;packaged&quot; statement used at execution time:
        • SELECT qqc103, qqc21, qq1000 from ‹db monitor table› WHERE qqrid=1000 AND qvc18='E'
  • Extended Dynamic & Packages STATEMENT NAME: QZ7A6B3E74C31D0000 Select IID, INAME, IPRICE, IDATA from TEST/ITEM where IID in ( ?, ?, ?, ?) SQL4021 Access plan last saved on 12/16/96 at 20:21:45. SQL4020 Estimated query run time is 1 seconds. SQL4008 Access path ITEM used for file 1. SQL4011 Key row positioning used on file 1. ... STATEMENT NAME: QZ7A6B3E74DD6D8000 Select CLAST, CDCT, CCREDT, WTAX from TEST/CSTMR, TEST//WRHS where CWID=? and CDID=? SQL4021 Access plan last saved on 12/16/96 at 20:21:43. SQL4020 Estimated query run time is 1 seconds. SQL4007 Query implementation for join position 1 file 2. SQL4008 Access path WRHS used for file 2. SQL4011 Key row positioning used on file 2. SQL4007 Query implementation for join position 2 file 1. SQL4006 All access paths considered for file 1. SQL4008 Access path CSTMR used for file 1. SQL4014 0 join field pair(s) are used for this join position. SQL4011 Key row positioning used on file 1.
    • Package Contents:
      • Statement name
      • Statement text
      • Statement parse tree
      • Access Plan
      • PRTSQLINF output
  • Extended Dynamic & Packages
    • Advantages of using Extended Dynamic SQL Packages:
      • Shared resource available to all users
        • Access information is reused instead of every job and every user &quot;re-learning&quot; the SQL statement
      • Permanent object that saves information across job termination and system termination
        • Can even be saved & restored to other systems
      • Improved performance decisions since statistical information is accumulated for each SQL statement
  • Extended Dynamic & Packages
    • The Interfaces
    • System API - QSQPRCED
      • API user responsible for creating package
      • API user responsible for preparing and descrbing statement into package
      • API user responsible for checking existince of statement and executing statements in the package
    • XDA API set
      • Abstraction layer built on top of QSQPRCED for local and remote access
    • Extended dynamic setting/configuration for IBM Client Access ODBC driver & iSeries Java Toolkit JDBC driver
      • Drivers handle package creation
      • Drivers automate the process of adding statements into the package
      • Drivers automate process of checking for existing statement and executing statements in the package
  • Extended Dynamic & Packages
    • QSQPRCED API functions:
      • 1 = Build new package
      • 2 = Prepare statement into package
      • 3 = Execute statement from a package
      • 4 = Open a cursor defined by statement in package
      • 5 = Fetch data from open cursor
      • 6 = Close open cursor
      • 7 = Describe prepared statement in package
      • 8 = Close open cursor and delete Open Data Path (ODP)
      • 9 = Prepare and describe in 1 step
      • A = Inquire if a statement has been prepared into package
      • B = Actually close pseudo-close cursors
      • C = Delete Package
  • Extended Dynamic & Packages
    • Considerations
    • Any SQL statement that can be prepared is eligible
      • ODBC & JDBC drivers have further restrictions
    • Size limitations
      • Current size limit is 500 MB, about 16K statements
      • Package can grow without new statements being added. Access plan rebuilds require additional storage
      • Background package compression tries to increase life and usefulness of package objects
    • Good online SQLPackage FAQ at the
    • DB2 UDB for iSeries web site - www.iseries.ibm.com/db2
    • FAQ URL - http://www.iseries.ibm.com/db2/sqlperffaq.htm
  • SQL Performance Techniques & Considerations
  • Blocking for Performance - Where? Disk Memory Job Structure ODP ODP ODP ODP ODP Physical I/O INDEX TABLE Logical I/O Application Program SQL Request Blocked Fetch/Insert Blocked Cursor Expert Cache/Pre-fetch
  • VARCHAR considerations
    • Variable length columns (VARCHAR/VARGRAPHIC)
      • If primary goal is space saving, include ALLOCATE(0) with VARCHAR definition
      • If primary goal is performance, ALLOCATE value should be wide enough to accommodate 90-95% of the values that will be assigned to the varying length column
        • Minimizes number of times that DB2 UDB has to touch data in overflow storage area
    • VARCHAR columns more efficient on wildcard searches
      • DB2 able to stop searching after the end of the string - with fixed length characters it must search to the end of string, even if all blanks
  • VARCHAR considerations Fixed Length Primary Storage Variable Length Auxilary Storage CREATE TABLE dept ( id CHAR(4), name VARCHAR(40), bldg_num INTEGER ) Fixed & &quot;Variable&quot; Length Storage CREATE TABLE dept ( id CHAR(4), name VARCHAR(40) ALLOCATE(40) , bldg_num INTEGER ) 05 SALES
  • SQL Table considerations
    • SQL-created tables are faster on reads and slower on writes that DDS-created tables
      • New data being added to SQL table is run thru more data validation, so there's no data cleansing & validation that has to be performed on reads
    • If you have tables that receive a high-velocity of inserts in concurrent enviroments, then it may be beneficial to pre-allocate storage for the table
      • CHGPF FILE(lib/table1) SIZE( 125000 1000 3) ALLOCATE(*YES)
      • After CHGPF, a CLRPFM or RGZPFM command must be executed to &quot;activate&quot; the allocation
  • Stored Procedures
    • Huge performance savings in distributed computing environments by dramatically reducing the number of flows (requests) to the database engine
    • Performance improvements further enhanced by the option of providing result sets back to ODBC & JDBC clients
    DB2 for AS/400 R e q u e s t o r DB2 UDB for iSeries R e q u e s t o r DB2 UDB for iSeries R e q u e s t o r SP
  • Additional Information
    • IBM Workshop -
    • ibm.com/servers/eserver/iseries/service/igs/db2performance.html (being offered in Rochester on April, July, October) AND... PRACTICE, PRACTICE, PRACTICE
    • ** 402401 (41MH) The Science & The Art of Query Optimization
    • ** 410191 (51MD) Preparing to Get the Best Performance Out of V5R2 & V5R3 (SQE)
    • Tools to help get started and make tuning easier:
      • insureSQL from Centerfield Technology (insureSQL.com)
      • IBM iSeries Navigator
    • Whitepaper on Indexing Strategy:
    • ibm.com/servers/enable/site/education/ibo/register.html?indxng
    • Latest Information on SQL Query Engine (SQE) Enhancements:
    • http://www.iseries.ibm.com/db2/sqe.html
  • Additional Information
    • DB2 UDB for iSeries home page - ibm.com/iseries/db2
    • Education Resources - Classroom & Online
      • http://www.iseries.ibm.com/db2/gettingstarted.html
      • ibm.com/servers/enable/site/education/ibo/view.html?oc#db2
      • ibm.com/servers/enable/site/education/ibo/view.html?wp#db2
    • DB2 UDB for iSeries Publications
      • Online Manuals: http://www.iseries.ibm.com/db2/books.htm
      • Porting Help: http://ibm.com/servers/enable/site/db2/porting.html
      • DB2 UDB for iSeries Redbooks (http://ibm.com/redbooks)
        • Stored Procedures, Triggers, and User-Defined Functions on DB2 UDB for iSeries (SG24-6503)
        • Preparing for & Understanding the SQL Query Engine Redbook (www.iseries.ibm.com/db2/sqe.html)
        • Modernizing iSeries Application Data Access (SG24-6393)
      • SQL/400 Developer's Guide by Paul Conte & Mike Cravitz
        • http://www.iseriesnetwork.com/str/books/Uniquebook2.cfm?NextBook=183
      • iSeries and AS/400 SQL at Work by Howard Arner
        • http://www.sqlthing.com/books.htm
  • Education Roadmap ibm.com /iseries/db2/gettingstarted.html ibm.com /services/learning
    • Self-study iSeries Navigator tutorials for DB2 UDB at:
    • ibm.com/servers/enable/site/education/ibo/view.html?oc#db2
    DB2 UDB for iSeries Fundamentals (S6145) DB2 UDB for iSeries SQL Advanced Programming (S6139) DB2 UDB for iSeries SQL & Query Performance Workshop ibm.com/servers/eserver/iseries/ service/igs/db2performance.html Accessing DB2 UDB for iSeries w/SQL (S6137) Developing iSeries applications w/SQL (S6138)
    • Piloting DB2 UDB with iSeries Navigator
    • Performance Tuning DB2 UDB with iSeries Navigator & Visual Explain
    • Integrating XML and DB2 UDB for iSeries
  • Appendix: SQL Performance Best Practices
  • Blocking for performance
    • DB2 UDB runtime engine tries to automatically block in the following cases
      • INSERT w/Subselect
        • 64K block size automatically used to allow more efficient I/O between cursors
        • Big impact on summary/aggregate table builds
        • May be able to increase efficiency with 128K blocking factors
          • Blocking factor = 128K / row length
          • OVRDBF FILE(table) SEQONLY(*YES factor)
      • OPEN
        • Blocking is done under the OPEN statement when the rows are retrieved if all of the following conditions are true:
          • The cursor is only used for FETCH statements.
          • No EXECUTE or EXECUTE IMMEDIATE statements are in the program, or ALWBLK(*ALLREAD) was specified, or the cursor is declared as FOR FETCH ONLY
          • COMMIT(*CHG or *CS) and ALWBLK(*ALLREAD) are specified or COMMIT(*NONE) is specified
  • Blocking for performance
    • INSERT for N Rows
    • Applications that perform many INSERT statements in succession or via a single loop may be improved by bundling all the new rows into a single request
    • Fill host language array with new rows and then pass array of rows on single SQL insert request
    • ODBC tests showed that 500 Single Row inserts took 17 seconds versus 1.25 seconds for Blocked insert
    1 SQL calls 1 database op 1 SQL call 100 database ops Multiple Row Insert Statement 100 SQL calls 1 database op 100 SQL calls 100 database ops Single Row Insert Statement Database Manager with Blocking Database Manager w/NO blocking
  • Blocking for performance
    • FETCH for N Rows
    • Multiple rows of data from a table are retrieved into the application in a single request
    • SQL blocking of fetches can be improved with the following:
      • Attribute information in the target array/area matches the attribute of the columns being retrieved
      • In general, try to retrieve as many rows as possible and let the database determine the optimal blocking size
      • Do not mix single and multiple row FETCH requests on the same cursor
      • PRIOR, CURRENT, and RELATIVE options should not be used with multiple row fetch due to their random nature
  • Miscellaneous considerations
    • Although SELECT * is very easy to code, it is far more effective to explicitly list the columns that are actually required by the application
      • Minimizes the amount of resource needed
        • Example, SELECT DISTINCT or SELECT UNION requires columns to be sorted
      • Improves the query optimizer's decision making
        • Improves chances of Index Only Access method
    • Example: JDBC program that executed a statement 20 times that really only needed 3 out of the 20 total columns
      • &quot;SELECT *&quot; caused the JDBC driver to call the database 800 times
      • &quot;SELECT col1, col2, col3&quot; caused driver to call the database 120 times
  • Miscellaneous considerations
    • FOR FETCH ONLY clause also improves decision making by letting DB2 UDB know exactly which cursors are read only
    • Only include columns that you really intend on updating on FOR UPDATE OF clause
      • Updateable cursor thru dynamic SQL or an UPDATE statement that doesn't specify a FOR UPDATE OF clause causes all columns to be considered updateable
    • Tell DB2 UDB as much as you know
      • Some interfaces provide options for controlling the default behavior
  • Isolation Level Considerations
    • Use lowest isolation level (commitment control) possible in your application
      • The lower the level, the less system resources consumed
      • Avoid Serializable isolation level in concurrent environments, Serializable isolation acquires exclusive table locks
    • Switching isolation levels can negatively impact ODP reuse if the same SQL statement is executed at different isolation levels
      • Switching to and from the Serializable level is especially problematic
  • Journal Considerations
    • DB2 attempts to journal (log) all SQL created tables automatically
      • Verify that DB2 tables are only journaled when required
    • Journals can have a definite impact on SQL performance, so that's another area of investigation when doing database performance analysis. Possible places to start:
      • Journal minimal data option to minimize amount of data copied into the journal and size of the journal object
        • MINENTDTA Option on CRTJRN & CHGJRN CL commands
      • Journal Caching PRPQ (5799-BJC) if running batch jobs with isolation level of No Commit/*NONE
      • HW Configuration: Look for limited Write Cache
      • New Redbook : Striving for Optimal Journal Performance (SG24-6286)
  • Miscellaneous considerations
    • If using System Naming (*SYS - lib/table) try to avoid unqualified long table name references
      • Each time SQL statement is run, background job has to search system catalog for the corresponding short name and then determine which library in the library list to use
      • Default collection option exists for static, dynamic and extended dynamic SQL
        • QSQCHGDC API added in V4R5 to allow default collection for dynamic SQL
      • SQL Naming (*SQL) does NOT have this performance overhead, since it only looks for tables in the library having the same name as user profile
    • Be cautious of queries run against the SQL catalog tables
    TIME_DIMENSION TIME_00001 Which library?
  • Trademarks and Disclaimers
    • © IBM Corporation 1994-2005. All rights reserved.
    • References in this document to IBM products or services do not imply that IBM intends to make them available in every country.
    • The following terms are trademarks of International Business Machines Corporation in the United States, other countries, or both:
    • Rational is a trademark of International Business Machines Corporation and Rational Software Corporation in the United States, other countries, or both.
    • Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
    • Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
    • Intel, Intel Inside (logos), MMX and Pentium are trademarks of Intel Corporation in the United States, other countries, or both.
    • UNIX is a registered trademark of The Open Group in the United States and other countries.
    • SET and the SET Logo are trademarks owned by SET Secure Electronic Transaction LLC.
    • Other company, product or service names may be trademarks or service marks of others.
    • Information is provided &quot;AS IS&quot; without warranty of any kind.
    • All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.
    • Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.
    • All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your local IBM office or IBM authorized reseller for the full text of the specific Statement of Direction.
    • Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.
    • Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore,
    • no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.
    • Photographs shown are of engineering prototypes. Changes may be incorporated in production models.
    iSeries IBM (logo) eServer i5/OS IBM AS/400e OS/400 e-business on demand AS/400