Your SlideShare is downloading. ×
ANALYZING DB2 DATA SHARING PERFORMANCE PROBLEMS
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

ANALYZING DB2 DATA SHARING PERFORMANCE PROBLEMS

997
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
997
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
35
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ANALYZING DB2 DATA SHARING PERFORMANCE PROBLEMS Donald R. Deese Computer Management Sciences, Inc. 6076-D Franconia Road Alexandria, VA 22310 DB2 data sharing offers many advantages in the right environment, and performs very well when DB2 is properly tuned. DB2 data sharing can significantly increase capacity, it offers increased availability, and it provides a flexible operating environment. In certain environments, DB2 data sharing can dramatically improve performance of DB2 subsystems (e.g., when most data are shared and most of the shared data can fit into group buffer pools, data sharing members do not have to continually reference DASD). On the other hand, remarkably unsatisfying DB2 performance can be the result if DB2 data sharing is poorly tuned. This paper describes common problems with DB2 data sharing, and offers alternative solutions to the problems. 1. INTRODUCTION and Linda August provide excellent descriptions of coupling facility performance analysis. This paper is in two parts. The first part gives a brief overview of DB2 data sharing concepts and operation. Also, for simplicity in the discussion, the performance The second part discusses techniques that can be used implications of duplexed coupling facility structures are to analyze common performance problems with DB2 not discussed. Discussion of these performance data sharing, and presents alternatives that can be used implications will await a future paper. to solve the performance problems. This document does not describe all aspects of DB2 data 2: DB2 DATA SHARING OVERVIEW sharing, nor does it describe all performance problems that could occur in a DB2 data sharing environment. DB2 data sharing provides the ability of two or more DB2 Such a detailed discussion would require a formal subsystems to directly access and change a single set of seminar on DB2 data sharing, and is beyond the scope data. Up to 32 DB2 subsystems can share the data, and of this document. the subsystems function as members of a data sharing group. Rather, this document describes aspects of DB2 data sharing that are necessary to understand the implications The sharing of data requires that a mechanism be in of common performance problems, it describes the most place to ensure that data integrity and consistency are common performance problems, and it describes maintained. If one member of the DB2 data sharing alternate solutions to the common problems. DB2 data group should change any shared data, the mechanism sharing environments certainly exist in which these must make sure that all other members of the data descriptions would be inadequate, but this document sharing group are advised that their copy of the data is no should prove useful for most DB2 data sharing longer valid. environments. In particular, the document does not describe much of 2.1: Group Buffer Pools the analysis related to coupling facility performance problems. The referenced presentations by Joan Kelley The primary mechanism that is used to ensure data integrity, aside from the basic DB2 locking algorithms provided by Internal Resource Lock Manager (IRLM), is Permission is granted to CMG to publish this document in its publications. Computer Management Sciences, Inc. retains its right to distribute copies of this the use of group buffer pools that reside in a coupling document to whomever it chooses. facility. Group buffer pools are shared by all members of CPExpert is a trademark of Computer Management Sciences, Inc., Alexandria, the data sharing group, and there is a one-to-one VA 22310. All other trademarks are the property of their respective owners. mapping between group buffer pools and virtual buffer pools (e.g., Group Buffer Pool 6 corresponds to Virtual ©Copyright 2001, Computer Management Sciences, Inc., Alexandria, VA Buffer Pool 6).
  • 2. The primary purpose of a group buffer pool is to ensure Directory data integrity and consistency among members of the Entries Data page data sharing group. Data page Data page • If there is inter-DB2 Read/Write (R/W) interest for a Data page page set or page set partition, each member in the Data page data sharing group must register its interest in a page Data from the shared page set or page set partition before Entries the page can be read into the member’s local buffer pool. This interest must be registered in a directory entry in the group buffer pool. Consequently, every page in the local buffer pool is registered in the page Data page directory of the corresponding group buffer pool when there is inter-DB2 R/W interest. Exhibit 1: Group Buffer Pool Structure • When a page is changed by any data sharing member, the directory entry for the page is updated to indicate that the page has been changed. All members of the The division is based on a specified ratio of directory data sharing group are then notified that the page has entries to data entries. The default ratio is 5:1, meaning been changed, and that the page in their local buffer that there are five directory entries for each data entry in pool is no longer valid but must be refreshed if the group buffer pool. referenced again. In this way, data consistency is maintained across data sharing members. There are more directory entries than data entries because (unless GBPCACHE ALL is specified), only A secondary purpose of a group buffer pool is to provide changed pages from a virtual buffer pool are placed in improved performance for data sharing members that the group buffer pool. Unless there is extensive update read pages from shared page sets or partitions. activity, changed pages account for a relatively small percent of the total number of pages in a virtual buffer • A data sharing member can write pages to the group pool. buffer pool when the pages are changed1, or optionally when the pages have been read from DASD. However, a directory entry must be created to account for each page from a shared page set that is in any virtual • When needed by other data sharing members, pages buffer pool or hiperpool of members of the DB2 data from shared page sets or partitions can be retrieved sharing group. Consequently, there normally would be from the coupling facility, rather than being read from far more directory entries than there would be data DASD. Retrieval from the coupling facility normally entries. IBM has decided that, as a default, there should would be much faster than retrieval from DASD 2. be five times as many directory entries as there are data entries. As illustrated in Exhibit 1, group buffer pool storage is divided into directory entry storage and data entry Data entries are the same size as the page in the storage. This means that, of the total storage allocated corresponding local buffer pool in a data sharing to the coupling facility structure containing the group member (e.g., if the page were defined as 16K in the buffer pool, some of the storage is reserved for directory local buffer pool, the data entry in the group buffer pool entries and some is reserved for data (page) entries. would be 16K). Data entries hold the actual data read This division of storage is done when the group buffer from DASD or updated by data sharing members. pool is created and can be changed via an ALTER GROUPBUFFERPOOL command. Directory entries are much smaller (typically about 200 bytes, but the size depends on the CF level). Directory entries hold control information about the data pages they describe. One implication of this difference in size between 1 Note that prior to DB2 Version 6, all changed pages were directory entries and data entries is that, although there written to the group buffer pool. With DB2 Version 6, the group buffer normally are many more directory entries than data pool optionally can be used only for data integrity and consistency by entries, the total storage in the group buffer pool that is specifying GBPCACHE NONE. occupied by directory entries normally is less than that 2 DB2 actually determines the fastest access method, and reads occupied by data entries. data from the coupling facility or from DASD, depending on which method is quicker.
  • 3. 2.2: Registering Pages • Pages are written to the group buffer pool as they are read from DASD with the GBPCACHE ALL option. As mentioned earlier, with data sharing and a shared page set, DB2 must register the page with the coupling • Only large object (LOB) space map pages are written facility (register with the group buffer pool) if there is inter- to the group buffer pool with the GBPCACHE SYSTEM DB2 R/W interest in the page set. This is done so DB2 option. can maintain integrity of the data. For example, if one DB2 should change a particular page, the page must be • Pages are never written to the group buffer pool with invalidated in the virtual buffer pool or hiperpool of all the GBPCACHE NONE option (the group buffer pool other members of the DB2 data sharing group. is used solely to ensure data integrity by implementing page cross-invalidation). The page is registered with the group buffer pool before it is read from DASD. If a directory entry exists for the Exhibit 3 shows an example action that can occur when page, the directory entry is updated to reflect that this DB2 subsystem DB2B wants to read the page that was DB2 has an interest in the page. If a directory entry does just read by DB2A. When DB2B wants to read the page, not exist for the page, a directory entry is created and it registers its interest in the page, and verifies that the updated to reflect that this DB2 has an interest in the page is not in the group buffer pool. page. Creating a directory entry means that storage must be DB2B registers interest and reads page available in the group buffer pool to hold the directory entry. Under certain situations, there may be no directory Directory entry for each registered page entry storage available in the group buffer pool and the Name Changed? Interest Flags Cached? Castout status etc. read fails because of lack of storage (this problem will be NO 1 1 0 ------- NO discussed later). D D D B B B DB2C 2 2 2 A B C Exhibit 2 illustrates example actions that can occur when DB2B DB2 subsystem DB2A (1) registers interest in a page and DB2A (2) reads the page. Data entries in DB2A registers interest and reads page Data entries in Group Buffer Pool 6 local buffer pools (BP6) Directory entry for each registered page Exhibit 3: DB2B registers interest and reads page Name Changed? Interest Flags Cached? Castout status etc. NO 1 0 0 ------- NO D D D B B B DB2C 2 2 2 A B C DB2B Notice that the page would be in the group buffer pool if GBPCACHE ALL had been specified, even though the DB2A page had not been changed. When a DB2 subsystem needs to reference a page in a Data entries in DB2 data sharing environment, it looks for the page in the Data entries in Group Buffer Pool 6 local buffer pools (BP6) following order: Step 1: Search the virtual buffer pool. The page might Exhibit 2: DB2A registers interest and reads page exist in the buffer pool, but might have been marked invalid by DB2 data sharing integrity code. If it is marked Exhibit 2 shows that the page is not written to the group invalid, DB2 skips to Step 3. buffer pool when it is read from DASD. With DB2 Version 6, there are four options with respect to whether (and Step 2: Search the hiperpool buffer pool. The page when) a page is cached in the group buffer pool: might exist in a hiperpool both with non-data sharing and with data sharing (unless GBPCACHE ALL has been • Only changed pages are written to the group buffer specified for the page set, in which case a hiperpool is pool with the GBPCACHE CHANGED option not used). However, the page might be marked invalid in (GBPCACHE CHANGED is assumed in Exhibit 2 and the hiperpool by DB2 data sharing integrity code. If it is the following exhibits). marked invalid, DB2 goes to Step 3.
  • 4. Step 3: Search the group buffer pool. Unless Exhibit 4 shows that page has been written to the group GBPCACHE parameters override, DB2 next checks the buffer pool. Actually, this is misleading, since the page group buffer pool for the page. This is done if there is probably is still only in the local buffer pool of DB2A (the R/W interest in the page set or the partition, or if the page dashed line and dashed box around YES under set is defined as GBPCACHE ALL. “Cached?” indicate that the page might not yet be cached). Step 4: Read the page from DASD. DB2 reads the page from DASD if the page is not found after applying the In a non-data sharing environment, changed pages in a above search algorithm. virtual buffer pool are not normally immediately written to DASD once they are changed. These changed pages In our example, the page is not in the group buffer pool. are retained in the virtual buffer pool awaiting the Consequently, DB2B must read the page from DASD. possibility that they will be referenced again. The pages remain in the virtual buffer pool until certain thresholds This is a good illustration of the potential benefits of are exceeded or until a DB2 checkpoint is taken. This GBPCACHE ALL. Suppose that DB2B regularly reads concept is termed “deferred writes” of the pages. pages from the table space that it shares with DB2A. If sufficient coupling facility storage were available, and if With data sharing, DB2 still performs deferred writes (i.e., the application was important, perhaps GBPCACHE ALL the page is retained in the virtual buffer pool). However, should be specified. In this illustration, DB2B would have when an update is to a page set for which inter-DB2 R/W found the page in the group buffer pool (because it would interest exists, DB2 writes the updated pages from the have been placed in the group buffer pool by DB2A) and virtual buffer pool to the group buffer pool, rather than would not have to read the page from DASD. A writing the pages to DASD. DB2 writes updated pages considerable improvement in performance could result, from the virtual buffer pool to the group buffer pool when: particularly if many pages were read. • A page is required for update act ion by another system because there is no conflict on transaction locking 2.3: Maintaining Data Integrity (such as with page sets that are using row locking, index pages, space map pages, etc.). Suppose that DB2A changes the page being discussed. The page must be invalidated in the local buffer pools of • The transaction commits. When committing a all data sharing members that have an interest in the transaction that performed an UPDATE, pages that page. were updated but not yet written to the group buffer pool are synchronously written to the group buffer pool. • Normal DB2 virtual buffer algorithms cause the page DB2A changes page, page invalidated in DB2B to be written to DASD (e.g., the virtual buffer pool’s Directory entry for each registered page DWQT or VDWQT thresholds were exceeded). Name Changed? Interest Flags Cached? Castout status etc. YES 1 1 0 YES ------- 2.4: Reading Changed Page D D D B B B DB2C 2 2 2 A B C DB2B When a DB2 member needs a page of data and finds the page in its local buffer pool (either a virtual buffer pool or DB2A a hiperpool), it tests to see if the buffer contents are still valid. If the local buffer contents are not valid, the page must be refreshed, either from the group buffer pool or Data entries in DASD. Data entries in Group Buffer Pool 6 local buffer pools (BP6) Exhibit 5 illustrates the example actions that occur when a DB2 subsystem wishes to read a page that has been Exhibit 4: Maintaining data integrity invalidated. In this example, the page was in the local buffer pool. Exhibit 4 illustrates the example actions that occur when However, it had been marked invalid by DB2 data sharing DB2 changes a page and invalidates the page in other integrity code, since it had been changed by DB2A. DB2 subsystems in the data sharing group. In this Consequently, DB2B must refresh its version of the page. illustration, the only other data sharing member with an In this example, DB2B finds the page in the group buffer interest in the page is DB2B, and the page is invalidated pool, and reads the page from the group buffer pool. in DB2B’s local buffer.
  • 5. In Exhibit 5, DB2A is shown as writing the page to the 2.5: Castout Process group buffer pool only when it is required by DB2B. This situation illustrates that even though the page had been Changed pages do not stay in the group buffer pool changed, DB2 does not necessarily write the page to the indefinitely. Eventually, the pages are written to DASD. group buffer pool immediately on change. DB2 uses a castout process to write changed3 data from a group buffer pool, through a DB2 private buffer, to DB2B reads page from Group Buffer Pool DASD. The castout process occurs during a group buffer pool checkpoint. Between checkpoints, castout normally Directory entry for each registered page is controlled by the value specified for two group buffer Name Changed? Interest Flags Cached? Castout status etc. pool thresholds: YES 1 1 0 ------- YES D B D B D B DB2C • Class Castout Threshold. The Class Castout 2 A 2 B 2 C Threshold is a percent of changed pages on a castout DB2B class queue. This threshold determines the total DB2A number of changed pages that can exist in a castout class queue before castout occurs. There are a fixed number of castout class queues in Data entries in local buffer pools (BP6) each group buffer pool. DB2 internally maps changed Data entries in Group Buffer Pool 6 pages that belong to the same page sets or partitions to the same castout class queues in the group buffer pool. Since there are a fixed number of queues, more Exhibit 5: Reading an invalidated page than one page set or partition can be mapped to a single castout class queue. Whether the page is in the group buffer pool when When DB2 writes changed pages to the group buffer needed by DB2B depends on whether the Deferred Write pool, it determines how many changed pages are on a Threshold (DWQT) or the Vertical Deferred Write particular class castout queue. If the Class Castout Threshold (VDWQT) on DB2A had been reached, Threshold (changed pages on queue divided by total whether the transaction on DB2A had committed, how pages in group buffer pool) is exceeded for any queue, DB2B obtained a lock, etc. Thus, there is a dashed line DB2 casts out a number of pages from that queue to between DB2A and the page in the group buffer pool, bring the percent of changed pages below the Class indicating that DB2A might write the page to the group Castout Threshold. buffer pool only when required to satisfy the request from DB2B. The Class Castout Threshold is conceptually similar to the VDWQT threshold for local buffer pools, except Under many conditions, the page will reside in the group that the castout class queue may contain multiple page buffer pool when needed by a data sharing member even sets or partitions. though the page had been invalidated. This is because the changed page had been written to the group buffer The Class Castout Threshold has a default of 10, pool, based on the conditions described in Section 2.3. meaning that castout begins when changed pages on a castout class queue are more than 10% of the total Once having been written to the group buffer pool, the pages in the group buffer pool. This default can be changed page will reside in the group buffer pool until changed by the ALTER GROUPBUFFERPOOL castout (castout is described in the next section). When CLASST command. needed by another DB2 data sharing member, the page might not have been castout from the group buffer pool • Group Buffer Pool Castout Threshold. The Group to DASD. Even if the page had been castout, its data Buffer Pool Castout Threshold is a percentage of the entry might not have been reclaimed. total number of pages in the group buffer pool. This threshold determines the total number of changed Alternatively, the page might have been written to the pages that can exist in the group buffer pool before group buffer pool because GBPCACHE ALL had been castout occurs. specified. In either of these cases, the page would be available in the group buffer pool when needed. 3 Unchanged pages in the group buffer pool are not castout. Unchanged pages would be in the group buffer pool if the GBPCACHE ALL option were used.
  • 6. When DB2 writes changed pages to the group buffer even after they have been written to DASD or to a group pool, it determines how many changed pages are in buffer pool). the group buffer pool. If the Group Buffer Pool Castout Threshold (changed pages divided by total pages in group buffer pool) is exceeded, DB2 casts out pages 2.6: Reclaiming Entries from class castout queues to bring the percent of changed pages below the Threshold. A group buffer pool has a fixed number of directory entries and data entries. Directory entries are used when The Group Buffer Pool Castout Threshold is some DB2 data sharing member registers a page with conceptually similar to the DWQT threshold for local the group buffer pool. Data entries are used when a buffer pools. changed page is written to the group buffer pool (with GBPCACHE CHANGED), or when an unchanged page The Group Buffer Pool Castout Threshold has a is written to the group buffer pool (with GBPCACHE ALL). default of 50, meaning that castout begins when changed pages in the group buffer pool are more than Directory entries are released when there is no more 50% of the total pages in the group buffer pool. This R/W interest in a page set or partition. Data entries are default can be changed by the ALTER not “released” as such. Their status changes from GROUPBUFFERPOOL GBPOOLT command. “changed” to “unchanged” when they are castout. Additionally, the storage occupied by data entries is The actual castout is done by the “owner” of the page set simply available when the corresponding directory entry or partition. The DB2 subsystem that is assigned is released. “ownership” of castout for a page set or partition is the DB2 subsystem that had the first update intent on the However, the amount of allocated group buffer pool page set or partition. After the castout ownership is storage can be inadequate for the DB2 data sharing assigned, subsequent updating DB2 subsystems become workload. In such a situation, DB2 must “reclaim” data backup owners. One of the backup owners becomes the entries or directory entries. castout owner if the original castout owner should no longer have R/W interest in the page set. • Reclaiming data entries. Data entries for changed pages are “reclaimed” by the castout process Exhibit 6 illustrates the castout process. described in Section 2.5 of this document. As described in that section, castout pages simply have DB2A CASTOUT page to DASD their status altered from “changed” to “unchanged” in their associated directory entry once they are castout. Directory entry for each registered page Name Changed? Interest Flags Cached? Castout status etc. Unchanged pages in the group buffer pool can be NO 1 1 0 ------- YES YES reclaimed by DB2 when space is needed to write a D B D B D B DB2C page to the group buffer pool, but there is no unused DB2B 2 A 2 B 2 C DB2A DB2A data entry storage. In this case, DB2 reclaims private buffer “unchanged” pages on a “least recently used” (LRU) Note basis. Data entries in local buffer pools (BP6) • Reclaiming directory entries. Directory entries can be reclaimed when a DB2 data sharing member Data entries in Group Buffer Pool 6 wishes to register a page as a part of reading the page, but there are no directory entries available. Exhibit 6: Castout process When directory entries are reclaimed, the associated pages must be invalidated in each DB2 registered for the pages. When a page has been invalidated, any Even after the page has been castout, it normally DB2 member of the data sharing group must reread remains in the group buffer pool, and can be read by the page from DASD if the DB2 member again wishes any DB2 member. However, the status of a castout page to reference the page. Rereading the page requires is altered from “changed” to “unchanged” since it has re-registration of the page (including creating a been written to DASD. Consequently, the data entry of directory entry). This can cause considerable the castout page can be “reclaimed” by DB2 if necessary. overhead and delay to the application. This feature (leaving castout pages in the group buffer The performance implications of reclaiming data entries pool) is conceptually similar to that with local buffer pools and directory entries are explored in the next section. (changed pages in a local buffer pool remain in the pool
  • 7. 3: ANALYZING DB2 DATA SHARING • The virtual buffer pool attributes records are written at each statistics interval (also written if monitor trace DB2 data sharing has many laudatory features, as class 1 is on, and are written each time IFCID 106 is mentioned at the beginning of this document. written because of a start trace command). These records contain information such as the size of the However, some users complain that performance is poor local buffer pools, deferred write thresholds, etc. Since with DB2 data sharing, or users complain that there is the virtual buffer pool parameters can be dynamically excessive overhead. changed (via ALTER BUFFERPOOL), change information is important when analyzing performance Often, these complaints are valid only because of data. improper sizing of group buffer pools, an inappropriate directory entry/data entry ratio for a group buffer pool • SMF Type 74 (Subtype 2 and Subtype 4) records when considering the specific DB2 workload, incorrect contain information about XCF activity and coupling DB2 data sharing parameters, or even inappropriate facility activity, respectively. This information is very parameters for DB2 virtual buffer pools that adversely useful to help identify problems with coupling facility affect DB2 data sharing performance. requests and delays. Information is provided for both the DB2 lock structure and the DB2 group buffer pools, This section describes standard data that is available for from the view of XCF. The structure names identify analyzing DB2 data sharing performance problems, whether a structure is a DB2 lock or group buffer pool discusses some of the common problems that occur with structure. DB2 data sharing, describes the analysis that is used to identify the problems, and suggests alternatives that • DB2 accounting information is written to SMF Type should be considered to solve the problems. 101 records. This source of performance analysis data is listed last because the records are voluminous and require significant processing time. The accounting 3.1: DB2 Performance Data information would be analyzed only if it were necessary to associate performance problems to specific The data that normally is used to analyze DB2 data applications. sharing performance problems come from standard System Management Facility (SMF) records. • Data set information is available in SMF Type 42 records. This information would be analyzed to identify • DB2 interval statistics are collected continuously data set problems. while DB2 is running, and the statistics are recorded to SMF at regular intervals. The default recording interval is 30 minutes. Recording at 15 minutes adds little 3.2: Common DB2 Data Sharing Problems overhead, but can significantly enhance the analysis of DB2 performance problems. The DB2 interval Analyzing standard data related to DB2 performance can statistics are written to SMF Type 100. usually identify the cause of problems, and changes normally can be made to significantly improve The interval statistics contain group buffer pool performance of DB2 data sharing. information that can be analyzed to determine how well DB2 data sharing is performing, and to determine The examples given in the following sections are meant causes of poor performance. to illustrate the analysis and possible solutions. Neither the analysis nor the solutions are complete; other areas The interval statistics also contain virtual buffer pool can be analyzed and other solutions can be explored in statistics. It often is necessary to analyze the virtual each area of analysis. buffer pool information in order to determine whether local buffer pools are causing problems for group In some instances, conflicts between applications might buffer pools. be so great that it might be impossible to obtain satisfactory DB2 data sharing performance. In this case, • The group buffer pool attributes records are written it might be necessary to decrease the data sharing level when statistics trace class 5 is on (also written if associated with specific group buffer pools. To monitor class 1 is on). These records contain accomplish this, page sets or partitions could be information such as the size of the group buffer pools, reassigned to different virtual buffer pools (with castout thresholds, etc. Since the group buffer pool corresponding reassignment to different group buffer parameters can be dynamically changed (via ALTER pools), additional virtual buffer pools could be created GROUPBUFFERPOOL), change information is (with corresponding additional group buffer pools), or important when analyzing performance data. some application could be moved between data sharing members and moved to another processing time.
  • 8. 3.2.1: Coupling facility read requests could not the directory entry/data entry ratio. This can be done complete by ALTER GROUPBUFFERPOOL. As mentioned in the first part of this document, a page Recall that the directory entry size is small relative to must be registered with the group buffer pool before it is the size of the data entry (directory entries are normally read from DASD. If a directory entry exists for the page, about 200 bytes while data entries are the size of the the directory entry is updated to reflect that this DB2 has local buffer pool page, which is a minimum of 4096 an interest in the page. If a directory entry does not exist bytes). Consequently, increasing the directory for the page, one is created and then updated to reflect entry/data entry ratio might yield a relatively large that this DB2 has an interest in the page. increase in the number of directory entries, while not dramatically reducing the number of data entries. Creating a directory entry means that storage must be available in the group buffer pool to hold the directory • Increase the size of the group buffer pool. If the entry. If many pages are registered (or if the ratio of above solution is not feasible because of a relative directory entries to data entries is too low), there might be shortage of data entries, consider increasing the size of no storage in the group buffer pool that is available for the group buffer pool. Increasing the size of the group creating a directory entry. In this case, the DASD read buffer pool will, of course yield more directory entries fails and directory entries must be reclaimed in the group (since part of the increase would be used for directory buffer pool so a new directory entry can be created for entries and part would be used for data entries). the page. This situation can cause serious performance If the size of the group buffer pool is increased, problems: consider increasing the directory entry/data entry ratio at the same time. This would cause the increased • When a DASD read fails, the DB2 application must group buffer pool storage to be used for directory wait for the page if the read were a synchronous read entries. (a random page read or a sequential page read that had not been read as part of a prefetch operation). • Reduce the data sharing level of the group buffer pool. If neither of the above alternatives is feasible, • When a DASD read fails for a prefetch operation, the consider reducing the data sharing level of the group prefetch operation is aborted. This means that pages buffer pool. The beginning of this section describes that otherwise would be prefetched via an some ways to reduce the data sharing level associated asynchronous I/O operation must be read via a with a specific group buffer pool. synchronous I/O operation when the pages are required by the application. 3.2.2: Coupling facility write requests could not • When directory entries are reclaimed, the associated complete pages must be invalidated in each DB2 registered for the pages. This page invalidation process creates A data entry must be available in the group buffer pool overhead and uses system resources. when a page of data is to be written to the group buffer pool. If the group buffer pool does not have a data entry • When a page has been invalidated in each DB2 available, the write is rejected because of a lack of registered for the page, those DB2s must reread the storage. When a write is rejected, DB2 initiates castout page from DASD if the DB2s again wish to reference processing if castout is not already in progress. The DB2 the page. Rereading the page requires re-registration member waits for 3 seconds after triggering the castout, of the page (including creating a directory entry). and it retries the write operation for up to four times. Reclaiming pages because of a shortage of directory Pages that cannot be written to the group buffer pool are entry storage can cause “thrashing” of pages between added to the Logical Page List and message DSNB250E buffer pools and DASD, and can generate significant is issued. This action would require a recovery of the overhead. page set before the pages could be used. This problem is detected by examining the QBGLRF This problem is detected by examining the QBGLWF (read fail) variable in the DB2 interval statistics. A (write fail) variable in the DB2 interval statistics. A problem is indicated when the QBGLRF value is non- problem is indicated when the QBGLWF value is non- zero. zero. Possible solutions to this problem include the following Possible solutions to this problem include the following alternatives: alternatives: • Increase the directory entry/data entry ratio. The • Reduce the local buffer pool thresholds. Failed easiest solution to this problem is to simply increase writes typically result from castout processing not being
  • 9. able to keep up with the writes. There are several 3.2.3: Low read hit percentage for cross-invalidated causes, but the most common is that a transaction pages commits and “floods” the group buffer pool with write requests. This “flood” of write requests can be reduced When a DB2 data sharing member writes changed by reducing the local buffer pool thresholds (VDWQT pages to the group buffer pool, the page is invalidated in and DWQT). This would spread the write requests more the buffer of all other DB2 members. A major reason for evenly over the life of the transaction. writing the page to the group buffer pool is the expectation that other data sharing members will • Increase frequency of castout. By increasing the eventually read the changed page, rather than having to frequency of castout processing, pages in the group read the page from DASD. If the changed page is not buffer pool would be castout to DASD more often. read again, there is considerable overhead required to Consequently, more pages would normally be write the page to the group buffer pool, overhead for available. This is accomplished by reducing the group cross-invalidation, overhead for castout processing, and buffer pool checkpoint interval (the default is eight overhead for re-registering the page. This overhead is a minutes), reducing the group buffer pool castout waste of system resources. threshold, or reducing the group buffer pool class castout threshold. More significant is the situation where the invalidated page is required by another data sharing member, but • Increase the size of the group buffer pool. As the page has already been castout to DASD, and its data mentioned above, the basic cause of the failed write is entry has been reclaimed for another page. In this case, that a data entry was not available. Consider not only is the overhead a waste of system resources, but increasing the size of the group buffer pool if the above the DASD accesses (both for castout and for reread by actions do not yield satisfactory results. the other data sharing member) could potentially be avoided. • Issue COMMIT more frequently. If the application can issue COMMIT more frequently, few pages would The situation is caused by the “residency time” of the normally be written to the group buffer pool at each pages being too short before the pages are castout. COMMIT. Consequently, there would be a greater opportunity for the castout processing to castout pages This problem is detected by computing the read hit between COMMIT operations. percent for cross-invalidated pages: • Verify GBPCACHE ALL is required. Some QBGLXD Read hit percent 100 applications might have specified GBPCACHE ALL, QBGLXD  QBGLXR and DB2 could be “flooding” the group buffer pool during periods of heavy prefetch activity. In this case, investigate these applications to determine whether this Based on IBM’s guidance, a problem is indicated when specification is necessary. the read hit percent for cross-invalidated pages is less than 95%. • Review I/O configuration. Check the I/O configuration (and the coupling facility Possible solutions to this problem include the following performance) to ensure that castout processing alternatives: is not being delayed by I/O waits. • Increase Class Castout Threshold or Group Buffer • Reduce the directory entry/data entry ratio. It is Pool Castout Threshold. Pages normally are castout possible that the directory entry/data entry ratio is too when a group buffer pool checkpoint occurs, or when high, resulting in too much storage reserved for the Class Castout Threshold or Group Buffer Pool directory entries rather than being available for data Castout Threshold is reached. If only a few page sets entries. or partitions are assigned to the group buffer pool, castout might occur prematurely because of the However, the directory entry is small relative to the size castout thresholds. If a low total read hit percentage of the data entry (directory entries are normally about occurs, consider increasing the thresholds. 200 bytes while data entries are the size of the local buffer pool page, which is a minimum of 4096 bytes). However, since all changed pages are castout at group Consequently, adjusting the directory entry/data entry buffer pool checkpoint, The “castout engine not ratio usually would have little positive effect in this available” situation might be encountered if the situation. thresholds are excessively high. This situation is described in Section 3.2.4.
  • 10. • Verify that GBPCACHE ALL is required. Some • Review I/O configuration. Another common reason application may have specified GBPCACHE ALL and that a castout engine is not available is that the castout DB2 was writing pages to the group buffer pool rate occurs too quickly for the coupling facility and the regardless of whether they were changed. In this case, DASD system to handle the I/O, or that the DB2 review the total read hit percentage (changed and “owning” the group buffer pool (and responsible for unchanged pages) to see if GBPCACHE ALL is castout processing) is overloaded with higher productive. importance work. Verify that the coupling facility and the DASD I/O system are not experiencing • Consider GBPCACHE NONE. One or more of the unnecessary delay, and that sufficient capacity exists page sets assigned to the group buffer pool might be for the DB2 that owns the group buffer pool. a candidate for GBPCACHE NONE (if operating with DB2 Version 6)4. This alternative would be considered, • Reduce the group buffer pool castout thresholds. of course, only if total read hit percentage was very low The Class Castout Threshold or the Group Buffer Pool (e.g., the percentage was less than 5%). Threshold might be too high, with the result that a burst of write activity to the group buffer pool could cause a • Increase the size of the group buffer pool. As corresponding burst of castout activity. Decreasing mentioned above, the basic cause of the low read hit these thresholds could reduce the bursts of castout. percentage for cross-invalidate pages is that the “residency time” of the pages is too short before the • Increase the size of the group buffer pool. pages are castout. Consider increasing the size of the Increasing the size of the group buffer pool might help group buffer pool if the above actions do not yield this situation, but only if the bursts of activity described satisfactory results. above can be managed. 3.2.4: Castout engine was not available 3.2.5: Coupling facility write engine was not When a castout operation is scheduled, a castout engine available is used for the castout process. DB2 has up to 300 castout engines, depending on the castout workload and The coupling facility write operations are implemented by the amount of DBM1 storage available. When castout is DB2 subtasks, called write engines. The maximum scheduled, there might not be an available engine. In number of concurrent write subtasks is a constant this situation, castout cannot commence for the castout designed into DB2 (the maximum write engines). process scheduled, and castout waits for an engine to Coupling facility write operations are suspended if there become available. are no more write engines when DB2 wishes to write pages to the coupling facility. This problem is detected by examining the QBGLCN (castout engine not available) variable in the DB2 interval When a "must complete" operation to the group buffer statistics. A problem is indicated when the QBGLCN pool fails (a write at commit or rollback, or a read for value is non-zero. rollback or restart), the page is put in the logical page list (LPL) and becomes unavailable. The START Possible solutions to this problem include the following DATABASE command must be used to recover pages alternatives: from the LPL. • Reduce the local buffer pool thresholds. Local This problem is detected by examining the QBGLSU buffer pool write activity can occur in bursts when a (coupling facility write engine not available) variable in the transaction commits and “floods” the group buffer pool DB2 interval statistics. A problem is indicated when the with write requests for changed pages. This “flood” of QBGLSU value is non-zero. write requests can cause the group buffer pool thresholds to be repeatedly reached, with Possible solutions to this problem include the following corresponding castout operations initiated. A “flood” of alternatives: write requests can be reduced by lowering the local buffer pool thresholds (VDWQT or DWQT). This will • Reduce the local buffer pool thresholds. Local spread the write requests more evenly over the life of buffer pool write activity can occur in bursts when a the transaction. transaction commits and “floods” the group buffer pool with write requests for changed pages. This “flood” of write requests can cause the pool of write engines to 4 If GBPCACHE NONE is specified, IBM suggests that DASD be exhausted. The “flood” of write requests can be Fast Write should be enabled. Additionally, the vertical deferred write reduced by lowering the local buffer pool thresholds threshold (VDWQT) should be set to 0, to cause deferred write pages (DWQT or VDWQT). Lowering the thresholds would to be continuously written. Continuously writing these pages would avoid a large surge of write activity at COMMIT.
  • 11. spread the write requests more evenly over the life of Possible solutions to this problem include the following the transaction. alternatives: • Verify GBPCACHE ALL is required. Some • Increase the directory entry/data entry ratio. application may have specified GBPCACHE ALL and Excessive cross-invalidations due to directory reclaims DB2 was writing pages to the group buffer pool indicate that there are too few directory entries. The regardless of whether they were changed. In this case, number of directory entries can be increased by review the total read hit percentage (changed and increasing the directory entry to data entry ratio. unchanged pages) to see if GBPCACHE ALL is Implementing this option would reduce the number of productive. data entries, but changing the directory entry to data entry ratio is relatively easy to implement. A small • Issue COMMIT more frequently. If the application increase in the directory entry to page entry ratio could can issue COMMIT more frequently, few pages would result in a large increase in the number of directory normally be written to the group buffer pool at each entries, with only a small decrease in the number of COMMIT. This would minimize the number of changed data entries. This is because the directory entry is so pages written from the local buffer pool when the small (about 200 bytes) relative to the data entry (4096 application does issue COMMIT. bytes or more, depending on the page size of the corresponding local buffer pool). 3.2.6: Excessive cross-invalidations due to • Increase the size of the group buffer pool. Since the directory reclaims group buffer pool is divided into directory entries and data entries, increasing the size of the group buffer When directory entries are reclaimed to handle new pool would provide more directory entries. This option work, cross-invalidation must occur for all members of is not likely to be satisfactory, however, unless either the DB2 data sharing group that have those pages in (1) the size of the group buffer pool is significantly their buffer pools, even when the data has not actually increased, or (2) the directory entry to data entry ratio changed. This cross-invalidation process requires is increased. overhead, as all members must be notified that their copy of the page is invalid. 4: CONCLUSIONS If another member of the DB2 data sharing group needs access to the invalidated page, that DB2 must read the This document has described some of the more serious page from DASD. This means that the DB2 must first problems with DB2 data sharing. Fortunately, these register the page with the coupling facility (it must acquire problems are easy to detect from standard data that is a directory entry and register interest in the page), and available in SMF records. Many of the listed alternatives then read the page from DASD. are easy to implement, yet the solutions can result in significantly improved performance with DB2 data Directory reclaim activity can result in (1) increased read sharing. I/O activity to the data base to reacquire a referenced data item, (2) increased CPU utilization and coupling facility overhead caused by re-registering interest in REFERENCES pages, and (3) elongated transaction response times. In some situations, the directory reclaim activity and cross- “Parallel Sysplex Tuning Update” series, Joan Kelley, invalidation can lead to "thrashing" among DB2 IBM, Session 2523 at SHARE Winter 2001 in Long members, with significant overhead and poor DB2 Beach. performance. “Coupling Facility and DB2 data sharing dance to the This problem is detected by computing the percent same tune,” Linda August, IBM, Session 2524 at SHARE cross-invalidated pages due to directory reclaims: Winter 2001 in Long Beach. Percent XI (directory reclaims) R744CXDR 100 DB2 Data Sharing: Planning and Administration, DB2 R744CXDR  R744CDER Version 4, SC26-3269 DB2 Data Sharing: Planning and Administration, DB2 Based on IBM’s guidance, a problem is indicated when Version 5, SC26-8961 the read hit percent for cross-invalidated pages is greater than zero (i.e., when there are any cross-invalidations DB2 Data Sharing: Planning and Administration, DB2 due to directory reclaims). Version 6, SC26-9007
  • 12. DB2 Universal Database for OS/390 and z/OS: Administrative Guide, DB2 Version 7, SC26-9931 DB2 Administration Guide, Version 4 , SC26-3269 DB2 Administration Guide, Version 5 , SC26-8957 DB2 Administration Guide, Version 6 , SC26-9003 DB2 Universal Database for OS/390 and z/OS: Data Sharing and Administration, DB2 Version 7, SC26-9935 DB2 for MVS/ESA Version 4 Data Sharing Performance Topics Redbook, SG24-4611 DB2 for OS/390 Version 5 Performance Topics Redbook SG24-2213 DB2 UDB for OS/390 Version 6 Performance Topics Redbook, SG24-5351 OS/390 MVS Setting Up a Sysplex. GC28-1779 OS/390 MVS Parallel Sysplex Configuration: Volume 2 Cookbook, SG24-2076 DSNWMSGS (IFCD field descriptions) for DB2 UDB for OS/390 Version 6