DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation

#IDUG
DB2 10/11 for z/OS System Performance
Monitoring and Optimisation
John Campbell, Florence Dubois
IBM DB2 for z/OS Development
One-day education seminar
Tuesday, September 9, 2014 – 09:30 AM - 04:30 PM | Platform: DB2 for z/OS

#IDUG
© 2014 IBM Corporation
Objectives
• Focus on key areas
• System address space CPU, EDM pools, dataset activity, logging, lock/latch contention, DBM1
virtual and real storage, buffer pools and GBP,…
• Identify the key performance indicators to be monitored
• Provide rules-of-thumb to be applied
• Typically expressed in a range, e.g. < X-Y
• If <X, no problem - GREEN
• If >Y, need further investigation and tuning - RED
• Boundary condition if in between - AMBER
• Investigate with more detailed tracing and analysis when time available
• Provide tuning advice for common problems
2

#IDUG
Topics
• Data Collection
• DB2 System Address Space CPU Time
• EDM Pools
• Dataset Open/Close
• Log Activity
• Lock/Latch
• DBM1 Virtual Storage and Real Storage
• Buffer Pools and Group Buffer Pools
• RDS Sort
• DDF Activity
• Miscellaneous
3

#IDUG
Statistics Classes
Class Data Collected IFCIDs Activated (1)
1 Statistics data 1, 2, 105,106, 202, 225
2 Installation-defined statistics record 152
3 Deadlock, lock escalation, group buffer pool, data
set extension information, indications of long-
running URs, and active log space shortages
172, 196, 250, 258, 261, 262, 313, 330, 335, 337
4 DB2 exceptional conditions 173,191-195, 203-210, 235, 236, 238, 267, 268, 343, 402
5 DB2 data sharing statistics record 230 , 254
6 Storage usage details 225
7 DRDA location statistics 365
8 Data set I/O statistics 199
9 Aggregated CPU and wait time by connection type 369
5
• How to Start, Modify, Stop traces
• -START TRACE(STAT) CLASS(*)
• -MODIFY TRA(S) CLASS(*) TNO(1)
• -STOP TRA(A) TNO(2)
(1) IFCID details are documented in SDSNIVPD(DSNWMSGS)

#IDUG
DB2 System Parameters
• SMFSTAT
• YES (default) starts the trace for the default classes (1,3,4,5,6)
• Recommendation to start all the statistics traces by specifying SMFSTAT=(1,3,4,5,6,7,8,9)
• CPU overhead of all the DB2 Statistics Traces is negligible
• STATIME
• V10: IFCIDs 2, 202, 217, 225, 230 are written at fixed one-minute intervals
• Only 1440 intervals per day
• Essential to study evolutional trend that led to complex system problems, e.g. slowdowns etc.
• Very small SMF data volume
• V10: STATIME applies only to IFCIDs 105, 106, 199 and 365
• Default: 1 minute
• Consider setting STATIME to 5 minutes if SMF record type 100 data volume is a problem
• SYNCVAL
• NO (default) specifies that statistics recording is not synchronized
• 0 to 59 – DB2 stats interval is synchronized with the beginning of the hour (0 minute past the hour) or
with any number of minutes past the hour up to 59
6

#IDUG
SMF Records
• V10: New DB2 ZPARM SMFCOMP
• Specifies whether DB2 is to compress trace records that are to be written to SMF
• Very useful to reduce SMF records volume
• Experience shows ~ 70-80% compression with only a small CPU overhead at ~1%
• Default OFF
• Need to check that all tools that post process DB2 SMF records can handle compressed SMF
records before turning compression ON
• DB2 Statistics Records are written as SMF 100 records
• Recommendation to copy SMF 100 records, and to keep them separately
• SMF 100 records represent a relatively small amount of the total SMF data volume
• Improved elapsed time performance to post process
7

#IDUG
DB2 System Address Space CPU Time
8

#IDUG
• If not, needs further investigation
• If distributed application, DDF SRB (preempt SRB + IIP SRB) is typically the highest by far
as it also includes Accounting TCB time
CPU TIMES TCB TIME PREEMPT SRB NONPREEMPT SRB CP CPU TIME PREEMPT CP CPU
IIP SRB /COMMIT
-------------------- ----------- ------------ ------------ ------------ ------------ --------
SYSTEM SERVICES AS 8.749583 56.375746 1.198326 1:06.323655 1.079948 0.000040
DATABASE SERVICES AS 1.807715 14.305106 1:54.209902 2:10.322723 1:40.832862 0.000079
IRLM 0.000931 0.000000 42.573512 42.574443 0.000000 0.000018
DDF AS 10.682058 13:58.424540 1:59.796978 16:08.903575 14:56.025519 0.000588
TOTAL 21.240287 21:07.030296 4:37.778719 20:08.124397 16:37.938329 0.000725
System Address Space CPU Time
9
All TCB times should be low relative to MSTR and DBM1 SRB times
IRLM SRB time should be low relative to MSTR and DBM1 SRB times
ROT
CP CPU TIME = TCB TIME + PREEMPT SRB + NONPREEMPT SRB << Total CPU Time on general CP
PREEMPT IIP SRB << Total CPU Time on zIIP
TOTAL CPU TIME = CP CPU TIME + PREEMPT IIP SRB
TOTAL SRB TIME = PREEMPT SRB + NONPREEMPT SRB + PREEMPT IIP SRB

#IDUG
10
TCB and SRB Times – Major Contributors
Application DBAS SSAS IRLM
ACCOUNTING STATISTICS
T
C
B
S
R
B
SQL processing Synch I/O
Global lock requests*
Buffer updates
Lock requests Logical logging
GBP reads*
The same as in TCB case, but only
in enclave preemptible SRB
mode.
Reported in TCB instrumentation.
Dataset Open/Close
ExtendPreformat
Deferred write
Castout*
P-lock negotiation*
Async GBP write*
GBP checkpoints*
Archiving
BSDS processing
Physical log write
CheckpointsBackouts
Thread deallocation
Error checking
Management
Deadlock detection
IRLM and XES
global contention*
(*) Data Sharing specific
DBM1 Full
System Contraction
Update commit
incl. GBP write
and page P-lock unlock*
Notify Exit*
Prefetch read
Parallel child tasks
Delete Name*
Delete Name = pageset close or pseudo-close to convert to non-GBP dependent
Async XES request*
Local IRLM latch
contention
P-lock negotiation*
Eligible to run on zIIP (V10) (V11)
Clean-up of pseudo deleted IX entries

#IDUG
WLM Policy Setup
• Common problems
• Systems are run at very high CPU utilisation for elongated periods of time
• Little or no non-DB2 work to pre-empt when the system becomes 100% busy i.e., no non-DB2
work that can be sacrificed
• Sporadic slowdowns or hangs of a DB2 subsystem or an entire DB2 data sharing group as a
result of poorly managed WLM policy settings
• DB2 applications workloads are allowed to ‘fall asleep’
• If a thread is starved while holding a major DB2 internal latch or other vital resource, this can
cause an entire DB2 subsystem or data sharing group to stall
• Especially if other work running on the LPAR has latent demand for CPU
• DB2 system address spaces (AS) are not protected from being pre-empted under severe load
• Any delays in critical system tasks as a result of a DB2 system AS being pre-empted can lead to
slowdowns and potential ‘sympathy sickness’ across the data sharing group
• See next slide
11

#IDUG
WLM Policy Setup
• Why protect DB2 system AS from being pre-empted?
• Answer: These AS are critical for efficient system operation and should be defined with an
aggressive goal and a very high importance
• MSTR contains the DB2 system monitor task
• Requires an aggressive WLM goal to monitor CPU stalls and virtual storage constraints
• DBM1 manages DB2 threads and is critical for both local latch and cross-system locking negotiation
• Any delay in negotiating a critical system or application resource (e.g. P-lock on a space map page)
can lead to a slowdown of the whole DB2 data sharing group
• DIST and WLM-managed stored procedure AS only run the DB2 service tasks i.e. work performed for
DB2 not attributable to a single user
• Classification of incoming workload, scheduling of external stored procedures, etc.
• Typically means these address spaces place a minimal CPU load on the system
• BUT… they do require minimal CPU delay to ensure good system wide performance
• The higher CPU demands to run DDF and/or SP workloads are controlled by the WLM service class
for the DDF enclave workloads or the other workloads calling the SP
• Clear separation between DB2 services which are long-running started tasks used to manage the
environment and transaction workloads that run and use DB2 services
12

#IDUG
WLM Policy Setup ...
• Recommendation
• Configure the WLM policy defensively to deal with situations when the system is over-
committed
• General guidelines
• VTAM, IRLM, and RRS must be mapped to service class SYSSTC
• All DB2 system AS should be isolated into a unique user-defined service class defined with
Importance 1 and a very high velocity goal (e.g. 85-90)
• MSTR, DBM1, DIST, WLM-managed stored procedure AS
• As of V9, recommendation was to put DB2 MSTR into SYSSTC
• When z/OS WLM Blocked Workload Support is enabled (see later slides), Imp1 with a very high
velocity goal is generally sufficient for MSTR
• Do not generally recommend putting DB2 DBM1 into SYSSTC because of risk of DB2 misclassifying
incoming work
• Special case: If experiencing recurring DB2 slowdowns when running at very high CPU utilisation
for elongated periods of time, move all DB2 system AS (MSTR, DBM1, DIST, WLM-managed store
procedure AS) to service class SYSSTC to help with PD/PSI
13

#IDUG
• General guidelines ...
• Reserve Importance 1 for critical server address spaces (DB2 AS, IMS control region, CICS TOR) and
use Importance 2-5 for all business applications starting in I=2 for high priority ones
• Set CPU critical for DB2 system AS to protect against stealing of CPU from lower priority work
• Try to have non-DB2 work ready to pre-empt
• Remove any discretionary definitions for any DB2-related work
• Do not use Resource Group Capping e.g., for batch
• Let WLM decide on how to best manage mixed workloads
• For CICS and IMS, favour 80th or 90th percentile response time goals
• Transactions are typically non-uniform
• Response time goals must be practically achievable – use RMF Workload Activity Report to validate
• Online workloads that share DB2 objects should have similar WLM performance goals to prevent
interference slowdowns
• Example: An online DDF workload classified much lower in importance than an online CICS workload
• If they share any of the DB2 objects, a low running DDF enclave may end up causing the CICS online
workload to become stalled
14

#IDUG
• General guidelines ...
• Processing capacity should be planned at no more than 92-95% average at peak
• Especially if there is little non-DB2 work that can be sacrificed at high CPU utilisation
• Risks of running at very high system utilisation over long periods of time
• More likely to expose latent stress-related software defects and/or user setup problems
• Transaction path length increases at very high system utilisation
• With a corresponding increase in CPU resource consumption
• Application locks as well as DB2 internal latches will be held for longer periods
• Will increase lock/latch contention, which will behave in a non-linear fashion
• Will lead to greatly extended In-DB2 transit time (response time)
• Extra processing capacity activated using IBM Capacity on Demand offering should be turned on
ahead of anticipated peaks – not when hitting the wall at 100%
• Otherwise the extra processing is used to manage contention and not net new workload
• Always need some spare capacity for automated and service recovery actions
• Enable defensive mechanisms to help with stalled workloads
• -DISPLAY THREAD(*) SERVICE(WAIT) command
• z/OS WLM Blocked Workload Support
15

#IDUG
-DIS THREAD(*) SERVICE(WAIT)
• Identifies allied agents and distributed DBATs that have been suspended for more than x
seconds
• x = MAX(60, 2x IRLM resource timeout interval)
• If the thread is suspended due to IRLM resource contention or DB2 latch contention,
additional information is displayed to help identify the problem
• Note that DISTSERV may return false positives for DBATs that have not been in use for
more than the x seconds
16
19:23:01.31 STC42353 DSNV401I -DT45 DISPLAY THREAD REPORT FOLLOWS –
DSNV402I -DT45 ACTIVE THREADS –
NAME ST A REQ ID AUTHID PLAN ASID TOKEN
CT45A T * 12635 ENTRU1030014 ADMF010 REPN603 00C6 90847
V490-SUSPENDED 04061-19:20:04.62 DSNTLSUS +00000546 14.21

#IDUG
-DIS THREAD(*) SERVICE(WAIT) …
• Will also attempt to dynamically boost priority for any latch holder that appears to be
stuck
• Targeted boost via WLM services
• Can run with SCOPE(GROUP)
• Output piped back to the originating member via notify message
• Recommendations
• Driven at one-minute interval by the internal DB2 System Monitor
• But the diagnostic information is not written out
• Recommend to issue the command via automation on a one-minute interval and save away the output
as diagnostics
• See also DSNV507I message on -DISPLAY THREAD(*) TYPE(SYSTEM) output
V507-ACTIVE MONITOR, INTERVALS=1235, STG=47%, BOOSTS=0, HEALTH=100
REGION=1633M, AVAIL=1521M, CUSHION=375M
17

#IDUG
z/OS WLM Blocked Workload Support
• Allows small amounts of CPU to be leaked onto stalled dispatchable units of work on the
system ready queue
• Not specific to DB2 workloads
• Allows even frozen discretionary jobs to get some small amount of CPU
• But will not help jobs that are stalled because of resource group capping
• Controlled by two parameters in IEAOPTxx parmlib member
• BLWLINTHD – Threshold time interval for which a blocked address space or enclave must wait
before being considered for promotion
• Default: 20 seconds
• BLWLTRPCT – How much of the CPU capacity on the LPAR is to be used to promote blocked
workloads
• Default: 5 (i.e. 0.5%)
• For more information, see APARs OA17735, OA22443, and techdoc FLASH10609
• http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10609
18

#IDUG
z/OS WLM Blocked Workload Support …
• Recommendations
• All customers should run with this function ENABLED
• May prove very beneficial to be more aggressive than the defaults for systems that run at high
level of utilisation and with little or no batch workloads
• APAR OA44526 allows a lower BLWLINTHD threshold limit
• Previous value range for IEAOPTxx keyword BLWLINTHD was 5-65535 seconds
• New lower limit is 1
• Changing BLWLINTHD to 2-5 seconds may provide better overall system throughput at very high CPU
utilisation
• Regularly review the statistics on this function provided in RMF CPU Activity and Workload
Activity reports
19

#IDUG
zIIP Capacity
• In DB2 10, prefetch and deferred writes engines are now eligible for zIIP offload – in DB2
11, all SRB-mode system agents (except p-lock negotiation) are eligible for zIIP offload
• These DB2 tasks must be dispatched very quickly
• Any delays could result in
• Significant elapsed time degradation for some batch and DB2 utility jobs
• Very high count for 'prefetch engine not available' in the DB2 Statistics Trace
• Many installations running with default settings IIPHONORPRIORITY=YES,
HIPERDISPATCH=YES and ZIIPAWMT=3200 in IEAOPTxx parmlib member
• zIIP processors can get help from standard processors (GCP)
• zIIP needs to be running for 3.2 msec before it checks to see if it should ask for help from GCPs
('alternate wait management')
• Many requests can be flowing through the zIIP during this time period. But if the zIIP has been running
for the length of time specified by ZIIPAWMT, the queue of work is still growing, and all the zIIPs are
busy, then the zIIP signals a GCP for help to process the work.
20

#IDUG
zIIP Capacity …
• With the above default settings and if the zIIP capacity is under-configured
• DB2 prefetch engines can end up queuing for a zIIP for up to 3.2 msec before they are
dispatched on a GCP
• Minimum value when HIPERDISPATCH=YES is ZIIPAWMT=1600 – still very high
• Of course, this could be much worse if the zIIP processors were not allowed to ask GCPs for
help (IIPHONORPRIORITY=NO)
• Correct technical solution is to add more zIIP capacity
• zIIP capacity planning must be elevated to the same level as GCP capacity planning
• Need to proactively monitor zIIP CPU usage
• Using the % of workload redirect (APPL% IIPCP in RMF) – not just the zIIP % utilisation
• zIIP engines should be treated as ‘assist-processors’
• Target utilisation for zIIP engines is likely to be much lower than the limit used for GP usage due to
the time-critical nature of the work likely to be offloaded there
21

#IDUG
zIIP Capacity …
• Sample RMF statistics
22

#IDUG
24
2GB Bar
DBM1 Address Space
EDM Pools – V9 Picture
RDS Pool Below
(part of CT/PT
that must be below)
(No external ZPARM)
RDS Pool Above
(part of CT/PT
that can be above)
Skeleton Pool
(SKCT/SKPT)
EDM_SKELETON_POOL
EDMPOOL
EDMSTMTC
EDMDBDC
Global Dynamic
Statement Cache
(Dynamically
Prepared
Statements)
Database Descriptors Pool
(DBD)
SKDS
SKDS
DBD
DBD
CT PT
SKCT
SKPT
SKCT
SKPT
CT
PT
PT
ZPARM EDMPOOL in V9 defines
the size of the 31-bit RDS pool
(getmained storage)

#IDUG
25
2GB Bar
DBM1 Address Space
EDM Pools – V10/V11 Picture
31-bit user agent local storage
for plans and packages bound
prior to V10
(No external ZPARM)
64-bit user agent local
storage for plans and
packages bound in V10
(shared variable storage)
Skeleton Pool
(SKCT/SKPT)
EDM_SKELETON_POOL
EDMSTMTC
EDMDBDC
Global Dynamic
Statement Cache
(Dynamically
Prepared
Statements)
Database Descriptors Pool
(DBD)
SKDS
SKDS
DBD
DBD
SKCT
SKPT
SKCT
SKPT
CT
PT
PT
EDMPOOLCT PT
ZPARM EDMPOOL controls the
amount of 31-bit thread
storage used (variable storage)
V11: New limits for EDM pools
(can be up to 4 GB)

#IDUG
EDM Pool changes in DB2 10 (CM)
• RDS getmained storage pools (below / above) are eliminated
• Plan and package structures (CTs/PTs) are allocated from the user agent local storage
• Agent local pools are not shared, so no latch is required
• LC24 contention eliminated
• For plans and packages bound prior to DB2 10:
• CT and PT structures are still allocated below 2GB bar (31-bit variable storage)
• For plans and packages bound in DB2 10:
• CT and PT structures are moved above the 2GB (64-bit shared storage)
• Column procedures (SPROC, IPROC, …) are still allocated and shared below the 2GB bar for
applications rebound on DB2 10
• ZPARM EDMPOOL now limits the amount of allocated 31-bit storage, rather than
allocating the storage at DB2 startup
• ZPARM EDMBFIT is removed
26

#IDUG
27
Plan/Package and Skeleton Storage
Field Name Description
QISEKPGE PAGES IN SKEL POOL
QISESKCT PAGES HELD BY SKCT
QISESKPT PAGES HELD BY SKPT
QISEKLRU STEALABLE PAGES
QISEKFRE FREE PAGES
QISEDBDG FAILS DUE TO SKEL POOL FULL
QISECTG CT REQUESTS
QISECTL CT NOT FOUND
QISEKTP PT REQUESTS
QISEKTL PT NOT FOUND
EDM POOL QUANTITY /SECOND /THREAD /COMMIT
--------------------------- -------- ------- ------- -------
...
PAGES IN SKEL POOL (ABOVE) 4964.00 N/A N/A N/A
HELD BY SKCT 20.00 N/A N/A N/A
HELD BY SKPT 1584.77 N/A N/A N/A
STEALABLE PAGES 1602.77 N/A N/A N/A
FREE PAGES 3359.23 N/A N/A N/A
% PAGES IN USE 0.04 N/A N/A N/A
FAILS DUE TO SKEL POOL FULL 0.00 0.00 0.00 0.00
...
CT REQUESTS 13.00 0.00 1.00 0.00
CT NOT FOUND 0.00 0.00 0.00 0.00
CT HIT RATIO (%) 100.00 N/A N/A N/A
PT REQUESTS 162.7K 45.19 12.5K 0.95
PT NOT FOUND 6.00 0.00 0.46 0.00
PT HIT RATIO (%) 100.00 N/A N/A N/A
CT/PT HIT RATIO = (CT/PT REQUESTS – CT/PT NOT FOUND) / CT/PT REQUESTS * 100
DBM1 AND MVS STORAGE BELOW 2 GB QUANTITY
-------------------------------------------- ------------------
...
THREAD PLAN AND PACKAGE STORAGE (MB) 0.00
DBM1 STORAGE ABOVE 2 GB QUANTITY
-------------------------------------------- ------------------
...
QISESQCB 31-BIT THREAD PLAN STORAGE
QISESQKB 31-BIT THREAD PACKAGE STORAGE
QISESQCA 64-BIT THREAD PLAN STORAGE
QISESQKA 64-BIT THREAD PACKAGE STORAGE

#IDUG
Plan/Package and Skeleton Storage ...
• If EDMPOOL is too small – after migration to DB2 10 and before first REBIND
• Only applies for application bound on a prior release
• CT and PT pages are still allocated below the bar
• Fewer threads can run concurrently
• Race condition with CTHREAD/MAXDBAT
• Queuing or resource unavailable “SQLCODE -904”
• Strong recommendation to REBIND in V10 to maximise 31-bit virtual storage constraint relief
• If Skeleton Pool is too small
• Increased response time due to loading of SKCT, SKPT
• Increased I/O against SCT02 and SPT01
28
31-BIT THREAD PLAN AND PACKAGE STORAGE < 75% of EDMPOOL
ROT
Increase Skeleton Pool size as needed to keep CT/PT HIT RATIO > 95 to 99%
ROT

#IDUG
DBD Pool
• If DBD Pool is too small
• Increased response time due to loading of DBD
• Increased I/O against DBD01
• Pages in DBD pool are stealable – as long as they are not in use
• In extreme cases, a DBD pool that is too small could result in resource unavailable “SQLCODE -904”
29
QISEDPGE PAGES IN DBD POOL (ABOVE)
QISEDBD PAGES HELD BY DBD
QISEDLRU STEALABLE PAGES
QISEDFRE FREE PAGES
QISEDFAL FAILS DUE TO DBD POOL FULL
QISEDBDG DBD REQUESTS
QISEDBDL DBD NOT FOUND
EDM POOL QUANTITY /SECOND /THREAD /COMMIT
--------------------------- -------- ------- ------- -------
PAGES IN DBD POOL (ABOVE) 75843.00 N/A N/A N/A
HELD BY DBD 1275.00 N/A N/A N/A
STEALABLE PAGES 1006.93 N/A N/A N/A
FREE PAGES 74568.00 N/A N/A N/A
% PAGES IN USE 0.35 N/A N/A N/A
FAILS DUE TO DBD POOL FULL 0.00 0.00 0.00 0.00
...
DBD REQUESTS 375.9K 104.43 28.9K 2.19
DBD NOT FOUND 0.00 0.00 0.00 0.00
DBD HIT RATIO (%) 100.00 N/A N/A N/A
Increase DBD Pool size as needed to keep DBD HIT RATIO > 95 to 99%ROT

#IDUG
DBD Pool …
• To keep the DBD Pool storage under control
• Try to minimise the number of objects in a database
• Especially in data sharing environment where multiple DBD copies can be stored in the DBD Pool due to
DBD invalidation by other members
• Smaller DBDs also help with contention (see Locking)
• Compact DBD by REORG and MODIFY if many DROP TABLE in segmented tablespace
30

#IDUG
31
NO CACHING
DB2 Statement Caching – Overview
PREPARE S
EXECUTE S
COMMIT
EXECUTE S
PREPARE S
EXECUTE S
PREPARE S
EXECUTE S
EXECUTE S
ProgramA
ProgramB
full prepare
prepared statement S
-514/-518
delete
DB2 z/OS
full prepare
full prepare

#IDUG
DB2 Statement Caching – Overview …
• Significant cost to fully prepare a dynamic SQL statement
• Dynamic statement caching introduced in DB2 for OS/390 and z/OS V5
• Used by dynamic SQL applications to reuse and share prepared statements
• Conditions for reuse of SQL statement from dynamic statement cache
• SQL is dynamically prepared SELECT, UPDATE, DELETE or INSERT
• The statement text is identical - character for character (literals problematic)
• The authorisation ID is the same
• REOPT(VARS) disables use of cache for that plan/package
• Two levels of caching
• Global Dynamic Statement Cache
• Local Dynamic Statement Cache
32

#IDUG
33
CACHEDYN=YES
Global Dynamic Statement Cache
PREPARE S
EXECUTE S
COMMIT
EXECUTE S
PREPARE S
EXECUTE S
PREPARE S
EXECUTE S
EXECUTE S
ProgramA
ProgramB
-514/-518
delete
DB2 z/OS
SKDS
S
full prepare
short
prepare
short
prepare

#IDUG
Global Dynamic Statement Cache …
NOTES:
• Enabled by ZPARM CACHEDYN = YES
• Statement text and executable of the prepared statement (SKDS) is cached for reuse
across all threads
• Only first prepare is full prepare, otherwise short prepare, which is a copy from global
cache into thread storage
• No prepared statement is kept in thread storage across commit
34

#IDUG
Global Dynamic Statement Cache …
• Cached Statements Invalidation
• ALTER TABLE
• DROP dependent object
• REFRESH TABLE
• REVOKE privilege
• CREATE INDEX
• LOAD/REORG
• RUNSTATS
• Both RUNSTATS TABLE and RUNSTATS INDEX (narrower scope) are supported
• UPDATE NONE REPORT NO is the most efficient way to invalidate cache entry
35

#IDUG
36
CACHEDYN=YES
KEEPDYNAMIC(YES)
Local Dynamic Statement Cache
avoided
prepare
PREPARE S
EXECUTE S
COMMIT
EXECUTE S
PREPARE S
EXECUTE S
EXECUTE S
ProgramA
ProgramB
full prepare
short
prepare
DB2 z/OS
SKDS
S

#IDUG
Local Dynamic Statement Cache …
• Enabled by BIND option KEEPDYNAMIC(YES)
• Prepared statements kept in thread storage across commit so that prepares can be
avoided
• Application logic needs to reflect the bind option
• Same prepared SQL statement can be stored in several threads
• Least Recently Prepared Statements are thrown away from the LDSC at commit based on
MAXKEEPD
• MAXKEEPD limits the stored executable only, the statement text is always stored in thread
storage
• V10: Local Dynamic Statement Cache is entirely moved above 2GB
• Opportunity to increase MAXKEEPD to improve Local Dynamic Cache Hit Ratio and reduce the
number of short prepares
• Important: Must provision additional real storage to back the requirement
37

#IDUG
38
Dynamic Statement Cache Performance
• GDSC hit ratio = [Short Prepares] / [Short + Full Prepares]
• LDSC hit ratio = [Prepares Avoided]/[Prepares Avoided + Implicit Prepares]
• If the statement is not found in the LDSC >> implicit prepare
• Can result in either Short or Full Prepare
QXPREP NUMBER OF SQL PREPARE STATEMENTS
QISEDSI FULL PREPARE REQUESTS
QXSTIPRP IMPLICIT PREPARES
QXSTNPRP PREPARES AVOIDED
QXSTDEXP PREP STMT DISCARDED - MAXKEEPD
QXSTDINV PREP STMT DISCARDED - INVALIDATION
DYNAMIC SQL STMT QUANTITY /SECOND /THREAD /COMMIT
------------------------ -------- ------- ------- -------
PREPARE REQUESTS 124.5K 5.78 0.75 0.25
FULL PREPARES 17446.00 0.81 0.10 0.04
SHORT PREPARES 108.1K 5.02 0.65 0.22
GLOBAL CACHE HIT RATIO (%) 86.10 N/A N/A N/A
IMPLICIT PREPARES 0.00 0.00 0.00 0.00
PREPARES AVOIDED 5603.00 0.26 0.03 0.01
CACHE LIMIT EXCEEDED 0.00 0.00 0.00 0.00
PREP STMT PURGED 3.00 0.00 0.00 0.00
LOCAL CACHE HIT RATIO (%) 100.00 N/A N/A N/A
GDSC hit ratio should be > 90-95%ROT
LDSC hit ratio should be >70%ROT
(1) PREP STMT PURGED = QXSTDEXP+QXSTDINV

#IDUG
Recommendations
• Global Dynamic Statement Cache
• Should be turned on if dynamic SQL is executed in the DB2 system
• Best trade-off between storage and CPU consumption for applications executing dynamic SQL
• Local Dynamic Statement Cache
• Should only be used selectively for application with a limited number of SQL statements that are
executed very frequently
• Bind the packages with KEEPDYNAMIC(YES) in a separate collection
• JCC: Define Data Source properties (WebSphere customProperty)
• -keepDynamic=1
• -JdbcCollection=collname
• DB2 10 opens the opportunity to increase the LDSC for better hit ratio
• Recommendation to set an appropriate MAXKEEPD value not to over allocate REAL storage
• BUT…. Distributed threads cannot go inactive
• DB2 resources for distributed threads are not freed at commit
• No Sysplex workload balancing
39

#IDUG
Dataset Open/Close – ZPARMs
• DSMAX
• Maximum number of data sets that can be concurrently open – up to 200,000
• Retrofitted to DB2 10 via APAR PM88166 (z/OS 1.13 recommended for performance)
• z/OS 1.12 delivered functions that allowed significant data set allocation elapsed time reductions
• New interface for open and close data sets can be enabled by one of the following actions:
• Update ALLOCxx PARMLIB member to set the SYSTEM MEMDSENQMGMT to ENABLE
• Recommended as it remains effective across IPLs
• Issue the system command: SETALLOC SYSTEM,MEMDSENQMGMT=ENABLE
• A DB2 restart is required to make the change effective
• Deferred close
• When the number of available open data set slots reaches MIN(100, 1% of DSMAX), DB2 closes
MIN(300, 3% of DSMAX) data sets
• First, page sets defined with CLOSE YES are closed (on an LRU basis)
• If more need to be closed, page sets defined with CLOSE NO (on an LRU basis)
• APAR PM56725 – Unnecessary OPEN/CLOSE when DSMAX reached (stale list)
41

#IDUG
Dataset Open/Close – ZPARMs …
• PCLOSEN – PCLOSET
• Control the pseudo-close mechanism
• DB2 automatically converts updated page sets or partitions from read-write to read-only state when
either condition is met
• PCLOSEN - Number of consecutive DB2 system checkpoints since a page set or partition was last
updated
• Default 10 checkpoints (new default in V10 – used to be 5)
• PCLOSET - Elapsed time since a page set or partition was last updated
• Default 10 minutes
• The SYSLGRNX entry is closed
• Any updated pages are externalized to disk
• Also control the deferred close of GBP-dependent objects with CLOSE YES
• See next slide
42

#IDUG
CLOSE (YES) versus CLOSE (NO)
• Significant difference between CLOSE(NO) and CLOSE(YES) for active data sharing i.e.
when there is GBP dependency (1 updater, 1 reader)
CLOSE(YES) CLOSE(NO)
When the next PCLOSET/N occurs for the reader,
physical close of the object
When the next PCLOSET/N occurs for the reader, no
physical close
(+) Data set ceases to be GBP-dependent. The
remaining member can update the data set in
its local buffer pool.
(+) No physical open on next access from reader
(-) Physical open on next access from reader (-) Data set will stay GBP-dependent (SIX to readers
IS) until it is stopped, or DB2 is stopped normally, or
DSMAX is hit and all CLOSE(YES) objects have been
closed
43

#IDUG
Dataset Open/Close – Tuning
• Frequent dataset opens are typically accompanied by high DBM1 TCB time and/or Acctg
Class 3 Dataset Open time
• Could be caused by hitting DSMAX too frequently
• Increase DSMAX in small incremental steps and study the effects
• PM19528 added new IFCID 370/371 pair for OPEN/CLOSE monitoring
QTMAXDS OPEN DATA SETS - HWM
QTDSOPN DATA SETS CURRENTLY OPEN
QTMAXPB DS NOT IN USE, BUT NOT CLOSED – HWM
QTSLWDD DS NOT IN USE, BUT NOT CLOSED
QTDSDRN DSETS CLOSED-THRESH.REACHED
QTPCCT CONVERTED R/W -> R/O
QBSTDSO DATA SETS PHYSICALLY OPENED
OPEN/CLOSE ACTIVITY QUANTITY /SECOND /THREAD /COMMIT
--------------------------- -------- ------- ------- -------
OPEN DATASETS - HWM 23488.00 N/A N/A N/A
OPEN DATASETS 23118.06 N/A N/A N/A
DS NOT IN USE,NOT CLOSE-HWM 22890.00 N/A N/A N/A
DS NOT IN USE,NOT CLOSED 22417.63 N/A N/A N/A
IN USE DATA SETS 700.43 N/A N/A N/A
DSETS CLOSED-THRESH.REACHED 0.00 0.00 0.00 0.00
DSETS CONVERTED R/W -> R/O 6485.00 0.30 0.04 0.01
TOTAL GENERAL QUANTITY /SECOND /THREAD /COMMIT
-------------------------- -------- ------- ------- ------
...
NUMBER OF DATASET OPENS 4819.00 0.22 0.03 0.01
NUMBER OF DATASET OPENS < 0.1 to 1/secROT
44

#IDUG
Dataset Open/Close – Tuning …
• Possible side effects of pseudo close activity (RW->RO switching)
• Expensive SYSLGRNX processing
• Growth in SYSLGRNX pageset
• Frequent dataset close and re-open
• Ping-pong in and out of GBP dependency
• V10 – Expensive scan of XXXL local buffer pools is now avoided
• Recommendations
• Take frequent system checkpoints
• Set CHKFREQ=2-5 (minutes)
• Adjust PCLOSEN/T to avoid too frequent pseudo closes
• Use CLOSE(YES) as a design default
#DSETS CONVERTED R/W -> R/O < 10-15 per minuteROT
45

#IDUG
Writing to the Active Log
• When does DB2 write to the active log?
• UR-related events
• End Commit Ph1
• Begin Commit Ph2 (2 log writes for 2 phase commit)
• Ph1/Ph2 combined for 1 phase commit (1 log write for 1 phase commit)
• Database writes
• "Write ahead" logging protocol
• All undo/redo records pertaining to the page being written must be hardened on disk before the
page is written to DASD or CF
• Page P-lock negotiation and index leaf-page split in data sharing
• 2 forced physical log writes per index leaf-page split
• Force log up to PGLOGRBA (RBA/LRSN of last update to the page)
• System checkpoints
47

#IDUG
Writing to the Active Log …
• When does DB2 write to the active log? …
• Log write threshold reached
• Fixed "20 buffer" value
• ARCHIVE LOG command
• IFI read for IFCID 306 from another member
• Log writes are chained, up to 128 4K CIs per write I/O
• For dual logging
• V9
• Full CI writes are fully overlapped
• CI re-writes are done serially
• V10
• All log write I/Os are done in parallel
• Elapsed time improvements
48

#IDUG
Output Log Buffer
• Size controlled by ZPARM OUTBUFF
• Maximum size 400MB – default size 4MB
• V10: Storage allocated in MSTR address space
• All the log buffers specified by OUTBUFF are page fixed at DB2 start
• V11: Log buffers are moved to 64-bit common storage area (HCSA), avoiding address space
switching during writing log records
• Log buffers are page fixed at DB2 start
• Use 1 MB page frames if available, otherwise, 4 KB page frames are used

#IDUG
50
Log Write Statistics
QJSTWTB UNAVAILABLE OUTPUT LOG BUFFER
QJSTBPAG LOG OUTPUT BUFFER PAGED IN
QJSTWRNW NOWAIT LOG WRITE REQUESTS
QJSTWRF FORCE LOG WRITE REQUESTS
QJSTBFFL ACTIVE LOG OUTPUT CI CREATED
QJSTLOGW LOG WRITE I/O REQUESTS
QJSTCIWR LOG CI WRITTEN
• Need REAL storage to support increase
• More output log buffer space may also help log reads
• Log data volume = LOG CI CREATED * 4KB / stats_interval
• Start paying attention if >10MB/sec
LOG ACTIVITY QUANTITY /SECOND /THREAD /COMMIT
------------------------- -------- ------- ------- -------
...
UNAVAILABLE OUTPUT LOG BUFF 10.00 0.00 0.00 0.00
OUTPUT LOG BUFFER PAGED IN 0.00 0.00 0.00 0.00
LOG RECORDS CREATED 23673.6K 3315.64 402.30 98.14
LOG CI CREATED 1710.1K 239.51 29.06 7.09
LOG WRITE I/O REQ (LOG1&2) 1816.3K 254.39 30.87 7.53
LOG CI WRITTEN (LOG1&2) 4383.3K 613.91 74.49 18.17
LOG RATE FOR 1 LOG (MB) N/A 1.20 N/A N/A
LOG WRITE SUSPENDED 623.8K 87.37 10.60 2.59
Increase OUTBUFF if UNAVAIL OUTPUT LOG BUF > 0 and log data volume << maxROT

#IDUG
Log Dataset I/O Tuning
• Avoid I/O interference among Active Log Read/Write and Archive Log Read/Write
• If Log Rate near maximum,
• Use faster log device
• Reduce log data volume
• Use of DB2 Data Compression for insert intensive tables
• Optimise table design to minimise log record size (BRF only)
• Be aware of impact of DBD update
• Consider use of DFSMS striping
• If DB2 archive log offload process cannot keep up with speed that the DB2 active log fills
• Consider DFSMS striping of archive log files (available since V9)
51

#IDUG
Log Reads
• Log reads are driven by any of the following types of activity
• Rollback of a unit of work
• Data recovery (RECOVER utility, LPL or GRECP recovery)
• Online Reorg LOGAPPLY phase
• Restart recovery
• IFI log read interface
• Data replication
• IFCID 129: read range of log record CIs from active log
• IFCID 306: supports archive logs, decompression, and data sharing log merge
• Option to not merge
• V11: Log read DBID/PSID filtering to avoid reading and decompressing unnecessary log records
(also available in V10 via APAR PM90568 )
• Standalone log read - DSNJSLR
• DSN1LOGP
52

#IDUG
Log Read Performance
• More output log buffer space may help log read performance
• Active log reads perform best due to
• Prefetch of log CIs enabled
• Automatic I/O load balancing between copy1/copy2
• VSAM striping can be enabled
• Reduced task switching
• Archives on DASD perform better: tape destroys parallelism
• Tape requires serialization in data sharing between concurrent Recovers
• You cannot concurrently share tape volumes across multiple jobs
• Small number of read devices on backend of current VTS models, which can become a severe
bottleneck for DB2 mass recovery
QJSTRBUF READS SATISFIED FROM OUTPUT BUF
QJSTRACT READS SATISFIED FROM ACTIVE LOG
QJSTRARH READS SATISFIED FROM ARCHIVE LOG
LOG ACTIVITY QUANTITY /SECOND /THREAD /COMMIT
--------------------------- -------- ------- ------- -------
READS SATISFIED-OUTPUT BUFF 27654.00 1.28 0.17 0.06
READS SATISFIED-OUTP.BUF(%) 21.67
READS SATISFIED-ACTIVE LOG 99955.00 4.64 0.60 0.20
READS SATISFIED-ACTV.LOG(%) 78.33
READS SATISFIED-ARCHIVE LOG 0.00 0.00 0.00 0.00
READS SATISFIED-ARCH.LOG(%) 0.00
53

#IDUG
IRLM
• Make sure to use WLM service class SYSSTC for IRLM address space
• In general, CPU time charged to
• Class 2 accounting for locking
• Class 2 accounting for unlocking from Select
• MSTR SRB (Update Commit) for unlocking from Insert/Update/Delete
• IRLM SRB for lock or IRLM latch suspension
• IRLM trace can add to CPU time
• TRACE YES (default is NO) can double IRLM pathlength
• However with good lock avoidance, this should have minimal overall effect in most online
transactions and queries
• MODIFY irlmproc,SET,DEADLOK= or
TIMEOUT= to dynamically change
deadlock and timeout frequency
IRLMRWT DEADLOK
Range allowed 1 to 3600 sec 0.1 to 5 sec
Default (V9) 60 sec 1 sec
Default (V10/V11) 30 sec 1 sec
55

#IDUG
Lock Avoidance
• Combination of techniques to prevent retrieval of uncommitted data
• (1) Page latching (and page p-lock in data sharing) controlled by DB2 to ensure physical
consistency of the page
• (2) Commit log sequence number (CLSN) – at the page level
• DB2 tracks "time" of last update to page (on page) (A)
• DB2 tracks "time" of oldest uncommitted activity on every pageset/partition (B)
• Non Data Sharing
• CLSN = lowest uncommitted RBA for all active transactions for a given pageset
• Data Sharing
• For non-GBP-dependent page sets, each member uses a local CLSN = lowest uncommitted LRSN for all
active transactions for a given pageset
• For GBP-dependent page sets, a Global CLSN value is maintained for the entire data sharing group = lowest
CLSN value across all members across all page sets (GBP-dependent or not)
• If (A) < (B) everything on the page is guaranteed to be committed
• Else, check Possibly UNCommitted bits (PUNC bits)
56

#IDUG
Lock Avoidance …
• Combination of techniques to prevent retrieval of uncommitted data …
• (3) Possibly UNCommitted bits (PUNC bits) – at the row level
• On each row, a PUNC bit is set when the row is updated
• PUNC bits are periodically reset
• If successful CLSN check and more than 25% of the rows have the PUNC bit ON
• RR scanner
• REORG TABLESPACE
• If the PUNC bit is not ON, the row/key is guaranteed to be committed
57

#IDUG
Lock Avoidance …
• Benefits of lock avoidance
• Increase in concurrency
• Decrease in lock and unlock activity requests, with an associated decrease in CPU resource
consumption and data sharing overhead
• Plans and packages have a better chance for lock avoidance if they are bound with
ISOLATION(CS) and CURRENTDATA(NO)
• Read-only or Ambiguous cursor
• CURRENTDATA(NO)  all rows, qualified or not, are eligible for lock avoidance
• CURRENTDATA(YES)  only non-qualified rows are eligible for lock avoidance
• Non-cursor ‘singleton’ SELECT
• Qualified rows are eligible for lock avoidance for CURRENTDATA(YES or NO)
58

#IDUG
Lock Avoidance …
• BIND option ISOLATION(CS) with CURRENTDATA(NO) could reduce # Lock/Unlock
requests dramatically
• High Unlock requests/commit could also be possible from
• Large number of relocated rows after update of compressed or VL row
• Lock/Unlock of pointer record (or page)
• Large number of pseudo-deleted entries in unique indexes
• Lock/Unlock of data (page or row) in insert to unique index when pseudo-deleted entries exist
• Both can be eliminated by REORG
• V11: PCTFREE FOR UPDATE to reserve freespace for Update + pseudo-deleted index entry cleanup
QTXALOCK LOCK REQUESTS
QTXAUNLK UNLOCK REQUESTS
LOCKING ACTIVITY QUANTITY /SECOND /THREAD /COMMIT
------------------------ -------- ------- ------- -------
...
LOCK REQUESTS 521.0M 24.2K 3134.34 1050.75
UNLOCK REQUESTS 478.1M 22.2K 2876.06 964.16
Lock avoidance may not be working effectively if Unlock requests/commit is high, e.g. >5ROT
59

#IDUG
Lock Avoidance …
• Effective lock avoidance is very important in data sharing
• Long-running URs can reduce the effectiveness of lock avoidance by stopping the Global CLSN
value from moving forward
• Recommendation: Aggressively monitor long-running URs
• 'First cut' ROTs:
• URs running for a long time without committing: zparm URCHKTH<=5
• Message DSNR035I
• URs performing massive update activity without committing: zparm URLGWTH=10(K)
• Message DSNJ031I
• Need Management ownership and process for getting rogue applications fixed up so that they commit
frequently based on
• Elapsed time and/or
• CPU time (no. of SQL update statements)
60

#IDUG
Lock Tuning
• As a general rule, start with LOCKSIZE PAGE as design default
• If high deadlock or timeout, consider LOCKSIZE ROW
• Not much difference between one row lock and one page lock request
• However, the number of IRLM requests issued can be quite different
• No difference in a random access
• In a sequential scan,
• No difference if 1 row per page (MAXROWS=1) or ISOLATION(UR)
• Negligible difference if ISO(CS) with CURRENTDATA(NO)
• Bigger difference if ISO(RR|RS), or sequential Insert, Update, Delete
• Biggest difference if ISO(CS) with CURRENTDATA(YES) and many rows per page
• In data sharing, additional data page P-locks are acquired when LOCKSIZE ROW is used
61

#IDUG
Lock Tuning …
• MAX PG/ROW LOCKS HELD from Accounting trace
is a useful indicator of commit frequency
• Page or row locks only
• AVERAGE is for average of MAX, TOTAL is for
max of MAX (of Accounting records)
• So if transaction A had max. locks of 10
and transaction B had 20, then
• AVERAGE (avg. of max.) = 15
• TOTAL (max. of max.) = 20
• In general, try to issue Commit to keep max. locks held below 100
62
LOCKING AVERAGE TOTAL
--------------------- -------- --------
TIMEOUTS 0.00 1
DEADLOCKS 0.00 0
ESCAL.(SHARED) 0.00 0
ESCAL.(EXCLUS) 0.00 0
MAX PG/ROW LOCKS HELD 2.43 350
LOCK REQUEST 78.61 24637270
UNLOCK REQUEST 26.91 8433176
QUERY REQUEST 0.00 0
CHANGE REQUEST 1.72 539607
OTHER REQUEST 0.00 0
TOTAL SUSPENSIONS 0.20 63617
LOCK SUSPENSIONS 0.00 1518
IRLM LATCH SUSPENS. 0.20 62091
OTHER SUSPENS. 0.00 8

#IDUG
Lock Tuning …
• ZPARMS
• RRULOCK – U-lock on SELECT FOR UPDATE for ISOLATION(RR|RS)
• NO – Acquires S-locks on FETCH
• YES – Acquires U-locks on FETCH (new default in V10)
• If no update, U is demoted to S on next fetch
• If update, U is promoted to X in COMMIT duration
• XLKUPDLT – X-lock for searched UPDATE/DELETE
• Take X-lock immediately on qualifying rows/pages instead of S|U-lock
• Good if most accessed objects are changed
• V8 PQ98172 introduces a new option: TARGET
• X-lock only for the specific table that is targeted by UPDATE or DELETE
• S- or U-lock when scanning for rows/pages of other tables referenced by the query
• SKIPUNCI – Skip uncommitted inserts for ISOLATION(CS|RS)
• Restricted to row level locking only
63

#IDUG
IRLM Latch Contention
• High number of IRLM latch suspensions could be caused by
• IRLM Trace on
• Low IRLM dispatching priority
• Frequent IRLM Query requests (e.g. DISPLAY DATABASE LOCKS, or MODIFY irlmproc, STATUS)
• Very low deadlock detection cycle time and very high locking rates
QTXASLAT SUSPENSIONS (IRLM LATCH)
QTXALOCK LOCK REQUESTS
QTXAUNLK UNLOCK REQUESTS
QTXAQRY QUERY REQUESTS
QTXACHG CHANGE REQUESTS
LOCKING ACTIVITY QUANTITY /SECOND /THREAD /COMMIT
------------------------ -------- ------- ------- -------
SUSPENSIONS (ALL) 1498.6K 69.57 9.02 3.02
SUSPENSIONS (LOCK ONLY) 33120.00 1.54 0.20 0.07
SUSPENSIONS (IRLM LATCH) 1405.3K 65.24 8.45 2.83
SUSPENSIONS (OTHER) 60185.00 2.79 0.36 0.12
...
LOCK REQUESTS 521.0M 24.2K 3134.34 1050.75
UNLOCK REQUESTS 478.1M 22.2K 2876.06 964.16
QUERY REQUESTS 50741.00 2.36 0.31 0.10
CHANGE REQUESTS 4488.4K 208.38 27.00 9.05
OTHER REQUESTS 0.00 0.00 0.00 0.00
IRLM latch contention should be less than 1-5% of Total IRLM RequestsROT
IRLM Latch Contention = SUSPENSIONS (IRLM LATCH) (A)
Total IRLM Requests = LOCK + UNLOCK + QUERY + CHANGE REQUESTS (B)
IRLM Latch Contention Rate (%) = (A)/(B)*100
64

#IDUG
65
QTGSLSLM SYNCH.XES - LOCK REQUESTS
QTGSCSLM SYNCH.XES - CHANGE REQUESTS
QTGSUSLM SYNCH.XES - UNLOCK REQUESTS
QTGSFLSE ASYNCH.XES -CONVERTED LOCKS
QTGSIGLO SUSPENDS - IRLM GLOBAL CONT
QTGSSGLO SUSPENDS - XES GLOBAL CONT
QTGSFLMG SUSPENDS - FALSE CONTENTION
DATA SHARING LOCKING QUANTITY /SECOND /THREAD /COMMIT
--------------------------- -------- ------- ------- -------
GLOBAL CONTENTION RATE (%) 0.64
FALSE CONTENTION RATE (%) 0.11
...
SYNCH.XES - LOCK REQUESTS 227.5M 10.6K 1368.75 458.86
SYNCH.XES - CHANGE REQUESTS 1340.7K 62.24 8.07 2.70
SYNCH.XES - UNLOCK REQUESTS 225.8M 10.5K 1358.14 455.30
ASYNCH.XES -CONVERTED LOCKS 3485.41 0.48 3.87 0.00
...
SUSPENDS - IRLM GLOBAL CONT 34192.00 1.59 0.21 0.07
SUSPENDS - XES GLOBAL CONT. 6076.00 0.28 0.04 0.01
SUSPENDS - FALSE CONTENTION 21575.00 1.00 0.13 0.04
Global Contention should be less than 3-5% of XES IRLM RequestsROT
Global Contention = SUSPENDS – IRLM + XES + FALSE (A)
XES IRLM Requests = (SYNCH. XES – LOCK + CHANGE + UNLOCK) + ASYNCH.XES –CONVERTED LOCKS + (SUSPENDS – IRLM + XES + FALSE) (B)
Global Contention Rate (%) = (A)/(B)*100
False Contention should be less than 1-3% of XES IRLM RequestsROT
False Contention = SUSPENDS – FALSE (C)
False Contention Rate (%) = (C)/(B)*100
Data Sharing Lock Tuning

#IDUG
Data Sharing Lock Tuning …
• Notes
• IRLM Cont. = IRLM resource contention
• XES Cont. = XES-level resource cont. as XES only understands S|X mode
• e.g. member 1 asking for IX and member 2 for IS
• Big relief with Locking Protocol 2 when enabled after entry to V8 (NFM) but increased overhead for
pageset/partitions opened for RO on certain members, e.g.
• -START DATABASE(....) SPACENAM(...) ACCESS(RO)
• SQL LOCK TABLE statement
• LOCKSIZE TABLESPACE or LOCKSIZE TABLE for segmented tablespace
• Table scan with Repeatable Read isolation level
• Lock escalation occurs
• False Cont. = false contention on lock table hash anchor point
• Could be minimised by increasing the number of Lock entries in the CF Lock Table
• Note: the counter is maintained on a per-LPAR basis
• Will over-report false contentions in cases where multiple members from the same data sharing group
run on the same z/OS image
66

#IDUG
67
QTGSPPPE PSET/PART P-LOCK NEGOTIATION
QTGSPGPE PAGE P-LOCK NEGOTIATION
QTGSOTPE OTHER P-LOCK NEGOTIATION
DATA SHARING LOCKING QUANTITY /SECOND /THREAD /COMMIT
--------------------------- -------- ------- ------- -------
...
SYNCH.XES - LOCK REQUESTS 227.5M 10.6K 1368.75 458.86
SYNCH.XES - CHANGE REQUESTS 1340.7K 62.24 8.07 2.70
SYNCH.XES - UNLOCK REQUESTS 225.8M 10.5K 1358.14 455.30
ASYNCH.XES -CONVERTED LOCKS 1315.00 0.06 0.01 0.00
...
PSET/PART P-LCK NEGOTIATION 18037.00 0.84 0.11 0.04
PAGE P-LOCK NEGOTIATION 2863.00 0.13 0.02 0.01
OTHER P-LOCK NEGOTIATION 9067.00 0.42 0.05 0.02
P-LOCK CHANGE DURING NEG. 15991.00 0.74 0.10 0.03
P-lock Negotiation should be less than 3-5% of XES IRLM requestsROT
P-lock Negotiation = P-LOCK NEGOTIATION – PSET/PART + PAGE + OTHER (A)
XES IRLM Requests = (SYNCH. XES – LOCK + CHANGE + UNLOCK) + ASYNC.XES –CONVERTED LOCKS + (SUSPENDS – IRLM + XES + FALSE) (B)
P-lock Negotiation Rate (%) = (A)/(B)*100
• P-lock contention and negotiation can cause IRLM latch contention, page latch contention,
asynchronous GBP write, active log write, GBP read
• Page P-lock contention by one thread causes Page Latch contention for all other threads in the same
member trying to get to the same page

#IDUG
• Notes
• Other P-lock negotiation for
• Index tree P-lock (See LC06 contention in Latch section)
• Castout P-lock
• SKCT/SKPT P-lock
• Breakdown by page P-lock type
in GBP statistics
GROUP TOTAL CONTINUED QUANTITY /SECOND /THREAD /COMMIT
----------------------- -------- ------- ------- -------
...
PAGE P-LOCK LOCK REQ 877.4K 122.88 14.91 3.64
SPACE MAP PAGES 83552.00 11.70 1.42 0.35
DATA PAGES 10663.00 1.49 0.18 0.04
INDEX LEAF PAGES 783.2K 109.69 13.31 3.25
PAGE P-LOCK UNLOCK REQ 926.8K 129.80 15.75 3.84
PAGE P-LOCK LOCK SUSP 8967.00 1.26 0.15 0.04
SPACE MAP PAGES 593.00 0.08 0.01 0.00
DATA PAGES 143.00 0.02 0.00 0.00
INDEX LEAF PAGES 8231.00 1.15 0.14 0.03
PAGE P-LOCK LOCK NEG 10285.00 1.44 0.17 0.04
SPACE MAP PAGES 8.00 0.00 0.00 0.00
DATA PAGES 10.00 0.00 0.00 0.00
INDEX LEAF PAGES 10267.00 1.44 0.17 0.04
68

#IDUG
• To reduce page p-lock and page latch contention on space map pages during heavy inserts
into GBP-dependent tablespace
• TRACKMOD NO
• When no use of Incremental COPY or performance of Incremental COPY not important
• Avoids Space map page update for changed pages
• MEMBER CLUSTER
• Option available on classic partitioned tablespace and UTS (PBG and PBR)
• Switching to classic partitioned will likely result in extra getpages for true varying length rows and fixed
length compressed
• Increases the number of space map pages
• Only 199 data pages per space map page instead of 10K per space map for non-UTS
• Only 10 segments per space map page for UTS
• When MEMBER CLUSTER is used, APPEND + LOCKSIZE ROW has the potential to provide additional
relief for insert intensive workloads
• Reduced page P-lock and page latch contention on data pages
• Better space use
• Reduced working set of pages in buffer pool and overall space use
69

#IDUG
• BUT…
• Do not use APPEND or LOCKSIZE ROW without using MEMBER CLUSTER for an INSERT-at-the-
end intensive workload
• May result in excessive page p-lock and page latch contention on data pages and space map pages
• Extra locking protocol that comes with taking page p-lock on data pages when using LOCKSIZE ROW
• Do not use LOCKSIZE ROW with or without MEMBER CLUSTER for a mixed
INSERT/UPDATE/DELETE workload
• Consider using LOCKSIZE PAGE MAXROWS 1 to reduce page P-lock and page latch contention on data
pages
70

#IDUG
• 50% of CLAIM_REQ can be used as a very rough estimate of # tablespace/partition locks
acquired when effective thread reuse with RELEASE(COMMIT), or no thread reuse
• CLAIM_REQ assumed once for index and once for tablespace/ partition for a very rough
estimation
• These tablespace/partition locks can be eliminated with effective thread reuse with use of
RELEASE(DEALLOCATE)
71

#IDUG
Internal DB2 Latch Contention
• Typical high latch contention classes highlighted
• LC06 = Index split latch
• LC14 = Buffer pool LRU and hash chain latch
• LC19 = Log latch
• LC24 = Prefetch latch (or EDM LRU chain latch)
• Latch Class details in IFCID 51/52 (Share) and 56/57 (Exclusive) perf trace
• Disabling Acct Class 3 trace can help reduce CPU time when high latch contention is
observed
QVLSLC01 to
QVLSLC32
INTERNAL LATCH CONTENTION BY CLASS 1-32
LATCH CNT /SECOND /SECOND /SECOND /SECOND
--------- -------- -------- -------- --------
LC01-LC04 0.00 0.00 0.00 0.00
LC05-LC08 0.00 75.62 0.00 0.01
LC09-LC12 0.00 0.79 0.00 1.25
LC13-LC16 0.01 676.17 0.00 0.00
LC17-LC20 0.00 0.00 105.58 0.00
LC21-LC24 0.08 0.00 6.01 4327.87
LC25-LC28 4.18 0.00 0.02 0.00
LC29-LC32 0.00 0.20 0.57 25.46
Try to keep latch contention rate < 1K-10K per secondROT
72

#IDUG
Internal DB2 Latch Contention …
• LC06 for index tree P-lock by index split
• Function code X’46’ in IFCID 57 performance trace
• Index split is particularly painful in data sharing
• Results in two forced physical log writes
• Index split time can be significantly reduced by using faster active log device
• Options to reduce index split
• Index freespace tuning for random insert
• Minimum index key size especially if unique index
• NOT PADDED index for large varchar columns (V8)
• Large index page size (V9)
• Asymmetric leaf-page split (V9)
• Indication of number of index splits in LEAFNEAR/FAR in SYSINDEXPART and RTS
REORGLEAFNEAR/FAR
• V10: New IFCID 359 to monitor Index splits
73

#IDUG
• LC14 Buffer Manager latch
• If many tablespaces and indexes, assign to separate buffer pools with an even Getpage frequency
• If objects bigger than buffer pool, try enlarging buffer pool if possible
• If high LC14 contention, use buffer pool with at least 3000 buffers
• Use PGSTEAL=FIFO or NONE rather than LRU buffer steal algorithm if there is no read I/O, i.e.
object(s) entirely in buffer pool
• LRU = Least Recently Used buffer steal algorithm (default)
• FIFO = First In First Out buffer steal algorithm
• Eliminates the need to maintain LRU chain which in turn
• Reduces CPU time for LRU chain maintenance
• Reduces CPU time for LC14 contention processing
• V10: NONE = Buffer pool used for "in-memory" objects
• Data is preloaded into the buffer pool at the time the object is physically opened and remains resident
• Eliminates the need to maintain LRU chain
74

#IDUG
• LC19 Log latch
• Minimise #log records created via
• LOAD RESUME/REPLACE with LOG NO instead of massive INSERT/UPDATE/DELETE
• Segmented or UTS tablespace if mass delete occurs
• Increase size of output log buffer if non-zero unavailable count
• When unavailable, first agent waits for log write
• All subsequent agents wait for LC19
• Reduce size of output log buffer if non-zero output log buffer paging
• See Log Statistics section
• Reduced LC19 Log latch contention in DB2 9
• Log latch not held while spinning for unique LRSN
• Further improvements in DB2 10
• Log latch time held minimized
• Conditional attempts to get log latch before unconditional request
75

#IDUG
• LC24 latch
• Used for multiple latch contention types
• (1) EDM LRU latch – FC X’18’ in IFCID 57 performance trace
• Use EDMBFIT zparm of NO >> ZPARM removed in V10
• Thread reuse with RELEASE DEALLOCATE instead of RELEASE COMMIT for frequently executed
packages
• V10: Moving CT,PT from EDM pool to thread pools reduced LC24 significantly
• (2) Prefetch scheduling – FC X’38’ in IFCID 57 performance trace
• Higher contention possible with many concurrent prefetches
• Disable dynamic prefetch if unnecessary by setting VPSEQT=0 (V9)
• V10: PGSTEAL NONE for in memory data/index BP
• Use more partitions (one x’38’ latch per data set)
76

#IDUG
DBM1 Virtual Storage
and Real Storage
77

#IDUG
DB2 Virtual and Real Storage
• Common problems
• CTHREAD and/or MAXDBAT set too high
• No denial of service
• V10 before REBIND
• If limit for EDMPOOL is hit  SQLCODE -904
• DBM1 full system contraction > Potentially degrading performance
• DBM1 storage critical > Individual DB2 threads may abend with 04E/RC=00E200xx
• DB2 crashes out
• Excessive paging
• LPAR crashes out
• Aggravated by
• Lack of balance across CICS/WAS and DB2
• Workload failover
• Abnormal slow downs
• Workload moves
78

#IDUG
DB2 Virtual and Real Storage …
• Common problems …
• Shortage of real storage
• Not good practice to configure REAL storage based on normal operating conditions and rely on DASD paging
space to absorb the peaks
• Can lead to excessive paging and severe performance issues
• Ultimately, can take the LPAR out
• Once all AUX is consumed, the LPAR goes into a wait state
• Can lead to long DUMP processing times and cause major disruption
• DUMP should complete in seconds to make sure no performance problems ensue
• Once paging begins, it is possible to have DUMP processing take 10s of minutes with high risk of system-
wide or even sysplex-wide slowdowns
• Could be running ‘one DUMP away from a disaster’
• Wasted opportunities for CPU reduction
• Reluctance to use bigger or more buffer pools
• Reluctance to use buffer pool long-term page fix
• Many performance opportunities in DB2 10 require real storage
79

#IDUG
• Collect IFCID 225 data from DB2 start to DB2 shutdown
• Use sample REXX program MEMU2 to pull the data via IFI
• Available on the DB2 for z/OS Exchange community website on IBM My developerWorks
• http://www.ibm.com/developerworks/software/exchange/db2zos
• Click on 'View and download examples‘ and search on tag with 'memu2
• Use the new OMPE Spreadsheet Input Data Generator to post process SMF data
• APAR PM73732 and OMPE v510 PTF UK90267 / OMPE v511 PTF UK90268
• Graph IFCID 225 data basic elements to study evolutionary trend
• Set CTHREAD and MAXDBAT to realistic values that can be supported
• Balance the design across CICS AORs connecting to the DB2 subsystem
80

#IDUG
• Track messages issued by the DB2 Internal Monitor
• Automatically issues console messages (DSNV508I) when DBM1 31-bit virtual storage crosses
(increasing or decreasing) thresholds
• Threshold augmented in PM38435 to account for the storage cushion
• Identifies the agents that consume the most storage
• Can get status at any time using -DIS THREAD(*) TYPE(SYSTEM)
DSNV508I -SE20 DSNVMON - DB2 DBM1 BELOW-THE-BAR
STORAGE NOTIFICATION
77% CONSUMED
76% CONSUMED BY DB2
352M AVAILABLE OUT OF REGION SIZE 1553M
WITH A 274M STORAGE CUSHION
NAME ST A REQ ID AUTHID PLAN ASID TOKEN
VA1A N * 0 002.VMON 01 SYSOPR 002A 0
V507-ACTIVE MONITOR, INTERVALS=8216, STG=77%, BOOSTS=0, HEALTH=100
REGION=1553M, AVAIL=352M, CUSHION=274M
81

#IDUG
DB2 DBM1 Virtual Storage
• 31-bit / 24-bit DB2 storage
• Getmained
• Variable
• Fixed Storage
• Stack storage
• Non-DB2 getmained
• SMF
• Dataset / pageset
DBM1 AND MVS STORAGE BELOW 2 GB QUANTITY
-------------------------------------------- ------------------
TOTAL DBM1 STORAGE BELOW 2 GB (MB) 62.44
TOTAL GETMAINED STORAGE (MB) 10.48
TOTAL VARIABLE STORAGE (MB) 26.63
TOTAL AGENT LOCAL STORAGE (MB) 13.56
TOTAL AGENT SYSTEM STORAGE (MB) 11.99
NUMBER OF PREFETCH ENGINES 600.00
NUMBER OF DEFERRED WRITE ENGINES 300.00
NUMBER OF CASTOUT ENGINES 65.00
NUMBER OF GBP WRITE ENGINES 30.00
NUMBER OF P-LOCK/NOTIFY EXIT ENGINES 20.00
TOTAL AGENT NON-SYSTEM STORAGE (MB) 1.57
TOTAL NUMBER OF ACTIVE USER THREADS 40.53
NUMBER OF ALLIED THREADS 2.73
NUMBER OF ACTIVE DBATS 2.58
NUMBER OF POOLED DBATS 35.22
NUMBER OF PARALLEL CHILD THREADS 7.17
THREAD COPIES OF CACHED SQL STMTS (MB) 5.27
IN USE STORAGE (MB) 0.01
HWM FOR ALLOCATED STATEMENTS (MB) 0.06
THREAD COPIES OF STATIC SQL (MB) 4.26
BUFFER MANAGER STORAGE CNTL BLKS (MB) 1.75
TOTAL FIXED STORAGE (MB) 0.40
TOTAL GETMAINED STACK STORAGE (MB) 24.92
TOTAL STACK STORAGE IN USE (MB) 21.12
SYSTEM AGENT STACK STORAGE IN USE (MB) 19.65
STORAGE CUSHION (MB) 306.09
82

#IDUG
Non-DB2 Storage
• Not tracked by DB2
• Non-DB2 storage is high private storage
TOTAL DBM1 STORAGE = TOTAL GETMAINED STORAGE QW0225GM + TOTAL GETMAINED STACK
STORAGE QW0225GS + TOTAL FIXED STORAGE QW0225FX + TOTAL VARIABLE STORAGE QW0225VR
NON-DB2 STORAGE = MVS 31 BIT EXTENDED HIGH PRIVATE QW0225EH – TOTAL DB2 DBM1 STORAGE
• Used usually by MVS functions such as SMF
• Parameter DETAIL in SMFPRMxx can cause storage to creep and become very large
• The big hit to DB2 in this area is the DDNAME tracking: allocation does not realise that we have
closed off a page set and reallocated it again
• SMF Type 30 subtype 4 and 5 will track all the DDNAMES
• Most environments do not need SMF Type 30 subtype 4 and 5
• Recommend NODETAIL
83

#IDUG
DB2 DBM1 Virtual Storage …
• 64-bit DB2 storage
• Getmained
• Compression Dictionaries
• DBD Pool
• Dynamic Statement Cache
• Skeleton Pool
• Fixed
• Variable
• Buffer Pools
• Buffer Control Blocks
• Castout Buffers
• 64-bit Shared
• Getmained
• Fixed
• Variable
• Stack
84
DBM1 STORAGE ABOVE 2 GB QUANTITY
-------------------------------------------- ------------------
GETMAINED STORAGE (MB) 777.12
COMPRESSION DICTIONARY (MB) 54.96
IN USE EDM DBD POOL (MB) 4.98
IN USE EDM STATEMENT POOL (MB) 293.01
IN USE EDM SKELETON POOL (MB) 6.27
FIXED STORAGE (MB) 207.69
VARIABLE STORAGE (MB) 111.61
STORAGE MANAGER CONTROL BLOCKS (MB) 2.34
VIRTUAL BUFFER POOLS (MB) 2796.47
VIRTUAL POOL CONTROL BLOCKS (MB) 87.64
CASTOUT BUFFERS (MB) 8.13
SHARED GETMAINED STORAGE (MB) 4.07
SHARED FIXED STORAGE (MB) 19.45
RID POOL (MB) 198.00
SHARED VARIABLE STORAGE (MB) 379.45
TOTAL AGENT LOCAL STORAGE (MB) 190.33
TOTAL AGENT SYSTEM STORAGE (MB) 88.75
TOTAL AGENT NON-SYSTEM STORAGE (MB) 101.58
DYNAMIC STMT CACHE CNTL BLKS (MB) 171.35
THREAD COPIES OF CACHED SQL STMTS (MB) N/A
STATEMENTS COUNT 3.32
HWM FOR ALLOCATED STATEMENTS (MB) 2.18
STATEMENT COUNT AT HWM 10.00
DATE AT HWM 04/15/13
TIME AT HWM 19:46:55.60
SHARED STORAGE MANAGER CNTL BLKS (MB) 21.77
SHARED SYSTEM AGENT STACK STORAGE (MB) 512.00
STACK STORAGE IN USE (MB) 280.17
SHARED NON-SYSTEM AGENT STACK STORAGE (MB) 1536.00
STACK STORAGE IN USE (MB) 24.33

#IDUG
DB2 DBM1 Virtual Storage …
• 31-bit and 64-bit Common
COMMON STORAGE BELOW AND ABOVE 2 GB QUANTITY
-------------------------------------------- ------------------
EXTENDED CSA SIZE (MB) 400.09
FIXED POOL BELOW (MB) 4.48
VARIABLE POOL BELOW (MB) 0.94
GETMAINED BELOW (MB) 0.08
FIXED POOL ABOVE (MB) 5.97
VARIABLE POOL ABOVE (MB) 4.00
GETMAINED ABOVE (MB) 0.14
STORAGE MANAGER CONTROL BLOCKS ABOVE (MB) 1.34
85

#IDUG
REAL Storage
• Real storage needs to be monitored as much as virtual storage
• Important subsystems such as DB2 should not be paging IN from AUX (DASD)
• Recommendation to keep page-in rates near zero
• Monitor using RMF Mon III
• Monitor in DB2 page in for reads/writes and output log buffer paged in
• Collect IFCID 225 data from DB2 start to DB2 shutdown
• High auxiliary storage in use should be investigated
• Indicates that there are periods when the DB2 working set is pushed out to AUX (DASD)
• Need to understand what the driver is
• Shift in Workload with REAL frames stolen by overnight batch?
• REAL frames stolen by DB2 utilities?
• Dump capture?
REAL AND AUXILIARY STORAGE QUANTITY
--------------------------------------- ---------------
REAL STORAGE IN USE (MB) 5958.66
AUXILIARY STORAGE IN USE (MB) 0.00
V9
86

#IDUG
87
REAL Storage …
REAL AND AUXILIARY STORAGE FOR DBM1 QUANTITY
-------------------------------------------- --------
31 BIT IN USE (MB) 258.09
64 BIT IN USE (MB) 2267.10
64 BIT THREAD AND SYSTEM ONLY (MB) 0.00
HWM 64 BIT REAL STORAGE IN USE (MB) 2321.07
AVERAGE THREAD FOOTPRINT (MB) 5.70
HWM 64 BIT AUX STORAGE IN USE (MB) 0.00
SUBSYSTEM SHARED STORAGE ABOVE 2 GB QUANTITY
-------------------------------------------- --------
SHARED THREAD AND SYSTEM (MB) 0.00
SHARED STACK STORAGE (MB) 0.00
SHARED THREAD AND SYSTEM (MB) 0.00
SHARED STACK STORAGE (MB) 0.00
REAL AND AUXILIARY STORAGE FOR DIST QUANTITY
-------------------------------------------- --------
HWM 64 BIT REAL STORAGE IN USE (MB) 10.97
AVERAGE DBAT FOOTPRINT (MB) 3.46
HWM 64 BIT AUX STORAGE IN USE (MB) 0.00
MVS LPAR SHARED STORAGE ABOVE 2 GB QUANTITY
-------------------------------------------- --------
SHARED MEMORY OBJECTS 4.00
64 BIT SHARED STORAGE (MB) 327680.00
HWM FOR 64 BIT SHARED STORAGE (MB) 327680.00
64 BIT SHARED STORAGE BACKED IN REAL (MB) 1777.25
AUX STORAGE USED FOR 64 BIT SHARED (MB) 0.00
64 BIT SHARED STORAGE PAGED IN FROM AUX (MB) 0.00
64 BIT SHARED STORAGE PAGED OUT TO AUX (MB) 0.00
COMMON STORAGE BELOW AND ABOVE 2 GB QUANTITY
-------------------------------------------- --------
(...)
LPAR level instrumentation for REAL/AUX storage use
Warning: Numbers reported are inaccurate if more than one DB2
subsystem on the LPAR or if other subsystems use 64-bit shared
Solution: DB2 APAR PM24723 and prereq z/OS APAR OA35885
V10

#IDUG
88
Monitoring REAL/AUX storage usage …
0
1000
2000
3000
4000
5000
6000
7000
(MB) D0E1 - Total REAL storage in use
REAL in use for 64-bit common (MB) REAL in use for 64-bit shared stack (MB)
REAL in use for 64-bit shared (MB) ASID DIST REAL in use for 64-bit priv (MB)
ASID DIST REAL in use for 31-bit priv (MB) ASID DBM1 REAL in use for BP (MB)
ASID DBM1 REAL in use for 64-bit priv w/o BP (MB) ASID DBM1 REAL in use for 31-bit priv (MB)

#IDUG
89
0
0
0
1
1
1
(MB) D0E1 - Total AUX storage in use
AUX in use for 64-bit common (MB) AUX in use for 64-bit shared stack (MB)
AUX in use for 64-bit shared (MB) ASID DIST AUX in use for 64-bit priv (MB)
ASID DIST AUX in use for 31-bit priv (MB) ASID DBM1 AUX in use for BP (MB)
ASID DBM1 AUX in use for 64-bit priv w/o BP (MB) ASID DBM1 AUX in use for 31-bit priv (MB)

#IDUG
90
0
5000
10000
15000
20000
25000
30000
35000
2014-05-28-…
2014-05-29-…
2014-05-29-…
2014-05-29-…
2014-05-29-…
2014-05-29-…
2014-05-29-…
2014-05-30-…
2014-05-30-…
2014-05-30-…
2014-05-30-…
2014-05-30-…
2014-05-30-…
2014-05-31-…
2014-05-31-…
2014-05-31-…
2014-05-31-…
2014-05-31-…
2014-05-31-…
2014-05-31-…
2014-06-01-…
2014-06-01-…
2014-06-01-…
2014-06-01-…
2014-06-01-…
2014-06-01-…
2014-06-02-…
2014-06-02-…
2014-06-02-…
2014-06-02-…
2014-06-02-…
2014-06-02-…
2014-06-02-…
2014-06-03-…
2014-06-03-…
2014-06-03-…
2014-06-03-…
2014-06-03-…
2014-06-03-…
2014-06-04-…
2014-06-04-…
2014-06-04-…
2014-06-04-…
2014-06-04-…
2014-06-04-…
(MB) SYS1 - REAL storage available
MAXSPACE=10GB

#IDUG
Monitoring REAL/AUX storage usage – Based on OMPE PDB
• Stacked AREA graph - one for each DB2 member (one sheet per DB2 member)
• Stacked AREA graph - one for each DB2 member (one sheet per member)
• Line graph - one for each LPAR
§91
(REAL_STORAGE_FRAME - A2GB_REAL_FRAME)*4/1024 AS DBM1_REAL_PRIV_31BIT_MB
(A2GB_REAL_FRAME - A2GB_REAL_FRAME_TS)*4/1024 AS DBM1_REAL_PRIV_64BIT_BP_MB
A2GB_REAL_FRAME_TS*4/1024 AS DBM1_REAL_PRIV_64BIT_XBP_MB
(DIST_REAL_FRAME - A2GB_DIST_REAL_FRM)*4/1024 AS DIST_REAL_PRIV_31BIT_MB
A2GB_DIST_REAL_FRM*4/1024 AS DIST_REAL_PRIV_64BIT_MB
A2GB_COMMON_REALF*4/1024 AS REAL_COM_64BIT_MB
A2GB_SHR_REALF_TS*4/1024 AS REAL_SHR_64BIT_MB
A2GB_SHR_REALF_STK*4/1024 AS REAL_SHR_STK_64BIT_MB
(AUX_STORAGE_SLOT - A2GB_AUX_SLOT)*4/1024 AS DBM1_AUX_PRIV_31BIT_MB
(A2GB_AUX_SLOT - A2GB_AUX_SLOT_TS)*4/1024 AS DBM1_AUX_PRIV_64BIT_BP_MB
A2GB_AUX_SLOT_TS*4/1024 AS DBM1_AUX_PRIV_64BIT_XBP_MB
(DIST_AUX_SLOT - A2GB_DIST_AUX_SLOT)*4/1024 AS DIST_AUX_PRIV_31BIT_MB
A2GB_DIST_AUX_SLOT*4/1024 AS DIST_AUX_PRIV_64BIT_MB
A2GB_COMMON_AUXS*4/1024 AS AUX_COM_64BIT_MB
A2GB_SHR_AUXS_TS*4/1024 AS AUX_SHR_64BIT_MB
A2GB_SHR_AUXS_STK*4/1024 AS AUX_SHR_STK_64BIT_MB
QW0225_REALAVAIL*4/1024 AS REAL_AVAIL_LPAR_MB

#IDUG
Monitoring REAL/AUX storage usage – Mapping for reference
IFCID FIELD PM FIELD PDB COLUMN NAME MEMU2 Description
QW0225RL QW0225RL REAL_STORAGE_FRAME DBM1 REAL in use for 31 and 64-bit priv (MB)
QW0225AX QW0225AX AUX_STORAGE_SLOT DBM1 AUX in use for 31 and 64-bit priv (MB)
QW0225HVPagesInReal SW225VPR A2GB_REAL_FRAME DBM1 REAL in use for 64-bit priv (MB)
QW0225HVAuxSlots SW225VAS A2GB_AUX_SLOT DBM1 AUX in use for 64-bit priv (MB)
QW0225PriStg_Real SW225PSR A2GB_REAL_FRAME_TS DBM1 REAL in use for 64-bit priv w/o BP (MB)
QW0225PriStg_Aux SW225PSA A2GB_AUX_SLOT_TS DBM1 AUX in use for 64-bit priv w/o BP (MB)
QW0225RL QW0225RL DIST_REAL_FRAME DIST REAL in use for 31 and 64-bit priv (MB)
QW0225AX QW0225AX DIST_AUX_SLOT DIST AUX in use for 31 and 64-bit priv (MB)
QW0225HVPagesInReal SW225VPR A2GB_DIST_REAL_FRM DIST REAL in use for 64-bit priv (MB)
QW0225HVAuxSlots SW225VAS A2GB_DIST_AUX_SLOT DIST AUX in use for 64-bit priv (MB)
QW0225ShrStg_Real SW225SSR A2GB_SHR_REALF_TS REAL in use for 64-bit shared (MB)
QW0225ShrStg_Aux SW225SSA A2GB_SHR_AUXS_TS AUX in use for 64-bit shared (MB)
QW0225ShrStkStg_Real SW225KSR A2GB_SHR_REALF_STK REAL in use for 64-bit shared stack (MB)
QW0225ShrStkStg_Aux SW225KSA A2GB_SHR_AUXS_STK AUX in use for 64-bit shared stack (MB)
QW0225ComStg_Real SW225CSR A2GB_COMMON_REALF REAL in use for 64-bit common (MB)
QW0225ComStg_Aux SW225CSA A2GB_COMMON_AUXS AUX in use for 64-bit common (MB)
QW0225_REALAVAIL S225RLAV QW0225_REALAVAIL REALAVAIL (MB) (S)
§92
Note: All REAL/AUX storage fields in IFCID 225 and OMPE performance database are expressed in 4KB
frames or slots – they should be converted to MB (conversion is already done in MEMU2)

#IDUG
REAL Storage …
• Make sure LPAR has enough REAL storage
• Need to provision sufficient REAL storage to cover both the normal DB2 working set size and
MAXSPACE requirement
• Make sure MAXSPACE is set properly and defensively
• Should plan for approx 16GB for MAXSPACE in V10/V11
• Fixing z/OS APARs available to handle and minimise DUMP size
• APAR OA39596 , OA40856 and OA40015
• Under-sizing MAXSPACE will result in partial dumps
• Seriously compromises problem determination
• Dumps should be taken very quickly (<10 secs) almost without any one noticing i.e., little
or no disruption to the subject LPAR and to the rest of the Sysplex
• Excessive dump time caused by paging on the LPAR may cause massive sysplex-wide sympathy
sickness slowdowns
• Consider automation to kill dumps taking longer than 10 secs
93

#IDUG
REAL Storage …
• V10: Enable “contraction mode” to protect the system against excessive use of AUX
• Apply DB2 APAR PM24723 and prereq z/OS APAR OA35885
• Run with REALSTORAGE_MANAGEMENT=AUTO (default)
• When significant paging is detected, enter “contraction mode” to help protect the system
• “Unbacks” virtual pages so that a real frame or aux slot is not consumed for this page
• Use automation to trap the DSNV516I (start) and DSN517I (end) messages
• Control use of storage by DFSORT
• Set EXPMAX down to what you can afford for DFSORT
• Set EXPOLD=0 to prevent DFSORT from taking "old" frames from other workloads
• Set EXPRES=% {enough for MAXSPACE}
94

#IDUG
REAL Storage …
• Set z/OS parameter AUXMGMT=ON
• No new dumps are allowed when AUX storage utilization reaches 50%
• Current dump data capture stops when AUX storage utilization reaches 68%
• Once the limit is exceeded, new dumps will not be processed until the AUX storage utilization
drops below 35%
• Consider specifying z/OS WLM STORAGE CRITICAL for DB2 system address spaces
• To safeguard the rest of DB2
• Tells WLM to not page these address spaces
• Keeps the thread control blocks, EDM and other needed parts of DB2 in REAL
• Prevents the performance problem as the Online day starts and DB2 has to be rapidly paged
back in
95

#IDUG
REAL Storage …
• How to limit REAL storage?
• V9: Hidden ZPARM SPRMRSMX (‘real storage kill switch’)
• Delivered in APAR PK18354
• Not widely broadcasted
• Prevents a runaway DB2 subsystem from taking the LPAR down
• Should be used when there is more than one DB2 subsystem running on the same LPAR
• Aim is to prevent multiple outages being caused by a single DB2 outage
• Should be set to 1.5x to 2x normal DB2 subsystem usage
• Kills the DB2 subsystem when SPRMRSMX value is reached
• V10: SPRMRSMX becomes an opaque ZPARM REALSTORAGE_MAX
• Need to factor in 64-bit shared and common use to establish new footprint
• As DB2 approaches the REALSTORAGE_MAX threshold, “Contraction mode” is also entered to help
protect the system
96

#IDUG
Buffer Pools and
Group Buffer Pools
97

#IDUG
Pages Classification and LRU Processing
• Pages in a BP are classified as either random or sequential
• Pages read from DASD
• A page read in via synchronous I/O is classified as random
• A page read in via any form of prefetch I/O is classified as sequential
• Pages already exist in the BP from previous work
• V8: sequential pages were reclassified as random when subsequently touched by a random
getpage
• A random page was never re-classified as sequential
• V9/V10: none of them will be re-classified to random on a getpage
• Also see APAR PM70270 for buffer misses to do lack of buffer reclassification
• V11: re-enables buffer reclassification from sequential to random based on random getpage
98

#IDUG
Pages Classification and LRU Processing …
• DB2 has a mechanism to prevent sequentially accessed data from monopolizing the BP
and pushing out useful random pages
• Maintains two chains
• LRU with all pages (random and sequential)
• SLRU with only the sequential pages
• Steals from the LRU chain until VPSEQT is reached, and then steals preferentially from the SLRU
chain
• General recommendations
• Set VPSEQT to 90-95 for work file BP if used for sort, DGTT and sparse index
• Set VPSEQT to 100 for work file BP if used for sort only
• Set VPSEQT to 0 for data-in-memory BP to avoid the overhead of scheduling prefetch engines
when data is already in BP (or use PGSTEAL=NONE)
• Unless you have done a detailed study, leave VPSEQT to 80 (default)
• You may decide to set it lower (e.g. 40) to protect useful random pages
99

#IDUG
Hit Ratios and Page Residency Time
• Hit Ratios = percentage of times the page was found in the BP
• System HR = (Total getpages – Total pages read) / Total getpages * 100
• Appli. HR = (Total getpages – Synchronous reads) / Total getpages * 100
• Residency Time = average time that a page is resident in the buffer pool
• System RT (seconds) = VPSIZE / Total pages read per second
• Total getpages = random getpages + sequential getpages
• Total pages read = synchronous reads for random getpages + synchronous reads for sequential
getpages + pages read via sequential prefetch + pages read via list prefetch + pages read via
dynamic prefetch
• Synchronous reads = synchronous reads for random getpages + synchronous reads for
sequential getpages
100

#IDUG
Deferred Writes
• VDWQT (Vertical Deferred Write Queue Threshold) based on number of updated pages
at the data set level (% of VPSIZE or number of buffers)
• DB2 schedules up to 128 pages for that data set, sorts them in sequence, and writes them out
in at least 4 I/Os (page distance limit of 180 pages applies)
• DWQT (horizontal Deferred Write Queue Threshold) based on number of unavailable
pages (updated + in-use) at the BP level (% of VPSIZE)
• Write operations are scheduled for up to 128 pages per data set to decrease the number of
unavailable buffers to 10% below DWQT
DS1 DS2 DS3 DS4 DS5
VDWQT
DWQT
Buffer Pool
VPSIZE
101

#IDUG
Deferred Writes …
• Setting VDWQT to 0 is good if the probability to re-write the page is low
• DB2 waits for up to 40 changed pages for 4K BP (24 for 8K, 16 for 16K, 12 for 32K) and writes
out 32 pages for 4K BP (16 for 8K, 8 for 16K, 4 for 32K)
• Could lead to more physical log write as log records must be written ahead of updated pages
• Setting VDWQT and DWQT to 90 is good for objects that reside entirely in the buffer
pool and are updated frequently
• Watch out for possible buffer shortage condition, especially with uneven workload
• In other cases, set VDWQT and DWQT low enough to achieve a "trickle" write effect in
between successive system checkpoints
• But… Setting VDWQT and DWQT too low may result in poor write caching, writing the same
page out many times, short deferred write I/Os, and increased DBM1 SRB CPU resource
consumption
• If you want to set VDWQT in pages, do not specify anything below 128
102

#IDUG
103
Critical Thresholds
Sequential Prefetch Threshold
Checked before and during prefetch
90% of buffers not available for steal, or running out of sequential
buffers (VPSEQT with 80% default)
>> Disables Prefetch (PREF DISABLED – NO BUFFER)
Immediate Write Threshold
Checked at page update
>> After update, synchronous write
Data Management Threshold
Checked at page read or update
>> Getpage can be issued for each row sequentially scanned on
the same page – potential large CPU time increase
90%
95%
97.5%

#IDUG
Critical Thresholds …
• If non-zero
• Increase BP size, and/or
• Reduce deferred write threshold (VDWQT, DWQT)
• Increase system checkpoint frequency (reduce CHKFREQ)
• (*) Can be much higher if prefetch intentionally disabled via VPSEQT=0 (V9) or if using
PGSTEAL NONE (V10)
• Interesting option for data-in-memory BP
• Avoid the overhead of scheduling prefetch engines when data is already in BP
• No problem in that case
• V9: Consider using FIFO instead of LRU for PGSTEAL
Minimise PREF DISABLED - NO BUFFER (*)
Keep DATA MANAGEMENT THRESHOLD to 0
ROT
104

#IDUG
Virtual and Real Storage Allocation for Buffer Pools in V9, V10, V11
Page frame Page fix Virtual storage Real storage
V9 4KB NO Entire BP As needed
4KB YES Entire BP As needed
V10 4KB NO As needed As needed
4KB YES As needed As needed
1MB YES As needed As needed
V11 4KB NO Entire BP As needed
4KB YES Entire BP As needed
1MB YES Entire BP Entire BP
2GB YES Entire BP Entire BP
105
'As needed': incremental allocation up to defined size | 'Entire BP': initial allocation to defined size

#IDUG
Buffer Pool Tuning
• Multiple buffer pools recommended
• Dynamic performance monitoring much cheaper and easier than performance trace
• DISPLAY BPOOL for online monitoring
• Dataset statistics via -DISPLAY BPOOL LSTATS (IFCID 199)
• Useful for access path monitoring
• Dynamic tuning
• Full exploitation of BP tuning parameters for customised tuning
• ALTER BPOOL is synchronous and effective immediately, except for Buffer pool contraction
because of wait for updated pages to be written out
• Prioritisation of buffer usage
• Reduced BP latch contention
• Minimum BPs
• Catalog/directory (4K, 8K, 16K, 32K), user index (4K), user data (4K), work files (4K and 32K)
• Additional BPs if access pattern known
• In-memory data, large NPIs, LOBs, etc..
106

#IDUG
Long-Term Page Fix for BPs with Frequent I/Os
• DB2 BPs have always been strongly recommended to be backed up 100% by real storage
• To avoid paging which occurs even if only one buffer is short of real storage because of LRU
buffer steal algorithm
• Given 100% real storage, might as well page fix each buffer just once to avoid the
repetitive cost of page fix and free for each and every I/O
• New option: ALTER BPOOL(name) PGFIX(YES|NO)
• Requires the BP to go through reallocation before it becomes operative
• Means a DB2 restart for BP0
• Up to 8% reduction in overall IRWW transaction CPU time
In a steady-state: PAGE-IN for READ / WRITE <1% of pages read / writtenROT
107

#IDUG
Long-Term Page Fix for BPs with Frequent I/Os
• Recommended for BPs with high I/O intensity
• I/O intensity = [pages read + pages written] / [number of buffers]
• Relative values across all BPsBPID VPSIZE Read Sync Read SPF Read LPF Read DPF Read -Total Written I/O Intensity
BP0 40000 2960 0 0 6174 9134 107 0.2
BP1 110000 12411 5185 0 1855 19451 6719 0.2
BP2 110000 40482 19833 11256 9380 80951 5763 0.8
BP3 75000 23972 6065 0 14828 44865 7136 0.7
BP4 80000 22873 45933 3926 50261 122993 3713 1.6
BP5 200000 0 0 0 0 0 0 0.0
BP8K0 32000 9 11 0 11 31 169 0.0
BP32K 2000 693 873 0 6415 7981 38 4.0
In this example: Best candidates would be BP32K, BP4, BP2, BP3
No benefit for BP5 (data in-memory)
108

#IDUG
1MB size real storage page frames on z10, z196 and zEC12
• Potential for reduced for CPU through less TLB misses
• Buffer pools must be defined as PGFIX=YES to use 1MB size page frames
• Must have sufficient total real storage to fully back the total DB2 requirement
• Involves partitioning real storage into 4KB and 1MB size page frames
• Specified by LFAREA xx% or n in IEASYSnn parmlib member
• Only changeable by IPL
• If 1MB size page frames are overcommitted, DB2 will use 4KB size page frames
• Recommendation to add 5-10% to the size to allow for some growth and tuning
• Must have both enough 4KB and enough 1MB size page frames
• Do not use 1MB size real storage frames until running smoothly on V10
• Make sure any critical z/OS maintenance is applied before using 1MB size real storage
page frames
109

#IDUG
1MB size real storage page frames on z10, z196 and zEC12 …
• Useful commands
• DB2 -DISPLAY BUFFERPOOL(BP1) SERVICE=4
• Useful command to find out how many 1MB size page frames are being used
• Especially useful when running multiple DB2 subsystems on the same LPAR
• See DSNB999I message
• MVS DISPLAY VIRTSTOR,LFAREA
• Show total LFAREA, allocation split across 4KB and 1MB size frames, what is available
• See IAR019I message
110

#IDUG
Group Bufferpool Tuning
• Two reasons for cross invalidations
• Perfectly normal condition in an active-active data sharing environment
• Directory entry reclaims – condition you want to tune away from
• CPU overhead and I/O overhead if there is not enough directory entries
• Extra CF access and Sync I/O Read
• -DISPLAY GROUPBUFFERPOOL(*) GDETAIL(INTERVAL)
Cross Invalidations due to Directory Reclaims should be zeroROT
DSNB787I - RECLAIMS
FOR DIRECTORY ENTRIES = 0
FOR DATA ENTRIES = 0
CASTOUTS = 0
DSNB788I - CROSS INVALIDATIONS
DUE TO DIRECTORY RECLAIMS = 0
DUE TO WRITES = 0
EXPLICIT = 0
111

#IDUG
Group Bufferpool Tuning …
• Recommend use of XES Auto Alter (autonomic)
• Tries to avoid Structure Full condition
• Tries to avoid Directory Entry Reclaim condition
• CFRM Policy
• ALLOWAUTOALT(YES)
• Set MINSIZE to INITSIZE
• Set FULLTHRESHOLD = 80-90%
• Set SIZE to 1.3-2x INITSIZE
• Periodic review and updated CFRM based on actual allocation and ratio
• Especially when allocated size reaches SIZE
112

#IDUG
GBP Read Tuning
• Local BP search -> GBP search -> DASD I/O
• SYN.READ(NF) = Local Buffer Pool miss
• SYN.READ(XI) = Local Buffer Pool hit but cross-invalidated buffer
• Most data should be found in GBP >> if not, GBP may be too small or pages have been removed
because of directory entry reclaims
QBGLXD SYN.READ(XI)-DATA
QBGLXR SYN.READ(XI)-NO DATA
QBGLMD SYN.READ(NF)-DATA
QBGLMR SYN.READ(NF)-NO DATA
Sync.Read(XI) miss ratio should be < 10%
GROUP BP14 QUANTITY /SECOND /THREAD /COMMIT
----------------------------- -------- ------- ------- -------
...
SYN.READ(XI)-DATA RETURNED 1932.00 0.09 0.01 0.00
SYN.READ(XI)-NO DATA RETURN 39281.6K 1823.66 236.31 79.22
SYN.READ(NF)-DATA RETURNED 22837.00 1.06 0.14 0.05
SYN.READ(NF)-NO DATA RETURN 6955.8K 322.93 41.85 14.03
ROT
TOTAL SYN.READ(XI) = SYN.READ(XI)-DATA RETURNED + SYN.READ(XI)-NO DATA RETURN
Sync.Read(XI) miss ratio = SYN.READ(XI)-NO DATA RETURN / TOTAL SYN.READ(XI)
113

#IDUG
GBP Read Tuning ...
• GBPCACHE CHANGED - default and general recommendation
• CHANGED should only be used for LOBs if the LOBs are logged
• GBPCACHE SYSTEM - default for LOBs prior to V9
• V9 default is CHANGED
• SYSTEM should be the recommended design default for LOB tablespaces
• GBPCACHE NONE
• GBP used only for Cross Invalidation
• Reported as Explicit Cross Invalidation in GBP statistics
• Saves data transfer in GBP write and GBP access for castout
• But updated pages must be written to DASD at or prior to commit
• Set VDWT=0 to promote continuous deferred write
• Minimises response time elongation at commit
• Useful when an updated page is rarely re-accessed
• Batch update sequentially processing a large table
• GBP read/write ratio < 1%
• Primarily to reduce CF usage
114

#IDUG
QBGLWS WRITE AND REGISTER
QBGLWM WRITE AND REGISTER MULT
QBGLSW CHANGED PGS SYNC.WRTN
QBGLAW CHANGED PGS ASYNC.WRTN
GROUP BP14 CONTINUED QUANTITY /SECOND /THREAD /COMMIT
----------------------- -------- ------- ------- -------
WRITE AND REGISTER 54896.00 2.55 0.33 0.11
WRITE AND REGISTER MULT 255.5K 11.86 1.54 0.52
CHANGED PGS SYNC.WRTN 408.3K 18.96 2.46 0.82
CHANGED PGS ASYNC.WRTN 1713.4K 79.55 10.31 3.46
GBP Write Tuning
• Pages Sync Written to GBP - force write at
• Commit
• P-lock negotiation
• Pages Async Written to GBP
• Deferred write
• System checkpoint
• In GBP write, corresponding log record must be written first
115

#IDUG
GBP Write Tuning …
• Pages Castout / Unlock Castout is a good indicator of castout efficiency
• GBP castout thresholds are similar to local BP deferred write thresholds
• Encourage Class_castout (CLASST) threshold by lowering its value
• More efficient than GBP_castout threshold (notify to pageset/partition castout owner)
• CLASST threshold check by GBP write
• GBPOOLT threshold check by
GBP castout timer (10sec default)
• Default thresholds lowered in V8
• V11: Class castout threshold below 1%
QBGLRC PAGES CASTOUT
QBGLUN UNLOCK CASTOUT
QBGLCT CLASST THRESHOLD
QBGLGT GBPOOLT THRESHOLD
QBGLCK GBP CHECKPOINTS
----------------------------- -------- ------- ------- -------
PAGES CASTOUT 2224.8K 103.28 13.38 4.49
UNLOCK CASTOUT 58868.00 2.73 0.35 0.12
...
CASTOUT CLASS THRESHOLD 26835.00 1.25 0.16 0.05
GROUP BP CASTOUT THRESHOLD 594.00 0.03 0.00 0.00
GBP CHECKPOINTS TRIGGERED 45.00 0.00 0.00 0.00
V7 V8/V9/V10
VDWQT (dataset level) 10% 5%
DWQT (buffer pool level) 50% 30%
CLASST (Class_castout) 10% 5%
GBPOOLT (GBP_castout) 50% 30%
GBPCHKPT (GBP checkpoint) 8 4
116

#IDUG
QBGLWF WRITE FAILED-NO STORAGE
----------------------------- -------- ------- ------- -------
CASTOUT ENGINE NOT AVAIL. N/A N/A N/A N/A
WRITE ENGINE NOT AVAILABLE N/A N/A N/A N/A
READ FAILED-NO STORAGE N/A N/A N/A N/A
WRITE FAILED-NO STORAGE 0.00 0.00 0.00 0.00
GBP Write Tuning …
• WRITE FAILED-NO STORAGE
• GBP is full and no stealable data page
• Potential outage if pages added to LPL list
• Keep Write Failed < 1% of pages written by
• Using bigger GBP, and/or
• Smaller castout thresholds, and/or
• Smaller GBP checkpoint timer
WRITE FAILED-NO STORAGE < 1% of TOTAL CHANGED PGS WRTNROT
117

#IDUG
119
Sort Phases
• # logical workfiles, or runs, required for a
given sort, with sufficient sort pool size =
• 1 if rows already in sort key sequence
• #rows_sorted / 64000 if rows in random sort
key sequence
• #rows_sorted / 32000 if rows in reverse sort
key sequence
• #merge passes =
• 0 if #runs=1
• 1 if #runs < max #workfiles in merge
determined by BPsize x VPSEQT
• 2 otherwise
Rows to
be
sorted
- Sort tree
- Runs
- Up to 64 M
- One per thread
Work
Files
Work
Files
Sorted
Rows
Input Phase Sort Phase
Sort Pool Buffer Pool
L
W
F
L
W
F
L
W
F

#IDUG
Sort Indicators
• MERGE PASSES DEGRADED = number of times merge pass > 1 because workfile requests rejected
because buffer pool is too small
• WORKFILE REQUESTS REJECTED = number of workfile requests rejected because buffer pool is too
small to support concurrent sort activity
• WORKFILE PREFETCH NOT SCHEDULED = number of times sequential prefetch was not scheduled for
work file because of buffer pool shortage
QBSTWFF MERGE PASS DEGRADED
QBSTWFD WORKFILE REQ. REJECTED
QBSTWKPD WORKFILE PRF NOT SCHEDULED
BP7 SORT/MERGE QUANTITY /SECOND /THREAD /COMMIT
--------------------------- -------- ------- ------- -------
MAX WORKFILES CONCURR. USED 8.22 N/A N/A N/A
MERGE PASSES REQUESTED 7304.00 0.34 0.04 0.02
MERGE PASS DEGRADED-LOW BUF 0.00 0.00 0.00 0.00
WORKFILE REQ.REJCTD-LOW BUF 0.00 0.00 0.00 0.00
WORKFILE REQ-ALL MERGE PASS 14610.00 0.68 0.09 0.03
WORKFILE NOT CREATED-NO BUF 0.00 0.00 0.00 0.00
WORKFILE PRF NOT SCHEDULED 0.00 0.00 0.00 0.00
Workfile Req. Rejected < 1 to 5% of Workfile Req. All Merge Passes (1)ROT
Merge Passes Degraded < 1 to 5% of Merge Pass Requested (1)ROT
Workfile Prefetch Not Scheduled < 1 to 5% of Seq Pref Reads (1)ROT
(1) ROT assume VDWQT = 10 and DWQT = 50
120

#IDUG
Sort Indicators …
• SYNCHRONOUS READS >> buffer pool shortage or too few physical work files
• Prefetch quantity = PRF.PAGES READ/PRF.READ
BP7 READ OPERATIONS QUANTITY /SECOND /THREAD /COMMIT
--------------------------- -------- ------- ------- -------
SYNCHRONOUS READS 5703.00 0.26 0.03 0.01
SYNCHRON. READS-SEQUENTIAL 8.00 0.00 0.00 0.00
SYNCHRON. READS-RANDOM 5695.00 0.26 0.03 0.01
SEQUENTIAL PREFETCH REQUEST 88102.00 4.09 0.53 0.18
SEQUENTIAL PREFETCH READS 124.00 0.01 0.00 0.00
PAGES READ VIA SEQ.PREFETCH 941.00 0.04 0.01 0.00
S.PRF.PAGES READ/S.PRF.READ 7.59
...
DYNAMIC PREFETCH REQUESTED 693.9K 32.22 4.17 1.40
DYNAMIC PREFETCH READS 164.00 0.01 0.00 0.00
PAGES READ VIA DYN.PREFETCH 2290.00 0.11 0.01 0.00
D.PRF.PAGES READ/D.PRF.READ 13.96
PREF.DISABLED-NO BUFFER 0.00 0.00 0.00 0.00
PREF.DISABLED-NO READ ENG 0.00 0.00 0.00 0.00
PAGE-INS REQUIRED FOR READ 0.00 0.00 0.00 0.00
Sync Reads < 1 to 5% of pages read by prefetch (1)ROT
If prefetch quantity < 4, increase buffer pool size (1)ROT
(1) ROT assume VDWQT = 10 and DWQT = 50
121

#IDUG
RDS Sort Recommendations
• Use dedicated 4K and 32K buffer pools for sort workfile
• Set VPSEQT=100 (or 90-95 if used for DGTTS and sparse indexes as sort)
• Two possible strategies
• Tuning for optimal performance
• Set write thresholds very high e.g. DWQT=90 and VDWQT=70
• Design objective: Complete sort in memory and avoid I/O to the sort workfiles
• This is OK for consistent number of small sorts
• But increased risk of hitting buffer pool critical thresholds
• Adversely affecting application performance and threatening system stability
• Tuning for robust defensive performance
• Set write thresholds to DWQT=50 and VDWQT=10
• Design objective: Protect the system against unexpected surge of sort activity
• Many parallel sorts or a few very large sorts
• Best strategy if fluctuation in the amount of sort activity within and across members
122

#IDUG
RDS Sort Recommendations …
• Provide multiple physical workfiles placed on different DASD volumes
• Sort workfile placement example
• 4-way Data Sharing Group
• Assume 24 volumes are available
• Each member should have 24 workfile tablespaces on separate volumes
• All members should share all 24 volumes (i.e. 4 workfiles on each volume)
• Sort pool size
• ZPARM SRTPOOL determines the maximum size of the sort work area allocated for each
concurrent sort user (i.e. on a per-thread basis)
• Most of the sort pool is above the 2GB bar
• The default (10MB in V10) is well suited for most operational workloads
• For more detailed tuning, use IFCID 96
123

#IDUG
Temporary Space since DB2 9 for z/OS (CM)
• Declared Global Temporary Tables and Static Scrollable Cursors now use the WORKFILE
database instead of the TEMP database
• Heavier use of 32K buffer pool and workfiles to help performance (CM)
• If the workfile record length is less than 100 bytes, then a 4KB page tablespace is used
• In all other cases, DB2 tries to use a 32KB page tablespace
• If no 32KB page TS is available, DB2 uses a 4KB page TS – lost performance benefit
• Recommendations
• Assign bigger 32KB page workfile buffer pool
• Allocate bigger/more 32KB page workfile tablespaces
WORKFILE
Work files
Created
global
temporary
tables
Declared
global
temporary
tables
Declared
temporary
tables for
SSC
124

#IDUG
Temporary Space since DB2 9 for z/OS (CM) …
• New instrumentation to monitor the usage of temp space
• IFCID 2 updated with new counters
• Including the number of times a 4KB page tablespace was used when a 32K page tablespace was
preferable (but not available)
QISTWFCU Current total storage used, in MB
QISTWFMU Maximum space ever used, in MB
QISTWFMX Max. allowable storage limit (MAXTEMPS) for an agent, in MB
QISTWFNE # of times max. allowable storage limit per agent was exceeded
QISTWF04 Current total 4KB page tablespace storage used, in MB
QISTWF32 Current total 32KB page tablespace storage used, in MB
QISTW04K Current total 4KB page tablespace storage used, in KB
QISTW32K Current total 32KB tablespace storage used, in KB
QISTWFP1 # of times a 32KB page tablespace was used when a 4KB page
tablespace was preferable (but not available)
QISTWFP2 # of times a 4KB page tablespace was used when a 32KB page
tablespace was preferable (but not available)
WORK FILE DATABASE QUANTITY
--------------------------- --------
MAX TOTAL STORAGE (KB) 5120
CURRENT TOTAL STORAGE (KB) 1600
STORAGE IN 4K TS (KB) 64
STORAGE IN 32K TS (KB) 1536
4K USED INSTEAD OF 32K TS 0
32K USED INSTEAD OF 4K TS 0
AGENT MAX STORAGE (KB) 0
NUMBER OF MAX EXCEEDED 0

#IDUG
• APAR PK70060 introduces a soft barrier between DGTT and non-DGTT space use in
workfile database
• Recommendation: Define some tablespaces with zero secondary quantity and [some
PBG tablespaces or DB2-managed classic segmented with non-zero secondary quantity]
• For the DB2-managed workfile tablespaces with zero secondary quantity, make sure the
primary quantity is a little bit less than 2GB to avoid DB2 allocating a new piece A002
• For PBG tablespaces, consider setting MAXPARTS=1 to avoid extension
• APAR PM02528 provides the choice to make it a hard barrier: zparm WFDBSEP=NO|YES
WORKFILE
SECQTY 0
Work files
Created
global
temporary
tables
Declared
global
temporary
tables
Declared
temporary
tables for SSC
WORKFILE
PBG or
SECQTY > 0
126

#IDUG
• V11: New ZPARMs to monitor storage usage by work files
• WFSTGUSE_AGENT_THRESHOLD
• % available space in work file data base that can be consumed by single agent before warning message
• Message DSNI052I is issued once per commit scope
• Default=0 – no warning
• WFSTGUSE_SYSTEM_THRESHOLD
• % available space in work file data base that can be consumed by all agents before warning message
• Message DSNI053I is issued at 5-minute interval if the criteria continues to exist
• Default=90
• V11: New statistics indicators
• QISTDGTTCTO = Total system storage for DGTTs in work file database
• QISTWFCTO = Total system storage for non-DGTT work files in work file database
127

DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation

DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation

Similar to DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation (20)

Recently uploaded

Recently uploaded (20)

DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation