cmg_las

Our Experiences Implementing Goals
By Online Transaction Response Time
CMG
Session #131
Dec 2004
Len Jejer
Ray Smith
The Harford Financial Services
Group
Workload Manager

Trademarks and Disclaimers
 CICS and DB2 are registered trademarks of IBM Corporation in the US and
other countries.
 RMF, WLM, zOS, IMS are trademarks of IBM Corporation in the US and
other countries.
 SAS is a registered trademark of The SAS Institute, Inc. in the US and other
countries.
 MXG is a trademark of Barry Merrill in the US and other countries.
 Use of and references to products in this presentation is not intended to be
a product endorsement or recommendation of that product by The Hartford
Financial Services Group or employees of The Hartford Financial Services
Group.

References And
Acknowledgements http://www-1.ibm.com/support/techdocs/atsmastr.nsf/Web/Techdocs
 http://www.redbooks.ibm.com
 SG24-5326-00 – WLM Redbook
 SG24-6404-00 – IMS V7 Perf Monitoring and Tuning
 SG24-4693-01 – DB2 Stored Procedures
 IBM Manuals
 MVS Planning: Workload Management
 RMF Suite
 A special thanks goes to the RMF/WLM ETR Q&A support team for their
contributions to this project
 Thanks to Peter Enrico for information we got from his handouts and classes
 Thanks also goes to the RMF, WLM, IMS, and CICS support folks at IBM

Workload Manager Level-Set
 WLM manages the performance of the
workloads in a zOS environment
 Works toward optimizing resource utilization and
meeting workload goals

Workload Manager Level-Set
 Manages CPU, I/O and storage
 CPU is managed using dispatching priority and rate of
consumption
 Memory is managed by using dynamic storage
isolation
 I/O is managed using I/O priority or CPU dispatch
priority

Workload Manager Concepts
 WLM manages workloads according to goals
defined
 Velocity goals
 Response time goals
 WLM uses importance to prioritize goal
achievement

WLM Importance
 Importance is WLM’s “search order”
 Higher importance work tends to receive resources
 Lower importance work tends to donate resources
 We saw discretionary work can get cycles from the
higher priority work when the higher priority work was
exceeding the goals

WLM Importance
 SYSTEM and SYSSTC highest, but no storage
isolation
 Importance 1-5
 Discretionary (MTTW algorithm)

WLM Goals
 Velocity Goals
 Measure of acceptable delay
 (using)/(using + waiting)
 Average Response Time Goals
 Percentile Response Time Goals
 Less likely to be influenced by outliers
 “Bucket-ize” response time distribution

WLM Transaction Sampling
 WLM Samples Performance Blocks (PB)
 Sampled once every .250 seconds
 One adjustment made every 10 seconds
 Adjusted work left alone for a minute or so

WLM Transaction Sampling
 In CICS, PB’s predefined
 #PB = MAXTASKS
 Keep MAXTASKS within reason to avoid excessive
WLM sampling and overhead
 In IMS, PB’s are dynamic
 #PB = # of MPR
 PB’s come and go with MPR’s

CICS
AR1
WLM Transaction Monitoring
VTAM
TCPIP
CICS
TR
Classify
DB2/IMS
Report

The CICS Challenge
• 1.5 million transactions/day during month-end processing
• Processor at 100% utilization

Environment
 OS/390 2.10
 Over-committed 8-way processor running 2
LPAR’s
 CICS V4
 Two major production regions and four
ancillary regions

Impeders
 Month-end batch
 10 jobs each taking 5 hours CPU time running 5 at a
time among others
 Very long In/Ready Queue
 Mis-behaved DDF Enclaves

Solutions
 Don’t run batch during the day
 Raise CICS region velocity
 Never did get it to go @ 90% velocity goal
 Put CICS in WLM-managed transaction
response time goals

We Lined Up Resources
 IBMLINK -- just did searches
 IBMLINK – RMF/WLM ETR Q&A
 IBM on the Web
 Redbooks

We Gathered Tools
 SAS/MXG
 TYPE72GO
 RMF Canned Reports
 SYSRPTS(WLMGL(SCPER(TRANCP*)))
 RMF/PM
 http://www.ibm.com/servers/eserver/zseries/rmf/
 RMF Monitor III

We Measured What We Had
 Measured transaction counts and response time
 CICS 110 data
 MXG
 80% of volume was covered by < 12
transactions
 Developed goals for those transactions using
desired response time given current
transaction response goals

We Determined Goals
 Used the key transactions to determine service
class goals
 High volume transactions will lift the region
 Region tends to be managed to the most aggressive
transaction goal
 Some transactions will get a free ride
 Separated CICS system transactions
 Long-runners
 Never ending

Service Classes Naming
 Establish good WLM naming conventions
 Change names if they don’t work
 Accommodate test and production service classes
 Accommodate IMS/CICS separate service classes
 In CICS, use report classes to be able to track
individual transactions in monitors if necessary

We Defined Service Classes
 Importance
 Go with something that fits your policy with very little
or no user work above the transaction service classes
 Percentage making the response time goal
 Good tweaking tool, start lower and move up
 Response time goal
 Go with what you got at first

Examples of Service Class
 TRANCP01
 Importance 1, 98% in .5 secs or less
 Classified 70-80% of total volume
 TRANCP02
 Importance 1, 98% in 1.2 secs or less
 Classified 20-30% of total volume
 TRANCP03
 Importance 2, 98% in 2.0 secs or less
 CICS Overhead and long runners

We Set Up Classification
Rules
 Classification rules in CICS subsystem
 Used Subsystem Instance (VTAM Applid) and
Transaction Name (SI/TN)
 In JES/STC, PF11 over a couple times,
“Manage Region Using Goals Of” REGION
 Install and activate the policy

We Did More Analysis
 Online monitors can be used to monitor proper
classification of transactions
 Can run 110 data to simulate response time goal
achievement
 SAS/MXG helped us decipher the data

We Took The Plunge
 Change “Manage Region Using Goals Of” to
TRANSACTION
 We installed and activated the policy
 Tuned using high-volume transactions
 Got to a point of diminishing returns

We Changed The Service
Class Names
 Don’t use FAST, SLOW, etc in service class
name, the phone will eventually ring!
 TRANCP01, TRANCT01 etc. worked for us
 We used transaction name for report classes
 Report classes were minimally used in our case

RMF/PM
• Highlight
RESOURCE/SYSPLEX
• Multitude of Service Class
Reports to choose from
• Filter on SC name

We Were Surprised

We
Tweaked
 That annoying, consistent 10AM dip
 We took stuff out of SYSSTC
 Netview, DB2, DF/HSM
 CICS goals would handle most of DB2 work

We Smiled

We Moved Onto IMS
 IMS somewhat well behaved, some month-
end stress
 4 production control regions, 5 test control
regions
 Environment not “super-constrained”
 zOS 1.4 (+ OA06672 for canned reports)
 IMS V7 (+ PQ71906)

We Took A Different
Approach
 More granularity required due to nature of
IMS architecture and our structure
 Classified by IMS SSID/Tran Class
 All inclusive, nothing defaulted
 zOS 1.2 allowed for simulation with report
classes

We Setup Workload
Manager Kept report classes granular and only
consisting of 1 service class
 In IMS region classification rules (STC/JES),
left everything “Managed According to Goals
of” REGION

WLM IMS Transaction
Classification

RMF/PM
• Use Report Classes
• Click on
RESOURCE/SYSPLEX
• Multitude of Report
Class reports to choose
from
• Filter on Report Class
Name

We Analyzed and Monitored
 zOS 1.2 and Report Classes allowed us to
model with 100% accuracy
 Each IMS transaction class was given a
report class.

We Tweaked
 Using the report classes, we were able to
move different IMS transaction classes in
different service classes and re-analyze non-
disruptively
 BEFORE implementation we knew what
goals were being achieved

We Implemented
 Modify ALL related IMS region JES/STC
classifications to TRANSACTION
 Control regions
 MPR’s
 DBRC address spaces
 DLI regions
 IMS Connect Address Spaces
 Install/Activate Policy

We Monitored
 RMF/PM
 RMF Monitor III
 APAR OA06672 required for Mon III SYSRTD
<reportclassname>
 SAS/MXG
 Use TYPE72GO to “bucketize” report classes
(doesn’t need OA06672)

We Benefited
 When DB2 Stored Procedures came into the
system, we were better set up to manage the
influx of new work
 At 100% processor utilization, we were able
to over-achieve aggressive IMS goals with
3,000,000 IMS transactions a day

We Began The Stored
Procedure Saga
 DB2 Administrators provided initiative
 DB2 Version 8 requires WLM management of
Stored Procedures
 Avoid the last minute rush, get an understanding
of things early on

We Defined The AE
 SG24-4693-01 DB2 Stored Procedures
 Parameters can be specified in AE panel or in
DB2 JCL
 NUMTCB is important—too low = too many
SPAS started

We Learned About WLM SPAS
 Each SPAS can only service a single Service
Class when “Unlimited” is specified
 We set production to “Unlimited” and test to
“Single AS Per System”
 Changes to definition or JCL will require a
refresh of the AE itself

WLM AE Vary Commands
 V WLM,APPLENV=<AEname>,REFRESH
 V WLM,APPLENV=<AEname>,QUIESCE
 V WLM,APPLENV=<AEname>, RESUME
 zOS System Commands

We Diagramed The Workflow
LPAR2
DB2 1
Batch Jobs
IMS1
DB2 2
SPAS
SPAS
Batch Jobs
IMS2
Intel
LPAR1

We Determined Enclave
Classification
 Stored Procedure calls from IMS2 would keep
the IMS transaction classification and would be
included in IMS response time
 Stored Procedure calls from batch on LPAR2
would keep the batch classification
 Work from LPAR1 and the network would go
through DDF classification

We Found Challenges
 Had to maintain IMS1 response time via DDF
classifications
 Did not want the batch enclaves from LPAR1 to
dominate online enclaves
 Requests coming in from the Web couldn’t be
classified with response time goals THREADS=
ACTIVE or RELEASE( DEALLOCATE)

We Needed To Distinguish
Sources
 Independent enclaves from IMS transactions on
the remote LPAR
 Independent enclaves from batch jobs on the
remote LPAR
 Independent enclaves from the Web

We Looked For Data
 DB2 Accounting Data provides SP detail data
 Turned on Options 7 and 8
 Used MXG to format the SMF101 data
 Dumped the data and looked

We Liked The DB2 Data
 DB2 data provided a wealth of information
 Originating IMS
 Batch job information
 Workstation information
 USERID
 Plan
 On and on

We Classified The Test
Environment
 Separated the IMS and batch from the Web
work (that was easy)
 Put a “middle of the road” service class in for
everything
 Installed the policy and did more measurements

We Investigated Results
 Used Monitor III to view available classification
data in the ENCLAVE report
 Verified what we had in the classification rules
was correct
 Tried to find more to distinguish IMS and batch

Monitor III Example
 Go into OPTIONS
 Chose ENCLAVE report
 Get some attributes from there to be included
in report

We Came Up With A Way
 Origin data not available to WLM in classification
– but available in DB2 101
 Used MXG/SAS to pull off the independent
enclaves from DB2 data that coming from IMS
and batch
 We ran reports to tell us what the longest
running enclave from IMS was

We Revised DDF Service
Classes
 Made the Service Class for mainframe
origination work two periods long
 First period was response time %-ile, reflecting
the importance of the IMS transaction
 Second period was velocity, reflecting that of
production batch

We Got Help From The
Application
 The day before production cut over, the
applications did a “stress test”
 We modified service classes and goals to
parallel what we planned for production
 It was relatively calm
 We made an assessment of what would happen
the following Monday

We Were “All Set” For
Production We started off with some struggling to make the
goals with an IMS volume of 50-60 transactions
a second
 Missing goals by a percent or 2
 Control regions and MPR’s were being delayed
for ENCLAVE here and there

It Got A Little Busier
 IMS volume ramped up to the normal 100+
transactions a second
 Goals were consistently being missed
 Everything delayed for ENCLAVE

We Made Adjustments
 We lowered the DDF Service Class to a lower
percentage and a little longer time
 We saw some improvements
 Other IMS work was suffering badly
 Still not a win

We Made More Adjustments
 We started to see queuing in the IMS Control
Regions with transactions not related to the
Stored Procedure application
 We were able to isolate the transaction classes
that were being impeded and put them in their
own service class with a goal of 98% in < 0.2
seconds

We Kept Going
 We took the local IMS transactions that were
causing the higher CPU consumption, put them
in an isolated Service Class with lower
importance
 We put the DDF Service Class in a velocity goal
 Things quieted down a lot

We Found Stuff Out
 Application code had call to Stored Procedure
inside the loop
 Each call was iterated multiple times (at least
10-15) instead of once
 We convinced the applications to consider a
design change

We Put Stuff Back
 Applications implemented design change
 Big reduction in CPU utilization
 We put offending transactions back in the
original service class (IMP=1)
 Saw much better transaction flow, no issues

We Changed Goals
 We went back and changed our DDFP001
Service Class back to a %-ile response time
goal
 We cleaned up classification rules

Summary
 Get resources lined up
 Learn the measurement data
 Gather more data than you think you need
 Don’t be afraid to change your own plans

More Summary
 Set reasonable expectations and know how
to react when things go awry. “What am I
going to do if…..”
 Understand that you won’t get it right the first
time (see above)
 Don’t be afraid to ask IBM questions. This
benefits you and you’ll find out the ETR folks are
great to work with

Exploit The Tool
 You manage WLM, let WLM manage the
system
 Feel free to reach us at:
 len.jejer@thehartford.com
 raymond.smith@thehartford.com

cmg_las

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to cmg_las

Similar to cmg_las (20)

cmg_las