SlideShare a Scribd company logo
1 of 8
Workload Manager
Our Experiences Implementing IMS And CICS Transaction Goals
And
DB2 Stored Procedure Management
Len Jejer
Ray Smith
The Hartford Financial Services Group
This paper relates our experiences converting to Workload Manager
response time goals for our CICS and IMS online transaction
environments. Also included are experiences with implementing WLM-
managed DB2 Stored Procedures.
Introduction
This is the story of the conversion of our online
workloads to WLM transaction management. Out of
necessity, we first converted a CICS system that was
not making its SLA. After we achieved some success
with the CICS conversion, we moved on to converting
our IMS regions to transaction management. We were
later presented with an opportunity to use WLM to
manage our DB2 Stored Procedures.
WLM Level-Set
First, keep in mind that WLM’s entire philosophy is to
meet workload goals and to optimize resource
utilization. If consideration isn’t given to that
philosophy when developing goals and classifications,
it’s going to be difficult to obtain good results.
WLM uses samples to tell what’s going on. It will
sample each performance block once every quarter
second. There is one performance block for each
address space. When managing response time goals
for a CICS region, WLM builds a PB (Performance
Block) “table” with the number of entries equal to the
MAXTASK parameter for that region. WLM goes
through each PB whether it’s used or not in each
sampling cycle. To avoid unnecessary WLM
overhead, remember to review the MAXTASK’s for
each region.
The single most influential input WLM uses is service
class importance. This is how WLM prioritizes goal
achievement. Higher importance work tends to
receive resources; lower importance tends to donate
resources. Make sure you have enough donors. We
didn’t at first and that made it difficult to achieve our
project goals.
WLM can take cycles from higher importance work
that is exceeding the goal and re-distribute them to
lower importance or discretionary work; however, this
isn’t the norm. WLM does this by imposing internal
resource group capping. You may see “RG-Cap” for
delay reason, even though you have no capped
resource groups.
The three types of goals WLM supports are velocity,
average response time and percentile response time
goals. The easiest to understand and most precise
are percentile response time goals. They are less
likely to be influenced by out-lying transactions and do
not have to be revisited frequently due to workload or
environmental changes. Average response time goals
aren’t the best choice because they are easily
influenced by outliers.
When using either of the response time goals, make
sure that the goals you set are realistic. WLM will spin
its wheels trying to make goals when the goals cannot
possibly be made. If this work is high importance
work, all the lower importance levels will not get the
help they may need.
Velocity goals are difficult in concept and
implementation. “Not making the goal” can be
attributed to real delays or simply lack of samples in
the service class. Measured velocity can fluctuate
with changes in hardware as well, meaning that
velocity goals must constantly be re-evaluated. We
weren’t having good luck with velocity, even with a
goal of 90% velocity on the problem CICS regions. In
general, we take the approach of only using velocity
goals where we absolutely have to. Most batch
workloads and STC workloads are good candidates
for velocity goals because they tend to be long
running, in some cases for the life of the IPL.
Response time goals aren’t usually suited for these
types of background work.
The CICS Project
Faced with failing to meet the SLA (98% of all
application transactions complete in 2 seconds or
less) in two critical CICS applications and having
failed using WLM velocity goals, we decided to try
WLM transaction management to get the transaction
response time more in line with the customer
requirements.
In our SLA diagrams, the graphs represent two high
profile applications with service expectation that 98%
of transactions will complete in two seconds or less.
They are located in two separate CICS regions.
Figure 1 shows the end-to-end response time
percentiles before the WLM conversion began.
Figure 1
Online CICS Performance Prior to WLM
Conversion
Environment
The CICS systems were on an Amdahl 8-way CMOS
processor running 2 logical partitions and were
processing 1.5 million transactions a day. The
operating system was OS/390 2.10 and CICS was
Version 4. The processor was at 100% utilization
during month-end processing with considerable latent
demand. Among the month-end batch, there were 10
jobs taking 5 hours each of CPU time. They ran 5 of
these at a time, which significantly impeded other
work in the system. There were also DDF enclaves
that were not well behaved.
Resources and Tools
We started by lining up resources. We found out
IBMLINK is the richest place to go. Between the
IBMLINK database and the RMF/WLM ETR Q&A,
there was a wealth of information. There are also
Redbooks covering WLM that discuss transaction
management. 75% of our knowledge came from the
answers to questions we asked the Q&A ETR folks
and help they gave us, the Redbooks and hits we
found on IBMLINK.
We were using TMON at the time for our CICS
monitor and we were using RMF for our MVS Monitor.
We also had SAS/MXG and we used TYPE72GO
records. We used RMF/PM and Monitor III as well as
RMF postprocessor reports:
SYSRPTS(WLMGL(SCPER(<serviceclassname>)))
SYSRPTS(WLMGL(RCPER(<reportclassname>))).
These two reports will give you response time
distributions.
Data Gathering and Analysis
CICS 110 data was invaluable. We gathered and
dumped a portion of it just to see what there was that
we could use. We ran statistical programs to get
volume percentiles. We discovered that 80% of the
transaction volume was covered by less than 12
individual transactions. We used the 110 data to help
us establish the goals to use.
Goal Determination
Using the response time field in the CICS 110 data,
we developed 3 buckets that more or less delineated
our CICS transactions. Those 3 buckets became our
first attempt at establishing service classes. We tried
using the high volume/short running transactions to lift
the region. We started out with three service classes
TRANFAST, TRANSLOW and TRANMED. We put
the CICS system transactions in TRANSLOW.
We set the importance for the transaction service
classes at 1. We took a conservative approach in
choosing the percentage making the response time
goal. This number turned out to be a good tweaking
tool.
Keep in mind that since there is only one address
space and only one dispatching priority, the region’s
DP will be managed to the most aggressive goal. This
means that some transactions will get a “free ride” as
the high volume transactions with high importance will
tend to “lift” the region.
Classification Determination
In hindsight and with more WLM experience, we
“over-classified” in this project. Later CICS
implementations consisted of one service class for a
region. In this project we had about 12 transactions
classified. Report classes were used with these so we
could get reporting granularity. Also, for a first shot at
doing transaction response time management, going
through picking out transactions and classifying them
is not a bad thing. We saw stuff that we would not
have seen in a blanket classification. If this is your
first attempt at this, go through the motions to better
understand the workload in the enterprise and how
WLM works.
The first step is to put the data in the CICS subsystem
classification rules. We kept it simple, just using
transaction name. Later on, we got a little fancier and
added subsystem instances. We changed service
class names as well. See Figure 2 for a sample
screenshot of the CICS subsystem classification
panel.
Figure 2
Sample CICS Subsystem Classification
We put the highest volume transactions first, to have a
better chance of getting out of classification routines
relatively quickly and reducing WLM overhead.
Our CICS’s are run as jobs. So the regions are in the
JES section of the classification rules. If you PF11
over a couple times in the classification section (same
in STC and JES), you will see a column that says
“Manage Region Using Goals Of”. See Figure 3 for a
sample screenshot of the JES subsystem
classification panel.
Figure 3
Managing According to Goals Of Region
Be careful, as the default for management is
TRANSACTION. At the start, make sure you set this
to REGION. This way, when you put the information
in the CICS subsystem section, WLM will still manage
according to the velocity goals set for the region and
not the transaction response time goals. It’s not until
you change these over to “TRANSACTION” that WLM
will actually start using the response time goals.
So with “REGION” in that column and the CICS
transactions and service classes in the policy, we
installed and activated the new policy.
What We Saw
CICS 110 data started picking up the service class,
and TMON was showing it in the transaction screen.
So we knew we did something right and we were then
able to use the 110 data to make sure that we had the
classification rules right. We also used that data to
simulate response time goal achievement to some
extent.
We went through cycles of tweaks, still not managing
according to response time goals, and re-iterating the
measurements.
When we were somewhat satisfied with the numbers
we were getting, we put the response time goals to
work.
Transaction Management Implementation
We went into the policy definition, changed “Manage
Region Using Goals Of” to TRANSACTION (see
Figure 4) for each region, installed and activated the
policy and saw a wonderful thing. Using the Monitor
III SYSSUM report, we saw transaction flow start to
smooth out. We had started to achieve WLM goals
which meant achieving the SLA.
Figure 4
Managing According to Goals Of Transaction
We tuned by moving transactions around to different
service classes and by modifying response time
objectives. We eventually got to a point of diminishing
returns.
Customer Reaction
“How come my transaction is in the slow class?” A
TMON user noticed one of his transactions was in
TRANSLOW. We found out that it’s not a good idea
to have connotations of speed in service class names.
After we explained that the transactions weren’t being
slowed down, we changed the service class names
changed from TRANFAST, TRANMED and
TRANSLOW to TRANCP01, TRANCP02 and
TRANCP03.
Results
With no other changes to the system except putting
the transactions in response time goal mode, the SLA
graph looked like Figure 5.
Figure 5
Online CICS Performance After to WLM
Conversion
This represents 1.5 million transactions with the
processor at 100% utilization.
We noticed we consistently had the dip starting
around 10AM in the blue application’s response time
results. We couldn’t figure out exactly what was
causing it. Peter Enrico spoke at a Connecticut CMG
meeting about taking work of out of SYSSTC.
We got back home and took another look at the policy.
We took Netview, DB2 and HSM out of SYSSTC;
however, we did leave the IRLM address space in
SYSSTC. There were concerns raised about taking
DB2 out of SYSSTC, but the transaction PB would
govern any threads presented by CICS and DB2
would be managed according to that PB.
We implemented the policy and measured, and the 10
AM dip on the blue application disappeared. Figure 6
shows the SLA graph for a month-end processing day
after taking work out of SYSSTC.
Figure 6
Online CICS Performance After Tweaking
CICS System Transactions
We eventually weeded out the CICS system
transactions, some long running, some never ending
and put them in their own service class. The overall
performance was not affected in our case.
IMS Was Next
On another sysplex, we have a bigger IMS
environment. The IMS was well behaved to some
degree, but there was some month-end stress. We
have 4 production control regions and 5 test control
regions. By the time we got around to IMS, we were
at z/OS 1.4 (+OA06672 for SYSRTD reports) and IMS
V7 (+PQ71906 for OTMA classification). The
processor was an IBM Z900 1C6. At the z/OS 1.2
level of WLM, IBM includes the capability of
measuring response time distribution from report
classes defined in the IMS subsystem section without
having management by response time actually turned
on. This made things much easier.
Classification of IMS Transactions
Our MPR’s were already set up to handle transaction
classes with similar response time requirements. So it
made sense for us to classify by IMS subsystem and
transaction class. There was a lot to type into the
policy, but it paid off in the long run.
We mapped out our report classes according to the
IMS region and IMS transaction class. This gave us
homogenous report classes which are required to
effectively use RMF response time distribution reports
(+OA06672) by report class. Our approach was to be
all inclusive, non-overlapping and non-defaulting.
We installed the policy with the definitions in the IMS
subsystem section and started to measure. Our
implementation provided for granularity at the
transaction class level. We could get RMF response
time distribution reports by IMS region/transaction
class.
What We Measured
We were surprised at first to see that when we added
up the counts in the report classes and then added up
the transactions in the IMS log data, they didn’t match.
After investigating where the missing transactions
were appearing and then working with IBM, we came
to the conclusion that the OTMA transactions were the
culprits. This led to APAR PQ71906 for IMS Version
7--the transaction class wasn’t being passed properly.
Using the Report Class response time distribution
statistics (from MXG TYPE72GO), we were able to
model exactly what was going on by using SAS to
map the report class statistics to what would be
service class statistics. We could move stuff around,
implement a new policy, do response time distribution
measurements again and repeat.
We knew ahead of implementation where we would
be as far as making the goals. We were able to model
at 100% accuracy with no disruption.
Implementation
We went into the JES/STC classification rules and
changed all the related IMS address spaces to
“Manage Region Using Goals Of” to TRANSACTION.
This included control regions, MPR’s, DBRC, DLI and
IMS Connect address spaces. We installed and
activated the policy.
There wasn’t much tweaking to do at this point, as we
had already done all the tweaking there was to do
during the modeling.
Observations
We saw more consistent IMS response times during
times of resource shortages. We were able to over-
achieve aggressive IMS goals with a volume of
3,000,000 customer transactions in an 8 hour period
with the processor running 100%.
Overall Benefits
We had a better handle on IMS performance. Month-
end became more hands-off. We were able to take
DB2 out of SYSSTC on that sysplex in preparation for
a policy overhaul that took place after attending Peter
Enrico’s “Revisiting WLM Goals” class.
Using Monitor III, we could tell at a glance which IMS
region and which transaction class was contributing to
any missed goals using the report class in SYSSUM.
Any missed goals were usually due to looping or
abending transactions. We still revisit the goals and
make adjustments as needed.
After our policy re-write, we only have some of the
online transactions in importance 1, along with some
DDF enclaves. That’s it. The bulk of our work is in
IMP=3, 4, 5 or discretionary. We now keep IMP=1 for
production online transactions only.
The whole transaction conversion process taught us a
lot about WLM. Having the IMS workload in
transaction response time mode helped us a lot when
we started using WLM-managed DB2 Stored
Procedures. We had some problems getting that
running smooth at first, as we had to deal with
dependent and independent enclaves, all doing the
same thing. Having the IMS structured as we did,
made it easy to shift IMS workloads around by
transaction classes to different service classes.
Managing DB2 Stored Procedures
The DB2 administrator came over one day and started
talking about a new application that was going to use
DB2 Stored Procedures. He wanted to use WLM to
manage the address spaces. It sounded good and we
started another WLM adventure.
DB2 SPAS Application Environment
We started by reading the Redbook on DB2 Stored
Procedures and talked with the DB2 administrator
about setting up the WLM Application Environment
(AE). After we had some operational issues, we
agreed that the parameters, such as NUMTCB, would
be coded in the DB2 SPAS JCL and not the AE
definition in the WLM policy. This was more because
of our particular organization, rather than any
technical reasons.
A little later, we changed the NUMTCB, as we had the
number too low and there were too many SPAS
started. If you specify that WLM can start an unlimited
number of SPAS, WLM will start a SPAS AE when a
delay “for server” contributes to not making the goals.
Also, WLM will start a SPAS for each service class
served by the Application Environment. Stored
procedures of different service classes will never
execute in the same SPAS. We also learned that you
needed to refresh the AE when you made changes
such as NUMTCB. See Figure 7 for a sample AE
definition in WLM. Refer to the z/OS System
Commands Reference for the commands to display,
start, stop and refresh the AE.
Figure 7
Sample AE Definition
Managing the Enclaves
It was time to get an understanding of how the work
was going to flow and how the work would be
classified.
There would be one DB2 handling the stored
procedures. Requests could come in from local and
remote IMS’s, local and remote batch jobs and
customers on PC’s. This meant dependent and
independent enclaves. The WLM ETR team gave
some ideas for the foundation for the classification.
Dependent (local) enclaves would retain the
classification of the invoker. Independent (remote)
enclaves would have to be classified in the DDF
subsystem.
The difficulty came in where the same stored
procedure could be called by an IMS transaction or a
batch program. Since we had IMS’s on other LPAR’s
generating these calls and batch jobs on still other
LPAR’s generating these calls, we had to figure out a
way to keep the “online” enclaves within response
time goals of the original transaction and let “batch”
enclaves fare as batch work.
We talked with DB2 support at IBM on how to get
some detail data. We were having problems figuring
out what classification criteria to use to distinguish the
DDF work coming in from the remote IMS and the
remote batch. DB2 support told us to turn on
accounting trace options 7 and 8 in DB2 to provide
statistics at the detail enclave/stored procedure level
in the DB2 101’s.
Learning The Data
We collected data from the application testing that
was going on, dumped it and just looked at what we
had. The DB2 101 data had the 7 & 8 trace data in it,
along with a whole bunch of other stuff. In dumping it,
we saw a lot of good information, including the
origination of the enclave, from which we could tell if it
was a batch job or IMS online transaction. All we had
to do was get that information into our WLM policy.
Classifying The DDF Work
We put some basic classification rules in WML DDF
subsystem, based on stored procedure name. We
used Monitor III to look at enclave classification data.
Figure 8 shows the ENCLAVE report. Using the
Options for the ENCLAVE report, we put some
classification data in the column labeled “Attributes”.
Figure 8
Sample Enclave Report
If you put your cursor on the “ENCnnnnn” field and
press enter, you will get all the data available for this
enclave that WLM could use for classification. Figure
9 shows the first screen that pops up. Now you can
see all the information WLM has available to it and
how it relates to what’s on the DB2 accounting
records. It will start to come together for you when
you see it compared to the dumped DB2 101 data.
Figure 9
Enclave Drill-Down
Because Monitor III is a sampling monitor, not a real
time monitor, and things in testing being what they
are, this took some time to get a good idea of the data
involved. You won’t see every enclave in the system
with Monitor III. You will only see an enclave in the
ENCLAVE report if it’s been in the system for two
WLM sampling cycles (.5 secs) and it is there at the
end of the Monitor III MINTIME interval. You can use
the enclave command in either SDSF or EJES (V3.6)
to get a detail snapshot display of the enclaves.
We then went back to the DB2 101 data and the data
available to WLM and started looking for things to help
us distinguish batch from online enclaves in the WLM
classification rules. We came up with nothing.
Moving on to Plan B, we measured CPU consumption
of the enclaves coming from an online transaction and
translated that into service units. Then we put in a
service class (DDFP001) that had two periods—the
first period was long enough to keep the longest
running IMS transaction enclave in period 1; the
second period would cover the batch ones. For period
1, we used a percentile response time goal, for period
2 we went with velocity. The importance of the first
period was 1, to match the importance of the
transactions that were spawning them and the
importance of period 2 was 4 to match batch.
On the Friday prior to the Monday application
production implementation, the application developers
decided to do some volume testing. We knew at that
point that Monday wasn’t going to be pretty. The
results showed that the number of stored procedure
executions (with respect to the number of IMS
transactions) was higher than the application
developers forecasted.
“All Set” For Production Day
Production day came, and everything looked good
until about 9-10 AM. In looking at Monitor III,
everything was being delayed for enclaves. We were
struggling to make the goals for the fastest transaction
goals. In talking with the WLM ETR team, they
suggested we might want to consider lowering the
goals on the enclaves, so we gave that a try. We
increased the response time and decreased the
percentile for the DDFP001 service class in period 1,
but left it at importance 1. We didn’t see as much
delay for enclaves, but there was still more than there
should be. And it was important work (MPR’s) being
delayed. We started to see more service class goals
not being made as volume increased.
Over the next couple of days, giving WLM time to get
settled and giving us time to gather more data and
think about things, we progressed further in the SPAS
adventure. We didn’t want to make knee jerk changes
to the policy, and didn’t want to rely on 100 second
intervals, so it took time. But all in all, if you see 10-20
100 second intervals all looking pretty sad, the 15
minute/hour intervals won’t look much better.
We were seeing intermittent queuing in the IMS
control regions. Our fastest IMS service class was
95% in .5 secs or less. The wrong 5% were the ones
not making the goal, along with some others. These
particular IMS transactions weren’t even part of the
Stored Procedure application. Phones started to ring,
and there were some unhappy customers out in the
field. The online performance guys did some work,
gave us a new service class with 98% having to
complete in .2 seconds and gave us the list of IMS
transaction classes to which it should be applied. In
less than 5 minutes we made the changes, installed
and implemented the policy. That helped those IMS
transactions, and that problem went away. This is
where the IMS classification granularity paid off for
us--we could quickly re-arrange work.
We still had delay problems with the enclaves.
Lowering the goal helped us some, but not enough.
We looked at the local (dependent) enclaves that were
spawned off the local IMS transactions in Importance
1. We were able to isolate them by transaction class
and put them into importance 2. Right about this time,
we also found out there was an application design
defect that was causing the stored procedure to be
called multiple times from a transaction instead of
once. In this case, multiple times = 10-15 times.
Once we put the offending IMS transactions in
importance 2, things got a little better.
Instead of watching both the enclave goals and the
transaction goals, we decided to watch the transaction
goals of the foreign IMS. We wanted to try putting the
DDFP001 in a velocity goal adjust up the velocity if we
saw issues. After all, meeting the transaction goal
was what was important in the end.
We put the DDFP001 service class period 1 into a
velocity goal (45%) with importance 1, and things
calmed down even more. The mix of delay reasons
and who was being delayed was good. It was
indicative of everyone taking turns instead of one
workload dominating the system.
We did some minor tweaking here and there, but we
were pretty much running OK at this point. After the
applications fixed the design defect, we put the
transactions (and their dependent enclaves) back in
importance 1. All was well and we were at the end of
the adventure.
Summary
What works for one place, won’t work for another. If
there was one magnificent WLM policy, IBM would
have published it a long time ago. The end results are
important with WLM, and sometimes the ends do
justify the means.
In this paper, we looked at a couple of WLM-managed
workloads in our shop taken out of the context of our
entire workload. How these workloads fare in our
shop is dependent on how we have the whole policy
set up and the workload itself. The structure might or
might not work in another shop.
However, the migration to WLM transaction
management process entails some basic concepts
that apply everywhere.
1. Get resources lined up.
2. Learn the measurement data.
3. Gather more data than you think you need.
4. Don’t be afraid to change plans.
5. Don’t be afraid to ask IBM questions. This
benefits you and you’ll find out the ETR folks
are great.
6. Set reasonable expectations and know how to
react when things go awry. “What am I going
to do if…..”
7. Understand that you won’t get it right the first
time (see #6).
8. You manage WLM. Let WLM manage the
system.
References and Acknowledgments
• SG24-5326-00 – WLM Redbook
• SG24-6404-00 – IMS V7 Performance
Monitoring and Tuning
• MVS Planning: Workload Management (z/OS
Library)
• RMF Suite of Manuals
• SG24-4693-01 - Getting Started with DB2
Stored Procedures
• Special thanks go to the RMF/WLM ETR Q&A
support team for their contributions to the
project and patience with the “customer”.
• Thanks also go to the RMF, WLM, IMS, DB2
and CICS support folks at IBM who listened
and in a couple instances provided fixes for
us.
• Thanks also to Peter Enrico for his efforts in
preparing and delivering his WLM
presentations and for the WLM/HTML tool
which helped us see the policy better.
Trademarks and Disclaimers
• CICS and DB2 are registered trademarks of
IBM Corporation in the US and other
countries.
• RMF, WLM, z/OS, IMS are trademarks of IBM
Corporation in the US and other countries.
• SAS is a registered trademark of The SAS
Institute, Inc. in the US and other countries.
• MXG is a trademark of Barry Merrill in the US
and other countries.
• Use of and references to products in this
presentation is not intended to be a product
endorsement or recommendation of that
product by The Hartford Financial Services
Group or employees of The Hartford Financial
Services Group.

More Related Content

What's hot

Cloud Security - Security Aspects of Cloud Computing
Cloud Security - Security Aspects of Cloud ComputingCloud Security - Security Aspects of Cloud Computing
Cloud Security - Security Aspects of Cloud ComputingJim Geovedi
 
Security in cloud computing
Security in cloud computingSecurity in cloud computing
Security in cloud computingveena venugopal
 
OpenNASA v2.0 Slideshare Large File
OpenNASA v2.0 Slideshare   Large FileOpenNASA v2.0 Slideshare   Large File
OpenNASA v2.0 Slideshare Large FileMegan Eskey
 
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...Amazon Web Services
 
Impact of busines model elements on cloud computing adoption
Impact of busines model elements on cloud computing adoptionImpact of busines model elements on cloud computing adoption
Impact of busines model elements on cloud computing adoptionAndreja Pucihar
 
Evaluation Of The Data Security Methods In Cloud Computing Environments
Evaluation Of The Data Security Methods In Cloud Computing EnvironmentsEvaluation Of The Data Security Methods In Cloud Computing Environments
Evaluation Of The Data Security Methods In Cloud Computing Environmentsijfcstjournal
 
SOME SECURITY CHALLENGES IN CLOUD COMPUTING
SOME SECURITY CHALLENGES  IN CLOUD COMPUTINGSOME SECURITY CHALLENGES  IN CLOUD COMPUTING
SOME SECURITY CHALLENGES IN CLOUD COMPUTINGHoang Nguyen
 
Introduction to Cloud Computing - CCGRID 2009
Introduction to Cloud Computing - CCGRID 2009Introduction to Cloud Computing - CCGRID 2009
Introduction to Cloud Computing - CCGRID 2009James Broberg
 
Cloud computing understanding security risk and management
Cloud computing   understanding security risk and managementCloud computing   understanding security risk and management
Cloud computing understanding security risk and managementShamsundar Machale (CISSP, CEH)
 
Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)Brian K. Dickard
 
Cloud computing-security-issues
Cloud computing-security-issuesCloud computing-security-issues
Cloud computing-security-issuesAleem Mohammed
 
Cloud Computing Security Issues in Infrastructure as a Service” report
Cloud Computing Security Issues in Infrastructure as a Service” reportCloud Computing Security Issues in Infrastructure as a Service” report
Cloud Computing Security Issues in Infrastructure as a Service” reportVivek Maurya
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing SecurityPiyush Mittal
 
What Everyone Ought To Know About Cloud Security
What Everyone Ought To Know About Cloud SecurityWhat Everyone Ought To Know About Cloud Security
What Everyone Ought To Know About Cloud Securitycraigbalding
 
Presentation on cloud computing security issues using HADOOP and HDFS ARCHITE...
Presentation on cloud computing security issues using HADOOP and HDFS ARCHITE...Presentation on cloud computing security issues using HADOOP and HDFS ARCHITE...
Presentation on cloud computing security issues using HADOOP and HDFS ARCHITE...Pushpa
 
A Short Appraisal on Cloud Computing
A Short Appraisal on Cloud ComputingA Short Appraisal on Cloud Computing
A Short Appraisal on Cloud ComputingScientific Review SR
 
An introduction to the cloud 11 v1
An introduction to the cloud 11 v1An introduction to the cloud 11 v1
An introduction to the cloud 11 v1charan7575
 

What's hot (20)

Cloud computing Fundamentals
Cloud computing FundamentalsCloud computing Fundamentals
Cloud computing Fundamentals
 
Cloud Security - Security Aspects of Cloud Computing
Cloud Security - Security Aspects of Cloud ComputingCloud Security - Security Aspects of Cloud Computing
Cloud Security - Security Aspects of Cloud Computing
 
Security in cloud computing
Security in cloud computingSecurity in cloud computing
Security in cloud computing
 
OpenNASA v2.0 Slideshare Large File
OpenNASA v2.0 Slideshare   Large FileOpenNASA v2.0 Slideshare   Large File
OpenNASA v2.0 Slideshare Large File
 
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
 
Impact of busines model elements on cloud computing adoption
Impact of busines model elements on cloud computing adoptionImpact of busines model elements on cloud computing adoption
Impact of busines model elements on cloud computing adoption
 
Evaluation Of The Data Security Methods In Cloud Computing Environments
Evaluation Of The Data Security Methods In Cloud Computing EnvironmentsEvaluation Of The Data Security Methods In Cloud Computing Environments
Evaluation Of The Data Security Methods In Cloud Computing Environments
 
SOME SECURITY CHALLENGES IN CLOUD COMPUTING
SOME SECURITY CHALLENGES  IN CLOUD COMPUTINGSOME SECURITY CHALLENGES  IN CLOUD COMPUTING
SOME SECURITY CHALLENGES IN CLOUD COMPUTING
 
Introduction to Cloud Computing - CCGRID 2009
Introduction to Cloud Computing - CCGRID 2009Introduction to Cloud Computing - CCGRID 2009
Introduction to Cloud Computing - CCGRID 2009
 
Cloud computing understanding security risk and management
Cloud computing   understanding security risk and managementCloud computing   understanding security risk and management
Cloud computing understanding security risk and management
 
Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)
 
The Cloud: Privacy and Forensics
The Cloud: Privacy and ForensicsThe Cloud: Privacy and Forensics
The Cloud: Privacy and Forensics
 
Cloud security
Cloud security Cloud security
Cloud security
 
Cloud computing-security-issues
Cloud computing-security-issuesCloud computing-security-issues
Cloud computing-security-issues
 
Cloud Computing Security Issues in Infrastructure as a Service” report
Cloud Computing Security Issues in Infrastructure as a Service” reportCloud Computing Security Issues in Infrastructure as a Service” report
Cloud Computing Security Issues in Infrastructure as a Service” report
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing Security
 
What Everyone Ought To Know About Cloud Security
What Everyone Ought To Know About Cloud SecurityWhat Everyone Ought To Know About Cloud Security
What Everyone Ought To Know About Cloud Security
 
Presentation on cloud computing security issues using HADOOP and HDFS ARCHITE...
Presentation on cloud computing security issues using HADOOP and HDFS ARCHITE...Presentation on cloud computing security issues using HADOOP and HDFS ARCHITE...
Presentation on cloud computing security issues using HADOOP and HDFS ARCHITE...
 
A Short Appraisal on Cloud Computing
A Short Appraisal on Cloud ComputingA Short Appraisal on Cloud Computing
A Short Appraisal on Cloud Computing
 
An introduction to the cloud 11 v1
An introduction to the cloud 11 v1An introduction to the cloud 11 v1
An introduction to the cloud 11 v1
 

Similar to CMG White Paper

Jason Nelson_Rapid AWS Service Enablement.pdf
Jason Nelson_Rapid AWS Service Enablement.pdfJason Nelson_Rapid AWS Service Enablement.pdf
Jason Nelson_Rapid AWS Service Enablement.pdfAWS Chicago
 
Phil Green - We're migrating to the cloud - Who needs service management
Phil Green - We're migrating to the cloud - Who needs service managementPhil Green - We're migrating to the cloud - Who needs service management
Phil Green - We're migrating to the cloud - Who needs service managementitSMF UK
 
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of DataIRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of DataIRJET Journal
 
Yapp methodology anjo-kolk
Yapp methodology anjo-kolkYapp methodology anjo-kolk
Yapp methodology anjo-kolkToon Koppelaars
 
TierPoint white paper_How_to_Position_Cloud_ROI_2015
TierPoint white paper_How_to_Position_Cloud_ROI_2015TierPoint white paper_How_to_Position_Cloud_ROI_2015
TierPoint white paper_How_to_Position_Cloud_ROI_2015sllongo3
 
Security management - 2.0 -time - to-replace-your-siem-(1)
Security management - 2.0 -time - to-replace-your-siem-(1)Security management - 2.0 -time - to-replace-your-siem-(1)
Security management - 2.0 -time - to-replace-your-siem-(1)CMR WORLD TECH
 
De-Mystifying Capacity Management in the Digital World
De-Mystifying Capacity Management in the Digital WorldDe-Mystifying Capacity Management in the Digital World
De-Mystifying Capacity Management in the Digital WorldPrecisely
 
Design Summit - Advanced policy state management - John Hardy
Design Summit - Advanced policy state management - John HardyDesign Summit - Advanced policy state management - John Hardy
Design Summit - Advanced policy state management - John HardyManageIQ
 
Webinar - The continuous improvement cycle of business processes
Webinar - The continuous improvement cycle of business processesWebinar - The continuous improvement cycle of business processes
Webinar - The continuous improvement cycle of business processesAuraQuantic
 
CICS_TS_White_Paper_PJ_PNT003
CICS_TS_White_Paper_PJ_PNT003CICS_TS_White_Paper_PJ_PNT003
CICS_TS_White_Paper_PJ_PNT003Paul Johnson
 
Finding the true value of cloud computing
Finding the true value of cloud computingFinding the true value of cloud computing
Finding the true value of cloud computingDavid Linthicum
 
FBPM2-Chapter09-ProcessAwareInformationSystems.pptx
FBPM2-Chapter09-ProcessAwareInformationSystems.pptxFBPM2-Chapter09-ProcessAwareInformationSystems.pptx
FBPM2-Chapter09-ProcessAwareInformationSystems.pptxssuser0d0f881
 

Similar to CMG White Paper (20)

cmg_las
cmg_lascmg_las
cmg_las
 
Dit yvol5iss13
Dit yvol5iss13Dit yvol5iss13
Dit yvol5iss13
 
Soa To The Rescue
Soa To The RescueSoa To The Rescue
Soa To The Rescue
 
Jason Nelson_Rapid AWS Service Enablement.pdf
Jason Nelson_Rapid AWS Service Enablement.pdfJason Nelson_Rapid AWS Service Enablement.pdf
Jason Nelson_Rapid AWS Service Enablement.pdf
 
Phil Green - We're migrating to the cloud - Who needs service management
Phil Green - We're migrating to the cloud - Who needs service managementPhil Green - We're migrating to the cloud - Who needs service management
Phil Green - We're migrating to the cloud - Who needs service management
 
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of DataIRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
 
Dit yvol5iss35
Dit yvol5iss35Dit yvol5iss35
Dit yvol5iss35
 
Yapp methodology anjo-kolk
Yapp methodology anjo-kolkYapp methodology anjo-kolk
Yapp methodology anjo-kolk
 
TierPoint white paper_How_to_Position_Cloud_ROI_2015
TierPoint white paper_How_to_Position_Cloud_ROI_2015TierPoint white paper_How_to_Position_Cloud_ROI_2015
TierPoint white paper_How_to_Position_Cloud_ROI_2015
 
Dit yvol2iss41
Dit yvol2iss41Dit yvol2iss41
Dit yvol2iss41
 
Security management - 2.0 -time - to-replace-your-siem-(1)
Security management - 2.0 -time - to-replace-your-siem-(1)Security management - 2.0 -time - to-replace-your-siem-(1)
Security management - 2.0 -time - to-replace-your-siem-(1)
 
De-Mystifying Capacity Management in the Digital World
De-Mystifying Capacity Management in the Digital WorldDe-Mystifying Capacity Management in the Digital World
De-Mystifying Capacity Management in the Digital World
 
Design Summit - Advanced policy state management - John Hardy
Design Summit - Advanced policy state management - John HardyDesign Summit - Advanced policy state management - John Hardy
Design Summit - Advanced policy state management - John Hardy
 
Webinar - The continuous improvement cycle of business processes
Webinar - The continuous improvement cycle of business processesWebinar - The continuous improvement cycle of business processes
Webinar - The continuous improvement cycle of business processes
 
CICS_TS_White_Paper_PJ_PNT003
CICS_TS_White_Paper_PJ_PNT003CICS_TS_White_Paper_PJ_PNT003
CICS_TS_White_Paper_PJ_PNT003
 
Finding the true value of cloud computing
Finding the true value of cloud computingFinding the true value of cloud computing
Finding the true value of cloud computing
 
Building a SaaS Style Application
Building a SaaS Style ApplicationBuilding a SaaS Style Application
Building a SaaS Style Application
 
FBPM2-Chapter09-ProcessAwareInformationSystems.pptx
FBPM2-Chapter09-ProcessAwareInformationSystems.pptxFBPM2-Chapter09-ProcessAwareInformationSystems.pptx
FBPM2-Chapter09-ProcessAwareInformationSystems.pptx
 
Dit yvol3iss25
Dit yvol3iss25Dit yvol3iss25
Dit yvol3iss25
 
Dit yvol2iss25
Dit yvol2iss25Dit yvol2iss25
Dit yvol2iss25
 

CMG White Paper

  • 1. Workload Manager Our Experiences Implementing IMS And CICS Transaction Goals And DB2 Stored Procedure Management Len Jejer Ray Smith The Hartford Financial Services Group This paper relates our experiences converting to Workload Manager response time goals for our CICS and IMS online transaction environments. Also included are experiences with implementing WLM- managed DB2 Stored Procedures. Introduction This is the story of the conversion of our online workloads to WLM transaction management. Out of necessity, we first converted a CICS system that was not making its SLA. After we achieved some success with the CICS conversion, we moved on to converting our IMS regions to transaction management. We were later presented with an opportunity to use WLM to manage our DB2 Stored Procedures. WLM Level-Set First, keep in mind that WLM’s entire philosophy is to meet workload goals and to optimize resource utilization. If consideration isn’t given to that philosophy when developing goals and classifications, it’s going to be difficult to obtain good results. WLM uses samples to tell what’s going on. It will sample each performance block once every quarter second. There is one performance block for each address space. When managing response time goals for a CICS region, WLM builds a PB (Performance Block) “table” with the number of entries equal to the MAXTASK parameter for that region. WLM goes through each PB whether it’s used or not in each sampling cycle. To avoid unnecessary WLM overhead, remember to review the MAXTASK’s for each region. The single most influential input WLM uses is service class importance. This is how WLM prioritizes goal achievement. Higher importance work tends to receive resources; lower importance tends to donate resources. Make sure you have enough donors. We didn’t at first and that made it difficult to achieve our project goals. WLM can take cycles from higher importance work that is exceeding the goal and re-distribute them to lower importance or discretionary work; however, this isn’t the norm. WLM does this by imposing internal resource group capping. You may see “RG-Cap” for delay reason, even though you have no capped resource groups. The three types of goals WLM supports are velocity, average response time and percentile response time goals. The easiest to understand and most precise are percentile response time goals. They are less likely to be influenced by out-lying transactions and do not have to be revisited frequently due to workload or environmental changes. Average response time goals aren’t the best choice because they are easily influenced by outliers. When using either of the response time goals, make sure that the goals you set are realistic. WLM will spin its wheels trying to make goals when the goals cannot possibly be made. If this work is high importance work, all the lower importance levels will not get the help they may need. Velocity goals are difficult in concept and implementation. “Not making the goal” can be attributed to real delays or simply lack of samples in the service class. Measured velocity can fluctuate with changes in hardware as well, meaning that velocity goals must constantly be re-evaluated. We weren’t having good luck with velocity, even with a goal of 90% velocity on the problem CICS regions. In general, we take the approach of only using velocity
  • 2. goals where we absolutely have to. Most batch workloads and STC workloads are good candidates for velocity goals because they tend to be long running, in some cases for the life of the IPL. Response time goals aren’t usually suited for these types of background work. The CICS Project Faced with failing to meet the SLA (98% of all application transactions complete in 2 seconds or less) in two critical CICS applications and having failed using WLM velocity goals, we decided to try WLM transaction management to get the transaction response time more in line with the customer requirements. In our SLA diagrams, the graphs represent two high profile applications with service expectation that 98% of transactions will complete in two seconds or less. They are located in two separate CICS regions. Figure 1 shows the end-to-end response time percentiles before the WLM conversion began. Figure 1 Online CICS Performance Prior to WLM Conversion Environment The CICS systems were on an Amdahl 8-way CMOS processor running 2 logical partitions and were processing 1.5 million transactions a day. The operating system was OS/390 2.10 and CICS was Version 4. The processor was at 100% utilization during month-end processing with considerable latent demand. Among the month-end batch, there were 10 jobs taking 5 hours each of CPU time. They ran 5 of these at a time, which significantly impeded other work in the system. There were also DDF enclaves that were not well behaved. Resources and Tools We started by lining up resources. We found out IBMLINK is the richest place to go. Between the IBMLINK database and the RMF/WLM ETR Q&A, there was a wealth of information. There are also Redbooks covering WLM that discuss transaction management. 75% of our knowledge came from the answers to questions we asked the Q&A ETR folks and help they gave us, the Redbooks and hits we found on IBMLINK. We were using TMON at the time for our CICS monitor and we were using RMF for our MVS Monitor. We also had SAS/MXG and we used TYPE72GO records. We used RMF/PM and Monitor III as well as RMF postprocessor reports: SYSRPTS(WLMGL(SCPER(<serviceclassname>))) SYSRPTS(WLMGL(RCPER(<reportclassname>))). These two reports will give you response time distributions. Data Gathering and Analysis CICS 110 data was invaluable. We gathered and dumped a portion of it just to see what there was that we could use. We ran statistical programs to get volume percentiles. We discovered that 80% of the transaction volume was covered by less than 12 individual transactions. We used the 110 data to help us establish the goals to use. Goal Determination Using the response time field in the CICS 110 data, we developed 3 buckets that more or less delineated our CICS transactions. Those 3 buckets became our first attempt at establishing service classes. We tried using the high volume/short running transactions to lift the region. We started out with three service classes TRANFAST, TRANSLOW and TRANMED. We put the CICS system transactions in TRANSLOW. We set the importance for the transaction service classes at 1. We took a conservative approach in choosing the percentage making the response time goal. This number turned out to be a good tweaking tool.
  • 3. Keep in mind that since there is only one address space and only one dispatching priority, the region’s DP will be managed to the most aggressive goal. This means that some transactions will get a “free ride” as the high volume transactions with high importance will tend to “lift” the region. Classification Determination In hindsight and with more WLM experience, we “over-classified” in this project. Later CICS implementations consisted of one service class for a region. In this project we had about 12 transactions classified. Report classes were used with these so we could get reporting granularity. Also, for a first shot at doing transaction response time management, going through picking out transactions and classifying them is not a bad thing. We saw stuff that we would not have seen in a blanket classification. If this is your first attempt at this, go through the motions to better understand the workload in the enterprise and how WLM works. The first step is to put the data in the CICS subsystem classification rules. We kept it simple, just using transaction name. Later on, we got a little fancier and added subsystem instances. We changed service class names as well. See Figure 2 for a sample screenshot of the CICS subsystem classification panel. Figure 2 Sample CICS Subsystem Classification We put the highest volume transactions first, to have a better chance of getting out of classification routines relatively quickly and reducing WLM overhead. Our CICS’s are run as jobs. So the regions are in the JES section of the classification rules. If you PF11 over a couple times in the classification section (same in STC and JES), you will see a column that says “Manage Region Using Goals Of”. See Figure 3 for a sample screenshot of the JES subsystem classification panel. Figure 3 Managing According to Goals Of Region Be careful, as the default for management is TRANSACTION. At the start, make sure you set this to REGION. This way, when you put the information in the CICS subsystem section, WLM will still manage according to the velocity goals set for the region and not the transaction response time goals. It’s not until you change these over to “TRANSACTION” that WLM will actually start using the response time goals. So with “REGION” in that column and the CICS transactions and service classes in the policy, we installed and activated the new policy. What We Saw CICS 110 data started picking up the service class, and TMON was showing it in the transaction screen. So we knew we did something right and we were then able to use the 110 data to make sure that we had the classification rules right. We also used that data to simulate response time goal achievement to some extent. We went through cycles of tweaks, still not managing according to response time goals, and re-iterating the measurements. When we were somewhat satisfied with the numbers we were getting, we put the response time goals to work. Transaction Management Implementation We went into the policy definition, changed “Manage Region Using Goals Of” to TRANSACTION (see Figure 4) for each region, installed and activated the policy and saw a wonderful thing. Using the Monitor III SYSSUM report, we saw transaction flow start to smooth out. We had started to achieve WLM goals which meant achieving the SLA.
  • 4. Figure 4 Managing According to Goals Of Transaction We tuned by moving transactions around to different service classes and by modifying response time objectives. We eventually got to a point of diminishing returns. Customer Reaction “How come my transaction is in the slow class?” A TMON user noticed one of his transactions was in TRANSLOW. We found out that it’s not a good idea to have connotations of speed in service class names. After we explained that the transactions weren’t being slowed down, we changed the service class names changed from TRANFAST, TRANMED and TRANSLOW to TRANCP01, TRANCP02 and TRANCP03. Results With no other changes to the system except putting the transactions in response time goal mode, the SLA graph looked like Figure 5. Figure 5 Online CICS Performance After to WLM Conversion This represents 1.5 million transactions with the processor at 100% utilization. We noticed we consistently had the dip starting around 10AM in the blue application’s response time results. We couldn’t figure out exactly what was causing it. Peter Enrico spoke at a Connecticut CMG meeting about taking work of out of SYSSTC. We got back home and took another look at the policy. We took Netview, DB2 and HSM out of SYSSTC; however, we did leave the IRLM address space in SYSSTC. There were concerns raised about taking DB2 out of SYSSTC, but the transaction PB would govern any threads presented by CICS and DB2 would be managed according to that PB. We implemented the policy and measured, and the 10 AM dip on the blue application disappeared. Figure 6 shows the SLA graph for a month-end processing day after taking work out of SYSSTC. Figure 6 Online CICS Performance After Tweaking CICS System Transactions We eventually weeded out the CICS system transactions, some long running, some never ending and put them in their own service class. The overall performance was not affected in our case. IMS Was Next On another sysplex, we have a bigger IMS environment. The IMS was well behaved to some degree, but there was some month-end stress. We have 4 production control regions and 5 test control regions. By the time we got around to IMS, we were at z/OS 1.4 (+OA06672 for SYSRTD reports) and IMS V7 (+PQ71906 for OTMA classification). The processor was an IBM Z900 1C6. At the z/OS 1.2 level of WLM, IBM includes the capability of measuring response time distribution from report classes defined in the IMS subsystem section without having management by response time actually turned on. This made things much easier.
  • 5. Classification of IMS Transactions Our MPR’s were already set up to handle transaction classes with similar response time requirements. So it made sense for us to classify by IMS subsystem and transaction class. There was a lot to type into the policy, but it paid off in the long run. We mapped out our report classes according to the IMS region and IMS transaction class. This gave us homogenous report classes which are required to effectively use RMF response time distribution reports (+OA06672) by report class. Our approach was to be all inclusive, non-overlapping and non-defaulting. We installed the policy with the definitions in the IMS subsystem section and started to measure. Our implementation provided for granularity at the transaction class level. We could get RMF response time distribution reports by IMS region/transaction class. What We Measured We were surprised at first to see that when we added up the counts in the report classes and then added up the transactions in the IMS log data, they didn’t match. After investigating where the missing transactions were appearing and then working with IBM, we came to the conclusion that the OTMA transactions were the culprits. This led to APAR PQ71906 for IMS Version 7--the transaction class wasn’t being passed properly. Using the Report Class response time distribution statistics (from MXG TYPE72GO), we were able to model exactly what was going on by using SAS to map the report class statistics to what would be service class statistics. We could move stuff around, implement a new policy, do response time distribution measurements again and repeat. We knew ahead of implementation where we would be as far as making the goals. We were able to model at 100% accuracy with no disruption. Implementation We went into the JES/STC classification rules and changed all the related IMS address spaces to “Manage Region Using Goals Of” to TRANSACTION. This included control regions, MPR’s, DBRC, DLI and IMS Connect address spaces. We installed and activated the policy. There wasn’t much tweaking to do at this point, as we had already done all the tweaking there was to do during the modeling. Observations We saw more consistent IMS response times during times of resource shortages. We were able to over- achieve aggressive IMS goals with a volume of 3,000,000 customer transactions in an 8 hour period with the processor running 100%. Overall Benefits We had a better handle on IMS performance. Month- end became more hands-off. We were able to take DB2 out of SYSSTC on that sysplex in preparation for a policy overhaul that took place after attending Peter Enrico’s “Revisiting WLM Goals” class. Using Monitor III, we could tell at a glance which IMS region and which transaction class was contributing to any missed goals using the report class in SYSSUM. Any missed goals were usually due to looping or abending transactions. We still revisit the goals and make adjustments as needed. After our policy re-write, we only have some of the online transactions in importance 1, along with some DDF enclaves. That’s it. The bulk of our work is in IMP=3, 4, 5 or discretionary. We now keep IMP=1 for production online transactions only. The whole transaction conversion process taught us a lot about WLM. Having the IMS workload in transaction response time mode helped us a lot when we started using WLM-managed DB2 Stored Procedures. We had some problems getting that running smooth at first, as we had to deal with dependent and independent enclaves, all doing the same thing. Having the IMS structured as we did, made it easy to shift IMS workloads around by transaction classes to different service classes. Managing DB2 Stored Procedures The DB2 administrator came over one day and started talking about a new application that was going to use DB2 Stored Procedures. He wanted to use WLM to manage the address spaces. It sounded good and we started another WLM adventure. DB2 SPAS Application Environment
  • 6. We started by reading the Redbook on DB2 Stored Procedures and talked with the DB2 administrator about setting up the WLM Application Environment (AE). After we had some operational issues, we agreed that the parameters, such as NUMTCB, would be coded in the DB2 SPAS JCL and not the AE definition in the WLM policy. This was more because of our particular organization, rather than any technical reasons. A little later, we changed the NUMTCB, as we had the number too low and there were too many SPAS started. If you specify that WLM can start an unlimited number of SPAS, WLM will start a SPAS AE when a delay “for server” contributes to not making the goals. Also, WLM will start a SPAS for each service class served by the Application Environment. Stored procedures of different service classes will never execute in the same SPAS. We also learned that you needed to refresh the AE when you made changes such as NUMTCB. See Figure 7 for a sample AE definition in WLM. Refer to the z/OS System Commands Reference for the commands to display, start, stop and refresh the AE. Figure 7 Sample AE Definition Managing the Enclaves It was time to get an understanding of how the work was going to flow and how the work would be classified. There would be one DB2 handling the stored procedures. Requests could come in from local and remote IMS’s, local and remote batch jobs and customers on PC’s. This meant dependent and independent enclaves. The WLM ETR team gave some ideas for the foundation for the classification. Dependent (local) enclaves would retain the classification of the invoker. Independent (remote) enclaves would have to be classified in the DDF subsystem. The difficulty came in where the same stored procedure could be called by an IMS transaction or a batch program. Since we had IMS’s on other LPAR’s generating these calls and batch jobs on still other LPAR’s generating these calls, we had to figure out a way to keep the “online” enclaves within response time goals of the original transaction and let “batch” enclaves fare as batch work. We talked with DB2 support at IBM on how to get some detail data. We were having problems figuring out what classification criteria to use to distinguish the DDF work coming in from the remote IMS and the remote batch. DB2 support told us to turn on accounting trace options 7 and 8 in DB2 to provide statistics at the detail enclave/stored procedure level in the DB2 101’s. Learning The Data We collected data from the application testing that was going on, dumped it and just looked at what we had. The DB2 101 data had the 7 & 8 trace data in it, along with a whole bunch of other stuff. In dumping it, we saw a lot of good information, including the origination of the enclave, from which we could tell if it was a batch job or IMS online transaction. All we had to do was get that information into our WLM policy. Classifying The DDF Work We put some basic classification rules in WML DDF subsystem, based on stored procedure name. We used Monitor III to look at enclave classification data. Figure 8 shows the ENCLAVE report. Using the Options for the ENCLAVE report, we put some classification data in the column labeled “Attributes”. Figure 8 Sample Enclave Report If you put your cursor on the “ENCnnnnn” field and press enter, you will get all the data available for this enclave that WLM could use for classification. Figure 9 shows the first screen that pops up. Now you can see all the information WLM has available to it and how it relates to what’s on the DB2 accounting
  • 7. records. It will start to come together for you when you see it compared to the dumped DB2 101 data. Figure 9 Enclave Drill-Down Because Monitor III is a sampling monitor, not a real time monitor, and things in testing being what they are, this took some time to get a good idea of the data involved. You won’t see every enclave in the system with Monitor III. You will only see an enclave in the ENCLAVE report if it’s been in the system for two WLM sampling cycles (.5 secs) and it is there at the end of the Monitor III MINTIME interval. You can use the enclave command in either SDSF or EJES (V3.6) to get a detail snapshot display of the enclaves. We then went back to the DB2 101 data and the data available to WLM and started looking for things to help us distinguish batch from online enclaves in the WLM classification rules. We came up with nothing. Moving on to Plan B, we measured CPU consumption of the enclaves coming from an online transaction and translated that into service units. Then we put in a service class (DDFP001) that had two periods—the first period was long enough to keep the longest running IMS transaction enclave in period 1; the second period would cover the batch ones. For period 1, we used a percentile response time goal, for period 2 we went with velocity. The importance of the first period was 1, to match the importance of the transactions that were spawning them and the importance of period 2 was 4 to match batch. On the Friday prior to the Monday application production implementation, the application developers decided to do some volume testing. We knew at that point that Monday wasn’t going to be pretty. The results showed that the number of stored procedure executions (with respect to the number of IMS transactions) was higher than the application developers forecasted. “All Set” For Production Day Production day came, and everything looked good until about 9-10 AM. In looking at Monitor III, everything was being delayed for enclaves. We were struggling to make the goals for the fastest transaction goals. In talking with the WLM ETR team, they suggested we might want to consider lowering the goals on the enclaves, so we gave that a try. We increased the response time and decreased the percentile for the DDFP001 service class in period 1, but left it at importance 1. We didn’t see as much delay for enclaves, but there was still more than there should be. And it was important work (MPR’s) being delayed. We started to see more service class goals not being made as volume increased. Over the next couple of days, giving WLM time to get settled and giving us time to gather more data and think about things, we progressed further in the SPAS adventure. We didn’t want to make knee jerk changes to the policy, and didn’t want to rely on 100 second intervals, so it took time. But all in all, if you see 10-20 100 second intervals all looking pretty sad, the 15 minute/hour intervals won’t look much better. We were seeing intermittent queuing in the IMS control regions. Our fastest IMS service class was 95% in .5 secs or less. The wrong 5% were the ones not making the goal, along with some others. These particular IMS transactions weren’t even part of the Stored Procedure application. Phones started to ring, and there were some unhappy customers out in the field. The online performance guys did some work, gave us a new service class with 98% having to complete in .2 seconds and gave us the list of IMS transaction classes to which it should be applied. In less than 5 minutes we made the changes, installed and implemented the policy. That helped those IMS transactions, and that problem went away. This is where the IMS classification granularity paid off for us--we could quickly re-arrange work. We still had delay problems with the enclaves. Lowering the goal helped us some, but not enough. We looked at the local (dependent) enclaves that were spawned off the local IMS transactions in Importance 1. We were able to isolate them by transaction class and put them into importance 2. Right about this time, we also found out there was an application design defect that was causing the stored procedure to be called multiple times from a transaction instead of once. In this case, multiple times = 10-15 times. Once we put the offending IMS transactions in importance 2, things got a little better. Instead of watching both the enclave goals and the transaction goals, we decided to watch the transaction goals of the foreign IMS. We wanted to try putting the
  • 8. DDFP001 in a velocity goal adjust up the velocity if we saw issues. After all, meeting the transaction goal was what was important in the end. We put the DDFP001 service class period 1 into a velocity goal (45%) with importance 1, and things calmed down even more. The mix of delay reasons and who was being delayed was good. It was indicative of everyone taking turns instead of one workload dominating the system. We did some minor tweaking here and there, but we were pretty much running OK at this point. After the applications fixed the design defect, we put the transactions (and their dependent enclaves) back in importance 1. All was well and we were at the end of the adventure. Summary What works for one place, won’t work for another. If there was one magnificent WLM policy, IBM would have published it a long time ago. The end results are important with WLM, and sometimes the ends do justify the means. In this paper, we looked at a couple of WLM-managed workloads in our shop taken out of the context of our entire workload. How these workloads fare in our shop is dependent on how we have the whole policy set up and the workload itself. The structure might or might not work in another shop. However, the migration to WLM transaction management process entails some basic concepts that apply everywhere. 1. Get resources lined up. 2. Learn the measurement data. 3. Gather more data than you think you need. 4. Don’t be afraid to change plans. 5. Don’t be afraid to ask IBM questions. This benefits you and you’ll find out the ETR folks are great. 6. Set reasonable expectations and know how to react when things go awry. “What am I going to do if…..” 7. Understand that you won’t get it right the first time (see #6). 8. You manage WLM. Let WLM manage the system. References and Acknowledgments • SG24-5326-00 – WLM Redbook • SG24-6404-00 – IMS V7 Performance Monitoring and Tuning • MVS Planning: Workload Management (z/OS Library) • RMF Suite of Manuals • SG24-4693-01 - Getting Started with DB2 Stored Procedures • Special thanks go to the RMF/WLM ETR Q&A support team for their contributions to the project and patience with the “customer”. • Thanks also go to the RMF, WLM, IMS, DB2 and CICS support folks at IBM who listened and in a couple instances provided fixes for us. • Thanks also to Peter Enrico for his efforts in preparing and delivering his WLM presentations and for the WLM/HTML tool which helped us see the policy better. Trademarks and Disclaimers • CICS and DB2 are registered trademarks of IBM Corporation in the US and other countries. • RMF, WLM, z/OS, IMS are trademarks of IBM Corporation in the US and other countries. • SAS is a registered trademark of The SAS Institute, Inc. in the US and other countries. • MXG is a trademark of Barry Merrill in the US and other countries. • Use of and references to products in this presentation is not intended to be a product endorsement or recommendation of that product by The Hartford Financial Services Group or employees of The Hartford Financial Services Group.