Much Ado About CPU

  • 1,663 views
Uploaded on

My - regularly updated - on z/OS mainframe performance information in the CPU area.

My - regularly updated - on z/OS mainframe performance information in the CPU area.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,663
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
92
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Much Ado About CPU Martin Packer +44-20-8832-5167 martin_packer@uk.ibm.com Twitter: MartinPacker 1 © 2009 IBM Corporation
  • 2. Abstract zSeries, System z9 and z10 processors have in recent years introduced a number of capabilities of real value to mainframe customers. These capabilities have, however, required changes in the way we think about CPU management. This presentation describes these capabilities and how to evolve your CPU management to take them into account. It is based on the author's experience of evolving his reporting to support these changes. This presentation is substantially enhanced this year © 2009 IBM Corporation 2
  • 3. Agenda Review of technology quot;Traditionalquot; LPAR Configuration and IRD Coupling Facility CPU IFL zAAP and zIIP z/OS Release 8, 9 and 10 Changes Soft Capping and Group Capacity Limits Blocked Workloads z10 Hiperdispatch Cool It I/O Assist Processors (IOPs) In Conclusion Backup Foils © 2009 IBM Corporation 3
  • 4. R Review of Technology © 2009 IBM Corporation 4
  • 5. quot;Characterisablequot; Engines  ICF CPs –Run CF code only –Dynamic Dispatch an option  IFL CPs –Run Linux only (though often under VM)  z/OS engines zAAPs –Run java code quot;offloadedquot; from regular CPs (GCPs) –Also System XML etc –zIIPs –Run certain kinds of DB2 work offloaded from GCPs –z/OS Global Mirror –IPSec Encryption GCPs –General purpose CPs “Non-Characterisablequot; Engines –SAPs –I/O Assist Processor –Spares ●Fewer in multi-book z9 and z10 machines than in z990 © 2009 IBM Corporation 5
  • 6. Book-Structured From z990 Onwards Connected by rings in z990 and z9 ● z10 ensures all books connected to all books directly ● Data transfers are direct between books via the L2 Cache chip in each book's MCM ● L2 Cache is shared by every PU on the MCM ● Only 1 book in z890, z9 BC and z10 BC models ● © 2009 IBM Corporation 6
  • 7. IRD CPU Management Weight Management for GCP engines –Alter weights within an LPAR Cluster –Shifts of 10% of weight CP Management –Vary LOGICAL CPs on and off –Only for GCP engines WLM objectives –Optimise goal attainment –Optimise PR/SM overhead –Optimise LPAR throughput Part of quot;On Demandquot; picture –Ensure you have defined reserved engines –Make weights sensible to allow shifts to happen © 2009 IBM Corporation 7
  • 8. quot;Traditionalquot; LPAR Configuration and IRD © 2009 IBM Corporation 8
  • 9. Some Old Questions How do we evolve our performance and capacity reporting? Should we define an LPAR with dedicated engines? –Or with shared engines? What should the weights be? In total and individually And what about in each pool? How many engines should each LPAR have? –For Dedicated engines: •The number is usually fairly obvious –For Shared engines: • The number of engines should roughly match share of pool •Other considerations often apply, though © 2009 IBM Corporation 9
  • 10. Increasing Complexity Installations are increasing the numbers of LPARs on a machine –Many exceed 10 per footprint ƒExpect 20 + soon ƒAnd have more logical and physical engines ƒAnd increasing the diversity of their LPARs ƒGreater incidence of IFLs ƒFast uptake of zIIPs and zAAPs •Sometimes meaning 2 engine speeds ƒFewer stand-alone CF configurations •With mergers etc. the numbers of machines managed by a team is increasing •And stuff's got more dynamic, too As an aside... ● ● Shouldn't systems be self-documenting? © 2009 IBM Corporation 10
  • 11. IRD Weights altered dynamically But only for GCPs Numbers of engines altered dynamically But only for GCPs And not with HiperDispatch turned on These introduce their own problems: –Varying weights when doing quot;sharequot; calculations in reporting –Fractional engines and varying engines Number of engines may go down when machine gets very busy This MIGHT be a surprising result –This is OK if goals are still met ƒIn the example in the backup foils even the minimum engine count is well above actual LPAR capacity requirement Backup Fo © 2009 IBM Corporation 11
  • 12. CPU Analysis with IRD z/OS image utilisation becomes less tenable –How do you compare 90% of 4 engines to 80% of 5? ƒCould happen in neighbouring intervals ƒAnswer: 3.6 engines vs 4.0 engines Capture ratio needs to take into account fractional engines –And varying at that Percent of share becomes less meaningful –As denominator can vary with varying weights Stacking up Partition Data Report utilisations still makes sense –Probably best way of summarising footprint and z/OS image utilisation –This is true for all pools, though IRD only relates to GCPs © 2009 IBM Corporation 12
  • 13. Coupling Facility CPU © 2009 IBM Corporation 13
  • 14. Internal Coupling Facility (ICF) Managed out of common Pool 2 in z990 –Out of Pool 5 in z9 and z10 –Pool numbers given in SMF 70 as index into table of labels – Called “ICF” in both z990 and z9 / z10 Recommendation: Manage in reporting as a separate pool Follow special CF sizing guidelines –Especially for takeover situations Always runs at full speed So good technology match for coupled z/OS images on same footprint Another good reason to use ICFs is IC links Shared ICFs strongly discouraged for Production Especially if the CF image has Dynamic Dispatch turned on © 2009 IBM Corporation 14
  • 15. ICF ... R744PBSY and R744PWAI add up to SMF 70-1 LPAR view of processor busy •PBSY is CPU time processing requests •PWAI is CPU time while CFCC is not processing requests but it is still using CF cycles •For Dynamic Dispatch PWAI is time when not processing CF requests but Logical CP not yet taken back by PR/SM •For dedicated or non-Dynamic Dispatch cases sum is constant •For Dynamic Dispatch sum can vary. Number of defined processors is number of CF Processor Data sections in 74-4 PBSY and PWAI Can be examined down to Coupling Facility engine level SMF 74-4 has much more besides CF Utilisation © 2009 IBM Corporation 15
  • 16. ICF ...  Need to correlate SMF 70-1 with SMF 74-4 CF Utilisation to get proper CPU picture  Since z/OS Release 8 74-4 has machine serial number – Allows correlation in most cases – Partition number added to 74-4 in OA21140 • Enables correlation with 70-1 when LPAR name is not the Coupling Facility Name © 2009 IBM Corporation 16
  • 17. Structure-Level CPU Consumption  CFLEVEL 15 and z/OS R.9  Always 100% Capture Ratio – Adds up to R744PBSY  Multiple uses: – Capacity planning for changing request rates – Examine which structures are large consumers – Compute CPU cost of a request • And compare to service time • Interesting number is “non-CPU” element of service time – as we shall see  NOTE: – Need to collect 74-4 data from all z/OS systems sharing to get total request rate © 2009 IBM Corporation 17
  • 18. Structure CPU Experiment © 2009 IBM Corporation 18
  • 19. Structure CPU Experiment  Based on – R744SETM Structure Execution Time – Sync Request Rate • Virtually no Async – Sync Service Time  One minute RMF intervals – Sorted by request rate increasing  Run was 1-way DB2 Datasharing – Only really active structures ISGLOCK and LOCK1  Red lines are CPU time per request – Blue lines are Service time per request  ISGLOCK “low volume” – Shows amortization of some fixed cost effect • Wondering also if some “practice effect” affects service times – CF used IC links  LOCK1 “high volume” – More reliable for capacity planning – CF used a mixture of ISC and ICB links © 2009 IBM Corporation 19
  • 20. ISGLOCK Requests 16 14 12 Microseconds 10 8 6 4 2 0 0 10 20 30 40 50 60 70 Requests / Second CPU Time Service Time © 2009 IBM Corporation 20
  • 21. LOCK1 Requests 12 10 8 Microseconds 6 4 2 0 750 800 850 900 Requests / Second CPU Time Service Time © 2009 IBM Corporation 21
  • 22. And From My Travels...  Next chart isn't from the experiment just described – A real customer system  A Group Buffer Pool  ISC-Connected – Necessary for the customer's estate  Clearly something goes wrong at about 1100 requests / second – Especially in response time terms but also CPU • (Coupling Facility not CPU constrained)  Options include – Managing the request rate to below 1100 / sec – Working on the request mix – Infrastructure reconfiguration © 2009 IBM Corporation 22
  • 23. © 2009 IBM Corporation 23
  • 24. IFL
  • 25. IFL Integrated Facility for Linux –Runs Linux ƒPerhaps under VM “Pool 2“ in z990 Separate “Pool 4” in z9 and z10 Labeled “IFL” Can be managed under IRD –Set velocity goals –Weight Management only ƒNot CP management For a good view of utilisation use VM etc monitors –Unless shared IFL –Always runs at full speed © 2009 IBM Corporation 25
  • 26. zAAP and zIIP © 2009 IBM Corporation 26
  • 27. zAAP and zIIP Must not exceed number of GCPs Run at full speed, even if GCPs don't Hardcapping but no softcapping zAAP “Pool 2quot; engines in z990 Separate Pool 3 in z9 and z10 for zAAP Separate Pool 6 for zIIPs Not managed by IRD –Weight is the INITIAL LPAR weight © 2009 IBM Corporation 27
  • 28. zAAP and zIIP Management zAAP-supported workloads can also run on GCP –If IFACROSSOVER=YES –zAAP-supported workload runs on GCP at priority of original workload ƒIf IFAHONORPRIORITY=YES ƒOA14131 removes need for IFACROSSOVER=YES in order to use IFAHONORPRIORITY=YES zIIP implementation similar to zAAP Reporting almost identical in RMF and Type 30 Simplified management IFAHONORPRIORITY not used “YES” behaviour always © 2009 IBM Corporation 28
  • 29. SMF Type 70 and zAAP (Similar For zIIP) Field Section Description SMF70PRF Bit 4 Product IFA processors available SMF70IFA CPU Control Number of IFA processors online at end of interval SMF70TYP CPU Data This engine is an IFA SMF70CIX Logical Processor 2 if quot;Pool 2quot; i.e IFA, ICF or IFL Data © 2009 IBM Corporation 29
  • 30. SMF Type 72 and zAAP (Similar For zIIP) RMF Workload Activity Report --SERVICE TIMES-- <- TCB time (seconds) TCB 581.6 SRB 0.0 RCT 0.0 IIT 0.0 HST 0.0 <- zAAP time (seconds) IFA 0.0 <- GCP % of an engine APPL% CP 64.6 <-% of an engine that could have been zAAP but wasn't APPL% IFACP 0.0 <-% of an engine that used a zAAP APPL% IFA 0.0 © 2009 IBM Corporation 30
  • 31. SMF Type 72 / 30 and zAAP (Similar For zIIP) APPL% IFACP is a subset of APPL% CP Field R723NFFI is normalization factor for IFA service time –Used to convert between real IFA time and normalized IFA times ƒEquivalent time on a GCP ƒMultiply R723IFAT x R723NFFI / 256 = normalized IFA time R723IFAU, R723IFCU, R723IFAD state samples –IFA Using, IFA on CP Using, IFA Delay SMF30: –SMF30CPT includes time spent on a GCP but eligible for zAAP –SMF30_TIME_ON_IFA is time spent on a zAAP –SMF30_TIME_IFA_ON_CP is time spent on GCP but eligible for zAAP –Other fields to do with enclaves © 2009 IBM Corporation 31
  • 32. © 2009 IBM Corporation 32
  • 33. z/OS Release 8, 9, 10 Changes © 2009 IBM Corporation 33
  • 34. z/OS Release 8 Changes  Type 70 Subtype 1 – Engine Counts by Pool • Copes with “large LPAR configuration splits 70-1 record” case • PTF to Release 7 – Hardware Model • Allows you to figure out how many uncharacterised PUs there are – But a z10 E64 with 2 OPTIONAL SAPs probably should be called a “E62” • PTF to Release 7 and requires certain levels of PR/SM microcode – Machine Serial Number • Correlation with 74-4 – Type 74 Subtype 4 • Machine Serial Number – Correlation with 70-1 and with peer Coupling Facility • Structure-Level CPU Backup Foils – Requires CFLEVEL=15 © 2009 IBM Corporation 34
  • 35. z/OS Release 9 changes  More than 32 engines in an LPAR – The z9 limit is 54 • 64 on z10 – GCPs, zIIPs and zAAPs added together  74-4 CPU enhancements – Whether Dynamic Dispatch is active – Whether a processor is shared or dedicated – Processor weight • Requires CFLEVEL 14 © 2009 IBM Corporation 35
  • 36. z/OS Release 10 Changes  All RMF Records – Whether at least one zAAP was online – Whether at least one zIIP was online  In Type 70 and retrofitted to supported releases: – Permanent and Temporary Capacity Models and 3 capacities – Hiperdispatch • To be covered in a few minutes © 2009 IBM Corporation 36
  • 37. Defined- and Group- Capacity instrumentation © 2009 IBM Corporation 37
  • 38. Soft Capping and Group Capacity Defined Capacity A throttle on the rolling 4-hour average of the LPAR ƒWhen this exceeds the defined capacity PR/SM softcaps the LPAR ƒCPU delay in RMF SMF70PMA Average Adjustment Weight for pricing management SMF70NSW Number of samples when WLM softcaps partition Group Capacity Similar to Defined Capacity but for groups of LPARs on the same machines SMF70GJT Timestamp when the system joined the Group Capacity group SMF70GNM Group name SMF70GMU Group Capacity MSU limit © 2009 IBM Corporation 38
  • 39. Exceeding 8 MSUs (MSU_VS_CAP > 100%) in the morning leads to active capping (SOFTCAPPED > 0%). Note: OCPU and O2 are CPU Queuing numbers © 2009 IBM Corporation 39
  • 40. LPAR Table Fragment for Group Capacity © 2009 IBM Corporation 40
  • 41. – Does something strike you as odd here? Rolling 4-Hour Average MSUs as % of Group Cap Hour © 2009 IBM Corporation 41
  • 42. Blocked Workloads © 2009 IBM Corporation 42
  • 43. z/OS Release 9 Blocked Workload Support  Rolled back to R.7 and R.8  Blocked workloads: – Lower priority work may not get dispatched for an elongated time – May hold a resource that more important work is waiting for  WLM allows some throughput for blocked workloads – By dispatching low important workload from time to time, these “blocked workloads” are no longer blocked – Helps to resolve resource contention for workloads that have no resource management implemented – Additional information in WSC flash http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10609 © 2009 IBM Corporation 43
  • 44. IEAOPT BLWLTRPCT and BLWLINTHD (With OA22443) BLWLTRPCT Percentage of the CPU capacity of the LPAR to be used for promotion Specified in units of 0.1% Default is 5 (=0.5%) Maximum is 200 (=20%) Would only be spent when sufficiently many dispatchable units need promotion. BLWLTRPCT Percentage of the CPU capacity of the LPAR to be used for promotion Specified in units of 0.1% Default is 5 (=0.5%) Maximum is 200 (=20%) Would only be spent when sufficiently many dispatchable units need promotion. © 2009 IBM Corporation 44
  • 45. Type 70 CPU Control Section Type 72-3 Service/Report Class Period Data Section © 2009 IBM Corporation 45
  • 46. IBM System z10 EC HiperDispatch © 2009 IBM Corporation 46
  • 47. z10 EC HiperDispatch  HiperDispatch – z10 EC unique function – Dispatcher Affinity (DA) - New z/OS Dispatcher – Vertical CPU Management (VCM) - New PR/SM Support  Hardware cache optimization occurs when a given unit of work is consistently dispatched on the same physical CPU – Up until now software, hardware, and firmware have acted independently of each other – Non-Uniform-Memory-Access has forced a paradigm change • CPUs have different distance-to-memory attributes • Memory accesses can take a number of cycles depending upon cache level / local or remote memory accessed  The entire z10 EC hardware / firmware /OS stack now tightly collaborates to manage these effects © 2009 IBM Corporation 47
  • 48. z10 EC HiperDispatch – z/OS Dispatcher Functionality New z/OS Dispatcher – Multiple dispatching queues • Average 4 logical processors per queue – Tasks distributed amongst queues – Periodic rebalancing of task assignments – Generally assign work to minimum # logicals needed to use weight • Expand to use white space on box – Real-time on/off switch (Parameter in IEAOPTxx) – May require quot;tightening upquot; of WLM policies for important work • Priorities are more sensitive with targeted dispatching queues © 2009 IBM Corporation 48
  • 49. z10 EC HiperDispatch – z/OS Dispatcher Functionality…  Initialization – Single HIPERDISPATCH=YES z/OS parameter dynamically activates HiperDispatch (full S/W and H/W collaboration) without IPL • With HIPERDISPATCH=ON, IRD management of CPU is turned OFF – Four Vertical High LPs are assigned to each Affinity Node – A “Home” Affinity Node is assigned to each address space / task – zIIP, zAAP and standard CP “Home” Affinity Nodes must be maintained for work that transitions across specialty engines – Benefit increases as LPAR size increases (i.e. crosses books) © 2009 IBM Corporation 49
  • 50. z10 EC HiperDispatch – z/OS Dispatcher Functionality… Workload Variability Issues – Short Term • Dealing with transient utilization spikes – Intermediate • Balancing workload across multiple Affinity Nodes – Manages “Home” Book assignment – Long Term • Mapping z/OS workload requirements to available physical resources – Via dynamic expansion into Vertical Low Logical Processors © 2009 IBM Corporation 50
  • 51. z10 EC HiperDispatch – PR/SM Functionality New PR/SM Support –Topology information exchanged with z/OS • z/OS uses this to construct its dispatching queues –Classes of logicals • High priority allowed to consume weight –Tight tie of logical processor to physical processor • Low priority generally run only to consume white space © 2009 IBM Corporation 51
  • 52. z10 EC HiperDispatch – PR/SM Functionality…  Firmware Support (PR/SM, millicode) – New z/OS invoked instruction to cause PR/SM to enter “Vertical mode” • To assign vertical LPs subset and their associated LP to physical CP mapping – Based upon LPAR weight – Enables z/OS to concentrate its work on fewer vertical processors • Key in PR/SM overcommitted environments to reduce the LP competition for physical CP resources – Vertical LPs are assigned High, Medium, and Low attributes – Vertical low LPs shouldn’t be used unless there is logical white space within the CEC and demand within LPAR © 2009 IBM Corporation 52
  • 53. z10 EC HiperDispatch Instrumentation  Hiperdispatch status – SMF70HHF bits for Supported, Active, Status Changed  Parked Time – SMF70PAT in CPU Data Section  Polarization Weight – SMF70POW in Logical Processor Data Section • Highest weight for LPAR means Vertical High processor • Zero weight means Vertical Low processor • In-between means Vertical Medium processor  Example on next foil – 2 x Vertical High (VH) – 1 x Vertical Medium (VM) – 4 x Vertical Low (VL) – Because Hiperdispatch all engines online in the interval are online all the time • But there are other engines reserved so with Online Time = 0 © 2009 IBM Corporation 53
  • 54. Depiction Of An LPAR – With HiperDispatch Enabled 120 160 140 100 120 80 100 60 80 60 40 40 20 20 0 0 0 1 2 3 4 5 6 UNPARKED % PARKED % POLAR WEIGHT I/O % © 2009 IBM Corporation 54
  • 55. HiperDispatch “GA2” Support in RMF - OA21140  SMF70POF Polarisation Indicators Bits 0,1 – 00 is “Horizontal” or “Polarisation Not Indicated” – 01 is “Vertical Low” – 10 is “Vertical Medium” – 11 is “Vertical High” – (Bit 2 is whether it changed in the interval)  SMF70Q00 - SMF70Q12 In & Ready counts based on the number of processors online and unparked – Refinement is to take into account parking and unparking  Also SMF70RNM – Normalisation factor for zIIP • Which happens to be the same for zAAP  Also R744LPN – LPAR Number – For correlation with SMF 70  (Also zHPF support) © 2009 IBM Corporation 55
  • 56. “Cool It” - Cycle Steering  Introduced with z990 – http://www.research.ibm.com/journal/rd/483/goth.html  Refined in later processors – BOTH frequency- and voltage-reduction in z9  When cooling degraded processor progressively slowed – Much better than dying – Rare event • But should not be ignored  WLM Policy refreshed – Admittedly not that helpful message: • IWM063I WLM POLICY WAS REFRESHED DUE TO A PROCESSOR SPEED CHANGE • Automate it – SMF70CPA not changed • Used as part of SCRT • Talk to IBM and consider excluding intervals round such an event – R723MADJ is changed • Al Sherkow's news item shows an example: – http://www.sherkow.com/updates/20081014cooling.html © 2009 IBM Corporation 56
  • 57. IOPs – I/O Assist Processors  Not documented in Type 70 – Despite being regular engines characterised as IOPs – NOT a pool  Instrumentation in Type 78-3 – Variable-length Control Section • 1 IOP Initiative Queue / Util Data Section per IOP inside it – Processor Was Busy / Was Idle counts • NOT Processor Utilisation as such • Suggest stacking the two numbers on a by-hour plot – I/O Retry counts • Channel Path Busy, CU Busy, Device Busy  Machines can be configured with different numbers of IOPs – Depending on I/O intensiveness of workloads • Generally speaking it's only TPF that is said to need extra IOPs – Analysis can help get this right © 2009 IBM Corporation 57
  • 58. In Conclusion © 2009 IBM Corporation 58
  • 59. In Conclusion Be prepared for fractional engines, multiple engine pools, varying weights etc Understand the limitations of z/OS Image Level CPU Utilisation as a number Consider the value of IRD for complex LPAR setups Take advantage of Coupling Facility Structure CPU For Capacity Planning For CF Request Performance Analysis There’s additional instrumentation for Defined- and Group-Capacity limits z9 and z10 ARE different from z990 – and from each other •And z10 is evolving The CPU data model is evolving To be more complete To be more comprehensible To meet new challenges Such as Hiperdispatch’s Parked Time state For example SMF 23 and 113 © 2009 IBM Corporation 59
  • 60. Backup Foils © 2009 IBM Corporation 60
  • 61. SMF Type 70 Subtype 1 Layout CPU Control Section –Control information Such as machine type and model (software) ƒ CPU Data Sections –1 per logical processor for this z/OS image Count is number that were ever on in the interval ƒ ASID Data Area Section –Address space distributions PR/SM Partition Data Section –One for each partition Whether active or not ƒ PR/SM Logical Processor Data Section –One for each logical engine for each partition Includes reserved engines ƒ Inactive LPARs have zero sections ƒ CPU Identification Section –Table containing mnemonics for engine pools and engine counts © 2009 IBM Corporation 61
  • 62. Other System z9 Changes Multiple speed engines Up to 8 slower speed GCPs for System z9 Business Class Separate management pools for all engine types Using Pools 3, 4, 5 and 6 Pool 2 obsoleted zAAP Initial Weight can be different from GCP Initial Weight More processors in a CEC / in a book: S08, S18, S28 and S38 still 12 engines in a book 4 2-engine chips and 2 1-engine chip 2 spares across entire CEC and 2 SAPs in a book So (12 - 2) * #books – 2 compared to 2 (12 – 2 - 2) * #books S54 has 16 in a book 4 2-engine chips 2 spares across entire CEC and 2 SAPs in a book So (16 - 2) * #books - 2 = 54 © 2009 IBM Corporation 62
  • 63. -> CPU Identification Name Section (6) ====================================== #1: +0000: C3D74040 40404040 40404040 40404040 *CP * +0010: 00320000 * * #2: +0000: 40404040 40404040 40404040 40404040 * * +0010: 00000000 * * #3: +0000: C9C6C140 40404040 40404040 40404040 *IFA * +0010: 00020000 * * #4: +0000: C9C6D340 40404040 40404040 40404040 *IFL * +0010: 00000000 * * #5: +0000: C9C3C640 40404040 40404040 40404040 *ICF * +0010: 00000000 * * #6: +0000: C9C9D740 40404040 40404040 40404040 *IIP * +0010: 00020000 * * © 2009 IBM Corporation 63
  • 64. -> CPU Control Section (1) ========================== #1: +0000: 20940036 18980000 F7F4F440 40404040 *m q 744 * +0010: 40404040 40404040 001B0000 000004B6 * ¶* +0020: 0000011F 00000001 00000000 00000000 * * +0030: E2F5F440 40404040 40404040 40404040 *S54 * +0040: 0001C02F E7C77139 1000F0F2 4040F0F0 * { XGÉ 02 00* +0050: F0F0F0F0 F0F0F0F0 F0F4C2F1 F0C50000 *0000000004B10E * +0060: 00000000 0000 * * © 2009 IBM Corporation 64
  • 65. -> Local Coupling Facility Data Section (1) =========================================== #1: +0000: C3C6F140 40404040 E2E8E2C4 40404040 *CF1 SYSD * +0010: 80000000 00000001 00000000 00000000 *Ø * +0020: 0000000E 00000007 00000007 00000000 * * +0030: 00000000 44142800 00000000 00000000 * à * +0040: 00000000 00000000 00000000 00000000 * * +0050: 00000000 00000000 00000000 00000000 * * +0060: 00000000 4040F2F0 F8F4C2F1 F6F0F200 * 2084B1602 * +0070: 0000000E 80C08000 C3C2D740 40404040 * Ø{Ø CBP * +0080: 40404040 40404040 40404040 40404040 * * +0090: 40404040 40404040 40404040 40404040 * * +00A0: F0F0F0F0 F0F0F0F2 F3C1F6C1 *000000023A6A * Main Presentation © 2009 IBM Corporation 65
  • 66. A Way of Looking at a Logical Engine – Breaking the RMF Interval Up Into Components Fraction of Interval 1 Logical CP does not exist 1 2 Logical CP not dispatched 3 LPAR overhead * 0.8 4 Logical CP dispatched for work 0.6 * Other overhead is recorded 0.4 in PHYSICAL LPAR 0.2 0 Interval - SMF70ONT (1) SMF70PDT - SMF70EDT (3) SMF70ONT - SMF70PDT (2) SMF70EDT (4) With z10 HiperDispatch there’s another state: PARKED Add to (1) when calculating z/OS CPU Utilisation NOTE: If HiperDispatch is enabled Online Time is normally the RMF interval for non-reserved engines © 2009 IBM Corporation 66
  • 67. IRD and Standby Images In IRD example (in backup foils) R1 and R3 could be viewed as hot standby images –If one fails the other picks up the load –In fact workload affinities complicate this: Some work has specific affinity to R1, explaining the imbalance ƒ So failover might not work perfectly ƒ Multi-Machine Hot Standby cases pretty similar –It's just one LPAR suddenly gets much bigger and no others shrink Other resources need to be taken into account - to sustain the oncoming work  zIIPs and zAAPs not “automatically provisioned” by IRD –IRD will probably do the job for disk I/O –Real memory needs to be available to support additional work Not managed by IRD ƒ –DB2 Virtual Storage needs to plan for takeover case Mainly threads but also eg buffer pools ƒ –Workload routing In some cases determined by WLM ƒ © 2009 IBM Corporation 67
  • 68. Example of IRD and LPAR Weights - 3 Systems on a z990 900 800 700 600 R3 500 R1 400 E1 300 200 100 0 0 2 4 6 8 10 12 14 16 18 20 22 1 3 5 7 9 11 13 15 17 19 21 23 © 2009 IBM Corporation 68
  • 69. IRD Changing Weights and Engines - 2 LPARs 500 6.5 450 6 400 5.5 350 5 300 W eight Engines 4.5 250 4 200 150 3.5 0 2 4 6 8 10 12 14 16 18 20 22 1 3 5 7 9 11 13 15 17 19 21 23 R1 Weight R3 Weight R1 CPs R3 CPs © 2009 IBM Corporation 69
  • 70. SMF Type 70 and IRD Field Section Description SMF70CNF Bit 6 CPU Data This z/OS image's engine n reconfigured during interval SMF70CNF Bit 7 CPU Data This z/OS image's engine n online at end of interval SMF70BDN Partition Data Number of engines defined for this LPAR - both online and reserved SMF70SPN Partition Data LPAR Cluster Name SMF70ONT Logical Processor Data Logical Processor Online Time (only for IRD-capable processors) SMF70BPS Logical Processor Data Traditional Weight (X'FFFF' = reserved) SMF70VPF Bit 2 Logical Processor Data Weight has changed during interval SMF70MIS / MAS Logical Processor Data Max and Min Share SMF70NSI / NSA Logical Processor Data Number of samples share with 10% of Min / Max SMF70ACS Logical Processor Data Accumulated processor share - Divide by SMF70DSA to get average share © 2009 IBM Corporation 70
  • 71. Online CPs by hour - quot;1quot; means the CP was online all hour (Chart based on SMF70ONT) 12 11 10 10 9 8 8 7 6 6 5 4 Engines 4 3 2 2 1 0 0 0 2 4 6 8 10 12 14 16 18 20 22 1 3 5 7 9 11 13 15 17 19 21 hour Main Presentation © 2009 IBM Corporation 71