Much Ado about CPU
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Much Ado about CPU

on

  • 2,914 views

 

Statistics

Views

Total Views
2,914
Views on SlideShare
2,914
Embed Views
0

Actions

Likes
1
Downloads
32
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Much Ado about CPU Presentation Transcript

  • 1. IBM System z Technical University – Vienna , Austria – May 2-6zZS28 Much Ado About CPUMartin Packer © 2011 IBM Corporation
  • 2. IBM System z Technical University – Vienna , Austria – May 2-6 Abstract System z and zEnterprise processors have in recent years introduced a number of capabilities of real value to mainframe customers. These capabilities have, however, required changes in the way we think about CPU management. This presentation describes these capabilities and how to evolve your CPU management to take them into account. It is based on the authors experience of evolving his reporting to support these changes. This presentation is substantially enhanced this year © 2011 IBM Corporation2
  • 3. IBM System z Technical University – Vienna , Austria – May 2-6 Agenda A brief review of technology Unfinished Business? Coupling Facility CPU zAAP and zIIP z/OS Release 10 Changes Soft Capping and Group Capacity Limits Blocked Workloads z10 Hiperdispatch Cool It I/O Assist Processors (IOPs) SMF 23 and 113 In Conclusion © 2011 IBM Corporation3
  • 4. IBM System z Technical University – Vienna , Austria – May 2-6 R A Brief Review of Technology © 2011 IBM Corporation4
  • 5. IBM System z Technical University – Vienna , Austria – May 2-6 "Characterisable" Engines –GCPs - Pool 1 –(Obsolete Pool 2) –ICFs - Pool 5 –IFLs - Pool 3 –zAAPs - Pool 4 –zIIPs – Pool 6 ● “Non-Characterisable" Engines ― SAPs ― Spares With zEnterprise zBX other engines ― Not connected in the same way at all ― Not discussed here ― Treating as a “z11” © 2011 IBM Corporation5
  • 6. IBM System z Technical University – Vienna , Austria – May 2-6Book-Structured ● Connected by a ring in z9 ● z10 and zEnterprise ensure all books connected to all books directly ● Data transfers are direct between books via the L2 Cache chip in each books MCM ● L2 Cache is shared by every PU on the MCM ● zEnterprise has an additional per-chip level of cache – and nomenclature “cleaned up” ● Only 1 book in BC models © 2011 IBM Corporation6
  • 7. IBM System z Technical University – Vienna , Austria – May 2-6 IRD CPU Management Weight Management for GCP engines –Alter weights within an LPAR Cluster –Shifts of 10% of weight CP Management –Doesnt work with HiperDispatch –Vary LOGICAL CPs on and off –Only for GCP engines WLM objectives –Optimise goal attainment –Optimise PR/SM overhead –Optimise LPAR throughput Part of "On Demand" picture –Ensure you have defined reserved engines –Make weights sensible to allow shifts to happen © 2011 IBM Corporation7
  • 8. IBM System z Technical University – Vienna , Austria – May 2-6 Unfinished Business? How do we evolve our performance and capacity reporting? Should we define an LPAR with dedicated engines? –Or with shared engines? •What should the weights be? - In total and individually - And what about the total for each pool? -How many engines should each LPAR have? -And IRD makes all this so much more dynamic © 2011 IBM Corporation8
  • 9. IBM System z Technical University – Vienna , Austria – May 2-6 Increasing Complexity Installations are increasing the numbers of LPARs on a machine –Many exceed 10 per footprint ● Expect 20 + soon ● My record: 51 and 52, 56 ● 33 and 34 active, respectively ―And have more logical and physical engines ―And increasing the diversity of their LPARs ● Greater incidence of IFLs ● Fast uptake of zIIPs and zAAPs ●Sometimes meaning 2 engine speeds ● Fewer stand-alone CF configurations ― With mergers etc. the numbers of machines managed by a team is increasing ― And stuffs got more dynamic, too ― As an aside... ● Shouldnt systems be self-documenting? © 2011 IBM Corporation9
  • 10. IBM System z Technical University – Vienna , Austria – May 2-6 Coupling Facility CPU © 2011 IBM Corporation10
  • 11. IBM System z Technical University – Vienna , Austria – May 2-6 Internal Coupling Facility (ICF)•Managed out of Pool 5 –Pool numbers given in SMF 70 as index into table of labels – Label is “ICF”Recommendation: Manage in reporting as a separate poolFollow special CF sizing guidelines –Especially for takeover situationsAlways runs at full speed So good technology match for coupled z/OS images on same footprint Another good reason to use ICFs is IC linksShared ICFs strongly discouraged for Production Especially if the CF image has Dynamic Dispatch turned on © 2011 IBM Corporation11
  • 12. IBM System z Technical University – Vienna , Austria – May 2-6ICF ... Need to correlate SMF 70-1 with SMF 74-4 CF Utilisation to get proper CPU picture Since z/OS Release 8 74-4 has machine serial number Allows correlation in most cases Partition number added to 74-4 in OA21140 • Enables correlation with 70-1 when LPAR name is not the Coupling Facility Name © 2011 IBM Corporation12
  • 13. IBM System z Technical University – Vienna , Austria – May 2-6Structure-Level CPU Consumption CFLEVEL 15 and z/OS R.9 Always 100% Capture Ratio Adds up to R744PBSY Multiple uses: Capacity planning for changing request rates Examine which structures are large consumers Compute CPU cost of a request • And compare to service time • Interesting number is “non-CPU” element of service time – as we shall see NOTE: Need to collect 74-4 data from all z/OS systems sharing to get total request rate © 2011 IBM Corporation13
  • 14. IBM System z Technical University – Vienna , Austria – May 2-6Structure CPU ...Where not trivial I plot Sync Request % Shows if deterioration with loadDifferent request types and technologies behave markedly differently For example modern lock structures locally accessed are typically around 5us CPU and elapsed or lower For example XCF structures often in hundreds of us elapsed • And quite high CPU • Though obviously all async © 2011 IBM Corporation14
  • 15. IBM System z Technical University – Vienna , Austria – May 2-6 zAAP and zIIP © 2011 IBM Corporation15
  • 16. IBM System z Technical University – Vienna , Austria – May 2-6 zAAP and zIIP Must each not exceed number of GCPs Run at full speed, even if GCPs dont •Instrumentation documents “speed” difference Hardcapping but no softcapping •No Resource Group capping Not managed by IRD –Weight is the INITIAL LPAR weight © 2011 IBM Corporation16
  • 17. IBM System z Technical University – Vienna , Austria – May 2-6 © 2011 IBM Corporation17
  • 18. IBM System z Technical University – Vienna , Austria – May 2-6zAAP on zIIP New with z/OS Release 11 Retrofitted to R.9 and R.10 with OA27495 Not available if you already have zAAPs installed Or have reserved zAAP logical engines Designed to enable further use of perhaps-underused zIIPs Does not change the configuration rules relative to GCPs Does not suddenly make zAAP-eligible work look like zIIP- eligible in terms of SRBs etc No special metrics eg zAAP work now in zIIP bucket eg zAAP-eligible now in zIIP-eligible bucket © 2011 IBM Corporation18
  • 19. IBM System z Technical University – Vienna , Austria – May 2-6zIIP Instrumentation – Subsystems and AddressSpaces Instrumentation on consumption and potential for a number of exploiters: Latter is eg “zAAP on GCP” Type 30 Address Space – Interval and Step/Job-End Takes RMF Workload Activity (72-3) to address space level DB2 Accounting Trace Type 101 shows zIIP USED times by usage category • At plan and package level • ELIGIBLE is only reported on up to Version 9 Websphere Application Server Type 120 Subtype 9 (Request Activity) • Both zIIP and zAAP usage and potential © 2011 IBM Corporation19
  • 20. IBM System z Technical University – Vienna , Austria – May 2-6 z/OS Release 10 Changes © 2011 IBM Corporation20
  • 21. IBM System z Technical University – Vienna , Austria – May 2-6z/OS Release 10 Changes All RMF Records Whether at least one zAAP was online Whether at least one zIIP was online In Type 70 and retrofitted to supported releases: Permanent and Temporary Capacity Models and 3 capacities Hiperdispatch • To be covered in a few minutes © 2011 IBM Corporation21
  • 22. IBM System z Technical University – Vienna , Austria – May 2-6 Defined- and Group- Capacity instrumentation © 2011 IBM Corporation22
  • 23. IBM System z Technical University – Vienna , Austria – May 2-6 Soft Capping and Group CapacityDefined Capacity A throttle on the rolling 4-hour average of the LPAR ƒ When this exceeds the defined capacity PR/SM softcaps the LPAR ƒ CPU delay in RMF SMF70PMA Average Adjustment Weight for pricing management SMF70NSW Number of samples when WLM softcaps partitionGroup Capacity Similar to Defined Capacity but for groups of LPARs on the same machines SMF70GJT Timestamp when the system joined the Group Capacity group SMF70GNM Group name SMF70GMU Group Capacity MSU limit © 2011 IBM Corporation23
  • 24. Exceeding 8University – Vienna , Austria – May 2-6IBM System z Technical MSUs (MSU_VS_CAP > 100%) in the morning leads to activecapping (SOFTCAPPED > 0%). Note: OCPU and O2 are CPU Queuingnumbers © 2011 IBM Corporation24
  • 25. IBM System z Technical University – Vienna , Austria – May 2-6Group Capacity Limits Each partition (z/OS system) manages itself Group capacity is based on defined capacity implementation 4hr rolling average of group MSU consumption is used for managing the groups partitions Each partition is aware of the consumption of all other partitions on the CPC And identifies all other partitions that are member of the same capacity group Calculates its defined share of the capacity group, based on the partition weight. • This share is the target for the partition if all partitions of the group want to use as much CPU as possible If some LPARs do not consume their share the unused capacity will be distributed over those LPARs that need additional capacity If a defined capacity limit is defined to a partition that limit will not be violated even when the partition receives capacity from others. WLM will only manage partitions with shared CPs and WC=NO © 2011 IBM Corporation25
  • 26. IBM System z Technical University – Vienna , Austria – May 2-6LPAR Table Fragment for Group Capacity © 2011 IBM Corporation26
  • 27. IBM System z Technical University – Vienna , Austria – May 2-6 Blocked Workloads © 2011 IBM Corporation27
  • 28. IBM System z Technical University – Vienna , Austria – May 2-6z/OS Release 9 Blocked Workload Support Rolled back to R.7 and R.8 Blocked workloads: Lower priority work may not get dispatched for an elongated time May hold a resource that more important work is waiting for WLM allows some throughput for blocked workloads By dispatching low important workload from time to time, these “blocked workloads” are no longer blocked Helps to resolve resource contention for workloads that have no resource management implemented Additional information in WSC flash http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10609 Additional instrumentation in 70-1 and 72-3 © 2011 IBM Corporation28
  • 29. IEAOPT BLWLTRPCT and BLWLINTHD (WithIBM System z Technical University – Vienna , Austria – May 2-6OA22443) BLWLTRPCT Percentage of the CPU capacity of the LPAR to be used for promotion Specified in units of 0.1% Default is 5 (=0.5%) Maximum is 200 (=20%) Would only be spent when sufficiently many dispatchable units need promotion. BLWLINTHD Specifies threshold time interval for which a blocked address space or enclave must wait before being considered for promotion. Minimum is 5 seconds. Maximum is 65535 seconds. Default is 60 seconds. © 2011 IBM Corporation29
  • 30. IBM System z Technical University – Vienna , Austria – May 2-6Type 70 CPU Control SectionType 72-3 Service/Report Class Period Data Section © 2011 IBM Corporation30
  • 31. IBM System z Technical University – Vienna , Austria – May 2-6 IBM System z10 EC HiperDispatch © 2011 IBM Corporation31
  • 32. IBM System z Technical University – Vienna , Austria – May 2-6z10 EC HiperDispatch  HiperDispatch – z10 EC unique function – Dispatcher Affinity (DA) - New z/OS Dispatcher – Vertical CPU Management (VCM) - New PR/SM Support  Hardware cache optimization occurs when a given unit of work is consistently dispatched on the same physical CPU – Up until now software, hardware, and firmware have acted independently of each other – Non-Uniform-Memory-Access has forced a paradigm change • CPUs have different distance-to-memory attributes • Memory accesses can take a number of cycles depending upon cache level / local or remote memory accessed  The entire z10 EC hardware / firmware / OS stack now tightly collaborates to manage these effects © 2011 IBM Corporation32
  • 33. IBM System z Technical University – Vienna , Austria – May 2-6z10 EC HiperDispatch – z/OS DispatcherFunctionality  New z/OS Dispatcher – Multiple dispatching queues • Average 4 logical processors per queue – Tasks distributed amongst queues – Periodic rebalancing of task assignments – Generally assign work to minimum # logicals needed to use weight • Expand to use white space on box – Real-time on/off switch (Parameter in IEAOPTxx) – May require "tightening up" of WLM policies for important work • Priorities are more sensitive with targeted dispatching queues © 2011 IBM Corporation33
  • 34. IBM System z Technical University – Vienna , Austria – May 2-6 z10 EC HiperDispatch – z/OS Dispatcher Functionality… Initialization: Single HIPERDISPATCH=YES z/OS parameter dynamically activates HiperDispatch (full S/W and H/W collaboration) without IPL • With HIPERDISPATCH=ON, IRD management of CPU is turned OFF Four Vertical High LPs are assigned to each Affinity Node A “Home” Affinity Node is assigned to each address space / task zIIP, zAAP and standard CP “Home” Affinity Nodes must be maintained for work that transitions across specialty engines Benefit increases as LPAR size increases (i.e. crosses books) © 2011 IBM Corporation34
  • 35. IBM System z Technical University – Vienna , Austria – May 2-6z10 EC HiperDispatch – z/OS DispatcherFunctionality…  Workload Variability Issues: – Short Term • Dealing with transient utilization spikes – Intermediate • Balancing workload across multiple Affinity Nodes – Manages “Home” Book assignment – Long Term • Mapping z/OS workload requirements to available physical resources – Via dynamic expansion into Vertical Low Logical Processors © 2011 IBM Corporation35
  • 36. IBM System z Technical University – Vienna , Austria – May 2-6z10 EC HiperDispatch – PR/SM Functionality New PR/SM Support – Topology information exchanged with z/OS • z/OS uses this to construct its dispatching queues – Classes of logicals • High priority allowed to consume weight – Tight tie of logical processor to physical processor • Low priority generally run only to consume white space © 2011 IBM Corporation36
  • 37. IBM System z Technical University – Vienna , Austria – May 2-6z10 EC HiperDispatch – PR/SM Functionality… Firmware Support (PR/SM, millicode) New z/OS invoked instruction to cause PR/SM to enter “Vertical mode” • To assign vertical LPs subset and their associated LP to physical CP mapping – Based upon LPAR weight Enables z/OS to concentrate its work on fewer vertical processors • Key in PR/SM overcommitted environments to reduce the LP competition for physical CP resources Vertical LPs are assigned High, Medium, and Low attributes Vertical low LPs shouldn’t be used unless there is logical white space within the CEC and demand within LPAR © 2011 IBM Corporation37
  • 38. IBM System z Technical University – Vienna , Austria – May 2-6z10 EC HiperDispatch Instrumentation  Hiperdispatch status – SMF70HHF bits for Supported, Active, Status Changed  Parked Time – SMF70PAT in CPU Data Section  Polarization Weight – SMF70POW in Logical Processor Data Section • Highest weight for LPAR means Vertical High processor • Zero weight means Vertical Low processor • In-between means Vertical Medium processor  Example on next foil – 2 x Vertical High (VH) – 1 x Vertical Medium (VM) – 4 x Vertical Low (VL) – Because Hiperdispatch all engines online in the interval are online all the time • But there are other engines reserved so with Online Time = 0 © 2011 IBM Corporation38
  • 39. IBM System z Technical University – Vienna , Austria – May 2-6Depiction Of An LPAR – With HiperDispatch Enabled 120 160 140 100 120 80 100 60 80 60 40 40 20 20 0 0 0 1 2 3 4 5 6 UNPARKED % PARKED % POLAR WEIGHT I/O % © 2011 IBM Corporation39
  • 40. IBM System z Technical University – Vienna , Austria – May 2-6HiperDispatch “GA2” Support in RMF - OA21140 SMF70POF Polarisation Indicators Bits 0,1 00 is “Horizontal” or “Polarisation Not Indicated” 01 is “Vertical Low” 10 is “Vertical Medium” 11 is “Vertical High” (Bit 2 is whether it changed in the interval) SMF70Q00 - SMF70Q12 In & Ready counts based on the number of processors online and unparked Refinement is to take into account parking and unparking Also SMF70RNM Normalisation factor for zIIP • Which happens to be the same for zAAP Also R744LPN – LPAR Number For correlation with SMF 70 (Also zHPF support) © 2011 IBM Corporation40
  • 41. IBM System z Technical University – Vienna , Austria – May 2-6“Cool It” - Cycle Steering Introduced with z990 http://www.research.ibm.com/journal/rd/483/goth.html Refined in later processors BOTH frequency- and voltage-reduction in z9 When cooling degraded processor progressively slowed Much better than dying Rare event • But should not be ignored WLM Policy refreshed Admittedly not that helpful a message: • IWM063I WLM POLICY WAS REFRESHED DUE TO A PROCESSOR SPEED CHANGE • Automate it SMF70CPA not changed • Used as part of SCRT • Talk to IBM and consider excluding intervals round such an event R723MADJ is changed • Al Sherkows news item shows an example: – http://www.sherkow.com/updates/20081014cooling.html In R.12 Types 89, 70, 72 and 30 have instrumentation for this situation © 2011 IBM Corporation41
  • 42. IBM System z Technical University – Vienna , Austria – May 2-6 IOPs – I/O Assist Processors Not documented in Type 70 Despite being regular engines characterised as IOPs NOT a pool Instrumentation in Type 78-3 Variable-length Control Section • 1 IOP Initiative Queue / Util Data Section per IOP inside it Processor Was Busy / Was Idle counts • NOT Processor Utilisation as such • Suggest stacking the two numbers on a by-hour plot I/O Retry counts • Channel Path Busy, CU Busy, Device Busy Machines can be configured with different numbers of IOPs Depending on I/O intensiveness of workloads • Generally speaking its only TPF that is said to need extra IOPs Analysis can help get this right © 2011 IBM Corporation42
  • 43. IBM System z Technical University – Vienna , Austria – May 2-6SMF 23 and 113 © 2011 IBM Corporation43
  • 44. IBM System z Technical University – Vienna , Austria – May 2-6SMF 23 SMF 23 –The “SMF”record New extensions to the SMF 23 record • Provide information related to Dispatching, Storage and I/O • Available on z/OS 1.8 and above Why you’d want to collect them? They may provided a way to help characterize your workload to improve your capacity planning • LoIO Mix zPCR is simply an estimate of your actual workload pattern Record Size and Interval Small record - 210 bytes (258 bytes with “deltas”) per System per interval © 2011 IBM Corporation44
  • 45. IBM System z Technical University – Vienna , Austria – May 2-6What is in the SMF 23s? - New Fields via APAR OA22414Storage Total Number of Getmain requests (NGR) Total Pages backed during Getmain requests (PBG) Total Number of Fixed requests for Storage below 2 GB (NFR) Total number of Frames for Fixed requests for Storage below 2 GB (PFX)Faults Total number of first reference faults (1RF) Total number of non first reference faults (NRF)I/Os Total Number of I/Os (NIO)Dispatches (Dispatch) Number of unlocked TCB Dispatches (TCB) Number of SRB Dispatches (SRB)APAR OA27161–Closed 1/19/2009 To provide “delta” counters for above fields Otherwise “cumulative” counters © 2011 IBM Corporation45
  • 46. IBM System z Technical University – Vienna , Austria – May 2-6What is the z10 CPU Measurement Facility? New hardware instrumentation facility “CPU Measurement Facility”(CPU MF) Available on System z10 EC GA2 and z10 BC Supported by a new z/OS component (Instrumentation), Hardware Instrumentation Services (HIS) Potential Future Uses –for this new “cool”virtualization technology CPU MF provides support built into the processor hardware • So exploiting mechanism allows the observation of performance behavior with nearly no impact to the system being observed Potential Uses • Future workload characterization • ISV product improvement • Application Tuning © 2011 IBM Corporation46
  • 47. IBM System z Technical University – Vienna , Austria – May 2-6CPU MF ... Data collection done by System z hardware Low overhead Little/No skew in sampling Access to information which is not available from software SAMPLING SAMPFREQ=800000 is default (samples per minute), = 13,333 /s • 8M samples in 10 minutes is the default (DURATION=10 is the default, 10 minutes) • Recommendation – Start with a small frequency, e.g. SAMPFREQ=320, and increase after early experiences – e.g. ensure enough disk space for output – Smaller z10 BCs should increase only up to SAMPFREQ=130000 (for DURATION=60) New IBM Research article “IBM System z10 performance improvements with software and hardware synergy” http://www.research.ibm.com/journal/rd/531/jackson.pdf © 2011 IBM Corporation47
  • 48. IBM System z Technical University – Vienna , Austria – May 2-6COUNTERS Basic Counter Set Cycle count Instruction count Level-1 I-cache directory write count Level-1 I-cache penalty cycle count Level-1 D-cache directory write count Level-1 D-cache penalty cycle count Problem State Counter Set Problem state cycle count Problem state instruction count Problem state level-1 I-cache directory write count Problem state level-1 I-cache penalty cycle count Problem state level-1 D-cache directory write count Problem state level-1 D-cache penalty cycle count Extended Counter Set Number and meaning of counters are model-dependent © 2011 IBM Corporation48
  • 49. IBM System z Technical University – Vienna , Austria – May 2-6Crypto Activity Counter Set (CPACF activity)PRNG function count DES function countPRNG cycle count DES cycle countPRNG blocked function count DES blocked function countPRNG blocked cycle count DES blocked cycle countSHA function count AES function countSHA cycle count AES cycle countSHA blocked function count AES blocked function countSHA blocked cycle count AES blocked cycle count © 2011 IBM Corporation49
  • 50. IBM System z Technical University – Vienna , Austria – May 2-6Sample Report – Basic / Extended Counters z10 L1 CacheHierarchy Sourcing © 2011 IBM Corporation50
  • 51. IBM System z Technical University – Vienna , Austria – May 2-6 In Conclusion © 2011 IBM Corporation51
  • 52. IBM System z Technical University – Vienna , Austria – May 2-6In ConclusionBe prepared for fractional engines, multiple engine pools, varying weights etcUnderstand the limitations of z/OS Image Level CPU Utilisation as a numberTake advantage of Coupling Facility Structure CPU  For Capacity Planning  For CF Request Performance AnalysisThere’s additional instrumentation for Defined- and Group-Capacity limits z9, z10 and zEnterprise ARE different from z990 – and from each otherThe CPU data model is evolving  To be more complete  To be more comprehensible  To meet new challenges  Such as Hiperdispatch’s Parked Time state  For example SMF 23 and 113 © 2011 IBM Corporation52