1
z/VM Performance Analysis
Lívio Sousa - livios@br.ibm.com
IBM zEnterprise Client Technical Specialist
2
Overview
• Guidelines
• Commands
• *MONITOR
• Performance Toolkit
• Omegamon XE
3
Definition of Performance
• Performance definitions:
– Response time
– Batch elapsed time
– Throughput
– Resource consum...
4
Performance Guidelines
• Processor
• Storage
• Paging
• Minidisk cache
• Server machines
5
Processor Guidelines
• Dedicated processors - mostly political
– Absolute share can be almost as effective
– Gets wait s...
6
Storage Guidelines
Virtual-to-real ratio should be <= 3:1 or make sure paging system is robust
– To avoid any performanc...
7
Paging Guidelines
• DASD paging allocations less than or equal to 50%
– QUERY ALLOC PAGE
• Watch blocks read per paging ...
88
Reorder Processing - Background
• Page reorder is the process in z/VM of managing
user frame owned lists as input to de...
9
9
Reorder Processing - Diagnosing
Performance Toolkit
– Check resident page fields (“R<2GB” & “R>2GB”) on
FCX113 UPAGE r...
10
10
REORDMON Example
Num. of Average Average
Userid Reorders Rsdnt(MB) Ref'd(MB) Reorder Times
-------- -------- -------...
11
11
Reorder Processing - Mitigations
• Try to keep the virtual machine as small as possible.
• Virtual machines with mul...
12
Minidisk Cache Guidelines
• In general, enable MDC for everything
• Configure some real storage for MDC
• Set maximum M...
13
Server Machine Guidelines
• Server Virtual Machine (SVM)
• TCP/IP, RACFVM, etc.
• QUICKDSP ON to avoid eligible list
• ...
14
CP INDICATE Command
• LOAD: shows total system load
– Processors, XSTORE, paging, MDC, queue lengths, storage load
– ST...
15
CP INDICATE LOAD Example
INDICATE LOAD
AVGPROC-088% 03
XSTORE-000000/SEC MIGRATE-0000/SEC
MDC READS-000035/SEC WRITES-0...
16
Selected CP QUERY Commands
USERS: number and type of users on system
SRM: scheduler/dispatcher settings (LDUBUF, etc.)
...
17
5,000 Foot View
CP Control
Blocks
Application
Data
VM Events
*MONITOR System Service
MONDCSS
Segment
MONWRITE Utility
P...
18
19
Processor
REPORT NAME REPORT CODE COMMAND
CPU Load and Transactions FCX100 CPU
LPAR Load FCX126 LPAR
Processor Log FCX1...
20
FCX126 Run 2011/09/20 18:00:56 Logical Partition Activity
From 2011/09/13 09:19:15 To 2011/09/13 10:09:15
For 3000 Secs...
21
FCX225 Run 2011/09/20 18:00:56 SYSSUMLG
System Performance Summary by Time
From 2011/09/13 09:19:15
To 2011/09/13 10:09...
22
REPORT NAME REPORT CODE COMMAND
Auxiliary Storage Log FCX146 AUXLOG
CP Owned Device FCX109 DEVICE CPOWNED
User Page Dat...
23
FCX109 Run 2011/05/31 17:44:26 DEVICE CPOWNED
Load and Performance of CP Owned Disks
From 2011/05/12 16:48:41 To 2011/0...
24
I/O
REPORT NAME REPORT CODE COMMAND
General I/O Device FCX108 DEVICE
SCSI Device FCX249 SCSI
DASD Performance Log FCX13...
25
Studying MONWRITE Data
• z/VM Performance Toolkit
• Interactively – possible, but not so useful
• PERFKIT BATCH command...
26
26
Some Notes on z/VM Limits
• Sheer hardware:
– z/VM 5.2: 24 engines, 128 GB real
– z/VM 5.3: 32 engines, 256 GB real
...
27
Some Final Thoughts
• Define what is performance for your case
• Collect data for a base line of good
performance
• Imp...
28
OBRIGADO!
Informações de Contato:
Livio Sousa
IBM Tutóia – SP
livios@br.ibm.com
+55 11 9 7203 6637
Upcoming SlideShare
Loading in …5
×

z/VM Performance Analysis

5,335 views

Published on

Apresentação realizada no CMG Brasil 2013

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,335
On SlideShare
0
From Embeds
0
Number of Embeds
142
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

z/VM Performance Analysis

  1. 1. 1 z/VM Performance Analysis Lívio Sousa - livios@br.ibm.com IBM zEnterprise Client Technical Specialist
  2. 2. 2 Overview • Guidelines • Commands • *MONITOR • Performance Toolkit • Omegamon XE
  3. 3. 3 Definition of Performance • Performance definitions: – Response time – Batch elapsed time – Throughput – Resource consumed per unit of work done – Utilization – Users supported – Phone ringing – Consistency • All of the above
  4. 4. 4 Performance Guidelines • Processor • Storage • Paging • Minidisk cache • Server machines
  5. 5. 5 Processor Guidelines • Dedicated processors - mostly political – Absolute share can be almost as effective – Gets wait state assist and 500 ms minor time slice – Perhaps not a good idea if you are CPU-constrained – A virtual machine should have all dedicated or all shared processors • Share settings – Use absolute if you can judge percent of resources required – Use relative if difficult to judge and if lower share as system load increases is acceptable – Be aware that share value is split by vCPUs – Do not use LIMITHARD settings unnecessarily • Masks looping users • More scheduler overhead • Use the right number of virtual processors for the guest's workload • Don’t share all available IFLs to all LPARs – Suspend Time can be high
  6. 6. 6 Storage Guidelines Virtual-to-real ratio should be <= 3:1 or make sure paging system is robust – To avoid any performance impact for production workloads, you may need to keep ratio to 1:1 – See also http://www.vm.ibm.com/perf/tips/memory.html – VIR2REAL EXEC (Bruce Hayden) http://www.vm.ibm.com/download/packages/descript.cgi?VIR2REAL Define some processor storage as expanded storage to provide paging hierarchy – For more background, see http://www.vm.ibm.com/perf/tips/storconf.html Size guests appropriately – Avoiding over provisioning – Do not put them in a high guest paging position – Right-sized usually means "just barely swapping" Exploit shared memory where possible – IPL your Linux guests from a segment – Use the Linux XIP (execute-in-place) file system Total Virtual storage (all logged on userids): 388308 MB (379.2 GB) Usable real storage (pageable) for this system: 202927 MB (198.2 GB) Total LPAR Real storage: 204800 MB (200.0 GB) Expanded storage usable for paging: 25600 MB ( 25.0 GB) Total Virtual disk (VDISK) space defined: 50176 MB ( 49.0 GB) Average Virtual disk size: 512 MB Virtual + VDISK to Real storage ratio: 2.2 : 1
  7. 7. 7 Paging Guidelines • DASD paging allocations less than or equal to 50% – QUERY ALLOC PAGE • Watch blocks read per paging request (keep >10) – Long block runs make paging I/O efficient • Multiple volumes and multiple paths – Remember, one I/O per real device at a time – Use lots of little volumes rather than a few big volumes – Pay attention in Response Time and Wait Queues • Do not mix sizes of paging DASD – Use all -3s, or all -9s, or whatever • Paging to FCP SCSI (EDEVICES) may offer higher paging bandwidth with higher processor requirements – See also http://www.vm.ibm.com/perf/tips/prgpage.html
  8. 8. 88 Reorder Processing - Background • Page reorder is the process in z/VM of managing user frame owned lists as input to demand scan processing. – It includes resetting the HW reference bit. – Serializes the virtual machine (all virtual processors). – In all releases of z/VM • It is done periodically on a virtual machine basis. • The cost of reorder is proportional to the number of resident frames for the virtual machine. – Roughly 130 ms/GB resident – Delays of ~1 second for guest having 8 GB resident – This can vary for different reasons +/- 40%
  9. 9. 9 9 Reorder Processing - Diagnosing Performance Toolkit – Check resident page fields (“R<2GB” & “R>2GB”) on FCX113 UPAGE report • Remember, Reorder works against the resident pages, not total virtual machine size. – Check Console Function Mode Wait (“%CFW”) on FCX114 USTAT report • A virtual machine may be brought through console function mode to serialize Reorder. There are other ways to serialize for Reorder and there are other reasons that for CFW, so this is not conclusive. REORDMON – Available from the VM Download Page http://www.vm.ibm.com/download/packages/ – Works against raw MONWRITE data for all monitored virtual machines – Works in real time for a specific virtual machine – Provides how often Reorder processing occurs in each monitor interval
  10. 10. 10 10 REORDMON Example Num. of Average Average Userid Reorders Rsdnt(MB) Ref'd(MB) Reorder Times -------- -------- --------- --------- ------------------- LINUX002 2 18352 13356 13:29:05 14:15:05 LINUX001 1 22444 6966 13:44:05 LINUX005 1 14275 5374 13:56:05 LINUX003 2 21408 13660 13:43:05 14:10:05 LINUX007 1 12238 5961 13:51:05 LINUX006 1 9686 4359 13:31:05 LINUX004 1 21410 11886 14:18:05
  11. 11. 11 11 Reorder Processing - Mitigations • Try to keep the virtual machine as small as possible. • Virtual machines with multiple applications may need to be split into multiple virtual machines with fewer applications. • See http://www.vm.ibm.com/perf/tips/reorder.html for more details. • Apply APAR VM64774 if necessary: – SET and QUERY commands, system wide settings – Corrects problem in earlier “patch” solution that inhibits paging of PGMBKs (Page Tables) for virtual machines where Reorder is set off. – z/VM 5.4.0 PTF UM33167 RSU 1003 – z/VM 6.1.0 PTF UM33169 RSU 1003
  12. 12. 12 Minidisk Cache Guidelines • In general, enable MDC for everything • Configure some real storage for MDC • Set maximum MDC limits – SET MDC STOR 0M 256M and SET MDC XSTOR 0M 0M • Disable MDC for – Write-mostly or read-once disks (logs, accounting, Linux swap) – Target volumes in backup scenarios • Better performer than Virtual Disk in Storage (VDISK) for read I/Os
  13. 13. 13 Server Machine Guidelines • Server Virtual Machine (SVM) • TCP/IP, RACFVM, etc. • QUICKDSP ON to avoid eligible list • Higher SHARE setting • Ensure performance data includes these virtual machines
  14. 14. 14 CP INDICATE Command • LOAD: shows total system load – Processors, XSTORE, paging, MDC, queue lengths, storage load – STORAGE value not very meaningful • USER EXP: more useful than plain USER • QUEUES EXP: great for scheduler problems and quick state sampling – Mostly useful for eligible list assessments • PAGING: lists users in page wait • I/O: lists users in I/O wait • ACTIVE: displays number of active users over given interval • Consider using MONITOR DATA instead for "serious" examinations
  15. 15. 15 CP INDICATE LOAD Example INDICATE LOAD AVGPROC-088% 03 XSTORE-000000/SEC MIGRATE-0000/SEC MDC READS-000035/SEC WRITES-000001/SEC HIT RATIO-099% PAGING-0023/SEC STEAL-000% Q0-00007(00000) DORMANT-00410 Q1-00000(00000) E1-00000(00000) Q2-00001(00000) EXPAN-002 E2-00000(00000) Q3-00013(00000) EXPAN-002 E3-00000(00000) PROC 0000-087% PROC 0001-088% PROC 0002-089% LIMITED-00000
  16. 16. 16 Selected CP QUERY Commands USERS: number and type of users on system SRM: scheduler/dispatcher settings (LDUBUF, etc.) SHARE: type and intensity of system share FRAMES: real storage allocation PATHS: physical paths to device and status ALLOC MAP: DASD allocation ALLOC PAGE: how full your paging space is XSTORE: assignment of expanded storage MONITOR: current monitor settings MDC: MDC usage VDISK: virtual disk in storage usage SXSPAGES: System Execution Space
  17. 17. 17 5,000 Foot View CP Control Blocks Application Data VM Events *MONITOR System Service MONDCSS Segment MONWRITE Utility Performance Toolkit Raw Monwrite History Files TCP/IP Network 3270 Browser VMRM
  18. 18. 18
  19. 19. 19 Processor REPORT NAME REPORT CODE COMMAND CPU Load and Transactions FCX100 CPU LPAR Load FCX126 LPAR Processor Log FCX144 PROCLOG LPAR Load Log FCX202 LPARLOG User Wait States FCX114 USTAT System Summary FCX225 SYMSUMLG
  20. 20. 20 FCX126 Run 2011/09/20 18:00:56 Logical Partition Activity From 2011/09/13 09:19:15 To 2011/09/13 10:09:15 For 3000 Secs 00:50:00 Result of 13092011 Run __________________________________________________________________________________________ Processor type and model : 2817-401 Nr. of configured partitions: 6 Nr. of physical processors : 25 Partition Nr. Upid #Proc Weight Wait-C Cap %Load CPU %Busy %Ovhd %Susp %VMld %Logld Type LPAR1 1 00 24 100 NO NO 89.0 0 94.3 2.1 6.5 92.0 98.4 IFL 100 NO 1 93.4 2.4 7.7 90.8 98.3 IFL 100 NO 2 93.6 2.3 7.4 91.1 98.3 IFL 100 NO 3 93.6 2.4 7.5 91.1 98.4 IFL 100 NO 4 93.6 2.3 7.4 91.1 98.4 IFL 100 NO 5 93.5 2.3 7.5 91.0 98.3 IFL 100 NO 6 93.4 2.4 7.6 90.9 98.3 IFL 100 NO 7 93.2 2.4 7.7 90.6 98.1 IFL 100 NO 8 93.4 2.4 7.5 90.8 98.2 IFL 100 NO 9 93.2 2.4 7.7 90.7 98.2 IFL 100 NO 10 93.1 2.5 7.8 90.4 98.0 IFL 100 NO 11 93.2 2.4 7.7 90.6 98.0 IFL 100 NO 12 93.4 2.4 7.5 90.8 98.1 IFL 100 NO 13 93.3 2.3 7.5 90.8 98.1 IFL 100 NO 14 93.3 2.4 7.5 90.7 98.1 IFL 100 NO 15 93.2 2.5 7.6 90.5 97.9 IFL 100 NO 16 91.1 2.9 9.0 88.0 96.6 IFL 100 NO 17 91.3 2.8 8.8 88.2 96.7 IFL 100 NO 18 91.4 2.9 8.9 88.3 96.8 IFL 100 NO 19 91.5 2.7 8.8 88.5 97.0 IFL 100 NO 20 91.7 2.8 8.7 88.6 97.1 IFL 100 NO 21 91.5 2.8 8.9 88.5 97.1 IFL
  21. 21. 21 FCX225 Run 2011/09/20 18:00:56 SYSSUMLG System Performance Summary by Time From 2011/09/13 09:19:15 To 2011/09/13 10:09:15 For 3000 Secs 00:50:00 Result of 13092011 Run _________________________________________________________________________________ <------- CPU --------> <Vec> <--Users--> <---I/O---> <Stg> <-Paging--> <--Ratio--> SSCH DASD Users <-Rate/s--> Interval Pct Cap- On- Pct Log- +RSCH Resp in PGIN+ Read+ End Time Busy T/V ture line Busy ged Activ /s msec Elist PGOUT Write >>Mean>> 90.0 1.10 .9293 24.0 .... 117 108 571.2 .4 .0 2610 1051 09:20:15 92.4 1.13 .9059 24.0 .... 117 108 523.0 .5 .0 1992 527.8 09:21:15 92.9 1.07 .9523 24.0 .... 117 108 399.2 .5 .0 1669 301.4 09:22:15 93.2 1.08 .9458 24.0 .... 117 108 557.4 .3 .0 2817 633.9 09:23:15 94.5 1.07 .9535 24.0 .... 117 108 590.3 .3 .0 1410 482.7 09:24:15 93.4 1.07 .9537 24.0 .... 117 108 649.5 .4 .0 2363 488.5 09:25:15 90.4 1.09 .9347 24.0 .... 117 108 684.7 .4 .0 2485 768.9 09:26:15 92.4 1.08 .9436 24.0 .... 117 108 666.8 .4 .0 2940 1215 09:27:15 90.9 1.09 .9344 24.0 .... 117 108 607.2 .4 .0 3179 726.7 09:28:15 92.2 1.08 .9469 24.0 .... 117 108 664.2 .5 .0 2179 896.0 09:29:17 90.8 1.10 .9318 24.0 .... 117 108 645.9 .6 .0 3404 804.5 09:30:16 89.5 1.19 .8579 24.0 .... 117 108 670.8 .7 .0 5402 3487 09:31:15 92.7 1.08 .9412 24.0 .... 117 108 588.7 .4 .0 3091 1807 09:32:15 91.2 1.09 .9421 24.0 .... 117 108 602.8 .3 .0 2635 1076 09:33:16 89.3 1.14 .9047 24.0 .... 117 108 255.2 .5 .0 3140 710.5 09:34:15 88.5 1.10 .9374 24.0 .... 117 108 205.2 .6 .0 2513 897.4 09:35:15 85.9 1.12 .9257 24.0 .... 117 108 320.4 .5 .0 3117 953.5 09:36:16 86.1 1.13 .9144 24.0 .... 117 108 213.5 .5 .0 3642 1144 09:37:16 83.0 1.14 .9090 24.0 .... 117 108 245.6 .5 .0 3414 2133
  22. 22. 22 REPORT NAME REPORT CODE COMMAND Auxiliary Storage Log FCX146 AUXLOG CP Owned Device FCX109 DEVICE CPOWNED User Page Data FCX113 UPAGE Shared Data Spaces FCX134 DSPACESH SXS Available Page Queues Mgnt FCX261 SXSAVAIL Mini Disk Storage FCX178 MDCSTOR Storage Utilization FCX103 STORAGE Available List Log FCX254 AVAILLOG Storage
  23. 23. 23 FCX109 Run 2011/05/31 17:44:26 DEVICE CPOWNED Load and Performance of CP Owned Disks From 2011/05/12 16:48:41 To 2011/05/12 17:31:41 For 2580 Secs 00:43:00 Result of 20110512 Run _______________________________________________________________________________ Page / SPOOL Allocation Summary PAGE slots available 25165k SPOOL slots available 3605928 PAGE slot utilization 25% SPOOL slot utilization 65% T-Disk cylinders avail. ....... DUMP slots available 0 T-Disk space utilization ...% DUMP slot utilization ..% . . . . . . . . . . _____ . .< Device Descr. -> <------------- Rate/s -------------> User Serv MLOAD Volume Area Area Used <--Page---> <--Spool--> SSCH Inter Queue Time Resp Addr Devtyp Serial Type Extent % P-Rds P-Wrt S-Rds S-Wrt Total +RSCH feres Lngth /Page Time EDF1 9336 ZDPAG1 PAGE 12583k 25 196.5 199.9 ... ... 396.4 .0 0 8.18 5.5 88.0 EDF2 9336 ZDPAG2 PAGE 12583k 24 194.2 206.1 ... ... 400.4 .0 0 7.23 6.0 58.4 4374 3390 610SP1 SPOOL 802880 61 .0 .0 .0 .0 .0 .1 0 0 .4 .4 4672 3390 610SP2 SPOOL 803060 68 .0 .0 .0 .0 .0 .0 0 0 1.0 1.0
  24. 24. 24 I/O REPORT NAME REPORT CODE COMMAND General I/O Device FCX108 DEVICE SCSI Device FCX249 SCSI DASD Performance Log FCX131 DEVCONF FICON Channel Load FCX215 INTERIM FCHANNEL General I/O Device Data Log FCX168 DEVLOG I/O Processor Log FCX232 IOPROCLG
  25. 25. 25 Studying MONWRITE Data • z/VM Performance Toolkit • Interactively – possible, but not so useful • PERFKIT BATCH command – pretty useful – Control files tell Perfkit which reports to produce – You can then inspect the reports by hand or programmatically • See z/VM Performance Toolkit Reference for information on how to use PERFKIT BATCH • PRFIT (Brian Wade) http://www.vm.ibm.com/download/packages/descript.cgi?PRFIT
  26. 26. 26 26 Some Notes on z/VM Limits • Sheer hardware: – z/VM 5.2: 24 engines, 128 GB real – z/VM 5.3: 32 engines, 256 GB real – zSeries: 65,000 I/O devices • Workloads we’ve run in test have included: – 54 engines – 440 GB real storage – 128 GB XSTORE – 240 1-GB Linux guests – 8 1-TB guests • Utilizations we routinely see in customer environments – 85% to 95% CPU utilization without worry – Tens of thousands of pages per second without worry • Our limits tend to have two distinct shapes – Performance drops off slowly with utilization (CPUs) – Performance drops off rapidly when wall is hit (storage) Performance Utilization Precipitous (e.g., storage) Gradual (e.g., CPUs)
  27. 27. 27 Some Final Thoughts • Define what is performance for your case • Collect data for a base line of good performance • Implement change management process • Make as few changes as possible at a time • Relieving one bottleneck will reveal another
  28. 28. 28 OBRIGADO! Informações de Contato: Livio Sousa IBM Tutóia – SP livios@br.ibm.com +55 11 9 7203 6637

×