VMware Performance  Troubleshooting Presented by Chris Kranz
Topics Covered•   Introduction•   Root Cause Analysis•   Performance Characteristics•   CPU•   Networking•   Memory•   Dis...
IntroductionMultiple layers of virtualisation are used toincrease service levels, availability andmanageabilityHowever, mu...
Performance Basics• Virtual Machine Resources  – CPU  – Memory  – Disk  – Networking
Resource Maximums                           Host           Guest  Logical Processors       64              N/A  Virtual CP...
Typical Host                                       vSphere 1U Host  CPU’s                                 2 x Quad Core  M...
Root Cause Analysishttp://www.vmware.com/resources/techresources/10066
Root Cause ...
Monitoring Performance• Do not rely on guest tools, but  – Can show high CPU, & Memory Utilisation  – Measurement of Laten...
Performance Analysis Tools• esxtop (service console only)• resxtop (remote command line utilities)• Performance graphs in ...
esxtop• esxtop can be run:  – Interactively  – Batch (eg. esxtop -a -b > analysis.csv)  – Load batch into windows perfmon ...
esxtop basics                                Host Resources   Name of Resource   Pool, Virtual      Number of Worlds   Mac...
Performance Characteristics    CPU            Memory           Networking       DiskSlow Processing   Slow Processing    P...
CPU         ESX Scheduler                                        Basic World States                                       ...
CPU                        High %RDY + High %User can imply over commitmentesxtop     •PCPU(%): CPU utilization     •%USED...
CPUVI-Client   Used Time > Ready Time:   Possible CPU over-committment                                   Used Time        ...
CPUFurther Investigation                        %MLMTD shows this VM has been limited
CPUFurther Investigation                        High ready time caused by CPU resource limit
VMware Memory Management• Transparent Page Sharing• VMware Tools Balloon Driver to force the VM to swap to disk• Virtual M...
MemoryBallooning vs. Swapping Ballooning driver causes the host to swap pages that it chooses to disk ESX Swapping will sw...
Memory• Ballooning can be disabled (0 value) or  controlled on a per Virtual Machine basis  using:  sched.mem.maxmemctl• D...
Memory - HostVI Client shows memory usage of the host. This is calculated as “consumed + overheadmemory + Service Console”...
Memory - Guest  Host Memory = Consumed + Overhead Memory   Guest Memory = Active Memory for Guest OS
Memory – Guest Overhead
Memory          Virtual Machine Memory Metrics – VI ClientMetric                 DescriptionMemory Active (KB)     Physica...
Memory                              Host Memory Metrics – VI ClientMetric                  DescriptionMemory Active (KB)  ...
Memory   PMEM: Total physical memory breakdownesxtop        VMKMEM: Memory managed by vmkernel              COSMEM: Servic...
Memory      esxtop / VI Client metrics : Virtual MachinesVI Client         esxtopActive Memory     TCHDMemory Usage      %...
Memory          esxtop / VI Client metrics : Host UsageVI Client              esxtopMemory Active          N/A (try /proc/...
Memory         VI Client memory usage graph
Memory         Troubleshooting Memory usage issues
Networking                       •Switch Assisted Teaming (IP Hash)                       •VLAN Trunking                  ...
Networkingesxtop                               Transmit and Receive in Mb/s              Transmit and Receive in Packets
Networkingesxtop                  Dropped Packets Transmit                                   Drop Packets Received
Disk Varying Factors    • File system performance    • Disk subsystem configuration (SAN, NAS, iSCSI, local disk)    • Dis...
Disk                                     VI Client statisticsQuite Coarse Statistics   • Disk read / write rate (KB/s)   •...
Disk                                    esxtop statisticsAggregated stats similar to VI Client    • Disk read / write per ...
DiskSAN Rough Estimates Purely looking at a single ESX host, roughly: Throughput (in MBps) = (Outstanding IOs * Block size...
DiskDesired Latency CalculationsDesired Larency in msec <= (Outstanding IOs * Block size in KB) / Throughput per hostExamp...
Disk          SAN Cache enabledVI Client             High throughputSAN Cache disabled Poor throughput
Diskesxtop             Latency is quite high            After enabling cache,            Latency is reduced
Virtual Machine Optimisation Deploy all machines from an optimised template!• VMware tools MUST be installed• The disks MU...
Virtual Machine OptimisationBlock alignment is vital to good disk performance!
Command Action      esxtop         space    Update the display                     ?        Show the help pageCommand Opti...
esxtopCommand Line Optionsfrom the console                       Command Action                       -b       batch mode ...
esxtopExpand the default window size for your session to get all statistics
vm-supportCreates a packaged zip file containing the following sections:   • boot         • contains the grub configuratio...
vm-supportUsing vm-support to extract performance information:vm-support –S –d <duration> -i <interval><duration> and <int...
Service Console Performance•Multiple Service Console networks – for network resiliency•Increased Service Console memory – ...
Resource Groups                  Dynamically reallocate resource shares                  Additional VM, shares allow you t...
Design Guidelines• Full Resilience / Multiple paths• Standard configuration across all aspects (ESX, Storage, Networking, ...
Capacity Planner & P2V Cautions and Limitations• Peak CPU usage can sometimes be misleading• Back-end storage system perfo...
Conclusion• Performance issues can often be traced with simple root causeanalysis using basic tools (VI Client / esxtop)• ...
Reference Articles•   http://www.vmware.com/pdf/esx3_memory.pdf•   http://www.vmworld.com/docs/DOC-2370•   http://blogs.vm...
Upcoming SlideShare
Loading in …5
×

Vmwareperformancetroubleshooting 100224104321-phpapp02

858 views

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
858
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
72
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Vmwareperformancetroubleshooting 100224104321-phpapp02

  1. 1. VMware Performance Troubleshooting Presented by Chris Kranz
  2. 2. Topics Covered• Introduction• Root Cause Analysis• Performance Characteristics• CPU• Networking• Memory• Disk• Virtual Machine optimisation• ESXTop• vm-support• Service Console• Resource Groups• Design Guidelines• Capacity Planner limitations and cautions• Conclusion• Reference Articles
  3. 3. IntroductionMultiple layers of virtualisation are used toincrease service levels, availability andmanageabilityHowever, multiple layers of virtualisation oftenmask performance and configuration issuesmaking it more of a challenge to troubleshootand correctThe worst out come is that performance issuesafter a virtualisation project lead to theperception that VMware results in reducedperformance and future confidence in VMwarecan be affected
  4. 4. Performance Basics• Virtual Machine Resources – CPU – Memory – Disk – Networking
  5. 5. Resource Maximums Host Guest Logical Processors 64 N/A Virtual CPUs N/A 8 Virtual CPU’s per Core 20 N/A Memory 1TB 256GBhttp://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf
  6. 6. Typical Host vSphere 1U Host CPU’s 2 x Quad Core Memory 32-64GB RAM Typical 3 VMs per core, 24VM’s per Host Each has 2GB of RAM = 48GB of RAM
  7. 7. Root Cause Analysishttp://www.vmware.com/resources/techresources/10066
  8. 8. Root Cause ...
  9. 9. Monitoring Performance• Do not rely on guest tools, but – Can show high CPU, & Memory Utilisation – Measurement of Latency & throughput of Disk & Network Interfaces• Use the virtualisation layer, to diagnose cause: – Guest is unaware of virtualisation workload – The way in which guest OS’s account time is different – No visibility of available resources
  10. 10. Performance Analysis Tools• esxtop (service console only)• resxtop (remote command line utilities)• Performance graphs in vCentre
  11. 11. esxtop• esxtop can be run: – Interactively – Batch (eg. esxtop -a -b > analysis.csv) – Load batch into windows perfmon or MS Excel• Two keys to remember – H : help – F : fields to display
  12. 12. esxtop basics Host Resources Name of Resource Pool, Virtual Number of Worlds Machine or World
  13. 13. Performance Characteristics CPU Memory Networking DiskSlow Processing Slow Processing Packet Loss Log Stalls High CPU Wait Disk Swapping Slow Network Disk Queue Slow Application Performance Reduced User Experience Data Loss and Corruption
  14. 14. CPU ESX Scheduler Basic World States Read / Run / Wait CPU States Service Virtual Ready / Usage / Wait Console Machine Limits / Shares / Reservations
  15. 15. CPU High %RDY + High %User can imply over commitmentesxtop •PCPU(%): CPU utilization •%USED: Utilization •%RDY: Ready Time •%RUN: Run Time •%WAIT: Wait and idling time
  16. 16. CPUVI-Client Used Time > Ready Time: Possible CPU over-committment Used Time Ready Time
  17. 17. CPUFurther Investigation %MLMTD shows this VM has been limited
  18. 18. CPUFurther Investigation High ready time caused by CPU resource limit
  19. 19. VMware Memory Management• Transparent Page Sharing• VMware Tools Balloon Driver to force the VM to swap to disk• Virtual Machine Page File
  20. 20. MemoryBallooning vs. Swapping Ballooning driver causes the host to swap pages that it chooses to disk ESX Swapping will swap any pages to disk.
  21. 21. Memory• Ballooning can be disabled (0 value) or controlled on a per Virtual Machine basis using: sched.mem.maxmemctl• Default is set to 65%, can be controlled at host level.• Only is an issue in resource contention scenarios. (or VM’s with low latency eg Citrix)
  22. 22. Memory - HostVI Client shows memory usage of the host. This is calculated as “consumed + overheadmemory + Service Console”.Performance charts are a very good way of showing the Virtual Machine memorybreakdown. • Consumed Memory • Ballooned Memory • Shared Memory • Swapped Memory
  23. 23. Memory - Guest Host Memory = Consumed + Overhead Memory Guest Memory = Active Memory for Guest OS
  24. 24. Memory – Guest Overhead
  25. 25. Memory Virtual Machine Memory Metrics – VI ClientMetric DescriptionMemory Active (KB) Physical pages touched recently by a VMMemory Usage (%) Active memory / configured memoryMemory Consumed (KB) Machine memory mapped to a virtual machine, including its portion of shared pages. Doesn’t include overhead memoryMemory Granted (KB) Physical pages allocated to a virtual machine. May be less than configured memory. Includes shared pages. Doesn’t include overhead memory.Memory Shared (KB) Physical pages shared with other virtual machinesMemory Balloon (KB) Physical memory ballooned from a virtual machineMemory Swapped (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulativeOverhead Memory (KB) Machine pages used for virtualisation
  26. 26. Memory Host Memory Metrics – VI ClientMetric DescriptionMemory Active (KB) Physical pages touched recently by the hostMemory Usage (%) Active memory / configured memoryMemory Consumed (KB) Total host physical memory – free memory on host. Includes Overhead and Service Console memoryMemory Granted (KB) Sum of physical pages allocated to all virtual machines. Doesn’t include overhead memory.Memory Shared (KB) Physical pages shared by virtual machines on hostShared Common (KB) Total machine pages used by shared pagesMemory Balloon (KB) Machine pages ballooned from virtual machinesMemory Swap Used (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulativeOverhead Memory (KB) Machine pages used for virtualisation
  27. 27. Memory PMEM: Total physical memory breakdownesxtop VMKMEM: Memory managed by vmkernel COSMEM: Service Console memory breakdown PSHARE: Page sharing statistics SWAP: Swap statistics MEMCTL: Balloon driver data
  28. 28. Memory esxtop / VI Client metrics : Virtual MachinesVI Client esxtopActive Memory TCHDMemory Usage %ACTVConsumed Memory N/AMemory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)Memory Shared SHRD (+SHRDSVD per VM). Must enable COW stats in ESXTOPMemory Balloon MCTLSZMemory Swapped SWCUR (SWR/s & SWW/s are rates)Overhead Memory OVHD & OVHDMAX
  29. 29. Memory esxtop / VI Client metrics : Host UsageVI Client esxtopMemory Active N/A (try /proc/vmware/sched/mem-verbose)Memory Usage N/A (try /proc/vmware/sched/mem-verbose)Memory Consumed PMEM total – PMEM freeMemory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)Memory Shared PSHARE (shared)Memory Shared Common PSHARE (common)Memory Balloon MEMCTLMemory Swap Used SWAP (r/w and w/s are rates)Overhead Memory OVHD & OVHDMAX
  30. 30. Memory VI Client memory usage graph
  31. 31. Memory Troubleshooting Memory usage issues
  32. 32. Networking •Switch Assisted Teaming (IP Hash) •VLAN Trunking •Flow Control (full) •Speed & Duplex (1000Mb / Full) •Port Fast •BPDU Disabled •STP Disabled •Link State Tracking •Jumbo FramesNetwork configuration is more likely to blame than resource contention
  33. 33. Networkingesxtop Transmit and Receive in Mb/s Transmit and Receive in Packets
  34. 34. Networkingesxtop Dropped Packets Transmit Drop Packets Received
  35. 35. Disk Varying Factors • File system performance • Disk subsystem configuration (SAN, NAS, iSCSI, local disk) • Disk caching • Disk formats (thick, sparse, thin)ESX Storage Stack •Different latencies for different disks •Queuing within the kernel K: Kernel D: Device G: Guest
  36. 36. Disk VI Client statisticsQuite Coarse Statistics • Disk read / write rate (KB/s) • Disk usage: sum of read BW and write BW (KB/s) • Disk read / write requests (per 20s interval) • Bus resets / Command aborts (per 20s interval) •Per LUN or aggregated stats
  37. 37. Disk esxtop statisticsAggregated stats similar to VI Client • Disk read / write per sec (READS/s, WRITES/s) • MB read / write per sec (MBREAD/s, MBWRTN/s)Latency Statistics • Kernel Average / command (KAVG/cmd) • Device Average / command (DAVG/cmd) • Guest Average / command (GAVG/cmd)Queuing Information • Adapter Queue Length (AQLEN) • LUN Queue Length (LQLEN) • VMKernel (QUED) • Active Queue (ACTV) • %Used (%USD = ACTV/LQLEN)
  38. 38. DiskSAN Rough Estimates Purely looking at a single ESX host, roughly: Throughput (in MBps) = (Outstanding IOs * Block size in KB) / latency in msec FC, rough maximums: Effective Link Bandwidth = ~80/90% of Real Bandwidth Effective (2Gbps) = 200 – 230 MBps Effective (4Gbps) = 410 – 460 MBps Effective (8Gbps) = 820 – 920 MBps iSCSI / NFS / FCoE, rough maximums: Effective Link Bandwidth = ~70/80% of Real Bandwidth Effective (1GigE) = 90 – 100 MBps Effective (10GigE) = 900 – 1000 MBps
  39. 39. DiskDesired Latency CalculationsDesired Larency in msec <= (Outstanding IOs * Block size in KB) / Throughput per hostExample:Number of Hosts: 16Effective Link Bandwidth: 90 MBpsThroughput per host: 90 / 16 = 5.6 MBpsDesired Latency: (32 * 32) / (5.6) = 182.86 msecWorkload Cached Sequential Read Cached Sequential WriteDesired Latency (msec) 182.86 182.86Observed Latency (msec) ~350 ~180Throughput Drop? Yes NoThroughput (MBps) ~45 ~90
  40. 40. Disk SAN Cache enabledVI Client High throughputSAN Cache disabled Poor throughput
  41. 41. Diskesxtop Latency is quite high After enabling cache, Latency is reduced
  42. 42. Virtual Machine Optimisation Deploy all machines from an optimised template!• VMware tools MUST be installed• The disks MUST be block aligned to the storage (even when using NFS and SAN)• Where possible, always separate data disks from OS disks• Windows performance settings should be optimised for application performance• Guest operating system timeouts should be set as defined by the SAN vendor• Pagefile should be separated where appropriate (this can impact VMware SRM however)• Unused Windows services should be disabled (wireless config, print spooler, audio, etc.)• Last access update time should be disabled (unless where required)• Logging of the VM should be disabled (only enabled for troubleshooting)• Remove any unused virtual hardware (floppy drives, USB, etc.)• Disable screen savers and power saving features, including logon screen saver• Enable Remote Desktop, avoid using the VI Client for remote administration• Install standard applications into template (bginfo, AntiVirus, any host agents, etc)• Multiple-CPU’s should be allocated sparingly
  43. 43. Virtual Machine OptimisationBlock alignment is vital to good disk performance!
  44. 44. Command Action esxtop space Update the display ? Show the help pageCommand Options q quitwhen inside esxtop f/F Add or Remove columns from the display o/O Change the order the display is sorted s change the update interval # change the number of instances to display W Write configuration to file e Expand / Rollup CPU Stats V View only VM instances L Change the length of the NAME field m Display memory statistics n Display network statistics i Display interrupt statistics d Display disk adapter statistics u Display disk device statistics v Display disk VM statistics
  45. 45. esxtopCommand Line Optionsfrom the console Command Action -b batch mode -l locks the objects available in the first snapshot -s enables secure mode -a show all statistics -c sets the configuration file -R enables replay mode (used with “vm-support –S”) -d sets the update interval -n runs esxtop for n iterations
  46. 46. esxtopExpand the default window size for your session to get all statistics
  47. 47. vm-supportCreates a packaged zip file containing the following sections: • boot • contains the grub configuration • etc • contains the Console OS configuration files (cron, tcpwrappers, syslog, etc) • proc • contains much of the hardware configuration modules and variables • tmp • contains a lot of the ESX specific configuration output • var • contains log files and any core dumps • vmfs • contains the structure of the VMFS datastores • esx3-installation (where appropriate) • contains a copy if the previous esx3 configuration variables
  48. 48. vm-supportUsing vm-support to extract performance information:vm-support –S –d <duration> -i <interval><duration> and <interval> are in secondsThe output from this can then be replayed in esxtop for review after it has beenextracted.esxtop –R <path_to_vm-support_output>
  49. 49. Service Console Performance•Multiple Service Console networks – for network resiliency•Increased Service Console memory – upto 800MB•Use host agents supplied by your vendors•Make storage recommended tweaks such as HBA Queue Depthand IO timeouts•Minimal use of the VI Client console – RDP or SSH instead•Properly sized vCenter server – 64bit OS where possible
  50. 50. Resource Groups Dynamically reallocate resource shares Additional VM, shares allow you to over- commit resources and have a graceful re-allocation Remove a VM and exploit extra resources across all remaining VM’s
  51. 51. Design Guidelines• Full Resilience / Multiple paths• Standard configuration across all aspects (ESX, Storage, Networking, etc.)• Standard naming conventions• Learn from others mistakes• Follow guidelines from vendors best-practices• Rule out the basics before requesting support
  52. 52. Capacity Planner & P2V Cautions and Limitations• Peak CPU usage can sometimes be misleading• Back-end storage system performance• P2V machines will require block-aligning to the storage• P2V machines will still require guest OS optimisation
  53. 53. Conclusion• Performance issues can often be traced with simple root causeanalysis using basic tools (VI Client / esxtop)• Performance tools help diagnose issues and help rule out non-issues• Performance tools are useful in different contexts, not alwayseither/or • Real-time data and troubleshooting: esxtop • Historical data: VI Client • Coarse resource / cluster usage: VI Client • Detailed resource usage: esxtop• Combine information from various tools to get a complete picture• Always benchmark your systems first so you not what the optimalperformance is that you can receive
  54. 54. Reference Articles• http://www.vmware.com/pdf/esx3_memory.pdf• http://www.vmworld.com/docs/DOC-2370• http://blogs.vmware.com/performance/• http://communities.vmware.com/docs/DOC-5420• http://kb.vmware.com/kb/1008205• http://communities.vmware.com/community/vmtn/general/performance• http://www.vmware.com/products/vmmark/• http://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdf• http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf• http://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdf• http://www.vmware.com/pdf/GuestOS_guide.pdf• http://www.vmware.com/resources/techresources/10066• http://www.vmware.com/resources/techresources/10059• http://www.vmware.com/resources/techresources/10062

×