VMware vSphere Performance Troubleshooting

12,519 views
12,263 views

Published on

From the Lewan

Published in: Technology, Design
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
12,519
On SlideShare
0
From Embeds
0
Number of Embeds
1,034
Actions
Shares
0
Downloads
768
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide
  • Who uses Resource Pools? How many have reservations or limits?
  • Use a Host CPU stacked (per VM) graph to quickly identify leading consumers
  • Don’t necessary need CPU saturation for overcommit to have an effect on performance
  • Don’t necessary need CPU saturation for overcommit to have an effect on performance
  • Don’t necessary need CPU saturation for overcommit to have an effect on performance
  • VMware vSphere Performance Troubleshooting

    1. 1. vSphere Performance Monitoring and Troubleshooting<br />Overview<br />What?<br />CPU, Memory, Disk, Network<br />How?<br />Use available tools and a systematic methodology<br />Why?<br />Need to build confidence in virtualizing critical and high demand applications<br />
    2. 2. vSphere Performance Monitoring and Troubleshooting<br />Top Issues<br />Top Issues:<br />Storage "performance capacity" oversubscription<br />Memory oversubscription<br />SMP overuse<br />Firmware & driver issues<br />
    3. 3. vSphere Performance Monitoring and Troubleshooting<br />What tools do we have at our disposal?<br />Top tools for information collection:<br />vCenter - Performance charts and alarms<br />Guest OS* - Task Manager/Resource Monitor and PerfMon<br />ESX Host - esxtop and vscsiStats<br />vSpherePowerCLI<br />*Guest based monitoring is subject to inaccuracy<br />
    4. 4. vSphere Performance Monitoring and Troubleshooting<br />Prepare vCenter Settings<br />
    5. 5. vSphere Performance Monitoring and Troubleshooting<br />Prepare vCenter Settings<br />
    6. 6. vSphere Performance Monitoring and Troubleshooting<br />Prepare vCenter Settings<br />Prepare custom vCenter alerts:<br />Host Console Swap In Rate  512KBps Warning, 1024 KBps Alert<br />Host Console Swap Out Rate  512KBps Warning, 1024 KBps Alert<br />VM CPU Ready  1000ms Warning, 2000ms Alert<br />VM Disk Latency  20ms Warning, 50ms Alert<br />
    7. 7. vSphere Performance Monitoring and Troubleshooting<br />Prepare vCenter Settings<br />
    8. 8. vSphere Performance Monitoring and Troubleshooting<br />Prepare vCenter Settings<br />
    9. 9. vSphere Performance Monitoring and Troubleshooting<br />Prepare esxtop<br />ESXTOP realtime monitoring:<br />esxtop(run command from SSH or tech-support mode)<br />s 2 (refresh view every 2 seconds)<br />V (View VMs only)<br />h(for quick in-tool command reference)<br />Batch Mode for a 5 minute capture of all stats:<br />esxtop-b -a -d 2 -n 150 > esxtop_capture.csv<br />
    10. 10. vSphere Performance Monitoring and Troubleshooting<br />Prepare PowerCLI<br />Run PowerCLI:<br />Tip: Run as Administrator<br />Set-ExecutionPolicyremotesigned<br />Connect-VIServer -Server <host> -Protocol https -User <user> -Password <pass><br /> <host> can be IP address or name of ESX server or vCenter<br />Get-VM<br />Get-Stat -common -realtime<br />
    11. 11. vSphere Performance Monitoring and Troubleshooting<br />Where do we get started?<br />
    12. 12. vSphere Performance Monitoring and Troubleshooting<br />Network Overview<br />
    13. 13. vSphere Performance Monitoring and Troubleshooting<br />Network<br />Troubleshooting Guidance:<br />1. Physical Issues - A bad cable, a failing switch port or NIC, or an incompatible/flawed firmware or device driver (use VMXNET3 whenever possible)<br />2. Configuration Issues - Inconsistent configuration of vSwitches, Port Groups, or upstream VLAN trunks<br />3. Capacity Issues - Too many VMs on a single NIC; inadequate switch backplane or uplink capacity; sharing “unmanaged” network infrastructure for storage and data<br />4. Thresholds – Bandwidth saturation, dropped packets<br />
    14. 14. vSphere Performance Monitoring and Troubleshooting<br />Network – What can we see?<br />
    15. 15. vSphere Performance Monitoring and Troubleshooting<br />Network<br />vCenter Metrics: <br />Receive packets dropped<br />Transmit packets dropped<br />
    16. 16. vSphere Performance Monitoring and Troubleshooting<br />Network<br />ESXTOP Metrics:<br />
    17. 17. vSphere Performance Monitoring and Troubleshooting<br />Network<br />ESXTOP Commands:<br />esxtop<br />s 2<br />n<br />f<br />
    18. 18. vSphere Performance Monitoring and Troubleshooting<br />Network<br />ESXTOP Example:<br />
    19. 19. vSphere Performance Monitoring and Troubleshooting<br />Network<br />PowerCLI Commands:<br />Get-Stat -net -realtime<br />Get-Stat -Entity <Host> -stat net.droppedRx.summation<br />Get-Stat -Entity <Host> -stat net.droppedTx.summation<br />
    20. 20. vSphere Performance Monitoring and Troubleshooting<br />Network – What can’t we see?<br />
    21. 21. vSphere Performance Monitoring and Troubleshooting<br />Network<br />Possible resources for external monitoring:<br />Native Telnet/SSH/HTTP-based interface counters and stats<br />Third-party SNMP, NetFlow and ICMP tools<br />
    22. 22. vSphere Performance Monitoring and Troubleshooting<br />CPU Overview<br />
    23. 23. vSphere Performance Monitoring and Troubleshooting<br />CPU<br />Troubleshooting Guidance:<br />1. Physical Issues - Rare and always catastrophic (e.g. obvious)<br />2. Configuration Issues - Too many / too few vCPUs per VM; SMP/HAL mismatch; incorrect CPU affinity settings<br />3. Capacity Issues - CPU saturation at the guest or host level; CPU starvation due to high IO or other system level ops<br />4. Thresholds – Waiting for CPU cycles (due to co-scheduling, swapping, high IO)<br />
    24. 24. vSphere Performance Monitoring and Troubleshooting<br />CPU – What can we see?<br />
    25. 25. vSphere Performance Monitoring and Troubleshooting<br />CPU<br />vCenter Metrics: <br />Host/Guest Saturation<br />Stacked Graph (per VM)<br />Usage<br />
    26. 26. vSphere Performance Monitoring and Troubleshooting<br />CPU<br />vCenter Metrics:<br />Guest<br />Ready (value/20=n%)<br />Swap Wait<br />
    27. 27. vSphere Performance Monitoring and Troubleshooting<br />CPU<br />ESXTOP Metrics:<br />
    28. 28. vSphere Performance Monitoring and Troubleshooting<br />CPU<br />ESXTOP Commands:<br />esxtop<br />s 2<br />V<br />c<br />e GID (expand/contract a VM world)<br />
    29. 29. vSphere Performance Monitoring and Troubleshooting<br />CPU<br />ESXTOP Example:<br />Excessive vCPUs<br />
    30. 30. vSphere Performance Monitoring and Troubleshooting<br />CPU<br />ESXTOP Example:<br /> Now with fewer vCPUs<br />
    31. 31. vSphere Performance Monitoring and Troubleshooting<br />CPU<br />ESXTOP Example:<br />SMP impacting multiple VMs<br />
    32. 32. vSphere Performance Monitoring and Troubleshooting<br />CPU<br />PowerCLI Example<br />Get-Stat -cpu<br />Get-Stat -Entity <VM> -stat cpu.ready.summation -realtime<br />Very cool script code at:<br />http://www.peetersonline.nl/index.php/vmware/examine-vmware-cpu-ready-times-with-powershell/<br />
    33. 33. vSphere Performance Monitoring and Troubleshooting<br />CPU – Not much else to see…<br />
    34. 34. vSphere Performance Monitoring and Troubleshooting<br />CPU<br />Possible resources for external monitoring:<br />Vendor specific systems management tools,<br />MS System Center, etc.<br />http://www.peetersonline.nl/index.php/vmware/examine-vmware-cpu-ready-times-with-powershell/<br />
    35. 35. vSphere Performance Monitoring and Troubleshooting<br />Memory Overview<br />
    36. 36. vSphere Performance Monitoring and Troubleshooting<br />Memory<br />Troubleshooting Guidance:<br />1. Physical Issues - Rare and usually catastrophic<br />2. Configuration Issues - Memory overcommit; incorrect configuration of shares, reservations or limits<br />3. Capacity Issues - Physical memory exhaustion<br />4. Thresholds – Active memory swapping<br />
    37. 37. vSphere Performance Monitoring and Troubleshooting<br />Memory – What can we see?<br />
    38. 38. vSphere Performance Monitoring and Troubleshooting<br />Memory<br />vCenter Metrics<br />Swap in rate<br />Swap out rate<br />Swap used<br />
    39. 39. vSphere Performance Monitoring and Troubleshooting<br />Memory<br />ESXTOP Metrics:<br />
    40. 40. vSphere Performance Monitoring and Troubleshooting<br />Memory<br />ESXTOP Commands:<br />esxtop<br />s 2<br />V<br />m<br />f<br />
    41. 41. vSphere Performance Monitoring and Troubleshooting<br />Memory<br />ESXTOP Example:<br />m – Heavy swapping and ballooning<br />
    42. 42. vSphere Performance Monitoring and Troubleshooting<br />Memory<br />PowerCLI Commands:<br />Get-Stat -mem<br />Get-Stat -Entity <VM> -stat mem.swapoutRate.average -realtime<br />Get-Stat -Entity <VM> -stat mem.swapinRate.average -realtime<br />Get-Stat -Entity <VM> -stat mem.vmmemctl.average -realtime<br />Get-Stat -Entity <Host> -stat mem.swapused.average -realtime<br />
    43. 43. vSphere Performance Monitoring and Troubleshooting<br />Memory – The occasional DIMM failure…<br />
    44. 44. vSphere Performance Monitoring and Troubleshooting<br />Memory<br />Possible external monitoring options:<br />Vendor specific systems management tools, MS System Center, etc.<br />Don’t forget vCenter ‘Hardware Status’ reporting<br />
    45. 45. vSphere Performance Monitoring and Troubleshooting<br />Storage Overview<br />
    46. 46. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />Troubleshooting Guidance:<br />1. Physical Issues - A bad cable, a failing switch port or HBA/NIC, or an incompatible/flawed firmware or device driver (use LSI Logic Parallel/SAS as appropriate)<br />2. Configuration Issues - Inconsistent or incorrect configuration of LUN masking, zoning, or multi-pathing; inappropriate resource provisioning; aligning queue depth with storage type<br />3. Capacity Issues - Too many VMs or VMDKs on a LUN; too much IO load for an array or RAID group<br />4. Thresholds – Latency and queuing<br />
    47. 47. vSphere Performance Monitoring and Troubleshooting<br />Storage – What can we see?<br />
    48. 48. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />vCenter Metrics:<br />Datastore<br />Read latency<br />Write latency<br />
    49. 49. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />ESXTOP Metrics:<br />
    50. 50. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />
    51. 51. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />ESXTOP Commands (HBA/LUN):<br />esxtop<br />s 2<br />V<br />d<br />f<br />e vmhba#<br />
    52. 52. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />ESXTOP Commands(LUN/Datastore):<br />esxtop<br />s 2<br />V<br />u<br />L 38<br />f<br />e <devname><br />
    53. 53. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />ESXTOP Commands (VM/VMDK):<br />esxtop<br />s 2<br />V<br />v<br />f<br />e GID<br />
    54. 54. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />ESXTOP Examples: <br />d - Multipathing / Expand adapter to view targets<br />
    55. 55. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />ESXTOP Examples: <br />u - Queuing, Disk or Kernel?<br />
    56. 56. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />ESXTOP Examples:<br />v - Identify the IO consumer<br />
    57. 57. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />vscsiStatsCommand:<br />[root@host ~]# cd /usr/lib/vmware/bin<br />./vscsiStats -l<br />./vscsiStats -s -w <worldid><br />./vscsiStats -w <worldid> -p all -c > /path/vscsistats.csv<br />./vscsiStats -x<br />
    58. 58. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />vscsiStatsExample:<br />
    59. 59. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />vscsiStatsExample:<br />
    60. 60. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />vscsiStatsExample:<br />http://dunnsept.wordpress.com/2010/03/11/new-vscsistats-excel-macro/<br />
    61. 61. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />vscsiStatshistograms:<br />
    62. 62. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />PowerCLI Commands:<br />Get-Stat -disk<br />Get-Stat -stat disk.totalLatency.average -realtime<br />Get-Stat -stat disk.deviceLatency.average -realtime<br />Get-Stat -stat disk.kernelLatency.average -realtime<br />
    63. 63. vSphere Performance Monitoring and Troubleshooting<br />Storage – What can’t we see?<br />
    64. 64. vSphere Performance Monitoring and Troubleshooting<br />Storage – More of what we can’t see<br />
    65. 65. vSphere Performance Monitoring and Troubleshooting<br />Storage<br />Possible external monitoring solutions:<br />Vendor specific SAN and fabric/network tools, native Telnet/SSH/HTTP-based tools for most networks, third-party SNMP-based tools<br />
    66. 66. vSphere Performance Monitoring and Troubleshooting<br />Working with PowerCLI<br />PowerCLI Tips:<br />For a complete list of stat objects:<br />Get-StatType -Entity <Host/VM><br />Pipe the outputs to a file:<br />Get-Stat -stat <stat> -realtime | ft -autosize > c:temp<filename>.csv<br />Import the CSV file data to a spreadsheet with fixed width parameters<br />Build pretty graphs<br />
    67. 67. vSphere Performance Monitoring and Troubleshooting<br />Working with PowerCLI<br />
    68. 68. vSphere Performance Monitoring and Troubleshooting<br />Way More Information<br />ESXTOP / vscsiStats / PowerCLI:<br />http://www.yellow-bricks.com/esxtop/ Special thanks to Duncan Epping!<br />http://communities.vmware.com/docs/DOC-3930<br />http://communities.vmware.com/docs/DOC-9279<br />http://communities.vmware.com/docs/DOC-10095<br />http://www.vmware.com/support/developer/PowerCLI/PowerCLI41/html/Get-Stat.html<br />http://www.lucd.info/2009/12/30/powercli-vsphere-statistics-part-1-the-basics/<br />http://simongreaves.co.uk/blog/esxtop-guide<br />http://dunnsept.wordpress.com/2010/03/11/new-vscsistats-excel-macro/<br />
    69. 69. vSphere Performance Monitoring and Troubleshooting<br />Easy button?<br />What is the problem with these tools?<br />Limited alerting mechanisms, no collection automation or historical data for comparison, and no correlation of events!<br />vCenter Operations Standard / Enterprise<br />

    ×