SlideShare a Scribd company logo
© 2010 VMware Inc. All rights reserved
Advanced performance troubleshooting using
esxtop/resxtop
Krishna Raj Raja
Staff Engineer, Performance Group
2
Disclaimer
This session may contain product features that are
currently under development.
This session/overview of the new technology represents
no commitment from VMware to deliver these features in
any generally available product.
Features are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new technologies or features
discussed or presented have not been determined.
“THESE FEATURES ARE REPRESENTATIVE OF FEATURE AREAS UNDER
DEVELOPMENT. FEATURE COMMITMENTS ARE SUBJECT TO CHANGE, AND
MUST NOT BE INCLUDED IN CONTRACTS, PURCHASE ORDERS, OR SALES
AGREEMENTS OF ANY KIND. TECHNICAL FEASIBILITY AND MARKET
DEMAND WILL AFFECT FINAL.”
3
esxtop resources
esxtop manual:
http://www.vmware.com/pdf/vsphere4/r41/vsp_41_resource_mgmt.pdf
VMware Community documents:
http://communities.vmware.com/docs/DOC-9279 - ESX 4.0
http://communities.vmware.com/docs/DOC-11812 - ESX 4.1
esxtop for advanced users:
VMworld 2008 - http://vmworld.com/docs/DOC-2356
VMworld 2009 - http://vmworld.com/docs/DOC-3838
4
Ten things that you need to know about
esxtop
5
esxtop counters
1. esxtop does not create performance metrics
• esxtop derives performance metrics from raw counters exported in the
VMkernel System Info nodes (VSI nodes)
• esxtop can show new counters on older ESX system if the raw counters
are present in VMKernel
6
esxtop counters
2. Counter values
• Many raw counters have static values that do no change with time – esxtop
displays them as it is
• Many counters increment monotonically, esxtop reports the delta for these
for the given refresh interval – for instance CMDS/sec, packets
transmitted/sec etc
• %USED and %RUN - CPU occupancy delta between successive
snapshots
7
Refresh interval
3. Graphs will look different depending on the refresh interval
• Many counters values are dependent on refresh interval
• Larger refresh interval smoothens spikes and troughs
2 second refresh interval 10 second refresh interval
8
esxtop counters
4. Counter normalization
• By default counters are shown for the group
• In group view counters values are cumulative
• In expanded view, counters are normalized per entity
Cumulative
stats
vcpu world
consumes CPU
Pressing ‘e’ key expands a group
9
esxtop counters
5. %USED can exceed 100
• Turbo boost can increase the processor clock speed
• Asynchronous work can be happening on a different core on behalf of the
VM
VM on a NFS datastore running I/O intensive workload
10
esxtop batch mode
6. Batch mode (-b)
• Produces windows perfmon compatible CSV file
• CSV file compatibility requires fixed number of columns on every row -
statistics of VMs/worlds instances that appear after starting the batch mode
are not collected because of this reason
• Only counters that are specified in the configuration file are collected, (-a)
option collects all counters
• Counters are named slightly differently
11
esxtop batch mode – importing data into perfmon
12
esxtop batch mode – viewing data in perfmon
13
esxtop batch mode – trimming data
Trimming data
Saving data after trim
14
esxplot
http://labs.vmware.com/flings/esxplot
15
I/O Latencies
7. IO latencies
• IO latencies are measured per SCSI command so it is not affected by
refresh interval
• Reported latencies are average values for all the SCSI commands issued
within the refresh interval window
• Reported average latencies can be different on different screens (adapter,
LUN, VM), since each screen accounts for different group of I/Os
16
resxtop – remote esxtop
8. You can use resxtop to connect to different ESX hosts
• Newer version of resxtop will connect to older ESX hosts
9. You don’t need root access to view esxtop counters
• resxtop can authenticate using vCenter credentials
17
esxtop CPU usage
10. esxtop can consume non-trivial amount of CPU
• When you have very large inventory (VMs, LUNs, virtual disks, virtual NICs
etc)
• You can limit the amount of data collected by limiting the fields (columns)
and entities (rows), you can also reduce CPU consumption by locking
entities, (-l) option
CPU consumption on a host with 512 VMs
CPU consumption with esxtop -l
CPU usage when using resxtop
18
Performance Troubleshooting Using
esxtop
19
esxtop screens
Screens
• c: cpu (default)
• m: memory
• n: network
• d: disk adapter
• u: disk device (added in ESX 3.5)
• v: disk VM (added in ESX 3.5)
• i: Interrupts (new in ESX 4.0)
• p: power management (new in ESX
4.1)
VMkernel
CPU
Scheduler
Memory
Scheduler
Virtual
Switch
vSCSI
c, i, p m d, u, vn
VM VM VMVM
20
Troubleshooting CPU Problems
21
CPU Constrained
SMP VM
High CPU
utilization
Both the
virtual CPUs
CPU
constrained
22
CPU Contention
4 CPUs,
all at
100%
3 SMP
VMs
VMs don’t
get to run
all the time
%ready
accumulates
23
CPU Limit
Max
Limited
CPU
Limit AMAX = -1 : Unlimited
24
Mis-configured SMP VM
vCPU 1 not
used by the
VM
Incorrect (UP) Kernel/HAL inside the
guest or the application inside the
guest is single threaded
25
Power management – CPU frequency scaling
C states: C0 – busy, C1 – halted, C2 – deep halt
P states: P0 – Highest clock frequency, P11 – Lowest clock frequency
26
VM Power Usage
Experimental feature, not enabled by default.
VMkernel advanced setting: Power.ChargeVMs
27
CPU clock frequency scaling
%USED: CPU usage with reference to base clock frequency
%UTIL: CPU utilization with reference to current clock frequency
%RUN: CPU scheduled time
VM is running all
the time but uses
only 75% of the
clock frequency
28
Hyperthreading
Two VMs running
on different cores
Two VMs sharing
the same core
%LAT_C counter
shows the time de-
scheduled due to
core sharing
29
Timer interrupt rate
Linux Guests
30
Timer interrupt rate
Windows Guests – Multimedia timer
31
New metrics in CPU screen
%LAT_C : %time the VM was not scheduled due to CPU resource issue
%LAT_M : %time the VM was not scheduled due to memory resource issue
%DMD : Moving CPU utilization average in the last one minute
EMIN : Minimum CPU resources in MHZ that the VM is guaranteed to get
when there is CPU contention
32
Troubleshooting Memory Problems
33
esxtop memory screen (m)
Possible states:
high, soft, hard
and low
PMEM – Total Physical memory
VMKMEM - Memory managed by VMKernel
COSMEM - Memory used by Service Console
34
Not able to power-on a new VM
Memory reservation
820 MB
reservation
requested
Overhead
memory
needs to be
reserved
4G memory
reservation
35
Granted Memory
Granted Memory = Memory touched by the guest
Windows and FreeBSD Guests touches (zeroes) all its memory during boot
Linux Guests touches memory when it first uses it
36
Ballooning versus Swapping
MCTL: N - Balloon
driver not active, tools
probably not installed
Memory
Hog
VMs
Swapped in
the past but
not actively
swapping
now
Swap target is
more for the VM
without the balloon
driver
VM with
Balloon
driver swaps
less
37
Memory Compression Stats
COWH : Copy on Write Pages hints – amount of memory in MB that are
potentially shareable
CACHESZ: Compression Cache size
CACHEUSD: Compression Cache currently used
ZIP/s, UNZIP/s: Memory compression/decompression rate
38
Wide NUMA - CPU
2 NUMA
nodes with
~6G each
NUMA home
node not assigned
6-vcpu VM –
cannot fit into
a NUMA node
size of 4
CPUs
4G, can fit into
a single node
39
NUMA affinity not set
NUMA machine
with 2 nodes
CPU affinity set to
wrong NUMA node
All the memory in
remote node
NHN: NUMA Home Node
NLMEM: Memory in local node
NRMEM: Memory in remote node
40
Wide NUMA - Memory
2 NUMA
nodes with
~6G each
NUMA home
node not
assigned
VM cannot be
fit into a single
NUMA node
41
Troubleshooting Network Problems
42
vSwitch active uplink
TEAM-PNIC : The uplink that the virtual switch port is currently using
43
Dropped packets at vSwitch
Packet drops usually happens when the traffic has
no flow control (UDP/Multicast/Broadcast packets)
44
Multicast/Broadcast stats
PKTTXMUL/s – Multicast packets transmitted per second
PKTRXMUL/s – Multicast packets received per second
PKTTXBRD/s – Broadcast packets transmitted per second
PKTRXBRD/s – Broadcast packets received per second
45
NFS stats
DAVG and KAVG is not available for network backed storage
GAVG – gives the end to end latency
46
Troubleshooting Disk Problems
47
Disk I/O latency
Host bus adapters (HBAs) -
includes SCSI, iSCSI, RAID,
and FC-HBA adapters
Latency stats from the
Device, Kernel and the
Guest
DAVG/cmd - Average latency (ms) from the Device (LUN)
KAVG/cmd - Average latency (ms) in the VMKernel
GAVG/cmd - Average latency (ms) in the Guest
48
Problem with the disk subsystem
Bad
throughput
Good
throughput
Device Latency is
high - cache disabled
Low device
Latency
49
Insufficient Queue depth
Non-zero
KAVG
Queuing at
the HBA
50
FC bottleneck
‘v’ – VM view
‘u’ – device view
‘d’ – adapter view
51
vStorage API for Array Integration (VAAI) stats
CLONE_RD, CLONE_WR: Number of Clone read/write requests
CLONE_F: Number of Failed clone operations
MBC_RD/s, MBC_WR/s – Clone read/write MBs/sec
ATS – Number of ATS commands
ATSF – Number of failed ATS commands
ZERO – Number of Zero requests
ZEROF – Number of failed zero requests
MBZERO/s – Megabytes Zeroed per second
52
VAAI - virtual disk creation example
vStorage API for Array Integration (VAAI)
53
SCSI reservation conflicts
54
Other diagnostic tools
55
Other diagnostic tools (1 of 2)
sched-stats and schedtrace
• vm-support -s/-S flag captures sched-stats
• vm-support -c flag captures scheduler trace – takes lot of disk space
memstats
• Provides detailed memory usage stats with resource pool hierarchy
ft-stats
• FT Virtual Machine stats
• Collected with vm-support –s/S flag
56
Other diagnostic tools (2 of 2)
swatchStats
• Stopwatch stats for VMFS, SCSI events
vscsiStats
• Virtual machine SCSI disk I/O stats
• Provides histogram information for latency, IO size, inter-arrival time and
outstanding I/Os
57
vscsiStats
Virtual scsi disk
handle ids -
unique across
virtual machines
World group
leader id
Virtual Machine
Name
# vscsiStats -l
58
vscsiStats – latency histogram
# vscsiStats -p latency -w 118739 -i 8205
Latency in
microsecondsI/O
distribution
count
59
vscsiStats – iolength histogram
# vscsiStats -p iolength -w 118739 -i 8205
I/O block size
Distribution
Count

More Related Content

What's hot

VMware VSAN Technical Deep Dive - March 2014
VMware VSAN Technical Deep Dive - March 2014VMware VSAN Technical Deep Dive - March 2014
VMware VSAN Technical Deep Dive - March 2014
David Davis
 
VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5
Vepsun Technologies
 
Reference design for v mware nsx
Reference design for v mware nsxReference design for v mware nsx
Reference design for v mware nsx
solarisyougood
 
VMware vSphere 6.0 - Troubleshooting Training - Day 5
VMware vSphere 6.0 - Troubleshooting Training - Day 5VMware vSphere 6.0 - Troubleshooting Training - Day 5
VMware vSphere 6.0 - Troubleshooting Training - Day 5
Sanjeev Kumar
 
VSICM8_M02.pptx
VSICM8_M02.pptxVSICM8_M02.pptx
VSICM8_M02.pptx
MazharUddin34
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Schubert Zhang
 
vSAN Beyond The Basics
vSAN Beyond The BasicsvSAN Beyond The Basics
vSAN Beyond The Basics
Sumit Lahiri
 
vSAN architecture components
vSAN architecture componentsvSAN architecture components
vSAN architecture components
David Pasek
 
Esxi troubleshooting
Esxi troubleshootingEsxi troubleshooting
Esxi troubleshooting
Ovi Chis
 
Virtual SAN 6.2, hyper-converged infrastructure software
Virtual SAN 6.2, hyper-converged infrastructure softwareVirtual SAN 6.2, hyper-converged infrastructure software
Virtual SAN 6.2, hyper-converged infrastructure software
Duncan Epping
 
VMware vSphere technical presentation
VMware vSphere technical presentationVMware vSphere technical presentation
VMware vSphere technical presentationaleyeldean
 
What’s New in VMware vSphere 7?
What’s New in VMware vSphere 7?What’s New in VMware vSphere 7?
What’s New in VMware vSphere 7?
Insight
 
VMware vSphere 6.0 - Troubleshooting Training - Day 1
VMware vSphere 6.0 - Troubleshooting Training - Day 1VMware vSphere 6.0 - Troubleshooting Training - Day 1
VMware vSphere 6.0 - Troubleshooting Training - Day 1
Sanjeev Kumar
 
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
David Pasek
 
Presentation v mware virtual san 6.0
Presentation   v mware virtual san 6.0Presentation   v mware virtual san 6.0
Presentation v mware virtual san 6.0
solarisyougood
 
VMware vSphere Storage Appliance (VSA) - Technical Presentation,Almacenamien...
VMware vSphere Storage Appliance (VSA) -  Technical Presentation,Almacenamien...VMware vSphere Storage Appliance (VSA) -  Technical Presentation,Almacenamien...
VMware vSphere Storage Appliance (VSA) - Technical Presentation,Almacenamien...
Suministros Obras y Sistemas
 
VMware virtual SAN 6 overview
VMware virtual SAN 6 overviewVMware virtual SAN 6 overview
VMware virtual SAN 6 overview
solarisyougood
 
VMware
VMwareVMware
VMware
InstituteIBA
 
PCF-VxRail-ReferenceArchiteture
PCF-VxRail-ReferenceArchiteturePCF-VxRail-ReferenceArchiteture
PCF-VxRail-ReferenceArchitetureVuong Pham
 
VMware Site Recovery Manager
VMware Site Recovery ManagerVMware Site Recovery Manager
VMware Site Recovery Manager
Jürgen Ambrosi
 

What's hot (20)

VMware VSAN Technical Deep Dive - March 2014
VMware VSAN Technical Deep Dive - March 2014VMware VSAN Technical Deep Dive - March 2014
VMware VSAN Technical Deep Dive - March 2014
 
VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5
 
Reference design for v mware nsx
Reference design for v mware nsxReference design for v mware nsx
Reference design for v mware nsx
 
VMware vSphere 6.0 - Troubleshooting Training - Day 5
VMware vSphere 6.0 - Troubleshooting Training - Day 5VMware vSphere 6.0 - Troubleshooting Training - Day 5
VMware vSphere 6.0 - Troubleshooting Training - Day 5
 
VSICM8_M02.pptx
VSICM8_M02.pptxVSICM8_M02.pptx
VSICM8_M02.pptx
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)
 
vSAN Beyond The Basics
vSAN Beyond The BasicsvSAN Beyond The Basics
vSAN Beyond The Basics
 
vSAN architecture components
vSAN architecture componentsvSAN architecture components
vSAN architecture components
 
Esxi troubleshooting
Esxi troubleshootingEsxi troubleshooting
Esxi troubleshooting
 
Virtual SAN 6.2, hyper-converged infrastructure software
Virtual SAN 6.2, hyper-converged infrastructure softwareVirtual SAN 6.2, hyper-converged infrastructure software
Virtual SAN 6.2, hyper-converged infrastructure software
 
VMware vSphere technical presentation
VMware vSphere technical presentationVMware vSphere technical presentation
VMware vSphere technical presentation
 
What’s New in VMware vSphere 7?
What’s New in VMware vSphere 7?What’s New in VMware vSphere 7?
What’s New in VMware vSphere 7?
 
VMware vSphere 6.0 - Troubleshooting Training - Day 1
VMware vSphere 6.0 - Troubleshooting Training - Day 1VMware vSphere 6.0 - Troubleshooting Training - Day 1
VMware vSphere 6.0 - Troubleshooting Training - Day 1
 
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
 
Presentation v mware virtual san 6.0
Presentation   v mware virtual san 6.0Presentation   v mware virtual san 6.0
Presentation v mware virtual san 6.0
 
VMware vSphere Storage Appliance (VSA) - Technical Presentation,Almacenamien...
VMware vSphere Storage Appliance (VSA) -  Technical Presentation,Almacenamien...VMware vSphere Storage Appliance (VSA) -  Technical Presentation,Almacenamien...
VMware vSphere Storage Appliance (VSA) - Technical Presentation,Almacenamien...
 
VMware virtual SAN 6 overview
VMware virtual SAN 6 overviewVMware virtual SAN 6 overview
VMware virtual SAN 6 overview
 
VMware
VMwareVMware
VMware
 
PCF-VxRail-ReferenceArchiteture
PCF-VxRail-ReferenceArchiteturePCF-VxRail-ReferenceArchiteture
PCF-VxRail-ReferenceArchiteture
 
VMware Site Recovery Manager
VMware Site Recovery ManagerVMware Site Recovery Manager
VMware Site Recovery Manager
 

Similar to Advanced performance troubleshooting using esxtop

Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
Suresh Kumar
 
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld
 
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & MemoryVMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive
Amazon Web Services
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntuSim Janghoon
 
20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS
Amazon Web Services Korea
 
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
DevOps.com
 
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld 2013: Extreme Performance Series: Monster Virtual Machines VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld
 
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
PlovDev Conference
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
data://disrupted®
 
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Suresh Kumar
 
Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02Suresh Kumar
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Amazon Web Services
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Coburn Watson
 
z/VM Performance Analysis
z/VM Performance Analysisz/VM Performance Analysis
z/VM Performance Analysis
Rodrigo Campos
 
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld
 
Presentation v mware performance overview
Presentation   v mware performance overviewPresentation   v mware performance overview
Presentation v mware performance overview
solarisyourep
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 

Similar to Advanced performance troubleshooting using esxtop (20)

Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
 
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters
 
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & MemoryVMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS
 
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
 
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld 2013: Extreme Performance Series: Monster Virtual Machines VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
 
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
 
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
 
Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
z/VM Performance Analysis
z/VM Performance Analysisz/VM Performance Analysis
z/VM Performance Analysis
 
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
 
Presentation v mware performance overview
Presentation   v mware performance overviewPresentation   v mware performance overview
Presentation v mware performance overview
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 

More from Alan Renouf

VMware Automation, PowerCLI presented at the Northern California PSUG
VMware Automation, PowerCLI presented at the Northern California PSUGVMware Automation, PowerCLI presented at the Northern California PSUG
VMware Automation, PowerCLI presented at the Northern California PSUG
Alan Renouf
 
Dutch VMUG 2010 PowerCLI Presentation
Dutch VMUG 2010 PowerCLI PresentationDutch VMUG 2010 PowerCLI Presentation
Dutch VMUG 2010 PowerCLI Presentation
Alan Renouf
 
PowerCLI & Onyx
PowerCLI & OnyxPowerCLI & Onyx
PowerCLI & Onyx
Alan Renouf
 
vSphere APIs for performance monitoring
vSphere APIs for performance monitoringvSphere APIs for performance monitoring
vSphere APIs for performance monitoringAlan Renouf
 
Exploring VMware APIs by Preetham Gopalaswamy
Exploring VMware APIs by Preetham GopalaswamyExploring VMware APIs by Preetham Gopalaswamy
Exploring VMware APIs by Preetham Gopalaswamy
Alan Renouf
 
TA6944 PowerCLI is for Administrators!
TA6944 PowerCLI is for Administrators!TA6944 PowerCLI is for Administrators!
TA6944 PowerCLI is for Administrators!
Alan Renouf
 
VMware VI Toolkit UKVMUG
VMware VI Toolkit UKVMUGVMware VI Toolkit UKVMUG
VMware VI Toolkit UKVMUGAlan Renouf
 

More from Alan Renouf (8)

VMware Automation, PowerCLI presented at the Northern California PSUG
VMware Automation, PowerCLI presented at the Northern California PSUGVMware Automation, PowerCLI presented at the Northern California PSUG
VMware Automation, PowerCLI presented at the Northern California PSUG
 
Bill board
Bill boardBill board
Bill board
 
Dutch VMUG 2010 PowerCLI Presentation
Dutch VMUG 2010 PowerCLI PresentationDutch VMUG 2010 PowerCLI Presentation
Dutch VMUG 2010 PowerCLI Presentation
 
PowerCLI & Onyx
PowerCLI & OnyxPowerCLI & Onyx
PowerCLI & Onyx
 
vSphere APIs for performance monitoring
vSphere APIs for performance monitoringvSphere APIs for performance monitoring
vSphere APIs for performance monitoring
 
Exploring VMware APIs by Preetham Gopalaswamy
Exploring VMware APIs by Preetham GopalaswamyExploring VMware APIs by Preetham Gopalaswamy
Exploring VMware APIs by Preetham Gopalaswamy
 
TA6944 PowerCLI is for Administrators!
TA6944 PowerCLI is for Administrators!TA6944 PowerCLI is for Administrators!
TA6944 PowerCLI is for Administrators!
 
VMware VI Toolkit UKVMUG
VMware VI Toolkit UKVMUGVMware VI Toolkit UKVMUG
VMware VI Toolkit UKVMUG
 

Recently uploaded

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Advanced performance troubleshooting using esxtop

  • 1. © 2010 VMware Inc. All rights reserved Advanced performance troubleshooting using esxtop/resxtop Krishna Raj Raja Staff Engineer, Performance Group
  • 2. 2 Disclaimer This session may contain product features that are currently under development. This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features discussed or presented have not been determined. “THESE FEATURES ARE REPRESENTATIVE OF FEATURE AREAS UNDER DEVELOPMENT. FEATURE COMMITMENTS ARE SUBJECT TO CHANGE, AND MUST NOT BE INCLUDED IN CONTRACTS, PURCHASE ORDERS, OR SALES AGREEMENTS OF ANY KIND. TECHNICAL FEASIBILITY AND MARKET DEMAND WILL AFFECT FINAL.”
  • 3. 3 esxtop resources esxtop manual: http://www.vmware.com/pdf/vsphere4/r41/vsp_41_resource_mgmt.pdf VMware Community documents: http://communities.vmware.com/docs/DOC-9279 - ESX 4.0 http://communities.vmware.com/docs/DOC-11812 - ESX 4.1 esxtop for advanced users: VMworld 2008 - http://vmworld.com/docs/DOC-2356 VMworld 2009 - http://vmworld.com/docs/DOC-3838
  • 4. 4 Ten things that you need to know about esxtop
  • 5. 5 esxtop counters 1. esxtop does not create performance metrics • esxtop derives performance metrics from raw counters exported in the VMkernel System Info nodes (VSI nodes) • esxtop can show new counters on older ESX system if the raw counters are present in VMKernel
  • 6. 6 esxtop counters 2. Counter values • Many raw counters have static values that do no change with time – esxtop displays them as it is • Many counters increment monotonically, esxtop reports the delta for these for the given refresh interval – for instance CMDS/sec, packets transmitted/sec etc • %USED and %RUN - CPU occupancy delta between successive snapshots
  • 7. 7 Refresh interval 3. Graphs will look different depending on the refresh interval • Many counters values are dependent on refresh interval • Larger refresh interval smoothens spikes and troughs 2 second refresh interval 10 second refresh interval
  • 8. 8 esxtop counters 4. Counter normalization • By default counters are shown for the group • In group view counters values are cumulative • In expanded view, counters are normalized per entity Cumulative stats vcpu world consumes CPU Pressing ‘e’ key expands a group
  • 9. 9 esxtop counters 5. %USED can exceed 100 • Turbo boost can increase the processor clock speed • Asynchronous work can be happening on a different core on behalf of the VM VM on a NFS datastore running I/O intensive workload
  • 10. 10 esxtop batch mode 6. Batch mode (-b) • Produces windows perfmon compatible CSV file • CSV file compatibility requires fixed number of columns on every row - statistics of VMs/worlds instances that appear after starting the batch mode are not collected because of this reason • Only counters that are specified in the configuration file are collected, (-a) option collects all counters • Counters are named slightly differently
  • 11. 11 esxtop batch mode – importing data into perfmon
  • 12. 12 esxtop batch mode – viewing data in perfmon
  • 13. 13 esxtop batch mode – trimming data Trimming data Saving data after trim
  • 15. 15 I/O Latencies 7. IO latencies • IO latencies are measured per SCSI command so it is not affected by refresh interval • Reported latencies are average values for all the SCSI commands issued within the refresh interval window • Reported average latencies can be different on different screens (adapter, LUN, VM), since each screen accounts for different group of I/Os
  • 16. 16 resxtop – remote esxtop 8. You can use resxtop to connect to different ESX hosts • Newer version of resxtop will connect to older ESX hosts 9. You don’t need root access to view esxtop counters • resxtop can authenticate using vCenter credentials
  • 17. 17 esxtop CPU usage 10. esxtop can consume non-trivial amount of CPU • When you have very large inventory (VMs, LUNs, virtual disks, virtual NICs etc) • You can limit the amount of data collected by limiting the fields (columns) and entities (rows), you can also reduce CPU consumption by locking entities, (-l) option CPU consumption on a host with 512 VMs CPU consumption with esxtop -l CPU usage when using resxtop
  • 19. 19 esxtop screens Screens • c: cpu (default) • m: memory • n: network • d: disk adapter • u: disk device (added in ESX 3.5) • v: disk VM (added in ESX 3.5) • i: Interrupts (new in ESX 4.0) • p: power management (new in ESX 4.1) VMkernel CPU Scheduler Memory Scheduler Virtual Switch vSCSI c, i, p m d, u, vn VM VM VMVM
  • 21. 21 CPU Constrained SMP VM High CPU utilization Both the virtual CPUs CPU constrained
  • 22. 22 CPU Contention 4 CPUs, all at 100% 3 SMP VMs VMs don’t get to run all the time %ready accumulates
  • 24. 24 Mis-configured SMP VM vCPU 1 not used by the VM Incorrect (UP) Kernel/HAL inside the guest or the application inside the guest is single threaded
  • 25. 25 Power management – CPU frequency scaling C states: C0 – busy, C1 – halted, C2 – deep halt P states: P0 – Highest clock frequency, P11 – Lowest clock frequency
  • 26. 26 VM Power Usage Experimental feature, not enabled by default. VMkernel advanced setting: Power.ChargeVMs
  • 27. 27 CPU clock frequency scaling %USED: CPU usage with reference to base clock frequency %UTIL: CPU utilization with reference to current clock frequency %RUN: CPU scheduled time VM is running all the time but uses only 75% of the clock frequency
  • 28. 28 Hyperthreading Two VMs running on different cores Two VMs sharing the same core %LAT_C counter shows the time de- scheduled due to core sharing
  • 30. 30 Timer interrupt rate Windows Guests – Multimedia timer
  • 31. 31 New metrics in CPU screen %LAT_C : %time the VM was not scheduled due to CPU resource issue %LAT_M : %time the VM was not scheduled due to memory resource issue %DMD : Moving CPU utilization average in the last one minute EMIN : Minimum CPU resources in MHZ that the VM is guaranteed to get when there is CPU contention
  • 33. 33 esxtop memory screen (m) Possible states: high, soft, hard and low PMEM – Total Physical memory VMKMEM - Memory managed by VMKernel COSMEM - Memory used by Service Console
  • 34. 34 Not able to power-on a new VM Memory reservation 820 MB reservation requested Overhead memory needs to be reserved 4G memory reservation
  • 35. 35 Granted Memory Granted Memory = Memory touched by the guest Windows and FreeBSD Guests touches (zeroes) all its memory during boot Linux Guests touches memory when it first uses it
  • 36. 36 Ballooning versus Swapping MCTL: N - Balloon driver not active, tools probably not installed Memory Hog VMs Swapped in the past but not actively swapping now Swap target is more for the VM without the balloon driver VM with Balloon driver swaps less
  • 37. 37 Memory Compression Stats COWH : Copy on Write Pages hints – amount of memory in MB that are potentially shareable CACHESZ: Compression Cache size CACHEUSD: Compression Cache currently used ZIP/s, UNZIP/s: Memory compression/decompression rate
  • 38. 38 Wide NUMA - CPU 2 NUMA nodes with ~6G each NUMA home node not assigned 6-vcpu VM – cannot fit into a NUMA node size of 4 CPUs 4G, can fit into a single node
  • 39. 39 NUMA affinity not set NUMA machine with 2 nodes CPU affinity set to wrong NUMA node All the memory in remote node NHN: NUMA Home Node NLMEM: Memory in local node NRMEM: Memory in remote node
  • 40. 40 Wide NUMA - Memory 2 NUMA nodes with ~6G each NUMA home node not assigned VM cannot be fit into a single NUMA node
  • 42. 42 vSwitch active uplink TEAM-PNIC : The uplink that the virtual switch port is currently using
  • 43. 43 Dropped packets at vSwitch Packet drops usually happens when the traffic has no flow control (UDP/Multicast/Broadcast packets)
  • 44. 44 Multicast/Broadcast stats PKTTXMUL/s – Multicast packets transmitted per second PKTRXMUL/s – Multicast packets received per second PKTTXBRD/s – Broadcast packets transmitted per second PKTRXBRD/s – Broadcast packets received per second
  • 45. 45 NFS stats DAVG and KAVG is not available for network backed storage GAVG – gives the end to end latency
  • 47. 47 Disk I/O latency Host bus adapters (HBAs) - includes SCSI, iSCSI, RAID, and FC-HBA adapters Latency stats from the Device, Kernel and the Guest DAVG/cmd - Average latency (ms) from the Device (LUN) KAVG/cmd - Average latency (ms) in the VMKernel GAVG/cmd - Average latency (ms) in the Guest
  • 48. 48 Problem with the disk subsystem Bad throughput Good throughput Device Latency is high - cache disabled Low device Latency
  • 50. 50 FC bottleneck ‘v’ – VM view ‘u’ – device view ‘d’ – adapter view
  • 51. 51 vStorage API for Array Integration (VAAI) stats CLONE_RD, CLONE_WR: Number of Clone read/write requests CLONE_F: Number of Failed clone operations MBC_RD/s, MBC_WR/s – Clone read/write MBs/sec ATS – Number of ATS commands ATSF – Number of failed ATS commands ZERO – Number of Zero requests ZEROF – Number of failed zero requests MBZERO/s – Megabytes Zeroed per second
  • 52. 52 VAAI - virtual disk creation example vStorage API for Array Integration (VAAI)
  • 55. 55 Other diagnostic tools (1 of 2) sched-stats and schedtrace • vm-support -s/-S flag captures sched-stats • vm-support -c flag captures scheduler trace – takes lot of disk space memstats • Provides detailed memory usage stats with resource pool hierarchy ft-stats • FT Virtual Machine stats • Collected with vm-support –s/S flag
  • 56. 56 Other diagnostic tools (2 of 2) swatchStats • Stopwatch stats for VMFS, SCSI events vscsiStats • Virtual machine SCSI disk I/O stats • Provides histogram information for latency, IO size, inter-arrival time and outstanding I/Os
  • 57. 57 vscsiStats Virtual scsi disk handle ids - unique across virtual machines World group leader id Virtual Machine Name # vscsiStats -l
  • 58. 58 vscsiStats – latency histogram # vscsiStats -p latency -w 118739 -i 8205 Latency in microsecondsI/O distribution count
  • 59. 59 vscsiStats – iolength histogram # vscsiStats -p iolength -w 118739 -i 8205 I/O block size Distribution Count