Master VMware Performance and Capacity ManagementIwan Rahabok
12 Sep 2016 update: See this http://virtual-red-dot.info/operationalize-sddc-program-2/ for details.
-------------
Based on the book http://virtual-red-dot.info/performance-and-capacity-management/
Master performance and capacity management of VMware SDDC
Master VMware Performance and Capacity ManagementIwan Rahabok
12 Sep 2016 update: See this http://virtual-red-dot.info/operationalize-sddc-program-2/ for details.
-------------
Based on the book http://virtual-red-dot.info/performance-and-capacity-management/
Master performance and capacity management of VMware SDDC
How to Optimize Microsoft Hyper-V Failover Cluster and Double PerformanceStarWind Software
High availability in a virtualized workload may require to sacrifice failover cluster performance. Using an optimized for virtualization approach on data storage and virtual machines placement and protection will give you desired boost of performance.
The presentation shows how to:
- Achieve true Hyper-V cluster high availability with just 2 Hyper-V hosts and zero storage hardware
- Boost Hyper-V cluster performance by configuring automatic dynamic optimization
- Effectively track VMs resources usage
- Save an extra 30% of Hyper-V cluster resources by utilizing agentless antivirus
Авторский учебный курс от Архитектора Microsoft Алексея Кибкало.
Что такое Hyper-V
Версии Windows Server 2012 Hyper-V
Аппаратные требования к Windows Server 2012 Hyper-V
Установка Hyper-V
Сетевые возможности Windows Server 2012 Hyper-V
Что такое Live Migration
Высокодоступные кластеры Windows Server 2012 Hyper-V
Аварийное восстановление и Hyper-V Replica
Азы управления при помощи System Center
При поддержке "Звезды и С" www.stars-s.ru
The Unofficial VCAP / VCP VMware Study GuideVeeam Software
Veeam® is happy to provide the VMware community with new, unofficial study guides prepared by VMware certified professionals Jason Langer and Josh Coen.
Free VCP5-DCV Study Guide
In this 136-page study guide Jason and Josh cover all seven of the exam blueprint sections to help prepare you for the VCP exam.
Free VCAP5-DCA Study Guide
For those currently holding their VCP certification and want to take it up a notch, Jason and Josh have you covered with the 248-page VCAP5-DCA study guide. Using this study guide along with hands-on lab time will help you in the three and a half hours, lab-based VCAP5-DCA exam.
Integration with EMC VNX and VNXe hybrid storage arraysVeeam Software
Providing Availability for the Always-On Enterprise™ is priority one for a modern data center. Veeam® Availability Suite™ v9 contains integration with EMC VNX and VNXe hybrid storage arrays, delivering the best RTPO™ (recovery time and point objectives).
Building vSphere Perf Monitoring ToolsPablo Roesch
Balaji and Ravi present on how to build vSphere monitoring tools using the vSphere APIs - this is a must view for anyone managing a large complex environment. For vSphere SDKs, API visit http://developer.vmware.com Blogs, Forums, Sample Code
How to Optimize Microsoft Hyper-V Failover Cluster and Double PerformanceStarWind Software
High availability in a virtualized workload may require to sacrifice failover cluster performance. Using an optimized for virtualization approach on data storage and virtual machines placement and protection will give you desired boost of performance.
The presentation shows how to:
- Achieve true Hyper-V cluster high availability with just 2 Hyper-V hosts and zero storage hardware
- Boost Hyper-V cluster performance by configuring automatic dynamic optimization
- Effectively track VMs resources usage
- Save an extra 30% of Hyper-V cluster resources by utilizing agentless antivirus
Авторский учебный курс от Архитектора Microsoft Алексея Кибкало.
Что такое Hyper-V
Версии Windows Server 2012 Hyper-V
Аппаратные требования к Windows Server 2012 Hyper-V
Установка Hyper-V
Сетевые возможности Windows Server 2012 Hyper-V
Что такое Live Migration
Высокодоступные кластеры Windows Server 2012 Hyper-V
Аварийное восстановление и Hyper-V Replica
Азы управления при помощи System Center
При поддержке "Звезды и С" www.stars-s.ru
The Unofficial VCAP / VCP VMware Study GuideVeeam Software
Veeam® is happy to provide the VMware community with new, unofficial study guides prepared by VMware certified professionals Jason Langer and Josh Coen.
Free VCP5-DCV Study Guide
In this 136-page study guide Jason and Josh cover all seven of the exam blueprint sections to help prepare you for the VCP exam.
Free VCAP5-DCA Study Guide
For those currently holding their VCP certification and want to take it up a notch, Jason and Josh have you covered with the 248-page VCAP5-DCA study guide. Using this study guide along with hands-on lab time will help you in the three and a half hours, lab-based VCAP5-DCA exam.
Integration with EMC VNX and VNXe hybrid storage arraysVeeam Software
Providing Availability for the Always-On Enterprise™ is priority one for a modern data center. Veeam® Availability Suite™ v9 contains integration with EMC VNX and VNXe hybrid storage arrays, delivering the best RTPO™ (recovery time and point objectives).
Building vSphere Perf Monitoring ToolsPablo Roesch
Balaji and Ravi present on how to build vSphere monitoring tools using the vSphere APIs - this is a must view for anyone managing a large complex environment. For vSphere SDKs, API visit http://developer.vmware.com Blogs, Forums, Sample Code
VMworld 2017 - Top 10 things to know about vSANDuncan Epping
In this session Cormac Hogan and I go over the top 10 things to know about vSAN. This is based on two years of questions/answers from our field and customers. Useful for any VMware vSAN customer!
#STO1264BU #STO1264BE
VMworld 2015: Virtualize Active Directory, the Right Way!VMworld
Active Directory Domain Services (ADDS) allows organizations to deploy a scalable and secure directory service for managing users, resources and applications. Virtualization of ADDS has been supported for many years now, however has required careful management to avoid pitfalls around replication, time management, and access. Windows Server 2012 provides greater support for virtualization by including virtualization-safe features and support for rapid domain controller deployment.
A look at the new enhancements to core storage in vSphere 6.5, including VMFS6, Automated UNMAP, I/O Filters, and much more, as delivered by Cormac Hogan and Cody Hosterman
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...Concentrated Technology
At the end of the day, virtualization is all about performance. If you squish together 20 VMs onto a single host and they don’t perform well, then you’ve failed at your job. Conversely, if you’ve constructed the environment correctly, you win. In this fun and exciting session, Friend-of-the-Virtual-Machine Greg Shields presents 30 of his very best tips that you can immediately implement. Who knows, you might find one or two that solve your performance problems overnight!
VMworld 2015: Extreme Performance Series - vSphere Compute & MemoryVMworld
In this session we'll dive deep into how the vSphere compute and memory schedulers work to provide the same level of performance as bare metal. Hosted by two outstanding performance engineers, they will review concepts like how and when vSphere schedules vCPUs, how virtual machines are idles, understand virtual machine memory overhead and how large memory pages help or hurt performance. If you want to understand what vSphere does at an atomic level you don't want to miss this advanced session.
This talk was given during DockerCon EU 2018.
It ain't just a whim - to be able to continue innovating, we’ve moved our good old static production to containers. We needed to be elastic, fast, reliable and production ready at any time - that's why we chose Docker. But like in most enterprises, lots of our apps run on the JVM and most JVMs’ ergonomics assume they “own” the server they are running on. So how do you containerize JVM apps? Should you really increase JVM heap if you have spare memory? What about OS caches? What are the differences between JDK 8, 9 and 10 when it comes to container awareness? Outages because of out of memory errors? Slowness because of long garbage collection and poor environment visibility? Long story short, in this session, we’ll look at the gotchas of running JVM apps in containers and teach you how to avoid costly mistakes.
Top 3 things attendees will learn:
1. Key differences between various JVM versions relevant for containerized Java apps.
2. Best practices for running JVM in containers.
3. Avoiding common pitfalls when running containerized JVM applications.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
A primary Italian Telecommunication company increased its VMware environment performance and automated its management with Eco4Cloud Workload Consolidation and Smart Ballooning. In this white paper we show how our customer automated VMs placement and got rid of memory ballooning and dangerous CPU latency.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Get Your GeekOn with Ron - Session One: Designing your VDI ServersUnidesk Corporation
Join virtualization expert and industry veteran Ron Oglesby as he breaks down how to select and configure servers, including:
• Server CPU selection - they were not made equal!
• Desktop-to-core guesstimation?
• Memory - and its temperamental relationship with disk design
• Local storage options - yes, it's an option
• And, overall best practices for VDI implementation
The have no fear guide to virtualizing databasesSolarWinds
When it comes to a successful database virtualization journey, there are things you must know before you start. In this presentation you will:
-Review terms and concepts for VMware, by far the most common virtualization platform
-Examine how to use vSphere (the VMware admin console)
-Explore the differences between virtual and physical host metrics a
-Learn to overcome the shortcomings of virtualizing your database environment
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld
VMworld 2013
Anne Holler, VMware
Ganesha Shanmuganathan, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Similar to Advancedtroubleshooting 101208145718-phpapp01 (20)
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
4. AGENDA
1. Introduction by Scott Drummonds
2. CPU troubleshooting
3. Memory troubleshooting
4. Storage troubleshooting
5. Network troubleshooting
6. Troubleshooting tools
5. INTRODUCTION
Scott Drummonds
Technical Director, vSpecialists, APJ at EMC
The performance space is massive. It’s nearly
impossible to keep up with everything that is
happening in this space. With the benefit of close
contact with VMware's performance engineering team
I was barely able to hold the reins on that massive
beast. The secret is not to try and learn every little
thing out there, but to develop a strong handle on
troubleshooting using esxtop, vCenter and vscsiStats.
Everything comes from there.
6. CPU TROUBLESHOOTING – CPU READY TIME
The vSphere Client Graph refreshes every 20 seconds
1000 Milliseconds / 20.000 Milliseconds = 5 %
34 Milliseconds / 20.000 Milliseconds = 0,17 % <~ no worries
7. CPU TROUBLESHOOTING – CPU READY TIME
A %RDY figure of 17.97% means that the virtual machine spent
17.97% of its last sample period waiting for available CPU
resources. Esxtop’s default refresh interval is 5 seconds.
The PCPU AVG value in this example is 100%.
8. CPU TROUBLESHOOTING - FLOWCHART
%RDY > 10% NO Other
VM CPU Ready Time
2000 mSec Problem:
- Memory
- Storage
- Network
No Problem YES
YES
Used time ~
%PCPU
ready time Hmmm
YES USED > 90% NO
with spikes
NO
Host CPU Saturation
9. CPU TROUBLESHOOTING - MAX LIMITED
%MLMTD - The max limited time is the percentage of time the VM
world was ready to run but deliberately wasn't scheduled
because that would violate the VM’s "CPU limit" settings.
%RDY includes %MLMTD
For CPU contention, use "%RDY -
%MLMTD“. 99.75 – 99,73 = 0.02
So there’s no contention despite of
the high ready time.
10. CPU TROUBLESHOOTING - MAX LIMITED
VMKernel deliberately didn't run
Yes
%MLMTD
Hmmm
> 0%
No
SMP virtual machine?
Check %CSTP - Co-Scheduling
11. CPU TROUBLESHOOTING – CO SCHEDULING
At any particular point in time,
each virtual cpu may be
scheduled, descheduled,
preempted, or blocked waiting for
some event.
Without co scheduling, the VCPUs
associated with an SMP VM
would be scheduled independently,
breaking the guest's assumptions
regarding uniform progress.
VMware uses the term "skew" to
refer to the difference in execution
rates between two or more VCPUs
associated with an SMP VM.
12. CPU TROUBLESHOOTING – CO SCHEDULING
Type “e” to show all the worlds associated with a single
virtual machine. The %CSTP metric indicates co scheduling.
13. CPU TROUBLESHOOTING - RECAP
o If ready time <= 5%, there’s no problem.
o If ready time is 5% <=> 10%, there might be an issue.
o If ready time is => 10% there’s a performance issue.
o Check if the virtual machine’s CPU is not limited.
o Check if there’s CPU over commitment all the time,
occasional spikes are no problem.
o If it’s an SMP virtual machine check if the application is
multithreading and actually using the resources.
o If the ESX host is saturated reduce the number of virtual
machines.
14. MEMORY TROUBLESHOOTING
For each running virtual machine, the ESX host reserves
physical memory for the virtual machine’s reservation
(if any) and for its virtualization overhead. Because of the
memory management techniques the ESX host uses,
your VMs can use more memory than there’s physically
available…
15. MEMORY TROUBLESHOOTING – PAGE SHARING
Transparent Page Sharing
Transparent page sharing (TPS) reclaims memory by
consolidating redundant pages with identical content.
This helps to free memory that a virtual machine would
otherwise (not) be using. Page sharing will show up in
esxtop at modern Intel/AMD processors only when host
memory is overcommitted.
16. MEMORY TROUBLESHOOTING – PAGE SHARING
Guest physical memory is not “freed”, the memory is moved
to the “free” list. The ESX host has no access to the guest’s
“free” list and the ESX host cannot “reclaim” the memory
freed up by the guest.
Sharing happens with other virtual machines on the same
host but also within virtual machines.
17. MEMORY TROUBLESHOOTING - BALLOONING
Ballooning reclaims memory by artificially increasing the
memory pressure inside the guest and will become a
performance issue when the guest OS is paging active
memory to its own page file. Ballooning offers a better
performance than ESX swapping or ESX memory
compression.
18. MEMORY TROUBLESHOOTING - BALLOONING
The MCTLTGT (target) value set by VMkernel for the VM’s
memory balloon size, in conjunction with MCTLSZ (size)
metric, is used by VMkernel to inflate and deflate the balloon
for a virtual machine.
If MCTLTGT > MCTLSZ the VMkernel inflates the balloon.
If MCTLTGT < MCTLSZ the VMkernel deflates balloon.
19. MEMORY TROUBLESHOOTING - LIMIT
Don’t configure VM memory limits, set an appropriate VM
memory size instead! Virtual machines deployed from a
template with a configured memory limit will become
ballooning ghosts after adding more configured memory.
Even though there’s enough memory available at host level
you will see ballooning with a maximum of 65%.
20. MEMORY TROUBLESHOOTING - LIMIT
This is an example of a virtual
machine configured with 1024
MB of memory and no limit.
Before 20:15 there’s no
memory limit configured after
20:15 the limit is set 512 MB.
As soon as the VM is trying to
access memory above 512
MB - ballooning kicks in.
21. MEMORY TROUBLESHOOTING - RESERVATION
Ballooning
RES Compression RES
SWAP
Be careful with configuring a high VM reservation. As soon
as a virtual machine has used or touched it’s reserved
memory, the other virtual machines can’t use it anymore.
The VM reservation is also used for calculating the slot size
in an HA cluster with “number host failures allowed”. Only
reserve what is really used and needs to be guaranteed.
22. MEMORY TROUBLESHOOTING – COMPRESSION
Compression
Memory compression reclaims memory by compressing the pages
that need to be swapped out. If the swapped out pages can be
compressed and stored in a compression cache located in the main
memory, the next access to the page only causes a page
decompression, which can be an order of magnitude faster than the
disk access. This means the number of future synchronous swap-in
operations will be reduced. The compression ratio must be + 50%.
23. MEMORY TROUBLESHOOTING – COMPRESSION
o The CACHESZ value (10% of the VM memory) is the
compression cache size.
o The CACHEUSD value is the compression cache
currently used.
o ZIP/s and UNZIP/s are the compressions and
uncompressing actions per second.
24. MEMORY TROUBLESHOOTING – SWAP
o SWCUR is the current amount of guest physical memory
swapped out to the virtual machine's swap file by the
VMkernel. Swapped memory stays on disk until the
virtual machine needs it.
o If SWTGT > SWCUR, the VMkernel can start swapping
when necessary.
o If SWTGT < SWCUR, the VMkernel stops swapping
memory.
25. MEMORY TROUBLESHOOTING - SWAP
Ballooning
Compression
SWAP
High swap-in latency, which can be tens of milliseconds, can
severely degrade guest performance. If available configure
local SSD storage for your virtual machine swap file location.
There’s a -12% degradation with local SSD versus
-69% for Fiber Channel and -83% for local SATA storage.
26. MEMORY TROUBLESHOOTING – SWAP
o SWPWT is the percentage of time that the virtual machine
is waiting for memory to be swapped in.
This value shouldn’t be above 5%.
o SWR/s is the rate at which memory is swapped from
(SSD) disk into active memory.
o SWW/s is the rate at which memory is being swapped from
active memory and written to (SSD) disk.
27. MEMORY TROUBLESHOOTING - RECAP
o Be careful with setting virtual machine memory
reservations. When memory is touched by the VM, the other
virtual machines can’t use the memory anymore. Only
configure what the virtual machine really needs.
o Don’t set memory limits, set an appropriate virtual machine
memory size instead.
o Do not disable page sharing or the balloon driver. Ballooning
is OK as long as the guest OS isn’t using it’s own page file
for active memory swapping.
o The use of large pages results in reduced memory
management overhead and can therefore increase
hypervisor performance. But take into consideration that
using large pages (2MB) TSP might not occur until memory
over commitment is high enough to require the large pages
to be broken into small pages.
28. STORAGE TROUBLESHOOTING – THE STACK
Application
Guest File System
I/O Drivers HD Tune Pro
VMM
VSCSI GAVG/cmd
VMFS
VMKernel
Core Storage KAVG/cmd
Driver
DAVG/cmd
29. STORAGE TROUBLESHOOTING – THE METRICS
DAVG - This is the latency seen at
the device driver level. It includes
the roundtrip time between the
HBA and the storage.
KAVG - This counter tracks the
latency due to the ESX Kernel's
command.
GAVG - This is the round-trip
latency that the guest sees for all
IO requests sent to the virtual
storage device.
30. STORAGE TROUBLESHOOTING – IBM DS3400
IBM-DS3400 with 2
arrays and 18 logical
drives – RAID 5
ISP2432-based 4Gb
Fiber Channel to PCI
Express HBA
31. STORAGE TROUBLESHOOTING – IOMEGA IX2
Iomega StorCenter ix2
with 500 GB - RAID 1
1 Gigabit Ethernet
Jumbo frame support
iSCSI target or CIFS/NFS
32. STORAGE TROUBLESHOOTING - (CONS/S)
The SCSI reservation conflict counter - CONS/s will become non-zero
when a host tries to do SCSI reservation on a LUN which has a SCSI
reservation in progress. This happens only when two hosts try to do
metadata operation on the same LUN at the same exact time.
33. STORAGE TROUBLESHOOTING - (CONS/S)
SCSI reservation is held for a very short period (few hundred
microseconds) so the chances of getting a conflict is very less on a
small cluster. However as the number of hosts that shares the LUN
increases conflicts could arise more frequently.
34. STORAGE TROUBLESHOOTING - VSCSISTAT
vscsiStats collects and reports counters on storage
activity. Its data is collected at the virtual SCSI device
level in the kernel. This means that results are reported
per VMDK (or RDM) irrespective of the underlying
storage protocol. The following data are reported in
histogram form:
o IO size
o Seek distance
o Outstanding IOs
o Latency (in mSecs)
35. STORAGE TROUBLESHOOTING - ALIGNMENT
VMDK file (NTFS) Cluster Cluster Cluster Cluster Cluster Cluster
VMFS volume Block Block
SAN LUN Chunk Chunk
VMDK file (NTFS) Cluster Cluster Cluster Cluster Cluster Cluster
VMFS volume Block Block
SAN LUN Chunk Chunk
Like other known disk based file systems, VMFS suffers a
penalty when the partition is unaligned. Use the vSphere
client to create VMFS partitions since the vSphere client
automatically aligns the partitions along the 64 KB boundary.
36. STORAGE TROUBLESHOOTING – ALIGNMENT
• Guest OS alignment is important for Microsoft Windows
Server 2003, XP and 2000. When a partition is created on
Windows 2008 or Windows 7 the newly created partition
is automatically aligned.
• Windows uses a factor of 512 bytes to create volume
clusters. This behavior causes a misaligned partition.
• To resolve this issue, use the Diskpart.exe tool to create
the disk partition and to specify a starting offset of 128
sectors (64 kilobyte).
• Create partition primary align=64
((Partition offset) * (Disk sector size)) / (Stripe unit size)
37. STORAGE TROUBLESHOOTING - RECAP
o If KAVG/cmd > 3 mSec or DAVG/cmd > 20 mSec
there might be a storage performance problem.
o Check alignment on the array, VMFS and in the
guest OS.
o Monitor the number of reservation conflicts per
second and be careful with snapshots.
o Pay attention to drive types, the more drives
you use the more IOPS you will get.
o When creating an VMFS, give it the right size
and keep in mind how many virtual machines
you want to host on that datastore.
o When choosing a block size, stick to it.
39. NETWORK TROUBLESHOOTING – DROPPED PKT
Receive packets might be dropped at the virtual switch if the
virtual machine’s network driver runs out of receive (Rx)
buffers, that’s a buffer overflow.
The dropped packets (%DRPR) may be reduced by
increasing the Rx buffers for the virtual network driver.
40. NETWORK TROUBLESHOOTING – NIC SETTINGS
In ESX 4.1, you can configure the
advanced VMXNET3 parameters
from the Device Manager in the
Windows guest OS.
It’s possible to increase the Rx
buffers for the virtual network
driver here.
This also works on an Intel E1000
with the native driver installed in
the guest OS.
41. NETWORK TROUBLESHOOTING – VLAN ID
VMXNET3
For VLAN troubleshooting, you have to create a new
dvPortgroup with a VLAN trunk. This way the network traffic
is delivered with a VLAN tag in the guest OS.
Now you can configure the VLAN advanced parameters for
an Intel E1000 or an VMXNET3 adapter in the guest OS and
specify a VLAN ID. This allows you to hop between VLANs.
42. NETWORK TROUBLESHOOTING – LOAD BASED TEAMING
pSwitch
LBT reshuffles port binding dynamically based on load
and dvUplinks usage to make an efficient use of the
available bandwidth.
When Load Based Teaming reassigns ports, the MAC
address change to a different pSwitch port. The pSwitch
must allow for this.
43. NETWORK TROUBLESHOOTING – LOAD BASED TEAMING
LBT will only move a flow when the mean send or receive
utilization on an uplink exceeds 75 percent of capacity over
a 30-second period. LBT will not move flows more often
than every 30 seconds. Enable PortFast mode for the
physical switch ports facing the ESX Server.
44. NETWORK TROUBLESHOOTING - RECAP
o Enable PortFast mode for the physical switch ports
facing the ESXi Server.
o Disable STP for the physical switch ports facing the
ESX Server.
o Use the VMXNET3 virtual network card wherever
possible.
45. TROUBLESHOOTING TOOLS
• Veeam Monitor
• VMTurbo Watchdog
• Quest vFoglight
• VKernel Capacity Analyzer
• VESI VMware Community PowerPack
• VMware Health Check Analyzer
• Bouke Groenescheij -> Graph-VM
• Esxplot and perfmon
• Rob de Veij - RVTools
• Xangati for ESX
46. TROUBLESHOOTING TOOLS – GRAPH-VM
http://www.jume.nl
Bouke Groenescheij has created a framework of scripts
which are able to produce some real nice graphs. Graph-
VM uses PowerShell to gather the information and creates
reports with the RDDTool.
47. TROUBLESHOOTING TOOLS – ESXPLOT
http://labs.vmware.com
The following command would run esxtop in batch mode,
updating all statistics to the file perfstats.csv every 10
seconds for 360 iterations (a total of 60 minutes) before
exiting:
esxtop -a -b -d 10 -n 360 > perfstats.csv
48. TROUBLESHOOTING TOOLS - RVTOOLS
http://www.robware.net
RVTools is a windows .NET 2.0 application which uses the VI
SDK to display information about your virtual machines and
ESX hosts. RVTools is able to list information about cpu,
memory, disks, nics, cd-rom, floppy drives, snapshots,
VMware tools, ESX hosts, nics, datastores, switches, ports
and health checks.
49. TROUBLESHOOTING TOOLS - XANGATI
http://xangati.com
Xangati for ESX is a Free tool designed for smaller scale
environments with only a few ESX/ESXi hosts. It offers
continuous, real-time visibility into over 100 metrics on an
ESX/ESXi host and its VMs activity, including
communications, CPU, memory, disk, and storage latency.
50. THANK YOU - QUESTIONS
This presentation is available for download at
http://www.ntpro.nl and http://www.vmug.nl
Don't forget to fill out the Session Evaluation.