How to Resolve Capacity Bottlenecks and Ensure Great Performance in Your VMware ESX Environment - Presentation Transcript
How to Resolve Capacity Bottlenecks
and Ensure Great Performance in
Your VMware ESX Environment
A VMware System Administrator’s Guide to
Identifying Capacity Bottlenecks, Predicting Future
Capacity Bottlenecks, and Removing Them
A Whitepaper by:
Alex Bakman
Founder and CEO, VKernel Corporation
abakman@vkernel.com
Joan Mealey
Senior Systems Engineer, VKernel Corporation
jmealey@vkernel.com
Table of Contents
The Virtualized Data Center: A Different World ......................................3
A Quick Primer on VMware ESX Capacity Management Concepts.......3
Environment Changes that Impact Capacity ..........................................4
Finding Current Capacity Bottlenecks ....................................................6
Identifying Future Capacity Bottlenecks.................................................7
Finding Pockets of Available Capacity in Your Environment................7
Page 2 of 8
The Virtualized Data Center: A Different World
As we are writing this paper, the overwhelming majority of
organizations are migrating and consolidating their servers from
physical to virtual environments. While the savings of the
virtualized data center are extremely compelling, there is now a
new set of challenges that System Administrators did not have to
deal with in the physical world.
One of the challenges is getting used to the fact that all hardware "As data centers struggle
resources: CPU, memory, storage and network utilization are with server consolidation
shared between virtual machines. This means that applications and server virtualization,
and users can impact each other and therefore resource
capacity planning becomes
monitoring becomes really important. Put another way if you
don’t closely monitor resource consumption by each virtual the key to maintaining or
machine and simply keep adding more virtual machines without improving service quality
doing analysis on how this will impact all four core resources, the while containing costs."
result will be bad performance and even system downtime.
Ultimately, this means unhappy users and dissatisfied managers.
So, you need to find a way to prevent this problem. The Capacity Planning
Software Market
In this white paper we will give you a formula for ensuring that -Forrester
you won’t run into these problems. In fact if you closely monitor
resource availability, you will create a very stable environment
and prove to your users that virtualization is ready for prime time.
A Quick Primer on VMware ESX Capacity
Management Concepts
Clusters consist of two or more VMware ESX hosts working
together as one to provide high availability/redundancy. Clusters
allow the sum of all resources from the hosts in the cluster to be
spread among the virtual machines on those hosts. This is a
typical way to deploy VMware ESX hosts in all but the smallest
environments.
Resource pools allow the administrator to allocate and divide
resources among virtual machines in clusters and other resource
pools. They work by using reservations (guaranteed resources),
shares (for when resources are overcommitted), and limits.
Resource pools can be nested and organized in a hierarchical
fashion to match the company’s organization. Reservations are
used for the more critical virtual machines, enabling you to
guarantee a virtual machine gets a specific amount of memory or
CPU time. Shares are used to divide up the remaining resources
among the rest of the virtual machines – the more shares a
virtual machine has, the higher percentage of resources the
virtual machine can use. Resource pools can have a “fixed”
amount of resource or be linked into sharing resources with other
Resource pools
Page 3 of 8
As you can see, a physical host is no longer the “resource
boundary”. Now the resource boundaries can also extend to
clusters and resource pools.
Distributed Resource Scheduling (DRS) is designed to balance
the load across the cluster by migrating virtual machines among
the hosts in the cluster. DRS can be set to manual, partially-
automated, or fully automated. The mode you set determines
how involved you will be in deciding those migrations.
“As changes happen, you
VMware High Availability (HA) allows companies to provide high need to be aware of the
availability to any application running in a virtual machine. It
continuously monitors and automatically restarts virtual systems impact they have on your
in the event of a host failure. VMware HA does not provide zero virtual environment.”
downtime and it is dependant on having enough available
resources among the hosts in the cluster. For instance, if you
have three hosts, any one of the hosts must have available
resources to run the virtual machines from either of the other two
hosts in the event of a failure.
So, as a system administrator, you can just set up your clusters,
turn on VMware HA, set DRS to automatic and walk away right?
It’s just not that simple. As you will quickly see, this is a complex
organization of resources that requires constant monitoring to
continue functioning properly.
Environment Changes that Impact Capacity
As changes happen, you need to be aware of the impact they
have on your virtual environment. Here is compiled a list of
"events" that can cause you to run out of capacity resources in
your VMware ESX data center, resulting in performance
problems or even downtime:
1. Adding new virtual machines though uncontrolled virtual
machine sprawl
2. Removing hosts from clusters possibly for maintenance
3. Enabling VMware HA in your cluster without accounting for
failover
4. Changing failover capacity settings in a Cluster
5. Increasing reservations in virtual machines
6. Changing resource pool configurations
7. Powering up many virtual machines that were powered off or
in maintenance
8. Natural growth rates in storage, CPU, memory and network
utilization
9. Changes in workloads can result in Disk I/O bottlenecks
Let’s examine these in more detail.
Page 4 of 8
In a VMware ESX environment, all memory, CPU, storage and
network utilization resources are shared. You are now dealing
with a four-dimensional capacity problem. How will you know
which of these resources will you run out of first?
The newly virtualized data center is experiencing constant
change. We have worked with many companies that are adding
hundreds of virtual machines every week. Virtual machine
sprawl is quickly becoming a real issue for many organizations.
Even if you are not adding hundreds of virtual machines per
week, every virtual machine that you do add can tip the balance “Virtual machine sprawl is
in the wrong direction and cause performance problems. A good already causing numerous
way to visualize the problem is to think of virtual machines as traffic jams or bottlenecks
cars and your resources as roads and highways. As you add
in many data centers.
more cars to the roadways, and the number of roads and lanes
remains fixed, sooner or later you will cause a serious traffic jam. Therefore, it is essential to
If the roads do not get expanded with additional lanes, new roads quickly identify current
are not built, or the number of cars does not decrease, the result traffic jams and resolve
is very predictable.
them as soon as possible.”
Virtual machine sprawl is already causing numerous traffic jams
or bottlenecks in many data centers. Therefore, it is essential to
quickly identify current traffic jams and resolve them as soon as
possible. Once you can accomplish this, you can do the same
thing with future bottlenecks and thus stay ahead of the problem
and prevent a situation in which you will run out of capacity.
You also need to take into consideration that your capacity is
getting depleted not only by new virtual machines, but also by
natural growth requirements in your applications. Remember the
age old rule of computing which states that your programs will
grow to fill all available memory, CPU and storage. Nothing has
changed in that regard in the virtualized data center. You must
figure what is the growth rate in each resource type, and at what
percentage is your memory, CPU, storage and network utilization
increasing on a weekly basis. By getting a baseline and
understanding trends, you will be able to proactively monitor your
data center for anomalies and identify problem areas quickly.
Another factor you must take into consideration is the number of
powered down virtual machines and the virtual machines
currently in maintenance mode. In their current state, they are
not consuming resources, but if powered up they will. Unless you
have total control over user behavior, and most of us don’t, you
will never know when these virtual machines get powered up. If
you have a large number of virtual machines powered off and
those suddenly become active, you may not have enough
capacity availability.
Page 5 of 8
And we are not done yet. There are more changes that can
impact capacity availability in your data center. VMware ESX
provides many ways of organizing your resources. For example a
Resource pool can be configured as fixed or expandable. When
fixed, the resource pool is limited to resources explicitly assigned
to it. When expandable, a resource pool can tap into its parent
resource pool when it can’t satisfy the requests. If this resource
pool setting is changed, your capacity availability will be
impacted.
Another critical setting which impacts capacity is the failover “To identify them you need
option setting in a cluster if it is VMware HA enabled. By default it to clearly understand
is set to 1 meaning that the cluster should be able to handle one utilization of all resources
host failure and must have enough capacity on the remaining
(memory, CPU, storage,
hosts to handle all of the virtual machines that need to be moved
out of the failed host. Clearly, if the failover occurs, the resource network, disk i/o) on all
capacity availability will change drastically. Moreover, if the hosts, clusters, and
setting is set higher then 1, you really need to make sure that resource pools.”
adequate capacity exists
As you can see there are many ways to run out of resources. So
now that we know this, the question becomes what we can do to
prevent this situation.
Finding Current Capacity Bottlenecks
The first thing you must do is to identify current capacity
bottlenecks. To identify them you need to clearly understand
utilization of all resources (memory, CPU, storage, network, disk
i/o) on all hosts, clusters, and resource pools.
Our recommendation is to focus first on resources that usually
become the first bottlenecks, such as memory and storage, and
then proceed to examine other resource types. Unfortunately,
with most available tools today that graph resource utilization
over a period of time, identifying current bottlenecks is a very
time consuming task.
Examine this simple example. Let’s say you have 30 VMware
ESX hosts organized into 4 clusters with 10 resource pools. To
find a current capacity bottlenecks, you will have to examine (30
hosts + 4 clusters + 10 Resource pools) X 4 Resource types =
176 charts.
And, this is not a one time event. Given how quickly most
organizations are adding new virtual machines, you need to
examine the charts at least once or twice a week at a minimum.
That’s a lot of work and unnecessary time to be spending on one
issue.
Page 6 of 8
Identifying Future Capacity Bottlenecks
Identifying future capacity bottlenecks is not a trivial process.
Some people are fooled into thinking that as long as you
correctly compute future growth requirements then you are all
set. However, the problem is much more complex than that.
Preventing future capacity bottlenecks requires you to perform
many steps which need to be repeated often enough to avoid
running into problems. We recommend that VMware “For virtualization projects
administrators perform the following steps at least several times to be ongoing successes,
per week. you must have control of
your available capacity.”
1. Compute additional resource requirements needed to support
new virtual machines before deployment and figure out which
hosts, clusters, or resource pools have the necessary resources.
2. Closely monitor growth in resource utilization of every host,
cluster, and resource pool. Here you must examine resource
utilization at every level just as you do to figure out current
capacity bottlenecks (see the formula above). You then have to
compare last weeks capacity status with this week’s in order to
identify the “delta” or what’s changed in the utilization of
resources at every resource boundary level in your infrastructure.
3. You must impose an iron clad change control process on
critical configuration settings that control resource allocation. You
don’t want to magically discover that someone in your
organization changed the failover settings, or changed
reservations, or unilaterally decided to remove a host from a
cluster for maintenance. Your change control process must
extend to powered down virtual machines with a tight control on
who, where, and when these virtual machines can be powered
up.
4. Closely monitor workloads especially in terms of disk I/O and
time shifts. As you virtualize more servers, the variability of
workloads will inevitable increase. Some workloads will be I/O
intensive and others will be CPU intensive. What will also happen
is a time shift when the workloads will peak. It is a given. You
must find a way to forecast these situations.
Finding Pockets of Available Capacity in Your
Environment
For virtualization projects to be ongoing successes, you must
have control of your available capacity. As you add new virtual
machines, remember these critical questions:
• How many more virtual machines can you add?
Page 7 of 8
• What resources will you run out of first as more virtual
machines are added?
• Where are your current and future capacity bottlenecks?
Helping VMware administrators deal with these challenges,
companies, such as VKernel are providing products that quickly
solve your issues. Be sure to visit our website www.vkernel.com
frequently to download our products and check for updates. If
you have questions or need help, you are welcome to email us For more information or to learn
directly: abakman@vkernel.com or jmealey@vkernel.com. more, call 866-370-2733 or
visit www.vkernel.com.
Page 8 of 8
As we are writing this paper, the overwhelming majo more
As we are writing this paper, the overwhelming majority of organizations are migrating and consolidating their servers from physical to virtual environments. While the savings of the virtualized data center are extremely compelling, there is now a new set of challenges that System Administrators did not have to deal with in the physical world. One of the challenges is getting used to the fact that all hardware resources: CPU, memory, storage and network utilization are shared between virtual machines. This means that applications and users can impact each other and therefore resource monitoring becomes really important. Put another way if you don't closely monitor resource consumption by each virtual machine and simply keep adding more virtual machines without doing analysis on how this will impact all four core resources, the result will be bad performance and even system downtime. Ultimately, this means unhappy users and dissatisfied managers. So, you need to find a way to prevent this problem. less
0 comments
Post a comment