Managing Complexity in the x86 Data Center: The User Experience
Destroying Perf Bottlenecks
1. TECHNICAL BRIEF
Quickly Identify Issues and Restore VM Performance
Written by
Quest Software, Inc.
Destroying Performance
Bottlenecks Across Diverse
VM Environments
2. Tech Brief – Destroying Performance Bottlenecks Across Diverse VM Environments 2
Contents
Challenge.......................................................................................................................................................................3
Determining the Real Root Cause..................................................................................................................................4
Achieving the Level of Visibility You Need .....................................................................................................................6
Conclusion .....................................................................................................................................................................8
3. Tech Brief – Destroying Performance Bottlenecks Across Diverse VM Environments 3
Challenge
Managing uptime and performance is critical to any environment. With virtualization, however, it changes the game
significantly when you consider the increased complexities involved in managing the interdependencies between
virtual machines (VMs) and the underlying physical infrastructure that supports them. As organizations become more
dependent on virtualization, it’s becoming increasingly critical, and challenging, to manage it effectively and to ensure
optimal uptime and availability.
As organizations mature in their use of virtualization, they encounter challenges along the way that are unique to
virtualized environments. These challenges eventually disrupt many of the management techniques traditionally
used. A reactive management approach typically only provides a temporary fix, which allows problems to resurface
later. The challenge ensues because many organizations are simply unaware of the extent of the additional
complexities virtualization introduces and why it makes management more complicated. A few examples of the
complexities introduced by server virtualization are:
Traditional infrastructure assigns specific hardware to specific applications, or ―share nothing,‖ providing
exclusive use of resources (except maybe disk)
Virtual infrastructure, in most cases, is share everything: shared resources across CPU, memory, network
and disk
More complicated still is the fact that a virtual machine (VM) is just a process that co-exists with others, sharing time
and access to resource availability and prioritization of needs. The impact of shared resources can have a resounding
effect on the performance of your overall VM environment – the source of many performance bottlenecks can be
directly traced to one form of resource contention or another. An example of this includes one VM starving another of
resources, like storage, CPU and memory – this happens when there are too many VMs running on one host.
The hypervisor itself can introduce specific issues. This becomes evident when attempting to manage memory
contention on VMs and running into memory limit issues, which are often difficult to detect, yet easy fix. The guest VM
may be configured to use a larger amount of memory than what is configured on the hypervisor (a hard limit setting).
This results in the guest requesting more memory than the hypervisor is allowing, leading to serious performance
problems. Reliance on metrics commonly used to measure physical environments (e.g. time metrics like I/O per
second) within a VM guest are misleading and do not provide any value in this case.
This tech brief explores the primary attributes of a solution that not only detects, diagnoses and resolves performance
bottlenecks in virtualized infrastructures quickly and efficiently, but also reduces risk and improves performance,
uptime and efficiency.
4. Tech Brief – Destroying Performance Bottlenecks Across Diverse VM Environments 4
Determining the Real Root Cause
Determining the root cause of problems is the most important step to reducing overall mean-time-to-resolution
(MTTR). In virtualized environments, however, finding the root cause is even more elusive. Sharing resources across
physical hosts and multiple virtual guests not only makes it difficult to pinpoint the genuine underlying root cause of a
problem, it also makes it difficult to get relevant information required to solve the issue.
Inefficient and costly practices – like blindly throwing more disk, memory, processor or hardware at a problem – are
not sufficient. In most cases, this approach simply masks the situation (and in all probability, the problem will return
with worse consequences). This situation provides a good example where virtualization-specific solutions are
needed. Traditional physical monitoring tools simply do not provide sufficient visibility into the virtual layer.
For example, let’s say a VM is experiencing a performance slowdown. Manual and simplistic management
techniques may reveal high disk swapping, but without insight into the virtual layer and proper virtualization-specific
diagnostics, users may not know that swapping had occurred because of an inflated balloon memory driver. This
could lead the user to a false diagnosis. In this case, a virtualization-specific solution provides the user with relevant
data, visibility into the root cause, and a resolution course of action (e.g. resetting the balloon driver).
Figure 1 – Visual cue of the four core resources: CPU, Memory, Network and Disk
A fast path to resolution is a required course of action for effectively killing bottleneck issues. A competent tool can
achieve this quickly, providing certainty that the root cause was isolated, addressed and resolved. This can best be
accomplished through the establishment of automated best-practice responses to remediating performance
bottlenecks.
Having the ability to be notified of relevant alarm details and then launch workflows specific to fast remediation of the
problem applies consistency to problem solving and speeds MTTR. This approach accelerates problem diagnosis
and correction, as well as increases staff efficiency by taking manual troubleshooting out of the equation. When using
an effective detect, diagnose and resolve (DDR) foundation, users must rely on a tool to present data on the
following:
Best Practices: A tool must alert users to conditions that are not satisfactory under normal ―best practices,‖
such as setting limits and thresholds
Impact Visibility: Visibility into whether configuration settings are correct, or are the root of a larger issue
5. Tech Brief – Destroying Performance Bottlenecks Across Diverse VM Environments 5
Capacity View: When and where are you going to run out of space at any given time
Changes in Behavior: Understand what caused an event to occur based on any changes in the environment
Clarity: Clear explanation of the problem that occurred and what can be done to fix it
Figure 2 – Automated remediation through problem identification by essential alerts
6. Tech Brief – Destroying Performance Bottlenecks Across Diverse VM Environments 6
Achieving the Level of Visibility You
Need
Effective virtualization performance monitoring and management also hinges on having sufficient visibility and control
of diverse and increasingly complex virtual environments. Users must rely on the necessary visibility (from their own
view), and either verify that problems were fixed, or continually monitor the state and health of their virtual
environment (Figure 3). Many tools make it overwhelming to zero in on the values and systems they are concerned
with at any given time. Administrators must be able to take any data point, or observation value, and tailor the
information to what they want/need to see, and not be hindered by a dashboard that only presents limited information.
Figure 3 – An at-a-glance representation of the state and health of the virtual environment
7. Tech Brief – Destroying Performance Bottlenecks Across Diverse VM Environments 7
Custom dashboards should also include the necessary visibility into the virtual infrastructure through the display of
important information relevant to the specific job functions of various stakeholders. Service views, such as in Figure
4, provide a comprehensive view of a service while being managed in a virtual environment, complete with data
supporting overall service levels and real-time state and health indicators. These views, along with detailed reporting,
act as invaluable resources for communicating with the rest of the business.
Figure 4 – Example of a service dashboard view of an HR application
8. Tech Brief – Destroying Performance Bottlenecks Across Diverse VM Environments 8
Conclusion
Success in server virtualization relies on innovative performance management technologies and practices. The need
to go deep into the virtualization layer with a solution designed for managing virtualized environments from the
ground-up is not only key, but necessary. Quest
®
vFoglight
®
provides the capabilities and intelligence to overcome
the complexities and challenges in virtualized environments with ease. With vFoglight, users are equipped to
accurately detect, diagnose and resolve problems with:
Pre-configured alerting based on industry best practices
Expert advice and clear information to guide users to fix problems and resolve the underlying issues that
lead to performance problems
Workflows and automated remediation based on alert context that cuts down time and effort
By helping maximize performance, speed problem resolution and ensure capacity across heterogeneous virtual,
physical, and cloud infrastructures, vFoglight simplifies the management of complex virtual environments and
increasing levels of availability and uptime, while reducing associated risk.