SlideShare a Scribd company logo
1 of 7
Download to read offline
AVOID RESOURCE CONTENTION
WITH ECO4CLOUD TECHNOLOGY
A PRIMARY TELCO USE CASE
Ph. +39 0984 494276 Piazza Vermicelli
87036 Rende (CS), Italy
www.eco4cloud.com
info@eco4cloud.com
Copyright © 2016 Eco4Cloud. All rights reserved. This product is protected by Italian and international copyright and intellectual property laws.
Eco4Cloud — www.eco4cloud.com | Phone +39 0984494276 | E-mail info@eco4cloud.com
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 2
Overcommitment and Contention
1. Introduction
VMware® ESX™ is a hypervisor designed to efficiently manage hardware resources
including CPU, memory, storage and network among multiple concurrent virtual machines
[1]. ESX uses high-level resource management policies to compute a target memory
allocation for each virtual machine (VM), based on the current system load and parameter
settings for the virtual machine (shares, reservation, and limit [2]).
The computed target allocation is used to guide the dynamic adjustment of the memory
allocation for each virtual machine; in case host memory is overcommitted, the target
allocations are achieved by invoking several lower-level mechanisms to reclaim memory
from virtual machines.
VMware ESX enables impressive memory and CPU consolidation ratios; ESX allows
running VMs with total configured resources that exceed the amount available on the
physical machine: this is called overcommitment.
Overcommitment raises the consolidation ratio, increases operational efficiency and lowers
total cost of operating virtual machines; if out of control, overcommitment leads to Resource
Contention, a typical situation where several VMs are competing over the same resources,
waiting for the VMware scheduler to assign them.
This is the main reason for performance issues in virtualized environment and, as such, it’s
the very first key performance indicator to be monitored in a virtual farm.
Contention is measured via CPU Ready Time and Memory Ballooning.
2. CPU Ready Time
CPU Ready Time is the period of time a VM waits in a ready-to-run state (meaning it has
work to do) before being scheduled by the hypervisor on one or more physical CPUs.
Therefore, CPU Ready Time is a metric showing how much time virtual CPU is ready to be
scheduled on a given physical host. In general terms, it is normal for VMs to have small
values of CPU Ready Time, even if the hypervisor is not over subscribed, or under heavy
activity; it is just the nature of shared scheduling in virtualization. For SMP VMs with multiple
vCPUs, the amount of ready time will generally be higher than for VMs with fewer vCPUs,
In general terms,
it is
normal for VMs to
have small values
of CPU Ready
Time
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 3
since it requires more resources to schedule/co-schedule the VM when necessary and each
CPU accumulates the time separately; under normal operating conditions, this value should
remain under 5%. If ready time values are higher, virtual machines experience bad
performance.
Even in best designed environments there will be some CPU contention and that is okay.
Any %ready number less than 5% is considered the optimal area to be in. Once your
%ready number climbs in between 5 and 10%, you need to pay attention when adding more
virtual machines and/or CPU cores to the virtual machines. We can call this the warning
area. Now, once the %ready numbers climb higher than 10%, you will reach the dangerous
area and as a consequence bad performance will impact those virtual machines. Your host
could show a %50 overall CPU utilization and strong CPU contention in your environment,
thus affecting the overall performance of your virtual machines.
Just to summarize, CPU contention is one of the hidden issues you might find in your
environment, unless you know where looking for. The best tool to use when looking for any
CPU contention in your environment is ESXTOP from inside the service console of the host,
RESXTOP from the vMA appliance, or other third-party tools, like Eco4Cloud. The best
defense against CPU contention is knowledge and comprehension of scheduler interactions
with multi-processor virtual machines; if you are using multi-processor systems, take into
account that potential issue.
While there are a number of scenarios where high values of CPU Ready Time can occur,
there are two most common scenarios. The first common reason tends to be host over
subscription, where too many vCPUs have been allocated per pCPU ratio wise; while ESX 5
supports a maximum of 25 vCPUs per physical CPU, this is definitely the case where just
because you can do it, it equals to a good practice. As always, your mileage may vary based
on your specific VM workloads, but typically you begin to experience some problems when a
host is in the range of 2-2.5X over subscribed for server workloads.
The second most common scenario where CPU Ready Time goes higher is when a larger
SMP VM, for example a 4-8 vCPUs running on a host having a lot of smaller VMs with 1-2
vCPUs for application servers. Depending on the number of physical processors and on the
total number of vCPUs allocated on the host, a larger resource allocation for the VM results
in longer waiting time, because the hypervisor has to preempt the necessary physical CPUs
to schedule/co-schedule the workload. When this issue occurs, the software vendor
increases vCPUs requirements, due to performance problems for the VM. Unfortunately, if
CPU Ready Time is the root cause, increasing vCPUs number actually does not improve
performance, on the contrary things get worse.
The best defense
against CPU
contention
is knowledge
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 4
3. Memory Ballooning
One of main benefits introduced by virtualization is virtual machines isolation, which is very
useful for security and risk management. A drawback of virtual machines isolation is that the
guest operating system is not aware it is running inside a virtual machine and is not aware of
the states of other virtual machines on the same physical host. When the hypervisor runs
multiple virtual machines and the total amount of free host memory gets low, none of the
virtual machines will release guest physical memory, since when the guest operating system
cannot detect the host’s memory shortage.
VMware ballooning is a memory reclamation technique used when an ESXi host is running
low on memory. This allows the physical host system to retrieve unused memory from
certain guest virtual machines (VMs) and share it with others [3].
Ballooning makes the guest operating system aware of the low memory status of the host. In
ESX, a balloon driver is loaded into the guest operating system as a pseudo-device driver. It
has no external interface to the guest operating system and communicates with the
hypervisor through a private channel. The balloon driver polls the hypervisor to obtain a
target balloon size. If the hypervisor needs to reclaim virtual machine memory, it sets a
proper target balloon size for the balloon driver, making it “inflate” by allocating guest
physical pages within the virtual machine.
Ballooned memory is a symptom of RAM memory contention. If host free memory drops
towards the 4% threshold, the hypervisor starts to reclaim memory, using ballooning.
VM memory ballooning can create performance degradation.
Ballooning is a CPU intensive process, and can eventually lead to memory swapping, when
a balloon driver inflates to the point where the VM no longer has enough memory to run its
processes. This will slow down the VMs, depending upon the amount of memory to recoup
and/or the quality of the storage IOPS delivered to it.
4. Why these counters are important
CPU Ready Time and Ballooned Memory are symptoms of contention on CPU and RAM,
respectively. These metrics represent, in IT literature, the universally recognized two most
significant indicators of the fact that virtual machines are experiencing bad performance.
The generally accepted industry best practice based on VMware’s guidelines is that CPU
Ready Time values up to 5% (per vCPU) fall within acceptable parameters.
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 5
Memory Ballooning is the first technique the hypervisor uses to reclaim memory. Absence or
very low levels of ballooning is a sign of excellent/good health for a virtual farm.
Eco4Cloud Workload Consolidation intelligence computes the ideal placement of VMs
among physical hosts, in order to decrease both CPU Ready Time and Memory Ballooning,
enabling higher performance and VMs density.
5. Test Workflow
A field test has been performed in a performance comparison between VMware® Distributed
Resource Scheduler and Eco4Cloud Workload Consolidation platform.
VMware® Distributed Resource Scheduler (DRS) aggregates computing capacity across a
set of servers into logical resource pools and intelligently allocates available resources
among the VMs, based on pre-defined rules.
VMware Distributed Power Management (DPM), within VMware DRS, automates power
management and minimizes power consumption across a given collection of servers in a
VMware DRS cluster.
The test was performed on a cluster in a production farm of a leader Italian Telco company;
the cluster contained 6 physical hosts running vmware vSphere version 5.0.
The hosts were HP ProLiant DL580 G5, equipped with 64GB RAM and 4 CPU socket. Three
hosts mounted 4x Intel® Xeon® CPU E7320 @ 2.13GHz while the other three mounted 4x
Intel® Xeon® CPU X7350 @ 2.93GHz. Each CPU had 4 physical cores, so the total number
of physical cores for each host was 16. The hosts ran about 94 virtual machines with a
number of virtual CPU assigned cores that range from 1 to 8 (most of them with 2 or 4 virtual
cores) and an amount of assigned RAM varying from 1 to 16 GB (most of them with 2 or 4
GB RAM). The guests operating systems were: 80% Microsoft Windows (various editions,
32 and 64 bit), 14% Linux Red Hat Enterprise (5 and 6, 32 and 64 bit), and 6% Oracle
Solaris 10 64 bit.
The average CPU host usage during the test performance was about 28%.
In order to collect valuable data, a set of tests using VMWare DRS and E4C Workload
Consolidation were performed.
Overall test was set to run in 6 days, divided in two equal length phases.
During first phase (3 days) workload placement was managed with VMware® DRS in fully
automated mode and Eco4Cloud Workload Consolidation was disabled.
Avoiding ballooning
is sign
of good health
for a virtual farm
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 6
After that first phase, a second one of additional 3 days occurred: Eco4Cloud Workload
Consolidation was enabled and VMware® DRS was put in partially automated mode.
The two phases were comparable, because the production workload on the given cluster did
not change significantly.
6. Results
Just after the end of the test, it was crystal clear that through Eco4Cloud Workload
Consolidation usage, the overall cluster performance increased: on one hand, CPU
Ready Time dropped by 23%; on the other hand, Ballooned Memory was completely
removed, through the intelligent workload placement strategy brought by Eco4Cloud
Workload Consolidation.
On the memory side, the result is crystal clear: problem solved.
On the CPU side, the result positively affects performance; 23% is just an aggregate value.
Let’s see how CPU Ready Time decreases in most important cases, where CPU Ready
Time exceeds the warning and alert thresholds, 5% and 10%, respectively.
Ballooning
memory
totally
removed
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 7
As you can see from the following exhibit, CPU Ready Time warnings decrease by 90.26%,
while CPU Ready Time alerts decrease by 42.86%.
It means, in our evaluation scenario:
- 514 less warning/alerts each day, per cluster
- 3598 less warnings/alerts each week, per cluster
Given how much time it takes to manage a performance warning or an alert, evaluating how
much time you can save with an intelligent workload placement solution is simple math.
References
[1] Carl A. Waldspurger. “Memory Resource Management in VMware ESX Server”.
Proceeding of the fifth Symposium on Operating System Design and Implementation,
Boston, Dec 2002
[2] vSphere Resource Management Guide. VMware.
http://www.vmware.com/pdf/vsphere4/r40/vsp_40_upgrade_guide.pdf
[3] Understanding Memory Resource Management in VMware® ESX™ Server
http://www.vmware.com/files/pdf/perf-vsphere-memory_management.pdf
For more information
 E4C Workload Consolidation: http://www.eco4cloud.com/workload-consolidation
 Eco4Cloud Workload Consolidation Product Overview
 Eco4Cloud Workload Consolidation FAQ
Ph. +39 0984 494276 Piazza Vermicelli
87036 Rende (CS), Italy
www.eco4cloud.com
info@eco4cloud.com
CPU Ready
Time warnings
and alerts
decreased by
more than 90%
and 42%,
respectively

More Related Content

What's hot

Accelerating virtualized Oracle 12c performance with vSphere 5.5 advanced fea...
Accelerating virtualized Oracle 12c performance with vSphere 5.5 advanced fea...Accelerating virtualized Oracle 12c performance with vSphere 5.5 advanced fea...
Accelerating virtualized Oracle 12c performance with vSphere 5.5 advanced fea...
Principled Technologies
 
Postgres plus cloud_database_getting_started_guide
Postgres plus cloud_database_getting_started_guidePostgres plus cloud_database_getting_started_guide
Postgres plus cloud_database_getting_started_guide
ice1oog
 

What's hot (14)

VMworld 2013: VMware Disaster Recovery Solution with Oracle Data Guard and Si...
VMworld 2013: VMware Disaster Recovery Solution with Oracle Data Guard and Si...VMworld 2013: VMware Disaster Recovery Solution with Oracle Data Guard and Si...
VMworld 2013: VMware Disaster Recovery Solution with Oracle Data Guard and Si...
 
Benchmark emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mware
Benchmark   emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mwareBenchmark   emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mware
Benchmark emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mware
 
Accelerating virtualized Oracle 12c performance with vSphere 5.5 advanced fea...
Accelerating virtualized Oracle 12c performance with vSphere 5.5 advanced fea...Accelerating virtualized Oracle 12c performance with vSphere 5.5 advanced fea...
Accelerating virtualized Oracle 12c performance with vSphere 5.5 advanced fea...
 
Postgres plus cloud_database_getting_started_guide
Postgres plus cloud_database_getting_started_guidePostgres plus cloud_database_getting_started_guide
Postgres plus cloud_database_getting_started_guide
 
Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1
Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1
Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1
 
KoprowskiT_2AMaDisasterJustBeganAD2018
KoprowskiT_2AMaDisasterJustBeganAD2018KoprowskiT_2AMaDisasterJustBeganAD2018
KoprowskiT_2AMaDisasterJustBeganAD2018
 
Citrix PVS Advanced memory and storage considerations for provisioning services
Citrix PVS Advanced memory and storage considerations for provisioning servicesCitrix PVS Advanced memory and storage considerations for provisioning services
Citrix PVS Advanced memory and storage considerations for provisioning services
 
Ibm aix technical deep dive workshop advanced administration and problem dete...
Ibm aix technical deep dive workshop advanced administration and problem dete...Ibm aix technical deep dive workshop advanced administration and problem dete...
Ibm aix technical deep dive workshop advanced administration and problem dete...
 
VMworld 2013: vSphere Data Protection 5.5 Advanced VMware Backup and Recovery...
VMworld 2013: vSphere Data Protection 5.5 Advanced VMware Backup and Recovery...VMworld 2013: vSphere Data Protection 5.5 Advanced VMware Backup and Recovery...
VMworld 2013: vSphere Data Protection 5.5 Advanced VMware Backup and Recovery...
 
Virtualization with Lenovo X6 Blade Servers: white paper
Virtualization with Lenovo X6 Blade Servers: white paperVirtualization with Lenovo X6 Blade Servers: white paper
Virtualization with Lenovo X6 Blade Servers: white paper
 
VMworld 2014: Data Protection for vSphere 101
VMworld 2014: Data Protection for vSphere 101VMworld 2014: Data Protection for vSphere 101
VMworld 2014: Data Protection for vSphere 101
 
A Step-By-Step Disaster Recovery Blueprint & Best Practices for Your NetBacku...
A Step-By-Step Disaster Recovery Blueprint & Best Practices for Your NetBacku...A Step-By-Step Disaster Recovery Blueprint & Best Practices for Your NetBacku...
A Step-By-Step Disaster Recovery Blueprint & Best Practices for Your NetBacku...
 
Optimize Oracle On VMware (Sep 2011)
Optimize Oracle On VMware (Sep 2011)Optimize Oracle On VMware (Sep 2011)
Optimize Oracle On VMware (Sep 2011)
 
Optimize oracle on VMware (April 2011)
Optimize oracle on VMware (April 2011)Optimize oracle on VMware (April 2011)
Optimize oracle on VMware (April 2011)
 

Viewers also liked

Viewers also liked (10)

The benefits of operating systems consolidation in corporate datacenters
The benefits of operating systems consolidation in corporate datacentersThe benefits of operating systems consolidation in corporate datacenters
The benefits of operating systems consolidation in corporate datacenters
 
Nelson Mandela Home Learning
Nelson Mandela Home LearningNelson Mandela Home Learning
Nelson Mandela Home Learning
 
denilson
denilsondenilson
denilson
 
Teoría de las necesidades de abraham maslow ...
Teoría de las necesidades de abraham maslow                                  ...Teoría de las necesidades de abraham maslow                                  ...
Teoría de las necesidades de abraham maslow ...
 
huracanes
huracaneshuracanes
huracanes
 
Saving energy in data centers through workload consolidation
Saving energy in data centers through workload consolidationSaving energy in data centers through workload consolidation
Saving energy in data centers through workload consolidation
 
Music Video Work
Music Video WorkMusic Video Work
Music Video Work
 
Careers our ancestors wouldn't believe
Careers our ancestors wouldn't believeCareers our ancestors wouldn't believe
Careers our ancestors wouldn't believe
 
Eco4Cloud - Company Presentation
Eco4Cloud - Company PresentationEco4Cloud - Company Presentation
Eco4Cloud - Company Presentation
 
How to sell clickbank products fast
How to sell clickbank products fastHow to sell clickbank products fast
How to sell clickbank products fast
 

Similar to Avoid resource contention with e4 c

ovm3-server-pool-459310
ovm3-server-pool-459310ovm3-server-pool-459310
ovm3-server-pool-459310
Enoch Antwi
 
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
Cloudian
 

Similar to Avoid resource contention with e4 c (20)

Networker integration for optimal performance
Networker integration for optimal performanceNetworker integration for optimal performance
Networker integration for optimal performance
 
Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
 Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
 
Performance management in the virtual data center
Performance management in the virtual data centerPerformance management in the virtual data center
Performance management in the virtual data center
 
Virtual Memory
Virtual MemoryVirtual Memory
Virtual Memory
 
Cpu ready recomendaciones
Cpu ready    recomendacionesCpu ready    recomendaciones
Cpu ready recomendaciones
 
EMC VNX
EMC VNXEMC VNX
EMC VNX
 
New Features For Your Software Defined Storage
New Features For Your Software Defined StorageNew Features For Your Software Defined Storage
New Features For Your Software Defined Storage
 
VirutualMemory.docx
VirutualMemory.docxVirutualMemory.docx
VirutualMemory.docx
 
Emc vi pr controller
Emc vi pr controllerEmc vi pr controller
Emc vi pr controller
 
Operating Systems - memory management
Operating Systems - memory managementOperating Systems - memory management
Operating Systems - memory management
 
A Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and ContainersA Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and Containers
 
Esxi troubleshooting
Esxi troubleshootingEsxi troubleshooting
Esxi troubleshooting
 
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
 
White Paper: Introduction to VFCache
White Paper: Introduction to VFCache   White Paper: Introduction to VFCache
White Paper: Introduction to VFCache
 
Del 1
Del 1Del 1
Del 1
 
ovm3-server-pool-459310
ovm3-server-pool-459310ovm3-server-pool-459310
ovm3-server-pool-459310
 
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
How to become cloud backup provider with Cloudian HyperStore and CloudBerry L...
 
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and CephAccelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
 
Optimizing cpu resources
Optimizing cpu resourcesOptimizing cpu resources
Optimizing cpu resources
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 

Recently uploaded

Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
drm1699
 

Recently uploaded (20)

[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)
 
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
 
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ 🏥 Safe Abortion Pills In Pongola...
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
 
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
Auto Affiliate  AI Earns First Commission in 3 Hours..pdfAuto Affiliate  AI Earns First Commission in 3 Hours..pdf
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
 
Your Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | EvmuxYour Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | Evmux
 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
Rapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and InsightsRapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and Insights
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements Engineering
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdf
 
Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
 
A Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdfA Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdf
 
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 

Avoid resource contention with e4 c

  • 1. AVOID RESOURCE CONTENTION WITH ECO4CLOUD TECHNOLOGY A PRIMARY TELCO USE CASE Ph. +39 0984 494276 Piazza Vermicelli 87036 Rende (CS), Italy www.eco4cloud.com info@eco4cloud.com Copyright © 2016 Eco4Cloud. All rights reserved. This product is protected by Italian and international copyright and intellectual property laws. Eco4Cloud — www.eco4cloud.com | Phone +39 0984494276 | E-mail info@eco4cloud.com
  • 2. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 2 Overcommitment and Contention 1. Introduction VMware® ESX™ is a hypervisor designed to efficiently manage hardware resources including CPU, memory, storage and network among multiple concurrent virtual machines [1]. ESX uses high-level resource management policies to compute a target memory allocation for each virtual machine (VM), based on the current system load and parameter settings for the virtual machine (shares, reservation, and limit [2]). The computed target allocation is used to guide the dynamic adjustment of the memory allocation for each virtual machine; in case host memory is overcommitted, the target allocations are achieved by invoking several lower-level mechanisms to reclaim memory from virtual machines. VMware ESX enables impressive memory and CPU consolidation ratios; ESX allows running VMs with total configured resources that exceed the amount available on the physical machine: this is called overcommitment. Overcommitment raises the consolidation ratio, increases operational efficiency and lowers total cost of operating virtual machines; if out of control, overcommitment leads to Resource Contention, a typical situation where several VMs are competing over the same resources, waiting for the VMware scheduler to assign them. This is the main reason for performance issues in virtualized environment and, as such, it’s the very first key performance indicator to be monitored in a virtual farm. Contention is measured via CPU Ready Time and Memory Ballooning. 2. CPU Ready Time CPU Ready Time is the period of time a VM waits in a ready-to-run state (meaning it has work to do) before being scheduled by the hypervisor on one or more physical CPUs. Therefore, CPU Ready Time is a metric showing how much time virtual CPU is ready to be scheduled on a given physical host. In general terms, it is normal for VMs to have small values of CPU Ready Time, even if the hypervisor is not over subscribed, or under heavy activity; it is just the nature of shared scheduling in virtualization. For SMP VMs with multiple vCPUs, the amount of ready time will generally be higher than for VMs with fewer vCPUs, In general terms, it is normal for VMs to have small values of CPU Ready Time
  • 3. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 3 since it requires more resources to schedule/co-schedule the VM when necessary and each CPU accumulates the time separately; under normal operating conditions, this value should remain under 5%. If ready time values are higher, virtual machines experience bad performance. Even in best designed environments there will be some CPU contention and that is okay. Any %ready number less than 5% is considered the optimal area to be in. Once your %ready number climbs in between 5 and 10%, you need to pay attention when adding more virtual machines and/or CPU cores to the virtual machines. We can call this the warning area. Now, once the %ready numbers climb higher than 10%, you will reach the dangerous area and as a consequence bad performance will impact those virtual machines. Your host could show a %50 overall CPU utilization and strong CPU contention in your environment, thus affecting the overall performance of your virtual machines. Just to summarize, CPU contention is one of the hidden issues you might find in your environment, unless you know where looking for. The best tool to use when looking for any CPU contention in your environment is ESXTOP from inside the service console of the host, RESXTOP from the vMA appliance, or other third-party tools, like Eco4Cloud. The best defense against CPU contention is knowledge and comprehension of scheduler interactions with multi-processor virtual machines; if you are using multi-processor systems, take into account that potential issue. While there are a number of scenarios where high values of CPU Ready Time can occur, there are two most common scenarios. The first common reason tends to be host over subscription, where too many vCPUs have been allocated per pCPU ratio wise; while ESX 5 supports a maximum of 25 vCPUs per physical CPU, this is definitely the case where just because you can do it, it equals to a good practice. As always, your mileage may vary based on your specific VM workloads, but typically you begin to experience some problems when a host is in the range of 2-2.5X over subscribed for server workloads. The second most common scenario where CPU Ready Time goes higher is when a larger SMP VM, for example a 4-8 vCPUs running on a host having a lot of smaller VMs with 1-2 vCPUs for application servers. Depending on the number of physical processors and on the total number of vCPUs allocated on the host, a larger resource allocation for the VM results in longer waiting time, because the hypervisor has to preempt the necessary physical CPUs to schedule/co-schedule the workload. When this issue occurs, the software vendor increases vCPUs requirements, due to performance problems for the VM. Unfortunately, if CPU Ready Time is the root cause, increasing vCPUs number actually does not improve performance, on the contrary things get worse. The best defense against CPU contention is knowledge
  • 4. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 4 3. Memory Ballooning One of main benefits introduced by virtualization is virtual machines isolation, which is very useful for security and risk management. A drawback of virtual machines isolation is that the guest operating system is not aware it is running inside a virtual machine and is not aware of the states of other virtual machines on the same physical host. When the hypervisor runs multiple virtual machines and the total amount of free host memory gets low, none of the virtual machines will release guest physical memory, since when the guest operating system cannot detect the host’s memory shortage. VMware ballooning is a memory reclamation technique used when an ESXi host is running low on memory. This allows the physical host system to retrieve unused memory from certain guest virtual machines (VMs) and share it with others [3]. Ballooning makes the guest operating system aware of the low memory status of the host. In ESX, a balloon driver is loaded into the guest operating system as a pseudo-device driver. It has no external interface to the guest operating system and communicates with the hypervisor through a private channel. The balloon driver polls the hypervisor to obtain a target balloon size. If the hypervisor needs to reclaim virtual machine memory, it sets a proper target balloon size for the balloon driver, making it “inflate” by allocating guest physical pages within the virtual machine. Ballooned memory is a symptom of RAM memory contention. If host free memory drops towards the 4% threshold, the hypervisor starts to reclaim memory, using ballooning. VM memory ballooning can create performance degradation. Ballooning is a CPU intensive process, and can eventually lead to memory swapping, when a balloon driver inflates to the point where the VM no longer has enough memory to run its processes. This will slow down the VMs, depending upon the amount of memory to recoup and/or the quality of the storage IOPS delivered to it. 4. Why these counters are important CPU Ready Time and Ballooned Memory are symptoms of contention on CPU and RAM, respectively. These metrics represent, in IT literature, the universally recognized two most significant indicators of the fact that virtual machines are experiencing bad performance. The generally accepted industry best practice based on VMware’s guidelines is that CPU Ready Time values up to 5% (per vCPU) fall within acceptable parameters.
  • 5. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 5 Memory Ballooning is the first technique the hypervisor uses to reclaim memory. Absence or very low levels of ballooning is a sign of excellent/good health for a virtual farm. Eco4Cloud Workload Consolidation intelligence computes the ideal placement of VMs among physical hosts, in order to decrease both CPU Ready Time and Memory Ballooning, enabling higher performance and VMs density. 5. Test Workflow A field test has been performed in a performance comparison between VMware® Distributed Resource Scheduler and Eco4Cloud Workload Consolidation platform. VMware® Distributed Resource Scheduler (DRS) aggregates computing capacity across a set of servers into logical resource pools and intelligently allocates available resources among the VMs, based on pre-defined rules. VMware Distributed Power Management (DPM), within VMware DRS, automates power management and minimizes power consumption across a given collection of servers in a VMware DRS cluster. The test was performed on a cluster in a production farm of a leader Italian Telco company; the cluster contained 6 physical hosts running vmware vSphere version 5.0. The hosts were HP ProLiant DL580 G5, equipped with 64GB RAM and 4 CPU socket. Three hosts mounted 4x Intel® Xeon® CPU E7320 @ 2.13GHz while the other three mounted 4x Intel® Xeon® CPU X7350 @ 2.93GHz. Each CPU had 4 physical cores, so the total number of physical cores for each host was 16. The hosts ran about 94 virtual machines with a number of virtual CPU assigned cores that range from 1 to 8 (most of them with 2 or 4 virtual cores) and an amount of assigned RAM varying from 1 to 16 GB (most of them with 2 or 4 GB RAM). The guests operating systems were: 80% Microsoft Windows (various editions, 32 and 64 bit), 14% Linux Red Hat Enterprise (5 and 6, 32 and 64 bit), and 6% Oracle Solaris 10 64 bit. The average CPU host usage during the test performance was about 28%. In order to collect valuable data, a set of tests using VMWare DRS and E4C Workload Consolidation were performed. Overall test was set to run in 6 days, divided in two equal length phases. During first phase (3 days) workload placement was managed with VMware® DRS in fully automated mode and Eco4Cloud Workload Consolidation was disabled. Avoiding ballooning is sign of good health for a virtual farm
  • 6. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 6 After that first phase, a second one of additional 3 days occurred: Eco4Cloud Workload Consolidation was enabled and VMware® DRS was put in partially automated mode. The two phases were comparable, because the production workload on the given cluster did not change significantly. 6. Results Just after the end of the test, it was crystal clear that through Eco4Cloud Workload Consolidation usage, the overall cluster performance increased: on one hand, CPU Ready Time dropped by 23%; on the other hand, Ballooned Memory was completely removed, through the intelligent workload placement strategy brought by Eco4Cloud Workload Consolidation. On the memory side, the result is crystal clear: problem solved. On the CPU side, the result positively affects performance; 23% is just an aggregate value. Let’s see how CPU Ready Time decreases in most important cases, where CPU Ready Time exceeds the warning and alert thresholds, 5% and 10%, respectively. Ballooning memory totally removed
  • 7. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 7 As you can see from the following exhibit, CPU Ready Time warnings decrease by 90.26%, while CPU Ready Time alerts decrease by 42.86%. It means, in our evaluation scenario: - 514 less warning/alerts each day, per cluster - 3598 less warnings/alerts each week, per cluster Given how much time it takes to manage a performance warning or an alert, evaluating how much time you can save with an intelligent workload placement solution is simple math. References [1] Carl A. Waldspurger. “Memory Resource Management in VMware ESX Server”. Proceeding of the fifth Symposium on Operating System Design and Implementation, Boston, Dec 2002 [2] vSphere Resource Management Guide. VMware. http://www.vmware.com/pdf/vsphere4/r40/vsp_40_upgrade_guide.pdf [3] Understanding Memory Resource Management in VMware® ESX™ Server http://www.vmware.com/files/pdf/perf-vsphere-memory_management.pdf For more information  E4C Workload Consolidation: http://www.eco4cloud.com/workload-consolidation  Eco4Cloud Workload Consolidation Product Overview  Eco4Cloud Workload Consolidation FAQ Ph. +39 0984 494276 Piazza Vermicelli 87036 Rende (CS), Italy www.eco4cloud.com info@eco4cloud.com CPU Ready Time warnings and alerts decreased by more than 90% and 42%, respectively