SlideShare a Scribd company logo
1 of 35
Commercial in Confidencewww.metron-athene.com
Virtualisation Oversubscription
(What’s so scary?)
Phil.bell@metron-athene.com
Commercial in Confidencewww.metron-athene.com
Topics
• What led me here
• Oversubscription Overview
• CPU Oversubscription
• Memory Oversubscription
• What’s the worst that can happen? (Queueing theory, the simple
version)
Commercial in Confidencewww.metron-athene.com
Overcommit vs Oversubscribe
• Overcommit = Oversubscribe
Commercial in Confidencewww.metron-athene.com
What led me here
• Clients
– “Oh, we don’t oversubscribe”
• Fear
• Misunderstanding
Commercial in Confidencewww.metron-athene.com
Flying Navigation by Dead Reckoning
• You know where you started
• You know how long you flew for
• You know your air speed
• You know what direction you flew in
• What if the wind changed in the last 8 hours?
• WW2 bombing saw 1 in 5 bomb loads within 5
miles of the target.
Commercial in Confidencewww.metron-athene.com
Virtualisation Used Capacity by Dead Reckoning
• You know what you started with
• You know what you provisioned
• You know how much is left
• Not especially efficient
Commercial in Confidencewww.metron-athene.com
Oversubscription
• Allocating more than you have
– Thin Provisioning
– Deduplication & Compression
Allocated
Exists
Allocated
Exists
Allocated
Used
Commercial in Confidencewww.metron-athene.com
What can be oversubscribed?
• CPUs
• Memory
• Disk
• NICs
– Nobody ever seems to think about that one
– VMs on a single host = no NIC involved
– Otherwise…
Commercial in Confidencewww.metron-athene.com
CPU VMware Maximums
• Virtual Machine Maximum
– 128 vCPUs per VM
• Host CPU maximums
– Logical CPUs per host 480
– Virtual machines per host 1024
– Virtual CPUs per host 4096
– Virtual CPUs per core 32
• The achievable number of vCPUs per core depends on the workload
and specifics of the hardware. For more information, see the latest
version of Performance Best Practices for VMware vSphere
https://www.vmware.com/pdf/vsphere6/r60/vsphere-60-
configuration-maximums.pdf
Commercial in Confidencewww.metron-athene.com
Memory VMware Maximums
• 6TB per Host
– Well 12TB on specific hardware
• 4TB per VM
Commercial in Confidencewww.metron-athene.com
Memory Oversubscription
• How?
– Free Space
– Page Sharing
– Balloon Driver (VMware)
– Reservations
– Shares
Commercial in Confidencewww.metron-athene.com
Memory
• Transparent Page Sharing
– Deduplication in memory
• Balloon Driver
– Vmmemctl process “steals” memory inside the VM allowing that
memory to be used by other VMs. This may cause the OS to page.
• VMkernel Swap
– VM thinks pages are in memory. ESX has put that memory on disk
in a Vmkernel Swap file.
– “Performance is NOT optimal”
Commercial in Confidencewww.metron-athene.com
Transparent Page Sharing
VM1 VM2
ESX
Commercial in Confidencewww.metron-athene.com
Balloon Driver (vmmemctl)
VM1 VM2
ESX
Commercial in Confidencewww.metron-athene.com
Memory test
• Memory vs. disk speed is…?
– A) Memory is 100x faster than disk
– B) Memory is 1,000x faster than disk
– C) Memory is 10,000x faster than disk
– D) Memory is 100,000x faster than disk
– E) Memory is 1,000,000x faster than disk
– F) I have no memory of the event, your honour
Commercial in Confidencewww.metron-athene.com
VMkernel Swap
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Balloon
Swap File
Reservation MB
Example:
• Assume maximum memory
contention
• Default 65% can be Balloon
driver
• Example Reservation is 30%
• 5% In the VMkernel (.vswp)
file.
Commercial in Confidencewww.metron-athene.com
Memory
433MB Active
Memory
2.6GB Unique
Memory
1.4GB Shared
Memory
50MB Balloon
Driver Memory
150MB ESX
Overhead for
the VM
Commercial in Confidencewww.metron-athene.com
Reservations
• Resource Pools or VMs
• If they want it, they get it
• If they don’t want it, it’s available to all
• Cannot reserve more than exists
• Oversubscribe
– Protect core VMs with a reservation
Commercial in Confidencewww.metron-athene.com
Memory Idle Tax
• Memory has Shares
• Memory Tax associates a value to each page used
• Default Idle Tax rate is 75%
• This makes idle memory cost 4 times as many shares as active
memory
Commercial in Confidencewww.metron-athene.com
CPU Oversubscription
• How?
– Time slicing
– Co-Scheduling
– Reservations
– Shares
– Limits
Commercial in Confidencewww.metron-athene.com
Time Slicing
• Cores are shared between vCPUs in time slices
– 1 vCPU to 1 core at any point in time
• More vCPUs = More time slicing
• Processes do this on CPUs all the time
– So why it is so scary?
– Over 100 processes on my laptop share 4 CPUs
Running Dormant/IdleVM1
VM1
Commercial in Confidencewww.metron-athene.com
IdleReadyThreads
VMWare Processor Scheduling:
vCPU Co-Scheduling & Ready Time
1
2
3
4
VM
VM
VM
VM
VM
VM
VM
VM
VM
Commercial in Confidencewww.metron-athene.com
ReservationsSharesLimits
Commercial in Confidencewww.metron-athene.com
Reservations
Prod VM
Reservation
CPUUsedbyProductionVM
CPU Used by Test VM
1)The Production VM wants to use all the CPU
available.
2)The Test VM starts and also wants to use all
the CPU available.
3)Each uses 50% CPU
4)The Production VM wants 250MHz CPU
while Test wants to use 4000MHz CPU.
Production gets 100% of it’s request. Test does
not.
100% CPU
100% CPU
0% CPU
50% CPU
50%CPU
Commercial in Confidencewww.metron-athene.com
Reservations & Shares
Prod VM
Reservation
CPUUsedbyProductionVM
CPU Used by Test VM
1)The Production VM (2000 Shares) wants to
use all the CPU available.
2)The Test VM (1000 Shares) also wants to use
all the CPU available.
3)Production gets 66% CPU, Test gets 33%
CPU.
4)The Production VM wants 250MHz CPU
while Test could still use 4000MHz CPU.
Production gets 100% of it’s request. Test does
not.
100% CPU
100% CPU
0% CPU 33% CPU
66%CPU
Commercial in Confidencewww.metron-athene.com
Expandable Reservation 1
Root (RP)
Total CPU: 10200 MHz
Software (RP)
Reservation: 3000 MHz
Expandable : Yes
Production (RP)
Reservation: 1200 MHz
Expandable : Yes
Test (RP)
Reservation: 1000 MHz
Expandable : No
VM1
Res: 400 MHz
VM2
Res: 300 MHz
VM7
Res: 500 MHz
Why Cant VM7 Start?
1200 MHz Required.
1000 MHz Available.
Commercial in Confidencewww.metron-athene.com
Expandable Reservation 2
Root (RP)
Total CPU: 10200 MHz
Software (RP)
Reservation: 3000 MHz
Expandable : Yes
Production (RP)
Reservation: 1200 MHz
Expandable : Yes
Test (RP)
Reservation: 1000 MHz
Expandable : Yes
VM1
Res: 400 MHz
VM2
Res: 300 MHz
VM7
Res: 500 MHz
VM3
Res: 500 MHz
VM4
Res: 500 MHz
VM5
Res: 500 MHz
VM6
Res: 500 MHz
2000MHz Requested
1200MHz Reservation
2000MHz of Parent Used
1200MHz Requested
1000MHz Available In Parent
Where is the “extra” taken from?
3200MHz Requested
3000MHz Reservation
200MHz used byTest (RP)
Commercial in Confidencewww.metron-athene.com
What’s the worst that can happen?
• Memory
• It fills up
• Then bad things happen
• CPU
• Bad things happen
• Then it’s full/maxed
• Queueing Theory
Commercial in Confidencewww.metron-athene.com
Contention and Queuing
• Finite system resources
• Single workstation = no contention (usually)
• More than One User = Possible Contention
• Contention = Queuing
– This is COMPLETELY NORMAL
– It’s how operating systems work.
• Excessive Queuing = Poor Performance and
Long Response Times
Commercial in Confidencewww.metron-athene.com
Basic Ideas of Queuing
Queue Server
Arriving customers,
transactions
A
Leaving customers,
transactions
L
Queuing Time
Q
Service Time
S
Response Time
Commercial in Confidencewww.metron-athene.com
Utilization and Response Time
Response Time
0 0.5 1.0
Utilization
Service
Time
R = S / (1 - U)
Commercial in Confidencewww.metron-athene.com
Benefits of Multiple Servers
Response Time
0 0.5 1.0
Utilization
Service
Time
Single CPU
Dual CPU
16-way CPU
Commercial in Confidencewww.metron-athene.com
Why are we interested in this queue stuff again?
• VMs Queue for free CPUs
– Ready Time
– Co-Stop time
– Higher utilisation = higher contention
– More concerned about CPU busy than vCPU to logical CPU ratio
– Because it’s maths, you can model it
Commercial in Confidencewww.metron-athene.com
Roundup
• Oversubscription does not equal unacceptable performance
• Virtualisation is expecting you to oversubscribe
– It’s the reason it exists
• Take the fear out of oversubscription through proper planning
– Plan for performance, not ratios
Commercial in Confidencewww.metron-athene.com
Thank You
www.metron-athene.com
Phil.bell@metron-athene.com

More Related Content

What's hot

Host and Boast: Best Practices for Magento Hosting | Imagine 2013 Technolog…
Host and Boast: Best Practices for Magento Hosting | Imagine 2013 Technolog…Host and Boast: Best Practices for Magento Hosting | Imagine 2013 Technolog…
Host and Boast: Best Practices for Magento Hosting | Imagine 2013 Technolog…
Atwix
 
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
Ontico
 

What's hot (16)

VMWare Performance Tuning by Virtera (Jan 2009)
VMWare Performance Tuning by  Virtera (Jan 2009)VMWare Performance Tuning by  Virtera (Jan 2009)
VMWare Performance Tuning by Virtera (Jan 2009)
 
Host and Boast: Best Practices for Magento Hosting | Imagine 2013 Technolog…
Host and Boast: Best Practices for Magento Hosting | Imagine 2013 Technolog…Host and Boast: Best Practices for Magento Hosting | Imagine 2013 Technolog…
Host and Boast: Best Practices for Magento Hosting | Imagine 2013 Technolog…
 
Automation with Microsoft Powershell
Automation with Microsoft PowershellAutomation with Microsoft Powershell
Automation with Microsoft Powershell
 
Good virtual machines
Good virtual machinesGood virtual machines
Good virtual machines
 
WordCamp Harare 2016 - Site Speed = Success
WordCamp Harare 2016 - Site Speed = SuccessWordCamp Harare 2016 - Site Speed = Success
WordCamp Harare 2016 - Site Speed = Success
 
Server side caching Vs other alternatives
Server side caching Vs other alternativesServer side caching Vs other alternatives
Server side caching Vs other alternatives
 
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
 
Mac版FileMaker Serverで使えるコマンドライン活用レシピ
Mac版FileMaker Serverで使えるコマンドライン活用レシピMac版FileMaker Serverで使えるコマンドライン活用レシピ
Mac版FileMaker Serverで使えるコマンドライン活用レシピ
 
DIY Async Message Pump: Lessons from the trenches
DIY Async Message Pump: Lessons from the trenchesDIY Async Message Pump: Lessons from the trenches
DIY Async Message Pump: Lessons from the trenches
 
Integration with EMC VNX and VNXe hybrid storage arrays
Integration with EMC VNX and VNXe hybrid storage arraysIntegration with EMC VNX and VNXe hybrid storage arrays
Integration with EMC VNX and VNXe hybrid storage arrays
 
Talk about Ansible and Infrastructure as Code
Talk about Ansible and Infrastructure as CodeTalk about Ansible and Infrastructure as Code
Talk about Ansible and Infrastructure as Code
 
Xen Installation Presentation
Xen Installation Presentation Xen Installation Presentation
Xen Installation Presentation
 
Thin client server capacity planning for sm es
Thin client server capacity planning for sm esThin client server capacity planning for sm es
Thin client server capacity planning for sm es
 
Building cloud stack at scale
Building cloud stack at scaleBuilding cloud stack at scale
Building cloud stack at scale
 
Next Level Mobile Graphics | Munseong Kang, Oleksii Vasylenko
Next Level Mobile Graphics | Munseong Kang, Oleksii VasylenkoNext Level Mobile Graphics | Munseong Kang, Oleksii Vasylenko
Next Level Mobile Graphics | Munseong Kang, Oleksii Vasylenko
 
Microsoft Azure Virtual Machine
Microsoft Azure Virtual MachineMicrosoft Azure Virtual Machine
Microsoft Azure Virtual Machine
 

Similar to Virtualisation Oversubscription - What's so scary?

vSphere APIs for performance monitoring
vSphere APIs for performance monitoringvSphere APIs for performance monitoring
vSphere APIs for performance monitoring
Alan Renouf
 
Top 5 vmware tips
Top 5 vmware tips Top 5 vmware tips
Top 5 vmware tips
Metron
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
Zubair Nabi
 
webinar vmware v-sphere performance management Challenges and Best Practices
webinar vmware v-sphere performance management Challenges and Best Practiceswebinar vmware v-sphere performance management Challenges and Best Practices
webinar vmware v-sphere performance management Challenges and Best Practices
Metron
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabits
Yves Goeleven
 

Similar to Virtualisation Oversubscription - What's so scary? (20)

vSphere APIs for performance monitoring
vSphere APIs for performance monitoringvSphere APIs for performance monitoring
vSphere APIs for performance monitoring
 
Cdn cs6740
Cdn cs6740Cdn cs6740
Cdn cs6740
 
Right-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual MachineRight-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual Machine
 
XPDDS18: Memory Overcommitment in XEN - Huang Zhichao, Huawei
XPDDS18: Memory Overcommitment in XEN - Huang Zhichao, HuaweiXPDDS18: Memory Overcommitment in XEN - Huang Zhichao, Huawei
XPDDS18: Memory Overcommitment in XEN - Huang Zhichao, Huawei
 
Top 5 vmware tips
Top 5 vmware tips Top 5 vmware tips
Top 5 vmware tips
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
 
20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS
 
Server virtualization
Server virtualizationServer virtualization
Server virtualization
 
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
 
Master VMware Performance and Capacity Management
Master VMware Performance and Capacity ManagementMaster VMware Performance and Capacity Management
Master VMware Performance and Capacity Management
 
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & MemoryVMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
 
VDI Design Guide
VDI Design GuideVDI Design Guide
VDI Design Guide
 
webinar vmware v-sphere performance management Challenges and Best Practices
webinar vmware v-sphere performance management Challenges and Best Practiceswebinar vmware v-sphere performance management Challenges and Best Practices
webinar vmware v-sphere performance management Challenges and Best Practices
 
High performance network programming on the jvm oscon 2012
High performance network programming on the jvm   oscon 2012 High performance network programming on the jvm   oscon 2012
High performance network programming on the jvm oscon 2012
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive
 
Virtual Infrastructure Disaster Recovery
Virtual Infrastructure Disaster RecoveryVirtual Infrastructure Disaster Recovery
Virtual Infrastructure Disaster Recovery
 
Артем Оробец «На пути к low-latency»
Артем Оробец «На пути к low-latency»Артем Оробец «На пути к low-latency»
Артем Оробец «На пути к low-latency»
 
The have no fear guide to virtualizing databases
The have no fear guide to virtualizing databasesThe have no fear guide to virtualizing databases
The have no fear guide to virtualizing databases
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabits
 

More from Metron

Vmware vsphere taking_a_trip_down_memory_lane
Vmware vsphere taking_a_trip_down_memory_laneVmware vsphere taking_a_trip_down_memory_lane
Vmware vsphere taking_a_trip_down_memory_lane
Metron
 
A roadmap to_success_in_capacity_management
A roadmap to_success_in_capacity_managementA roadmap to_success_in_capacity_management
A roadmap to_success_in_capacity_management
Metron
 

More from Metron (15)

Cloud Capacity Management
Cloud Capacity ManagementCloud Capacity Management
Cloud Capacity Management
 
It's all about the cmis, no trouble
It's all about the cmis, no troubleIt's all about the cmis, no trouble
It's all about the cmis, no trouble
 
Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1 Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1
 
Top 5 performance and capacity challenges for z/OS
Top 5 performance and capacity challenges for z/OS Top 5 performance and capacity challenges for z/OS
Top 5 performance and capacity challenges for z/OS
 
Essential reporting for capacity and performance management webinar 11 18
Essential reporting for capacity and performance management webinar 11 18Essential reporting for capacity and performance management webinar 11 18
Essential reporting for capacity and performance management webinar 11 18
 
Capacity Management for system z license charge reporting
Capacity Management for system z  license charge reportingCapacity Management for system z  license charge reporting
Capacity Management for system z license charge reporting
 
Clouds and costs
Clouds and costsClouds and costs
Clouds and costs
 
Top 5 key capacity management concerns for Unix
Top 5 key capacity management concerns for UnixTop 5 key capacity management concerns for Unix
Top 5 key capacity management concerns for Unix
 
Capacity Management - Telling the story
Capacity Management -  Telling the storyCapacity Management -  Telling the story
Capacity Management - Telling the story
 
Vmware vsphere taking_a_trip_down_memory_lane
Vmware vsphere taking_a_trip_down_memory_laneVmware vsphere taking_a_trip_down_memory_lane
Vmware vsphere taking_a_trip_down_memory_lane
 
Justifying capacity management by demonstrating the return on investment
Justifying capacity management by demonstrating the return on investmentJustifying capacity management by demonstrating the return on investment
Justifying capacity management by demonstrating the return on investment
 
Data data everywhere
Data data everywhereData data everywhere
Data data everywhere
 
Effective capacity management at the heart of green IT
Effective capacity management  at the heart of green ITEffective capacity management  at the heart of green IT
Effective capacity management at the heart of green IT
 
Why do we_model_in_the_uk_monitor_in_japan_and_manage_in_the_usa
Why do we_model_in_the_uk_monitor_in_japan_and_manage_in_the_usaWhy do we_model_in_the_uk_monitor_in_japan_and_manage_in_the_usa
Why do we_model_in_the_uk_monitor_in_japan_and_manage_in_the_usa
 
A roadmap to_success_in_capacity_management
A roadmap to_success_in_capacity_managementA roadmap to_success_in_capacity_management
A roadmap to_success_in_capacity_management
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Virtualisation Oversubscription - What's so scary?

Editor's Notes

  1. When I wrote the Title and Synopsis for this presentation I happened to choose the word oversubscription. It’s been pointed out to me that many people refer to “overcommit” rather than “oversubscribe”. Both words appear to be in use to describe this. Google does some nice work here to return results using either option.
  2. In my role with Metron I get to visit lots of different people working in lots of different industries. They are mostly capacity managers working in IT departments though, so there is some common ground. Over the past year I’ve had a number of mildly frustrating conversations with organisations. These tend to be newer and/or don’t have a good history of Capacity Management. The frustration has been around ‘Oversubscription’ in virtualised environments. I’ll start talking about how we can monitor the environment to ensure good performance in the future. When they’ll stop me and say, quite proudly, “Oh, we don’t oversubscribe, we don’t want to impact performance”. The pride is almost the worst part. They’ll beam a smile at the senior staff as if to say “We’ve got this, don’t worry”. The problem being the overspend that’s required to have that attitude. Ultimately there is a fear in these departments that oversubscription = poor performance. It’s considered to be a 1:1 relationship. The reason for that, is, to some extent, a misunderstanding of what oversubscription is. It’s go the word ‘over’ in it then it must be bad. Nothing in our department is ‘over’. We’re all looking at the same word, they see something bad, I see an opportunity to save some money. Correct me if you think I’m wrong, but saving money is typically thought to be a good thing.
  3. Avoiding oversubscription is a bit like navigating by dead reckoning. You know where you started. You know how long you were flying, and your air speed, and in what direction. You’ve even tried to take account of the wind speed and direction, but you are using a forecast for those that’s out of date when you did the planning never mind some hours into the flight. In WW2 a bomb was considered to be on target if it was within 5 miles of the actual target. We only managed that with 1 in 5. Dead reckoning isn’t very accurate on it’s own. The situation is just more complex than that, and the same remains true for people avoiding oversubscription.
  4. The essence of what these sites seem to be doing is this: We start with a 5 Host cluster that has 120 Logical CPUs, and 180GB RAM. We’re then going to issue no more than 96 vCPUs and 144GB RAM across the VMs. This allows for a host to fail and we can still run everything. We’ll also have great performance because VMs will get a CPU whenever they want it, because it’s theirs, and the same with memory. All the memory a VM want’s is real RAM. I’m not going to deny that performance will be about as good as it can be. But it’s not going to be terribly efficient. Chances are you could turn off 2 hosts and still see no impact in performance. Who wouldn't like to reduce their ESX licence, and related power costs by 20%, while still having a spare host?
  5. So what is oversubscription? Well the most obvious example happens in storage. Thin provisioning has been around a long time, and is the same thing, by another name. With storage you have the LUNs that are allocated. Now traditionally these would have been a physical allocation on disk that was available for use. But with Thin Provisioning you can allocate more space to LUNs than you actually have. The reason being, that most disks on servers are not full. So if the average disk is 30% full, you could get away with only having 50% of your allocated storage as real usable space that exists, and you’d still have plenty of space to grow into. On top of that, some storage systems will do their own deduplication. So if you have 200 Windows 2012 servers, all with a C drive that just has the OS on it. That’s about 12 GB per server storing the same base OS files. Or 2.4TB of space. Now those OS disks need space for things like memory dumps, updates and log files etc, which is the unused space. But do you want to spend 2.4TB of storage storing the same 12GB of files 200 times? Probably not. You’d prefer to store a single copy of all the identical files and let them all access that single copy. So you’re not just ignoring some of the unused space, you’re able to store less as well. So, 200, 32 GB drives (Minimum Windows 2012 requirement), would be 6.4TB. It’s now theoretically come down to something like 20GB used space with thin provisioning and deduplication. Oversubscription is a good thing.
  6. In our virtual world, now that we have broken the link between the OS and the hardware, we can over provision all sorts of things. CPU, Memory, Disk (as we mentioned) and NICs are all “Oversubscribed”. Disk we already looked at, Memory and CPU we’ll go into more detail on later. But I thought it was worth mentioning NICs here. Typically people seem to be running with 10 - 15 VMs on a single host. Which will have significantly fewer NICs installed. A Sever typically wouldn’t use all the bandwidth of it’s NIC. So that unused bandwidth is like the unused space on disk. When the VMs talk to other VMs on the same Host, that’s not generating traffic though the physical NICs, so we might consider that the equivalent of de-duplication.
  7. CPU and Memory are the main items people consider for Virtualised systems. So let’s lay down the maximums for a moment. A maximum of 480 Logical CPUs. Logical CPUs being simultaneous threads so that might be 240 hyper-threaded cores. 1024 VMs max on a host, with a max of 4096 vCPUs between them. Then we get to the maximum with a caveat. 32 vCPUs to a core, but it depends on “the workload and specifics of the hardware”. This raises 2 points. 1)Clearly it’s ok to oversubscribe CPUs and 2)There is no set number to tell you how much oversubscription is OK.
  8. Memory is a lot simpler. 6TB or 12TB in a host depending on hardware. 4TB in any single VM.
  9. Having set out those few ground rules we can talk about memory oversubscription. Just like disks have a lot of free space, servers have typically run with free space in memory. Then there are a number of tricks that the hypervisor can do to find even more savings. Page Sharing (Deduplication) The Balloon Driver Reservations Shares and Swap space
  10. Transparent Page Sharing is where the OS stores a single page of memory and shares it out to multiple VMs that have the same page in their memory. It’s very much the same as deduplication on storage. Only one copy of the data is stored. This just happens in VMware it’s not something you’d turn off. But it does mean that if you are doing a 1:1 VM MB to Host MB memory. Assuming your running a lot of the same OS and applications etc, then you are going to see a lot of spare memory on the hosts (if you bother to go and look). The Balloon driver is a nice little device. Essentially it inflates in the memory of a VM. Asking the OS for pages of RAM. As those pages are all the same, only a single copy is needed in RAM. Now the reason it inflates is to free up that memory for use by another VM. So consistent levels of Balloon Driver memory are an indicator of memory pressure on the host. At this point you may have taken oversubscription a touch too far. The other thing is that the OS doesn’t tell the Hypervisor when a page of memory has been release by a process. So by inflating the balloon driver and then deflating it, you can get the OS to allocate unused pages to the balloon driver, then if they don’t get over written when the balloon driver deflates, you know the processes in the OS don’t need that page of memory and you can use it for something else. Of course if the balloon driver inflates and the OS is forced to start pushing pages out into it’s swap file, that’s not great. Swapping is generally bad. When the hypervisor has to swap memory out to disk things have got really bad. You do not want to see this.
  11. Transparent Page Sharing When two or more Virtual Machines have the same pages of data in memory, VMware can store a single copy and present it to all the VMs. Should a VM alter a shared memory page, a copy will be created by VMware and presented to that VM. Example VM1 starts and allocates some unique memory. VM2 starts and allocates some unique memory. VM1 allocates memory for a standard windows dll VM2 also allocates memory for the same standard windows dll VMware maps both systems memory to the same page in RAM.
  12. Balloon Driver (vmmemctl) The Problem A process in VM1 is shut down and it’s memory is freed in the OS. The “hardware” does not know. The data is still there but only the OS inside the VM knows it can overwrite it. The VMware Solution When memory gets tight on an ESX host, the Vmkernel will pick a VM (based on shares), and tell the balloon driver to request some memory. The balloon driver requests memory and “pins” it so it cannot be paged. The Memory on the ESX is then freed up and can be allocated to another system.
  13. Memory test If memory must be copied to or from disk because there is more requested than can be satisfied, what’s the penalty for doing this ? A modern disk will respond to an I/O in about 5 milliseconds (5 * 1/1000 of a second). Access to memory is usually in the order of 50 nanoseconds (50 *1/1,000,000,000 of a second). That makes disk access a hundred thousand (100,000) times SLOWER than memory access. Tiny numbers like this are difficult to comprehend, so imagine that the memory access time was 1 second. To write something to disk would then take about 27 ¾ hours to complete. That’s one good reason for avoiding swapping if at all possible!
  14. VMkernel Swap A reservation is typically set against a resource pool and filters down to give a VM rights against memory. Essentially if a reservation has been set and applies to this VM, then the VM is guaranteed that amount of memory will be made available in RAM on the EX host. You can never reserve more memory than exists. So reservations can ensure good performance for the VMs you care about. You put the VMs in a resource pool, and allocate a reservation that’s appropriate. That might be your 1:1 ratio with allocated and reservation. Then let other less important VMs worry about oversubscription. When an ESX host is very short of memory it may have to resort to using .vswp swap files for the VM memory. At this point performance will be affected as data that the OS believes is in memory is, in reality, now on disk. A VM as default can have up to 65% of its memory used by the balloon driver. It may also have a memory reservation. The reservation cannot be swapped or taken up by the balloon driver. Any memory outside the 65% used by the balloon driver, and the reservation, can be placed into a .vswp file. In reality you never want this to happen.
  15. If we look at some stats for a single VM. This is a 4GB VM, but it’s only accessing about 400 MB on a regular basis. It’s got 2.6GB of memory that’s unique to itself, and 1.4GB that’s shared with other VMs. So at least one other VM is likely to be sharing at about 1.4GB memory as well. Given there are a lot of windows VMs in that cluster it’s likely a lot of them have similar amounts of shared memory. If there are 10 VMs on that host then that’s about 15GB or RAM that you don’t have to have installed. Or rather, a few more VMs that will fit on the host. There’s also a couple of hours where the balloon driver steals some memory from the VM. Only about 50MB and given the VMs only accessing 4 to 500MB of RAM, out of the 2.6GB that it’s using, the OS probably just released some cache to satisfy that request.
  16. Reservations are associated with Resource Pools or individual VMs. Essentially you are setting a value for CPU or Memory that the VM is guaranteed to get. If the VM doesn’t use all it’s reservation other VMs can make use of the Memory and CPU. The fairly obvious caveat is that you cannot have a total list of reservations that are bigger than the hardware. You can use reservations to ensure that important VMs get the resources they want. So you don’t have to worry about avoiding oversubscription for everything. Pick the VMs you want to perform their best and give them a reservation that ensures that. Then your background VMs can be pushed out the way if required.
  17. Like reservations, a VM also has an associated number of shares. The more shares, the more priority it has over the resource if there is contention. If a virtual machine is not actively using its currently allocated memory, ESX Server charges a memory tax — more for idle memory than for memory that is in use. That is, the idle memory counts more towards the share allocation than memory in use. The default tax rate is 75 percent, that is, an idle page of memory costs as much as four active pages. The end result is that VMs holding onto a lot of idle memory, will be more likely to have the balloon driver inflate inside them to try and release some of that idle memory for use by other VMs.
  18. Memory is fairly easy to describe but there are a lot of things going on. CPU Oversubscription and the technologies involved can be a little more complex to visualise, but there are less tools that the hypervisor has to work with. For a start, time is no longer a constant. The hypervisor has the ability to run time at whatever speed it likes. Just so long as it averages out in the end. Co-Scheduling is where we have to have all the vCPUs for a single VM, mapped to logical CPUs from the hardware. Reservations and Shares apply here also and we’ll have more of a look at how they work. Limits (also exist for memory), but these can be applied to restrict some VMs down to a smaller amount of CPU than their vCPU allocation would otherwise allow them to have.
  19. In a typical vmware host we have more vCPUs assigned to VMs than we do physical cores. The processing time of the physical cores (or logical CPUs if hyper threading is in play), has to be shared among the vCPUs in the VMs. The more vCPUs we have, the less time each can be on the core, and therefore the slower time passes for that VM. To keep the VM in time, extra time interrupts are sent in quick succession when the VM is processing. So time passes slowly and then very fast. Significant improvements have been made in this area over the releases of VMware. vCPUs can be scheduled onto the hardware a few milliseconds apart. But the basic concept remains in place.
  20. Here’s an animation to show the effect of what is happening inside the host to schedule the physical CPUs/cores to the vCPUs of the VMs. Clearly most hosts have more than 4 consecutive threads that can be processed. But let’s keep this simple to follow. 1)VMs that are “ready” are moved onto the Threads. 2)There is not enough space for all the vCPUs in all the VMs. So some are left behind. (CPU Utilisation = 75%, capacity used = 100%) 3)If a single vCPU VM finishes processing, the spare Threads can now be used to process a 2 vCPU vm. (CPU Utilisation = 100%) 4)A 4 vCPU VM needs to process. 5)Even if the 2 single vCPU VMs finish processing, the 4 vCPU VM cannot use the CPU available. 6)And while it’s accumulating Ready Time, other single vCPU VMs are able to take advantage of the available Threads 7)Even if we end up in a situation where only a single vCPU is being used, the 4 vCPU VM cannot do any processing. (CPU utilisation = 25%)
  21. As mentioned when we discussed time slicing, improvements have been made in the area of co-scheduling with each release of VMware. Amongst other things the time between individual CPUs being scheduled onto the physical CPUs has increased, allowing for greater flexibility in scheduling VMs with large number of vCPUs. Acceptable performance is seen from larger VMs. Along with Ready Time, there is also a Co-Stop metric. Ready Time can be accumulated against any VM. Co-Stop is specific to VMs with 2 or more vCPUs and relates to the time “stopped” due to Co-Scheduling contention. E.g. One or more vCPUs has been allocated a physical CPU, but we are stopped waiting on other vCPUs to be scheduled. I’d love to do an animation of that but my powerpoint skills would need seriously improving. Imagine the bottom of a “ready” VM displayed, sliding across to a thread and the top sliding across as other VMs move off the Threads. So the VM is no longer rigid it’s more of an elastic band.
  22. Reservations, Shares and Limits. VMs and Resource Pools can be allocated Reservations, Shares and Limits. These apply to the amount of CPU and Memory a VM or Resource pool can use. In the example above we have an Engineering Resource Pool containing 2 Virtual Machines. Test has 1000 CPU shares and Production has 2000 CPU shares. Giving a total of 3000 shares between them. If there is contention for CPU resource then Production will be given twice as much CPU time as Test. Also notice the reservation on the Resource Pool has an Expandable Reservation. This means that if there is another resource pool not using it’s reservation Engineering could claim and use that reservation if required. This could cause problems if the 2nd resource pool wishes to use it’s reservation as it will not be able to push Engineering out. So while this may provide flexibility it’s use should be monitored.
  23. Reservations Here’s a quick demonstration of what a reservation does. When both VMs want the same amount of resource (and have the same shares), they will get an even share of the CPU. Assuming they both want all of the 4000MHz available they will each get 50% of what they want. As the Production workload reduces, Test will take more and more of the CPU however Production will always have the rights to use 250MHz CPU. At the point where Production is using 250MHz CPU Production is in effect getting 100% of the CPU it wants while Test is getting 93.75% of the CPU it wants. Despite having the same shares values.
  24. Reservations and Shares If we run the scenario again but this time include the Shares values for the VMs the situation is different. When they are both trying to use all of the CPU the effect of the shares will come into play and with only 1000 shares Test will get 1333MHz of the 4000MHz available while Production will get 2666MHz. Or Test gets 33% of what it wants to use and Production gets 66% of what it wants to use. As the Production workload decreases this ratio should be maintained until Production gets to it’s reservation. At which point Production is in effect getting 100% of the CPU it wants while Test is getting 93.75%
  25. Expandable Reservation When a VM starts the Reservation set for that VM is taken from the Reservation available within the Resource Pool. The total reservations of the child VMs may not be more than the Reservation for the Resource Pool. However if Expandable Reservation is turned on then a Resource Pool may satisfy it’s Reservation requirements by using the Reservation of another Resource Pool. This however may stop the 2nd Resource Pool from starting VMs as it itself cannot satisfy the Reservation requirements of the VM which wants to start.
  26. Expandable Reservation When a VM starts the Reservation set for that VM is taken from the Reservation available within the Resource Pool. The total reservations of the child VMs may not be more than the Reservation for the Resource Pool. However if Expandable Reservation is turned on then a Resource Pool may satisfy it’s Reservation requirements by using the Reservation of another Resource Pool. This however may stop the 2nd Resource Pool from starting VMs as it itself cannot satisfy the Reservation requirements of the VM which wants to start.
  27. What’s the worst that can happen? Well if you push things too far, all those things that the Hypervisor can do to try and keep things running will eventually be overwhelmed. If you try to use too much memory you’ll start to see ballooning on a consistent basis, then swapping. At that point performance will degrade rapidly. Watch active memory values and take ballooning increasing as the indication things are getting tight. CPU is as always a more gentle decay in performance. CPU also has it’s indicators that the limits are being approached. CPU Ready and Co-Stop are indicators that VMs are finding it tricky to find CPUs when they want to do some processing. The reason CPU degrades differently to Memory is that it’s used differently. A process is in memory all the time, but only uses a CPU when it needs. So CPU busy is dictated by how frequently the CPU is required and for how long. The performance of a transaction will be dictated by the ‘chance’ that a CPU will not be available when the transaction arrives. If all the CPUs are busy it’ll enter a queue. And this is where queueing theory comes in.
  28. Any system has a finite set of resources. If you only have a single user trying to use one workstation then there is no contention for the use of that workstation. As soon as you have more than one user then there is a chance that they will want to use the workstation at the same time. That’s contention. But it’s perfectly normal and happens inside every OS all the time. There are lots more process threads than there are CPUs, and when there is contention, then the processes queue. Poor performance only occurs when queueing becomes excessive.
  29. Queueing theory is pretty simple. You have a ‘server’. Think of this as the CPU or the person sat at the checkout scanning groceries. They work at a constant pace, and are fed with work from a queue. The Queue is filled by transactions or customers. The response time of a transaction (from arriving to leaving), is the sum of the time spent queueing, and being served. Given identical transactions, or customers, we know the service time is a constant. What can change is the Arrival rate, and the time spent in the Queue.
  30. What we have here is a chart showing response time on the Y-Axis and the utilisation of the server on the X-Axis. The reason the chart starts part way up the Y-Axis is the Service Time. That’s static. As the utilisation of the server becomes higher the chance of the server being busy when a new transaction/customer arrives increases, and therefore the longer the transaction/customer will spend in the queue. As we can see, it’s not a straight line. All of this can be plotted using the formula R = S / (1-U). Where S is the service time and U is the Utilisation of the server.
  31. When we add in multiple Servers, the line ends up having a more sudden degradation. This change is sometimes known as “the knee of the curve”. The more servers or CPUs we include the higher the utilisation of them before the knee of the curve is observed. This is because there is more chance that a CPU will be available at the moment a piece of work arrives. Given most of the hosts in a virtualised environment are going to have high numbers of CPUs this means we can run them with pretty high utilisations before queueing takes over. Consider though that a multiple vCPU VM needs multiple logical CPUs on the host available to do anything. This has the effect of reducing the number of ‘servers’ or CPUs in the system. If all your VMs are 4 vCPUs and you have 16 logical CPUs in the host. That’s the equivalent of a 1 vCPU VM on a 4 CPU host. The moral of the story here being “use as few vCPUs as possible in each VM, and you’ll reduce queueing and improve performance.
  32. The reason we were talking about queueing theory is that it’s part of how the hypervisor copes with CPU oversubscription. By queueing the VMs. You can see when this starts to happen by monitoring ready and Co-Stop metrics. You should typically be more worried about CPU busy than you are the ratio of CPUs in the VMs to the logical CPUs presented by the hardware. Because all this is maths, people have written programs to model this stuff. So you can see how busy you can run your hosts before performance becomes unacceptable.
  33. Hopefully, if there was anybody in the room who considered oversubscription to mean poor performance, I’ve gone some way to showing you that’s not the case. Virtualisation platforms are set up for this, it’s part of the very reason they exist in the first place. Don’t throw that away. It’s going to cost you money. Plan for performance. Look at the metrics on your systems and use them to model the point where performance will degrade because of utilisation. You cannot do that by looking at the ratio of vCPUs to logical CPUs. But you can with utilisation figures.