As a consultant for Metron I engage with many Capacity Managers throughout the year. One thing I find, is that many clients are running very conservatively on their VMware or Hyper-V virtualized environments. There appears to be an inherent fear within IT departments that “something bad will happen” if Compute resources are oversubscribed. Virtualization was intended to make better use of our hardware. Virtualization does have other benefits, but let’s hold on to that central concept and make good use of what we have.
These slides explore what oversubscription is and why you don’t need to fear it:
• Oversubscription Overview
• CPU Oversubscription
• Memory Oversubscription
• What’s the worst that can happen? (Queueing theory, the simple version)
2. Commercial in Confidencewww.metron-athene.com
Topics
• What led me here
• Oversubscription Overview
• CPU Oversubscription
• Memory Oversubscription
• What’s the worst that can happen? (Queueing theory, the simple
version)
5. Commercial in Confidencewww.metron-athene.com
Flying Navigation by Dead Reckoning
• You know where you started
• You know how long you flew for
• You know your air speed
• You know what direction you flew in
• What if the wind changed in the last 8 hours?
• WW2 bombing saw 1 in 5 bomb loads within 5
miles of the target.
8. Commercial in Confidencewww.metron-athene.com
What can be oversubscribed?
• CPUs
• Memory
• Disk
• NICs
– Nobody ever seems to think about that one
– VMs on a single host = no NIC involved
– Otherwise…
9. Commercial in Confidencewww.metron-athene.com
CPU VMware Maximums
• Virtual Machine Maximum
– 128 vCPUs per VM
• Host CPU maximums
– Logical CPUs per host 480
– Virtual machines per host 1024
– Virtual CPUs per host 4096
– Virtual CPUs per core 32
• The achievable number of vCPUs per core depends on the workload
and specifics of the hardware. For more information, see the latest
version of Performance Best Practices for VMware vSphere
https://www.vmware.com/pdf/vsphere6/r60/vsphere-60-
configuration-maximums.pdf
12. Commercial in Confidencewww.metron-athene.com
Memory
• Transparent Page Sharing
– Deduplication in memory
• Balloon Driver
– Vmmemctl process “steals” memory inside the VM allowing that
memory to be used by other VMs. This may cause the OS to page.
• VMkernel Swap
– VM thinks pages are in memory. ESX has put that memory on disk
in a Vmkernel Swap file.
– “Performance is NOT optimal”
15. Commercial in Confidencewww.metron-athene.com
Memory test
• Memory vs. disk speed is…?
– A) Memory is 100x faster than disk
– B) Memory is 1,000x faster than disk
– C) Memory is 10,000x faster than disk
– D) Memory is 100,000x faster than disk
– E) Memory is 1,000,000x faster than disk
– F) I have no memory of the event, your honour
16. Commercial in Confidencewww.metron-athene.com
VMkernel Swap
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Balloon
Swap File
Reservation MB
Example:
• Assume maximum memory
contention
• Default 65% can be Balloon
driver
• Example Reservation is 30%
• 5% In the VMkernel (.vswp)
file.
18. Commercial in Confidencewww.metron-athene.com
Reservations
• Resource Pools or VMs
• If they want it, they get it
• If they don’t want it, it’s available to all
• Cannot reserve more than exists
• Oversubscribe
– Protect core VMs with a reservation
19. Commercial in Confidencewww.metron-athene.com
Memory Idle Tax
• Memory has Shares
• Memory Tax associates a value to each page used
• Default Idle Tax rate is 75%
• This makes idle memory cost 4 times as many shares as active
memory
21. Commercial in Confidencewww.metron-athene.com
Time Slicing
• Cores are shared between vCPUs in time slices
– 1 vCPU to 1 core at any point in time
• More vCPUs = More time slicing
• Processes do this on CPUs all the time
– So why it is so scary?
– Over 100 processes on my laptop share 4 CPUs
Running Dormant/IdleVM1
VM1
24. Commercial in Confidencewww.metron-athene.com
Reservations
Prod VM
Reservation
CPUUsedbyProductionVM
CPU Used by Test VM
1)The Production VM wants to use all the CPU
available.
2)The Test VM starts and also wants to use all
the CPU available.
3)Each uses 50% CPU
4)The Production VM wants 250MHz CPU
while Test wants to use 4000MHz CPU.
Production gets 100% of it’s request. Test does
not.
100% CPU
100% CPU
0% CPU
50% CPU
50%CPU
25. Commercial in Confidencewww.metron-athene.com
Reservations & Shares
Prod VM
Reservation
CPUUsedbyProductionVM
CPU Used by Test VM
1)The Production VM (2000 Shares) wants to
use all the CPU available.
2)The Test VM (1000 Shares) also wants to use
all the CPU available.
3)Production gets 66% CPU, Test gets 33%
CPU.
4)The Production VM wants 250MHz CPU
while Test could still use 4000MHz CPU.
Production gets 100% of it’s request. Test does
not.
100% CPU
100% CPU
0% CPU 33% CPU
66%CPU
29. Commercial in Confidencewww.metron-athene.com
Contention and Queuing
• Finite system resources
• Single workstation = no contention (usually)
• More than One User = Possible Contention
• Contention = Queuing
– This is COMPLETELY NORMAL
– It’s how operating systems work.
• Excessive Queuing = Poor Performance and
Long Response Times
33. Commercial in Confidencewww.metron-athene.com
Why are we interested in this queue stuff again?
• VMs Queue for free CPUs
– Ready Time
– Co-Stop time
– Higher utilisation = higher contention
– More concerned about CPU busy than vCPU to logical CPU ratio
– Because it’s maths, you can model it
34. Commercial in Confidencewww.metron-athene.com
Roundup
• Oversubscription does not equal unacceptable performance
• Virtualisation is expecting you to oversubscribe
– It’s the reason it exists
• Take the fear out of oversubscription through proper planning
– Plan for performance, not ratios
When I wrote the Title and Synopsis for this presentation I happened to choose the word oversubscription. It’s been pointed out to me that many people refer to “overcommit” rather than “oversubscribe”.
Both words appear to be in use to describe this. Google does some nice work here to return results using either option.
In my role with Metron I get to visit lots of different people working in lots of different industries. They are mostly capacity managers working in IT departments though, so there is some common ground.
Over the past year I’ve had a number of mildly frustrating conversations with organisations. These tend to be newer and/or don’t have a good history of Capacity Management. The frustration has been around ‘Oversubscription’ in virtualised environments. I’ll start talking about how we can monitor the environment to ensure good performance in the future. When they’ll stop me and say, quite proudly, “Oh, we don’t oversubscribe, we don’t want to impact performance”. The pride is almost the worst part. They’ll beam a smile at the senior staff as if to say “We’ve got this, don’t worry”. The problem being the overspend that’s required to have that attitude.
Ultimately there is a fear in these departments that oversubscription = poor performance. It’s considered to be a 1:1 relationship. The reason for that, is, to some extent, a misunderstanding of what oversubscription is. It’s go the word ‘over’ in it then it must be bad. Nothing in our department is ‘over’. We’re all looking at the same word, they see something bad, I see an opportunity to save some money. Correct me if you think I’m wrong, but saving money is typically thought to be a good thing.
Avoiding oversubscription is a bit like navigating by dead reckoning.
You know where you started.
You know how long you were flying, and your air speed, and in what direction. You’ve even tried to take account of the wind speed and direction, but you are using a forecast for those that’s out of date when you did the planning never mind some hours into the flight.
In WW2 a bomb was considered to be on target if it was within 5 miles of the actual target. We only managed that with 1 in 5. Dead reckoning isn’t very accurate on it’s own. The situation is just more complex than that, and the same remains true for people avoiding oversubscription.
The essence of what these sites seem to be doing is this:
We start with a 5 Host cluster that has 120 Logical CPUs, and 180GB RAM. We’re then going to issue no more than 96 vCPUs and 144GB RAM across the VMs. This allows for a host to fail and we can still run everything. We’ll also have great performance because VMs will get a CPU whenever they want it, because it’s theirs, and the same with memory. All the memory a VM want’s is real RAM.
I’m not going to deny that performance will be about as good as it can be. But it’s not going to be terribly efficient. Chances are you could turn off 2 hosts and still see no impact in performance. Who wouldn't like to reduce their ESX licence, and related power costs by 20%, while still having a spare host?
So what is oversubscription? Well the most obvious example happens in storage. Thin provisioning has been around a long time, and is the same thing, by another name.
With storage you have the LUNs that are allocated. Now traditionally these would have been a physical allocation on disk that was available for use. But with Thin Provisioning you can allocate more space to LUNs than you actually have. The reason being, that most disks on servers are not full. So if the average disk is 30% full, you could get away with only having 50% of your allocated storage as real usable space that exists, and you’d still have plenty of space to grow into.
On top of that, some storage systems will do their own deduplication. So if you have 200 Windows 2012 servers, all with a C drive that just has the OS on it. That’s about 12 GB per server storing the same base OS files. Or 2.4TB of space. Now those OS disks need space for things like memory dumps, updates and log files etc, which is the unused space. But do you want to spend 2.4TB of storage storing the same 12GB of files 200 times? Probably not. You’d prefer to store a single copy of all the identical files and let them all access that single copy. So you’re not just ignoring some of the unused space, you’re able to store less as well.
So, 200, 32 GB drives (Minimum Windows 2012 requirement), would be 6.4TB. It’s now theoretically come down to something like 20GB used space with thin provisioning and deduplication.
Oversubscription is a good thing.
In our virtual world, now that we have broken the link between the OS and the hardware, we can over provision all sorts of things.
CPU, Memory, Disk (as we mentioned) and NICs are all “Oversubscribed”.
Disk we already looked at, Memory and CPU we’ll go into more detail on later. But I thought it was worth mentioning NICs here. Typically people seem to be running with 10 - 15 VMs on a single host. Which will have significantly fewer NICs installed.
A Sever typically wouldn’t use all the bandwidth of it’s NIC. So that unused bandwidth is like the unused space on disk.
When the VMs talk to other VMs on the same Host, that’s not generating traffic though the physical NICs, so we might consider that the equivalent of de-duplication.
CPU and Memory are the main items people consider for Virtualised systems. So let’s lay down the maximums for a moment.
A maximum of 480 Logical CPUs. Logical CPUs being simultaneous threads so that might be 240 hyper-threaded cores.
1024 VMs max on a host, with a max of 4096 vCPUs between them.
Then we get to the maximum with a caveat.
32 vCPUs to a core, but it depends on “the workload and specifics of the hardware”. This raises 2 points. 1)Clearly it’s ok to oversubscribe CPUs and 2)There is no set number to tell you how much oversubscription is OK.
Memory is a lot simpler.
6TB or 12TB in a host depending on hardware.
4TB in any single VM.
Having set out those few ground rules we can talk about memory oversubscription.
Just like disks have a lot of free space, servers have typically run with free space in memory. Then there are a number of tricks that the hypervisor can do to find even more savings.
Page Sharing (Deduplication)
The Balloon Driver
Reservations
Shares
and
Swap space
Transparent Page Sharing is where the OS stores a single page of memory and shares it out to multiple VMs that have the same page in their memory. It’s very much the same as deduplication on storage. Only one copy of the data is stored. This just happens in VMware it’s not something you’d turn off. But it does mean that if you are doing a 1:1 VM MB to Host MB memory. Assuming your running a lot of the same OS and applications etc, then you are going to see a lot of spare memory on the hosts (if you bother to go and look).
The Balloon driver is a nice little device. Essentially it inflates in the memory of a VM. Asking the OS for pages of RAM. As those pages are all the same, only a single copy is needed in RAM. Now the reason it inflates is to free up that memory for use by another VM. So consistent levels of Balloon Driver memory are an indicator of memory pressure on the host. At this point you may have taken oversubscription a touch too far. The other thing is that the OS doesn’t tell the Hypervisor when a page of memory has been release by a process. So by inflating the balloon driver and then deflating it, you can get the OS to allocate unused pages to the balloon driver, then if they don’t get over written when the balloon driver deflates, you know the processes in the OS don’t need that page of memory and you can use it for something else. Of course if the balloon driver inflates and the OS is forced to start pushing pages out into it’s swap file, that’s not great.
Swapping is generally bad. When the hypervisor has to swap memory out to disk things have got really bad. You do not want to see this.
Transparent Page Sharing
When two or more Virtual Machines have the same pages of data in memory, VMware can store a single copy and present it to all the VMs. Should a VM alter a shared memory page, a copy will be created by VMware and presented to that VM.
Example
VM1 starts and allocates some unique memory.
VM2 starts and allocates some unique memory.
VM1 allocates memory for a standard windows dll
VM2 also allocates memory for the same standard windows dll
VMware maps both systems memory to the same page in RAM.
Balloon Driver (vmmemctl)
The Problem
A process in VM1 is shut down and it’s memory is freed in the OS.
The “hardware” does not know. The data is still there but only the OS inside the VM knows it can overwrite it.
The VMware Solution
When memory gets tight on an ESX host, the Vmkernel will pick a VM (based on shares), and tell the balloon driver to request some memory.
The balloon driver requests memory and “pins” it so it cannot be paged. The Memory on the ESX is then freed up and can be allocated to another system.
Memory test
If memory must be copied to or from disk because there is more requested than can be satisfied, what’s the penalty for doing this ?
A modern disk will respond to an I/O in about 5 milliseconds (5 * 1/1000 of a second). Access to memory is usually in the order of 50 nanoseconds (50 *1/1,000,000,000 of a second).
That makes disk access a hundred thousand (100,000) times SLOWER than memory access. Tiny numbers like this are difficult to comprehend, so imagine that the memory access time was 1 second. To write something to disk would then take about 27 ¾ hours to complete.
That’s one good reason for avoiding swapping if at all possible!
VMkernel Swap
A reservation is typically set against a resource pool and filters down to give a VM rights against memory. Essentially if a reservation has been set and applies to this VM, then the VM is guaranteed that amount of memory will be made available in RAM on the EX host. You can never reserve more memory than exists. So reservations can ensure good performance for the VMs you care about. You put the VMs in a resource pool, and allocate a reservation that’s appropriate. That might be your 1:1 ratio with allocated and reservation. Then let other less important VMs worry about oversubscription.
When an ESX host is very short of memory it may have to resort to using .vswp swap files for the VM memory. At this point performance will be affected as data that the OS believes is in memory is, in reality, now on disk.
A VM as default can have up to 65% of its memory used by the balloon driver. It may also have a memory reservation. The reservation cannot be swapped or taken up by the balloon driver. Any memory outside the 65% used by the balloon driver, and the reservation, can be placed into a .vswp file.
In reality you never want this to happen.
If we look at some stats for a single VM.
This is a 4GB VM, but it’s only accessing about 400 MB on a regular basis. It’s got 2.6GB of memory that’s unique to itself, and 1.4GB that’s shared with other VMs.
So at least one other VM is likely to be sharing at about 1.4GB memory as well. Given there are a lot of windows VMs in that cluster it’s likely a lot of them have similar amounts of shared memory. If there are 10 VMs on that host then that’s about 15GB or RAM that you don’t have to have installed. Or rather, a few more VMs that will fit on the host.
There’s also a couple of hours where the balloon driver steals some memory from the VM. Only about 50MB and given the VMs only accessing 4 to 500MB of RAM, out of the 2.6GB that it’s using, the OS probably just released some cache to satisfy that request.
Reservations are associated with Resource Pools or individual VMs. Essentially you are setting a value for CPU or Memory that the VM is guaranteed to get. If the VM doesn’t use all it’s reservation other VMs can make use of the Memory and CPU.
The fairly obvious caveat is that you cannot have a total list of reservations that are bigger than the hardware.
You can use reservations to ensure that important VMs get the resources they want. So you don’t have to worry about avoiding oversubscription for everything. Pick the VMs you want to perform their best and give them a reservation that ensures that. Then your background VMs can be pushed out the way if required.
Like reservations, a VM also has an associated number of shares. The more shares, the more priority it has over the resource if there is contention.
If a virtual machine is not actively using its currently allocated memory, ESX Server charges a memory tax — more for idle memory than for memory that is in use. That is, the idle memory counts more towards the share allocation than memory in use. The default tax rate is 75 percent, that is, an idle page of memory costs as much as four active pages.
The end result is that VMs holding onto a lot of idle memory, will be more likely to have the balloon driver inflate inside them to try and release some of that idle memory for use by other VMs.
Memory is fairly easy to describe but there are a lot of things going on. CPU Oversubscription and the technologies involved can be a little more complex to visualise, but there are less tools that the hypervisor has to work with.
For a start, time is no longer a constant. The hypervisor has the ability to run time at whatever speed it likes. Just so long as it averages out in the end.
Co-Scheduling is where we have to have all the vCPUs for a single VM, mapped to logical CPUs from the hardware.
Reservations and Shares apply here also and we’ll have more of a look at how they work.
Limits (also exist for memory), but these can be applied to restrict some VMs down to a smaller amount of CPU than their vCPU allocation would otherwise allow them to have.
In a typical vmware host we have more vCPUs assigned to VMs than we do physical cores. The processing time of the physical cores (or logical CPUs if hyper threading is in play), has to be shared among the vCPUs in the VMs. The more vCPUs we have, the less time each can be on the core, and therefore the slower time passes for that VM. To keep the VM in time, extra time interrupts are sent in quick succession when the VM is processing. So time passes slowly and then very fast.
Significant improvements have been made in this area over the releases of VMware. vCPUs can be scheduled onto the hardware a few milliseconds apart. But the basic concept remains in place.
Here’s an animation to show the effect of what is happening inside the host to schedule the physical CPUs/cores to the vCPUs of the VMs. Clearly most hosts have more than 4 consecutive threads that can be processed. But let’s keep this simple to follow.
1)VMs that are “ready” are moved onto the Threads.
2)There is not enough space for all the vCPUs in all the VMs. So some are left behind. (CPU Utilisation = 75%, capacity used = 100%)
3)If a single vCPU VM finishes processing, the spare Threads can now be used to process a 2 vCPU vm. (CPU Utilisation = 100%)
4)A 4 vCPU VM needs to process.
5)Even if the 2 single vCPU VMs finish processing, the 4 vCPU VM cannot use the CPU available.
6)And while it’s accumulating Ready Time, other single vCPU VMs are able to take advantage of the available Threads
7)Even if we end up in a situation where only a single vCPU is being used, the 4 vCPU VM cannot do any processing. (CPU utilisation = 25%)
As mentioned when we discussed time slicing, improvements have been made in the area of co-scheduling with each release of VMware. Amongst other things the time between individual CPUs being scheduled onto the physical CPUs has increased, allowing for greater flexibility in scheduling VMs with large number of vCPUs. Acceptable performance is seen from larger VMs.
Along with Ready Time, there is also a Co-Stop metric. Ready Time can be accumulated against any VM. Co-Stop is specific to VMs with 2 or more vCPUs and relates to the time “stopped” due to Co-Scheduling contention. E.g. One or more vCPUs has been allocated a physical CPU, but we are stopped waiting on other vCPUs to be scheduled.
I’d love to do an animation of that but my powerpoint skills would need seriously improving. Imagine the bottom of a “ready” VM displayed, sliding across to a thread and the top sliding across as other VMs move off the Threads. So the VM is no longer rigid it’s more of an elastic band.
Reservations, Shares and Limits.
VMs and Resource Pools can be allocated Reservations, Shares and Limits.
These apply to the amount of CPU and Memory a VM or Resource pool can use.
In the example above we have an Engineering Resource Pool containing 2 Virtual Machines.
Test has 1000 CPU shares and Production has 2000 CPU shares. Giving a total of 3000 shares between them. If there is contention for CPU resource then Production will be given twice as much CPU time as Test.
Also notice the reservation on the Resource Pool has an Expandable Reservation. This means that if there is another resource pool not using it’s reservation Engineering could claim and use that reservation if required. This could cause problems if the 2nd resource pool wishes to use it’s reservation as it will not be able to push Engineering out. So while this may provide flexibility it’s use should be monitored.
Reservations
Here’s a quick demonstration of what a reservation does.
When both VMs want the same amount of resource (and have the same shares), they will get an even share of the CPU. Assuming they both want all of the 4000MHz available they will each get 50% of what they want.
As the Production workload reduces, Test will take more and more of the CPU however Production will always have the rights to use 250MHz CPU.
At the point where Production is using 250MHz CPU Production is in effect getting 100% of the CPU it wants while Test is getting 93.75% of the CPU it wants. Despite having the same shares values.
Reservations and Shares
If we run the scenario again but this time include the Shares values for the VMs the situation is different.
When they are both trying to use all of the CPU the effect of the shares will come into play and with only 1000 shares Test will get 1333MHz of the 4000MHz available while Production will get 2666MHz. Or Test gets 33% of what it wants to use and Production gets 66% of what it wants to use.
As the Production workload decreases this ratio should be maintained until Production gets to it’s reservation. At which point Production is in effect getting 100% of the CPU it wants while Test is getting 93.75%
Expandable Reservation
When a VM starts the Reservation set for that VM is taken from the Reservation available within the Resource Pool. The total reservations of the child VMs may not be more than the Reservation for the Resource Pool.
However if Expandable Reservation is turned on then a Resource Pool may satisfy it’s Reservation requirements by using the Reservation of another Resource Pool. This however may stop the 2nd Resource Pool from starting VMs as it itself cannot satisfy the Reservation requirements of the VM which wants to start.
Expandable Reservation
When a VM starts the Reservation set for that VM is taken from the Reservation available within the Resource Pool. The total reservations of the child VMs may not be more than the Reservation for the Resource Pool.
However if Expandable Reservation is turned on then a Resource Pool may satisfy it’s Reservation requirements by using the Reservation of another Resource Pool. This however may stop the 2nd Resource Pool from starting VMs as it itself cannot satisfy the Reservation requirements of the VM which wants to start.
What’s the worst that can happen?
Well if you push things too far, all those things that the Hypervisor can do to try and keep things running will eventually be overwhelmed.
If you try to use too much memory you’ll start to see ballooning on a consistent basis, then swapping. At that point performance will degrade rapidly. Watch active memory values and take ballooning increasing as the indication things are getting tight.
CPU is as always a more gentle decay in performance. CPU also has it’s indicators that the limits are being approached. CPU Ready and Co-Stop are indicators that VMs are finding it tricky to find CPUs when they want to do some processing.
The reason CPU degrades differently to Memory is that it’s used differently. A process is in memory all the time, but only uses a CPU when it needs. So CPU busy is dictated by how frequently the CPU is required and for how long. The performance of a transaction will be dictated by the ‘chance’ that a CPU will not be available when the transaction arrives. If all the CPUs are busy it’ll enter a queue. And this is where queueing theory comes in.
Any system has a finite set of resources. If you only have a single user trying to use one workstation then there is no contention for the use of that workstation. As soon as you have more than one user then there is a chance that they will want to use the workstation at the same time. That’s contention. But it’s perfectly normal and happens inside every OS all the time. There are lots more process threads than there are CPUs, and when there is contention, then the processes queue. Poor performance only occurs when queueing becomes excessive.
Queueing theory is pretty simple.
You have a ‘server’. Think of this as the CPU or the person sat at the checkout scanning groceries. They work at a constant pace, and are fed with work from a queue. The Queue is filled by transactions or customers. The response time of a transaction (from arriving to leaving), is the sum of the time spent queueing, and being served. Given identical transactions, or customers, we know the service time is a constant. What can change is the Arrival rate, and the time spent in the Queue.
What we have here is a chart showing response time on the Y-Axis and the utilisation of the server on the X-Axis.
The reason the chart starts part way up the Y-Axis is the Service Time. That’s static. As the utilisation of the server becomes higher the chance of the server being busy when a new transaction/customer arrives increases, and therefore the longer the transaction/customer will spend in the queue. As we can see, it’s not a straight line.
All of this can be plotted using the formula R = S / (1-U). Where S is the service time and U is the Utilisation of the server.
When we add in multiple Servers, the line ends up having a more sudden degradation. This change is sometimes known as “the knee of the curve”. The more servers or CPUs we include the higher the utilisation of them before the knee of the curve is observed. This is because there is more chance that a CPU will be available at the moment a piece of work arrives.
Given most of the hosts in a virtualised environment are going to have high numbers of CPUs this means we can run them with pretty high utilisations before queueing takes over.
Consider though that a multiple vCPU VM needs multiple logical CPUs on the host available to do anything.
This has the effect of reducing the number of ‘servers’ or CPUs in the system. If all your VMs are 4 vCPUs and you have 16 logical CPUs in the host. That’s the equivalent of a 1 vCPU VM on a 4 CPU host. The moral of the story here being “use as few vCPUs as possible in each VM, and you’ll reduce queueing and improve performance.
The reason we were talking about queueing theory is that it’s part of how the hypervisor copes with CPU oversubscription. By queueing the VMs.
You can see when this starts to happen by monitoring ready and Co-Stop metrics. You should typically be more worried about CPU busy than you are the ratio of CPUs in the VMs to the logical CPUs presented by the hardware.
Because all this is maths, people have written programs to model this stuff. So you can see how busy you can run your hosts before performance becomes unacceptable.
Hopefully, if there was anybody in the room who considered oversubscription to mean poor performance, I’ve gone some way to showing you that’s not the case.
Virtualisation platforms are set up for this, it’s part of the very reason they exist in the first place. Don’t throw that away. It’s going to cost you money.
Plan for performance. Look at the metrics on your systems and use them to model the point where performance will degrade because of utilisation. You cannot do that by looking at the ratio of vCPUs to logical CPUs. But you can with utilisation figures.