The problem Storage I/O control is addressing is the situation where some less important workloads are taking the majority of I/O bandwidth from more important applications. In the case of the three applications shown here, the data mining is hogging a majority of the storage I/O resource. And the two more important to the business operations are getting less performance than needed. <Click> what one wants to see is a distribution of I/O that is aligned with the importance of each virtual machine. Where the most important business critical applications are getting the I/O bandwidth needed for them to be responsive and the less critical data mining application is taking less I/O bandwidth.
I/O shares can be set at the Virtual Machine level and although this capability has been there for a few previous releases, it was not enforced on a VMware cluster wide level until release 4.1. Prior to 4.1 the I/O shares and limits were enforced for a VM with more than one virtual disk or a number of VMs on a single ESX server. <click> But with 4.1 these I/O shares are now used to distribute I/O bandwidth across all the ESX servers which have access to that shared datastore.
The ability to set shares for I/O is done via edit properties on the virtual machine. This screen shows two virtual disks and the ability to set priority and limits on the I/Os per second.
Once the shares are set on the virtual machines in a vmware cluster, one needs to also enable the “Storage I/O Control” option on the properties screen for that datastores on which you want to have Storage I/O control working. The other thing that is needed for Storage I/O to kick in is that congestion measured in the form of latency must exist for a period of time on the datastore before the I/O control kicks in. The example which comes to mind is a car pool lane is not typically enforced when there is not a lot of traffic on the highway. It would be of limited value if you could travel at the same speed in the non car pool lane as well as the car pool lane. In much the same way, Storage I/O control will not be put into action when there is latency below a sustained value of 30 msec.
One can then observe which VMs have what shares and limits set via the virtual machine tab for the datastore. As datastores are now objects managed by vCenter, there are several new views in vSphere that enable you see which ESX Servers are connected to a datastore and which VMs are sharing that datastore. Many of these views also allow one to customize which columns are displayed and create specific views to report on usage.
The way in which these I/O shares are used to effect performance is that queue depth for each ESX server can be assigned and throttled to align the specific shares assigned for each VM running on the collective pool of storage. In the case of our 3 VMs displayed earlier, we have the data mining vm getting the least number of queues assigned while the other two VMs are getting many more queuing slots enabled for their I/O.
It is important to understand that SIOC does not kick in until the congestion of the datastore gets above the threshold of 35 ms for a period of time. It is a weighted average that is used to determine this latency is not just a minor spike that comes and goes quickly. This threshold value can be modified but should be done only with great care and consideration as if its too low, it might be on and off again a lot or if too high might not kick in at all.