Problems that arise when software is run at a privilege level other than the privilege level for which it was written .
Example: the CS register which points to the code segment. If the PUSH instruction is executed with the CS register, the contents of that register (which include the current privilege level) is pushed on the stack. A guest OS could easily determine that it is not running at privilege level 0.
OSs expect to have access to the processor’s full virtual address space (in IA-32. linear address space)
The VMM could run entirely within the guest’s virtual-address space (but the VMM’s instructions and data structures would use a substantial amount of the guest’s virtual address space.
The VMM could run in a separate address space, but it must use a minimal amount of the guest’s virtual address space for the control structures that manage transitions between guest software and the VMM (IDT and GDT for IA-32)
The VMM must prevent guest access to those portions of the guest’s virtual address space that the VMM is using. Otherwise the VMM’s integrity could be compromised.
Ring deprivileging can interfere with the effectiveness of facilities in the IA-32 architecture that accelerate the delivery and handling of transitions to OS software.
For example: The IA-32 SYSENTER and SYSEXIT instructions support low-latency system calls. SYSENTER always effects a transition to privilege level 0, and SYSEXIT faults if executed outside that ring.
The VMM must emulate every execution of SYSENTER and SYSEXIT causing serious performance problems.
There are instructions that access privileged state and do not fault when executed with insufficient privilege.
For example, the IA-32 registers GDTR, IDTR, LDTR, and TR contain pointers to data structures that control CPU operation. Software can execute the instructions that read, or store, from these registers at any privilege level.
The mechanisms of masking external interrupts for preventing their delivery when the OS is not ready for them is a big challenge for the VMM design. The VMM must manage the interrupt masking in order to prevent an OS of masking the external interrupts preventing any guest to receive interrupts.
For example: IA-32 uses the interrupt flag (IF) in EFLAGS register to control interrupt masking. A value of 0 indicates that interrupts are masked.
Access to Hidden State
Some components of the processor state are not represented in any software- accessible register.
For example: the IA-32 has the hidden descriptor caches for segment registers. A segment-register load copies of the GDT and LDT into this cache, which is not modified if software later writes to the descriptor tables.
Data structure that manages VM entries and VM exits.
VMCS is logically divided into:
VM-execution control fields
VM-exit control fields
VM-entry control fields
VM-exit information fields
VM entries load processor state from the guest-state area.
VM exits save processor state to the guest-state area and the exit reason, and then load processor state from the host-state area.
VT-x Operations IA-32 Operation VMX Root Operation VMX Non-root Operation . . . VMXON VMLAUNCH VMRESUME VM Exit Ring 0 Ring 3 Ring 0 Ring 3 VM 1 Ring 0 Ring 3 VM 2 Ring 0 Ring 3 VM n VMCS 2 VMCS n VMCS 1
It can help a lot when you need to switch tasks, or you must allocate a certain amount of CPU power to a task. For telecom and networking applications, it makes virtualization a useful tool and possibly a must have feature. On the other end of the spectrum, it can help for media applications like media PCs and Tivo-type devices. For the business world, it doesn't buy you all that much.
VT-d: Intel® Virtualization Technology for Directed I/O
Provides the capability to ensure improved isolation of I/O resources for greater reliability, security, and availability.
Supports the remapping of I/O DMA transfers and device-generated interrupts.
Provides flexibility to support multiple usage models that may run un-modified, special-purpose, or "virtualization aware" guest OSs.
Intel® I/O Acceleration Technology (Intel® I/OAT) is a suite of features which improves data acceleration across the platform, from I/O and networking devices to the memory and processors which help to improve system performance.
Intel® QuickData Technology : designed to maximize the throughput of server data traffic across a broader range of configurations and server environments to achieve faster, scalable, and more reliable I/O.
Direct Cache Access (DCA) : Enables the CPU to pre-fetch data avoiding cache misses and improving application response times
MSI-X : Helps in load-balancing I/O network interrupts
Low latency interrupts : Automatically tune interrupt interval times depending on the latency sensitivity of the data
Receive Side Coalescing (RSC) : provides lightweight coalescing of receive packets, which increases the efficiency of the host network stack
In addition to consolidating CPU processes, you also effectively consolidate I/O bandwidth and switch processing capabilities onto the same platform
The overhead of this switching limits your bandwidth, adds CPU overhead, and effectively reduces the benefits of server virtualization. In some cases you may have a new problem in having created an I/O bottleneck
On the receive path, VMDq provides a hardware ‘sorter' or classifier that essentially does the pre-work for the VMM of directing which end VM the packets should go to. The NIC or LAN silicon is performing a hardware assist for the VMM layer.
The VMCS contains a number of fields that control VMX not-root operation by specifying the instructions and events that cause VM exits.
The VMCS includes controls that support interrupt virtualization:
External interrupt exiting: if it is set, all external interrupts cause VM exits. The guest is not able to mask these interrupts
Interrupt window exiting: if it is set a VM exit occurs whenever guest software is ready to receive interrupts.
Use TPR shadow: if is set, accesses to the APIC’s TPR through control register CR8 are handled in a special way: executions of MOV CR8 access a TPR shadow referenced by a pointer in the VMCS. The VMCS also includes a TPR threshold; a VM exit occurs after any instruction that reduces the TPR shadow below the TPR threshold. ( Flex Priority )
Exception bitmap: 32 entries for the IA-32 exceptions. To specify which exception should cause VM exits and which should not.
I/O bitmaps: one entry for each port in the 16-bit I/O space. An I/O cause a VM exit if it attempts to access a port whose entry is set in the I/O bitmap.
MSR bitmaps: two entries (read and write) for each model-specific register (MSR) currently in use. An execution of RDMSR (or WRMSR) causes a VM exit if attempts to read (or write) an MSR whose read bit (or write bit) is set in the MSR bitmaps.