On October 23rd, 2014, we updated our
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
In the early days of computing, virtualization as we know it today didn't really exist. Instead, emulation was used.
In emulation, the behavior of a complete computer is copied to a software program. The emulation layer talks to an operating system which on its turn talks to the computer hardware. The operating system that you want to install in an emulation layer doesn't see that it is used in an emulated environemt and therefore you can install it as you are used to install your favourite operating system.
Two popular open source emulators are QEMU (http://fabrice.bellard.free.fr/qemu/) en Bochs
One of the most important properties of emulation, is that all hardware is emulated, the CPU as well.
This has advantages, such as the fact that you can run an operating system that was developed for another architecture on your architecture. With this advantage however, also comes the most important disadvantage; this same option to virtualize a complete CPU comes with a heavy performance price.
In the next generation, virtualization was taken to a higher level. This means that between the emulation layer that was responsible for interpreting instructions from the virtualized machines and the
hardware, no host operating system was required between virtual machines and hardware anymore.
Instead the virtual machine monitor, also known as the hypervisor was introduced to run directly on the hardware. Because of this new architecture, virtualization became much more efficient. VMware for
example was very succesful with this approach as implemented in VMware ESX.
There are however two different approaches when virtualization is used this way. In the old approach all instructions that were generated by the virtualized machine needed to be translated to the appropriate format for the CPU, which involves a lot of work for the hypervisor.
In the new approach which is used by Xen, there is no translation between the instructions that leave the virtualized machine
and the CPU that executes them.
This can be accomplished in two ways .
Option number one is to use a CPU that understands the unmodified instructions that are generated by the virtualized operating system and interprets them (full virtualization).
Option number two is to modify the operating system so that it generates instructions that are optimized for use in a virtualized environment (para virtualization).
Full versus Para Virtualization
Full virtualization is one way of handling virtualization. Using this method, the virtual machine talks to a component called the virtual machine monito r and this virtual machine monitor talks to the hardware platform directly.
To use full virtualization in a Xen environment, you need a CPU that understands unmodified instructions that are generated by the virtualized operating system. Without this special feature on the CPU's, it's not possible to use full virtualization in Xen.
This is because in the Xen approach not every instruction that is generated by the virtualized operating system is translated to a format that every CPU understands, because this is very resource intensive. Instead, the virtualization feature that is implemented in modern CPU's helps the virtualized operating system in a way that it can send out unmodified instructions.
The main advantage of full virtualization, is that an unmodified operating system is installed. This means that virtually every operating system that runs on the same architecture can be virtualized.
Full versus Para Virtualization
The most efficient approach in virtualization, is para virtualization .
In para virtualization , the guest operating system uses a specialized API to talk to the virtual machine monitor which is responsible for handling the virtualization requests and putting them to the real hardware.
Because of this special API , the virtual machine manager doesn't need to do a resource intensive translation of instructions any more before they can be passed to the hardware.
Also, when using the paravirtualization API, the virtualized operating system is capable of generating much more efficients instructions.
A disadvantage however, is that you do need a modified operating system that includes this specific API and for certain operating systems (Windows mainly) this is an important disadvantage because such an API is not available.
What is virtulization?
• virtulization is a broad term that refers to the abstraction
of computer resources.
• Server virtulization
– Hardware – ex: IBM pSeries and zSeries LPARS
– Software – ex: Vmware, Xen, Solaris Containers, SWsoft
– OS instances think they are controlling the “real” machine*
– Virtulization layer mediates access to hardware resources
– Permits multiple OS instances to coexist on a single server
– Even incompatible OS's can share a single server
– the “layer” is referred to as a Virtual Machine Monitor (VMM)
full virtualization on CPUs that have been designed specifically for virtualization. (Examples include the next-generation AMD processors with AMD-V.) A fully virtualized operating system is one that has not been modified specifically to run in a virtual environment, so it is unaware that it is being virtualized. As a result, the hypervisor traps and emulates every I/O and hardware instruction that is deemed privileged by the hypervisor.
Typically, the overhead occurring from these trapping and emulation operations would have a significant impact on performance. However, the AMD processors with AMD-V have been designed specifically for virtualization. The Xen hypervisor interacts with the virtualization extensions in the AMD processors not only to improve performance
and efficiency, but also to provide hardwarebased isolation between these unmodified guest operating systems running on a virtualization server.
The main benefit of full virtualization comes from its ability to host legacy operating systems that have not been paravirtualized. The ability to host these legacy operating systems in a virtualized environment is critical to a data center’s server-consolidation efforts. This feature is mandatory for virtualizing proprietary operating systems, including those from Microsoft*.
To run full virtualization guests on systems with Hardware-assisted Virtual Machine (HVM), Intel, or AMD platforms, you must check to ensure your CPUs have the capabilities needed to do so.
To check if you have the CPU flags for Intel support, enter the following:
Xen’s unique performance benefits accrue from its use of paravirtualization. With paravirtualization,the operating system running inside of a virtual machine (known as a guest operating system) is modified to run on top of a hypervisor.
virtualized operating system instance is aware that it is running in a virtualized state and has been fine-tuned for optimal performance
in that environment.
Paravirtualization allows the hypervisor to avoid hard-to-virtualize processor instructions by replacing them with procedure calls that
provide that functionality. A paravirtualized operating system loads and runs virtual drivers that are capable of interacting with Xen to access resources on the host virtual server. In other words, it does not require complete emulation of computer devices.
Full & Paravirtualization Overview
32-bit hosts runs only 32-bit paravirtual guests. 64-bit hosts runs only 64-bit paravirtual guests. And a 64-bit full virtualization host runs 32-bit, 32-bit PAE, or 64-bit guests. A 32-bit full virtualization host runs both PAE and non-PAE full virtualization guests.
guest operating system : An operating system that can run within the Xen environment.
hypervisor : Code running at a higher privilege level than the supervisor code of its guest operating systems. The hypervisor is Xen itself. It goes between the hardware and the operating systems of the various domain s. The hypervisor is responsible for checking page tables, allocating resources for new domains, and scheduling domains. It presents the domains with a VirtualMachine that looks similar but not identical to the native architecture. It is also responsible for booting the machine enough that it can start dom0 .
Just as applications can interact with an OS by giving it syscalls, domains interact with the hypervisor by giving it hypercall s. The hypervisor responds by sending the domain an event, which fulfils the same function as an IRQ on real hardware.
virtual machine monitor (" vmm "): In this context, the hypervisor.
domain : A running virtual machine within which a guest OS executes.
domain0 (" dom0 "): The first domain, automatically started at boot time. Dom0 has permission to control all hardware on the system, and is used to manage the hypervisor and the other domains.
unprivileged domain (" domU "): A domain with no special hardware access.
Full virtualization : An approach to virtualization which requires no modifications to the hosted operating system, providing the illusion of a complete system of real hardware devices.
paravirtualization : An approach to virtualization which requires modifications to the operating system in order to run in a virtual machine. Xen uses paravirtualization but preserves binary compatibility for user space applications.
HVM : Hardware Virtual Machine, which is the full-virtualization mode supported by Xen. This mode requires hardware support, e.g. Intel's Virtualization Technology (VT) and AMD's Pacifica technology.
SVM : full-virtualization support on AMD's Pacifica-enabled processors
VT-x : full-virtualization support on Intel's x86 VT-enabled processors
VT-i : full-virtualization support on Intel's IA-64 VT-enabled processors
backend : one half of a communication end point - interdomain communication is implemented using a frontend and backend device model interacting via event channels.
frontend : the device as presented to the guest; other half of the communication endpoint.
vif : virtual interface; the name of the network backend device connected by an event channel to a network front end on the guest.
vethN : local networking front end on dom0; renamed to ethN by xen network scripts in bridging mode (FIXME)
pethN : real physical device (after renaming)
Live migration : A technique for moving a running virtual machine to another physical host, without stopping it or the services running on it.
Hypervisors are currently classified in two types:
A Type 1 (or native or bare-metal ) hypervisor is software that runs directly on a given hardware platform (as an operating system control program ).
A guest operating system thus runs at the second level above the hardware. The classic type 1 hypervisor was CP/CMS , developed at IBM in the 1960s, ancestor of IBM's current z/VM . More recent examples are Xen , Oracle VM, VMware 's ESX Server , L4 microkernels , TRANGO, IBM's LPAR hypervisor (PR/SM), Microsoft's Hyper-V (currently in Beta), and Sun's Logical Domains Hypervisor (released in 2005). A variation of this is embedding the hypervisor in the firmware of the platform, as is done in the case of Hitachi 's Virtage hypervisor. KVM , which turns a complete Linux kernel into a hypervisor, is also Type 1.
A Type 2 (or hosted ) hypervisor is software that runs within an operating system environment. A "guest" operating system thus runs at the third level above the hardware. Examples include VMware Server (formerly known as GSX), VMware Workstation , VMware Fusion , the open source QEMU , Microsoft 's Virtual PC and Microsoft Virtual Server products, InnoTek's VirtualBox , as well as SWsoft 's Parallels Workstation and Parallels Desktop .
The term hypervisor apparently originated in IBM 's CP-370 reimplementation of CP-67 for the System/370 , released in 1972 as VM/370 . The term hypervisor call , or hypercall , referred to the paravirtualization interface, by which a "guest" operating system could access services directly from the (higher-level) control program – analogous to making a "supervisor call" to the (same level) operating system. (The term " supervisor " refers to the operating system kernel , which on IBM mainframes runs in supervisor state .)
This congures Xen to output on COM1 at 115,200 baud, 8 data bits, 1 stop bit and no parity. Modify these parameters for your set up.
One can also congure XenLinux to share the serial console; to achieve this append
. console=ttyS0. to your module line.
If you wish to be able to log in over the XenLinux serial console it is necessary to add a line into /etc/inittab, just as per regular Linux. Simply add the line:
and you should be able to log in. Note that to successfully log in as root over the serial line will require adding ttyS0 to /etc/securetty in most modern distributions.
Xen Boot Options
These options are used to configure Xen's behaviour at runtime. They should be appended to Xen's command line, either manually or by editing grub.conf.
noreboot Don't reboot the machine automatically on errors. This is useful to catch debug output if you aren't catching console messages via the serial line. nosmp Disable SMP support. This option is implied by `ignorebiostables'. watchdog Enable NMI watchdog which can report certain failures. noirqbalance Disable software IRQ balancing and affinity. This can be used on systems such as Dell 1850/2850 that have workarounds in hardware for IRQ-routing issues. badpage=<page number>,<page number>, ... Specify a list of pages not to be allocated for use because they contain bad bytes. For example, if your memory tester says that byte 0x12345678 is bad, you would place `badpage=0x12345' on Xen's command line.
Xen supports up to two 16550-compatible serial ports. For example: `com1=9600, 8n1, 0x408, 5' maps COM1 to a 9600-baud port, 8 data bits, no parity, 1 stop bit, I/O port base 0x408, IRQ 5. If some configuration options are standard (e.g., I/O base and IRQ), then only a prefix of the full configuration string need be specified. If the baud rate is pre-configured (e.g., by the bootloader) then you can specify `auto' in place of a numeric baud rate.
Specify the destination for Xen console I/O. This is a comma-separated list of, for example:
vga Use VGA console and allow keyboard input .
com1 Use serial port com1.
com2H Use serial port com2. Transmitted chars will have the MSB set. Received chars must have MSB set.
com2L Use serial port com2. Transmitted chars will have the MSB cleared. Received chars must have MSB cleared.
Force synchronous console output. This is useful if you system fails unexpectedly before it has sent all available output to the console. In most cases Xen will automatically enter synchronous mode when an exceptional event occurs, but this option provides a manual fallback.
Specify how to switch serial-console input between Xen and DOM0. The required sequence is CTRL-<switch-char> pressed three times. Specifying the backtick character disables switching. The <auto-switch-char> specifies whether Xen should auto-switch input to DOM0 when it boots -- if it is `x' then auto-switching is disabled. Any other value, or omitting the character, enables auto-switching. [NB. Default switch-char is `a'.]
Specify what to do with an NMI parity or I/O error. `nmi=fatal': Xen prints a diagnostic and then hangs. `nmi=dom0': Inform DOM0 of the NMI. `nmi=ignore': Ignore the NMI.
Set the physical RAM address limit. Any RAM appearing beyond this physical address in the memory map will be ignored. This parameter may be specified with a B, K, M or G suffix, representing bytes, kilobytes, megabytes and gigabytes respectively. The default unit, if no suffix is specified, is kilobytes.
Set the amount of memory to be allocated to domain0. In Xen 3.x the parameter may be specified with a B, K, M or G suffix, representing bytes, kilobytes, megabytes and gigabytes respectively; if no suffix is specified, the parameter defaults to kilobytes. In previous versions of Xen, suffixes were not supported and the value is always interpreted as kilobytes.
Set the size of the per-cpu trace buffers, in pages (default 1). Note that the trace buffers are only enabled in debug builds. Most users can ignore this feature completely.
Select the CPU scheduler Xen should use. The current possibilities are `sedf' (default) and `bvt'.
apic_verbosity=debug,verbose Print more detailed information about local APIC and IOAPIC configuration.
lapic Force use of local APIC even when left disabled by uniprocessor BIOS.
nolapic Ignore local APIC in a uniprocessor system, even if enabled by the BIOS.
apic=bigsmp,default,es7000,summit Specify NUMA platform. This can usually be probed automatically. In addition, the following options may be specified on the Xen command line. Since domain 0 shares responsibility for booting the platform, Xen will automatically propagate these options to its command line. These options are taken from Linux's command-line syntax with unchanged semantics.
Modify how Xen (and domain 0) parses the BIOS ACPI tables.
Instruct Xen (and domain 0) to ignore timer-interrupt override instructions specified by the BIOS ACPI tables.
Instruct Xen (and domain 0) to ignore any IOAPICs that are present in the system, and instead continue to use the legacy PIC.
Specify the device node to which the Xen virtual console driver is attached. The following options are supported:
`xencons=off': disable virtual console
`xencons=tty': attach console to /dev/tty1 (tty0 at boot-time) `xencons=ttyS': attach console to /dev/ttyS0
The default is ttyS for dom0 and tty for all other domains.
Virtual Ethernet interfaces
Xen creates, by default, seven pair of "connected virtual ethernet interfaces" for use by dom0. Think of them as two ethernet interfaces connected by an internal crossover ethernet cable. veth0 is connected to vif0.0, veth1 is connected to vif0.1, etc, up to veth7 -> vif0.7. You can use them by configuring IP and MAC addresses on the veth# end, then attaching the vif0.# end to a bridge.
Every time you create a running domU instance, it is assigned a new domain id number.
For each new domU, Xen creates new "connected virtual ethernet interfaces", with one end of each pair is within the domU and the other end exists within dom0. For linux domU's, the device name it sees is named eth0.
The other end of that virtual ethernet interface pair exists within dom0 as interface vif<id#>.0.
For example, domU #5's eth0 is attached to vif5.0.
If you create multiple network interfaces for a domU, it's ends will be eth0, eth1, etc, whereas the dom0 end will be vif<id#>.0, vif<id#>.1, etc.
Logical network cards connected between dom0 and dom1:
When xend starts up, it runs the network-bridge script, which:
creates a new bridge named xenbr0
"real" ethernet interface eth0 is brought down
the IP and MAC addresses of eth0 are copied to virtual network interface veth0
real interface eth0 is renamed peth0
virtual interface veth0 is renamed eth0
peth0 and vif0.0 are attached to bridge xenbr0
the bridge, peth0, eth0 and vif0.0 are brought up
It is good to have the physical interface and the dom0 interface separated; thus you can e.g. setup a firewall on dom0 that does not affect the traffic to the domUs (just for protecting dom0 alone).
When a domU starts up, xend (running in dom0) runs the vif-bridge script, which:
attaches vif<id#>.0 to xenbr0
vif<id#>.0 is brought up
you can change the bridge name from xenbr0 using:
(network-script 'network-bridge bridge=mybridge') in xend- config.sxp and rebooting or restarting xend
you can create multiple network interfaces, and attach them to different bridges using: vif=[ 'mac=00:16:3e:70:01:01,bridge=br0', 'mac=00:16:3e:70:02:01,bridge=br1' ]
Domain Management Tools
Command line management tasks are also performed using the xm tool. For online
help for the commands available, type:
# xm help
You can also type xm help <command> for more information on a given command.
Starting/Stopping a Domain at Boot Time
You can start or stop running domains at any time. Domain0 waits for all running domains to shutdown before restarting.
You must place the configuration files of the domains you wish to shut down in the /etc/xen/ directory.
All the domains that you want to start at boot time must be symlinked to /etc/xen/auto .
chkconfig xendomains on
The chkconfig xendomains on command does not automatically start domains; instead it will start the domains on the next boot.
chkconfig xendomains off
Terminates all running Red Hat Virtualization domains. The chkconfig xendomains off command shuts down the domains on the next boot.
The Xend node control daemon performs system management functions related to virtual machines. It forms a central point of control of virtualized resources, and must be running in order to start and manage virtual machines. Xend must be run as root because it needs access to privileged system management functions.
Xend can be started on the command line as well, and supports the following set of parameters:
# xend start start xend, if not already running
# xend stop stop xend if already running
# xend restart restart xend if running, otherwise start it
# xend status indicates xend status by its return code
As xend runs, events will be logged to / var/log/xend.log and (less frequently) to /var/log/xend-debug.log . These, along with the standard syslog files , are useful when troubleshooting problems.
Xend is written in Python. At startup, it reads its configuration information from the file /etc/xen/xend-config.sxp . The Xen installation places an example xend-config.sxp file in the /etc/xen subdirectory which should work for most installations.
An HTTP interface and a Unix domain socket API are available to communicate with Xend. This allows remote users to pass commands to the daemon.
By default, Xend does not start an HTTP server. It does start a Unix domain socket management server, as the low level utility xm requires it. For support of cross-machine migration, Xend can start a relocation server . This support is not enabled by default for security reasons.
From the file:
Comment or uncomment lines in that file to disable or enable features that you require.
Connections from remote hosts are disabled by default:
# Address xend should listen on for HTTP connections, if xend-http-server is
# Specifying the empty string '' (the default) allows all connections.
It is recommended that if migration support is not needed, the xend-relocation-server parameter value be changed to ``no'' or commented out.
The xm tool is the primary tool for managing Xen from the console. The general format of an xm command line is:
# xm command [switches] [arguments] [variables]
# xm help
This will list the most commonly used commands. The full list can be obtained using xm help --long. You can also type xm help <command> for more information on a given command.
One useful command is
# xm list
which lists all domains running in rows of the following format:
name domid memory vcpus state cputime
The meaning of each field is as follows:
name The descriptive name of the virtual machine.
domid The number of the domain ID this virtual machine is running in.
memory Memory size in megabytes.
vcpus The number of virtual CPUs this domain has.
state Domain state consists of 5 fields:
r running b blocked p paused s shutdown c crashed
cputime How much CPU time (in seconds) the domain has used so far.
The xm list command also supports a long output format when the -l switch is used. This outputs the full details of the running domains in xend's SXP configuration format.
You can get access to the console of a particular domain using the # xm console command (e.g. # xm console myVM).
5.1 Configuration Files
Xen configuration files contain the following standard variables. Unless otherwise stated, configuration items should be enclosed in quotes: see the configuration scripts in /etc/xen/ for concrete examples.
kernel Path to the kernel image.
ramdisk Path to a ramdisk image (optional).
memory Memory size in megabytes.
vcpus The number of virtual CPUs .
console Port to export the domain console on (default 9600 + domain ID).