Stub Domain Device Model Domain and PV-GRUB Kwon-yong Lee Distributed Computing & Communication Lab. (URL: http://dcclab.sogang.ac.kr) Dept. of Computer Science Sogang University Seoul, Korea Tel : +82-2-3273-8783 Email : email@example.com
Domain0 Disaggregation Running a lot of Xen components Most of the components run as root. The hypervisor can not itself schedule them appropriately. Move the components to separate domains
Driver domain, Builder domain, Device model domains, etc.
PyGRUB Acts as a “PV bootloader” Allows to boot from a kernel that resides within the DomU disk or partition image Xen Hypervisor Dom0 PV Domain xend Linux PyGRUB menu.lst vmlinuz initrd
Needs to be root to access guest disk
Mini-OS A sample PV guest for the Xen hypervisor Completely rely on the hypervisor to access the machine Uses the Xen network, block, and console frontend/backend mechanism
One virtual memory address space (no user space)
Mini-OS It has been extended up to being able to run the newlib C library and the lwIP stack, thus providing a basic POSIX environment, including TCP/IP networking. xen-3.3.1/extras/mini-os/
PS) being tested at Cisco for IOS
xen-3.3.1/extras/mini-os/README Minimal OS ---------- This shows some of the stuff that any guest OS will have to set up. This includes: * installing a virtual exception table * handling virtual exceptions * handling asynchronous events * enabling/disabling async events * parsing start_info struct at start-of-day * registering virtual interrupt handlers (for timer interrupts) * a simple page and memory allocator * minimal libc support * minimal Copy-on-Write support * network, block, framebuffer support * transparent access to FileSystem exports (see tools/fs-back) - to build it just type make. - to build it with TCP/IP support, download LWIP 1.3 source code and type make LWIPDIR=/path/to/lwip/source - to build it with much better libc support, see the stubdom/ directory - to start it do the following in domain0 (assuming xend is running) # xm create domain_config This starts the kernel and prints out a bunch of stuff and then once every second the system time. If you have setup a disk in the config file (e.g. disk = [ 'file:/tmp/foo,hda,r' ] ), it will loop reading it. If that disk is writable (e.g. disk = [ 'file:/tmp/foo,hda,w' ] ), it will write data patterns and re-read them. If you have setup a network in the config file (e.g. vif = [''] ), it will print incoming packets. If you have setup a VFB in the config file (e.g. vfb = ['type=sdl'] ), it will show a mouse with which you can draw color squares. If you have compiled it with TCP/IP support, it will run a daytime server on TCP port 13.
POSIX Environment on top of Mini-OS Xen Hypervisor Mini-OS New lib lwIP Additional Code getpid, sig, mmap, … Application Sched MM Console frontend Network frontend Block frontend FS frontend FB frontend
POSIX Environment on top of Mini-OS Provides a lightweight TCP/IP stack Just connect to the network frontend of Mini-OS Widely used open source TCP/IP stack designed for embedded systems Reduce resource usage while still having a full scale TCP
TCP/IP stack for 8-bit microcontrollers
POSIX Environment on top of Mini-OS Provides the standard C library functions getpid and similar return e.g. 1. Don’t have the notion of Unix process sig functions can be void. Don’t have signals either
mmap is only implemented for one case.
POSIX Environment on top of Mini-OS FileSystem frontend (to access part of the Dom0 FS) Through the FileSystem frontend/backend mechanism
By using very simple virtualized kernel, JavaGuest project avoids all the complicated semantics of a full-featured kernel, and hence permit far easier certification of the semantics of the JVM.
POSIX Environment on top of Mini-OS Running a Mini-OS example Xm create –c domain_config 해당 도메인과의 콘솔 연결을 끊으려면 ‘ Ctrl+]’ Cross-compilation environment
binutils, gcc, newlib, lwip
Old HVM Device Model (< Xen 3.3) Modified version of qemu, ioemu To provide HVM domains with virtual hardware Used to run in dom0 as a root process, since it needs to directly access disks and tap network The qemu code base was not particularly meant to be safe
When an HVM guest performs an I/O operation, the hypervisor gives hand to Dom0, which then may not schedule the ioemu process immediately, leading to uneven performances.
Old HVM Device Model Xen Hypervisor Dom0 HVM Domain IN/OUT qemu Linux
Have to wait for Dom0 Linux to schedule qemu
Xen 3.3.1 (compared to 3.2) Power management (P & C states) in the hypervisor HVM emulation domains (qemu-on-minios) for better scalability, performance and security PVGrub: boot PV kernels using real GRUB inside the PV domain Better PV performance: domain lock removed from pagetable-update paths Shadow3: optimizations to make this the best shadow pagetable algorithm yet, making HVM performance better than ever Hardware Assisted Paging enhancements: 2MB page support for better TLB locality CPUID feature leveling: allows safe domain migration across systems with different CPU models PVSCSI drivers for SCSI access direct into PV guests HVM frame-buffer optimizations: scan for frame-buffer updates more efficiently Device pass-through enhancements Full x86 real-mode emulation for HVM guests on Intel VT: supports a much wider range of legacy guest OSes New qemu merge with upstream development
Many other changes in both x86 and IA64 ports
HVM Device Model Domain (Xen 3.3 Feature) In Xen 3.3, ioemu can be run in a Stub Domain. Dedicated Device Model Domain for each HVM domain Processes the I/O requests of the HVM guest
Uses the regular PV interface to actually perform disk and network I/O
Stub Domain Helper domains for HVM guest Because the emulated devices are processes in Dom0, their execution time is accounted to Dom0. An HVM guest performing a lot of I/O can cause Dom0 to use an inordinate amount of CPU time, preventing other guests from getting their fair share of the CPU. Each HVM guest would have its own stub domain, responsible for its I/O.
Small stub domains run nothing other than the device emulators.
Stub Domain The current schedulers in Xen are based on the assumption that virtual machines are, for the most part, independent. If domain 2 is under-scheduled, this doesn’t have a negative effect on domain 3. With HVM and stub domain pairs, The HVM guest is likely to be performance-limited by the amount of time allocated to the stub domain.
In case where the stub domain is under-scheduled, the HVM domain sits around waiting for I/O.
Stub Domain From the Spring operating system and later Solaris Allows a process to delegate the rest of its scheduling quantum to another The stub domain would run whenever the pair needed to be scheduled.
It would then perform pending I/O emulation and “delegate” scheduler operation (instead of “yield”) on the HVM guest, which would then run for the remainder of the quantum.
Stub Domain Proposed by IBM based on work in the Nemesis Exokernel Similar conceptually to the N:M threading model The hypervisor’s scheduler would schedule this domain, and it would be responsible for dividing time amongst the others in the group.
In this way, the scheduler domain fulfills the same role as the user-space component of an N:M threading library.
HVM Device Model Domain Provides better CPU usage accounting Xen Hypervisor stubdom HVM Domain IN/OUT qemu Mini-OS Dom0 Linux PV
Let the hypervisor schedule it directly
HVM Device Model Domain lnb : latency of I/O port accesses
The round trip time between the application in the HVM domain and the virtual device emulation part of qemu
HVM Device Model Domain CPU %
HVM Device Model Domain
HVM Device Model Domain
PV-GRUB PyGRUB used to act as a “PV bootloader” Real GRUB source code recompiled against Mini-OS Runs inside the PV domain that will host the PV guest Detect the PV disks and network interfaces of the domain Use that to access the PV guests’ menu.lst Use the regular PV console to show the GRUB menu Use the PV interface to load the kernel image from the guest disk image
Just only uses the resources that the PV guest will use
Allows “live” booting of a new kernel over the currently running one
PV-GRUB Replace native drivers with Mini-OS drivers Add PV-kexec implementation
Just uses the target PV guest resources
Reference Samuel Thibault, Citrix/Xensource, “Stub Domains: A Step Towards Dom0 Disaggregation” Samuel Thibault, and Tim Deegan, “Improving Performance by Embedding HPC Applications in Lightweight Xen Domains”, HPCVIRT’08, Oct. 2008. “ The Definitive Guide to the Xen Hypervisor” Xen 3.3 Features: Stub Domains Xen 3.3 Features: HVM Device Model Domain
Xen 3.3 Features: PV-GRUB
HVM Configuration HVM (hardware virtualized machine) Hardware support is needed to trap privileged instructions. Trap-and-emulate approach vmx : virtual machine extensions – Intel CPU svm : support vector machine – AMD CPU In Intel’s VT architecture
Use VMexit and VMentry operations -> a lot of costs