Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Stub Domain Device Model Domain and PV-GRUB Kwon-yong Lee Distributed Computing & Communication Lab. (URL: http://dcclab.sogang.ac.kr) Dept. of Computer Science Sogang University Seoul, Korea Tel : +82-2-3273-8783 Email : dlrnjsdyd@sogang.ac.kr
  2. 2. Domain0 Disaggregation <ul><li>Big Dom0 Problems </li></ul><ul><ul><li>Running a lot of Xen components </li></ul></ul><ul><ul><ul><li>Physical device drivers </li></ul></ul></ul><ul><ul><ul><li>Domain manager </li></ul></ul></ul><ul><ul><ul><li>Domain builder </li></ul></ul></ul><ul><ul><ul><li>ioemu device models </li></ul></ul></ul><ul><ul><ul><li>PyGRUB </li></ul></ul></ul><ul><ul><li>Security issues </li></ul></ul><ul><ul><ul><li>Most of the components run as root. </li></ul></ul></ul><ul><ul><li>Scalability issues </li></ul></ul><ul><ul><ul><li>The hypervisor can not itself schedule them appropriately. </li></ul></ul></ul><ul><li>Goal </li></ul><ul><ul><li>Move the components to separate domains </li></ul></ul><ul><ul><li>Helper domains </li></ul></ul><ul><ul><ul><li>Driver domain, Builder domain, Device model domains, etc. </li></ul></ul></ul>
  3. 3. PyGRUB <ul><li>Acts as a “PV bootloader” </li></ul><ul><li>Allows to boot from a kernel that resides within the DomU disk or partition image </li></ul><ul><li>Needs to be root to access guest disk </li></ul><ul><ul><li>Security issues </li></ul></ul><ul><li>Can’t network boot </li></ul><ul><li>Re-implements GRUB </li></ul>Xen Hypervisor Dom0 PV Domain xend Linux PyGRUB menu.lst vmlinuz initrd
  4. 4. Mini-OS <ul><li>A sample PV guest for the Xen hypervisor </li></ul><ul><ul><li>Very simple </li></ul></ul><ul><ul><li>Completely rely on the hypervisor to access the machine </li></ul></ul><ul><ul><ul><li>Uses the Xen network, block, and console frontend/backend mechanism </li></ul></ul></ul><ul><ul><li>Supports only </li></ul></ul><ul><ul><ul><li>Non-preemptive threads </li></ul></ul></ul><ul><ul><ul><li>One virtual memory address space (no user space) </li></ul></ul></ul><ul><ul><ul><li>Single CPU (mono-VCPU) </li></ul></ul></ul>
  5. 5. Mini-OS <ul><li>Xen 3.3 </li></ul><ul><ul><li>It has been extended up to being able to run the newlib C library and the lwIP stack, thus providing a basic POSIX environment, including TCP/IP networking. </li></ul></ul><ul><ul><li>xen-3.3.1/extras/mini-os/ </li></ul></ul><ul><li>PS) being tested at Cisco for IOS </li></ul>
  6. 6. xen-3.3.1/extras/mini-os/README Minimal OS ---------- This shows some of the stuff that any guest OS will have to set up. This includes: * installing a virtual exception table * handling virtual exceptions * handling asynchronous events * enabling/disabling async events * parsing start_info struct at start-of-day * registering virtual interrupt handlers (for timer interrupts) * a simple page and memory allocator * minimal libc support * minimal Copy-on-Write support * network, block, framebuffer support * transparent access to FileSystem exports (see tools/fs-back) - to build it just type make. - to build it with TCP/IP support, download LWIP 1.3 source code and type make LWIPDIR=/path/to/lwip/source - to build it with much better libc support, see the stubdom/ directory - to start it do the following in domain0 (assuming xend is running) # xm create domain_config This starts the kernel and prints out a bunch of stuff and then once every second the system time. If you have setup a disk in the config file (e.g. disk = [ 'file:/tmp/foo,hda,r' ] ), it will loop reading it. If that disk is writable (e.g. disk = [ 'file:/tmp/foo,hda,w' ] ), it will write data patterns and re-read them. If you have setup a network in the config file (e.g. vif = [''] ), it will print incoming packets. If you have setup a VFB in the config file (e.g. vfb = ['type=sdl'] ), it will show a mouse with which you can draw color squares. If you have compiled it with TCP/IP support, it will run a daytime server on TCP port 13.
  7. 7. POSIX Environment on top of Mini-OS Xen Hypervisor Mini-OS New lib lwIP Additional Code getpid, sig, mmap, … Application Sched MM Console frontend Network frontend Block frontend FS frontend FB frontend
  8. 8. POSIX Environment on top of Mini-OS <ul><li>lwIP (lightweight IP) </li></ul><ul><ul><li>Provides a lightweight TCP/IP stack </li></ul></ul><ul><ul><ul><li>Just connect to the network frontend of Mini-OS </li></ul></ul></ul><ul><ul><li>Widely used open source TCP/IP stack designed for embedded systems </li></ul></ul><ul><ul><li>Reduce resource usage while still having a full scale TCP </li></ul></ul><ul><li>PS) uIP </li></ul><ul><ul><li>TCP/IP stack for 8-bit microcontrollers </li></ul></ul>
  9. 9. POSIX Environment on top of Mini-OS <ul><li>newlib </li></ul><ul><ul><li>Provides the standard C library functions </li></ul></ul><ul><ul><li>Or GNU libc </li></ul></ul><ul><li>Others </li></ul><ul><ul><li>getpid and similar return e.g. 1. </li></ul></ul><ul><ul><ul><li>Don’t have the notion of Unix process </li></ul></ul></ul><ul><ul><li>sig functions can be void. </li></ul></ul><ul><ul><ul><li>Don’t have signals either </li></ul></ul></ul><ul><ul><li>mmap is only implemented for one case. </li></ul></ul><ul><ul><ul><li>Anonymous memory </li></ul></ul></ul>
  10. 10. POSIX Environment on top of Mini-OS <ul><li>Disk frontend </li></ul><ul><li>FrameBuffer frontend </li></ul><ul><li>FileSystem frontend (to access part of the Dom0 FS) </li></ul><ul><ul><li>Through the FileSystem frontend/backend mechanism </li></ul></ul><ul><ul><ul><li>Imported from JavaGuest </li></ul></ul></ul><ul><ul><ul><ul><li>By using very simple virtualized kernel, JavaGuest project avoids all the complicated semantics of a full-featured kernel, and hence permit far easier certification of the semantics of the JVM. </li></ul></ul></ul></ul><ul><li>More advanced MM </li></ul><ul><ul><li>Read-only memory </li></ul></ul><ul><ul><li>CoW for zeroed pages </li></ul></ul>
  11. 11. POSIX Environment on top of Mini-OS <ul><li>Running a Mini-OS example </li></ul><ul><ul><li>1 초에 한번씩 타임스탬프가 출력 </li></ul></ul><ul><ul><li>Xm create –c domain_config </li></ul></ul><ul><ul><li>해당 도메인과의 콘솔 연결을 끊으려면 ‘ Ctrl+]’ </li></ul></ul><ul><li>Cross-compilation environment </li></ul><ul><ul><li>binutils, gcc, newlib, lwip </li></ul></ul><ul><ul><li>Ex) ‘Hello World!’ </li></ul></ul><ul><ul><ul><li>xen-3.3.1/stubdom/c/ </li></ul></ul></ul>
  12. 12. Old HVM Device Model (< Xen 3.3) <ul><li>Modified version of qemu, ioemu </li></ul><ul><ul><li>To provide HVM domains with virtual hardware </li></ul></ul><ul><ul><li>Used to run in dom0 as a root process, since it needs to directly access disks and tap network </li></ul></ul><ul><ul><li>Problems </li></ul></ul><ul><ul><ul><li>Security </li></ul></ul></ul><ul><ul><ul><ul><li>The qemu code base was not particularly meant to be safe </li></ul></ul></ul></ul><ul><ul><ul><li>Efficiency </li></ul></ul></ul><ul><ul><ul><ul><li>When an HVM guest performs an I/O operation, the hypervisor gives hand to Dom0, which then may not schedule the ioemu process immediately, leading to uneven performances. </li></ul></ul></ul></ul>
  13. 13. Old HVM Device Model <ul><li>Have to wait for Dom0 Linux to schedule qemu </li></ul><ul><li>Consume Dom0 CPU time </li></ul>Xen Hypervisor Dom0 HVM Domain IN/OUT qemu Linux
  14. 14. Xen 3.3.1 (compared to 3.2) <ul><li>Power management (P & C states) in the hypervisor </li></ul><ul><li>HVM emulation domains (qemu-on-minios) for better scalability, performance and security </li></ul><ul><li>PVGrub: boot PV kernels using real GRUB inside the PV domain </li></ul><ul><li>Better PV performance: domain lock removed from pagetable-update paths </li></ul><ul><li>Shadow3: optimizations to make this the best shadow pagetable algorithm yet, making HVM performance better than ever </li></ul><ul><li>Hardware Assisted Paging enhancements: 2MB page support for better TLB locality </li></ul><ul><li>CPUID feature leveling: allows safe domain migration across systems with different CPU models </li></ul><ul><li>PVSCSI drivers for SCSI access direct into PV guests </li></ul><ul><li>HVM frame-buffer optimizations: scan for frame-buffer updates more efficiently </li></ul><ul><li>Device pass-through enhancements </li></ul><ul><li>Full x86 real-mode emulation for HVM guests on Intel VT: supports a much wider range of legacy guest OSes </li></ul><ul><li>New qemu merge with upstream development </li></ul><ul><li>Many other changes in both x86 and IA64 ports </li></ul>
  15. 15. HVM Device Model Domain (Xen 3.3 Feature) <ul><li>In Xen 3.3, ioemu can be run in a Stub Domain. </li></ul><ul><ul><li>Dedicated Device Model Domain for each HVM domain </li></ul></ul><ul><ul><li>Device Model Domain </li></ul></ul><ul><ul><ul><li>Processes the I/O requests of the HVM guest </li></ul></ul></ul><ul><ul><ul><li>Uses the regular PV interface to actually perform disk and network I/O </li></ul></ul></ul>
  16. 16. Stub Domain <ul><li>Helper domains for HVM guest </li></ul><ul><ul><li>Because the emulated devices are processes in Dom0, their execution time is accounted to Dom0. </li></ul></ul><ul><ul><ul><li>An HVM guest performing a lot of I/O can cause Dom0 to use an inordinate amount of CPU time, preventing other guests from getting their fair share of the CPU. </li></ul></ul></ul><ul><ul><li>Each HVM guest would have its own stub domain, responsible for its I/O. </li></ul></ul><ul><ul><ul><li>Small stub domains run nothing other than the device emulators. </li></ul></ul></ul><ul><ul><li>Based on Mini-OS </li></ul></ul><ul><ul><li>xen-3.3.1/stubdom/ </li></ul></ul>
  17. 17. Stub Domain <ul><li>Tricky scheduling </li></ul><ul><ul><li>The current schedulers in Xen are based on the assumption that virtual machines are, for the most part, independent. </li></ul></ul><ul><ul><ul><li>If domain 2 is under-scheduled, this doesn’t have a negative effect on domain 3. </li></ul></ul></ul><ul><ul><li>With HVM and stub domain pairs, </li></ul></ul><ul><ul><ul><li>The HVM guest is likely to be performance-limited by the amount of time allocated to the stub domain. </li></ul></ul></ul><ul><ul><ul><li>In case where the stub domain is under-scheduled, the HVM domain sits around waiting for I/O. </li></ul></ul></ul><ul><ul><li>Potential solutions </li></ul></ul><ul><ul><ul><li>Doors </li></ul></ul></ul><ul><ul><ul><li>Scheduler domains </li></ul></ul></ul>
  18. 18. Stub Domain <ul><li>Doors </li></ul><ul><ul><li>From the Spring operating system and later Solaris </li></ul></ul><ul><ul><li>IPC mechanism </li></ul></ul><ul><ul><ul><li>Allows a process to delegate the rest of its scheduling quantum to another </li></ul></ul></ul><ul><ul><ul><li>The stub domain would run whenever the pair needed to be scheduled. </li></ul></ul></ul><ul><ul><ul><li>It would then perform pending I/O emulation and “delegate” scheduler operation (instead of “yield”) on the HVM guest, which would then run for the remainder of the quantum. </li></ul></ul></ul>
  19. 19. Stub Domain <ul><li>Scheduler domains </li></ul><ul><ul><li>Proposed by IBM based on work in the Nemesis Exokernel </li></ul></ul><ul><ul><li>Similar conceptually to the N:M threading model </li></ul></ul><ul><ul><ul><li>The hypervisor’s scheduler would schedule this domain, and it would be responsible for dividing time amongst the others in the group. </li></ul></ul></ul><ul><ul><ul><li>In this way, the scheduler domain fulfills the same role as the user-space component of an N:M threading library. </li></ul></ul></ul>
  20. 20. HVM Device Model Domain <ul><li>Almost unmodified qemu </li></ul><ul><li>Relieve Dom0 </li></ul><ul><li>Provides better CPU usage accounting </li></ul><ul><li>More efficient </li></ul><ul><ul><li>Let the hypervisor schedule it directly </li></ul></ul><ul><ul><li>More lightweight OS </li></ul></ul><ul><li>A lot safer </li></ul>Xen Hypervisor stubdom HVM Domain IN/OUT qemu Mini-OS Dom0 Linux PV
  21. 21. HVM Device Model Domain <ul><li>Performance </li></ul><ul><ul><li>lnb : latency of I/O port accesses </li></ul></ul><ul><ul><ul><li>The round trip time between the application in the HVM domain and the virtual device emulation part of qemu </li></ul></ul></ul>
  22. 22. HVM Device Model Domain <ul><ul><li>Disk performance </li></ul></ul>CPU %
  23. 23. HVM Device Model Domain <ul><ul><li>Network performance </li></ul></ul><ul><ul><ul><li>e1000 </li></ul></ul></ul>
  24. 24. HVM Device Model Domain <ul><ul><li>Network performance </li></ul></ul><ul><ul><ul><li>bicore </li></ul></ul></ul>
  25. 25. PV-GRUB <ul><li>PyGRUB used to act as a “PV bootloader” </li></ul><ul><li>PV-GRUB </li></ul><ul><ul><li>Real GRUB source code recompiled against Mini-OS </li></ul></ul><ul><ul><li>Runs inside the PV domain that will host the PV guest </li></ul></ul><ul><ul><li>Boot inside PV domain </li></ul></ul><ul><ul><li>Detect the PV disks and network interfaces of the domain </li></ul></ul><ul><ul><li>Use that to access the PV guests’ menu.lst </li></ul></ul><ul><ul><li>Use the regular PV console to show the GRUB menu </li></ul></ul><ul><ul><li>Use the PV interface to load the kernel image from the guest disk image </li></ul></ul><ul><li>More secure that PyGRUB </li></ul><ul><ul><li>Just only uses the resources that the PV guest will use </li></ul></ul>
  26. 26. PV-GRUB <ul><li>Start </li></ul>
  27. 27. PV-GRUB <ul><li>Loading </li></ul>
  28. 28. PV-GRUB <ul><li>Loaded </li></ul><ul><li>kexec (kernel execution) </li></ul><ul><ul><li>Allows “live” booting of a new kernel over the currently running one </li></ul></ul>
  29. 29. PV-GRUB
  30. 30. PV-GRUB <ul><li>Executes upstream GRUB </li></ul><ul><ul><li>Replace native drivers with Mini-OS drivers </li></ul></ul><ul><ul><li>Add PV-kexec implementation </li></ul></ul><ul><li>Just uses the target PV guest resources </li></ul><ul><li>Improve security </li></ul><ul><li>Provides network boot </li></ul>
  31. 31. Reference <ul><li>Samuel Thibault, Citrix/Xensource, “Stub Domains: A Step Towards Dom0 Disaggregation” </li></ul><ul><li>Samuel Thibault, and Tim Deegan, “Improving Performance by Embedding HPC Applications in Lightweight Xen Domains”, HPCVIRT’08, Oct. 2008. </li></ul><ul><li>“ The Definitive Guide to the Xen Hypervisor” </li></ul><ul><li>http://blog.xen.org </li></ul><ul><ul><li>Xen 3.3 Features: Stub Domains </li></ul></ul><ul><ul><li>Xen 3.3 Features: HVM Device Model Domain </li></ul></ul><ul><ul><li>Xen 3.3 Features: PV-GRUB </li></ul></ul>
  32. 32. HVM Configuration <ul><li>Para-virtualization </li></ul><ul><ul><li>Hypercall </li></ul></ul><ul><li>HVM (hardware virtualized machine) </li></ul><ul><ul><li>Hardware support is needed to trap privileged instructions. </li></ul></ul><ul><ul><li>Trap-and-emulate approach </li></ul></ul><ul><ul><li>Processor flag </li></ul></ul><ul><ul><ul><li>vmx : virtual machine extensions – Intel CPU </li></ul></ul></ul><ul><ul><ul><li>svm : support vector machine – AMD CPU </li></ul></ul></ul><ul><ul><li>In Intel’s VT architecture </li></ul></ul><ul><ul><ul><li>Use VMexit and VMentry operations -> a lot of costs </li></ul></ul></ul>