Upcoming SlideShare
Loading in...5

Like this? Share it with your network








Total Views
Views on SlideShare
Embed Views



1 Embed 4

http://www.slideshare.net 4



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

12393401222009.04.02.ppt Presentation Transcript

  • 1. Stub Domain Device Model Domain and PV-GRUB Kwon-yong Lee Distributed Computing & Communication Lab. (URL: http://dcclab.sogang.ac.kr) Dept. of Computer Science Sogang University Seoul, Korea Tel : +82-2-3273-8783 Email : dlrnjsdyd@sogang.ac.kr
  • 2. Domain0 Disaggregation
    • Big Dom0 Problems
      • Running a lot of Xen components
        • Physical device drivers
        • Domain manager
        • Domain builder
        • ioemu device models
        • PyGRUB
      • Security issues
        • Most of the components run as root.
      • Scalability issues
        • The hypervisor can not itself schedule them appropriately.
    • Goal
      • Move the components to separate domains
      • Helper domains
        • Driver domain, Builder domain, Device model domains, etc.
  • 3. PyGRUB
    • Acts as a “PV bootloader”
    • Allows to boot from a kernel that resides within the DomU disk or partition image
    • Needs to be root to access guest disk
      • Security issues
    • Can’t network boot
    • Re-implements GRUB
    Xen Hypervisor Dom0 PV Domain xend Linux PyGRUB menu.lst vmlinuz initrd
  • 4. Mini-OS
    • A sample PV guest for the Xen hypervisor
      • Very simple
      • Completely rely on the hypervisor to access the machine
        • Uses the Xen network, block, and console frontend/backend mechanism
      • Supports only
        • Non-preemptive threads
        • One virtual memory address space (no user space)
        • Single CPU (mono-VCPU)
  • 5. Mini-OS
    • Xen 3.3
      • It has been extended up to being able to run the newlib C library and the lwIP stack, thus providing a basic POSIX environment, including TCP/IP networking.
      • xen-3.3.1/extras/mini-os/
    • PS) being tested at Cisco for IOS
  • 6. xen-3.3.1/extras/mini-os/README Minimal OS ---------- This shows some of the stuff that any guest OS will have to set up. This includes: * installing a virtual exception table * handling virtual exceptions * handling asynchronous events * enabling/disabling async events * parsing start_info struct at start-of-day * registering virtual interrupt handlers (for timer interrupts) * a simple page and memory allocator * minimal libc support * minimal Copy-on-Write support * network, block, framebuffer support * transparent access to FileSystem exports (see tools/fs-back) - to build it just type make. - to build it with TCP/IP support, download LWIP 1.3 source code and type make LWIPDIR=/path/to/lwip/source - to build it with much better libc support, see the stubdom/ directory - to start it do the following in domain0 (assuming xend is running) # xm create domain_config This starts the kernel and prints out a bunch of stuff and then once every second the system time. If you have setup a disk in the config file (e.g. disk = [ 'file:/tmp/foo,hda,r' ] ), it will loop reading it. If that disk is writable (e.g. disk = [ 'file:/tmp/foo,hda,w' ] ), it will write data patterns and re-read them. If you have setup a network in the config file (e.g. vif = [''] ), it will print incoming packets. If you have setup a VFB in the config file (e.g. vfb = ['type=sdl'] ), it will show a mouse with which you can draw color squares. If you have compiled it with TCP/IP support, it will run a daytime server on TCP port 13.
  • 7. POSIX Environment on top of Mini-OS Xen Hypervisor Mini-OS New lib lwIP Additional Code getpid, sig, mmap, … Application Sched MM Console frontend Network frontend Block frontend FS frontend FB frontend
  • 8. POSIX Environment on top of Mini-OS
    • lwIP (lightweight IP)
      • Provides a lightweight TCP/IP stack
        • Just connect to the network frontend of Mini-OS
      • Widely used open source TCP/IP stack designed for embedded systems
      • Reduce resource usage while still having a full scale TCP
    • PS) uIP
      • TCP/IP stack for 8-bit microcontrollers
  • 9. POSIX Environment on top of Mini-OS
    • newlib
      • Provides the standard C library functions
      • Or GNU libc
    • Others
      • getpid and similar return e.g. 1.
        • Don’t have the notion of Unix process
      • sig functions can be void.
        • Don’t have signals either
      • mmap is only implemented for one case.
        • Anonymous memory
  • 10. POSIX Environment on top of Mini-OS
    • Disk frontend
    • FrameBuffer frontend
    • FileSystem frontend (to access part of the Dom0 FS)
      • Through the FileSystem frontend/backend mechanism
        • Imported from JavaGuest
          • By using very simple virtualized kernel, JavaGuest project avoids all the complicated semantics of a full-featured kernel, and hence permit far easier certification of the semantics of the JVM.
    • More advanced MM
      • Read-only memory
      • CoW for zeroed pages
  • 11. POSIX Environment on top of Mini-OS
    • Running a Mini-OS example
      • 1 초에 한번씩 타임스탬프가 출력
      • Xm create –c domain_config
      • 해당 도메인과의 콘솔 연결을 끊으려면 ‘ Ctrl+]’
    • Cross-compilation environment
      • binutils, gcc, newlib, lwip
      • Ex) ‘Hello World!’
        • xen-3.3.1/stubdom/c/
  • 12. Old HVM Device Model (< Xen 3.3)
    • Modified version of qemu, ioemu
      • To provide HVM domains with virtual hardware
      • Used to run in dom0 as a root process, since it needs to directly access disks and tap network
      • Problems
        • Security
          • The qemu code base was not particularly meant to be safe
        • Efficiency
          • When an HVM guest performs an I/O operation, the hypervisor gives hand to Dom0, which then may not schedule the ioemu process immediately, leading to uneven performances.
  • 13. Old HVM Device Model
    • Have to wait for Dom0 Linux to schedule qemu
    • Consume Dom0 CPU time
    Xen Hypervisor Dom0 HVM Domain IN/OUT qemu Linux
  • 14. Xen 3.3.1 (compared to 3.2)
    • Power management (P & C states) in the hypervisor
    • HVM emulation domains (qemu-on-minios) for better scalability, performance and security
    • PVGrub: boot PV kernels using real GRUB inside the PV domain
    • Better PV performance: domain lock removed from pagetable-update paths
    • Shadow3: optimizations to make this the best shadow pagetable algorithm yet, making HVM performance better than ever
    • Hardware Assisted Paging enhancements: 2MB page support for better TLB locality
    • CPUID feature leveling: allows safe domain migration across systems with different CPU models
    • PVSCSI drivers for SCSI access direct into PV guests
    • HVM frame-buffer optimizations: scan for frame-buffer updates more efficiently
    • Device pass-through enhancements
    • Full x86 real-mode emulation for HVM guests on Intel VT: supports a much wider range of legacy guest OSes
    • New qemu merge with upstream development
    • Many other changes in both x86 and IA64 ports
  • 15. HVM Device Model Domain (Xen 3.3 Feature)
    • In Xen 3.3, ioemu can be run in a Stub Domain.
      • Dedicated Device Model Domain for each HVM domain
      • Device Model Domain
        • Processes the I/O requests of the HVM guest
        • Uses the regular PV interface to actually perform disk and network I/O
  • 16. Stub Domain
    • Helper domains for HVM guest
      • Because the emulated devices are processes in Dom0, their execution time is accounted to Dom0.
        • An HVM guest performing a lot of I/O can cause Dom0 to use an inordinate amount of CPU time, preventing other guests from getting their fair share of the CPU.
      • Each HVM guest would have its own stub domain, responsible for its I/O.
        • Small stub domains run nothing other than the device emulators.
      • Based on Mini-OS
      • xen-3.3.1/stubdom/
  • 17. Stub Domain
    • Tricky scheduling
      • The current schedulers in Xen are based on the assumption that virtual machines are, for the most part, independent.
        • If domain 2 is under-scheduled, this doesn’t have a negative effect on domain 3.
      • With HVM and stub domain pairs,
        • The HVM guest is likely to be performance-limited by the amount of time allocated to the stub domain.
        • In case where the stub domain is under-scheduled, the HVM domain sits around waiting for I/O.
      • Potential solutions
        • Doors
        • Scheduler domains
  • 18. Stub Domain
    • Doors
      • From the Spring operating system and later Solaris
      • IPC mechanism
        • Allows a process to delegate the rest of its scheduling quantum to another
        • The stub domain would run whenever the pair needed to be scheduled.
        • It would then perform pending I/O emulation and “delegate” scheduler operation (instead of “yield”) on the HVM guest, which would then run for the remainder of the quantum.
  • 19. Stub Domain
    • Scheduler domains
      • Proposed by IBM based on work in the Nemesis Exokernel
      • Similar conceptually to the N:M threading model
        • The hypervisor’s scheduler would schedule this domain, and it would be responsible for dividing time amongst the others in the group.
        • In this way, the scheduler domain fulfills the same role as the user-space component of an N:M threading library.
  • 20. HVM Device Model Domain
    • Almost unmodified qemu
    • Relieve Dom0
    • Provides better CPU usage accounting
    • More efficient
      • Let the hypervisor schedule it directly
      • More lightweight OS
    • A lot safer
    Xen Hypervisor stubdom HVM Domain IN/OUT qemu Mini-OS Dom0 Linux PV
  • 21. HVM Device Model Domain
    • Performance
      • lnb : latency of I/O port accesses
        • The round trip time between the application in the HVM domain and the virtual device emulation part of qemu
  • 22. HVM Device Model Domain
      • Disk performance
    CPU %
  • 23. HVM Device Model Domain
      • Network performance
        • e1000
  • 24. HVM Device Model Domain
      • Network performance
        • bicore
  • 25. PV-GRUB
    • PyGRUB used to act as a “PV bootloader”
    • PV-GRUB
      • Real GRUB source code recompiled against Mini-OS
      • Runs inside the PV domain that will host the PV guest
      • Boot inside PV domain
      • Detect the PV disks and network interfaces of the domain
      • Use that to access the PV guests’ menu.lst
      • Use the regular PV console to show the GRUB menu
      • Use the PV interface to load the kernel image from the guest disk image
    • More secure that PyGRUB
      • Just only uses the resources that the PV guest will use
  • 26. PV-GRUB
    • Start
  • 27. PV-GRUB
    • Loading
  • 28. PV-GRUB
    • Loaded
    • kexec (kernel execution)
      • Allows “live” booting of a new kernel over the currently running one
  • 29. PV-GRUB
  • 30. PV-GRUB
    • Executes upstream GRUB
      • Replace native drivers with Mini-OS drivers
      • Add PV-kexec implementation
    • Just uses the target PV guest resources
    • Improve security
    • Provides network boot
  • 31. Reference
    • Samuel Thibault, Citrix/Xensource, “Stub Domains: A Step Towards Dom0 Disaggregation”
    • Samuel Thibault, and Tim Deegan, “Improving Performance by Embedding HPC Applications in Lightweight Xen Domains”, HPCVIRT’08, Oct. 2008.
    • “ The Definitive Guide to the Xen Hypervisor”
    • http://blog.xen.org
      • Xen 3.3 Features: Stub Domains
      • Xen 3.3 Features: HVM Device Model Domain
      • Xen 3.3 Features: PV-GRUB
  • 32. HVM Configuration
    • Para-virtualization
      • Hypercall
    • HVM (hardware virtualized machine)
      • Hardware support is needed to trap privileged instructions.
      • Trap-and-emulate approach
      • Processor flag
        • vmx : virtual machine extensions – Intel CPU
        • svm : support vector machine – AMD CPU
      • In Intel’s VT architecture
        • Use VMexit and VMentry operations -> a lot of costs