KVM: Virtualisation The
Linux Way
Amit Shah
amit.shah@qumranet.com




GEEP
Virtualisation Strategies

 “Native” Hypervisors
    Have a runtime
    Need a “primary” guest OS
    Examples: Xen, VMWare ESX Server, IBM mainframes
 Containers
   Different namespaces for different guests
   Run on host kernel
   Userland can be different from host
   Examples: OpenVZ, FreeVPS, Linux-Vserver
 Paravirtualisation
 Emulation
   Examples: QEMU, PearPC

  Copyright © 2007 Qumranet, Inc. All rights reserved.
KVM: Architectures Supported

    S390
      IBM mainframes: a hypervisor is a must
      Included in 2.6.26
    IA-64
      Included in 2.6.26
    X86
      Included in 2.6.20
      KVM-lite: PV Linux guest on non-VTx / non-SVM host
      (proposed)
    PowerPC
      PV
      Architecture support for hypervisor
      Included in 2.6.26
3   Copyright © 2007 Qumranet, Inc. All rights reserved.
X86 Hardware Extensions

 'guest mode' in addition to user and kernel modes
 Raise a trap for all privileged instructions
 Virtualised registers
 Processor
    Intel-VTx (VMX)
    AMD-V (SVM)
 MM
   EPT (Intel)
   NPT (AMD)
 IO
      VT-d (Intel)
      IOMMU (AMD)


  Copyright © 2007 Qumranet, Inc. All rights reserved.
What's handled in the kernel?

     CPU virtualisation (special instructions)
     MMU virtualisation
     Local APIC, PIC, IOAPIC, PIT
     (guest) paravirtualised network and block device
     drivers
          virtio-net
          virtio-block
     (guest) paravirtualised kernel support code
          paravirt_ops
          MMU
     (guest) paravirtualised clock driver

5   Copyright © 2007 Qumranet, Inc. All rights reserved.
KVM Process Model




                task           task      guest             task   task   guest




                                                 kernel


6   Copyright © 2007 Qumranet, Inc. All rights reserved.
KVM Process Model (cont'd)

    Guests are scheduled as regular processes
    kill(1), top(1) work as expected
    Guest physical memory is mapped into the task's
    virtual memory space
    Virtual processors in one VM are threads




7    Copyright © 2007 Qumranet, Inc. All rights reserved.
KVM Execution Model
        Userspace                              Kernel               Guest


                ioctl()

                                              Switch to
                                              Guest Mode

    Heavyweight                              Lightweight Exit   Native Guest
    Exit                                                         Execution

                                              Kernel
                                              Exit Handler


            Userspace
            Exit Handler

8   Copyright © 2007 Qumranet, Inc. All rights reserved.
Flow Example: Memory
Access
    Guest accesses an unmapped memory location
    Hardware traps into kernel mode
    kvm walks the guest page table, determines guest
    physical address
    kvm performs guest physical -> host physical
    translation
    kvm installs shadow page table entry containing guest
    virtual -> host physical translation
    Processor restarts execution of faulting instruction




9    Copyright © 2007 Qumranet, Inc. All rights reserved.
Paravirtualisation

 Modifying guest OS for performance
 Virtio
     Common drivers for all hypervisors
     Hypervisor-specific backend
     KVM backend in qemu
     Faster performance
     Efficient block, net drivers
     Balloon
     lguest, KVM use it already
 PV DMA
     Pass through Ethernet devices
 paravirt_ops
  Copyright © 2007 Qumranet, Inc. All rights reserved.
Network Devices

 Fully virtualised device performance not great
    55 Mbps for RTL
    Lots of IO-exits per packet

 Decided to implement a modern e1000
   Advantages:
        All code in userspace (qemu)
        All existing drivers recognise device
   IRQ coalescing
   Only 2-3 IO-exits per packet
   Goes in excess of 800 Mbps




 Copyright © 2007 Qumranet, Inc. All rights reserved.
Virtio Net

 Shared memory between host and guest
 Two queues: recv and send
 Ring buffer within each queue
 'available' pointer controlled by guest
 'used' pointer controlled by host




  Copyright © 2007 Qumranet, Inc. All rights reserved.
Virtio-net on KVM


                        BLK               NET
                                                          User-
                             Virtio PCI                   space
                            Guest kernel


                                                 QEMU
                                Shared
                                Memory

                                                  Linux




 Copyright © 2007 Qumranet, Inc. All rights reserved.
Ideas

 Shared memory between host and guest via virtio-
 pci
 Shared directory between host and guest using
 virtio + fuse
 VMGL (OpenGL for Virtual Machines) support
 http://kvm.qumranet.com/kvmwiki/TODO




 Copyright © 2007 Qumranet, Inc. All rights reserved.
KVM Pros

 Leverages Linux scheduler, memory management, I/O
 No scheduler involvement for I/O
 Full virtualisation: No changes to the guest necessary
   Paravirt drivers available for better performance
 Uses existing Linux security model
   can run VM as ordinary user
 Uses existing management tools
 Power management
 Guest memory swapping
 Real-time scheduling, NUMA
 Leverages Linux development momentum: all new
 drivers, {cpu, disk} schedulers, file systems, etc
 supported Qumranet, Inc. All rights reserved.
  Copyright © 2007
Distro / Industry interest

 libvirt
    Managing various guests under a hypervisor
    Support for Xen, KVM
    APIs between UI, middle layer and virtualisation backend
 Distributions
    Debian
    Ubuntu
    RedHat EL
    SLES
 Qumranet
    Dekstop Virtualisation


  Copyright © 2007 Qumranet, Inc. All rights reserved.
Release Philosophy

 Development snapshots every 1-2 weeks
    Release early and often
    Features introduces quickly
    Bugs fixed quickly
    Bugs added quickly
    Allows developers and users to track and test the latest
    and greatest
 Stable releases part of Linux 2.6.x
    With bugfixes going into Linux 2.6.x.y




 Copyright © 2007 Qumranet, Inc. All rights reserved.
Journey

  Linux 2.6.20 (4 Feb 2007): Initial release
  Linux 2.6.21 (25 Apr 2007): Stability,
  suspend/resume
  Linux 2.6.22 (8 Jul 2007): Stable ABI
    Old userspace, new kernel
    New userspace, old kernel
 Linux 2.6.23 (9 Oct 2007): SMP, performance
 Linux 2.6.24 (24 Jan 2008): In-kernel APIC,
 preemptibility, virtio
 Linux 2.6.25 (16 Apr 2008): Guest swapping,
 paravirt_ops, balloon drv
 Linux 2.6.26 (soon): PowerPC, s390, IA64, NPT,
 Copyright © 2007 Qumranet, Inc. All rights reserved. ...
 EPT, more paravirt (mmu),
KVM is Developer-friendly

   No need to reboot (usually)
   Netconsole, oprofile, all the tools work
   Small codebase
   Friendly community




  Copyright © 2007 Qumranet, Inc. All rights reserved.
Future

 Consolidate various virtualisation solutions existing in
 the kernel
    Started with move to virt/ from drivers/kvm/
 More hardware features support
 More paravirtualisation support
 Improve guest scaling
 Better support for management layers like libvirt
 Intel Real Mode Emulation




  Copyright © 2007 Qumranet, Inc. All rights reserved.
Do Read

 virt/*, arch/[x86|ia64|s390|powerpc]/kvm/*
 KvmForum2007 wiki page on
 http://kvm.qumranet.com
 kvm@vger.kernel.org
 virtualization@lists.osdl.org




  Copyright © 2007 Qumranet, Inc. All rights reserved.
Thank You

The kvm virtualization way

  • 1.
    KVM: Virtualisation The LinuxWay Amit Shah amit.shah@qumranet.com GEEP
  • 2.
    Virtualisation Strategies “Native”Hypervisors Have a runtime Need a “primary” guest OS Examples: Xen, VMWare ESX Server, IBM mainframes Containers Different namespaces for different guests Run on host kernel Userland can be different from host Examples: OpenVZ, FreeVPS, Linux-Vserver Paravirtualisation Emulation Examples: QEMU, PearPC Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 3.
    KVM: Architectures Supported S390 IBM mainframes: a hypervisor is a must Included in 2.6.26 IA-64 Included in 2.6.26 X86 Included in 2.6.20 KVM-lite: PV Linux guest on non-VTx / non-SVM host (proposed) PowerPC PV Architecture support for hypervisor Included in 2.6.26 3 Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 4.
    X86 Hardware Extensions 'guest mode' in addition to user and kernel modes Raise a trap for all privileged instructions Virtualised registers Processor Intel-VTx (VMX) AMD-V (SVM) MM EPT (Intel) NPT (AMD) IO VT-d (Intel) IOMMU (AMD) Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 5.
    What's handled inthe kernel? CPU virtualisation (special instructions) MMU virtualisation Local APIC, PIC, IOAPIC, PIT (guest) paravirtualised network and block device drivers virtio-net virtio-block (guest) paravirtualised kernel support code paravirt_ops MMU (guest) paravirtualised clock driver 5 Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 6.
    KVM Process Model task task guest task task guest kernel 6 Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 7.
    KVM Process Model(cont'd) Guests are scheduled as regular processes kill(1), top(1) work as expected Guest physical memory is mapped into the task's virtual memory space Virtual processors in one VM are threads 7 Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 8.
    KVM Execution Model Userspace Kernel Guest ioctl() Switch to Guest Mode Heavyweight Lightweight Exit Native Guest Exit Execution Kernel Exit Handler Userspace Exit Handler 8 Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 9.
    Flow Example: Memory Access Guest accesses an unmapped memory location Hardware traps into kernel mode kvm walks the guest page table, determines guest physical address kvm performs guest physical -> host physical translation kvm installs shadow page table entry containing guest virtual -> host physical translation Processor restarts execution of faulting instruction 9 Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 10.
    Paravirtualisation Modifying guestOS for performance Virtio Common drivers for all hypervisors Hypervisor-specific backend KVM backend in qemu Faster performance Efficient block, net drivers Balloon lguest, KVM use it already PV DMA Pass through Ethernet devices paravirt_ops Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 11.
    Network Devices Fullyvirtualised device performance not great 55 Mbps for RTL Lots of IO-exits per packet Decided to implement a modern e1000 Advantages: All code in userspace (qemu) All existing drivers recognise device IRQ coalescing Only 2-3 IO-exits per packet Goes in excess of 800 Mbps Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 12.
    Virtio Net Sharedmemory between host and guest Two queues: recv and send Ring buffer within each queue 'available' pointer controlled by guest 'used' pointer controlled by host Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 13.
    Virtio-net on KVM BLK NET User- Virtio PCI space Guest kernel QEMU Shared Memory Linux Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 14.
    Ideas Shared memorybetween host and guest via virtio- pci Shared directory between host and guest using virtio + fuse VMGL (OpenGL for Virtual Machines) support http://kvm.qumranet.com/kvmwiki/TODO Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 15.
    KVM Pros LeveragesLinux scheduler, memory management, I/O No scheduler involvement for I/O Full virtualisation: No changes to the guest necessary Paravirt drivers available for better performance Uses existing Linux security model can run VM as ordinary user Uses existing management tools Power management Guest memory swapping Real-time scheduling, NUMA Leverages Linux development momentum: all new drivers, {cpu, disk} schedulers, file systems, etc supported Qumranet, Inc. All rights reserved. Copyright © 2007
  • 16.
    Distro / Industryinterest libvirt Managing various guests under a hypervisor Support for Xen, KVM APIs between UI, middle layer and virtualisation backend Distributions Debian Ubuntu RedHat EL SLES Qumranet Dekstop Virtualisation Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 17.
    Release Philosophy Developmentsnapshots every 1-2 weeks Release early and often Features introduces quickly Bugs fixed quickly Bugs added quickly Allows developers and users to track and test the latest and greatest Stable releases part of Linux 2.6.x With bugfixes going into Linux 2.6.x.y Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 18.
    Journey Linux2.6.20 (4 Feb 2007): Initial release Linux 2.6.21 (25 Apr 2007): Stability, suspend/resume Linux 2.6.22 (8 Jul 2007): Stable ABI Old userspace, new kernel New userspace, old kernel Linux 2.6.23 (9 Oct 2007): SMP, performance Linux 2.6.24 (24 Jan 2008): In-kernel APIC, preemptibility, virtio Linux 2.6.25 (16 Apr 2008): Guest swapping, paravirt_ops, balloon drv Linux 2.6.26 (soon): PowerPC, s390, IA64, NPT, Copyright © 2007 Qumranet, Inc. All rights reserved. ... EPT, more paravirt (mmu),
  • 19.
    KVM is Developer-friendly No need to reboot (usually) Netconsole, oprofile, all the tools work Small codebase Friendly community Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 20.
    Future Consolidate variousvirtualisation solutions existing in the kernel Started with move to virt/ from drivers/kvm/ More hardware features support More paravirtualisation support Improve guest scaling Better support for management layers like libvirt Intel Real Mode Emulation Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 21.
    Do Read virt/*,arch/[x86|ia64|s390|powerpc]/kvm/* KvmForum2007 wiki page on http://kvm.qumranet.com kvm@vger.kernel.org virtualization@lists.osdl.org Copyright © 2007 Qumranet, Inc. All rights reserved.
  • 22.