Extending bhyve
beyond FreeBD guests
Peter Grehan
grehan@freebsd.org
EuroBSDcon 2013
Sunday, September 29, 2013
What is bhyve ?
• A “minimally viable x86 hypervisor”
• serial console, PCI virtio block/net, 64 bit
host/guests (covers most)
• RequiresVT-x/EPT CPU support (core i*)
• In base-system FreeBSD as of 10.0
Sunday, September 29, 2013
Pieces #1: vmm.ko
• Kernel module implementing:
• VT-x state setup, enter/exit context switching
• Local APIC emulation
• VT-d IOMMU for PCI pass-thru
• Guest physical memory mgmt
• user-space cdev interface
• Small: 64KB text
Sunday, September 29, 2013
Pieces #2: bhyveload
• user-space bootloader
• FreeBSD “userboot” library + bhyve API
• CreatesVM; lays out kernel + metadata;
sets up initialVM register state
• ... and then exits.
Sunday, September 29, 2013
Pieces #3: bhyve
• User-space run loop
• Implements PCI bus/device emulation
• PCI devs - uart, virtio block/net
• Device backends
• stdin/out, tap device, file/block device
• Threads for vCPUs, i/o devs, kqueue loop.
• Small: 130KB text (less than ifconfig)
Sunday, September 29, 2013
Pieces #4: bhyvectl
• Simple utility
• Dump/modifyVM state (registers,VMCS)
• DumpVM stats (e.g. # HLT exits)
• DeleteVMs
Sunday, September 29, 2013
Why non-FreeBSD ?
• Simple: what users expect
• Intent of bhyve was always to run
unmodified guests
• Restricting to modified guest o/s dooms to
slow decay, or as a limited audience
developer-only tool
• UML Linux
Sunday, September 29, 2013
Ordering Development
• Development started with modified guest
o/s - reduces unknowns and complexity
• Then moved to an unmodified guest
• ... older versions of guest
• ... guest o/s’s with source access/buildability
• binary-only guest o/s’s last.
Sunday, September 29, 2013
First “other”: UEFI
• BSD-licensed BIOS replacement
• “OVMF” build target for virtual machines
• Extracts parameters from hypervisor
• Memory map, # CPUs etc
Sunday, September 29, 2013
UEFI issues
• Simple linear loader, but with 16-bit entry
• set up memory area for params
• Wanted 8259/8254 for timer support
• Replaced with HPET
• PCI BARs reprogrammed
• Blew away PCI serial port: changed to use
LPC bridge and ISA-style addresses
Sunday, September 29, 2013
grub
• “grub-emu” build target for user-space
• Mainly for filesystem debug
• Hacked^h^h^h^hCo-opted for a Linux/
*BSD userspace loader
Sunday, September 29, 2013
grub-1.98
• grub-emu builds OOTB on FreeBSD
• However, boot-loader code assumes 32-
bit. Lots of 64-bit issues
• Assumes it is running in target address
space with linear mapping
• Direct pointer derefs
• Trampolines to launch o/s
• Too much conditional code. Abandoned.
Sunday, September 29, 2013
grub-2.00
• grub-emu nightmare build on FreeBSD
• gnulib, aaaargh !!!
• Once done, excellent clean codebase
• Indirection for target mem/reg sets
• Easy interface to bhyve API
• No conditional code in Linux/*BSD loaders
Sunday, September 29, 2013
Linux
• 32-bit “flat” entry point
• Intended for non-BIOS environments
• Well documented !
• RequiresVT-x ‘unrestricted guest’ support
• Lots of trial and error to fix initial TR/TSS
state to avoid “invalid guest state” exit on
vmenter
Sunday, September 29, 2013
Linux #2
• CR0 unconditional write exposed bug
• Lots of CPUID probing
• Required hiding more and more fields
• Lots of MSRs that required emulation/
ignorance
• “Interesting” use of 8254 timer
Sunday, September 29, 2013
Mo’ Linux
• Requires PCI hostbridge, or no PCI probing
• ISO initrd doesn’t have virtio block drivers
• Fixed with GSoC AHCI emulation
• IOAPIC changed to ignore level-trigger
• SMP, virtio devices worked fine
• No custom kernels. Lots of code reading
Sunday, September 29, 2013
OpenBSD
• Using 5.3. Not quite working.
• Well supported by grub-2.00
• No MSI in virtio drivers
• Well... non-standard and bhyve-specific
• 1-line change, committed (thanks!)
Sunday, September 29, 2013
OpenBSD #2
• APIC code exposed instruction emul bugs
• RIP-relative addressing, SIB + displacement
• Required RTC NVRAM implementation
• PCI MSI capability writes to r/o regs
• Fine on real h/w; no point in trapping
• Extensive kernel rebuilds for debug
Sunday, September 29, 2013
Future o/s’s
• NetBSD - no MSI in drivers (fixed?)
• D’flyBSD - BIOS emul for loader
• Illumos
• Requires BIOS emul
• Useful for ZFS development
• Windows - daunting, no source
Sunday, September 29, 2013
Thanks to...
• neel@freebsd.org, who did (and does) all
the hard stuff
• Tycho Nightingale from Pluribus Networks
who contributed the UEFI work
Sunday, September 29, 2013
Questions ?
Sunday, September 29, 2013

Extending bhyve beyond FreeBSD guests - EuroBSDCon 2013

  • 1.
    Extending bhyve beyond FreeBDguests Peter Grehan grehan@freebsd.org EuroBSDcon 2013 Sunday, September 29, 2013
  • 2.
    What is bhyve? • A “minimally viable x86 hypervisor” • serial console, PCI virtio block/net, 64 bit host/guests (covers most) • RequiresVT-x/EPT CPU support (core i*) • In base-system FreeBSD as of 10.0 Sunday, September 29, 2013
  • 3.
    Pieces #1: vmm.ko •Kernel module implementing: • VT-x state setup, enter/exit context switching • Local APIC emulation • VT-d IOMMU for PCI pass-thru • Guest physical memory mgmt • user-space cdev interface • Small: 64KB text Sunday, September 29, 2013
  • 4.
    Pieces #2: bhyveload •user-space bootloader • FreeBSD “userboot” library + bhyve API • CreatesVM; lays out kernel + metadata; sets up initialVM register state • ... and then exits. Sunday, September 29, 2013
  • 5.
    Pieces #3: bhyve •User-space run loop • Implements PCI bus/device emulation • PCI devs - uart, virtio block/net • Device backends • stdin/out, tap device, file/block device • Threads for vCPUs, i/o devs, kqueue loop. • Small: 130KB text (less than ifconfig) Sunday, September 29, 2013
  • 6.
    Pieces #4: bhyvectl •Simple utility • Dump/modifyVM state (registers,VMCS) • DumpVM stats (e.g. # HLT exits) • DeleteVMs Sunday, September 29, 2013
  • 7.
    Why non-FreeBSD ? •Simple: what users expect • Intent of bhyve was always to run unmodified guests • Restricting to modified guest o/s dooms to slow decay, or as a limited audience developer-only tool • UML Linux Sunday, September 29, 2013
  • 8.
    Ordering Development • Developmentstarted with modified guest o/s - reduces unknowns and complexity • Then moved to an unmodified guest • ... older versions of guest • ... guest o/s’s with source access/buildability • binary-only guest o/s’s last. Sunday, September 29, 2013
  • 9.
    First “other”: UEFI •BSD-licensed BIOS replacement • “OVMF” build target for virtual machines • Extracts parameters from hypervisor • Memory map, # CPUs etc Sunday, September 29, 2013
  • 10.
    UEFI issues • Simplelinear loader, but with 16-bit entry • set up memory area for params • Wanted 8259/8254 for timer support • Replaced with HPET • PCI BARs reprogrammed • Blew away PCI serial port: changed to use LPC bridge and ISA-style addresses Sunday, September 29, 2013
  • 11.
    grub • “grub-emu” buildtarget for user-space • Mainly for filesystem debug • Hacked^h^h^h^hCo-opted for a Linux/ *BSD userspace loader Sunday, September 29, 2013
  • 12.
    grub-1.98 • grub-emu buildsOOTB on FreeBSD • However, boot-loader code assumes 32- bit. Lots of 64-bit issues • Assumes it is running in target address space with linear mapping • Direct pointer derefs • Trampolines to launch o/s • Too much conditional code. Abandoned. Sunday, September 29, 2013
  • 13.
    grub-2.00 • grub-emu nightmarebuild on FreeBSD • gnulib, aaaargh !!! • Once done, excellent clean codebase • Indirection for target mem/reg sets • Easy interface to bhyve API • No conditional code in Linux/*BSD loaders Sunday, September 29, 2013
  • 14.
    Linux • 32-bit “flat”entry point • Intended for non-BIOS environments • Well documented ! • RequiresVT-x ‘unrestricted guest’ support • Lots of trial and error to fix initial TR/TSS state to avoid “invalid guest state” exit on vmenter Sunday, September 29, 2013
  • 15.
    Linux #2 • CR0unconditional write exposed bug • Lots of CPUID probing • Required hiding more and more fields • Lots of MSRs that required emulation/ ignorance • “Interesting” use of 8254 timer Sunday, September 29, 2013
  • 16.
    Mo’ Linux • RequiresPCI hostbridge, or no PCI probing • ISO initrd doesn’t have virtio block drivers • Fixed with GSoC AHCI emulation • IOAPIC changed to ignore level-trigger • SMP, virtio devices worked fine • No custom kernels. Lots of code reading Sunday, September 29, 2013
  • 17.
    OpenBSD • Using 5.3.Not quite working. • Well supported by grub-2.00 • No MSI in virtio drivers • Well... non-standard and bhyve-specific • 1-line change, committed (thanks!) Sunday, September 29, 2013
  • 18.
    OpenBSD #2 • APICcode exposed instruction emul bugs • RIP-relative addressing, SIB + displacement • Required RTC NVRAM implementation • PCI MSI capability writes to r/o regs • Fine on real h/w; no point in trapping • Extensive kernel rebuilds for debug Sunday, September 29, 2013
  • 19.
    Future o/s’s • NetBSD- no MSI in drivers (fixed?) • D’flyBSD - BIOS emul for loader • Illumos • Requires BIOS emul • Useful for ZFS development • Windows - daunting, no source Sunday, September 29, 2013
  • 20.
    Thanks to... • neel@freebsd.org,who did (and does) all the hard stuff • Tycho Nightingale from Pluribus Networks who contributed the UEFI work Sunday, September 29, 2013
  • 21.