Implements BIOS
              emulation support for
                     BHyVe
                Takuya ASADA<syuu@freebsd.org>




13年3月17日日曜日
Before talk about BIOS
               Emulation on BHyVe

              Let’s quickly looking into BHyVe internal
              structure and Intel VT-x




13年3月17日日曜日
BHyVe Overview
                              2. Run VM instace
                                       Disk image
                                                                       •   bhyveload loads guest
     1. Create VM instance,               tap device                       OS
        load guest kernel                     stdin/stdout
                  Guest
                  kernel              N
                                             Console
                                                       3. Destroy VM   •   bhyve is userland part of
                                 H
                                 D
                                      I
                                      C
                                                          instance         Hypervisor
                bhyveload
                                     bhyve
                                                          bhyvectl
                                                                           Emulates devices
                                                                       •
                                  libvmmapi

          mmap/ioctl
                                                                           bhyvectl is a management
                                                                           tool
                       /dev/vmm/${vm_name} (vmm.ko)

                               FreeBSD kernel                          •   libvmmapi is userland API
                                                                       •   vmm.ko is kernel part of
                                                                           Hypervisor
13年3月17日日曜日
vmm.ko
              • Provides /dev/vmm/${vmname}
              • Each vmm device file contains each VM
                instance state
              • The device file can create via sysctl:
                hw.vmm.create
              • Destroy via sysctl: hw.vmm.destroy

13年3月17日日曜日
/dev/vmm/${vmname}
                    interfaces
              • read/write/mmap
                Can access guest memory area by standard
                syscall (Which means you even can dump
                guest memory by dd command)
              • ioctl
                Provides various operation to VM



13年3月17日日曜日
/dev/vmm/${vmname}
                     ioctls
              • VM_MAP_MEMORY: Map guest memory
                area as requested size
              • VM_SET/GET_REGISTER: Access registers
              • VM_RUN: Run guest machine, until virtual
                devices accessed (Or some other trap
                happened)



13年3月17日日曜日
bhyveload
              •   FreeBSD bootloader ported to userland: userboot
              •   bhyveload loads userboot.so as dynamic link library, call loader_main function
              •   Once it called, it does following things:
                  •   Parse UFS on diskimage, find kernel
                  •   Load kernel to guest memory area (using mmap)
                  •   Set initial guest register values (using VM_SET_REGISTER ioctl)
                      •   RIP = kernel entry point
                      •   CR0 = Paging enable | Protected mode enable
                      •   EFER = Long mode enable | Long mode active
                      •   Initialize Page Table, set addr to CR3
                      •   Create GDT, IDT, LDT, set addr to GDTR, IDTR, LDTR
                      •   Initialize TR
              •   Guest machine starts from kernel entry point, with 64bit mode enabled
13年3月17日日曜日
bhyve

              • bhyve command runs like following rules:
               while (1) {
                   ioctl(VM_RUN);
                   device_io_emulation();
               }



13年3月17日日曜日
Intel VT-x: Hardware
                  assisted virtualization
                                   VMX                  VMX
                                root mode             non-root
                                                       mode
                                   User                 User
                                 (Ring 3)   VMEntry   (Ring 3)

                                  Kernel     VMExit    Kernel
                                 (Ring 0)             (Ring 0)



              •   New CPU mode:
                  VMX root mode(hypervisor) / VMX non-root mode(guest)
              •   If some event which need to emulate in hypervisor,
                  CPU stops guest, exit to hypervisor → VMExit



13年3月17日日曜日
VT-x configuration

              • Which event should be handled by
                hypervisor?
                It depends hypervisor implementation!
              • VT-x is configurable!
                You can disable/enable each event
              • Also can change some behavior of CPU

13年3月17日日曜日
BHyVe BIOS emulation
                    project
              • Google Summer of Code ’12
                “BHyVe BIOS emulation to boot legacy
                systems”
              • Project Goal:
                Implement BIOS emulation on BHyVe
                hypervisor, to make BHyVe able to support
                more guest OSes


13年3月17日日曜日
Limitation of bhyveload
              • It’s legacy free! yay!
              • But...
              • Only supports FreeBSD/amd64
              • You need to implement kernel loader for
                each OSes
              • Want to run more OSes on BHyVe!
13年3月17日日曜日
Why don’t you just
                  implement OS loader?
              •   Better than supporting legacy ugly BIOS? True! But...
              •   OS loader will be heavily dependent kernel
                  implementation
              •   You’ll be need to implement OS loader for each OSes
                  ex: Linux loader, NetBSD loader, OpenBSD loader...
              •   Maybe it’s very hard to implement proprietary OS loader
              •   Even OS loader could worked, Guest OS may call BIOS
                  interrupt handler → DIE!
                  It’s common on 32bit x86 OSes.
                  Most 64bit OS are legacy free.



13年3月17日日曜日
BIOS interrupt call
         •    Ex: sys/boot/i386/mbr/mbr.s
              main.5:      movw %sp,%di             # Save stack pointer
                           movb 0x1(%si),%dh        # Load head
                           movw 0x2(%si),%cx        # Load cylinder:sector
                           movw $LOAD,%bx           # Transfer buffer
                           testb $FL_PACKET,flags   # Try EDD?
                           jz main.7                # No.
                           pushw %cx                # Save %cx
                           pushw %bx                # Save %bx
                           movw $0x55aa,%bx         # Magic
                           movb $0x41,%ah           # BIOS: EDD extensions
                           int $0x13                #   present?

                           ↑BIOS Interrupt Call




13年3月17日日曜日
What happen when it
                    called?
              int 13h    Software interrupt(INTx)


                        CPU reads interrupt vector
                                                         On the
                                                         ROM
                        Execute BIOS call handler
                                               Perform IO by in/out or MMIO

                                        Hardware




13年3月17日日曜日
How Linux KVM
                    handles BIOS
              • KVM uses QEMU for userland process
              • QEMU has real BIOS called “SeaBIOS”,
                opensource BIOS
              • SeaBIOS perform I/O by in/out instruction
                or MMIO
              • KVM handles these I/O, emulate devices

13年3月17日日曜日
BIOS call handling on
                      KVM
                int 13h               Software interrupt(INTx)


                                     CPU reads interrupt vector


                                     Execute interrupt handler
               SeaBIOS preforms IO                         VMExit by in/out or MMIO
                  to virtual HW
                                                    QEMU HW
     Guest                                           Emulation


              HyperVisor                                             QEMU emulates HW IO




13年3月17日日曜日
Bring SeaBIOS in
                        BHyVe?

              • I wanted to use it
              • But we can’t bring the code in FreeBSD
              • Because it’s GPLv3 licensed


13年3月17日日曜日
OK then, is there BSDL
                 BIOS?
              • Unfortunately, we haven’t find any BSDL
                BIOS
              • But, there’s BSDL DOS emulator on Ports:
                doscmd
              • It has DOS & BIOS interrupt call emulator
                runs on FreeBSD/i386



13年3月17日日曜日
How doscmd works
              •   Map pages on low memory area to place DOS app(<1MB)
              •   Setup interrupt vector / interrupt handler(It just issues HLT;IRET)
              •   Load DOS app on low memory area
              •   Enter virtual 8086 mode(i386_vm86(2)), entry DOS app entry address
              •   CPU executes DOS app in virtual 8086 mode
              •   When DOS app calls DOS/BIOS interrupt call, it handled by interrupt
                  handler, the handler issues HLT instruction
              •   Once HLT instruction issued, CPU leaves from virtual 8086 mode
              •   doscmd emulates DOS/BIOS interrupt call                               virtual 8086
              •   return to virtual 8086 mode                                               mode




13年3月17日日曜日
How doscmd works
                    int 13h            Software interrupt(INTx)


                                      CPU reads interrupt vector


              Issue HLT instruction   Execute interrupt handler

                                                           HLT instruction Trap

DOS app on
                                                   BIOS Emulation
v8086 mode
                                                                    doscmd emulates BIOS call
doscmd on FreeBSD/i386

13年3月17日日曜日
Difference of BIOS handling
             on QEMU vs doscmd
          • QEMU
            Runs real BIOS in guest machine
                Interrupt handler handles BIOS interrupt call
                QEMU just emulates hardware devices
              • doscmd
                Hasn’t real BIOS
                Interrupt handler is just for trap vm86
                machine
                doscmd emulates BIOS interrupt call handler


13年3月17日日曜日
Plan to emulate BIOS
                        on BHyVe
              •   Extract only necessary code from doscmd, make it library
                  Export two function: biosemul_init() / biosemul_call()
              •   In biosemul_init(), perform BIOS compatible initialization
                  (initialize register value, boot sector loading, initialize
                  interrupt vector, install interrupt handler)
                  •   On interrupt handler, use VMCALL instruction instead of
                      HLT instruction
                      Because GuestOS also may use HLT, and we don’t want
                      to handle it by BIOS emulation code
              •   biosemul_call() handles BIOS interrupt call
                  Executes BIOS interrupt call emulation using doscmd code



13年3月17日日曜日
How to handle BIOS
              interrupt call in BHyVe
                int 13h          Software interrupt(INTx)


                                CPU reads interrupt vector


                               Execute interrupt call handler
               Issue VMCALL                             VMExit by VMCALL
                 instruction

                                              BIOS Emulation
     Guest
              HyperVisor                                        doscmd emulates BIOS call




13年3月17日日曜日
Why don’t you trap
                  interrupt directly?
              •   Intel VT-x has ability to trap interrupt directly
                  (no need to issue VMCALL instruction in
                  interrupt handler)
              •   Why we shouldn’t use it for BIOS emulation?
                  Because guest OS may use BIOS interrupt call
                  vector numbers for different software interrupt
                  after entering protected mode
              •   Bootloaders may invoke interrupt handler by
                  jumping address (btx does it)


13年3月17日日曜日
Problems(1)
              •   doscmd is 64bit unsafe!
                  Need to rewrite some type definition
                  Ex: u_long → uint32_t
              •   doscmd maps guest memory area at 0x0
                  Maybe we also can mmap guest memry area at 0x0
                  on BHyVe, but I rewrited code
                  Ex:
                  *(char *)(0x400) = 0;
                        ↓
                  *(char *)(0x400 + guest_mem) = 0;


13年3月17日日曜日
Problems(2)
              • Guest register storage
                doscmd stores register value in their
                structure, but BHyVe requires to issue ioctl
                to set/get guest register

                I decided to copy all register first, then
                emulate BIOS interrupt call, writeback
                modified register after that


13年3月17日日曜日
Debugging BIOS
                          emulator
              •   When I started implementing BIOS emulation, I inserted register
                  dump for each BIOS interrupt call
              •   Actually, dumping for each BIOS interrupt call is too few to
                  determine what’s going on
                  •   And the emulation doesn’t worked fine, it finally jumped away
                      to strange EIP and commit suicide, I have no idea
              •   I haven’t find a way to run BHyVe on an emulator and getting
                  instruction level trace
                  •   BHyVe can run on VMware, but I haven’t find a way to do
                      tracing on it
              •   Decided to implement instruction level trace on BHyVe



13年3月17日日曜日
Implement instruction
          level tracer on BHyVe(1)
              •   If guest CPU is emulated, dumping each instruction is
                  very easy
                  Just dump everything when instruction decoder called
              •   But, on BHyVe guest program runs natively
                  Because it uses VT-x
              •   This means, you have no way to inspect instruction or
                  dump registers until VMExit caused
              •   Then, we can raise exception on every instruction
              •   You can insert instruction to raise exception, but x86 has
                  a flag to single step debugging (TF bit on EFLAGS)



13年3月17日日曜日
Implement instruction
          level tracer on BHyVe(2)
              • At first, I implemented following rule:
                • Sets TF bit on EFLAGS, enables VMExit on
                  #DB exception
                • bhyve handle #DB exception, disassembly
                  instruction on EIP, step forward EIP
                  address,VMEnter again
              • I suddenly realized VMExit causing BEFORE
                executing instruction! USELESS!!


13年3月17日日曜日
Implement instruction
          level tracer on BHyVe(3)
              •   I changed my mind to handle it just same as BIOS interrupt
                  call (interrupt handler issue VMCALL instruction→VMExit)
              •   EIP and some register are pushed on stack because it’s not
                  returned
                  Need to fetch from stack to dump
                  •   OLD_EIP = *(uint16_t *)(ESP)
                  •   OLD_CS = * (uint16_t *)(ESP + 2)
                  •   OLD_EFLAGS = * (uint16_t *)(ESP + 4)
                  •   OLD_ESP = * (uint16_t *)(ESP + 6)



13年3月17日日曜日
Instruction level tracer
                       output
              [trace] 16bit ip:7c3e cs:0 flags:102 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:cld
              [trace] 16bit ip:7c3f cs:0 flags:102 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:xor %cx, %cx
              [trace] 16bit ip:7c41 cs:0 flags:146 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %cx, %es
              [trace] 16bit ip:7c43 cs:0 flags:146 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %cx, %ds
              [trace] 16bit ip:7c45 cs:0 flags:146 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %cx, %ss
              [trace] 16bit ip:7c4a cs:0 flags:146 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %sp, %si
              [trace] 16bit ip:7c4c cs:0 flags:146 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov $0x700, %di
              [trace] 16bit ip:7c4f cs:0 flags:146 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:incb %ch
              [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:100 edx:80 insn:rep movsw
              [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:ff edx:80 insn:rep movsw
              [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fe edx:80 insn:rep movsw
              [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fd edx:80 insn:rep movsw
              [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fc edx:80 insn:rep movsw
              [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fb edx:80 insn:rep movsw
              [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fa edx:80 insn:rep movsw
              [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:f9 edx:80 insn:rep movsw
              [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:f8 edx:80 insn:rep movsw
              [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:f7 edx:80 insn:rep movsw
13年3月17日日曜日
Tracing suddenly stops!
                        (1)
              • EFLAGS can be cleared on some conditions
               • popf clears EFLAGS:
                  #DB exception still causes immediately
                  after popf instruction issued, so setting TF
                  bit on OLD_FLAGS(on stack) can solve
                  the issue
                  (Guest machine restores EFLAGS by
                  IRET)


13年3月17日日曜日
Tracing suddenly stops!
                        (2)
              •   EFLAGS can be cleared on some conditions
                  •   BIOS interrupt call VMExit:
                      Looks like CPU clears TF flag when it interrupted
                      doscmd uses following interrupt call handler for handle
                      BIOS interrupt call:
                      VMCALL; STI; RETF 2
                      RETF 2 means don’t restore CS and EFLAGS, so changing
                      OLD_EFLAGS(on stack) has no effect
                      Just sets TF bit on EFLAGS can solve the issue
                  •   But we must not set TF bit on EFLAGS when interrupt is
                      #DB exception
                      It causes infinite loop



13年3月17日日曜日
Tracing suddenly stops!
                        (3)
              •   lidt just before switching protected mode
                  •   After IDTR changed, #DB exception cannot handle anymore
                  •   Because #DB handler only installed on real mode interrupt
                      vector, not on IDT
                  •   Modified IDT and implement #DB handler on btx
                  •   #DB exception haven’t caused in real mode after the lidt
                      instruction
                  •   Probably because IDT for protected mode is not valid for real
                      mode
                  •   After switching protected mode, tracing could resumed by set
                      TF flag on EFLAGS



13年3月17日日曜日
Exception causes
                      exception
              • Not really sure, but it looks like exception
                raises at an exception handler
              • Because of this, it can’t print error on
                console
              • Inserted VMCALL at the beginning of
                exception handler, dump it all



13年3月17日日曜日
BTX interrupt call
                   causes exception
              [trace] 32bit-kern eip:9332 cs:18 eflags:106 ss:10 esp:17b8 ds:10 cr0:31 eax:31
              ebx:9357 ecx:0 edx:70000 insn:decb %al
              [trace] 32bit-kern eip:9334 cs:18 eflags:106 ss:10 esp:17b8 ds:10 cr0:31 eax:30
              ebx:9357 ecx:0 edx:70000 insn:mov %eax, %cr0
              [trace] 32bit-kern eip:9097 cs:8 eflags:146 ss:0 esp:1800 ds:0 cr0:31 eax:102
              ebx:2820 ecx:0 edx:708ee insn:mov $0x10, %cl
              [trace] 32bit-kern eip:9099 cs:8 eflags:146 ss:0 esp:1800 ds:0 cr0:31 eax:102
              ebx:2820 ecx:10 edx:708ee insn:mov %ecx, %ss
              [trace] 32bit-kern eip:909d cs:8 eflags:146 ss:10 esp:1800 ds:0 cr0:31 eax:102
              ebx:2820 ecx:38 edx:708ee insn:ltr %cx
              [except] 32bit-kern exception:13 error_code:38 eip:909d cs:8 eflags:10146 ss:
              10 esp:1800 insn:ltr %cx ds:0 cr0:31 eax:102 ebx:2820 ecx:38 edx:708ee

         •    INT 0x31 (BIOS call from BTX app) causes an exception at LTR instruction

         •    I Have no idea... → Tried to skips all BIOS call on boot2 & loader, use in/out


13年3月17日日曜日
rep causes exception in
                  loader
              [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31
              eax:a0200 ebx:201000 ecx:52f edx:50000a insn:rep movsb
              [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31
              eax:a0200 ebx:201000 ecx:52e edx:50000a insn:rep movsb
              [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31
              eax:a0200 ebx:201000 ecx:52d edx:50000a insn:rep movsb
              [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31
              eax:a0200 ebx:201000 ecx:52c edx:50000a insn:rep movsb
              [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31
              eax:a0290 ebx:201000 ecx:52b edx:50000a insn:rep movsb
              [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31
              eax:a027b ebx:201000 ecx:52a edx:50000a insn:rep movsb
              [except] 32bit-kern exception:3 error_code:0 eip:2000c4 cs:8 eflags:10106 ss:10
              esp:ffc insn:rep movsb ds:10 cr0:31 eax:a0236 ebx:201000 ecx:529 edx:50000a


         •    Really haven’t good idea...


13年3月17日日曜日
Demonstration



13年3月17日日曜日
Conclusion
              •   Test implementation of BIOS emulator for BHyVe
                  implemented
              •   Instruction level tracer implemented on it for debugging
              •   Reached at /boot/loader stage, but it dies before loading
                  a kernel
              •   Advices by bootloader developers are really needed
              •   Advices for better debugging method is also needed
                  (Is there hardware debugger for x86?
                  Or, maybe VMware has cool debugging feature?)



13年3月17日日曜日

Implements BIOS emulation support for BHyVe

  • 1.
    Implements BIOS emulation support for BHyVe Takuya ASADA<syuu@freebsd.org> 13年3月17日日曜日
  • 2.
    Before talk aboutBIOS Emulation on BHyVe Let’s quickly looking into BHyVe internal structure and Intel VT-x 13年3月17日日曜日
  • 3.
    BHyVe Overview 2. Run VM instace Disk image • bhyveload loads guest 1. Create VM instance, tap device OS load guest kernel stdin/stdout Guest kernel N Console 3. Destroy VM • bhyve is userland part of H D I C instance Hypervisor bhyveload bhyve bhyvectl Emulates devices • libvmmapi mmap/ioctl bhyvectl is a management tool /dev/vmm/${vm_name} (vmm.ko) FreeBSD kernel • libvmmapi is userland API • vmm.ko is kernel part of Hypervisor 13年3月17日日曜日
  • 4.
    vmm.ko • Provides /dev/vmm/${vmname} • Each vmm device file contains each VM instance state • The device file can create via sysctl: hw.vmm.create • Destroy via sysctl: hw.vmm.destroy 13年3月17日日曜日
  • 5.
    /dev/vmm/${vmname} interfaces • read/write/mmap Can access guest memory area by standard syscall (Which means you even can dump guest memory by dd command) • ioctl Provides various operation to VM 13年3月17日日曜日
  • 6.
    /dev/vmm/${vmname} ioctls • VM_MAP_MEMORY: Map guest memory area as requested size • VM_SET/GET_REGISTER: Access registers • VM_RUN: Run guest machine, until virtual devices accessed (Or some other trap happened) 13年3月17日日曜日
  • 7.
    bhyveload • FreeBSD bootloader ported to userland: userboot • bhyveload loads userboot.so as dynamic link library, call loader_main function • Once it called, it does following things: • Parse UFS on diskimage, find kernel • Load kernel to guest memory area (using mmap) • Set initial guest register values (using VM_SET_REGISTER ioctl) • RIP = kernel entry point • CR0 = Paging enable | Protected mode enable • EFER = Long mode enable | Long mode active • Initialize Page Table, set addr to CR3 • Create GDT, IDT, LDT, set addr to GDTR, IDTR, LDTR • Initialize TR • Guest machine starts from kernel entry point, with 64bit mode enabled 13年3月17日日曜日
  • 8.
    bhyve • bhyve command runs like following rules: while (1) { ioctl(VM_RUN); device_io_emulation(); } 13年3月17日日曜日
  • 9.
    Intel VT-x: Hardware assisted virtualization VMX VMX root mode non-root mode User User (Ring 3) VMEntry (Ring 3) Kernel VMExit Kernel (Ring 0) (Ring 0) • New CPU mode: VMX root mode(hypervisor) / VMX non-root mode(guest) • If some event which need to emulate in hypervisor, CPU stops guest, exit to hypervisor → VMExit 13年3月17日日曜日
  • 10.
    VT-x configuration • Which event should be handled by hypervisor? It depends hypervisor implementation! • VT-x is configurable! You can disable/enable each event • Also can change some behavior of CPU 13年3月17日日曜日
  • 11.
    BHyVe BIOS emulation project • Google Summer of Code ’12 “BHyVe BIOS emulation to boot legacy systems” • Project Goal: Implement BIOS emulation on BHyVe hypervisor, to make BHyVe able to support more guest OSes 13年3月17日日曜日
  • 12.
    Limitation of bhyveload • It’s legacy free! yay! • But... • Only supports FreeBSD/amd64 • You need to implement kernel loader for each OSes • Want to run more OSes on BHyVe! 13年3月17日日曜日
  • 13.
    Why don’t youjust implement OS loader? • Better than supporting legacy ugly BIOS? True! But... • OS loader will be heavily dependent kernel implementation • You’ll be need to implement OS loader for each OSes ex: Linux loader, NetBSD loader, OpenBSD loader... • Maybe it’s very hard to implement proprietary OS loader • Even OS loader could worked, Guest OS may call BIOS interrupt handler → DIE! It’s common on 32bit x86 OSes. Most 64bit OS are legacy free. 13年3月17日日曜日
  • 14.
    BIOS interrupt call • Ex: sys/boot/i386/mbr/mbr.s main.5: movw %sp,%di # Save stack pointer movb 0x1(%si),%dh # Load head movw 0x2(%si),%cx # Load cylinder:sector movw $LOAD,%bx # Transfer buffer testb $FL_PACKET,flags # Try EDD? jz main.7 # No. pushw %cx # Save %cx pushw %bx # Save %bx movw $0x55aa,%bx # Magic movb $0x41,%ah # BIOS: EDD extensions int $0x13 # present?    ↑BIOS Interrupt Call 13年3月17日日曜日
  • 15.
    What happen whenit called? int 13h Software interrupt(INTx) CPU reads interrupt vector On the ROM Execute BIOS call handler Perform IO by in/out or MMIO Hardware 13年3月17日日曜日
  • 16.
    How Linux KVM handles BIOS • KVM uses QEMU for userland process • QEMU has real BIOS called “SeaBIOS”, opensource BIOS • SeaBIOS perform I/O by in/out instruction or MMIO • KVM handles these I/O, emulate devices 13年3月17日日曜日
  • 17.
    BIOS call handlingon KVM int 13h Software interrupt(INTx) CPU reads interrupt vector Execute interrupt handler SeaBIOS preforms IO VMExit by in/out or MMIO to virtual HW QEMU HW Guest Emulation HyperVisor QEMU emulates HW IO 13年3月17日日曜日
  • 18.
    Bring SeaBIOS in BHyVe? • I wanted to use it • But we can’t bring the code in FreeBSD • Because it’s GPLv3 licensed 13年3月17日日曜日
  • 19.
    OK then, isthere BSDL BIOS? • Unfortunately, we haven’t find any BSDL BIOS • But, there’s BSDL DOS emulator on Ports: doscmd • It has DOS & BIOS interrupt call emulator runs on FreeBSD/i386 13年3月17日日曜日
  • 20.
    How doscmd works • Map pages on low memory area to place DOS app(<1MB) • Setup interrupt vector / interrupt handler(It just issues HLT;IRET) • Load DOS app on low memory area • Enter virtual 8086 mode(i386_vm86(2)), entry DOS app entry address • CPU executes DOS app in virtual 8086 mode • When DOS app calls DOS/BIOS interrupt call, it handled by interrupt handler, the handler issues HLT instruction • Once HLT instruction issued, CPU leaves from virtual 8086 mode • doscmd emulates DOS/BIOS interrupt call virtual 8086 • return to virtual 8086 mode mode 13年3月17日日曜日
  • 21.
    How doscmd works int 13h Software interrupt(INTx) CPU reads interrupt vector Issue HLT instruction Execute interrupt handler HLT instruction Trap DOS app on BIOS Emulation v8086 mode doscmd emulates BIOS call doscmd on FreeBSD/i386 13年3月17日日曜日
  • 22.
    Difference of BIOShandling on QEMU vs doscmd • QEMU Runs real BIOS in guest machine Interrupt handler handles BIOS interrupt call QEMU just emulates hardware devices • doscmd Hasn’t real BIOS Interrupt handler is just for trap vm86 machine doscmd emulates BIOS interrupt call handler 13年3月17日日曜日
  • 23.
    Plan to emulateBIOS on BHyVe • Extract only necessary code from doscmd, make it library Export two function: biosemul_init() / biosemul_call() • In biosemul_init(), perform BIOS compatible initialization (initialize register value, boot sector loading, initialize interrupt vector, install interrupt handler) • On interrupt handler, use VMCALL instruction instead of HLT instruction Because GuestOS also may use HLT, and we don’t want to handle it by BIOS emulation code • biosemul_call() handles BIOS interrupt call Executes BIOS interrupt call emulation using doscmd code 13年3月17日日曜日
  • 24.
    How to handleBIOS interrupt call in BHyVe int 13h Software interrupt(INTx) CPU reads interrupt vector Execute interrupt call handler Issue VMCALL VMExit by VMCALL instruction BIOS Emulation Guest HyperVisor doscmd emulates BIOS call 13年3月17日日曜日
  • 25.
    Why don’t youtrap interrupt directly? • Intel VT-x has ability to trap interrupt directly (no need to issue VMCALL instruction in interrupt handler) • Why we shouldn’t use it for BIOS emulation? Because guest OS may use BIOS interrupt call vector numbers for different software interrupt after entering protected mode • Bootloaders may invoke interrupt handler by jumping address (btx does it) 13年3月17日日曜日
  • 26.
    Problems(1) • doscmd is 64bit unsafe! Need to rewrite some type definition Ex: u_long → uint32_t • doscmd maps guest memory area at 0x0 Maybe we also can mmap guest memry area at 0x0 on BHyVe, but I rewrited code Ex: *(char *)(0x400) = 0;       ↓ *(char *)(0x400 + guest_mem) = 0; 13年3月17日日曜日
  • 27.
    Problems(2) • Guest register storage doscmd stores register value in their structure, but BHyVe requires to issue ioctl to set/get guest register I decided to copy all register first, then emulate BIOS interrupt call, writeback modified register after that 13年3月17日日曜日
  • 28.
    Debugging BIOS emulator • When I started implementing BIOS emulation, I inserted register dump for each BIOS interrupt call • Actually, dumping for each BIOS interrupt call is too few to determine what’s going on • And the emulation doesn’t worked fine, it finally jumped away to strange EIP and commit suicide, I have no idea • I haven’t find a way to run BHyVe on an emulator and getting instruction level trace • BHyVe can run on VMware, but I haven’t find a way to do tracing on it • Decided to implement instruction level trace on BHyVe 13年3月17日日曜日
  • 29.
    Implement instruction level tracer on BHyVe(1) • If guest CPU is emulated, dumping each instruction is very easy Just dump everything when instruction decoder called • But, on BHyVe guest program runs natively Because it uses VT-x • This means, you have no way to inspect instruction or dump registers until VMExit caused • Then, we can raise exception on every instruction • You can insert instruction to raise exception, but x86 has a flag to single step debugging (TF bit on EFLAGS) 13年3月17日日曜日
  • 30.
    Implement instruction level tracer on BHyVe(2) • At first, I implemented following rule: • Sets TF bit on EFLAGS, enables VMExit on #DB exception • bhyve handle #DB exception, disassembly instruction on EIP, step forward EIP address,VMEnter again • I suddenly realized VMExit causing BEFORE executing instruction! USELESS!! 13年3月17日日曜日
  • 31.
    Implement instruction level tracer on BHyVe(3) • I changed my mind to handle it just same as BIOS interrupt call (interrupt handler issue VMCALL instruction→VMExit) • EIP and some register are pushed on stack because it’s not returned Need to fetch from stack to dump • OLD_EIP = *(uint16_t *)(ESP) • OLD_CS = * (uint16_t *)(ESP + 2) • OLD_EFLAGS = * (uint16_t *)(ESP + 4) • OLD_ESP = * (uint16_t *)(ESP + 6) 13年3月17日日曜日
  • 32.
    Instruction level tracer output [trace] 16bit ip:7c3e cs:0 flags:102 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:cld [trace] 16bit ip:7c3f cs:0 flags:102 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:xor %cx, %cx [trace] 16bit ip:7c41 cs:0 flags:146 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %cx, %es [trace] 16bit ip:7c43 cs:0 flags:146 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %cx, %ds [trace] 16bit ip:7c45 cs:0 flags:146 ss:0 sp:7ffe ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %cx, %ss [trace] 16bit ip:7c4a cs:0 flags:146 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov %sp, %si [trace] 16bit ip:7c4c cs:0 flags:146 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:mov $0x700, %di [trace] 16bit ip:7c4f cs:0 flags:146 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:0 edx:80 insn:incb %ch [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:100 edx:80 insn:rep movsw [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:ff edx:80 insn:rep movsw [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fe edx:80 insn:rep movsw [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fd edx:80 insn:rep movsw [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fc edx:80 insn:rep movsw [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fb edx:80 insn:rep movsw [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:fa edx:80 insn:rep movsw [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:f9 edx:80 insn:rep movsw [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:f8 edx:80 insn:rep movsw [trace] 16bit ip:7c51 cs:0 flags:102 ss:0 sp:7c00 ds:0 cr0:30 eax:0 ebx:0 ecx:f7 edx:80 insn:rep movsw 13年3月17日日曜日
  • 33.
    Tracing suddenly stops! (1) • EFLAGS can be cleared on some conditions • popf clears EFLAGS: #DB exception still causes immediately after popf instruction issued, so setting TF bit on OLD_FLAGS(on stack) can solve the issue (Guest machine restores EFLAGS by IRET) 13年3月17日日曜日
  • 34.
    Tracing suddenly stops! (2) • EFLAGS can be cleared on some conditions • BIOS interrupt call VMExit: Looks like CPU clears TF flag when it interrupted doscmd uses following interrupt call handler for handle BIOS interrupt call: VMCALL; STI; RETF 2 RETF 2 means don’t restore CS and EFLAGS, so changing OLD_EFLAGS(on stack) has no effect Just sets TF bit on EFLAGS can solve the issue • But we must not set TF bit on EFLAGS when interrupt is #DB exception It causes infinite loop 13年3月17日日曜日
  • 35.
    Tracing suddenly stops! (3) • lidt just before switching protected mode • After IDTR changed, #DB exception cannot handle anymore • Because #DB handler only installed on real mode interrupt vector, not on IDT • Modified IDT and implement #DB handler on btx • #DB exception haven’t caused in real mode after the lidt instruction • Probably because IDT for protected mode is not valid for real mode • After switching protected mode, tracing could resumed by set TF flag on EFLAGS 13年3月17日日曜日
  • 36.
    Exception causes exception • Not really sure, but it looks like exception raises at an exception handler • Because of this, it can’t print error on console • Inserted VMCALL at the beginning of exception handler, dump it all 13年3月17日日曜日
  • 37.
    BTX interrupt call causes exception [trace] 32bit-kern eip:9332 cs:18 eflags:106 ss:10 esp:17b8 ds:10 cr0:31 eax:31 ebx:9357 ecx:0 edx:70000 insn:decb %al [trace] 32bit-kern eip:9334 cs:18 eflags:106 ss:10 esp:17b8 ds:10 cr0:31 eax:30 ebx:9357 ecx:0 edx:70000 insn:mov %eax, %cr0 [trace] 32bit-kern eip:9097 cs:8 eflags:146 ss:0 esp:1800 ds:0 cr0:31 eax:102 ebx:2820 ecx:0 edx:708ee insn:mov $0x10, %cl [trace] 32bit-kern eip:9099 cs:8 eflags:146 ss:0 esp:1800 ds:0 cr0:31 eax:102 ebx:2820 ecx:10 edx:708ee insn:mov %ecx, %ss [trace] 32bit-kern eip:909d cs:8 eflags:146 ss:10 esp:1800 ds:0 cr0:31 eax:102 ebx:2820 ecx:38 edx:708ee insn:ltr %cx [except] 32bit-kern exception:13 error_code:38 eip:909d cs:8 eflags:10146 ss: 10 esp:1800 insn:ltr %cx ds:0 cr0:31 eax:102 ebx:2820 ecx:38 edx:708ee • INT 0x31 (BIOS call from BTX app) causes an exception at LTR instruction • I Have no idea... → Tried to skips all BIOS call on boot2 & loader, use in/out 13年3月17日日曜日
  • 38.
    rep causes exceptionin loader [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a0200 ebx:201000 ecx:52f edx:50000a insn:rep movsb [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a0200 ebx:201000 ecx:52e edx:50000a insn:rep movsb [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a0200 ebx:201000 ecx:52d edx:50000a insn:rep movsb [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a0200 ebx:201000 ecx:52c edx:50000a insn:rep movsb [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a0290 ebx:201000 ecx:52b edx:50000a insn:rep movsb [trace] 32bit-kern eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc ds:10 cr0:31 eax:a027b ebx:201000 ecx:52a edx:50000a insn:rep movsb [except] 32bit-kern exception:3 error_code:0 eip:2000c4 cs:8 eflags:10106 ss:10 esp:ffc insn:rep movsb ds:10 cr0:31 eax:a0236 ebx:201000 ecx:529 edx:50000a • Really haven’t good idea... 13年3月17日日曜日
  • 39.
  • 40.
    Conclusion • Test implementation of BIOS emulator for BHyVe implemented • Instruction level tracer implemented on it for debugging • Reached at /boot/loader stage, but it dies before loading a kernel • Advices by bootloader developers are really needed • Advices for better debugging method is also needed (Is there hardware debugger for x86? Or, maybe VMware has cool debugging feature?) 13年3月17日日曜日