Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

XPDDS17: Xen Test Framework: Testing, From a Guest's Perspective - Andrew Cooper, Citrix

163 views

Published on

Sensible testing of hypervisor behaviour is a complicated task. Checking whether guest OSes boot and install properly is certainly useful, but this only covers a fraction of the guest/hypervisor interfaces. x86 in particular has large quantities of architecture which isn't used by any modern OS.

Unit testing on the other hand would be a great, if unit testing a kernel were a plausible task in general. XTF takes an alternative approach, and allows for component level testing from a unikernel-like perspective.

It is amazing what you find from this viewpoint.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

XPDDS17: Xen Test Framework: Testing, From a Guest's Perspective - Andrew Cooper, Citrix

  1. 1. Xen Test Framework Testing from a guest’s perspective Andrew Cooper Citrix XenServer Thursday 13th July 2017 Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 1 / 13
  2. 2. Views on testing Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 2 / 13
  3. 3. Views on testing Ever thought? “This bugfix is so simple, it is clearly correct” Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 2 / 13
  4. 4. Views on testing Ever thought? “This bugfix is so simple, it is clearly correct” “I’ve only got time for a dev test at the moment” Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 2 / 13
  5. 5. Views on testing Ever thought? “This bugfix is so simple, it is clearly correct” “I’ve only got time for a dev test at the moment” “Writing a test for that will be too hard” Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 2 / 13
  6. 6. Views on testing Ever thought? “This bugfix is so simple, it is clearly correct” “I’ve only got time for a dev test at the moment” “Writing a test for that will be too hard” Who actually likes testing? Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 2 / 13
  7. 7. Views on testing Ever thought? “This bugfix is so simple, it is clearly correct” “I’ve only got time for a dev test at the moment” “Writing a test for that will be too hard” Who actually likes testing? Testing is frequently a lower priority activity than it should be Therefore, it tends not to get done Especially if there is no pressure to do so Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 2 / 13
  8. 8. Testing is perceived as being hard Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 3 / 13
  9. 9. Testing is perceived as being hard Because it is... ... without good infrastructure Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 3 / 13
  10. 10. Testing is perceived as being hard Because it is... ... without good infrastructure Technical problems boil down to the fact that: Writing tests is hard Automating written tests is hard Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 3 / 13
  11. 11. Testing is perceived as being hard Because it is... ... without good infrastructure Technical problems boil down to the fact that: Writing tests is hard Automating written tests is hard The Xen Test Framework, and associated infrastructure, intend to remove these obstacles Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 3 / 13
  12. 12. “Writing tests is hard” Easy 4-step guide: Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 4 / 13
  13. 13. “Writing tests is hard” Easy 4-step guide: root@testbox:~/xtf# ./make-new-test.sh mytest Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 4 / 13
  14. 14. “Writing tests is hard” Easy 4-step guide: root@testbox:~/xtf# ./make-new-test.sh mytest root@testbox:~/xtf# $EDITOR tests/mytest/main.c Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 4 / 13
  15. 15. “Writing tests is hard” Easy 4-step guide: root@testbox:~/xtf# ./make-new-test.sh mytest root@testbox:~/xtf# $EDITOR tests/mytest/main.c root@testbox:~/xtf# make Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 4 / 13
  16. 16. “Writing tests is hard” Easy 4-step guide: root@testbox:~/xtf# ./make-new-test.sh mytest root@testbox:~/xtf# $EDITOR tests/mytest/main.c root@testbox:~/xtf# make root@testbox:~/xtf# ./xtf-runner mytest Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 4 / 13
  17. 17. “Writing tests is hard” Easy 4-step guide: root@testbox:~/xtf# ./make-new-test.sh mytest root@testbox:~/xtf# $EDITOR tests/mytest/main.c root@testbox:~/xtf# make root@testbox:~/xtf# ./xtf-runner mytest Combined test results: test-pv32pae-mytest SUCCESS test-pv64-mytest SUCCESS test-hvm32-mytest SUCCESS test-hvm32pse-mytest SUCCESS test-hvm32pae-mytest SUCCESS test-hvm64-mytest SUCCESS Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 4 / 13
  18. 18. “Writing tests is hard” #include <xtf.h> const char test_title[] = "XSA-203 PoC"; bool test_needs_fep = true; void test_main(void) { asm volatile (_ASM_XEN_FEP "1: vmfunc; 2:" /* Ignore #UD on older Xen versions. */ _ASM_EXTABLE(1b, 2b) :: "a" (0)); /* If Xen is alive, it didn’t hit the NULL pointer. */ xtf_success("Success: Not vulnerable to XSA-203n"); } Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 5 / 13
  19. 19. “Automating written tests is hard” Easy 2-step guide: Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 6 / 13
  20. 20. “Automating written tests is hard” Easy 2-step guide: Get patch to me via xen-devel Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 6 / 13
  21. 21. “Automating written tests is hard” Easy 2-step guide: Get patch to me via xen-devel me@box:~/xtf$ git push upstream master Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 6 / 13
  22. 22. “Automating written tests is hard” Easy 2-step guide: Get patch to me via xen-devel me@box:~/xtf$ git push upstream master New test will be picked up automatically by OSSTest Will start blocking pushes to master if a regression is detected Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 6 / 13
  23. 23. History of XTF XSA-106 was initially reported to XenServer (August 2014) “x86_emulate() doesn’t perform DPL checks for software exceptions” These are: int3, into, int $x and icebp A windows userspace PoC existed, which would cause a BSOD Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 7 / 13
  24. 24. History of XTF XSA-106 was initially reported to XenServer (August 2014) “x86_emulate() doesn’t perform DPL checks for software exceptions” These are: int3, into, int $x and icebp A windows userspace PoC existed, which would cause a BSOD On inspection: No checks of the IDT Entry at all Also important to check the descriptor type and present bit All software events injected as hardware exceptions Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 7 / 13
  25. 25. History of XTF XSA-106 was initially reported to XenServer (August 2014) “x86_emulate() doesn’t perform DPL checks for software exceptions” These are: int3, into, int $x and icebp A windows userspace PoC existed, which would cause a BSOD On inspection: No checks of the IDT Entry at all Also important to check the descriptor type and present bit All software events injected as hardware exceptions First stab at a fix changed the BSOD, but didn’t fix it Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 7 / 13
  26. 26. History of XTF XSA-106 was initially reported to XenServer (August 2014) “x86_emulate() doesn’t perform DPL checks for software exceptions” These are: int3, into, int $x and icebp A windows userspace PoC existed, which would cause a BSOD On inspection: No checks of the IDT Entry at all Also important to check the descriptor type and present bit All software events injected as hardware exceptions First stab at a fix changed the BSOD, but didn’t fix it Needed an easier repro! Started with hvmloader Removed all BIOS-related bits, added an IDT Borrowed the Force Emulation Prefix from PV guests A mess, but it did work Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 7 / 13
  27. 27. History of XTF The successful case semantics were wrong Should have had Trap semantics, actually had Fault semantics Fixed the test code up to detect and break infinite loops Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 8 / 13
  28. 28. History of XTF The successful case semantics were wrong Should have had Trap semantics, actually had Fault semantics Fixed the test code up to detect and break infinite loops The icebp instruction bypasses DPL checks Also sets the External bit in error codes Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 8 / 13
  29. 29. History of XTF The successful case semantics were wrong Should have had Trap semantics, actually had Fault semantics Fixed the test code up to detect and break infinite loops The icebp instruction bypasses DPL checks Also sets the External bit in error codes The first proposed fix didn’t actually work on non-Intel hardware Without NRIPS, AMD hardware can’t correctly inject a software exception which faults for IDT-related reasons With NRIPS, AMD hardware experimentally still can’t inject icebp properly if it would fault Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 8 / 13
  30. 30. History of XTF The successful case semantics were wrong Should have had Trap semantics, actually had Fault semantics Fixed the test code up to detect and break infinite loops The icebp instruction bypasses DPL checks Also sets the External bit in error codes The first proposed fix didn’t actually work on non-Intel hardware Without NRIPS, AMD hardware can’t correctly inject a software exception which faults for IDT-related reasons With NRIPS, AMD hardware experimentally still can’t inject icebp properly if it would fault XSA-106 was released (September 2014) Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 8 / 13
  31. 31. History of XTF The successful case semantics were wrong Should have had Trap semantics, actually had Fault semantics Fixed the test code up to detect and break infinite loops The icebp instruction bypasses DPL checks Also sets the External bit in error codes The first proposed fix didn’t actually work on non-Intel hardware Without NRIPS, AMD hardware can’t correctly inject a software exception which faults for IDT-related reasons With NRIPS, AMD hardware experimentally still can’t inject icebp properly if it would fault XSA-106 was released (September 2014) XSA-156 was released (November 2015) The fix was buggy, and caused infinite loops when the guest used int3 Should have been caught during development Would have been caught if the XSA-106 code had been in automation Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 8 / 13
  32. 32. So what is XTF, exactly? A microkernel core: Performs minimal setup at boot (pagetables, exceptions, console, etc) Non-essential infrastructure available from the library Uniform reporting mechanism (success, skip, error, failure) Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 9 / 13
  33. 33. So what is XTF, exactly? A microkernel core: Performs minimal setup at boot (pagetables, exceptions, console, etc) Non-essential infrastructure available from the library Uniform reporting mechanism (success, skip, error, failure) A multi-guest build system: Write one main.c, compile for any/all environments PV or HVM, 32bit or 64bit, unpaged/PSE/PAE/Long mode Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 9 / 13
  34. 34. So what is XTF, exactly? A microkernel core: Performs minimal setup at boot (pagetables, exceptions, console, etc) Non-essential infrastructure available from the library Uniform reporting mechanism (success, skip, error, failure) A multi-guest build system: Write one main.c, compile for any/all environments PV or HVM, 32bit or 64bit, unpaged/PSE/PAE/Long mode A utility to run one or more tests from dom0: ./xtf-runner $MYTEST Scriptable Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 9 / 13
  35. 35. So what is XTF, exactly? A microkernel core: Performs minimal setup at boot (pagetables, exceptions, console, etc) Non-essential infrastructure available from the library Uniform reporting mechanism (success, skip, error, failure) A multi-guest build system: Write one main.c, compile for any/all environments PV or HVM, 32bit or 64bit, unpaged/PSE/PAE/Long mode A utility to run one or more tests from dom0: ./xtf-runner $MYTEST Scriptable A suite of tests: Sorted into broad categories (functional, XSA, utilities) Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 9 / 13
  36. 36. So what is XTF, exactly? A microkernel core: Performs minimal setup at boot (pagetables, exceptions, console, etc) Non-essential infrastructure available from the library Uniform reporting mechanism (success, skip, error, failure) A multi-guest build system: Write one main.c, compile for any/all environments PV or HVM, 32bit or 64bit, unpaged/PSE/PAE/Long mode A utility to run one or more tests from dom0: ./xtf-runner $MYTEST Scriptable A suite of tests: Sorted into broad categories (functional, XSA, utilities) Quick and easy to use: time ./xtf-runner --all --quiet Haswell test box, 7.2s for all current tests (62) to run sequentially My Edit/Compile/Rerun cycle is a matter of seconds Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 9 / 13
  37. 37. Examples of tests CPUID Faulting (all environments) Xen 4.8 introduced guest CPUID faulting support 71 LoC (137 inc. docs), test/probe/enable/retest/disable/retest cycle Many subsequent CPUID changes in Xen, made with full confidence Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 10 / 13
  38. 38. Examples of tests CPUID Faulting (all environments) Xen 4.8 introduced guest CPUID faulting support 71 LoC (137 inc. docs), test/probe/enable/retest/disable/retest cycle Many subsequent CPUID changes in Xen, made with full confidence NMI Taskswitch Priv (hvm32pae) Xen 4.9 regression with task switching (fixed around rc9!) 93 LoC (186 inc. docs), NMI Task Gate, userspace-initiated self-NMI Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 10 / 13
  39. 39. Examples of tests CPUID Faulting (all environments) Xen 4.8 introduced guest CPUID faulting support 71 LoC (137 inc. docs), test/probe/enable/retest/disable/retest cycle Many subsequent CPUID changes in Xen, made with full confidence NMI Taskswitch Priv (hvm32pae) Xen 4.9 regression with task switching (fixed around rc9!) 93 LoC (186 inc. docs), NMI Task Gate, userspace-initiated self-NMI XSA-203 CVE-2016-10025 (hvm32, shown before) NULL pointer dereference with vmfunc emulation 12 LoC (42 inc. docs) Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 10 / 13
  40. 40. Examples of tests CPUID Faulting (all environments) Xen 4.8 introduced guest CPUID faulting support 71 LoC (137 inc. docs), test/probe/enable/retest/disable/retest cycle Many subsequent CPUID changes in Xen, made with full confidence NMI Taskswitch Priv (hvm32pae) Xen 4.9 regression with task switching (fixed around rc9!) 93 LoC (186 inc. docs), NMI Task Gate, userspace-initiated self-NMI XSA-203 CVE-2016-10025 (hvm32, shown before) NULL pointer dereference with vmfunc emulation 12 LoC (42 inc. docs) XSA-186 CVE-2016-7093 (hvm32, hvm64) Bad truncation of %eip, underflowing the instruction cache 119 LoC (241 inc. docs), 16bit code segment, executing above 64k Suspected Broadwell TLB erratum Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 10 / 13
  41. 41. Pagetable emulation test Started when cleaning up the XSA-176 PoC Mysteriously descheduled itself and ceased executing Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 11 / 13
  42. 42. Pagetable emulation test Started when cleaning up the XSA-176 PoC Mysteriously descheduled itself and ceased executing Found a large number of pagetable walking bugs Comprehensive test of Xen’s pagewalk against real hardware Skylake with PKRU, 4.2s to run, 2252800 unique pagewalks Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 11 / 13
  43. 43. Pagetable emulation test Started when cleaning up the XSA-176 PoC Mysteriously descheduled itself and ceased executing Found a large number of pagetable walking bugs Comprehensive test of Xen’s pagewalk against real hardware Skylake with PKRU, 4.2s to run, 2252800 unique pagewalks Issues fixed: * 2-level PSE36 superpages now return the correct translation. * 2-level L2 superpages without CR0.PSE now return the correct translation. * SMEP now inhibits a user instruction fetch even if NX isn’t active. * Supervisor writes without CR0.WP now set the leaf dirty bit. * L4e._PAGE_GLOBAL is strictly reserved on AMD. * 3-level l3 entries have all reserved bits checked. * 3-level entries can no longer alias Xen’s idea of paged or shared. Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 11 / 13
  44. 44. Pagetable emulation test Started when cleaning up the XSA-176 PoC Mysteriously descheduled itself and ceased executing Found a large number of pagetable walking bugs Comprehensive test of Xen’s pagewalk against real hardware Skylake with PKRU, 4.2s to run, 2252800 unique pagewalks Issues fixed: * 2-level PSE36 superpages now return the correct translation. * 2-level L2 superpages without CR0.PSE now return the correct translation. * SMEP now inhibits a user instruction fetch even if NX isn’t active. * Supervisor writes without CR0.WP now set the leaf dirty bit. * L4e._PAGE_GLOBAL is strictly reserved on AMD. * 3-level l3 entries have all reserved bits checked. * 3-level entries can no longer alias Xen’s idea of paged or shared. Still under development. Outstanding issues: Xen leaks EFER.NX into guests on Intel hardware AMD Zen doesn’t order A/D updates with loads Intel and AMD’s implementation of SMAP differs Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 11 / 13
  45. 45. Further testing ideas Currently, tests are “boot microkernel, wait for it to exit” Easy, and very effective for a lot of tasks Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 12 / 13
  46. 46. Further testing ideas Currently, tests are “boot microkernel, wait for it to exit” Easy, and very effective for a lot of tasks Device Model / Introspection testing Dom0 test agent connects to the IOREQ/VM EVENT ring Microkernel makes a set of specific actions Test agent checks for correct requests in the ring, and responds Microkernel checks for correct results of the responses Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 12 / 13
  47. 47. Further testing ideas Currently, tests are “boot microkernel, wait for it to exit” Easy, and very effective for a lot of tasks Device Model / Introspection testing Dom0 test agent connects to the IOREQ/VM EVENT ring Microkernel makes a set of specific actions Test agent checks for correct requests in the ring, and responds Microkernel checks for correct results of the responses Configuration testing In dom0, iterate over VM configuration options Boot the microkernel for each configuration, and check Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 12 / 13
  48. 48. Further testing ideas Currently, tests are “boot microkernel, wait for it to exit” Easy, and very effective for a lot of tasks Device Model / Introspection testing Dom0 test agent connects to the IOREQ/VM EVENT ring Microkernel makes a set of specific actions Test agent checks for correct requests in the ring, and responds Microkernel checks for correct results of the responses Configuration testing In dom0, iterate over VM configuration options Boot the microkernel for each configuration, and check Performance testing Microbenchmarking or multi-domain stress testing Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 12 / 13
  49. 49. Further testing ideas Currently, tests are “boot microkernel, wait for it to exit” Easy, and very effective for a lot of tasks Device Model / Introspection testing Dom0 test agent connects to the IOREQ/VM EVENT ring Microkernel makes a set of specific actions Test agent checks for correct requests in the ring, and responds Microkernel checks for correct results of the responses Configuration testing In dom0, iterate over VM configuration options Boot the microkernel for each configuration, and check Performance testing Microbenchmarking or multi-domain stress testing Combine with coverage See which paths are actually covered by a test Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 12 / 13
  50. 50. Any Questions? Source: git://xenbits.xen.org/xtf.git http://xenbits.xen.org/gitweb/?p=xtf.git Documentation: http://xenbits.xen.org/docs/xtf/ Examples: Already 37 tests upstream and running automatically http://xenbits.xen.org/docs/xtf/test-index.html Discussion: xen-devel@lists.xenproject.org irc://#xendevel@chat.freenode.net Andrew Cooper (Citrix XenServer) Xen Test Framework Thursday 13th July 2017 13 / 13

×