XenTT: Deterministic Systems Analysis in Xen


Published on

This talk details XenTT, an open-source framework for deterministic replay and systems analysis in development at the University of Utah. The framework consists of two main parts: a set of Xen extensions that implement efficient, deterministic replay, and a powerful analysis engine that extracts information from systems during replay executions.

Deterministic replay promises to change how people analyze and debug software systems. As software stacks grow in complexity, traditional ways of understanding failures, explaining anomalous executions, and analyzing performance are reaching their limits in the face of emergent behavior, unrepeatability, cross-component execution, software aging, and adversarial changes to code. Replay-based, whole-system analyses offer precise solutions to these problems.

XenTT extends Xen with the ability to replay and analyze the execution of VM guests. A number of careful design choices ensure that our implementation, which supports single-CPU, paravirtual, Linux guests, is efficient, maintainable, and extensible. XenTT's run-time checks and offline log-comparison tools enabled us to efficiently scale the recording layer by detecting and debugging errors in the determinism of replay.

Our analysis engine seeks to overcome the semantic gap between an analysis algorithm and the low-level state of a guest. Using debug information to reconstruct functions and data structures within the guest, the engine provides a convenient API for implementing systems analyses. The engine implements a powerful debug-symbol and VM introspection library, which enables an analysis to access the state of the guest through familiar terms. To further simplify the development of new analyses, the engine provides primitives that support common exploration patterns, e.g., breakpoints, watchpoints, and control-flow integrity checking. To enable performance analyses of recorded executions, XenTT provides a performance modeling interface, which faithfully replays performance parameters of the original run.

Beyond describing the design and implementation of XenTT, this talk will present examples of how we have used deterministic replay to implement security and performance analyses.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

XenTT: Deterministic Systems Analysis in Xen

  1. 1. XenTT: Deterministic Systems Analysisin XenAnton Burtsev, David Johnson, Chung Hwan Kim, Mike Hibler,Eric Eide, John RegehrUniversity of Utah
  2. 2. • Record execution history of a guest VM• Recreate it in an instruction-accurate way 2
  3. 3. • Execution replay is a right way to analyze systems• We need a practical tool! 3
  4. 4. Deterministic replay 4
  5. 5. DeterminismCPU is deterministic Execution history Interrupt, device I/O 5
  6. 6. Recording• Determinism of the execution environment• Instruction-accurate position of events Event log 6
  7. 7. Instruction-accurate position of eventslabel: ... mov • Number of instructions shr since boot mov • Intel has a hardware rep movsl counter test • It’s not accurate jne label ...• Hardware instruction counter Position = { EIP, branch #, ECX} • Preempt execution of a system at the same instruction • Hardware instruction counter + single-stepping 7
  8. 8. Determinism in Xen 8
  9. 9. Nondeterministic events Simple Model System = memory pages + registers Events = memory updates (time, device I/O, system calls)Control flow updates = registers + stack (interrupts, events) 9
  10. 10. Some examples• Instruction emulation (e.g. cpuid, rdtsc, in/out) • Return the values of the original run• Hypercalls • Re-execute to ensure determinism of the hypervisor• Time • Shared info page + rdtsc• Exceptions • Deterministic, re-execute• Interrupts • Force re-execution of the interrupt frame (bounce frame) code in entry.S• Shared info updates • Replay original values• Memory • Shadow page tables 10
  11. 11. 11
  12. 12. Xen devices 12
  13. 13. Device interposition• Devd ensures determinism of updates to the guest’s shared ring buffers 13
  14. 14. Replay touches many parts of Xen• Device discovery • Devd implements a concept of a device bus • Discovers new devices in Xenstore • Binds new devices with drivers• Xenstore transactions • During replay, transactions from replayed guest can’t fail • They will not be re-executed• Out-of-order device responses • Disk and network responses can arrive out-of-order• Disk logging • Disk payload is deterministic • LVM snapshots• Network logging • In-kernel logging of the network payload 14
  15. 15. Low-overhead logging 15
  16. 16. Are we sure that executions identical?• Intel branch store trace facility• Record all taken branches in a memory buffer • Compare original and replay runs 16
  17. 17. Analysis Engine andVirtual Machine Introspection 17
  18. 18. Analysis framework 18
  19. 19. Virtual Machine Introspection (VMI) 19
  20. 20. Performance model • Account for effects of replay • Translate performance between original and replay runs• Re-execution approach to performance Virt cntr = Virt cntrstart + ∆ (Real cntr) 20
  21. 21. Analysis Examples 21
  22. 22. NFS request processing path• How much time requests spend in each subsystem?• Request tracking • Address of the kernel data structure is a unique identifier • Join identifiers when requests move between subsystems 22
  23. 23. Execution context tracking• Execution context • Context switches • schedule.switch_tasks • User/kernel • System call transitions • system_call • Interrupts and exceptions • do_IRQ • do_pagefault • do_* (divide_error, debug, nmi, int3...)• Make analysis context aware • Filter probes by context, e.g. • All pagefaults from the process “foo” 23
  24. 24. Control Flow Integrity (CFI) • Simple CFI model: • Returns should match calls • Dynamic calls are “sane”, e.g. within kernel address space • Detect ROP attacks, stack smashing, etc. 24
  25. 25. Execution tracesys_sendfile do_sendfile fget_light • CFI records a trace of rw_verify_area fget_light function calls rw_verify_area shmem_file_sendfile • sys_sendfile is do_shmem_file_read shmem_getpage the last system call find_lock_page radix_tree_lookup before control shmem_recalc_inode shmem_swp_alloc flow jumps to 0x0 shmem_swp_entry kmap_atomic __kmap_atomic page_address kunmap_atomic find_get_page radix_tree_lookup file_send_actor sock_sendpage UNKNOWN FUNCTION (addr:0x00000000) 25
  26. 26. Intrusion backtracking• Track accesses to “/etc/passwd”• Probe sys_open • Filter by file name • Find process ID, branch counter 26
  27. 27. Intrusion backtracking: pass 1• Process or it’s parent escalated privileges• Watch write accesses to &task->uid • Filter by parents of the offending process 27
  28. 28. Intrusion backtracking: pass 1• Find the syscall inside which privileges are escalated• Probe sys_* - all system call entry and exit points • Filter by the offending process ID 28
  29. 29. Intrusion backtracking: pass 1• Privilege escalation is a CFI violation• Start CFI analysis from the last system call• Find %EIP at which CFI is violated, and location of the shell code (0x0) 29
  30. 30. Intrusion backtracking: pass 1• Find at which point address 0x0000000 gets mapped• Probe do_page_fault 30
  31. 31. More mechanisms• Execution traces • BTS trace of all taken branches • Instruction traces• Memory (variable) access traces • Intersect with the execution trace • See where variables get accessed 31
  32. 32. How much overhead? 32
  33. 33. • 32-bit x86 PV-guests• xen-unstable near v3.0.4 • We rely on a working shadow page tables• 1-CPU time-traveling guests • No SMP replay • Dom0 and Xen are SMP of course• Test machine • 4 cores • 1Gbps network • 130 MB/s disks 33
  34. 34. CPU 140 120 100 80Score 60 40 Xen XenTT 20 0 34
  35. 35. Network throughput 140 120 100 80MB/s Xen 60 XenTT XenTT (32MB buffers) 40 20 0 TCP Send TCP Receive 35
  36. 36. Network delay 0.16 0.14 0.12 0.1ms 0.08 Xen 0.06 XenTT 0.04 0.02 0 domU to dom0 domU to external host Ping 36
  37. 37. Disk I/O 140 120 100 80MB/s Xen 60 XenTT 40 20 0 Serial write Serial read 37
  38. 38. Raw Compressed (gzip)Linux bootEvent log 4.3 GB 0.8 GBIdle overnight (14 hours)Event log 6.2 GB 1.6 GBGrowth rate 440 MB/hour 114 MB/hour (2.7GB/day)TCP receive (1.63 GB data stream)Event log 1.76 GB 342 MBPayload log 1.79 GB Payload dependentDisk write (4 GB file)Event log 1.8 GB 350 MBDisk read (4 GB file)Event log 0.59 GB 38
  39. 39. Lessons learnt 39
  40. 40. Scaling development• Extending Xen with BTS support • Debug crashes in Xen, and dom0• Execution comparison tools • BTS traces to understand what went wrong • Support for resolving symbols• Run-time comparison tools • Compare guest’s state between original and replay runs• Trace from all parts of your system • Xen, dom0, domU• Support performance tracing • Xentrace messages 40
  41. 41. What we didn’t predict• I/O delay goes up • Not sure if Linux has adequate low-latency user-level processing support • Maybe need an in-kernel interposition component• Branch counters are fragile • Our code works on several server CPUs • Fails on a laptop with the CPU from the same model/family line 41
  42. 42. Conclusions 42
  43. 43. Practical replay analysis is feasible• Performance overheads are reasonable • Realistic systems • Realistic workloads • Minor setup costs (just install Xen)• Analysis engine is an amazing tool • And we’re growing it• We need your help to port it upstream • Porting effort • Shadow memory for PV guests • Support for HVM guests 43
  44. 44. Thank you.All code is GPLv2 and will be available soon (available on-request now). Questions: aburtsev@flux.utah.edu 44
  45. 45. Backup slides 45
  46. 46. Execution environment Deterministic• CPU Machine• Virtual hardware • Memory System • Disk• Software External Events • External I/O 46
  47. 47. How do we find nondeterministic events? 47
  48. 48. Device interface 48
  49. 49. Reading a local variablebsym_skb = target_lookup_sym(probe->target, "netif_poll.skb", ".", NULL, flags);lval_skb = bsymbol_load(bsym_skb, flags);skb_addr = *(unsigned long*) lval_skb->buf; 49
  50. 50. Controlled re-execution or replay• Types of non-deterministic events • Synchronous • Hypercalls • Asynchronous • Interrupts • Best effort • Time updates• Branch counters • The biggest problem of this solution 50