XenTT: Deterministic Systems Analysis in Xen
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


XenTT: Deterministic Systems Analysis in Xen



This talk details XenTT, an open-source framework for deterministic replay and systems analysis in development at the University of Utah. The framework consists of two main parts: a set of Xen ...

This talk details XenTT, an open-source framework for deterministic replay and systems analysis in development at the University of Utah. The framework consists of two main parts: a set of Xen extensions that implement efficient, deterministic replay, and a powerful analysis engine that extracts information from systems during replay executions.

Deterministic replay promises to change how people analyze and debug software systems. As software stacks grow in complexity, traditional ways of understanding failures, explaining anomalous executions, and analyzing performance are reaching their limits in the face of emergent behavior, unrepeatability, cross-component execution, software aging, and adversarial changes to code. Replay-based, whole-system analyses offer precise solutions to these problems.

XenTT extends Xen with the ability to replay and analyze the execution of VM guests. A number of careful design choices ensure that our implementation, which supports single-CPU, paravirtual, Linux guests, is efficient, maintainable, and extensible. XenTT's run-time checks and offline log-comparison tools enabled us to efficiently scale the recording layer by detecting and debugging errors in the determinism of replay.

Our analysis engine seeks to overcome the semantic gap between an analysis algorithm and the low-level state of a guest. Using debug information to reconstruct functions and data structures within the guest, the engine provides a convenient API for implementing systems analyses. The engine implements a powerful debug-symbol and VM introspection library, which enables an analysis to access the state of the guest through familiar terms. To further simplify the development of new analyses, the engine provides primitives that support common exploration patterns, e.g., breakpoints, watchpoints, and control-flow integrity checking. To enable performance analyses of recorded executions, XenTT provides a performance modeling interface, which faithfully replays performance parameters of the original run.

Beyond describing the design and implementation of XenTT, this talk will present examples of how we have used deterministic replay to implement security and performance analyses.



Total Views
Views on SlideShare
Embed Views



4 Embeds 261

http://xen.org 139
http://www.xen.org 71
http://www-archive.xenproject.org 49
http://xen.xensource.com 2



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

XenTT: Deterministic Systems Analysis in Xen Presentation Transcript

  • 1. XenTT: Deterministic Systems Analysisin XenAnton Burtsev, David Johnson, Chung Hwan Kim, Mike Hibler,Eric Eide, John RegehrUniversity of Utah
  • 2. • Record execution history of a guest VM• Recreate it in an instruction-accurate way 2
  • 3. • Execution replay is a right way to analyze systems• We need a practical tool! 3
  • 4. Deterministic replay 4
  • 5. DeterminismCPU is deterministic Execution history Interrupt, device I/O 5
  • 6. Recording• Determinism of the execution environment• Instruction-accurate position of events Event log 6
  • 7. Instruction-accurate position of eventslabel: ... mov • Number of instructions shr since boot mov • Intel has a hardware rep movsl counter test • It’s not accurate jne label ...• Hardware instruction counter Position = { EIP, branch #, ECX} • Preempt execution of a system at the same instruction • Hardware instruction counter + single-stepping 7
  • 8. Determinism in Xen 8
  • 9. Nondeterministic events Simple Model System = memory pages + registers Events = memory updates (time, device I/O, system calls)Control flow updates = registers + stack (interrupts, events) 9
  • 10. Some examples• Instruction emulation (e.g. cpuid, rdtsc, in/out) • Return the values of the original run• Hypercalls • Re-execute to ensure determinism of the hypervisor• Time • Shared info page + rdtsc• Exceptions • Deterministic, re-execute• Interrupts • Force re-execution of the interrupt frame (bounce frame) code in entry.S• Shared info updates • Replay original values• Memory • Shadow page tables 10
  • 11. 11
  • 12. Xen devices 12
  • 13. Device interposition• Devd ensures determinism of updates to the guest’s shared ring buffers 13
  • 14. Replay touches many parts of Xen• Device discovery • Devd implements a concept of a device bus • Discovers new devices in Xenstore • Binds new devices with drivers• Xenstore transactions • During replay, transactions from replayed guest can’t fail • They will not be re-executed• Out-of-order device responses • Disk and network responses can arrive out-of-order• Disk logging • Disk payload is deterministic • LVM snapshots• Network logging • In-kernel logging of the network payload 14
  • 15. Low-overhead logging 15
  • 16. Are we sure that executions identical?• Intel branch store trace facility• Record all taken branches in a memory buffer • Compare original and replay runs 16
  • 17. Analysis Engine andVirtual Machine Introspection 17
  • 18. Analysis framework 18
  • 19. Virtual Machine Introspection (VMI) 19
  • 20. Performance model • Account for effects of replay • Translate performance between original and replay runs• Re-execution approach to performance Virt cntr = Virt cntrstart + ∆ (Real cntr) 20
  • 21. Analysis Examples 21
  • 22. NFS request processing path• How much time requests spend in each subsystem?• Request tracking • Address of the kernel data structure is a unique identifier • Join identifiers when requests move between subsystems 22
  • 23. Execution context tracking• Execution context • Context switches • schedule.switch_tasks • User/kernel • System call transitions • system_call • Interrupts and exceptions • do_IRQ • do_pagefault • do_* (divide_error, debug, nmi, int3...)• Make analysis context aware • Filter probes by context, e.g. • All pagefaults from the process “foo” 23
  • 24. Control Flow Integrity (CFI) • Simple CFI model: • Returns should match calls • Dynamic calls are “sane”, e.g. within kernel address space • Detect ROP attacks, stack smashing, etc. 24
  • 25. Execution tracesys_sendfile do_sendfile fget_light • CFI records a trace of rw_verify_area fget_light function calls rw_verify_area shmem_file_sendfile • sys_sendfile is do_shmem_file_read shmem_getpage the last system call find_lock_page radix_tree_lookup before control shmem_recalc_inode shmem_swp_alloc flow jumps to 0x0 shmem_swp_entry kmap_atomic __kmap_atomic page_address kunmap_atomic find_get_page radix_tree_lookup file_send_actor sock_sendpage UNKNOWN FUNCTION (addr:0x00000000) 25
  • 26. Intrusion backtracking• Track accesses to “/etc/passwd”• Probe sys_open • Filter by file name • Find process ID, branch counter 26
  • 27. Intrusion backtracking: pass 1• Process or it’s parent escalated privileges• Watch write accesses to &task->uid • Filter by parents of the offending process 27
  • 28. Intrusion backtracking: pass 1• Find the syscall inside which privileges are escalated• Probe sys_* - all system call entry and exit points • Filter by the offending process ID 28
  • 29. Intrusion backtracking: pass 1• Privilege escalation is a CFI violation• Start CFI analysis from the last system call• Find %EIP at which CFI is violated, and location of the shell code (0x0) 29
  • 30. Intrusion backtracking: pass 1• Find at which point address 0x0000000 gets mapped• Probe do_page_fault 30
  • 31. More mechanisms• Execution traces • BTS trace of all taken branches • Instruction traces• Memory (variable) access traces • Intersect with the execution trace • See where variables get accessed 31
  • 32. How much overhead? 32
  • 33. • 32-bit x86 PV-guests• xen-unstable near v3.0.4 • We rely on a working shadow page tables• 1-CPU time-traveling guests • No SMP replay • Dom0 and Xen are SMP of course• Test machine • 4 cores • 1Gbps network • 130 MB/s disks 33
  • 34. CPU 140 120 100 80Score 60 40 Xen XenTT 20 0 34
  • 35. Network throughput 140 120 100 80MB/s Xen 60 XenTT XenTT (32MB buffers) 40 20 0 TCP Send TCP Receive 35
  • 36. Network delay 0.16 0.14 0.12 0.1ms 0.08 Xen 0.06 XenTT 0.04 0.02 0 domU to dom0 domU to external host Ping 36
  • 37. Disk I/O 140 120 100 80MB/s Xen 60 XenTT 40 20 0 Serial write Serial read 37
  • 38. Raw Compressed (gzip)Linux bootEvent log 4.3 GB 0.8 GBIdle overnight (14 hours)Event log 6.2 GB 1.6 GBGrowth rate 440 MB/hour 114 MB/hour (2.7GB/day)TCP receive (1.63 GB data stream)Event log 1.76 GB 342 MBPayload log 1.79 GB Payload dependentDisk write (4 GB file)Event log 1.8 GB 350 MBDisk read (4 GB file)Event log 0.59 GB 38
  • 39. Lessons learnt 39
  • 40. Scaling development• Extending Xen with BTS support • Debug crashes in Xen, and dom0• Execution comparison tools • BTS traces to understand what went wrong • Support for resolving symbols• Run-time comparison tools • Compare guest’s state between original and replay runs• Trace from all parts of your system • Xen, dom0, domU• Support performance tracing • Xentrace messages 40
  • 41. What we didn’t predict• I/O delay goes up • Not sure if Linux has adequate low-latency user-level processing support • Maybe need an in-kernel interposition component• Branch counters are fragile • Our code works on several server CPUs • Fails on a laptop with the CPU from the same model/family line 41
  • 42. Conclusions 42
  • 43. Practical replay analysis is feasible• Performance overheads are reasonable • Realistic systems • Realistic workloads • Minor setup costs (just install Xen)• Analysis engine is an amazing tool • And we’re growing it• We need your help to port it upstream • Porting effort • Shadow memory for PV guests • Support for HVM guests 43
  • 44. Thank you.All code is GPLv2 and will be available soon (available on-request now). Questions: aburtsev@flux.utah.edu 44
  • 45. Backup slides 45
  • 46. Execution environment Deterministic• CPU Machine• Virtual hardware • Memory System • Disk• Software External Events • External I/O 46
  • 47. How do we find nondeterministic events? 47
  • 48. Device interface 48
  • 49. Reading a local variablebsym_skb = target_lookup_sym(probe->target, "netif_poll.skb", ".", NULL, flags);lval_skb = bsymbol_load(bsym_skb, flags);skb_addr = *(unsigned long*) lval_skb->buf; 49
  • 50. Controlled re-execution or replay• Types of non-deterministic events • Synchronous • Hypercalls • Asynchronous • Interrupts • Best effort • Time updates• Branch counters • The biggest problem of this solution 50