Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Performance Profiling of Virtual Machines

2,955 views

Published on

Published in: Technology, Travel
  • Be the first to comment

Performance Profiling of Virtual Machines

  1. 1. Performance Profiling of Virtual MachinesJiaqing Du+, Nipun Sehrawat*, Willy Zwaenepoel++EPFL, Switzerland*University of Illinois at Urbana-Champaign
  2. 2. Performance Profiling• Use CPU performance counters• Monitor software runtime behavior• Incur very low overhead• Used extensively: OProfile, VTune, … %CYCLE Function Module 98.5529 vmx_vcpu_run kvm-intel.ko 0.2226 (no symbols) libc.so 0.1034 hpet_cpuhp_notify vmlinux 0.1034 native_patch vmlinuxJiaqing Du, VEE, March 9, 2011 2
  3. 3. Terminology OS Guest Guest profiler profiler profiler VMM VMM profiler CPU PMU CPU PMU CPU PMU (1) native profiling (2) guest-wide profiling (3) system-wide profilingJiaqing Du, VEE, March 9, 2011 3
  4. 4. Profiling with Virtual Machines Para- Hardware Binary virtualization assistance translation Guest-wide profiling ? ? ? System-wide profiling XenOprof ? ? Profilers do not work well with virtual machines.Jiaqing Du, VEE, March 9, 2011 4
  5. 5. Contributions (1) Give solutions Para- Hardware Binary virtualization assistance translation Guest-wide profiling ? ? ? System-wide profiling XenOprof ? ? (2) Implement prototypesJiaqing Du, VEE, March 9, 2011 5
  6. 6. Outline• Native profiling• Guest-wide profiling• System-wide profiling• EvaluationJiaqing Du, VEE, March 9, 2011 6
  7. 7. Native Profiling• Performance monitoring unit (PMU) – consists of a set of event counters – generates an interrupt when a counter overflows• PMU-based profiler User Control Interpret - previous PC value Kernel - process identifier Configure Collect CPU PMUJiaqing Du, VEE, March 9, 2011 7
  8. 8. Guest-wide Profiling• Profiler runs in the guest and only profiles the guest Guest Control Interpret Injected interrupts should be handled right after guest Configure Collect resumes execution. VMM CPU PMU Challenge: synchronous interrupt delivery to the guestJiaqing Du, VEE, March 9, 2011 8
  9. 9. System-wide Profiling (1/3)• Reveal runtime behavior of both VMM and guest(s) Guest1 Guest2 Do not know the internals of a guest. Control Interpret VMM Configure Collect CPU PMU Challenge: interpret samples belonging to the guestJiaqing Du, VEE, March 9, 2011 9
  10. 10. System-wide Profiling (2/3)• Interpret guest samples: full delegation Control Interpret Guest Configure Collect Control Interpret VMM Configure Collect CPU PMUJiaqing Du, VEE, March 9, 2011 10
  11. 11. System-wide Profiling (3/3)• Interpret guest samples: interpretation delegation Control Interpret Guest Configure Collect Control Interpret Shared Buffer VMM Configure Collect CPU PMUJiaqing Du, VEE, March 9, 2011 11
  12. 12. PMU Multiplexing• When to save & restore performance counters?• CPU switch – only in-guest execution is accounted to the guest VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest 1 account to guest 2 account to guest 2• Domain switch – in-VMM execution is also accounted to the guest VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest1 account to guest2Jiaqing Du, VEE, March 9, 2011 12
  13. 13. Implementation Para- KVM QEMU virtualization Guest-wide profiling ? √ ? System-wide profiling XenOprof √ √Jiaqing Du, VEE, March 9, 2011 13
  14. 14. Evaluation question #1How much does profiling slow down programs?Jiaqing Du, VEE, March 9, 2011 14
  15. 15. Profiling Overhead• Measure execution time – a computation-intensive program – with and without profiling – about 400 counter overflows per second Profiling environment Increased execution time Native Linux 0.04% ± 0.004% KVM guest-wide 0.39% ± 0.045% KVM system-wide 0.44% ± 0.043% QEMU system-wide 0.94% ± 0.044%Jiaqing Du, VEE, March 9, 2011 15
  16. 16. Evaluation question #2 Are profiling results accurate?Jiaqing Du, VEE, March 9, 2011 16
  17. 17. Profiling Accuracy (1/4)• A computation-intensive benchmark• compute_{a|b}() does floating point arithmetic• Monitor CPU cycles int main(int argc, char *argv[]) { while (1) { compute_a(); compute_b(); } }Jiaqing Du, VEE, March 9, 2011 17
  18. 18. Profiling Accuracy (2/4)• Comparison with native profiling 90 80 70 60 50 Native Cycle % 40 KVM guest-wide KVM system-wide 30 QEMU system-wide 20 10 0 compute_a compute_b Routine nameJiaqing Du, VEE, March 9, 2011 18
  19. 19. Profiling Accuracy (3/4)• A memory-intensive benchmark• Randomly access a fixed-size region of memory• Monitor last level cache misses struct item { struct item *next; long pad[NUM_PAD]; } void chase_pointer() { struct item *p = NULL; p = &randomly_connected_items; while (p != null) p = p->next; }Jiaqing Du, VEE, March 9, 2011 19
  20. 20. Profiling Accuracy (4/4) • Comparison with native profiling 1.6 1.4 1.2 1 NativeCache misses per 0.8 KVM guest-widememory access 0.6 KVM system-wide QEMU system-wide 0.4 0.2 0 256 512 768 1024 1280 1536 1792 2048 2304 2560 2816 3072 Working set size (KB) Jiaqing Du, VEE, March 9, 2011 20
  21. 21. Evaluation question #3 What is the difference between CPU switch and domain switch?Jiaqing Du, VEE, March 9, 2011 21
  22. 22. Recap• CPU switch VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest 1 account to guest 2 account to guest 2• Domain switch VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest1 account to guest2Jiaqing Du, VEE, March 9, 2011 22
  23. 23. Profiling Packet Receive (1/2)• Experiment – push packets to a Linux guest in KVM – run OProfile in the guest – monitor instruction retirements Linux KVM virtual NIC Linux Hardware Hardware NIC NICJiaqing Du, VEE, March 9, 2011 23
  24. 24. Profiling Packet Receive (2/2) CPU Switch Domain Switch INSTR Function INSTR Function 167 csum_partial 2261 cp_interrupt 106 csum_partial_copy_generic 1336 cp_rx_pollPacket 74 copy_to_user 1034 cp_start_xmit I/OProcessing Related 47 ipt_do_table 421 native_apic_mem_write 38 tcp_v4_rcv 374 native_apic_mem_read … … 191 … csum_partial … … … 105 … csum_partial_copy_generic … … … 94 … copy_to_user … … … 79 … ipt_do_table … … … 51 … tcp_v4_rcv … Domain switch gives more insight for I/O operations. Jiaqing Du, VEE, March 9, 2011 24
  25. 25. Related Work• XenOprof – first profiler targeting virtual machines – system-wide profiling for Xen• Linux perf – a profiling infrastructure for Linux – limited support of profiling KVM Linux guest• VMware vmkperf – only read and write CPU performance countersJiaqing Du, VEE, March 9, 2011 25
  26. 26. Conclusions Para- Hardware Binary virtualization assistance translation Guest-wide √ √ profiling √ System-wide profiling XenOprof √ √Jiaqing Du, VEE, March 9, 2011 26

×