Performance Profiling of Virtual Machines
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,803
On Slideshare
1,803
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
30
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Performance Profiling of Virtual MachinesJiaqing Du+, Nipun Sehrawat*, Willy Zwaenepoel++EPFL, Switzerland*University of Illinois at Urbana-Champaign
  • 2. Performance Profiling• Use CPU performance counters• Monitor software runtime behavior• Incur very low overhead• Used extensively: OProfile, VTune, … %CYCLE Function Module 98.5529 vmx_vcpu_run kvm-intel.ko 0.2226 (no symbols) libc.so 0.1034 hpet_cpuhp_notify vmlinux 0.1034 native_patch vmlinuxJiaqing Du, VEE, March 9, 2011 2
  • 3. Terminology OS Guest Guest profiler profiler profiler VMM VMM profiler CPU PMU CPU PMU CPU PMU (1) native profiling (2) guest-wide profiling (3) system-wide profilingJiaqing Du, VEE, March 9, 2011 3
  • 4. Profiling with Virtual Machines Para- Hardware Binary virtualization assistance translation Guest-wide profiling ? ? ? System-wide profiling XenOprof ? ? Profilers do not work well with virtual machines.Jiaqing Du, VEE, March 9, 2011 4
  • 5. Contributions (1) Give solutions Para- Hardware Binary virtualization assistance translation Guest-wide profiling ? ? ? System-wide profiling XenOprof ? ? (2) Implement prototypesJiaqing Du, VEE, March 9, 2011 5
  • 6. Outline• Native profiling• Guest-wide profiling• System-wide profiling• EvaluationJiaqing Du, VEE, March 9, 2011 6
  • 7. Native Profiling• Performance monitoring unit (PMU) – consists of a set of event counters – generates an interrupt when a counter overflows• PMU-based profiler User Control Interpret - previous PC value Kernel - process identifier Configure Collect CPU PMUJiaqing Du, VEE, March 9, 2011 7
  • 8. Guest-wide Profiling• Profiler runs in the guest and only profiles the guest Guest Control Interpret Injected interrupts should be handled right after guest Configure Collect resumes execution. VMM CPU PMU Challenge: synchronous interrupt delivery to the guestJiaqing Du, VEE, March 9, 2011 8
  • 9. System-wide Profiling (1/3)• Reveal runtime behavior of both VMM and guest(s) Guest1 Guest2 Do not know the internals of a guest. Control Interpret VMM Configure Collect CPU PMU Challenge: interpret samples belonging to the guestJiaqing Du, VEE, March 9, 2011 9
  • 10. System-wide Profiling (2/3)• Interpret guest samples: full delegation Control Interpret Guest Configure Collect Control Interpret VMM Configure Collect CPU PMUJiaqing Du, VEE, March 9, 2011 10
  • 11. System-wide Profiling (3/3)• Interpret guest samples: interpretation delegation Control Interpret Guest Configure Collect Control Interpret Shared Buffer VMM Configure Collect CPU PMUJiaqing Du, VEE, March 9, 2011 11
  • 12. PMU Multiplexing• When to save & restore performance counters?• CPU switch – only in-guest execution is accounted to the guest VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest 1 account to guest 2 account to guest 2• Domain switch – in-VMM execution is also accounted to the guest VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest1 account to guest2Jiaqing Du, VEE, March 9, 2011 12
  • 13. Implementation Para- KVM QEMU virtualization Guest-wide profiling ? √ ? System-wide profiling XenOprof √ √Jiaqing Du, VEE, March 9, 2011 13
  • 14. Evaluation question #1How much does profiling slow down programs?Jiaqing Du, VEE, March 9, 2011 14
  • 15. Profiling Overhead• Measure execution time – a computation-intensive program – with and without profiling – about 400 counter overflows per second Profiling environment Increased execution time Native Linux 0.04% ± 0.004% KVM guest-wide 0.39% ± 0.045% KVM system-wide 0.44% ± 0.043% QEMU system-wide 0.94% ± 0.044%Jiaqing Du, VEE, March 9, 2011 15
  • 16. Evaluation question #2 Are profiling results accurate?Jiaqing Du, VEE, March 9, 2011 16
  • 17. Profiling Accuracy (1/4)• A computation-intensive benchmark• compute_{a|b}() does floating point arithmetic• Monitor CPU cycles int main(int argc, char *argv[]) { while (1) { compute_a(); compute_b(); } }Jiaqing Du, VEE, March 9, 2011 17
  • 18. Profiling Accuracy (2/4)• Comparison with native profiling 90 80 70 60 50 Native Cycle % 40 KVM guest-wide KVM system-wide 30 QEMU system-wide 20 10 0 compute_a compute_b Routine nameJiaqing Du, VEE, March 9, 2011 18
  • 19. Profiling Accuracy (3/4)• A memory-intensive benchmark• Randomly access a fixed-size region of memory• Monitor last level cache misses struct item { struct item *next; long pad[NUM_PAD]; } void chase_pointer() { struct item *p = NULL; p = &randomly_connected_items; while (p != null) p = p->next; }Jiaqing Du, VEE, March 9, 2011 19
  • 20. Profiling Accuracy (4/4) • Comparison with native profiling 1.6 1.4 1.2 1 NativeCache misses per 0.8 KVM guest-widememory access 0.6 KVM system-wide QEMU system-wide 0.4 0.2 0 256 512 768 1024 1280 1536 1792 2048 2304 2560 2816 3072 Working set size (KB) Jiaqing Du, VEE, March 9, 2011 20
  • 21. Evaluation question #3 What is the difference between CPU switch and domain switch?Jiaqing Du, VEE, March 9, 2011 21
  • 22. Recap• CPU switch VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest 1 account to guest 2 account to guest 2• Domain switch VMM VMM guest1 I/Oguest1 guest2 I/Oguest2 guest2 account to guest1 account to guest2Jiaqing Du, VEE, March 9, 2011 22
  • 23. Profiling Packet Receive (1/2)• Experiment – push packets to a Linux guest in KVM – run OProfile in the guest – monitor instruction retirements Linux KVM virtual NIC Linux Hardware Hardware NIC NICJiaqing Du, VEE, March 9, 2011 23
  • 24. Profiling Packet Receive (2/2) CPU Switch Domain Switch INSTR Function INSTR Function 167 csum_partial 2261 cp_interrupt 106 csum_partial_copy_generic 1336 cp_rx_pollPacket 74 copy_to_user 1034 cp_start_xmit I/OProcessing Related 47 ipt_do_table 421 native_apic_mem_write 38 tcp_v4_rcv 374 native_apic_mem_read … … 191 … csum_partial … … … 105 … csum_partial_copy_generic … … … 94 … copy_to_user … … … 79 … ipt_do_table … … … 51 … tcp_v4_rcv … Domain switch gives more insight for I/O operations. Jiaqing Du, VEE, March 9, 2011 24
  • 25. Related Work• XenOprof – first profiler targeting virtual machines – system-wide profiling for Xen• Linux perf – a profiling infrastructure for Linux – limited support of profiling KVM Linux guest• VMware vmkperf – only read and write CPU performance countersJiaqing Du, VEE, March 9, 2011 25
  • 26. Conclusions Para- Hardware Binary virtualization assistance translation Guest-wide √ √ profiling √ System-wide profiling XenOprof √ √Jiaqing Du, VEE, March 9, 2011 26