* Distributed System Lab 1 Analysis and experimental ...


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Asdfdffef Wef Wef We F F Ew Fewf Ewfewfewfwf We F Wef We F ewf
  • * Distributed System Lab 1 Analysis and experimental ...

    1. 1. Analysis and experimental evaluation of data plane virtualization with Xen 04/03/10 Distributed System Lab 游清權
    2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Virtual Network with Xen </li></ul><ul><ul><li>Data path in Xen </li></ul></ul><ul><ul><li>Routers data plane virtualization with Xen </li></ul></ul><ul><ul><li>Performance problem statement </li></ul></ul><ul><li>Experiments and Analysis </li></ul><ul><li>Related work </li></ul><ul><li>Conclusion and perspectives </li></ul>04/03/10 Distributed System Lab
    3. 3. Introduction <ul><li>System virtualization </li></ul><ul><ul><li>Isolation </li></ul></ul><ul><ul><li>Mobility </li></ul></ul><ul><ul><li>Dynamic reconfiguration </li></ul></ul><ul><ul><li>Fault tolerance of distributed systems </li></ul></ul><ul><ul><li>Increase security due to the isolation </li></ul></ul>04/03/10 Distributed System Lab
    4. 4. Introduction <ul><li>Virtualization could potentially solve main issues of the actual Internet (security, mobility, eliability,configurability) </li></ul><ul><ul><li>Overhead due to the additional layers </li></ul></ul><ul><li>Considering this sharing of resources like the </li></ul><ul><ul><li>network interfaces </li></ul></ul><ul><ul><li>the processors </li></ul></ul><ul><ul><li>the memory(buffer space) </li></ul></ul><ul><ul><li>the switching fabric </li></ul></ul><ul><li>It is a challenge to get a predictable, stable and optimal performance! </li></ul>04/03/10 Distributed System Lab
    5. 5. Virtual Network with Xen <ul><li>Data path in Xen </li></ul>04/03/10 Distributed System Lab
    6. 6. Data path in Xen <ul><li>VMs in Xen access the network hardware through the virtualization layer </li></ul><ul><li>domU has a virtual interface for each physical network interface </li></ul><ul><li>Virtual interface be accessed via a split device driver ( frontend driver in domU , the backend driver in dom0 ) </li></ul>04/03/10 Distributed System Lab
    7. 7. 1.Data path in Xen <ul><li>Network packets emitted on a VM </li></ul><ul><li>Copied to a segment of shared memory by the Xen hypervisor , transmitted to dom0 </li></ul><ul><li>Packets are bridged (path 1), routed (path 2) between the virtual interfaces and the physical ones </li></ul><ul><li>The additional path a packet (dashed line) </li></ul><ul><li>Overhead: </li></ul><ul><ul><li>Copy the shared memory, </li></ul></ul><ul><ul><li>multiplexing and demultiplexing </li></ul></ul>04/03/10 Distributed System Lab
    8. 8. 2.Routers data plane virtualization with Xen <ul><li>Xen can be used for fully (i. e. control plane and data plane) virtualized software routers </li></ul><ul><li>Figure 2 .Architecture with software routers uploaded into two virtual machines to create virtual routers </li></ul><ul><li>VM not direct access to the physical hardware interfaces </li></ul><ul><li>Packets are forwarded between the virtual interface and corresponding physical interface (multiplexing and demultiplexing) </li></ul>04/03/10 Distributed System Lab
    9. 9. 04/03/10 Distributed System Lab
    10. 10. 3.Performance problem statement 04/03/10 Distributed System Lab
    11. 11. 3.Performance problem statement <ul><li>Define the efficiency in terms of throughput </li></ul><ul><li>Fairness of the inter-virtual machine resource sharing is derived from the classical Jain index[6] </li></ul><ul><li>n : Number of VMs sharing the physical resources </li></ul><ul><li>Xi: The metric achieved by each virtual machine I </li></ul>04/03/10 Distributed System Lab
    12. 12. Experiments and Analysis <ul><li>1.Experimental setup </li></ul><ul><li>All executed on the fully controlled, reservable and reconfigurable French national testbed Grid’5000 [4]. </li></ul><ul><li>End-hosts are IBM eServers 325 </li></ul><ul><ul><li>With 2 CPUs AMD Opteron 246 (2.0 GHz/1MB) </li></ul></ul><ul><ul><li>With one core each one </li></ul></ul><ul><ul><li>2GB of memory and a 1Gb/s NIC. </li></ul></ul>04/03/10 Distributed System Lab
    13. 13. Experiments and Analysis <ul><li>Virtual routers host on IBM eServers 326m </li></ul><ul><ul><li>With 2 CPUs AMD Opteron 246 (2.0GHz/1MB), </li></ul></ul><ul><ul><li>With one core each one </li></ul></ul><ul><ul><li>2GB of memory and 2 1Gb/s NICs. </li></ul></ul><ul><li>Xen 3.1.0 and 3.2.1 with respectively the modified 2.6.18-3 and linux kernels </li></ul><ul><li>Measurement tools </li></ul><ul><ul><li>iperf for TCP throughput </li></ul></ul><ul><ul><li>netperf for UDP rate </li></ul></ul><ul><ul><li>xentop for the CPU utilization </li></ul></ul><ul><ul><li>classical ping utility for latency </li></ul></ul>04/03/10 Distributed System Lab
    14. 14. Experiments and Analysis <ul><li>Evaluation of virtual end-hosts </li></ul><ul><ul><li>Network performance on virtual end-hosts implemented with Xen 3.1 and Xen 3.2. </li></ul></ul><ul><ul><li>Some results with Xen 3.1 were not satisfying , dom0 being the bottleneck. </li></ul></ul><ul><ul><li>Second run of the on Xen 3.1 , attributing more CPU time to dom0, (up to 32 times the part attributed to a domU) called Xen 3.1a. </li></ul></ul>04/03/10 Distributed System Lab
    15. 15. Sending performance <ul><li>First experiment </li></ul><ul><ul><li>TCP sending throughput on 1, 2, 4 and 8 virtual hosts </li></ul></ul><ul><ul><li>Figure 3:Throughput per VM , Aggregate throughput. </li></ul></ul><ul><ul><li>3.1 and 3.2, close to classical linux throughput R classical(T/R) = 938Mb/s </li></ul></ul><ul><ul><li>3.1a and 3.2, aggregated throughput obtained by VMs reaches roughly more than on 3.1 </li></ul></ul>04/03/10 Distributed System Lab
    16. 16. 04/03/10 Distributed System Lab
    17. 17. Sending performance <ul><li>Conclude in three cases </li></ul><ul><ul><li>The system is efficient and predictable (Throughput) </li></ul></ul><ul><li>The throughput per VM corresponds to the fair share of the available bandwidth of the link (Rtheoretical/N). </li></ul>04/03/10 Distributed System Lab
    18. 18. Sending performance <ul><li>Average CPU utilization for each guest domain Figure 4 . </li></ul><ul><li>For a single domU </li></ul><ul><ul><li>Two CPUs are used at around 50% in the three setups (Xen 3.1, 3.1a and 3.2) </li></ul></ul><ul><li>Linux system without virtualization : </li></ul><ul><ul><li>only Cclassical(E) = 32% of both CPUs are in use </li></ul></ul><ul><li>With 8 domUs </li></ul><ul><ul><li>Both of the CPUs are used at over 70% </li></ul></ul>04/03/10 Distributed System Lab
    19. 19. 04/03/10 Distributed System Lab
    20. 20. Sending performance <ul><li>3.1a : Increasing dom0’s CPU weight </li></ul><ul><li>Even if virtualization introduces a processing overhead , two processors can allow to achieve a throughput equivalent to the Max theoretical throughput on 8 concurrent VMs using a 1Gb/s link. </li></ul><ul><li>Fairness index is here close to 1 (bandwidth and CPU time are fairly) </li></ul>04/03/10 Distributed System Lab
    21. 21. 2.Receiving performance <ul><li>Figure 5 </li></ul><ul><li>Xen 3.1: Aggregate throughput decreases slightly </li></ul><ul><ul><li>(according to the number of VM) </li></ul></ul><ul><li>Only 882Mb/s on a single domU </li></ul><ul><li>Only 900Mb/s on a set of 8 concurrent domUs </li></ul><ul><ul><li>What corresponds to around 95% of the throughput Rclassical(T/R) = 938Mb/s on a classical linux system. </li></ul></ul>04/03/10 Distributed System Lab
    22. 22. Receiving performance. <ul><li>The efficiency E throughput </li></ul><ul><ul><li>Varies between 0.96 for 8 domUs and 0.94 for a single domU </li></ul></ul><ul><li>By changing scheduler parameters (Xen3.1a) </li></ul><ul><ul><li>Improve the aggregate throughput to reach about 970Mb/s on 8 virtual machines. </li></ul></ul>04/03/10 Distributed System Lab
    23. 23. Receiving performance <ul><li>Xen 3.1, bandwidth between the domUs is very unfair (Growing number of domUs) </li></ul><ul><li>Unfair treatment of the events and has been fixed in Xen 3.2. </li></ul><ul><li>To provide simply dom0 with more CPU time </li></ul><ul><ul><li>3.1a improve fairness in Xen 3.1 by giving dom0 enough time to treat all the events </li></ul></ul>04/03/10 Distributed System Lab
    24. 24. Receiving performance <ul><li>Fair resource sharing: </li></ul><ul><ul><li>Makes performance much more predictable </li></ul></ul><ul><li>Xen 3.2 is similar to Xen 3.1a </li></ul><ul><ul><li>Throughput increases by about 6% </li></ul></ul><ul><ul><li>(compared to the default 3.1 version) </li></ul></ul>04/03/10 Distributed System Lab
    25. 25. Receiving performance 04/03/10 Distributed System Lab
    26. 26. Receiving performance <ul><li>Total CPU cost </li></ul><ul><ul><li>Varie between 70% and 75%(Xen3.1 and 3.2) </li></ul></ul><ul><ul><li>(important overhead compared to linux system without virtualization) </li></ul></ul><ul><ul><li>Network reception takes C classical(R) = 24% </li></ul></ul><ul><li>Notice that on default Xen 3.1 </li></ul><ul><ul><li>The efficiency in terms of throughput decreases, but the available CPU time is not entirely consumed </li></ul></ul><ul><ul><li>Unfairness </li></ul></ul>04/03/10 Distributed System Lab
    27. 27. Receiving performance <ul><li>Proposal improves fairness but increases CPU </li></ul><ul><li>Xen 3.2 </li></ul><ul><ul><li>DomUs CPU sharing is fair (dom0’s CPU decreases slightly) </li></ul></ul><ul><ul><li>Less total CPU overhead and achieving however better throughput </li></ul></ul><ul><li>Conclude: </li></ul><ul><ul><li>Important improvements have been implemented in Xen 3.2 to decrease the excessive dom0 CPU overhead. </li></ul></ul>04/03/10 Distributed System Lab
    28. 28. Receiving performance 04/03/10 Distributed System Lab
    29. 29. 3.Evaluation of virtual routers <ul><li>Forwarding performance of virtual routers with 2 NICs </li></ul><ul><ul><li>UDP receiving throughput over VMs </li></ul></ul><ul><ul><li>Sending Max sized packets on Max link speed over the virtual routers and the TCP throughput is measured. </li></ul></ul><ul><ul><li>Further Latency over virtual routers is measured </li></ul></ul><ul><li>Xen 3.2a </li></ul><ul><ul><li>Xen 3.2 in its default configuration </li></ul></ul><ul><ul><li>Increased weight parameter for dom0 in CPU scheduling </li></ul></ul>04/03/10 Distributed System Lab
    30. 30. 3.Evaluation of virtual routers <ul><li>Forwarding performance. </li></ul>04/03/10 Distributed System Lab
    31. 31. 3.Evaluation of virtual routers <ul><li>Performance of virtual routers </li></ul><ul><ul><li>Generate UDP traffic over one or several virtual routers (1 to 8) sharing a single physical machine </li></ul></ul><ul><ul><ul><li>Max(1500 bytes) </li></ul></ul></ul><ul><ul><ul><li>min (64 bytes) </li></ul></ul></ul><ul><li>Figure 7 ,obtained UDP bit rate and TCP throughput </li></ul><ul><li>Packet loss rate with Max sized packets on each VM </li></ul><ul><li>1 − Rtheoretical/(N × Rtheoretical) </li></ul><ul><li>Classical linux router Rclassical(F) = 957Mb/s </li></ul>04/03/10 Distributed System Lab
    32. 32. 3.Evaluation of virtual routers <ul><li>Details the UDP packet rates and the loss rates per domU with Max and min sized packets. </li></ul>04/03/10 Distributed System Lab
    33. 33. 3.Evaluation of virtual routers <ul><li>Aggregate UDP some cases a bit higher than theoretical value </li></ul><ul><ul><li>Due to little variation in the start times of the different flows </li></ul></ul><ul><li>Resource sharing is fair </li></ul><ul><ul><li>Performance of this setup is predictable </li></ul></ul><ul><li>With min sized packets on 4 or 8 virtual routers , dom0 becomes too overloaded </li></ul><ul><li>Giving a bigger CPU part to dom0 (Xen 3.2a) </li></ul><ul><ul><li>Overall TCP throughput increases </li></ul></ul>04/03/10 Distributed System Lab
    34. 34. 3.Evaluation of virtual routers <ul><li>Virtual router(VR) latency. </li></ul><ul><ul><li>Concurrent virtual routers sharing the same physical machine are either idle or stressed forwarding Max rate TCP flows. </li></ul></ul>04/03/10 Distributed System Lab
    35. 35. Related work <ul><li>Performance of virtual packet transmission in Xen is a crucial subject and has been treated in several papers </li></ul>04/03/10 Distributed System Lab
    36. 36. Conclusion and perspectives <ul><li>Virtualization mechanisms are costly </li></ul><ul><ul><li>Additional copy </li></ul></ul><ul><ul><li>I/O scheduling of virtual machines sharing the physical devices </li></ul></ul><ul><li>Virtualizing the data plane by forwarding packets in domU becomes a more and more promising approach </li></ul>04/03/10 Distributed System Lab
    37. 37. Conclusion and perspectives <ul><li>End-host throughput improved in Xen 3.2 compared to 3.1 </li></ul><ul><li>Virtual routers act similar to classical linux routers forwarding big packets. </li></ul><ul><li>Latency is impacted by the number of concurrent virtual routers . </li></ul><ul><li>Our next goal is to evaluate the performance on 10 Gbit/s links and implement virtual routers on the Grid’5000 platform. </li></ul>04/03/10 Distributed System Lab