Personal Pervasive Telemetry

537 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
537
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Personal Pervasive Telemetry

  1. 1. Achieving 10 Gb/s Using Xen Para-virtualized Network Drivers Kaushik Kumar Ram*, J. Renato Santos + , Yoshio Turner + , Alan L. Cox*, Scott Rixner* + HP Labs *Rice University
  2. 2. Xen PV Driver on 10 Gig Networks Xen summit – Feb 2009 02/25/2009 <ul><li>Focus of this talk: RX </li></ul>Throughput on a single TCP connection (netperf)
  3. 3. Network Packet Reception in Xen Xen summit – Feb 2009 02/25/2009 Driver Domain Guest Domain Xen Physical Driver Hardware NIC I/O Channel Incoming Pkt IRQ Bridge grant copy event DMA demux Push into the network stack Post grant on I/O channel <ul><li>Mechanisms to reduce driver domain cost: </li></ul><ul><li>Use of Multi-queue NIC </li></ul><ul><ul><li>Avoid data copy </li></ul></ul><ul><ul><li>Packet demultiplex in hardware </li></ul></ul><ul><li>Grant Reuse Mechanism </li></ul><ul><ul><li>Reduce cost of grant operations </li></ul></ul>Backend Driver Frontend Driver 2 3 4 5 6 7 1 gr
  4. 4. Using Multi-Queue NICs Xen summit – Feb 2009 02/25/2009 Driver Domain Guest Domain Xen Physical Driver Hardware MQ NIC I/O Channels Incoming Pkt IRQ event Post grant on I/O channel Map buffer post buf on dev queue DMA UnMap buffer Push into the network stack <ul><li>Advantage of multi-queue </li></ul><ul><ul><li>Avoid data copy </li></ul></ul><ul><ul><li>Avoid software bridge </li></ul></ul>One RX queue per guest guest MAC addr demux Backend Driver Frontend Driver 6 8 1 5 7 9 gr 3 2 4
  5. 5. Performance Impact of Multi-queue Xen summit – Feb 2009 02/25/2009 <ul><li>Savings due to multiqueue </li></ul><ul><ul><li>grant copy </li></ul></ul><ul><ul><li>bridge </li></ul></ul><ul><li>Most of remaining cost </li></ul><ul><ul><li>grant hypercalls (grant + xen functions) </li></ul></ul>Driver Domain CPU Cost
  6. 6. Using Grants with Multi-queue NIC Xen summit – Feb 2009 02/25/2009 Driver Domain Guest Domain Xen Physical Driver NIC map grant hypercall unmap grant hypercall <ul><li>Multi-queue replaces one grant hypercall (copy) with two hypercalls (map/unmap) </li></ul><ul><li>Grant hypercalls are expensive </li></ul><ul><ul><li>Map/unmap calls for every I/O operation </li></ul></ul>use page for I/O Frontend Driver Backend Driver 1 gr 3 2
  7. 7. Reducing Grant Cost Xen summit – Feb 2009 02/25/2009 <ul><li>Grant Reuse </li></ul><ul><ul><li>Do not revoke grant after I/O is completed </li></ul></ul><ul><ul><li>Keep buffer page on a pool of unused I/O pages </li></ul></ul><ul><ul><li>Reuse already granted pages available on buffer pool for future I/O operations </li></ul></ul><ul><ul><ul><li>Avoids map/unmap on every I/O </li></ul></ul></ul>
  8. 8. Revoking a Grant for when the Page is Mapped in Driver Domain Xen summit – Feb 2009 02/25/2009 <ul><li>Guest may need to reclaim I/O page for other use (e.g. memory pressure on guest) </li></ul><ul><li>Need to unmap page at driver domain before using it in guest kernel </li></ul><ul><ul><li>To preserve memory isolation (e.g. protect from driver bugs) </li></ul></ul><ul><li>Need handshake between frontend and backend to revoke grant </li></ul><ul><ul><li>This may be slow especially if the driver domain is not running </li></ul></ul>
  9. 9. Approach to Avoid Handshake when Revoking Grants Xen summit – Feb 2009 02/25/2009 <ul><li>Observation: No need to map guest page into driver domain with multi-queue NIC </li></ul><ul><ul><li>Software does not need to look at packet header, since demux is performed in the device </li></ul></ul><ul><ul><li>Just need page address for DMA operation </li></ul></ul><ul><li>Approach: Replace grant map hypercall with a shared memory interface to the hypervisor </li></ul><ul><ul><li>Shared memory table provides translation of guest grant to page address </li></ul></ul><ul><ul><li>No need to unmap page when guest needs to revoke grant (no handshake) </li></ul></ul>
  10. 10. Software I/O Translation Table Xen summit – Feb 2009 02/25/2009 Driver Domain Guest Domain Xen Physical Driver NIC create a grant for buffer page Send grant over I/O channel set hypercall Validate, pin and update SIOTT clear hypercall get page reset use SIOTT #pg use pg 0 1 Use page for I/O DMA event check use and revoke set use <ul><li>SIOTT: software I/O translation table </li></ul><ul><ul><li>Indexed by grant reference </li></ul></ul><ul><ul><li>“ pg” field: guest page address & permission </li></ul></ul><ul><ul><li>“ use” field indicates if grant is in use by driver domain </li></ul></ul><ul><li>set/clear hypercalls </li></ul><ul><ul><li>Invoked by guest </li></ul></ul><ul><ul><li>Set validates grant, pins page, and writes page address to SIOTT </li></ul></ul><ul><ul><li>Clear requires that “use”=0 </li></ul></ul>Backend Driver Frontend Driver 2 9 6 8 1 3 7 10 gr 4 5 pg
  11. 11. Grant Reuse: Avoid pin/unpin hypercall on every I/O Xen summit – Feb 2009 02/25/2009 Driver Domain Guest Domain Xen Physical Driver NIC create grant set hypercall validate, pin and update SIOTT SIOTT #pg use pg 0 event I/O Buffer Pool reuse buffer & grant from pool return buffer to pool & keep grant kernel mem pressure clear hypercall clear SIOT return page to kernel return buffer to pool & keep grant revoke grant Use page for I/O Backend Driver Frontend Driver 2 1 3 5 4 8 9 7 11 gr gr 6 10
  12. 12. Performance Impact of Grant Reuse w/ Software I/O Translation Table Xen summit – Feb 2009 02/25/2009 cost saving: grant hypercall Driver Domain CPU Cost
  13. 13. Impact of optimizations on throughput Xen summit – Feb 2009 02/25/2009 Data rate CPU utilization <ul><li>Multi-queue w/ grant reuse significantly reduce driver domain cost </li></ul><ul><li>Bottleneck shifts from driver domain to guest </li></ul><ul><li>Higher cost in guest than in Linux still limits throughput in Xen </li></ul>
  14. 14. Additional optimizations at guest frontend driver Xen summit – Feb 2009 02/25/2009 <ul><li>LRO (Large Receive Offload) support at frontend </li></ul><ul><ul><li>Consecutive packets on same connection combined into one large packet </li></ul></ul><ul><ul><li>Reduces cost of processing packet in network stack </li></ul></ul><ul><li>Software prefetch </li></ul><ul><ul><li>Prefetch next packet and socket buffer struct into CPU cache while processing current packet </li></ul></ul><ul><ul><li>Reduces cache misses at frontend </li></ul></ul><ul><li>Avoid full page buffers </li></ul><ul><ul><li>Use half-page (2KB) buffers (Max pkt size is 1500 bytes) </li></ul></ul><ul><ul><li>Reduces TLB working set and thus TLB misses </li></ul></ul>
  15. 15. Performance impact of guest frontend optimizations Xen summit – Feb 2009 02/25/2009 <ul><li>Optimizations bring CPU cost in guest close to native Linux </li></ul><ul><li>Remaining cost difference </li></ul><ul><ul><li>Higher cost in netfront than in physical driver </li></ul></ul><ul><ul><li>Xen functions to send and deliver events </li></ul></ul>Guest Domain CPU Cost
  16. 16. Impact of all optimizations on throughput Xen summit – Feb 2009 02/25/2009 <ul><li>Multiqueue with software optimizations achieves the same throughput as direct I/O ( ~8 Gb/s) </li></ul><ul><li>2 or more guests are able to saturate 10 gigabit link </li></ul>current PV driver optimized PV driver (1 guest) optimized PV driver (2 guests) Direct I/O (1 guest) Linux
  17. 17. Conclusion Xen summit – Feb 2009 02/25/2009 <ul><li>Use of multi-queue support in modern NICs enables high performance networking with Xen PV Drivers </li></ul><ul><ul><li>Attractive alternative to Direct I/O </li></ul></ul><ul><ul><ul><li>Same throughput, although with some additional CPU cycles at driver domain </li></ul></ul></ul><ul><ul><ul><li>Avoids hardware dependence in the guests </li></ul></ul></ul><ul><ul><li>Light driver domain enables scalability for multiple guests </li></ul></ul><ul><ul><ul><li>Driver domain can now handle 10 Gb/s data rates </li></ul></ul></ul><ul><ul><ul><li>Multiple guests can leverage multiple CPU cores and saturate 10 gigabit link </li></ul></ul></ul>
  18. 18. Status Xen summit – Feb 2009 02/25/2009 <ul><li>Status </li></ul><ul><ul><li>Performance results obtained on a modified netfront/netback implementation using the original Netchannel1 protocol </li></ul></ul><ul><ul><li>Currently porting mechanisms to Netchannel2 </li></ul></ul><ul><ul><ul><li>Basic multi-queue already available on public netchannel2 tree </li></ul></ul></ul><ul><ul><ul><li>Additional software optimizations still in discussion with community and should be included in netchannel2 sometime soon. </li></ul></ul></ul><ul><li>Thanks to </li></ul><ul><ul><li>Mitch Williams and John Ronciak from Intel for providing samples of Intel NICs and for adding multi-queue support on their driver </li></ul></ul><ul><ul><li>Ian Pratt, Steven Smith and Keir Fraser for helpful discussions </li></ul></ul>

×