Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

slides

  • Login to see the comments

  • Be the first to like this

slides

  1. 1. Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., steph@sandburst.com Allyn Romanow, Cisco Systems, allyn@cisco.com
  2. 2. RDDP Is Coming Soon <ul><li>“ ST [RDMA] Is The Wave Of The Future” – S Bailey & C Good, CERN 1999 </li></ul><ul><li>Need: </li></ul><ul><ul><li>standard protocols </li></ul></ul><ul><ul><li>host software </li></ul></ul><ul><ul><li>accelerated NICs (RNICs) </li></ul></ul><ul><ul><li>faster host buses (for > 1G) </li></ul></ul><ul><li>Vendors are finally serious: </li></ul><ul><ul><li>Broadcom, Intel, Agilent, Adaptec, Emulex, Microsoft, IBM, HP (Compaq, Tandem, DEC), Sun, EMC, NetApp, Oracle, Cisco & many, many others </li></ul></ul>
  3. 3. Overview <ul><li>Motivation </li></ul><ul><li>Architecture </li></ul><ul><li>Open Issues </li></ul>
  4. 4. CFP SigComm Workshop <ul><li>NICELI SigComm 03 Workshop </li></ul><ul><li>Workshop on Network-I/O Convergence: Experience, Lessons, Implications </li></ul><ul><li>http://www.acm.org/sigcomm/sigcomm2003/workshop/niceli/index.html </li></ul>
  5. 5. High Speed Data Transfer <ul><li>Bottlenecks </li></ul><ul><ul><li>Protocol performance </li></ul></ul><ul><ul><li>Router performance </li></ul></ul><ul><ul><li>End station performance, host processing </li></ul></ul><ul><ul><ul><li>CPU Utilization </li></ul></ul></ul><ul><ul><ul><li>The I/O Bottleneck </li></ul></ul></ul><ul><ul><ul><ul><li>Interrupts </li></ul></ul></ul></ul><ul><ul><ul><ul><li>TCP checksum </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Copies </li></ul></ul></ul></ul>
  6. 6. What is RDMA? <ul><li>Avoids copying by allowing network adapter under control of application to steer data directly into application buffers </li></ul><ul><li>Bulk data transfer or kernel bypass for small messages </li></ul><ul><li>Grid, cluster, supercomputing, data centers </li></ul><ul><li>Historically, special purpose fabrics – Fibre Channel, VIA, Infiniband, Quadrics, Servernet </li></ul>
  7. 7. Traditional Data Center Servers The World Ethernet/ IP Storage Network (Fibre Channel) Database Intermachine Network (VIA, IB, Proprietary) application A Machine
  8. 8. Why RDMA over IP? Business Case <ul><li>TCP/IP not used for high bandwidth interconnection, host processing costs too high </li></ul><ul><li>High bandwidth transfer to become more prevalent – 10 GE, data centers </li></ul><ul><li>Special purpose interfaces are expensive </li></ul><ul><li>IP NICs are cheap, volume </li></ul>
  9. 9. The Technical Problem- I/O Bottleneck <ul><li>With TCP/IP host processing can’t keep up with link bandwidth, on receive </li></ul><ul><li>Per byte costs dominate, Clark (89) </li></ul><ul><li>Well researched by distributed systems community, mid 1990’s. Industry experience. </li></ul><ul><li>Memory bandwidth doesn’t scale, processor memory performance gap– Hennessy(97), D.Patterson, T. Anderson(97), </li></ul><ul><li>Stream benchmark </li></ul>
  10. 10. Copying <ul><li>Using IP transports (TCP & SCTP) requires data copying </li></ul>NIC 1 User Buffer Packet Buffer Packet Buffer 2 Data copies
  11. 11. Why Is Copying Important? <ul><li>Heavy resource consumption @ high speed (1Gbits/s and up) </li></ul><ul><ul><li>Uses large % of available CPU </li></ul></ul><ul><ul><li>Uses large fraction of avail. bus bw – min 3 trips across the bus </li></ul></ul>64 KB window, 64 KB I/Os, 2P 600 MHz PIII, 9000 B MTU 0.2 CPUs 0.2 CPUs 891 1 Gb/s RDMA SAN - VIA 1.2 CPUs 0.5 CPUs 769 1 GBE, TCP Rx CPUs Tx CPUs Throughput (Mb/sec) Test
  12. 12. What’s In RDMA For Us? <ul><li>Network I/O becomes `free’ (still have latency though) </li></ul>2500 machines using 30% CPU for I/O 1750 machines using 0% CPU for I/O
  13. 13. Approaches to Copy Reduction <ul><li>On-host – Special purpose software and/or hardware e.g., Zero Copy TCP, page flipping </li></ul><ul><ul><li>Unreliable, idiosyncratic, expensive </li></ul></ul><ul><li>Memory to memory copies, using network protocols to carry placement information </li></ul><ul><ul><li>Satisfactory experience – Fibre Channel, VIA, Servernet </li></ul></ul><ul><li>FOR HARDWARE, not software </li></ul>
  14. 14. RDMA over IP Standardization <ul><li>IETF RDDP Remote Direct Data Placement WG </li></ul><ul><ul><li>http://ietf.org/html.charters/rddp-charter.html </li></ul></ul><ul><li>RDMAC RDMA Consortium </li></ul><ul><ul><li>http://www.rdmaconsortium.org/home </li></ul></ul>
  15. 15. RDMA over IP Architecture <ul><li>Two layers: </li></ul><ul><li>DDP – Direct Data Placement </li></ul><ul><li>RDMA - control </li></ul>IP Transport DDP RDMA control ULP
  16. 16. Upper and Lower Layers <ul><li>ULPs- SDP Sockets Direct Protocol, iSCSI, MPI </li></ul><ul><li>DAFS is standardized NFSv4 on RDMA </li></ul><ul><li>SDP provides SOCK_STREAM API </li></ul><ul><li>Over reliable transport – TCP, SCTP </li></ul>
  17. 17. Open Issues <ul><li>Security </li></ul><ul><li>TCP order processing, framing </li></ul><ul><li>Atomic ops </li></ul><ul><li>Ordering constraints – performance vs. predictability </li></ul><ul><li>Other transports, SCTP, TCP, unreliable </li></ul><ul><li>Impact on network & protocol behaviors </li></ul><ul><li>Next performance bottleneck? </li></ul><ul><li>What new applications? </li></ul><ul><li>Eliminates the need for large MTU (jumbos)? </li></ul>

×