• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • I’m not sure what equipment costs and topological flexibility are-?
  • Under Test you have 1GBE and 1Gb/s – not the same, needs to be fixed, I assume.

slides slides Presentation Transcript

  • Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., steph@sandburst.com Allyn Romanow, Cisco Systems, allyn@cisco.com
  • RDDP Is Coming Soon
    • “ ST [RDMA] Is The Wave Of The Future” – S Bailey & C Good, CERN 1999
    • Need:
      • standard protocols
      • host software
      • accelerated NICs (RNICs)
      • faster host buses (for > 1G)
    • Vendors are finally serious:
      • Broadcom, Intel, Agilent, Adaptec, Emulex, Microsoft, IBM, HP (Compaq, Tandem, DEC), Sun, EMC, NetApp, Oracle, Cisco & many, many others
  • Overview
    • Motivation
    • Architecture
    • Open Issues
  • CFP SigComm Workshop
    • NICELI SigComm 03 Workshop
    • Workshop on Network-I/O Convergence: Experience, Lessons, Implications
    • http://www.acm.org/sigcomm/sigcomm2003/workshop/niceli/index.html
  • High Speed Data Transfer
    • Bottlenecks
      • Protocol performance
      • Router performance
      • End station performance, host processing
        • CPU Utilization
        • The I/O Bottleneck
          • Interrupts
          • TCP checksum
          • Copies
  • What is RDMA?
    • Avoids copying by allowing network adapter under control of application to steer data directly into application buffers
    • Bulk data transfer or kernel bypass for small messages
    • Grid, cluster, supercomputing, data centers
    • Historically, special purpose fabrics – Fibre Channel, VIA, Infiniband, Quadrics, Servernet
  • Traditional Data Center Servers The World Ethernet/ IP Storage Network (Fibre Channel) Database Intermachine Network (VIA, IB, Proprietary) application A Machine
  • Why RDMA over IP? Business Case
    • TCP/IP not used for high bandwidth interconnection, host processing costs too high
    • High bandwidth transfer to become more prevalent – 10 GE, data centers
    • Special purpose interfaces are expensive
    • IP NICs are cheap, volume
  • The Technical Problem- I/O Bottleneck
    • With TCP/IP host processing can’t keep up with link bandwidth, on receive
    • Per byte costs dominate, Clark (89)
    • Well researched by distributed systems community, mid 1990’s. Industry experience.
    • Memory bandwidth doesn’t scale, processor memory performance gap– Hennessy(97), D.Patterson, T. Anderson(97),
    • Stream benchmark
  • Copying
    • Using IP transports (TCP & SCTP) requires data copying
    NIC 1 User Buffer Packet Buffer Packet Buffer 2 Data copies
  • Why Is Copying Important?
    • Heavy resource consumption @ high speed (1Gbits/s and up)
      • Uses large % of available CPU
      • Uses large fraction of avail. bus bw – min 3 trips across the bus
    64 KB window, 64 KB I/Os, 2P 600 MHz PIII, 9000 B MTU 0.2 CPUs 0.2 CPUs 891 1 Gb/s RDMA SAN - VIA 1.2 CPUs 0.5 CPUs 769 1 GBE, TCP Rx CPUs Tx CPUs Throughput (Mb/sec) Test
  • What’s In RDMA For Us?
    • Network I/O becomes `free’ (still have latency though)
    2500 machines using 30% CPU for I/O 1750 machines using 0% CPU for I/O
  • Approaches to Copy Reduction
    • On-host – Special purpose software and/or hardware e.g., Zero Copy TCP, page flipping
      • Unreliable, idiosyncratic, expensive
    • Memory to memory copies, using network protocols to carry placement information
      • Satisfactory experience – Fibre Channel, VIA, Servernet
    • FOR HARDWARE, not software
  • RDMA over IP Standardization
    • IETF RDDP Remote Direct Data Placement WG
      • http://ietf.org/html.charters/rddp-charter.html
    • RDMAC RDMA Consortium
      • http://www.rdmaconsortium.org/home
  • RDMA over IP Architecture
    • Two layers:
    • DDP – Direct Data Placement
    • RDMA - control
    IP Transport DDP RDMA control ULP
  • Upper and Lower Layers
    • ULPs- SDP Sockets Direct Protocol, iSCSI, MPI
    • DAFS is standardized NFSv4 on RDMA
    • SDP provides SOCK_STREAM API
    • Over reliable transport – TCP, SCTP
  • Open Issues
    • Security
    • TCP order processing, framing
    • Atomic ops
    • Ordering constraints – performance vs. predictability
    • Other transports, SCTP, TCP, unreliable
    • Impact on network & protocol behaviors
    • Next performance bottleneck?
    • What new applications?
    • Eliminates the need for large MTU (jumbos)?