Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rdma presentation-kisti-v2

923 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Rdma presentation-kisti-v2

  1. 1. RDMAoE collaboration with KISTI Tuesday 6/7/2011 10:00am-11:00am (50B-2222) mbalman@lbl.gov
  2. 2. RDMA for High Performance Data Movement Network I/O operations are costly: − CPU load − Context switching − Memory latency Zero-copy networking − NIC copies data directly to/from application memory IB transport (HPC applications) iWARP (TCP stack / TOE)
  3. 3. RDMA model One sided operations Get/Put semantics  Send/receive Direct data placement  RDMA Write  RDMA Read Asyschronous − Work Queue (send queue – receive queue) − Completion Queue
  4. 4. RDMA Programming Model Objects  Queue Pairs (protection domain)  Send queue (RDMA write, RDMA read)  Receive queue  Modify state  Completetion queue (poll)  Memory region (MR) Functions (verbs) − IB (libmlx4) iWARP (libcxgb3) Librdmacm (connection setup)
  5. 5. RDMA/iWARP Implicit RDMA support Explicit RDMA support iWARP − encapsulate RDMA traffic at a high level − Use TCP stack − Without TOE is it beneficial?
  6. 6. Alternative Approaches RDMA over Converged Ethernet (RoCE) − Lightweight RDMA transport over Ethernet  Widely deployed technology  Support kernel bypass  OFED 1.5.1 supports RoCE SoftRDMAs... − SoftRoCE (OFED 1.5.1 supports softRoCE) − SoftiWARP (new TPC kernel stack)
  7. 7. Hidden Cost Memory Registration − RDMA Read/Write Connection Setup − Librdmacm→ Bulk data movement? Asynchronous Model − Buffer Management
  8. 8. Challanges in Bulk Transfer Application Level Adjustments Request Aggregation − Small data files − Does FTP like transfer mechanism is appropriate for RDMA? File System Overhead − Asynchronous Operations Connection Caching / Multiple Connection?
  9. 9. Local Area / Wide Area IB RDMA designed for local area − How does RDMA perform in Wide Area? iWARP − No promising results - Over TCP (with TOE?) − SoftiWARP ??? RoCE − Isolated traffic ? / much less CPU usage − softRoCE?
  10. 10. GridFTP over RDMA XIO driver for GridFTP − Experimented using Chelsio cards (cxgb3) − 10GE − WAN testing in progress! − Local area: 910MBbps – 1175MBps − Much better than GridFTP over TCP  Much less CPU load (1/2)
  11. 11. FTP100 – FTP over RDMA Experimented with Mellonox Cards − Local area – 10GE − iWARP  Did not perform well compared to TCP − No significant gain − RoCE tests  In progress (have some initial results)  Limited by the disk performance  Mem2mem: − Can already saturate the 10GE link
  12. 12. What is Next?Experiments RDMA model over WAN SoftiWARP from IBM Zurich − TCP kernel stack implementing/defining RDMA iverbs SoftRoCE – OFED 1.5.2-rxe distribution − Multiple connections?
  13. 13. Transfer Applications over RDMA Simple Client/Server: − Developing a prototype for transferring climate dataset using RDMA protocols − Asysnchronous memory management module Application level tuning? − Memory regions (max/min?) − Multiple QPs
  14. 14. Climate AnalysisClimate Applications are Data-Intensive Shared data repository: − Data files needs to be downloaded for further processing and analysis − Data retrieval is the main bottleneck − Multiple clients (working as VM instances)  Can not depent on HW support  SoftRoCE ? softiWARP
  15. 15. What can we do for WAN testing? Q&A?→ https://sdm.lbl.gov/climate100/

×