Rdma presentation-kisti-v2

874 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
874
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
25
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Rdma presentation-kisti-v2

  1. 1. RDMAoE collaboration with KISTI Tuesday 6/7/2011 10:00am-11:00am (50B-2222) mbalman@lbl.gov
  2. 2. RDMA for High Performance Data Movement Network I/O operations are costly: − CPU load − Context switching − Memory latency Zero-copy networking − NIC copies data directly to/from application memory IB transport (HPC applications) iWARP (TCP stack / TOE)
  3. 3. RDMA model One sided operations Get/Put semantics  Send/receive Direct data placement  RDMA Write  RDMA Read Asyschronous − Work Queue (send queue – receive queue) − Completion Queue
  4. 4. RDMA Programming Model Objects  Queue Pairs (protection domain)  Send queue (RDMA write, RDMA read)  Receive queue  Modify state  Completetion queue (poll)  Memory region (MR) Functions (verbs) − IB (libmlx4) iWARP (libcxgb3) Librdmacm (connection setup)
  5. 5. RDMA/iWARP Implicit RDMA support Explicit RDMA support iWARP − encapsulate RDMA traffic at a high level − Use TCP stack − Without TOE is it beneficial?
  6. 6. Alternative Approaches RDMA over Converged Ethernet (RoCE) − Lightweight RDMA transport over Ethernet  Widely deployed technology  Support kernel bypass  OFED 1.5.1 supports RoCE SoftRDMAs... − SoftRoCE (OFED 1.5.1 supports softRoCE) − SoftiWARP (new TPC kernel stack)
  7. 7. Hidden Cost Memory Registration − RDMA Read/Write Connection Setup − Librdmacm→ Bulk data movement? Asynchronous Model − Buffer Management
  8. 8. Challanges in Bulk Transfer Application Level Adjustments Request Aggregation − Small data files − Does FTP like transfer mechanism is appropriate for RDMA? File System Overhead − Asynchronous Operations Connection Caching / Multiple Connection?
  9. 9. Local Area / Wide Area IB RDMA designed for local area − How does RDMA perform in Wide Area? iWARP − No promising results - Over TCP (with TOE?) − SoftiWARP ??? RoCE − Isolated traffic ? / much less CPU usage − softRoCE?
  10. 10. GridFTP over RDMA XIO driver for GridFTP − Experimented using Chelsio cards (cxgb3) − 10GE − WAN testing in progress! − Local area: 910MBbps – 1175MBps − Much better than GridFTP over TCP  Much less CPU load (1/2)
  11. 11. FTP100 – FTP over RDMA Experimented with Mellonox Cards − Local area – 10GE − iWARP  Did not perform well compared to TCP − No significant gain − RoCE tests  In progress (have some initial results)  Limited by the disk performance  Mem2mem: − Can already saturate the 10GE link
  12. 12. What is Next?Experiments RDMA model over WAN SoftiWARP from IBM Zurich − TCP kernel stack implementing/defining RDMA iverbs SoftRoCE – OFED 1.5.2-rxe distribution − Multiple connections?
  13. 13. Transfer Applications over RDMA Simple Client/Server: − Developing a prototype for transferring climate dataset using RDMA protocols − Asysnchronous memory management module Application level tuning? − Memory regions (max/min?) − Multiple QPs
  14. 14. Climate AnalysisClimate Applications are Data-Intensive Shared data repository: − Data files needs to be downloaded for further processing and analysis − Data retrieval is the main bottleneck − Multiple clients (working as VM instances)  Can not depent on HW support  SoftRoCE ? softiWARP
  15. 15. What can we do for WAN testing? Q&A?→ https://sdm.lbl.gov/climate100/

×