RDMAoE collaboration with KISTI Tuesday 6/7/2011 10:00am-11:00am (50B-2222) firstname.lastname@example.org
RDMA for High Performance Data Movement Network I/O operations are costly: − CPU load − Context switching − Memory latency Zero-copy networking − NIC copies data directly to/from application memory IB transport (HPC applications) iWARP (TCP stack / TOE)
RDMA model One sided operations Get/Put semantics Send/receive Direct data placement RDMA Write RDMA Read Asyschronous − Work Queue (send queue – receive queue) − Completion Queue
RDMA/iWARP Implicit RDMA support Explicit RDMA support iWARP − encapsulate RDMA traffic at a high level − Use TCP stack − Without TOE is it beneficial?
Alternative Approaches RDMA over Converged Ethernet (RoCE) − Lightweight RDMA transport over Ethernet Widely deployed technology Support kernel bypass OFED 1.5.1 supports RoCE SoftRDMAs... − SoftRoCE (OFED 1.5.1 supports softRoCE) − SoftiWARP (new TPC kernel stack)
Hidden Cost Memory Registration − RDMA Read/Write Connection Setup − Librdmacm→ Bulk data movement? Asynchronous Model − Buffer Management
Challanges in Bulk Transfer Application Level Adjustments Request Aggregation − Small data files − Does FTP like transfer mechanism is appropriate for RDMA? File System Overhead − Asynchronous Operations Connection Caching / Multiple Connection?
Local Area / Wide Area IB RDMA designed for local area − How does RDMA perform in Wide Area? iWARP − No promising results - Over TCP (with TOE?) − SoftiWARP ??? RoCE − Isolated traffic ? / much less CPU usage − softRoCE?
GridFTP over RDMA XIO driver for GridFTP − Experimented using Chelsio cards (cxgb3) − 10GE − WAN testing in progress! − Local area: 910MBbps – 1175MBps − Much better than GridFTP over TCP Much less CPU load (1/2)
FTP100 – FTP over RDMA Experimented with Mellonox Cards − Local area – 10GE − iWARP Did not perform well compared to TCP − No significant gain − RoCE tests In progress (have some initial results) Limited by the disk performance Mem2mem: − Can already saturate the 10GE link
What is Next?Experiments RDMA model over WAN SoftiWARP from IBM Zurich − TCP kernel stack implementing/defining RDMA iverbs SoftRoCE – OFED 1.5.2-rxe distribution − Multiple connections?
Transfer Applications over RDMA Simple Client/Server: − Developing a prototype for transferring climate dataset using RDMA protocols − Asysnchronous memory management module Application level tuning? − Memory regions (max/min?) − Multiple QPs
Climate AnalysisClimate Applications are Data-Intensive Shared data repository: − Data files needs to be downloaded for further processing and analysis − Data retrieval is the main bottleneck − Multiple clients (working as VM instances) Can not depent on HW support SoftRoCE ? softiWARP
What can we do for WAN testing? Q&A?→ https://sdm.lbl.gov/climate100/