SlideShare a Scribd company logo
1 of 42
Download to read offline
Open Fabrics InterfacesArchitecture Introduction 
Sean Hefty 
Intel Corporation
Current State of Affairs 
•OFED SW was not designed around HPC 
•Hardware and fabric features are changing 
–Divergence is driving competing APIs 
•Interfaces are being extended, and new APIs introduced 
–Long delay in adoption 
•Size of clusters and single node core counts greatly increasing 
•More applications are wanting to take advantage of high- performance fabrics 
2 
OFED software 
•Widely adopted low-level RDMA API 
•Ships with upstream Linux 
but…
Solution 
3 
Design software interfaces that are aligned with application requirements 
Evolve OpenFabrics 
Target needs of HPC 
Support multiple interface semantics 
Fabric and vendor agnostic 
Supportable in upstream Linux
Enabling through OpenFabrics 
•Leveraging existing open source community 
•Broad ecosystem 
–Application developers and vendors 
–Active community engagement 
•Drive high-performance software APIs 
–Take advantage of available hardware features 
–Support multiple product generations 
4 
Open Fabrics Interfaces Working Group
OFIWG Charter 
•Develop an extensible, open source framework and interfaces aligned with ULP and applicationneeds for high-performancefabric services 
•Software leading hardware 
–Enable future hardware features 
–Minimal impact to applications 
•Minimize impedance match between ULPs and network APIs 
•Craft optimal APIs 
–Detailed analysis on MPI, SHMEM, and other PGAS languages 
–Focus on other applications –storage, databases, … 
5
Call for Participation 
•OFI WG is open participation 
–Contact the ofiwgmail list for meeting details 
–ofiwg@lists.openfabrics.org 
•Source code available through github 
–github.com/ofiwg 
•Presentations / meeting minutes available from OFA download directory 
6 
Help OFI WG understand workload requirements and drive software design
Enable.. 
7 
Optimized software path to hardware 
•Independent of hardware interface, version, features 
Scalability 
High performance 
App-centric 
Extensible 
Reduced cache and memory footprint 
•Scalable address resolution and storage 
•Tight data structures 
Analyze application needs 
•Implement them in a coherent, concise, high-performance manner 
More agile development 
•Time-boxed, iterative development 
•Application focused APIs 
•Adaptable
Verbs Semantic Mismatch 
8 
Allocate WR 
Allocate SGE 
Format SGE –3 writes 
Format WR –6 writes 
Loop 1 
Checks –9 branchesLoop 2 
Check 
Loop 3 
Checks –3 branches 
Checks –3 branches 
Direct call –3 writes 
Checks –2 branches 
50-60 lines of C-code 
25-30 lines of C-code 
generic send call 
optimized send call 
Reduce setup cost-Tighter data 
Eliminate loops and branches-Remaining branches predictable 
Selective optimization paths to HW-Manual function expansion 
Current RDMA APIs 
Evolved Fabric Interfaces
Application-Centric Interfaces 
•Collect application requirements 
•Identify common, fast path usage models 
–Too many use cases to optimize them all 
•Build primitives around fabric services 
–Not device specific interface 
9 
Reducing instruction count requires a better application impedance match
OFA Software Evolution 
10 
libfabric 
FI Provider 
IB Verbs 
Verbs Provider 
Verbs 
Fabric Interfaces 
Transition from disjoint APIs 
to a cohesive set of fabric interfaces 
RDMA CM 
CM 
uVerbsCommand
Fabric Interfaces Framework 
•Take growth into consideration 
•Reduce effort to incorporate new applicationfeatures 
–Addition of new interfaces, structures, or fields 
–Modification of existing functions 
•Allow time to design new interfaces correctly 
–Support prototyping interfaces prior to integration 
www.openfabrics.org 11 
Focus on longer-lived interfaces – software leading hardware
Fabric Interfaces 
12 
Fabric Interfaces 
Message Queue 
Control 
Interface 
RMA 
Atomics 
Addressing Services 
Tag Matching 
Triggered Operations 
CM Services 
Fabric Provider Implementation 
Message Queue 
CM Services 
RMA 
Tag Matching 
Control Interface 
Framework defines multiple interfaces 
Vendors provide optimized implementations 
Addressing Services 
Atomics 
Triggered Operations
Fabric Interfaces 
•Defines philosophy for interfaces and extensions 
–Focus interfaces on the semanticsand servicesoffered by the hardware and not the hardware implementation 
•Exportsa minimal API 
–Control interface 
•Defines fabric interfaces 
–API sets for specific functionality 
•Defines core object model 
–Object-oriented design, but C-interfaces 
13
Fabric Interfaces Architecture 
•Based on object-oriented programming concepts 
•Derived objects define interfaces 
–New interfaces exposed 
–Define behavior of inherited interfaces 
–Optimize implementation 
www.openfabrics.org 14
Control Interfaces 
•Application specifies desired functionality 
•Discover fabric providers and services 
•Identify resources and addressing 
fi_getinfo 
•Open a set of fabric interfaces and resources 
fi_fabric 
•Dynamic providers publish control interfaces 
fi_register 
www.openfabrics.org 15 
FI Framework 
fi_getinfo 
fi_fabric 
fi_register
Application Semantics 
•Progress 
–Application or hardware driven 
–Data versus control interfaces 
•Ordering 
–Message ordering 
–Data delivery order 
•Multi-threading and locking model 
–Compile and run-time options 
www.openfabrics.org 16 
Get / set using control interfaces
Fabric 
Fabric Object Model 
www.openfabrics.org 17 
Fabric Interfaces 
NIC 
Physical Fabric 
NIC 
NIC 
Resource Domain 
Resource Domain 
Event Queue 
Event Counter 
Active Endpoint 
Address Vectors 
Event Queue 
Event Counter 
Active Endpoint 
Address Vectors 
Passive Endpoint 
Event Queue 
Boundary of resource sharing 
Provider abstracts multiple NICs 
Software objects usable across multiple HW providers
Endpoint Interfaces 
www.openfabrics.org 18 
Type 
Protocol 
CM 
Message Transfers 
RMA 
Tagged 
Atomics 
Triggered 
Properties 
Interfaces 
Endpoint 
Communication interfaces 
Software path to hardware optimized based on endpoint properties 
Capabilities 
Message Transfers 
RMA 
Tagged 
Atomics 
Triggered 
Aliasingsupports multiple software paths to hardware
Application Configured Interfaces 
19 
lg. msg 
RMA 
NIC 
Message Queue Ops 
RMA Ops 
Endpoint 
Communication type 
Capabilities 
Data transfer flags 
sm. msg 
inline send 
send 
write 
read 
Provider directs app to best API sets 
App specifies commmodel
Event Queues 
www.openfabrics.org 20 
Format 
Wait Object 
Context only 
Data 
Tagged 
Generic 
None 
fd 
mwait 
Properties 
Interface Details 
Event Queues 
Asynchronous event reporting 
User specified wait object 
Compact optimized data structures 
Optimize interface around reporting successful operations 
Domain 
Event counters support lightweight event reporting
Event Queues 
www.openfabrics.org 21 
read CQ 
optimized CQ 
Generic completion 
Op context 
Send: +4-6 writes, +2 branches 
Recv: +10-13 writes, +4 branches 
+1 write, +0 branches 
App selects completion structure 
Generic verbs completion example 
Application optimized completion
Address Vectors 
22 
Store addresses/host names 
-Insert range of addresses with single call 
StartRange 
End Range 
Base LID 
SL 
host10 
host1000 
50 
1 
host1001 
host4999 
2000 
2 
Share between processes 
Enable provider optimization techniques 
-Greatly reduce storage requirements 
Reference entries by handle or index 
-Handle may be encodedfabric address 
Reference vector for group communication 
Example only 
Fabric specific addressing requirements
Summary 
www.openfabrics.org 23 
•These concepts are necessary, not revolutionary 
–Communication addressing, optimized data transfers, app- centric interfaces, future looking 
•Want a solution where the pieces fit tightly together
Repeated Call for Participation 
•Co-chair (sean.hefty@intel.com) 
–Meets Tuesdays from 9-10 PST / 12-1 EST 
•Links 
–Mailing list subscription 
•http://lists.openfabrics.org/mailman/listinfo/ofiwg 
–Document downloads 
•https://www.openfabrics.org/downloads/OFIWG/ 
–libfabricsource tree 
•www.github.com/ofiwg/libfabric 
–labfabricsample programs 
•www.github.com/ofiwg/fabtests 
24
Backup 
www.openfabrics.org 25
Verbs API Mismatch 
26 
struct ibv_sge { 
uint64_taddr; 
uint32_tlength; 
uint32_tlkey; 
}; 
struct ibv_send_wr { 
uint64_twr_id; 
struct ibv_send_wr *next; 
struct ibv_sge*sg_list; 
intnum_sge; 
enum ibv_wr_opcodeopcode; 
intsend_flags; 
uint32_timm_data; 
... 
}; 
Application request 
<buffer, length, context> 
3 x 8 = 24 bytes of data needed 
SGE + WR = 88 bytes allocated 
Requests may be linked - next must be set to NULL 
Must link to separate SGL and initialize count 
App must set and provider must switch on opcode 
Must clear flags 
28 additional bytes initialized 
Significant SW overhead
Verbs Provider Mismatch 
27 
For each work request 
Check for available queue space 
Check SGL size 
Check valid opcode 
Check flags x 2 
Check specific opcode 
Switch on QP type 
Switch on opcode 
Check flags 
For each SGE 
Check size 
Loop over length 
Check flags 
Check 
Check for last request 
Other checks x 3 
19+ branches including loops 
100+ lines of C code50-60 lines of code to HW 
Most often 1(overlap operations) 
Often 1 or 2(fixed in source) 
Artifactof API 
QP type usually fixed in source 
Flags may be fixed or app may have taken branches
Verbs Completions Mismatch 
28 
struct ibv_wc { 
uint64_twr_id; 
enum ibv_wc_statusstatus; 
enum ibv_wc_opcodeopcode; 
uint32_tvendor_err; 
uint32_tbyte_len; 
uint32_timm_data; 
uint32_tqp_num; 
uint32_tsrc_qp; 
intwc_flags; 
uint16_tpkey_index; 
uint16_tslid; 
uint8_tsl; 
uint8_tdlid_path_bits; 
}; 
Application accessed fields 
Provider must fill out all fields, even those ignored by the app 
Developer must determine if fields apply to their QP 
App must check both return code and status to determine if a request completed successfully 
Single structure is 48 byteslikely to cross cachelineboundary 
Provider must handle all types of completions from any QP
RDMA CM Mismatch 
29 
struct rdma_route { 
struct rdma_addr addr; 
struct ibv_sa_path_rec *path_rec; 
... 
}; 
struct rdma_cm_id {...}; 
rdma_create_id() 
rdma_resolve_addr() 
rdma_resolve_route() 
rdma_connect() 
Src/dstaddresses stored per endpoint 
456 bytes per endpoint 
Path record per endpoint 
Resolve single address and path at a time 
All to all connected model for best performance 
Want: reliable data transfers, zero copies to thousands of processes 
RDMA interfaces expose:
Progress 
•Ability of the underlying implementation to complete processing of an asynchronous request 
•Need to consider ALLasynchronous requests 
–Connections, address resolution, data transfers, event processing, completions, etc. 
•HW/SW mix 
www.openfabrics.org 30 
All(?) current solutions require significant software components
Progress 
•Support two progress models 
–Automatic and implicit 
•Separate operations as belonging to one of two progress domains 
–Data or control 
–Report progress model for each domain 
www.openfabrics.org 31 
SAMPLE 
Implicit 
Automatic 
Data 
Software 
Hardware offload 
Control 
Software 
Kernel services
Automatic Progress 
•Implies hardware offload model 
–Or standard kernel services / threads for control operations 
•Once an operation is initiated, it will complete without further user intervention or calls into the API 
•Automatic progress meets implicit model by definition 
www.openfabrics.org 32
Implicit Progress 
•Implies significant software component 
•Occurs when reading or waiting on EQ(s) 
•Application can use separate EQs for control and data 
•Progress limited to objects associated with selected EQ(s) 
•App can request automatic progress 
–E.g. app wants to wait on native wait object 
–Implies provider allocated threading 
www.openfabrics.org 33
Ordering 
•Applies to a single initiator endpoint performing data transfers to one target endpoint over the same data flow 
–Data flow may be a conceptual QoSlevel or path through the network 
•Separate ordering domains 
–Completions, message, data 
•Fenced ordering may be obtained using fi_syncoperation 
www.openfabrics.org 34
Completion Ordering 
•Order in which operation completions are reported relative to their submission 
•Unordered or ordered 
–No defined requirement for ordered completions 
•Default: unordered 
www.openfabrics.org 35
Message Ordering 
•Order in which message (transport) headers are processed 
–I.e. whether transport message are received in or out of order 
•Determined by selection of ordering bits 
–[Read | Write | Send] After [Read | Write | Send] 
–RAR, RAW, RAS, WAR, WAW, WAS, SAR, SAW, SAS 
•Example: 
–fi_order= 0 // unordered 
–fi_order= RAR | RAW | RAS | WAW | WAS | SAW | SAS // IB/iWarplike ordering 
www.openfabrics.org 36
Data Ordering 
•Delivery order of transport data into target memory 
–Ordering per byte-addressable location 
–I.e. access to the same byte in memory 
•Ordering constrained by message ordering rules 
–Must at least have message ordering first 
www.openfabrics.org 37
Data Ordering 
•Ordering limited to message order size 
–E.g. MTU 
–In order data delivery if transfer <= message order size 
–WAW, RAW, WAR sizes? 
•Message order size = 0 
–No data ordering 
•Message order size = -1 
–All data ordered 
www.openfabrics.org 38
Other Ordering Rules 
•Ordering to different target endpoints not defined 
•Per message ordering semantics implemented using different data flows 
–Data flows may be less flexible, but easier to optimize for 
–Endpoint aliases may be configured to use different data flows 
www.openfabrics.org 39
Multi-threading and Locking 
•Support both thread safe and lockless models 
–Compile time and run time support 
–Run-time limited to compiled support 
•Lockless (based on MPI model) 
–Single –single-threaded app 
–Funneled –only 1 thread calls into interfaces 
–Serialized –only 1 thread at a time calls into interfaces 
•Thread safe 
–Multiple –multi-threaded app, with no restrictions 
www.openfabrics.org 40
Buffering 
•Support both application and network buffering 
–Zero-copy for high-performance 
–Network buffering for ease of use 
•Buffering in local memory or NIC 
–In some case, buffered transfers may be higher- performing (e.g. “inline”) 
•Registration option for local NIC access 
–Migration to fabric managed registration 
•Required registration for remote access 
–Specify permissions 
www.openfabrics.org 41
Scalable Transfer Interfaces 
•Applicationoptimized code paths based on usage model 
•Optimize call(s) for single work request 
–Single data buffer 
–Still support more complex WR lists/SGL 
•Per endpoint send/receive operations 
–Separate RMA function calls 
•Pre-configure data transfer flags 
–Known before post request 
–Select software path through provider 
42

More Related Content

Viewers also liked

[KB캠퍼스스타] 팔방미인_일념통암
[KB캠퍼스스타] 팔방미인_일념통암[KB캠퍼스스타] 팔방미인_일념통암
[KB캠퍼스스타] 팔방미인_일념통암simplan
 
Drying conditions for rice and tomato
Drying conditions for rice and tomatoDrying conditions for rice and tomato
Drying conditions for rice and tomatoIAEME Publication
 
PP Samit (NALED, Ekonomsko odeljenje Francuske Ambasade u Srbiji i Privredna ...
PP Samit (NALED, Ekonomsko odeljenje Francuske Ambasade u Srbiji i Privredna ...PP Samit (NALED, Ekonomsko odeljenje Francuske Ambasade u Srbiji i Privredna ...
PP Samit (NALED, Ekonomsko odeljenje Francuske Ambasade u Srbiji i Privredna ...NALED Serbia
 
VISITANDO TEMPLOS BUDISTA EN JAPóN VISITING Buddhist temple in Japan 日本を訪問仏教寺...
VISITANDO TEMPLOS BUDISTA EN JAPóN VISITING Buddhist temple in Japan 日本を訪問仏教寺...VISITANDO TEMPLOS BUDISTA EN JAPóN VISITING Buddhist temple in Japan 日本を訪問仏教寺...
VISITANDO TEMPLOS BUDISTA EN JAPóN VISITING Buddhist temple in Japan 日本を訪問仏教寺...B & M Co., Ltd.
 
PowerPoint donts GREEK VERSION
PowerPoint donts GREEK VERSIONPowerPoint donts GREEK VERSION
PowerPoint donts GREEK VERSIONspyrkyrim
 
Real Time Implementation Of Face Recognition System
Real Time Implementation Of Face Recognition SystemReal Time Implementation Of Face Recognition System
Real Time Implementation Of Face Recognition SystemIJERA Editor
 
Development of self leadership skills training module based on need assessmen...
Development of self leadership skills training module based on need assessmen...Development of self leadership skills training module based on need assessmen...
Development of self leadership skills training module based on need assessmen...IAEME Publication
 
Sitic ii web 2.0
Sitic ii web 2.0Sitic ii web 2.0
Sitic ii web 2.0fteixido
 
Как мы документируем программные интерфейсы. Алексей Миронов
Как мы документируем программные интерфейсы. Алексей МироновКак мы документируем программные интерфейсы. Алексей Миронов
Как мы документируем программные интерфейсы. Алексей МироновYandex
 
Tristan, payong
Tristan, payongTristan, payong
Tristan, payonggwap0trap
 
Analyzing Video Streaming Quality by Using Various Error Correction Methods o...
Analyzing Video Streaming Quality by Using Various Error Correction Methods o...Analyzing Video Streaming Quality by Using Various Error Correction Methods o...
Analyzing Video Streaming Quality by Using Various Error Correction Methods o...IJERA Editor
 
Layered memristive and memcapacitive switches for printable electronics
Layered memristive and memcapacitive switches for printable electronicsLayered memristive and memcapacitive switches for printable electronics
Layered memristive and memcapacitive switches for printable electronicsThe Skolkovo Foundation
 
Effect of operating conditions on the solvent recovery by the method of evapo
Effect of operating conditions on the solvent recovery by the method of evapoEffect of operating conditions on the solvent recovery by the method of evapo
Effect of operating conditions on the solvent recovery by the method of evapoIAEME Publication
 
Epic research daily agri report 19 nov 2014.pdf
Epic research daily  agri report  19 nov 2014.pdfEpic research daily  agri report  19 nov 2014.pdf
Epic research daily agri report 19 nov 2014.pdfEpic Research Limited
 
Cubic equations1
Cubic equations1Cubic equations1
Cubic equations1Media4math
 

Viewers also liked (20)

[KB캠퍼스스타] 팔방미인_일념통암
[KB캠퍼스스타] 팔방미인_일념통암[KB캠퍼스스타] 팔방미인_일념통암
[KB캠퍼스스타] 팔방미인_일념통암
 
Drying conditions for rice and tomato
Drying conditions for rice and tomatoDrying conditions for rice and tomato
Drying conditions for rice and tomato
 
PP Samit (NALED, Ekonomsko odeljenje Francuske Ambasade u Srbiji i Privredna ...
PP Samit (NALED, Ekonomsko odeljenje Francuske Ambasade u Srbiji i Privredna ...PP Samit (NALED, Ekonomsko odeljenje Francuske Ambasade u Srbiji i Privredna ...
PP Samit (NALED, Ekonomsko odeljenje Francuske Ambasade u Srbiji i Privredna ...
 
VISITANDO TEMPLOS BUDISTA EN JAPóN VISITING Buddhist temple in Japan 日本を訪問仏教寺...
VISITANDO TEMPLOS BUDISTA EN JAPóN VISITING Buddhist temple in Japan 日本を訪問仏教寺...VISITANDO TEMPLOS BUDISTA EN JAPóN VISITING Buddhist temple in Japan 日本を訪問仏教寺...
VISITANDO TEMPLOS BUDISTA EN JAPóN VISITING Buddhist temple in Japan 日本を訪問仏教寺...
 
ΠΑΡΑΝΕΣΤΙ
ΠΑΡΑΝΕΣΤΙΠΑΡΑΝΕΣΤΙ
ΠΑΡΑΝΕΣΤΙ
 
PowerPoint donts GREEK VERSION
PowerPoint donts GREEK VERSIONPowerPoint donts GREEK VERSION
PowerPoint donts GREEK VERSION
 
Real Time Implementation Of Face Recognition System
Real Time Implementation Of Face Recognition SystemReal Time Implementation Of Face Recognition System
Real Time Implementation Of Face Recognition System
 
Development of self leadership skills training module based on need assessmen...
Development of self leadership skills training module based on need assessmen...Development of self leadership skills training module based on need assessmen...
Development of self leadership skills training module based on need assessmen...
 
Stranded tin coated copper wire
Stranded tin coated copper wireStranded tin coated copper wire
Stranded tin coated copper wire
 
Sitic ii web 2.0
Sitic ii web 2.0Sitic ii web 2.0
Sitic ii web 2.0
 
Oubao Moin
Oubao MoinOubao Moin
Oubao Moin
 
Как мы документируем программные интерфейсы. Алексей Миронов
Как мы документируем программные интерфейсы. Алексей МироновКак мы документируем программные интерфейсы. Алексей Миронов
Как мы документируем программные интерфейсы. Алексей Миронов
 
LMT 4G internets telefonā
LMT 4G internets telefonāLMT 4G internets telefonā
LMT 4G internets telefonā
 
Tristan, payong
Tristan, payongTristan, payong
Tristan, payong
 
Analyzing Video Streaming Quality by Using Various Error Correction Methods o...
Analyzing Video Streaming Quality by Using Various Error Correction Methods o...Analyzing Video Streaming Quality by Using Various Error Correction Methods o...
Analyzing Video Streaming Quality by Using Various Error Correction Methods o...
 
Layered memristive and memcapacitive switches for printable electronics
Layered memristive and memcapacitive switches for printable electronicsLayered memristive and memcapacitive switches for printable electronics
Layered memristive and memcapacitive switches for printable electronics
 
Effect of operating conditions on the solvent recovery by the method of evapo
Effect of operating conditions on the solvent recovery by the method of evapoEffect of operating conditions on the solvent recovery by the method of evapo
Effect of operating conditions on the solvent recovery by the method of evapo
 
Jay
JayJay
Jay
 
Epic research daily agri report 19 nov 2014.pdf
Epic research daily  agri report  19 nov 2014.pdfEpic research daily  agri report  19 nov 2014.pdf
Epic research daily agri report 19 nov 2014.pdf
 
Cubic equations1
Cubic equations1Cubic equations1
Cubic equations1
 

Similar to Open Fabrics Interfaces Architecture Introduction

OFI Overview 2019 Webinar
OFI Overview 2019 WebinarOFI Overview 2019 Webinar
OFI Overview 2019 Webinarseanhefty
 
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...apidays
 
Light Reading BTE_SDNtoolbox_June_2015
Light Reading BTE_SDNtoolbox_June_2015Light Reading BTE_SDNtoolbox_June_2015
Light Reading BTE_SDNtoolbox_June_2015Deborah Porchivina
 
Data as a_service_1.1_anup
Data as a_service_1.1_anupData as a_service_1.1_anup
Data as a_service_1.1_anupAnup kumar
 
OFC2016 SDN Framework and APIs
OFC2016 SDN Framework and APIsOFC2016 SDN Framework and APIs
OFC2016 SDN Framework and APIsDeborah Porchivina
 
OIF Transport SDN Interop - ECOC 2016
OIF Transport SDN Interop - ECOC 2016OIF Transport SDN Interop - ECOC 2016
OIF Transport SDN Interop - ECOC 2016Deborah Porchivina
 
2. RINA overview - TF workshop
2. RINA overview - TF workshop2. RINA overview - TF workshop
2. RINA overview - TF workshopARCFIRE ICT
 
Transport SDN & NFV - What does it mean for Optical Networking?
Transport SDN & NFV - What does it mean for Optical Networking?Transport SDN & NFV - What does it mean for Optical Networking?
Transport SDN & NFV - What does it mean for Optical Networking?Deborah Porchivina
 
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road AheadAmazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Aheadinside-BigData.com
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVGhodhbane Mohamed Amine
 
Apache Thrift, a brief introduction
Apache Thrift, a brief introductionApache Thrift, a brief introduction
Apache Thrift, a brief introductionRandy Abernethy
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSimDeepak Shankar
 
SDN and NFV Value in Business Services - A Presentation By Cox Communications
SDN and NFV Value in Business Services - A Presentation By Cox CommunicationsSDN and NFV Value in Business Services - A Presentation By Cox Communications
SDN and NFV Value in Business Services - A Presentation By Cox CommunicationsCisco Service Provider
 
2018 OIF SDN T-API Readout 6.2018
2018 OIF SDN T-API Readout 6.20182018 OIF SDN T-API Readout 6.2018
2018 OIF SDN T-API Readout 6.2018Leah Wilkinson
 
Spirent SDN and NFV Solutions
Spirent SDN and NFV SolutionsSpirent SDN and NFV Solutions
Spirent SDN and NFV SolutionsMalathi Malla
 
Spirent Accelerating SDN and NFV Deployments
Spirent Accelerating SDN and NFV DeploymentsSpirent Accelerating SDN and NFV Deployments
Spirent Accelerating SDN and NFV DeploymentsSailaja Tennati
 

Similar to Open Fabrics Interfaces Architecture Introduction (20)

OFI Overview 2019 Webinar
OFI Overview 2019 WebinarOFI Overview 2019 Webinar
OFI Overview 2019 Webinar
 
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...
apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...
 
Light Reading BTE_SDNtoolbox_June_2015
Light Reading BTE_SDNtoolbox_June_2015Light Reading BTE_SDNtoolbox_June_2015
Light Reading BTE_SDNtoolbox_June_2015
 
Introduction to SignalR
Introduction to SignalRIntroduction to SignalR
Introduction to SignalR
 
Data as a_service_1.1_anup
Data as a_service_1.1_anupData as a_service_1.1_anup
Data as a_service_1.1_anup
 
OFC2016 SDN Framework and APIs
OFC2016 SDN Framework and APIsOFC2016 SDN Framework and APIs
OFC2016 SDN Framework and APIs
 
OIF Transport SDN Interop - ECOC 2016
OIF Transport SDN Interop - ECOC 2016OIF Transport SDN Interop - ECOC 2016
OIF Transport SDN Interop - ECOC 2016
 
2. RINA overview - TF workshop
2. RINA overview - TF workshop2. RINA overview - TF workshop
2. RINA overview - TF workshop
 
KRITI_BHOLA_CV
KRITI_BHOLA_CVKRITI_BHOLA_CV
KRITI_BHOLA_CV
 
Transport SDN & NFV - What does it mean for Optical Networking?
Transport SDN & NFV - What does it mean for Optical Networking?Transport SDN & NFV - What does it mean for Optical Networking?
Transport SDN & NFV - What does it mean for Optical Networking?
 
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road AheadAmazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFV
 
Embedded to connected
Embedded to connectedEmbedded to connected
Embedded to connected
 
Apache Thrift, a brief introduction
Apache Thrift, a brief introductionApache Thrift, a brief introduction
Apache Thrift, a brief introduction
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSim
 
SDN and NFV Value in Business Services - A Presentation By Cox Communications
SDN and NFV Value in Business Services - A Presentation By Cox CommunicationsSDN and NFV Value in Business Services - A Presentation By Cox Communications
SDN and NFV Value in Business Services - A Presentation By Cox Communications
 
Moran wsmx
Moran wsmxMoran wsmx
Moran wsmx
 
2018 OIF SDN T-API Readout 6.2018
2018 OIF SDN T-API Readout 6.20182018 OIF SDN T-API Readout 6.2018
2018 OIF SDN T-API Readout 6.2018
 
Spirent SDN and NFV Solutions
Spirent SDN and NFV SolutionsSpirent SDN and NFV Solutions
Spirent SDN and NFV Solutions
 
Spirent Accelerating SDN and NFV Deployments
Spirent Accelerating SDN and NFV DeploymentsSpirent Accelerating SDN and NFV Deployments
Spirent Accelerating SDN and NFV Deployments
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Open Fabrics Interfaces Architecture Introduction

  • 1. Open Fabrics InterfacesArchitecture Introduction Sean Hefty Intel Corporation
  • 2. Current State of Affairs •OFED SW was not designed around HPC •Hardware and fabric features are changing –Divergence is driving competing APIs •Interfaces are being extended, and new APIs introduced –Long delay in adoption •Size of clusters and single node core counts greatly increasing •More applications are wanting to take advantage of high- performance fabrics 2 OFED software •Widely adopted low-level RDMA API •Ships with upstream Linux but…
  • 3. Solution 3 Design software interfaces that are aligned with application requirements Evolve OpenFabrics Target needs of HPC Support multiple interface semantics Fabric and vendor agnostic Supportable in upstream Linux
  • 4. Enabling through OpenFabrics •Leveraging existing open source community •Broad ecosystem –Application developers and vendors –Active community engagement •Drive high-performance software APIs –Take advantage of available hardware features –Support multiple product generations 4 Open Fabrics Interfaces Working Group
  • 5. OFIWG Charter •Develop an extensible, open source framework and interfaces aligned with ULP and applicationneeds for high-performancefabric services •Software leading hardware –Enable future hardware features –Minimal impact to applications •Minimize impedance match between ULPs and network APIs •Craft optimal APIs –Detailed analysis on MPI, SHMEM, and other PGAS languages –Focus on other applications –storage, databases, … 5
  • 6. Call for Participation •OFI WG is open participation –Contact the ofiwgmail list for meeting details –ofiwg@lists.openfabrics.org •Source code available through github –github.com/ofiwg •Presentations / meeting minutes available from OFA download directory 6 Help OFI WG understand workload requirements and drive software design
  • 7. Enable.. 7 Optimized software path to hardware •Independent of hardware interface, version, features Scalability High performance App-centric Extensible Reduced cache and memory footprint •Scalable address resolution and storage •Tight data structures Analyze application needs •Implement them in a coherent, concise, high-performance manner More agile development •Time-boxed, iterative development •Application focused APIs •Adaptable
  • 8. Verbs Semantic Mismatch 8 Allocate WR Allocate SGE Format SGE –3 writes Format WR –6 writes Loop 1 Checks –9 branchesLoop 2 Check Loop 3 Checks –3 branches Checks –3 branches Direct call –3 writes Checks –2 branches 50-60 lines of C-code 25-30 lines of C-code generic send call optimized send call Reduce setup cost-Tighter data Eliminate loops and branches-Remaining branches predictable Selective optimization paths to HW-Manual function expansion Current RDMA APIs Evolved Fabric Interfaces
  • 9. Application-Centric Interfaces •Collect application requirements •Identify common, fast path usage models –Too many use cases to optimize them all •Build primitives around fabric services –Not device specific interface 9 Reducing instruction count requires a better application impedance match
  • 10. OFA Software Evolution 10 libfabric FI Provider IB Verbs Verbs Provider Verbs Fabric Interfaces Transition from disjoint APIs to a cohesive set of fabric interfaces RDMA CM CM uVerbsCommand
  • 11. Fabric Interfaces Framework •Take growth into consideration •Reduce effort to incorporate new applicationfeatures –Addition of new interfaces, structures, or fields –Modification of existing functions •Allow time to design new interfaces correctly –Support prototyping interfaces prior to integration www.openfabrics.org 11 Focus on longer-lived interfaces – software leading hardware
  • 12. Fabric Interfaces 12 Fabric Interfaces Message Queue Control Interface RMA Atomics Addressing Services Tag Matching Triggered Operations CM Services Fabric Provider Implementation Message Queue CM Services RMA Tag Matching Control Interface Framework defines multiple interfaces Vendors provide optimized implementations Addressing Services Atomics Triggered Operations
  • 13. Fabric Interfaces •Defines philosophy for interfaces and extensions –Focus interfaces on the semanticsand servicesoffered by the hardware and not the hardware implementation •Exportsa minimal API –Control interface •Defines fabric interfaces –API sets for specific functionality •Defines core object model –Object-oriented design, but C-interfaces 13
  • 14. Fabric Interfaces Architecture •Based on object-oriented programming concepts •Derived objects define interfaces –New interfaces exposed –Define behavior of inherited interfaces –Optimize implementation www.openfabrics.org 14
  • 15. Control Interfaces •Application specifies desired functionality •Discover fabric providers and services •Identify resources and addressing fi_getinfo •Open a set of fabric interfaces and resources fi_fabric •Dynamic providers publish control interfaces fi_register www.openfabrics.org 15 FI Framework fi_getinfo fi_fabric fi_register
  • 16. Application Semantics •Progress –Application or hardware driven –Data versus control interfaces •Ordering –Message ordering –Data delivery order •Multi-threading and locking model –Compile and run-time options www.openfabrics.org 16 Get / set using control interfaces
  • 17. Fabric Fabric Object Model www.openfabrics.org 17 Fabric Interfaces NIC Physical Fabric NIC NIC Resource Domain Resource Domain Event Queue Event Counter Active Endpoint Address Vectors Event Queue Event Counter Active Endpoint Address Vectors Passive Endpoint Event Queue Boundary of resource sharing Provider abstracts multiple NICs Software objects usable across multiple HW providers
  • 18. Endpoint Interfaces www.openfabrics.org 18 Type Protocol CM Message Transfers RMA Tagged Atomics Triggered Properties Interfaces Endpoint Communication interfaces Software path to hardware optimized based on endpoint properties Capabilities Message Transfers RMA Tagged Atomics Triggered Aliasingsupports multiple software paths to hardware
  • 19. Application Configured Interfaces 19 lg. msg RMA NIC Message Queue Ops RMA Ops Endpoint Communication type Capabilities Data transfer flags sm. msg inline send send write read Provider directs app to best API sets App specifies commmodel
  • 20. Event Queues www.openfabrics.org 20 Format Wait Object Context only Data Tagged Generic None fd mwait Properties Interface Details Event Queues Asynchronous event reporting User specified wait object Compact optimized data structures Optimize interface around reporting successful operations Domain Event counters support lightweight event reporting
  • 21. Event Queues www.openfabrics.org 21 read CQ optimized CQ Generic completion Op context Send: +4-6 writes, +2 branches Recv: +10-13 writes, +4 branches +1 write, +0 branches App selects completion structure Generic verbs completion example Application optimized completion
  • 22. Address Vectors 22 Store addresses/host names -Insert range of addresses with single call StartRange End Range Base LID SL host10 host1000 50 1 host1001 host4999 2000 2 Share between processes Enable provider optimization techniques -Greatly reduce storage requirements Reference entries by handle or index -Handle may be encodedfabric address Reference vector for group communication Example only Fabric specific addressing requirements
  • 23. Summary www.openfabrics.org 23 •These concepts are necessary, not revolutionary –Communication addressing, optimized data transfers, app- centric interfaces, future looking •Want a solution where the pieces fit tightly together
  • 24. Repeated Call for Participation •Co-chair (sean.hefty@intel.com) –Meets Tuesdays from 9-10 PST / 12-1 EST •Links –Mailing list subscription •http://lists.openfabrics.org/mailman/listinfo/ofiwg –Document downloads •https://www.openfabrics.org/downloads/OFIWG/ –libfabricsource tree •www.github.com/ofiwg/libfabric –labfabricsample programs •www.github.com/ofiwg/fabtests 24
  • 26. Verbs API Mismatch 26 struct ibv_sge { uint64_taddr; uint32_tlength; uint32_tlkey; }; struct ibv_send_wr { uint64_twr_id; struct ibv_send_wr *next; struct ibv_sge*sg_list; intnum_sge; enum ibv_wr_opcodeopcode; intsend_flags; uint32_timm_data; ... }; Application request <buffer, length, context> 3 x 8 = 24 bytes of data needed SGE + WR = 88 bytes allocated Requests may be linked - next must be set to NULL Must link to separate SGL and initialize count App must set and provider must switch on opcode Must clear flags 28 additional bytes initialized Significant SW overhead
  • 27. Verbs Provider Mismatch 27 For each work request Check for available queue space Check SGL size Check valid opcode Check flags x 2 Check specific opcode Switch on QP type Switch on opcode Check flags For each SGE Check size Loop over length Check flags Check Check for last request Other checks x 3 19+ branches including loops 100+ lines of C code50-60 lines of code to HW Most often 1(overlap operations) Often 1 or 2(fixed in source) Artifactof API QP type usually fixed in source Flags may be fixed or app may have taken branches
  • 28. Verbs Completions Mismatch 28 struct ibv_wc { uint64_twr_id; enum ibv_wc_statusstatus; enum ibv_wc_opcodeopcode; uint32_tvendor_err; uint32_tbyte_len; uint32_timm_data; uint32_tqp_num; uint32_tsrc_qp; intwc_flags; uint16_tpkey_index; uint16_tslid; uint8_tsl; uint8_tdlid_path_bits; }; Application accessed fields Provider must fill out all fields, even those ignored by the app Developer must determine if fields apply to their QP App must check both return code and status to determine if a request completed successfully Single structure is 48 byteslikely to cross cachelineboundary Provider must handle all types of completions from any QP
  • 29. RDMA CM Mismatch 29 struct rdma_route { struct rdma_addr addr; struct ibv_sa_path_rec *path_rec; ... }; struct rdma_cm_id {...}; rdma_create_id() rdma_resolve_addr() rdma_resolve_route() rdma_connect() Src/dstaddresses stored per endpoint 456 bytes per endpoint Path record per endpoint Resolve single address and path at a time All to all connected model for best performance Want: reliable data transfers, zero copies to thousands of processes RDMA interfaces expose:
  • 30. Progress •Ability of the underlying implementation to complete processing of an asynchronous request •Need to consider ALLasynchronous requests –Connections, address resolution, data transfers, event processing, completions, etc. •HW/SW mix www.openfabrics.org 30 All(?) current solutions require significant software components
  • 31. Progress •Support two progress models –Automatic and implicit •Separate operations as belonging to one of two progress domains –Data or control –Report progress model for each domain www.openfabrics.org 31 SAMPLE Implicit Automatic Data Software Hardware offload Control Software Kernel services
  • 32. Automatic Progress •Implies hardware offload model –Or standard kernel services / threads for control operations •Once an operation is initiated, it will complete without further user intervention or calls into the API •Automatic progress meets implicit model by definition www.openfabrics.org 32
  • 33. Implicit Progress •Implies significant software component •Occurs when reading or waiting on EQ(s) •Application can use separate EQs for control and data •Progress limited to objects associated with selected EQ(s) •App can request automatic progress –E.g. app wants to wait on native wait object –Implies provider allocated threading www.openfabrics.org 33
  • 34. Ordering •Applies to a single initiator endpoint performing data transfers to one target endpoint over the same data flow –Data flow may be a conceptual QoSlevel or path through the network •Separate ordering domains –Completions, message, data •Fenced ordering may be obtained using fi_syncoperation www.openfabrics.org 34
  • 35. Completion Ordering •Order in which operation completions are reported relative to their submission •Unordered or ordered –No defined requirement for ordered completions •Default: unordered www.openfabrics.org 35
  • 36. Message Ordering •Order in which message (transport) headers are processed –I.e. whether transport message are received in or out of order •Determined by selection of ordering bits –[Read | Write | Send] After [Read | Write | Send] –RAR, RAW, RAS, WAR, WAW, WAS, SAR, SAW, SAS •Example: –fi_order= 0 // unordered –fi_order= RAR | RAW | RAS | WAW | WAS | SAW | SAS // IB/iWarplike ordering www.openfabrics.org 36
  • 37. Data Ordering •Delivery order of transport data into target memory –Ordering per byte-addressable location –I.e. access to the same byte in memory •Ordering constrained by message ordering rules –Must at least have message ordering first www.openfabrics.org 37
  • 38. Data Ordering •Ordering limited to message order size –E.g. MTU –In order data delivery if transfer <= message order size –WAW, RAW, WAR sizes? •Message order size = 0 –No data ordering •Message order size = -1 –All data ordered www.openfabrics.org 38
  • 39. Other Ordering Rules •Ordering to different target endpoints not defined •Per message ordering semantics implemented using different data flows –Data flows may be less flexible, but easier to optimize for –Endpoint aliases may be configured to use different data flows www.openfabrics.org 39
  • 40. Multi-threading and Locking •Support both thread safe and lockless models –Compile time and run time support –Run-time limited to compiled support •Lockless (based on MPI model) –Single –single-threaded app –Funneled –only 1 thread calls into interfaces –Serialized –only 1 thread at a time calls into interfaces •Thread safe –Multiple –multi-threaded app, with no restrictions www.openfabrics.org 40
  • 41. Buffering •Support both application and network buffering –Zero-copy for high-performance –Network buffering for ease of use •Buffering in local memory or NIC –In some case, buffered transfers may be higher- performing (e.g. “inline”) •Registration option for local NIC access –Migration to fabric managed registration •Required registration for remote access –Specify permissions www.openfabrics.org 41
  • 42. Scalable Transfer Interfaces •Applicationoptimized code paths based on usage model •Optimize call(s) for single work request –Single data buffer –Still support more complex WR lists/SGL •Per endpoint send/receive operations –Separate RMA function calls •Pre-configure data transfer flags –Known before post request –Select software path through provider 42