SlideShare a Scribd company logo
2019 Storage Developer Conference India © All Rights Reserved.
1
NVMe over Fabrics
Demystified
Rob Davis
Mellanox
© 2019 Mellanox Technologies 22
Why NVMe over Fabrics?
0.1
10
1000
HD SSD NVM
AccessTime(micro-Sec)
StorageMedia Technology
AccessTimeinMicroSeconds
HDD PM
© 2019 Mellanox Technologies 33
NVMe Technology
▪Optimized for flash and PM
▪ Traditional SCSI interfaces designed for spinning disk
▪ NVMe bypasses unneeded layers
▪NVMe Flash Outperforms SAS/SATA Flash
▪ +2.5x more bandwidth, +50% lower latency, +3x
more IOPS
© 2019 Mellanox Technologies 44
“NVMe over Fabrics” was the Logical and
Historical next step
▪Sharing NVMe based storage across
multiple servers/CPUs was the next step
▪ Better utilization: capacity, rack space, power
▪ Scalability, management, fault isolation
▪NVMe over Fabrics standard
▪ 50+ contributors
▪ Version 1.0 released in June 2016
▪Pre-standard demos in 2014
▪Able to almost match local NVMe
performance
Gb/s
© 2019 Mellanox Technologies 55
NVMe over Fabrics (NVMe-oF) Transports
▪ The NVMe-oF standard is
not Fabric specific
▪ Instead there is a separate
Transport Binding
specification for each
Transport Layer
▪ RDMA was 1st
▪ Later Fibre Channel
▪ NVM.org just released a
new binding specification
for TCP
InfiniBand
© 2019 Mellanox Technologies 66
How Does NVMe-oF Maintain NVMe
Performance?
▪ By extending NVMe efficiency over a fabric
▪ NVMe commands and data structures are transferred end
to end
▪ Bypassing legacy stacks for performance
▪ First products and early demos all used RDMA
▪ Performance is impressive
SAS/sATA
Device
over Fabrics
NVMe/RDMA
NVMe/TCP
Transport
Transport
or IB
© 2019 Mellanox Technologies 77
How Does NVMe-oF Maintain NVMe
Performance?
▪ By extending NVMe efficiency over a fabric
▪ NVMe commands and data structures are transferred end
to end
▪ Bypassing legacy stacks for performance
▪ First products and early demos all used RDMA
▪ Performance is impressive
SAS/sATA
Device
over Fabrics
NVMe/RDMA
NVMe/TCP
https://www.theregister.co.uk/2018/08/16/pavilion_fabrics_performance/
© 2019 Mellanox Technologies 88
How Does NVMe-oF Maintain NVMe
Performance?
▪ By extending NVMe efficiency over a fabric
▪ NVMe commands and data structures are transferred end
to end
▪ Bypassing legacy stacks for performance
▪ First products and early demos all used RDMA
▪ Performance is impressive
SAS/sATA
Device
over Fabrics
NVMe/RDMA
NVMe/TCP
Fibre
Channel
Fibre
Channel
NVMe/TCP
NVMe/FC
over Fabrics
~150
© 2019 Mellanox Technologies 99
Faster Storage Needs a Faster Network
10GbE
Fibre Channel
© 2019 Mellanox Technologies 1010
Faster Network Wires Solves Some the Network
Bottle Neck Problem…
Ethernet & InfiniBand
End-to-End 25, 40, 50, 56, 100, 200Gb
Going to 400Gb
© 2019 Mellanox Technologies 1111
Faster Protocols Solves the Rest
© 2019 Mellanox Technologies 1212
Faster Protocols Solves the Rest
© 2019 Mellanox Technologies 1313
NVMe, NVMe-oF, and RDMA Protocols
© 2019 Mellanox Technologies 1414
NVMe/RDMA
adapter based
transport
NVMe-oF over RoCE
© 2019 Mellanox Technologies 1515
NVMe/RDMA
adapter based
transport
1) Ethernet
▪ RoCE
▪ iWARP
2) InfiniBand
3) OmniPath
NVMe-oF over RoCE
© 2019 Mellanox Technologies 1616
NVMe Commands Encapsulated
Network
© 2019 Mellanox Technologies 1717
NVMe Commands Encapsulated
Network
RNICNVMe
Initiator
RNIC
NVMe
Target
Post Send (CC)
Send – Command Capsule
Ack
Completion
Completion
Post NVMe command
Wait for completion
Free receivebuffer
Post Send (RC)
Send – Response Capsule
Completion Ack
Completion
Free send buffer
Free send buffer
Post Send
(Write data)
Write first
Write last
Ack
Completion
Free allocated buffer
© 2019 Mellanox Technologies 1818
Importance of Latency with NVMe-oF
Common Switch & Adapter
Logarithmicscale
Low Latency Switch & Adapter
Network hops multiply latency
Request/Response
Newest
NVMe SSD
© 2019 Mellanox Technologies 1919
Composable Infrastructure Use Case
▪Also called Compute
Storage Disaggregation
and Rack Scale
▪Dramatically improves
data center efficiency
▪NVMe over Fabrics
enables Composable
Infrastructure
▪ Low latency
▪ High bandwidth
▪ Nearly local disk
performance
Switch
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Switch
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
© 2019 Mellanox Technologies 2020
Hyperconverged and Scale-Out Storage Use Case
▪Scale-out
▪Cluster of commodity servers
▪Software provides storage
functions
▪Hyperconverged collapses
compute & storage
▪Integrated compute-storage
nodes & software
▪NVMe-oF performs like
local/direct-attached SSD
Scale out Storage
Mellanox x86 Switch
Compute Nodes
Storage Application
VM VM
VM VM
NVMe NVMe NVMe
NVMe NVMe NVMe
Storage
App
HCI Nodes
© 2019 Mellanox Technologies 2121
Backend Scale Out Use Case
Backend
Network
JBOF
Frontend
© 2019 Mellanox Technologies 2222
NVMe-oF Use Cases: Classic SAN
▪SAN features at
higher performance
▪Better utilization:
capacity, rack space,
and power
▪Scalability
▪Management
▪Fault isolation
© 2019 Mellanox Technologies 2323
NVMe-oF Target Hardware Offloads
No Offload Mode
© 2019 Mellanox Technologies 2424
How Target Offload Works
▪ Offload
▪ Only control path, management and
exceptions go through Target CPU
software
▪ Data path and NVMe commands handled
by the network adapter
© 2019 Mellanox Technologies 2525
Offload vs No Offload Performance
Data Path
DDR4DDR4
PCIe Switch
NVME
SSD
NVME
SSD
Initiator x86
ConnectX-5
Initiator x86
ConnectX-5
▪ 6M IOPs, 512B block size
▪ 2M IOPs, 4K block side
▪ ~15 usec latency (not including
SSD)
no Offload
Target
2 100Gb Initiators
DDR4DDR4
PCIe Switch
NVME
SSD
NVME
SSD
Initiator x86
ConnectX-5
Initiator x86
ConnectX-5
SOC
▪ 8M IOPs, 512B block size
▪ 5M IOPs, 4K block side
▪ ~5 usec latency (not including SSD)
Offload
Target
2 100Gb Initiators
© 2019 Mellanox Technologies 2626
Offload vs No Offload Performance
Data Path
DDR4DDR4
PCIe Switch
NVME
SSD
NVME
SSD
Initiator x86
ConnectX-5
Initiator x86
ConnectX-5
▪ 6M IOPs, 512B block size
▪ 2M IOPs, 4K block side
▪ ~15 usec latency (not including
SSD)
no Offload
Target
2 100Gb Initiators
DDR4DDR4
PCIe Switch
NVME
SSD
NVME
SSD
Initiator x86
ConnectX-5
Initiator x86
ConnectX-5
SOC
▪ 8M IOPs, 512B block size
▪ 5M IOPs, 4K block side
▪ ~5 usec latency (not including SSD)
Offload
Target
2 100Gb Initiators
© 2019 Mellanox Technologies 2727
NVMe Emulation
Physical Local NVMe Storage
Physical Local Storage
OS/Hypervisor
NVMe Standard Driver
PCIe
BUS
NVMe
Host Server
Local Physical Storage to
Hardware Emulated Storage
NVMe Drive Emulation
Host Server
OS/Hypervisor
NVMe Standard Driver
NVMe
Emulated
Storage
PCIe
BUS
Remote Storage
© 2019 Mellanox Technologies 2828
NVMe/TCP
▪NVMe-oF commands are sent over standard TCP/IP sockets
▪Each NVMe queue pair is mapped to a TCP connection
▪Easy to support NVMe over TCP with no changes
▪Good for distance, stranded server, and out of band management connectivity
© 2019 Mellanox Technologies 2929
Latency: NVMe-RDMA vs NVMe-TCP
LocalSSDWrite
RDMAWrite
TCPWrite
Tail Latency
FractionofIOswiththisorlesslatency
© 2019 Mellanox Technologies 3030
Latency: NVMe-RDMA vs NVMe-TCP
LocalSSDWrite
RDMAWrite
TCPWrite
Tail Latency
FractionofIOswiththisorlesslatency
© 2019 Mellanox Technologies 3131
Latency: NVMe-RDMA vs NVMe-TCP
LocalSSDWrite
RDMAWrite
TCPWrite
Tail Latency
FractionofIOswiththisorlesslatency
© 2019 Mellanox Technologies 3232
NVMe over Fabrics Maturity
▪UNH-IOL, a neutral environment
for multi-vendor interoperability
since 1988
▪Four plug fests for NVMe-oF since
May 2017
▪Tests require participating vendors
to mix and match in both Target
and Initiator positions
▪June 2018 test included Mellanox,
Broadcom and Marvel ASIC
solutions
▪URL to list of vendors who OK
public results:
https://www.iol.unh.edu/registry/
nvmeof
© 2019 Mellanox Technologies 3333
NVMe Market Projection – $60B by 2021
▪~$20B in NVMe-oF
revenue projected by
2021
▪NVMe-oF adapter
shipments will exceed
1.5M units by 2021
▪This does not include ASICs,
Custom Mezz Cards, etc.
inside AFAs and other
Storage Appliances
© 2019 Mellanox Technologies 3434
Some NVMe-oF Storage Players
© 2019 Mellanox Technologies 3535
Conclusions
▪NVMe-oF brings the value of networked storage to NVMe
based solutions
▪NVMe-oF is supported across many network technologies
▪The performance advantages of NVMe, are not lost with
NVMe-oF
▪Especially with RDMA
▪There are many suppliers of NVMe-oF solutions across a
variety of important data center use cases
© 2019 Mellanox Technologies 36
Thank You
2019 Storage Developer Conference India © All Rights Reserved.
37
NVMe over Fabrics
Demystified
Rob Davis
Mellanox

More Related Content

What's hot

5G Standards: 3GPP Release 15, 16, and beyond
5G Standards: 3GPP Release 15, 16, and beyond5G Standards: 3GPP Release 15, 16, and beyond
5G Standards: 3GPP Release 15, 16, and beyond
3G4G
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxBroadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptx
Memory Fabric Forum
 
3GPP 5G SA Detailed explanation 3(5G SA NR Attach Registration Call Flow)
3GPP 5G SA Detailed explanation 3(5G SA NR Attach Registration Call Flow)3GPP 5G SA Detailed explanation 3(5G SA NR Attach Registration Call Flow)
3GPP 5G SA Detailed explanation 3(5G SA NR Attach Registration Call Flow)
Ryuichi Yasunaga
 
VMware vSphere 6.0 - Troubleshooting Training - Day 5
VMware vSphere 6.0 - Troubleshooting Training - Day 5VMware vSphere 6.0 - Troubleshooting Training - Day 5
VMware vSphere 6.0 - Troubleshooting Training - Day 5
Sanjeev Kumar
 
Overview of kubernetes network functions
Overview of kubernetes network functionsOverview of kubernetes network functions
Overview of kubernetes network functions
HungWei Chiu
 
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
Hidetsugu Sugiyama
 
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
ScyllaDB
 
How to build a Kubernetes networking solution from scratch
How to build a Kubernetes networking solution from scratchHow to build a Kubernetes networking solution from scratch
How to build a Kubernetes networking solution from scratch
All Things Open
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architectures
inside-BigData.com
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
Tim Bell
 
FlashSystem Portfolio Overview April 2016 w/ A9000
FlashSystem Portfolio Overview April 2016 w/ A9000FlashSystem Portfolio Overview April 2016 w/ A9000
FlashSystem Portfolio Overview April 2016 w/ A9000
Joe Krotz
 
Routed Fabrics For Ceph
Routed Fabrics For CephRouted Fabrics For Ceph
Routed Fabrics For Ceph
ShapeBlue
 
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
OpenStack
 
Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containers
inside-BigData.com
 
NVMe overview
NVMe overviewNVMe overview
NVMe overview
Michael Wang
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko VancsaStarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
Vietnam Open Infrastructure User Group
 
Network Function Virtualization (NFV) using IOS-XR
Network Function Virtualization (NFV) using IOS-XRNetwork Function Virtualization (NFV) using IOS-XR
Network Function Virtualization (NFV) using IOS-XR
Cisco Canada
 
Deploying CloudStack with Ceph
Deploying CloudStack with CephDeploying CloudStack with Ceph
Deploying CloudStack with Ceph
ShapeBlue
 
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
ShapeBlue
 

What's hot (20)

5G Standards: 3GPP Release 15, 16, and beyond
5G Standards: 3GPP Release 15, 16, and beyond5G Standards: 3GPP Release 15, 16, and beyond
5G Standards: 3GPP Release 15, 16, and beyond
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxBroadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptx
 
3GPP 5G SA Detailed explanation 3(5G SA NR Attach Registration Call Flow)
3GPP 5G SA Detailed explanation 3(5G SA NR Attach Registration Call Flow)3GPP 5G SA Detailed explanation 3(5G SA NR Attach Registration Call Flow)
3GPP 5G SA Detailed explanation 3(5G SA NR Attach Registration Call Flow)
 
VMware vSphere 6.0 - Troubleshooting Training - Day 5
VMware vSphere 6.0 - Troubleshooting Training - Day 5VMware vSphere 6.0 - Troubleshooting Training - Day 5
VMware vSphere 6.0 - Troubleshooting Training - Day 5
 
Overview of kubernetes network functions
Overview of kubernetes network functionsOverview of kubernetes network functions
Overview of kubernetes network functions
 
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
 
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
 
How to build a Kubernetes networking solution from scratch
How to build a Kubernetes networking solution from scratchHow to build a Kubernetes networking solution from scratch
How to build a Kubernetes networking solution from scratch
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architectures
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
FlashSystem Portfolio Overview April 2016 w/ A9000
FlashSystem Portfolio Overview April 2016 w/ A9000FlashSystem Portfolio Overview April 2016 w/ A9000
FlashSystem Portfolio Overview April 2016 w/ A9000
 
Routed Fabrics For Ceph
Routed Fabrics For CephRouted Fabrics For Ceph
Routed Fabrics For Ceph
 
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
 
Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containers
 
NVMe overview
NVMe overviewNVMe overview
NVMe overview
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko VancsaStarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
 
Network Function Virtualization (NFV) using IOS-XR
Network Function Virtualization (NFV) using IOS-XRNetwork Function Virtualization (NFV) using IOS-XR
Network Function Virtualization (NFV) using IOS-XR
 
Deploying CloudStack with Ceph
Deploying CloudStack with CephDeploying CloudStack with Ceph
Deploying CloudStack with Ceph
 
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
 

Similar to NVMe over Fabrics Demystified

Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RSimon Huang
 
Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017
Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017
Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017
Cloud Native Day Tel Aviv
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
DoKC
 
Virtualization Acceleration
Virtualization Acceleration Virtualization Acceleration
Virtualization Acceleration
Mellanox Technologies
 
Open coud networking at full speed - Avi Alkobi
Open coud networking at full speed - Avi AlkobiOpen coud networking at full speed - Avi Alkobi
Open coud networking at full speed - Avi Alkobi
OpenInfra Days Poland 2019
 
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to KnowWebinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
Storage Switzerland
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
inside-BigData.com
 
#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session
Brocade
 
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Community
 
Dedicated Networks For IP Storage
Dedicated Networks For IP StorageDedicated Networks For IP Storage
Dedicated Networks For IP Storage
EMC
 
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
NAIM Networks, Inc.
 
SAN overview.pptx
SAN overview.pptxSAN overview.pptx
SAN overview.pptx
Mugabo4
 
NVMe over Fabrics and Composable Infrastructure - What Do They Mean for Softw...
NVMe over Fabrics and Composable Infrastructure - What Do They Mean for Softw...NVMe over Fabrics and Composable Infrastructure - What Do They Mean for Softw...
NVMe over Fabrics and Composable Infrastructure - What Do They Mean for Softw...
Ceph Community
 
Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance
Ceph Community
 
Mellanox VXLAN Acceleration
Mellanox VXLAN AccelerationMellanox VXLAN Acceleration
Mellanox VXLAN Acceleration
Mellanox Technologies
 
Achieving Network Deployment Flexibility with Mirantis OpenStack
Achieving Network Deployment Flexibility with Mirantis OpenStackAchieving Network Deployment Flexibility with Mirantis OpenStack
Achieving Network Deployment Flexibility with Mirantis OpenStack
Eric Zhaohui Ji
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Netronome
 
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
OpenStack Korea Community
 
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
Ceph Community
 

Similar to NVMe over Fabrics Demystified (20)

Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3R
 
Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017
Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017
Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
 
Virtualization Acceleration
Virtualization Acceleration Virtualization Acceleration
Virtualization Acceleration
 
Open coud networking at full speed - Avi Alkobi
Open coud networking at full speed - Avi AlkobiOpen coud networking at full speed - Avi Alkobi
Open coud networking at full speed - Avi Alkobi
 
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to KnowWebinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
 
#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session
 
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
 
Dedicated Networks For IP Storage
Dedicated Networks For IP StorageDedicated Networks For IP Storage
Dedicated Networks For IP Storage
 
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
 
SAN overview.pptx
SAN overview.pptxSAN overview.pptx
SAN overview.pptx
 
NVMe over Fabrics and Composable Infrastructure - What Do They Mean for Softw...
NVMe over Fabrics and Composable Infrastructure - What Do They Mean for Softw...NVMe over Fabrics and Composable Infrastructure - What Do They Mean for Softw...
NVMe over Fabrics and Composable Infrastructure - What Do They Mean for Softw...
 
Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance
 
Mellanox VXLAN Acceleration
Mellanox VXLAN AccelerationMellanox VXLAN Acceleration
Mellanox VXLAN Acceleration
 
CloudX on OpenStack
CloudX on OpenStackCloudX on OpenStack
CloudX on OpenStack
 
Achieving Network Deployment Flexibility with Mirantis OpenStack
Achieving Network Deployment Flexibility with Mirantis OpenStackAchieving Network Deployment Flexibility with Mirantis OpenStack
Achieving Network Deployment Flexibility with Mirantis OpenStack
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
 
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
 
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
 

Recently uploaded

Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
perweeng31
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
PinkySharma900491
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
kywwoyk
 
Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
ThalapathyVijay15
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
kywwoyk
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
freshgammer09
 
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
Amil baba
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
ArjunJain44
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
eemet
 

Recently uploaded (9)

Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
 
Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
 
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 

NVMe over Fabrics Demystified

  • 1. 2019 Storage Developer Conference India © All Rights Reserved. 1 NVMe over Fabrics Demystified Rob Davis Mellanox
  • 2. © 2019 Mellanox Technologies 22 Why NVMe over Fabrics? 0.1 10 1000 HD SSD NVM AccessTime(micro-Sec) StorageMedia Technology AccessTimeinMicroSeconds HDD PM
  • 3. © 2019 Mellanox Technologies 33 NVMe Technology ▪Optimized for flash and PM ▪ Traditional SCSI interfaces designed for spinning disk ▪ NVMe bypasses unneeded layers ▪NVMe Flash Outperforms SAS/SATA Flash ▪ +2.5x more bandwidth, +50% lower latency, +3x more IOPS
  • 4. © 2019 Mellanox Technologies 44 “NVMe over Fabrics” was the Logical and Historical next step ▪Sharing NVMe based storage across multiple servers/CPUs was the next step ▪ Better utilization: capacity, rack space, power ▪ Scalability, management, fault isolation ▪NVMe over Fabrics standard ▪ 50+ contributors ▪ Version 1.0 released in June 2016 ▪Pre-standard demos in 2014 ▪Able to almost match local NVMe performance Gb/s
  • 5. © 2019 Mellanox Technologies 55 NVMe over Fabrics (NVMe-oF) Transports ▪ The NVMe-oF standard is not Fabric specific ▪ Instead there is a separate Transport Binding specification for each Transport Layer ▪ RDMA was 1st ▪ Later Fibre Channel ▪ NVM.org just released a new binding specification for TCP InfiniBand
  • 6. © 2019 Mellanox Technologies 66 How Does NVMe-oF Maintain NVMe Performance? ▪ By extending NVMe efficiency over a fabric ▪ NVMe commands and data structures are transferred end to end ▪ Bypassing legacy stacks for performance ▪ First products and early demos all used RDMA ▪ Performance is impressive SAS/sATA Device over Fabrics NVMe/RDMA NVMe/TCP Transport Transport or IB
  • 7. © 2019 Mellanox Technologies 77 How Does NVMe-oF Maintain NVMe Performance? ▪ By extending NVMe efficiency over a fabric ▪ NVMe commands and data structures are transferred end to end ▪ Bypassing legacy stacks for performance ▪ First products and early demos all used RDMA ▪ Performance is impressive SAS/sATA Device over Fabrics NVMe/RDMA NVMe/TCP https://www.theregister.co.uk/2018/08/16/pavilion_fabrics_performance/
  • 8. © 2019 Mellanox Technologies 88 How Does NVMe-oF Maintain NVMe Performance? ▪ By extending NVMe efficiency over a fabric ▪ NVMe commands and data structures are transferred end to end ▪ Bypassing legacy stacks for performance ▪ First products and early demos all used RDMA ▪ Performance is impressive SAS/sATA Device over Fabrics NVMe/RDMA NVMe/TCP Fibre Channel Fibre Channel NVMe/TCP NVMe/FC over Fabrics ~150
  • 9. © 2019 Mellanox Technologies 99 Faster Storage Needs a Faster Network 10GbE Fibre Channel
  • 10. © 2019 Mellanox Technologies 1010 Faster Network Wires Solves Some the Network Bottle Neck Problem… Ethernet & InfiniBand End-to-End 25, 40, 50, 56, 100, 200Gb Going to 400Gb
  • 11. © 2019 Mellanox Technologies 1111 Faster Protocols Solves the Rest
  • 12. © 2019 Mellanox Technologies 1212 Faster Protocols Solves the Rest
  • 13. © 2019 Mellanox Technologies 1313 NVMe, NVMe-oF, and RDMA Protocols
  • 14. © 2019 Mellanox Technologies 1414 NVMe/RDMA adapter based transport NVMe-oF over RoCE
  • 15. © 2019 Mellanox Technologies 1515 NVMe/RDMA adapter based transport 1) Ethernet ▪ RoCE ▪ iWARP 2) InfiniBand 3) OmniPath NVMe-oF over RoCE
  • 16. © 2019 Mellanox Technologies 1616 NVMe Commands Encapsulated Network
  • 17. © 2019 Mellanox Technologies 1717 NVMe Commands Encapsulated Network RNICNVMe Initiator RNIC NVMe Target Post Send (CC) Send – Command Capsule Ack Completion Completion Post NVMe command Wait for completion Free receivebuffer Post Send (RC) Send – Response Capsule Completion Ack Completion Free send buffer Free send buffer Post Send (Write data) Write first Write last Ack Completion Free allocated buffer
  • 18. © 2019 Mellanox Technologies 1818 Importance of Latency with NVMe-oF Common Switch & Adapter Logarithmicscale Low Latency Switch & Adapter Network hops multiply latency Request/Response Newest NVMe SSD
  • 19. © 2019 Mellanox Technologies 1919 Composable Infrastructure Use Case ▪Also called Compute Storage Disaggregation and Rack Scale ▪Dramatically improves data center efficiency ▪NVMe over Fabrics enables Composable Infrastructure ▪ Low latency ▪ High bandwidth ▪ Nearly local disk performance Switch Compute Compute Compute Compute Compute Compute Compute Compute Compute Compute Compute Switch Compute Compute Compute Compute Compute Compute Compute Compute Compute
  • 20. © 2019 Mellanox Technologies 2020 Hyperconverged and Scale-Out Storage Use Case ▪Scale-out ▪Cluster of commodity servers ▪Software provides storage functions ▪Hyperconverged collapses compute & storage ▪Integrated compute-storage nodes & software ▪NVMe-oF performs like local/direct-attached SSD Scale out Storage Mellanox x86 Switch Compute Nodes Storage Application VM VM VM VM NVMe NVMe NVMe NVMe NVMe NVMe Storage App HCI Nodes
  • 21. © 2019 Mellanox Technologies 2121 Backend Scale Out Use Case Backend Network JBOF Frontend
  • 22. © 2019 Mellanox Technologies 2222 NVMe-oF Use Cases: Classic SAN ▪SAN features at higher performance ▪Better utilization: capacity, rack space, and power ▪Scalability ▪Management ▪Fault isolation
  • 23. © 2019 Mellanox Technologies 2323 NVMe-oF Target Hardware Offloads No Offload Mode
  • 24. © 2019 Mellanox Technologies 2424 How Target Offload Works ▪ Offload ▪ Only control path, management and exceptions go through Target CPU software ▪ Data path and NVMe commands handled by the network adapter
  • 25. © 2019 Mellanox Technologies 2525 Offload vs No Offload Performance Data Path DDR4DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86 ConnectX-5 ▪ 6M IOPs, 512B block size ▪ 2M IOPs, 4K block side ▪ ~15 usec latency (not including SSD) no Offload Target 2 100Gb Initiators DDR4DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86 ConnectX-5 SOC ▪ 8M IOPs, 512B block size ▪ 5M IOPs, 4K block side ▪ ~5 usec latency (not including SSD) Offload Target 2 100Gb Initiators
  • 26. © 2019 Mellanox Technologies 2626 Offload vs No Offload Performance Data Path DDR4DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86 ConnectX-5 ▪ 6M IOPs, 512B block size ▪ 2M IOPs, 4K block side ▪ ~15 usec latency (not including SSD) no Offload Target 2 100Gb Initiators DDR4DDR4 PCIe Switch NVME SSD NVME SSD Initiator x86 ConnectX-5 Initiator x86 ConnectX-5 SOC ▪ 8M IOPs, 512B block size ▪ 5M IOPs, 4K block side ▪ ~5 usec latency (not including SSD) Offload Target 2 100Gb Initiators
  • 27. © 2019 Mellanox Technologies 2727 NVMe Emulation Physical Local NVMe Storage Physical Local Storage OS/Hypervisor NVMe Standard Driver PCIe BUS NVMe Host Server Local Physical Storage to Hardware Emulated Storage NVMe Drive Emulation Host Server OS/Hypervisor NVMe Standard Driver NVMe Emulated Storage PCIe BUS Remote Storage
  • 28. © 2019 Mellanox Technologies 2828 NVMe/TCP ▪NVMe-oF commands are sent over standard TCP/IP sockets ▪Each NVMe queue pair is mapped to a TCP connection ▪Easy to support NVMe over TCP with no changes ▪Good for distance, stranded server, and out of band management connectivity
  • 29. © 2019 Mellanox Technologies 2929 Latency: NVMe-RDMA vs NVMe-TCP LocalSSDWrite RDMAWrite TCPWrite Tail Latency FractionofIOswiththisorlesslatency
  • 30. © 2019 Mellanox Technologies 3030 Latency: NVMe-RDMA vs NVMe-TCP LocalSSDWrite RDMAWrite TCPWrite Tail Latency FractionofIOswiththisorlesslatency
  • 31. © 2019 Mellanox Technologies 3131 Latency: NVMe-RDMA vs NVMe-TCP LocalSSDWrite RDMAWrite TCPWrite Tail Latency FractionofIOswiththisorlesslatency
  • 32. © 2019 Mellanox Technologies 3232 NVMe over Fabrics Maturity ▪UNH-IOL, a neutral environment for multi-vendor interoperability since 1988 ▪Four plug fests for NVMe-oF since May 2017 ▪Tests require participating vendors to mix and match in both Target and Initiator positions ▪June 2018 test included Mellanox, Broadcom and Marvel ASIC solutions ▪URL to list of vendors who OK public results: https://www.iol.unh.edu/registry/ nvmeof
  • 33. © 2019 Mellanox Technologies 3333 NVMe Market Projection – $60B by 2021 ▪~$20B in NVMe-oF revenue projected by 2021 ▪NVMe-oF adapter shipments will exceed 1.5M units by 2021 ▪This does not include ASICs, Custom Mezz Cards, etc. inside AFAs and other Storage Appliances
  • 34. © 2019 Mellanox Technologies 3434 Some NVMe-oF Storage Players
  • 35. © 2019 Mellanox Technologies 3535 Conclusions ▪NVMe-oF brings the value of networked storage to NVMe based solutions ▪NVMe-oF is supported across many network technologies ▪The performance advantages of NVMe, are not lost with NVMe-oF ▪Especially with RDMA ▪There are many suppliers of NVMe-oF solutions across a variety of important data center use cases
  • 36. © 2019 Mellanox Technologies 36 Thank You
  • 37. 2019 Storage Developer Conference India © All Rights Reserved. 37 NVMe over Fabrics Demystified Rob Davis Mellanox