Mellanox Storage Solutions
Yaron Haviv, VP Datacenter Solutions
VMworld 2013 – San Francisco, CA
© 2012 Mellanox Technologies 2- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
Maximizing Data Center Return on Investment
 97% reduction in database recovery time
• Case: Tier-1 Fortune100 Web 2.0 company
 Database performance improved up to 10X
• Cases: Oracle, Teradata, IBM, Microsoft
 Big Data needs big pipes
• high throughput, low latency server and storage interconnect
 2X faster data analytics = expose your data value!
• With Mellanox 56Gb/s RDMA interconnect solutions
 3x more VMs per physical server
 33% lower application cost
• Cases: Microsoft, Oracle, Atlantic, Profitbricks and more
 Consolidation of network and storage I/O for lower OPEX
 More than 10X higher storage performance
 Millions of IOPS
 60% Lower TCO, 50% Lower CAPEX
From 7 Days to 4 Hours!
3X the Virtual Machines
At 33% lower Cost
Storage
© 2012 Mellanox Technologies 3- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
SSD Adoption grows significantly, Driving the Need for Faster I/O
Source: IT Brand Pulse
SSDs are 100x Faster, Require Faster Networks and RDMA
© 2012 Mellanox Technologies 4- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
The Storage Delivery Bottleneck
+ = 12GB/s =
Server
24 x 2.5” SATA 3 SSDs
15 x 8Gb/s Fibre
Channel Ports
2 x 40-56Gb/s IB/Eth port
(with RDMA)
OR
OR
SSD & Flash Mandate High-Speed Interconnect
10 x 10Gb/s iSCSI
Ports (with offload)
© 2012 Mellanox Technologies 5- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
Solving the Storage (Synchronous) IOPs Bottleneck With RDMA
100usec 200usec 6000usec
25
usec
1
us
20
usec
10
usec
The Old Days
(~6msec)
Software Disk
With SSDs
(~0.5msec)
With Fast Network
(~0.2msec)
With RDMA
(~0.05msec)
Network
100usec 200usec
200usec
25
usec
25
usec
180 IOPs
3000 IOPs
4300 IOPs
20,000 IOPs
Synchronous (back to back)
With Full OS Bypass
& Cache
(~0.007msec)
1
us
6
us
3
us
100,000 IOPs
Synchronous
© 2012 Mellanox Technologies 6- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
FC 10GbE/TCP 10GbE/RoCE 40GbE/RoCE InfiniBand
Block
NAS and Object
Big Data (Hadoop)
Storage Backplane/Clustering
Messaging
Compare Interconnect Technologies
FC 10GbE/TCP 10GbE/RoCE 40GbE/RoCE InfiniBand
Bandwidth [GB/s] 0.8/1.6 1.25 1.25 5 7
$/GBps [NIC/Switch]* 500 / 500 200 / 150 200 / 150 120 / 90 80 / 50
Credit Based (Lossless) ** **
Built in L2 Multi-path
Latency
Technology Features:
Storage Application/Protocol Support:
* Based on Google Product Search
** Mellanox end to end can be configured as true lossless Mellanox
© 2012 Mellanox Technologies 7- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
Standard iSCSI over TCP/IP
 iSCSI’s main performance deficiencies stem from TCP/IP:
• TCP is a complex protocol requiring significant processing
• Stream based, making it hard to separate data and headers
• Requires copies that increase latency and CPU overhead
• Using checksums requiring additional CRCs in the ULP
BHS AHS HD Data DD
Basic Header
Segment
Additional
Header
Segment
(optional)
Header Digest
(optional)
Data Digest
(optional)
Protocol frames
(TCP/IP)
iSCSI PDU
© 2012 Mellanox Technologies 8- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
iSCSI Mapping to iSER / RDMA Transport
 iSER eliminates the bottlenecks through:
• Zero copy using RDMA
• CRC calculated by hardware
• Work with message boundaries instead of streams
• Transport protocol implemented in hardware (minimal CPU cycles per IO)
 Enabling unparalleled performance
BHS AHS HD Data DD
Protocol frames
(RDMA)
iSCSI PDU
RC Send RC RDMA Read/Write
X
In HW
X
In HW
© 2012 Mellanox Technologies 9- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
iSER protocol overview (Read)
 SCSI Reads
• Initiator Send Command PDU (Protocol data unit) to Target
• Target return data using RDMA Write
• Target send Response PDU back when completed transaction
• Initiator receives Response and complete SCSI operation
iSCSI
Initiator
iSER
HCA
HCA
iSERTarget
Target
Storage
Send_Control (SCSI
Read Cmd)
RDMA Write for
Data
Send_Control + Buffer
advertisement
Control_Notify
Data_Put
(Data-In PDU)
for Read
Control_Notify
Send_Control (SCSI
Response)
© 2012 Mellanox Technologies 10- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
Mellanox Unbeatable Storage Performance
@ 2300K IOPs
5-10% the latency under 20x the workload
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
iSCSI/TCP iSCSI/RDMA
IOLatency@4KIO[micsec]
We Deliver Significantly Faster IO Rates and Lower Access Times !
@ only 131K
IOPs
iSCSI (TCP/IP)
1 x FC 8 Gb
port
4 x FC 8 Gb
port
iSER 1 x
40GbE/IB Port
iSER 2 x
40GbE/IB Port
(+Acceleration)
KIOPs 130 200 800 1100 2300
0
500
1000
1500
2000
2500
KIOPs@4KIOSize
© 2012 Mellanox Technologies 11- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
Accelerating IO Performance (Accessing a Single LUN)
0
1000
2000
3000
4000
5000
6000
1 2 4 8 16 32 64 128 256
Bandwidth[MB/s]
IO Size [KB]
iSCSI/TCP
iSER std
iSER BD (bypass)
0
100
200
300
400
500
600
700
1 2 4 8 16 32 64 128 256
IOPs[K/s]
IO Size [KB]
iSCSI/TCP
iSER std
iSER BD (bypass)
0
500
1000
1500
2000
2500
3000
3500
iSCSI/TCP iSER std iSER BD (bypass)
IOLatency[microsecounds]
Read
Write
R/W
0
2
4
6
8
10
12
14
16
18
iSCSI/TCP iSER std iSER BD (bypass)CPUIoWaitin%
Read
Write
R/W
IO Latency and % of CPU in I/O Wait @ 4KB IO size and max IOPsBandwidth & IOPs, Single LUN, 3 Threads
PCIe
Limit
PCIe Limit
@ only 80K IOPs
@ 624K IOPs
@ only 80K IOPs
@ 624K IOPs
@ 235K IOPs
@ 235K IOPs
© 2012 Mellanox Technologies 12- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
Primary Secondary
 Mellanox & LSI address the critical storage latency and IOPs
bottlenecks in Virtual Desktop Infrastructure (VDI)
• LSI Nytro MegaRAID accelerate disk access through SSD based caching
• Mellanox ConnectX®-3 10/40GbE Adapter with RDMA Accelerate access from
Hypervisors to fast shared storage over Ethernet, and enable Zero-overhead replication
 When tested with Login VSI’s VDI load generator, the solution
delivered unprecedented VM density of 150 VMs per ESX server
• Using iSCSI/RDMA (iSER) enabled 2.5x more VMs compared to using iSCSI with
TCP/IP over the exact same setup
Mellanox & LSI Accelerate VDI, Enable 2.5x More VMs
iSCSI/RDMA
(iSER) target
Software
RAID (MD)
LSI Caching
Flash/RAID
Controller
0 20 40 60 80 100 120 140 160
Intel 10GbE, iSCSI/TCP
ConnectX3 10GbE, iSCSI/RDMA (iSER)
ConnectX3 40GbE, iSCSI/RDMA (iSER)
Number of Virtual Desktop VMs
iSCSI/RDMA
(iSER) target
LSI Caching
Flash/RAID
Controller
Replication
Mellanox SX1012
10/40GbE Switch
Benchmark Configuration
Redundant Storage Cluster
• 2 x Xeon E5-2650
processors
• Mellanox ConnectX®-3
Pro, 40GbE/RoCE
• LSI Nytro MegaRAID
NMR 8110-4i
2.5x More VMs
© 2012 Mellanox Technologies 13- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -
 Using OpenStack Built-in components and management (Open-iSCSI, tgt target, Cinder), no
additional software is required, RDMA is already inbox and used by our OpenStack customers !
 Mellanox enable faster performance, with much lower CPU%
 Next step is to bypass Hypervisor layers, and add NAS & Object storage
Native Integration Into OpenStack Cinder
Hypervisor (KVM)
OS
VM
OS
VM
OS
VM
Adapter
Open-iSCSI w iSER
Compute Servers
Switching Fabric
iSCSI/iSER Target (tgt)
Adapter Local Disks
RDMA Cache
Storage Servers
OpenStack (Cinder)
Using RDMA
to accelerate
iSCSI storage
0
1000
2000
3000
4000
5000
6000
7000
1 2 4 8 16 32 64 128 256
Bandwidth[MB/s]
I/O Size [KB]
iSER 4 VMs Write
iSER 8 VMs Write
iSER 16 VMs Write
iSCSI Write 8 vms
iSCSI Write 16 VMs
PCIe Limit
6X

Mellanox Storage Solutions

  • 1.
    Mellanox Storage Solutions YaronHaviv, VP Datacenter Solutions VMworld 2013 – San Francisco, CA
  • 2.
    © 2012 MellanoxTechnologies 2- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential - Maximizing Data Center Return on Investment  97% reduction in database recovery time • Case: Tier-1 Fortune100 Web 2.0 company  Database performance improved up to 10X • Cases: Oracle, Teradata, IBM, Microsoft  Big Data needs big pipes • high throughput, low latency server and storage interconnect  2X faster data analytics = expose your data value! • With Mellanox 56Gb/s RDMA interconnect solutions  3x more VMs per physical server  33% lower application cost • Cases: Microsoft, Oracle, Atlantic, Profitbricks and more  Consolidation of network and storage I/O for lower OPEX  More than 10X higher storage performance  Millions of IOPS  60% Lower TCO, 50% Lower CAPEX From 7 Days to 4 Hours! 3X the Virtual Machines At 33% lower Cost Storage
  • 3.
    © 2012 MellanoxTechnologies 3- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential - SSD Adoption grows significantly, Driving the Need for Faster I/O Source: IT Brand Pulse SSDs are 100x Faster, Require Faster Networks and RDMA
  • 4.
    © 2012 MellanoxTechnologies 4- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential - The Storage Delivery Bottleneck + = 12GB/s = Server 24 x 2.5” SATA 3 SSDs 15 x 8Gb/s Fibre Channel Ports 2 x 40-56Gb/s IB/Eth port (with RDMA) OR OR SSD & Flash Mandate High-Speed Interconnect 10 x 10Gb/s iSCSI Ports (with offload)
  • 5.
    © 2012 MellanoxTechnologies 5- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential - Solving the Storage (Synchronous) IOPs Bottleneck With RDMA 100usec 200usec 6000usec 25 usec 1 us 20 usec 10 usec The Old Days (~6msec) Software Disk With SSDs (~0.5msec) With Fast Network (~0.2msec) With RDMA (~0.05msec) Network 100usec 200usec 200usec 25 usec 25 usec 180 IOPs 3000 IOPs 4300 IOPs 20,000 IOPs Synchronous (back to back) With Full OS Bypass & Cache (~0.007msec) 1 us 6 us 3 us 100,000 IOPs Synchronous
  • 6.
    © 2012 MellanoxTechnologies 6- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential - FC 10GbE/TCP 10GbE/RoCE 40GbE/RoCE InfiniBand Block NAS and Object Big Data (Hadoop) Storage Backplane/Clustering Messaging Compare Interconnect Technologies FC 10GbE/TCP 10GbE/RoCE 40GbE/RoCE InfiniBand Bandwidth [GB/s] 0.8/1.6 1.25 1.25 5 7 $/GBps [NIC/Switch]* 500 / 500 200 / 150 200 / 150 120 / 90 80 / 50 Credit Based (Lossless) ** ** Built in L2 Multi-path Latency Technology Features: Storage Application/Protocol Support: * Based on Google Product Search ** Mellanox end to end can be configured as true lossless Mellanox
  • 7.
    © 2012 MellanoxTechnologies 7- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential - Standard iSCSI over TCP/IP  iSCSI’s main performance deficiencies stem from TCP/IP: • TCP is a complex protocol requiring significant processing • Stream based, making it hard to separate data and headers • Requires copies that increase latency and CPU overhead • Using checksums requiring additional CRCs in the ULP BHS AHS HD Data DD Basic Header Segment Additional Header Segment (optional) Header Digest (optional) Data Digest (optional) Protocol frames (TCP/IP) iSCSI PDU
  • 8.
    © 2012 MellanoxTechnologies 8- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential - iSCSI Mapping to iSER / RDMA Transport  iSER eliminates the bottlenecks through: • Zero copy using RDMA • CRC calculated by hardware • Work with message boundaries instead of streams • Transport protocol implemented in hardware (minimal CPU cycles per IO)  Enabling unparalleled performance BHS AHS HD Data DD Protocol frames (RDMA) iSCSI PDU RC Send RC RDMA Read/Write X In HW X In HW
  • 9.
    © 2012 MellanoxTechnologies 9- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential - iSER protocol overview (Read)  SCSI Reads • Initiator Send Command PDU (Protocol data unit) to Target • Target return data using RDMA Write • Target send Response PDU back when completed transaction • Initiator receives Response and complete SCSI operation iSCSI Initiator iSER HCA HCA iSERTarget Target Storage Send_Control (SCSI Read Cmd) RDMA Write for Data Send_Control + Buffer advertisement Control_Notify Data_Put (Data-In PDU) for Read Control_Notify Send_Control (SCSI Response)
  • 10.
    © 2012 MellanoxTechnologies 10- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential - Mellanox Unbeatable Storage Performance @ 2300K IOPs 5-10% the latency under 20x the workload 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 iSCSI/TCP iSCSI/RDMA IOLatency@4KIO[micsec] We Deliver Significantly Faster IO Rates and Lower Access Times ! @ only 131K IOPs iSCSI (TCP/IP) 1 x FC 8 Gb port 4 x FC 8 Gb port iSER 1 x 40GbE/IB Port iSER 2 x 40GbE/IB Port (+Acceleration) KIOPs 130 200 800 1100 2300 0 500 1000 1500 2000 2500 KIOPs@4KIOSize
  • 11.
    © 2012 MellanoxTechnologies 11- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential - Accelerating IO Performance (Accessing a Single LUN) 0 1000 2000 3000 4000 5000 6000 1 2 4 8 16 32 64 128 256 Bandwidth[MB/s] IO Size [KB] iSCSI/TCP iSER std iSER BD (bypass) 0 100 200 300 400 500 600 700 1 2 4 8 16 32 64 128 256 IOPs[K/s] IO Size [KB] iSCSI/TCP iSER std iSER BD (bypass) 0 500 1000 1500 2000 2500 3000 3500 iSCSI/TCP iSER std iSER BD (bypass) IOLatency[microsecounds] Read Write R/W 0 2 4 6 8 10 12 14 16 18 iSCSI/TCP iSER std iSER BD (bypass)CPUIoWaitin% Read Write R/W IO Latency and % of CPU in I/O Wait @ 4KB IO size and max IOPsBandwidth & IOPs, Single LUN, 3 Threads PCIe Limit PCIe Limit @ only 80K IOPs @ 624K IOPs @ only 80K IOPs @ 624K IOPs @ 235K IOPs @ 235K IOPs
  • 12.
    © 2012 MellanoxTechnologies 12- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential - Primary Secondary  Mellanox & LSI address the critical storage latency and IOPs bottlenecks in Virtual Desktop Infrastructure (VDI) • LSI Nytro MegaRAID accelerate disk access through SSD based caching • Mellanox ConnectX®-3 10/40GbE Adapter with RDMA Accelerate access from Hypervisors to fast shared storage over Ethernet, and enable Zero-overhead replication  When tested with Login VSI’s VDI load generator, the solution delivered unprecedented VM density of 150 VMs per ESX server • Using iSCSI/RDMA (iSER) enabled 2.5x more VMs compared to using iSCSI with TCP/IP over the exact same setup Mellanox & LSI Accelerate VDI, Enable 2.5x More VMs iSCSI/RDMA (iSER) target Software RAID (MD) LSI Caching Flash/RAID Controller 0 20 40 60 80 100 120 140 160 Intel 10GbE, iSCSI/TCP ConnectX3 10GbE, iSCSI/RDMA (iSER) ConnectX3 40GbE, iSCSI/RDMA (iSER) Number of Virtual Desktop VMs iSCSI/RDMA (iSER) target LSI Caching Flash/RAID Controller Replication Mellanox SX1012 10/40GbE Switch Benchmark Configuration Redundant Storage Cluster • 2 x Xeon E5-2650 processors • Mellanox ConnectX®-3 Pro, 40GbE/RoCE • LSI Nytro MegaRAID NMR 8110-4i 2.5x More VMs
  • 13.
    © 2012 MellanoxTechnologies 13- Mellanox Confidential -- Mellanox Confidential -- Mellanox Confidential -  Using OpenStack Built-in components and management (Open-iSCSI, tgt target, Cinder), no additional software is required, RDMA is already inbox and used by our OpenStack customers !  Mellanox enable faster performance, with much lower CPU%  Next step is to bypass Hypervisor layers, and add NAS & Object storage Native Integration Into OpenStack Cinder Hypervisor (KVM) OS VM OS VM OS VM Adapter Open-iSCSI w iSER Compute Servers Switching Fabric iSCSI/iSER Target (tgt) Adapter Local Disks RDMA Cache Storage Servers OpenStack (Cinder) Using RDMA to accelerate iSCSI storage 0 1000 2000 3000 4000 5000 6000 7000 1 2 4 8 16 32 64 128 256 Bandwidth[MB/s] I/O Size [KB] iSER 4 VMs Write iSER 8 VMs Write iSER 16 VMs Write iSCSI Write 8 vms iSCSI Write 16 VMs PCIe Limit 6X