May 5, 2016 1
Allen Samuels
The Consequences of Infinite Storage Bandwidth
Engineering Fellow, Systems and Software Solutions
May 5, 2016
May 5, 2016 2
Disclaimer
During the presentation today, we may make forward-looking statements.
Any statement that refers to expectations, projections, or other characterizations of future events or circumstances is a forward-
looking statement, including those relating to industry predictions and trends, future products and their projected availability,
and evolution of product capacities. Actual results may differ materially from those expressed in these forward-looking
statements due to a number of risks and uncertainties, including among others: industry predictions may not occur as expected,
products may not become available as expected, and products may not evolve as excepted; and the factors detailed under the
caption “Risk Factors” and elsewhere in the documents we file from time to time with the SEC, including, but not limited to, our
annual report on Form 10-K for the year ended January 3, 2016. This presentation contains information from third parties,
which reflect their projections as of the date of issuance. We undertake no obligation to update these forward-looking
statements, which speak only as of the date hereof or the date of issuance by a third party.
May 5, 2016 3
What do I Mean By Infinite Bandwidth ?
May 5, 2016 4
Log scale
• Use DRAM Bandwidth as a
proxy for CPU throughput
• Reasonable approximation
for DMA heavy, and/or
poor cache hit
performance workloads
(e.g. Storage)
Bigdifference
inslope!
Data is for informational purposes only and may contain errors
Network, Storage and DRAM Trends
May 5, 2016 5
Linear scale
InfiniteStorageBandwidth
• Same data as last slide,
but for the Log-
impaired
• Storage Bandwidth is
not literally infinite
• But the ratio of
Network and Storage
to CPU throughput is
widening very quickly
Data is for informational purposes only and may contain errors
Network, Storage and DRAM Trends
May 5, 2016 6
0
50
100
150
200
250
1990 1995 2000 2005 2010 2015 2020 2025
Year
SSDs / CPU Socket
Data is for informational purposes only and may contain errors
May 5, 2016 7
0
5
10
15
20
25
30
35
40
45
50
1995 2000 2005 2010 2015 2020 2025
Year
SSDs / CPU Socket @ 20% Max BW
Data is for informational purposes only and may contain errors
May 5, 2016 8
What happens as we get closer to the limit?
May 5, 2016 9
 New Denser Server Form Factors
– Blades
– Sleds
 Good short term solutions
Let’s Get Small!
May 5, 2016 10
 Storage Cost = Media + Access + Management
 Shared nothing architecture conflates access and management
 Storage costs will become dominated by Management cost
 Storage costs become CPU/DRAM costs
Effects Of The CPU/DRAM Bottleneck
May 5, 2016 11
 Move management to upper layers where CPU can be right-sized by client
 What kind of media access do I want?
– Simple enough functionality to be done directly in drive hardware – NO CPU
– Allow direct access throughout the compute cluster over a network
– Just enough machinery to enable coarse-grained sharing
Embracing The CPU/DRAM Bottleneck
 In short, you really want a SAN !
– Or more technically, Fabric Connected Storage
May 5, 2016 12
Not Your Father’s SAN
 Three problems with current SAN
– Fibre channel transport
– SCSI access protocol
– Drive oriented storage allocation
 All of these want to be updated
– Fibre channel is brittle and costly
– SCSI initiators have long code paths catering to seldom used configurations
– Robust sub-drive storage allocation
May 5, 2016 13
SAN 2.0
 NVMe over Fabrics
 1.0 Spec is out for review, hopefully done in May
 Simple enough for direct hardware execution of data path ops
 Minimal initiator code path lengths improve performance
 Namespaces allow sub-drive allocations
 Not mature enough for enterprise deployment – yet
May 5, 2016 14
SAN 2.0
 What storage network?
– Current candidates are FC, Infiniband and Ethernet
 Ethernet has best economics – if you can make it work
 RoCE is easy on the edge, but hard on the interior
– Only controlled environments have shown multi-switch scalability
– General scalability in a multi-vendor environment likely to be difficult
– Wonderful for intra-rack storage networking
 iWarp is hard on the edge, but easy on the interior
– Scarcity of implementations inhibits deployment
 Storage over IP will see limited cross rack deployment until this is resolved
May 5, 2016 15
 Implementations using OTS stuff are in progress
 Server side implementations look pretty conventional too
 4-5 MIOPS have been shown
 Seems like 10 MIOPS isn’t unreasonable to expect
First Generation Of SAN 2.0
NIC
CPU DRAM
SSD
PCIe
May 5, 2016 16
 Soon, NICs will forward NVMe operations to local PCIe devices
 CPU removed from the software part of the data path
 CPU is still needed for the hardware part of the data path
 IOPS improve, BW is unchanged
 Significant CPU freed for application processing
 Getting closer to the wall!
Second Generation SAN 2.0
May 5, 2016 17
 New generation of combined SSD controller and NIC
– Rethink of interfaces eliminates DRAM buffering
 Network goes right into the drive
 No CPU to be found
 Works well with rack scale architecture
Third Generation SAN 2.0, Imagined
May 5, 2016 18
 Disaggregated / Rack Scale Architecture
– Fabric connected
– Independently scale compute, networking and storage
Let’s Get Really Small
May 5, 2016 19
Call To Action
 Fabric-connected storage isn’t well managed by existing FOSS
 Lots of upper layer management software is available
– OpenStack, Ceph, Gluster, Cassandra, MongoDB, SheepDog, etc.
 Lower layer cluster management still primitive
May 5, 2016 20
What’s It All Mean?
 New form factors are in everybody's future
 The coming avalanche of storage bandwidth wants to be free
– Not imprisoned by a CPU
 Rack Scale Architecture allows new Storage/Compute configs
 Storage will be increasingly “Software Defined” as the HW evolves
May 5, 2016 21
Product Pitch!
May 5, 2016 22
Old Model
 Monolithic, large upfront
investments, and fork-lift upgrades
 Proprietary storage OS
 Costly: $$$$$
New SD-AFS Model
 Disaggregate storage, compute, and software for
better scaling and costs
 Best-in-class solution components
 Open source software - no vendor lock-in
 Cost-efficient: $
Software-defined All-Flash Storage
The disaggregated model for scale
May 5, 2016 23
Scalable Raw Performance
2M IOPS, Latency 1-3ms
12-15 GB/s Throughput
8TB Flash-Card Innovations
• Enterprise Grade Power-Fail Safe
• Alerts & monitoring
• Latching integrated & monitored
• Directly samples air temp
• Form-factor enables lowest cost SSD
InfiniFlash™ Storage Platform
Capacity 512TB – raw all Flash!
All Flash 3U JBOD of Flash (JBOF)
Up to 64 x 8TB SAS Drive Cards
4TB cards also available soon
Operational Efficiency & Resilient
Hot Swappable Architecture, Easy FRU
Low power – typical workload 400-500W
150W(idle) - 750W(max)
MTBF 1.5+ million hours
Hot Swappable !
Fans, SAS Expander Boards,
Power Suppliers, Flash cards
Host Connectivity
Connect up to 8 servers
through 8 SAS ports
Multi-path enabled
Flash Drive Card
EMS Product Management SanDisk Confidential
May 5, 2016 24
InfiniFlash IF500 All-Flash Storage System
Block and Object Storage Powered by Ceph
 Ultra-dense High Capacity Flash storage
– 512TB in 3U, Scale-out software for PB scale capacity
 Highly scalable performance
– Industry leading IOPS/TB
 Cinder, Glance and Swift storage
– Add/remove server & capacity on-demand
 Enterprise-Class storage features
– Automatic rebalancing
– Hot Software upgrade
– Snapshots, replication, thin provisioning
– Fully hot swappable, redundant
 Ceph Optimized for SanDisk flash
– Tuned & Hardened for InfiniFlash
May 5, 2016 25
InfiniFlash SW + HW Advantage
Software Storage System
Software tuned for
Hardware
• Ceph modifications for Flash
• Both Ceph, Host OS tuned for
InfiniFlash
• SW defects that impacts Flash
identified & mitigated
Hardware Configured
for Software
• Right balance of CPU, RAM,
Storage
• Rack level designs for optimal
performance & cost
Software designed for all
systems does not work well with
any system
 Ceph has over 50 tuning
parameters that results in 5x – 6x
performance improvement
 Fixed CPU, RAM hyperconverged
nodes does not work well for all
workloads
May 5, 2016 26
InfiniFlash for OpenStack with Dis-Aggregation
 Compute & Storage Disaggregation enables
Optimal Resource utilization
 Allows for more CPU usage required for OSDs with
small Block workloads
 Allows for higher bandwidth provisioning as required
for large Object workload
 Independent Scaling of Compute and
Storage
 Higher Storage capacity needs doesn't’t force you to
add more compute and vice-versa
 Leads to optimal ROI for PB scale
OpenStack deploymentsHSEB A HSEB B
OSDs
SAS
….
HSEB A HSEB B HSEB A HSEB B
….
ComputeFarm
LUN LUN
iSCSI Storage
…Obj Obj
Swift ObjectStore
…LUN LUN
Nova with Cinder
& Glance
…
LibRBD
QEMU/KVM
RGW
WebServer
KRBD
iSCSI Target
OSDs OSDs OSDs OSDs OSDs
StorageFarm
Confidential – EMS Product Management
May 5, 2016 27
IF500 - Enhancing Ceph for Enterprise Consumption
IF500 provides usability and performance utilities without sacrificing Open Source principles
• SanDisk Ceph Distro ensures packaging with stable, production-ready code with consistent quality
• All Ceph Performance improvements developed by SanDisk are contributed back to community
27
SanDisk
Distribution or
Community
Distribution
 Out-of-the Box
configurations tuned for
performance with Flash
 Sizing & planning tool
 InfiniFlash drive
management integrated
into Ceph management
(Coming Soon)
 Ceph installer that is specifically built for InfiniFlash
 High performance iSCSI storage
 Better diagnostics with log collection tool
 Enterprise hardened SW + HW QA
May 5, 2016 28
InfiniFlash Performance Advantage
900K Random Read Performance with 384TB of storage
Flash Performance unleashed
• Out-of-the Box configurations tuned for
performance with Flash
• Read & Write data-path changes for Flash
• x3-12 block performance improvement –
depending on workload
• Almost linear performance scale with
addition of InfiniFlash nodes
• Write performance WIP with NV-RAM
Journals• Measured with 3 InfiniFlash nodes with 128TB each
• Avg Latency with 4K Block is ~2ms, with 99.9 percentile
latency is under 10ms
• For Lower block size, performance is CPU bound at Storage
Node.
• Maximum Bandwidth of 12.2GB/s measured towards 64KB
blocks
S
28
May 5, 2016 29
InfiniFlash Ceph Performance Advantage
 Single InfiniFlash unit Performance
– 1 x 512TB InfiniFlash unit connected with 8 nodes
– 4K RR IOPS: ~1 million IOPs - 85% of bare metal perf.
• Corresponding Bare metal IF100 IOPS is 1.1 million
– All 8 hosts CPU saturated for 4K Random read.
• More performance potential with higher CPU cycles
– With 64k IO size we are able to utilize full IF150
bandwidth of over 12GB/s.
– Librbd and Krbd performance are comparable.
– Write Performance is on 3x copy configuration. The
more common 2x copy will result in 33% improvement.
Random Write
IO Profile LIBRBD IOPs
4k Random Write 54k
64k Random Write 34k
256k Random Write 11.3k
1,123,175
349,247
87,369
0
5
10
15
20
25
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
4k 64k 256k
BW(GBps)
IOPS
Random Read Block Performance
LIBRBD IOPs Bandwidth (GBps)
May 5, 2016 30
InfiniFlash Ceph Performance Advantage
 Linear Scaling with 2 InfiniFlash units
– 2 x 512TB InfiniFlash unit connected with 16 nodes
– 1.8M 4K IOPS – 80% of the bare metal performance
– Performance is Scaling almost linearly - Almost doubled the
performance of single IF150 with ceph
– Write perf is 2 X with 16 node cluster compared with 8 node
cluster.
Random Read
Random Write
IO Profile LIBRBD IOPs
4k RR 1800k
64k RR 225k
256k RR 53k
IO Profile LIBRBD BW(MB/s)
4k RR 7194
64k RR 14412
256k RR 13366
May 5, 2016 31
InfiniFlash OS – Hardened Enterprise Class Ceph
 Hardened and tested for Hyperscale
deployments and workloads
 Platform focused testing enables us to deliver a
complete and hardened storage solution
 Single Vendor support for both Hardware &
Software
Enterprise Level
Hardening
Testing at
Scale
Failure
Testing
 9,000 hours
of cumulative
IO tests
 1,100+
unique test
cases
 1,000 hours
of Cluster
Rebalancing
tests
 1,000 hours
of IO on iSCSI
 Over 100
server node
clusters
 Over 4PB of
Flash Storage
 2,000 Cycle
Node Reboot
 1,000 times
Node Abrupt
Power Cycle
 1,000 times
Storage Failure
 1,000 times
Network
Failure
 IO for 250
hours at a
stretch
May 5, 2016 32
IF500 Reference Configurations
Model Entry Mid High
InfiniFlash 128TB 256TB 512TB
Servers1 2 x Dell R 630-2U 4 x Dell R 630-2U 4 x Dell R 630-2U2
Processor per server Dual socket Intel Xeon E5-2690 v3 Dual socket Intel Xeon E5-2690 v3 * Dual socket Intel Xeon E5-2690 v3
Memory per server 128GB RAM 128GB RAM 128GB RAM
HBA per server (1) LSI 9300-8e PCIe 12Gbps (1) LSI 9300-8e PCIe 12Gbps (1) LSI 9300-8e PCIe 12Gbps
Network per server
(1) Mellanox ConnectX-3 dual ports
40GbE
(1) Mellanox ConnectX-3 dual ports
40GbE
(1) Mellanox ConnectX-3 dual ports
40GbE
Boot Drive per server (2) SATA 120GB SSD (2) SATA 120GB SSD (2) SATA 120GB SSD
1 - For larger block workload or less CPU intensive workload, OSD node could use single socket server.
Dell Servers can be substituted with other vendor servers that match the specs.
2 - For Small Block workloads, 8 servers are recommended
May 5, 2016 33
InfiniFlash TCO Advantage
$-
$10,000,000
$20,000,000
$30,000,000
$40,000,000
$50,000,000
$60,000,000
$70,000,000
$80,000,000
Tradtional ObjStore on
HDD
IF500 ObjStore w/ 3
Full Replicas on Flash
IF500 w/ EC - All Flash IF500 - Flash Primary
& HDD Copies
3 year TCO comparison *
3 year Opex
TCA
0
20
40
60
80
100
Tradtional ObjStore on HDD IF500 ObjStore w/ 3 Full
Replicas on Flash
IF500 w/ EC - All Flash IF500 - Flash Primary & HDD
Copies
Total Rack
 Reduce the replica count with higher
reliability of flash
- 2 copies on InfiniFlash vs. 3 copies on
HDD
 InfiniFlash disaggregated architecture
reduces compute usage, thereby
reducing HW & SW costs
- Flash allows the use of erasure coded
storage pool without performance
limitations
- Protection equivalent of 2x storage with
only 1.2x storage
 Power, real estate, maintenance cost
savings over 5 year TCO
* TCO analysis based on a US customer’s OPEX & Cost data for a 100PB deployment
33
May 5, 2016 34
©2016 SanDisk Corporation. All rights reserved. SanDisk is a trademark of SanDisk Corporation, registered in the
United States and other countries. Other brands mentioned herein are for identification purposes only and may be
the trademarks of their holder(s).

The Consequences of Infinite Storage Bandwidth: Allen Samuels, SanDisk

  • 1.
    May 5, 20161 Allen Samuels The Consequences of Infinite Storage Bandwidth Engineering Fellow, Systems and Software Solutions May 5, 2016
  • 2.
    May 5, 20162 Disclaimer During the presentation today, we may make forward-looking statements. Any statement that refers to expectations, projections, or other characterizations of future events or circumstances is a forward- looking statement, including those relating to industry predictions and trends, future products and their projected availability, and evolution of product capacities. Actual results may differ materially from those expressed in these forward-looking statements due to a number of risks and uncertainties, including among others: industry predictions may not occur as expected, products may not become available as expected, and products may not evolve as excepted; and the factors detailed under the caption “Risk Factors” and elsewhere in the documents we file from time to time with the SEC, including, but not limited to, our annual report on Form 10-K for the year ended January 3, 2016. This presentation contains information from third parties, which reflect their projections as of the date of issuance. We undertake no obligation to update these forward-looking statements, which speak only as of the date hereof or the date of issuance by a third party.
  • 3.
    May 5, 20163 What do I Mean By Infinite Bandwidth ?
  • 4.
    May 5, 20164 Log scale • Use DRAM Bandwidth as a proxy for CPU throughput • Reasonable approximation for DMA heavy, and/or poor cache hit performance workloads (e.g. Storage) Bigdifference inslope! Data is for informational purposes only and may contain errors Network, Storage and DRAM Trends
  • 5.
    May 5, 20165 Linear scale InfiniteStorageBandwidth • Same data as last slide, but for the Log- impaired • Storage Bandwidth is not literally infinite • But the ratio of Network and Storage to CPU throughput is widening very quickly Data is for informational purposes only and may contain errors Network, Storage and DRAM Trends
  • 6.
    May 5, 20166 0 50 100 150 200 250 1990 1995 2000 2005 2010 2015 2020 2025 Year SSDs / CPU Socket Data is for informational purposes only and may contain errors
  • 7.
    May 5, 20167 0 5 10 15 20 25 30 35 40 45 50 1995 2000 2005 2010 2015 2020 2025 Year SSDs / CPU Socket @ 20% Max BW Data is for informational purposes only and may contain errors
  • 8.
    May 5, 20168 What happens as we get closer to the limit?
  • 9.
    May 5, 20169  New Denser Server Form Factors – Blades – Sleds  Good short term solutions Let’s Get Small!
  • 10.
    May 5, 201610  Storage Cost = Media + Access + Management  Shared nothing architecture conflates access and management  Storage costs will become dominated by Management cost  Storage costs become CPU/DRAM costs Effects Of The CPU/DRAM Bottleneck
  • 11.
    May 5, 201611  Move management to upper layers where CPU can be right-sized by client  What kind of media access do I want? – Simple enough functionality to be done directly in drive hardware – NO CPU – Allow direct access throughout the compute cluster over a network – Just enough machinery to enable coarse-grained sharing Embracing The CPU/DRAM Bottleneck  In short, you really want a SAN ! – Or more technically, Fabric Connected Storage
  • 12.
    May 5, 201612 Not Your Father’s SAN  Three problems with current SAN – Fibre channel transport – SCSI access protocol – Drive oriented storage allocation  All of these want to be updated – Fibre channel is brittle and costly – SCSI initiators have long code paths catering to seldom used configurations – Robust sub-drive storage allocation
  • 13.
    May 5, 201613 SAN 2.0  NVMe over Fabrics  1.0 Spec is out for review, hopefully done in May  Simple enough for direct hardware execution of data path ops  Minimal initiator code path lengths improve performance  Namespaces allow sub-drive allocations  Not mature enough for enterprise deployment – yet
  • 14.
    May 5, 201614 SAN 2.0  What storage network? – Current candidates are FC, Infiniband and Ethernet  Ethernet has best economics – if you can make it work  RoCE is easy on the edge, but hard on the interior – Only controlled environments have shown multi-switch scalability – General scalability in a multi-vendor environment likely to be difficult – Wonderful for intra-rack storage networking  iWarp is hard on the edge, but easy on the interior – Scarcity of implementations inhibits deployment  Storage over IP will see limited cross rack deployment until this is resolved
  • 15.
    May 5, 201615  Implementations using OTS stuff are in progress  Server side implementations look pretty conventional too  4-5 MIOPS have been shown  Seems like 10 MIOPS isn’t unreasonable to expect First Generation Of SAN 2.0 NIC CPU DRAM SSD PCIe
  • 16.
    May 5, 201616  Soon, NICs will forward NVMe operations to local PCIe devices  CPU removed from the software part of the data path  CPU is still needed for the hardware part of the data path  IOPS improve, BW is unchanged  Significant CPU freed for application processing  Getting closer to the wall! Second Generation SAN 2.0
  • 17.
    May 5, 201617  New generation of combined SSD controller and NIC – Rethink of interfaces eliminates DRAM buffering  Network goes right into the drive  No CPU to be found  Works well with rack scale architecture Third Generation SAN 2.0, Imagined
  • 18.
    May 5, 201618  Disaggregated / Rack Scale Architecture – Fabric connected – Independently scale compute, networking and storage Let’s Get Really Small
  • 19.
    May 5, 201619 Call To Action  Fabric-connected storage isn’t well managed by existing FOSS  Lots of upper layer management software is available – OpenStack, Ceph, Gluster, Cassandra, MongoDB, SheepDog, etc.  Lower layer cluster management still primitive
  • 20.
    May 5, 201620 What’s It All Mean?  New form factors are in everybody's future  The coming avalanche of storage bandwidth wants to be free – Not imprisoned by a CPU  Rack Scale Architecture allows new Storage/Compute configs  Storage will be increasingly “Software Defined” as the HW evolves
  • 21.
    May 5, 201621 Product Pitch!
  • 22.
    May 5, 201622 Old Model  Monolithic, large upfront investments, and fork-lift upgrades  Proprietary storage OS  Costly: $$$$$ New SD-AFS Model  Disaggregate storage, compute, and software for better scaling and costs  Best-in-class solution components  Open source software - no vendor lock-in  Cost-efficient: $ Software-defined All-Flash Storage The disaggregated model for scale
  • 23.
    May 5, 201623 Scalable Raw Performance 2M IOPS, Latency 1-3ms 12-15 GB/s Throughput 8TB Flash-Card Innovations • Enterprise Grade Power-Fail Safe • Alerts & monitoring • Latching integrated & monitored • Directly samples air temp • Form-factor enables lowest cost SSD InfiniFlash™ Storage Platform Capacity 512TB – raw all Flash! All Flash 3U JBOD of Flash (JBOF) Up to 64 x 8TB SAS Drive Cards 4TB cards also available soon Operational Efficiency & Resilient Hot Swappable Architecture, Easy FRU Low power – typical workload 400-500W 150W(idle) - 750W(max) MTBF 1.5+ million hours Hot Swappable ! Fans, SAS Expander Boards, Power Suppliers, Flash cards Host Connectivity Connect up to 8 servers through 8 SAS ports Multi-path enabled Flash Drive Card EMS Product Management SanDisk Confidential
  • 24.
    May 5, 201624 InfiniFlash IF500 All-Flash Storage System Block and Object Storage Powered by Ceph  Ultra-dense High Capacity Flash storage – 512TB in 3U, Scale-out software for PB scale capacity  Highly scalable performance – Industry leading IOPS/TB  Cinder, Glance and Swift storage – Add/remove server & capacity on-demand  Enterprise-Class storage features – Automatic rebalancing – Hot Software upgrade – Snapshots, replication, thin provisioning – Fully hot swappable, redundant  Ceph Optimized for SanDisk flash – Tuned & Hardened for InfiniFlash
  • 25.
    May 5, 201625 InfiniFlash SW + HW Advantage Software Storage System Software tuned for Hardware • Ceph modifications for Flash • Both Ceph, Host OS tuned for InfiniFlash • SW defects that impacts Flash identified & mitigated Hardware Configured for Software • Right balance of CPU, RAM, Storage • Rack level designs for optimal performance & cost Software designed for all systems does not work well with any system  Ceph has over 50 tuning parameters that results in 5x – 6x performance improvement  Fixed CPU, RAM hyperconverged nodes does not work well for all workloads
  • 26.
    May 5, 201626 InfiniFlash for OpenStack with Dis-Aggregation  Compute & Storage Disaggregation enables Optimal Resource utilization  Allows for more CPU usage required for OSDs with small Block workloads  Allows for higher bandwidth provisioning as required for large Object workload  Independent Scaling of Compute and Storage  Higher Storage capacity needs doesn't’t force you to add more compute and vice-versa  Leads to optimal ROI for PB scale OpenStack deploymentsHSEB A HSEB B OSDs SAS …. HSEB A HSEB B HSEB A HSEB B …. ComputeFarm LUN LUN iSCSI Storage …Obj Obj Swift ObjectStore …LUN LUN Nova with Cinder & Glance … LibRBD QEMU/KVM RGW WebServer KRBD iSCSI Target OSDs OSDs OSDs OSDs OSDs StorageFarm Confidential – EMS Product Management
  • 27.
    May 5, 201627 IF500 - Enhancing Ceph for Enterprise Consumption IF500 provides usability and performance utilities without sacrificing Open Source principles • SanDisk Ceph Distro ensures packaging with stable, production-ready code with consistent quality • All Ceph Performance improvements developed by SanDisk are contributed back to community 27 SanDisk Distribution or Community Distribution  Out-of-the Box configurations tuned for performance with Flash  Sizing & planning tool  InfiniFlash drive management integrated into Ceph management (Coming Soon)  Ceph installer that is specifically built for InfiniFlash  High performance iSCSI storage  Better diagnostics with log collection tool  Enterprise hardened SW + HW QA
  • 28.
    May 5, 201628 InfiniFlash Performance Advantage 900K Random Read Performance with 384TB of storage Flash Performance unleashed • Out-of-the Box configurations tuned for performance with Flash • Read & Write data-path changes for Flash • x3-12 block performance improvement – depending on workload • Almost linear performance scale with addition of InfiniFlash nodes • Write performance WIP with NV-RAM Journals• Measured with 3 InfiniFlash nodes with 128TB each • Avg Latency with 4K Block is ~2ms, with 99.9 percentile latency is under 10ms • For Lower block size, performance is CPU bound at Storage Node. • Maximum Bandwidth of 12.2GB/s measured towards 64KB blocks S 28
  • 29.
    May 5, 201629 InfiniFlash Ceph Performance Advantage  Single InfiniFlash unit Performance – 1 x 512TB InfiniFlash unit connected with 8 nodes – 4K RR IOPS: ~1 million IOPs - 85% of bare metal perf. • Corresponding Bare metal IF100 IOPS is 1.1 million – All 8 hosts CPU saturated for 4K Random read. • More performance potential with higher CPU cycles – With 64k IO size we are able to utilize full IF150 bandwidth of over 12GB/s. – Librbd and Krbd performance are comparable. – Write Performance is on 3x copy configuration. The more common 2x copy will result in 33% improvement. Random Write IO Profile LIBRBD IOPs 4k Random Write 54k 64k Random Write 34k 256k Random Write 11.3k 1,123,175 349,247 87,369 0 5 10 15 20 25 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 4k 64k 256k BW(GBps) IOPS Random Read Block Performance LIBRBD IOPs Bandwidth (GBps)
  • 30.
    May 5, 201630 InfiniFlash Ceph Performance Advantage  Linear Scaling with 2 InfiniFlash units – 2 x 512TB InfiniFlash unit connected with 16 nodes – 1.8M 4K IOPS – 80% of the bare metal performance – Performance is Scaling almost linearly - Almost doubled the performance of single IF150 with ceph – Write perf is 2 X with 16 node cluster compared with 8 node cluster. Random Read Random Write IO Profile LIBRBD IOPs 4k RR 1800k 64k RR 225k 256k RR 53k IO Profile LIBRBD BW(MB/s) 4k RR 7194 64k RR 14412 256k RR 13366
  • 31.
    May 5, 201631 InfiniFlash OS – Hardened Enterprise Class Ceph  Hardened and tested for Hyperscale deployments and workloads  Platform focused testing enables us to deliver a complete and hardened storage solution  Single Vendor support for both Hardware & Software Enterprise Level Hardening Testing at Scale Failure Testing  9,000 hours of cumulative IO tests  1,100+ unique test cases  1,000 hours of Cluster Rebalancing tests  1,000 hours of IO on iSCSI  Over 100 server node clusters  Over 4PB of Flash Storage  2,000 Cycle Node Reboot  1,000 times Node Abrupt Power Cycle  1,000 times Storage Failure  1,000 times Network Failure  IO for 250 hours at a stretch
  • 32.
    May 5, 201632 IF500 Reference Configurations Model Entry Mid High InfiniFlash 128TB 256TB 512TB Servers1 2 x Dell R 630-2U 4 x Dell R 630-2U 4 x Dell R 630-2U2 Processor per server Dual socket Intel Xeon E5-2690 v3 Dual socket Intel Xeon E5-2690 v3 * Dual socket Intel Xeon E5-2690 v3 Memory per server 128GB RAM 128GB RAM 128GB RAM HBA per server (1) LSI 9300-8e PCIe 12Gbps (1) LSI 9300-8e PCIe 12Gbps (1) LSI 9300-8e PCIe 12Gbps Network per server (1) Mellanox ConnectX-3 dual ports 40GbE (1) Mellanox ConnectX-3 dual ports 40GbE (1) Mellanox ConnectX-3 dual ports 40GbE Boot Drive per server (2) SATA 120GB SSD (2) SATA 120GB SSD (2) SATA 120GB SSD 1 - For larger block workload or less CPU intensive workload, OSD node could use single socket server. Dell Servers can be substituted with other vendor servers that match the specs. 2 - For Small Block workloads, 8 servers are recommended
  • 33.
    May 5, 201633 InfiniFlash TCO Advantage $- $10,000,000 $20,000,000 $30,000,000 $40,000,000 $50,000,000 $60,000,000 $70,000,000 $80,000,000 Tradtional ObjStore on HDD IF500 ObjStore w/ 3 Full Replicas on Flash IF500 w/ EC - All Flash IF500 - Flash Primary & HDD Copies 3 year TCO comparison * 3 year Opex TCA 0 20 40 60 80 100 Tradtional ObjStore on HDD IF500 ObjStore w/ 3 Full Replicas on Flash IF500 w/ EC - All Flash IF500 - Flash Primary & HDD Copies Total Rack  Reduce the replica count with higher reliability of flash - 2 copies on InfiniFlash vs. 3 copies on HDD  InfiniFlash disaggregated architecture reduces compute usage, thereby reducing HW & SW costs - Flash allows the use of erasure coded storage pool without performance limitations - Protection equivalent of 2x storage with only 1.2x storage  Power, real estate, maintenance cost savings over 5 year TCO * TCO analysis based on a US customer’s OPEX & Cost data for a 100PB deployment 33
  • 34.
    May 5, 201634 ©2016 SanDisk Corporation. All rights reserved. SanDisk is a trademark of SanDisk Corporation, registered in the United States and other countries. Other brands mentioned herein are for identification purposes only and may be the trademarks of their holder(s).