SlideShare a Scribd company logo
1 of 21
Download to read offline
1
Ceph on All-Flash Storage –
Breaking Performance Barriers
Zhou Hao
Technical Marketing Engineer
June 6th, 2015
Forward-Looking Statements
During our meeting today we will make forward-looking statements.
Any statement that refers to expectations, projections or other characterizations of future events or
circumstances is a forward-looking statement, including those relating to market growth, industry
trends, future products, product performance and product capabilities. This presentation also
contains forward-looking statements attributed to third parties, which reflect their projections as of the
date of issuance.
Actual results may differ materially from those expressed in these forward-looking statements due
to a number of risks and uncertainties, including the factors detailed under the caption “Risk Factors”
and elsewhere in the documents we file from time to time with the SEC, including our annual and
quarterly reports.
We undertake no obligation to update these forward-looking statements, which speak only as
of the date hereof or as of the date of issuance by a third party, as the case may be.
Requirement from Big Data @ PB Scale
 Mixed media container, active-
archiving, backup, locality of data
 Large containers with application
SLAs
 Internet of Things, Sensor
Analytics
 Time-to-Value and Time-to-Insight
 Hadoop
 NoSQL
 Cassandra
 MongoDB
 High read intensive access from
billions of edge devices
 Hi-Def video driving even greater
demand for capacity and
performance
 Surveillance systems, analytics
CONTENT REPOSITORIES BIG DATA ANALYTICS MEDIA SERVICES
InfiniFlash™ System
• Ultra-dense All-Flash Appliance
- 512TB in 3U
• Scale-out software for massive capacity
- Unified Content: Block, Object
- Flash optimized software with
programmable interfaces (SDK)
• Enterprise-Class storage features
- snapshots, replication, thin
provisioning
• Enhanced Performance for Block and
Object
- 10x Improvement for Block Reads
- 2x Improvement for Object Reads
IF500 with InfiniFlash OS (Ceph)
Ideal for large-scale storage
&
Best in class $/IOPS/TB
InfiniFlash Hardware System
Capacity 512TB* raw
 All-Flash 3U Storage System
 64 x 8TB Flash Cards with Pfail
 8 SAS ports total
Operational Efficiency and Resilient
 Hot Swappable components, Easy
FRU
 Low power 450W(avg), 750W(active)
 MTBF 1.5+ million hours
Scalable Performance**
 780K IOPS
 7GB/s Throughput
 Upgrade to 12GB/s in Q315
* 1TB = 1,000,000,000,000 bytes. Actual user capacity less.
** Based on internal testing of InfiniFlash 100. Test report available.
Innovating Performance @ InfiniFlash OS
 Major Improvements to
Enhance Parallelism
 Backend Optimizations
– XFS and Flash
 Messenger Performance
Enhancements
• Message signing
• Socket Read aheads
• Resolved severe lock contentions
• Reduced ~2 CPU core usage with improved file
path resolution from object ID
• CPU and Lock optimized fast path for reads
• Disabled throttling for Flash
• Index Manager caching and Shared FdCache in
filestore
• Removed single Dispatch queue bottlenecks for
OSD and Client (librados) layers
• Shared thread pool implementation
• Major lock reordering
• Improved lock granularity – Reader / Writer locks
• Granular locks at Object level
• Optimized OpTracking path in OSD eliminating
redundant locks
Open Source with SanDisk Advantage
InfiniFlash OS – Enterprise Level Hardened Ceph
Enterprise Level Hardening
 9,000 hours of
cumulative IO tests
 1,100+ unique test cases
 1,000 hours of cluster
rebalancing tests
 1,000 hours of IO on
iSCSI
Testing at Hyperscale
 Over 100 server node
clusters
 Over 4PB of flash storage
Failure Testing
 2,000 cycle node reboot
 1,000 times node abrupt
power cycle
 1,000 times storage
failure
 1,000 times network
failure
 IO for 250 hours at a
stretch
Enterprise Level Support
 Enterprise class
support and services
from SanDisk
 Risk mitigation through
long term support
and a reliable long
term roadmap
 Continual contribution
back to the community
Test Configuration – Single InfiniFlash System
Performance improves 2x to 12x depending on the Block size
Performance Improvement: Stock Ceph vs IF OS
8K Random Blocks
Top Row: Queue Depth
Bottom Row: % Read IOs
IOPS
Avglatenv(ms)
Avg Latency
0
50000
100000
150000
200000
250000
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
Stock Ceph
(Giant)
IFOS 1.0
0
20
40
60
80
100
120
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
• 2 RBD/Client x Total 4 Clients
• 1 InfiniFlash node with 512TB
IOPS
Top Row: Queue Depth
Bottom Row: % Read IOs
0
20000
40000
60000
80000
100000
120000
140000
160000
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
Stock Ceph
IFOS 1.0
AvgLatency(ms)
0
20
40
60
80
100
120
140
160
180
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
IOPS
Performance Improvement: Stock Ceph vs IF OS
64K Random Blocks
IOPS Avg Latency
• 2 RBD/Client x Total 4 Clients
• 1 InfiniFlash node with 512TB
Top Row: Queue Depth
Bottom Row: % Read IOs
Top Row: Queue Depth
Bottom Row: % Read IOs
Performance Improvement: Stock Ceph vs IF OS
256K Random Blocks
0
5000
10000
15000
20000
25000
30000
35000
40000
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
Stock Ceph
IFOS 1.0
0
50
100
150
200
250
300
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
IOPS
AvgLatency(ms)
IOPS Avg Latency
Top Row: Queue Depth
Bottom Row: % Read IOs
Top Row: Queue Depth
Bottom Row: % Read IOs
• 2 RBD/Client x Total 4 Clients
• 1 InfiniFlash node with 512TB
Test Configuration – 3 InfiniFlash Systems (128TB each)
Performance scales linearly with additional InfiniFlash nodes
Scaling with Performance
8K Random Blocks
0
100000
200000
300000
400000
500000
600000
700000
1 8 64 1 8 64 1 8 64 1 8 64 1 8 64
0 25 50 75 100
0
50
100
150
200
250
300
350
1 8 64 1 8 64 1 8 64 1 8 64 1 8 64
0 25 50 75 100
IOPS Avg Latency
• 2 RBD/Client x 5 Clients
• 3 InfiniFlash nodes with 128TB each
Top Row: Queue Depth
Bottom Row: % Read IOs
Top Row: Queue Depth
Bottom Row: % Read IOs
IOPS
AvgLatency(ms)
Scaling with Performance
64K Random Blocks
0
50000
100000
150000
200000
250000 1
4
16
64
256
2
8
32
128
1
4
16
64
256
2
8
32
128
1
4
16
64
256
0 25 50 75 100
0
100
200
300
400
500
600
700
800
900
1000
1 8 64 1 8 64 1 8 64 1 8 64 1 8 64
0 25 50 75 100
IOPS Avg Latency
• 2 RBD/Client x 5 Clients
• 3 InfiniFlash nodes with 128TB each
Top Row: Queue Depth
Bottom Row: % Read IOs
Top Row: Queue Depth
Bottom Row: % Read IOs
IOPS
AvgLatency(ms)
Scaling with Performance
256K Random Blocks
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
1
4
16
64
256
2
8
32
128
1
4
16
64
256
2
8
32
128
1
4
16
64
256
0 25 50 75 100
0
500
1000
1500
2000
2500
3000
3500
1 8 64 1 8 64 1 8 64 1 8 64 1 8 64
0 25 50 75 100
IOPS Avg Latency
• 2 RBD/Client x 5 Clients
• 3 InfiniFlash nodes with 128TB each
Top Row: Queue Depth
Bottom Row: % Read IOs
Top Row: Queue Depth
Bottom Row: % Read IOs
IOPS
AvgLatency(ms)
Flexible Ceph Topology with InfiniFlash
SAS
HSEB A HSEB B
OSDs
….
HSEB A HSEB B HSEB A HSEB B
…. LUN LUN
Client Application
…LUN LUN
Client Application
…LUN LUN
Client Application
…
RBDs / RGW
SCSI Targets
ReadIOO
Write IO
RBDs / RGW
SCSI Targets
RBDs / RGW
SCSI Targets
OSDs OSDs OSDs OSDs OSDs
ReadIOO
ReadIOO
 Disaggregated Architecture
 Optimized for Performance
 Higher Utilization
 Reduced CostsStorage Farm
Compute Farm
Flash + HDD with Data Tier-ing
Flash Performance with TCO of HDD
 InfiniFlash OS performs automatic data
placement and data movement between tiers
based transparent to Applications
 User defined Policies for data placement on
tiers
 Can be used with Erasure coding to further
reduce the TCO
Benefits
 Flash based performance with HDD like TCO
 Lower performance requirements on HDD tier
enables use of denser and cheaper SMR drives
 Denser and lower power compared to HDD only
solution
 InfiniFlash for High Activity data and SMR drives
for Low activity data
 60+ HDD per Server
Compute Farm
Flash Primary + HDD Replicas
Flash Performance with TCO of HDD
Primary replica on
InfiniFlash
HDD based data node
for 2nd local replica
HDD based data node
for 3rd DR replica
 Higher Affinity of the Primary Replica ensures much
of the compute is on InfiniFlash Data
 2nd and 3rd replicas on HDDs are primarily for data
protection
 High throughput of InfiniFlash provides data
protection, movement for all replicas without
impacting application IO
 Eliminates cascade data propagation requirement
for HDD replicas
 Flash-based accelerated Object performance for
Replica 1 allows for denser and cheaper SMR HDDs
for Replica 2 and 3
Compute Farm
TCO Example - Object Storage
Scale-out Flash Benefits at the TCO of HDD
$-
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
Traditional
ObjStore on
HDD
InfiniFlash
ObjectStore -3
Full Replicas
on Flash
InfiniFlash
with
ErasureCoding
- All Flash
InfiniFlash -
Flash Primary
& HDD copies
x10000
3Y TCO comparison for 96PB object storage
3 Year Opex
TCA
0
20
40
60
80
100
Total Racks
• Weekly failure rate for a 100PB deployment
15-35 HDD vs. 1 InfiniFlash Card
• HDD cannot handle simultaneous egress/ingress
• HDD long rebuild times, multiple failures and
rebalancing of data impact in service disruption
• Flash provides guaranteed & consistent SLA
• Flash capacity utilization >> HDD due to
reliability & ops
• Flash low power consumption
450W(avg), 750W(active)
Note that operational/maintenance cost and performance benefits are not accounted for in these models!!!
InfiniFlash™ System
The First All-Flash Storage System Built for High Performance Ceph
21
© 2015 SanDisk Corporation. All rights reserved. SanDisk is a trademark of SanDisk Corporation, registered in the United States and other countries. InfiniFlash is a trademarks of SanDisk Enterprise IP
LLC. All other product and company names are used for identification purposes and may be trademarks of their respective holder(s).
http://bigdataflash.sandisk.com/infiniflash
Steven.Xi@SanDisk.com Sales
Tonny.Ai@SanDisk.com Sales Engineering
Hao.Zhou@SanDisk.com Technical Marketing
Venkat.Kolli@SanDisk.com Production Management

More Related Content

What's hot

Leveraging memory in sql server
Leveraging memory in sql serverLeveraging memory in sql server
Leveraging memory in sql serverChris Adkin
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deploymentYoshinori Matsunobu
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeChris Adkin
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentalsChris Adkin
 
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...In-Memory Computing Summit
 
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Chris Adkin
 
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoCCeph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoCCeph Community
 
Sql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesChris Adkin
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...Zarafa
 
MongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsBenjamin Darfler
 
Zarafa Scaling & Performance
Zarafa Scaling & PerformanceZarafa Scaling & Performance
Zarafa Scaling & PerformanceZarafa
 
Optimizing InnoDB bufferpool usage
Optimizing InnoDB bufferpool usageOptimizing InnoDB bufferpool usage
Optimizing InnoDB bufferpool usageZarafa
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Community
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton insertsChris Adkin
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramChris Adkin
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 

What's hot (20)

Leveraging memory in sql server
Leveraging memory in sql serverLeveraging memory in sql server
Leveraging memory in sql server
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch mode
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentals
 
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
 
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)
 
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoCCeph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
 
Sql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architectures
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
 
MongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB Deployment Checklist
MongoDB Deployment Checklist
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
Zarafa Scaling & Performance
Zarafa Scaling & PerformanceZarafa Scaling & Performance
Zarafa Scaling & Performance
 
Optimizing InnoDB bufferpool usage
Optimizing InnoDB bufferpool usageOptimizing InnoDB bufferpool usage
Optimizing InnoDB bufferpool usage
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ram
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 

Viewers also liked

British Columbia's Gold Rush Trail Workshop Series Presentation June 9 12, 2015
British Columbia's Gold Rush Trail Workshop Series Presentation June 9 12, 2015British Columbia's Gold Rush Trail Workshop Series Presentation June 9 12, 2015
British Columbia's Gold Rush Trail Workshop Series Presentation June 9 12, 2015Tourism Cafe Canada
 
Epic research daily agri report 17th june 2015
Epic research  daily agri report 17th june  2015Epic research  daily agri report 17th june  2015
Epic research daily agri report 17th june 2015Epic Research Limited
 
Analysis of process parameters in dry machining of en 31 steel by grey relati...
Analysis of process parameters in dry machining of en 31 steel by grey relati...Analysis of process parameters in dry machining of en 31 steel by grey relati...
Analysis of process parameters in dry machining of en 31 steel by grey relati...IAEME Publication
 
Google Partners -Analytics Certification
Google Partners -Analytics  CertificationGoogle Partners -Analytics  Certification
Google Partners -Analytics CertificationChris de Jager
 
Analytische meetkunde les6 gv alst
Analytische meetkunde les6 gv alstAnalytische meetkunde les6 gv alst
Analytische meetkunde les6 gv alstGerard van Alst
 
Chúng ta thật nhỏ bé trong vũ trụ !
Chúng ta thật nhỏ bé trong vũ trụ !Chúng ta thật nhỏ bé trong vũ trụ !
Chúng ta thật nhỏ bé trong vũ trụ !Hùng Đỗ
 
Ubiquitus and context aware technologies- a new interactive way to tour the city
Ubiquitus and context aware technologies- a new interactive way to tour the cityUbiquitus and context aware technologies- a new interactive way to tour the city
Ubiquitus and context aware technologies- a new interactive way to tour the cityMaria Matarrese
 
Hdmi技术构思
Hdmi技术构思Hdmi技术构思
Hdmi技术构思Julian Wong
 
Ramesh Hegde_Social Media Profile
Ramesh Hegde_Social Media ProfileRamesh Hegde_Social Media Profile
Ramesh Hegde_Social Media ProfileRamesh Hegde
 

Viewers also liked (16)

British Columbia's Gold Rush Trail Workshop Series Presentation June 9 12, 2015
British Columbia's Gold Rush Trail Workshop Series Presentation June 9 12, 2015British Columbia's Gold Rush Trail Workshop Series Presentation June 9 12, 2015
British Columbia's Gold Rush Trail Workshop Series Presentation June 9 12, 2015
 
Epic research daily agri report 17th june 2015
Epic research  daily agri report 17th june  2015Epic research  daily agri report 17th june  2015
Epic research daily agri report 17th june 2015
 
Analysis of process parameters in dry machining of en 31 steel by grey relati...
Analysis of process parameters in dry machining of en 31 steel by grey relati...Analysis of process parameters in dry machining of en 31 steel by grey relati...
Analysis of process parameters in dry machining of en 31 steel by grey relati...
 
Google Partners -Analytics Certification
Google Partners -Analytics  CertificationGoogle Partners -Analytics  Certification
Google Partners -Analytics Certification
 
Polinomioak
Polinomioak Polinomioak
Polinomioak
 
Analytische meetkunde les6 gv alst
Analytische meetkunde les6 gv alstAnalytische meetkunde les6 gv alst
Analytische meetkunde les6 gv alst
 
Budapest erra 19_june2015_gjm
Budapest erra 19_june2015_gjmBudapest erra 19_june2015_gjm
Budapest erra 19_june2015_gjm
 
Chúng ta thật nhỏ bé trong vũ trụ !
Chúng ta thật nhỏ bé trong vũ trụ !Chúng ta thật nhỏ bé trong vũ trụ !
Chúng ta thật nhỏ bé trong vũ trụ !
 
Matka Irlantiin
Matka IrlantiinMatka Irlantiin
Matka Irlantiin
 
Ubiquitus and context aware technologies- a new interactive way to tour the city
Ubiquitus and context aware technologies- a new interactive way to tour the cityUbiquitus and context aware technologies- a new interactive way to tour the city
Ubiquitus and context aware technologies- a new interactive way to tour the city
 
RESUME.doc-August 2016
RESUME.doc-August 2016RESUME.doc-August 2016
RESUME.doc-August 2016
 
Hdmi技术构思
Hdmi技术构思Hdmi技术构思
Hdmi技术构思
 
ReillyT CV
ReillyT CVReillyT CV
ReillyT CV
 
Richard Bayley CV
Richard Bayley CVRichard Bayley CV
Richard Bayley CV
 
Стратегия RSI
Стратегия RSIСтратегия RSI
Стратегия RSI
 
Ramesh Hegde_Social Media Profile
Ramesh Hegde_Social Media ProfileRamesh Hegde_Social Media Profile
Ramesh Hegde_Social Media Profile
 

Similar to Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Community
 
Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Community
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Community
 
High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databasePeter Lawrey
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...In-Memory Computing Summit
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail Internet World
 
SDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxSDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxssuserabc741
 
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...Ceph Community
 
Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Howard Marks
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.pptzagreb2
 
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...NetAppUK
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceGlenn K. Lockwood
 
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bS016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bTony Pearson
 
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkSunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkShay Hassidim
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephDanielle Womboldt
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Community
 

Similar to Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers (20)

Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage
 
Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash Storage
 
High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL database
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail
 
SDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxSDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptx
 
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
 
Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Deploying ssd in the data center 2014
Deploying ssd in the data center 2014
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.ppt
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
 
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bS016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710b
 
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkSunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and Ceph
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph
 

Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers

  • 1. 1 Ceph on All-Flash Storage – Breaking Performance Barriers Zhou Hao Technical Marketing Engineer June 6th, 2015
  • 2. Forward-Looking Statements During our meeting today we will make forward-looking statements. Any statement that refers to expectations, projections or other characterizations of future events or circumstances is a forward-looking statement, including those relating to market growth, industry trends, future products, product performance and product capabilities. This presentation also contains forward-looking statements attributed to third parties, which reflect their projections as of the date of issuance. Actual results may differ materially from those expressed in these forward-looking statements due to a number of risks and uncertainties, including the factors detailed under the caption “Risk Factors” and elsewhere in the documents we file from time to time with the SEC, including our annual and quarterly reports. We undertake no obligation to update these forward-looking statements, which speak only as of the date hereof or as of the date of issuance by a third party, as the case may be.
  • 3. Requirement from Big Data @ PB Scale  Mixed media container, active- archiving, backup, locality of data  Large containers with application SLAs  Internet of Things, Sensor Analytics  Time-to-Value and Time-to-Insight  Hadoop  NoSQL  Cassandra  MongoDB  High read intensive access from billions of edge devices  Hi-Def video driving even greater demand for capacity and performance  Surveillance systems, analytics CONTENT REPOSITORIES BIG DATA ANALYTICS MEDIA SERVICES
  • 4. InfiniFlash™ System • Ultra-dense All-Flash Appliance - 512TB in 3U • Scale-out software for massive capacity - Unified Content: Block, Object - Flash optimized software with programmable interfaces (SDK) • Enterprise-Class storage features - snapshots, replication, thin provisioning • Enhanced Performance for Block and Object - 10x Improvement for Block Reads - 2x Improvement for Object Reads IF500 with InfiniFlash OS (Ceph) Ideal for large-scale storage & Best in class $/IOPS/TB
  • 5. InfiniFlash Hardware System Capacity 512TB* raw  All-Flash 3U Storage System  64 x 8TB Flash Cards with Pfail  8 SAS ports total Operational Efficiency and Resilient  Hot Swappable components, Easy FRU  Low power 450W(avg), 750W(active)  MTBF 1.5+ million hours Scalable Performance**  780K IOPS  7GB/s Throughput  Upgrade to 12GB/s in Q315 * 1TB = 1,000,000,000,000 bytes. Actual user capacity less. ** Based on internal testing of InfiniFlash 100. Test report available.
  • 6. Innovating Performance @ InfiniFlash OS  Major Improvements to Enhance Parallelism  Backend Optimizations – XFS and Flash  Messenger Performance Enhancements • Message signing • Socket Read aheads • Resolved severe lock contentions • Reduced ~2 CPU core usage with improved file path resolution from object ID • CPU and Lock optimized fast path for reads • Disabled throttling for Flash • Index Manager caching and Shared FdCache in filestore • Removed single Dispatch queue bottlenecks for OSD and Client (librados) layers • Shared thread pool implementation • Major lock reordering • Improved lock granularity – Reader / Writer locks • Granular locks at Object level • Optimized OpTracking path in OSD eliminating redundant locks
  • 7. Open Source with SanDisk Advantage InfiniFlash OS – Enterprise Level Hardened Ceph Enterprise Level Hardening  9,000 hours of cumulative IO tests  1,100+ unique test cases  1,000 hours of cluster rebalancing tests  1,000 hours of IO on iSCSI Testing at Hyperscale  Over 100 server node clusters  Over 4PB of flash storage Failure Testing  2,000 cycle node reboot  1,000 times node abrupt power cycle  1,000 times storage failure  1,000 times network failure  IO for 250 hours at a stretch Enterprise Level Support  Enterprise class support and services from SanDisk  Risk mitigation through long term support and a reliable long term roadmap  Continual contribution back to the community
  • 8. Test Configuration – Single InfiniFlash System Performance improves 2x to 12x depending on the Block size
  • 9. Performance Improvement: Stock Ceph vs IF OS 8K Random Blocks Top Row: Queue Depth Bottom Row: % Read IOs IOPS Avglatenv(ms) Avg Latency 0 50000 100000 150000 200000 250000 1 4 16 1 4 16 1 4 16 1 4 16 1 4 16 0 25 50 75 100 Stock Ceph (Giant) IFOS 1.0 0 20 40 60 80 100 120 1 4 16 1 4 16 1 4 16 1 4 16 1 4 16 0 25 50 75 100 • 2 RBD/Client x Total 4 Clients • 1 InfiniFlash node with 512TB IOPS Top Row: Queue Depth Bottom Row: % Read IOs
  • 10. 0 20000 40000 60000 80000 100000 120000 140000 160000 1 4 16 1 4 16 1 4 16 1 4 16 1 4 16 0 25 50 75 100 Stock Ceph IFOS 1.0 AvgLatency(ms) 0 20 40 60 80 100 120 140 160 180 1 4 16 1 4 16 1 4 16 1 4 16 1 4 16 0 25 50 75 100 IOPS Performance Improvement: Stock Ceph vs IF OS 64K Random Blocks IOPS Avg Latency • 2 RBD/Client x Total 4 Clients • 1 InfiniFlash node with 512TB Top Row: Queue Depth Bottom Row: % Read IOs Top Row: Queue Depth Bottom Row: % Read IOs
  • 11. Performance Improvement: Stock Ceph vs IF OS 256K Random Blocks 0 5000 10000 15000 20000 25000 30000 35000 40000 1 4 16 1 4 16 1 4 16 1 4 16 1 4 16 0 25 50 75 100 Stock Ceph IFOS 1.0 0 50 100 150 200 250 300 1 4 16 1 4 16 1 4 16 1 4 16 1 4 16 0 25 50 75 100 IOPS AvgLatency(ms) IOPS Avg Latency Top Row: Queue Depth Bottom Row: % Read IOs Top Row: Queue Depth Bottom Row: % Read IOs • 2 RBD/Client x Total 4 Clients • 1 InfiniFlash node with 512TB
  • 12. Test Configuration – 3 InfiniFlash Systems (128TB each) Performance scales linearly with additional InfiniFlash nodes
  • 13. Scaling with Performance 8K Random Blocks 0 100000 200000 300000 400000 500000 600000 700000 1 8 64 1 8 64 1 8 64 1 8 64 1 8 64 0 25 50 75 100 0 50 100 150 200 250 300 350 1 8 64 1 8 64 1 8 64 1 8 64 1 8 64 0 25 50 75 100 IOPS Avg Latency • 2 RBD/Client x 5 Clients • 3 InfiniFlash nodes with 128TB each Top Row: Queue Depth Bottom Row: % Read IOs Top Row: Queue Depth Bottom Row: % Read IOs IOPS AvgLatency(ms)
  • 14. Scaling with Performance 64K Random Blocks 0 50000 100000 150000 200000 250000 1 4 16 64 256 2 8 32 128 1 4 16 64 256 2 8 32 128 1 4 16 64 256 0 25 50 75 100 0 100 200 300 400 500 600 700 800 900 1000 1 8 64 1 8 64 1 8 64 1 8 64 1 8 64 0 25 50 75 100 IOPS Avg Latency • 2 RBD/Client x 5 Clients • 3 InfiniFlash nodes with 128TB each Top Row: Queue Depth Bottom Row: % Read IOs Top Row: Queue Depth Bottom Row: % Read IOs IOPS AvgLatency(ms)
  • 15. Scaling with Performance 256K Random Blocks 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 1 4 16 64 256 2 8 32 128 1 4 16 64 256 2 8 32 128 1 4 16 64 256 0 25 50 75 100 0 500 1000 1500 2000 2500 3000 3500 1 8 64 1 8 64 1 8 64 1 8 64 1 8 64 0 25 50 75 100 IOPS Avg Latency • 2 RBD/Client x 5 Clients • 3 InfiniFlash nodes with 128TB each Top Row: Queue Depth Bottom Row: % Read IOs Top Row: Queue Depth Bottom Row: % Read IOs IOPS AvgLatency(ms)
  • 16. Flexible Ceph Topology with InfiniFlash SAS HSEB A HSEB B OSDs …. HSEB A HSEB B HSEB A HSEB B …. LUN LUN Client Application …LUN LUN Client Application …LUN LUN Client Application … RBDs / RGW SCSI Targets ReadIOO Write IO RBDs / RGW SCSI Targets RBDs / RGW SCSI Targets OSDs OSDs OSDs OSDs OSDs ReadIOO ReadIOO  Disaggregated Architecture  Optimized for Performance  Higher Utilization  Reduced CostsStorage Farm Compute Farm
  • 17. Flash + HDD with Data Tier-ing Flash Performance with TCO of HDD  InfiniFlash OS performs automatic data placement and data movement between tiers based transparent to Applications  User defined Policies for data placement on tiers  Can be used with Erasure coding to further reduce the TCO Benefits  Flash based performance with HDD like TCO  Lower performance requirements on HDD tier enables use of denser and cheaper SMR drives  Denser and lower power compared to HDD only solution  InfiniFlash for High Activity data and SMR drives for Low activity data  60+ HDD per Server Compute Farm
  • 18. Flash Primary + HDD Replicas Flash Performance with TCO of HDD Primary replica on InfiniFlash HDD based data node for 2nd local replica HDD based data node for 3rd DR replica  Higher Affinity of the Primary Replica ensures much of the compute is on InfiniFlash Data  2nd and 3rd replicas on HDDs are primarily for data protection  High throughput of InfiniFlash provides data protection, movement for all replicas without impacting application IO  Eliminates cascade data propagation requirement for HDD replicas  Flash-based accelerated Object performance for Replica 1 allows for denser and cheaper SMR HDDs for Replica 2 and 3 Compute Farm
  • 19. TCO Example - Object Storage Scale-out Flash Benefits at the TCO of HDD $- $1,000 $2,000 $3,000 $4,000 $5,000 $6,000 $7,000 $8,000 Traditional ObjStore on HDD InfiniFlash ObjectStore -3 Full Replicas on Flash InfiniFlash with ErasureCoding - All Flash InfiniFlash - Flash Primary & HDD copies x10000 3Y TCO comparison for 96PB object storage 3 Year Opex TCA 0 20 40 60 80 100 Total Racks • Weekly failure rate for a 100PB deployment 15-35 HDD vs. 1 InfiniFlash Card • HDD cannot handle simultaneous egress/ingress • HDD long rebuild times, multiple failures and rebalancing of data impact in service disruption • Flash provides guaranteed & consistent SLA • Flash capacity utilization >> HDD due to reliability & ops • Flash low power consumption 450W(avg), 750W(active) Note that operational/maintenance cost and performance benefits are not accounted for in these models!!!
  • 20. InfiniFlash™ System The First All-Flash Storage System Built for High Performance Ceph
  • 21. 21 © 2015 SanDisk Corporation. All rights reserved. SanDisk is a trademark of SanDisk Corporation, registered in the United States and other countries. InfiniFlash is a trademarks of SanDisk Enterprise IP LLC. All other product and company names are used for identification purposes and may be trademarks of their respective holder(s). http://bigdataflash.sandisk.com/infiniflash Steven.Xi@SanDisk.com Sales Tonny.Ai@SanDisk.com Sales Engineering Hao.Zhou@SanDisk.com Technical Marketing Venkat.Kolli@SanDisk.com Production Management