Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers
1. 1
Ceph on All-Flash Storage –
Breaking Performance Barriers
Zhou Hao
Technical Marketing Engineer
June 6th, 2015
2. Forward-Looking Statements
During our meeting today we will make forward-looking statements.
Any statement that refers to expectations, projections or other characterizations of future events or
circumstances is a forward-looking statement, including those relating to market growth, industry
trends, future products, product performance and product capabilities. This presentation also
contains forward-looking statements attributed to third parties, which reflect their projections as of the
date of issuance.
Actual results may differ materially from those expressed in these forward-looking statements due
to a number of risks and uncertainties, including the factors detailed under the caption “Risk Factors”
and elsewhere in the documents we file from time to time with the SEC, including our annual and
quarterly reports.
We undertake no obligation to update these forward-looking statements, which speak only as
of the date hereof or as of the date of issuance by a third party, as the case may be.
3. Requirement from Big Data @ PB Scale
Mixed media container, active-
archiving, backup, locality of data
Large containers with application
SLAs
Internet of Things, Sensor
Analytics
Time-to-Value and Time-to-Insight
Hadoop
NoSQL
Cassandra
MongoDB
High read intensive access from
billions of edge devices
Hi-Def video driving even greater
demand for capacity and
performance
Surveillance systems, analytics
CONTENT REPOSITORIES BIG DATA ANALYTICS MEDIA SERVICES
4. InfiniFlash™ System
• Ultra-dense All-Flash Appliance
- 512TB in 3U
• Scale-out software for massive capacity
- Unified Content: Block, Object
- Flash optimized software with
programmable interfaces (SDK)
• Enterprise-Class storage features
- snapshots, replication, thin
provisioning
• Enhanced Performance for Block and
Object
- 10x Improvement for Block Reads
- 2x Improvement for Object Reads
IF500 with InfiniFlash OS (Ceph)
Ideal for large-scale storage
&
Best in class $/IOPS/TB
5. InfiniFlash Hardware System
Capacity 512TB* raw
All-Flash 3U Storage System
64 x 8TB Flash Cards with Pfail
8 SAS ports total
Operational Efficiency and Resilient
Hot Swappable components, Easy
FRU
Low power 450W(avg), 750W(active)
MTBF 1.5+ million hours
Scalable Performance**
780K IOPS
7GB/s Throughput
Upgrade to 12GB/s in Q315
* 1TB = 1,000,000,000,000 bytes. Actual user capacity less.
** Based on internal testing of InfiniFlash 100. Test report available.
6. Innovating Performance @ InfiniFlash OS
Major Improvements to
Enhance Parallelism
Backend Optimizations
– XFS and Flash
Messenger Performance
Enhancements
• Message signing
• Socket Read aheads
• Resolved severe lock contentions
• Reduced ~2 CPU core usage with improved file
path resolution from object ID
• CPU and Lock optimized fast path for reads
• Disabled throttling for Flash
• Index Manager caching and Shared FdCache in
filestore
• Removed single Dispatch queue bottlenecks for
OSD and Client (librados) layers
• Shared thread pool implementation
• Major lock reordering
• Improved lock granularity – Reader / Writer locks
• Granular locks at Object level
• Optimized OpTracking path in OSD eliminating
redundant locks
7. Open Source with SanDisk Advantage
InfiniFlash OS – Enterprise Level Hardened Ceph
Enterprise Level Hardening
9,000 hours of
cumulative IO tests
1,100+ unique test cases
1,000 hours of cluster
rebalancing tests
1,000 hours of IO on
iSCSI
Testing at Hyperscale
Over 100 server node
clusters
Over 4PB of flash storage
Failure Testing
2,000 cycle node reboot
1,000 times node abrupt
power cycle
1,000 times storage
failure
1,000 times network
failure
IO for 250 hours at a
stretch
Enterprise Level Support
Enterprise class
support and services
from SanDisk
Risk mitigation through
long term support
and a reliable long
term roadmap
Continual contribution
back to the community
8. Test Configuration – Single InfiniFlash System
Performance improves 2x to 12x depending on the Block size
16. Flexible Ceph Topology with InfiniFlash
SAS
HSEB A HSEB B
OSDs
….
HSEB A HSEB B HSEB A HSEB B
…. LUN LUN
Client Application
…LUN LUN
Client Application
…LUN LUN
Client Application
…
RBDs / RGW
SCSI Targets
ReadIOO
Write IO
RBDs / RGW
SCSI Targets
RBDs / RGW
SCSI Targets
OSDs OSDs OSDs OSDs OSDs
ReadIOO
ReadIOO
Disaggregated Architecture
Optimized for Performance
Higher Utilization
Reduced CostsStorage Farm
Compute Farm
17. Flash + HDD with Data Tier-ing
Flash Performance with TCO of HDD
InfiniFlash OS performs automatic data
placement and data movement between tiers
based transparent to Applications
User defined Policies for data placement on
tiers
Can be used with Erasure coding to further
reduce the TCO
Benefits
Flash based performance with HDD like TCO
Lower performance requirements on HDD tier
enables use of denser and cheaper SMR drives
Denser and lower power compared to HDD only
solution
InfiniFlash for High Activity data and SMR drives
for Low activity data
60+ HDD per Server
Compute Farm
18. Flash Primary + HDD Replicas
Flash Performance with TCO of HDD
Primary replica on
InfiniFlash
HDD based data node
for 2nd local replica
HDD based data node
for 3rd DR replica
Higher Affinity of the Primary Replica ensures much
of the compute is on InfiniFlash Data
2nd and 3rd replicas on HDDs are primarily for data
protection
High throughput of InfiniFlash provides data
protection, movement for all replicas without
impacting application IO
Eliminates cascade data propagation requirement
for HDD replicas
Flash-based accelerated Object performance for
Replica 1 allows for denser and cheaper SMR HDDs
for Replica 2 and 3
Compute Farm
19. TCO Example - Object Storage
Scale-out Flash Benefits at the TCO of HDD
$-
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
Traditional
ObjStore on
HDD
InfiniFlash
ObjectStore -3
Full Replicas
on Flash
InfiniFlash
with
ErasureCoding
- All Flash
InfiniFlash -
Flash Primary
& HDD copies
x10000
3Y TCO comparison for 96PB object storage
3 Year Opex
TCA
0
20
40
60
80
100
Total Racks
• Weekly failure rate for a 100PB deployment
15-35 HDD vs. 1 InfiniFlash Card
• HDD cannot handle simultaneous egress/ingress
• HDD long rebuild times, multiple failures and
rebalancing of data impact in service disruption
• Flash provides guaranteed & consistent SLA
• Flash capacity utilization >> HDD due to
reliability & ops
• Flash low power consumption
450W(avg), 750W(active)
Note that operational/maintenance cost and performance benefits are not accounted for in these models!!!