Intel QLC: Cost-effective Ceph on NVMe

Intel QLC: Cost-effective Ceph on NVMe
Ceph Month 06/11/2021
Anthony D’Atri, Solutions Architect anthony.datri@intel.com
Yuyang Sun, Product Marketing Manager yuyang.sun@intel.com

2
Ceph Month June 2021
Legal Disclaimers
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or
component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of
others.

3
§ SSDs are too expensive
§ SSDs are too small
§ QLC is too slow and DWPD
are too low
§ HDDs are more reliable
SSD vs HDD:
The Reality
I’d like to use SSDs for Ceph
OSDs but they can’t compete
with HDDs

4
§ SSDs are too expensive
§ SSDs are too small
§ QLC is too slow and DWPD
are too low
§ HDDs are more reliable
SSD vs HDD:
The Reality
The Myth
I’d like to use SSDs for Ceph
OSDs but they can’t compete
with HDDs

5
§ Competitive now; subtle factors
beyond calculators1
§ HDDs may be short-stroked or
capacity restricted: interface
bottleneck and recovery time
§ HDDs run out of IOPS before
capacity: extra drives are required
to meet IOPS needs
§ Expand clusters faster than data
inflow: priceless!
Cost
TCO crossover soon … or
today!
See appendix for footnotes.

6
§ TB/chassis, TB/RU, TB/watt, OpEx,
racks, cost of RMA2/crushing
failed drives
§ Cluster maintenance without
prolonged and risky reduced
redundancy.
§ How much does degraded user/
customer experience cost?
Especially during recovery?
Cost
TCO crossover soon … or
today!

7
• 144-layer QLC NAND enables
high-capacity devices
• Intel® NVMe QLC SSD is
available in capacities up to
30TB3
• Up to 1.5PB raw per RU with
E1.L EDSFF drives4
• Abundance of IOPS allows
flexible capacity provisioning
Capacity
Large capacity: fewer chassis, RUs,
and racks

8
§ Intel® SSD D5-P5316 NVMe QLC
delivers up to 800K 4KB random read
IOPS, 38% increase gen over gen3
§ Up to 7000 MB/s sequential read, 2x+
gen over gen3
§ SATA saturates at ~550 MB/s5
§ PCIe Gen 4 NVMe crushes the SATA
bottleneck
§ Two or more OSDs per device improve
throughput, IOPS, and tail latency6
Performance
Fast and wide
See appendix for footnotes. Results may vary.

10
§ RGW is prone to hotspots and QoS
events
§ One strategy to mitigate latency and
IOPS bottlenecks is to cap HDD size, eg.
at 8TB
§ Adjustment of scrub intervals, a CDN
front end, and load balancer throttling
can help, but OSD upweighting a single
HDD still can take weeks.
§ OSD crashes can impact API availability
§ Replacing HDDs with Intel QLC SSDs for
bucket data can markedly improve QoS
and serviceability
Performance
Operational Advantages

11
§ Most SSD failures are firmware
– and fixable in-situ7
§ 99% of SSDs never exceed
15% of rated endurance7,8
§ One RGW deployment projects
seven years of endurance using
previous gen Intel QLC
§ Current gen provides even
more
Reliability and
Endurance
Better than you think, and
more than you need!

12
§ 30TB Intel® SSD D5-P5316
QLC SSD rated at ≥ 22PB of
IU-aligned random writes9
§ 1DWPD 7.68T TLC SSD rated
at <15PB of 4K random
writes9
§ Tunable endurance via
overprovisioning13
Reliability and
Endurance
Get with the program
[erase cycle]

13
§ 8TB HDD 0.44% AFR spec, 1-
2% actual9
§ Intel DC QLC NAND SSD
AFR <0.44%9
§ Greater temperature range9
§ Better UBER9
§ Cost to have hands replace a
failed drive? To RMA?
Reliability and
OpEx
Drive failures cost money
and QoS

14
Intel® QLC SSD
delivers up to 104
PBW, significantly
outperforming HDDs
2.75 2.75
14.016
22.93
56.71
104.55
0
20
40
60
80
100
120
Western
Digital
Ultrastar DC
HC650 20TB
Seagate Exos
X18 18 TB
Intel® SSD D7-
P5510 7.38
TB (64K
random write)
Intel® SSD D5-
P5316 30.72
TB (64K
random write)
Intel® SSD D5-
P5316 24.58
TB (64K
random write)
[20% OP]
Intel® SSD D5-
P5316 30.72
TB (64K
sequential
writes)
HDD and SSD endurance in Petabytes Written
(PBW)
(higher is better)
HDD only allows 2.75PB of combined read / write IO before
exceeding the AFR target.
See appendix for sources 8, 9, 11, 12. Results may vary.

15
§ bluestore_min_alloc_size=16
k|64k
§ Writes aligned to IU multiples
enhance performance and
endurance
§ Metadata is small percent of
overall workload
Optimize endurance
and performance
Align to IU size

16
§ RGW: large objects
§ RBD: Backup, Archive, Media
§ CephFS: 4MB block size,
mostly used for larger files
§ Metadata, RocksDB are small
fraction of overall write
workload
Example
use cases

17
§ RocksDB block size aligned to IU
§ RocksDB universal compaction
§ Other RocksDB tuning
§ Optane acceleration of WAL+DB,
write shaping
§ Crimson, RocksDB successor
§ Separate pools for large/small
objects. EC & replication, QLC & TLC.
Internal RGW enhancement? Lua
script to change storage class?
Additional
optimizations
To be explored, because
better is still better:

18
Appendix
1. https://www.snia.org/forums/cmsi/ssd-endurance
2. Author’s professional experience: RMA cost not worth the effort for devices worth < USD 500
3. https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/Intel-D5-P5316_product_Brief-728323.pdf
https://www.intel.com/content/www/us/en/products/docs/memory-storage/solid-state-drives/data-center-ssds/d5-p5316-series-brief
4. https://echostreams.com/products/flachesan2n108m-un
https://www.supermicro.com/en/products/system/1U/1029/SSG-1029P-NES32R.cfm
5. https://www.isunshare.com/computer/why-the-max-sata-3-speed-is-550mbs-usually.html
6. https://ceph.io/community/part-4-rhcs-3-2-bluestore-advanced-performance-investigation
7. https://searchstorage.techtarget.com/post/Monitoring-the-Health-of-NVMe-SSDs
https://searchstorage.techtarget.com/tip/4-causes-of-SSD-failure-and-how-to-deal-with-them
8. https://www.usenix.org/system/files/fast20-maneas.pdf
9. https://www.intel.com/content/dam/www/central-libraries/us/en/documents/qlc-nand-ready-for-data-center-white-paper.pdf
10. https://searchstorage.techtarget.com/post/Monitoring-the-Health-of-NVMe-SSDs
https://searchstorage.techtarget.com/tip/4-causes-of-SSD-failure-and-how-to-deal-with-them
11. https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc600-
series/data-sheet-ultrastar-dc-hc650.pdf
12. https://www.seagate.com/files/www-content/datasheets/pdfs/exos-x18-channel-DS2045-1-2007GB-en_SG.pdf
13. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/over-provisioning-nand-based-ssds-better-endurance-whitepaper.pdf

Intel QLC: Cost-effective Ceph on NVMe

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Intel QLC: Cost-effective Ceph on NVMe

Similar to Intel QLC: Cost-effective Ceph on NVMe (20)

Recently uploaded

Recently uploaded (20)

Intel QLC: Cost-effective Ceph on NVMe