SlideShare a Scribd company logo
1 of 39
Download to read offline
Peggy Shen, Software Solutions Architect, Intel Corp.
Jack Zhang, Enterprise Architect, Intel Corp.
2016-08
Legalnotices
Copyright © 2016 Intel Corporation.
All rights reserved. Intel, the Intel logo, Xeon, Intel Inside, and 3D XPoint are trademarks of Intel Corporation in the U.S. and/or
other countries.
*Other names and brands may be claimed as the property of others.
FTC Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not
unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations.
Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured
by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product
User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision
#20110804
The cost reduction scenarios described in this document are intended to enable you to get a better understanding of how
the purchase of a given Intel product, combined with a number of situation-specific variables, might affect your future cost
and savings. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software,
or configuration will affect actual performance. Consult other sources of information to evaluate performance as you
consider your purchase. For more complete information about performance and benchmark results, visit
http://www.intel.com/performance.
2
Agenda
• Introduction, Ceph at Intel
• All-flash Ceph configurations and benchmark data
• OEMs/ISVs/Intel Ceph Reference Architects/Recipes
• Future Ceph* with Intel NVM Technologies
3D XpointTM and 3D NAND SSD
• Summary
3*Other names and brands may be claimed as the property of others.
4
Acknowledgements
This is team work.
Thanks for the contributions of Intel Team:
PRC team: Jian Zhang, Yuan Zhou, Haodong Tang, Jianpeng Ma, Ning Li
US team: Daniel Ferber, Tushar Gohad, Orlando Moreno, Anjaneya Chagam
5
Introduction, Ceph at Intel
6
Ceph at Intel – A brief introduction
Optimize for Intel® platforms, flash and networking
• Compression, Encryption hardware offloads (QAT & SOCs)
• PMStore (for 3D XPoint DIMMs)
• RBD caching and Cache tiering with NVM
• IA optimized storage libraries to reduce latency (ISA-L, SPDK)
Performance profiling, analysis and community contributions
• All flash workload profiling and latency analysis, performance portal http://01.org/cephperf
• Streaming, Database and Analytics workload driven optimizations
Ceph enterprise usages and hardening
• Manageability (Virtual Storage Manager)
• Multi Data Center clustering (e.g., async mirroring)
End Customer POCs with focus on broad industry influence
• CDN, Cloud DVR, Video Surveillance, Ceph Cloud Services, Analytics
• Working with 50+ customers to help them enabling Ceph based storage solutions
POCs
Ready to use IA, Intel NVM optimized systems & solutions from OEMs & ISVs
• Ready to use IA, Intel NVM optimized systems & solutions from OEMs & ISVs
• Intel system configurations, white papers, case studies
• Industry events coverage
Go to
market
Intel® Storage
Acceleration Library
(Intel® ISA-L)
Intel® Storage Performance
Development Kit (Intel® SPDK)
Intel® Cache Acceleration
Software (Intel® CAS)
Virtual Storage Manager Ce-Tune Ceph Profiler
7
Intel Ceph Contribution Timeline
2014 2015 2016
* Right Edge of box indicates approximate release date
New Key/Value Store
Backend (rocksdb)
Giant* Hammer Infernalis Jewel
CRUSH Placement
Algorithm improvements
(straw2 bucket type)
Bluestore Backend
Optimizations for NVM
Bluestore SPDK
Optimizations
RADOS I/O Hinting
(35% better EC Write erformance)
Cache-tiering with SSDs
(Write support)
PMStore
(NVM-optimized backend
based on libpmem)
RGW, Bluestore
Compression, Encryption
(w/ ISA-L, QAT backend)
Virtual Storage Manager
(VSM) Open Sourced
CeTune
Open Sourced
Erasure Coding
support with ISA-L
Cache-tiering with SSDs
(Read support)
Client-side Block Cache
(librbd)
8
Intel Ceph value add-on areas
Support
• Sizing
• Deploying
• Benchmarking
• Tunings
• Troubleshooting
• upgrade
Feature
• Management
• Interface – iscsi
• Compression
• Deduplication
• Encryption
Performance
• Caching
• SSD
• SPDK – speedup
Stability
• Stable code
• Production
ready
Solutions
• Customizing
• IOPS optimized,
Throughput
optimized,
Capacity-archive
optimized
Tools & BKMs
VSM &
Upstream work
& Library
Upstream
features &
Reference
solutions
Upstream POC solutions
Ceph Powered by Intel
9
All-flash Ceph configurations and benchmark data
Ceph and NVM SSDs
10
* NVM – Non-volatile Memory
Suggested Configurations for Ceph* Storage Node
Standard/good (baseline):
Use cases/Applications: that need high capacity storage with high
throughput performance
 NVMe*/PCIe* SSD for Journal + Caching, HDDs as OSD data drive
 Example: 1x 1.6TB Intel® SSD DC P3700 as Journal + Intel® Cache
Acceleration Software (Intel® CAS) + 12 HDDs
Better IOPS
Use cases/Applications: that need higher performance especially for
throughput, IOPS and SLAs with medium storage capacity requirements
 NVMe/PCIe SSD as Journal, no caching, High capacity SATA SSD for
data drive
 Example: 1x 800GB Intel® SSD DC P3700 + 4 to 6x 1.6TB DC S3510
Best Performance
Use cases/Applications: that need highest performance (throughput
and IOPS) and low latency.
 All NVMe/PCIe SSDs
 Example: 4 to 6 x 2TB Intel SSD DC P3700 Series
More Information: https://intelassetlibrary.tagcmd.com/#assets/gallery/11492083/details
*Other names and brands may be claimed as the property of others.
11
Ceph* storage node --Good
CPU Intel(R) Xeon(R) CPU E5-2650v3
Memory 64 GB
NIC 10GbE
Disks 1x 1.6TB P3700 + 12 x 4TB HDDs (1:12 ratio)
P3700 as Journal and caching
Caching software Intel(R) CAS 3.0, option: Intel(R) RSTe/MD4.3
Ceph* Storage node --Better
CPU Intel(R) Xeon(R) CPU E5-2690
Memory 128 GB
NIC Duel 10GbE
Disks 1x Intel(R) DC P3700(800G) + 4x Intel(R) DC S3510 1.6TB
Ceph* Storage node --Best
CPU Intel(R) Xeon(R) CPU E5-2699v3
Memory >= 128 GB
NIC 2x 40GbE, 4x dual 10GbE
Disks 4 to 6 x Intel® DC P3700 2TB
12
All Flash (PCIe* SSD + SATA SSD) Ceph Configuration
2x10Gb NIC
Test Environment
CEPH1
MON
OSD1 OSD8…
FIO FIO
CLIENT 1
1x10Gb NIC
.
FIO FIO
CLIENT 2
FIO FIO
CLIENT 3
FIO FIO
CLIENT 4
FIO FIO
CLIENT 5
CEPH2
OSD1 OSD8…
CEPH3
OSD1 OSD8…
CEPH4
OSD1 OSD8…
CEPH5
OSD1 OSD8…
“Better IOPS Ceph Configuration”¹
More Information: https://intelassetlibrary.tagcmd.com/#assets/gallery/11492083/details
*Other names and brands may be claimed as the property of others.
¹ For configuration see Slide 5
5x Client Node
• Intel® Xeon® processor E5-
2699 v3 @ 2.3GHz, 64GB
mem
• 10Gb NIC
5x Storage Node
• Intel® Xeon® processor E5-
2699 v3 @ 2.3 GHz
• 128GB Memory
• 1x 1T HDD for OS
• 1x Intel® DC P3700 800G
SSD for Journal (U.2)
• 4x 1.6TB Intel® SSD DC
S3510 as data drive
• 2 OSD instances one each
Intel® DC S3510 SSD
13
Ceph* on All Flash Array
--Tuning and optimization efforts
• Up to 16x performance improvement for 4K random read, peak throughput
1.08M IOPS
• Up to 7.6x performance improvement for 4K random write, 140K IOPS
4K Random Read Tunings 4K Random Write Tunings
Default Single OSD Single OSD
Tuning-1 2 OSD instances per SSD 2 OSD instances per SSD
Tuning-2 Tuning1 + debug=0 Tuning2+Debug 0
Tuning-3 Tuning2 + jemalloc
tuning3+ op_tracker off, tuning fd
cache
Tuning-4 Tuning3 + read_ahead_size=16 Tuning4+jemalloc
Tuning-5 Tuning4 + osd_op_thread=32 Tuning4 + Rocksdb to store omap
Tuning-6 Tuning5 + rbd_op_thread=4 N/A
-
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
Default Tuning-1 Tuning-2 Tuning-3 Tuning-4 Tuning-5 Tuning-6
Normalized
4K random Read/Write Tunings
4K Random Read 4K random write
Performance numbers are Intel Internal estimates
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Intel and Intel logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries
14
Ceph* on All Flash Array
--Tuning and optimization efforts
 1.08M IOPS for 4K random read, 144K IOPS for 4K random write with tunings
and optimizations
1
2
4
8
16
32
64
128
0 200000 400000 600000 800000 1000000 1200000 1400000
LATENCY(MS)
IOPS
RANDOM READ PERFORMANCE
RBD # SCALE TEST
4K Rand.R 8K Rand.R 16K Rand.R 64K Rand.R
63K 64k Random Read
IOPS @ 40ms
300K 16k Random
Read IOPS @ 10 ms
1.08M 4k Random
Read IOPS @ 3.4ms500K 8k Random
Read IOPS @ 8.8ms
0
2
4
6
8
10
0 20000 40000 60000 80000 100000 120000 140000 160000
LATENCY(MS)
IOPS
RANDOM WRITE PERFORMANCE
RBD # SCALE TEST
4K Rand.W 8K Rand.w 16K Rand.W 64K Rand.W
23K 64k Random Write
IOPS @ 2.6ms
88K 16kRandom Write
IOPS @ 2.7ms
132K 8k Random Write
IOPS @ 4.1ms
144K 4kRandom
Write IOPS @ 4.3ms
Excellent random read performance and Acceptable random write performance
Performance numbers are Intel Internal estimates
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Intel and Intel logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries
Ceph* on All Flash Array
--Ceph*: SSD Cluster vs. HDD Cluster
• Both journal on PCI Express*/NVM Express* SSD
• 4K random write, need ~ 58x HDD Cluster (~ 2320 HDDs) to
get same performance
• 4K random read, need ~ 175x HDD Cluster (~ 7024 HDDs)
to get the same performance
ALL SSD Ceph* helps provide excellent TCO (both Capx and Opex), not only performance but
also space, Power, Fail rate, etc.
Client Node
• 5 nodes with Intel® Xeon® processor E5-2699 v3 @ 2.30GHz,
64GB memory
• OS : Ubuntu* Trusty
Storage Node
• 5 nodes with Intel® Xeon® processor E5-2699 v3 @ 2.30GHz,
128GB memory
• Ceph* Version : 9.2.0, OS : Ubuntu* Trusty
• 1 x Intel(R) DC P3700 SSDs for Journal per node
Cluster difference:
SSD cluster : 4 x Intel(R) DC S3510 1.6TB for OSD per node
HDD cluster : 10 x SATA 7200RPM HDDs as OSD per node
15
0
50
100
150
200
4K Rand.W 4K Rand.R
Normalized
Performance Comparison
HDD SSD
~ 58.2
~175.6
Performance numbers are Intel Internal estimates
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Intel and Intel logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries
16
All-NVMe Ceph Cluster for MySQL Hosting
Supermicro 1028U-TN10RT+
NVMe2
NVMe3 NVMe4
CephOSD1
CephOSD2
CephOSD3
CephOSD4
CephOSD16
5-Node all-NVMe Ceph Cluster
Dual-Xeon E5 2699v4@2.2GHz, 44C HT, 128GB DDR4
RHEL7.2, 3.10-327, Ceph v10.2.0, bluestore async
ClusterNW2x10GbE
10x Client Systems
Dual-socket Xeon E5 2699v3@2.3GHz
36 Cores HT, 128GB DDR4
Public NW 2x 10GbE
Docker3
Sysbench Client
Docker4
Sysbench Client
DB containers
16 vCPUs, 32GB mem,
200GB RBD volume,
100GB MySQL dataset,
InnoDB buf cache 25GB (25%)
CephRBDClient
Docker1 (krbd)
MySQL DB Server
NVMe1
Client containers
16 vCPUs, 32GB RAM
FIO 2.8, Sysbench 0.5Docker2 (krbd)
MySQL DB Server
20x 1.6TB P3700 SSDs
80 OSDs
2x Replication
19TB Effective Capacity
Tests at cluster fill-level 82%
FIO 4K Random Read/Write Performance and Latency
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or
software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark
parameters.
0
1
2
3
4
5
6
7
8
9
10
11
12
0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000
AverageLatency(ms)
IOPS
IODepth Scaling - Latency vs IOPS - Read, Write, and 70/30 4K Random Mix
5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE
Ceph 10.2.1 w/ BlueStore. 6x RBD FIO Clients
100% Rand Read 100% Rand Write 70% Rand Read
~1.4M 4k Random Read IOPS
@~1 ms avg
~220k 4k Random Write IOPS
@~5 ms avg
~560k 70/30% (OLTP)
Random IOPS @~3 ms avg ~1.6M 4k Random Read IOPS
@~2.2 ms avg
First Ceph cluster to break ~1.4 Million 4K random IOPS, ~1ms response time in 5U
17
Sysbench MySQL OLTP Performance
(100% SELECT, 16KB Avg IO Size, QD=2-8 Avg)
InnoDB buf pool = 25%, SQL dataset = 100GB
0
5
10
15
20
25
30
35
0 200000 400000 600000 800000 1000000 1200000 1400000
AvgLatency(ms)
Aggregate Queries Per Second (QPS)
Sysbench Thread Scaling - Latency vs QPS – 100% read (Point SELECTs)
5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE
Ceph 10.1.2 w/ BlueStore. 20 Docker-rbd Sysbench Clients (16vCPUs, 32GB)
100% Random Read
~55000 QPS with 1 client
1 million QPS with 20 clients @ ~11 ms avg
2 Sysbench threads/client
~1.3 million QPS with 20 Sysbench clients,
8 Sysbench threads/client
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or
software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark
parameters.
18
Database page size = 16KB
Sysbench MySQL OLTP Performance
(100% UPDATE, 70/30% SELECT/UPDATE)
0
50
100
150
200
250
300
350
400
450
500
0 100000 200000 300000 400000 500000 600000
AvgLatency(ms)
Aggregate Queries Per Second (QPS)
Sysbench Thread Scaling - Latency vs QPS – 100% Write (Index UPDATEs), 70/30% OLTP
5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE
Ceph 10.2.1 w/ BlueStore. 20 Docker-rbd Sysbench Clients (16vCPU, 32GB)
100% Random Write 70/30% Read/Write
~400k 70/30% OLTP QPS@~50 ms avg
~25000 QPS w/ 1 Sysbench client (4-8 threads)
~100k Write QPS@~200 ms avg (Aggregate, 20 clients)
~5500 QPS w/ 1 Sysbench client (2-4 threads)
InnoDB buf pool = 25%, SQL dataset = 100GB
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or
software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark
parameters.
19
Database page size = 16KB
20
OEMs/ISVs/Intel Ceph
Reference Architects/Recipes
Available Reference Architectures (recipes)
Available Reference Architectures (recipes)
• http://www.redhat.com/en/files/resources/en-rhst-cephstorage-supermicro-
INC0270868_v2_0715.pdf
• http://www.qct.io/account/download/download?order_download_id=1065&dtype=Reference%20
Architecture
• https://www.redhat.com/en/resources/red-hat-ceph-storage-hardware-configuration-guide
• https://www.percona.com/resources/videos/accelerating-ceph-database-workloads-all-pcie-ssd-cluster
• https://www.percona.com/resources/videos/mysql-cloud-head-head-performance-lab
https://www.thomas-krenn.com/en/products/storage-systems/suse-enterprise-storage/ses-appliance-performance.html
• https://intelassetlibrary.tagcmd.com/#assets/gallery/11492083
23
Future Ceph* with Intel NVM Technologies
3D XpointTM and 3D NAND
3DMLCandTLCNAND
BuildingblockenablingexpansionofSSDintoHDDsegments
3DXpoint™
Buildingblocksforultrahighperformance
storage&memory
TechnologyDriven:NVMLeadership
Moore’s Law Continues to Disrupt the Computing Industry
U.2 SSD
First Intel® SSD for
Commercial Usage
2017 >10TB
1,000,000x
the capacity while
shrinking the
form factor
1992 12MB
Source: Intel projections on SSD capacity
2019201820172014
>6TB >30TB 1xxTB>10TB
25
3D XPoint™
Latency: ~100X
Size of Data: ~1,000X
NAND
Latency: ~100,000X
Size of Data: ~1,000X
Latency: 1X
Size of Data: 1X
SRAM
Latency: ~10 MillionX
Size of Data: ~10,000 X
HDD
Latency: ~10X
Size of Data: ~100X
DRAM
3DXpoint™TECHNOLOGY
STORAGE
Technology claims are based on comparisons of latency, density and write cycling metrics amongst memory technologies recorded on published specifications of
in-market memory products against internal Intel specifications.
Performance numbers are Intel Internal estimates
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Intel and Intel logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries
27
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance.
Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark
results, visit http://www.intel.com/performance. Server Configuration: 2x Intel® Xeon® E5 2690 v3 NVM Express* (NVMe) NAND based SSD: Intel P3700 800 GB, 3D
Xpoint based SSD: Optane NVMe OS: Red Hat* 7.1
Intel® Optane™ storage (prototype) vs Intel® SSD DC
P3700 Series at QD=1
3DXpoint™Technology-NSGSoftwareReadiness
CAS-LinuxLatencyOptimizations
Configuration & Methodology
• Latency measured in microseconds (us)
• Intel® Xeon® Dual Socket E5 2699 v3 @ 2.3GHz,
64GB ECC DDR4 DRAM
• Red Hat Enterprise Linux Server release 7.0, kernel
version: 3.10.0-123.13.2.el7
• fio-2.2.8, zipfian distribution with theta 1.2
(random_distribution=zipf:1.2), test file size=50GiB
• 100% cache hits for following workload: 4K read, 1
worker, 1 qd, all IO requests are served from cache
• Cache Device: Intel® DC P3700 400GB, raw access
to block device (no file system on SSD Cache SSD)
• Latency comparisons between major Intel® CAS
Linux software releases
Dec’14 June’15 Dec’16 Jul’16
Intel technologies may require enabled hardware, specific software, or services activation. Software and workloads used in performance tests may have been
optimized for performance only on Intel microprocessors. Performance tests, such as Optistruct* are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests
to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
*Other names and brands may be claimed as the property of others
29
5X lower 99th%
Higher is better
*Benchmarked on early prototype samples, 2S Haswell/Broadwell Xeon platform single server.
Data produced without any tuning. We expect performance to improve with tuning.
PCIe SSD Intel Optane
Lower is better
PCIe SSD Intel Optane
2X the
Throughput
Performance numbers are Intel Internal estimates
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Intel and Intel logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries
Storage Hierarchy Tomorrow
Hot
3D XPoint™ DIMMs
NVM Express* (NVMe)
3D XPoint™ SSDs
Warm
NVMe 3D NAND SSDs
Cold
NVMe 3D NAND SSDs
SATA or SAS HDDs
~6GB/s per channel
~250 nanosecond latency
PCI Express* (PCIe*) 3.0 x4 link, ~3.2 GB/s
<10 microsecond latency
SATA* 6Gbps
Minutes offline
DRAM: 10GB/s per channel, ~100 nanosecond latency
PCIe 3.0 x4, x2 link
<100 microsecond latency
Comparisons between memory technologies based on in-market product specifications and internal Intel specifications.
Server side and/or AFA
Business Processing
High Performance/In-Memory Analytics
Scientific
Cloud Web/Search/Graph
Big Data Analytics (Hadoop*)
Object Store / Active-Archive
Swift, lambert, HDFS, Ceph*
Low cost archive
30
31
3D XPoint™ & 3D NAND Enable
High performance & cost effective solutions
Enterprise class, highly reliable, feature rich, and
cost effective AFA solution:
‒ NVMe as Journal, 3D NAND TLC SSD as data store
Enhance value through special software
optimization on filestore and bluestore backend
Ceph Node
S3510
1.6TB
S3510
1.6TB
S3510
1.6TB
S3510
1.6TB
P3700
U.2 800GB
Ceph Node
P4500
4TB
P4500
4TB
P4500
4TB
P4500
4TB
P3700 & 3D Xpoint™ SSDs
3D NAND
P4500
4TB
3D XPoint™
(performance) (capacity)
32
3D Xpoint™ opportunities: Bluestore backend
• Three usages for PMEM device
• Backend of bluestore: raw PMEM block device or
file of dax-enabled FS
• Backend of rocksdb: raw PMEM block device or
file of dax-enabled FS
• Backend of rocksdb’s WAL: raw PMEM block
device or file of DAX-enabled FS
• Two methods for accessing PMEM devices
• libpmemblk
• mmap + libpmemlib
• https://github.com/Ceph*/Ceph*/pull/8761
BlueStore
Rocksdb
BlueFS
PMEMDevice PMEMDevice PMEMDevice
Metadata
Libpmemlib
Libpmemblk
DAX Enabled File System
mmap
Load/store
mmap
Load/store
File
File
File
API
API
Data
Summary
• Strong demands and trends to all-flash array Ceph* solutions
• IOPS/SLA based applications such as SQL Database can be backend
with all flash Ceph
• NVM technologies such as 3D Xpoint and 3D NANDs enable new
performance capabilities and expedite all flash adoptions
• Bluestore shows significant performance increase compared with
filestore, but still needs to be improved
• Let’s work together to make Ceph* more efficient with all-flash array!
33
THANK YOU!
34
Legalnotices
Copyright © 2016 Intel Corporation.
All rights reserved. Intel, the Intel logo, Xeon, Intel Inside, and 3D XPoint are trademarks of Intel Corporation in the U.S. and/or
other countries.
*Other names and brands may be claimed as the property of others.
FTC Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not
unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations.
Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured
by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product
User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision
#20110804
The cost reduction scenarios described in this document are intended to enable you to get a better understanding of how
the purchase of a given Intel product, combined with a number of situation-specific variables, might affect your future cost
and savings. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software,
or configuration will affect actual performance. Consult other sources of information to evaluate performance as you
consider your purchase. For more complete information about performance and benchmark results, visit
http://www.intel.com/performance.
35
Backup
37
Storage interface
Use FIORBD as storage interface
Tool
• Use “dd” to prepare data for R/W tests
• Use fio (ioengine=libaio, direct=1) to generate 4 IO patterns: sequential write/read, random write/read
• Access Span: 60GB
Run rules
• Drop osds page caches ( “1” > /proc/sys/vm/drop_caches)
• 100 secs for warm up, 600 secs for data collection
• Run 4KB/64KB tests under different # of rbds (1 to 120)
Testing Methodology
39
[global]
debug paxos = 0/0
debug journal = 0/0
debug mds_balancer = 0/0
debug mds = 0/0
mon_pg_warn_max_per_osd = 10000
debug lockdep = 0/0
debug auth = 0/0
debug mds_log = 0/0
debug mon = 0/0
debug perfcounter = 0/0
debug monc = 0/0
debug rbd = 0/0
debug throttle = 0/0
debug mds_migrator = 0/0
debug client = 0/0
debug rgw = 0/0
debug finisher = 0/0
debug journaler = 0/0
debug ms = 0/0
debug hadoop = 0/0
debug mds_locker = 0/0
debug tp = 0/0
debug context = 0/0
debug osd = 0/0
debug bluestore = 0/0
debug objclass = 0/0
debug objecter = 0/0
osd_mount_options_xfs =
rw,noatime,inode64,logbsize=256k,delaylog
osd_mkfs_type = xfs
filestore_queue_max_ops = 5000
osd_client_message_size_cap = 0
objecter_infilght_op_bytes = 1048576000
ms_dispatch_throttle_bytes = 1048576000
osd_mkfs_options_xfs = -f -i size=2048
filestore_wbthrottle_enable = True
filestore_fd_cache_shards = 64
objecter_inflight_ops = 1024000
filestore_queue_committing_max_bytes = 1048576000
osd_op_num_threads_per_shard = 2
filestore_queue_max_bytes = 10485760000
osd_op_threads = 32
osd_op_num_shards = 16
filestore_max_sync_interval = 10
filestore_op_threads = 16
osd_pg_object_context_cache_count = 10240
journal_queue_max_ops = 3000
journal_queue_max_bytes = 10485760000
journal_max_write_entries = 1000
filestore_queue_committing_max_ops = 5000
journal_max_write_bytes = 1048576000
osd_enable_op_tracker = False
filestore_fd_cache_size = 10240
osd_client_message_cap = 0
Ceph* All-Flash Tunings
debug log = 0
debug filer = 0/0
debug mds_log_expire = 0/0
debug crush = 0/0
debug optracker = 0/0
debug rados = 0/0
debug heartbeatmap = 0/0
debug buffer = 0/0
debug asok = 0/0
debug objectcacher = 0/0
debug filestore = 0/0
debug timer = 0/0
mutex_perf_counter = True
rbd_cache = False
ms_crc_header = False
ms_crc_data = False
osd_pool_default_pgp_num = 32768
osd_pool_default_size = 2
rbd_op_threads = 4
Ceph*x require signatures = False
Ceph*x sign messages = False
osd_pool_default_pg_num = 32768
throttler_perf_counter = False
auth_service_required = none
auth_cluster_required = none
auth_client_required = none

More Related Content

What's hot

Ceph Day Seoul - The Anatomy of Ceph I/O
Ceph Day Seoul - The Anatomy of Ceph I/OCeph Day Seoul - The Anatomy of Ceph I/O
Ceph Day Seoul - The Anatomy of Ceph I/OCeph Community
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Community
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph Community
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Patrick McGarry
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Community
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Community
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Community
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Community
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephDanielle Womboldt
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Danielle Womboldt
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Community
 

What's hot (17)

Ceph Day Seoul - The Anatomy of Ceph I/O
Ceph Day Seoul - The Anatomy of Ceph I/OCeph Day Seoul - The Anatomy of Ceph I/O
Ceph Day Seoul - The Anatomy of Ceph I/O
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash Storage
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce
 
MySQL Head-to-Head
MySQL Head-to-HeadMySQL Head-to-Head
MySQL Head-to-Head
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to Enterprise
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
 
Bluestore
BluestoreBluestore
Bluestore
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
 

Similar to Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster

Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Community
 
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...inwin stack
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSCeph Community
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
Intel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIeIntel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIeLow Hong Chuan
 
Forwarding Plane Opportunities: How to Accelerate Deployment
Forwarding Plane Opportunities: How to Accelerate DeploymentForwarding Plane Opportunities: How to Accelerate Deployment
Forwarding Plane Opportunities: How to Accelerate DeploymentCharo Sanchez
 
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 fin
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 finCерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 fin
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 finDEPO Computers
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Community
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephDanielle Womboldt
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Community
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Community
 
Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7MarketingArrowECS_CZ
 
Impact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCImpact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCMemVerge
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Community
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryDatabricks
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsRed_Hat_Storage
 

Similar to Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster (20)

Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
 
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
Intel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIeIntel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIe
 
Forwarding Plane Opportunities: How to Accelerate Deployment
Forwarding Plane Opportunities: How to Accelerate DeploymentForwarding Plane Opportunities: How to Accelerate Deployment
Forwarding Plane Opportunities: How to Accelerate Deployment
 
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 fin
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 finCерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 fin
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 fin
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and Ceph
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in Ceph
 
Security a SPARC M7 CPU
Security a SPARC M7 CPUSecurity a SPARC M7 CPU
Security a SPARC M7 CPU
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7
 
Yeni Nesil Sunucular ile Veritabanınız
Yeni Nesil Sunucular ile VeritabanınızYeni Nesil Sunucular ile Veritabanınız
Yeni Nesil Sunucular ile Veritabanınız
 
Impact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCImpact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPC
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster

  • 1. Peggy Shen, Software Solutions Architect, Intel Corp. Jack Zhang, Enterprise Architect, Intel Corp. 2016-08
  • 2. Legalnotices Copyright © 2016 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Intel Inside, and 3D XPoint are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. FTC Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 The cost reduction scenarios described in this document are intended to enable you to get a better understanding of how the purchase of a given Intel product, combined with a number of situation-specific variables, might affect your future cost and savings. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. 2
  • 3. Agenda • Introduction, Ceph at Intel • All-flash Ceph configurations and benchmark data • OEMs/ISVs/Intel Ceph Reference Architects/Recipes • Future Ceph* with Intel NVM Technologies 3D XpointTM and 3D NAND SSD • Summary 3*Other names and brands may be claimed as the property of others.
  • 4. 4 Acknowledgements This is team work. Thanks for the contributions of Intel Team: PRC team: Jian Zhang, Yuan Zhou, Haodong Tang, Jianpeng Ma, Ning Li US team: Daniel Ferber, Tushar Gohad, Orlando Moreno, Anjaneya Chagam
  • 6. 6 Ceph at Intel – A brief introduction Optimize for Intel® platforms, flash and networking • Compression, Encryption hardware offloads (QAT & SOCs) • PMStore (for 3D XPoint DIMMs) • RBD caching and Cache tiering with NVM • IA optimized storage libraries to reduce latency (ISA-L, SPDK) Performance profiling, analysis and community contributions • All flash workload profiling and latency analysis, performance portal http://01.org/cephperf • Streaming, Database and Analytics workload driven optimizations Ceph enterprise usages and hardening • Manageability (Virtual Storage Manager) • Multi Data Center clustering (e.g., async mirroring) End Customer POCs with focus on broad industry influence • CDN, Cloud DVR, Video Surveillance, Ceph Cloud Services, Analytics • Working with 50+ customers to help them enabling Ceph based storage solutions POCs Ready to use IA, Intel NVM optimized systems & solutions from OEMs & ISVs • Ready to use IA, Intel NVM optimized systems & solutions from OEMs & ISVs • Intel system configurations, white papers, case studies • Industry events coverage Go to market Intel® Storage Acceleration Library (Intel® ISA-L) Intel® Storage Performance Development Kit (Intel® SPDK) Intel® Cache Acceleration Software (Intel® CAS) Virtual Storage Manager Ce-Tune Ceph Profiler
  • 7. 7 Intel Ceph Contribution Timeline 2014 2015 2016 * Right Edge of box indicates approximate release date New Key/Value Store Backend (rocksdb) Giant* Hammer Infernalis Jewel CRUSH Placement Algorithm improvements (straw2 bucket type) Bluestore Backend Optimizations for NVM Bluestore SPDK Optimizations RADOS I/O Hinting (35% better EC Write erformance) Cache-tiering with SSDs (Write support) PMStore (NVM-optimized backend based on libpmem) RGW, Bluestore Compression, Encryption (w/ ISA-L, QAT backend) Virtual Storage Manager (VSM) Open Sourced CeTune Open Sourced Erasure Coding support with ISA-L Cache-tiering with SSDs (Read support) Client-side Block Cache (librbd)
  • 8. 8 Intel Ceph value add-on areas Support • Sizing • Deploying • Benchmarking • Tunings • Troubleshooting • upgrade Feature • Management • Interface – iscsi • Compression • Deduplication • Encryption Performance • Caching • SSD • SPDK – speedup Stability • Stable code • Production ready Solutions • Customizing • IOPS optimized, Throughput optimized, Capacity-archive optimized Tools & BKMs VSM & Upstream work & Library Upstream features & Reference solutions Upstream POC solutions Ceph Powered by Intel
  • 9. 9 All-flash Ceph configurations and benchmark data
  • 10. Ceph and NVM SSDs 10 * NVM – Non-volatile Memory
  • 11. Suggested Configurations for Ceph* Storage Node Standard/good (baseline): Use cases/Applications: that need high capacity storage with high throughput performance  NVMe*/PCIe* SSD for Journal + Caching, HDDs as OSD data drive  Example: 1x 1.6TB Intel® SSD DC P3700 as Journal + Intel® Cache Acceleration Software (Intel® CAS) + 12 HDDs Better IOPS Use cases/Applications: that need higher performance especially for throughput, IOPS and SLAs with medium storage capacity requirements  NVMe/PCIe SSD as Journal, no caching, High capacity SATA SSD for data drive  Example: 1x 800GB Intel® SSD DC P3700 + 4 to 6x 1.6TB DC S3510 Best Performance Use cases/Applications: that need highest performance (throughput and IOPS) and low latency.  All NVMe/PCIe SSDs  Example: 4 to 6 x 2TB Intel SSD DC P3700 Series More Information: https://intelassetlibrary.tagcmd.com/#assets/gallery/11492083/details *Other names and brands may be claimed as the property of others. 11 Ceph* storage node --Good CPU Intel(R) Xeon(R) CPU E5-2650v3 Memory 64 GB NIC 10GbE Disks 1x 1.6TB P3700 + 12 x 4TB HDDs (1:12 ratio) P3700 as Journal and caching Caching software Intel(R) CAS 3.0, option: Intel(R) RSTe/MD4.3 Ceph* Storage node --Better CPU Intel(R) Xeon(R) CPU E5-2690 Memory 128 GB NIC Duel 10GbE Disks 1x Intel(R) DC P3700(800G) + 4x Intel(R) DC S3510 1.6TB Ceph* Storage node --Best CPU Intel(R) Xeon(R) CPU E5-2699v3 Memory >= 128 GB NIC 2x 40GbE, 4x dual 10GbE Disks 4 to 6 x Intel® DC P3700 2TB
  • 12. 12 All Flash (PCIe* SSD + SATA SSD) Ceph Configuration 2x10Gb NIC Test Environment CEPH1 MON OSD1 OSD8… FIO FIO CLIENT 1 1x10Gb NIC . FIO FIO CLIENT 2 FIO FIO CLIENT 3 FIO FIO CLIENT 4 FIO FIO CLIENT 5 CEPH2 OSD1 OSD8… CEPH3 OSD1 OSD8… CEPH4 OSD1 OSD8… CEPH5 OSD1 OSD8… “Better IOPS Ceph Configuration”¹ More Information: https://intelassetlibrary.tagcmd.com/#assets/gallery/11492083/details *Other names and brands may be claimed as the property of others. ¹ For configuration see Slide 5 5x Client Node • Intel® Xeon® processor E5- 2699 v3 @ 2.3GHz, 64GB mem • 10Gb NIC 5x Storage Node • Intel® Xeon® processor E5- 2699 v3 @ 2.3 GHz • 128GB Memory • 1x 1T HDD for OS • 1x Intel® DC P3700 800G SSD for Journal (U.2) • 4x 1.6TB Intel® SSD DC S3510 as data drive • 2 OSD instances one each Intel® DC S3510 SSD
  • 13. 13 Ceph* on All Flash Array --Tuning and optimization efforts • Up to 16x performance improvement for 4K random read, peak throughput 1.08M IOPS • Up to 7.6x performance improvement for 4K random write, 140K IOPS 4K Random Read Tunings 4K Random Write Tunings Default Single OSD Single OSD Tuning-1 2 OSD instances per SSD 2 OSD instances per SSD Tuning-2 Tuning1 + debug=0 Tuning2+Debug 0 Tuning-3 Tuning2 + jemalloc tuning3+ op_tracker off, tuning fd cache Tuning-4 Tuning3 + read_ahead_size=16 Tuning4+jemalloc Tuning-5 Tuning4 + osd_op_thread=32 Tuning4 + Rocksdb to store omap Tuning-6 Tuning5 + rbd_op_thread=4 N/A - 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00 Default Tuning-1 Tuning-2 Tuning-3 Tuning-4 Tuning-5 Tuning-6 Normalized 4K random Read/Write Tunings 4K Random Read 4K random write Performance numbers are Intel Internal estimates For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Intel and Intel logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries
  • 14. 14 Ceph* on All Flash Array --Tuning and optimization efforts  1.08M IOPS for 4K random read, 144K IOPS for 4K random write with tunings and optimizations 1 2 4 8 16 32 64 128 0 200000 400000 600000 800000 1000000 1200000 1400000 LATENCY(MS) IOPS RANDOM READ PERFORMANCE RBD # SCALE TEST 4K Rand.R 8K Rand.R 16K Rand.R 64K Rand.R 63K 64k Random Read IOPS @ 40ms 300K 16k Random Read IOPS @ 10 ms 1.08M 4k Random Read IOPS @ 3.4ms500K 8k Random Read IOPS @ 8.8ms 0 2 4 6 8 10 0 20000 40000 60000 80000 100000 120000 140000 160000 LATENCY(MS) IOPS RANDOM WRITE PERFORMANCE RBD # SCALE TEST 4K Rand.W 8K Rand.w 16K Rand.W 64K Rand.W 23K 64k Random Write IOPS @ 2.6ms 88K 16kRandom Write IOPS @ 2.7ms 132K 8k Random Write IOPS @ 4.1ms 144K 4kRandom Write IOPS @ 4.3ms Excellent random read performance and Acceptable random write performance Performance numbers are Intel Internal estimates For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Intel and Intel logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries
  • 15. Ceph* on All Flash Array --Ceph*: SSD Cluster vs. HDD Cluster • Both journal on PCI Express*/NVM Express* SSD • 4K random write, need ~ 58x HDD Cluster (~ 2320 HDDs) to get same performance • 4K random read, need ~ 175x HDD Cluster (~ 7024 HDDs) to get the same performance ALL SSD Ceph* helps provide excellent TCO (both Capx and Opex), not only performance but also space, Power, Fail rate, etc. Client Node • 5 nodes with Intel® Xeon® processor E5-2699 v3 @ 2.30GHz, 64GB memory • OS : Ubuntu* Trusty Storage Node • 5 nodes with Intel® Xeon® processor E5-2699 v3 @ 2.30GHz, 128GB memory • Ceph* Version : 9.2.0, OS : Ubuntu* Trusty • 1 x Intel(R) DC P3700 SSDs for Journal per node Cluster difference: SSD cluster : 4 x Intel(R) DC S3510 1.6TB for OSD per node HDD cluster : 10 x SATA 7200RPM HDDs as OSD per node 15 0 50 100 150 200 4K Rand.W 4K Rand.R Normalized Performance Comparison HDD SSD ~ 58.2 ~175.6 Performance numbers are Intel Internal estimates For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Intel and Intel logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries
  • 16. 16 All-NVMe Ceph Cluster for MySQL Hosting Supermicro 1028U-TN10RT+ NVMe2 NVMe3 NVMe4 CephOSD1 CephOSD2 CephOSD3 CephOSD4 CephOSD16 5-Node all-NVMe Ceph Cluster Dual-Xeon E5 2699v4@2.2GHz, 44C HT, 128GB DDR4 RHEL7.2, 3.10-327, Ceph v10.2.0, bluestore async ClusterNW2x10GbE 10x Client Systems Dual-socket Xeon E5 2699v3@2.3GHz 36 Cores HT, 128GB DDR4 Public NW 2x 10GbE Docker3 Sysbench Client Docker4 Sysbench Client DB containers 16 vCPUs, 32GB mem, 200GB RBD volume, 100GB MySQL dataset, InnoDB buf cache 25GB (25%) CephRBDClient Docker1 (krbd) MySQL DB Server NVMe1 Client containers 16 vCPUs, 32GB RAM FIO 2.8, Sysbench 0.5Docker2 (krbd) MySQL DB Server 20x 1.6TB P3700 SSDs 80 OSDs 2x Replication 19TB Effective Capacity Tests at cluster fill-level 82%
  • 17. FIO 4K Random Read/Write Performance and Latency Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters. 0 1 2 3 4 5 6 7 8 9 10 11 12 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 AverageLatency(ms) IOPS IODepth Scaling - Latency vs IOPS - Read, Write, and 70/30 4K Random Mix 5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE Ceph 10.2.1 w/ BlueStore. 6x RBD FIO Clients 100% Rand Read 100% Rand Write 70% Rand Read ~1.4M 4k Random Read IOPS @~1 ms avg ~220k 4k Random Write IOPS @~5 ms avg ~560k 70/30% (OLTP) Random IOPS @~3 ms avg ~1.6M 4k Random Read IOPS @~2.2 ms avg First Ceph cluster to break ~1.4 Million 4K random IOPS, ~1ms response time in 5U 17
  • 18. Sysbench MySQL OLTP Performance (100% SELECT, 16KB Avg IO Size, QD=2-8 Avg) InnoDB buf pool = 25%, SQL dataset = 100GB 0 5 10 15 20 25 30 35 0 200000 400000 600000 800000 1000000 1200000 1400000 AvgLatency(ms) Aggregate Queries Per Second (QPS) Sysbench Thread Scaling - Latency vs QPS – 100% read (Point SELECTs) 5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE Ceph 10.1.2 w/ BlueStore. 20 Docker-rbd Sysbench Clients (16vCPUs, 32GB) 100% Random Read ~55000 QPS with 1 client 1 million QPS with 20 clients @ ~11 ms avg 2 Sysbench threads/client ~1.3 million QPS with 20 Sysbench clients, 8 Sysbench threads/client Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters. 18 Database page size = 16KB
  • 19. Sysbench MySQL OLTP Performance (100% UPDATE, 70/30% SELECT/UPDATE) 0 50 100 150 200 250 300 350 400 450 500 0 100000 200000 300000 400000 500000 600000 AvgLatency(ms) Aggregate Queries Per Second (QPS) Sysbench Thread Scaling - Latency vs QPS – 100% Write (Index UPDATEs), 70/30% OLTP 5 nodes, 80 OSDs, Xeon E5 2699v4 Dual Socket / 128GB Ram / 2x10GbE Ceph 10.2.1 w/ BlueStore. 20 Docker-rbd Sysbench Clients (16vCPU, 32GB) 100% Random Write 70/30% Read/Write ~400k 70/30% OLTP QPS@~50 ms avg ~25000 QPS w/ 1 Sysbench client (4-8 threads) ~100k Write QPS@~200 ms avg (Aggregate, 20 clients) ~5500 QPS w/ 1 Sysbench client (2-4 threads) InnoDB buf pool = 25%, SQL dataset = 100GB Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters. 19 Database page size = 16KB
  • 22. Available Reference Architectures (recipes) • http://www.redhat.com/en/files/resources/en-rhst-cephstorage-supermicro- INC0270868_v2_0715.pdf • http://www.qct.io/account/download/download?order_download_id=1065&dtype=Reference%20 Architecture • https://www.redhat.com/en/resources/red-hat-ceph-storage-hardware-configuration-guide • https://www.percona.com/resources/videos/accelerating-ceph-database-workloads-all-pcie-ssd-cluster • https://www.percona.com/resources/videos/mysql-cloud-head-head-performance-lab https://www.thomas-krenn.com/en/products/storage-systems/suse-enterprise-storage/ses-appliance-performance.html • https://intelassetlibrary.tagcmd.com/#assets/gallery/11492083
  • 23. 23 Future Ceph* with Intel NVM Technologies 3D XpointTM and 3D NAND
  • 25. Moore’s Law Continues to Disrupt the Computing Industry U.2 SSD First Intel® SSD for Commercial Usage 2017 >10TB 1,000,000x the capacity while shrinking the form factor 1992 12MB Source: Intel projections on SSD capacity 2019201820172014 >6TB >30TB 1xxTB>10TB 25
  • 26. 3D XPoint™ Latency: ~100X Size of Data: ~1,000X NAND Latency: ~100,000X Size of Data: ~1,000X Latency: 1X Size of Data: 1X SRAM Latency: ~10 MillionX Size of Data: ~10,000 X HDD Latency: ~10X Size of Data: ~100X DRAM 3DXpoint™TECHNOLOGY STORAGE Technology claims are based on comparisons of latency, density and write cycling metrics amongst memory technologies recorded on published specifications of in-market memory products against internal Intel specifications. Performance numbers are Intel Internal estimates For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Intel and Intel logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries
  • 27. 27 Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Server Configuration: 2x Intel® Xeon® E5 2690 v3 NVM Express* (NVMe) NAND based SSD: Intel P3700 800 GB, 3D Xpoint based SSD: Optane NVMe OS: Red Hat* 7.1 Intel® Optane™ storage (prototype) vs Intel® SSD DC P3700 Series at QD=1
  • 28. 3DXpoint™Technology-NSGSoftwareReadiness CAS-LinuxLatencyOptimizations Configuration & Methodology • Latency measured in microseconds (us) • Intel® Xeon® Dual Socket E5 2699 v3 @ 2.3GHz, 64GB ECC DDR4 DRAM • Red Hat Enterprise Linux Server release 7.0, kernel version: 3.10.0-123.13.2.el7 • fio-2.2.8, zipfian distribution with theta 1.2 (random_distribution=zipf:1.2), test file size=50GiB • 100% cache hits for following workload: 4K read, 1 worker, 1 qd, all IO requests are served from cache • Cache Device: Intel® DC P3700 400GB, raw access to block device (no file system on SSD Cache SSD) • Latency comparisons between major Intel® CAS Linux software releases Dec’14 June’15 Dec’16 Jul’16 Intel technologies may require enabled hardware, specific software, or services activation. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as Optistruct* are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. *Other names and brands may be claimed as the property of others
  • 29. 29 5X lower 99th% Higher is better *Benchmarked on early prototype samples, 2S Haswell/Broadwell Xeon platform single server. Data produced without any tuning. We expect performance to improve with tuning. PCIe SSD Intel Optane Lower is better PCIe SSD Intel Optane 2X the Throughput Performance numbers are Intel Internal estimates For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Intel and Intel logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries
  • 30. Storage Hierarchy Tomorrow Hot 3D XPoint™ DIMMs NVM Express* (NVMe) 3D XPoint™ SSDs Warm NVMe 3D NAND SSDs Cold NVMe 3D NAND SSDs SATA or SAS HDDs ~6GB/s per channel ~250 nanosecond latency PCI Express* (PCIe*) 3.0 x4 link, ~3.2 GB/s <10 microsecond latency SATA* 6Gbps Minutes offline DRAM: 10GB/s per channel, ~100 nanosecond latency PCIe 3.0 x4, x2 link <100 microsecond latency Comparisons between memory technologies based on in-market product specifications and internal Intel specifications. Server side and/or AFA Business Processing High Performance/In-Memory Analytics Scientific Cloud Web/Search/Graph Big Data Analytics (Hadoop*) Object Store / Active-Archive Swift, lambert, HDFS, Ceph* Low cost archive 30
  • 31. 31 3D XPoint™ & 3D NAND Enable High performance & cost effective solutions Enterprise class, highly reliable, feature rich, and cost effective AFA solution: ‒ NVMe as Journal, 3D NAND TLC SSD as data store Enhance value through special software optimization on filestore and bluestore backend Ceph Node S3510 1.6TB S3510 1.6TB S3510 1.6TB S3510 1.6TB P3700 U.2 800GB Ceph Node P4500 4TB P4500 4TB P4500 4TB P4500 4TB P3700 & 3D Xpoint™ SSDs 3D NAND P4500 4TB 3D XPoint™ (performance) (capacity)
  • 32. 32 3D Xpoint™ opportunities: Bluestore backend • Three usages for PMEM device • Backend of bluestore: raw PMEM block device or file of dax-enabled FS • Backend of rocksdb: raw PMEM block device or file of dax-enabled FS • Backend of rocksdb’s WAL: raw PMEM block device or file of DAX-enabled FS • Two methods for accessing PMEM devices • libpmemblk • mmap + libpmemlib • https://github.com/Ceph*/Ceph*/pull/8761 BlueStore Rocksdb BlueFS PMEMDevice PMEMDevice PMEMDevice Metadata Libpmemlib Libpmemblk DAX Enabled File System mmap Load/store mmap Load/store File File File API API Data
  • 33. Summary • Strong demands and trends to all-flash array Ceph* solutions • IOPS/SLA based applications such as SQL Database can be backend with all flash Ceph • NVM technologies such as 3D Xpoint and 3D NANDs enable new performance capabilities and expedite all flash adoptions • Bluestore shows significant performance increase compared with filestore, but still needs to be improved • Let’s work together to make Ceph* more efficient with all-flash array! 33
  • 35. Legalnotices Copyright © 2016 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Intel Inside, and 3D XPoint are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. FTC Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 The cost reduction scenarios described in this document are intended to enable you to get a better understanding of how the purchase of a given Intel product, combined with a number of situation-specific variables, might affect your future cost and savings. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. 35
  • 36.
  • 38. Storage interface Use FIORBD as storage interface Tool • Use “dd” to prepare data for R/W tests • Use fio (ioengine=libaio, direct=1) to generate 4 IO patterns: sequential write/read, random write/read • Access Span: 60GB Run rules • Drop osds page caches ( “1” > /proc/sys/vm/drop_caches) • 100 secs for warm up, 600 secs for data collection • Run 4KB/64KB tests under different # of rbds (1 to 120) Testing Methodology
  • 39. 39 [global] debug paxos = 0/0 debug journal = 0/0 debug mds_balancer = 0/0 debug mds = 0/0 mon_pg_warn_max_per_osd = 10000 debug lockdep = 0/0 debug auth = 0/0 debug mds_log = 0/0 debug mon = 0/0 debug perfcounter = 0/0 debug monc = 0/0 debug rbd = 0/0 debug throttle = 0/0 debug mds_migrator = 0/0 debug client = 0/0 debug rgw = 0/0 debug finisher = 0/0 debug journaler = 0/0 debug ms = 0/0 debug hadoop = 0/0 debug mds_locker = 0/0 debug tp = 0/0 debug context = 0/0 debug osd = 0/0 debug bluestore = 0/0 debug objclass = 0/0 debug objecter = 0/0 osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog osd_mkfs_type = xfs filestore_queue_max_ops = 5000 osd_client_message_size_cap = 0 objecter_infilght_op_bytes = 1048576000 ms_dispatch_throttle_bytes = 1048576000 osd_mkfs_options_xfs = -f -i size=2048 filestore_wbthrottle_enable = True filestore_fd_cache_shards = 64 objecter_inflight_ops = 1024000 filestore_queue_committing_max_bytes = 1048576000 osd_op_num_threads_per_shard = 2 filestore_queue_max_bytes = 10485760000 osd_op_threads = 32 osd_op_num_shards = 16 filestore_max_sync_interval = 10 filestore_op_threads = 16 osd_pg_object_context_cache_count = 10240 journal_queue_max_ops = 3000 journal_queue_max_bytes = 10485760000 journal_max_write_entries = 1000 filestore_queue_committing_max_ops = 5000 journal_max_write_bytes = 1048576000 osd_enable_op_tracker = False filestore_fd_cache_size = 10240 osd_client_message_cap = 0 Ceph* All-Flash Tunings debug log = 0 debug filer = 0/0 debug mds_log_expire = 0/0 debug crush = 0/0 debug optracker = 0/0 debug rados = 0/0 debug heartbeatmap = 0/0 debug buffer = 0/0 debug asok = 0/0 debug objectcacher = 0/0 debug filestore = 0/0 debug timer = 0/0 mutex_perf_counter = True rbd_cache = False ms_crc_header = False ms_crc_data = False osd_pool_default_pgp_num = 32768 osd_pool_default_size = 2 rbd_op_threads = 4 Ceph*x require signatures = False Ceph*x sign messages = False osd_pool_default_pg_num = 32768 throttler_perf_counter = False auth_service_required = none auth_cluster_required = none auth_client_required = none