SlideShare a Scribd company logo
1 of 44
Accelerate Ceph via SPDK
XSKY’s BlueStore as a case study
Danny.Kuo@intel.com
Haomai Wang, XSKY
Outline
• Background
• SPDK introduction
• XSKY’s BlueStore
• Conclusion
• Low performance of Ceph’s storage service
• Ceph’s original architecture is designed on the low speed storage devices
(with ms latency level)
• There are more and more fast devices in both Network and Storage
• Network: 10G/25G/40G/100G (low performance → high performance)
• Storage: HDD → SATA SSD → NMVe SSD → NVDIMM (high latency →
low latency)
• Challenge: Software design and implementation in Ceph is the bottleneck
• Equipped with those fast devices, software needs to be refreshed to explore
the limitation of those hardware devices.
Background – Performance Driven
SAN(Storage
AreaNetwork)
Application Server Application Server
Capacity
Performance
Scale-up
Capacity
Performance
Scale-out
CephCluster
Standard Ethernet
network
Data distributed
across multiple nodes
or clusters
Flexible design to
support multiple
workloads
Separate,
dedicated networks
Data stored in
proprietary storage
hardware
Optimized to run only
a specific workload
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
7200 RPM 15000
RPM
SATA
NAND
Enterprise
NAND
Optane
SSD
3D Xpoint
DIMMs
Hardware vs. Software Latency
Drive Read Latency Software Overhead
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
7200 RPM 15000 RPMSATA NAND Enterprise
NAND
Optane SSD 3D Xpoint
DIMMs
Media vs. Network + Software Latency
Drive Read Latency Network Latency (200usec)
Background – New Hardware need New Balance
FileStore and problems in Ceph
• FileStore
• PG = collection = directory
• Object = file
• Advantages:
• Most are simple via POSIX
interface
• Disadvantages:
• Poor to extend advanced features
like compress/checksum
• POSIX Fails:
• Transaction Atomic → Double
Write (increase latency)
• Enumeration → Build directory
tree by hash-value prefix (need
high computing power)
Potential Solutions
• Invent a new ObjectStore/FileStore design and implementation in
following aspects:
API Change
• Synchronous APIs → Asynchronous APIs (POSIX → NON-POSIX)
• Benefit: Obtaining performance via completing several requests instead of
one.
I/O stack optimization:
• Replace Kernel I/O stacks with user space stacks (e.g., Network I/O,
Storage I/O )
• Benefit: No context switch, no data copy among kernel and user space,
locked architecture → unlocked architecture
SPDK (Storage Performance Development Kit, https://www.spdk.io/)
provides a set of libraries to address such issues.
asynchronous, polled mode, zero-copy
Outline
• Background
• SPDK introduction
• XSKY’s BlueStore
• Conclusion
Scalable and Efficient
Millions of IOPS per core
Linear scaling with more cores
iSCSI and NVMe over Fabrics targets
IA-Optimized Storage Reference Architecture
Lockless, polled-mode drivers and protocol libraries
Designed for 3D XPoint® media latencies
BSD licensed drivers via github.com/spdk
User-Space & Polled-Mode, End-to-End
No Kernel/Interrupt context switching overhead
Drops latencies from microsecond to nanosecond
Storage Performance
Development Kit
Built on Intel® Data Plane Development Kit (DPDK)
Software infrastructure to accelerate the packet input/output to Intel CPU
*Other names and brands may be claimed as the property of others.
Storage Performance
Development Kit
User space Network Services (UNS)
TCP/IP stack implemented as polling, lock-light library, bypassing
kernel bottlenecks, and enabling scalability
User space NVMe, Intel® Xeon®/Intel® Atom™
Processor DMA, and Linux* AIO drivers
Optimizes back end driver performance and prevents kernel
bottlenecks from forming at the back end of the I/O chain
Reference Software with Example Application
Customer-relevant example application leveraging Intel® Storage
Acceleration Libraries (ISA-L) is included; support provided on a best-
effort basis
User Space
KNI IGB_UIO VFIO
EAL
MBUF
MEMPOO
L
RING
TIMER
KernelUIO_PCI_GENERI
C
FM10K
IXGBE
VMXNET
3
IGB
E1000
I40E
XENVIRT PCAP
MLX4
MLX5
ETHDEV
RING
NULL
AF_PKT
BONDING
VIRTIOENIC
CXGBE
BNX2X
PMDs: Native & Virtual
SZEDATA2
NFP
MPIPE
HASH
LPM
ACL
JOBSTAT
DISTRIB
IP FRAGKNI
REORDE
R
POWER
VHOST
IVSHMEM
SCHED
METER
PIPELINE
PORT TABLE
Network Functions (Cloud, Enterprise, Comms)
CRYPTO
QAT
AESNI
MB
Future
TBD
AcceleratorsCore
Classify Extensions QoS Pkt Framework
ENA
AESNI GCM
SNOW
3G
NULL
VHOST
ISA-L
DPDK Framework
PERFORMANCE OPTIMIZING
DATA
PROTECTION
XOR (RAID 5), P+Q (RAID 6), Reed Solomon Erasure
Code
COMPRESSION
“DEFLATE”
IGZIP: Fast CompressionMulti-Buffer: SHA-1, SHA-256, SHA-512, MD5
CRYPTOGRAPHIC
HASHING
Dog
06d80e7
b0C50bs
49a509t
b49f249
24e8c8o
05x84q4
CRC-T10, CRC-IEEE (802.3),
CRC32-iSCSI
DATA
INTEGRITY
ReceiverSender
CRC DataDivisor
00..0 Data
Remainder
Remainder
Divisor
CRC Data
Zero,
accept
Non-zero,
reject
CRC n bits
n+1 bits
XTS-AES 128,
XTS-AES 256
ENCRYPTION
plaintext
ReceiverSender
plaintext
Decryption
Algorithm
Encryption
Algorithm
Ciphertext
Public encryption key
Private encryption keydB
eB
1111
Intel® ISA-L Functions
12
Extends Data Plane Development Kit concepts through an end-to-end storage context
 Optimized, user-space lockless polling in the NIC driver, TCP/IP stack, iSCSI target, and NVMe driver
 iSCSI and NVMe over Fabrics targets integrated
Exposes the performance potential of current and next-generation storage media
 Media latencies moving from µsec to nsec, storage software architectures must keep up
 Permissive open source license for user-space media drivers: NVMe & CBDMA drivers are on github.com
 Media drivers support both Linux* and FreeBSD*
NVMf Application and Protocol Library:
 Provisioning, Fabric Interface Processing , Memory Allocation, Fabric Connection Handling, RDMA Data Xfer
 Discovery, Subsystems, Logical Controller, Capsule Processing, Manage Interface with NVMe Driver library
*Other names and brands may be claimed as the property of others.
iSCSI
Target
Block
Device
Abstraction
DPDK
NIC Driver
TCP
IP (UNS)
NVMe Driver
DPDK LIBRARIES
NIC
User-space
Mem Driver DDR
CBDMACBDMA Driver
CLOUD
WRITEREAD
Customer SW
Existing SW
Linux* Kernel
Enhanced SW
NVMe
HW
RNIC
NVMf
Target
RDMA
VERBS
RNIC
RDMA
SW
Linux* OFED
SPDK architecture Overview
Licensed Package Includes:
 Media Drivers: I/OAT DMA (CBDMA) and NVMe
Protocols: iSCSI and NVMe over Fabrics (NVMf)
Optimized Libraries: DPDK and UNS TCP/IP Stack
 User space support code (written in C):
 POSIX compliant
 Demo/Usage, Unit test (functional correctness),
Basic performance
 API manuals – may include links or copy key papers
 Release.txt – release notes, version, etc.
Source Agreement
 BSD licensed code distributed via https://github.com/spdk
 Licensed version (including UNS and other components in
development) is available under non-commercial restricted
license and full software license agreement
 All code is provided as reference software with best-effort
support model
SPDK
Packaging and
Contents
Performance comparison:
User-space NVMe driver vs. Kernel NVMe driver
4KB Random Read Performance: Partition variants on 4 NVMe SSD Drives
Single-Core Intel® Xeon® Processor
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 Partition 2 Partitions 4 Partitions 8 Partitions 16 Partitions
IOps(thousands)
Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and
MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You
should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with
other products. For more information go to http://www.intel.com/performance.
Kernel
NVMe Driver
SPDK
NVMe Driver
SPDK NVMe driver delivers up to 6x performance improvement
vs. Kernel NVMe driver with a single-core Intel® Xeon® processor
4KB Random Read Performance: 1 to 4 NVMe SSD Drives
Single-Core Intel® Xeon® Processor
SPDK NVMe driver scales linearly in performance
from 1 to 4 NVMe drives with a single-core Intel® Xeon® processor
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 NVMe 2 NVMe 4 NVMe
IOps(thousands)
Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark,
are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
information go to http://www.intel.com/performance.
Kernel
NVMe Driver
SPDK
NVMe Driver
Performance comparison:
iSCSI target in SPDK vs. Linux-IO target
Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and
performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
For more information go to http://www.intel.com/performance
0
1
2
3
4
5
6
7
8
0
100
200
300
400
500
600
1 2 4 8 16 32
Lines-Latency(inmsec)
(lowerisbetter)
Bars-IOps(inthousands)
(higherisbetter)
Queue Depth
0
2
4
6
8
10
0
100
200
300
400
500
600
1 2 4 8 16 32
Lines-#ofCoresutilized
(lowerisbetter)
Bars-IOps(inthousands)
(higherisbetter)
Queue Depth
IOps vs. LATENCY IOps vs. CORE UTILIZATION
SPDK 2 Core
LIO unlimited cores
LIO 2 Core
SPDK 4 Core
SPDK can provide similar IOps and latency characteristics as LIO
while utilizing up to 8 fewer cores
Intel® Xeon® Processor v3 – 4KB - iSCSI Random Read:
SPDK vs. LIO
Intel® Xeon® Processor v3 – 4KB - iSCSI Random Write:
SPDK vs. LIO
0
1
2
3
4
5
6
7
8
9
10
0
50
100
150
200
250
300
350
400
450
500
1 2 4 8 16 32
Lines-#ofCoresutilized
(lowerisbetter)
Bars-IOps(inthousands)
(higherisbetter)
Queue Depth
Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and
performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
For more information go to http://www.intel.com/performance
0
1
2
3
4
5
6
7
8
9
10
0
50
100
150
200
250
300
350
400
450
500
1 2 4 8 16 32
Lines-Latency(inmsec)
(lowerisbetter)
Bars-IOps(inthousands)
(higherisbetter)
Queue Depth
SPDK can provide similar IOps and latency characteristics as LIO
while utilizing up to 2 fewer cores
IOps vs. LATENCY IOps vs. CORE UTILIZATION
SPDK 2 Core
LIO unlimited cores
LIO 2 Core
SPDK 4 Core
Intel® Xeon® Processor E5-2620v2-iSCSI Read/Write:
4 KB Data
0
100
200
300
400
500
600
IO/s(inthousands)
LIO 6 CoreSPDK
2 Core
SPDK
1 Core
0
50
100
150
200
250
300
350
IO/s(inthousands)
LIOSPDK
+ Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products. Source: Intel Internal Measurements as of 22 August 2014. See back up slide # 10-13 for configuration details.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction
sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this
product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors.
Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804.
For more information go to http://www.intel.com/performance
NVM
Express
Backend
PERFORMANCE PERFORMANCE/CORE
Up to 650% increase in max performance per core+
4 KB-Random-
100% Read
4 KB-Random-
70% Read 30% Write
4 KB-Random-
100% Write
Performance demonstration:
User-space NVMf target in SPDK
SPDK NVMf Performance Approaches Local NVMe
Efficiency and Scalable Performance
 NVMe >2M IOPS per Xeon-D core
 NVMf 1.2M IOPS per Xeon-D core
Optimized for Intel® Architecture and
NVMe
 Latency and jitter reduced
 Leaves CPU cycles for storage
application and value
 4X efficiency of kernel NVMe driver
0.3
1.8
1.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1x NVMe 2x NVMe 4x NVMe
IOPS(inmillions)
higherisbetter
Single Core Performance Comparison
Intel® Xeon® processor D, Intel P3700 800GB SSDs, FIO-
2.2.9, direct=1, iodepth=128 per LUN
Kernel NVMe Driver SPDK NVMe Driver SPDK NVMf Target
Configuration details SPDK NVMf Target Intel Xeon-D Processor D-1567 No. of Sockets 1 No. of Cores 12 Cores/ 12 Threads per socket Memory 32G NIC Mellanox Connectx-4 EN Adapter (25Gbps) – Dual Port MTU 1500 OS Fedora 23
Linux Kernel 4.4.3-300.fc23.x86_64 NVMf Target Intel SPDK Storage Array Component Details Storage Drives 4x Intel SSD 800GB DC P3700series NVMf Initiator Component Details No. of Initiators Dell PowerEdge 730xd Processor Intel
Xeon Processor E5-2699v3 (45M Cache, 2.30 GHz) No. of Sockets 2 sockets populated No. of Cores 18 Cores/ 18 Threads per socket Memory 132G NIC Mellanox Connectx-4 EN Adapter (25Gbps) – Dual Port MTU 1500 OS RHEL 7.2 Linux
Kernel 4.5.0-rc3. Tested by Intel, 3/22/2016
NVMf Target
NVMf Client
SPDK NVMf Target
SPDK NVMe lib
NVMe Controller
spdk_lib_read_start spdk_lib_read_complete
FIO- Libaio- Block I/O- Direct- 1 worker per device -1 Queue Depth to each device
nvmf_read I/O start nvmf_read I/O complete
93 usec
• 93 usec round trip time measured from NVMf
client
• Out of 93usec, ~80 usec spent in NVMe
controller
• 12-13 usec measured time over the fabric
• SPDK NVMf target adding ~3% to fabric
overhead
Disclaimer:
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and
functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products.
For more information go to http://www.intel.com/performance
85.6 usec
85.7 usec
FIO Read
start
6.5 usec
6.6 usec
85.6 usec6.7 usec
NVMf IO Latency Model, 4KB 100% Random read
What SPDK can do to improve Ceph?
• Accelerate the backend I/Os in Ceph OSD (Object storage service)
• Key solution: Replace the Kernel drivers with user-space NVMe drivers provided by
SPDK to accelerate the I/Os on NVMe SSDs.
• Accelerate the client I/O performance on Ceph Cluster
• Key solution: use the accelerated iSCSI application and user-space NVMe drivers in
SPDK to build a caching solution in front of Ceph Clusters.
• Accelerate the network performance (TCP/IP) in Ceph’s internal network.
• Key solution: Replace the existed network solution provided by kernel in each OSD
Node with DPDK + User-space TCP/IP stack (e.g., LIBUNS, SEASTAR, MTCP and
etc.).
Outline
• Background
• SPDK introduction
• XSKY’s BlueStore
• Conclusion
• key/value database (RocksDB) for metadata
• data written directly to block device
 Write through cache
• pluggable block Allocator (policy)
• Adaptive driver policy
 Kernel and user-space coexist
BlueStore
Consume raw block device
From Sage 06/21 Talk
Performance Status -- Sequential Write(HDD)
From Sage 06/21 Talk
Performance Status -- Random Write(HDD)
• Done
• fully function IO path with checksums and compression
• fsck
• bitmap-based allocator and freelist
• Current efforts
• optimize metadata encoding efficiency
• performance tuning
• ZetaScale key/value db as RocksDB alternative
• bounds on compressed blob occlusion
• Coming Soon
• per-pool properties that map to compression, checksum, IO hints more
performance optimization
• native SMR support high density HDD
• Leverage SPDK (Bypass kernel for NVMe devices)
From Sage 06/21 Talk
BlueStore Status
High performance gap
Performance Bottleneck – kernel AIO library
• Non Local Connections
• NIC RX and application call in different core
• Global TCP Control Block Management
• Socket API Overhead (building the link)
Other Kernel Bottleneck
BlueStore Architecture with DPDK/SPDK
DPDK-Messenger Plugin, create alternative data path
• TCP, IP, ARP, DPDK Device:
• hardware features offloads
• port from seastar TCP/IP stack
• integrated with Ceph’s libraries
• Event-drive:
• User-space Event Center(like epoll)
• NetworkStack API:
• Basic Network Interface With Zero-copy or Non Zero-copy
• Ensure PosixStack ↔ DPDK Stack Compatible
• AsyncMessenger:
• A collection of Connections
• Network Error Policy
Design
asynchronous, polled mode, zero-copy
• Local Listen Table → low latency
• Local Connection Process → run-to-complete
• TCP 5 Tuples → RX/TX Cores (RSS)
• Mbuf go through the whole IO Stack → prevent context switch
Shared Nothing TCP/IP (local TCP/IP)
asynchronous, polled mode, zero-copy
• Status
• User-space NVMe Library(SPDK)
• Already in Ceph master branch
• DPDK integrated
• IO Data From NIC(DPDK mbuf) To Device
• Missing part (plan in Q4’16)
• User-space Cache
NVMe Device
asynchronous, polled mode, zero-copy
Details
Random 4KB Read Random 4KB Write
IOPS
Kernel Userspace
Random 4KB Read Random 4KB Write
Avg Latency
Kernel Userspace
Improvements
• Core Logics
• no signal/wait
• future/promise
• full async
• Memory Allocation
• rte_malloc isn’t effective enough
• mbuf live cycle control
• Full user-space logic
Bluestore Roadmap
asynchronous, polled mode, zero-copy
Outline
• Background
• SPDK introduction
• XSKY’s BlueStore
• Conclusion
• There are performance issues in Ceph with the emerging fast network and
storage devices.
• Storage system need to refactor to catch up hardware.
• Ceph is hoped to change to share-nothing implementation.
• Mainly, We introduce SPDK and BlueStore to address the current issues in
Ceph.
• SPDK: Libraries (e.g., user-space NVMe driver) can be used for performance
acceleration.
• BlueStore: Invent a new store to implement lockless, asynchronous and high
performance storage service.
• Lots of details need to work(coming soon)
Summary
THANK YOU
Overview: ObjectStore and Data Model
• ObjectStore
• Abstract interface for storing local
data
• decouple data and metadata
EBOFS, FileStore
• EBOFS highlight
• A user space extent-based object
file system
• Deprecated in favor of FileStore on
BTRFS in 2009
• Object – “file”
• data (file-like byte stream)
• Attributes (small key/value)
• Omap (unbounded key/value)
• Collection-”directory”
• placement group shard(slice of the
RADOS pool)
• shared by 32bit hash value
• All writes are transactions
• Atomic + Consistent + Durable
• Isolation provided by OSD
RADOS
EBOFS
RADOS
EBOFS

More Related Content

What's hot

BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmicsDenys Haryachyy
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPThomas Graf
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSHSage Weil
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelDivye Kapoor
 

What's hot (20)

Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Dpdk pmd
Dpdk pmdDpdk pmd
Dpdk pmd
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Intel dpdk Tutorial
Intel dpdk TutorialIntel dpdk Tutorial
Intel dpdk Tutorial
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
PCI Drivers
PCI DriversPCI Drivers
PCI Drivers
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Ceph
CephCeph
Ceph
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 

Viewers also liked

Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient Ceph Community
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationCeph Community
 
Ceph de facto storage backend for OpenStack
Ceph de facto storage backend for OpenStack Ceph de facto storage backend for OpenStack
Ceph de facto storage backend for OpenStack eNovance
 
Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt Ceph Community
 
Ceph Day Shanghai - Ceph in Ctrip
Ceph Day Shanghai - Ceph in CtripCeph Day Shanghai - Ceph in Ctrip
Ceph Day Shanghai - Ceph in CtripCeph Community
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Community
 
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Community
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Community
 
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Community
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashCeph Community
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Community
 
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Community
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph Ceph Community
 
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Community
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Community
 
Ceph Day Seoul - Community Update
Ceph Day Seoul - Community UpdateCeph Day Seoul - Community Update
Ceph Day Seoul - Community UpdateCeph Community
 

Viewers also liked (20)

Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
 
Ceph de facto storage backend for OpenStack
Ceph de facto storage backend for OpenStack Ceph de facto storage backend for OpenStack
Ceph de facto storage backend for OpenStack
 
Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt
 
Ceph Day Shanghai - Ceph in Ctrip
Ceph Day Shanghai - Ceph in CtripCeph Day Shanghai - Ceph in Ctrip
Ceph Day Shanghai - Ceph in Ctrip
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to Enterprise
 
librados
libradoslibrados
librados
 
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage
 
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
 
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Day KL - Bluestore
Ceph Day KL - Bluestore
 
Ceph Day Seoul - Community Update
Ceph Day Seoul - Community UpdateCeph Day Seoul - Community Update
Ceph Day Seoul - Community Update
 

Similar to Ceph Day Taipei - Accelerate Ceph via SPDK

Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Community
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephDanielle Womboldt
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSCeph Community
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
JetStor portfolio update final_2020-2021
JetStor portfolio update final_2020-2021JetStor portfolio update final_2020-2021
JetStor portfolio update final_2020-2021Gene Leyzarovich
 
Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Michelle Holley
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Odinot Stanislas
 
M|18 Intel and MariaDB: Strategic Collaboration to Enhance MariaDB Functional...
M|18 Intel and MariaDB: Strategic Collaboration to Enhance MariaDB Functional...M|18 Intel and MariaDB: Strategic Collaboration to Enhance MariaDB Functional...
M|18 Intel and MariaDB: Strategic Collaboration to Enhance MariaDB Functional...MariaDB plc
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Community
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Community
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...PT Datacomm Diangraha
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Community
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkLenovo Data Center
 
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...Intel IT Center
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergenceinside-BigData.com
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryDatabricks
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Community
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...In-Memory Computing Summit
 

Similar to Ceph Day Taipei - Accelerate Ceph via SPDK (20)

Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in Ceph
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
JetStor portfolio update final_2020-2021
JetStor portfolio update final_2020-2021JetStor portfolio update final_2020-2021
JetStor portfolio update final_2020-2021
 
Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
 
M|18 Intel and MariaDB: Strategic Collaboration to Enhance MariaDB Functional...
M|18 Intel and MariaDB: Strategic Collaboration to Enhance MariaDB Functional...M|18 Intel and MariaDB: Strategic Collaboration to Enhance MariaDB Functional...
M|18 Intel and MariaDB: Strategic Collaboration to Enhance MariaDB Functional...
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
 
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Ceph Day Taipei - Accelerate Ceph via SPDK

  • 1. Accelerate Ceph via SPDK XSKY’s BlueStore as a case study Danny.Kuo@intel.com Haomai Wang, XSKY
  • 2. Outline • Background • SPDK introduction • XSKY’s BlueStore • Conclusion
  • 3. • Low performance of Ceph’s storage service • Ceph’s original architecture is designed on the low speed storage devices (with ms latency level) • There are more and more fast devices in both Network and Storage • Network: 10G/25G/40G/100G (low performance → high performance) • Storage: HDD → SATA SSD → NMVe SSD → NVDIMM (high latency → low latency) • Challenge: Software design and implementation in Ceph is the bottleneck • Equipped with those fast devices, software needs to be refreshed to explore the limitation of those hardware devices. Background – Performance Driven
  • 4. SAN(Storage AreaNetwork) Application Server Application Server Capacity Performance Scale-up Capacity Performance Scale-out CephCluster Standard Ethernet network Data distributed across multiple nodes or clusters Flexible design to support multiple workloads Separate, dedicated networks Data stored in proprietary storage hardware Optimized to run only a specific workload 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 7200 RPM 15000 RPM SATA NAND Enterprise NAND Optane SSD 3D Xpoint DIMMs Hardware vs. Software Latency Drive Read Latency Software Overhead 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 7200 RPM 15000 RPMSATA NAND Enterprise NAND Optane SSD 3D Xpoint DIMMs Media vs. Network + Software Latency Drive Read Latency Network Latency (200usec) Background – New Hardware need New Balance
  • 5. FileStore and problems in Ceph • FileStore • PG = collection = directory • Object = file • Advantages: • Most are simple via POSIX interface • Disadvantages: • Poor to extend advanced features like compress/checksum • POSIX Fails: • Transaction Atomic → Double Write (increase latency) • Enumeration → Build directory tree by hash-value prefix (need high computing power)
  • 6. Potential Solutions • Invent a new ObjectStore/FileStore design and implementation in following aspects: API Change • Synchronous APIs → Asynchronous APIs (POSIX → NON-POSIX) • Benefit: Obtaining performance via completing several requests instead of one. I/O stack optimization: • Replace Kernel I/O stacks with user space stacks (e.g., Network I/O, Storage I/O ) • Benefit: No context switch, no data copy among kernel and user space, locked architecture → unlocked architecture SPDK (Storage Performance Development Kit, https://www.spdk.io/) provides a set of libraries to address such issues. asynchronous, polled mode, zero-copy
  • 7. Outline • Background • SPDK introduction • XSKY’s BlueStore • Conclusion
  • 8. Scalable and Efficient Millions of IOPS per core Linear scaling with more cores iSCSI and NVMe over Fabrics targets IA-Optimized Storage Reference Architecture Lockless, polled-mode drivers and protocol libraries Designed for 3D XPoint® media latencies BSD licensed drivers via github.com/spdk User-Space & Polled-Mode, End-to-End No Kernel/Interrupt context switching overhead Drops latencies from microsecond to nanosecond Storage Performance Development Kit
  • 9. Built on Intel® Data Plane Development Kit (DPDK) Software infrastructure to accelerate the packet input/output to Intel CPU *Other names and brands may be claimed as the property of others. Storage Performance Development Kit User space Network Services (UNS) TCP/IP stack implemented as polling, lock-light library, bypassing kernel bottlenecks, and enabling scalability User space NVMe, Intel® Xeon®/Intel® Atom™ Processor DMA, and Linux* AIO drivers Optimizes back end driver performance and prevents kernel bottlenecks from forming at the back end of the I/O chain Reference Software with Example Application Customer-relevant example application leveraging Intel® Storage Acceleration Libraries (ISA-L) is included; support provided on a best- effort basis
  • 10. User Space KNI IGB_UIO VFIO EAL MBUF MEMPOO L RING TIMER KernelUIO_PCI_GENERI C FM10K IXGBE VMXNET 3 IGB E1000 I40E XENVIRT PCAP MLX4 MLX5 ETHDEV RING NULL AF_PKT BONDING VIRTIOENIC CXGBE BNX2X PMDs: Native & Virtual SZEDATA2 NFP MPIPE HASH LPM ACL JOBSTAT DISTRIB IP FRAGKNI REORDE R POWER VHOST IVSHMEM SCHED METER PIPELINE PORT TABLE Network Functions (Cloud, Enterprise, Comms) CRYPTO QAT AESNI MB Future TBD AcceleratorsCore Classify Extensions QoS Pkt Framework ENA AESNI GCM SNOW 3G NULL VHOST ISA-L DPDK Framework
  • 11. PERFORMANCE OPTIMIZING DATA PROTECTION XOR (RAID 5), P+Q (RAID 6), Reed Solomon Erasure Code COMPRESSION “DEFLATE” IGZIP: Fast CompressionMulti-Buffer: SHA-1, SHA-256, SHA-512, MD5 CRYPTOGRAPHIC HASHING Dog 06d80e7 b0C50bs 49a509t b49f249 24e8c8o 05x84q4 CRC-T10, CRC-IEEE (802.3), CRC32-iSCSI DATA INTEGRITY ReceiverSender CRC DataDivisor 00..0 Data Remainder Remainder Divisor CRC Data Zero, accept Non-zero, reject CRC n bits n+1 bits XTS-AES 128, XTS-AES 256 ENCRYPTION plaintext ReceiverSender plaintext Decryption Algorithm Encryption Algorithm Ciphertext Public encryption key Private encryption keydB eB 1111 Intel® ISA-L Functions
  • 12. 12 Extends Data Plane Development Kit concepts through an end-to-end storage context  Optimized, user-space lockless polling in the NIC driver, TCP/IP stack, iSCSI target, and NVMe driver  iSCSI and NVMe over Fabrics targets integrated Exposes the performance potential of current and next-generation storage media  Media latencies moving from µsec to nsec, storage software architectures must keep up  Permissive open source license for user-space media drivers: NVMe & CBDMA drivers are on github.com  Media drivers support both Linux* and FreeBSD* NVMf Application and Protocol Library:  Provisioning, Fabric Interface Processing , Memory Allocation, Fabric Connection Handling, RDMA Data Xfer  Discovery, Subsystems, Logical Controller, Capsule Processing, Manage Interface with NVMe Driver library *Other names and brands may be claimed as the property of others. iSCSI Target Block Device Abstraction DPDK NIC Driver TCP IP (UNS) NVMe Driver DPDK LIBRARIES NIC User-space Mem Driver DDR CBDMACBDMA Driver CLOUD WRITEREAD Customer SW Existing SW Linux* Kernel Enhanced SW NVMe HW RNIC NVMf Target RDMA VERBS RNIC RDMA SW Linux* OFED SPDK architecture Overview
  • 13. Licensed Package Includes:  Media Drivers: I/OAT DMA (CBDMA) and NVMe Protocols: iSCSI and NVMe over Fabrics (NVMf) Optimized Libraries: DPDK and UNS TCP/IP Stack  User space support code (written in C):  POSIX compliant  Demo/Usage, Unit test (functional correctness), Basic performance  API manuals – may include links or copy key papers  Release.txt – release notes, version, etc. Source Agreement  BSD licensed code distributed via https://github.com/spdk  Licensed version (including UNS and other components in development) is available under non-commercial restricted license and full software license agreement  All code is provided as reference software with best-effort support model SPDK Packaging and Contents
  • 14. Performance comparison: User-space NVMe driver vs. Kernel NVMe driver
  • 15. 4KB Random Read Performance: Partition variants on 4 NVMe SSD Drives Single-Core Intel® Xeon® Processor 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 Partition 2 Partitions 4 Partitions 8 Partitions 16 Partitions IOps(thousands) Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. Kernel NVMe Driver SPDK NVMe Driver SPDK NVMe driver delivers up to 6x performance improvement vs. Kernel NVMe driver with a single-core Intel® Xeon® processor
  • 16. 4KB Random Read Performance: 1 to 4 NVMe SSD Drives Single-Core Intel® Xeon® Processor SPDK NVMe driver scales linearly in performance from 1 to 4 NVMe drives with a single-core Intel® Xeon® processor 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 NVMe 2 NVMe 4 NVMe IOps(thousands) Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. Kernel NVMe Driver SPDK NVMe Driver
  • 17. Performance comparison: iSCSI target in SPDK vs. Linux-IO target
  • 18. Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance 0 1 2 3 4 5 6 7 8 0 100 200 300 400 500 600 1 2 4 8 16 32 Lines-Latency(inmsec) (lowerisbetter) Bars-IOps(inthousands) (higherisbetter) Queue Depth 0 2 4 6 8 10 0 100 200 300 400 500 600 1 2 4 8 16 32 Lines-#ofCoresutilized (lowerisbetter) Bars-IOps(inthousands) (higherisbetter) Queue Depth IOps vs. LATENCY IOps vs. CORE UTILIZATION SPDK 2 Core LIO unlimited cores LIO 2 Core SPDK 4 Core SPDK can provide similar IOps and latency characteristics as LIO while utilizing up to 8 fewer cores Intel® Xeon® Processor v3 – 4KB - iSCSI Random Read: SPDK vs. LIO
  • 19. Intel® Xeon® Processor v3 – 4KB - iSCSI Random Write: SPDK vs. LIO 0 1 2 3 4 5 6 7 8 9 10 0 50 100 150 200 250 300 350 400 450 500 1 2 4 8 16 32 Lines-#ofCoresutilized (lowerisbetter) Bars-IOps(inthousands) (higherisbetter) Queue Depth Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance 0 1 2 3 4 5 6 7 8 9 10 0 50 100 150 200 250 300 350 400 450 500 1 2 4 8 16 32 Lines-Latency(inmsec) (lowerisbetter) Bars-IOps(inthousands) (higherisbetter) Queue Depth SPDK can provide similar IOps and latency characteristics as LIO while utilizing up to 2 fewer cores IOps vs. LATENCY IOps vs. CORE UTILIZATION SPDK 2 Core LIO unlimited cores LIO 2 Core SPDK 4 Core
  • 20. Intel® Xeon® Processor E5-2620v2-iSCSI Read/Write: 4 KB Data 0 100 200 300 400 500 600 IO/s(inthousands) LIO 6 CoreSPDK 2 Core SPDK 1 Core 0 50 100 150 200 250 300 350 IO/s(inthousands) LIOSPDK + Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Internal Measurements as of 22 August 2014. See back up slide # 10-13 for configuration details. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804. For more information go to http://www.intel.com/performance NVM Express Backend PERFORMANCE PERFORMANCE/CORE Up to 650% increase in max performance per core+ 4 KB-Random- 100% Read 4 KB-Random- 70% Read 30% Write 4 KB-Random- 100% Write
  • 22. SPDK NVMf Performance Approaches Local NVMe Efficiency and Scalable Performance  NVMe >2M IOPS per Xeon-D core  NVMf 1.2M IOPS per Xeon-D core Optimized for Intel® Architecture and NVMe  Latency and jitter reduced  Leaves CPU cycles for storage application and value  4X efficiency of kernel NVMe driver 0.3 1.8 1.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1x NVMe 2x NVMe 4x NVMe IOPS(inmillions) higherisbetter Single Core Performance Comparison Intel® Xeon® processor D, Intel P3700 800GB SSDs, FIO- 2.2.9, direct=1, iodepth=128 per LUN Kernel NVMe Driver SPDK NVMe Driver SPDK NVMf Target Configuration details SPDK NVMf Target Intel Xeon-D Processor D-1567 No. of Sockets 1 No. of Cores 12 Cores/ 12 Threads per socket Memory 32G NIC Mellanox Connectx-4 EN Adapter (25Gbps) – Dual Port MTU 1500 OS Fedora 23 Linux Kernel 4.4.3-300.fc23.x86_64 NVMf Target Intel SPDK Storage Array Component Details Storage Drives 4x Intel SSD 800GB DC P3700series NVMf Initiator Component Details No. of Initiators Dell PowerEdge 730xd Processor Intel Xeon Processor E5-2699v3 (45M Cache, 2.30 GHz) No. of Sockets 2 sockets populated No. of Cores 18 Cores/ 18 Threads per socket Memory 132G NIC Mellanox Connectx-4 EN Adapter (25Gbps) – Dual Port MTU 1500 OS RHEL 7.2 Linux Kernel 4.5.0-rc3. Tested by Intel, 3/22/2016
  • 23. NVMf Target NVMf Client SPDK NVMf Target SPDK NVMe lib NVMe Controller spdk_lib_read_start spdk_lib_read_complete FIO- Libaio- Block I/O- Direct- 1 worker per device -1 Queue Depth to each device nvmf_read I/O start nvmf_read I/O complete 93 usec • 93 usec round trip time measured from NVMf client • Out of 93usec, ~80 usec spent in NVMe controller • 12-13 usec measured time over the fabric • SPDK NVMf target adding ~3% to fabric overhead Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance 85.6 usec 85.7 usec FIO Read start 6.5 usec 6.6 usec 85.6 usec6.7 usec NVMf IO Latency Model, 4KB 100% Random read
  • 24. What SPDK can do to improve Ceph? • Accelerate the backend I/Os in Ceph OSD (Object storage service) • Key solution: Replace the Kernel drivers with user-space NVMe drivers provided by SPDK to accelerate the I/Os on NVMe SSDs. • Accelerate the client I/O performance on Ceph Cluster • Key solution: use the accelerated iSCSI application and user-space NVMe drivers in SPDK to build a caching solution in front of Ceph Clusters. • Accelerate the network performance (TCP/IP) in Ceph’s internal network. • Key solution: Replace the existed network solution provided by kernel in each OSD Node with DPDK + User-space TCP/IP stack (e.g., LIBUNS, SEASTAR, MTCP and etc.).
  • 25. Outline • Background • SPDK introduction • XSKY’s BlueStore • Conclusion
  • 26. • key/value database (RocksDB) for metadata • data written directly to block device  Write through cache • pluggable block Allocator (policy) • Adaptive driver policy  Kernel and user-space coexist BlueStore Consume raw block device
  • 27. From Sage 06/21 Talk Performance Status -- Sequential Write(HDD)
  • 28. From Sage 06/21 Talk Performance Status -- Random Write(HDD)
  • 29. • Done • fully function IO path with checksums and compression • fsck • bitmap-based allocator and freelist • Current efforts • optimize metadata encoding efficiency • performance tuning • ZetaScale key/value db as RocksDB alternative • bounds on compressed blob occlusion • Coming Soon • per-pool properties that map to compression, checksum, IO hints more performance optimization • native SMR support high density HDD • Leverage SPDK (Bypass kernel for NVMe devices) From Sage 06/21 Talk BlueStore Status
  • 31. Performance Bottleneck – kernel AIO library
  • 32. • Non Local Connections • NIC RX and application call in different core • Global TCP Control Block Management • Socket API Overhead (building the link) Other Kernel Bottleneck
  • 34. DPDK-Messenger Plugin, create alternative data path
  • 35. • TCP, IP, ARP, DPDK Device: • hardware features offloads • port from seastar TCP/IP stack • integrated with Ceph’s libraries • Event-drive: • User-space Event Center(like epoll) • NetworkStack API: • Basic Network Interface With Zero-copy or Non Zero-copy • Ensure PosixStack ↔ DPDK Stack Compatible • AsyncMessenger: • A collection of Connections • Network Error Policy Design asynchronous, polled mode, zero-copy
  • 36. • Local Listen Table → low latency • Local Connection Process → run-to-complete • TCP 5 Tuples → RX/TX Cores (RSS) • Mbuf go through the whole IO Stack → prevent context switch Shared Nothing TCP/IP (local TCP/IP) asynchronous, polled mode, zero-copy
  • 37. • Status • User-space NVMe Library(SPDK) • Already in Ceph master branch • DPDK integrated • IO Data From NIC(DPDK mbuf) To Device • Missing part (plan in Q4’16) • User-space Cache NVMe Device asynchronous, polled mode, zero-copy
  • 39. Random 4KB Read Random 4KB Write IOPS Kernel Userspace Random 4KB Read Random 4KB Write Avg Latency Kernel Userspace Improvements
  • 40. • Core Logics • no signal/wait • future/promise • full async • Memory Allocation • rte_malloc isn’t effective enough • mbuf live cycle control • Full user-space logic Bluestore Roadmap asynchronous, polled mode, zero-copy
  • 41. Outline • Background • SPDK introduction • XSKY’s BlueStore • Conclusion
  • 42. • There are performance issues in Ceph with the emerging fast network and storage devices. • Storage system need to refactor to catch up hardware. • Ceph is hoped to change to share-nothing implementation. • Mainly, We introduce SPDK and BlueStore to address the current issues in Ceph. • SPDK: Libraries (e.g., user-space NVMe driver) can be used for performance acceleration. • BlueStore: Invent a new store to implement lockless, asynchronous and high performance storage service. • Lots of details need to work(coming soon) Summary
  • 44. Overview: ObjectStore and Data Model • ObjectStore • Abstract interface for storing local data • decouple data and metadata EBOFS, FileStore • EBOFS highlight • A user space extent-based object file system • Deprecated in favor of FileStore on BTRFS in 2009 • Object – “file” • data (file-like byte stream) • Attributes (small key/value) • Omap (unbounded key/value) • Collection-”directory” • placement group shard(slice of the RADOS pool) • shared by 32bit hash value • All writes are transactions • Atomic + Consistent + Durable • Isolation provided by OSD RADOS EBOFS RADOS EBOFS

Editor's Notes

  1. Encryption is for maintaining data confidentiality and requires the use of a key (kept secret) in order to return to plaintext. Hashing is for validating the integrity of content by detecting all modification thereof via obvious changes to the hash output. CRC are good for preventing random errors in transmission but provide little protection from an intentional attack on your data RAID: provides parity function EC: Provides Encode /decode
  2. Summary of DPDK as background (this is NOT a description of SPDK): If you look at a storage system, you have a wire that goes into the box. That wire has a NIC driver. The NIC driver comes from DPDK. It is a very tailored driver that runs in user space and does what is called “polling”. DPDK does not do an interrupt DPDK is very fast compared to traditional operating systems DPDK runs in user space. Not kernel space. That is relevant because kernel space is really close to the hardware and that is where the drivers run. You have extra privileges and you can mess up your system because you have all those privileges. Use space is a little more protected. You don’t have as many privileges. When you transition from user space to kernel space, there is a context switch (just like in an interrupt as described earlier) For every I/O operation you do, there are many more user space to kernel transitions than there are interrupts. You are doing the transition from user to kernel hundreds of thousands of times per second also. This is very painful in terms of consuming CPU resources when you look at this as a whole. A lot of what DPDK does is to get rid of those painful context switches. DPDK is also software that does not need “locks”. In a general purpose operating system, it has to be set up to handle anything at any time for any application that comes in over the wire. That means that there is a lot of synchronization that has to occur between cores and the threads. That synchronization is also very expensive. You get rid of that problem by creating software that is essentially lockless. This allows you to better scale your software – every core you add can increase your performance linearly. Adding a second core could double your performance, tripling your cores could triple your performance.   SPDK takes these concepts and applies them to storage with iSCSI and NVMe. If you take all these things together, you end up with a specific instance of a storage system. Looking at the data flow. Bits come in from the left, go through the NIC, the DPDK NIC driver is running in user space and is polling. It takes the bits and passes it to a TCP/IP stack. In this case, user-mode network services (UNS). The TCP/IP stack takes those bits and does a lot of work to make sure those bits were intended for you and properly formed. That goes into an iSCSI target. That speaks block-based language of SCSI. The user storage app could look at that SCSI request and decide how to service it. It could be a SCSI read request, a SCSI write request, or a SCSI inquiry request trying to find out what the device is capable of, etc. The storage app is where our customers come in. They have the opportunity to provide their value add differentiation services like in-line de-duplication, erasure code, compression, hashing, etc. After the user storage app completes its operation, it is going to want to persist the data somewhere. In this case, we use NVMe. NVMe is the Intel prescribed way of doing PCIe-based SSDs. NVMe is an open standard so we can create a driver that can be open sourced. From there, the data gets stored. WKB is currently under development. It will run on Linux or FreeBSD. Targeting release of WKB in 2015. WKB is also a good system-level vehicle for demonstrating ISA-L. This allows us to provide real-world performance numbers for ISA-L. Previously, all we could show is cycles per byte for the algorithm.
  3. Not the easiest charts to read. For the left chart, the lines are latency, the bars are throughput. The key take away is that even at reasonably high queue depths (16) the SPDK stack delivers sub-millisecond latency at 500K IOPS using just two cores. On the right chart, the bars are the same, but the lines are looking at the number of cores consumed – this just highlights the differences in efficiency between similar levels of performance. The dark blue bar (indicating LIO + kernel TCP/IP) tracks along with the SPDK performance, but the line shows how many more cores are required to keep up. Even at a queue depth of 1, SPDK is twice as efficient as the kernel stack, and that advantage grows as queue depth (and thus overall throughput) increases. Latency is measured from the initiator side. This latency measurement denotes “avg. latency” metric computed by FIO which is the sum of submission + completion latencies. All the latency measurements are in milliseconds (msecs). LIO 2 core and SPDK 2 core data were carried out by restricting the system to only 2 cores. # echo 0 > /sys/devices/system/cpu/cpu{}/online
  4. This is where it would be good to have actual data to update this foil. Our demo running at the performance of the right most bar for 4-NVMe devices. Configuration details”   SPDK NVMf Target Component Details Hardware Intel Xeon-D   Processor D-1567 No. of Sockets 1 No. of Cores 12 Cores/ 12 Threads per socket Memory 32G NIC Mellanox Connectx-4 EN Adapter (25Gbps) – Dual Port MTU 1500 OS Fedora 23 Linux Kernel 4.4.3-300.fc23.x86_64 NVMf Target Intel SPDK   Storage Array Component Details Storage Drives 4x Intel SSD 800GB DC P3700series     NVMf Initiator Component Details No. of Initiators Dell PowerEdge 730xd Processor Intel Xeon Processor E5-2699v3 (45M Cache, 2.30 GHz) No. of Sockets 2 sockets populated No. of Cores 18 Cores/ 18 Threads per socket Memory 132G NIC Mellanox Connectx-4 EN Adapter (25Gbps) – Dual Port MTU 1500 OS RHEL 7.2 Linux Kernel 4.5.0-rc3