SlideShare a Scribd company logo
1 of 19
Download to read offline
© 2018 Arm Limited
Jun He, jun.he@arm.com
Tone Zhang, tone.zhang@arm.com
• 2018/3/9
Accelerate Ceph
By SPDK on
AArch64
© 2018 Arm Limited
SPDK
3 © 2018 Arm Limited
SPDK
What’s SPDK?
• Storage Performance Development Kit
• A set of tools and libraries to create high
performance, scalable, user mode
storage applications
• Designed for new storage HW devices
(NVMe). Can achieve millions of IOPS
per core. Better tail latency.
Architecture diagram
4 © 2018 Arm Limited
SPDK on AArch64
• Several ARM related patches are merged
• Memory_barrier
• VA address space
• 17.10 release verified
• Kernel: 4.11, 48bit/42bit VA, 4KB pagesize
• UIO/VFIO
5 © 2018 Arm Limited
SPDK Performance on AArch64
• SPDK perf
• UIO/4K pagesize
• FIO
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2000000
IOPS Bandwidth Latency
RandRead
Kernel SPDK
0
200000
400000
600000
800000
1000000
1200000
IOPS Bandwidth Latency
RandWrite
Kernel SPDK
FIO configuration: direct=1, bs=4096, rwmixread=50, iodepth=32, ramp=30s, run_time=180s, jobs=1
6 © 2018 Arm Limited
SPDK Performance on AArch64
• FIO
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
IOPS Bandwidth Latency
RandRW - read
Kernel SPDK
0
50000
100000
150000
200000
250000
300000
350000
IOPS Bandwidth Latency
RandRW - write
Kernel SPDK
FIO configuration: direct=1, bs=4096, rwmixread=50, iodepth=32, ramp=30s, run_time=180s, jobs=1
7 © 2018 Arm Limited
What’s the next?
• Optimization with ASIMD and Crypto extensions
• Tuning with different page-size(16KB/64KB)
• Cache strategy improvement for better read/write performance
© 2018 Arm Limited
Ceph
9 © 2018 Arm Limited
Ceph
What’s Ceph?
• Ceph is a unified, distributed storage
system designed for excellent
performance, reliability and scalability
• Ceph can supply following services
• Object storage
• Block storage
• File system
• The backend storage types
• FileStore
• BlueStore
10 © 2018 Arm Limited
BlueStore
BlueStore is a new storage backend for Ceph.
• Full data built-in compression
• Full data checksum
• Boasts better performance
• Get rid of file system, and write all data to RAW
device via asynchronous libaio infrastructure
11 © 2018 Arm Limited
Ceph on AArch64
• Has already been integrated with OpenStack
• Has been validated and released by Linaro SDI team
• Has committed many patches to fix the functional faults and improve the performance
• Has validated “Ceph + SPDK” on top of NVMe devices
• Tuned Ceph performance on AArch64
12 © 2018 Arm Limited
Ceph + SPDK on AArch64
• Dependencies
• NVMe device
• SPDK/DPDK
• BlueStore
• Enabled SPDK in Ceph on AArch64
• Extended virtual address map bits from 47 to 48 bits in DPDK
13 © 2018 Arm Limited
Ceph + SPDK on AArch64
BlueStore is a new storage backend for Ceph.
• BlueStore can utilize SPDK
• Replace kernel driver with SPDK user
space NVMe driver
• Abstract BlockDevice on top of SPDK
NVMe driver
NVMe device
Kernel NVMe driver
BlueFS
BlueRocksENV
RocksDB
metadata
NVMe device
SPDK NVMe driver
BlueFS
BlueRocksENV
RocksDB
metadata
FileStore BlueStore
CEPH RBD Service
BlockDevice
CEPH Object Service CEPHFS Service
14 © 2018 Arm Limited
Ceph + SPDK Performance test on AArch64
Test case
• Ceph cluster
• Two OSD, one MON, no MDS and RGW
• One NVMe card per OSD
• CPU: 2.4GHz multi-core
• Client
• CPU: 2.0GHz multi-core
• Test tool
• Fio (v2.2.10)
• Test case:
• Sequential write with different block_size (4KB,
8KB and 16KB)
• 1 and 2 fio streams
Ceph cluster
OSD1 OSD2
MON
Client
15 © 2018 Arm Limited
Write performance result
1 stream
1000
1500
2000
2500
3000
3500
IOPS - 4KB
1coe 2cores 4cores
Kernel NVMe
1core 2cores 4cores
SPDK
1000
1500
2000
2500
3000
3500
IOPS - 8KB
1core 2cores 4cores
Kernel NVMe
1core 2cores 4cores
SPDK
1000
1500
2000
2500
3000
3500
IOPS - 16KB
100
150
200
250
300
350
latency - 4KB
msec
100
150
200
250
300
350
latency - 8KB
msec
100
150
200
250
300
350
latency - 16KB
msec
1core 2cores 4cores
Kernel NVMe
1core 2cores 4cores
Kernel NVMe
1core 2cores 4cores
Kernel NVMe
1core 2cores 4cores
Kernel NVMe
1core 2cores 4cores
SPDK
1core 2cores 4cores
SPDK
1core 2cores 4cores
SPDK
1core 2cores 4cores
SPDK
1 fio stream, FIO configuration: bs=4K/8K/16K, rw=write, iodepth=384, run_time=40s, jobs=1, ioengine=rbd
16 © 2018 Arm Limited
Write performance result
2 streams
1000
1500
2000
2500
3000
3500
4000
IOPS - 4K
1000
1500
2000
2500
3000
3500
4000
IOPS - 8K
1000
1500
2000
2500
3000
3500
4000
IOPS - 16K
1core 2cores 4cores
SPDK
1core 2cores 4cores
SPDK
1core 2cores 4cores
SPDK
1core 2cores 4cores
Kernel NVMe
1core 2cores 4cores
Kernel NVMe
1core 2cores 4cores
Kernel NVMe
2 fio streams, FIO configuration: bs=4K/8K/16K, rw=write, iodepth=384, run_time=40s, jobs=1, ioengine=rbd
17 © 2018 Arm Limited
Performance improvement
SPDK accelerated Ceph in below:
• More IOPS
• Lower latency
• Linear scaling associate with the number of CPU cores
18 © 2018 Arm Limited
What’s the next?
• Continue improving Ceph performance on top of SPDK
• Enable NVMe-OF and RDMA
• Enable zero-copy in Ceph
• Simplify the locking in Ceph to improve the OSD daemon performance
• Switch PAGE_SIZE to 16KB and 64KB to improve the memory performance
• Modify NVMEDEVICE to improve its performance associate with different PAGE_SIZE
1919
Thank You
Danke
Merci
谢谢
ありがとう
Gracias
Kiitos
감사합니다
धन्यवाद
‫תודה‬
© 2018 Arm Limited

More Related Content

More from Linaro

Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
Linaro
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
Linaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
Linaro
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
Linaro
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
Linaro
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
Linaro
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
Linaro
 

More from Linaro (20)

Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

HKG18-112 - Accelerate Ceph by SPDK on AArch64

  • 1. © 2018 Arm Limited Jun He, jun.he@arm.com Tone Zhang, tone.zhang@arm.com • 2018/3/9 Accelerate Ceph By SPDK on AArch64
  • 2. © 2018 Arm Limited SPDK
  • 3. 3 © 2018 Arm Limited SPDK What’s SPDK? • Storage Performance Development Kit • A set of tools and libraries to create high performance, scalable, user mode storage applications • Designed for new storage HW devices (NVMe). Can achieve millions of IOPS per core. Better tail latency. Architecture diagram
  • 4. 4 © 2018 Arm Limited SPDK on AArch64 • Several ARM related patches are merged • Memory_barrier • VA address space • 17.10 release verified • Kernel: 4.11, 48bit/42bit VA, 4KB pagesize • UIO/VFIO
  • 5. 5 © 2018 Arm Limited SPDK Performance on AArch64 • SPDK perf • UIO/4K pagesize • FIO 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 IOPS Bandwidth Latency RandRead Kernel SPDK 0 200000 400000 600000 800000 1000000 1200000 IOPS Bandwidth Latency RandWrite Kernel SPDK FIO configuration: direct=1, bs=4096, rwmixread=50, iodepth=32, ramp=30s, run_time=180s, jobs=1
  • 6. 6 © 2018 Arm Limited SPDK Performance on AArch64 • FIO 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 IOPS Bandwidth Latency RandRW - read Kernel SPDK 0 50000 100000 150000 200000 250000 300000 350000 IOPS Bandwidth Latency RandRW - write Kernel SPDK FIO configuration: direct=1, bs=4096, rwmixread=50, iodepth=32, ramp=30s, run_time=180s, jobs=1
  • 7. 7 © 2018 Arm Limited What’s the next? • Optimization with ASIMD and Crypto extensions • Tuning with different page-size(16KB/64KB) • Cache strategy improvement for better read/write performance
  • 8. © 2018 Arm Limited Ceph
  • 9. 9 © 2018 Arm Limited Ceph What’s Ceph? • Ceph is a unified, distributed storage system designed for excellent performance, reliability and scalability • Ceph can supply following services • Object storage • Block storage • File system • The backend storage types • FileStore • BlueStore
  • 10. 10 © 2018 Arm Limited BlueStore BlueStore is a new storage backend for Ceph. • Full data built-in compression • Full data checksum • Boasts better performance • Get rid of file system, and write all data to RAW device via asynchronous libaio infrastructure
  • 11. 11 © 2018 Arm Limited Ceph on AArch64 • Has already been integrated with OpenStack • Has been validated and released by Linaro SDI team • Has committed many patches to fix the functional faults and improve the performance • Has validated “Ceph + SPDK” on top of NVMe devices • Tuned Ceph performance on AArch64
  • 12. 12 © 2018 Arm Limited Ceph + SPDK on AArch64 • Dependencies • NVMe device • SPDK/DPDK • BlueStore • Enabled SPDK in Ceph on AArch64 • Extended virtual address map bits from 47 to 48 bits in DPDK
  • 13. 13 © 2018 Arm Limited Ceph + SPDK on AArch64 BlueStore is a new storage backend for Ceph. • BlueStore can utilize SPDK • Replace kernel driver with SPDK user space NVMe driver • Abstract BlockDevice on top of SPDK NVMe driver NVMe device Kernel NVMe driver BlueFS BlueRocksENV RocksDB metadata NVMe device SPDK NVMe driver BlueFS BlueRocksENV RocksDB metadata FileStore BlueStore CEPH RBD Service BlockDevice CEPH Object Service CEPHFS Service
  • 14. 14 © 2018 Arm Limited Ceph + SPDK Performance test on AArch64 Test case • Ceph cluster • Two OSD, one MON, no MDS and RGW • One NVMe card per OSD • CPU: 2.4GHz multi-core • Client • CPU: 2.0GHz multi-core • Test tool • Fio (v2.2.10) • Test case: • Sequential write with different block_size (4KB, 8KB and 16KB) • 1 and 2 fio streams Ceph cluster OSD1 OSD2 MON Client
  • 15. 15 © 2018 Arm Limited Write performance result 1 stream 1000 1500 2000 2500 3000 3500 IOPS - 4KB 1coe 2cores 4cores Kernel NVMe 1core 2cores 4cores SPDK 1000 1500 2000 2500 3000 3500 IOPS - 8KB 1core 2cores 4cores Kernel NVMe 1core 2cores 4cores SPDK 1000 1500 2000 2500 3000 3500 IOPS - 16KB 100 150 200 250 300 350 latency - 4KB msec 100 150 200 250 300 350 latency - 8KB msec 100 150 200 250 300 350 latency - 16KB msec 1core 2cores 4cores Kernel NVMe 1core 2cores 4cores Kernel NVMe 1core 2cores 4cores Kernel NVMe 1core 2cores 4cores Kernel NVMe 1core 2cores 4cores SPDK 1core 2cores 4cores SPDK 1core 2cores 4cores SPDK 1core 2cores 4cores SPDK 1 fio stream, FIO configuration: bs=4K/8K/16K, rw=write, iodepth=384, run_time=40s, jobs=1, ioengine=rbd
  • 16. 16 © 2018 Arm Limited Write performance result 2 streams 1000 1500 2000 2500 3000 3500 4000 IOPS - 4K 1000 1500 2000 2500 3000 3500 4000 IOPS - 8K 1000 1500 2000 2500 3000 3500 4000 IOPS - 16K 1core 2cores 4cores SPDK 1core 2cores 4cores SPDK 1core 2cores 4cores SPDK 1core 2cores 4cores Kernel NVMe 1core 2cores 4cores Kernel NVMe 1core 2cores 4cores Kernel NVMe 2 fio streams, FIO configuration: bs=4K/8K/16K, rw=write, iodepth=384, run_time=40s, jobs=1, ioengine=rbd
  • 17. 17 © 2018 Arm Limited Performance improvement SPDK accelerated Ceph in below: • More IOPS • Lower latency • Linear scaling associate with the number of CPU cores
  • 18. 18 © 2018 Arm Limited What’s the next? • Continue improving Ceph performance on top of SPDK • Enable NVMe-OF and RDMA • Enable zero-copy in Ceph • Simplify the locking in Ceph to improve the OSD daemon performance • Switch PAGE_SIZE to 16KB and 64KB to improve the memory performance • Modify NVMEDEVICE to improve its performance associate with different PAGE_SIZE