SlideShare a Scribd company logo
1 of 22
Emerging Storage Solutions (EMS) SanDisk Confidential 1c
CEPH Performance on XIO
Emerging Storage Solutions (EMS) SanDisk Confidential 2
Setup
 4 OSDs, one per SSD (4TB)
 4 pools, 4 rbd images (one per pool)
 1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD
 Block size = 4K, 100% RR
 Working set ~4 TB
 Code base is latest ceph master
 Server has 40 cores and 64 GB RAM
 Shards : thread_per_shard = 25:1
Emerging Storage Solutions (EMS) SanDisk Confidential 3
Result
Transport IOPS BW % of read served
from disk
User%cpu Sys%cpu %idle
TCP ~50K ~200M ~99 ~15 ~12 ~55
RDMA ~130K ~520M ~99 ~40 ~19 ~11
Summary:
• ~1.5X performance gain
• TCP iops/core = 2777, XIO iops/core = 3651
Emerging Storage Solutions (EMS) SanDisk Confidential 4
Setup
 16 OSDs, one per SSD (4TB)
 4 pools, 4 rbd images (one per pool)
 1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD
 Block size = 4K, 100% RR
 Working set ~4 TB
 Code base is latest ceph master
 Server has 40 cores and 64 GB RAM
 Shards : thread_per_shard = 25:1, 10:1
Emerging Storage Solutions (EMS) SanDisk Confidential 5
Result
Transport IOPS BW Disk read (%) Cpu usage
cluster Nodes
(%idle)
Cpu usage
Client nodes
(%idle)
Mem usage
cluster nodes
(%)
TCP ~118K ~470M ~99% ~3 ~26 ~16%
RDMA ~120K ~480M ~99% ~7 ~25 ~28%
Summary:
• TCP is catching up; TCP iops/core = 3041, XIO iops/core = 3225 in cluster nodes
• More memory consumed by XIO
Emerging Storage Solutions (EMS) SanDisk Confidential 6
Setup
 16 OSDs, one per SSD (4TB)
 2 hosts, 8 OSDs each
 4 pools, 4 rbd images (one per pool)
 1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD
 Block size = 4K, 100% RR
 Working set ~6 TB
 Code base is latest ceph master
 Server has 40 cores and 64 GB RAM
 Shards : thread_per_shard = 25:1, 10:1
Emerging Storage Solutions (EMS) SanDisk Confidential 7
Result
Transport IOPS BW Disk read (%) Cpu usage
cluster Nodes
(%idle)
Cpu usage
Client nodes
(%idle)
Mem usage
cluster nodes
(%)
TCP ~175K ~700M ~99% ~8 ~18 ~16%
RDMA ~238K ~952M ~99% ~14 ~20 ~28%
Summary:
• ~36% performance gain
• TCP iops/core = 4755, XIO iops/core = 6918 in cluster nodes
• More than 10% memory usage by RDMA
Emerging Storage Solutions (EMS) SanDisk Confidential 8
Setup
 32 OSDs, one per SSD (4TB)
 2 hosts, 16 OSDs each
 4 pools, 4 rbd images (one per pool)
 1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD
 Block size = 4K, 100% RR
 Working set ~6 TB
 Code base is latest ceph master
 Server has 40 cores and 64 GB RAM
 Shards : thread_per_shard = 25:1, 10:1,15:1,5:2
Emerging Storage Solutions (EMS) SanDisk Confidential 9
Result
Transport IOPS BW Disk read (%) Cpu usage
cluster Nodes
(%idle)
Cpu usage
Client nodes
(%idle)
Mem usage
cluster nodes
(%)
TCP ~214K ~775M ~99% ~9 ~12 ~16%
RDMA ~230K ~870M ~99% ~12 ~18 ~28%
Summary:
• TCP is catching up again, not much of gain
• TCP iops/core = 2939, XIO iops/core = 3267 in cluster nodes
• More emory usage per cluster node
Emerging Storage Solutions (EMS) SanDisk Confidential 10
Did some testing with more powerful setup
 8 OSDs, one per SSD (4TB)
 4 pools, 4 rbd images (one per pool)
 1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD
 Block size = 4K, 100% RR
 Working set ~4 TB
 Code base is latest ceph master
 Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM
 Shards : thread_per_shard = 25:1
Emerging Storage Solutions (EMS) SanDisk Confidential 11
Result
Transport IOPS BW (% of read served
from disk)
Cpu usage
cluster Nodes
(%idle)
Cpu usage
Client nodes
(%idle)
Mem usage
cluster nodes
(%)
TCP ~148K ~505M ~99% ~15 ~68 ~11%
RDMA ~166K ~665M ~99% ~18 ~73 ~19%
Summary:
• ~12% performance gain
• TCP iops/core = 3109, XIO iops/core = 3616 in cluster nodes.
• For client node, TCP iops/core = 8258, XIO iops/core = 10978
• More than 8% memory usage by RDMA
Emerging Storage Solutions (EMS) SanDisk Confidential 12
Result no disk hit
Transport IOPS BW (% of read served
from disk)
Cpu usage
cluster Nodes
(%idle)
Cpu usage
Client nodes
(%idle)
Mem usage
cluster nodes
(%)
TCP ~265K ~1037M ~0 ~35 ~40 ~11%
RDMA ~276K ~1084M ~0 ~60 ~63 ~19%
Summary:
• Not much difference throughput wise
• But, significant difference here.. TCP iops/core = 7280, XIO iops/core = 12,321 in cluster nodes
• More than 8% memory usage by RDMA
Emerging Storage Solutions (EMS) SanDisk Confidential 13
Bumping up OSDs on the same setup
 16 OSDs, one per SSD (4TB)
 4 pools, 4 rbd images (one per pool)
 1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD
 Block size = 4K, 100% RR
 Working set ~4 TB
 Code base is latest ceph master
 Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM
 Shards : thread_per_shard = 10:1, 4:2, 25:1
 Little bit experiment with xio_portal_thread features
Emerging Storage Solutions (EMS) SanDisk Confidential 14
Result
Transport IOPS BW (% of read served
from disk)
Cpu usage
cluster Nodes
(%idle)
Cpu usage
Client nodes
(%idle)
Mem usage
cluster nodes
(%)
TCP ~142K ~505M ~99% ~18 ~68 ~18%
RDMA ~166K ~665M ~99% ~18 ~73 ~38%
Summary:
• TCP iops/core = 3092, XIO iops/core = 3614 in cluster nodes
• TCP iops/core = 7924, XIO iops/core = 10978
• More than 2X memory usage by RDMA
• No t much scaling between 8 and 16 OSDs for both TCP/RDMA !!! Nothing is saturated at this point.
Emerging Storage Solutions (EMS) SanDisk Confidential 15
Result no disk hit
Transport IOPS BW (% of read served
from disk)
Cpu usage
cluster Nodes
(%idle)
Cpu usage
Client nodes
(%idle)
Mem usage
cluster nodes
(%)
TCP ~268K ~1049M ~0 ~37 ~37 ~17%
RDMA ~400K (when osd
side portal thread
= 2, client side =
8)
~1600M ~0 ~40 ~42 ~40%
Summary:
• Well, suspecting some lock contention in the OSD layer, started playing with xio portal threads
• With less number of portal threads (2) in the OSD node, bumped up the no disk hit performance to 400K !!
• I can see increasing XIO portal threads in OSD layer decreasing performance in this case
• Tried with some shard options but TCP remains almost similar to 8 OSD case. Seems like this is a limit.
Emerging Storage Solutions (EMS) SanDisk Confidential 16
Checking the scale out nature
 32OSDs, one per SSD (4TB)
 2 nodes with 16 OSDs each
 4 pools, 4 rbd images (one per pool)
 1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD
 Block size = 4K, 100% RR
 Working set ~4 TB
 Code base is latest ceph master
 Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM
 Shards : thread_per_shard = 10:1, 4:2, 25:1
 Little bit experiment with xio_portal_thread features
Emerging Storage Solutions (EMS) SanDisk Confidential 17
Result no disk hit
Transport IOPS BW (% of read served
from disk)
Cpu usage
cluster Nodes
(%idle)
Cpu usage
Client nodes
(%idle)
Mem usage
cluster nodes
(%)
TCP ~323K ~1263M ~0 ~40 ~12 ~18.7%
RDMA ~343K ~1339M ~0 ~55 ~30 ~37.5%
Summary:
• TCP is scaling but not XIO !
• In fact it is giving less throughput than 16 OSD setup !
• TCP iops/core = 4806, XIO iops/core = 6805 in cluster nodes
• TCP iops/core = 6565, XIO iops/core=8750, even more significant in the client nodes
• XIO mem usage per node is again ~2X
Emerging Storage Solutions (EMS) SanDisk Confidential 18
Result
Transport IOPS BW (% of read served
from disk)
Cpu usage per
cluster Nodes
(%idle)
Cpu usage
Client nodes
(%idle)
Mem usage per
cluster nodes
(%)
TCP ~249K ~973M ~99% ~22 ~18 ~15.5%
RDMA ~258K ~1006M ~99% ~24 ~40 ~38%
Summary:
• TCP/XIO similar throughput
• TCP iops/core = 5422, XIO iops/core=7678. Significant gain with XIO in client side
• XIO mem usage per node is again more than 2X
Emerging Storage Solutions (EMS) SanDisk Confidential 19
Trying out bigger block sizes
 32OSDs, one per SSD (4TB)
 2 nodes with 16 OSDs each
 4 pools, 4 rbd images (one per pool)
 1 physical client box. Total 1 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD
 Couldn’t able to run 4 clients in parallel in case of XIO
 Block size = 16K/64K, 100% RR
 Working set ~4 TB
 Code base is latest ceph master
 Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM
 Shards : thread_per_shard = 10:1, 4:2, 25:1
 Little bit experiment with xio_portal_thread features
Emerging Storage Solutions (EMS) SanDisk Confidential 20
Result(32OSDS,16K,1client )
Transport IOPS BW (% of read served
from disk)
Cpu usage per
cluster Nodes
(%idle)
Cpu usage
Client nodes
(%idle)
Mem usage per
cluster nodes
(%)
TCP ~150K ~2354M ~99% ~35 ~48 ~15.5%
RDMA ~152K (spiky) ~2355M ~99% ~40 ~60 ~38%
Summary:
• TCP/XIO similar throughput
• XIO is very spiky
• Couldn’t run more than 1 client (8 num_jobs) with XIO.
• But, cpu gain is visible
Emerging Storage Solutions (EMS) SanDisk Confidential 21
Result(32OSDS, 1 client)
Transport IOPS BW (% of read served
from disk)
Cpu usage per
cluster Nodes
(%idle)
Cpu usage
Client nodes
(%idle)
Mem usage per
cluster nodes
(%)
TCP ~53K ~3312M ~99% ~57 ~74 ~15.5%
RDMA ~55K (but spiky) ~3625M ~99% ~57 ~82 ~39%
Summary:
• TCP/XIO similar throughput
• XIO is very spiky
• Couldn’t run more than 1 client (8 num_jobs) with XIO.
• But, cpu gain is visible specially in client side
Emerging Storage Solutions (EMS) SanDisk Confidential 22
Summary
 Highlights:
– Definite improvement on iops/core
– Single client is much more efficient with XIO messenger
– Lower number of OSDs can give high throughput
– If we can fix the internal XIO messenger contention, it has potential to outperform TCP in a big way
 Lowlights:
– TCP is catching up fast with increasing OSDs
– TCP also scaling out well than XIO I guess
– XIO present state is *unstable*, some crash/peering problem
– Startup time for a connection is much higher for XIO
– XIO connection is taking time to stabilize to a fix throughput
– Memory requirement is considerably higher

More Related Content

What's hot

Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph Community
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFinside-BigData.com
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Community
 
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuCeph Community
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Odinot Stanislas
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaKamesh Pemmaraju
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph Community
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13Gosuke Miyashita
 
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...inwin stack
 
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John HaanBasic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John HaanCeph Community
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Community
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstackIkuo Kumagai
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateDanielle Womboldt
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
 
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...inside-BigData.com
 
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Community
 

What's hot (20)

Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
MySQL Head-to-Head
MySQL Head-to-HeadMySQL Head-to-Head
MySQL Head-to-Head
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oF
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
 
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of Alabama
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
 
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
 
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John HaanBasic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstack
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA Update
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
 
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache TieringCeph Day Shanghai - Recovery Erasure Coding and Cache Tiering
Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering
 

Similar to Ceph on rdma

3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programminginside-BigData.com
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity Hardware
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity HardwareMirantis, Openstack, Ubuntu, and it's Performance on Commodity Hardware
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity HardwareRyan Aydelott
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosAMD Developer Central
 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance ConsiderationsShawn Wells
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
Decoupling Compute from Memory, Storage and IO with OMI
Decoupling Compute from Memory, Storage and IO with OMIDecoupling Compute from Memory, Storage and IO with OMI
Decoupling Compute from Memory, Storage and IO with OMIAllan Cantle
 
Ics21 workshop decoupling compute from memory, storage & io with omi - ...
Ics21 workshop   decoupling compute from memory, storage & io with omi - ...Ics21 workshop   decoupling compute from memory, storage & io with omi - ...
Ics21 workshop decoupling compute from memory, storage & io with omi - ...Vaibhav R
 
Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Ceph Community
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFShapeBlue
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Community
 
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Community
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFZoltan Arnold Nagy
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureAllan Cantle
 

Similar to Ceph on rdma (20)

3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programming
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity Hardware
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity HardwareMirantis, Openstack, Ubuntu, and it's Performance on Commodity Hardware
Mirantis, Openstack, Ubuntu, and it's Performance on Commodity Hardware
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Decoupling Compute from Memory, Storage and IO with OMI
Decoupling Compute from Memory, Storage and IO with OMIDecoupling Compute from Memory, Storage and IO with OMI
Decoupling Compute from Memory, Storage and IO with OMI
 
Ics21 workshop decoupling compute from memory, storage & io with omi - ...
Ics21 workshop   decoupling compute from memory, storage & io with omi - ...Ics21 workshop   decoupling compute from memory, storage & io with omi - ...
Ics21 workshop decoupling compute from memory, storage & io with omi - ...
 
Ceph on arm64 upload
Ceph on arm64   uploadCeph on arm64   upload
Ceph on arm64 upload
 
Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
BURA Supercomputer
BURA SupercomputerBURA Supercomputer
BURA Supercomputer
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective
 
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
 

Recently uploaded

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 

Recently uploaded (20)

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 

Ceph on rdma

  • 1. Emerging Storage Solutions (EMS) SanDisk Confidential 1c CEPH Performance on XIO
  • 2. Emerging Storage Solutions (EMS) SanDisk Confidential 2 Setup  4 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1
  • 3. Emerging Storage Solutions (EMS) SanDisk Confidential 3 Result Transport IOPS BW % of read served from disk User%cpu Sys%cpu %idle TCP ~50K ~200M ~99 ~15 ~12 ~55 RDMA ~130K ~520M ~99 ~40 ~19 ~11 Summary: • ~1.5X performance gain • TCP iops/core = 2777, XIO iops/core = 3651
  • 4. Emerging Storage Solutions (EMS) SanDisk Confidential 4 Setup  16 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1, 10:1
  • 5. Emerging Storage Solutions (EMS) SanDisk Confidential 5 Result Transport IOPS BW Disk read (%) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~118K ~470M ~99% ~3 ~26 ~16% RDMA ~120K ~480M ~99% ~7 ~25 ~28% Summary: • TCP is catching up; TCP iops/core = 3041, XIO iops/core = 3225 in cluster nodes • More memory consumed by XIO
  • 6. Emerging Storage Solutions (EMS) SanDisk Confidential 6 Setup  16 OSDs, one per SSD (4TB)  2 hosts, 8 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~6 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1, 10:1
  • 7. Emerging Storage Solutions (EMS) SanDisk Confidential 7 Result Transport IOPS BW Disk read (%) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~175K ~700M ~99% ~8 ~18 ~16% RDMA ~238K ~952M ~99% ~14 ~20 ~28% Summary: • ~36% performance gain • TCP iops/core = 4755, XIO iops/core = 6918 in cluster nodes • More than 10% memory usage by RDMA
  • 8. Emerging Storage Solutions (EMS) SanDisk Confidential 8 Setup  32 OSDs, one per SSD (4TB)  2 hosts, 16 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~6 TB  Code base is latest ceph master  Server has 40 cores and 64 GB RAM  Shards : thread_per_shard = 25:1, 10:1,15:1,5:2
  • 9. Emerging Storage Solutions (EMS) SanDisk Confidential 9 Result Transport IOPS BW Disk read (%) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~214K ~775M ~99% ~9 ~12 ~16% RDMA ~230K ~870M ~99% ~12 ~18 ~28% Summary: • TCP is catching up again, not much of gain • TCP iops/core = 2939, XIO iops/core = 3267 in cluster nodes • More emory usage per cluster node
  • 10. Emerging Storage Solutions (EMS) SanDisk Confidential 10 Did some testing with more powerful setup  8 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 25:1
  • 11. Emerging Storage Solutions (EMS) SanDisk Confidential 11 Result Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~148K ~505M ~99% ~15 ~68 ~11% RDMA ~166K ~665M ~99% ~18 ~73 ~19% Summary: • ~12% performance gain • TCP iops/core = 3109, XIO iops/core = 3616 in cluster nodes. • For client node, TCP iops/core = 8258, XIO iops/core = 10978 • More than 8% memory usage by RDMA
  • 12. Emerging Storage Solutions (EMS) SanDisk Confidential 12 Result no disk hit Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~265K ~1037M ~0 ~35 ~40 ~11% RDMA ~276K ~1084M ~0 ~60 ~63 ~19% Summary: • Not much difference throughput wise • But, significant difference here.. TCP iops/core = 7280, XIO iops/core = 12,321 in cluster nodes • More than 8% memory usage by RDMA
  • 13. Emerging Storage Solutions (EMS) SanDisk Confidential 13 Bumping up OSDs on the same setup  16 OSDs, one per SSD (4TB)  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 10:1, 4:2, 25:1  Little bit experiment with xio_portal_thread features
  • 14. Emerging Storage Solutions (EMS) SanDisk Confidential 14 Result Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~142K ~505M ~99% ~18 ~68 ~18% RDMA ~166K ~665M ~99% ~18 ~73 ~38% Summary: • TCP iops/core = 3092, XIO iops/core = 3614 in cluster nodes • TCP iops/core = 7924, XIO iops/core = 10978 • More than 2X memory usage by RDMA • No t much scaling between 8 and 16 OSDs for both TCP/RDMA !!! Nothing is saturated at this point.
  • 15. Emerging Storage Solutions (EMS) SanDisk Confidential 15 Result no disk hit Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~268K ~1049M ~0 ~37 ~37 ~17% RDMA ~400K (when osd side portal thread = 2, client side = 8) ~1600M ~0 ~40 ~42 ~40% Summary: • Well, suspecting some lock contention in the OSD layer, started playing with xio portal threads • With less number of portal threads (2) in the OSD node, bumped up the no disk hit performance to 400K !! • I can see increasing XIO portal threads in OSD layer decreasing performance in this case • Tried with some shard options but TCP remains almost similar to 8 OSD case. Seems like this is a limit.
  • 16. Emerging Storage Solutions (EMS) SanDisk Confidential 16 Checking the scale out nature  32OSDs, one per SSD (4TB)  2 nodes with 16 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Block size = 4K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 10:1, 4:2, 25:1  Little bit experiment with xio_portal_thread features
  • 17. Emerging Storage Solutions (EMS) SanDisk Confidential 17 Result no disk hit Transport IOPS BW (% of read served from disk) Cpu usage cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage cluster nodes (%) TCP ~323K ~1263M ~0 ~40 ~12 ~18.7% RDMA ~343K ~1339M ~0 ~55 ~30 ~37.5% Summary: • TCP is scaling but not XIO ! • In fact it is giving less throughput than 16 OSD setup ! • TCP iops/core = 4806, XIO iops/core = 6805 in cluster nodes • TCP iops/core = 6565, XIO iops/core=8750, even more significant in the client nodes • XIO mem usage per node is again ~2X
  • 18. Emerging Storage Solutions (EMS) SanDisk Confidential 18 Result Transport IOPS BW (% of read served from disk) Cpu usage per cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage per cluster nodes (%) TCP ~249K ~973M ~99% ~22 ~18 ~15.5% RDMA ~258K ~1006M ~99% ~24 ~40 ~38% Summary: • TCP/XIO similar throughput • TCP iops/core = 5422, XIO iops/core=7678. Significant gain with XIO in client side • XIO mem usage per node is again more than 2X
  • 19. Emerging Storage Solutions (EMS) SanDisk Confidential 19 Trying out bigger block sizes  32OSDs, one per SSD (4TB)  2 nodes with 16 OSDs each  4 pools, 4 rbd images (one per pool)  1 physical client box. Total 1 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD  Couldn’t able to run 4 clients in parallel in case of XIO  Block size = 16K/64K, 100% RR  Working set ~4 TB  Code base is latest ceph master  Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM  Shards : thread_per_shard = 10:1, 4:2, 25:1  Little bit experiment with xio_portal_thread features
  • 20. Emerging Storage Solutions (EMS) SanDisk Confidential 20 Result(32OSDS,16K,1client ) Transport IOPS BW (% of read served from disk) Cpu usage per cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage per cluster nodes (%) TCP ~150K ~2354M ~99% ~35 ~48 ~15.5% RDMA ~152K (spiky) ~2355M ~99% ~40 ~60 ~38% Summary: • TCP/XIO similar throughput • XIO is very spiky • Couldn’t run more than 1 client (8 num_jobs) with XIO. • But, cpu gain is visible
  • 21. Emerging Storage Solutions (EMS) SanDisk Confidential 21 Result(32OSDS, 1 client) Transport IOPS BW (% of read served from disk) Cpu usage per cluster Nodes (%idle) Cpu usage Client nodes (%idle) Mem usage per cluster nodes (%) TCP ~53K ~3312M ~99% ~57 ~74 ~15.5% RDMA ~55K (but spiky) ~3625M ~99% ~57 ~82 ~39% Summary: • TCP/XIO similar throughput • XIO is very spiky • Couldn’t run more than 1 client (8 num_jobs) with XIO. • But, cpu gain is visible specially in client side
  • 22. Emerging Storage Solutions (EMS) SanDisk Confidential 22 Summary  Highlights: – Definite improvement on iops/core – Single client is much more efficient with XIO messenger – Lower number of OSDs can give high throughput – If we can fix the internal XIO messenger contention, it has potential to outperform TCP in a big way  Lowlights: – TCP is catching up fast with increasing OSDs – TCP also scaling out well than XIO I guess – XIO present state is *unstable*, some crash/peering problem – Startup time for a connection is much higher for XIO – XIO connection is taking time to stabilize to a fix throughput – Memory requirement is considerably higher