CEPH PERFORMANCE
Profiling and Reporting
Brent Compton, Director Storage Solution Architectures
Kyle Bader, Sr Storage Architect
Veda Shankar, Sr Storage Architect
HOW WELL CAN CEPH
PERFORM?
WHICH OF MY
WORKLOADS CAN IT
HANDLE?
HOW WILL CEPH
PERFORM ON MY
SERVERS?
INSERT DESIGNATOR, IF NEEDED2
Questions that continually surface
FAQ FROM THE COMMUNITY
PERCEIVED RANGE
OF CEPH PERF
ACTUAL (MEASURED) RANGE
OF CEPH PERF
INSERT DESIGNATOR, IF NEEDED3
Finding the right server and network config for the job
HOW WELL CAN CEPH PERFORM?
https://github.com/ceph/ceph-brag (email pmcgarry@redhat.com for access)
INSERT DESIGNATOR, IF NEEDED4
Ceph performance leaderboard (ceph-brag) coming to ceph.com
INVITATION TO BE PART OF THE ANSWER
INSERT DESIGNATOR, IF NEEDED5
Posted throughput results
A LEADERBOARD FOR CEPH PERF RESULTS
INSERT DESIGNATOR, IF NEEDED6
Looking for Beta submitters prior to general availability on Ceph.com
LEADERBOARD ATTRIBUTION AND DETAILS
INSERT DESIGNATOR, IF NEEDED7
Still under construction
EMERGING LEADERBOARD FOR IOPS
OpenStack Starter
64 TB
S
256TB +
M
1PB +
L
2PB+
MySQL Perf Node
IOPs optimized
Digital Media Perf Node
Throughput
optimized
Archive Node
Cost-Capacity
optimized
MAPPING CONFIGS TO WORKLOAD IO
CATEGORIES
INSERT DESIGNATOR, IF NEEDED9
Some pertinent measures
• MBps
• $/MBps
• MBps/provisioned-TB
• Watts/MBps
• MTTR (self-heal from server failure)
Range of MBps measured with Ceph on different server configs
DIGITAL MEDIA PERF NODES
0
100
200
300
400
500
HDD
sample
SSD
sample
4M Read
MBps per
Drive
4M Write
MBps per
Drive
Sequential Read Throughput vs IO Block Size
THROUGHPUT PER OSD DEVICE (READ)
INSERT DESIGNATOR, IF NEEDED10
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
64 512 1024 4096
MB/secperOSDDevice
IO Block Size
D51PH-1ULH - 12xOSD+3xSSD, 2x10G (3xRep)
D51PH-1ULH - 12xOSD+0xSSD, 2x10G (EC3:2)
T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G (3xRep)
T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G (EC2:2)
T21P-4U/Dual - 35xOSD+0xSSD, 10G+10G (EC2:2)
Sequential Write Throughput vs IO Block Size
THROUGHPUT PER OSD DEVICE (WRITE)
INSERT DESIGNATOR, IF NEEDED11
0.00
5.00
10.00
15.00
20.00
25.00
64 512 1024 4096
MB/secperOSDDevice
IO Block Size
D51PH-1ULH - 12xOSD+3xSSD, 2x10G (3xRep)
D51PH-1ULH - 12xOSD+3xSSD, 2x10G (EC3:2)
T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G
(3xRep)
T21P-4U/Dual - 35xOSD+0xPCIe, 1x40G
(EC2:2)
T21P-4U/Dual - 35xOSD+0xSSD, 10G+10G
(EC2:2)
Sequential Throughput vs Different Server Sizes
SERVER SCALABILITY
INSERT DESIGNATOR, IF NEEDED12
0
10
20
30
40
50
60
70
80
90
100
12 Disks / OSDs (D51PH) 35 Disks / OSDs (T21P)
MBytes/sec/disk
Rados-4M-seq-read/Disk
Rados-4M-seq-write/Disk
Sequential Throughput vs Different Protection Methods (Replication v. Erasure-coding)
DATA PROTECTION METHODS
INSERT DESIGNATOR, IF NEEDED13
0
10
20
30
40
50
60
70
80
90
100
Rados-4M-Seq-Reads/disk Rados-4M-Seq-Writes/disk
MBytes/sec/disk
D51PH-1ULH - 12xOSD+0xSSD,
2x10G (EC3:2)
D51PH-1ULH - 12xOSD+3xSSD,
2x10G (EC3:2)
D51PH-1ULH - 12xOSD+3xSSD,
2x10G (3xRep)
Sequential IO Latency vs Different Journal Approaches
JOURNALING
INSERT DESIGNATOR, IF NEEDED14
0
500
1000
1500
2000
2500
3000
3500
4000
Rados-4M-Seq-Reads Latency Rados-4M-Seq-Writes Latency
Latencyinmsec
T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G
(3xRep)
T21P-4U/Dual - 35xOSD+0xPCIe, 1x40G
(3xRep)
Sequential Throughput vs Different Network Bandwidth
NETWORK
INSERT DESIGNATOR, IF NEEDED15
0
10
20
30
40
50
60
70
80
90
100
Rados-4M-Seq-Reads/disk Rados-4M-Seq-Writes/disk
MBytes/sec/disk
T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G
(3xRep)
T21P-4U/Dual - 35xOSD+2xPCIe, 10G+10G
(3xRep)
Sequential Throughput v. Different OSD Media Types (All-flash v. Magnetic)
MEDIA TYPE
16
Different Configs vs $/MBps (lowest = best)
PRICE/PERFORMANCE
17
D51PH-1ULH - 12xOSD+3xSSD, 2x10G (3xRep)
T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G (3xRep)
$/MBps
Price/Perf (w)
Price/Perf (r)
Different Configs vs $/MBps (lowest = best)
PRICE/PERFORMANCE
18
D51PH-1ULH - 12xOSD+3xSSD, 2x10G (3xRep)
T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G (3xRep)
$/MBps
Price/Perf (w)
Price/Perf (r)
INSERT DESIGNATOR, IF NEEDED19
Some pertinent measures
• MySQL Sysbench requests/sec
• IOPS (4K, 16K random)
• $/IOP
• IOPS/provisioned-GB
• Watts/IOP
Range of IOPS measured with Ceph on different server configs
MYSQL PERF NODES
0
10000
20000
30000
40000
50000
60000
HDD
sample
SSD
sample
4K Read
IOPS per
Drive
4K Write
IOPS per
Drive
AWS provisioned-IOPS v. Ceph all-flash configs
SYSBENCH REQUEST/SEC
20
0
10000
20000
30000
40000
50000
60000
70000
80000
P-IOPS
m4.4XL
Ceph cluster
cl: 16 vcpu/64MB
(1 instance,
14% capacity)
Ceph cluster
cl: 16 vcpu/64MB
(10 instances,
87% capacity)
Sysbench Read Req/sec
Sysbench Write Req/sec
Sysbench 70/30 R/W
Req/sec
AWS use of IOPS/GB throttles
GETTING DETERMINISTIC IOPS
21
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
P-IOPS
m4.4XL
P-IOPS
r3.2XL
GP-SSD
r3.2XL
MySQL IOPS/GB, Sysbench Reads
MySQL IOPS/GB, Sysbench Writes
Ceph IOPS/GB varying with instance quantity and cluster capacity utilization
MYSQL INSTANCES AND CLUSTER
CAPACITY
22
26
87
19
0
10
20
30
40
50
60
70
80
90
100
P-IOPS
m4.4XL
Ceph cluster
cl: 16 vcpu/64MB
(1 instance,
14% capacity)
Ceph cluster
cl: 16 vcpu/64MB
(10 instances,
87% capacity)
Collect baseline measures
INSERT DESIGNATOR, IF NEEDED23
METHODOLOGY: BASELINING
1. Determine benchmark measures most representative of business need
2. Determine cluster access method (block, object, file)
3. Collect baseline measures
1. Look-up manufacturer drive specifications (IOPS, MBps, latency)
2. Single-node IO baseline (max IOPS, MBps to all drives concurrently)
3. Network baseline (consistent bandwidth across full route mesh)
4. Rados baseline (max sequential throughput per drive)
5. RBD baseline (max IOPS per drive)
6. Sysbench baseline (max DB requests/sec per drive)
7. RGW baseline (max object OP/sec per drive)
4. Calculate drive efficiency at each level up the stack
Towards deterministic performance
INSERT DESIGNATOR, IF NEEDED24
METHODOLOGY: WATERMARKS
1. Identify IOPS/GB at 35% and 70% cluster utilization (with corresponding MySQL instances)
2. Identify MBps/TB at 35% and 70% cluster utilization
3. Determine target IOPS/GB or MBps at target cluster utilization
4. (experimential) Set block device IO throttles to cap consumption by any single client
Towards comparable results
INSERT DESIGNATOR, IF NEEDED25
COMMON TOOLS
1. CBT – Ceph Benchmarking Tool
https://github.com/ceph/ceph-brag (email pmcgarry@redhat.com for access)
INSERT DESIGNATOR, IF NEEDED26
Ceph performance leaderboard (ceph-brag) coming to ceph.com
INVITATION TO BE PART OF THE ANSWER
plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHatNews
THANK YOU
4K Random Write IOPS v. Different Controllers and software configs
RAID CONTROLLER WRITE-BACK (HDD
OSDS)
28

Ceph Performance Profiling and Reporting

  • 1.
    CEPH PERFORMANCE Profiling andReporting Brent Compton, Director Storage Solution Architectures Kyle Bader, Sr Storage Architect Veda Shankar, Sr Storage Architect
  • 2.
    HOW WELL CANCEPH PERFORM? WHICH OF MY WORKLOADS CAN IT HANDLE? HOW WILL CEPH PERFORM ON MY SERVERS? INSERT DESIGNATOR, IF NEEDED2 Questions that continually surface FAQ FROM THE COMMUNITY
  • 3.
    PERCEIVED RANGE OF CEPHPERF ACTUAL (MEASURED) RANGE OF CEPH PERF INSERT DESIGNATOR, IF NEEDED3 Finding the right server and network config for the job HOW WELL CAN CEPH PERFORM?
  • 4.
    https://github.com/ceph/ceph-brag (email pmcgarry@redhat.comfor access) INSERT DESIGNATOR, IF NEEDED4 Ceph performance leaderboard (ceph-brag) coming to ceph.com INVITATION TO BE PART OF THE ANSWER
  • 5.
    INSERT DESIGNATOR, IFNEEDED5 Posted throughput results A LEADERBOARD FOR CEPH PERF RESULTS
  • 6.
    INSERT DESIGNATOR, IFNEEDED6 Looking for Beta submitters prior to general availability on Ceph.com LEADERBOARD ATTRIBUTION AND DETAILS
  • 7.
    INSERT DESIGNATOR, IFNEEDED7 Still under construction EMERGING LEADERBOARD FOR IOPS
  • 8.
    OpenStack Starter 64 TB S 256TB+ M 1PB + L 2PB+ MySQL Perf Node IOPs optimized Digital Media Perf Node Throughput optimized Archive Node Cost-Capacity optimized MAPPING CONFIGS TO WORKLOAD IO CATEGORIES
  • 9.
    INSERT DESIGNATOR, IFNEEDED9 Some pertinent measures • MBps • $/MBps • MBps/provisioned-TB • Watts/MBps • MTTR (self-heal from server failure) Range of MBps measured with Ceph on different server configs DIGITAL MEDIA PERF NODES 0 100 200 300 400 500 HDD sample SSD sample 4M Read MBps per Drive 4M Write MBps per Drive
  • 10.
    Sequential Read Throughputvs IO Block Size THROUGHPUT PER OSD DEVICE (READ) INSERT DESIGNATOR, IF NEEDED10 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00 64 512 1024 4096 MB/secperOSDDevice IO Block Size D51PH-1ULH - 12xOSD+3xSSD, 2x10G (3xRep) D51PH-1ULH - 12xOSD+0xSSD, 2x10G (EC3:2) T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G (3xRep) T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G (EC2:2) T21P-4U/Dual - 35xOSD+0xSSD, 10G+10G (EC2:2)
  • 11.
    Sequential Write Throughputvs IO Block Size THROUGHPUT PER OSD DEVICE (WRITE) INSERT DESIGNATOR, IF NEEDED11 0.00 5.00 10.00 15.00 20.00 25.00 64 512 1024 4096 MB/secperOSDDevice IO Block Size D51PH-1ULH - 12xOSD+3xSSD, 2x10G (3xRep) D51PH-1ULH - 12xOSD+3xSSD, 2x10G (EC3:2) T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G (3xRep) T21P-4U/Dual - 35xOSD+0xPCIe, 1x40G (EC2:2) T21P-4U/Dual - 35xOSD+0xSSD, 10G+10G (EC2:2)
  • 12.
    Sequential Throughput vsDifferent Server Sizes SERVER SCALABILITY INSERT DESIGNATOR, IF NEEDED12 0 10 20 30 40 50 60 70 80 90 100 12 Disks / OSDs (D51PH) 35 Disks / OSDs (T21P) MBytes/sec/disk Rados-4M-seq-read/Disk Rados-4M-seq-write/Disk
  • 13.
    Sequential Throughput vsDifferent Protection Methods (Replication v. Erasure-coding) DATA PROTECTION METHODS INSERT DESIGNATOR, IF NEEDED13 0 10 20 30 40 50 60 70 80 90 100 Rados-4M-Seq-Reads/disk Rados-4M-Seq-Writes/disk MBytes/sec/disk D51PH-1ULH - 12xOSD+0xSSD, 2x10G (EC3:2) D51PH-1ULH - 12xOSD+3xSSD, 2x10G (EC3:2) D51PH-1ULH - 12xOSD+3xSSD, 2x10G (3xRep)
  • 14.
    Sequential IO Latencyvs Different Journal Approaches JOURNALING INSERT DESIGNATOR, IF NEEDED14 0 500 1000 1500 2000 2500 3000 3500 4000 Rados-4M-Seq-Reads Latency Rados-4M-Seq-Writes Latency Latencyinmsec T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G (3xRep) T21P-4U/Dual - 35xOSD+0xPCIe, 1x40G (3xRep)
  • 15.
    Sequential Throughput vsDifferent Network Bandwidth NETWORK INSERT DESIGNATOR, IF NEEDED15 0 10 20 30 40 50 60 70 80 90 100 Rados-4M-Seq-Reads/disk Rados-4M-Seq-Writes/disk MBytes/sec/disk T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G (3xRep) T21P-4U/Dual - 35xOSD+2xPCIe, 10G+10G (3xRep)
  • 16.
    Sequential Throughput v.Different OSD Media Types (All-flash v. Magnetic) MEDIA TYPE 16
  • 17.
    Different Configs vs$/MBps (lowest = best) PRICE/PERFORMANCE 17 D51PH-1ULH - 12xOSD+3xSSD, 2x10G (3xRep) T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G (3xRep) $/MBps Price/Perf (w) Price/Perf (r)
  • 18.
    Different Configs vs$/MBps (lowest = best) PRICE/PERFORMANCE 18 D51PH-1ULH - 12xOSD+3xSSD, 2x10G (3xRep) T21P-4U/Dual - 35xOSD+2xPCIe, 1x40G (3xRep) $/MBps Price/Perf (w) Price/Perf (r)
  • 19.
    INSERT DESIGNATOR, IFNEEDED19 Some pertinent measures • MySQL Sysbench requests/sec • IOPS (4K, 16K random) • $/IOP • IOPS/provisioned-GB • Watts/IOP Range of IOPS measured with Ceph on different server configs MYSQL PERF NODES 0 10000 20000 30000 40000 50000 60000 HDD sample SSD sample 4K Read IOPS per Drive 4K Write IOPS per Drive
  • 20.
    AWS provisioned-IOPS v.Ceph all-flash configs SYSBENCH REQUEST/SEC 20 0 10000 20000 30000 40000 50000 60000 70000 80000 P-IOPS m4.4XL Ceph cluster cl: 16 vcpu/64MB (1 instance, 14% capacity) Ceph cluster cl: 16 vcpu/64MB (10 instances, 87% capacity) Sysbench Read Req/sec Sysbench Write Req/sec Sysbench 70/30 R/W Req/sec
  • 21.
    AWS use ofIOPS/GB throttles GETTING DETERMINISTIC IOPS 21 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 P-IOPS m4.4XL P-IOPS r3.2XL GP-SSD r3.2XL MySQL IOPS/GB, Sysbench Reads MySQL IOPS/GB, Sysbench Writes
  • 22.
    Ceph IOPS/GB varyingwith instance quantity and cluster capacity utilization MYSQL INSTANCES AND CLUSTER CAPACITY 22 26 87 19 0 10 20 30 40 50 60 70 80 90 100 P-IOPS m4.4XL Ceph cluster cl: 16 vcpu/64MB (1 instance, 14% capacity) Ceph cluster cl: 16 vcpu/64MB (10 instances, 87% capacity)
  • 23.
    Collect baseline measures INSERTDESIGNATOR, IF NEEDED23 METHODOLOGY: BASELINING 1. Determine benchmark measures most representative of business need 2. Determine cluster access method (block, object, file) 3. Collect baseline measures 1. Look-up manufacturer drive specifications (IOPS, MBps, latency) 2. Single-node IO baseline (max IOPS, MBps to all drives concurrently) 3. Network baseline (consistent bandwidth across full route mesh) 4. Rados baseline (max sequential throughput per drive) 5. RBD baseline (max IOPS per drive) 6. Sysbench baseline (max DB requests/sec per drive) 7. RGW baseline (max object OP/sec per drive) 4. Calculate drive efficiency at each level up the stack
  • 24.
    Towards deterministic performance INSERTDESIGNATOR, IF NEEDED24 METHODOLOGY: WATERMARKS 1. Identify IOPS/GB at 35% and 70% cluster utilization (with corresponding MySQL instances) 2. Identify MBps/TB at 35% and 70% cluster utilization 3. Determine target IOPS/GB or MBps at target cluster utilization 4. (experimential) Set block device IO throttles to cap consumption by any single client
  • 25.
    Towards comparable results INSERTDESIGNATOR, IF NEEDED25 COMMON TOOLS 1. CBT – Ceph Benchmarking Tool
  • 26.
    https://github.com/ceph/ceph-brag (email pmcgarry@redhat.comfor access) INSERT DESIGNATOR, IF NEEDED26 Ceph performance leaderboard (ceph-brag) coming to ceph.com INVITATION TO BE PART OF THE ANSWER
  • 27.
  • 28.
    4K Random WriteIOPS v. Different Controllers and software configs RAID CONTROLLER WRITE-BACK (HDD OSDS) 28