That's Ceph, I use Ceph now, Ceph is Cool.
Who's the crazy guy speaking?
What about Ceph?
RBD KO QEMU RBD RGW CephFS FUSE
librbd libcephfs
Ceph Storage Cluster Protocol (librados)
OSDs MonitorsOSDs MDSs
DISTRIBUTED EVERYTHING
CRUSH:
Hash Based
Deterministic Data Placement
Pseudo-Random, Weighted, Distribution
Hierarchically Defined Failure Domains
ADVANTAGES:
Avoids Centralized Data Lookups
Even Data Distribution
Healing is Distributed
Abstracted Storage Backends
CHALLENGES:
Ceph Loves Homogeneity (Per Pool)
Ceph Loves Concurrency
Data Integrity is Expensive
Data Movement is Unavoida...
BORING!
How fast can we go?
Let's test something Fun!
Supermicro SC847A 36-drive Chassis
2x Intel XEON E5-2630L
4x LSI SAS9207-8i Controllers
24x 1TB 7200rpm spinning disks
8x ...
Write Read
0
500
1000
1500
2000
2500
Cuttlefish RADOS Bench 4M Object Throughput
4 Processes, 128 Concurrent Operations
BT...
Yeah, yeah, the bonded 10GbE network is maxed
out. Good for you Mark.
Who cares about RADOS Bench though?
I've moved to the cloud and do lots of small writes
on block storage.
OK, if Ceph is so awesome why are you only
testing 1 server? How does it scale?
Oak Ridge National Laboratory
4 Storage Servers, 8 Client Nodes
DDN SFA10K Storage Chassis
QDR Infiniband Everywhere
A Boa...
1 2 3 4
0
2000
4000
6000
8000
10000
12000
14000
ORNL Multi-Server RADOS Bench Througput
4MB IOs, 8 Client Nodes
Writes
Rea...
So RADOS is scaling nicely.
How much does data replication hurt us?
1 2 3
0
2000
4000
6000
8000
10000
12000
ORNL 4MB RADOS Bench Throughput
Write
Read
Total Write
(Including Journals)
Replic...
This is an HPC site. What about CephFS?
NOTE: CephFS is not production ready!
(Marketing and sales can now sleep again)
1 2 3 4 5 6 7 8
0
1000
2000
3000
4000
5000
6000
7000
ORNL 4M CephFS (IOR) Throughput
Max Write
Avg Write
Max Read
Avg Read...
Hundreds of Cluster Configurations
Hundreds of Tunable Settings
Hundreds of Potential IO Patterns
Too Many Permutations to...
When performance is bad, how do you diagnose?
Ceph Admin Socket
Collectl
Blktrace & Seekwatcher
perf
Where are we going from here?
More testing and Bug fixes!
Erasure Coding
Cloning from Journal Writes (BTRFS)
RSOCKETS/RDMA
Tiering
THANK YOU
Ceph Day NYC: Ceph Performance & Benchmarking
Ceph Day NYC: Ceph Performance & Benchmarking
Ceph Day NYC: Ceph Performance & Benchmarking
Ceph Day NYC: Ceph Performance & Benchmarking
Ceph Day NYC: Ceph Performance & Benchmarking
Upcoming SlideShare
Loading in …5
×

Ceph Day NYC: Ceph Performance & Benchmarking

4,799 views

Published on

Mark Nelson from Inktank discusses his performance and benchmarking efforts with Ceph.

Published in: Technology, Business
  • Be the first to comment

Ceph Day NYC: Ceph Performance & Benchmarking

  1. 1. That's Ceph, I use Ceph now, Ceph is Cool.
  2. 2. Who's the crazy guy speaking?
  3. 3. What about Ceph?
  4. 4. RBD KO QEMU RBD RGW CephFS FUSE librbd libcephfs Ceph Storage Cluster Protocol (librados) OSDs MonitorsOSDs MDSs
  5. 5. DISTRIBUTED EVERYTHING
  6. 6. CRUSH: Hash Based Deterministic Data Placement Pseudo-Random, Weighted, Distribution Hierarchically Defined Failure Domains
  7. 7. ADVANTAGES: Avoids Centralized Data Lookups Even Data Distribution Healing is Distributed Abstracted Storage Backends
  8. 8. CHALLENGES: Ceph Loves Homogeneity (Per Pool) Ceph Loves Concurrency Data Integrity is Expensive Data Movement is Unavoidable Distributed Storage is Hard!
  9. 9. BORING! How fast can we go? Let's test something Fun!
  10. 10. Supermicro SC847A 36-drive Chassis 2x Intel XEON E5-2630L 4x LSI SAS9207-8i Controllers 24x 1TB 7200rpm spinning disks 8x Intel 520 SSDs Bonded 10GbE Network Total Cost: ~$12k
  11. 11. Write Read 0 500 1000 1500 2000 2500 Cuttlefish RADOS Bench 4M Object Throughput 4 Processes, 128 Concurrent Operations BTRFS EXT4 XFS Throughput(MB/s)
  12. 12. Yeah, yeah, the bonded 10GbE network is maxed out. Good for you Mark.
  13. 13. Who cares about RADOS Bench though? I've moved to the cloud and do lots of small writes on block storage.
  14. 14. OK, if Ceph is so awesome why are you only testing 1 server? How does it scale?
  15. 15. Oak Ridge National Laboratory 4 Storage Servers, 8 Client Nodes DDN SFA10K Storage Chassis QDR Infiniband Everywhere A Boatload of Drives!
  16. 16. 1 2 3 4 0 2000 4000 6000 8000 10000 12000 14000 ORNL Multi-Server RADOS Bench Througput 4MB IOs, 8 Client Nodes Writes Reads Writes (Including Journals) Disk Fabric Max Client Network Max Server Nodes (11 OSDs Each) Throughput(MB/s)
  17. 17. So RADOS is scaling nicely. How much does data replication hurt us?
  18. 18. 1 2 3 0 2000 4000 6000 8000 10000 12000 ORNL 4MB RADOS Bench Throughput Write Read Total Write (Including Journals) Replication Level Throughput(MB/s)
  19. 19. This is an HPC site. What about CephFS? NOTE: CephFS is not production ready! (Marketing and sales can now sleep again)
  20. 20. 1 2 3 4 5 6 7 8 0 1000 2000 3000 4000 5000 6000 7000 ORNL 4M CephFS (IOR) Throughput Max Write Avg Write Max Read Avg Read Client Nodes (8 Processes Each) Throughput(MiB/s)
  21. 21. Hundreds of Cluster Configurations Hundreds of Tunable Settings Hundreds of Potential IO Patterns Too Many Permutations to Test Everything!
  22. 22. When performance is bad, how do you diagnose?
  23. 23. Ceph Admin Socket
  24. 24. Collectl
  25. 25. Blktrace & Seekwatcher
  26. 26. perf
  27. 27. Where are we going from here?
  28. 28. More testing and Bug fixes! Erasure Coding Cloning from Journal Writes (BTRFS) RSOCKETS/RDMA Tiering
  29. 29. THANK YOU

×