Ceph Performance and Optimization - Ceph Day Frankfurt
CephDays Frankfurt 2014
💥 Sébastien Han
💥 French Cloud Engineer working for eNovance
💥 Daily job focused on Ceph and OpenStack
Personal blog: http://www.sebastien-han.fr/blog/
Company blog: http://techs.enovance.com/
Last Cephdays presentation
How does Ceph perform?
*The Hitchhiker's Guide to the Galaxy
As soon as an IO goes into an OSD, it gets written twice.
Journal and OSD data on the same
Journal penalty on the disk
Since we write twice, if the journal is stored on the same disk as
the OSD data this will result in the following:
sdb1 - journal 50.11
sdb2 - osd_data 40.25
• Objects are stored as files on the OSD filesystem
• Several IO patterns with different block sizes increase filesystem
• Possible root cause: image sparseness
• One year old cluster ends up with (see allocsize options for
$ sudo xfs_db -c frag -r /dev/sdd
actual 196334, ideal 122582, fragmentation factor 37.56%
No parallelized reads
• Ceph will always serve the read request from the primary OSD
• Room for Nx times speed up where N is the replica count
Blueprint from Sage for the Giant release
• Consistent object check at the PG level
• Compare replicas versions between each others (Fsck for objects)
• Light scrubbing (daily) checks the object size and attributes.
• Deep scrubbing (weekly) reads the data and uses checksums to ensure
• Corruption exists – ECC memory (10^15 for enterprise disk)
• No pain No gain
How to start?
Things that you must consider:
• IO profile: Bandwidth? IOPS? Mixed?
• How many IOPS or Bandwidth per client do I want to deliver?
• Do I use Ceph in standalone or is it combined with a software solution?
•Amount of data (usable not RAW)
• Replica count
• Do I have a data growth planning?
• How much data am I willing to lose if a node fails? (%)
• Am I ready to be annoyed by the scrubbing process?
Things that you must not do
• Don't put a RAID underneath your OSD
• Ceph already manages the replication
• Degraded RAID breaks performances
• Reduce usable space on the cluster
• Don't build high density nodes with a tiny cluster
• Failure consideration and data to re-balance
• Potential full cluster
• Don't run Ceph on your hypervisors (unless you're broke)
• Well maybe…