SlideShare a Scribd company logo
Ceph performance
CephDays Frankfurt 2014
Whoami
💥 Sébastien Han
💥 French Cloud Engineer working for eNovance
💥 Daily job focused on Ceph and OpenStack
💥 Blogger
Personal blog: http://www.sebastien-han.fr/blog/
Company blog: http://techs.enovance.com/
Last Cephdays presentation
How does Ceph perform?
42*
*The Hitchhiker's Guide to the Galaxy
The Good
Ceph IO pattern
CRUSH: deterministic object
placement
As soon as a client writes into Ceph, the operation is computed
and the client decides to which OSD the object should belong
Aggregation: cluster level
As soon as you write into Ceph, all the objects get equally spread across the entire
Cluster, understanding machines and disks..
Aggregation: OSD level
As soon as an IO goes into an OSD, no matter how the original pattern was,
it becomes sequential.
The Bad
Ceph IO pattern
Journaling
As soon as an IO goes into an OSD, it gets written twice.
Journal and OSD data on the same
disk
Journal penalty on the disk
Since we write twice, if the journal is stored on the same disk as
the OSD data this will result in the following:
Device: wMB/s
sdb1 - journal 50.11
sdb2 - osd_data 40.25
Filesystem fragmentation
• Objects are stored as files on the OSD filesystem
• Several IO patterns with different block sizes increase filesystem
fragmentation
• Possible root cause: image sparseness
• One year old cluster ends up with (see allocsize options for
XFS):
$ sudo xfs_db -c frag -r /dev/sdd
actual 196334, ideal 122582, fragmentation factor 37.56%
No parallelized reads
• Ceph will always serve the read request from the primary OSD
• Room for Nx times speed up where N is the replica count
Blueprint from Sage for the Giant release
Scrubbing impact
• Consistent object check at the PG level
• Compare replicas versions between each others (Fsck for objects)
• Light scrubbing (daily) checks the object size and attributes.
• Deep scrubbing (weekly) reads the data and uses checksums to ensure
data integrity.
• Corruption exists – ECC memory (10^15 for enterprise disk)
~113TB
• No pain No gain
The Ugly
Ceph IO pattern
IOs to the OSD disk
One IO into Ceph leads to 2 writes, well… the second write is the worst!
The problem
• Several objects map to the same physical disks
• Sequential streams get mixed all together
• Result: The disk seeks like hell
Even worse with erasure coding?
This is just an assumption!
•Since erasure coding does chunks of chunks we can possibly
have this phenomena amplified
CLUSTER
How to build it?
How to start?
Things that you must consider:
•Use case
• IO profile: Bandwidth? IOPS? Mixed?
• How many IOPS or Bandwidth per client do I want to deliver?
• Do I use Ceph in standalone or is it combined with a software solution?
•Amount of data (usable not RAW)
• Replica count
• Do I have a data growth planning?
•Leftover
• How much data am I willing to lose if a node fails? (%)
• Am I ready to be annoyed by the scrubbing process?
•
Things that you must not do
• Don't put a RAID underneath your OSD
• Ceph already manages the replication
• Degraded RAID breaks performances
• Reduce usable space on the cluster
• Don't build high density nodes with a tiny cluster
• Failure consideration and data to re-balance
• Potential full cluster
• Don't run Ceph on your hypervisors (unless you're broke)
• Well maybe…
Firefly: Interesting things
going on
Object store multi-backend
• ObjectStore is born
• Aims to support several backends:
• levelDB (default)
• RocksDB
• Fusionio NVMKV
• Seagate Kinetic
• Yours!
Why is it so good?
• No more journal! Yay!
• Object backends have built-in atomic functions
Firefly leveldb
• Relatively new
• Need to be tested with your workload first
• Tend to be more efficient with small objects
Many thanks!
Questions?
Contact: sebastien@enovance.com
Twitter: @sebastien_han
IRC: leseb

More Related Content

What's hot

OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamonOpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamoneNovance
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
ShapeBlue
 
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red HatMultiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
OpenStack
 
How to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with CephHow to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with Ceph
Sean Cohen
 
OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...
OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...
OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...eNovance
 
Manila, an update from Liberty, OpenStack Summit - Tokyo
Manila, an update from Liberty, OpenStack Summit - TokyoManila, an update from Liberty, OpenStack Summit - Tokyo
Manila, an update from Liberty, OpenStack Summit - Tokyo
Sean Cohen
 
Enabling Disaster Recovery as Service (DRaaS) on OpenStack
Enabling Disaster Recovery as Service (DRaaS) on OpenStack Enabling Disaster Recovery as Service (DRaaS) on OpenStack
Enabling Disaster Recovery as Service (DRaaS) on OpenStack
haribabu kasturi
 
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOKCEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
Ceph Community
 
Re-Think of Virtualization and Containerization
Re-Think of Virtualization and ContainerizationRe-Think of Virtualization and Containerization
Re-Think of Virtualization and Containerization
Xu Wang
 
Open stack in action enovance-quantum in action
Open stack in action enovance-quantum in actionOpen stack in action enovance-quantum in action
Open stack in action enovance-quantum in actioneNovance
 
Containers and HPC
Containers and HPCContainers and HPC
Containers and HPC
Olli-Pekka Lehto
 
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
ClusterHQ
 
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang HuiStor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Ceph Community
 
Antoine Coetsier - billing the cloud
Antoine Coetsier - billing the cloudAntoine Coetsier - billing the cloud
Antoine Coetsier - billing the cloud
ShapeBlue
 
Stateful set in kubernetes implementation & usecases
Stateful set in kubernetes implementation & usecases Stateful set in kubernetes implementation & usecases
Stateful set in kubernetes implementation & usecases
Krishna-Kumar
 
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Red Hat Summit 2017:  Wicked Fast PaaS: Performance Tuning of OpenShift and D...Red Hat Summit 2017:  Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Jeremy Eder
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
NETWAYS
 
Which Hypervisor is Best?
Which Hypervisor is Best?Which Hypervisor is Best?
Which Hypervisor is Best?
Kyle Bader
 
XCP-ng - past, present and future
XCP-ng - past, present and futureXCP-ng - past, present and future
XCP-ng - past, present and future
ShapeBlue
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Community
 

What's hot (20)

OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamonOpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red HatMultiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat
 
How to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with CephHow to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with Ceph
 
OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...
OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...
OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...
 
Manila, an update from Liberty, OpenStack Summit - Tokyo
Manila, an update from Liberty, OpenStack Summit - TokyoManila, an update from Liberty, OpenStack Summit - Tokyo
Manila, an update from Liberty, OpenStack Summit - Tokyo
 
Enabling Disaster Recovery as Service (DRaaS) on OpenStack
Enabling Disaster Recovery as Service (DRaaS) on OpenStack Enabling Disaster Recovery as Service (DRaaS) on OpenStack
Enabling Disaster Recovery as Service (DRaaS) on OpenStack
 
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOKCEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
 
Re-Think of Virtualization and Containerization
Re-Think of Virtualization and ContainerizationRe-Think of Virtualization and Containerization
Re-Think of Virtualization and Containerization
 
Open stack in action enovance-quantum in action
Open stack in action enovance-quantum in actionOpen stack in action enovance-quantum in action
Open stack in action enovance-quantum in action
 
Containers and HPC
Containers and HPCContainers and HPC
Containers and HPC
 
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...
 
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang HuiStor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
 
Antoine Coetsier - billing the cloud
Antoine Coetsier - billing the cloudAntoine Coetsier - billing the cloud
Antoine Coetsier - billing the cloud
 
Stateful set in kubernetes implementation & usecases
Stateful set in kubernetes implementation & usecases Stateful set in kubernetes implementation & usecases
Stateful set in kubernetes implementation & usecases
 
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Red Hat Summit 2017:  Wicked Fast PaaS: Performance Tuning of OpenShift and D...Red Hat Summit 2017:  Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
Which Hypervisor is Best?
Which Hypervisor is Best?Which Hypervisor is Best?
Which Hypervisor is Best?
 
XCP-ng - past, present and future
XCP-ng - past, present and futureXCP-ng - past, present and future
XCP-ng - past, present and future
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOcean
 

Viewers also liked

Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Community
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
Ceph Community
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Community
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and Beyond
Sage Weil
 
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageCeph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Sage Weil
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
Marian Marinov
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
Sage Weil
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
Sage Weil
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
Sage Weil
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
Sage Weil
 

Viewers also liked (10)

Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and Beyond
 
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageCeph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 

Similar to Ceph Performance and Optimization - Ceph Day Frankfurt

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Community
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
Nikos Kormpakis
 
Webinar - Getting Started With Ceph
Webinar - Getting Started With CephWebinar - Getting Started With Ceph
Webinar - Getting Started With Ceph
Ceph Community
 
Ceph
CephCeph
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
Patrick McGarry
 
How swift is your Swift - SD.pptx
How swift is your Swift - SD.pptxHow swift is your Swift - SD.pptx
How swift is your Swift - SD.pptx
OpenStack Foundation
 
Surge2012
Surge2012Surge2012
Surge2012
davidapacheco
 
Unite2013-gavilan-pdf
Unite2013-gavilan-pdfUnite2013-gavilan-pdf
Unite2013-gavilan-pdfDavid Gavilan
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNET
Nikos Kormpakis
 
Ceph, Xen, and CloudStack: Semper Melior
Ceph, Xen, and CloudStack: Semper MeliorCeph, Xen, and CloudStack: Semper Melior
Ceph, Xen, and CloudStack: Semper Melior
Patrick McGarry
 
Erasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William ByrneErasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William Byrne
Ceph Community
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBS
Hanson Dong
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
Julien Anguenot
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
Fred de Villamil
 
Ceph & OpenStack - Boston Meetup
Ceph & OpenStack - Boston MeetupCeph & OpenStack - Boston Meetup
Ceph & OpenStack - Boston Meetup
Patrick McGarry
 
Ceph and openstack at the boston meetup
Ceph and openstack at the boston meetupCeph and openstack at the boston meetup
Ceph and openstack at the boston meetup
Kamesh Pemmaraju
 
Debugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to LinuxDebugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to Linux
Serapheim-Nikolaos Dimitropoulos
 
OpenStack and Ceph: the Winning Pair
OpenStack and Ceph: the Winning PairOpenStack and Ceph: the Winning Pair
OpenStack and Ceph: the Winning Pair
Red_Hat_Storage
 
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Community
 

Similar to Ceph Performance and Optimization - Ceph Day Frankfurt (20)

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
 
Webinar - Getting Started With Ceph
Webinar - Getting Started With CephWebinar - Getting Started With Ceph
Webinar - Getting Started With Ceph
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
 
How swift is your Swift - SD.pptx
How swift is your Swift - SD.pptxHow swift is your Swift - SD.pptx
How swift is your Swift - SD.pptx
 
Surge2012
Surge2012Surge2012
Surge2012
 
Unite2013-gavilan-pdf
Unite2013-gavilan-pdfUnite2013-gavilan-pdf
Unite2013-gavilan-pdf
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNET
 
Ceph, Xen, and CloudStack: Semper Melior
Ceph, Xen, and CloudStack: Semper MeliorCeph, Xen, and CloudStack: Semper Melior
Ceph, Xen, and CloudStack: Semper Melior
 
Erasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William ByrneErasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William Byrne
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBS
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
Ceph & OpenStack - Boston Meetup
Ceph & OpenStack - Boston MeetupCeph & OpenStack - Boston Meetup
Ceph & OpenStack - Boston Meetup
 
Ceph and openstack at the boston meetup
Ceph and openstack at the boston meetupCeph and openstack at the boston meetup
Ceph and openstack at the boston meetup
 
Debugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to LinuxDebugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to Linux
 
OpenStack and Ceph: the Winning Pair
OpenStack and Ceph: the Winning PairOpenStack and Ceph: the Winning Pair
OpenStack and Ceph: the Winning Pair
 
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
 

Recently uploaded

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Ceph Performance and Optimization - Ceph Day Frankfurt

  • 2. Whoami 💥 Sébastien Han 💥 French Cloud Engineer working for eNovance 💥 Daily job focused on Ceph and OpenStack 💥 Blogger Personal blog: http://www.sebastien-han.fr/blog/ Company blog: http://techs.enovance.com/ Last Cephdays presentation
  • 3. How does Ceph perform? 42* *The Hitchhiker's Guide to the Galaxy
  • 5. CRUSH: deterministic object placement As soon as a client writes into Ceph, the operation is computed and the client decides to which OSD the object should belong
  • 6. Aggregation: cluster level As soon as you write into Ceph, all the objects get equally spread across the entire Cluster, understanding machines and disks..
  • 7. Aggregation: OSD level As soon as an IO goes into an OSD, no matter how the original pattern was, it becomes sequential.
  • 8. The Bad Ceph IO pattern
  • 9. Journaling As soon as an IO goes into an OSD, it gets written twice.
  • 10. Journal and OSD data on the same disk Journal penalty on the disk Since we write twice, if the journal is stored on the same disk as the OSD data this will result in the following: Device: wMB/s sdb1 - journal 50.11 sdb2 - osd_data 40.25
  • 11. Filesystem fragmentation • Objects are stored as files on the OSD filesystem • Several IO patterns with different block sizes increase filesystem fragmentation • Possible root cause: image sparseness • One year old cluster ends up with (see allocsize options for XFS): $ sudo xfs_db -c frag -r /dev/sdd actual 196334, ideal 122582, fragmentation factor 37.56%
  • 12. No parallelized reads • Ceph will always serve the read request from the primary OSD • Room for Nx times speed up where N is the replica count Blueprint from Sage for the Giant release
  • 13. Scrubbing impact • Consistent object check at the PG level • Compare replicas versions between each others (Fsck for objects) • Light scrubbing (daily) checks the object size and attributes. • Deep scrubbing (weekly) reads the data and uses checksums to ensure data integrity. • Corruption exists – ECC memory (10^15 for enterprise disk) ~113TB • No pain No gain
  • 14. The Ugly Ceph IO pattern
  • 15. IOs to the OSD disk One IO into Ceph leads to 2 writes, well… the second write is the worst!
  • 16. The problem • Several objects map to the same physical disks • Sequential streams get mixed all together • Result: The disk seeks like hell
  • 17. Even worse with erasure coding? This is just an assumption! •Since erasure coding does chunks of chunks we can possibly have this phenomena amplified
  • 19. How to start? Things that you must consider: •Use case • IO profile: Bandwidth? IOPS? Mixed? • How many IOPS or Bandwidth per client do I want to deliver? • Do I use Ceph in standalone or is it combined with a software solution? •Amount of data (usable not RAW) • Replica count • Do I have a data growth planning? •Leftover • How much data am I willing to lose if a node fails? (%) • Am I ready to be annoyed by the scrubbing process? •
  • 20. Things that you must not do • Don't put a RAID underneath your OSD • Ceph already manages the replication • Degraded RAID breaks performances • Reduce usable space on the cluster • Don't build high density nodes with a tiny cluster • Failure consideration and data to re-balance • Potential full cluster • Don't run Ceph on your hypervisors (unless you're broke) • Well maybe…
  • 22. Object store multi-backend • ObjectStore is born • Aims to support several backends: • levelDB (default) • RocksDB • Fusionio NVMKV • Seagate Kinetic • Yours!
  • 23. Why is it so good? • No more journal! Yay! • Object backends have built-in atomic functions
  • 24. Firefly leveldb • Relatively new • Need to be tested with your workload first • Tend to be more efficient with small objects