Ceph: de factor storage
backend for OpenStack

OpenStack Summit 2013
Hong Kong
Whoami
� Sébastien Han
� French Cloud Engineer working for eNovance
� Daily job focused on Ceph and OpenStack
� Blogger
Personal blog: http://www.sebastien-han.fr/blog/
Company blog: http://techs.enovance.com/

Worldwide offices
We design, build and run clouds – anytime coverage
Ceph
What is it?
The project
➜ Unified distributed storage system
➜ Started in 2006 as a PhD by Sage Weil
➜ Open source under LGPL license
➜ Written in C++
➜ Build the future of storage on commodity hardware
Key features
➜ Self managing/healing
➜ Self balancing
➜ Painless scaling
➜ Data placement with CRUSH
Controlled replication under scalable
hashing
➜ Pseudo-random placement algorithm
➜ Statistically uniform distribution
➜ Rule-based configuration
Overview
Building a Ceph cluster
General considerations
How to start?
➜ Use case
• IO profile: Bandwidth? IOPS? Mixed?
• Guaranteed IOs : how many IOPS or Bandwidth per client do I want to
deliver?
• Usage: do I use Ceph in standalone or is it combined with a software
solution?

➜ Amount of data (usable not RAW)
• Replica count
• Failure ratio - How much data am I willing to rebalance if a node fail?
• Do I have a data growth planning?

➜ Budget :-)
Things that you must not do
➜ Don't put a RAID underneath your OSD
• Ceph already manages the replication
• Degraded RAID breaks performances
• Reduce usable space on the cluster

➜ Don't build high density nodes with a tiny
cluster
• Failure consideration and data to re-balance
• Potential full cluster

➜ Don't run Ceph on your hypervisors (unless you're
broke)
State of the integration
Including best Havana’s additions
Why is Ceph so good?
It unifies OpenStack components
Havana’s additions
➜ Complete refactor of the Cinder driver:
• Librados and librbd usage
• Flatten volumes created from snapshots
• Clone depth

➜ Cinder backup with a Ceph backend:
•
•
•
•
•

backing up within the same Ceph pool (not recommended)
backing up between different Ceph pools
backing up between different Ceph clusters
Support RBD stripes
Differentials

➜ Nova Libvirt_image_type = rbd
• Directly boot all the VMs in Ceph
• Volume QoS
Today’s Havana integration
Is Havana the perfect stack?
…
Well, almost…
What’s missing?
➜ Direct URL download for Nova
• Already on the pipe, probably for 2013.2.1

➜ Nova’s snapshots integration
• Ceph snapshot

https://github.com/jdurgin/nova/commits/havana-ephemeralrbd
Icehouse and beyond
Future
Tomorrow’s integration
Icehouse roadmap
➜ Implement “bricks” for RBD
➜ Re-implement snapshotting function to use RBD
snapshot
➜ RBD on Nova bare metal
➜ Volume migration support
➜ RBD stripes support

« J » potential roadmap

➜ Manila support
Ceph, what’s coming up?
Roadmap
Firefly
➜ Tiering - cache pool overlay
➜ Erasure code
➜ Ceph OSD ZFS
➜ Full support of OpenStack Icehouse
Many thanks!
Questions?

Contact: sebastien@enovance.com
Twitter: @sebastien_han
IRC: leseb

Openstack Summit HK - Ceph defacto - eNovance

  • 1.
    Ceph: de factorstorage backend for OpenStack OpenStack Summit 2013 Hong Kong
  • 2.
    Whoami � Sébastien Han �French Cloud Engineer working for eNovance � Daily job focused on Ceph and OpenStack � Blogger Personal blog: http://www.sebastien-han.fr/blog/ Company blog: http://techs.enovance.com/ Worldwide offices We design, build and run clouds – anytime coverage
  • 3.
  • 4.
    The project ➜ Unifieddistributed storage system ➜ Started in 2006 as a PhD by Sage Weil ➜ Open source under LGPL license ➜ Written in C++ ➜ Build the future of storage on commodity hardware
  • 5.
    Key features ➜ Selfmanaging/healing ➜ Self balancing ➜ Painless scaling ➜ Data placement with CRUSH
  • 6.
    Controlled replication underscalable hashing ➜ Pseudo-random placement algorithm ➜ Statistically uniform distribution ➜ Rule-based configuration
  • 7.
  • 8.
    Building a Cephcluster General considerations
  • 9.
    How to start? ➜Use case • IO profile: Bandwidth? IOPS? Mixed? • Guaranteed IOs : how many IOPS or Bandwidth per client do I want to deliver? • Usage: do I use Ceph in standalone or is it combined with a software solution? ➜ Amount of data (usable not RAW) • Replica count • Failure ratio - How much data am I willing to rebalance if a node fail? • Do I have a data growth planning? ➜ Budget :-)
  • 10.
    Things that youmust not do ➜ Don't put a RAID underneath your OSD • Ceph already manages the replication • Degraded RAID breaks performances • Reduce usable space on the cluster ➜ Don't build high density nodes with a tiny cluster • Failure consideration and data to re-balance • Potential full cluster ➜ Don't run Ceph on your hypervisors (unless you're broke)
  • 11.
    State of theintegration Including best Havana’s additions
  • 12.
    Why is Cephso good? It unifies OpenStack components
  • 13.
    Havana’s additions ➜ Completerefactor of the Cinder driver: • Librados and librbd usage • Flatten volumes created from snapshots • Clone depth ➜ Cinder backup with a Ceph backend: • • • • • backing up within the same Ceph pool (not recommended) backing up between different Ceph pools backing up between different Ceph clusters Support RBD stripes Differentials ➜ Nova Libvirt_image_type = rbd • Directly boot all the VMs in Ceph • Volume QoS
  • 14.
  • 15.
    Is Havana theperfect stack? …
  • 16.
  • 17.
    What’s missing? ➜ DirectURL download for Nova • Already on the pipe, probably for 2013.2.1 ➜ Nova’s snapshots integration • Ceph snapshot https://github.com/jdurgin/nova/commits/havana-ephemeralrbd
  • 18.
  • 19.
  • 20.
    Icehouse roadmap ➜ Implement“bricks” for RBD ➜ Re-implement snapshotting function to use RBD snapshot ➜ RBD on Nova bare metal ➜ Volume migration support ➜ RBD stripes support « J » potential roadmap ➜ Manila support
  • 21.
  • 22.
    Firefly ➜ Tiering -cache pool overlay ➜ Erasure code ➜ Ceph OSD ZFS ➜ Full support of OpenStack Icehouse
  • 23.

Editor's Notes

  • #5 Insist on commodity hardware: Open source so no vendor lock-in No software nor hardware lock-in You don’t need big boxes anymore You can diverse hardware (old, crap, recent) Which means that it moves along with your needs and your budget as well And obviously it makes it easy to test
  • #6 It provides numerous features Self healing: if something breaks, the cluster reacts and triggers a recovery process Self balancing, as soon as you add a new disk or a new node, the cluster moves and re-balance data Self managing: periodic tasks such as scrubbing to check object consistency and if something is wrong ceph repairs the object Painless scaling: it’s fearly easy to add a new disk, node especially with all the tools outthere to deploy ceph (puppet, chef, ceph-deploy) Intelligent data placement, so you can logically reflect your physical infrastructure and you can build placement rules objects are automatically placed, balanced, migrated in a dynamic cluster
  • #7 Controlled replication under scalable hashing pseudo-random placement algorithm fast calculation, no lookup repeatable, deterministic statistically uniform distribution rule-based configuration infrastructure topology aware adjustable replication The way CRUSH is configured is somewhat unique. Instead of defining pools for different data types, workgroups, subnets, or applications, CRUSH is configured with the physical topology of your storage network. You tell it how many buildings, rooms, shelves, racks, and nodes you have, and you tell it how you want data placed. For example, you could tell CRUSH that it’s okay to have two replicas in the same building, but not on the same power circuit. You also tell it how many copies to keep.
  • #8 RADOS is a distributed object store, and it’s the foundation for Ceph. On top of RADOS, we have built three systems that allow us to store data Several ways to access data RGW Native RESTful S3 and Swift compatible Multi-tenant and quota Multi-site capabilities Disaster recovery RBD Thinly provisioned Full and Incremental Snapshots Copy-on-write cloning Native Linux kernel driver support Supported by KVM and Xen CephFS POSIX-compliant semantics Subdirectory snapshots
  • #13 Ceph tighly interacts with openstack components