Ceph: de facto storage
backend for OpenStack

FOSDEM 2014

- Sébastien Han
- French Cloud Engineer working for
eNovance
- Daily job focused on Ceph and
OpenStack
Ceph
What is it?
Unified distributed storage
system
➜ Started in 2006 | Open Source LGPL | Written in C++
➜ Self managing/healing
➜ Self balancing (uniform distribution)
➜ Painless scaling
➜ Data placement with CRUSH

➜ Pseudo-random placement algorithm
➜ Rule-based configuration
Overview
State of the integration
OpenStack Havana
Today’s Havana integration
Havana is not the perfect
stack
…
➜Nova RBD ephemeral backend is buggy:
https://github.com/jdurgin/nova/commits/havanaephemeral-rbd
Icehouse status
Future
Tomorrow’s integration
Icehouse progress
BLUEPRINTS / BUGS

STATUS

Swift RADOS backend

In progress

DevStack Ceph

In progress

RBD TGT for other hypervisors

Not started

Enable cloning for rbd-backed
ephemeral disks

In progress

Clone non-raw images in Glance
RBD backend

Implemented

Nova ephemeral backend dedicated
pool and user

Implemented

Volume migration support

Not started

Use RBD snapshot instead of qemu- Not started
img
Ceph, what’s coming up?
Roadmap
Firefly
➜ Tiering - cache pool overlay
➜ Erasure code
➜ Ceph OSD ZFS
➜ Filestore multi-backend
Many thanks!
Questions?
Contact: sebastien@enovance.com
Twitter: @sebastien_han
IRC: leseb
Company blog: http://techs.enovance.com/
Personal blog: http://www.sebastien-han.fr/blog/

Ceph de facto storage backend for OpenStack

Editor's Notes

  • #4 It provides numerous features: Self healing: if something breaks, the cluster reacts and triggers a recovery process Self balancing, as soon as you add a new disk or a new node, the cluster moves and re-balance data Self managing: periodic tasks such as scrubbing to check object consistency and if something is wrong ceph repairs the object Painless scaling: it’s fearly easy to add a new disk, node especially with all the tools outthere to deploy ceph (puppet, chef, ceph-deploy) Intelligent data placement, so you can logically reflect your physical infrastructure and you can build placement rules objects are automatically placed, balanced, migrated in a dynamic cluster Controlled replication under scalable hashing pseudo-random placement algorithm fast calculation, no lookup repeatable, deterministic rule-based configuration infrastructure topology aware adjustable replication The way CRUSH is configured is somewhat unique. Instead of defining pools for different data types, workgroups, subnets, or applications, CRUSH is configured with the physical topology of your storage network. You tell it how many buildings, rooms, shelves, racks, and nodes you have, and you tell it how you want data placed. For example, you could tell CRUSH that it’s okay to have two replicas in the same building, but not on the same power circuit. You also tell it how many copies to keep.
  • #5 RADOS is a distributed object store. On top of RADOS, we have built three systems that allow us to store data Several ways to access data RGW Native RESTful S3 and Swift compatible Multi-tenant and quota Multi-site capabilities Disaster recovery RBD Thinly provisioned Full and Incremental Snapshots Copy-on-write cloning Native Linux kernel driver support Supported by KVM and Xen CephFS POSIX-compliant semantics Subdirectory snapshots