The document summarizes a presentation about Ceph storage and its integration with OpenStack. It discusses:
- Ceph is an open source distributed storage system that is self-managing, self-healing, and scales easily. It uses a pseudo-random placement algorithm called CRUSH to distribute data.
- Ceph has become the de facto storage backend for OpenStack. The presentation discusses the current status of Ceph integration in OpenStack Havana and upcoming improvements planned for Icehouse.
- Future releases like Firefly will add new features to Ceph like tiering, erasure coding, ZFS support, and multi-backend filestore capabilities.
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Ceph de facto storage backend for OpenStack
1. Ceph: de facto storage
backend for OpenStack
FOSDEM 2014
- Sébastien Han
- French Cloud Engineer working for
eNovance
- Daily job focused on Ceph and
OpenStack
3. Unified distributed storage
system
➜ Started in 2006 | Open Source LGPL | Written in C++
➜ Self managing/healing
➜ Self balancing (uniform distribution)
➜ Painless scaling
➜ Data placement with CRUSH
➜ Pseudo-random placement algorithm
➜ Rule-based configuration
10. Icehouse progress
BLUEPRINTS / BUGS
STATUS
Swift RADOS backend
In progress
DevStack Ceph
In progress
RBD TGT for other hypervisors
Not started
Enable cloning for rbd-backed
ephemeral disks
In progress
Clone non-raw images in Glance
RBD backend
Implemented
Nova ephemeral backend dedicated
pool and user
Implemented
Volume migration support
Not started
Use RBD snapshot instead of qemu- Not started
img
It provides numerous features:
Self healing: if something breaks, the cluster reacts and triggers a recovery process
Self balancing, as soon as you add a new disk or a new node, the cluster moves and re-balance data
Self managing: periodic tasks such as scrubbing to check object consistency and if something is wrong ceph repairs the object
Painless scaling: it’s fearly easy to add a new disk, node especially with all the tools outthere to deploy ceph (puppet, chef, ceph-deploy)
Intelligent data placement, so you can logically reflect your physical infrastructure and you can build placement rules
objects are automatically placed, balanced, migrated in a dynamic cluster
Controlled replication under scalable hashing
pseudo-random placement algorithm
fast calculation, no lookup
repeatable, deterministic
rule-based configuration
infrastructure topology aware
adjustable replication
The way CRUSH is configured is somewhat unique. Instead of defining pools for different data types, workgroups, subnets, or applications, CRUSH is configured with the physical topology of your storage network. You tell it how many buildings, rooms, shelves, racks, and nodes you have, and you tell it how you want data placed. For example, you could tell CRUSH that it’s okay to have two replicas in the same building, but not on the same power circuit. You also tell it how many copies to keep.
RADOS is a distributed object store. On top of RADOS, we have built three systems that allow us to store data
Several ways to access data
RGW
Native RESTful
S3 and Swift compatible
Multi-tenant and quota
Multi-site capabilities
Disaster recovery
RBD
Thinly provisioned
Full and Incremental Snapshots
Copy-on-write cloning
Native Linux kernel driver support
Supported by KVM and Xen
CephFS
POSIX-compliant semantics
Subdirectory snapshots