Ceph in Mirantis OpenStack
Mountain View, 2014
1. What is Ceph?
2. What is Mirantis OpenStack?
3. How does Ceph ﬁt into OpenStack?
4. What has Fuel ever done for Ceph?
5. What does it look like?
6. Things we’ve done
7. Disk partition for Ceph OSD
8. Cephx authentication settings
9. Types of VM migrations
10. Live VM migrations with Ceph
11. Thinks we left undone
12. Diagnostics and troubleshooting
What is Ceph?
Ceph is a free clustered storage platform that provides uniﬁed
object, block, and ﬁle storage.
Object Storage RADOS objects support snapshotting, replication,
Block Storage RBD block devices are thinly provisioned over
RADOS objects and can be accessed by QEMU via
File Storage CephFS metadata servers (MDS) provide a
POSIX-compliant overlay over RADOS.
What is Mirantis OpenStack?
OpenStack is an open source cloud computing platform.
Mirantis ships hardened OpenStack packages and provides Fuel
utility to simplify deployment of OpenStack and Ceph.
Fuel uses Cobbler, MCollective, and Puppet to discover
nodes, provision OS, and setup OpenStack services.
Fuel master node
How does Ceph ﬁt into OpenStack?
RBD drivers for OpenStack make libvirt
conﬁgure the QEMU interface to librbd.
Multi-node striping and redundancy
for block storage (Cinder volumes
and Nova ephemeral drives)
Copy-on-write cloning of images to
volumes and instances
Uniﬁed storage pool for all types of
storage (object, block, POSIX)
Live migration of Ceph-backed
Problems: sensitivity to clock drift, multi-site (async replication in
Emperor), block storage density (erasure coding in Fireﬂy), Swift
API gap (rbd backend for Swift)
What has Fuel ever done for Ceph?
1. Fuel deploys Ceph Monitors and OSDs on dedicated nodes or
in combination with OpenStack components.
2. Creates partitions for OSDs when nodes are provisioned.
3. Creates separate RADOS pools and sets up Cephx
authentication for Cinder, Glance, and Nova.
4. Conﬁgures Cinder, Glance, and Nova to use RBD backend
with the right pools and credentials.
5. Deploys RADOS Gateway (S3 and Swift API frontend to
Ceph) behind HAProxy on controller nodes.
What does it look like?
Select storage options ⇒ assign roles to nodes ⇒ allocate disks:
Things we’ve done
1. Set the right GPT type GUIDs on OSD and journal partitions
for udev automount rules
2. ceph-deploy: set up root SSH between Ceph nodes
3. Basic Ceph settings: cephx, pool size, networks
4. Cephx: ceph auth command line can’t be split
5. Rados Gateway: has to be the Inktank’s fork of FastCGI, set
an inﬁnite revocation interval for UUID auth tokens to work
6. Patch Cinder to convert non-raw images when creating an
RBD backed volume from Glance
7. Patch Nova: clone RBD backed Glance images into RBD
backed ephemeral volumes, pass RBD user to qemu-img
8. Ephemeral RBD: disable SSH key injection, set up Nova,
libvirt, and QEMU for live migrations
Disk partitioning for Ceph OSD
Flow of disk partitioning information during discovery,
conﬁguration, provisioning, and deployment:
Fuel master node
GPT partition type GUIDs according to ceph-disk:
JOURNAL_UUID = ’ 45 b0969e -9 b03 -4 f30 - b4c6 - b4b80ceff106 ’
= ’4 fbd7e29 -9 d25 -41 b8 - afd0 -062 c0ceff05d ’
If more than one device is allocated for OSD Journal, journal
devices are evenly distributed between OSDs.
Cephx authentication settings
Monitor ACL is the same for all Cephx users:
OSD ACLs vary per OpenStack component:
Glance: allow class - read object_prefix rbd_children ,
allow rwx pool = images
Cinder: allow class - read object_prefix rbd_children ,
allow rwx pool = volumes
allow rx pool = images
Nova: allow class - read object_prefix rbd_children ,
allow rwx pool = volumes
allow rx pool = images
allow rwx pool = compute
Watch out: Cephx is easily tripped up by unexpected whitespace in
ceph auth command line parameters, so we have to keep them all
on a single line.
Types of VM migrations
Live vs oﬄine: Is VM stopped during migration?
Block vs shared storage vs volume-backed: Is VM data shared
between nodes? Is VM metadata (e.g. libvirt domain
Native vs tunneled: Is VM state transferred directly between
hypervisors or tunneled by libvirtd?
Direct vs peer-to-peer: Is migration controlled by libvirt client or by
Managed vs unmanaged: Is migration controlled by libvirt or by
Live, volume-backed*, native, peer-to-peer, managed.
Live VM migrations with Ceph
Enable native peer to peer live migration:
Source compute node
Destination compute node
libvirt VIR_MIGRATE_* ﬂags: LIVE, PEER2PEER,
Patch Nova to decouple shared volumes from shared libvirt
metadata logic during live migration
Set VNC listen address to 0.0.0.0 and block VNC from outside
the management network in iptables
Open ports 49152+ between computes for QEMU migrations
Things we left undone
1. Non-root user with sudo for ceph-deploy
2. Calculate PG numbers based on the number of OSDs
3. Ceph public network should go to a second storage network
instead of management
4. Dedicated Monitor nodes, list all Monitors in ceph.conf on
each Ceph node
5. Multi-backend conﬁguration for Cinder
6. A better way to conﬁgure pools for OpenStack services (than
CEPH_ARGS in the init script)
7. Make Nova update VM’s VNC listen address to
vncserver_listen of the destination compute after migration
8. Replace ’qemu-img convert’ with clone_image() in
LibvirtDriver.snapshot() in Nova
Diagnostics and troubleshooting
ceph osd tree
cinder create 1
qemu - img convert -O raw cirros . qcow2 cirros . raw
glance image - create -- name cirros - raw --is - public yes
-- container - format bare -- disk - format raw < cirros . raw
nova boot -- flavor 1 -- image cirros - raw vm0
nova live - migration vm0 node -3
disk partitioning failed during provisioning – check if traces of
previous partition tables are left on any drives
’ceph-deploy conﬁg pull’ failed – check if the node can ssh to the
primary controller over management network
HEALTH_WARN: clock skew detected – check your ntpd settings,
make sure your NTP server is reachable from all nodes
ENOSPC when storing small objects in RGW – try setting a
smaller rgw object stripe size
Read the docs:
Get the code:
Mirantis OpenStack ISO image and VirtualBox scripts,
ceph Puppet module for Fuel,
Josh Durgin’s havana-ephemeral-rbd branch for Nova.
Vote on Nova bugs:
#1226351, #1261675, #1262450, #1262914.
Sign up for Mirantis and Inktank webcast on Ceph and OpenStack.