Ceph as storage for
CloudStack

Wido den Hollander <wido@42on.com>
Who am I?
●

Wido den Hollander
–
–

Co-owner of a dutch hosting company

–
●

Part of the Ceph community since 2010
Committer and PMC member for Apache CloudStack

Developed:
–
–

rados-java

–

libvirt RBD storage pool support

–
●

phprados

CloudStack integration

Work as a Ceph and CloudStack consultant
Ceph
Ceph is a unified, open source distributed object
store
Auto recovery
●
●

Recovery when a OSD fails
Data migration when the cluster expands or
contracts
Traditional vs Distributed
Traditional storage systems don't scale that well
–

All have their limitations: Number of disks, shelfs, CPUs,
network connections, etc

–

Scaling usually meant buying a second system
●

●

Ceph clusters can grow and shrink without service
interruptions
–

●

Migrating data requires service windows

We don't want to watch rsync copying over data and
wasting our time

Ceph runs on commodity hardware
–

Just add more nodes to add capacity

–

Ceph fits in smaller budgets
Hardware failure is the rule
●

As systems grow hardware failure becomes
more frequent
–

–

●

A system with 1.000 nodes will see daily hardware
issues
We don't want to get out of bed when a machine
fails at 03:00 on Sunday morning

Commodity hardware is cheaper, but less
reliable. Ceph mitigates that
RBD: the RADOS Block Device
●

Ceph is a object store
–
–

●

Store billions of objects in pools
RADOS is the heart of Ceph

RBD block devices are striped over RADOS
objects
–

Default stripe size is 4MB

–

All objects are distributed over all available Object
Store Daemons (OSDs)

–

40GB image consists out of 10.000 potential
objects

–

Thin provisioned
RADOS Block Device
RBD for Primary Storage
●

In 4.0 RBD support for Primary Storage for
KVM was added
–

No support for VMware or Xen

–

Xen support is being worked on (not by me)

●

Live migration is supported

●

Snapshot and backup support (4.2)

●

Cloning when deploying from templates

●

Run System VMs from RBD (4.2)

●

Uses the rados-java bindings
RBD for Primary Storage
System Virtual Machines
●

Perform cluster tasks, e.g.:
–
–

Serving metadata to Instances

–

Loadbalancing

–

Copying data between clusters

–
●

DHCP

Run in between user Instances

They can now run from RBD due to a change in
the way they get their metadata
–

Old way was dirty and had to be replaced
●

It created a small disk with metadata files
rados-java bindings
●

Developed to have the KVM Agent perform
snapshotting and cloning
–

libvirt doesn't know how to do this, but it would be
best if it did

●

Uses JNA, so easy deployment

●

Binds both librados and librbd

●

Available on github.com/ceph/rados-java
Future plans
●

Add RBD write caching
–

Write-cache setting per Disk Offering
●

–
●

none (default), write-back and write-through

Probably in 4.3

Native RADOS support for Secondary Storage
–
–

●

Secondary Storage already supports S3
Ceph has a S3-compatible gateway

Moving logic from the KVM Agent into libvirt
–

Like snapshotting and cloning RBD images
Help is needed!
●

Code is tested, but testing is always welcome

●

Adding more RBD logic into libvirt
–
–

Cloning RBD images

–

●

Snapshotting RBD images
This makes the CloudStack code cleaner and
helps other users who also use libvirt with RBD

Improving the rados-java bindings
–

Not feature complete yet
Thanks
●

Find me on:
–

E-Mail: wido@42on.com

–

IRC: widodh @ Freenode / wido @ OFTC

–

Skype: widodh / contact42on

–

Twitter: widodh

Integrating CloudStack & Ceph

  • 1.
    Ceph as storagefor CloudStack Wido den Hollander <wido@42on.com>
  • 2.
    Who am I? ● Widoden Hollander – – Co-owner of a dutch hosting company – ● Part of the Ceph community since 2010 Committer and PMC member for Apache CloudStack Developed: – – rados-java – libvirt RBD storage pool support – ● phprados CloudStack integration Work as a Ceph and CloudStack consultant
  • 3.
    Ceph Ceph is aunified, open source distributed object store
  • 4.
    Auto recovery ● ● Recovery whena OSD fails Data migration when the cluster expands or contracts
  • 5.
    Traditional vs Distributed Traditionalstorage systems don't scale that well – All have their limitations: Number of disks, shelfs, CPUs, network connections, etc – Scaling usually meant buying a second system ● ● Ceph clusters can grow and shrink without service interruptions – ● Migrating data requires service windows We don't want to watch rsync copying over data and wasting our time Ceph runs on commodity hardware – Just add more nodes to add capacity – Ceph fits in smaller budgets
  • 6.
    Hardware failure isthe rule ● As systems grow hardware failure becomes more frequent – – ● A system with 1.000 nodes will see daily hardware issues We don't want to get out of bed when a machine fails at 03:00 on Sunday morning Commodity hardware is cheaper, but less reliable. Ceph mitigates that
  • 7.
    RBD: the RADOSBlock Device ● Ceph is a object store – – ● Store billions of objects in pools RADOS is the heart of Ceph RBD block devices are striped over RADOS objects – Default stripe size is 4MB – All objects are distributed over all available Object Store Daemons (OSDs) – 40GB image consists out of 10.000 potential objects – Thin provisioned
  • 8.
  • 9.
    RBD for PrimaryStorage ● In 4.0 RBD support for Primary Storage for KVM was added – No support for VMware or Xen – Xen support is being worked on (not by me) ● Live migration is supported ● Snapshot and backup support (4.2) ● Cloning when deploying from templates ● Run System VMs from RBD (4.2) ● Uses the rados-java bindings
  • 10.
  • 11.
    System Virtual Machines ● Performcluster tasks, e.g.: – – Serving metadata to Instances – Loadbalancing – Copying data between clusters – ● DHCP Run in between user Instances They can now run from RBD due to a change in the way they get their metadata – Old way was dirty and had to be replaced ● It created a small disk with metadata files
  • 12.
    rados-java bindings ● Developed tohave the KVM Agent perform snapshotting and cloning – libvirt doesn't know how to do this, but it would be best if it did ● Uses JNA, so easy deployment ● Binds both librados and librbd ● Available on github.com/ceph/rados-java
  • 13.
    Future plans ● Add RBDwrite caching – Write-cache setting per Disk Offering ● – ● none (default), write-back and write-through Probably in 4.3 Native RADOS support for Secondary Storage – – ● Secondary Storage already supports S3 Ceph has a S3-compatible gateway Moving logic from the KVM Agent into libvirt – Like snapshotting and cloning RBD images
  • 14.
    Help is needed! ● Codeis tested, but testing is always welcome ● Adding more RBD logic into libvirt – – Cloning RBD images – ● Snapshotting RBD images This makes the CloudStack code cleaner and helps other users who also use libvirt with RBD Improving the rados-java bindings – Not feature complete yet
  • 15.
    Thanks ● Find me on: – E-Mail:wido@42on.com – IRC: widodh @ Freenode / wido @ OFTC – Skype: widodh / contact42on – Twitter: widodh