1. Ceph as storage for
CloudStack
Wido den Hollander <wido@42on.com>
2. Who am I?
●
Wido den Hollander
–
–
Co-owner of a dutch hosting company
–
●
Part of the Ceph community since 2010
Committer and PMC member for Apache CloudStack
Developed:
–
–
rados-java
–
libvirt RBD storage pool support
–
●
phprados
CloudStack integration
Work as a Ceph and CloudStack consultant
3. Ceph
Ceph is a unified, open source distributed object
store
4. So why Ceph?
●
As Sage already explained... :-)
●
Traditional storage systems don't scale that well
–
All have their limitations: Number of disks, shelfs, CPUs,
network connections, etc
–
Scaling usually meant buying a second system
●
●
●
Migrating data requires service windows and watching rsync...
Ceph clusters can grow and shrink without service
interruptions
Ceph runs on commodity hardware
–
Just add more nodes to add capacity
–
Ceph fits in smaller budgets
5. Hardware failure is the rule
●
As systems grow hardware failure becomes
more frequent
–
●
A system with 1.000 nodes will see daily hardware
issues
Commodity hardware is cheaper, but less
reliable. Ceph mitigates that
6. RBD: the RADOS Block Device
●
Ceph is a object store
–
–
●
Store billions of objects in pools
RADOS is the heart of Ceph
RBD block devices are striped over RADOS
objects
–
Default stripe size is 4MB
–
All objects are distributed over all available Object
Store Daemons (OSDs)
7. RBD for Primary Storage
●
In 4.0 RBD support for Primary Storage for
KVM was added
–
No support for VMware or Xen
–
Xen support is being worked on (not by me)
●
Live migration is supported
●
Snapshot and backup support (4.2)
●
Cloning when deploying from templates
●
Run System VMs from RBD (4.2)
●
Uses the rados-java bindings
10. System Virtual Machines
●
Perform cluster tasks, e.g.:
–
–
Serving metadata to Instances
–
Loadbalancing
–
Copying data between clusters
–
●
DHCP
Run in between user Instances
They can now run from RBD due to a change in
the way they get their metadata
–
Old way was dirty and had to be replaced
●
It created a small disk with metadata files
11. Performance
●
Parallel performance is high
–
●
Run 100s or 1000s of Instances with 200 IOps
each
Serial performance is moderate
–
A single Instance won't get 20k IOps
12. rados-java bindings
●
Developed to have the KVM Agent perform
snapshotting and cloning
–
libvirt doesn't know how to do this, but it would be
best if it did
●
Uses JNA, so easy deployment
●
Binds both librados and librbd
●
Available on github.com/ceph/rados-java
13. Future plans
●
Add RBD write caching
–
Write-cache setting per Disk Offering
●
–
●
none (default), write-back and write-through
Probably in 4.3
Native RADOS support for Secondary Storage
–
–
●
Secondary Storage already supports S3
Ceph has a S3-compatible gateway
Moving logic from the KVM Agent into libvirt
–
Like snapshotting and cloning RBD images
14. Help is needed!
●
Code is tested, but testing is always welcome
●
Adding more RBD logic into libvirt
–
–
Cloning RBD images
–
●
Snapshotting RBD images
This makes the CloudStack code cleaner and
helps other users who also use libvirt with RBD
Improving the rados-java bindings
–
Not feature complete yet