Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Colorado OpenStack Meetup 
14 OCT 2014
HISTORICAL TIMELINE 
2 
RHEL-OSP 
Certification 
FEB 2014 
OpenStack 
Integration 
2011 
MAY 2012 
Launch of 
Inktank 
201...
OPENSTACK USER SURVEY, 
05/2014 
3 
DEV / QA PROOF OF CONCEPT PRODUCTION
A STORAGE REVOLUTION
ARCHITECTURE
ARCHITECTURAL COMPONENTS 
Copyright © 2014 by Inktank 
6 
APP HOST/VM CLIENT
ARCHITECTURAL COMPONENTS 
7 
APP HOST/VM CLIENT 
Copyright © 2014 by Inktank
OBJECT STORAGE DAEMONS 
8 
btrfs 
xfs 
ext4 
zfs? 
Copyright © 2014 by Inktank
RADOS CLUSTER 
9 
RADOS CLUSTER 
Copyright © 2014 by Inktank
RADOS COMPONENTS 
10 
OSDs: 
 10s to 10000s in a cluster 
 One per disk (or one per SSD, RAID group…) 
 Serve stored ob...
WHERE DO OBJECTS LIVE? 
11 
?? 
Copyright © 2014 by Inktank
A METADATA SERVER? 
12 
1 
2 
Copyright © 2014 by Inktank
CALCULATED PLACEMENT 
13 
A-G 
H-N 
O-T 
U-Z 
Copyright © 2014 by Inktank
EVEN BETTER: CRUSH! 
14 
PLACEMENT GROUPS CLUSTER 
(PGs) 
Copyright © 2014 by Inktank
CRUSH IS A QUICK 
CALCULATION 
15 
RADOS CLUSTER 
Copyright © 2014 by Inktank
CRUSH: DYNAMIC DATA 
PLACEMENT 
16 
CRUSH: 
 Pseudo-random placement algorithm 
 Fast calculation, no lookup 
 Repeatab...
CRUSH 
17 
hash(object name) % num pg 
CRUSH(pg, cluster state, rule set) 
Copyright © 2014 by Inktank
18 
Copyright © 2014 by Inktank
19 
?? 
Copyright © 2014 by Inktank
20 
Copyright © 2014 by Inktank
21 
Copyright © 2014 by Inktank
22 
?? 
Copyright © 2014 by Inktank
23 
Copyright © 2014 by Inktank
24 
Copyright © 2014 by Inktank
25 
Copyright © 2014 by Inktank
ARCHITECTURAL COMPONENTS 
26 
APP HOST/VM CLIENT 
Copyright © 2014 by Inktank
ACCESSING A RADOS CLUSTER 
27 
socket 
RADOS CLUSTER 
Copyright © 2014 by Inktank
LIBRADOS: RADOS ACCESS FOR 
APPS 
28 
LIBRADOS: 
 Direct access to RADOS for applications 
 C, C++, Python, PHP, Java, E...
ARCHITECTURAL COMPONENTS 
29 
APP HOST/VM CLIENT 
Copyright © 2014 by Inktank
THE RADOS GATEWAY 
30 
RADOS CLUSTER 
REST 
socket 
Copyright © 2014 by Inktank
RADOSGW MAKES RADOS 
WEBBY 
31 
RADOSGW: 
 REST-based object storage proxy 
 Uses RADOS to store objects 
 API supports...
ARCHITECTURAL COMPONENTS 
32 
APP HOST/VM CLIENT 
Copyright © 2014 by Inktank
STORING VIRTUAL DISKS 
33 
RADOS CLUSTER 
Copyright © 2014 by Inktank
SEPARATE COMPUTE FROM 
STORAGE 
34 
RADOS CLUSTER 
Copyright © 2014 by Inktank
KERNEL MODULE FOR MAX 
FLEXIBLE! 
35 
RADOS CLUSTER 
Copyright © 2014 by Inktank
RBD STORES VIRTUAL DISKS 
36 
RADOS BLOCK DEVICE: 
 Storage of disk images in RADOS 
 Decouples VMs from host 
 Images ...
RBD SNAPSHOTS 
Export snapshots to geographically dispersed data centers 
▪ Institute disaster recovery 
Export incrementa...
ARCHITECTURAL COMPONENTS 
38 
APP HOST/VM CLIENT 
Copyright © 2014 by Inktank
SEPARATE METADATA SERVER 
39 
metadata data 
RADOS CLUSTER 
Copyright © 2014 by Inktank
SCALABLE METADATA SERVERS 
40 
METADATA SERVER 
 Manages metadata for a POSIX-compliant 
shared filesystem 
 Directory h...
CALAMARI 
41 
Copyright © 2014 by Inktank
CALAMARI ARCHITECTURE 
42 
ADMIN NODE 
CEPH STORAGE CLUSTER 
Copyright © 2014 by Inktank
USE CASES
WEB APPLICATION STORAGE 
44 
S3/Swift S3/Swift S3/Swift S3/Swift 
Copyright © 2014 by Inktank
MULTI-SITE OBJECT STORAGE 
45 
Copyright © 2014 by Inktank
ARCHIVE / COLD STORAGE 
46 
CEPH STORAGE CLUSTER 
Copyright © 2014 by Inktank
ERASURE CODING 
47 
CEPH STORAGE CLUSTER CEPH STORAGE CLUSTER 
Full copies of stored objects 
 Very high durability 
 Qu...
ERASURE CODING: HOW DOES 
IT WORK? 
48 
OSD OSD OSD OSD OSD OSD 
ERASURE CODED POOL 
CEPH STORAGE CLUSTER 
Copyright © 201...
CACHE TIERING 
49 
Read/Write Read/Write 
CEPH STORAGE CLUSTER 
Copyright © 2014 by Inktank
CACHE TIERING 
50 
Write Write Read Read 
CEPH STORAGE CLUSTER 
Copyright © 2014 by Inktank
WEBSCALE APPLICATIONS 
51 
Native 
Protocol 
Native 
Protocol 
Native 
Protocol 
Native 
Protocol 
Copyright © 2014 by Ink...
ARCHIVE / COLD STORAGE 
52 
CEPH STORAGE CLUSTER CEPH STORAGE CLUSTER 
Site A Site B 
Copyright © 2014 by Inktank
DATABASES 
53 
Native 
Protocol 
Native 
Protocol 
Native 
Protocol 
Native 
Protocol 
Copyright © 2014 by Inktank
WHAT ABOUT CEPH AND 
OPENSTACK?
CEPH AND OPENSTACK 
55 
RADOS CLUSTER 
Copyright © 2014 by Inktank
OPENSTACK ADDITIONS 
 JUNO 
 Enable Cloning for rbd-backed ephemeral disks 
 KILO 
 Volume Migration from One Backend ...
Future Ceph Roadmap
CEPH ROADMAP 
58 
Giant Hammer I-Release 
Copyright © 2014 by Inktank
NEXT STEPS
NEXT STEPS 
WHAT NOW? 
• Read about the latest version of 
Ceph: http://ceph.com/docs 
• Deploy a test cluster using ceph-...
THANK YOU! 
Ian Colle 
Global Director of 
Software Engineering 
icolle@redhat.com 
303.601.7713 
@ircolle
Upcoming SlideShare
Loading in …5
×

What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

3,410 views

Published on

October 2014 overview of the Ceph distributed storage system architecture, integration with OpenStack, and plans for future development.

Published in: Software
  • Be the first to comment

What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

  1. 1. Colorado OpenStack Meetup 14 OCT 2014
  2. 2. HISTORICAL TIMELINE 2 RHEL-OSP Certification FEB 2014 OpenStack Integration 2011 MAY 2012 Launch of Inktank 2010 Mainline Linux Kernel Open Source 2006 2004 Project Starts at UCSC Production Ready Ceph SEPT 2012 Xen Integration 2013 2012 CloudStack Integration OCT 2013 Inktank Ceph Enterprise Launch APR 2014 Inktank Acquired by Red Hat Copyright © 2014 by Inktank
  3. 3. OPENSTACK USER SURVEY, 05/2014 3 DEV / QA PROOF OF CONCEPT PRODUCTION
  4. 4. A STORAGE REVOLUTION
  5. 5. ARCHITECTURE
  6. 6. ARCHITECTURAL COMPONENTS Copyright © 2014 by Inktank 6 APP HOST/VM CLIENT
  7. 7. ARCHITECTURAL COMPONENTS 7 APP HOST/VM CLIENT Copyright © 2014 by Inktank
  8. 8. OBJECT STORAGE DAEMONS 8 btrfs xfs ext4 zfs? Copyright © 2014 by Inktank
  9. 9. RADOS CLUSTER 9 RADOS CLUSTER Copyright © 2014 by Inktank
  10. 10. RADOS COMPONENTS 10 OSDs:  10s to 10000s in a cluster  One per disk (or one per SSD, RAID group…)  Serve stored objects to clients  Intelligently peer for replication & recovery Monitors:  Maintain cluster membership and state  Provide consensus for distributed decision-making  Small, odd number  These do not serve stored objects to clients Copyright © 2014 by Inktank
  11. 11. WHERE DO OBJECTS LIVE? 11 ?? Copyright © 2014 by Inktank
  12. 12. A METADATA SERVER? 12 1 2 Copyright © 2014 by Inktank
  13. 13. CALCULATED PLACEMENT 13 A-G H-N O-T U-Z Copyright © 2014 by Inktank
  14. 14. EVEN BETTER: CRUSH! 14 PLACEMENT GROUPS CLUSTER (PGs) Copyright © 2014 by Inktank
  15. 15. CRUSH IS A QUICK CALCULATION 15 RADOS CLUSTER Copyright © 2014 by Inktank
  16. 16. CRUSH: DYNAMIC DATA PLACEMENT 16 CRUSH:  Pseudo-random placement algorithm  Fast calculation, no lookup  Repeatable, deterministic  Statistically uniform distribution  Stable mapping  Limited data migration on change  Rule-based configuration  Infrastructure topology aware  Adjustable replication  Weighting Copyright © 2014 by Inktank
  17. 17. CRUSH 17 hash(object name) % num pg CRUSH(pg, cluster state, rule set) Copyright © 2014 by Inktank
  18. 18. 18 Copyright © 2014 by Inktank
  19. 19. 19 ?? Copyright © 2014 by Inktank
  20. 20. 20 Copyright © 2014 by Inktank
  21. 21. 21 Copyright © 2014 by Inktank
  22. 22. 22 ?? Copyright © 2014 by Inktank
  23. 23. 23 Copyright © 2014 by Inktank
  24. 24. 24 Copyright © 2014 by Inktank
  25. 25. 25 Copyright © 2014 by Inktank
  26. 26. ARCHITECTURAL COMPONENTS 26 APP HOST/VM CLIENT Copyright © 2014 by Inktank
  27. 27. ACCESSING A RADOS CLUSTER 27 socket RADOS CLUSTER Copyright © 2014 by Inktank
  28. 28. LIBRADOS: RADOS ACCESS FOR APPS 28 LIBRADOS:  Direct access to RADOS for applications  C, C++, Python, PHP, Java, Erlang  Direct access to storage nodes  No HTTP overhead Copyright © 2014 by Inktank
  29. 29. ARCHITECTURAL COMPONENTS 29 APP HOST/VM CLIENT Copyright © 2014 by Inktank
  30. 30. THE RADOS GATEWAY 30 RADOS CLUSTER REST socket Copyright © 2014 by Inktank
  31. 31. RADOSGW MAKES RADOS WEBBY 31 RADOSGW:  REST-based object storage proxy  Uses RADOS to store objects  API supports buckets, accounts  Usage accounting for billing  Compatible with S3 and Swift applications Copyright © 2014 by Inktank
  32. 32. ARCHITECTURAL COMPONENTS 32 APP HOST/VM CLIENT Copyright © 2014 by Inktank
  33. 33. STORING VIRTUAL DISKS 33 RADOS CLUSTER Copyright © 2014 by Inktank
  34. 34. SEPARATE COMPUTE FROM STORAGE 34 RADOS CLUSTER Copyright © 2014 by Inktank
  35. 35. KERNEL MODULE FOR MAX FLEXIBLE! 35 RADOS CLUSTER Copyright © 2014 by Inktank
  36. 36. RBD STORES VIRTUAL DISKS 36 RADOS BLOCK DEVICE:  Storage of disk images in RADOS  Decouples VMs from host  Images are striped across the cluster (pool)  Snapshots  Copy-on-write clones  Support in:  Mainline Linux Kernel (2.6.39+)  Qemu/KVM, native Xen coming soon  OpenStack, CloudStack, Nebula, Proxmox Copyright © 2014 by Inktank
  37. 37. RBD SNAPSHOTS Export snapshots to geographically dispersed data centers ▪ Institute disaster recovery Export incremental snapshots ▪ Minimize network bandwidth by only sending changes Copyright © 2014 by Inktank
  38. 38. ARCHITECTURAL COMPONENTS 38 APP HOST/VM CLIENT Copyright © 2014 by Inktank
  39. 39. SEPARATE METADATA SERVER 39 metadata data RADOS CLUSTER Copyright © 2014 by Inktank
  40. 40. SCALABLE METADATA SERVERS 40 METADATA SERVER  Manages metadata for a POSIX-compliant shared filesystem  Directory hierarchy  File metadata (owner, timestamps, mode, etc.)  Stores metadata in RADOS  Does not serve file data to clients  Only required for shared filesystem Copyright © 2014 by Inktank
  41. 41. CALAMARI 41 Copyright © 2014 by Inktank
  42. 42. CALAMARI ARCHITECTURE 42 ADMIN NODE CEPH STORAGE CLUSTER Copyright © 2014 by Inktank
  43. 43. USE CASES
  44. 44. WEB APPLICATION STORAGE 44 S3/Swift S3/Swift S3/Swift S3/Swift Copyright © 2014 by Inktank
  45. 45. MULTI-SITE OBJECT STORAGE 45 Copyright © 2014 by Inktank
  46. 46. ARCHIVE / COLD STORAGE 46 CEPH STORAGE CLUSTER Copyright © 2014 by Inktank
  47. 47. ERASURE CODING 47 CEPH STORAGE CLUSTER CEPH STORAGE CLUSTER Full copies of stored objects  Very high durability  Quicker recovery One copy plus parity  Cost-effective durability  Expensive recovery Copyright © 2014 by Inktank
  48. 48. ERASURE CODING: HOW DOES IT WORK? 48 OSD OSD OSD OSD OSD OSD ERASURE CODED POOL CEPH STORAGE CLUSTER Copyright © 2014 by Inktank
  49. 49. CACHE TIERING 49 Read/Write Read/Write CEPH STORAGE CLUSTER Copyright © 2014 by Inktank
  50. 50. CACHE TIERING 50 Write Write Read Read CEPH STORAGE CLUSTER Copyright © 2014 by Inktank
  51. 51. WEBSCALE APPLICATIONS 51 Native Protocol Native Protocol Native Protocol Native Protocol Copyright © 2014 by Inktank
  52. 52. ARCHIVE / COLD STORAGE 52 CEPH STORAGE CLUSTER CEPH STORAGE CLUSTER Site A Site B Copyright © 2014 by Inktank
  53. 53. DATABASES 53 Native Protocol Native Protocol Native Protocol Native Protocol Copyright © 2014 by Inktank
  54. 54. WHAT ABOUT CEPH AND OPENSTACK?
  55. 55. CEPH AND OPENSTACK 55 RADOS CLUSTER Copyright © 2014 by Inktank
  56. 56. OPENSTACK ADDITIONS  JUNO  Enable Cloning for rbd-backed ephemeral disks  KILO  Volume Migration from One Backend to Another  Implement proper snapshotting for Ceph-based ephemeral disks  Improve Backup in Cinder Copyright © 2014 by Inktank
  57. 57. Future Ceph Roadmap
  58. 58. CEPH ROADMAP 58 Giant Hammer I-Release Copyright © 2014 by Inktank
  59. 59. NEXT STEPS
  60. 60. NEXT STEPS WHAT NOW? • Read about the latest version of Ceph: http://ceph.com/docs • Deploy a test cluster using ceph-deploy: http://ceph.com/qsg  Most discussion happens on the mailing lists ceph-devel and ceph-users. Join or view archives at http://ceph.com/list  IRC is a great place to get help (or help others!) #ceph and #ceph-devel. Details and logs at http://ceph.com/irc 60 • Deploy a test cluster on the AWS free-tier using Juju: http://ceph.com/juju • Ansible playbooks for Ceph: https://www.github.com/alfredodeza/c eph-ansible  Download the code: http://www.github.com/ceph  The tracker manages bugs and feature requests. Register and start looking around at http://tracker.ceph.com  Doc updates and suggestions are always welcome. Learn how to contribute docs at http://ceph.com/docwriting
  61. 61. THANK YOU! Ian Colle Global Director of Software Engineering icolle@redhat.com 303.601.7713 @ircolle

×