S C A L I N G STO R AG E W I T H C E P H            Ross Turk, Inktank
WHO?Ross TurkVP Community, Inktank ross@inktank.com @rossturkinktank.com | ceph.com
me
APP                    APP                   HOST/VM                    CLIENT                       RADOSGW              ...
I N T H E B EG I N N I N GMagic Madzik, Flickr / CC BY 2.0
EA R LY I N FO R M AT I O N STO R AG EChico.Ferreira, Flickr / CC BY 2.0
W R I T I N G > C AV E PA I N T I N G Skevingessner, Flickr / CC BY-SA 2.0
==x1000        x1
P EO P L E B EG I N W R I T I N G A LOTMoyan_Brenn, Flickr / CC BY-ND 2.0
W R I T I N G I S T I M E - CO N S U M I N Gtrekkyandy, Flickr / CC BY 2.0
T H E I N D U ST R I A L I ZAT I O N O F W R I T I N GFateDenied, Flickr / CC BY 2.0
magnet       +   tape   =    magnetic tape                   ==         x1000              x1
STO R AG E B ECO M ES M EC H A N I C A LErik Pitti, Wikipedia / CC BY-ND 2.0
HUMA        ROCK  N         INKHUMA  N        PAPERHUMA       COMPUTER   TAPE  N
CO M P U T E RS N E E D P EO P L E TO WO R KUSDAgov, Flickr / CC BY 2.0
HUMA       COMPUTER   TAPE  N
11101011 10110110     10110101 10101001     00100100 01001001==   10100100 10100101     01011010 01101010     10101010 101...
T H RO U G H P U T B ECO M ES I M P O RTA N TZane Luke, Flickr / CC BY-ND 2.0
L A Z 0 R B 3 A M S C H A N G E E V E RY T H I N G ! !Jeff Kubina, Flickr / CC-BY-SA 2.0
H A R D D R I V ES A R E TOTA L LY B E T T E R              amazing spinny hard drives   sucky stupid tape                ...
E V E RY T H I N G G E T S M ES SYRob!, Flickr / CC BY 2.0
aa      ab               111010               ac101   ba    bb                        bc    111   010da    110   db   011 ...
file                                      owner: rturk                                  created: aug12                    ...
aa      ab              111010               ac101   ba    bb                       bc    111   010da    110   db   01    ...
W E O U TG ROW T H E H A R D D R I V EMr. T in DC, Flickr / CC BY 2.0
DISK                 DISKHUMA  N              DISKHUMA   COMPUTE   DISK  N       RHUMA             DISK  N                ...
HUMAN          HUMAN                          HUMAN HUMAN                                                         DISK    ...
COMPUTE                 DISK          R       COMPUTE                 DISK          RHUMA   COMPUTE                 DISK  ...
aa      ab               111010               ac101   ba    bb                        bc    111   010da    110   db   011 ...
object                                    pace: quick                                    driver: frog                     ...
COMPUTE                DISK         R      COMPUTE                DISK         R      COMPUTE                DISK         ...
COMPUTE                           DISK                    R                 COMPUTE                           DISK        ...
COMPUTE               DISK        R     COMPUTE               DISK        R     COMPUTE               DISK        R     CO...
Ceph                                                                                                          Cloud comput...
COMPUTE                 DISK          R       COMPUTE                 DISK          RHUMA   COMPUTE                 DISK  ...
COMPUTE          DISK   RCOMPUTE          DISK   RCOMPUTE          DISK   RCOMPUTE          DISK   RCOMPUTE          DISK ...
C DC DC DC DC DC DC DC DC DC DC DC D
C D       C DHUMA   C D  N    C D       C D       C DHUMA  N    C D       C D       C DHUMA  N    C D       C D       C D
STO R AG E A P P L I A N C ESMichael Moll, Wikipedia / CC BY-SA 2.0
6 . 4 M I L L I O N S Q F T O F FAC TO R I ESDude94111, Flickr / CC BY 2.0
T EC H N O LO GY I S A CO M M O D I T YRaeAllen, Flickr / CC-BY 2.0
CO M M O D I T Y P R I C ES F LU C T UAT EMay-07       May-08      May-09       May-10     May-11   May-12
Hardware Appliances are Mysterious Black BoxesAbode of Chaos, Flickr / CC BY 2.0
C   D                   C   D                   C   D                   C   D                   C   D HUMAN        !!   C ...
C   D     C   DC    C   D     C   D         D     C   DC+   C   D     C   D+     C   D     C   D     C   D     C   D
C   D     C   DC    C   D     C   D         D     C   DC+   C   D     C   D+     C   D     C   D     C   D     C   D
THE WORLD     NEEDSAN OPEN STORAGE  TECHNOLOGY      THAT     SCALES
SAG E W E I L Co-founder of DreamHost Inventor of Ceph CEO of Inktank
philosophy   design OPENSOURCE
O P E N S O U RC E S P R EA D S I D EA Sorchidgalore, Flickr / CC BY 2.0
philosophy   design     OPEN    SOURCECOMMUNITY- FOCUSED
W E A R E S M A RT E R TO G E T H E Rrturk, Linkedin Inmap
C E P H B E LO N G S TO A L L O F U Swackybadger, Flickr / CC BY 2.0
philosophy   design     OPEN         SCALABL    SOURCE           ECOMMUNITY- FOCUSED
Ceph                                                                Too much for a room                                   ...
philosophy   design     OPEN         SCALABL    SOURCE           ECOMMUNITY-          NO SINGLE POINT OF FOCUSED          ...
A R I LO M A X C A L I FO R N I C U Saroid, Flickr / CC BY 2.0
single point                                   of failure                                highly-availablereplicatedT H E O...
T H E B E E H I V E ( A N OT H E R M E TA P H O R )blumenbiene, Flickr / CC BY 2.0
philosophy   design     OPEN         SCALABL    SOURCE           ECOMMUNITY-          NO SINGLE POINT OF FOCUSED          ...
C   D     C   DC    C   D     C   D         D     C   DC+   C   D     C   D+     C   D     C   D     C   D     C   D
C   D     C   DC    C   D     C   D         D     C   DC+   C   D     C   D+     C   D     C   D     C   D     C   D
philosophy   design     OPEN         SCALABL    SOURCE           ECOMMUNITY-          NO SINGLE POINT OF FOCUSED          ...
D I S KS = J U ST T I N Y R ECO R D P L AY E RSjon_a_ross, Flickr / CC BY 2.0
D    D  D    D  D    D      =  D    Dx 1 MILLION                  55 times / day
I T A L L STA RT E D W I T H A D R EA M
+
APP                    APP                   HOST/VM                    CLIENT                       RADOSGW              ...
APP                    APP                   HOST/VM                    CLIENT                       RADOSGW              ...
OSD    OSD    OSD    OSD    OSD                                   btrfsFS      FS    FS     FS     FS                     ...
HUMAN        MM           M
Monitors:     Maintain cluster mapM    Provide consensus for      distributed decision-      making     Must have an od...
APP                    APP                   HOST/VM                    CLIENT                       RADOSGW              ...
APP    LIBRADOS               native    MM               M
LIBRADOS     Provides direct access toL      RADOS for applications     C, C++, Python, PHP, Jav       a     No HTTP ov...
APP                    APP                   HOST/VM                    CLIENT                       RADOSGW              ...
APP                APP                                RESTRADOSGW          RADOSGW  LIBRADOS           LIBRADOS           ...
RADOS Gateway: REST-based interface to  RADOS Supports  buckets, accounting Compatible with S3 and  Swift applications
APP                    APP                   HOST/VM                    CLIENT                       RADOSGW              ...
VMVIRTUALIZATION CONTAINER            LIBRBD          LIBRADOS        M   M                 M
CONTAINER            VM       CONTAINER   LIBRBD                        LIBRBD  LIBRADOS                      LIBRADOS    ...
HOST    KRBD (KERNEL MODULE)         LIBRADOS       MM                      M
RADOS Block Device: Storage of virtual disks in  RADOS Allows decoupling of VMs  and containers     Live migration! Im...
APP                    APP                   HOST/VM                    CLIENT                       RADOSGW              ...
CLIENTmetadata           01   data                   10               M           M            M
Metadata Server Manages metadata for a  POSIX-compliant shared  filesystem    Directory hierarchy    File metadata     ...
WHAT MAKES  CEPH UNIQUE?
H OW D O YO U F I N D YO U R K E YS ?azmeen, Flickr / CC BY 2.0
C D           C D           C D           C D           C D      ??APP        C D           C D           C D           C ...
C D           C D   A-G           C D           C D           C D   H-NAPP   F*   C D           C D           C D   O-T   ...
I A LWAYS P U T M Y K E YS O N T H E H O O Kvitamindave, Flickr / CC BY 2.0
C D      C D      C D      C D      C DAPP   C D      C D      C D      C D      C D      C D      C D
D EA R D I A RY: K E YS = I N T H E K I TC H E NBarnaby, Flickr / CC BY 2.0
HOW DO YOU FIND YOUR KEYSWHEN YOUR HOUSE         IS  INFINITELY BIG       ANDALWAYS CHANGING?
T H E A N SW E R : C R U S H ! !pasukaru76, Flickr / CC SA 2.0
10 10 01 01 10 10 01 11 01 10                               hash(object name) % num pg10   10    01   01   10   10    01  ...
10 10 01 01 10 10 01 11 01 1010   10    01   01   10   10   01   11    01   10
CRUSH Pseudo-random placement  algorithm Ensures even distribution Repeatable, deterministic Rule-based configuration ...
CLIENT         ??
CLIENT         ??
VMVIRTUALIZATION CONTAINER            LIBRBD         LIBRADOS        M   M                 M
HOW DO YOU      SPIN UPTHOUSANDS OF VMs    INSTANTLY       AND  EFFICIENTLY?
instant copy144   0       0      0   0   = 144
write                          CLIENT                  write                  write                  write144   4   = 148
read                  read                         CLIENT                  read144   4   = 148
HOW DO YOU       MANAGEDIRECTORY HEIRARCHY      WITHOUT          A   SINGLE POINT OF      FAILURE?
F I L ESYST E M S R EQ U I R E M E TA DATABarnaby, Flickr / CC BY 2.0
CLIENT        01        10    MM            M
MM       M
one treethree metadata servers                              ??
DYNAMIC SUBTREE PARTITIONING
AND NOWBACKPEDALING
ALMOSTEVERYTHING  WORKS
APP                    APP                   HOST/VM                    CLIENT                       RADOSGW              ...
*LAN SCALE!!* OR REALLY REALLY SCARY FAST WAN
C E P H A N D C LO U D STAC Ktableatny, Flickr / CC BY 2.0
R B D S U P P O RT I N C LO U D STAC K Allows storage of virtual disks inside RADOS    Works with KVM only right now   ...
Q U EST I O N S ?Ross TurkVP Community, Inktank ross@inktank.com @rossturkinktank.com | ceph.com
Build A Cloud Day - Chicago
Build A Cloud Day - Chicago
Build A Cloud Day - Chicago
Build A Cloud Day - Chicago
Build A Cloud Day - Chicago
Build A Cloud Day - Chicago
Build A Cloud Day - Chicago
Build A Cloud Day - Chicago
Build A Cloud Day - Chicago
Upcoming SlideShare
Loading in …5
×

Build A Cloud Day - Chicago

342 views
275 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
342
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • People have been trying to capture knowledge for a very long time. I guess the first form of captured knowledge is the cave painting.
  • TODO: change this slide. Man + magnet + tape = magnetic tape.1000 books on one tape
  • People learned how to store data on magnetic tape.Many, many, many books could be stored on a single tape.
  • TODO: animate so that they show progressively
  • Build A Cloud Day - Chicago

    1. 1. S C A L I N G STO R AG E W I T H C E P H Ross Turk, Inktank
    2. 2. WHO?Ross TurkVP Community, Inktank ross@inktank.com @rossturkinktank.com | ceph.com
    3. 3. me
    4. 4. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
    5. 5. I N T H E B EG I N N I N GMagic Madzik, Flickr / CC BY 2.0
    6. 6. EA R LY I N FO R M AT I O N STO R AG EChico.Ferreira, Flickr / CC BY 2.0
    7. 7. W R I T I N G > C AV E PA I N T I N G Skevingessner, Flickr / CC BY-SA 2.0
    8. 8. ==x1000 x1
    9. 9. P EO P L E B EG I N W R I T I N G A LOTMoyan_Brenn, Flickr / CC BY-ND 2.0
    10. 10. W R I T I N G I S T I M E - CO N S U M I N Gtrekkyandy, Flickr / CC BY 2.0
    11. 11. T H E I N D U ST R I A L I ZAT I O N O F W R I T I N GFateDenied, Flickr / CC BY 2.0
    12. 12. magnet + tape = magnetic tape == x1000 x1
    13. 13. STO R AG E B ECO M ES M EC H A N I C A LErik Pitti, Wikipedia / CC BY-ND 2.0
    14. 14. HUMA ROCK N INKHUMA N PAPERHUMA COMPUTER TAPE N
    15. 15. CO M P U T E RS N E E D P EO P L E TO WO R KUSDAgov, Flickr / CC BY 2.0
    16. 16. HUMA COMPUTER TAPE N
    17. 17. 11101011 10110110 10110101 10101001 00100100 01001001== 10100100 10100101 01011010 01101010 10101010 10101010 01010110 01010011
    18. 18. T H RO U G H P U T B ECO M ES I M P O RTA N TZane Luke, Flickr / CC BY-ND 2.0
    19. 19. L A Z 0 R B 3 A M S C H A N G E E V E RY T H I N G ! !Jeff Kubina, Flickr / CC-BY-SA 2.0
    20. 20. H A R D D R I V ES A R E TOTA L LY B E T T E R amazing spinny hard drives sucky stupid tape slow
    21. 21. E V E RY T H I N G G E T S M ES SYRob!, Flickr / CC BY 2.0
    22. 22. aa ab 111010 ac101 ba bb bc 111 010da 110 db 011 010 000 dc000 110 001
    23. 23. file owner: rturk created: aug12 last viewed: aug17 size: 4202511101011 10110110 10110101 perms: 64410101001 00100100 0100100110100100 10100101 0101101001101010 10101010 10101010
    24. 24. aa ab 111010 ac101 ba bb bc 111 010da 110 db 01 010 000 dc 10000 110 001
    25. 25. W E O U TG ROW T H E H A R D D R I V EMr. T in DC, Flickr / CC BY 2.0
    26. 26. DISK DISKHUMA N DISKHUMA COMPUTE DISK N RHUMA DISK N DISK DISK
    27. 27. HUMAN HUMAN HUMAN HUMAN DISK HUMANHUMAN DISK HUMAN HUMAN DISK DISK HUMAN DISK HUMANHUMAN DISK (COMPUTER) HUMAN DISK HUMAN HUMAN DISK HUMAN HUMAN DISK HUMAN DISK HUMAN DISK HUMAN HUMAN DISK HUMAN HUMAN (actually more like this…)
    28. 28. COMPUTE DISK R COMPUTE DISK RHUMA COMPUTE DISK R N COMPUTE DISK R COMPUTE DISK R COMPUTEHUMA R DISK N COMPUTE DISK R COMPUTE DISK R COMPUTEHUMA R DISK N COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R
    29. 29. aa ab 111010 ac101 ba bb bc 111 010da 110 db 011 010 000 dc000 110 001
    30. 30. object pace: quick driver: frog license: expired expression: agog11101011 10110110 1011010110101001 00100100 0100100110100100 10100101 0101101001101010 10101010 10101010
    31. 31. COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISKAPP R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R
    32. 32. COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTECOMPUTE DISK R COMPUTE R R DISK DISK COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R
    33. 33. COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK RVM COMPUTE DISK R COMPUTE DISK RVM COMPUTE DISK R COMPUTE DISKVM R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R
    34. 34. Ceph Cloud computing Distributed storage Shared storage Computers Writing PaintingS TO R A G E T H R O U G H O U T H I S TO RYTime-scale: Roughly logarithmic. Content: Whatever the opposite of “scientific” is.
    35. 35. COMPUTE DISK R COMPUTE DISK RHUMA COMPUTE DISK R N COMPUTE DISK R COMPUTE DISK R COMPUTEHUMA R DISK N COMPUTE DISK R COMPUTE DISK R COMPUTEHUMA R DISK N COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R
    36. 36. COMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK R
    37. 37. C DC DC DC DC DC DC DC DC DC DC DC D
    38. 38. C D C DHUMA C D N C D C D C DHUMA N C D C D C DHUMA N C D C D C D
    39. 39. STO R AG E A P P L I A N C ESMichael Moll, Wikipedia / CC BY-SA 2.0
    40. 40. 6 . 4 M I L L I O N S Q F T O F FAC TO R I ESDude94111, Flickr / CC BY 2.0
    41. 41. T EC H N O LO GY I S A CO M M O D I T YRaeAllen, Flickr / CC-BY 2.0
    42. 42. CO M M O D I T Y P R I C ES F LU C T UAT EMay-07 May-08 May-09 May-10 May-11 May-12
    43. 43. Hardware Appliances are Mysterious Black BoxesAbode of Chaos, Flickr / CC BY 2.0
    44. 44. C D C D C D C D C D HUMAN !! C D C D[DEVELOPER] C D C D C D C D C D
    45. 45. C D C DC C D C D D C DC+ C D C D+ C D C D C D C D
    46. 46. C D C DC C D C D D C DC+ C D C D+ C D C D C D C D
    47. 47. THE WORLD NEEDSAN OPEN STORAGE TECHNOLOGY THAT SCALES
    48. 48. SAG E W E I L Co-founder of DreamHost Inventor of Ceph CEO of Inktank
    49. 49. philosophy design OPENSOURCE
    50. 50. O P E N S O U RC E S P R EA D S I D EA Sorchidgalore, Flickr / CC BY 2.0
    51. 51. philosophy design OPEN SOURCECOMMUNITY- FOCUSED
    52. 52. W E A R E S M A RT E R TO G E T H E Rrturk, Linkedin Inmap
    53. 53. C E P H B E LO N G S TO A L L O F U Swackybadger, Flickr / CC BY 2.0
    54. 54. philosophy design OPEN SCALABL SOURCE ECOMMUNITY- FOCUSED
    55. 55. Ceph Too much for a room Too much for a computer Too much for a drive Too much for a book Too much for a caveC E P H I S B U I LT TO S C A L E
    56. 56. philosophy design OPEN SCALABL SOURCE ECOMMUNITY- NO SINGLE POINT OF FOCUSED FAILURE
    57. 57. A R I LO M A X C A L I FO R N I C U Saroid, Flickr / CC BY 2.0
    58. 58. single point of failure highly-availablereplicatedT H E O C TO P U S ( A M E TA P H O R )I love speaking in metaphors.
    59. 59. T H E B E E H I V E ( A N OT H E R M E TA P H O R )blumenbiene, Flickr / CC BY 2.0
    60. 60. philosophy design OPEN SCALABL SOURCE ECOMMUNITY- NO SINGLE POINT OF FOCUSED FAILURE SOFTWARE BASED
    61. 61. C D C DC C D C D D C DC+ C D C D+ C D C D C D C D
    62. 62. C D C DC C D C D D C DC+ C D C D+ C D C D C D C D
    63. 63. philosophy design OPEN SCALABL SOURCE ECOMMUNITY- NO SINGLE POINT OF FOCUSED FAILURE SOFTWARE BASED SELF- MANAGING
    64. 64. D I S KS = J U ST T I N Y R ECO R D P L AY E RSjon_a_ross, Flickr / CC BY 2.0
    65. 65. D D D D D D = D Dx 1 MILLION 55 times / day
    66. 66. I T A L L STA RT E D W I T H A D R EA M
    67. 67. +
    68. 68. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
    69. 69. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
    70. 70. OSD OSD OSD OSD OSD btrfsFS FS FS FS FS xfs ext4DISK DISK DISK DISK DISK M M M
    71. 71. HUMAN MM M
    72. 72. Monitors:  Maintain cluster mapM  Provide consensus for distributed decision- making  Must have an odd number  These do not serve stored objects to clients OSDs:  One per disk (recommended)  At least three in a cluster  Serve stored objects to clients  Intelligently peer to perform replication tasks  Supports object classes
    73. 73. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
    74. 74. APP LIBRADOS native MM M
    75. 75. LIBRADOS  Provides direct access toL RADOS for applications  C, C++, Python, PHP, Jav a  No HTTP overhead
    76. 76. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
    77. 77. APP APP RESTRADOSGW RADOSGW LIBRADOS LIBRADOS native M M M
    78. 78. RADOS Gateway: REST-based interface to RADOS Supports buckets, accounting Compatible with S3 and Swift applications
    79. 79. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
    80. 80. VMVIRTUALIZATION CONTAINER LIBRBD LIBRADOS M M M
    81. 81. CONTAINER VM CONTAINER LIBRBD LIBRBD LIBRADOS LIBRADOS M M M
    82. 82. HOST KRBD (KERNEL MODULE) LIBRADOS MM M
    83. 83. RADOS Block Device: Storage of virtual disks in RADOS Allows decoupling of VMs and containers  Live migration! Images are striped across the cluster Boot support in QEMU, KVM, and OpenStack Nova Mount support in the Linux kernel
    84. 84. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
    85. 85. CLIENTmetadata 01 data 10 M M M
    86. 86. Metadata Server Manages metadata for a POSIX-compliant shared filesystem  Directory hierarchy  File metadata (owner, timestamps, mo de, etc.) Stores metadata in RADOS Does not serve file data to clients Only required for shared filesystem
    87. 87. WHAT MAKES CEPH UNIQUE?
    88. 88. H OW D O YO U F I N D YO U R K E YS ?azmeen, Flickr / CC BY 2.0
    89. 89. C D C D C D C D C D ??APP C D C D C D C D C D C D C D
    90. 90. C D C D A-G C D C D C D H-NAPP F* C D C D C D O-T C D C D C D U-Z C D
    91. 91. I A LWAYS P U T M Y K E YS O N T H E H O O Kvitamindave, Flickr / CC BY 2.0
    92. 92. C D C D C D C D C DAPP C D C D C D C D C D C D C D
    93. 93. D EA R D I A RY: K E YS = I N T H E K I TC H E NBarnaby, Flickr / CC BY 2.0
    94. 94. HOW DO YOU FIND YOUR KEYSWHEN YOUR HOUSE IS INFINITELY BIG ANDALWAYS CHANGING?
    95. 95. T H E A N SW E R : C R U S H ! !pasukaru76, Flickr / CC SA 2.0
    96. 96. 10 10 01 01 10 10 01 11 01 10 hash(object name) % num pg10 10 01 01 10 10 01 11 01 10 CRUSH(pg, cluster state, rule set)
    97. 97. 10 10 01 01 10 10 01 11 01 1010 10 01 01 10 10 01 11 01 10
    98. 98. CRUSH Pseudo-random placement algorithm Ensures even distribution Repeatable, deterministic Rule-based configuration  Replica count  Infrastructure topology  Weighting
    99. 99. CLIENT ??
    100. 100. CLIENT ??
    101. 101. VMVIRTUALIZATION CONTAINER LIBRBD LIBRADOS M M M
    102. 102. HOW DO YOU SPIN UPTHOUSANDS OF VMs INSTANTLY AND EFFICIENTLY?
    103. 103. instant copy144 0 0 0 0 = 144
    104. 104. write CLIENT write write write144 4 = 148
    105. 105. read read CLIENT read144 4 = 148
    106. 106. HOW DO YOU MANAGEDIRECTORY HEIRARCHY WITHOUT A SINGLE POINT OF FAILURE?
    107. 107. F I L ESYST E M S R EQ U I R E M E TA DATABarnaby, Flickr / CC BY 2.0
    108. 108. CLIENT 01 10 MM M
    109. 109. MM M
    110. 110. one treethree metadata servers ??
    111. 111. DYNAMIC SUBTREE PARTITIONING
    112. 112. AND NOWBACKPEDALING
    113. 113. ALMOSTEVERYTHING WORKS
    114. 114. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP AWESOME AWESOME NEARLY AWESOME AWESOMERADOS AWESOMEA reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
    115. 115. *LAN SCALE!!* OR REALLY REALLY SCARY FAST WAN
    116. 116. C E P H A N D C LO U D STAC Ktableatny, Flickr / CC BY 2.0
    117. 117. R B D S U P P O RT I N C LO U D STAC K Allows storage of virtual disks inside RADOS  Works with KVM only right now  No snapshots yet Upcoming in CloudStack 4 More information can be found on the mailing list:  ceph-devel / incubator-cloudstack-dev: http://article.gmane.org/gmane.comp.file-systems.ceph.devel/7505
    118. 118. Q U EST I O N S ?Ross TurkVP Community, Inktank ross@inktank.com @rossturkinktank.com | ceph.com

    ×