InktankDelivering the Future of Storage
ME ME ME ME ME ME.I made a slide today. It’s all about me.                                           2
ME ME ME ME ME ME.I made a slide today. It’s all about me.Ross TurkVP Community, Inktankross@inktank.com@rossturkinktank.c...
4
APP                    APP                  HOST/VM                   CLIENT                       RADOSGW                ...
Let’s Start With a Good, Old-Fashioned Origin StoryJD Hancock, Flickr / CC BY 2.0                        6
The Evolution of StorageA brief history of information storage technology                                                 ...
Cave Paintings: The Earliest Form (maybe) of Information StorageChico.Ferreira, Flickr / CC BY 2.0                        ...
Technology Review: Cave PaintingThe good                 The bad•  Low cost per smudge   •  Limited storage capacity•  Mul...
HUMAN   +            =        WRITING                ==        x1000            x1                                        10
Technology Review: Books and LibrariesThe good                         The bad•  Cost per scroll is high       •  No autom...
Books (Strahov, Prague Library)Moyan_Brenn, Flickr / CC BY-ND 2.0   12
Printing PressFateDenied, Flickr / CC BY 2.0   13
magnet       +   tape   =    magnetic tape                   ==         x1000              x1                             ...
IBM System 360 Tape DrivesErik Pitti, Wikipedia / CC BY-ND 2.0   15
HUMAN    ROCK          INKHUMAN         PAPERHUMAN   COMPUTER   TAPE                          16
11101011 10110110     10110101 10101001     00100100 01001001==   10100100 10100101     01011010 01101010     10101010 101...
Tape Is StupidMrs. Gemstone, Flickr / CC BY-SA 2.0   18
Computers Need Programmers (and Operators)USDAgov, Flickr / CC BY 2.0                  19
HUMAN   COMPUTER   TAPE                          20
Throughput Becomes Importantrfduck, Flickr / CC BY-ND 2.0   21
Hard DriveJeff Kubina, Flickr / CC-BY-SA 2.0   22
Hard Drives Are Totally Better     amazing spinny hard drives   sucky stupid tape                                         ...
aa      ab               111010               ac101   ba    bb                        bc    111   010da    110   db   011 ...
file                                      owner: rturk                                  created: aug12                    ...
aa      ab              111010               ac101   ba    bb                       bc    111   010da    110   db   01    ...
Humanity Outgrows the Hard DriveMr. T in DC, Flickr / CC BY 2.0    27
DISK                  DISK                  DISK        COMPUTE   DISKHUMAN           R                  DISK             ...
What Happens When Two HUMANs Need Access to the Same Resource?wFourier, Flickr / CC BY 2.0                                ...
DISK                  DISKHUMAN                  DISK        COMPUTE   DISKHUMAN           R                  DISKHUMAN   ...
HUMAN          HUMAN                          HUMAN HUMAN                                                         DISK    ...
COMPUTE                  DISK           R        COMPUTE                  DISK           R        COMPUTE                 ...
object                                  texture: crunch                              flavor: smoke, salt                  ...
X                        aa      ab               111010               ac101   ba    bb                        bc    111  ...
COMPUTE                DISK         R      COMPUTE                DISK         R      COMPUTE                DISK         ...
COMPUTE                           DISK                    R                 COMPUTE                           DISK        ...
COMPUTE               DISK        R     COMPUTE               DISK        R     COMPUTE               DISK        R     CO...
The Current State of StorageHow people store information today, and why it’s still not perfect yet                        ...
Ceph                                                                                              Cloud computing         ...
COMPUTE                  DISK           R        COMPUTE                  DISK           R        COMPUTE                 ...
COMPUTE          DISK   RCOMPUTE          DISK   RCOMPUTE          DISK   RCOMPUTE          DISK   RCOMPUTE          DISK ...
C DC DC DC DC DC DC DC DC DC DC DC D      42
C D        C D        C DHUMAN        C D        C D        C DHUMAN   C D        C D        C DHUMAN   C D        C D    ...
Storage HardwareMichael Moll, Wikipedia / CC BY-SA 2.0   44
6.4 Million Square Feet of Expensive Factory BuildingsDude94111, Flickr / CC BY 2.0                            45
Storage Hardware Vendors Have Bills to PayCarbonNYC, Flickr / CC BY 2.0                46
…Which Means That Customers Do Too401K 2012, Flickr / CC BY-SA 2.0     47
Technology Is Becoming a CommodityRaeAllen, Flickr / CC-BY 2.0         48
Commodity Prices FluctuateMay-07   May-08   May-09   May-10   May-11   May-12                                             ...
Growing With Hardware Appliances        First PB                      Second PBC   D   •  Proprietary        C   D   •  Pr...
Dedicated Hardware Appliances Are OLD TECHNOLOGYPaul Keller, Flickr / CC BY 2.0                    51
Source: http://www.cpubenchmark.net/high_end_cpus.html                                                         52
FLAGSHIPPRODUCT           53
“Im sick of paying for hardware with a three-year-old proc in it!”Mel B., Flickr / CC BY 2.0                              ...
Hardware Appliances are Mysterious Black BoxesAbode of Chaos, Flickr / CC BY 2.0               55
C   D      C   D C    C   D      C   D          D      C   D      C   DC++   C   D      C   D      C   D      C   D      C...
X      C   D      C   D C    C   D      C   D          D      C   D      C   DC++   C   D      C   D      C   D      C   D...
C   D                   C   D                   C   D                   C   D                   C   D HUMAN        !!   C ...
Give More Money To The Big Proprietary VendorsIt will make them very, very happy.              59
Storage Should Be BetterPeople need storage solutions that…•  …are open•  …are easy to manage•  …satisfy their requirement...
The Birth of a New Storage SolutionWe think our roots are showing                                      61
DreamHost            62
Sage WeilCo-founder ofDreamHostInventor of CephCEO of Inktank                   63
DreamHostDreamHost is staffed by extraordinarily hip people   64
+    65
New Monthly Code Commits700600500400300200100  0  2004-06   2005-07   2006-07   2007-07   2008-07   2009-07   2010-07   20...
Ceph Starts Popping Up                         67
philosophy   design OPENSOURCE                       68
Open Source is the Best Way to Spread Ideasorchidgalore, Flickr / CC BY 2.0              69
philosophy   design     OPEN    SOURCECOMMUNITY- FOCUSED                           70
All of Us Are Smarter Than Some of Usrturk, Linkedin Inmap                   71
philosophy   design     OPEN         SCALABL    SOURCE           ECOMMUNITY- FOCUSED                            72
Ceph                                                                                             Too much for a room      ...
philosophy   design     OPEN         SCALABL    SOURCE           ECOMMUNITY-          NO SINGLE POINT OF FOCUSED          ...
Ariolimax Californicusaroid, Flickr / CC BY 2.0   75
single point                                 of failure                replicated      replicatedThe Octopus (A Metaphor)I...
The Beehive (A Better Metaphor)blumenbiene, Flickr / CC BY 2.0   77
philosophy   design     OPEN         SCALABL    SOURCE           ECOMMUNITY-          NO SINGLE POINT OF FOCUSED          ...
C   D      C   D C    C   D      C   D          D      C   D      C   DC++   C   D      C   D      C   D      C   D      C...
C   D      C   D✔ C    C   D      C   D          D      C   D      C   DC++   C   D      C   D      C   D      C   D      ...
philosophy   design     OPEN         SCALABL    SOURCE            ECOMMUNITY-           NO SINGLE POINT OF FOCUSED        ...
Hard Drives Are Tiny Record Players and They Fail Oftenjon_a_ross, Flickr / CC BY 2.0                            82
D    D  D    D  D    D      =  D    Dx 1 MILLION                  55 times / day                                   83
Enter: CephAn architectural and functional overview of the Ceph system                                                    ...
APP                    APP                  HOST/VM                   CLIENT                       RADOSGW                ...
APP                    APP                  HOST/VM                   CLIENT                       RADOSGW                ...
OSD    OSD    OSD    OSD    OSD                                   btrfsFS      FS    FS     FS     FS                     ...
HUMAN        MM           M                88
Monitors:M    •  Maintain cluster map    •  Provide consensus for       distributed decision-making    •  Must have an odd...
APP                    APP                  HOST/VM                   CLIENT                       RADOSGW                ...
APP    LIBRADOS               socket    MM               M                        91
LIBRADOSL    •  Provides direct access to       RADOS for applications    •  C, C++, Python, PHP, Java    •  No HTTP overh...
APP                    APP                  HOST/VM                   CLIENT                       RADOSGW                ...
APP                 REST    RADOSGW      LIBRADOS                 socket     MM                  M                        ...
RADOS Gateway:•  REST-based interface to   RADOS•  Supports buckets,   accounting•  Compatible with S3 and   Swift applica...
APP                    APP                  HOST/VM                   CLIENT                       RADOSGW                ...
VMVIRTUALIZATION CONTAINER            LIBRBD          LIBRADOS        M   M                 M                           97
CONTAINER            VM       CONTAINER   LIBRBD                        LIBRBD  LIBRADOS                      LIBRADOS    ...
HOST    KRBD (KERNEL MODULE)         LIBRADOS       MM                          M                               99
RADOS Block Device:•  Storage of virtual disks in   RADOS•  Allows decoupling of VMs and   containers •  Live migration!• ...
APP                    APP                  HOST/VM                   CLIENT                       RADOSGW                ...
CLIENTmetadata           01   data                   10               M           M            M                          ...
Metadata Server•  Manages metadata for a   POSIX-compliant shared   filesystem •  Directory hierarchy •  File metadata (ow...
What Makes Ceph Unique?Part one: CRUSH                          104
C D           C D           C D           C D           C D      ??APP        C D           C D           C D           C ...
C D      C D      C D      C D      C DAPP   C D      C D      C D      C D      C D      C D      C D            106
How Long Did It Take You To Find Your Keys This Morning?azmeen, Flickr / CC BY 2.0                                 107
Dear Diary: Today I Put My Keys on the Kitchen CounterBarnaby, Flickr / CC BY 2.0                              108
C D           C D   A-G           C D           C D           C D   H-NAPP   F*   C D           C D           C D   O-T   ...
I Always Put My Keys on the Hook By the Doorvitamindave, Flickr / CC BY 2.0                110
HOW DO YOU FIND YOUR KEYSWHEN YOUR HOUSE         IS  INFINITELY BIG       ANDALWAYS CHANGING?                   111
The Answer: CRUSH!!!!!pasukaru76, Flickr / CC SA 2.0   112
10 10 01 01 10 10 01 11 01 10                               hash(object name) % num pg10   10    01   01   10   10    01  ...
10 10 01 01 10 10 01 11 01 1010   10    01   01   10   10   01   11    01   10                                            ...
CRUSH•  Pseudo-random placement   algorithm•  Ensures even distribution•  Repeatable, deterministic•  Rule-based configura...
CLIENT         ??              116
117
118
CLIENT         ??              119
What Makes Ceph UniquePart two: thin provisioning                              120
VMVIRTUALIZATION CONTAINER            LIBRBD          LIBRADOS        M   M                 M                           121
HOW DO YOU      SPIN UPTHOUSANDS OF VMs    INSTANTLY       AND  EFFICIENTLY?                   122
instant copy144   0       0      0   0   = 144                                     123
write                          CLIENT                  write                  write                  write144   4   = 148 ...
read                  read                         CLIENT                  read144   4   = 148                            ...
What Makes Ceph Unique?Part three: clustered metadata                                 126
Metadata for a POSIX-Compliant FilesystemBarnaby, Flickr / CC BY 2.0                 127
CLIENT        01        10    MM            M                 128
MM       M            129
one treethree metadata servers                              ??                                   130
131
132
133
134
DYNAMIC SUBTREE PARTITIONING                               135
And Now: Backpedaling                        136
ALMOSTEVERYTHING  WORKS             137
APP                    APP                  HOST/VM                   CLIENT                       RADOSGW                ...
*LAN SCALE!!* OR REALLY REALLY SCARY FAST WAN                                    139
What is Inktank?I really like your polo shirt, please tell me what it means!                                              ...
141
Who?•  Ceph’s inventor and (most) developers                                           142
Why?•  To ensure the long-term success of Ceph•  To help companies adopt Ceph through   services, support, training, and c...
When?•  Founded: December 28, 2011•  Brand Launched: April 2012                                144
What do we want from you??•  Try Ceph! Tell us what you think. Ask if   you need help. Help others if you can!•  Are you a...
Questions?Ross TurkVP Community, Inktankross@inktank.com@rossturkinktank.com | ceph.com                         146
Upcoming SlideShare
Loading in …5
×

GigaOM Structure SF - June 2012

263 views

Published on

Our VP, Community Ross Turk's slides from his Structure talk in July 2012.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
263
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

GigaOM Structure SF - June 2012

  1. 1. InktankDelivering the Future of Storage
  2. 2. ME ME ME ME ME ME.I made a slide today. It’s all about me. 2
  3. 3. ME ME ME ME ME ME.I made a slide today. It’s all about me.Ross TurkVP Community, Inktankross@inktank.com@rossturkinktank.com | ceph.com 3
  4. 4. 4
  5. 5. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing,intelligent storage nodes 5
  6. 6. Let’s Start With a Good, Old-Fashioned Origin StoryJD Hancock, Flickr / CC BY 2.0 6
  7. 7. The Evolution of StorageA brief history of information storage technology 7
  8. 8. Cave Paintings: The Earliest Form (maybe) of Information StorageChico.Ferreira, Flickr / CC BY 2.0 8
  9. 9. Technology Review: Cave PaintingThe good The bad•  Low cost per smudge •  Limited storage capacity•  Multitouch •  10 caveman ideas per wall •  No support for CIFS 9
  10. 10. HUMAN + = WRITING == x1000 x1 10
  11. 11. Technology Review: Books and LibrariesThe good The bad•  Cost per scroll is high •  No automatic replication •  Can be eased w/slave labor •  Must complete backups before Caesar’s invasion of Egypt! 11
  12. 12. Books (Strahov, Prague Library)Moyan_Brenn, Flickr / CC BY-ND 2.0 12
  13. 13. Printing PressFateDenied, Flickr / CC BY 2.0 13
  14. 14. magnet + tape = magnetic tape == x1000 x1 14
  15. 15. IBM System 360 Tape DrivesErik Pitti, Wikipedia / CC BY-ND 2.0 15
  16. 16. HUMAN ROCK INKHUMAN PAPERHUMAN COMPUTER TAPE 16
  17. 17. 11101011 10110110 10110101 10101001 00100100 01001001== 10100100 10100101 01011010 01101010 10101010 10101010 01010110 01010011 17
  18. 18. Tape Is StupidMrs. Gemstone, Flickr / CC BY-SA 2.0 18
  19. 19. Computers Need Programmers (and Operators)USDAgov, Flickr / CC BY 2.0 19
  20. 20. HUMAN COMPUTER TAPE 20
  21. 21. Throughput Becomes Importantrfduck, Flickr / CC BY-ND 2.0 21
  22. 22. Hard DriveJeff Kubina, Flickr / CC-BY-SA 2.0 22
  23. 23. Hard Drives Are Totally Better amazing spinny hard drives sucky stupid tape 23
  24. 24. aa ab 111010 ac101 ba bb bc 111 010da 110 db 011 010 000 dc000 110 001 24
  25. 25. file owner: rturk created: aug12 last viewed: aug17 size: 4202511101011 10110110 10110101 perms: 64410101001 00100100 0100100110100100 10100101 0101101001101010 10101010 10101010 25
  26. 26. aa ab 111010 ac101 ba bb bc 111 010da 110 db 01 010 000 dc 10000 110 001 26
  27. 27. Humanity Outgrows the Hard DriveMr. T in DC, Flickr / CC BY 2.0 27
  28. 28. DISK DISK DISK COMPUTE DISKHUMAN R DISK DISK DISK 28
  29. 29. What Happens When Two HUMANs Need Access to the Same Resource?wFourier, Flickr / CC BY 2.0 29
  30. 30. DISK DISKHUMAN DISK COMPUTE DISKHUMAN R DISKHUMAN DISK DISK 30
  31. 31. HUMAN HUMAN HUMAN HUMAN DISK HUMANHUMAN DISK HUMAN HUMAN DISK DISK HUMAN DISK HUMANHUMAN DISK (COMPUTER) HUMAN DISK HUMAN HUMAN DISK HUMAN HUMAN DISK HUMAN DISK HUMAN DISK HUMAN HUMAN DISK HUMAN HUMAN (actually more like this…) 31
  32. 32. COMPUTE DISK R COMPUTE DISK R COMPUTE DISKHUMAN R COMPUTE DISK R COMPUTE DISK R COMPUTE DISKHUMAN R COMPUTE DISK R COMPUTE DISK R COMPUTE DISKHUMAN R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R 32
  33. 33. object texture: crunch flavor: smoke, salt nutrition: none color: bacon11101011 10110110 1011010110101001 00100100 0100100110100100 10100101 0101101001101010 10101010 10101010 33
  34. 34. X aa ab 111010 ac101 ba bb bc 111 010da 110 db 011 010 000 dc000 110 001 34
  35. 35. COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISKAPP R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R 35
  36. 36. COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTECOMPUTE R DISK COMPUTE R R DISK DISK COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R 36
  37. 37. COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK RVM COMPUTE DISK R COMPUTE DISK RVM COMPUTE DISK R COMPUTE DISKVM R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R 37
  38. 38. The Current State of StorageHow people store information today, and why it’s still not perfect yet 38
  39. 39. Ceph Cloud computing Distributed storage Shared storage Computers WritingPainting How Much Store Things All Human History!! Time-scale: Roughly logarithmic. Content: Whatever the opposite of “scientific” is. 39
  40. 40. COMPUTE DISK R COMPUTE DISK R COMPUTE DISKHUMAN R COMPUTE DISK R COMPUTE DISK R COMPUTE DISKHUMAN R COMPUTE DISK R COMPUTE DISK R COMPUTE DISKHUMAN R COMPUTE DISK R COMPUTE DISK R COMPUTE DISK R 40
  41. 41. COMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK RCOMPUTE DISK R 41
  42. 42. C DC DC DC DC DC DC DC DC DC DC DC D 42
  43. 43. C D C D C DHUMAN C D C D C DHUMAN C D C D C DHUMAN C D C D C D 43
  44. 44. Storage HardwareMichael Moll, Wikipedia / CC BY-SA 2.0 44
  45. 45. 6.4 Million Square Feet of Expensive Factory BuildingsDude94111, Flickr / CC BY 2.0 45
  46. 46. Storage Hardware Vendors Have Bills to PayCarbonNYC, Flickr / CC BY 2.0 46
  47. 47. …Which Means That Customers Do Too401K 2012, Flickr / CC BY-SA 2.0 47
  48. 48. Technology Is Becoming a CommodityRaeAllen, Flickr / CC-BY 2.0 48
  49. 49. Commodity Prices FluctuateMay-07 May-08 May-09 May-10 May-11 May-12 49
  50. 50. Growing With Hardware Appliances First PB Second PBC D •  Proprietary C D •  Proprietary storageC D storage hardware C D hardwareC D •  Well-known C D •  Same storageC D storage vendor C D vendorC D C DC D $14 b’zillion C D AnotherC D C D $14 b’zillionC D C DC D C DC D C DC D C DC D C D 50
  51. 51. Dedicated Hardware Appliances Are OLD TECHNOLOGYPaul Keller, Flickr / CC BY 2.0 51
  52. 52. Source: http://www.cpubenchmark.net/high_end_cpus.html 52
  53. 53. FLAGSHIPPRODUCT 53
  54. 54. “Im sick of paying for hardware with a three-year-old proc in it!”Mel B., Flickr / CC BY 2.0 54
  55. 55. Hardware Appliances are Mysterious Black BoxesAbode of Chaos, Flickr / CC BY 2.0 55
  56. 56. C D C D C C D C D D C D C DC++ C D C D C D C D C D 56
  57. 57. X C D C D C C D C D D C D C DC++ C D C D C D C D C D 57
  58. 58. C D C D C D C D C D HUMAN !! C D[DEVELOPER] C D C D C D C D C D C D 58
  59. 59. Give More Money To The Big Proprietary VendorsIt will make them very, very happy. 59
  60. 60. Storage Should Be BetterPeople need storage solutions that…•  …are open•  …are easy to manage•  …satisfy their requirements •  performance •  functional •  financial 60
  61. 61. The Birth of a New Storage SolutionWe think our roots are showing 61
  62. 62. DreamHost 62
  63. 63. Sage WeilCo-founder ofDreamHostInventor of CephCEO of Inktank 63
  64. 64. DreamHostDreamHost is staffed by extraordinarily hip people 64
  65. 65. + 65
  66. 66. New Monthly Code Commits700600500400300200100 0 2004-06 2005-07 2006-07 2007-07 2008-07 2009-07 2010-07 2011-07 66
  67. 67. Ceph Starts Popping Up 67
  68. 68. philosophy design OPENSOURCE 68
  69. 69. Open Source is the Best Way to Spread Ideasorchidgalore, Flickr / CC BY 2.0 69
  70. 70. philosophy design OPEN SOURCECOMMUNITY- FOCUSED 70
  71. 71. All of Us Are Smarter Than Some of Usrturk, Linkedin Inmap 71
  72. 72. philosophy design OPEN SCALABL SOURCE ECOMMUNITY- FOCUSED 72
  73. 73. Ceph Too much for a room Too much for a computer Too much for a drive Too much for a bookToo much for a cave Ceph is Built to Scale Time-scale: Roughly logarithmic. Content: Whatever the opposite of “scientific” is. 73
  74. 74. philosophy design OPEN SCALABL SOURCE ECOMMUNITY- NO SINGLE POINT OF FOCUSED FAILURE 74
  75. 75. Ariolimax Californicusaroid, Flickr / CC BY 2.0 75
  76. 76. single point of failure replicated replicatedThe Octopus (A Metaphor)I love speaking in metaphors. 76
  77. 77. The Beehive (A Better Metaphor)blumenbiene, Flickr / CC BY 2.0 77
  78. 78. philosophy design OPEN SCALABL SOURCE ECOMMUNITY- NO SINGLE POINT OF FOCUSED FAILURE SOFTWARE BASED 78
  79. 79. C D C D C C D C D D C D C DC++ C D C D C D C D C D 79
  80. 80. C D C D✔ C C D C D D C D C DC++ C D C D C D C D C D 80
  81. 81. philosophy design OPEN SCALABL SOURCE ECOMMUNITY- NO SINGLE POINT OF FOCUSED FAILURE SOFTWARE BASED SELF- MANAGING 81
  82. 82. Hard Drives Are Tiny Record Players and They Fail Oftenjon_a_ross, Flickr / CC BY 2.0 82
  83. 83. D D D D D D = D Dx 1 MILLION 55 times / day 83
  84. 84. Enter: CephAn architectural and functional overview of the Ceph system 84
  85. 85. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing,intelligent storage nodes 85
  86. 86. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing,intelligent storage nodes 86
  87. 87. OSD OSD OSD OSD OSD btrfsFS FS FS FS FS xfs ext4DISK DISK DISK DISK DISK M M M 87
  88. 88. HUMAN MM M 88
  89. 89. Monitors:M •  Maintain cluster map •  Provide consensus for distributed decision-making •  Must have an odd number •  These do not serve stored objects to clients OSDs: •  One per disk (recommended) •  At least three in a cluster •  Serve stored objects to clients •  Intelligently peer to perform replication tasks •  Supports object classes 89
  90. 90. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing,intelligent storage nodes 90
  91. 91. APP LIBRADOS socket MM M 91
  92. 92. LIBRADOSL •  Provides direct access to RADOS for applications •  C, C++, Python, PHP, Java •  No HTTP overhead
  93. 93. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing,intelligent storage nodes 93
  94. 94. APP REST RADOSGW LIBRADOS socket MM M 94
  95. 95. RADOS Gateway:•  REST-based interface to RADOS•  Supports buckets, accounting•  Compatible with S3 and Swift applications 95
  96. 96. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing,intelligent storage nodes 96
  97. 97. VMVIRTUALIZATION CONTAINER LIBRBD LIBRADOS M M M 97
  98. 98. CONTAINER VM CONTAINER LIBRBD LIBRBD LIBRADOS LIBRADOS M M M 98
  99. 99. HOST KRBD (KERNEL MODULE) LIBRADOS MM M 99
  100. 100. RADOS Block Device:•  Storage of virtual disks in RADOS•  Allows decoupling of VMs and containers •  Live migration!•  Images are striped across the cluster•  Boot support in QEMU, KVM, and OpenStack Nova•  Mount support in the Linux kernel 100
  101. 101. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHPRADOSA reliable, autonomous, distributed object store comprised of self-healing, self-managing,intelligent storage nodes 101
  102. 102. CLIENTmetadata 01 data 10 M M M 102
  103. 103. Metadata Server•  Manages metadata for a POSIX-compliant shared filesystem •  Directory hierarchy •  File metadata (owner, timestamps, mode, etc.)•  Stores metadata in RADOS•  Does not serve file data to clients•  Only required for shared filesystem 103
  104. 104. What Makes Ceph Unique?Part one: CRUSH 104
  105. 105. C D C D C D C D C D ??APP C D C D C D C D C D C D C D 105
  106. 106. C D C D C D C D C DAPP C D C D C D C D C D C D C D 106
  107. 107. How Long Did It Take You To Find Your Keys This Morning?azmeen, Flickr / CC BY 2.0 107
  108. 108. Dear Diary: Today I Put My Keys on the Kitchen CounterBarnaby, Flickr / CC BY 2.0 108
  109. 109. C D C D A-G C D C D C D H-NAPP F* C D C D C D O-T C D C D C D U-Z C D 109
  110. 110. I Always Put My Keys on the Hook By the Doorvitamindave, Flickr / CC BY 2.0 110
  111. 111. HOW DO YOU FIND YOUR KEYSWHEN YOUR HOUSE IS INFINITELY BIG ANDALWAYS CHANGING? 111
  112. 112. The Answer: CRUSH!!!!!pasukaru76, Flickr / CC SA 2.0 112
  113. 113. 10 10 01 01 10 10 01 11 01 10 hash(object name) % num pg10 10 01 01 10 10 01 11 01 10 CRUSH(pg, cluster state, rule set) 113
  114. 114. 10 10 01 01 10 10 01 11 01 1010 10 01 01 10 10 01 11 01 10 114
  115. 115. CRUSH•  Pseudo-random placement algorithm•  Ensures even distribution•  Repeatable, deterministic•  Rule-based configuration •  Replica count •  Infrastructure topology •  Weighting 115
  116. 116. CLIENT ?? 116
  117. 117. 117
  118. 118. 118
  119. 119. CLIENT ?? 119
  120. 120. What Makes Ceph UniquePart two: thin provisioning 120
  121. 121. VMVIRTUALIZATION CONTAINER LIBRBD LIBRADOS M M M 121
  122. 122. HOW DO YOU SPIN UPTHOUSANDS OF VMs INSTANTLY AND EFFICIENTLY? 122
  123. 123. instant copy144 0 0 0 0 = 144 123
  124. 124. write CLIENT write write write144 4 = 148 124
  125. 125. read read CLIENT read144 4 = 148 125
  126. 126. What Makes Ceph Unique?Part three: clustered metadata 126
  127. 127. Metadata for a POSIX-Compliant FilesystemBarnaby, Flickr / CC BY 2.0 127
  128. 128. CLIENT 01 10 MM M 128
  129. 129. MM M 129
  130. 130. one treethree metadata servers ?? 130
  131. 131. 131
  132. 132. 132
  133. 133. 133
  134. 134. 134
  135. 135. DYNAMIC SUBTREE PARTITIONING 135
  136. 136. And Now: Backpedaling 136
  137. 137. ALMOSTEVERYTHING WORKS 137
  138. 138. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP AWESOME AWESOME NEARLY AWESOME AWESOMERADOS AWESOMEA reliable, autonomous, distributed object store comprised of self-healing, self-managing,intelligent storage nodes 138
  139. 139. *LAN SCALE!!* OR REALLY REALLY SCARY FAST WAN 139
  140. 140. What is Inktank?I really like your polo shirt, please tell me what it means! 140
  141. 141. 141
  142. 142. Who?•  Ceph’s inventor and (most) developers 142
  143. 143. Why?•  To ensure the long-term success of Ceph•  To help companies adopt Ceph through services, support, training, and consulting 143
  144. 144. When?•  Founded: December 28, 2011•  Brand Launched: April 2012 144
  145. 145. What do we want from you??•  Try Ceph! Tell us what you think. Ask if you need help. Help others if you can!•  Are you a company? Consider dedicating dev resources to the project. 145
  146. 146. Questions?Ross TurkVP Community, Inktankross@inktank.com@rossturkinktank.com | ceph.com 146

×