OSDC 2014: Martin Gerhard Loschwitz - What's next for Ceph?

789 views

Published on

Ceph has recently gained considerable momentum as a possible replacement for conventional storage technologies. Every new Ceph release brings a number of important improvements and interesting features such as Erasure Coding and Multi-Site replication. Work is on the way to make CephFS, the POSIX-compatible Ceph file system, ready for enterprise usage and the number of companies using Ceph is permanently increasing. More than enough reasons to take a closer look at recent Ceph developments: What's hot and boiling and which features do the Ceph developers have on their list for implementation next?

Published in: Software, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
789
On SlideShare
0
From Embeds
0
Number of Embeds
41
Actions
Shares
0
Downloads
23
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

OSDC 2014: Martin Gerhard Loschwitz - What's next for Ceph?

  1. 1. What’s next for Ceph? On the future of scalable storage Martin Gerhard Loschwitz © 2014 hastexo Professional Services GmbH. All rights reserved.
  2. 2. Who?
  3. 3. Quick reminder: Object Storage
  4. 4. Users Objects HDD FS HDD FS HDD FS HDD FS HDD FS HDD FS HDD FS
  5. 5. Cephalopod (Wikipedia, user Nhobgood)
  6. 6. RADOS
  7. 7. Redundant Autonomic Distributed Object Store
  8. 8. 2 Components
  9. 9. OSDs
  10. 10. Users Objects HDD FS HDD FS HDD FS HDD FS HDD FS HDD FS HDD FS
  11. 11. OSD OSD OSD OSD OSD OSD OSD Users Objects
  12. 12. Unified Storage
  13. 13. OSD OSD OSD OSD OSD OSD OSD Users Objects
  14. 14. Users Objects OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD
  15. 15. Users Objects OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD
  16. 16. Users Objects OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD
  17. 17. MONs
  18. 18. Users Objects OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD MON MON MON
  19. 19. Data Placement
  20. 20. MONs
  21. 21. MONs
  22. 22. MONs
  23. 23. MONs
  24. 24. MONs
  25. 25. MONs
  26. 26. MONs
  27. 27. Parallelization
  28. 28. 221 1 MONs
  29. 29. 221 1 MONs
  30. 30. 221 11 2 21 MONs
  31. 31. MONs
  32. 32. CRUSH
  33. 33. Controlled Replication Under Scalable Hashing
  34. 34. By configuring CRUSH, you make the cluster rack-aware.
  35. 35. MON MON MON Users Objects OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD RADOS Block Device Block-level interface driver for RADOS RADOS Gateway ReSTful API to access RADOS CephFS POSIX file system access to RADOS
  36. 36. “Booooring!”
  37. 37. Cool Stuff ahead: Erasure Coding Tiering Multi-DC Setups Automation CephFS Enterprise Support
  38. 38. Erasure Coding
  39. 39. 221 11 2 21 MONs
  40. 40. Until now, Ceph has really worked like a standard RAID 1.
  41. 41. Every binary object exists two times.
  42. 42. 221 121 21 MONs
  43. 43. Works great. But it also reduces the net capacity by 50%. At least.
  44. 44. That is where Erasure Coding comes in. It makes Ceph work like a RAID 5.
  45. 45. Mostly developed by Loic Dachary
  46. 46. Idea: Split binary objects into even smaller chunks
  47. 47. MONs
  48. 48. MONs
  49. 49. MONs
  50. 50. MONs
  51. 51. This reduces the amount of space required for replicas enormously!
  52. 52. Different replication factors available
  53. 53. But: The lower the level is, the longer it takes to re-calculate missing chunks.
  54. 54. Available in Ceph 0.80.
  55. 55. Tiering
  56. 56. Not all data stored in Ceph is equal.
  57. 57. Often needed, fresh data is usually expected to be served quickly.
  58. 58. Also, customers may be willing to accept slower performance in exchange for lower prices.
  59. 59. Until now, that wasn’t easy to implement in RADOS due to a number of limitations.
  60. 60. With Ceph 0.80, pools will allow to store data on different, hardware, based on its performance
  61. 61. Wait. Pools?
  62. 62. Pools are a logical unit in RADOS. A pool is a bunch of Placement Groups.
  63. 63. By using tiering, Pools can be tied to specific hardware components.
  64. 64. All replication happens intra-pool
  65. 65. Data may be moved from one pool to another pool in RADOS
  66. 66. Available in Ceph 0.80.
  67. 67. Multi-DC Setups
  68. 68. Ceph was designed for high-performance, synchronous replication
  69. 69. Off-Site replication is typically asynchronous.
  70. 70. Bummer!
  71. 71. But starting with Ceph 0.67, the RADOS Gateway supports “Federation”
  72. 72. MONs 1 MONs DC 2 DC 1 RADOS Gateway RADOS Gateway Sync-Agent Sync-Agent
  73. 73. MONs 1 MONs DC 2 DC 1 RADOS Gateway RADOS Gateway Sync-Agent Sync-Agent 1
  74. 74. In fact, the federation feature adds asynchronous replication on top of the RADOS storage cluster
  75. 75. Still needs better integration with the other Ceph components
  76. 76. Automation
  77. 77. Ceph clusters will almost always be deployed using tools for automation
  78. 78. Thus, it needs to play together well with Chef, Puppet & Co.
  79. 79. Chef: Yay! Chef cookbooks are maintained and provided by Inktank.
  80. 80. Puppet: Ouch Inktank does not provide Puppet modules for Ceph deployment
  81. 81. Right now, at least 6 concurring modules exist on GitHub, some forks of each other
  82. 82. None of these use ceph-deploy, though.
  83. 83. But there is hope: puppet-cephdeploy does use ceph-deploy
  84. 84. Needs some additional work, but generally, looks very promising and already works
  85. 85. Plays together nicely even with ENCs such as the Puppet Dashboard or the Foreman project
  86. 86. CephFS
  87. 87. Considered Vapoware by some people already. But that’s not fair!
  88. 88. CephFS is already available and works. Well, mostly.
  89. 89. For CephFS, the really critical component is the Metadata Server (MDS)
  90. 90. Running CephFS today with exactly one active MDS is fine and will most likely not cause trouble.
  91. 91. But Sage wants the MDS to scale-out properly so that running several active MDSes at a time works
  92. 92. That’s called Subtree Partitioning. Every active MDS will be responsible for the meta-data of a certain subtree of the POSIX-compatible FS
  93. 93. Right now, Subtree partitioning is what’s causing trouble.
  94. 94. CephFS is not Inktank’s main priority; likely to be released as “stable” in Q4 2014
  95. 95. Enterprise Support
  96. 96. Major companies willing to run Ceph need some type of support contract.
  97. 97. Inktank has started to offer that support through a product called “Inktank Ceph Enterprise” (ICE)
  98. 98. Gives users Long-Term support for certain Ceph releases (such as 0.80) and hot-fixes for problems
  99. 99. Also brings Calamari, Inktank’s Ceph GUI
  100. 100. Distribution Support
  101. 101. Inktank does a lot to make installing Ceph on different distributions as smooth as possible already.
  102. 102. Ye olde OSes: Ubuntu 12.04 Debian Wheezy RHEL 6 SLES 11
  103. 103. Ubuntu 14.04: May 2014
  104. 104. RHEL 7: December 2014
  105. 105. Release Schedule
  106. 106. Firefly (0.80): May 2014, along with ICE 1.2
  107. 107. Giant: Summer 2014 (Non-LTS version)
  108. 108. The “H”-release: December 2014, along with ICE 2.0
  109. 109. Ceph Days
  110. 110. Ceph Days are information events run by Inktank all over the world.
  111. 111. 2 have happened in Europe so far: London (October 2013) Frankfurt (Februar 2014)
  112. 112. Ceph Days allow to gather with others willing to use Ceph, exchange experiences.
  113. 113. And you can meet Sage Weil
  114. 114. No shit. You can meet Sage Weil!
  115. 115. Special thanks to Sage Weil (Twitter: @liewegas) & Crew for Ceph Inktank (Twitter: @inktank) for the Ceph-Logo
  116. 116. martin.loschwitz@hastexo.com goo.gl/S1sYZ (me on Google+) twitter.com/hastexo hastexo.com 221 11 2 21 MONS

×