Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandra Summit 2016

2,175 views

Published on

The Anti-Entropy process used by nodetool repair is the way of ensuring consistency of data on disk. Over the many years of the Apache Cassandra project it has also been the biggest pain point for teams running Cassandra. With a solid repair process in place you can be confident that deleted data will not come back to life, and that data is fully distributed when nodes fail.

In this talk Alexander Dejanovski, Consultant at The Last Pickle, will explain how Anti-Entropy works and why it should be run on your cluster. He will discuss the different options such as ""primary range"" repair, sub-range repairs, and incremental repair introduced in version 2.1.
He will also introduce additional tools such as the Spotify Reaper and the range repair script, and future optimisations incremental repair could bring to the read path.

About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle

Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.

Published in: Software

Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandra Summit 2016

  1. 1. Real world tales of repair
  2. 2. CASSANDRA SUMMIT - SEPTEMBER 2016 Alexander Dejanovski @alexanderdeja Consultant www.thelastpickle.com Datastax MVP for Apache Cassandra Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  3. 3. AboutThe Last Pickle
 We help people deliver and improve Apache Cassandra based solutions. With staff in 5 countries and over 50 years combined experience in Apache Cassandra.
  4. 4. What and why ? Full repair Incremental repair How to make it work Automated repairs www.thelastpickle.com
  5. 5. What is repair ? A maintenance operation that (briefly) restores strong consistency throughout the cluster www.thelastpickle.com
  6. 6. Why do we need repair ?
 - Eventual consistency - Downtime / failure recovery - Safe deletes www.thelastpickle.com
  7. 7. Tombstones need repair too
 
 Missing tombstones can lead to zombie data (repair within gc_grace_seconds) www.thelastpickle.com
  8. 8. What and why ? Full repair Incremental repair How to make it work Automated repairs www.thelastpickle.com
  9. 9. How does anti-entropy repair works ? Reads all data www.thelastpickle.com
  10. 10. How does anti-entropy repair works ? Reads all data Calculates hashes www.thelastpickle.com
  11. 11. How does anti-entropy repair works ? Reads all data Calculates hashes Compares hashes www.thelastpickle.com
  12. 12. How does anti-entropy repair works ? Reads all data Calculates hashes Compares hashes Streams mismatching partitions www.thelastpickle.com
  13. 13. How does anti-entropy repair works ? www.thelastpickle.com
  14. 14. Merkle tree is requested to all replicas www.thelastpickle.com
  15. 15. Validation compaction www.thelastpickle.com
  16. 16. Merkle tree comparison www.thelastpickle.com
  17. 17. Streaming www.thelastpickle.com
  18. 18. How do we run repair ? 
 nodetool repair www.thelastpickle.com
  19. 19. Improving repair www.thelastpickle.com
  20. 20. Improving repair www.thelastpickle.com
  21. 21. Improving repair www.thelastpickle.com
  22. 22. Improving repair 
 repairing each range once is enough www.thelastpickle.com
  23. 23. Improving repair 
 nodetool repair -pr www.thelastpickle.com
  24. 24. Improving repair 
 nodetool repair -pr not suitable for node recovery www.thelastpickle.com
  25. 25. Repair too slow ? 
 Sequential repair is the default since C* 2.0 www.thelastpickle.com
  26. 26. Repair too slow ? 
 nodetool repair -par www.thelastpickle.com
  27. 27. The problem with dense nodes Overstreaming Leaves of the Merkle tree contain several partitions. www.thelastpickle.com
  28. 28. The solutions with dense nodes cassandra_range_repair (Matt Stump & Brian Gallew) Breaks the repair sessions in n steps www.thelastpickle.com
  29. 29. The solutions with dense nodes vnodes : one repair session per vnode Drawback : if you have many vnodes, repair takes longer www.thelastpickle.com
  30. 30. Repair in… www.thelastpickle.com
  31. 31. The early days of your cluster Node density is low, repair works just fine however you run it. www.thelastpickle.com
  32. 32. The early days of your cluster So maybe like I did, you run « nodetool repair » on all nodes… at the same time www.thelastpickle.com
  33. 33. The (not so) early days of your cluster As nodes gets higher in density, repair takes longer… and longer… www.thelastpickle.com
  34. 34. The (not so) early days of your cluster … and latencies rise as repair is a CPU and I/O intensive operation www.thelastpickle.com
  35. 35. Your cluster is a grown up now … until it breaks your cluster www.thelastpickle.com
  36. 36. How can it break ? Load gets too high www.thelastpickle.com
  37. 37. How can it break ? Load gets too high You don’t meet your latency SLA anymore www.thelastpickle.com
  38. 38. How can it break ? Load gets too high www.thelastpickle.com
  39. 39. How can it break ? Load gets too high Streams get stuck www.thelastpickle.com
  40. 40. How can it break ? Load gets too high Streams get stuck and out of nowhere, all nodes start to eat all your CPU doing nothing www.thelastpickle.com
  41. 41. The fun part ? You need to run repair to recover from the repair outage ! www.thelastpickle.com
  42. 42. The cluster keeps growing And you realize orchestration is needed to stop blowing up your cluster www.thelastpickle.com
  43. 43. Orchestrating repair Repair must not run on all nodes at the same time www.thelastpickle.com
  44. 44. Tools to orchestrate repairs OpsCenter repair service (DSE users) Spotify reaper www.thelastpickle.com
  45. 45. Spotify reaper https://github.com/spotify/cassandra-reaper www.thelastpickle.com
  46. 46. Spotify reaper Performs subrange repair www.thelastpickle.com
  47. 47. Spotify reaper Performs subrange repair Limits repair pressure www.thelastpickle.com
  48. 48. Spotify reaper Performs subrange repair Limits repair pressure Retries failed sessions www.thelastpickle.com
  49. 49. Spotify reaper Performs subrange repair Limits repair pressure Retries failed sessions Schedules cyclic repairs www.thelastpickle.com
  50. 50. Spotify reaper Performs subrange repair Limits repair pressure Retries failed sessions Schedules cyclic repairs Optimizes cluster load www.thelastpickle.com
  51. 51. Spotify reaper - with UI (thx Stefan Podkowinski) GUI screenshots www.thelastpickle.com
  52. 52. What and why ? Full repair Incremental repair How to make it work Automated repairs www.thelastpickle.com
  53. 53. What if we stopped repairing repaired data ? www.thelastpickle.com
  54. 54. Here comes the savior ! C* 2.1 introduces incremental repair Default repair mode since C* 2.2 www.thelastpickle.com
  55. 55. How does incremental repair work ? www.thelastpickle.com
  56. 56. Anticompaction www.thelastpickle.com
  57. 57. Anticompaction (repair on all ranges on local node) www.thelastpickle.com
  58. 58. Incremental repair looks awesome… …but has flaws and drawbacks www.thelastpickle.com
  59. 59. Incremental repair caveats Carefully prepare your switch to incremental repair www.thelastpickle.com
  60. 60. Incremental repair caveats Carefully prepare your switch to incremental repair i.e. do not run « nodetool repair -inc » straight away… www.thelastpickle.com
  61. 61. Incremental repair caveats It doesn’t handle missing/corrupted data that was already repaired www.thelastpickle.com
  62. 62. Incremental repair caveats 
 It splits SSTables in 2 sets
 that cannot be compacted together (think tombstone purge) www.thelastpickle.com
  63. 63. Incremental repair caveats It is incompatible with subrange repair (anticompaction) www.thelastpickle.com
  64. 64. Incremental repair caveats It doesn’t like concurrency very much even in C* 3.x www.thelastpickle.com
  65. 65. Incremental repair caveats Validator.java:261 - Failed creating a merkle tree for [repair #e4c782d0-11fc-11e6- b616-51a3849870bb on table_v2/table_attributes, [(8835460833482333317,8838777311566358575], (-7300486781514672850,-7298192396576668423], (-959298474675167225,-959177964106074209]]], /10.10.10.33 (see log for details) www.thelastpickle.com
  66. 66. Incremental repair caveats CompactionManager.java:1320 - Cannot start multiple repair sessions over the same sstables www.thelastpickle.com
  67. 67. Incremental repair caveats CASSANDRA-8316 A running anticompation prevents validation compaction www.thelastpickle.com
  68. 68. Incremental repair caveats Do not use -pr with incremental repair www.thelastpickle.com
  69. 69. Incremental repair caveats Do not use -pr with incremental repair Useless : data is repaired once only www.thelastpickle.com
  70. 70. Incremental repair caveats Do not use -pr with incremental repair Useless : data is repaired once only Expensive : anticompaction overhead www.thelastpickle.com
  71. 71. Incremental repair will not… Fix a poor repair strategy www.thelastpickle.com
  72. 72. Incremental repair will not… Prevent you from having to run full repair www.thelastpickle.com
  73. 73. Reaper does not support incremental repair But this fork does : https://github.com/adejanovski/cassandra- reaper/tree/inc-repair-that-works www.thelastpickle.com
  74. 74. Reaper does not support incremental repair And this one embeds the modded UI : https://github.com/adejanovski/cassandra- reaper/tree/inc-repair-support-with-ui www.thelastpickle.com
  75. 75. Reaper does not support incremental repair Not enough time to write those urls ? github.com/adejanovski www.thelastpickle.com
  76. 76. Reaper inc repair fork No subrange repair www.thelastpickle.com
  77. 77. Reaper inc repair fork No subrange repair Single repair thread => no concurrency www.thelastpickle.com
  78. 78. What and why ? Full repair Incremental repair How to make it work Automated repairs www.thelastpickle.com
  79. 79. Repair best practices Put your repair strategy in place on day 1 www.thelastpickle.com
  80. 80. Repair best practices Use appropriate tooling or build your own www.thelastpickle.com
  81. 81. Repair best practices Spread repair over a gc_grace_seconds cycle www.thelastpickle.com
  82. 82. Repair best practices Adjust repair pressure on your cluster (Reaper does that) www.thelastpickle.com
  83. 83. Repair best practices Don’t repair everything ! Pick tables with deletes and those with critical data www.thelastpickle.com
  84. 84. Repair best practices If every data is critical, then none is ;) www.thelastpickle.com
  85. 85. Repair best practices Be tight on your schedule with inc repair Tombstones and anticompaction www.thelastpickle.com
  86. 86. Repair best practices Avoid concurrency with inc repair One node at a time www.thelastpickle.com
  87. 87. What and why ? Full repair Incremental repair How to make it work Automated repairs www.thelastpickle.com
  88. 88. The bright future ? Not having to think about cyclic maintenance repairs www.thelastpickle.com
  89. 89. CASSANDRA-10070 Automatic repair scheduling www.thelastpickle.com
  90. 90. CASSANDRA-8911 Mutation-based Repairs www.thelastpickle.com
  91. 91. Thanks!@alexanderdeja

×