Fault Tolerance in Cassandra

5,296
-1

Published on

A short talk on how Cassandra deals with various failure modes. Discussion of replication and consistency levels and how they can be used to survive many kinds of failure. Ends with explanation of recovery methods - repair, hinted handoff and read repair.

Published in: Technology, News & Politics
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,296
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
52
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Fault Tolerance in Cassandra

  1. 1. Fault tolerance in Cassandra Richard Low richard@acunu.com @acunu @richardalowCassandra London Meetup, 5 Sept 2011Tuesday, 6 September 2011
  2. 2. Menu • Failure modes • Maintaining availability • RecoveryTuesday, 6 September 2011
  3. 3. Failure modesTuesday, 6 September 2011
  4. 4. Failures are the norm • With more than a few nodes, something goes wrong all the time • Don’t want to be down all the timeTuesday, 6 September 2011
  5. 5. Failure causes • Hardware failure • Bug • Power • Natural disasterTuesday, 6 September 2011
  6. 6. Failure modes • Data centre failure • Node failure • Disk failureTuesday, 6 September 2011
  7. 7. Failure modes • Data centre failure • Node failure • Disk failure • Temporary • PermanentTuesday, 6 September 2011
  8. 8. Failure modes • Network failure • One node • Network partition • Whole data centreTuesday, 6 September 2011
  9. 9. Failure modes • Operator failure • Delete files • Delete entire database • Incorrect configurationTuesday, 6 September 2011
  10. 10. Failure modes • Want a system that can tolerate all the above failures • Make assumptions about probabilities of multiple events • Be careful when assuming independenceTuesday, 6 September 2011
  11. 11. Solutions • Do nothing • Make boxes bullet proof • ReplicationTuesday, 6 September 2011
  12. 12. AvailabilityTuesday, 6 September 2011
  13. 13. How maintain availability in the presence of failure?Tuesday, 6 September 2011
  14. 14. Replication • Buy cheap nodes and cheap disks • Store multiple copies of the data • Don’t care if some disappearTuesday, 6 September 2011
  15. 15. Replication • What about consistency? • What if I can’t tolerate out-of-date reads? • How restore a replica?Tuesday, 6 September 2011
  16. 16. RF and CL • Replication factor • How many copies • How much failure can tolerate • Consistency Level • How many nodes must be contactable for operation to succeedTuesday, 6 September 2011
  17. 17. Simple example • Replication factor 3 • Uniform network topology • Read and write at CL.QUORUM • Strong consistency • Available if any one node is down • Can recover if any two nodes failTuesday, 6 September 2011
  18. 18. In general • RF N, reads and writes at CL.QUORUM • Available if ceil(N/2)-1 nodes fail • Can recover if N-1 nodes failTuesday, 6 September 2011
  19. 19. Multi data centre • Cassandra knows location of hosts • Through the snitch • Can ensure replicas in each DC • NetworkTopologyStrategy • => can cope with whole DC failureTuesday, 6 September 2011
  20. 20. RecoveryTuesday, 6 September 2011
  21. 21. Recovery • Want to maintain replication factor • Ensures recovery guarantees • Methods: • Automatic • ManualTuesday, 6 September 2011
  22. 22. AutomaticTuesday, 6 September 2011
  23. 23. Automatic processes • Eventually moves replicas towards consistency • The ‘eventual’ in ‘eventual consistency’Tuesday, 6 September 2011
  24. 24. Hinted Handoff • Hints • Stored on any node • When a node is temporarily unavailable • Delivered when the node comes back • Can use CL.ANY • Writes not immediately readableTuesday, 6 September 2011
  25. 25. Read Repair • Since done a read, might as well repair any old copies • Compare values, update any out of syncTuesday, 6 September 2011
  26. 26. ManualTuesday, 6 September 2011
  27. 27. Repair: method • Ensures a node is up to date • Run ‘nodetool -h <node> repair’ • Reads through entire data on the node • Builds a Merkel tree • Compares with replicas • Streams differencesTuesday, 6 September 2011
  28. 28. Repair: when • After node has been down a long time • After increasing replication factor • Every 10 days to ensure tombstones are propagated • Can be used to restore a failed nodeTuesday, 6 September 2011
  29. 29. Replace a node: method • Bootstrap new node with <old_token>-1 • Tell existing nodes old node is dead • nodetool removeTuesday, 6 September 2011
  30. 30. Replace a node: when • Complete node failure • Cannot replace failed disk • CorruptionTuesday, 6 September 2011
  31. 31. Restore from backup: method • Stop Cassandra on the node • Copy SSTables from backup • Restart Cassandra • Make take a while reading indexesTuesday, 6 September 2011
  32. 32. Restore from backup: when • Disk failure • with no RAID rebuild available • Operator error • Corruption • HackerTuesday, 6 September 2011
  33. 33. Thanks :) www.acunu.com richard@acunu.com @acunu @richardalowTuesday, 6 September 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×