Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Understanding AntiEntropy in Cassandra


Published on

Introduction to the anti-entropy mechanisms in cassandra. Covers write and read paths as well as node repair.

Published in: Technology, Business
    aime moi ma page stp
    Are you sure you want to  Yes  No
    Your message goes here
  • legal gostei
    Are you sure you want to  Yes  No
    Your message goes here

Understanding AntiEntropy in Cassandra

  1. When Bad ThingsHappen to Good DataUnderstanding Anti-Entropy in Cassandra#cassandra13Jason Brown@jasobrown
  2. About me• Senior Software Engineer, Netflix• Apache Cassandra committer• E-commerce Architect, Major League Baseball AdvancedMedia• Wireless Developer (J2ME and BREW)#cassandra13
  3. Maintaining consistent state is hard in a distributed systemCAP theorem is working against you#cassandra13
  4. Inconsistencies creep in• Node is down• Network partition• Dropped Mutations• Process crash before flush• File corruption#cassandra13
  5. Anti-Entropy Overview• Write time• Tunable consistency• Atomic batches• Hinted handoff• Read time• Consistent reads• Read repair• Maintenance time• Node repair#cassandra13
  6. Write Time#cassandra13
  7. C* Write Basics• Determine all replica nodes, in all DCs• Send to all replicas in local DC• Send to one replica in remote DCs• It will forward to peers• All respond back to coordinator#cassandra13
  8. Writes – request path#cassandra13
  9. Writes – response path#cassandra13
  10. Tunable consistencyCoordinator blocks for specified count of replicas to respondconsistency levels:• ANY• ONE / TWO / THREE• LOCAL_QUORUM• EACH_QUORUM• ALL#cassandra13
  11. Hinted HandoffSave a copy of the write for down nodes, and replay laterHint = target replica ID + mutation data#cassandra13
  12. Hinted Handoff - storing• On coordinator, store hint for nodes not up• Also, if a replica doesn’t respond withinwrite_request_timeout_in_ms, store a hint• max_hint_window_in_ms – max time a node will createhints for a dead node#cassandra13
  13. Hinted Handoff - replay• Try to send hints to nodes• Runs every ten minutes• Multithreaded (c* 1.2)• Throttleable (kb per second)#cassandra13
  14. Hinted Handoff – down node#cassandra13
  15. Hinted Handoff – replay#cassandra13
  16. What if coordinator dies?#cassandra13
  17. Atomic Batches• Coordinator stores incoming mutation to two peers insame DC• Deletes batch from peers on successful completion• Peers will play batch if not deleted• Runs every 60 seconds• With c* 1.2, all mutates use atomic batch#cassandra13
  18. Read time#cassandra13
  19. Cassandra reads - setup• Determine replicas to invoke• consistency level vs. read repair• First data node responds with full data set, other senddigest• Coordinator waits for consistency_level nodes to respond#cassandra13
  20. LOCAL_QUORUM read#cassandra13
  21. Consistent reads• Compare digests• If any mismatches• re-request to same nodes (full data set)• compare full data sets, send updates• block until out of date replicas respond successfully• Return merged data set to client#cassandra13
  22. Read repair• Synchronizes the client-requested data amongst allreplicas• Piggy-backs on normal reads, but waits for all replicas toresponds (asynchronously)• Compares the digests and follow same alg as consistentread#cassandra13
  23. Read Repair#cassandra13Green lines = LOCAL_QUORUM nodesBlue lines = nodes for read repair
  24. Read repair configuration• Setting per column family• Percentage of all reads to CF• Local DC vs. Global#cassandra13
  25. Read repair fixes data that is actuallyrequested,…but what about data that isn’t requested?#cassandra13
  26. Node repair - introduction• Repairs inconsistencies across all replicas for a givenrange• nodetool repair• repairs the ranges the node contains• one or more column families (within the same keyspace)• can choose local datacenter only (c* 1.2)#cassandra13
  27. Node Repair - cautions• Should be part of standard c* operations• Especially if you delete data• Repair is IO and CPU intensive#cassandra13
  28. Node Repair – details, 1• Determine peer nodes with matching ranges• Triggers a major (validation) compaction on peer nodes• read and generate hash for every row in CF• add result to a Merkle Tree• return tree to initiator#cassandra13
  29. Node Repair – details, 2• Initiator awaits trees from participating nodes• Compares every tree to every other tree• If any differences detected, the differing nodes exchangeconflicting range(s)• Written out as new, local SSTables#cassandra13
  30. Read Repair – example#cassandra13
  31. #cassandra13
  32. #cassandra13
  33. #cassandra13
  34. #cassandra13
  35. Anti-Entropy – Wrap Up• CAP Theorem lives, tradeoffs must be understood andmade• C* contains processes to make diverging data setsconsistent• Tunable controls exist at write and read times, as well on-demand#cassandra13
  36. Thank you!Q & A time@jasobrown#cassandra13