SlideShare a Scribd company logo
1 of 37
Download to read offline
When Bad Things
Happen to Good Data:
Understanding Anti-Entropy in
Cassandra
Jason Brown
@jasobrown jasedbrown@gmail.com
About me
•  Senior Software Engineer @ Netflix
•  Apache Cassandra committer
•  E-Commerce Architect, Major League
Baseball Advanced Media
•  Wireless developer (J2ME and BREW)
Maintaining consistent state is hard in a
distributed system
CAP theorem works against you
Inconsistencies creep in
•  Node is down
•  Network partition
•  Dropped mutations
•  Process crash before commit log flush
•  File corruption
Cassandra trades C for AP
Anti-Entropy Overview
•  write time
o  tunable consistency
o  atomic batches
o  hinted handoff
•  read time
o  consistent reads
o  read repair
•  maintenance time
o  node repair
Write Time
Cassandra Writes Basics
•  determine all replica nodes in all DCs
•  send to replicas in local DC
•  send one replica node in remote DCs,
o  it will forward to peers
•  all respond back to original coordinator
Writes - request path
Writes - response path
Writes - Tunable consistency
Coordinator blocks for specified count of
replicas to respond
•  consistency level
o  ALL
o  EACH_QUORUM
o  LOCAL_QUORUM
o  ONE / TWO / THREE
o  ANY
Hinted handoff
Save a copy of the write for down nodes, and
replay later
hint = target replica + mutation data
Hinted handoff - storing
•  on coordinator, store a hint for any nodes not
currently 'up'
•  if a replica doesn't respond within
write_request_timeout_in_ms, store a hint
•  max_hint_window_in_ms - maximum
amount of time a dead host will have hints
generated.
Hinted handoff - replay
•  try to send hints to nodes
•  runs every ten minutes
•  multithreaded (as of 1.2)
•  throttable (kb per second)
Hinted Handoff - R2 down
R2 down, coordinator (R1) stores hint
Hinted handoff - replay
R2 comes back up, R1 plays hints for it
What if coordinator dies?
Atomic Batches
•  coordinator stores incoming mutation to two
peers in same DC
o  deletes from peers on successful completion
•  peers will replay the batch if not deleted
o  runs every 60 seconds
•  with 1.2, all mutates use atomic batch
Read Time
Cassandra Reads - setup
•  determine endpoints to invoke
o  consistency level vs. read repair
•  first data node to send back full data set,
other nodes only return a digest
•  wait until the CL number of nodes to return
LOCAL_QUORUM read
Pink nodes contain requested row key
Consistent reads
•  compare the digests of returned data sets
•  if any mismatches, send request again to
same CL data nodes.
o  this time no digests, full data set
•  compare the full data sets, send updates to
out of date replicas
•  block until those fixes are responded to
•  return data to caller
Read Repair
•  synchronizes the client-requested data
amongst all replicas
•  piggy-backs on normal reads, but waits for
all replicas to respond asynchronously
•  then, just like consistent reads, compares
the digests, and fix if needed
Read Repair
green lines = LOCAL_QUORUM nodes
blue lines = nodes for read repair
Read Repair - configuration
•  setting per column family
•  percentage of all calls to CF
•  Local DC vs. Global chance
Read repair fixes data that is actually
requested,
... but what about data that isn't requested?
Node Repair - introduction
•  repairs inconsistencies across all replicas for
a given range
•  nodetool repair
o  repairs the ranges the node contains
o  one of more column families (within the same
keyspace)
o  can choose local datacenter only (c* 1.2)
•  should be part of std operations
maintenance for c*, esp if you delete data
o  ensures tombstones are propagated, and avoid
resurrected data
•  repair is IO and CPU intensive
Node Repair - cautions
Node Repair - details 1
•  determine peer nodes with matching ranges
•  triggers a major (validation) compaction on
peer nodes
o  read and generate hash for every row in CF
o  add result to a Merkle Tree
o  return tree to initiator
Node Repair - details 2
•  initiator awaits trees from all nodes
•  compares each tree to every other tree
•  if any differences exist, two nodes are
exchange the conflicting ranges
o  these ranges get written out as new, local sstables
'ABC' node is repair initiator
Nodes sharing range A
Nodes sharing range B
Nodes sharing range C
Five nodes participating in repair
Anti-Entropy wrap-up
•  CAP Theorem lives, tradeoffs must be made
•  C* contains processes to make diverging
data sets consistent
•  Tunable controls exist at write and read
times, as well on-demand
Thank you!
Q & A time
@jasobrown
Notes from Netflix
•  carefully tune RR_chance
•  schedule repair operations
•  tickler
•  store more hints vs. running repair

More Related Content

More from DataStax Academy

Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph DatabasesDataStax Academy
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkDataStax Academy
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and CassandraDataStax Academy
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talkDataStax Academy
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayDataStax Academy
 

More from DataStax Academy (20)

Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with Spark
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and Cassandra
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

C* Summit 2013: When Bad Things Happen to Good Data: A Deep Dive Into How Cassandra Resolves Inconsistent Data by Jason Brown

  • 1. When Bad Things Happen to Good Data: Understanding Anti-Entropy in Cassandra Jason Brown @jasobrown jasedbrown@gmail.com
  • 2. About me •  Senior Software Engineer @ Netflix •  Apache Cassandra committer •  E-Commerce Architect, Major League Baseball Advanced Media •  Wireless developer (J2ME and BREW)
  • 3. Maintaining consistent state is hard in a distributed system CAP theorem works against you
  • 4. Inconsistencies creep in •  Node is down •  Network partition •  Dropped mutations •  Process crash before commit log flush •  File corruption Cassandra trades C for AP
  • 5. Anti-Entropy Overview •  write time o  tunable consistency o  atomic batches o  hinted handoff •  read time o  consistent reads o  read repair •  maintenance time o  node repair
  • 7. Cassandra Writes Basics •  determine all replica nodes in all DCs •  send to replicas in local DC •  send one replica node in remote DCs, o  it will forward to peers •  all respond back to original coordinator
  • 10. Writes - Tunable consistency Coordinator blocks for specified count of replicas to respond •  consistency level o  ALL o  EACH_QUORUM o  LOCAL_QUORUM o  ONE / TWO / THREE o  ANY
  • 11. Hinted handoff Save a copy of the write for down nodes, and replay later hint = target replica + mutation data
  • 12. Hinted handoff - storing •  on coordinator, store a hint for any nodes not currently 'up' •  if a replica doesn't respond within write_request_timeout_in_ms, store a hint •  max_hint_window_in_ms - maximum amount of time a dead host will have hints generated.
  • 13. Hinted handoff - replay •  try to send hints to nodes •  runs every ten minutes •  multithreaded (as of 1.2) •  throttable (kb per second)
  • 14. Hinted Handoff - R2 down R2 down, coordinator (R1) stores hint
  • 15. Hinted handoff - replay R2 comes back up, R1 plays hints for it
  • 17. Atomic Batches •  coordinator stores incoming mutation to two peers in same DC o  deletes from peers on successful completion •  peers will replay the batch if not deleted o  runs every 60 seconds •  with 1.2, all mutates use atomic batch
  • 19. Cassandra Reads - setup •  determine endpoints to invoke o  consistency level vs. read repair •  first data node to send back full data set, other nodes only return a digest •  wait until the CL number of nodes to return
  • 20. LOCAL_QUORUM read Pink nodes contain requested row key
  • 21. Consistent reads •  compare the digests of returned data sets •  if any mismatches, send request again to same CL data nodes. o  this time no digests, full data set •  compare the full data sets, send updates to out of date replicas •  block until those fixes are responded to •  return data to caller
  • 22. Read Repair •  synchronizes the client-requested data amongst all replicas •  piggy-backs on normal reads, but waits for all replicas to respond asynchronously •  then, just like consistent reads, compares the digests, and fix if needed
  • 23. Read Repair green lines = LOCAL_QUORUM nodes blue lines = nodes for read repair
  • 24. Read Repair - configuration •  setting per column family •  percentage of all calls to CF •  Local DC vs. Global chance
  • 25. Read repair fixes data that is actually requested, ... but what about data that isn't requested?
  • 26. Node Repair - introduction •  repairs inconsistencies across all replicas for a given range •  nodetool repair o  repairs the ranges the node contains o  one of more column families (within the same keyspace) o  can choose local datacenter only (c* 1.2)
  • 27. •  should be part of std operations maintenance for c*, esp if you delete data o  ensures tombstones are propagated, and avoid resurrected data •  repair is IO and CPU intensive Node Repair - cautions
  • 28. Node Repair - details 1 •  determine peer nodes with matching ranges •  triggers a major (validation) compaction on peer nodes o  read and generate hash for every row in CF o  add result to a Merkle Tree o  return tree to initiator
  • 29. Node Repair - details 2 •  initiator awaits trees from all nodes •  compares each tree to every other tree •  if any differences exist, two nodes are exchange the conflicting ranges o  these ranges get written out as new, local sstables
  • 30. 'ABC' node is repair initiator
  • 35. Anti-Entropy wrap-up •  CAP Theorem lives, tradeoffs must be made •  C* contains processes to make diverging data sets consistent •  Tunable controls exist at write and read times, as well on-demand
  • 36. Thank you! Q & A time @jasobrown
  • 37. Notes from Netflix •  carefully tune RR_chance •  schedule repair operations •  tickler •  store more hints vs. running repair