Automating Cassandra
Repairs
Radovan Zvoncek
zvo@spotify.com
github.com/spotify/cassandra-reaper
#CassandraSummit
About zvo
About zvo
Likes pancakes
About zvo
Likes pancakes
Does this for the 3rd time
About zvo
Likes pancakes
Does this for the 3rd time
Works at Spotify
Working at Spotify
Is autonomous
Squads responsible for their full stack
Including Cassandra
Cassandra
Node’s data
Cassandra
Replication
Cassandra
Running Cassandra
Requires many things
One of them is keeping data consistent
Otherwise it can get lost or reappear
Running Cassandra
Requires many things
One of them is keeping data consistent
Eventually
Eventual consistency
Cassandra
Eventual consistency
Read Repairs
Cassandra
R W
Eventual consistency
Hinted Handoff
Cassandra
Eventual consistency
Anti-entropy Repair
Cassandra
Coordinated process
Anti-entropy Repair
Coordinated process
Four steps:
Anti-entropy Repair
Coordinated process
Four steps:
1: Hash
Anti-entropy Repair
#
#
#
Coordinated process
Four steps:
1: Hash
2: Compare
Anti-entropy Repair
###
Coordinated process
Four steps:
1: Hash
2: Compare
3: Stream
Anti-entropy Repair
Coordinated process
Four steps:
1: Hash
2: Compare
3: Stream
4: Merge
Anti-entropy Repair
Coordinated process
Four steps:
1: Hash
2: Compare
3: Stream
4: Merge
Can go wild...
Anti-entropy Repair
Repair gone wild
Repair gone wild
Eats a lot of disk IO
● because of hashing all the data
Repair gone wild
Eats a lot of disk IO
Saturates the network
● because of streaming a lot of data around
Repair gone wild
Eats a lot of disk IO
Saturates the network
Fills up the disk
● because of receiving all replicas, possibly
from all other data centers
Repair gone wild
Eats a lot of disk IO
Saturates the network
Fills up the disk
Causes a ton of compactions
● because of having to merge the received
data
Repair gone wild
Eats a lot of disk IO
Saturates the network
Fills up the disk
Causes a ton of compactions
… one better be careful
Careful repair
nodetool repair
Careful repair
All three intervals
Partitioner range
● nodetool repair -pr
Careful repair
This interval only
Start & end tokens
● nodetool repair -pr -st -et
Careful repair
A part of interval only
Requires splitting the ring into smaller intervals
Smaller intervals mean less data
Less data means fewer repairs gone wild
Careful repair
Smaller intervals also mean more intervals
More intervals mean more actual repairs
Repairs need to be babysat :(
Careful repair
The Spotify way
Feature teams meant to do features
Not waste time operating their C* clusters
Cron-ing nodetool repair is no good
● mostly due to no feedback loop
This all led to creation of the Reaper
The Reaper
REST(ish) service
Does a lot of JMX
Orchestrates repairs for you
The reaping
You:
curl http://reaper/cluster --data ‘{“seedHost” : “my.cassandra.host.net”}’
The Reaper:
● Figures out cluster info (e.g. name, partitioner)
The reaping
You:
curl http://reaper/repair_run --data ‘{“clusterName”: “myCluster”}’
The Reaper:
● Prepares repair intervals
The reaping
You:
curl -X PUT http://reaper/repair_run/42 -d state=RUNNING
The Reaper:
● Starts triggering repairs of repair intervals
Reaper’s features
Reaper’s features
Carefulness - doesn’t kill a node
● checks for node load
● backs off after repairing an interval
Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
● because things break all the time
Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
Parallelism - no idle nodes
● multiple small intervals in parallel
Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
Parallelism - no idle nodes
Scheduling - setup things only once
● regular full-ring repairs
Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
Parallelism - no idle nodes
Scheduling - setup things only once
Persistency - state saved somewhere
● a bit of extra resilience
What we reaped
First repair done 2015-01-28
1,700 repairs since then, recently 90 per week
176,000 (16%) segments failed at least once
60 repair failures
What we reaped
What we reaped
Reaper’s Future
CASSANDRA-10070
Whatever is needed until then
Greatest benefit
Cassandra Reaper automates a very tedious
maintenance operation of Cassandra clusters
in a rather smart, efficient and careful manner
while requiring minimal Cassandra expertise
github.com/spotify/cassandra-reaper
#CassandraSummit
Thank you!
github.com/spotify/cassandra-reaper
#CassandraSummit

Spotify: Automating Cassandra repairs