2. I: Goal
II: Design
III: Implementation
Distributed Counters in Cassandra
Friday, August 13, 2010
3. I: Goal
Distributed Counters in Cassandra
Friday, August 13, 2010
4. Goal
Low Latency,
Highly Available
Counters
Distributed Counters in Cassandra
Friday, August 13, 2010
5. II: Design
Distributed Counters in Cassandra
Friday, August 13, 2010
6. I: Traditional Counter Design
II: Abstract Strategy
III: Distributed Counter Design
Distributed Counters in Cassandra
Friday, August 13, 2010
7. Design
I: Traditional Counter Design
Distributed Counters in Cassandra
Friday, August 13, 2010
8. Traditional Counter Design
Atomic Counters
1. single machine
2. one order of execution
3. strongly consistent
Distributed Counters in Cassandra
Friday, August 13, 2010
9. Traditional Counter Design
Problems
1. SPOF / single master
2. high latency
3. manually sharded
Distributed Counters in Cassandra
Friday, August 13, 2010
10. Traditional Counter Design
Question
What constraints can we relax?
Distributed Counters in Cassandra
Friday, August 13, 2010
11. Design
II: Abstract Strategy
Distributed Counters in Cassandra
Friday, August 13, 2010
12. Abstract Strategy
Constraints to Relax
1. one order of execution
2. strong consistency
Distributed Counters in Cassandra
Friday, August 13, 2010
13. Abstract Strategy
Relax: One Order of Execution
commutative operation:
- operations must be re-orderable
Distributed Counters in Cassandra
Friday, August 13, 2010
14. Abstract Strategy
Relax: Strong Consistency
partitioned work:
- each op must occur once
- unique partition identifier
idempotent repair:
- recognize ops from other partitions
Distributed Counters in Cassandra
Friday, August 13, 2010
15. Design
III: Distributed Counter Design
Distributed Counters in Cassandra
Friday, August 13, 2010
16. Distributed Counter Design
Requirements
1. commutative operation
2. partitioned work
3. idempotent repair
Distributed Counters in Cassandra
Friday, August 13, 2010
17. Distributed Counter Design
Commutative Operation
addition:
- commutative operation
- sum ops performed by all replicas
-a + b = b + a
Distributed Counters in Cassandra
Friday, August 13, 2010
18. Distributed Counter Design
Partitioned Work
each op assigned to a replica:
- every replica sums all of its ops
Distributed Counters in Cassandra
Friday, August 13, 2010
19. Distributed Counter Design
Idempotent Repair
save counts from remote replicas:
- keep highest count seen
prevent multiple execution:
- do not transfer the target replica’s count
Distributed Counters in Cassandra
Friday, August 13, 2010
20. III: Implementation
Distributed Counters in Cassandra
Friday, August 13, 2010
21. I: Data Structure
II: Single Node
III: Eventual Consistency
Distributed Counters in Cassandra
Friday, August 13, 2010
22. I: Data Structure
Distributed Counters in Cassandra
Friday, August 13, 2010
23. Data Structure
Requirements
local counts:
- incrementally update
remote counts:
- independently track partitions
Distributed Counters in Cassandra
Friday, August 13, 2010
24. Data Structure
Context Format
list of (replica id, count) tuples:
[(replica A, count), (replica B, count), ...]
Distributed Counters in Cassandra
Friday, August 13, 2010
25. Data Structure
Context Mutations
local write:
sum local count and write delta
note: memtable
Distributed Counters in Cassandra
Friday, August 13, 2010
26. Data Structure
Context Mutations
remote repair:
for each replica,
keep highest count seen
(local or from repair)
Distributed Counters in Cassandra
Friday, August 13, 2010
27. II: Single Node
Distributed Counters in Cassandra
Friday, August 13, 2010
28. Single Node
Write Path
client
1. construct column
- value: delta (big-endian long)
- clock: empty
2. thrift: insert / batch_mutate
Distributed Counters in Cassandra
Friday, August 13, 2010
30. Single Node
Write Path
target replica
insert:
1. memtable does not contain column
2. insert column into memtable
Distributed Counters in Cassandra
Friday, August 13, 2010
31. Single Node
Write Path
target replica
update:
1. memtable contains column
2. retrieve existing column
3. create new column
- context: sum local count w/ delta from write
4. replace column in ConcurrentSkipListMap
5. if failed to replace column, go to step 2.
Distributed Counters in Cassandra
Friday, August 13, 2010
32. Single Node
Write Path
Interesting Note:
MTs are serialized to SSTs, as-is
- each SST encapsulates the updates
when it was an MT
- local count total must be aggregated
across the MT and all SSTs
Distributed Counters in Cassandra
Friday, August 13, 2010
33. Single Node
Read Path
target replica
read:
1. construct collating iterator over:
- frozen snapshot of MT
- all relevant SSTs
2. resolve column
- local counts: sum
- remote counts: keep max
3. construct value
- sum local and remote counts (big-endian long)
Distributed Counters in Cassandra
Friday, August 13, 2010
34. Single Node
Compaction
replica
compaction:
1. construct collating iterator over all SSTs
2. resolve every column in the CF
- local counts: sum
- remote counts: keep max
3. write out resolved CF
Distributed Counters in Cassandra
Friday, August 13, 2010
36. Eventual Consistency
Read Repair
coordinator / replica
read repair:
1. calculate resolved (superset) CF
- resolve every column (local: sum, remote: max)
2. return resolved CF to client
Distributed Counters in Cassandra
Friday, August 13, 2010
37. Eventual Consistency
Read Repair
coordinator / replica
read repair:
1. calculate repair CF for each replica
- calculate diff CF between resolved and received
- modify columns to remove target replica’s counts
2. send repair CF to each replica
Distributed Counters in Cassandra
Friday, August 13, 2010
38. Eventual Consistency
Anti-Entropy Service
sending replica
AES:
1. follow normal AES code path
- calculate repair SST based on shared ranges
- send repair SST
Distributed Counters in Cassandra
Friday, August 13, 2010
39. Eventual Consistency
Anti-Entropy Service
receiving replica
AES:
1. post-process streamed SST
- re-build streamed SST
- note: strip out local replica’s counts
2. remove temporary descriptor
3. add to SSTableTracker
Distributed Counters in Cassandra
Friday, August 13, 2010
40. Questions?
Distributed Counters in Cassandra
Friday, August 13, 2010
41. More Information
Issues:
#580: Vector Clocks
#1072: Distributed Counters
Related Work:
Helland and Campbell, Building on Quicksand, CIDR (2009),
Sections 5 & 6.
My email address:
kakugawa@gmail.com
Distributed Counters in Cassandra
Friday, August 13, 2010