SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Introduction to Cassandra: Replication and Consistency
A short introduction to replication and consistency in the Cassandra distributed database. Delivered April 28th, 2010 at the Seattle Scalability Meetup.
A short introduction to replication and consistency in the Cassandra distributed database. Delivered April 28th, 2010 at the Seattle Scalability Meetup.
Dynamo-like Features Symmetric, P2P architecture
No special nodes/SPOFs Gossip-based cluster management Distributed hash table for data placement Pluggable partitioning Pluggable topology discovery Pluggable placement strategies Tunable, eventual consistency
BigTable-like Features Sparse, “columnar” data
model Optional, 2-level maps called Super Column Families SSTable disk storage Append-only commit log Memtable (buffer and sort) Immutable SSTable files Hadoop integration
CL.Options WRITE READ Level Description
Level Description ZERO Cross fingers ANY WEAK 1st Response (including HH) ONE 1st Response ONE 1st Response STRONG QUORUM N/2 + 1 replicas QUORUM N/2 + 1 replicas ALL All replicas ALL All replicas
A Side Note on CL
Consistency Level is based on Replication Factor (N), not on the number of nodes in the system.
A Question of Time row
column column column column column value value value value value timestamp timestamp timestamp timestamp timestamp All columns have a value and a timestamp Timestamps provided by clients usec resolution by convention Latest timestamp wins Vector clocks may be introduced in 0.7
Read Repair ? Query all
replicas on every read Data from one replica Checksum/timestamps from all others If there is a mismatch: Pull all data and merge Write back to out of sync replicas
Weak vs. Strong Weak Consistency
(reads)Perform repair after returning results Strong Consistency (reads) Perform repair before returning results
R+W>N Please imagine this inequality
has huge fangs, dripping with the blood of innocent, enterprise developers so you can best appreciate the terror it inspires.
Tokens A TOKEN is a
partitioner-dependent element on the ring Each NODE has a single, unique TOKEN Each NODE claims a RANGE of the ring from its TOKEN to the token of the previous node on the ring
Partitioning Map from Key Space
to Token RandomPartitioner Tokens are integers in the range 0-2127 MD5(Key) -> Token Good: Even key distribution, Bad: Inefficient range queries OrderPreservingPartitioner Tokens are UTF8 strings in the range ‘’-∞ Key -> Token Good: Efficient range queries, Bad: Uneven key distribution
Snitching Map from Nodes to
Physical Location EndpointSnitch Guess at rack and datacenter based on IP address octets. DatacenterEndpointSnitch Specify IP subnets for racks, grouped per datacenter. PropertySnitch Specify arbitrary mappings from individual IP addresses to racks and datacenters. Or write your own!
Placement Map from Token Space
to Nodes The first replica is always placed on the node that claims the range in which the token falls. Strategies determine where the rest of the replicas are placed.
RackUnaware Place replicas on the
N-1 subsequent nodes around the ring, ignoring topology. datacenter A datacenter B rack 1 rack 2 rack 1 rack 2
RackAware Place the second replica
in another datacenter, and the remaining N-2 replicas on nodes in other racks in the same datacenter. datacenter A datacenter B rack 1 rack 2 rack 1 rack 2
DatacenterShard Place M of the
N replicas in another datacenter, and the remaining N - (M + 1) replicas on nodes in other racks in the same datacenter. datacenter A datacenter B rack 1 rack 2 rack 1 rack 2