Introduction to Cassandra: Replication and Consistency

69,221
-1

Published on

A short introduction to replication and consistency in the Cassandra distributed database. Delivered April 28th, 2010 at the Seattle Scalability Meetup.

Published in: Technology
2 Comments
120 Likes
Statistics
Notes
No Downloads
Views
Total Views
69,221
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
2,565
Comments
2
Likes
120
Embeds 0
No embeds

No notes for slide






























  • Introduction to Cassandra: Replication and Consistency

    1. Cassandra Replication & Consistency Benjamin Black, b@b3k.us 2010-04-28
    2. Dynamo BigTable Cluster Sparse, management, columnar data replication, fault model, storage tolerance architecture Cassandra
    3. Dynamo-like Features Symmetric, P2P architecture No special nodes/SPOFs Gossip-based cluster management Distributed hash table for data placement Pluggable partitioning Pluggable topology discovery Pluggable placement strategies Tunable, eventual consistency
    4. BigTable-like Features Sparse, “columnar” data model Optional, 2-level maps called Super Column Families SSTable disk storage Append-only commit log Memtable (buffer and sort) Immutable SSTable files Hadoop integration
    5. Topic(s) for Today Replication & Consistency
    6. [1]
    7. Replication How many copies of each piece of data do we want in the system? N=3
    8. Consistency Level How many replicas must respond to declare success? W=2 R=2 ?
    9. CL.Options WRITE READ Level Description Level Description ZERO Cross fingers ANY WEAK 1st Response (including HH) ONE 1st Response ONE 1st Response STRONG QUORUM N/2 + 1 replicas QUORUM N/2 + 1 replicas ALL All replicas ALL All replicas
    10. A Side Note on CL Consistency Level is based on Replication Factor (N), not on the number of nodes in the system.
    11. A Question of Time row column column column column column value value value value value timestamp timestamp timestamp timestamp timestamp All columns have a value and a timestamp Timestamps provided by clients usec resolution by convention Latest timestamp wins Vector clocks may be introduced in 0.7
    12. Read Repair ? Query all replicas on every read Data from one replica Checksum/timestamps from all others If there is a mismatch: Pull all data and merge Write back to out of sync replicas
    13. Weak vs. Strong Weak Consistency (reads)Perform repair after returning results Strong Consistency (reads) Perform repair before returning results
    14. R+W>N Please imagine this inequality has huge fangs, dripping with the blood of innocent, enterprise developers so you can best appreciate the terror it inspires.
    15. Our Guarantee R+W>N guarantees overlap of read and write quorums W=2 R=2 N=3
    16. A Matter of Perspective View consistency Replica consistency
    17. [2]
    18. The Ring 0 range 113 375 125 312 250
    19. Tokens A TOKEN is a partitioner-dependent element on the ring Each NODE has a single, unique TOKEN Each NODE claims a RANGE of the ring from its TOKEN to the token of the previous node on the ring
    20. Partitioning Map from Key Space to Token RandomPartitioner Tokens are integers in the range 0-2127 MD5(Key) -> Token Good: Even key distribution, Bad: Inefficient range queries OrderPreservingPartitioner Tokens are UTF8 strings in the range ‘’-∞ Key -> Token Good: Efficient range queries, Bad: Uneven key distribution
    21. Snitching Map from Nodes to Physical Location EndpointSnitch Guess at rack and datacenter based on IP address octets. DatacenterEndpointSnitch Specify IP subnets for racks, grouped per datacenter. PropertySnitch Specify arbitrary mappings from individual IP addresses to racks and datacenters. Or write your own!
    22. Placement Map from Token Space to Nodes The first replica is always placed on the node that claims the range in which the token falls. Strategies determine where the rest of the replicas are placed.
    23. RackUnaware Place replicas on the N-1 subsequent nodes around the ring, ignoring topology. datacenter A datacenter B rack 1 rack 2 rack 1 rack 2
    24. RackAware Place the second replica in another datacenter, and the remaining N-2 replicas on nodes in other racks in the same datacenter. datacenter A datacenter B rack 1 rack 2 rack 1 rack 2
    25. DatacenterShard Place M of the N replicas in another datacenter, and the remaining N - (M + 1) replicas on nodes in other racks in the same datacenter. datacenter A datacenter B rack 1 rack 2 rack 1 rack 2
    26. Or write your own!
    27. [fin]
    28. Cassandra http://cassandra.apache.org
    29. Amazon Dynamo http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html Google BigTable http://labs.google.com/papers/bigtable.html Facebook Cassandra http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
    30. Thank you! Questions?

    ×