Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Highly Available: TheCassandra Distribution        Model      Sam Overton  Cassandra Europe 2012
Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this ...
Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this ...
Highly Available: The Cassandra Distribution ModelOverview● High availability● Partition tolerant● Tunable consistency● Sc...
Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this ...
Highly Available: The Cassandra Distribution ModelPartitioning and placementShould...● Assign data to hosts● Have no S.P.O...
Highly Available: The Cassandra Distribution ModelConsistent Hashing                      Cassandra Europe 2012
Highly Available: The Cassandra Distribution ModelConsistent Hashing                                                    (k...
Highly Available: The Cassandra Distribution ModelConsistent Hashing● partitioner maps key to ring token● hosts tokens det...
Highly Available: The Cassandra Distribution ModelConsistent Hashing                      Cassandra Europe 2012
Highly Available: The Cassandra Distribution ModelConsistent HashingBootstrapping anew node                       Cassandr...
Highly Available: The Cassandra Distribution ModelConsistent HashingRange istransferred from oldhost to new host          ...
Highly Available: The Cassandra Distribution ModelConsistent Hashing                      Cassandra Europe 2012
Highly Available: The Cassandra Distribution ModelConsistent Hashing                      Cassandra Europe 2012
Highly Available: The Cassandra Distribution ModelConsistent Hashing                      Cassandra Europe 2012
Highly Available: The Cassandra Distribution ModelConsistent HashingDecommission isthe reverse process                    ...
Highly Available: The Cassandra Distribution ModelConsistent Hashing                      Cassandra Europe 2012
Highly Available: The Cassandra Distribution ModelConsistent Hashing● Tokens can be assigned manually, automaticallyor ran...
Highly Available: The Cassandra Distribution ModelPartitioners● Converts a row key (from client data) into atoken on the r...
Highly Available: The Cassandra Distribution ModelPartitionersRandom Partitioner● token = hash(key)● good load balancing● ...
Highly Available: The Cassandra Distribution ModelPartitionersOrder Preserving Partitioner● token = key● requires manual l...
Highly Available: The Cassandra Distribution ModelPartitioners● Get it right first time!● Design data model for RP● Custom...
Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this ...
Highly Available: The Cassandra Distribution ModelReplication● For availability● For redundancy● Can increase read bandwid...
Highly Available: The Cassandra Distribution ModelReplication● Replication Factor (RF) is number of copies ofdata● Defined...
Highly Available: The Cassandra Distribution ModelReplication Strategy● Determines how replicas are assigned for eachhost●...
Highly Available: The Cassandra Distribution Model  Replication Strategy : Simple Strategy(k1, v1) eg. RF=3 (k2, v2)      ...
Highly Available: The Cassandra Distribution ModelReplication Strategy : Network Topology Strategy                       C...
Highly Available: The Cassandra Distribution ModelReplication Strategy : Network Topology Strategy                  Multi-...
Highly Available: The Cassandra Distribution ModelReplication Strategy : Network Topology Strategy                       C...
Highly Available: The Cassandra Distribution ModelSnitches● Enables routing of requests according to nodeproximity● Used b...
Highly Available: The Cassandra Distribution ModelSimple Snitch●Every host is in the same rack & DC with equalproximityRac...
Highly Available: The Cassandra Distribution ModelEC2Snitch● DC = EC2 region● Rack = EC2 availability zoneProperty file sn...
Highly Available: The Cassandra Distribution ModelDynamicSnitch● Wraps each of the other snitches● Records latency stats f...
Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this ...
Highly Available: The Cassandra Distribution ModelConsistency● Replication and failures/partitions causeinconsistency● Old...
Highly Available: The Cassandra Distribution ModelConsistency● Cassandra allows a trade-off between partition-tolerance an...
Highly Available: The Cassandra Distribution ModelConsistency● Cassandra allows a trade-off between partition-tolerance an...
Highly Available: The Cassandra Distribution ModelConsistency● Cassandra allows a trade-off between partition-tolerance an...
Highly Available: The Cassandra Distribution ModelConsistency Level● ANY (only for writes)● ONE, TWO, THREE● QUORUM       ...
Highly Available: The Cassandra Distribution ModelIncreasing Consistency● Read repair● Hinted hand-off● Anti-entropy repai...
Highly Available: The Cassandra Distribution ModelRead Repair                      Cassandra Europe 2012
Highly Available: The Cassandra Distribution ModelRead Repair                      Cassandra Europe 2012
Highly Available: The Cassandra Distribution ModelRead Repair                      Cassandra Europe 2012
Highly Available: The Cassandra Distribution ModelRead Repair                      Cassandra Europe 2012
Highly Available: The Cassandra Distribution ModelHinted Hand-off                                              (k1, v1)eg....
Highly Available: The Cassandra Distribution ModelHinted Hand-off                                               (k1, v1)eg...
Highly Available: The Cassandra Distribution ModelHinted Hand-off                                               (k1, v1)eg...
Highly Available: The Cassandra Distribution ModelHinted Hand-off                                               (k1, v1)eg...
Highly Available: The Cassandra Distribution ModelHinted Hand-off                                                        (...
Highly Available: The Cassandra Distribution ModelHinted Hand-off                                                       (k...
Highly Available: The Cassandra Distribution ModelHinted Hand-off                                                       (k...
Highly Available: The Cassandra Distribution ModelHinted Hand-off                                                       (k...
Highly Available: The Cassandra Distribution ModelHinted Hand-off● Hinted writes do not count towards the chosenconsistenc...
Highly Available: The Cassandra Distribution ModelAnti-entropy repair● Manual maintenance process● Compares all data store...
Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this ...
Upcoming SlideShare
Loading in …5
×

Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam Overton

3,795 views

Published on

Sam Overton's talk from Cassandra Europe on March 28th 2012

Published in: Technology
  • Be the first to comment

Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam Overton

  1. 1. Highly Available: TheCassandra Distribution Model Sam Overton Cassandra Europe 2012
  2. 2. Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this talk:● Cassandra distribution overview● Partitioning and placement● Replication● Consistency Cassandra Europe 2012
  3. 3. Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this talk:● Cassandra distribution overview● Partitioning and placement● Replication● Consistency Cassandra Europe 2012
  4. 4. Highly Available: The Cassandra Distribution ModelOverview● High availability● Partition tolerant● Tunable consistency● Scalable● Replication● No single point of failure Cassandra Europe 2012
  5. 5. Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this talk:● Cassandra distribution overview● Partitioning and placement● Replication● Consistency Cassandra Europe 2012
  6. 6. Highly Available: The Cassandra Distribution ModelPartitioning and placementShould...● Assign data to hosts● Have no S.P.O.F for routing clients to data● Balance load● Allow scaling without moving too much data Cassandra Europe 2012
  7. 7. Highly Available: The Cassandra Distribution ModelConsistent Hashing Cassandra Europe 2012
  8. 8. Highly Available: The Cassandra Distribution ModelConsistent Hashing (k2, v2) (k1, v1) (k3, v3) Cassandra Europe 2012
  9. 9. Highly Available: The Cassandra Distribution ModelConsistent Hashing● partitioner maps key to ring token● hosts tokens determine placement of keys● and proportion of data assigned to each host● each row is stored on one host● wide rows can cause hot-spotting!So how does it scale? Cassandra Europe 2012
  10. 10. Highly Available: The Cassandra Distribution ModelConsistent Hashing Cassandra Europe 2012
  11. 11. Highly Available: The Cassandra Distribution ModelConsistent HashingBootstrapping anew node Cassandra Europe 2012
  12. 12. Highly Available: The Cassandra Distribution ModelConsistent HashingRange istransferred from oldhost to new host Cassandra Europe 2012
  13. 13. Highly Available: The Cassandra Distribution ModelConsistent Hashing Cassandra Europe 2012
  14. 14. Highly Available: The Cassandra Distribution ModelConsistent Hashing Cassandra Europe 2012
  15. 15. Highly Available: The Cassandra Distribution ModelConsistent Hashing Cassandra Europe 2012
  16. 16. Highly Available: The Cassandra Distribution ModelConsistent HashingDecommission isthe reverse process Cassandra Europe 2012
  17. 17. Highly Available: The Cassandra Distribution ModelConsistent Hashing Cassandra Europe 2012
  18. 18. Highly Available: The Cassandra Distribution ModelConsistent Hashing● Tokens can be assigned manually, automaticallyor randomly● Every node has full knowledge of placement● Client connects to any node, max 1 hop to data● Node status is gossiped Cassandra Europe 2012
  19. 19. Highly Available: The Cassandra Distribution ModelPartitioners● Converts a row key (from client data) into atoken on the ring● RandomPartitioner● Order Preserving Partitioner Cassandra Europe 2012
  20. 20. Highly Available: The Cassandra Distribution ModelPartitionersRandom Partitioner● token = hash(key)● good load balancing● no range queries across row keys Cassandra Europe 2012
  21. 21. Highly Available: The Cassandra Distribution ModelPartitionersOrder Preserving Partitioner● token = key● requires manual load balancing● careful selection of tokens around the ring● allows range queries across row keys Cassandra Europe 2012
  22. 22. Highly Available: The Cassandra Distribution ModelPartitioners● Get it right first time!● Design data model for RP● Custom partitioners are possible if necessary Cassandra Europe 2012
  23. 23. Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this talk:● Cassandra distribution overview● Partitioning and placement● Replication● Consistency Cassandra Europe 2012
  24. 24. Highly Available: The Cassandra Distribution ModelReplication● For availability● For redundancy● Can increase read bandwidth Cassandra Europe 2012
  25. 25. Highly Available: The Cassandra Distribution ModelReplication● Replication Factor (RF) is number of copies ofdata● Defined per-keyspace● Can be changed (eg. If data becomes more/lessvaluable)● Determines how many failures can be tolerated Cassandra Europe 2012
  26. 26. Highly Available: The Cassandra Distribution ModelReplication Strategy● Determines how replicas are assigned for eachhost● Defined per keyspace (like RF)● SimpleStrategy● NetworkTopologyStrategy● Custom strategies can be written Cassandra Europe 2012
  27. 27. Highly Available: The Cassandra Distribution Model Replication Strategy : Simple Strategy(k1, v1) eg. RF=3 (k2, v2) Cassandra Europe 2012
  28. 28. Highly Available: The Cassandra Distribution ModelReplication Strategy : Network Topology Strategy Cassandra Europe 2012
  29. 29. Highly Available: The Cassandra Distribution ModelReplication Strategy : Network Topology Strategy Multi-datacentre support DC1 DC2 Cassandra Europe 2012
  30. 30. Highly Available: The Cassandra Distribution ModelReplication Strategy : Network Topology Strategy Cassandra Europe 2012
  31. 31. Highly Available: The Cassandra Distribution ModelSnitches● Enables routing of requests according to nodeproximity● Used by replication strategy to determine rackand DC membership● Custom snitches can be written Cassandra Europe 2012
  32. 32. Highly Available: The Cassandra Distribution ModelSimple Snitch●Every host is in the same rack & DC with equalproximityRackInferringSnitchInfers the rack & DC from IP address of host●123.8.2.100 DC rack host Cassandra Europe 2012
  33. 33. Highly Available: The Cassandra Distribution ModelEC2Snitch● DC = EC2 region● Rack = EC2 availability zoneProperty file snitch●Rack and DC membership read fromconfiguration file Cassandra Europe 2012
  34. 34. Highly Available: The Cassandra Distribution ModelDynamicSnitch● Wraps each of the other snitches● Records latency stats from read operations● Avoids routing to slow hosts● Configurable update intervals Cassandra Europe 2012
  35. 35. Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this talk:● Cassandra distribution overview● Partitioning and placement● Replication● Consistency Cassandra Europe 2012
  36. 36. Highly Available: The Cassandra Distribution ModelConsistency● Replication and failures/partitions causeinconsistency● Old versions of data can be returned Timestamps:● Chosen by the client● Can be used to avoid read-modify-write Cassandra Europe 2012
  37. 37. Highly Available: The Cassandra Distribution ModelConsistency● Cassandra allows a trade-off between partition-tolerance and consistencyFor strong consistency:●R+W>N 1 1●Eg. with 5 replicas 1 1 1(RF = N = 5)write to 3read from 3 Cassandra Europe 2012
  38. 38. Highly Available: The Cassandra Distribution ModelConsistency● Cassandra allows a trade-off between partition-tolerance and consistencyFor strong consistency:● writeR+W>N 2 1●Eg. with 5 replicas 2 2 1(RF = N = 5)write to 3read from 3 Cassandra Europe 2012
  39. 39. Highly Available: The Cassandra Distribution ModelConsistency● Cassandra allows a trade-off between partition-tolerance and consistencyFor strong consistency:● readR+W>N 2 1●Eg. with 5 replicas 2 2 1(RF = N = 5)write to 3read from 3 Cassandra Europe 2012
  40. 40. Highly Available: The Cassandra Distribution ModelConsistency Level● ANY (only for writes)● ONE, TWO, THREE● QUORUM (N/2 + 1)● LOCAL QUORUM● ALL● Relax strong consistency for partition tolerance● To tolerate 1 node failure with strong consistencyuse RF=3 with CL=QUORUM Cassandra Europe 2012
  41. 41. Highly Available: The Cassandra Distribution ModelIncreasing Consistency● Read repair● Hinted hand-off● Anti-entropy repair Cassandra Europe 2012
  42. 42. Highly Available: The Cassandra Distribution ModelRead Repair Cassandra Europe 2012
  43. 43. Highly Available: The Cassandra Distribution ModelRead Repair Cassandra Europe 2012
  44. 44. Highly Available: The Cassandra Distribution ModelRead Repair Cassandra Europe 2012
  45. 45. Highly Available: The Cassandra Distribution ModelRead Repair Cassandra Europe 2012
  46. 46. Highly Available: The Cassandra Distribution ModelHinted Hand-off (k1, v1)eg. RF=2 (k1, v1) Cassandra Europe 2012
  47. 47. Highly Available: The Cassandra Distribution ModelHinted Hand-off (k1, v1)eg. RF=2 (k1, v1) Write (k1, v2) Cassandra Europe 2012
  48. 48. Highly Available: The Cassandra Distribution ModelHinted Hand-off (k1, v1)eg. RF=2 (k1, v1) Write (k1, v2) Cassandra Europe 2012
  49. 49. Highly Available: The Cassandra Distribution ModelHinted Hand-off (k1, v1)eg. RF=2 (k1, v1) Write (k1, v2) Cassandra Europe 2012
  50. 50. Highly Available: The Cassandra Distribution ModelHinted Hand-off (k1, v1)eg. RF=2 (k1, v1) Write (k1, v2) (k1, Cassandra Europe 2012 v2)
  51. 51. Highly Available: The Cassandra Distribution ModelHinted Hand-off (k1, v2)eg. RF=2 (k1, v1) (k1, Cassandra Europe 2012 v2)
  52. 52. Highly Available: The Cassandra Distribution ModelHinted Hand-off (k1, v2)eg. RF=2 (k1, v2) (k1, Cassandra Europe 2012 v2)
  53. 53. Highly Available: The Cassandra Distribution ModelHinted Hand-off (k1, v2)eg. RF=2 (k1, v2) (k1, Cassandra Europe 2012 v2)
  54. 54. Highly Available: The Cassandra Distribution ModelHinted Hand-off● Hinted writes do not count towards the chosenconsistency level● … except with CL=ANY which succeeds even ifall replicas are down● Dont rely on hints: hints cannot be read! Cassandra Europe 2012
  55. 55. Highly Available: The Cassandra Distribution ModelAnti-entropy repair● Manual maintenance process● Compares all data stored on a host with thereplicas● Differences are streamed to restore consistency● Must be run every 10 days to ensuretombstones are replicated Cassandra Europe 2012
  56. 56. Highly Available: The Cassandra Distribution ModelCassandra is:● built for scalability● built to tolerate failure In this talk:● Cassandra distribution overview● Partitioning and placement● Replication● Consistency fin. Cassandra Europe 2012

×