Rethinking Topology In Cassandra (ApacheCon NA)

2,034 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,034
On SlideShare
0
From Embeds
0
Number of Embeds
78
Actions
Shares
0
Downloads
22
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Rethinking Topology In Cassandra (ApacheCon NA)

  1. 1. Rethinking Topology in Cassandra ApacheCon North America February 28, 2013 Eric Evans eevans@acunu.com @jericevansThursday, February 28, 13 1
  2. 2. DHT 101Thursday, February 28, 13 2
  3. 3. DHT 101 partitioning Z AThursday, February 28, 13 3
  4. 4. DHT 101 partitioning Z A Y B CThursday, February 28, 13 4
  5. 5. DHT 101 partitioning Z A Y Key = Aaa B CThursday, February 28, 13 5
  6. 6. DHT 101 replica placement Z A Y Key = Aaa B CThursday, February 28, 13 6
  7. 7. DHT 101 consistency Consistency Availability Partition toleranceThursday, February 28, 13 7
  8. 8. DHT 101 scenario: consistency level = one A W ? ?Thursday, February 28, 13 8
  9. 9. DHT 101 scenario: consistency level = all A R ? ?Thursday, February 28, 13 9
  10. 10. DHT 101 scenario: quorum write A W R+W > N B ?Thursday, February 28, 13 10
  11. 11. DHT 101 scenario: quorum read ? R+W > N B R CThursday, February 28, 13 11
  12. 12. Awesome, yes?Thursday, February 28, 13 12
  13. 13. Well...Thursday, February 28, 13 13
  14. 14. Problem: Poor load distributionThursday, February 28, 13 14
  15. 15. Distributing Load Z A Y B C MThursday, February 28, 13 15
  16. 16. Distributing Load Z A Y B C MThursday, February 28, 13 16
  17. 17. Distributing Load Z A Y B C MThursday, February 28, 13 17
  18. 18. Distributing Load Z A Y B C MThursday, February 28, 13 18
  19. 19. Distributing Load Z A Y B C MThursday, February 28, 13 19
  20. 20. Distributing Load Z A A1 Y B C MThursday, February 28, 13 20
  21. 21. Distributing Load Z A A1 Y B C MThursday, February 28, 13 21
  22. 22. Distributing Load Z A A1 Y B C MThursday, February 28, 13 22
  23. 23. Problem: Poor data distributionThursday, February 28, 13 23
  24. 24. Distributing Data A C D BThursday, February 28, 13 24
  25. 25. Distributing Data A E C D BThursday, February 28, 13 25
  26. 26. Distributing Data A A E C D C D B BThursday, February 28, 13 26
  27. 27. Distributing Data A A E C D C D B BThursday, February 28, 13 27
  28. 28. Distributing Data A H E C D G F BThursday, February 28, 13 28
  29. 29. Distributing Data A H E C D G F BThursday, February 28, 13 29
  30. 30. Virtual NodesThursday, February 28, 13 30
  31. 31. In a nutshell... host host hostThursday, February 28, 13 31
  32. 32. Benefits • Operationally simpler (no token management) • Better distribution of load • Concurrent streaming involving all hosts • Smaller partitions mean greater reliability • Supports heterogenous hardwareThursday, February 28, 13 32
  33. 33. Strategies • Automatic sharding • Fixed partition assignment • Random token assignmentThursday, February 28, 13 33
  34. 34. Strategy Automatic Sharding • Partitions are split when data exceeds a threshold • Newly created partitions are relocated to a host with lower data load • Similar to sharding performed by Bigtable, or Mongo auto-shardingThursday, February 28, 13 34
  35. 35. Strategy Fixed Partition Assignment • Namespace divided into Q evenly-sized partitions • Q/N partitions assigned per host (where N is the number of hosts) • Joining hosts “steal” partitions evenly from existing hosts. • Used by Dynamo and Voldemort (described in Dynamo paper as “strategy 3”)Thursday, February 28, 13 35
  36. 36. Strategy Random Token Assignment • Each host assigned T random tokens • T random tokens generated for joining hosts; New tokens divide existing ranges • Similar to libketama; Identical to Classic Cassandra when T=1Thursday, February 28, 13 36
  37. 37. Considerations 1. Number of partitions 2. Partition size 3. How 1 changes with more nodes and data 4. How 2 changes with more nodes and dataThursday, February 28, 13 37
  38. 38. Evaluating Strategy No. Partitions Partition size Random O(N) O(B/N) Fixed O(1) O(B) Auto-sharding O(B) O(1) B ~ total data size, N ~ number of hostsThursday, February 28, 13 38
  39. 39. Evaluating • Automatic sharding • partition size constant (great) • number of partitions scales linearly with data size (bad) • Fixed partition assignment • Random token assignmentThursday, February 28, 13 39
  40. 40. Evaluating • Automatic sharding • Fixed partition assignment • Number of partitions is constant (good) • Partition size scales linearly with data size (bad) • Higher operational complexity (bad) • Random token assignmentThursday, February 28, 13 40
  41. 41. Evaluating • Automatic sharding • Fixed partition assignment • Random token assignment • Number of partitions scales linearly with number of hosts (good ok) • Partition size increases with more data; decreases with more hosts (good)Thursday, February 28, 13 41
  42. 42. Evaluating • Automatic sharding • Fixed partition assignment • Random token assignmentThursday, February 28, 13 42
  43. 43. CassandraThursday, February 28, 13 43
  44. 44. Configuration conf/cassandra.yaml # Comma separated list of tokens, # (new installs only). initial_token:<token>,<token>,<token> or # Number of tokens to generate. num_tokens: 256Thursday, February 28, 13 44
  45. 45. Configuration nodetool info Token : (invoke with -T/--tokens to see all 256 tokens) ID : 64090651-6034-41d5-bfc6-ddd24957f164 Gossip active : true Thrift active : true Load : 92.69 KB Generation No : 1351030018 Uptime (seconds): 45 Heap Memory (MB): 95.16 / 1956.00 Data Center : datacenter1 Rack : rack1 Exceptions : 0 Key Cache : size 240 (bytes), capacity 101711872 (bytes ... Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, ...Thursday, February 28, 13 45
  46. 46. Configuration nodetool ring Datacenter: datacenter1 ========== Replicas: 2 Address Rack Status State Load Owns Token 9022770486425350384 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9182469192098976078 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9054823614314102214 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8970752544645156769 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8927190060345427739 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8880475677109843259 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8817876497520861779 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8810512134942064901 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8661764562509480261 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8641550925069186492 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8636224350654790732 ... ...Thursday, February 28, 13 46
  47. 47. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1Thursday, February 28, 13 47
  48. 48. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1Thursday, February 28, 13 48
  49. 49. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1Thursday, February 28, 13 49
  50. 50. Migration A C BThursday, February 28, 13 50
  51. 51. Migration edit conf/cassandra.yaml and restart # Number of tokens to generate. num_tokens: 256Thursday, February 28, 13 51
  52. 52. Migration convert to T contiguous tokens in existing ranges A AA A AA B A A AA A A AA A A A A AAA AA A A A C A A A A A A A AThursday, February 28, 13 52
  53. 53. Migration shuffle A AA A AA B A A AA A A AA A A A A AAA AA A A A C A A A A A A A AThursday, February 28, 13 53
  54. 54. Shuffle • Range transfers are queued on each host • Hosts initiate transfer of ranges to self • Pay attention to the logs!Thursday, February 28, 13 54
  55. 55. Shuffle bin/shuffle Usage: shuffle [options] <sub-command> Sub-commands: create Initialize a new shuffle operation ls List pending relocations clear Clear pending relocations en[able] Enable shuffling dis[able] Disable shuffling Options: -dc, --only-dc Apply only to named DC (create only) -tp, --thrift-port Thrift port number (Default: 9160) -p, --port JMX port number (Default: 7199) -tf, --thrift-framed Enable framed transport for Thrift (Default: false) -en, --and-enable Immediately enable shuffling (create only) -H, --help Print help information -h, --host JMX hostname or IP address (Default: localhost) -th, --thrift-host Thrift hostname or IP address (Default: JMX host)Thursday, February 28, 13 55
  56. 56. PerformanceThursday, February 28, 13 56
  57. 57. removenode 400 300 200 100 0 Cassandra 1.2 Cassandra 1.1Thursday, February 28, 13 57
  58. 58. bootstrap 500 375 250 125 0 Cassandra 1.2 Cassandra 1.1Thursday, February 28, 13 58
  59. 59. The End • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels “Dynamo: Amazon’s Highly Available Key-value Store” Web. • Low, Richard. “Improving Cassandras uptime with virtual nodes” Web. • Overton, Sam. “Virtual Nodes Strategies.” Web. • Overton, Sam. “Virtual Nodes: Performance Results.” Web. • Jones, Richard. "libketama - a consistent hashing algo for memcache clients” Web.Thursday, February 28, 13 59

×