Your SlideShare is downloading. ×
Virtual Nodes: Rethinking Topology in Cassandra
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Virtual Nodes: Rethinking Topology in Cassandra

2,191

Published on

A presentation on the recent work to transition Cassandra from its naive 1-partition-per-node distribution, to a proper virtual nodes implementation.

A presentation on the recent work to transition Cassandra from its naive 1-partition-per-node distribution, to a proper virtual nodes implementation.

Published in: Technology, Business
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,191
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
11
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. #Cassandra13Rethinking Topology in CassandraCassandra SummitJune 11, 2013Eric Evanseevans@opennms.com@jericevans
  • 2. #Cassandra13DHT 101
  • 3. #Cassandra13DHT 101partitioningAZ
  • 4. #Cassandra13DHT 101partitioningAZBYC
  • 5. #Cassandra13DHT 101partitioningAZBYCKey = Aaa
  • 6. #Cassandra13DHT 101replica placementAZBYCKey = Aaa
  • 7. #Cassandra13DHT 101consistencyConsistencyAvailabilityPartition tolerance
  • 8. #Cassandra13DHT 101scenario: consistency level = oneA??W
  • 9. #Cassandra13DHT 101scenario: consistency level = allA??R
  • 10. #Cassandra13DHT 101scenario: quorum writeAB?W
  • 11. #Cassandra13DHT 101scenario: quorum readAB?R
  • 12. #Cassandra13Awesome, yes?
  • 13. #Cassandra13Well...
  • 14. #Cassandra13Problem:Poor request/stream distribution
  • 15. #Cassandra13DistributionAZBYCM
  • 16. #Cassandra13DistributionAZBYCM
  • 17. #Cassandra13DistributionAZBYCM
  • 18. #Cassandra13DistributionAZBYCM
  • 19. #Cassandra13DistributionZ ABYCM
  • 20. #Cassandra13DistributionABYCMA1Z
  • 21. #Cassandra13DistributionABYCMA1Z
  • 22. #Cassandra13DistributionABYCMA1Z
  • 23. #Cassandra13Problem:Poor data distribution
  • 24. #Cassandra13DistributionABDC
  • 25. #Cassandra13DistributionABDCE
  • 26. #Cassandra13DistributionEAD BC
  • 27. #Cassandra13DistributionEAD BC
  • 28. #Cassandra13DistributionABDCH EFG
  • 29. #Cassandra13DistributionABDCH EFG
  • 30. #Cassandra13Virtual Nodes
  • 31. #Cassandra13In a nutshell...
  • 32. #Cassandra13Benefits● Operationally simpler (no token management)● Better distribution of load● Concurrent streaming (all hosts)● Smaller partitions mean greater reliability● Better supports heterogeneous hardware
  • 33. #Cassandra13Strategies● Automatic sharding● Fixed partition assignment● Random token assignment
  • 34. #Cassandra13Strategyautomatic sharding● Partitions are split when data exceeds a threshold● Newly created partitions are relocated to a host with less data● Similar to Bigtable, or Mongo auto-sharding
  • 35. #Cassandra13Strategyfixed partition assignment● Namespace divided into Q evenly-sized partitions● Q/N partitions assigned per host (where N is number of hosts)● Joining hosts “steal” partitions evenly from existing hosts● Used by Dynamo/Voldemort (“strategy 3” in Dynamo paper)
  • 36. #Cassandra13Strategyrandom token assignment● Each host assigned T random tokens● T random tokens generated for joining hosts; New tokens divideexisting ranges● Similar to libketama; Identical to Classic Cassandra when T=1
  • 37. #Cassandra13Considerations1.Number of partitions2.Partition size3.How 1 changes with more nodes and data4.How 2 changes with more nodes and data
  • 38. #Cassandra13EvaluatingStrategy No. Partitions Partition sizeRandom O(N) O(B/N)Fixed O(1) O(B)Auto-sharding O(B) O(1)
  • 39. #Cassandra13EvaluatingAutomatic sharding● Partition size is constant (great)● Number of partitions scales linearly with data size (bad)
  • 40. #Cassandra13EvaluatingFixed partition assignment● Number of partitions is constant (good)● Partition size scales linearly with data size (bad)● Greater operational complexity (bad)
  • 41. #Cassandra13EvaluatingRandom token assignment● Number of partitions scales linearly with number of hosts (OK)● Partition size increases with more data; Decreases with morehosts (good)
  • 42. #Cassandra13Evaluating● Automatic sharding● Fixed partition assignment● Random token assignment
  • 43. #Cassandra13Cassandra
  • 44. #Cassandra13Configurationconf/cassandra.yaml# Comma separated list of tokens, (new# installs only).initial_token:<token>,<token>,<token>or# Number of tokens to generate.num_tokens: 256
  • 45. #Cassandra13Configurationnodetool infoToken : (invoke with -T/--tokens to see all 256 tokens)ID : 6a8dc22c-1f37-473f-8f7e-47742f4b83a5Gossip active : trueThrift active : trueLoad : 42.92 MBGeneration No : 1370016307Uptime (seconds) : 221Heap Memory (MB) : 998.72 / 1886.00Data Center : datacenter1Rack : rack1Exceptions : 0Key Cache : size 1128 (bytes), capacity 98566144 (bytes), 42 hits, 54 re...Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN ...
  • 46. #Cassandra13Configurationnodetool ringDatacenter: datacenter1==========Replicas: 0Address Rack Status State Load Owns Token3074457345618258602127.0.0.1 rack1 Up Normal 42.92 MB 33.33% -9223372036854775808127.0.0.1 rack1 Up Normal 42.92 MB 33.33% 3098476543630901247127.0.0.1 rack1 Up Normal 42.92 MB 33.33% 3122495741643543892127.0.0.1 rack1 Up Normal 42.92 MB 33.33% 3146514939656186537127.0.0.1 rack1 Up Normal 42.92 MB 33.33% 3170534137668829183127.0.0.1 rack1 Up Normal 42.92 MB 33.33% 3194553335681471828127.0.0.1 rack1 Up Normal 42.92 MB 33.33% 321857253369411447127.0.0.1 rack1 Up Normal 42.92 MB 33.33% 3242591731706757118...
  • 47. #Cassandra13Configurationnodetool statusDatacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 127.0.0.1 42.92 MB 256 33.3% 6a8dc22c-1f37-473f-8f7e-47742f4b83a5 rack1UN 127.0.0.2 60.17 MB 256 33.3% 26263a2b-768e-4a79-8d41-3624a14b13a8 rack1UN 127.0.0.3 56.85 MB 256 33.3% 5b3e208f-6d36-4c7b-b2bb-b7c476a1af66 rack1
  • 48. #Cassandra13Configurationnodetool statusDatacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 127.0.0.1 42.92 MB 256 33.3% 6a8dc22c-1f37-473f-8f7e-47742f4b83a5 rack1UN 127.0.0.2 60.17 MB 256 33.3% 26263a2b-768e-4a79-8d41-3624a14b13a8 rack1UN 127.0.0.3 56.85 MB 256 33.3% 5b3e208f-6d36-4c7b-b2bb-b7c476a1af66 rack1
  • 49. #Cassandra13Configurationnodetool statusDatacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 127.0.0.1 42.92 MB 256 33.3% 6a8dc22c-1f37-473f-8f7e-47742f4b83a5 rack1UN 127.0.0.2 60.17 MB 256 33.3% 26263a2b-768e-4a79-8d41-3624a14b13a8 rack1UN 127.0.0.3 56.85 MB 256 33.3% 5b3e208f-6d36-4c7b-b2bb-b7c476a1af66 rack1
  • 50. #Cassandra13MigrationABD
  • 51. #Cassandra13Migrationedit conf/cassandra.yaml and restart# Number of tokens to generate.num_tokens: 256
  • 52. #Cassandra13Migrationconvert to T contiguous tokens in existing rangesBAAAAAAAAAAAAAAAACAB
  • 53. #Cassandra13MigrationshuffleBAAAAAAAAAAAAAAAACAB
  • 54. #Cassandra13Shuffle● Range transfers are queued on each host● Hosts initiate transfer to self● Pay attention to the logs!
  • 55. #Cassandra13ShuffleUsage: shuffle [options] <sub-command>Sub-commands:create Initialize a new shuffle operationls List pending relocationsclear Clear pending relocationsen[able] Enable shufflingdis[able] Disable shufflingOptions:-dc, --only-dc Apply only to named DC (create only)-u, --username JMX username-tp, --thrift-port Thrift port number (Default: 9160)-p, --port JMX port number (Default: 7199)-tf, --thrift-framed Enable framed transport for Thrift (Default: false)-en, --and-enable Immediately enable shuffling (create only)-pw, --password JMX password-H, --help Print help information-h, --host JMX hostname or IP address (Default: localhost)-th, --thrift-host Thrift hostname or IP address (Default: JMX host)
  • 56. #Cassandra13Performance
  • 57. #Cassandra13removenodeCassandra 1.2 Cassandra 1.1050100150200250300350400450
  • 58. #Cassandra13bootstrapCassandra 1.2 Cassandra 1.10100200300400500600
  • 59. #Cassandra13The End● Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshalland Werner Vogels “Dynamo: Amazon’s Highly Available Key-value Store” Web.● Low, Richard. “Improving Cassandras uptime with virtual nodes” Web.● Overton, Sam. “Virtual Nodes Strategies.” Web.● Overton, Sam. “Virtual Nodes: Performance Results.” Web.● Jones, Richard. "libketama - a consistent hashing algo for memcache clients” Web.

×