Virtual nodes: Operational Aspirin

1,431 views
1,247 views

Published on

Cassandra SF meetup, October 2013

Published in: Technology, Education
1 Comment
2 Likes
Statistics
Notes
  • Following link is broken :http://www.acunu.com/2/post/2012/07/virtual-nodes-performance-results.html
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
1,431
On SlideShare
0
From Embeds
0
Number of Embeds
93
Actions
Shares
0
Downloads
18
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Virtual nodes: Operational Aspirin

  1. 1. Virtual nodes: Operational Aspirin Nicolas Favre-Felix nicolas@acunu.com @yowgi Thursday, 24 October 13
  2. 2. 1-minute recap on Cassandra distribution • Nodes are clustered in a “ring” • Each node has a token in [0,2127-1]: • 0 • 42535295865117307932921825928971026432 • 85070591730234615865843651857942052864 • 127605887595351923798765477786913079296 • Keys are hashed using MD5 (now Murmur3) • Each node owns a share of the key-space 2 Thursday, 24 October 13
  3. 3. Cassandra distribution limitations • Operational complexity • Rebuild cost for capacity bound clusters • Impact on maintenance operations • Impact on topology changes • No native support for heterogeneous hardware 3 Thursday, 24 October 13
  4. 4. Adding a node to an existing cluster 4 Thursday, 24 October 13
  5. 5. Insert the new node... 5 Thursday, 24 October 13
  6. 6. Recalculate ranges and rebalance by hand 6 Thursday, 24 October 13
  7. 7. Usually just double the number of nodes 7 Thursday, 24 October 13
  8. 8. Add/remove node • Need to rebalance ranges between nodes • Move more data than is optimal • (optimal would be 1/N) • Impacts at most RF nodes • (prefer to spread load across cluster) • Manual, tedious, error-prone, painful... 8 Thursday, 24 October 13
  9. 9. Removing a node • nodetool removetoken (removenode from 1.2) • Dead host's token removed from ring • Next host in ring assumes range • Replica count restored • Involves at most 2 * RF - 1 nodes • If we can make it faster, we can store more data! 9 Thursday, 24 October 13
  10. 10. Virtual Nodes! 10 Thursday, 24 October 13
  11. 11. Virtual nodes in Cassandra 1.2+ • More than one token per node • Random token assignment • Incremental cluster resize, one node at a time • Streaming to/from all nodes, not just neighbors • Only random partitioners are supported • Multi-DC support still works in the same way 11 Thursday, 24 October 13
  12. 12. Different virtual nodes strategies Number partitions Partition Size Random (Cassandra 1.2+) O(N) O(B/N) Fixed (Riak) O(1) O(B) Auto-sharding (MongoDb) O(B) O(1) N = number of nodes B = size of dataset (read more at http://bit.ly/virtualnodes) Thursday, 24 October 13 12
  13. 13. Virtual Nodes!   New in 1.2 Enabled by default in 2.0 → set num_tokens: 256 in cassandra.yaml 13 Thursday, 24 October 13
  14. 14. Adding nodes to a cluster • From a single node... • Multiple tokens • Ranges of different sizes 14 Thursday, 24 October 13
  15. 15. Adding nodes to a cluster • We add a second node • “Steals” ranges from the existing node 15 Thursday, 24 October 13
  16. 16. Adding nodes to a cluster • And a third one... • “Steals” ranges from the existing nodes • Distribution is close to 1/3 each 16 Thursday, 24 October 13
  17. 17. An ideal distribution 17 Thursday, 24 October 13
  18. 18. Actually more like this 18 Thursday, 24 October 13
  19. 19. Bootstrap • Assign a new host T random tokens (T=256) • New tokens split ranges from existing nodes • Each existing node contributes to the bootstrap • Optimal data movement • No need to rebalance, or double cluster size • No need to calculate tokens 19 Thursday, 24 October 13
  20. 20. Removing nodes from a cluster Removing the node with blue ranges: 20 Thursday, 24 October 13
  21. 21. Removing nodes from a cluster 21 Thursday, 24 October 13
  22. 22. Removing a node • Nodetool removetoken removenode • nodetool removenode <host_id> • Dead host's tokens removed from ring • Ranges recalculated & data moved • All nodes participate! 22 Thursday, 24 October 13
  23. 23. nodetool ring becomes useless... $ nodetool ring Datacenter: datacenter1 ========== Address Rack Status State Load 192.168.100.2 192.168.100.2 192.168.100.2 192.168.100.2 192.168.100.2 192.168.100.2 192.168.100.2 192.168.100.2 192.168.100.2 192.168.100.2 192.168.100.2 192.168.100.2 192.168.100.2 192.168.100.2 48.18 48.18 48.18 48.18 48.18 48.18 48.18 48.18 48.18 48.18 48.18 48.18 48.18 48.18 rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 Up Up Up Up Up Up Up Up Up Up Up Up Up Up Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Normal Owns KB KB KB KB KB KB KB KB KB KB KB KB KB KB 19.46% 19.46% 19.46% 19.46% 19.46% 19.46% 19.46% 19.46% 19.46% 19.46% 19.46% 19.46% 19.46% 19.46% Token 908086307850 -92138833317 -91449505236 -89961709812 -89833237466 -89829145910 -88349645925 -87940053784 -87315744643 -86833403935 -86172092729 -85207698040 -85134888150 -85110178049 -84965473082 23 Thursday, 24 October 13
  24. 24. nodetool status $ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns UN 192.168.100.2 48.18 KB 256 19.5% UN 192.168.100.3 48.21 KB 256 19.9% UN 192.168.100.1 46.19 KB 256 21.3% UN 192.168.100.5 48.13 KB 256 18.9% UN 192.168.100.4 48.15 KB 256 20.5% Host ID bb84e34e-b929-41c2-a5 50e8c1b1-a28f-431a-85 67bdd989-1b34-4bbd-a5 1a88e040-84fd-4461-80 3be40484-8225-467c-82 24 Thursday, 24 October 13
  25. 25. Heterogeneity Fewer tokens, less data! 25 Thursday, 24 October 13
  26. 26. Modeled with simulated token assignment Virtual Nodes: Operational Aspirin Frequency mean range size Range size (arbitrary units) Thursday, 24 October 13 26
  27. 27. How does this lead to balanced load?! • Each host has the same distribution of range sizes • So will assume roughly equal portions of the key-space • Modelled with simulated data inserted into ranges... 27 Thursday, 24 October 13
  28. 28. Normalised data load Virtual Nodes: Operational Aspirin Virtual node (location in key-space) Thursday, 24 October 13 28
  29. 29. Frequency How balanced is balanced? Virtual Nodes: Operational Aspirin Normalised load (arbitrary units) Thursday, 24 October 13 29
  30. 30. A balanced cluster • Keys are randomly distributed • V-node partition will assume the load proportional to its size • Load tends towards balance with increase in number of nodes • 2 nodes: 48.4% and 51.6% • 3 nodes: 34.3%, 33.0%, 32.7% • 4 nodes: 24.3%, 25.2%, 24.9%, 25.6% 30 Thursday, 24 October 13
  31. 31. Performance testing • 17 node EC2 m1.large • Inserted 460 million keys • at RF=3 • Timed removenode and then bootstrap • Results at http://bit.ly/vnodesperf 31 Thursday, 24 October 13
  32. 32. Performance testing Time (seconds) 500 Cassandra 1.2 Cassandra 1.1 375 250 125 0 removenode bootstrap 32 Thursday, 24 October 13
  33. 33. Migration path for a non-vnode cluster • Several techniques to migrate to vnodes • The “simplest” is to rebuild your cluster • With downtime: restore from backup • Without downtime: twice the hardware • “shuffle” is the proposed alternative • Migrate all nodes to vnodes, shuffle ranges • Very few success stories 33 Thursday, 24 October 13
  34. 34. Conclusion • You should already be using virtual nodes! • Token management is a thing of the past • Embrace the randomness • Scale up and down without pain 34 Thursday, 24 October 13
  35. 35. Thanks! @yowgi @acunu Thursday, 24 October 13
  36. 36. We’re hiring! • Acunu suggested and developed virtual nodes • Patches by @samoverton and @jericevans • Eric Evans also contributed much of CQL • We are looking for developers to work on Apache Cassandra, contributing features and enhancements to the Open-Source project 36 Thursday, 24 October 13

×