Dynamo concepts in depth (@pavlobaron)

4,969 views
4,450 views

Published on

Slides of the talk I did at the NoSQL Roadshow 2012 in Basel

Dynamo concepts in depth (@pavlobaron)

  1. Dynamo concepts in depth. Pavlo Baron, codecentric AGFriday, August 31, 12
  2. Pavlo Baron pavlo.baron@codecentric.de @pavlobaronFriday, August 31, 12
  3. The shopping cart caseFriday, August 31, 12
  4. The 2 AM alarm call caseFriday, August 31, 12
  5. The Tower of Babel caseFriday, August 31, 12
  6. The Neo vs. Smiths caseFriday, August 31, 12
  7. The Pavlo caseFriday, August 31, 12
  8. Friday, August 31, 12
  9. So Dynamo isn’t about speed. It’s about immediate, reliable writes. It’s about operation relaxation. It’s about distribution and fault tolerance. It’s about almost linear scalability.Friday, August 31, 12
  10. Time and timestampsFriday, August 31, 12
  11. ClocksV(i), V(j): competingConflict resolution: 1: siblings, client 2: merge, system 3: voting, systemFriday, August 31, 12
  12. Vector clocksNode 1 1,0,0 2,2,0 3,2,0 4,3,3Node 2 1,1,0 1,2,0 1,3,3 4,4,3Node 3 1,0,1 1,2,2 1,2,3 4,3,4Friday, August 31, 12
  13. Vector clocksNode 1 Node 2 Node 3 Node 4 1,0,0,0 1,1,0,0 1,2,0,0 1,3,0,3 1,0,1,0 1,0,2,0 1,0,0,1 1,2,0,2 1,2,0,3 Friday, August 31, 12
  14. O(1) for data lookups / delta tracking #Friday, August 31, 12
  15. Merkle TreesN, M: nodesHT(N), HT(M): hash treesM needs update: obtain HT(N) calc delta(HT(M), HT(N)) pull keys(delta)Friday, August 31, 12
  16. Node a.1 Merkle Trees a ab ac abc abd acb acc abe abd ada adb ab ad a Node a.2Friday, August 31, 12
  17. Node a.1 Merkle Trees a ab abc abd abd ada adb ab ad a Node a.2Friday, August 31, 12
  18. “Equal” nodes based decentralized distributionFriday, August 31, 12
  19. Consensus, agreement, voting, quorumFriday, August 31, 12
  20. Consistent hashing - the ring X bit integer space 0 <= N <= 2 ^ X or: 2 x Pi 0 <= A <= 2 x Pi x(N) = cos(A) y(N) = sin(A)Friday, August 31, 12
  21. Quorum V: vnodes holding a key W: write quorum R: read quorum DW: durable write quorum W > 0.5 * V R+W>VFriday, August 31, 12
  22. Insert key Key = “foo” (sloppy quorum) # = N, W = 2 replicate N okFriday, August 31, 12
  23. Add node co py leave leave co py py leave coFriday, August 31, 12
  24. Lookup key (sloppy quorum) N Value = “bar” Key = “foo” # = N, R = 2Friday, August 31, 12
  25. Remove node copy leaveFriday, August 31, 12
  26. Gossip – node down/upNode 1Node 2 update, read, update update 4 down 4 upNode 3 Node 4 update read Friday, August 31, 12
  27. Eventual consistencyFriday, August 31, 12
  28. BASE Basically Available, Soft-state, Eventually consistent Opposite to ACIDFriday, August 31, 12
  29. Read your write consistency FE1 FE2 write read write read v2 v2 v1 v1 v1 v2 v3 Data storeFriday, August 31, 12
  30. Session consistency FE Session 1 Session 2 write read write read v2 v2 v1 v1 v1 v2 v3 Data storeFriday, August 31, 12
  31. Monotonic read consistency FE1 FE2 read read read read read v2 v2 v3 v3 v4 v1 v2 v3 v4 Data storeFriday, August 31, 12
  32. Monotonic write consistency FE1 FE2 write write read read v1 v2 v3 v3 v1 v2 v3 v4 Data storeFriday, August 31, 12
  33. Eventual consistency FE1 FE2 read read read read write v1 v2 v2 v3 v3 v1 v2 v3 Data storeFriday, August 31, 12
  34. Hinted handoff N: node, G: group including N node(N) is unavailable replicate to G or store data(N) locally hint handoff for later node(N) is alive handoff data to node(N)Friday, August 31, 12
  35. Key = “foo”, # = N -> Direct handoff hint = true replica fails Key = “foo” N replicateFriday, August 31, 12
  36. Replica handoff recoversFriday, August 31, 12
  37. All Key = “foo”, # = N -> replicas handoff hint = fail true NFriday, August 31, 12
  38. All replicas handoff recover replicateFriday, August 31, 12
  39. Friday, August 31, 12
  40. Latency is an adjustment screwFriday, August 31, 12
  41. Availability is an adjustment screwFriday, August 31, 12
  42. CAP – the variations CA – irrelevant CP – eventually unavailable offering maximum consistency AP – eventually inconsistent offering maximum availabilityFriday, August 31, 12
  43. CAP – the tradeoff A CFriday, August 31, 12
  44. Replica 1 CP v1 read v2 write v2 v2 v1 read Replica 2Friday, August 31, 12
  45. Replica 1 CP (partition) v1 read v2 write v2 v1 read Replica 2Friday, August 31, 12
  46. Replica 1 AP v1 write v2 v2 read replicate v2 v1 read Replica 2Friday, August 31, 12
  47. Replica 1 AP (partition) v1 write v2 v2 read hint handoff v2 v1 read Replica 2Friday, August 31, 12
  48. Frequent structure changesFriday, August 31, 12
  49. Thank youFriday, August 31, 12
  50. Many graphics I’ve created myself Some images originate from istockphoto.com except few ones taken from Wikipedia and product pagesFriday, August 31, 12

×