Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Dynamo concepts in depth.        Pavlo Baron, codecentric AGFriday, August 31, 12
Pavlo Baron                        pavlo.baron@codecentric.de                                     @pavlobaronFriday, Augus...
The shopping cart caseFriday, August 31, 12
The 2 AM alarm call caseFriday, August 31, 12
The Tower of Babel caseFriday, August 31, 12
The Neo vs. Smiths caseFriday, August 31, 12
The Pavlo caseFriday, August 31, 12
Friday, August 31, 12
So Dynamo isn’t about speed.                                It’s about immediate,                                       re...
Time and timestampsFriday, August 31, 12
ClocksV(i), V(j): competingConflict resolution:   1: siblings, client   2: merge, system   3: voting, systemFriday, August ...
Vector clocksNode 1                1,0,0    2,2,0           3,2,0     4,3,3Node 2                 1,1,0   1,2,0   1,3,3   ...
Vector clocksNode 1 Node 2 Node 3 Node 4                              1,0,0,0                               1,1,0,0     1,...
O(1) for data lookups / delta tracking                                     #Friday, August 31, 12
Merkle TreesN, M: nodesHT(N), HT(M): hash treesM needs update:   obtain HT(N)   calc delta(HT(M), HT(N))   pull keys(delta...
Node a.1                                    Merkle Trees                              a                        ab        a...
Node a.1                                    Merkle Trees                              a                        ab     abc ...
“Equal” nodes based decentralized distributionFriday, August 31, 12
Consensus, agreement, voting, quorumFriday, August 31, 12
Consistent hashing - the ring       X bit integer space          0 <= N <= 2 ^ X       or: 2 x Pi          0 <= A <= 2 x P...
Quorum V: vnodes holding a key W: write quorum R: read quorum DW: durable write quorum        W > 0.5 * V    R+W>VFriday, ...
Insert key   Key = “foo”                                    (sloppy quorum)  # = N, W = 2                            repli...
Add node                             co                                py                                     leave       ...
Lookup key                                               (sloppy                                             quorum)  N   ...
Remove                                         node                        copy                               leaveFriday,...
Gossip – node down/upNode 1Node 2                          update,                 read,                update            ...
Eventual consistencyFriday, August 31, 12
BASE Basically Available, Soft-state, Eventually consistent Opposite to ACIDFriday, August 31, 12
Read your write consistency     FE1                          FE2          write         read        write   read          ...
Session consistency                               FE       Session 1                    Session 2           write        r...
Monotonic read consistency     FE1                                  FE2           read         read   read         read   ...
Monotonic write consistency     FE1                           FE2          write         write        read   read         ...
Eventual consistency     FE1                                         FE2         read           read   read     read      ...
Hinted handoff  N: node, G: group including N  node(N) is unavailable     replicate to G or     store data(N) locally     ...
Key = “foo”, # = N ->                Direct    handoff hint = true                 replica                                ...
Replica                        handoff   recoversFriday, August 31, 12
All   Key = “foo”,   # = N ->                 replicas   handoff hint =                fail   true                        ...
All                                    replicas                        handoff     recover                        replicat...
Friday, August 31, 12
Latency is an adjustment screwFriday, August 31, 12
Availability is an adjustment screwFriday, August 31, 12
CAP – the variations  CA – irrelevant  CP – eventually unavailable offering  maximum consistency  AP – eventually inconsis...
CAP – the tradeoff         A                           CFriday, August 31, 12
Replica 1                          CP              v1             read               v2            write                  ...
Replica 1                   CP (partition)              v1             read               v2            write             ...
Replica 1                                 AP              v1                    write                                     ...
Replica 1                        AP (partition)              v1                  write                                   v...
Frequent structure changesFriday, August 31, 12
Thank youFriday, August 31, 12
Many graphics I’ve                                   created myself                        Some images originate from     ...
Upcoming SlideShare
Loading in …5
×

Dynamo concepts in depth (@pavlobaron)

6,172 views

Published on

Slides of the talk I did at the NoSQL Roadshow 2012 in Basel

  • Be the first to comment

Dynamo concepts in depth (@pavlobaron)

  1. Dynamo concepts in depth. Pavlo Baron, codecentric AGFriday, August 31, 12
  2. Pavlo Baron pavlo.baron@codecentric.de @pavlobaronFriday, August 31, 12
  3. The shopping cart caseFriday, August 31, 12
  4. The 2 AM alarm call caseFriday, August 31, 12
  5. The Tower of Babel caseFriday, August 31, 12
  6. The Neo vs. Smiths caseFriday, August 31, 12
  7. The Pavlo caseFriday, August 31, 12
  8. Friday, August 31, 12
  9. So Dynamo isn’t about speed. It’s about immediate, reliable writes. It’s about operation relaxation. It’s about distribution and fault tolerance. It’s about almost linear scalability.Friday, August 31, 12
  10. Time and timestampsFriday, August 31, 12
  11. ClocksV(i), V(j): competingConflict resolution: 1: siblings, client 2: merge, system 3: voting, systemFriday, August 31, 12
  12. Vector clocksNode 1 1,0,0 2,2,0 3,2,0 4,3,3Node 2 1,1,0 1,2,0 1,3,3 4,4,3Node 3 1,0,1 1,2,2 1,2,3 4,3,4Friday, August 31, 12
  13. Vector clocksNode 1 Node 2 Node 3 Node 4 1,0,0,0 1,1,0,0 1,2,0,0 1,3,0,3 1,0,1,0 1,0,2,0 1,0,0,1 1,2,0,2 1,2,0,3 Friday, August 31, 12
  14. O(1) for data lookups / delta tracking #Friday, August 31, 12
  15. Merkle TreesN, M: nodesHT(N), HT(M): hash treesM needs update: obtain HT(N) calc delta(HT(M), HT(N)) pull keys(delta)Friday, August 31, 12
  16. Node a.1 Merkle Trees a ab ac abc abd acb acc abe abd ada adb ab ad a Node a.2Friday, August 31, 12
  17. Node a.1 Merkle Trees a ab abc abd abd ada adb ab ad a Node a.2Friday, August 31, 12
  18. “Equal” nodes based decentralized distributionFriday, August 31, 12
  19. Consensus, agreement, voting, quorumFriday, August 31, 12
  20. Consistent hashing - the ring X bit integer space 0 <= N <= 2 ^ X or: 2 x Pi 0 <= A <= 2 x Pi x(N) = cos(A) y(N) = sin(A)Friday, August 31, 12
  21. Quorum V: vnodes holding a key W: write quorum R: read quorum DW: durable write quorum W > 0.5 * V R+W>VFriday, August 31, 12
  22. Insert key Key = “foo” (sloppy quorum) # = N, W = 2 replicate N okFriday, August 31, 12
  23. Add node co py leave leave co py py leave coFriday, August 31, 12
  24. Lookup key (sloppy quorum) N Value = “bar” Key = “foo” # = N, R = 2Friday, August 31, 12
  25. Remove node copy leaveFriday, August 31, 12
  26. Gossip – node down/upNode 1Node 2 update, read, update update 4 down 4 upNode 3 Node 4 update read Friday, August 31, 12
  27. Eventual consistencyFriday, August 31, 12
  28. BASE Basically Available, Soft-state, Eventually consistent Opposite to ACIDFriday, August 31, 12
  29. Read your write consistency FE1 FE2 write read write read v2 v2 v1 v1 v1 v2 v3 Data storeFriday, August 31, 12
  30. Session consistency FE Session 1 Session 2 write read write read v2 v2 v1 v1 v1 v2 v3 Data storeFriday, August 31, 12
  31. Monotonic read consistency FE1 FE2 read read read read read v2 v2 v3 v3 v4 v1 v2 v3 v4 Data storeFriday, August 31, 12
  32. Monotonic write consistency FE1 FE2 write write read read v1 v2 v3 v3 v1 v2 v3 v4 Data storeFriday, August 31, 12
  33. Eventual consistency FE1 FE2 read read read read write v1 v2 v2 v3 v3 v1 v2 v3 Data storeFriday, August 31, 12
  34. Hinted handoff N: node, G: group including N node(N) is unavailable replicate to G or store data(N) locally hint handoff for later node(N) is alive handoff data to node(N)Friday, August 31, 12
  35. Key = “foo”, # = N -> Direct handoff hint = true replica fails Key = “foo” N replicateFriday, August 31, 12
  36. Replica handoff recoversFriday, August 31, 12
  37. All Key = “foo”, # = N -> replicas handoff hint = fail true NFriday, August 31, 12
  38. All replicas handoff recover replicateFriday, August 31, 12
  39. Friday, August 31, 12
  40. Latency is an adjustment screwFriday, August 31, 12
  41. Availability is an adjustment screwFriday, August 31, 12
  42. CAP – the variations CA – irrelevant CP – eventually unavailable offering maximum consistency AP – eventually inconsistent offering maximum availabilityFriday, August 31, 12
  43. CAP – the tradeoff A CFriday, August 31, 12
  44. Replica 1 CP v1 read v2 write v2 v2 v1 read Replica 2Friday, August 31, 12
  45. Replica 1 CP (partition) v1 read v2 write v2 v1 read Replica 2Friday, August 31, 12
  46. Replica 1 AP v1 write v2 v2 read replicate v2 v1 read Replica 2Friday, August 31, 12
  47. Replica 1 AP (partition) v1 write v2 v2 read hint handoff v2 v1 read Replica 2Friday, August 31, 12
  48. Frequent structure changesFriday, August 31, 12
  49. Thank youFriday, August 31, 12
  50. Many graphics I’ve created myself Some images originate from istockphoto.com except few ones taken from Wikipedia and product pagesFriday, August 31, 12

×