SlideShare a Scribd company logo
1 of 50
Dynamo concepts in depth.
 
 
 
 
 
 
 
 Pavlo Baron, codecentric AG
Friday, August 31, 12
Pavlo Baron
                        pavlo.baron@codecentric.de
                                     @pavlobaron
Friday, August 31, 12
The shopping cart case




Friday, August 31, 12
The 2 AM alarm call case




Friday, August 31, 12
The Tower of Babel case




Friday, August 31, 12
The Neo vs. Smiths case




Friday, August 31, 12
The Pavlo case




Friday, August 31, 12
Friday, August 31, 12
So Dynamo isn’t about speed.

                                It’s about immediate,
                                       reliable writes.

                                           It’s about
                                operation relaxation.

                               It’s about distribution
                                  and fault tolerance.

                                    It’s about almost
                                    linear scalability.
Friday, August 31, 12
Time and timestamps




Friday, August 31, 12
Clocks


V(i), V(j): competing


Conflict resolution:


 
 
 1: siblings, client

 
 
 2: merge, system

 
 
 3: voting, system



Friday, August 31, 12
Vector clocks
Node 1




                1,0,0    2,2,0           3,2,0     4,3,3
Node 2




                 1,1,0   1,2,0   1,3,3             4,4,3
Node 3




              1,0,1      1,2,2   1,2,3             4,3,4


Friday, August 31, 12
Vector clocks
Node 1 Node 2 Node 3 Node 4


                              1,0,0,0



                               1,1,0,0     1,2,0,0   1,3,0,3



                              1,0,1,0                           1,0,2,0



                              1,0,0,1    1,2,0,2     1,2,0,3

   Friday, August 31, 12
O(1) for data lookups / delta tracking




                                     #
Friday, August 31, 12
Merkle Trees


N, M: nodes
HT(N), HT(M): hash trees


M needs update:

 
 
 obtain HT(N)

 
 
 calc delta(HT(M), HT(N))

 
 
 pull keys(delta)



Friday, August 31, 12
Node a.1                                    Merkle Trees
                              a
                        ab        ac
     abc                abd       acb   acc


     abe                abd       ada   adb
                        ab        ad
                              a
  Node a.2
Friday, August 31, 12
Node a.1                                    Merkle Trees
                              a
                        ab
     abc                abd


                        abd       ada   adb
                        ab        ad
                              a
  Node a.2
Friday, August 31, 12
“Equal” nodes based decentralized distribution




Friday, August 31, 12
Consensus, agreement, voting, quorum




Friday, August 31, 12
Consistent hashing - the ring


       X bit integer space
       
 
 
 0 <= N <= 2 ^ X


       or: 2 x Pi
       
 
 
 0 <= A <= 2 x Pi
       
 
 
 x(N) = cos(A)
       
 
 
 y(N) = sin(A)



Friday, August 31, 12
Quorum


 V: vnodes holding a key
 W: write quorum
 R: read quorum
 DW: durable write quorum

 
 
 
     W > 0.5 * V
 
 
 
 R+W>V



Friday, August 31, 12
Insert key
   Key = “foo”
                                    (sloppy quorum)
  # = N, W = 2



                            replicate

                        N
                              ok




Friday, August 31, 12
Add node




                             co
                                py
                                     leave
                        leave
                                        co
                                             py
                         py




                                     leave
                        co




Friday, August 31, 12
Lookup key
                                               (sloppy
                                             quorum)
  N
                           Value = “bar”




                        Key = “foo”
                        # = N, R = 2



Friday, August 31, 12
Remove
                                         node


                        copy



                               leave




Friday, August 31, 12
Gossip – node down/up
Node 1
Node 2




                          update,                 read,
                update                 update
                          4 down                  4 up
Node 3 Node 4




                         update            read


 Friday, August 31, 12
Eventual consistency




Friday, August 31, 12
BASE


 Basically Available,
 Soft-state,
 Eventually consistent


 Opposite to ACID




Friday, August 31, 12
Read your write consistency


     FE1                          FE2
          write         read        write   read
           v2            v2          v1      v1




            v1           v2       v3

                           Data store
Friday, August 31, 12
Session consistency

                               FE
       Session 1                    Session 2
           write        read         write      read
            v2           v2           v1         v1




            v1           v2         v3

                          Data store
Friday, August 31, 12
Monotonic read consistency


     FE1                                  FE2
           read         read   read         read   read
            v2           v2     v3           v3     v4




            v1           v2       v3     v4

                                      Data store
Friday, August 31, 12
Monotonic write consistency


     FE1                           FE2
          write         write        read   read
           v1            v2           v3     v3




            v1          v2           v3     v4

                                      Data store
Friday, August 31, 12
Eventual consistency


     FE1                                         FE2
         read           read   read     read      write
          v1             v2     v2       v3        v3




            v1          v2      v3

                           Data store
Friday, August 31, 12
Hinted handoff


  N: node, G: group including N


  node(N) is unavailable
  
 
 
 replicate to G or
  
 
 
 store data(N) locally
  
 
 
 hint handoff for later
  
 node(N) is alive
  
 
 
 handoff data to node(N)


Friday, August 31, 12
Key = “foo”, # = N ->                Direct
    handoff hint = true                 replica
                                          fails
      Key = “foo”
                            N

                            replicate




Friday, August 31, 12
Replica
                        handoff   recovers




Friday, August 31, 12
All
   Key = “foo”,
   # = N ->                 replicas
   handoff hint =                fail
   true

                        N




Friday, August 31, 12
All
                                    replicas
                        handoff     recover




                        replicate




Friday, August 31, 12
Friday, August 31, 12
Latency is an adjustment screw




Friday, August 31, 12
Availability is an adjustment screw




Friday, August 31, 12
CAP – the variations


  CA – irrelevant

  CP – eventually unavailable offering
  maximum consistency

  AP – eventually inconsistent offering
  maximum availability




Friday, August 31, 12
CAP – the tradeoff




         A                           C




Friday, August 31, 12
Replica 1                          CP

              v1             read

               v2            write
                              v2




                        v2

                        v1   read

  Replica 2
Friday, August 31, 12
Replica 1                   CP (partition)

              v1             read

               v2            write
                              v2




                        v1   read

  Replica 2
Friday, August 31, 12
Replica 1                                 AP

              v1                    write
                                     v2
              v2                    read



                        replicate


              v2              v1    read


  Replica 2
Friday, August 31, 12
Replica 1                        AP (partition)

              v1                  write
                                   v2
              v2                  read
                        hint
                        handoff
                          v2

                          v1      read


  Replica 2
Friday, August 31, 12
Frequent structure changes




Friday, August 31, 12
Thank you




Friday, August 31, 12
Many graphics I’ve
                                   created myself

                        Some images originate from
                                 istockphoto.com

                             except few ones taken
                                    from Wikipedia
                                and product pages




Friday, August 31, 12

More Related Content

More from Pavlo Baron

More from Pavlo Baron (19)

(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)
 
Near realtime analytics - technology choice (@pavlobaron)
Near realtime analytics - technology choice (@pavlobaron)Near realtime analytics - technology choice (@pavlobaron)
Near realtime analytics - technology choice (@pavlobaron)
 
Set this Big Data technology zoo in order (@pavlobaron)
Set this Big Data technology zoo in order (@pavlobaron)Set this Big Data technology zoo in order (@pavlobaron)
Set this Big Data technology zoo in order (@pavlobaron)
 
a Tech guy’s take on Big Data business cases (@pavlobaron)
a Tech guy’s take on Big Data business cases (@pavlobaron)a Tech guy’s take on Big Data business cases (@pavlobaron)
a Tech guy’s take on Big Data business cases (@pavlobaron)
 
Diving into Erlang is a one-way ticket (@pavlobaron)
Diving into Erlang is a one-way ticket (@pavlobaron)Diving into Erlang is a one-way ticket (@pavlobaron)
Diving into Erlang is a one-way ticket (@pavlobaron)
 
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
 
From Hand To Mouth (@pavlobaron)
From Hand To Mouth (@pavlobaron)From Hand To Mouth (@pavlobaron)
From Hand To Mouth (@pavlobaron)
 
The Big Data Developer (@pavlobaron)
The Big Data Developer (@pavlobaron)The Big Data Developer (@pavlobaron)
The Big Data Developer (@pavlobaron)
 
What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)
 
20 reasons why we don't need architects (@pavlobaron)
20 reasons why we don't need architects (@pavlobaron)20 reasons why we don't need architects (@pavlobaron)
20 reasons why we don't need architects (@pavlobaron)
 
NoSQL - how it works (@pavlobaron)
NoSQL - how it works (@pavlobaron)NoSQL - how it works (@pavlobaron)
NoSQL - how it works (@pavlobaron)
 
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
 
The Agile Alibi (Pavlo Baron)
The Agile Alibi (Pavlo Baron)The Agile Alibi (Pavlo Baron)
The Agile Alibi (Pavlo Baron)
 
Harry Potter and Enormous Data (Pavlo Baron)
Harry Potter and Enormous Data (Pavlo Baron)Harry Potter and Enormous Data (Pavlo Baron)
Harry Potter and Enormous Data (Pavlo Baron)
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)
 
Let It Crash (@pavlobaron)
Let It Crash (@pavlobaron)Let It Crash (@pavlobaron)
Let It Crash (@pavlobaron)
 
JUGS June'11 - Erlang/OTP
JUGS June'11 - Erlang/OTPJUGS June'11 - Erlang/OTP
JUGS June'11 - Erlang/OTP
 
Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)Big Data - JAX2011 (Pavlo Baron)
Big Data - JAX2011 (Pavlo Baron)
 
BigData & CDN - OOP2011 (Pavlo Baron)
BigData & CDN - OOP2011 (Pavlo Baron)BigData & CDN - OOP2011 (Pavlo Baron)
BigData & CDN - OOP2011 (Pavlo Baron)
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Dynamo concepts in depth (@pavlobaron)

  • 1. Dynamo concepts in depth. Pavlo Baron, codecentric AG Friday, August 31, 12
  • 2. Pavlo Baron pavlo.baron@codecentric.de @pavlobaron Friday, August 31, 12
  • 3. The shopping cart case Friday, August 31, 12
  • 4. The 2 AM alarm call case Friday, August 31, 12
  • 5. The Tower of Babel case Friday, August 31, 12
  • 6. The Neo vs. Smiths case Friday, August 31, 12
  • 7. The Pavlo case Friday, August 31, 12
  • 9. So Dynamo isn’t about speed. It’s about immediate, reliable writes. It’s about operation relaxation. It’s about distribution and fault tolerance. It’s about almost linear scalability. Friday, August 31, 12
  • 11. Clocks V(i), V(j): competing Conflict resolution: 1: siblings, client 2: merge, system 3: voting, system Friday, August 31, 12
  • 12. Vector clocks Node 1 1,0,0 2,2,0 3,2,0 4,3,3 Node 2 1,1,0 1,2,0 1,3,3 4,4,3 Node 3 1,0,1 1,2,2 1,2,3 4,3,4 Friday, August 31, 12
  • 13. Vector clocks Node 1 Node 2 Node 3 Node 4 1,0,0,0 1,1,0,0 1,2,0,0 1,3,0,3 1,0,1,0 1,0,2,0 1,0,0,1 1,2,0,2 1,2,0,3 Friday, August 31, 12
  • 14. O(1) for data lookups / delta tracking # Friday, August 31, 12
  • 15. Merkle Trees N, M: nodes HT(N), HT(M): hash trees M needs update: obtain HT(N) calc delta(HT(M), HT(N)) pull keys(delta) Friday, August 31, 12
  • 16. Node a.1 Merkle Trees a ab ac abc abd acb acc abe abd ada adb ab ad a Node a.2 Friday, August 31, 12
  • 17. Node a.1 Merkle Trees a ab abc abd abd ada adb ab ad a Node a.2 Friday, August 31, 12
  • 18. “Equal” nodes based decentralized distribution Friday, August 31, 12
  • 19. Consensus, agreement, voting, quorum Friday, August 31, 12
  • 20. Consistent hashing - the ring X bit integer space 0 <= N <= 2 ^ X or: 2 x Pi 0 <= A <= 2 x Pi x(N) = cos(A) y(N) = sin(A) Friday, August 31, 12
  • 21. Quorum V: vnodes holding a key W: write quorum R: read quorum DW: durable write quorum W > 0.5 * V R+W>V Friday, August 31, 12
  • 22. Insert key Key = “foo” (sloppy quorum) # = N, W = 2 replicate N ok Friday, August 31, 12
  • 23. Add node co py leave leave co py py leave co Friday, August 31, 12
  • 24. Lookup key (sloppy quorum) N Value = “bar” Key = “foo” # = N, R = 2 Friday, August 31, 12
  • 25. Remove node copy leave Friday, August 31, 12
  • 26. Gossip – node down/up Node 1 Node 2 update, read, update update 4 down 4 up Node 3 Node 4 update read Friday, August 31, 12
  • 28. BASE Basically Available, Soft-state, Eventually consistent Opposite to ACID Friday, August 31, 12
  • 29. Read your write consistency FE1 FE2 write read write read v2 v2 v1 v1 v1 v2 v3 Data store Friday, August 31, 12
  • 30. Session consistency FE Session 1 Session 2 write read write read v2 v2 v1 v1 v1 v2 v3 Data store Friday, August 31, 12
  • 31. Monotonic read consistency FE1 FE2 read read read read read v2 v2 v3 v3 v4 v1 v2 v3 v4 Data store Friday, August 31, 12
  • 32. Monotonic write consistency FE1 FE2 write write read read v1 v2 v3 v3 v1 v2 v3 v4 Data store Friday, August 31, 12
  • 33. Eventual consistency FE1 FE2 read read read read write v1 v2 v2 v3 v3 v1 v2 v3 Data store Friday, August 31, 12
  • 34. Hinted handoff N: node, G: group including N node(N) is unavailable replicate to G or store data(N) locally hint handoff for later node(N) is alive handoff data to node(N) Friday, August 31, 12
  • 35. Key = “foo”, # = N -> Direct handoff hint = true replica fails Key = “foo” N replicate Friday, August 31, 12
  • 36. Replica handoff recovers Friday, August 31, 12
  • 37. All Key = “foo”, # = N -> replicas handoff hint = fail true N Friday, August 31, 12
  • 38. All replicas handoff recover replicate Friday, August 31, 12
  • 40. Latency is an adjustment screw Friday, August 31, 12
  • 41. Availability is an adjustment screw Friday, August 31, 12
  • 42. CAP – the variations CA – irrelevant CP – eventually unavailable offering maximum consistency AP – eventually inconsistent offering maximum availability Friday, August 31, 12
  • 43. CAP – the tradeoff A C Friday, August 31, 12
  • 44. Replica 1 CP v1 read v2 write v2 v2 v1 read Replica 2 Friday, August 31, 12
  • 45. Replica 1 CP (partition) v1 read v2 write v2 v1 read Replica 2 Friday, August 31, 12
  • 46. Replica 1 AP v1 write v2 v2 read replicate v2 v1 read Replica 2 Friday, August 31, 12
  • 47. Replica 1 AP (partition) v1 write v2 v2 read hint handoff v2 v1 read Replica 2 Friday, August 31, 12
  • 50. Many graphics I’ve created myself Some images originate from istockphoto.com except few ones taken from Wikipedia and product pages Friday, August 31, 12