The inner workings of Dynamo DB


Published on

An introduction to the inner works of dynamo db

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The inner workings of Dynamo DB

  1. 1. THE INNER WORKINGS OF AMAZON DYNAMO Jonathan Lau Nov 2013 Smokehouse Software | Jonathan Lau |
  2. 2. MOTIVATION AND BIO • Early stage companies • Build bigger system • Specialize in backend system Smokehouse Software | Jonathan Lau |
  3. 3. DISTRIBUTE / CENTRALIZE Distributed Centralized Data Different data for each node One master copy Replicas Replicate smaller data set for each of the nodes Replicate the master copy into read slaves Scaling Data are shared into the nodes by default Extra work to shard Smokehouse Software | Jonathan Lau |
  4. 4. WHAT ABOUT NOSQL? High performance solution != scaling Smokehouse Software | Jonathan Lau |
  5. 5. DYNAMO DESIGN CONSIDERATION • Distributed key value store • Incremental scalability - Scaling one node at a time • Decentralized design - Gossip-based protocol for membership and failure detection • Symmetry - All the nodes have the same functionality • Heterogeneity - The system will be deployed in a environment with huge variance on hardware and system performance. Smokehouse Software | Jonathan Lau |
  6. 6. put() get() A H Request for key "K", which is in [C, D) B G C F D E HIGH LEVEL CONCEPT Distribute the data in N nodes in a ring Smokehouse Software | Jonathan Lau |
  7. 7. DYNAMO’S CHALLENGES • Data partitioning • N-1 replicas • High availability for writes • Handling temporary failures • Recovering from permanent failures • Membership and failure detection Smokehouse Software | Jonathan Lau |
  8. 8. PARTITIONING • 128 bit MD5 hash • Consistent hashing for key partitioning • Virtual node helps improve the local distribution • Request can hit any of the node on the key preference list (coordinator) Request for key K in [B, C) A B C D Smokehouse Software | Jonathan Lau |
  9. 9. REPLICATION • Replication is stored by N-1 successor nodes • The nodes with the replicas and the coordinator node forms the preference list. Smokehouse Software | Jonathan Lau |
  10. 10. AVAILABLE FOR WRITES • Accepts all the writes based on the version modified • Tracking modification and base version by vector clock • Accepts all the writes and the vector clock • Conflict resolution by examining the vector clock on the objects and reconcile during the read operation • Consistency issue arises because of network or node failure • Oldest vector clock items will be purged Smokehouse Software | Jonathan Lau |
  11. 11. HANDLING TEMPORARY FAILURES • Trade off between durability and availability • Sloppy Quorum - write / read is only consider successful if the first N healthy nodes return from the preference list. • Hinted hand off - write will be picked up by the replicas when the designated coordinator node is down. The write picked up by replica will have hint about the intended recipient for the write so we can reconstruct the state. Smokehouse Software | Jonathan Lau |
  12. 12. REPLICA SYNCHRON • Dynamo uses Merkle tree to track hash for the keys • Passing only the root hash to validate synchronization states between the replicas • If a replica is deemed to be out of sync, the node can traverse down the tree to figure out the exact mismatch portion. Smokehouse Software | Jonathan Lau |
  13. 13. NODE MEMBERSHIP • Partition and placement information is propagate via a gossip protocol • Each node will be aware of the token range of its peer • They have seed node in the cluster to speed up the membership and the key range membership for the ring • Nodes are not really aware of each other until an actual delete happens Smokehouse Software | Jonathan Lau |
  14. 14. GET() AND PUT() What happen during a read or write request? Smokehouse Software | Jonathan Lau |
  15. 15. GET() AND PUT() • get() and put() are routed through a generic load balancer + partition aware library to route traffic • top N nodes in the preference list for key K are the coordinators. • Requests basically go down the list and bad nodes are skipped over • Two configuration parameters: R and W, where R + W > N.  Smokehouse Software | Jonathan Lau |
  16. 16. MORE ON GET() AND PUT() When a writes happens:  • coordinator generates a vector clock value • sends the new value along with the vector clock value to N highest ranked reachable nodes • If at least W-1 node responded, the write is considered successful. When a read happens:  • coordinate sends a read request to N highest ranked reachable nodes • wait for R nodes return, and then return the result to client Smokehouse Software | Jonathan Lau |
  17. 17. WHAT DOES IT ALL MEAN How does all these ties in together? Smokehouse Software | Jonathan Lau |
  18. 18. WHAT DOES IT MEAN? • Dynamo shards the data from day 1 • Replica and redundancy is baked in from day 1 • The configuration parameter W and R has a huge effect our trade off between availability and durability. • • W + R > N Consistency resolution at read will allow more controlled conflict resolution strategy Smokehouse Software | Jonathan Lau |
  19. 19. HAPPY SCALING Read the dynamo design paper @ Smokehouse Software | Jonathan Lau |