SlideShare a Scribd company logo
1 of 19
Download to read offline
THE INNER WORKINGS OF
AMAZON DYNAMO
Jonathan Lau	

Nov 2013

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
MOTIVATION AND BIO

•

Early stage companies	


•

Build bigger system	


•

Specialize in backend
system

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
DISTRIBUTE / CENTRALIZE
Distributed

Centralized

Data

Different data for each
node

One master copy

Replicas

Replicate smaller data set
for each of the nodes

Replicate the master copy
into read slaves

Scaling

Data are shared into the
nodes by default

Extra work to shard

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
WHAT ABOUT NOSQL?
High performance solution != scaling
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
DYNAMO DESIGN
CONSIDERATION
•

Distributed key value store	


•

Incremental scalability - Scaling one node at a time	


•

Decentralized design - Gossip-based protocol for membership
and failure detection	


•

Symmetry - All the nodes have the same functionality	


•

Heterogeneity - The system will be deployed in a environment
with huge variance on hardware and system performance.
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
put()
get()
A
H

Request for key "K", which is in [C, D)
B

G

C

F

D

E

HIGH LEVEL CONCEPT
Distribute the data in N nodes in a ring
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
DYNAMO’S CHALLENGES
•

Data partitioning	


•

N-1 replicas	


•

High availability for writes	


•

Handling temporary failures	


•

Recovering from permanent failures	


•

Membership and failure detection
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
PARTITIONING
•

128 bit MD5 hash	


•

Consistent hashing for key
partitioning	


•

Virtual node helps improve
the local distribution	


•

Request can hit any of the
node on the key preference
list (coordinator)

Request for key K in [B, C)

A

B

C

D

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
REPLICATION

•

Replication is stored by N-1
successor nodes	


•

The nodes with the replicas
and the coordinator node
forms the preference list.

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
AVAILABLE FOR WRITES
•

Accepts all the writes based on the version modified 	


•

Tracking modification and base version by vector clock	


•

Accepts all the writes and the vector clock	


•

Conflict resolution by examining the vector clock on the objects and
reconcile during the read operation	


•

Consistency issue arises because of network or node failure	


•

Oldest vector clock items will be purged
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
HANDLING TEMPORARY
FAILURES
•

Trade off between durability and availability	


•

Sloppy Quorum - write / read is only consider successful if
the first N healthy nodes return from the preference list.	


•

Hinted hand off - write will be picked up by the replicas
when the designated coordinator node is down. The write
picked up by replica will have hint about the intended
recipient for the write so we can reconstruct the state.
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
REPLICA SYNCHRON
•

Dynamo uses Merkle tree to track hash for the keys	


•

Passing only the root hash to validate
synchronization states between the replicas	


•

If a replica is deemed to be out of sync, the node
can traverse down the tree to figure out the exact
mismatch portion.
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
NODE MEMBERSHIP
•

Partition and placement information is propagate via a
gossip protocol	


•

Each node will be aware of the token range of its peer	


•

They have seed node in the cluster to speed up the
membership and the key range membership for the ring	


•

Nodes are not really aware of each other until an actual
delete happens
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
GET() AND PUT()
What happen during a read or write request?
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
GET() AND PUT()
•

get() and put() are routed through a generic load balancer +
partition aware library to route traffic	


•

top N nodes in the preference list for key K are the
coordinators.	


•

Requests basically go down the list and bad nodes are
skipped over	


•

Two configuration parameters: R and W, where R + W > N. 
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
MORE ON GET() AND PUT()
When a writes happens: 	

•

coordinator generates a vector clock value	


•

sends the new value along with the vector clock value to N highest ranked
reachable nodes	


•

If at least W-1 node responded, the write is considered successful.	


When a read happens: 	

•

coordinate sends a read request to N highest ranked reachable nodes	


•

wait for R nodes return, and then return the result to client
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
WHAT DOES IT ALL MEAN
How does all these ties in together?
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
WHAT DOES IT MEAN?
•

Dynamo shards the data from day 1	


•

Replica and redundancy is baked in from day 1	


•

The configuration parameter W and R has a huge effect our
trade off between availability and durability.	

•

•

W + R > N	


Consistency resolution at read will allow more controlled conflict
resolution strategy
Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
HAPPY
SCALING
Read the dynamo design
paper @ 	

http://bit.ly/QeM8AC

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

More Related Content

What's hot

Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Web Services
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and consSaniya Khalsa
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremGrisha Weintraub
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureGwen (Chen) Shapira
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupSnehal Nagmote
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBaseHBaseCon
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
Citus Architecture: Extending Postgres to Build a Distributed Database
Citus Architecture: Extending Postgres to Build a Distributed DatabaseCitus Architecture: Extending Postgres to Build a Distributed Database
Citus Architecture: Extending Postgres to Build a Distributed DatabaseOzgun Erdogan
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRyan Bosshart
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseCloudera, Inc.
 

What's hot (20)

Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and cons
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Citus Architecture: Extending Postgres to Build a Distributed Database
Citus Architecture: Extending Postgres to Build a Distributed DatabaseCitus Architecture: Extending Postgres to Build a Distributed Database
Citus Architecture: Extending Postgres to Build a Distributed Database
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLib
 
Unit ii sem-v-hadoop
Unit ii  sem-v-hadoopUnit ii  sem-v-hadoop
Unit ii sem-v-hadoop
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
 

Viewers also liked

Advantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureAdvantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureDuy Lâm
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Consistent hashing
Consistent hashingConsistent hashing
Consistent hashingJooho Lee
 
Introduction to aws dynamo db
Introduction to aws dynamo dbIntroduction to aws dynamo db
Introduction to aws dynamo dbOmid Vahdaty
 
Fluid power, Hydraulic & penumatic
Fluid power, Hydraulic & penumaticFluid power, Hydraulic & penumatic
Fluid power, Hydraulic & penumaticMusa Sabri
 
Dynamo DB & RDS Deep Dive - AWS India Summit 2012
Dynamo DB & RDS Deep Dive - AWS India Summit 2012Dynamo DB & RDS Deep Dive - AWS India Summit 2012
Dynamo DB & RDS Deep Dive - AWS India Summit 2012Amazon Web Services
 

Viewers also liked (11)

Advantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureAdvantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architecture
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Emergform
EmergformEmergform
Emergform
 
4 fluids
4 fluids4 fluids
4 fluids
 
Consistent hashing
Consistent hashingConsistent hashing
Consistent hashing
 
Hydrolic Fluid purpose & properties (chapter 2)
Hydrolic Fluid purpose & properties (chapter 2)Hydrolic Fluid purpose & properties (chapter 2)
Hydrolic Fluid purpose & properties (chapter 2)
 
Introduction to aws dynamo db
Introduction to aws dynamo dbIntroduction to aws dynamo db
Introduction to aws dynamo db
 
Fluid power, Hydraulic & penumatic
Fluid power, Hydraulic & penumaticFluid power, Hydraulic & penumatic
Fluid power, Hydraulic & penumatic
 
Introducing DynamoDB
Introducing DynamoDBIntroducing DynamoDB
Introducing DynamoDB
 
Dynamo DB & RDS Deep Dive - AWS India Summit 2012
Dynamo DB & RDS Deep Dive - AWS India Summit 2012Dynamo DB & RDS Deep Dive - AWS India Summit 2012
Dynamo DB & RDS Deep Dive - AWS India Summit 2012
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 

Similar to The inner workings of Dynamo DB

HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Futurercastain
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Modelsiammutex
 
Thoughts on consistency models
Thoughts on consistency modelsThoughts on consistency models
Thoughts on consistency modelsrogerbodamer
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Closing Keynote
Closing KeynoteClosing Keynote
Closing KeynoteNeo4j
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Adding Real-time Features to PHP Applications
Adding Real-time Features to PHP ApplicationsAdding Real-time Features to PHP Applications
Adding Real-time Features to PHP ApplicationsRonny López
 
Cloud computing Module 2 First Part
Cloud computing Module 2 First PartCloud computing Module 2 First Part
Cloud computing Module 2 First PartSoumee Maschatak
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...Tristan Penman
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorStéphane Maldini
 
Low Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in HadoopLow Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in HadoopInSemble
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
 
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Continuent
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Consistency Models in New Generation Databases
Consistency Models in New Generation DatabasesConsistency Models in New Generation Databases
Consistency Models in New Generation Databasesiammutex
 
Consistency-New-Generation-Databases
Consistency-New-Generation-DatabasesConsistency-New-Generation-Databases
Consistency-New-Generation-DatabasesRoger Xia
 

Similar to The inner workings of Dynamo DB (20)

HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Future
 
Scalable Web Apps
Scalable Web AppsScalable Web Apps
Scalable Web Apps
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Models
 
Thoughts on consistency models
Thoughts on consistency modelsThoughts on consistency models
Thoughts on consistency models
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Closing Keynote
Closing KeynoteClosing Keynote
Closing Keynote
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Adding Real-time Features to PHP Applications
Adding Real-time Features to PHP ApplicationsAdding Real-time Features to PHP Applications
Adding Real-time Features to PHP Applications
 
Cloud computing Module 2 First Part
Cloud computing Module 2 First PartCloud computing Module 2 First Part
Cloud computing Module 2 First Part
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
 
Storm
StormStorm
Storm
 
Low Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in HadoopLow Latency Streaming Data Processing in Hadoop
Low Latency Streaming Data Processing in Hadoop
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
 
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Consistency Models in New Generation Databases
Consistency Models in New Generation DatabasesConsistency Models in New Generation Databases
Consistency Models in New Generation Databases
 
Consistency-New-Generation-Databases
Consistency-New-Generation-DatabasesConsistency-New-Generation-Databases
Consistency-New-Generation-Databases
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

The inner workings of Dynamo DB

  • 1. THE INNER WORKINGS OF AMAZON DYNAMO Jonathan Lau Nov 2013 Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 2. MOTIVATION AND BIO • Early stage companies • Build bigger system • Specialize in backend system Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 3. DISTRIBUTE / CENTRALIZE Distributed Centralized Data Different data for each node One master copy Replicas Replicate smaller data set for each of the nodes Replicate the master copy into read slaves Scaling Data are shared into the nodes by default Extra work to shard Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 4. WHAT ABOUT NOSQL? High performance solution != scaling Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 5. DYNAMO DESIGN CONSIDERATION • Distributed key value store • Incremental scalability - Scaling one node at a time • Decentralized design - Gossip-based protocol for membership and failure detection • Symmetry - All the nodes have the same functionality • Heterogeneity - The system will be deployed in a environment with huge variance on hardware and system performance. Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 6. put() get() A H Request for key "K", which is in [C, D) B G C F D E HIGH LEVEL CONCEPT Distribute the data in N nodes in a ring Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 7. DYNAMO’S CHALLENGES • Data partitioning • N-1 replicas • High availability for writes • Handling temporary failures • Recovering from permanent failures • Membership and failure detection Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 8. PARTITIONING • 128 bit MD5 hash • Consistent hashing for key partitioning • Virtual node helps improve the local distribution • Request can hit any of the node on the key preference list (coordinator) Request for key K in [B, C) A B C D Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 9. REPLICATION • Replication is stored by N-1 successor nodes • The nodes with the replicas and the coordinator node forms the preference list. Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 10. AVAILABLE FOR WRITES • Accepts all the writes based on the version modified • Tracking modification and base version by vector clock • Accepts all the writes and the vector clock • Conflict resolution by examining the vector clock on the objects and reconcile during the read operation • Consistency issue arises because of network or node failure • Oldest vector clock items will be purged Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 11. HANDLING TEMPORARY FAILURES • Trade off between durability and availability • Sloppy Quorum - write / read is only consider successful if the first N healthy nodes return from the preference list. • Hinted hand off - write will be picked up by the replicas when the designated coordinator node is down. The write picked up by replica will have hint about the intended recipient for the write so we can reconstruct the state. Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 12. REPLICA SYNCHRON • Dynamo uses Merkle tree to track hash for the keys • Passing only the root hash to validate synchronization states between the replicas • If a replica is deemed to be out of sync, the node can traverse down the tree to figure out the exact mismatch portion. Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 13. NODE MEMBERSHIP • Partition and placement information is propagate via a gossip protocol • Each node will be aware of the token range of its peer • They have seed node in the cluster to speed up the membership and the key range membership for the ring • Nodes are not really aware of each other until an actual delete happens Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 14. GET() AND PUT() What happen during a read or write request? Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 15. GET() AND PUT() • get() and put() are routed through a generic load balancer + partition aware library to route traffic • top N nodes in the preference list for key K are the coordinators. • Requests basically go down the list and bad nodes are skipped over • Two configuration parameters: R and W, where R + W > N.  Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 16. MORE ON GET() AND PUT() When a writes happens:  • coordinator generates a vector clock value • sends the new value along with the vector clock value to N highest ranked reachable nodes • If at least W-1 node responded, the write is considered successful. When a read happens:  • coordinate sends a read request to N highest ranked reachable nodes • wait for R nodes return, and then return the result to client Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 17. WHAT DOES IT ALL MEAN How does all these ties in together? Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 18. WHAT DOES IT MEAN? • Dynamo shards the data from day 1 • Replica and redundancy is baked in from day 1 • The configuration parameter W and R has a huge effect our trade off between availability and durability. • • W + R > N Consistency resolution at read will allow more controlled conflict resolution strategy Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com
  • 19. HAPPY SCALING Read the dynamo design paper @ http://bit.ly/QeM8AC Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com