SlideShare a Scribd company logo
1 of 19
Dynamo: Amazon’s Highly
Available Key-value Store
Giuseppe DeCandia, Deniz Hastorun,
Madan Jampani, Gunavardhan Kakulapati,
Avinash Lakshman, Alex Pilchin, Swaminathan
Sivasubramanian, Peter Vosshall
and Werner Vogels
Motivation
 Build a distributed storage system:
 Scale
 Simple: key-value
 Highly available
 Guarantee Service Level Agreements (SLA)
System Assumptions and Requirements
 Query Model: simple read and write operations to a data
item that is uniquely identified by a key.
 ACID Properties: Atomicity, Consistency, Isolation,
Durability.
 Efficiency: latency requirements which are in general
measured at the 99.9th percentile of the distribution.
 Other Assumptions: operation environment is assumed
to be non-hostile and there are no security related requirements
such as authentication and authorization.
Service Level Agreements (SLA)
 Application can deliver its
functionality in abounded
time: Every dependency in the
platform needs to deliver its
functionality with even tighter
bounds.
 Example: service guaranteeing
that it will provide a response within
300ms for 99.9% of its requests for a
peak client load of 500 requests per
second.
Service-oriented architecture of
Amazon’s platform
Design Consideration
 Sacrifice strong consistency for availability
 Conflict resolution is executed during read
instead of write, i.e. “always writeable”.
 Other principles:
 Incremental scalability.
 Symmetry.
 Decentralization.
 Heterogeneity.
Summary of techniques used in Dynamo
and their advantages
Problem Technique Advantage
Partitioning Consistent Hashing Incremental Scalability
High Availability for writes
Vector clocks with reconciliation
during reads
Version size is decoupled from
update rates.
Handling temporary failures Sloppy Quorum and hinted handoff Provides high availability and
durability guarantee when some of
the replicas are not available.
Recovering from permanent
failures
Anti-entropy using Merkle trees
Synchronizes divergent replicas in
the background.
Membership and failure detection
Gossip-based membership protocol
and failure detection.
Preserves symmetry and avoids
having a centralized registry for
storing membership and node
liveness information.
Partition Algorithm
 Consistent hashing: the output
range of a hash function is treated as a
fixed circular space or “ring”.
 ”Virtual Nodes”: Each node can
be responsible for more than one
virtual node.
Advantages of using virtual nodes
 If a node becomes unavailable the
load handled by this node is evenly
dispersed across the remaining
available nodes.
 When a node becomes available
again, the newly available node
accepts a roughly equivalent
amount of load from each of the
other available nodes.
 The number of virtual nodes that a
node is responsible can decided
based on its capacity, accounting
for heterogeneity in the physical
infrastructure.
Replication
 Each data item is
replicated at N hosts.
 “preference list”: The list of
nodes that is responsible
for storing a particular key.
Data Versioning
 A put() call may return to its caller before the
update has been applied at all the replicas
 A get() call may return many versions of the
same object.
 Challenge: an object having distinct version sub-histories,
which the system will need to reconcile in the future.
 Solution: uses vector clocks in order to capture causality
between different versions of the same object.
Vector Clock
 A vector clock is a list of (node, counter)
pairs.
 Every version of every object is associated
with one vector clock.
 If the counters on the first object’s clock are
less-than-or-equal to all of the nodes in the
second clock, then the first is an ancestor of
the second and can be forgotten.
Vector clock example
Execution of get () and put ()
operations
1. Route its request through a generic load
balancer that will select a node based on
load information.
2. Use a partition-aware client library that
routes requests directly to the appropriate
coordinator nodes.
Sloppy Quorum
 R/W is the minimum number of nodes that
must participate in a successful read/write
operation.
 Setting R + W > N yields a quorum-like
system.
 In this model, the latency of a get (or put)
operation is dictated by the slowest of the R
(or W) replicas. For this reason, R and W are
usually configured to be less than N, to
provide better latency.
Hinted handoff
 Assume N = 3. When A
is temporarily down or
unreachable during a
write, send replica to D.
 D is hinted that the
replica is belong to A and
it will deliver to A when A
is recovered.
 Again: “always writeable”
Other techniques
 Replica synchronization:
 Merkle hash tree.
 Membership and Failure Detection:
 Gossip
Implementation
 Java
 Local persistence component allows for
different storage engines to be plugged in:
 Berkeley Database (BDB) Transactional Data
Store: object of tens of kilobytes
 MySQL: object of > tens of kilobytes
 BDB Java Edition, etc.
Evaluation
Evaluation

More Related Content

Similar to Dynamo.ppt

Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...Valverde Computing
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupAdam Hutson
 
An introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methodsAn introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methodsAjith Narayanan
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategySaptarshi Chatterjee
 
Distributed Algorithms
Distributed AlgorithmsDistributed Algorithms
Distributed Algorithms913245857
 
A Distributed Control Law for Load Balancing in Content Delivery Networks
A Distributed Control Law for Load Balancing in Content Delivery NetworksA Distributed Control Law for Load Balancing in Content Delivery Networks
A Distributed Control Law for Load Balancing in Content Delivery NetworksSruthi Kamal
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]Chris Suszyński
 
Apache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveApache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveAlex Thompson
 
REEF: Towards a Big Data Stdlib
REEF: Towards a Big Data StdlibREEF: Towards a Big Data Stdlib
REEF: Towards a Big Data StdlibDataWorks Summit
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEAravind NC
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBANikhil Kumar
 

Similar to Dynamo.ppt (20)

Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
Cl306
Cl306Cl306
Cl306
 
An introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methodsAn introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methods
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Cassandra
CassandraCassandra
Cassandra
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
 
I0935053
I0935053I0935053
I0935053
 
Distributed Algorithms
Distributed AlgorithmsDistributed Algorithms
Distributed Algorithms
 
A Distributed Control Law for Load Balancing in Content Delivery Networks
A Distributed Control Law for Load Balancing in Content Delivery NetworksA Distributed Control Law for Load Balancing in Content Delivery Networks
A Distributed Control Law for Load Balancing in Content Delivery Networks
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
 
Apache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveApache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep dive
 
GARUDA
GARUDAGARUDA
GARUDA
 
Oracle Coherence
Oracle CoherenceOracle Coherence
Oracle Coherence
 
REEF: Towards a Big Data Stdlib
REEF: Towards a Big Data StdlibREEF: Towards a Big Data Stdlib
REEF: Towards a Big Data Stdlib
 
Failover cluster
Failover clusterFailover cluster
Failover cluster
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
 
No sql (not only sql)
No sql                 (not only sql)No sql                 (not only sql)
No sql (not only sql)
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBA
 

Recently uploaded

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Dynamo.ppt

  • 1. Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels
  • 2. Motivation  Build a distributed storage system:  Scale  Simple: key-value  Highly available  Guarantee Service Level Agreements (SLA)
  • 3. System Assumptions and Requirements  Query Model: simple read and write operations to a data item that is uniquely identified by a key.  ACID Properties: Atomicity, Consistency, Isolation, Durability.  Efficiency: latency requirements which are in general measured at the 99.9th percentile of the distribution.  Other Assumptions: operation environment is assumed to be non-hostile and there are no security related requirements such as authentication and authorization.
  • 4. Service Level Agreements (SLA)  Application can deliver its functionality in abounded time: Every dependency in the platform needs to deliver its functionality with even tighter bounds.  Example: service guaranteeing that it will provide a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second. Service-oriented architecture of Amazon’s platform
  • 5. Design Consideration  Sacrifice strong consistency for availability  Conflict resolution is executed during read instead of write, i.e. “always writeable”.  Other principles:  Incremental scalability.  Symmetry.  Decentralization.  Heterogeneity.
  • 6. Summary of techniques used in Dynamo and their advantages Problem Technique Advantage Partitioning Consistent Hashing Incremental Scalability High Availability for writes Vector clocks with reconciliation during reads Version size is decoupled from update rates. Handling temporary failures Sloppy Quorum and hinted handoff Provides high availability and durability guarantee when some of the replicas are not available. Recovering from permanent failures Anti-entropy using Merkle trees Synchronizes divergent replicas in the background. Membership and failure detection Gossip-based membership protocol and failure detection. Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information.
  • 7. Partition Algorithm  Consistent hashing: the output range of a hash function is treated as a fixed circular space or “ring”.  ”Virtual Nodes”: Each node can be responsible for more than one virtual node.
  • 8. Advantages of using virtual nodes  If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes.  When a node becomes available again, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes.  The number of virtual nodes that a node is responsible can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.
  • 9. Replication  Each data item is replicated at N hosts.  “preference list”: The list of nodes that is responsible for storing a particular key.
  • 10. Data Versioning  A put() call may return to its caller before the update has been applied at all the replicas  A get() call may return many versions of the same object.  Challenge: an object having distinct version sub-histories, which the system will need to reconcile in the future.  Solution: uses vector clocks in order to capture causality between different versions of the same object.
  • 11. Vector Clock  A vector clock is a list of (node, counter) pairs.  Every version of every object is associated with one vector clock.  If the counters on the first object’s clock are less-than-or-equal to all of the nodes in the second clock, then the first is an ancestor of the second and can be forgotten.
  • 13. Execution of get () and put () operations 1. Route its request through a generic load balancer that will select a node based on load information. 2. Use a partition-aware client library that routes requests directly to the appropriate coordinator nodes.
  • 14. Sloppy Quorum  R/W is the minimum number of nodes that must participate in a successful read/write operation.  Setting R + W > N yields a quorum-like system.  In this model, the latency of a get (or put) operation is dictated by the slowest of the R (or W) replicas. For this reason, R and W are usually configured to be less than N, to provide better latency.
  • 15. Hinted handoff  Assume N = 3. When A is temporarily down or unreachable during a write, send replica to D.  D is hinted that the replica is belong to A and it will deliver to A when A is recovered.  Again: “always writeable”
  • 16. Other techniques  Replica synchronization:  Merkle hash tree.  Membership and Failure Detection:  Gossip
  • 17. Implementation  Java  Local persistence component allows for different storage engines to be plugged in:  Berkeley Database (BDB) Transactional Data Store: object of tens of kilobytes  MySQL: object of > tens of kilobytes  BDB Java Edition, etc.