Dynamo db


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Dynamo db

  1. 1. Dynamo:Amazon’s Highly Available Key-value Store & Amazon DynamoDB Presented by: Zuhair Khayyat
  2. 2. What is Dynamo● Dynamo is an eventually-consistent key-value storage system used in Amazons web services to support scalable highly available data access.● Dynamo is used to mainly to manage the state of services, such as S3 and e-commerce.● Optimized for availability (always on experience) to maximize customer satisfaction in trade of: – Data consistency – Durability – Performance Dynamo & DynamoDB
  3. 3. Dynamo: Why not relational database● Many services on Amazon’s platform that requires high reliability requirements only need primary-key access to a data store.● Relational databases are highly optimized for complex query processing, however they have limited scalability and chose consistency over availability.● The complicated features of relational databases requires expensive hardware and very skillful administrators. Dynamo & DynamoDB
  4. 4. Dynamo: Amazons Requirements● Simple reads and writes to binary objects not larger than 1 MB while no operation spans for multiple data.● Very fast data access, (<300) ms response time.● Heterogeneous commodity hardware infrastructure.● Used by decentralized, loosely coupled services.● Highly available (always on); expect small frequent network and server failures. Dynamo & DynamoDB
  5. 5. Dynamo: Consistency and Replication● Strong data consistency and high data availability cannot be achieved simultaneously.● “Dynamo is designed to be an eventually consistent data store; that is all updates reach all replicas eventually.”● “always writable” data store, do not reject write operations if data is inconsistent. – Imagine you are ordering form Amazon.com and the website rejects adding an item to your cart!● Conflict resolution: The application is responsible too resolve the data conflicts. Dynamo & DynamoDB
  6. 6. Dynamo VS Bigtable Dynamo Bigtable Cluster Setup decentralized Centralized (GFS) Data Access (Primary-key, version*) (row key,col key,timestamp)Data Partitioning and Load Customized Consistency 64K partitions stored in Balancing Hashing least utilized machines (GFS) Data Query Zero-hop DHT Ask the Master (GFS) Read Operation Multiple copies read Single copy read Typical Value size Less than 1 MB Not specified (GFS) Writes operation on Accept all write operations Make data unavailable until inconsistence Data and resolve conflicts consistent (GFS) Dynamo & DynamoDB
  7. 7. Dynamo: Interface● Key-value storage system with operators: – get(key): returns a single or a list of objects with conflicting versions – put(key,context,object): place the object and write its replicas to disk. Context contains information about the object such as the version.● MD5 hashing is applied on the key to generate 128-bit identifier. Dynamo & DynamoDB
  8. 8. Dynamo: Partitioning● Dynamo is designed to scale incrementally one machine at a time.● Consistent hashing generates a fixed output space constructed as a ring.● A variant of consistent hashing (virtual nodes) is used by Dynamo to dynamically repartition and load balance the data over the storage hosts.● Each storage host acquires data depending on its capacity. Dynamo & DynamoDB
  9. 9. Dynamo: Consistent Hashing A H [1,10] [71,80] D [11.20] A G H [1,10][61,70] E [71,80] D [21.30] [11.20] G C [61,70] [51,60] B E F [31.40] [21.30] [41,50] C [55,60] B Adding a node [31.40] (storage host) I F [47,54] [41,46] Dynamo & DynamoDB
  10. 10. Dynamo: Variant of Consistent Hashing A D* [1,10] [71,80] D [11.20] A B* D* [1,10] D[61,70] C* [71,80] [11.16] [21.30] B* E C [61,70] [17,24] [51,60] B A* [31.40] C* [41,50] C [25.30] [55,60] Adding a node B (storage host) E* [31.40] A* [47,54] Dynamo & DynamoDB [41,46]
  11. 11. Dynamo: Replication● Each key (k) is assigned to a coordinator node (i).● Each value (v) is replicated to (N-1) clockwise successor logical nodes in the ring.● Node (i) is responsible to update all other (N-1) replicas for the keys it owns.● Each key (k) has a preference list of physical nodes that are responsible to maintain and access the keys data Dynamo & DynamoDB
  12. 12. Dynamo: Data Versioning● Eventual consistency protocol is used to update all data replicas asynchronously.● put() is returned before updating all replicas.● get() can return multiple versions for the same key.● Dynamo track each data mutation as a new version version to support “write always” protocol.● Dynamo uses vector clocks protocol for versioning. Dynamo & DynamoDB
  13. 13. Dynamo: vector clocks example 1 Value=100A A:1BC Dynamo & DynamoDB
  14. 14. Dynamo: vector clocks example 1 Value=100A A:1 +1 Value=101B A:1,B:1C Dynamo & DynamoDB
  15. 15. Dynamo: vector clocks example 1 Value=100A A:1 +1 Value=101B A:1,B:1 +4 Value=105C A:1,B:1,C:1 Dynamo & DynamoDB
  16. 16. Dynamo: vector clocks example 1 Value=100A A:1 +1 Value=101 Value=205B A:1,B:1 A:1,B:2,C:1 +4 Value=105 +100C A:1,B:1,C:1 Dynamo & DynamoDB
  17. 17. Dynamo: vector clocks example 1 Value=100A A:1 +1 Value=101 Value=205B A:1,B:1 A:1,B:2,C:1 +4 +110 Value=105 +100 Value=315C A:1,B:1,C:1 A:1,B:2,C:2 Dynamo & DynamoDB
  18. 18. Dynamo: vector clocks example 2 Value=100A A:1 +1 Value=101 +100 Value=201B A:1,B:1 A:1,B:2 +4 +110 Value=105 Value=311C A:1,B:1,C:1 +110 A:1,B:2,C:1 Conflict! Dynamo & DynamoDB Value=215 A:1,B:1,C:2
  19. 19. Dynamo: resolving conflicts● Syntactic reconciliation: – The Application is able to resolve the conflict automatically● Semantic reconciliation: – Merge results from different conflicts, make the user revise the new values. – Example: Amazons shopping cart: ● Preserve “Add to cart” items. ● Deleted items can resurface. Dynamo & DynamoDB
  20. 20. Dynamo: Processing put() & get()● The user is able to issue commands with either of the following scenarios: – A generic load balancer is invoked to direct the users requests to the least utilization. – Use a partition-aware library to direct the request to one of the data owners directly.● The system requires two configurable values: – R: the number of available healthy nodes required for a successful reads – W: the number of available healthy nodes required for a successful write. Dynamo & DynamoDB
  21. 21. Dynamo: Hinted Handoff● Assuming N=3, a failed put() operation on node A is temporarily handled by B.● After A recovers, B sends the result of put() operation back to A.● Advantage: temporarily A D failure has minimal effect D on the application. A C C Dynamo & DynamoDB B A
  22. 22. Dynamo: Scalability● Adding or removing the node requires a third party tool or direct user interaction.● Gossip-based protocol is used to propagate membership throughout the cluster and to detect failures.● Replica synchronization is done using Merkle hash tree. Dynamo & DynamoDB
  23. 23. Dynamo: Peak Performance● Shopping Cart Service at a holiday: – 10 Million requests – 3 million checkouts – 100000+ concurrent sessions – No downtime! Dynamo & DynamoDB
  24. 24. Dynamo DBDynamo & DynamoDB
  25. 25. What is DynamoDB● A NoSQL database service available publicly through amazons EC2; released on 2012.● Based on Dynamo, a scalable highly available (key, value) storage system used by Amazons servers; published in SOSP 2007● Dynamo & DynamoDB
  26. 26. DynamoDB: Data Model● The database is a collection of tables.● A table is a collection of items.● An item is a collection of attributes.● Primary key is required.● No nulls or empty Strings.● No schema is required, items can vary in the number of attributes.. How it is possible? Dynamo & DynamoDB
  27. 27. DynamoDB: Example ● Table name: ProductCatalog{ Id = 101 { Id = 202 ProductName = "Book 101 Title" ProductName = "21-Bicycle 202" ISBN = "111-1111111111" Description = "202 description" Authors = [ "Author 1","Author 2" ] BicycleType = "Road" Price = -2 Brand = "Brand-Company A" Dimensions = "8.5 x 11.0 x 0.5" Price = 200 PageCount = 500 Gender = "M" InPublication = 1 Color = [ "Green", "Black" ] ProductCategory = "Book" ProductCategory = "Bike"} }{ Id = 201 ProductName = "18-Bicycle 201" Description = "201 description" BicycleType = "Road" Brand = "Brand-Company A" Price = 100 Gender = "M" Color = [ "Red", "Black" ] ProductCategory = "Bike"}
  28. 28. DynamoDB: Example● Storage in Dynamo: – <Tabel_List, {ProductCatalog,....}> – <ProductCatalog, {101,102,201,202}> – <101, {ProductName={},ISBN={},Authors={}...}> – or – – <Tabel_List, {ProductCatalog,....}> – <ProductCatalog, {101,102,201,202}> – <101, {ProductName,ISBN,Authors...}> – <101_Authors,{Author 1,Author 2}>1 Dynamo & DynamoDB
  29. 29. DynamoDB: Table Primary Keys● A table in DynamoDB must have a primary key.● A primary key can be either “hash only” or hash and range.● DynamoDB uses unsorted hash index, while the range index is sorted.● Hash only primary key is based on only a single attribute.● Hash and range primary key is based on two attributes.● Data types: – Scalar data types: Number, String, and Binary. – Multi-valued types: String Set, Number Set, and Binary Set. Dynamo & DynamoDB
  30. 30. DynamoDB: Read operation● Availability and durability are maintained through data replication.● Updating all the replicas after data mutation requires some latency; DynamoDB eventually will synchronize all the replicas.● DynamoDB supports two read operations: – Eventually consistent read ● Does not necessarily reflects the last data mutation. ● Very fast data access; not affected by failures. – Consistent read ● Always reflects the last data access. ● Wait for data to be consistent in all replicas; affected by network and storage failures.
  31. 31. DynamoDB: Similar services● Datastore on Google Appengine● Cloudant Data Layer (CouchDB) Dynamo & DynamoDB
  32. 32. DynamoDB: try it today Dynamo & DynamoDB