Your SlideShare is downloading. ×
0
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Dynamo db
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Dynamo db

3,177

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,177
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
89
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Dynamo:Amazon’s Highly Available Key-value Store & Amazon DynamoDB Presented by: Zuhair Khayyat
  • 2. What is Dynamo● Dynamo is an eventually-consistent key-value storage system used in Amazons web services to support scalable highly available data access.● Dynamo is used to mainly to manage the state of services, such as S3 and e-commerce.● Optimized for availability (always on experience) to maximize customer satisfaction in trade of: – Data consistency – Durability – Performance Dynamo & DynamoDB
  • 3. Dynamo: Why not relational database● Many services on Amazon’s platform that requires high reliability requirements only need primary-key access to a data store.● Relational databases are highly optimized for complex query processing, however they have limited scalability and chose consistency over availability.● The complicated features of relational databases requires expensive hardware and very skillful administrators. Dynamo & DynamoDB
  • 4. Dynamo: Amazons Requirements● Simple reads and writes to binary objects not larger than 1 MB while no operation spans for multiple data.● Very fast data access, (<300) ms response time.● Heterogeneous commodity hardware infrastructure.● Used by decentralized, loosely coupled services.● Highly available (always on); expect small frequent network and server failures. Dynamo & DynamoDB
  • 5. Dynamo: Consistency and Replication● Strong data consistency and high data availability cannot be achieved simultaneously.● “Dynamo is designed to be an eventually consistent data store; that is all updates reach all replicas eventually.”● “always writable” data store, do not reject write operations if data is inconsistent. – Imagine you are ordering form Amazon.com and the website rejects adding an item to your cart!● Conflict resolution: The application is responsible too resolve the data conflicts. Dynamo & DynamoDB
  • 6. Dynamo VS Bigtable Dynamo Bigtable Cluster Setup decentralized Centralized (GFS) Data Access (Primary-key, version*) (row key,col key,timestamp)Data Partitioning and Load Customized Consistency 64K partitions stored in Balancing Hashing least utilized machines (GFS) Data Query Zero-hop DHT Ask the Master (GFS) Read Operation Multiple copies read Single copy read Typical Value size Less than 1 MB Not specified (GFS) Writes operation on Accept all write operations Make data unavailable until inconsistence Data and resolve conflicts consistent (GFS) Dynamo & DynamoDB
  • 7. Dynamo: Interface● Key-value storage system with operators: – get(key): returns a single or a list of objects with conflicting versions – put(key,context,object): place the object and write its replicas to disk. Context contains information about the object such as the version.● MD5 hashing is applied on the key to generate 128-bit identifier. Dynamo & DynamoDB
  • 8. Dynamo: Partitioning● Dynamo is designed to scale incrementally one machine at a time.● Consistent hashing generates a fixed output space constructed as a ring.● A variant of consistent hashing (virtual nodes) is used by Dynamo to dynamically repartition and load balance the data over the storage hosts.● Each storage host acquires data depending on its capacity. Dynamo & DynamoDB
  • 9. Dynamo: Consistent Hashing A H [1,10] [71,80] D [11.20] A G H [1,10][61,70] E [71,80] D [21.30] [11.20] G C [61,70] [51,60] B E F [31.40] [21.30] [41,50] C [55,60] B Adding a node [31.40] (storage host) I F [47,54] [41,46] Dynamo & DynamoDB
  • 10. Dynamo: Variant of Consistent Hashing A D* [1,10] [71,80] D [11.20] A B* D* [1,10] D[61,70] C* [71,80] [11.16] [21.30] B* E C [61,70] [17,24] [51,60] B A* [31.40] C* [41,50] C [25.30] [55,60] Adding a node B (storage host) E* [31.40] A* [47,54] Dynamo & DynamoDB [41,46]
  • 11. Dynamo: Replication● Each key (k) is assigned to a coordinator node (i).● Each value (v) is replicated to (N-1) clockwise successor logical nodes in the ring.● Node (i) is responsible to update all other (N-1) replicas for the keys it owns.● Each key (k) has a preference list of physical nodes that are responsible to maintain and access the keys data Dynamo & DynamoDB
  • 12. Dynamo: Data Versioning● Eventual consistency protocol is used to update all data replicas asynchronously.● put() is returned before updating all replicas.● get() can return multiple versions for the same key.● Dynamo track each data mutation as a new version version to support “write always” protocol.● Dynamo uses vector clocks protocol for versioning. Dynamo & DynamoDB
  • 13. Dynamo: vector clocks example 1 Value=100A A:1BC Dynamo & DynamoDB
  • 14. Dynamo: vector clocks example 1 Value=100A A:1 +1 Value=101B A:1,B:1C Dynamo & DynamoDB
  • 15. Dynamo: vector clocks example 1 Value=100A A:1 +1 Value=101B A:1,B:1 +4 Value=105C A:1,B:1,C:1 Dynamo & DynamoDB
  • 16. Dynamo: vector clocks example 1 Value=100A A:1 +1 Value=101 Value=205B A:1,B:1 A:1,B:2,C:1 +4 Value=105 +100C A:1,B:1,C:1 Dynamo & DynamoDB
  • 17. Dynamo: vector clocks example 1 Value=100A A:1 +1 Value=101 Value=205B A:1,B:1 A:1,B:2,C:1 +4 +110 Value=105 +100 Value=315C A:1,B:1,C:1 A:1,B:2,C:2 Dynamo & DynamoDB
  • 18. Dynamo: vector clocks example 2 Value=100A A:1 +1 Value=101 +100 Value=201B A:1,B:1 A:1,B:2 +4 +110 Value=105 Value=311C A:1,B:1,C:1 +110 A:1,B:2,C:1 Conflict! Dynamo & DynamoDB Value=215 A:1,B:1,C:2
  • 19. Dynamo: resolving conflicts● Syntactic reconciliation: – The Application is able to resolve the conflict automatically● Semantic reconciliation: – Merge results from different conflicts, make the user revise the new values. – Example: Amazons shopping cart: ● Preserve “Add to cart” items. ● Deleted items can resurface. Dynamo & DynamoDB
  • 20. Dynamo: Processing put() & get()● The user is able to issue commands with either of the following scenarios: – A generic load balancer is invoked to direct the users requests to the least utilization. – Use a partition-aware library to direct the request to one of the data owners directly.● The system requires two configurable values: – R: the number of available healthy nodes required for a successful reads – W: the number of available healthy nodes required for a successful write. Dynamo & DynamoDB
  • 21. Dynamo: Hinted Handoff● Assuming N=3, a failed put() operation on node A is temporarily handled by B.● After A recovers, B sends the result of put() operation back to A.● Advantage: temporarily A D failure has minimal effect D on the application. A C C Dynamo & DynamoDB B A
  • 22. Dynamo: Scalability● Adding or removing the node requires a third party tool or direct user interaction.● Gossip-based protocol is used to propagate membership throughout the cluster and to detect failures.● Replica synchronization is done using Merkle hash tree. Dynamo & DynamoDB
  • 23. Dynamo: Peak Performance● Shopping Cart Service at a holiday: – 10 Million requests – 3 million checkouts – 100000+ concurrent sessions – No downtime! Dynamo & DynamoDB
  • 24. Dynamo DBDynamo & DynamoDB
  • 25. What is DynamoDB● A NoSQL database service available publicly through amazons EC2; released on 2012.● Based on Dynamo, a scalable highly available (key, value) storage system used by Amazons servers; published in SOSP 2007● Dynamo & DynamoDB
  • 26. DynamoDB: Data Model● The database is a collection of tables.● A table is a collection of items.● An item is a collection of attributes.● Primary key is required.● No nulls or empty Strings.● No schema is required, items can vary in the number of attributes.. How it is possible? Dynamo & DynamoDB
  • 27. DynamoDB: Example ● Table name: ProductCatalog{ Id = 101 { Id = 202 ProductName = "Book 101 Title" ProductName = "21-Bicycle 202" ISBN = "111-1111111111" Description = "202 description" Authors = [ "Author 1","Author 2" ] BicycleType = "Road" Price = -2 Brand = "Brand-Company A" Dimensions = "8.5 x 11.0 x 0.5" Price = 200 PageCount = 500 Gender = "M" InPublication = 1 Color = [ "Green", "Black" ] ProductCategory = "Book" ProductCategory = "Bike"} }{ Id = 201 ProductName = "18-Bicycle 201" Description = "201 description" BicycleType = "Road" Brand = "Brand-Company A" Price = 100 Gender = "M" Color = [ "Red", "Black" ] ProductCategory = "Bike"}
  • 28. DynamoDB: Example● Storage in Dynamo: – <Tabel_List, {ProductCatalog,....}> – <ProductCatalog, {101,102,201,202}> – <101, {ProductName={},ISBN={},Authors={}...}> – or – – <Tabel_List, {ProductCatalog,....}> – <ProductCatalog, {101,102,201,202}> – <101, {ProductName,ISBN,Authors...}> – <101_Authors,{Author 1,Author 2}>1 Dynamo & DynamoDB
  • 29. DynamoDB: Table Primary Keys● A table in DynamoDB must have a primary key.● A primary key can be either “hash only” or hash and range.● DynamoDB uses unsorted hash index, while the range index is sorted.● Hash only primary key is based on only a single attribute.● Hash and range primary key is based on two attributes.● Data types: – Scalar data types: Number, String, and Binary. – Multi-valued types: String Set, Number Set, and Binary Set. Dynamo & DynamoDB
  • 30. DynamoDB: Read operation● Availability and durability are maintained through data replication.● Updating all the replicas after data mutation requires some latency; DynamoDB eventually will synchronize all the replicas.● DynamoDB supports two read operations: – Eventually consistent read ● Does not necessarily reflects the last data mutation. ● Very fast data access; not affected by failures. – Consistent read ● Always reflects the last data access. ● Wait for data to be consistent in all replicas; affected by network and storage failures.
  • 31. DynamoDB: Similar services● Datastore on Google Appengine● Cloudant Data Layer (CouchDB) Dynamo & DynamoDB
  • 32. DynamoDB: try it today Dynamo & DynamoDB

×