Dynamodb
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Dynamodb

on

  • 691 views

 

Statistics

Views

Total Views
691
Views on SlideShare
691
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Dynamodb Presentation Transcript

  • 1. DynamoDB: Data Example userId date value unlockedAchievments hadr-fb 18-07-2012 72 [’10 days’, ‘2 levels day’] hadr-fb 19-07-2012 1 None hadr-fb 20-07-2012 56789 [‘top 10 progress’] Table: ‘Waldo-Scores’ Id platform Name JoinDate Score hadr fb Hadrien 31-02-2011 10 457 hadr G+ Hadrien 18-07-2012 357 pior fb Pior 12-12-2012 18 951 Table: ‘Players’
  • 2. Data types (Lean. . . ) Types single string (utf-8) number (entre 10-128 et 10+126 ) set string (utf-8) number Constraints no “Embeded Documents” no complex types (dates, . . . )
  • 3. Dimensionning 1/2: Big picture Units acces/s ∗ roundUp(kb) ∗ item provisionning updates are. . . constraining Storage tables are “elastic” 64KB max per item overhead = 100o per item
  • 4. Dimensionning 2/2: Traps and constraints TRAPS: Units are divided among each partition. Bigger tables often means higher throughput. Divide tables ? CONSTRAINTS for throughput: absolute min 5 max 10 000 1 single table in UPDATING state increase min 10% max 100% decrease min 10% max once a day
  • 5. Integrated Service 1/3: IAM API level table level (except for “ListTables”) Example: “Fair” Scores table use { "Statement":[{ "Effect":"Allow", "Action":["DynamoDB:DeleteItem", "DynamoDB:PutItem", "DynamoDB:UpdateItem", "DynamoDB:GetItem", "DynamoDB:Query"], "Resource": "arn:aws:DynamoDB:<region>:<account>:table/Scores" }] }
  • 6. Integrated Service 2/3: CloudWatch Metrics: SuccessfulRequestLatency UserErrors SystemErrors ThrottledRequests ConsumedReadCapacityUnits ConsumedWriteCapacityUnits ReturnedItemCount Metric’s context Table Operation ({Put, Delete, Update, Get, BatchGet}Item, Scan, Query)
  • 7. Integrated Service 3/3: EMR out of the scope of this presentation basically, HIVE integrated with DynamoDB => HiveQL use cases: custom index generation export to S3 (backup, data removal) data analysis / aggregation
  • 8. Data access 1/3: GetItem Fastest: primary key(s) 0-1 item Cost = 1 unit Example : ‘Hadrien’ Player of ‘fb’ platform table = conn.get_table(’Players’) item = table.get_item( hash_key=’hadr’, range_key=’fb’ )
  • 9. Data access 2/3: Query Fast primary key range key conditions =, <, >, <=, >=, startsWith 0+ item(s) Cost = 1 unit per returned item Example : All ‘Waldo-Scores’ of ‘hadr-fb’ Player table = conn.get_table(’Waldo-Scores’) item = table.get_item( hash_key=’hadr-fb’, #range_key_condition= )
  • 10. Data access 3/3: Scan Slooooow filter on any key tests ALL the table ! 0+ item(s) Cost = 1/2 unit for each parsed KB ! => Starvations Use case: get a full (small) table. Ex: ‘powerups’ Example : All days where ‘hadr-*’ did better than 100 table = conn.get_table(’Waldo-Scores’) item = table.get_item( scan_filter={ ’userId’: BEGINSWITH(’hadr-’), ’value’: GT(100) })
  • 11. Performance considerations: non indexed data 1/2 De-normalisation Ex: Waldo and Players table :) big picture: data duplication to fit the view point need
  • 12. Performance considerations: non indexed data 2/2 Scan sloooooow (sequential) (bad) unit consumption (sequential) EMR scales (less slow :p) (better) units consumption (parallele) TL;DR Index your data !
  • 13. Eventual vs strong consitence write => propagation ∼ 1s read => may not be up to date . . . Consistence Applications Cost (Units) performance strong critical 1 per KB good eventual aware 1/2 per KB maximal
  • 14. Critical/specific applications Redundancy/backup managed => no need “∼ Snapshot” => EMR + S3 ∼ Transactions conditional operations (idempotent) atomic counter (idempotent BUT strong consistence)
  • 15. API 1/3: Read Method Consistence Description Returns GetItem eventual/strong load by key 0-1 item BatchGetItem eventual/strong same // 0-100 item, 0-1MB Query eventual/strong rangeKey filter 0+ item, 0-1MB Scan eventual any key filter 0+ item, 0-1MB rule: 0-1 filter / eligible key unprocessed => ‘UnprocessedKeys’, ‘LastEvaluatedKey’ consumed units => ‘ConsumedCapacityUnits’ enforce strong consistence => ‘ConsistentRead’
  • 16. API 2/3: Edit Method Consistence Condition Changes PutItem create-replace yes 1 item DeleteItem supprime yes 1 item, 0-1MB BatchWriteItem create-up-del no 1-25 item UpdateItem create-up-del yes 1+ field, 1 item not processed / failure => ‘UnprocessedItems’ condition failed => ‘ConditionalCheckFailed’
  • 17. API 3/3: Structure Method Asynchronous Description CreateTable yes Create table - provision units DeleteTable yes self explanatory DescribeTable no Read size, status, throughput ListTables no Get tables starting with “. . . ” UpdateTables yes Update provisions “DELETING” table might answer requests until deleted
  • 18. TL;DR Let’s make it short :) Amazon scalable fully integrated Constraints throughput provisioning index matters