Under the Covers of DynamoDB                Philip Fitzsimons    Manager, Solution Architecture
Overview 1. Getting started 2. Customer story: Royal Opera House 3. Data modeling 4. Partitioning 5. Reporting & Analytics
Getting started
DynamoDB is a managedNoSQL database service.Store and retrieve any amount of data.  Serve any level of request traffic.
Without the operational burden.
Consistent, predictable performance.        Single digit millisecond latency.         Backed on solid-state drives.
Flexible data model.Key/attribute pairs. No schema required.    Easy to create. Easy to adjust.
Seamless scalability.No table size limits. Unlimited storage.            No downtime.
Durable.             Consistent, disk only writes.Replication across data centers and availability zones.
Without the operational burden.
Focus on your app.
Two decisions + three clicks     = ready for use
Level of throughput        Primary keysTwo decisions + three clicks     = ready for use
Level of throughput        Primary keysTwo decisions + three clicks     = ready for use
Provisioned throughput.Reserve IOPS for reads and writes.  Scale up for down at any time.
Pay per capacity unit.Priced per hour of provisioned throughput.
Write throughput.Size of item x writes per second   $0.0065 for 10 write units
Consistent writes.        Atomic increment and decrement.Optimistic concurrency control: conditional writes.
Transactions.    Item level transactions only.Puts, updates and deletes are ACID.
Strong or eventual consistency Read throughput.
Strong or eventual consistency           Read throughput.Provisioned units = size of item x reads per second          $0.0...
Strong or eventual consistency           Read throughput.Provisioned units = size of item x reads per second              ...
Strong or eventual consistency Read throughput.  Same latency expectations. Mix and match at ‘read time’.
Customer storyRoyal Opera House
Royal Opera House     Rob Greig       CTO      @rob_greig
StrategyInnovation usingopen data andcollaborative working       Web 3.0?
>600 pages            Inconsistent structure…and this doesn’t include microsites
Manual interlinkingPoor discoverability
Coherent and consistent linking      throughout the complete           production lifecycle
iPad
Open Data
Customer Service
DynamoDB
Provisioned throughput ismanaged by DynamoDB.
Data is partitioned andmanaged by DynamoDB.
Reserved capacity.Up to 53% for 1 year reservation.Up to 76% for 3 year reservation.
Authentication.   Session based to minimize latency.Uses the Amazon Security Token Service.        Handled by AWS SDKs.   ...
Monitoring.             CloudWatch metrics:latency, consumed read and write throughput,             errors and throttling.
Libraries, mappers and mocks.  ColdFusion, Django, Erlang, Java, .Net,     Node.js, Perl, PHP, Python, Ruby         http:/...
Data Modeling
date = 2012-05-16-id = 100        09-00-10        total = 25.00           date = 2012-05-15-id = 101        15-00-11      ...
Table           date = 2012-05-16-id = 100        09-00-10        total = 25.00           date = 2012-05-15-id = 101      ...
date = 2012-05-16-id = 100        09-00-10        total = 25.00                                                 Item      ...
date = 2012-05-16-id = 100        09-00-10         total = 25.00                                Attribute           date =...
Where is the schema?Tables do not require a formal schema.  Items are an arbitrarily sized hash.
Indexing.Items are indexed by primary and secondary keys.        Primary keys can be composite.    Secondary keys index on...
ID                Date                 Totalid = 100   date = 2012-05-16-09-00-10   total = 25.00id = 101   date = 2012-05...
Hash key   ID                Date                 Totalid = 100   date = 2012-05-16-09-00-10   total = 25.00id = 101   dat...
Hash key                       Range key   ID                              Date               Total           Composite pr...
Hash key          Range key             Secondary range key   ID                Date                      Totalid = 100   ...
Programming DynamoDB.  Small but perfectly formed API.
CreateTable           PutItemUpdateTable           GetItemDeleteTable        UpdateItemDescribeTable       DeleteItemListT...
CreateTable           PutItemUpdateTable           GetItemDeleteTable        UpdateItemDescribeTable       DeleteItemListT...
CreateTable           PutItemUpdateTable           GetItemDeleteTable        UpdateItemDescribeTable       DeleteItemListT...
Conditional updates.PutItem, UpdateItem, DeleteItem can take     optional conditions for operation.UpdateItem performs ato...
One API call, multiple items       BatchGet returns multiple items by key.BatchWrite performs up to 25 put or delete opera...
CreateTable           PutItemUpdateTable           GetItemDeleteTable        UpdateItemDescribeTable       DeleteItemListT...
Query vs ScanQuery for Composite Key queries.Scan for full table scans, exports. Both support pages and limits.Maximum res...
Query patterns   Retrieve all items by hash key.        Range key conditions:==, <, >, >=, <=, begins with, between.  Coun...
EXAMPLE 1:Mapping relationships.
Players user_id =   location =    joined =   mza       Cambridge    2011-07-04 user_id =   location =    joined =  jeffbar...
Players user_id =   location =     joined =   mza       Cambridge     2011-07-04 user_id =   location =     joined =  jeff...
Players user_id =   location =     joined =   mza       Cambridge     2011-07-04 user_id =   location =     joined =  jeff...
Players user_id =   location =     joined =   mza       Cambridge     2011-07-04 user_id =   location =     joined =      ...
Players user_id =   location =     joined =   mza       Cambridge     2011-07-04 user_id =   location =     joined =      ...
EXAMPLE 2:Storing large items.
Unlimited storage.Unlimited attributes per item. Unlimited items per table. Maximum of 64k per item.
Split across items.                                        message =message_id = 1          part = 1                      ...
Store a pointer to S3.                                 message =message_id = 1                        http://s3.amazonaws....
EXAMPLE 3:Time series data
April                Hot and cold tables.        event_id =          timestamp =       key =          1000          2013-0...
December   January   February   March   April
Archive data.  Move old data to S3: lower cost.    Still available for analytics.Run queries across hot and cold data     ...
Partitioning
Uniform workload.         Data stored across multiple partitions.       Data is primarily distributed by primary key.Provi...
To achieve and maintain full provisioned throughput, spreadworkload evenly across hash keys.
Non-Uniform workload.Might be throttled, even at high levels of throughput.
BEST PRACTICE 1:Distinct values for hash keys.   Hash key elements should have a    high number of distinct values.
Lots of users with unique user_id.Workload well distributed across hash key.user_id =        first_name =     last_name = ...
BEST PRACTICE 2:Avoid limited hash key values.    Hash key elements should have a     high number of distinct values.
Small number of status codes.Unevenly, non-uniform workload.status =                   date =  200                2012-04-...
BEST PRACTICE 3:Model for even distribution.Access by hash key value should be evenly     distributed across the dataset.
Large number of devices.Small number which are much more popular than others.           Workload unevenly distributed.    ...
Sample access pattern.Workload randomized by hash key.  mobile_id =          access_date =    100.1           2012-04-01-0...
Reporting & Analytics
Seamless scale.Scalable methods for data processing.Scalable methods for backup/restore.
Amazon Elastic MapReduce.    Managed Hadoop service for     data-intensive workflows.       aws.amazon.com/emr
create external table items_db (id string, votes bigint, views bigint) stored by org.apache.hadoop.hive.dynamodb.DynamoDBS...
select id, likes, viewsfrom items_dborder by views desc;
Summary 1. Getting started 2. Customer story: Royal Opera House 3. Data modeling 4. Partitioning 5. Reporting & Analytics
Free tier.
aws.amazon.com/dynamodb
Thank you!philipfs@amazon.co.uk
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Upcoming SlideShare
Loading in...5
×

Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

3,519

Published on

In this presentation you'll learn about the decisions that went into designing and building Amazon DynamoDB, and how it allows you to stay focused on your application while enjoying single digit latencies at any scale. We'll dive deep on how to model data, maintain maximum throughput, and drive analytics against your data, while profiling real world use cases, tips and tricks from customers running on Amazon DynamoDB today.

Phil Fitzsimons, Solution Architect, AWS
Rob Greig, CTO, Royal Opera House

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,519
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
70
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

  1. 1. Under the Covers of DynamoDB Philip Fitzsimons Manager, Solution Architecture
  2. 2. Overview 1. Getting started 2. Customer story: Royal Opera House 3. Data modeling 4. Partitioning 5. Reporting & Analytics
  3. 3. Getting started
  4. 4. DynamoDB is a managedNoSQL database service.Store and retrieve any amount of data. Serve any level of request traffic.
  5. 5. Without the operational burden.
  6. 6. Consistent, predictable performance. Single digit millisecond latency. Backed on solid-state drives.
  7. 7. Flexible data model.Key/attribute pairs. No schema required. Easy to create. Easy to adjust.
  8. 8. Seamless scalability.No table size limits. Unlimited storage. No downtime.
  9. 9. Durable. Consistent, disk only writes.Replication across data centers and availability zones.
  10. 10. Without the operational burden.
  11. 11. Focus on your app.
  12. 12. Two decisions + three clicks = ready for use
  13. 13. Level of throughput Primary keysTwo decisions + three clicks = ready for use
  14. 14. Level of throughput Primary keysTwo decisions + three clicks = ready for use
  15. 15. Provisioned throughput.Reserve IOPS for reads and writes. Scale up for down at any time.
  16. 16. Pay per capacity unit.Priced per hour of provisioned throughput.
  17. 17. Write throughput.Size of item x writes per second $0.0065 for 10 write units
  18. 18. Consistent writes. Atomic increment and decrement.Optimistic concurrency control: conditional writes.
  19. 19. Transactions. Item level transactions only.Puts, updates and deletes are ACID.
  20. 20. Strong or eventual consistency Read throughput.
  21. 21. Strong or eventual consistency Read throughput.Provisioned units = size of item x reads per second $0.0065 per hour for 50 units
  22. 22. Strong or eventual consistency Read throughput.Provisioned units = size of item x reads per second 2 $0.0065 per hour for 100 units
  23. 23. Strong or eventual consistency Read throughput. Same latency expectations. Mix and match at ‘read time’.
  24. 24. Customer storyRoyal Opera House
  25. 25. Royal Opera House Rob Greig CTO @rob_greig
  26. 26. StrategyInnovation usingopen data andcollaborative working Web 3.0?
  27. 27. >600 pages Inconsistent structure…and this doesn’t include microsites
  28. 28. Manual interlinkingPoor discoverability
  29. 29. Coherent and consistent linking throughout the complete production lifecycle
  30. 30. iPad
  31. 31. Open Data
  32. 32. Customer Service
  33. 33. DynamoDB
  34. 34. Provisioned throughput ismanaged by DynamoDB.
  35. 35. Data is partitioned andmanaged by DynamoDB.
  36. 36. Reserved capacity.Up to 53% for 1 year reservation.Up to 76% for 3 year reservation.
  37. 37. Authentication. Session based to minimize latency.Uses the Amazon Security Token Service. Handled by AWS SDKs. Integrates with IAM.
  38. 38. Monitoring. CloudWatch metrics:latency, consumed read and write throughput, errors and throttling.
  39. 39. Libraries, mappers and mocks. ColdFusion, Django, Erlang, Java, .Net, Node.js, Perl, PHP, Python, Ruby http://j.mp/dynamodb-libs
  40. 40. Data Modeling
  41. 41. date = 2012-05-16-id = 100 09-00-10 total = 25.00 date = 2012-05-15-id = 101 15-00-11 total = 35.00 date = 2012-05-16-id = 101 12-00-10 total = 100.00
  42. 42. Table date = 2012-05-16-id = 100 09-00-10 total = 25.00 date = 2012-05-15-id = 101 15-00-11 total = 35.00 date = 2012-05-16-id = 101 12-00-10 total = 100.00
  43. 43. date = 2012-05-16-id = 100 09-00-10 total = 25.00 Item date = 2012-05-15-id = 101 15-00-11 total = 35.00 date = 2012-05-16-id = 101 12-00-10 total = 100.00
  44. 44. date = 2012-05-16-id = 100 09-00-10 total = 25.00 Attribute date = 2012-05-15-id = 101 15-00-11 total = 35.00 date = 2012-05-16-id = 101 12-00-10 total = 100.00
  45. 45. Where is the schema?Tables do not require a formal schema. Items are an arbitrarily sized hash.
  46. 46. Indexing.Items are indexed by primary and secondary keys. Primary keys can be composite. Secondary keys index on other attributes.
  47. 47. ID Date Totalid = 100 date = 2012-05-16-09-00-10 total = 25.00id = 101 date = 2012-05-15-15-00-11 total = 35.00id = 101 date = 2012-05-16-12-00-10 total = 100.00id = 102 date = 2012-03-20-18-23-10 total = 20.00id = 102 date = 2012-03-20-18-23-10 total = 120.00
  48. 48. Hash key ID Date Totalid = 100 date = 2012-05-16-09-00-10 total = 25.00id = 101 date = 2012-05-15-15-00-11 total = 35.00id = 101 date = 2012-05-16-12-00-10 total = 100.00id = 102 date = 2012-03-20-18-23-10 total = 20.00id = 102 date = 2012-03-20-18-23-10 total = 120.00
  49. 49. Hash key Range key ID Date Total Composite primary keyid = 100 date = 2012-05-16-09-00-10 total = 25.00id = 101 date = 2012-05-15-15-00-11 total = 35.00id = 101 date = 2012-05-16-12-00-10 total = 100.00id = 102 date = 2012-03-20-18-23-10 total = 20.00id = 102 date = 2012-03-20-18-23-10 total = 120.00
  50. 50. Hash key Range key Secondary range key ID Date Totalid = 100 date = 2012-05-16-09-00-10 total = 25.00id = 101 date = 2012-05-15-15-00-11 total = 35.00id = 101 date = 2012-05-16-12-00-10 total = 100.00id = 102 date = 2012-03-20-18-23-10 total = 20.00id = 102 date = 2012-03-20-18-23-10 total = 120.00
  51. 51. Programming DynamoDB. Small but perfectly formed API.
  52. 52. CreateTable PutItemUpdateTable GetItemDeleteTable UpdateItemDescribeTable DeleteItemListTables BatchGetItemQuery BatchWriteItemScan
  53. 53. CreateTable PutItemUpdateTable GetItemDeleteTable UpdateItemDescribeTable DeleteItemListTables BatchGetItemQuery BatchWriteItemScan
  54. 54. CreateTable PutItemUpdateTable GetItemDeleteTable UpdateItemDescribeTable DeleteItemListTables BatchGetItemQuery BatchWriteItemScan
  55. 55. Conditional updates.PutItem, UpdateItem, DeleteItem can take optional conditions for operation.UpdateItem performs atomic increments.
  56. 56. One API call, multiple items BatchGet returns multiple items by key.BatchWrite performs up to 25 put or delete operations. Throughput is measured by IO, not API calls.
  57. 57. CreateTable PutItemUpdateTable GetItemDeleteTable UpdateItemDescribeTable DeleteItemListTables BatchGetItemQuery BatchWriteItemScan
  58. 58. Query vs ScanQuery for Composite Key queries.Scan for full table scans, exports. Both support pages and limits.Maximum response is 1Mb in size.
  59. 59. Query patterns Retrieve all items by hash key. Range key conditions:==, <, >, >=, <=, begins with, between. Counts. Top and bottom n values. Paged responses.
  60. 60. EXAMPLE 1:Mapping relationships.
  61. 61. Players user_id = location = joined = mza Cambridge 2011-07-04 user_id = location = joined = jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15
  62. 62. Players user_id = location = joined = mza Cambridge 2011-07-04 user_id = location = joined = jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15Scores user_id = game = score = mza angry-birds 11,000 user_id = game = score = mza tetris 1,223,000 user_id = location = score = werner bejewelled 55,000
  63. 63. Players user_id = location = joined = mza Cambridge 2011-07-04 user_id = location = joined = jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15Scores Leader boards user_id = game = score = game = score = user_id = mza angry-birds 11,000 angry-birds 11,000 mza user_id = game = score = game = score = user_id = mza tetris 1,223,000 tetris 1,223,000 mza user_id = location = score = game = score = user_id = werner bejewelled 55,000 tetris 9,000,000 jeffbarr
  64. 64. Players user_id = location = joined = mza Cambridge 2011-07-04 user_id = location = joined = Query for scores jeffbarr Seattle 2012-01-20 by user user_id = location = joined = werner Worldwide 2011-05-15Scores Leader boards user_id = game = score = game = score = user_id = mza angry-birds 11,000 angry-birds 11,000 mza user_id = game = score = game = score = user_id = mza tetris 1,223,000 tetris 1,223,000 mza user_id = location = score = game = score = user_id = werner bejewelled 55,000 tetris 9,000,000 jeffbarr
  65. 65. Players user_id = location = joined = mza Cambridge 2011-07-04 user_id = location = joined = High scores by game jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15Scores Leader boards user_id = game = score = game = score = user_id = mza angry-birds 11,000 angry-birds 11,000 mza user_id = game = score = game = score = user_id = mza tetris 1,223,000 tetris 1,223,000 mza user_id = location = score = game = score = user_id = werner bejewelled 55,000 tetris 9,000,000 jeffbarr
  66. 66. EXAMPLE 2:Storing large items.
  67. 67. Unlimited storage.Unlimited attributes per item. Unlimited items per table. Maximum of 64k per item.
  68. 68. Split across items. message =message_id = 1 part = 1 <first 64k> message =message_id = 1 part = 2 <second 64k> joined =message_id = 1 part = 3 <third 64k>
  69. 69. Store a pointer to S3. message =message_id = 1 http://s3.amazonaws.com... message =message_id = 2 http://s3.amazonaws.com... message =message_id = 3 http://s3.amazonaws.com...
  70. 70. EXAMPLE 3:Time series data
  71. 71. April Hot and cold tables. event_id = timestamp = key = 1000 2013-04-16-09-59-01 value event_id = timestamp = key = 1001 2013-04-16-09-59-02 value event_id = timestamp = key = 1002 2013-04-16-09-59-02 valueMarch event_id = timestamp = key = 1000 2013-03-01-09-59-01 value event_id = timestamp = key = 1001 2013-03-01-09-59-02 value event_id = timestamp = key = 1002 2013-03-01-09-59-02 value
  72. 72. December January February March April
  73. 73. Archive data. Move old data to S3: lower cost. Still available for analytics.Run queries across hot and cold data with Elastic MapReduce.
  74. 74. Partitioning
  75. 75. Uniform workload. Data stored across multiple partitions. Data is primarily distributed by primary key.Provisioned throughput is divided evenly across partitions.
  76. 76. To achieve and maintain full provisioned throughput, spreadworkload evenly across hash keys.
  77. 77. Non-Uniform workload.Might be throttled, even at high levels of throughput.
  78. 78. BEST PRACTICE 1:Distinct values for hash keys. Hash key elements should have a high number of distinct values.
  79. 79. Lots of users with unique user_id.Workload well distributed across hash key.user_id = first_name = last_name = mza Matt Wooduser_id = first_name = last_name = jeffbarr Jeff Barruser_id = first_name = last_name = werner Werner Vogelsuser_id = first_name = last_name = simone Simone Brunozzi ... ... ...
  80. 80. BEST PRACTICE 2:Avoid limited hash key values. Hash key elements should have a high number of distinct values.
  81. 81. Small number of status codes.Unevenly, non-uniform workload.status = date = 200 2012-04-01-00-00-01status = date = 404 2012-04-01-00-00-01 status date = 404 2012-04-01-00-00-01status = date = 404 2012-04-01-00-00-01
  82. 82. BEST PRACTICE 3:Model for even distribution.Access by hash key value should be evenly distributed across the dataset.
  83. 83. Large number of devices.Small number which are much more popular than others. Workload unevenly distributed. mobile_id = access_date = 100 2012-04-01-00-00-01 mobile_id = access_date = 100 2012-04-01-00-00-02 mobile_id = access_date = 100 2012-04-01-00-00-03 mobile_id = access_date = 100 2012-04-01-00-00-04 ... ...
  84. 84. Sample access pattern.Workload randomized by hash key. mobile_id = access_date = 100.1 2012-04-01-00-00-01 mobile_id = access_date = 100.2 2012-04-01-00-00-02 mobile_id = access_date = 100.3 2012-04-01-00-00-03 mobile_id = access_date = 100.4 2012-04-01-00-00-04 ... ...
  85. 85. Reporting & Analytics
  86. 86. Seamless scale.Scalable methods for data processing.Scalable methods for backup/restore.
  87. 87. Amazon Elastic MapReduce. Managed Hadoop service for data-intensive workflows. aws.amazon.com/emr
  88. 88. create external table items_db (id string, votes bigint, views bigint) stored by org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler tblproperties ("dynamodb.table.name" = "items", "dynamodb.column.mapping" = "id:id,votes:votes,views:views");
  89. 89. select id, likes, viewsfrom items_dborder by views desc;
  90. 90. Summary 1. Getting started 2. Customer story: Royal Opera House 3. Data modeling 4. Partitioning 5. Reporting & Analytics
  91. 91. Free tier.
  92. 92. aws.amazon.com/dynamodb
  93. 93. Thank you!philipfs@amazon.co.uk
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×