Building Applications              withDynamoDB An Online Seminar - 16th May 2012Dr Matt Wood, Amazon Web Services
Thank you!
Building Applications with DynamoDB
Building Applications with DynamoDB               Getting started
Building Applications with DynamoDB               Getting started               Data modeling
Building Applications with DynamoDB               Getting started               Data modeling               Partitioning
Building Applications with DynamoDB               Getting started               Data modeling               Partitioning  ...
Getting started with     DynamoDB                 quick review
DynamoDB is a managedNoSQL database service.Store and retrieve any amount of data.Serve any level of request traffic.
Without theoperational burden.
Consistent, predictableperformance.Single digit millisecond latencies.Backed on solid-state drives.
Flexible data model.Key/attribute pairs.No schema required.Easy to create. Easy to adjust.
Seamless scalability.No table size limits. Unlimited storage.No downtime.
Durable.Consistent, disk-only writes.Replication across data centres andavailability zones.
Without theoperational burden.
Without theoperational burden.             FOCUS ON YOUR APP
Two decisions + three clicks= ready for use
P rimary keys +                        t   le v e l of throughpuTwo decisions + three clicks= ready for use
Provisioned throughput.Reserve IOPS for reads and writes.Scale up (or down) at any time.
Pay per capacity unit.Priced per hour ofprovisioned throughput.
Write throughput.Units = size of item x writes/second$0.01 per hour for 10 write units
Consistent writes.Atomic increment/decrement.Optimistic concurrency control.aka: “conditional writes”.
Transactions.Item level transactions only.Puts, updates and deletes are ACID.
strongly consistent      eventually consistentRead throughput.
strongly consistent         eventually consistentRead throughput.Provisioned units =size of item x reads/second$0.01 per h...
strongly consistent         eventually consistentRead throughput.Provisioned units =size of item x reads/second           ...
strongly consistent         eventually consistentRead throughput.Same latency expectations.Mix and match at “read time”.
Two decisions + three clicks= ready for use
Two decisions + three clicks= ready for use
Two decisions + one API call= ready for use
$create_response = $dynamodb->create_table(array(    TableName => ProductCatalog,    KeySchema => array(        HashKeyEle...
Two decisions + one API call= ready for use
Two decisions + one API call= ready for development
Two decisions + one API call= ready for production
Two decisions + one API call= ready for scale
Authentication.Session based to minimize latency.Uses Amazon Security Token Service.Handled by AWS SDKs.Integrates with IAM.
Monitoring.CloudWatch metrics:latency, consumed read and writethroughput, errors and throttling.
Libraries, mappers & mocks.ColdFusion, Django, Erlang, Java, .Net,Node.js, Perl, PHP, Python, Rubyhttp://j.mp/dynamodb-libs
DynamoDB data models
DynamoDB semantics.Tables, items and attributes.
Tables contain items.Unlimited items per table.
Items are a collection ofattributes.Each attribute has a key and a value.An item can have any number ofattributes, up to 6...
Two scalar data types.String: Unicode, UTF8 binary encoding.Number: 38 digit precision.Multi-value strings and numbers.
date =id = 100   2012-05-16-09-00-10   total = 25.00                 date =id = 101   2012-05-15-15-00-11   total = 35.00 ...
Table                 date =id = 100   2012-05-16-09-00-10   total = 25.00                 date =id = 101   2012-05-15-15-...
Item                        date =       id = 100   2012-05-16-09-00-10   total = 25.00                        date =     ...
Attribute                 date =id = 100   2012-05-16-09-00-10   total = 25.00                 date =id = 101   2012-05-15...
Where is the schema?Tables do not require a formal schema.Items are an arbitrary sized hash.Just need to specify the prima...
Items are indexed byprimary key.Single hash keys and composite keys.
Hash Key                   date =  id = 100   2012-05-16-09-00-10   total = 25.00                   date =  id = 101   201...
Range key for queries.Querying items by composite key.
Hash Key + Range Key                   date =  id = 100   2012-05-16-09-00-10   total = 25.00                   date =  id...
Programming DynamoDB.Small but perfectly formed.Whole programming interfacefits on one slide.
CreateTable           PutItemUpdateTable           GetItemDeleteTable        UpdateItemDescribeTable      DeleteItemListTa...
CreateTable           PutItemUpdateTable           GetItemDeleteTable        UpdateItemDescribeTable      DeleteItemListTa...
CreateTable           PutItemUpdateTable           GetItemDeleteTable        UpdateItemDescribeTable      DeleteItemListTa...
Conditional updates.PutItem, UpdateItem, DeleteItem cantake optional conditions for operation.UpdateItem performs atomicin...
CreateTable           PutItemUpdateTable           GetItemDeleteTable        UpdateItemDescribeTable      DeleteItemListTa...
One API call, multiple items.BatchGet returns multiple items byprimary key.BatchWrite performs up to 25 put ordelete opera...
CreateTable           PutItemUpdateTable           GetItemDeleteTable        UpdateItemDescribeTable      DeleteItemListTa...
Query vs ScanQuery for composite key queries.Scan for full table scans, exports.Both support pages and limits.Maximum resp...
Query patterns.Retrieve all items by hash key.Range key conditions:==, <, >, >=, <=, begins with, between.Counts. Top and ...
Modeling patterns
Patterns 1. Mapping relationships with range keys. No cross-table joins in DynamoDB. Use composite keys to model relations...
Data model example: online gaming.Storing scores and leader boards.                                       Players with    ...
Data model example: online gaming.Storing scores and leader boards.                                           Players with...
Data model example: online gaming.Storing scores and leader boards.                                            Players wit...
Data model example: online gaming.Storing scores and leader boards.                                                       ...
Data model example: online gaming.Storing scores and leader boards.  Players: hash key   user_id =   location =     joined...
Data model example: online gaming.Storing scores and leader boards.  Players: hash key   user_id =   location =     joined...
Patterns 2. Handling large items. Unlimited attributes per item. Unlimited items per table. Max 64k per item.
Data model example: large items.Storing more than 64k across items.  Large messages: composite keys         message_id =  ...
Patterns Store a pointer to objects in Amazon S3. Large data stored in S3. Location stored in DynamoDB. 99.999999999% data...
Patterns 3. Managing secondary indices. Not supported by DynamoDB. Create your own.
Data model example: secondary indices.Storing more than 64k across items.  Users: hash key           user_id =            ...
Data model example: secondary indices.Storing more than 64k across items.  Users: hash key           user_id =            ...
Data model example: secondary indices.Storing more than 64k across items.  Users: hash key           user_id =            ...
Data model example: secondary indices.Storing more than 64k across items.  Users: hash key           user_id =            ...
Data model example: secondary indices.Storing more than 64k across items.  Users: hash key           user_id =            ...
Patterns 4. Time series data. Logging, click through, ad views, game play data, application usage. Non-uniform access patt...
Data model example: time series data.Rolling tables for hot and cold data.  Events table: composite keys          event_id...
Data model example: time series data.Rolling tables for hot and cold data.  Events table: composite keys           event_i...
Patterns       Hot and cold tables.Dec    Jan       Feb   Mar   April   May
Patterns       Hot and cold tables.Dec    Jan       Feb   Mar   April       May                                       high...
Patterns       Hot and cold tables.Dec    Jan       Feb   Mar   April       May         lower                         high...
Patterns       Hot and cold tables.Dec    Jan       Feb   Mar   April   May      data to S3,  delete cold tables
Patterns       Hot and cold tables.Jan    Feb       Mar   Apr   May   June
Patterns       Hot and cold tables.Feb    Mar       Apr   May   June   July
Patterns       Hot and cold tables.Mar    Apr       May   June   July   Aug
Patterns       Hot and cold tables.Apr    May       June   July   Aug   Sept
Patterns    Hot and cold tables.May June      July   Aug   Sept   Oct
Patterns Not out of mind. DynamoDB and S3 data can be integrated for analytics. Run queries across hot and cold data with ...
Partitioning best practices
Uniform workloads.DynamoDB divides table data intomultiple partitions.Data is distributed primarily byhash key.Provisioned...
Uniform workloads.To achieve and maintain fullprovisioned throughput for a table,spread your workload evenly acrossthe has...
Non-uniform workloads.Some requests might be throttled,even at high levels of provisionedthroughput.Some best practices...
Patterns 1. Distinct values for hash keys. Hash key elements should have a high number of distinct values.
Data model example: hash key selection.Well distributed work loads  Users          user_id =           first_name =   last...
Data model example: hash key selection.Well distributed work loads  Users          user_id =             first_name =     ...
Patterns 2. Avoid limited hash key values. Hash key elements should have a high number of distinct values.
Data model example: small hash value range.Non-uniform workload. Status responses                    status =           da...
Data model example: small hash value range.Non-uniform workload. Status responses                    status =             ...
Patterns 3. Model for even distribution of access. Access by hash key value should be evenly distributed across the dataset.
Data model example: uneven access pattern by key.Non-uniform access workload. Devices              mobile_id =           a...
Data model example: uneven access pattern by key.Non-uniform access workload. Devices              mobile_id =            ...
Data model example: randomize access pattern by key.Towards a uniform workload. Devices             mobile_id =           ...
Design for a uniformworkload.
Analytics with DynamoDB
Seamless scale.Scalable methods for data processing.Scalable methods for backup/restore.
Amazon Elastic MapReduce.Managed Hadoop service fordata-intensive workflows.http://aws.amazon.com/emr
Hadoop under the hood.Take advantage of the Hadoopecosystem: streaming interfaces,Hive, Pig, Mahout.
Distributed data processing.API driven. Analytics at any scale.
Query flexibility with Hive.create external table items_db  (id string, votes bigint, views bigint) stored by  org.apache....
Query flexibility with Hive.select id, likes, viewsfrom items_dborder by views desc;
Data export/import.Use EMR for backup and restoreto Amazon S3.
Data export/import.CREATE EXTERNAL TABLE orders_s3_new_export ( order_idstring, customer_id string, order_date int, totald...
Integrate live andarchive dataRun queries across external Hive tableson S3 and DynamoDB.Live & archive. Metadata & big obj...
In summary...DynamoDBPredictable performanceProvisioned throughputLibraries & mappers
In summary...DynamoDBPredictable performanceProvisioned throughputLibraries & mappersData modelingTables & itemsRead & wri...
In summary...DynamoDB                                 PartitioningPredictable performance                  Automatic parti...
In summary...DynamoDB                                 PartitioningPredictable performance                  Automatic parti...
DynamoDB free tier5 writes, 10 consistent reads per second           100Mb of storage
aws.amazon.com/dynamodbaws.amazon.com/documentation/dynamodb         best practice + sample code
Thank you!
Q&Amatthew@amazon.com      @mza
Building Applications with DynamoDB
Building Applications with DynamoDB
Building Applications with DynamoDB
Building Applications with DynamoDB
Upcoming SlideShare
Loading in...5
×

Building Applications with DynamoDB

19,461

Published on

Amazon DynamoDB is a managed NoSQL database. These slides introduce DynamoDB and discuss best practices for data modeling and primary key selection.

Published in: Technology, Business
0 Comments
48 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
19,461
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
467
Comments
0
Likes
48
Embeds 0
No embeds

No notes for slide

Transcript of "Building Applications with DynamoDB"

  1. 1. Building Applications withDynamoDB An Online Seminar - 16th May 2012Dr Matt Wood, Amazon Web Services
  2. 2. Thank you!
  3. 3. Building Applications with DynamoDB
  4. 4. Building Applications with DynamoDB Getting started
  5. 5. Building Applications with DynamoDB Getting started Data modeling
  6. 6. Building Applications with DynamoDB Getting started Data modeling Partitioning
  7. 7. Building Applications with DynamoDB Getting started Data modeling Partitioning Analytics
  8. 8. Getting started with DynamoDB quick review
  9. 9. DynamoDB is a managedNoSQL database service.Store and retrieve any amount of data.Serve any level of request traffic.
  10. 10. Without theoperational burden.
  11. 11. Consistent, predictableperformance.Single digit millisecond latencies.Backed on solid-state drives.
  12. 12. Flexible data model.Key/attribute pairs.No schema required.Easy to create. Easy to adjust.
  13. 13. Seamless scalability.No table size limits. Unlimited storage.No downtime.
  14. 14. Durable.Consistent, disk-only writes.Replication across data centres andavailability zones.
  15. 15. Without theoperational burden.
  16. 16. Without theoperational burden. FOCUS ON YOUR APP
  17. 17. Two decisions + three clicks= ready for use
  18. 18. P rimary keys + t le v e l of throughpuTwo decisions + three clicks= ready for use
  19. 19. Provisioned throughput.Reserve IOPS for reads and writes.Scale up (or down) at any time.
  20. 20. Pay per capacity unit.Priced per hour ofprovisioned throughput.
  21. 21. Write throughput.Units = size of item x writes/second$0.01 per hour for 10 write units
  22. 22. Consistent writes.Atomic increment/decrement.Optimistic concurrency control.aka: “conditional writes”.
  23. 23. Transactions.Item level transactions only.Puts, updates and deletes are ACID.
  24. 24. strongly consistent eventually consistentRead throughput.
  25. 25. strongly consistent eventually consistentRead throughput.Provisioned units =size of item x reads/second$0.01 per hour for 50 read units
  26. 26. strongly consistent eventually consistentRead throughput.Provisioned units =size of item x reads/second 2$0.01 per hour for 100 read units
  27. 27. strongly consistent eventually consistentRead throughput.Same latency expectations.Mix and match at “read time”.
  28. 28. Two decisions + three clicks= ready for use
  29. 29. Two decisions + three clicks= ready for use
  30. 30. Two decisions + one API call= ready for use
  31. 31. $create_response = $dynamodb->create_table(array( TableName => ProductCatalog, KeySchema => array( HashKeyElement => array( AttributeName => Id, AttributeType => AmazonDynamoDB::TYPE_NUMBER ) ), ProvisionedThroughput => array( ReadCapacityUnits => 10, WriteCapacityUnits => 5 )));
  32. 32. Two decisions + one API call= ready for use
  33. 33. Two decisions + one API call= ready for development
  34. 34. Two decisions + one API call= ready for production
  35. 35. Two decisions + one API call= ready for scale
  36. 36. Authentication.Session based to minimize latency.Uses Amazon Security Token Service.Handled by AWS SDKs.Integrates with IAM.
  37. 37. Monitoring.CloudWatch metrics:latency, consumed read and writethroughput, errors and throttling.
  38. 38. Libraries, mappers & mocks.ColdFusion, Django, Erlang, Java, .Net,Node.js, Perl, PHP, Python, Rubyhttp://j.mp/dynamodb-libs
  39. 39. DynamoDB data models
  40. 40. DynamoDB semantics.Tables, items and attributes.
  41. 41. Tables contain items.Unlimited items per table.
  42. 42. Items are a collection ofattributes.Each attribute has a key and a value.An item can have any number ofattributes, up to 64k total.
  43. 43. Two scalar data types.String: Unicode, UTF8 binary encoding.Number: 38 digit precision.Multi-value strings and numbers.
  44. 44. date =id = 100 2012-05-16-09-00-10 total = 25.00 date =id = 101 2012-05-15-15-00-11 total = 35.00 date =id = 101 2012-05-16-12-00-10 total = 100.00 date =id = 102 2012-03-20-18-23-10 total = 20.00 date =id = 102 2012-03-20-18-23-10 total = 120.00
  45. 45. Table date =id = 100 2012-05-16-09-00-10 total = 25.00 date =id = 101 2012-05-15-15-00-11 total = 35.00 date =id = 101 2012-05-16-12-00-10 total = 100.00 date =id = 102 2012-03-20-18-23-10 total = 20.00 date =id = 102 2012-03-20-18-23-10 total = 120.00
  46. 46. Item date = id = 100 2012-05-16-09-00-10 total = 25.00 date = id = 101 2012-05-15-15-00-11 total = 35.00 date = id = 101 2012-05-16-12-00-10 total = 100.00 date = id = 102 2012-03-20-18-23-10 total = 20.00 date = id = 102 2012-03-20-18-23-10 total = 120.00
  47. 47. Attribute date =id = 100 2012-05-16-09-00-10 total = 25.00 date =id = 101 2012-05-15-15-00-11 total = 35.00 date =id = 101 2012-05-16-12-00-10 total = 100.00 date =id = 102 2012-03-20-18-23-10 total = 20.00 date =id = 102 2012-03-20-18-23-10 total = 120.00
  48. 48. Where is the schema?Tables do not require a formal schema.Items are an arbitrary sized hash.Just need to specify the primary key.
  49. 49. Items are indexed byprimary key.Single hash keys and composite keys.
  50. 50. Hash Key date = id = 100 2012-05-16-09-00-10 total = 25.00 date = id = 101 2012-05-15-15-00-11 total = 35.00 date = id = 101 2012-05-16-12-00-10 total = 100.00 date = id = 102 2012-03-20-18-23-10 total = 20.00 date = id = 102 2012-03-20-18-23-10 total = 120.00
  51. 51. Range key for queries.Querying items by composite key.
  52. 52. Hash Key + Range Key date = id = 100 2012-05-16-09-00-10 total = 25.00 date = id = 101 2012-05-15-15-00-11 total = 35.00 date = id = 101 2012-05-16-12-00-10 total = 100.00 date = id = 102 2012-03-20-18-23-10 total = 20.00 date = id = 102 2012-03-20-18-23-10 total = 120.00
  53. 53. Programming DynamoDB.Small but perfectly formed.Whole programming interfacefits on one slide.
  54. 54. CreateTable PutItemUpdateTable GetItemDeleteTable UpdateItemDescribeTable DeleteItemListTables BatchGetItemQuery BatchWriteItemScan
  55. 55. CreateTable PutItemUpdateTable GetItemDeleteTable UpdateItemDescribeTable DeleteItemListTables BatchGetItemQuery BatchWriteItemScan
  56. 56. CreateTable PutItemUpdateTable GetItemDeleteTable UpdateItemDescribeTable DeleteItemListTables BatchGetItemQuery BatchWriteItemScan
  57. 57. Conditional updates.PutItem, UpdateItem, DeleteItem cantake optional conditions for operation.UpdateItem performs atomicincrements.
  58. 58. CreateTable PutItemUpdateTable GetItemDeleteTable UpdateItemDescribeTable DeleteItemListTables BatchGetItemQuery BatchWriteItemScan
  59. 59. One API call, multiple items.BatchGet returns multiple items byprimary key.BatchWrite performs up to 25 put ordelete operations.Throughput is measured by IO,not API calls.
  60. 60. CreateTable PutItemUpdateTable GetItemDeleteTable UpdateItemDescribeTable DeleteItemListTables BatchGetItemQuery BatchWriteItemScan
  61. 61. Query vs ScanQuery for composite key queries.Scan for full table scans, exports.Both support pages and limits.Maximum response is 1Mb in size.
  62. 62. Query patterns.Retrieve all items by hash key.Range key conditions:==, <, >, >=, <=, begins with, between.Counts. Top and bottom n values.Paged responses.
  63. 63. Modeling patterns
  64. 64. Patterns 1. Mapping relationships with range keys. No cross-table joins in DynamoDB. Use composite keys to model relationships.
  65. 65. Data model example: online gaming.Storing scores and leader boards. Players with high Scores. Leader board for each game.
  66. 66. Data model example: online gaming.Storing scores and leader boards. Players with high Scores. Players: hash key user_id = location = joined = Leader board for mza Cambridge 2011-07-04 each game. user_id = location = joined = jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15
  67. 67. Data model example: online gaming.Storing scores and leader boards. Players with high Scores. Players: hash key user_id = location = joined = Leader board for mza Cambridge 2011-07-04 each game. user_id = location = joined = jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15 Scores: composite key user_id = game = score = mza angry-birds 11,000 user_id = game = score = mza tetris 1,223,000 user_id = location = score = werner bejewelled 55,000
  68. 68. Data model example: online gaming.Storing scores and leader boards. Players with high Scores. Players: hash key user_id = location = joined = Leader board for mza Cambridge 2011-07-04 each game. user_id = location = joined = jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15 Scores: composite key Leader boards: composite key user_id = game = score = game = score = user_id = mza angry-birds 11,000 angry-birds 11,000 mza user_id = game = score = game = score = user_id = mza tetris 1,223,000 tetris 1,223,000 mza user_id = location = score = game = score = user_id = werner bejewelled 55,000 tetris 9,000,000 jeffbarr
  69. 69. Data model example: online gaming.Storing scores and leader boards. Players: hash key user_id = location = joined = mza Cambridge 2011-07-04 Scores by user user_id = jeffbarr location = Seattle joined = 2012-01-20 (and by game) user_id = location = joined = werner Worldwide 2011-05-15 Scores: composite key Leader boards: composite key user_id = game = score = game = score = user_id = mza angry-birds 11,000 angry-birds 11,000 mza user_id = game = score = game = score = user_id = mza tetris 1,223,000 tetris 1,223,000 mza user_id = location = score = game = score = user_id = werner bejewelled 55,000 tetris 9,000,000 jeffbarr
  70. 70. Data model example: online gaming.Storing scores and leader boards. Players: hash key user_id = location = joined = High scores by mza Cambridge 2011-07-04 user_id = location = joined = game jeffbarr Seattle 2012-01-20 user_id = location = joined = werner Worldwide 2011-05-15 Scores: composite key Leader boards: composite key user_id = game = score = game = score = user_id = mza angry-birds 11,000 angry-birds 11,000 mza user_id = game = score = game = score = user_id = mza tetris 1,223,000 tetris 1,223,000 mza user_id = location = score = game = score = user_id = werner bejewelled 55,000 tetris 9,000,000 jeffbarr
  71. 71. Patterns 2. Handling large items. Unlimited attributes per item. Unlimited items per table. Max 64k per item.
  72. 72. Data model example: large items.Storing more than 64k across items. Large messages: composite keys message_id = part = message = 1 1 <first 64k> message_id = part = message = 1 2 <second 64k> message_id = part = joined = 1 3 <third 64k> Split attributes across items. Query by message_id and part to retrieve.
  73. 73. Patterns Store a pointer to objects in Amazon S3. Large data stored in S3. Location stored in DynamoDB. 99.999999999% data durability in S3.
  74. 74. Patterns 3. Managing secondary indices. Not supported by DynamoDB. Create your own.
  75. 75. Data model example: secondary indices.Storing more than 64k across items. Users: hash key user_id = first_name = last_name = mza Matt Wood user_id = first_name = last_name = mattfox Matt Fox user_id = first_name = last_name = werner Werner Vogels
  76. 76. Data model example: secondary indices.Storing more than 64k across items. Users: hash key user_id = first_name = last_name = mza Matt Wood user_id = first_name = last_name = mattfox Matt Fox user_id = first_name = last_name = werner Werner Vogels First name index: composite keys first_name = user_id = Matt mza first_name = user_id = Matt mattfox first_name = user_id = Werner werner
  77. 77. Data model example: secondary indices.Storing more than 64k across items. Users: hash key user_id = first_name = last_name = mza Matt Wood user_id = first_name = last_name = mattfox Matt Fox user_id = first_name = last_name = werner Werner Vogels First name index: composite keys Second name index: composite keys first_name = user_id = last_name = user_id = Matt mza Wood mza first_name = user_id = last_name = user_id = Matt mattfox Fox mattfox first_name = user_id = last_name = user_id = Werner werner Vogels werner
  78. 78. Data model example: secondary indices.Storing more than 64k across items. Users: hash key user_id = first_name = last_name = mza Matt Wood user_id = first_name = last_name = mattfox Matt Fox user_id = first_name = last_name = werner Werner Vogels First name index: composite keys Second name index: composite keys first_name = user_id = last_name = user_id = Matt mza Wood mza first_name = user_id = last_name = user_id = Matt mattfox Fox mattfox first_name = user_id = last_name = user_id = Werner werner Vogels werner
  79. 79. Data model example: secondary indices.Storing more than 64k across items. Users: hash key user_id = first_name = last_name = mza Matt Wood user_id = first_name = last_name = mattfox Matt Fox user_id = first_name = last_name = werner Werner Vogels First name index: composite keys Second name index: composite keys first_name = user_id = last_name = user_id = Matt mza Wood mza first_name = user_id = last_name = user_id = Matt mattfox Fox mattfox first_name = user_id = last_name = user_id = Werner werner Vogels werner
  80. 80. Patterns 4. Time series data. Logging, click through, ad views, game play data, application usage. Non-uniform access patterns. Newer data is ‘live’. Older data is read only.
  81. 81. Data model example: time series data.Rolling tables for hot and cold data. Events table: composite keys event_id = timestamp = key = 1000 2012-05-16-09-59-01 value event_id = timestamp = key = 1001 2012-05-16-09-59-02 value event_id = timestamp = key = 1002 2012-05-16-09-59-02 value
  82. 82. Data model example: time series data.Rolling tables for hot and cold data. Events table: composite keys event_id = timestamp = key = 1000 2012-05-16-09-59-01 value event_id = timestamp = key = 1001 2012-05-16-09-59-02 value event_id = timestamp = key = 1002 2012-05-16-09-59-02 value Events table for April: composite keys Events table for January: composite keys event_id = timestamp = event_id = timestamp = 400 2012-04-01-00-00-01 100 2012-01-01-00-00-01 event_id = timestamp = event_id = timestamp = 401 2012-04-01-00-00-02 101 2012-01-01-00-00-02 event_id = timestamp = event_id = timestamp = 402 2012-04-01-00-00-03 102 2012-01-01-00-00-03
  83. 83. Patterns Hot and cold tables.Dec Jan Feb Mar April May
  84. 84. Patterns Hot and cold tables.Dec Jan Feb Mar April May higher throughput
  85. 85. Patterns Hot and cold tables.Dec Jan Feb Mar April May lower higher throughput throughput
  86. 86. Patterns Hot and cold tables.Dec Jan Feb Mar April May data to S3, delete cold tables
  87. 87. Patterns Hot and cold tables.Jan Feb Mar Apr May June
  88. 88. Patterns Hot and cold tables.Feb Mar Apr May June July
  89. 89. Patterns Hot and cold tables.Mar Apr May June July Aug
  90. 90. Patterns Hot and cold tables.Apr May June July Aug Sept
  91. 91. Patterns Hot and cold tables.May June July Aug Sept Oct
  92. 92. Patterns Not out of mind. DynamoDB and S3 data can be integrated for analytics. Run queries across hot and cold data with Elastic MapReduce.
  93. 93. Partitioning best practices
  94. 94. Uniform workloads.DynamoDB divides table data intomultiple partitions.Data is distributed primarily byhash key.Provisioned throughput is dividedevenly across the partitions.
  95. 95. Uniform workloads.To achieve and maintain fullprovisioned throughput for a table,spread your workload evenly acrossthe hash keys.
  96. 96. Non-uniform workloads.Some requests might be throttled,even at high levels of provisionedthroughput.Some best practices...
  97. 97. Patterns 1. Distinct values for hash keys. Hash key elements should have a high number of distinct values.
  98. 98. Data model example: hash key selection.Well distributed work loads Users user_id = first_name = last_name = mza Matt Wood user_id = first_name = last_name = jeffbarr Jeff Barr user_id = first_name = last_name = werner Werner Vogels user_id = first_name = last_name = mattfox Matt Fox ... ... ...
  99. 99. Data model example: hash key selection.Well distributed work loads Users user_id = first_name = last_name = mza Matt Wood user_id = first_name = last_name = jeffbarr Jeff Barr user_id = first_name = last_name = werner Werner Vogels user_id = first_name = last_name = mattfox Matt Fox ... ... ... Lots of users with unique user_id. Workload well distributed across user partitions.
  100. 100. Patterns 2. Avoid limited hash key values. Hash key elements should have a high number of distinct values.
  101. 101. Data model example: small hash value range.Non-uniform workload. Status responses status = date = 200 2012-04-01-00-00-01 status = date = 404 2012-04-01-00-00-01 status date = 404 2012-04-01-00-00-01 status = date = 404 2012-04-01-00-00-01
  102. 102. Data model example: small hash value range.Non-uniform workload. Status responses status = date = 200 2012-04-01-00-00-01 status = date = 404 2012-04-01-00-00-01 status date = 404 2012-04-01-00-00-01 status = date = 404 2012-04-01-00-00-01 Small number of status codes. Unevenly, non-uniform workload.
  103. 103. Patterns 3. Model for even distribution of access. Access by hash key value should be evenly distributed across the dataset.
  104. 104. Data model example: uneven access pattern by key.Non-uniform access workload. Devices mobile_id = access_date = 100 2012-04-01-00-00-01 mobile_id = access_date = 100 2012-04-01-00-00-02 mobile_id = access_date = 100 2012-04-01-00-00-03 mobile_id = access_date = 100 2012-04-01-00-00-04 ... ...
  105. 105. Data model example: uneven access pattern by key.Non-uniform access workload. Devices mobile_id = access_date = 100 2012-04-01-00-00-01 mobile_id = access_date = 100 2012-04-01-00-00-02 mobile_id = access_date = 100 2012-04-01-00-00-03 mobile_id = access_date = 100 2012-04-01-00-00-04 ... ... Large number of devices. Small number which are much more popular than others. Workload unevenly distributed.
  106. 106. Data model example: randomize access pattern by key.Towards a uniform workload. Devices mobile_id = access_date = 100.1 2012-04-01-00-00-01 mobile_id = access_date = 100.2 2012-04-01-00-00-02 mobile_id = access_date = 100.3 2012-04-01-00-00-03 mobile_id = access_date = 100.4 2012-04-01-00-00-04 ... ... Randomize access pattern. Workload randomised by hash key.
  107. 107. Design for a uniformworkload.
  108. 108. Analytics with DynamoDB
  109. 109. Seamless scale.Scalable methods for data processing.Scalable methods for backup/restore.
  110. 110. Amazon Elastic MapReduce.Managed Hadoop service fordata-intensive workflows.http://aws.amazon.com/emr
  111. 111. Hadoop under the hood.Take advantage of the Hadoopecosystem: streaming interfaces,Hive, Pig, Mahout.
  112. 112. Distributed data processing.API driven. Analytics at any scale.
  113. 113. Query flexibility with Hive.create external table items_db (id string, votes bigint, views bigint) stored by org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler tblproperties ("dynamodb.table.name" = "items", "dynamodb.column.mapping" = "id:id,votes:votes,views:views");
  114. 114. Query flexibility with Hive.select id, likes, viewsfrom items_dborder by views desc;
  115. 115. Data export/import.Use EMR for backup and restoreto Amazon S3.
  116. 116. Data export/import.CREATE EXTERNAL TABLE orders_s3_new_export ( order_idstring, customer_id string, order_date int, totaldouble )PARTITIONED BY (year string, month string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ,LOCATION s3://export_bucket;INSERT OVERWRITE TABLEorders_s3_new_exportPARTITION (year=2012, month=01)SELECT * from orders_ddb_2012_01;
  117. 117. Integrate live andarchive dataRun queries across external Hive tableson S3 and DynamoDB.Live & archive. Metadata & big objects.
  118. 118. In summary...DynamoDBPredictable performanceProvisioned throughputLibraries & mappers
  119. 119. In summary...DynamoDBPredictable performanceProvisioned throughputLibraries & mappersData modelingTables & itemsRead & write patternsTime series data
  120. 120. In summary...DynamoDB PartitioningPredictable performance Automatic partitioningProvisioned throughput Hot and cold dataLibraries & mappers Size/throughput ratioData modelingTables & itemsRead & write patternsTime series data
  121. 121. In summary...DynamoDB PartitioningPredictable performance Automatic partitioningProvisioned throughput Hot and cold dataLibraries & mappers Size/throughput ratioData modeling AnalyticsTables & items Elastic MapReduceRead & write patterns Hive queriesTime series data Backup & restore
  122. 122. DynamoDB free tier5 writes, 10 consistent reads per second 100Mb of storage
  123. 123. aws.amazon.com/dynamodbaws.amazon.com/documentation/dynamodb best practice + sample code
  124. 124. Thank you!
  125. 125. Q&Amatthew@amazon.com @mza
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×