Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(DAT401) Amazon DynamoDB Deep Dive

5,273 views

Published on

Amazon DynamoDB is a fully managed NoSQL database service for applications that need consistent, single-digit millisecond latency at any scale. This talk explores DynamoDB capabilities and benefits in detail and discusses how to get the most out of your DynamoDB database. We go over schema design best practices with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others. We also explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including JSON document support, Streams, and more.

Published in: Technology

(DAT401) Amazon DynamoDB Deep Dive

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rick Houlihan Principal Solutions Architect Amazon Web Services October 2015 Deep Dive: Amazon DynamoDB DAT401
  2. 2. What to expect from the session • Tables, API, data types, indexes • Scaling • Data modeling • Scenarios and best practices • DynamoDB Streams • Reference architecture
  3. 3. Why NoSQL? Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Yesterday’s solution Built for today’s apps SQL NoSQL
  4. 4. Amazon DynamoDB Fast and Consistent Scales to Any WorkloadDocument or Key-ValueFully Managed NoSQL Event Driven ProgrammingAccess Control
  5. 5. Tables, API, Data Types
  6. 6. Table and item API DynamoDB ListStreams DescribeStream GetShardIterator GetRecords Streams API Admin CRUD Create Table Put/Get Item Update Table Batch Put/Get Item Delete Table Update Item Describe Table Delete Item Query Scan
  7. 7. Data types String (S) Number (N) Binary (B) String Set (SS) Number Set (NS) Binary Set (BS) Boolean (BOOL) Null (NULL) List (L) Map (M) Used for storing nested JSON documents
  8. 8. Table Table Items Attributes Hash Key Range Key Mandatory Key-value access pattern Determines data distribution Optional Model 1:N relationships Enables rich query capabilities All items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top/bottom N values paged responses
  9. 9. 00 FF00 55 A954 AA FF Hash table Hash key uniquely identifies an item Hash key is used for building an unordered hash index Table can be partitioned for scale Id = 1 Name = Jim Hash (1) = 7B Id = 2 Name = Andy Dept = Eng Hash (2) = 48 Id = 3 Name = Kim Dept = Ops Hash (3) = CD Key Space
  10. 10. Hash-range table Hash key and range key together uniquely identify an Item Within unordered hash index, data is sorted by the range key No limit on the number of items (∞) per hash key • Except if you have local secondary indexes 00:0 FF:∞ Hash (2) = 48 Customer# = 2 Order# = 10 Item = Pen Customer# = 2 Order# = 11 Item = Shoes Customer# = 1 Order# = 10 Item = Toy Customer# = 1 Order# = 11 Item = Boots Hash (1) = 7B Customer# = 3 Order# = 10 Item = Book Customer# = 3 Order# = 11 Item = Paper Hash (3) = CD 55 A9:∞54:∞ AA Partition 1 Partition 2 Partition 3
  11. 11. Partitions are three-way replicated Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Replica 1 Replica 2 Replica 3 Partition 1 Partition 2 Partition N
  12. 12. Indexes
  13. 13. Local secondary index (LSI) Alternate range key attribute Index is local to a hash key (or partition) A1 (hash) A3 (range) A2 (table key) A1 (hash) A2 (range) A3 A4 A5 LSIs A1 (hash) A4 (range) A2 (table key) A3 (projected) Table KEYS_ONLY INCLUDE A3 A1 (hash) A5 (range) A2 (table key) A3 (projected) A4 (projected) ALL 10 GB max per hash key, i.e. LSIs limit the # of range keys!
  14. 14. Global secondary index (GSI) Alternate hash (+range) key Index is across all table hash keys (partitions) A1 (hash) A2 A3 A4 A5 GSIs A5 (hash) A4 (range) A1 (table key) A3 (projected) Table INCLUDE A3 A4 (hash) A5 (range) A1 (table key) A2 (projected) A3 (projected) ALL A2 (hash) A1 (table key) KEYS_ONLY RCUs/WCUs provisioned separately for GSIs Online indexing
  15. 15. How do GSI updates work? Table Primary table Primary table Primary table Primary table Global Secondary Index Client 2. Asynchronous update (in progress) If GSIs don’t have enough write capacity, table writes will be throttled!
  16. 16. LSI or GSI? LSI can be modeled as a GSI If data size in an item collection > 10 GB, use GSI If eventual consistency is okay for your scenario, use GSI!
  17. 17. Scaling
  18. 18. Scaling Throughput • Provision any amount of throughput to a table Size • Add any number of items to a table • Max item size is 400 KB • LSIs limit the number of range keys due to 10 GB limit Scaling is achieved through partitioning
  19. 19. Throughput Provisioned at the table level • Write capacity units (WCUs) are measured in 1 KB per second • Read capacity units (RCUs) are measured in 4 KB per second • RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads Read and write throughput limits are independent WCURCU
  20. 20. Partitioning math In the future, these details might change… Number of Partitions By Capacity (Total RCU / 3000) + (Total WCU / 1000) By Size Total Size / 10 GB Total Partitions CEILING(MAX (Capacity, Size))
  21. 21. Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500 RCUs per partition = 5000/3 = 1666.67 WCUs per partition = 500/3 = 166.67 Data/partition = 10/3 = 3.33 GB RCUs and WCUs are uniformly spread across partitions Number of Partitions By Capacity (5000 / 3000) + (500 / 1000) = 2.17 By Size 8 / 10 = 0.8 Total Partitions CEILING(MAX (2.17, 0.8)) = 3
  22. 22. Example: hot keys Partition Time Heat
  23. 23. Example: periodic spike
  24. 24. Getting the most out of DynamoDB throughput “To get the most out of DynamoDB throughput, create tables where the hash key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.” —DynamoDB Developer Guide Space: access is evenly spread over the key-space Time: requests arrive evenly spaced in time
  25. 25. How does DynamoDB handle bursts? DynamoDB saves 300 seconds of unused capacity per partition This is used when a partition runs out of provisioned throughput due to bursts • Provided excess capacity is available at the node
  26. 26. Burst capacity is built-in 0 400 800 1200 1600 CapacityUnits Time Provisioned Consumed “Save up” unused capacity Consume saved up capacity Burst capacity: 300 seconds (1200 × 300 = 3600 CU)
  27. 27. Burst capacity may not be sufficient 0 400 800 1200 1600 CapacityUnits Time Provisioned Consumed Attempted Burst capacity: 300 seconds (1200 × 300 = 3600 CU) Throttled requests Don’t completely depend on burst capacity… provision sufficient throughput
  28. 28. What causes throttling? If sustained throughput goes beyond provisioned throughput per partition Non-uniform workloads • Hot keys/hot partitions • Very large bursts Mixing hot data with cold data • Use a table per time period From the example before: • Table created with 5000 RCUs, 500 WCUs • RCUs per partition = 1666.67 • WCUs per partition = 166.67 • If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB may throttle requests • Solution: Increase provisioned throughput
  29. 29. Table examples case class CameraRecord( cameraId: Int, // hash key ownerId: Int, subscribers: Set[Int], hoursOfRecording: Int, ... ) case class Cuepoint( cameraId: Int, // hash key timestamp: Long, // range key type: String, ... )HashKey RangeKey Value Key Segment 1234554343254 Key Segment1 1231231433235
  30. 30. Data modeling
  31. 31. 1:1 relationships or key-values Use a table or GSI with a hash key Use GetItem or BatchGetItem API Example: Given an SSN or license number, get attributes Users Table Hash key Attributes SSN = 123-45-6789 Email = johndoe@nowhere.com, License = TDL25478134 SSN = 987-65-4321 Email = maryfowler@somewhere.com, License = TDL78309234 Users-Email-GSI Hash key Attributes License = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321 License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789
  32. 32. 1:N relationships or parent-children Use a table or GSI with hash and range key Use Query API Example: • Given a device, find all readings between epoch X, Y Device-measurements Hash Key Range key Attributes DeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90 DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
  33. 33. N:M relationships Use a table and GSI with hash and range key elements switched Use Query API Example: Given a user, find all games. Or given a game, find all users. User-Games-Table Hash Key Range key UserId = bob GameId = Game1 UserId = fred GameId = Game2 UserId = bob GameId = Game3 Game-Users-GSI Hash Key Range key GameId = Game1 UserId = bob GameId = Game2 UserId = fred GameId = Game3 UserId = bob
  34. 34. Documents (JSON) New data types (M, L, BOOL, NULL) introduced to support JSON Document SDKs • Simple programming model • Conversion to/from JSON • Java, JavaScript, Ruby, .NET Cannot index (S,N) elements of a JSON object stored in M • Only top-level table attributes can be used in LSIs and GSIs without Streams/Lambda JavaScript DynamoDB string S number N boolean BOOL null NULL array L object M
  35. 35. Rich expressions Projection expression • Query/Get/Scan: ProductReviews.FiveStar[0] Filter expression • Query/Scan: #VIEWS > :num Conditional expression • Put/Update/DeleteItem: attribute_not_exists (#pr.FiveStar) Update expression • UpdateItem: set Replies = Replies + :num
  36. 36. Scenarios and best practices
  37. 37. Event logging Storing time series data
  38. 38. Time series tables Events_table_2015_April Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N Events_table_2015_March Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N Events_table_2015_Feburary Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N Events_table_2015_January Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N RCUs = 1000 WCUs = 100 RCUs = 10000 WCUs = 10000 RCUs = 100 WCUs = 1 RCUs = 10 WCUs = 1 Current table Older tables HotdataColddata Don’t mix hot and cold data; archive cold data to Amazon S3
  39. 39. Dealing with time series data Use a table per time period Pre-create daily, weekly, monthly tables Provision required throughput for current table Writes go to the current table Turn off (or reduce) throughput for older tables
  40. 40. Product catalog Popular items (read)
  41. 41. Partition 1 2000 RCUs Partition K 2000 RCUs Partition M 2000 RCUs Partition 50 2000 RCU Scaling bottlenecks Product A Product B Shoppers ProductCatalog Table SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"
  42. 42. RequestsPerSecond Item Primary Key Request Distribution Per Hash Key DynamoDB Requests
  43. 43. Partition 1 Partition 2 ProductCatalog Table User DynamoDB User Cache popular items SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"
  44. 44. RequestsPerSecond Item Primary Key Request Distribution Per Hash Key DynamoDB Requests Cache Hits
  45. 45. Messaging app Large items Filters vs. indexes M:N Modeling—inbox and outbox
  46. 46. Messages Table Messages App David SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC Inbox SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC Outbox
  47. 47. Recipient Date Sender Message David 2014-10-02 Bob … … 48 more messages for David … David 2014-10-03 Alice … Alice 2014-09-28 Bob … Alice 2014-10-01 Carol … Large and small attributes mixed (Many more messages) David Messages Table 50 items × 256 KB each Hash key Range key Large message bodies Attachments SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC Inbox
  48. 48. Computing inbox query cost Items evaluated by query Average item size Conversion ratio Eventually consistent reads 50 * 256KB * (1 RCU / 4KB) * (1 / 2) = 1600 RCU
  49. 49. Recipient Date Sender Subject MsgId David 2014-10-02 Bob Hi!… afed David 2014-10-03 Alice RE: The… 3kf8 Alice 2014-09-28 Bob FW: Ok… 9d2b Alice 2014-10-01 Carol Hi!... ct7r Separate the bulk data Inbox-GSI Messages Table MsgId Body 9d2b … 3kf8 … ct7r … afed … David 1. Query Inbox-GSI: 1 RCU 2. BatchGetItem Messages: 1600 RCU (50 separate items at 256 KB) (50 sequential items at 128 bytes) Uniformly distributes large item reads
  50. 50. Inbox GSI Define which attributes to copy into the index
  51. 51. Outbox Sender Outbox GSI SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC
  52. 52. Messaging app Messages Table David Inbox Global secondary index Inbox Outbox Global secondary index Outbox
  53. 53. Reduce one-to-many item sizes Configure secondary index projections Use GSIs to model M:N relationship between sender and recipient Distribute large items Querying many large items at once InboxMessagesOutbox
  54. 54. Multiplayer online gaming Query filters vs. composite key indexes
  55. 55. GameId Date Host Opponent Status d9bl3 2014-10-02 David Alice DONE 72f49 2014-09-30 Alice Bob PENDING o2pnb 2014-10-08 Bob Carol IN_PROGRESS b932s 2014-10-03 Carol Bob PENDING ef9ca 2014-10-03 David Bob IN_PROGRESS Games Table Multiplayer online game data Hash key
  56. 56. Query for incoming game requests DynamoDB indexes provide hash and range What about queries for two equalities and a range? SELECT * FROM Game WHERE Opponent='Bob‘ AND Status=‘PENDING' ORDER BY Date DESC (hash) (range) (?)
  57. 57. Secondary Index Opponent Date GameId Status Host Alice 2014-10-02 d9bl3 DONE David Carol 2014-10-08 o2pnb IN_PROGRESS Bob Bob 2014-09-30 72f49 PENDING Alice Bob 2014-10-03 b932s PENDING Carol Bob 2014-10-03 ef9ca IN_PROGRESS David Approach 1: Query filter BobHash key Range key
  58. 58. Secondary Index Approach 1: query filter Bob Opponent Date GameId Status Host Alice 2014-10-02 d9bl3 DONE David Carol 2014-10-08 o2pnb IN_PROGRESS Bob Bob 2014-09-30 72f49 PENDING Alice Bob 2014-10-03 b932s PENDING Carol Bob 2014-10-03 ef9ca IN_PROGRESS David SELECT * FROM Game WHERE Opponent='Bob' ORDER BY Date DESC FILTER ON Status='PENDING' (filtered out)
  59. 59. Needle in a haystack Bob
  60. 60. Send back less data “on the wire” Simplify application code Simple SQL-like expressions • AND, OR, NOT, () Use query filter Your index isn’t entirely selective
  61. 61. Approach 2: composite key StatusDate DONE_2014-10-02 IN_PROGRESS_2014-10-08 IN_PROGRESS_2014-10-03 PENDING_2014-09-30 PENDING_2014-10-03 Status DONE IN_PROGRESS IN_PROGRESS PENDING PENDING Date 2014-10-02 2014-10-08 2014-10-03 2014-10-03 2014-09-30 + =
  62. 62. Secondary Index Approach 2: composite key Opponent StatusDate GameId Host Alice DONE_2014-10-02 d9bl3 David Carol IN_PROGRESS_2014-10-08 o2pnb Bob Bob IN_PROGRESS_2014-10-03 ef9ca David Bob PENDING_2014-09-30 72f49 Alice Bob PENDING_2014-10-03 b932s Carol Hash key Range key
  63. 63. Opponent StatusDate GameId Host Alice DONE_2014-10-02 d9bl3 David Carol IN_PROGRESS_2014-10-08 o2pnb Bob Bob IN_PROGRESS_2014-10-03 ef9ca David Bob PENDING_2014-09-30 72f49 Alice Bob PENDING_2014-10-03 b932s Carol Secondary Index Approach 2: composite key Bob SELECT * FROM Game WHERE Opponent='Bob' AND StatusDate BEGINS_WITH 'PENDING'
  64. 64. Needle in a sorted haystack Bob
  65. 65. Sparse indexes Id (Hash) User Game Score Date Award 1 Bob G1 1300 2012-12-23 2 Bob G1 1450 2012-12-23 3 Jay G1 1600 2012-12-24 4 Mary G1 2000 2012-10-24 Champ 5 Ryan G2 123 2012-03-10 6 Jones G2 345 2012-03-20 Game-scores-table Award (Hash) Id User Score Champ 4 Mary 2000 Award-GSI Scan sparse hash GSIs
  66. 66. Concatenate attributes to form useful secondary index keys Take advantage of sparse indexes Replace filter with indexes You want to optimize a query as much as possible Status + Date
  67. 67. Real-Time voting Write-heavy items
  68. 68. Requirements for voting Allow each person to vote only once No changing votes Real-time aggregation Voter analytics, demographics
  69. 69. Real-time voting architecture AggregateVotes Table Voters RawVotes Table Voting App
  70. 70. Partition 1 1000 WCUs Partition K 1000 WCUs Partition M 1000 WCUs Partition N 1000 WCUs Votes Table Candidate A Candidate B Scaling bottlenecks Voters Provision 200,000 WCUs
  71. 71. Write sharding Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_7 Candidate B_8 Candidate A_6 Candidate A_8 Candidate A_5 Voter Votes Table
  72. 72. Write sharding Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_7 Candidate B_8 UpdateItem: “CandidateA_” + rand(0, 10) ADD 1 to Votes Candidate A_6 Candidate A_8 Candidate A_5 Voter Votes Table
  73. 73. Votes Table Shard aggregation Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_5 Candidate A_6 Candidate A_8 Candidate A_7 Candidate B_8 Periodic Process Candidate A Total: 2.5M 1. Sum 2. Store Voter
  74. 74. Trade off read cost for write scalability Consider throughput per hash key and per partition Shard write-heavy hash keys Your write workload is not horizontally scalable
  75. 75. Correctness in voting UserId Candidate Date Alice A 2013-10-02 Bob B 2013-10-02 Eve B 2013-10-02 Chuck A 2013-10-02 RawVotes Table Segment Votes A_1 23 B_2 12 B_1 14 A_2 25 AggregateVotes Table Voter 1. Record vote and de-dupe; retry 2. Increment candidate counter
  76. 76. Correctness in aggregation? UserId Candidate Date Alice A 2013-10-02 Bob B 2013-10-02 Eve B 2013-10-02 Chuck A 2013-10-02 RawVotes Table Segment Votes A_1 23 B_2 12 B_1 14 A_2 25 AggregateVotes Table Voter
  77. 77. DynamoDB Streams
  78. 78. Stream of updates to a table Asynchronous Exactly once Strictly ordered • Per item Highly durable • Scale with table 24-hour lifetime Sub-second latency DynamoDB Streams
  79. 79. View Type Destination Old image—before update Name = John, Destination = Mars New image—after update Name = John, Destination = Pluto Old and new images Name = John, Destination = Mars Name = John, Destination = Pluto Keys only Name = John View types UpdateItem (Name = John, Destination = Pluto)
  80. 80. Stream Table Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Table Shard 1 Shard 2 Shard 3 Shard 4 KCL Worker KCL Worker KCL Worker KCL Worker Amazon Kinesis Client Library Application DynamoDB Client Application Updates DynamoDB Streams and Amazon Kinesis Client Library
  81. 81. DynamoDB Streams Open Source Cross- Region Replication Library Asia Pacific (Sydney) EU (Ireland) Replica US East (N. Virginia) Cross-region replication
  82. 82. DynamoDB Streams and AWS Lambda
  83. 83. Triggers
  84. 84. Real-time voting architecture (improved) AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis– Enabled App Voters RawVotes TableVoting App RawVotes DynamoDB Stream
  85. 85. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis- Enabled App Voters RawVotes TableVoting App RawVotes DynamoDB Stream Handle any scale of election
  86. 86. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis- Enabled app Voters RawVotes TableVoting App RawVotes DynamoDB Stream Vote only once, no changing votes
  87. 87. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis– Enabled App Voters RawVotes TableVoting app RawVotes DynamoDB Stream Real-time, fault-tolerant, scalable aggregation
  88. 88. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis– Enabled App Voters RawVotes TableVoting app RawVotes DynamoDB Stream Voter analytics, statistics
  89. 89. Analytics with DynamoDB Streams Collect and de-dupe data in DynamoDB Aggregate data in-memory and flush periodically Performing real-time aggregation and analytics
  90. 90. Architecture
  91. 91. Reference Architecture
  92. 92. Thank you!

×