Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SRV404 Deep Dive on Amazon DynamoDB

384 views

Published on

Explore Amazon DynamoDB capabilities and benefits in detail and learn how to get the most out of your DynamoDB database. We go over best practices for schema design with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others. We explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including JSON document support, DynamoDB Streams, and more. We also provide lessons learned from operating DynamoDB at scale, including provisioning DynamoDB for IoT.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

SRV404 Deep Dive on Amazon DynamoDB

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deep Dive: Amazon DynamoDB Sean Shriver, AWS NoSQL Solutions Architect April 19, 2017
  2. 2. Plan Dating Website Serverless IoT DAX and GSIs TTL, Streams, and DAX Getting Started Developer Resources Amazon DynamoDB Tables, Indexes, Partitioning New Features TTL, VPCe, DAX
  3. 3. Highly available Consistent, single digit millisecond latency at any scale Fully managed Secure Integrates with AWS Lambda, Amazon Redshift, and more. Amazon DynamoDB
  4. 4. A1 (partition key) A2 (sort key) A3 A4 A5 A1 (partition key) A2 (sort key) A1 (partition key) A2 (sort key) A6 A4 A5 A1 (partition key) A2 (sort key) A3 A4 A7 Table Partition Key Sort Key Items Mandatory Key-value access pattern Determines data distribution Optional Model 1:N relationships Enables rich query capabilities DynamoDB Table
  5. 5. 10 GB max per partition key, i.e. LSIs limit the # of sort keys! A1 (partition key) A3 (sort key) A2 A4 A5 A1 (partition key) A4 (sort key) A2 A3 A5 A1 (partition key) A5 (sort key) A2 A3 A4 Local Secondary Indexes • Alternate sort key attribute • Index is local to a partition key
  6. 6. Global Secondary Indexes RCUs/WCUs provisioned separately for GSIs INCLUDE A2 ALL KEYS_ONLY A3 (partition key) A1 (table key) A2 A4 A5 A3 (partition key) A1 (table key) A2 A4 A7 A3 (partition key) A1 (table key) A3 (partition key) A1 (table key) A3 (partition key) A1 (table key) A2 A3 (partition key) A1 (table key) A2 • Alternate partition (+sort) key • Index is across all table partition keys • Can be added or removed anytime
  7. 7. RCUs/WCUs provisioned separately for GSIs 10 GB max per partition key, i.e. LSIs limit the # of sort keys! INCLUDE A2 ALL KEYS_ONLY A1 (partition key) A2 (sort key) A3 A4 A5 A1 (partition key) A2 (sort key) A1 (partition key) A2 (sort key) A6 A4 A5 A1 (partition key) A2 (sort key) A3 A4 A7 Partition Key Sort Key A1 (partition key) A3 (sort key) A2 A4 A5 A1 (partition key) A4 (sort key) A2 A3 A5 A1 (partition key) A5 (sort key) A2 A3 A4 A3 (partition key) A1 (table key) A2 A4 A5 A3 (partition key) A1 (table key) A2 A4 A7 A3 GSI A3 (partition key) A1 (table key) A3 (partition key) A1 (table key) A3 (partition key) A1 (table key) A2 A3 (partition key) A1 (table key) A2 Mandatory Key-value access pattern Determines data distribution Optional Model 1:N relationships Enables rich query capabilities Global Secondary Indexes Limit 5 Local Secondary Indexes Limit 5 DynamoDB Table
  8. 8. Data types, table creation options, provisioned capacity Data Types Provisioned capacity Type DynamoDB Type String String Integer, Float Number Timestamp Number or String Blob Binary Boolean Bool Null Null List List Set Set of String, Number, or Binary Map Map PartitionKey, Type: SortKey, Type: Provisioned Reads: Provisioned Writes: LSI Schema GSI Schema AttributeName [S,N,B] AttributeName [S,N,B] 1+ 1+ Provisioned Reads: 1+ Provisioned Writes: 1+ TableName OptionalRequired Read Capacity Unit (RCU) 1 RCU returns 4KB of data for strongly consistent reads, or double the data at the same cost for eventually consistent reads Capacity is per second, rounded up to the next whole number Write Capacity Unit (WCU) 1 WCU writes 1KB of data, and each item consumes 1 WCU minimum CreateTable String, Number, Binary ONLY Per Second Unique to Account and Region
  9. 9. CustomerOrdersTable OrderId: 1 CustomerId: 1 ASIN: [B00X4WHP5E] Partitioning 00 55 AA FF Hash(1) = 7B CustomerOrdersTable OrderId: 2 CustomerId: 4 ASIN: [B00OQVZDJM] OrderId: 3 CustomerId: 3 ASIN: [B00U3FPN4U] Partition A 33.33 % Keyspace 33.33 % Provisioned Capacity Partition B 33.33 % Keyspace 33.33 % Provisioned Capacity Partition C 33.33 % Keyspace 33.33 % Provisioned Capacity Hash.MIN = 0 Hash.MAX = FF Keyspace Hash(2) = 48 Hash(3) = CD
  10. 10. CustomerOrdersTable 00 55 AA FF Partition A 33.33 % Keyspace 33.33 % Provisioned Capacity Partition B 33.33 % Keyspace 33.33 % Provisioned Capacity Partition C 33.33 % Keyspace 33.33 % Provisioned Capacity Hash.MIN = 0 Hash.MAX = FF Keyspace Time Partition A 33.33 % Keyspace 33.33 % Provisioned Capacity Partition B 33.33 % Keyspace 33.33 % Provisioned Capacity Partition D Partition E 16.66 % 16.66 % 16.66 % 16.66 % Partition split due to partition size 00 55 AA FF Partition A 33.33 % Keyspace 33.33 % Provisioned Capacity Partition B 33.33 % Keyspace 33.33 % Provisioned Capacity Partition C 33.33 % Keyspace 33.33 % Provisioned Capacity Time Partition A Partition C 16.66 % 16.66 % 16.66 % 16.66 % Partition splits due to capacity increase 16.66 % 16.66 % 16.66 % 16.66 % 16.66 % 16.66 % 16.66 % 16.66 % Partition B Partition D Partition E Partition F The desired size of a partition is 10GB* and when a partition surpasses this it can split *=subject to change Split for partition size The desired capacity of a partition is expressed as: 3w + 1r < 3000 * Where w = WCU & r = RCU *=subject to change Split for provisioned capacity
  11. 11. Partitioning Partition A 1000 RCUs 100 WCUs Partition C 1000 RCUs 100 WCUs Host A Host C Availability Zone A Partition A 1000 RCUs 100 WCUs Partition C 1000 RCUs 100 WCUs Host E Host G Availability Zone B Partition A 1000 RCUs 100 WCUs Partition C 1000 RCUs 100 WCUs Host H Host J Availability Zone C CustomerOrdersTable 54:∞00:0 54:∞00:0 54:∞00:0FF:∞AA:0 FF:∞AA:0 FF:∞AA:0 Data is replicated to three Availability Zones by design 3-way replication OrderId: 1 CustomerId: 1 ASIN: [B00X4WHP5E] Hash(1) = 7B Partition B 1000 RCUs 100 WCUs Host B Host F Host I Partition B 1000 RCUs 100 WCUs Partition B 1000 RCUs 100 WCUs A9:∞55:0 A9:∞55:0 A9:∞55:0
  12. 12. DynamoDB Streams Partition A Partition B Partition C  Ordered stream of item changes  Exactly once, strictly ordered by key  Highly durable, scalable  24 hour retention  Sub-second latency  Compatible with Kinesis Client Library DynamoDB Streams Shard 1 Shards have a lineage and automatically close after time or when the associated DynamoDB partition splits Shard 2 Shard 3Updates KCL Worker Amazon Kinesis Client Library Application KCL Worker KCL Worker GetRecords Amazon DynamoDB Table DynamoDB Streams Stream
  13. 13. TTL job Time-To-Live (TTL) Amazon DynamoDB Table CustomerActiveOrder OrderId: 1 CustomerId: 1 MyTTL: 1492641900 DynamoDB Streams Stream Amazon Kinesis Amazon Redshift An epoch timestamp marking when an item can be deleted by a background process, without consuming any provisioned capacity Time-To-Live Removes data that is no longer relevant
  14. 14. Time-To-Live (TTL)  TTL items identifiable in DynamoDB Streams  Configuration protected by AWS Identity and Access Management (IAM), auditable with AWS CloudTrail  Eventual deletion, free to use
  15. 15. Amazon DynamoDB msμs DAXYour App in VPC Amazon DynamoDB Accelerator (DAX)
  16. 16. DynamoDB in the VPC Availability Zone #1 Availability Zone #2 Private Subnet Private Subnet VPC endpoint web app server security group security group DAX o Microseconds latency in- memory cache o Millions of requests per second o Fully managed, highly available o Role based access control o No IGW or VPC endpoint required DAX o DynamoDB-in-the-VPC o IAM resource policy restricted VPC Endpoints AWS Lambda web app server security group security group DAX
  17. 17. DynamoDB Accelerator (DAX) FQDN Endpoint for discovery Supports AWS Java SDK on launch, with more AWS SDKs to come Cluster based, Multi-AZ Separate Query and Item cache
  18. 18. DynamoDB table key choice To get the most out of DynamoDB throughput, create tables where the partition key has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible. Amazon DynamoDB Developer Guide
  19. 19. Elements of even access Partitions Time Heat 1. Key choice: high key cardinality 2. Uniform access: access is evenly spread over the key-space Time 3. Requests arrive evenly spaced in time Time Even access: All three at once
  20. 20. Use burst sparingly DynamoDB “saves” 300 seconds of unused capacity per partition 0 400 800 1200 1600 CapacityUnits Time Provisioned Consumed “Save up” unused capacity Consume saved up capacity Burst: 300 seconds (1200 × 300 = 360k CU) 0 400 800 1200 1600 CapacityUnits Time Provisioned Consumed Attempted Throttled requests Don’t completely depend on burst capacity… provision sufficient throughput
  21. 21. What causes throttling? If sustained throughput goes beyond provisioned throughput on a partition A throttle comes from a partition In Amazon CloudWatch, if consumed capacity is well under provisioned and throttling occurs, it must be “partition throttling” Disable retries, writes your own retry code, and log all throttled or returned keys • Fire TV Stick • Echo Dot – Black • Amazon Fire TV • Amazon Echo – Black • Fire HD 8 • Echo Dot – White • Kindle Paperwhite • Fire Tablet with Alexa • Fire HD 8 Tablet with A… • Fire HD 8 Tablet with A… Top Items
  22. 22. Elastic is the new normal Write Capacity Units Read Capacity Units ConsumedCapacityUnits Time >200% increase from baseline >300% increase from baseline
  23. 23. Ad TechGaming IoTMobile Web
  24. 24. Design Patterns
  25. 25. Online dating website running on AWS Users have people they like, and conversely people who like them Hourly batch job matches users Data stored in Likes and Matches tables Dating Website DESIGN PATTERNS: DynamoDB Accelerator and GSIs
  26. 26. Schema Design Likes user_id_self (Partition key) user_id_other (sort key) MyTTL (TTL attribute) … Attribute N Table Keys GSI Other user_id_other (Partition key) user_id_self (sort key) Matches event_id (Partition key) timestamp (sort key) UserIdLeft (GSI left) UserIdRight (GSI right) Attribute N GSI Left UserIdLeft (Partition key) event_id (Table key) timestamp (Table Key) UserIdRight Requirements: 1. Get all people I like 2. Get all people that like me 3. Expire likes after 90 days LIKES| GSI Right UserIdRight (Partition key) event_id (Table key) timestamp (Table Key) UserIdLeft Requirements: 1. Get my matches MATCHES|
  27. 27. Matchmaking LIKESRequirements: 1. Get all new likes every hour 2. For each like, get the other user’s likes 3. Store matches in matches table Partition 1 Partition … Partition N Availability Zone Public Subnet match making server security group Auto Scaling group
  28. 28. Matchmaking LIKESRequirements: 1. Get all new likes every hour 2. For each like, get the other user’s likes 3. Store matches in matches table Partition 1 Partition … Partition N Availability Zone Public Subnet match making server security group Auto Scaling group THROTTLE!
  29. 29. Matchmaking Requirements: 1. Get all new likes every hour 2. For each like, get the other user’s likes 3. Store matches in matches table 1. Key choice: High key cardinality 2. Uniform access: access is evenly spread over the key-space 3. Time: requests arrive evenly spaced in time Even Access:
  30. 30. Matchmaking LIKESRequirements: 1. Get all new likes every hour 2. For each like, get the other user’s likes 3. Store matches in matches table Partition 1 Partition … Partition N Availability Zone Private Subnet match making server security group AutoScaling group 0. Write like to like table, then query by user id to warm cache, then queue for batch processing security group DAX
  31. 31. Takeaways: Keep DAX warm by querying after writing Use GSIs for many to many relationships Dating Website DESIGN PATTERNS: DynamoDB Accelerator and GSIs
  32. 32. Serverless IoT DESIGN PATTERNS: TTL, DynamoDB Streams, and DAX Amazon DynamoDB Single DynamoDB table for storing sensor data Tiered storage to remove archive old events to S3 Data stored in data table
  33. 33. Schema Design Data DeviceId (Partition key) EventEpoch (sort key) MyTTL (TTL attribute) … Attribute N Requirements: 1. Get all events for a device 2. Archive old events after 90 daysDATA| UserDevices UserId (Partition key) DeviceId (sort key) Attribute 1 … Attribute N Requirements: 1. Get all devices for a user USERDEVICES| References
  34. 34. Serverless IoT DATA DeviceId: 1 EventEpoch: 1492641900 MyTTL: 1492736400 Expiry AWS Lambda Amazon S3 Bucket Amazon DynamoDB Amazon DynamoDB Streams Single DynamoDB table for storing sensor data Tiered storage to remove archive old events to S3 Data stored in data table USERDEVICES
  35. 35. Serverless IoT DATA Partition A Partition B Partition DPartition C Throttling Noisy sensor produces data at a rate several times greater than others
  36. 36. Data 00 3F BF FF Partition A 25.0 % Keyspace 25.0 % Provisioned Capacity Partition B 25.0 % Keyspace 25.0 % Provisioned Capacity Partition D 25.0 % Keyspace 25.0 % Provisioned Capacity Hash.MIN = 0 Hash.MAX = FF Keyspace Partition C 25.0 % Keyspace 25.0 % Provisioned Capacity 7F
  37. 37. Serverless IoT: Naïve Sharding Requirements: 1. Single DynamoDB table for storing sensor data 2. Tiered storage to remove archive old events to S3 3. Data stored in data table 0. Capable of dynamically sharding to overcome throttling Data DeviceId_ShardId (Partition key) EventEpoch (sort key) MyTTL (TTL attribute) … Attribute N Requirements: 1. Get all events for a device 2. Archive old events after 90 daysDATA| Shard DeviceId (Partition key) ShardCount Requirements: 1. Get shard count for given device 2. Always grow the count of shardsSHARD| Range: 0..1,000 A sharding scheme where the number of shards is not predefined, and will grow over time but never contract. Contrast with a fixed shard count Naïve Sharding
  38. 38. DATA DeviceId_ShardId: 1_3 EventEpoch: 1492641900 MyTTL: 1492736400 Expiry SHARD DeviceId: 1 ShardCount: 10 Serverless IoT: Naïve Sharding Request path: 1. Read ShardCount from Shard table 2. Write to a random shard 3. If throttled, review shard count 1. 2.
  39. 39. Serverless IoT DATA Partition A Partition B Partition DPartition C Pick a random shard to write data to SHARD DeviceId: 1 ShardCount: 10 DeviceId_ShardId: 1_Rand(0,10) EventEpoch: 1492641900 MyTTL: 1492736400 1. 2. ?
  40. 40. DATA DeviceId: 1 EventEpoch: 1492641900 MyTTL: 1492736400 AWS Lambda Amazon S3 Bucket Amazon DynamoDB Amazon DynamoDB Streams Single DynamoDB table for storing sensor data Tiered storage to remove archive old events to S3 Data stored in data table Capable of dynamically sharding to overcome throttling USERDEVICES SHARD DeviceId: 1 ShardCount: 10 DAX + Amazon Kinesis Firehose
  41. 41. Takeaways: Use naïve write sharding to dynamically expand shards Use DAX for hot reads, especially from Lambda Use TTL to create tiered storage Serverless IoT DESIGN PATTERNS: TTL, DynamoDB Streams, and DAX
  42. 42. Getting started? DynamoDB Local Document SDKs DynamoDB as a target for AWS DMS
  43. 43. Thank you! Remember to fill out your survey

×