Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Dive on Amazon DynamoDB

353 views

Published on

Explore Amazon DynamoDB capabilities and benefits in detail and learn how to get the most out of your DynamoDB database. We go over best practices for schema design with DynamoDB across multiple use cases, including gaming, IoT, and others. We explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including DynamoDB Accelerator (DAX), DynamoDB Time-to-Live, and more. We also provide lessons learned from operating DynamoDB at scale, including provisioning DynamoDB for IoT.

  • Hey guys! Who wants to chat with me? More photos with me here 👉 http://www.bit.ly/katekoxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Deep Dive on Amazon DynamoDB

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved Deep Dive: Amazon DynamoDB Sean Shriver NoSQL Solutions Architect Amazon Web Services
  2. 2. Relational (SQL) vs. nonrelational (NoSQL)
  3. 3. Relational vs. nonrelational databases Traditional SQL NoSQL DB Primary Secondary Scale up DB DB DBDB DB DB Scale out
  4. 4. SQL (Relational) Price Desc. $11.50 $8.99 Chaplin’s first … Columns Rows Primary Key Index $14.95 One of 2 major … The Partitas Product ID Type 1 Book 2 Album 3 Movie Products
  5. 5. SQL (Relational) Price Desc. $11.50 $8.99 Chaplin’s first … Columns Rows Primary Key Index $14.95 One of 2 major … The Partitas Product ID Type 1 2 3 Title Date Odyssey 1871 Book ID Author 1 Homer Books Albums Title 6 Partitas Album ID Artist 2 Bach Genre Director Drama, Comedy Chaplin Movie ID Title 3 The Kid Movies Products Book Album Movie
  6. 6. SQL (Relational) vs. NoSQL (Non-relational) Product ID Type Odyssey Homer1 Book ID 2 Album ID 6 Partitas 2 Album ID: Track ID Partita No. 1 Bach Attributes Schema is defined per item Items Partition Key Sort Key Price Desc. $11.50 $8.99 Chaplin’s first … Columns Rows Primary Key Index $14.95 One of 2 major … The Partitas 3 Movie ID The Kid Drama, Comedy 1871 Chaplin Primary Key Product ID Type 1 2 3 Title Date Odyssey 1871 Book ID Author 1 Homer Books Albums Title 6 Partitas Album ID Artist 2 Bach Genre Director Drama, Comedy Chaplin Movie ID Title 3 The Kid Movies Products Products Book Album Movie
  7. 7. Why NoSQL? Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP at scale SQL NoSQL
  8. 8. Agenda Tables, API, data types, indexes Replication Data modeling [1:1, 1:M] DynamoDB Streams Use cases Reference architecture
  9. 9. Amazon DynamoDB Managed NoSQL database service Simple and powerful API Supports both document and key-value data models Highly scalable Consistent, single-digit millisecond latency at any scale Highly available—3x replication
  10. 10. Tables, Partitioning
  11. 11. Table Table Items Attributes Partition Key Sort Key Mandatory Key-value access pattern Determines data distribution Optional Model 1:N relationships Enables rich query capabilities All items for a partition key ==, <, >, >=, <= “begins with” “between” sorted results counts top/bottom N values paged responses
  12. 12. APIs CreateTable UpdateTable DeleteTable DescribeTable ListTables UpdateTimeToLive DescribeTimeToLive GetItem Query Scan BatchGetItem _______________ PutItem UpdateItem DeleteItem BatchWriteItem ListStreams DescribeStream GetShardIterator GetRecords Stream API DynamoDB Table Item
  13. 13. Data types String (S) Number (N) Binary (B) String Set (SS) Number Set (NS) Binary Set (BS) Map (M) List (L) Boolean (BOOL) Null (NULL) Used for storing nested JSON documents
  14. 14. 00 55 A954 AA FF Partition table Partition key uniquely identifies an item Partition key is used for building an unordered hash index Table can be partitioned for scale 00 FF Id = 1 Name = Jim Hash (1) = 7B Id = 2 Name = Andy Dept = Engg Hash (2) = 48 Id = 3 Name = Kim Dept = Ops Hash (3) = CD Key Space
  15. 15. Partitions are three-way replicated Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Replica 1 Replica 2 Replica 3 Partition 1 Partition 2 Partition N
  16. 16. Partition-sort key table Partition key and sort key together uniquely identify an Item Within unordered partition key-space, data is sorted by the sort key No limit on the number of items (∞) per partition key • Except if you have local secondary indexes 00:0 FF:∞ Hash (2) = 48 Customer# = 2 Order# = 10 Item = Pen Customer# = 2 Order# = 11 Item = Shoes Customer# = 1 Order# = 10 Item = Toy Customer# = 1 Order# = 11 Item = Boots Hash (1) = 7B Customer# = 3 Order# = 10 Item = Book Customer# = 3 Order# = 11 Item = Paper Hash (3) = CD 55:0 A9:∞54:∞ AA:0 Partition 1 Partition 2 Partition 3
  17. 17. Indexes
  18. 18. Global secondary index (GSI) Alternate partition (+sort) key Index is across all table partition keys GSIs A5 (part.) A4 (sort) A1 (table key) A3 (projected) Table INCLUDE A3 A4 (part.) A5 (sort) A1 (table key) A2 (projected) A3 (projected) ALL A2 (part.) A1 (table key) KEYS_ONLY RCU/WCU provisioned separately for GSIs Online Indexing A1 (partition) A2 A3 A4 A5
  19. 19. How do GSI updates work? Table Primary table Primary table Primary table Primary table Global Secondary Index Client 3. Asynchronous update (in progress) If GSIs don’t have enough write capacity, table writes will be throttled!
  20. 20. Local secondary index (LSI) Alternate sort key attribute Index is local to a partition key A1 (partition) A3 (sort) A2 (table key) A1 (partition) A2 (sort) A3 A4 A5 LSIs A1 (partition) A4 (sort) A2 (table key) A3 (projected) Table KEYS_ONLY INCLUDE A3 A1 (partition) A5 (sort) A2 (table key) A3 (projected) A4 (projected) ALL 10 GB max per partition key, therefore LSIs limit the # of sort keys!
  21. 21. Scaling
  22. 22. Scaling: Throughput Provisioned at the table level Read and write throughput limits are independent Tables re-partition according to provisioned throughput • Write capacity units (WCUs) are measured in 1 KB per second • Read capacity units (RCUs) are measured in 4 KB per second • RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads WCURCU
  23. 23. Scaling: Disk Size Virtually limitless storage DynamoDB automatically splits partitions after 10GB Items are limited to 400KB in size
  24. 24. Partitioning math # 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 = 𝑇𝑎𝑏𝑙𝑒 𝑆𝑖𝑧𝑒 𝑖𝑛 𝐺𝐵 10 𝐺𝐵(𝑓𝑜𝑟 𝑠𝑖𝑧𝑒) # 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 (𝑓𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡) = 𝑅𝐶𝑈𝑓𝑜𝑟 𝑟𝑒𝑎𝑑𝑠 3000 𝑅𝐶𝑈 + 𝑊𝐶𝑈𝑓𝑜𝑟 𝑤𝑟𝑖𝑡𝑒𝑠 1000 𝑊𝐶𝑈 (𝑓𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡)(𝑓𝑜𝑟 𝑠𝑖𝑧𝑒)(𝑡𝑜𝑡𝑎𝑙) In the future, these details might change… Not used until table is created
  25. 25. Partitioning example # 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 = 8 𝐺𝐵 10 𝐺𝐵 = 0.8 = 1 (𝑓𝑜𝑟 𝑠𝑖𝑧𝑒) # 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 (𝑓𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡) = 5000 𝑅𝐶𝑈 3000 𝑅𝐶𝑈 + 500 𝑊𝐶𝑈 1000 𝑊𝐶𝑈 = 2.17 = 3 Table size = 8 GB, RCUs = 5000, WCUs = 500 (𝑡𝑜𝑡𝑎𝑙) RCUs per partition = 5000/3 = 1666.67 WCUs per partition = 500/3 = 166.67 Data/partition = 10/3 = 3.33 GB RCUs and WCUs are uniformly spread across partitions on table creation
  26. 26. Getting the most out of DynamoDB throughput “To get the most out of DynamoDB throughput, create tables where the partition key has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.” —DynamoDB Developer Guide 1. Space: 1. Key Choice: High key cardinality 2. Uniform Access: access is evenly spread over the key-space 2. Time: requests arrive evenly spaced in time
  27. 27. Example: Key Choice or Uniform Access Partition Time Heat
  28. 28. Example: Time
  29. 29. Example: Uniform access
  30. 30. How does DynamoDB handle bursts? DynamoDB saves 300 seconds of unused capacity per partition Bursting is best effort!
  31. 31. Burst capacity is built-in 0 400 800 1200 1600 CapacityUnits Time Provisioned Consumed “Save up” unused capacity Consume saved up capacity Burst: 300 seconds (1200 × 300 = 360k CU)
  32. 32. Burst capacity may not be sufficient 0 400 800 1200 1600 CapacityUnits Time Provisioned Consumed Attempted Throttled requests Don’t completely depend on burst capacity… provision sufficient throughput Burst: 300 seconds (1200 × 300 = 360k CU)
  33. 33. What causes throttling? Non-uniform workloads • Hot keys/hot partitions • Large collection sizes [100s of MBs under one partition key] Dilution of throughout across partitions caused by mixing hot data with cold data • Use a table per time period for storing time series data so WCUs and RCUs are applied to the hot data set
  34. 34. Data Modeling Store data based on how you will access it!
  35. 35. 1:1 relationships or key-values Use a table or GSI with a partition key Use GetItem or BatchGetItem API Example: Given a user or email, get attributes Users Table Partition key Attributes UserId = bob Email = bob@gmail.com, JoinDate = 2016-11-15 UserId = fred Email = fred@yahoo.com, JoinDate = 2016-12-01 Users-Email-GSI Partition key Attributes Email = bob@gmail.com UserId = bob, JoinDate = 2016-11-15 Email = fred@yahoo.com UserId = fred, JoinDate = 2016-12-01
  36. 36. 1:N relationships or parent-children Use a table or GSI with partition and sort key Use Query API Example: Given a device, find all readings between epoch X, Y Device-measurements Part. Key Sort key Attributes DeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90 DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
  37. 37. N:M relationships Use a table and GSI with partition and sort key elements switched Use Query API Example: Given a user, find all games. Or given a game, find all users. User-Games-Table Part. Key Sort key UserId = bob GameId = Game1 UserId = fred GameId = Game2 UserId = bob GameId = Game3 Game-Users-GSI Part. Key Sort key GameId = Game1 UserId = bob GameId = Game2 UserId = fred GameId = Game3 UserId = bob
  38. 38. Rich expressions Projection expression • Query/Get/Scan: ProductReviews.FiveStar[0] Filter expression • Query/Scan: #V > :num (#V is a place holder for keyword VIEWS) Conditional expression • Put/Update/DeleteItem: attribute_not_exists (#pr.FiveStar) Update expression • UpdateItem: set Replies = Replies + :num
  39. 39. DynamoDB Streams Partition A Partition B Partition C Ordered stream of item changes Exactly once, strictly ordered by key Highly durable, scalable 24 hour retention Sub-second latency Compatible with Kinesis Client Library DynamoDB Streams 1 Shards have a lineage and automatically close after time or when the associated DynamoDB partition splits 2 3 Updates KCL Worker Amazon Kinesis Client Library Application KCL Worker KCL Worker GetRecords Amazon DynamoDB Table DynamoDB Streams Stream Shards
  40. 40. Use cases
  41. 41. Event Logging Storing time series data
  42. 42. Time series tables Events_table_2016_April Event_id (Partition key) Timestamp (sort key) Attribute1 …. Attribute N Events_table_2016_March Event_id (Partition key) Timestamp (sort key) Attribute1 …. Attribute N Events_table_2016_Feburary Event_id (Partition key) Timestamp (sort key) Attribute1 …. Attribute N Events_table_2016_January Event_id (Partition key) Timestamp (sort key) Attribute1 …. Attribute N RCUs = 1000 WCUs = 100 RCUs = 10000 WCUs = 10000 RCUs = 100 WCUs = 1 RCUs = 10 WCUs = 1 Current table Older tables HotdataColddata Don’t mix hot and cold data; archive cold data to Amazon S3
  43. 43. DynamoDB TTL RCUs = 10000 WCUs = 10000 RCUs = 100 WCUs = 1 HotdataColddata Use DynamoDB TTL and Streams to archive Events_table_2016_April Event_id (Partition key) Timestamp (sort key) myTTL 1489188093 …. Attribute N Current table Events_Archive Event_id (Partition key) Timestamp (sort key) Attribute1 …. Attribute N
  44. 44. Isolate cold data from hot data Pre-create daily, weekly, monthly tables Provision required throughput for current table Writes go to the current table Turn off (or reduce) throughput for older tables OR move items to separate table with TTL Dealing with time series data
  45. 45. Product Catalog Popular items (read)
  46. 46. Partition 1 2000 RCUs Partition K 2000 RCUs Partition M 2000 RCUs Partition 50 2000 RCU Scaling bottlenecks Product A Product B Shoppers ProductCatalog Table 100,000 𝑅𝐶𝑈 50 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 ≈ 𝟐𝟎𝟎𝟎 𝑅𝐶𝑈 𝑝𝑒𝑟 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"
  47. 47. RequestsPerSecond Item Primary Key Request Distribution Per Partition Key DynamoDB Requests
  48. 48. DAX DynamoDB Accelerator (DAX) In-memory write-through caching cluster built specifically for DynamoDB DynamoDB App DAX
  49. 49. What is DAX? The benefits of DAX: •Latency: <1ms •Throughput: Millions of RPS for a 3-node (highly available) cluster •Simplified caching: DynamoDB API, AWS integration •Hot-key/cost savings: reduce over provisioning for frequently accessed data
  50. 50. DAX Latency DynamoDB App DynamoDB App ~5ms ~5ms~400us DAX
  51. 51. AMAZON VPC EC2 App DAX SDK How DAX works Reads: Check DAX first, then read from DynamoDB 1 2 34 GetItem (Synchronous) DynamoDB
  52. 52. AMAZON VPC EC2 App DAX SDK How DAX works Writes: DAX is a write-through cache 1 2 34 PutItem DynamoDB
  53. 53. How DAX works DAX is API compatible with DynamoDB Read APIs: GetItem, BatchGetItem, Query, Scan Modify APIs: PutItem, UpdateItem, DeleteItem, BatchWriteItem Control plane APIs: Not supported (CreateTable, DeleteTable, etc.)
  54. 54. RequestsPerSecond Item Primary Key Request Distribution Per Partition Key DynamoDB Requests Cache Hits
  55. 55. Promotion Tracking Analytics
  56. 56. Requirements for promotion tracking Monitor customer promotion redemption Support adhoc queries from Hive Archive historical promotion redemption in Redshift BI requirements are not known in advance
  57. 57. 1. Create Orders 2. Capture User Activity CUSTOMER-ID ORDER-ITEM-ID PROMOTION-ID STATUS CUST-1 ITEM-a PROMO-PRIMEDAY OK CUST-1 ITEM-b PROMO-GOLDBOX OK CUST-1 ITEM-c PROMO-GOLDBOX OK CUST-2 ITEM-a PROMO-PRIMEDAY OK CUST-3 ITEM-a PROMO-PRIMEDAY RETURNED
  58. 58. BI Questions 1. Which promotions were valued by customers aged 18-24 y/o? 2. How many PrimeDay promotions were returned, by country? 3. How did the PrimeDay sales compare to Black Friday last year? CUSTOMER-ID ORDER-ITEM-ID PROMOTION-ID STATUS CUST-1 ITEM-a PROMO-PRIMEDAY OK CUST-1 ITEM-b PROMO-GOLDBOX OK CUST-1 ITEM-c PROMO-GOLDBOX OK CUST-2 ITEM-a PROMO-PRIMEDAY OK CUST-3 ITEM-a PROMO-PRIMEDAY RETURNED Amazon S3 Redshift Other data sources
  59. 59. CUSTOMER-ID ORDER-ITEM-ID PROMOTION-ID STATUS CUST-1 ITEM-a PROMO-PRIMEDAY OK CUST-1 ITEM-b PROMO-GOLDBOX OK CUST-1 ITEM-c PROMO-GOLDBOX OK CUST-2 ITEM-a PROMO-PRIMEDAY OK CUST-3 ITEM-a PROMO-PRIMEDAY RETURNED Amazon DynamoDB DynamoDB Streams Amazon Kinesis Firehose Redshift 1. Keep all promo redemption history in Redshift CUSTOMER-ID ORDER-ITEM-ID PROMOTION-ID STATUS CUST-1 ITEM-a PROMO-PRIMEDAY OK CUST-1 ITEM-b PROMO-GOLDBOX OK CUST-1 ITEM-c PROMO-GOLDBOX OK CUST-2 ITEM-a PROMO-PRIMEDAY OK CUST-3 ITEM-a PROMO-PRIMEDAY RETURNED 2. Support adhoc queries with HQL Amazon EMR Amazon DynamoDB Other data sources Amazon S3
  60. 60. Use DynamoDB Streams to record changes Use S3 as intermediary data store Use Hive at low priority for adhoc queries Analyze Items out of band Queries are unknown at design Other data sources Amazon S3 RedshiftAmazon DynamoDB DynamoDB Streams
  61. 61. Real-Time Voting Write-heavy items
  62. 62. Requirements for voting Allow each person to vote only once No changing votes Real-time aggregation Voter analytics, demographics
  63. 63. Real-time voting architecture AggregateVotes Table Voters RawVotes Table Voting App
  64. 64. Partition 1 1000 WCUs Partition K 1000 WCUs Partition M 1000 WCUs Partition N 1000 WCUs Votes Table Candidate A Candidate B Scaling bottlenecks Voters Provision 200,000 WCUs
  65. 65. Write sharding Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_7 Candidate B_8 Candidate A_6 Candidate A_8 Candidate A_5 Voter Votes Table
  66. 66. Write sharding Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_7 Candidate B_8 UpdateItem: “CandidateA_” + rand(0, 10) ADD 1 to Votes Candidate A_6 Candidate A_8 Candidate A_5 Voter Votes Table
  67. 67. Votes Table Shard aggregation Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_5 Candidate A_6 Candidate A_8 Candidate A_7 Candidate B_8 Periodic Process Candidate A Total: 2.5M 1. Sum 2. Store Voter
  68. 68. Trade off read cost for write scalability Consider throughput per partition key and per partition Shard write-heavy partition keys Your write workload is not horizontally scalable
  69. 69. Correctness in voting UserId Candidate Date Alice A 2016-10-02 Bob B 2016-10-02 Eve B 2016-10-02 Chuck A 2016-10-02 RawVotes Table Segment Votes A_1 23 B_2 12 B_1 14 A_2 25 AggregateVotes Table Voter 1. Record vote and de-dupe; retry 2. Increment candidate counter
  70. 70. Correctness in aggregation? UserId Candidate Date Alice A 2016-10-02 Bob B 2016-10-02 Eve B 2016-10-02 Chuck A 2016-10-02 RawVotes Table Segment Votes A_1 23 B_2 12 B_1 14 A_2 25 AggregateVotes Table Voter
  71. 71. DynamoDB Streams
  72. 72. Stream of updates to a table Asynchronous Exactly once Strictly ordered • Per item Highly durable • Scale with table 24-hour lifetime Sub-second latency DynamoDB Streams
  73. 73. View Type Destination Old image—before update Name = John, Destination = Mars New image—after update Name = John, Destination = Pluto Old and new images Name = John, Destination = Mars Name = John, Destination = Pluto Keys only Name = John View types UpdateItem (Name = John, Destination = Pluto)
  74. 74. Stream Partition 1 Partition 2 Partition 3 Partition 4 Table Shard 1 Shard 2 Shard 3 Shard 4 KCL Worker KCL Worker KCL Worker KCL Worker Amazon Kinesis Client Library Application DynamoDB Client Application Updates DynamoDB Streams and Amazon Kinesis Client Library
  75. 75. DynamoDB Streams Open Source Cross- Region Replication Library Asia Pacific (Sydney) EU (Ireland) Replica US East (N. Virginia) Cross-region replication
  76. 76. Real-time voting architecture (improved) AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis– Enabled App Voters RawVotes TableVoting App RawVotes DynamoDB Stream
  77. 77. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis- Enabled App Voters RawVotes TableVoting App RawVotes DynamoDB Stream
  78. 78. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis- Enabled app Voters RawVotes TableVoting App RawVotes DynamoDB Stream
  79. 79. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis– Enabled App Voters RawVotes TableVoting app RawVotes DynamoDB Stream
  80. 80. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis– Enabled App Voters RawVotes TableVoting app RawVotes DynamoDB Stream
  81. 81. Analytics with DynamoDB Streams Collect and de-dupe data in DynamoDB Aggregate data in-memory and flush periodically Performing real-time aggregation and analytics
  82. 82. Architecture
  83. 83. Reference Architecture
  84. 84. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved aws.amazon.com/activate Don’t forget to take the survey! Up next: 11:00AM-11:30AM | Amazon DynamoDB Accelerator

×