Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Dive - DynamoDB

5,082 views

Published on

Amazon DynamoDB is a fully managed, highly scalable distributed database service. In this technical talk, we show you how to use Amazon DynamoDB to build high-scale applications like social gaming, chat, and voting. We show you how to use building blocks such as secondary indexes, conditional writes, consistent reads, and batch operations to build the higher-level functionality such as multi-item atomic writes and join queries. We also discuss best practices such as index projections, item sharding, and parallel scan for maximum scalability.

Speakers:
Philip Fitzsimons, AWS Solutions Architect
Richard Freeman, PhD, Senior Data Scientist/Architect, JustGiving

Published in: Technology

Deep Dive - DynamoDB

  1. 1. ©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved Deep Dive: Amazon DynamoDB Fitz (Philip Fitzsimons) Solutions Architecture Amazon Web Services
  2. 2. Agenda •  Tables, API, data types, indexes •  Scaling •  Data modeling •  Scenarios and best practices •  Customer Story •  DynamoDB Streams •  Reference architecture
  3. 3. Amazon DynamoDB •  Managed NoSQL database service •  Supports both document and key-value data models •  Highly scalable •  Consistent, single-digit millisecond latency at any scale •  Highly available—3x replication •  Simple and powerful API
  4. 4. Tables, API, Data Types
  5. 5. Table Table Items AAributes Hash Key Range Key Mandatory Key-value access pattern Determines data distribution Optional Model 1:N relationships Enables rich query capabilities All  items  for  a  hash  key ==,  <,  >,  >=,  <= “begins  with” “between” sorted  results counts top/boAom  N  values paged  responses
  6. 6. •  CreateTable •  UpdateTable •  DeleteTable •  DescribeTable •  ListTables •  GetItem •  Query •  Scan •  BatchGetItem •  PutItem •  UpdateItem •  DeleteItem •  BatchWriteItem •  ListStreams •  DescribeStream •  GetShardIterator •  GetRecords Table and item API Stream API DynamoDB In preview
  7. 7. Data types •  String (S) •  Number (N) •  Binary (B) •  String Set (SS) •  Number Set (NS) •  Binary Set (BS) •  Boolean (BOOL) •  Null (NULL) •  List (L) •  Map (M) Used for storing nested JSON documents
  8. 8. 00 55 A954 AA FF Hash table •  Hash key uniquely identifies an item •  Hash key is used for building an unordered hash index •  Table can be partitioned for scale 00 FF Id = 1 Name = Jim Hash (1) = 7B Id = 2 Name = Andy Dept = Engg Hash (2) = 48 Id = 3 Name = Kim Dept = Ops Hash (3) = CD Key Space
  9. 9. Partitions are three-way replicated Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Id = 2 Name = Andy Dept = Engg Id = 3 Name = Kim Dept = Ops Id = 1 Name = Jim Replica 1 Replica 2 Replica 3 Partition 1 Partition 2 Partition N
  10. 10. Hash-range table •  Hash key and range key together uniquely identify an Item •  Within unordered hash index, data is sorted by the range key •  No limit on the number of items (∞) per hash key –  Except if you have local secondary indexes 00:0 FF:∞ Hash (2) = 48 Customer# = 2 Order# = 10 Item = Pen Customer# = 2 Order# = 11 Item = Shoes Customer# = 1 Order# = 10 Item = Toy Customer# = 1 Order# = 11 Item = Boots Hash (1) = 7B Customer# = 3 Order# = 10 Item = Book Customer# = 3 Order# = 11 Item = Paper Hash (3) = CD 55 A9:∞54:∞ AA Partition 1 Partition 2 Partition 3
  11. 11. Indexes
  12. 12. Local secondary index (LSI) •  Alternate range key attribute •  Index is local to a hash key (or partition) A1   (hash)   A3   (range)   A2   (table  key)   A1   (hash)   A2   (range)   A3  A4  A5   LSIs A1   (hash)   A4   (range)   A2   (table  key)   A3   (projected)   Table KEYS_ONLY INCLUDE A3 A1   (hash)   A5   (range)   A2   (table  key)   A3   (projected)   A4   (projected)   ALL 10 GB max per hash key, i.e. LSIs limit the # of range keys!
  13. 13. Global secondary index (GSI) •  Alternate hash (+range) key •  Index is across all table hash keys (partitions) A1   (hash)   A2  A3  A4  A5   GSIs A5   (hash)   A4   (range)   A1   (table  key)   A3   (projected)   Table INCLUDE A3 A4   (hash)   A5   (range)   A1   (table  key)   A2   (projected)   A3   (projected)   ALL A2   (hash)   A1   (table  key)   KEYS_ONLY RCUs/WCUs provisioned separately for GSIs Online indexing
  14. 14. How do GSI updates work? Table Primary table Primary table Primary table Primary table Global Secondary Index Client 1. Update request 2. Asynchronous update (in progress) 2. Update response If GSIs don’t have enough write capacity, table writes will be throttled!
  15. 15. LSI or GSI? •  LSI can be modeled as a GSI •  If data size in an item collection > 10 GB, use GSI •  If eventual consistency is okay for your scenario, use GSI!
  16. 16. Scaling
  17. 17. Scaling •  Throughput –  Provision any amount of throughput to a table •  Size –  Add any number of items to a table •  Max item size is 400 KB •  LSIs limit the number of range keys due to 10 GB limit •  Scaling is achieved through partitioning
  18. 18. Throughput •  Provisioned at the table level –  Write capacity units (WCUs) are measured in 1 KB per second –  Read capacity units (RCUs) are measured in 4 KB per second •  RCUs measure strictly consistent reads •  Eventually consistent reads cost 1/2 of consistent reads •  Read and write throughput limits are independent WCURCU
  19. 19. Partitioning math #   𝑜𝑓   𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠  =  ​ 𝑇 𝑎𝑏𝑙𝑒   𝑆𝑖𝑧𝑒   𝑖𝑛   𝐺𝐵/10   𝐺𝐵  ( 𝑓𝑜𝑟   𝑠𝑖𝑧𝑒) #   𝑜𝑓   𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 ( 𝑓𝑜𝑟   𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡) =      ​​ 𝑅 𝐶𝑈↓𝑓𝑜𝑟   𝑟𝑒𝑎𝑑𝑠 /3000   𝑅𝐶𝑈   +  ​​ 𝑊 𝐶𝑈↓𝑓𝑜𝑟   𝑤𝑟𝑖𝑡𝑒𝑠 / 1000   𝑊𝐶𝑈  #   𝑜𝑓   𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠=​MAX⁠​# 𝑜𝑓   𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠  ⁠  # 𝑜𝑓   𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠   ( 𝑓𝑜𝑟   𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡)( 𝑓𝑜𝑟   𝑠𝑖𝑧𝑒)( 𝑡𝑜𝑡𝑎𝑙) In the future, these details might change…
  20. 20. Partitioning example #   𝑜𝑓   𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠  =  ​8   𝐺𝐵/10   𝐺𝐵  = 0.8 = 1 ( 𝑓𝑜𝑟   𝑠𝑖𝑧𝑒) #   𝑜𝑓   𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 ( 𝑓𝑜𝑟   𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡) =      ​​5000↓𝑅𝐶𝑈 /3000   𝑅𝐶𝑈   +  ​​500↓𝑊𝐶𝑈 /1000   𝑊𝐶𝑈  = 2.17 = 3 Table  size  =  8  GB,  RCUs  =  5000,  WCUs  =  500 #   𝑜𝑓   𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠=​MAX⁠​1   𝑓𝑜𝑟   𝑠𝑖𝑧𝑒⁠  3   𝑓𝑜𝑟   𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡   ( 𝑡𝑜𝑡𝑎𝑙) RCUs  per  partition  =  5000/3  =  1666.67 WCUs  per  partition  =  500/3  =    166.67 Data/partition  =  8/3  =  2.66  GB RCUs and WCUs are uniformly spread across partitions
  21. 21. Getting the most out of DynamoDB throughput “To get the most out of DynamoDB throughput, create tables where the hash key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.” —DynamoDB Developer Guide •  Space: access is evenly spread over the key-space •  Time: requests arrive evenly spaced in time
  22. 22. Example: hot keys Partition Time Heat
  23. 23. Example: periodic spike
  24. 24. How does DynamoDB handle bursts? •  DynamoDB saves 300 seconds of unused capacity per partition •  This is used when a partition runs out of provisioned throughput due to bursts –  Provided excess capacity is available at the node
  25. 25. Burst capacity is built-in 0 400 800 1200 1600 CapacityUnits Time Provisioned Consumed “Save up” unused capacity Consume saved up capacity Burst capacity: 300 seconds (1200 × 300 = 3600 CU)
  26. 26. Burst capacity may not be sufficient 0 400 800 1200 1600 CapacityUnits Time Provisioned Consumed Attempted Burst capacity: 300 seconds (1200 × 300 = 3600 CU) Throttled requests Don’t completely depend on burst capacity… provision sufficient throughput
  27. 27. What causes throttling? •  If sustained throughput goes beyond provisioned throughput per partition •  From the example before: –  Table created with 5000 RCUs, 500 WCUs –  RCUs per partition = 1666.67 –  WCUs per partition = 166.67 –  If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB may throttle requests •  Solution: Increase provisioned throughput
  28. 28. What causes throttling? •  Non-uniform workloads –  Hot keys/hot partitions –  Very large bursts •  Dilution of throughout across partitions caused by mixing hot data with cold data –  Use a table per time period for storing time series data so WCUs and RCUs are applied to the hot data set
  29. 29. Data Modeling Store data based on how you will access it!
  30. 30. 1:1 relationships or key-values •  Use a table with a hash key •  Use GetItem or BatchGetItem API Example: Given a user or email, get attributes Users  Table   Hash  key   A/ributes   UserId  =  bob   Email  =  bob@gmail.com,  JoinDate  =  2011-­‐11-­‐15   UserId  =  fred   Email  =  fred@yahoo.com,  JoinDate  =  2011-­‐12-­‐01   Users-­‐Email-­‐GSI   Hash  key   A/ributes   Email  =  bob@gmail.com   UserId  =  bob,  JoinDate  =  2011-­‐11-­‐15   Email  =  fred@yahoo.com   UserId  =  fred,  JoinDate  =  2011-­‐12-­‐01  
  31. 31. 1:N relationships or parent-children •  Use a table with hash and range key •  Use Query API Example: –  Given a device, find all readings between epoch X, Y Device-­‐measurements   Hash  Key   Range  key   A/ributes   DeviceId  =  1   epoch  =  5513A97C   Temperature  =  30,  pressure  =  90   DeviceId  =  1   epoch  =  5513A9DB   Temperature  =  30,  pressure  =  90  
  32. 32. N:M relationships •  Use a table and GSI with hash and range key elements switched •  Use Query API Example: Given a user, find all games. Or given a game, find all users. User-­‐Games-­‐Table   Hash  Key   Range  key   UserId  =  bob   GameId  =  Game1   UserId  =  fred   GameId  =  Game2   UserId  =  bob   GameId  =  Game3   Game-­‐Users-­‐GSI   Hash  Key   Range  key   GameId  =  Game1   UserId  =  bob   GameId  =  Game2   UserId  =  fred   GameId  =  Game3   UserId  =  bob  
  33. 33. Documents (JSON) •  New data types (M, L, BOOL, NULL) introduced to support JSON •  Document SDKs –  Simple programming model –  Conversion to/from JSON –  Java, JavaScript, Ruby, .NET •  Cannot index (S,N) elements of a JSON object stored in M –  They need to be modeled as top-level table attributes to be used in LSIs and GSIs Javascript DynamoDB string S number N boolean BOOL null NULL array L object M
  34. 34. Rich expressions •  Projection expression –  Query/Get/Scan: ProductReviews.FiveStar[0] •  Filter expression –  Query/Scan: #V > :num (#V is a place holder for keyword VIEWS) •  Conditional expression –  Put/Update/DeleteItem: attribute_not_exists (#pr.FiveStar) •  Update expression –  UpdateItem: set Replies = Replies + :num
  35. 35. Scenarios and Best Practices
  36. 36. Event Logging Storing time series data
  37. 37. Time series tables Events_table_2015_April Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N Events_table_2015_March Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N Events_table_2015_Feburary Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N Events_table_2015_January Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N RCUs = 1000 WCUs = 100 RCUs = 10000 WCUs = 10000 RCUs = 100 WCUs = 1 RCUs = 10 WCUs = 1 Current table Older tables HotdataColddata Don’t mix hot and cold data; archive cold data to Amazon S3
  38. 38. Use a table per time period •  Pre-create daily, weekly, monthly tables •  Provision required throughput for current table •  Writes go to the current table •  Turn off (or reduce) throughput for older tables Dealing with time series data
  39. 39. Product Catalog Popular items (read)
  40. 40. Partition 1 2000 RCUs Partition K 2000 RCUs Partition M 2000 RCUs Partition 50 2000 RCU Scaling bottlenecks Product A Product B Shoppers ProductCatalog Table 100,000↓𝑅𝐶𝑈 /​50↓𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠      ≈𝟐𝟎𝟎𝟎     𝑅𝐶𝑈   𝑝𝑒𝑟   𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"
  41. 41. RequestsPerSecond Item Primary Key Request Distribution Per Hash Key DynamoDB Requests
  42. 42. Partition 1 Partition 2 ProductCatalog Table User DynamoDB User SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"
  43. 43. RequestsPerSecond Item Primary Key Request Distribution Per Hash Key DynamoDB Requests Cache Hits
  44. 44. Messaging App Large items Filters vs. indexes M:N Modeling—inbox and outbox
  45. 45. Messages Table Messages App David SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC Inbox SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC Outbox
  46. 46. Recipient Date Sender Message David 2014-10-02 Bob … … 48 more messages for David … David 2014-10-03 Alice … Alice 2014-09-28 Bob … Alice 2014-10-01 Carol … Large and small attributes mixed (Many more messages) David Messages Table 50 items × 256 KB each Large message bodies Attachments SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC Inbox
  47. 47. ​50↓𝑖𝑡𝑒𝑚𝑠 ×​256↓𝐾𝐵 ×  ​​1↓𝑅𝐶𝑈 /​4↓𝐾𝐵  ×  ​​1↓𝑟𝑒𝑎𝑑 /​2↓𝑒. 𝑐.   𝑟𝑒𝑎𝑑𝑠  =​ 𝟏 𝟔𝟎𝟎↓𝑹𝑪𝑼  Computing inbox query cost Items evaluated by query Average item size Conversion ratio Eventually consistent reads
  48. 48. Recipient Date Sender Subject MsgId David 2014-10-02 Bob Hi!… afed David 2014-10-03 Alice RE: The… 3kf8 Alice 2014-09-28 Bob FW: Ok… 9d2b Alice 2014-10-01 Carol Hi!... ct7r Separate the bulk data Inbox-GSI Messages Table MsgId Body 9d2b … 3kf8 … ct7r … afed … David 1.  Query Inbox-GSI: 1 RCU 2.  BatchGetItem Messages: 1600 RCU (50 separate items at 256 KB) (50 sequential items at 128 bytes) Uniformly distributes large item reads
  49. 49. Inbox GSI
  50. 50. Simplified writes David PutItem { MsgId: 123, Body: ..., Recipient: Steve, Sender: David, Date: 2014-10-23, ... } Inbox Global secondary index Messages Table
  51. 51. Outbox Sender Outbox GSI SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC
  52. 52. Messaging app Messages Table David Inbox Global secondary index Inbox Outbox Global secondary index Outbox
  53. 53. •  Reduce one-to-many item sizes •  Configure secondary index projections •  Use GSIs to model M:N relationship between sender and recipient Distribute large items Querying many large items at once InboxMessagesOutbox
  54. 54. Multiplayer Online Gaming Query filters vs. composite key indexes
  55. 55. GameId Date Host Opponent Status d9bl3 2014-10-02 David Alice DONE 72f49 2014-09-30 Alice Bob PENDING o2pnb 2014-10-08 Bob Carol IN_PROGRESS b932s 2014-10-03 Carol Bob PENDING ef9ca 2014-10-03 David Bob IN_PROGRESS Games Table Multiplayer online game data
  56. 56. Query for incoming game requests •  DynamoDB indexes provide hash and range •  What about queries for two equalities and a range? SELECT * FROM Game WHERE Opponent='Bob‘ AND Status=‘PENDING' ORDER BY Date DESC (hash) (range) (?)
  57. 57. Secondary Index Opponent Date GameId Status Host Alice 2014-10-02 d9bl3 DONE David Carol 2014-10-08 o2pnb IN_PROGRESS Bob Bob 2014-09-30 72f49 PENDING Alice Bob 2014-10-03 b932s PENDING Carol Bob 2014-10-03 ef9ca IN_PROGRESS David Approach 1: Query filter Bob
  58. 58. Secondary Index Approach 1: Query filter Bob Opponent Date GameId Status Host Alice 2014-10-02 d9bl3 DONE David Carol 2014-10-08 o2pnb IN_PROGRESS Bob Bob 2014-09-30 72f49 PENDING Alice Bob 2014-10-03 b932s PENDING Carol Bob 2014-10-03 ef9ca IN_PROGRESS David SELECT * FROM Game WHERE Opponent='Bob' ORDER BY Date DESC FILTER ON Status='PENDING' (filtered out)
  59. 59. Needle in a haystack Bob
  60. 60. •  Send back less data “on the wire” •  Simplify application code •  Simple SQL-like expressions –  AND, OR, NOT, () Use query filter Your index isn’t entirely selective
  61. 61. Approach 2: Composite key StatusDate DONE_2014-10-02 IN_PROGRESS_2014-10-08 IN_PROGRESS_2014-10-03 PENDING_2014-09-30 PENDING_2014-10-03 Status DONE IN_PROGRESS IN_PROGRESS PENDING PENDING Date 2014-10-02 2014-10-08 2014-10-03 2014-10-03 2014-09-30
  62. 62. Secondary Index Approach 2: Composite key Opponent StatusDate GameId Host Alice DONE_2014-10-02 d9bl3 David Carol IN_PROGRESS_2014-10-08 o2pnb Bob Bob IN_PROGRESS_2014-10-03 ef9ca David Bob PENDING_2014-09-30 72f49 Alice Bob PENDING_2014-10-03 b932s Carol
  63. 63. Opponent StatusDate GameId Host Alice DONE_2014-10-02 d9bl3 David Carol IN_PROGRESS_2014-10-08 o2pnb Bob Bob IN_PROGRESS_2014-10-03 ef9ca David Bob PENDING_2014-09-30 72f49 Alice Bob PENDING_2014-10-03 b932s Carol Secondary Index Approach 2: Composite key Bob SELECT * FROM Game WHERE Opponent='Bob' AND StatusDate BEGINS_WITH 'PENDING'
  64. 64. Needle in a sorted haystack Bob
  65. 65. Sparse indexes Id     (Hash)   User   Game  Score  Date   Award   1   Bob   G1   1300   2012-­‐12-­‐23   2   Bob   G1   1450   2012-­‐12-­‐23   3   Jay   G1   1600   2012-­‐12-­‐24   4   Mary   G1   2000   2012-­‐10-­‐24   Champ   5   Ryan   G2   123   2012-­‐03-­‐10   6   Jones  G2   345   2012-­‐03-­‐20   Game-scores-table Award     (Hash)   Id   User   Score   Champ   4   Mary   2000   Award-GSI Scan sparse hash GSIs
  66. 66. •  Concatenate attributes to form useful secondary index keys •  Take advantage of sparse indexes Replace filter with indexes You want to optimize a query as much as possible Status + Date
  67. 67. Customer Story: JustGiving
  68. 68. AWS Big Data Platform Overview Richard Freeman, PhD Senior Data Scientist / Architect richard.freeman@justgiving.com
  69. 69. 69 We are A tech-for-good company that is Connecting the World’s Causes with People who Care DONATIONS $3bn of donations raised since launch, $576m in 2014 CHARITIES 20,000 charities on the platform USERS 23m users have made a transaction REACH Givers in 164 countries donate to causes in 12 countries DATA Billions of data points AMAZON WEB SERVICES Production environment micro-services and big data analytics, reporting and KPIs
  70. 70. The Challenge •  Data sources: clickstream, logs, data warehouse, and external data sources •  Data too big and SQL queries too complex for data warehouse •  Machine Learning algorithms at scale •  Generate KPI and interactive reports New Data Science Use Cases •  High throughput •  Live view in near real-time •  Scale out •  Minimum maintenance Web Events
  71. 71. Real Time Web Analytics with Kinesis and DynamoDB
  72. 72. The Outcome • Live view on the data • Fast to query and scales out • Minimum support and maintenance • Needed a pattern to maximise throughput of temporal data DynamoDB • AWS services managed cloud services and SDKs • New pattern: Resilient and incremental data loads into many Redshift clusters in parallel • One Redshift cluster per stakeholder, e.g. business, data scientist, analysts, DBAs • Scale out and run complex queries and machine learning algorithms RAVEN platform: Scalable reporting and analytics cloud platform
  73. 73. Thank you! We are hiring! Contact: Richard.freeman @justgiving.com
  74. 74. Derived computations
  75. 75. Real-Time Voting Write-heavy items
  76. 76. Requirements for voting •  Allow each person to vote only once •  No changing votes •  Real-time aggregation •  Voter analytics, demographics
  77. 77. Real-time voting architecture AggregateVotes Table Voters RawVotes Table Voting App
  78. 78. Partition 1 1000 WCUs Partition K 1000 WCUs Partition M 1000 WCUs Partition N 1000 WCUs Votes Table Candidate A Candidate B Scaling bottlenecks Voters Provision 200,000 WCUs
  79. 79. Write sharding Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_7 Candidate B_8 Candidate A_6 Candidate A_8 Candidate A_5 Voter Votes Table
  80. 80. Write sharding Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_7 Candidate B_8 UpdateItem: “CandidateA_” + rand(0, 200) ADD 1 to Votes Candidate A_6 Candidate A_8 Candidate A_5 Voter Votes Table
  81. 81. Votes Table Shard aggregation Candidate A_2 Candidate B_1 Candidate B_2 Candidate B_3 Candidate B_5 Candidate B_4 Candidate B_7 Candidate B_6 Candidate A_1 Candidate A_3 Candidate A_4 Candidate A_5 Candidate A_6 Candidate A_8 Candidate A_7 Candidate B_8 Periodic Process Candidate A Total: 2.5M 1. Sum 2. Store Voter
  82. 82. •  Trade off read cost for write scalability •  Consider throughput per hash key and per partition Shard write-heavy hash keys Your write workload is not horizontally scalable
  83. 83. Correctness in voting UserId Candidate Date Alice A 2013-10-02 Bob B 2013-10-02 Eve B 2013-10-02 Chuck A 2013-10-02 RawVotes Table Segment Votes A_1 23 B_2 12 B_1 14 A_2 25 AggregateVotes Table Voter 1. Record vote and de-dupe; retry 2. Increment candidate counter
  84. 84. Correctness in aggregation? UserId Candidate Date Alice A 2013-10-02 Bob B 2013-10-02 Eve B 2013-10-02 Chuck A 2013-10-02 RawVotes Table Segment Votes A_1 23 B_2 12 B_1 14 A_2 25 AggregateVotes Table Voter
  85. 85. DynamoDB Streams
  86. 86. •  Stream of updates to a table •  Asynchronous •  Exactly once •  Strictly ordered –  Per item •  Highly durable •  Scale with table •  24-hour lifetime •  Sub-second latency DynamoDB Streams
  87. 87. View Type Destination Old image—before update Name = John, Destination = Mars New image—after update Name = John, Destination = Pluto Old and new images Name = John, Destination = Mars Name = John, Destination = Pluto Keys only Name = John View types UpdateItem (Name = John, Destination = Pluto)
  88. 88. Stream Table Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Table Shard 1 Shard 2 Shard 3 Shard 4 KCL Worker KCL Worker KCL Worker KCL Worker Amazon Kinesis Client Library Application DynamoDB Client Application Updates DynamoDB Streams and Amazon Kinesis Client Library
  89. 89. DynamoDB Streams Open Source Cross- Region Replication Library Asia Pacific (Sydney) EU (Ireland) Replica US East (N. Virginia) Cross-region replication
  90. 90. DynamoDB Streams and AWS Lambda
  91. 91. Real-time voting architecture (improved) AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis– Enabled App Voters RawVotes TableVoting App RawVotes DynamoDB Stream
  92. 92. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis- Enabled App Voters RawVotes TableVoting App RawVotes DynamoDB Stream
  93. 93. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis- Enabled app Voters RawVotes TableVoting App RawVotes DynamoDB Stream
  94. 94. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis– Enabled App Voters RawVotes TableVoting app RawVotes DynamoDB Stream
  95. 95. Real-time voting architecture AggregateVotes Table Amazon Redshift Amazon EMR Your Amazon Kinesis– Enabled App Voters RawVotes TableVoting app RawVotes DynamoDB Stream
  96. 96. Analytics with DynamoDB Streams •  Collect and de-dupe data in DynamoDB •  Aggregate data in-memory and flush periodically Performing real-time aggregation and analytics
  97. 97. Architecture
  98. 98. Reference Architecture
  99. 99. LONDON

×