AWS Webcast - Data Modeling for low cost and high performance with DynamoDB

3,490 views

Published on

Efficient schema design reduces cost and eliminates barriers to scalability. This requires a different approach to data modeling, with a focus on optimizing for usage patterns rather than merely describing objects. In this session, you will learn best practice techniques for minimizing payload size and modeling one-to-many relationships in DynamoDB, leveraging the versatility of hash+range primary keys. These methods have been used extensively by customers with substantial workloads on DynamoDB, enabling them to grow their applications quickly and cost-effectively.

Published in: Business, Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,490
On SlideShare
0
From Embeds
0
Number of Embeds
333
Actions
Shares
0
Downloads
95
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

AWS Webcast - Data Modeling for low cost and high performance with DynamoDB

  1. 1. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Data Modeling forLow Cost and High Performancewith Amazon DynamoDBDavid PearsonSiva Raghupathy
  2. 2. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.What is AWS?Compute StorageAWS Global InfrastructureDatabaseApplication ServicesDeployment & AdministrationNetworking
  3. 3. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Amazon DynamoDBFast, Predictable, Highly-Scalable NoSQL Data StoreAmazon RDSManaged Relational Database Service forMySQL, Oracle and SQL ServerAmazon ElastiCacheIn-Memory Caching ServiceAmazon RedshiftFast, Powerful, Fully Managed, Petabyte-ScaleData Warehouse ServiceCompute StorageAWS Global InfrastructureDatabaseApplication ServicesDeployment & AdministrationNetworkingAWS DatabaseServicesScalable High PerformanceApplication Storage in the Cloud
  4. 4. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Amazon DynamoDBFast, Predictable, Highly-Scalable NoSQL Data StoreAmazon RDSManaged Relational Database Service forMySQL, Oracle and SQL ServerAmazon ElastiCacheIn-Memory Caching ServiceAmazon RedshiftFast, Powerful, Fully Managed, Petabyte-ScaleData Warehouse ServiceCompute StorageAWS Global InfrastructureDatabaseApplication ServicesDeployment & AdministrationNetworkingAWS DatabaseServicesScalable High PerformanceApplication Storage in the Cloud
  5. 5. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Fully ManagedNon-RelationalPredictable PerformanceMassively Scalable=
  6. 6. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.= FASTDevelop in daysScale in minutesLow Latencysingle-digit msec with on-diskdurability, spanning multiple AZ’sAdmin-Free Scalabilityrequested-capacity provisioningof read and write throughputRapid Deploymentsimple APIs and no effort neededto configure and maintain a cluster
  7. 7. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Data Lifecycle Integration with RedshiftDirect integration with COPY commandHigh velocity data ages into RedshiftLow cost, high scale option for new appsDynamoDBRedshiftOLTPWeb AppsReportingand BI
  8. 8. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.New! Local Secondary Indexessupport for newaccess patternsqueryflexibilityapplicationcomplexityconsistent latency
  9. 9. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Access Pattern ModelingMethod1. Describe the overall use case – maintain context2. Identify the individual access patterns of the use case3. Model each access pattern to its own discrete data set4. Consolidate data sets into tables and indexesBenefits• Data is stored in the format it is accessed• Payloads are minimal for each access
  10. 10. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.AgendaDynamoDB RecapModeling PrimitivesModeling Examples
  11. 11. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Tables, Items, AttributesAn AWS account owns a collection of TablesA Table is a collection of ItemsAn Item is a arbitrary collection of Attributes (Name-Value pairs)11
  12. 12. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Primary KeyA table mush have a Primary Key• Each item must have a unique primary keyTypes of Primary keys:• Hash• One attribute is chosen as the Hash key• Unordered hash index• Hash and Range• Two attributes constitute a composite key– First is Hash» Unordered hash index– Second is Range» Sorted range index• Sorted collection within a hash bucket
  13. 13. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Online Gaming Example
  14. 14. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.DynamoDB NamespaceAWSAccount1Region1Table1Item1Attribute1(Hash key)Attribute2(Range key)Attribute3Item2Attribute1Attribute2...…Table2Item1Attribute1(Hash key)Attribute2…Item2Attribute1Attribute2………Region2 …
  15. 15. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.IndexingData is indexed by the primary keyLocal Secondary Indexes provide an “alternate rangekey” for your table for efficient queries
  16. 16. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.PartitioningDynamoDB automatically partitions data by the hash key• Hash key spreads data & workload across partitionsAuto-partitioning driven by:• Data set size• Provisioned ThroughputTip: Large number of unique hash keys and uniformdistribution of workload across hash keys lends well tomassive scale!
  17. 17. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Data typesScalar data types• String (S)• Unicode with UTF8 binary encoding• Number (N)• Up to 38 digits precision and can be between 10-128 to 10+126• Variable width encoding can occupy up to 21 bytes• Binary (B)• Base64 encoded binary dataMulti-valued types• String Set (SS)• Number Set (NS)• Binary Set (BS)17
  18. 18. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.DynamoDB APICreateTableUpdateTableDeleteTableDescribeTableListTablesPutItemGetItemUpdateItemDeleteItemBatchGetItemBatchWriteItemQueryScanmanage tablesquery specificitems OR scanthe full tableread andwrite itemsbulk get orupdate
  19. 19. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Read PatternsGetItem• Returns a set of attributes for an item that matches the primary keyQuery• Only works on a table with a composite hash, range key• Hash key = ‘xxxxx’ and no range key condition• Hash key = ‘xxxxx’ and a range key condition (EQ, GT, LT, GE, LE, BEGINS_WITH,BETWEEN)• Count of items (that match a hash key value or hash key + range condition)• Top N / Bottom N items ( via ScanIndexForward = T/F & Limit N)• Paging via Limit N & ExclusiveStartKey = LastEvaluatedKeyBatchGetItem• Returns the attributes for multiple items from multiple tables using their primary keysScan• Scans a table from beginning to end and applies filters19
  20. 20. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Write PatternsPutItem• Add a new item, replace an item with a new item• Conditional: Insert a new item only if the PK does not exist• Can returns ALL_OLDUpdateItem• Add, update or delete an attribute (other than the PK)• Increment an attribute (X = X + 10)• Atomic increment and get• Insert a new item and attributes• Conditional: Insert a new attribute if it does not exist• Can return ALL_OLD, UPDATED_OLD, ALL_NEW or UPDATED_NEW (optional)DeleteItem• Delete an item• Conditional: Delete an item if it exists or if it has an expected attribute value• Can return ALL_OLD (optional)BatchWriteItem• Up to 25 put or delete operations (or 1 MB payload) in a single API call• Not atomic across multiple items or tables (but individual updates are atomic)• No conditional updates or return values20
  21. 21. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.AgendaDynamoDB RecapModeling PrimitivesModeling Examples
  22. 22. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Modeling 1:1 relationshipsUse a table with a hash keyExamples:• Users• Hash key = UserID• Games• Hash key = GameIdUsers TableHash key AttributesUserId = bob Email = bob@gmail.com, JoinDate = 2011-11-15UserId = fred Email = fred@yahoo.com, JoinDate = 2011-12-01, Sex = MGames TableHash key AttributesGameId = Game1 LaunchData = 2011-10-15, Version = 2,GameId = Game2 LaunchDate = 2010-05-12, Version = 3,GameId = Game3 LaunchDate = 2012-01-20, version = 1
  23. 23. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Modeling 1:N relationshipsUse a table with hash and range keyExample:• One (1) User can play many (N) Games• User_Games table– Hash key = UserId– Range key = GameIdUser Games tableHash Key Range key AttributesUserId = bob GameId =Game1HighScore = 10500,ScoreDate = 2011-10-20UserId =fredGameId =Game2HIghScore = 12000,ScoreDate = 2012-01-10UserId = bob GameId =Game3HighScore = 20000,ScoreDate = 2012-02-12
  24. 24. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Modeling N:M relationshipsUse two hash and range tablesExample:• One User can play many Games• Hash key = UserId• Range key = GameId• One Game can have many Users• Hash key = GameId• Range key = UserIdUser_GamesHash Key Range keyUserId = bob GameId = Game1UserId = fred GameId = Game2UserId = bob GameId = Game3Game_UsersHash Key Range keyGameId = Game1 UserId = bobGameId = Game2 UserId = fredGameId = Game3 UserId = bob
  25. 25. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Modeling Multi-tenancyUse tenant id as thehash keyExample:ForumName isthe tenent id in theThread table
  26. 26. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.AgendaDynamoDB RecapModeling PrimitivesModeling Examples
  27. 27. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Example1: Multi-tenant application for filestoring and sharingAccess Patterns1. Users should be able to query all the files they own2. Search by File Name3. Search by File Type4. Search by Date Range5. Keep track of Shared Files6. Search by descending order or File Size
  28. 28. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Entities and RelationshipsEntities:• Users• FilesRelationship• One User has many Files (1:N)
  29. 29. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.DynamoDB Data ModelUsers• Hash key = UserId (S)• Attributes = User Name (S), Email (S), Address (SS), etc.User_Files• Hash key = UserId (S) – This is also the tenant id• Range key = FileId (S)• Attributes = Name (S), Type (S), Size (N), Date (S), SharedFlag(S), S3key (S)
  30. 30. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Access Pattern 1Find all files owned by a user• Query (TableName = User_Files, UserId = 2)UserId(Hash)FileId(Range)Name Date Type SharedFlag Size S3key1 1 File1 2013-04-23 JPG 1000 bucket11 2 File2 2013-03-10 PDF Y 100 bucket22 1 File3 2013-03-10 PNG Y 2000 bucket32 2 File4 2013-03-10 DOC 3000 bucket43 1 File5 2013-04-10 TXT 400 bucket5
  31. 31. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Access Pattern 2Search by file name• Query (TableName =User_Files, IndexName =NameIndex, UserId = 1,Name = File1)UserId Name FileId1 File1 11 File2 22 File3 12 File4 23 File5 1NameIndex
  32. 32. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Access Pattern 3Search for file name byfile Type• Query (TableName =User_Files, IndexName =TypeIndex, UserId = 2,Type = DOC)UserId Type FileId Name1 JPG 1 File11 PDF 2 File22 DOC 2 File42 PNG 1 File33 TXT 1 File5ProjectionTypeIndex
  33. 33. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Access Pattern 4Search for file name bydate range• Query (TableName =User_Files, IndexName =DateIndex, UserId = 1,Date between 2013-03-01 and 2013-03-29)UserId Date FileId Name1 2013-03-10 2 File21 2013-04-23 1 File12 2013-03-10 1 File32 2013-03-10 2 File43 2013-04-10 1 File5ProjectionDateIndex
  34. 34. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Access Pattern 5Search for names ofShared files• Query (TableName =User_Files, IndexName =SharedFlagIndex, UserId= 1, SharedFlag = Y)UserId SharedFlag FileId Name1 Y 2 File22 Y 1 File3ProjectionSharedFlagIndex
  35. 35. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Access Pattern 6Query for file names bydescending order of size• Query (TableName =User_Files, IndexName =SizeIndex, UserId = 1,ScanIndexForward = false)UserId Size FileId Name1 100 1 File13 400 1 File21 1000 2 File32 2000 1 File42 3000 2 File5Projection
  36. 36. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Local Secondary IndexesTable Name Index Name Attribute toIndexProjected AttributeUser_Files NameIndex Name KEYSUser_Files TypeIndex Type KEYS + NameUser_Files DateIndex Date KEYS + NameUser_Files SharedFlagIndex SharedFlag KEYS + NameUser_Files SizeIndex Size KEYS + Name
  37. 37. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Example 2: Modeling large itemsBreak large attributesacross multipleDynamoDB itemsStore Large attributesin Amazon S3MESSAGE-ID(hash key)1 FROM = ‘user1’TO = ‘user2’DATE = ‘12/12/2011’SUBJECT = ‘DynamoDB Best practices’BODY= ‘The first few Kbytes…..’BODY_OVERFLOW = ‘S3bucket+key’MESSAGE-ID(hash key)PART(range key)1 0 FROM = ‘user1’TO = ‘user2’DATE = ‘12/12/2011’SUBJECT = ‘DynamoDB Best practices’BODY = ‘The first few Kbytes…..’1 1 BODY = ‘ the next 64k’1 2 BODY = ‘ the next 64k’1 3 EOM37
  38. 38. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Example 2: Modeling large itemsUse a overflow table for large attributesRetrieve items via Batch GetMail Box TableID (hash key)Timestamp (range key)Attribute1Attribute2Attribute3….AttributeNLargeAttributeMailBox TableID (hash key)Timestamp (range key)Attribute1Attribute2Attribute3….AttributeNLargeAttributeUUIDOverflow TableLargeAttributeUUID LargeAttribute
  39. 39. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Example 3: Modeling Time Series DataYou application wants tokeep one year historic dataYou can pre-create onetable per week (or per dayor per month) and insertrecords into the appropriatetable based on timestamp39Events_table_2012Event_id(Hash key)Timestamp(range key)Attribute1 …. Attribute NEvents_table_2012_05_week1Event_id(Hash key)Timestamp(range key)Attribute1 …. Attribute NEvents_table_2012_05_week2Event_id(Hash key)Timestamp(range key)Attribute1 …. Attribute NEvents_table_2012_05_week3Event_id(Hash key)Timestamp(range key)Attribute1 …. Attribute N
  40. 40. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Example 4: Modeling Global Secondary IndexesCreate global secondaryindexes• Example: First_name_index& Last_name_indexQuery: Get me all the Usersdata for First_name = ‘Tim’• Query First_name_index forhash key = ‘Tim’• This will return User_id =(101, 201)• BatchGet (Users, [101, 201])40User_Id(hashkey)First_name Last_name …101 Tim White201 Tim Black301 Ted White401 Keith Brown501 Keith White601 Keith BlackFirst_name(hash key)User_id(range key)Tim 101Tim 201Ted 301Keith 401Keith 501Keith 601Last_name(hash key)User_id(range key)White 101Black 201White 301Brown 401White 501Black 601UsersFirst_name_index Last_name_index
  41. 41. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Example 5: Model for Avoiding Hot KeysUse multiple keys (aliases) instead of asingle hot keyGenerate aliases by prefixing or suffixinga known range (N)Use BatchGetItem API to retrieve ticketcounts for all the aliases (1_Avatar,2_Avatar, 3_Avatar,…, N_Avatar) andsum them in your client application41MOVIESMNAME (hash key)1_Avatar TicketCount =4,000,0002_Avatar TicketCount =2,000,0003_Avatar TicketCount =4,000,000….N_Avatar TicketCount =4,000,000
  42. 42. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Tips for Minimizing Storage & Throughput CostsKeep item size as small as possible• Consider compressing attribute values and storing them as binary• Keep attribute names succinctUse the right storage service• Example: keep Blobs in S3 and metadata in DynamoDBUse overflow table for large items and do batch getsUse table for time period for time series data
  43. 43. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.SummaryAccess Pattern Modeling enables applications toscale with minimal overhead and costUse Case  Access Patterns  Data DesignDynamoDB enables 1:M relationships within tablesvia support for hash+range primary keysLocal Secondary Indexes provide complexquery support without performance degradation
  44. 44. © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.Questions?http://aws.amazon.com/resources/databaseservices/webinars

×