• Save
Amazon DynamoDB Design Patterns & Best Practices
Upcoming SlideShare
Loading in...5
×
 

Amazon DynamoDB Design Patterns & Best Practices

on

  • 2,925 views

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1eC7nXb. ...

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1eC7nXb.

Siva Raghupathy discusses DynamoDB Design Patterns & Best Practices for realizing DynamoDB benefits at the right cost. Filmed at qconnewyork.com.

Siva Raghupathy is a Principal Solutions Architect at Amazon Web Services. He guides customers (including Amazon.com) build successful solutions using AWS. Previously, as a Principal Technical Program Manager for AWS Database Services, he gathered emerging NoSQL requirements and wrote the first version of DynamoDB product specification.

Statistics

Views

Total Views
2,925
Views on SlideShare
2,799
Embed Views
126

Actions

Likes
10
Downloads
0
Comments
0

2 Embeds 126

http://archbridge.ccm.canon.co.jp 66
http://www.scoop.it 60

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Amazon DynamoDB Design Patterns & Best Practices Amazon DynamoDB Design Patterns & Best Practices Presentation Transcript

  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Amazon DynamoDB Design Patterns & Best Practices
  • InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /amazon-dynamodb-patterns- practices
  • Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide View slide
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Internet-scale Database Requirements Unlimited throughput • Social applications • Online gaming Elasticity and flexibility • Application could go viral at any time • Must handle sudden traffic without code changes Predictable performance • Low latency • No latency increase or throughput decrease with increase in data set size or throughput No administration 2 View slide
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. What is Amazon DynamoDB? Fully managed NoSQL database service Accessible via simple web service APIs 3 Id Title Year 1 Terminator 1984 2 Titanic 1997 Movies Table Client
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. What can I do with DynamoDB? Offload operating and scaling a highly available distributed database cluster to AWS • Serve any level of request traffic • Store and retrieve any amount of data • Pay a low price for what you use Fast time to market 4
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. DEMO DynamoDB Speed Test! (www.DynaSpeed.net)
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Region Availability Zone Availability ZoneAvailability Zone Cluster controller 100 c1.mediums / 200 virtual CPUs DynamoDB Demo Architecture…
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Demo Architecture… DynamoDB Master node/cluster controller Worker nodes
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. DATA MODEL, DATA TYPES & API 9
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Tables, Items, Attributes Table is a collection of Items Item is a collection of Attributes (name-value pairs) Primary key is required 10 HashKey Attribute1 Attribute2 Attribute3 item1 userid=bob email=bob@gmail.com joindate=20121221 Sex=M item2 userid=ken email=ken@yahoo.com joindate=20130210 UserProfiles
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Data Types Scalar data types • String, Number, Binary Multi-valued types • String Set, Number Set, Binary Set 11
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Indexing Data Indexed by primary key Type of primary keys • Hash • Hash + Range
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Local Secondary Index Alternate range key Index local to the hash key All indexe data local to the partition user=bob file=file1.txt date=2013/01/10 size=200 url=s3://bucket/bob/file1.txt user=bob file=folder1/file1 date=2013/12/21 size=100 url=s3://bucket/bob/folder1/file1 user=bob file=folder1/file2 data=2013/01/10 size=100 url=s3://bucket/bob/folder1/file1 user=ken file=folder1/file1 date=2013/02/10 size=300 shared=Y url=s3://bucket/ken/folder2/file1 user=ken file=file2.jpg date=2013/02/10 size=300 shared=Y url=s3://bucket/ken/file2.jpg user=bob date=2013/01/10 file=folder1/file2 user=bob date=2013/01/10 file=file1.txt user=bob date=2013/12/21 file=folder1/file1 user=ken date=2013/02/10 file=folder1/file1 user=ken date=2013/02/10 file=file2.jpg DateIndex File MetadataTable
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Partitioning Data is auto-partitioned by hash key Auto-partitioning driven by: • Table size • Provisioned throughput DynamoDB table Client Partition 1 Partition N
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Provisioned Throughput Model Throughput declared/updated via API or console • CreateTable (foo, reads/sec = 100, writes/sec = 100) • UpdateTable (foo, reads/sec = 10000, writes/sec = 10000) DynamoDB handles the rest • Capacity is reserved and available when needed • Throughput increases trigger repartitioning and reallocation High performance at any scale
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. API CreateTable UpdateTable DeleteTable DescribeTable ListTables PutItem GetItem UpdateItem DeleteItem BatchGetItem BatchWriteItem Query Scan manage tables query specific items OR scan the full table read and write items bulk get or update
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Read Patterns GetItem (table, key) -> Item Query (table, hash_key, [range_key_condition]) -> Items BatchGetItem (table1:key1, …tableN:keyN) -> items Scan (table) -> Items 17
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Write Patterns PutItem (table, key, [attributes]) UpdateItem (table, key, [attributes]) DeleteItem (table, key) BatchWriteItem (table1:key1[:attributes]…tableN:keyN[:attributes]) 18
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. DYNAMODB CHARACTERISTICS 19
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. High Availability and Durability Multi-datacenter (AZ) replication and failover • If one machine or datacenter fails, another serves your requests • High availability • Protects against data loss
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. What DynamoDB Manages For You Hardware provisioning Cross-availability zone replication Monitoring and handling of hardware failures • Replicas automatically regenerated whenever necessary Hardware and Software updates ADMIN
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. DynamoDB Scale out Data is automatically partitioned Partitions are fully independent No limits as long as workload is well spread
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Consistently Low Latencies Typically single digit millisecond average Put and Get latencies Custom SSD based storage platform • Performance independent of table size • No need for working set to fit in memory
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Authentication & Wire format Session based authentication • Client establishes session via AWS Security Token Service (STS) and retrieves token • Client signs with session token valid for a few hours • Streamlines authentication to minimize latency Request and Response parameters encoded in JSON • Widely adopted industry standard • Relatively compact and efficient to parse 24
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Consistency Strictly or eventually consistent reads • Specified at API level for maximum flexibility • Throughput, not latency tradeoff Strictly consistent writes • Atomic increment/decrement and get • Conditional write a.k.a. optimistic concurrency control GetItem & Query APIs support eventually consistent and consistent reads Scan & BatchGetItem only support eventually consistent reads 25
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Transactions Supports Item level transaction • UpdateItem, PutItem and DeleteItem operate at the Item level and their changes are ACID • UpdateItem supports atomic ADD and Get No multi-item or cross table transactions • While BatchWriteItem operates on multiple items and across tables, but it only supports transactions at an item level 26
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. MODELING RELATIONSHIPS IN DYNAMODB
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Modeling 1:1 relationships Use a table with a hash key Examples: • Users • Hash key = UserId • Games • Hash key = GameId Users Table Hash key Attributes UserId = bob Email = bob@gmail.com, JoinDate = 2011-11-15 UserId = fred Email = fred@yahoo.com, JoinDate = 2011-12- 01, Sex = M Games Table Hash key Attributes GameId = Game1 LaunchData = 2011-10-15, Version = 2, GameId = Game2 LaunchDate = 2010-05-12, Version = 3, GameId = Game3 LaunchDate = 2012-01-20, version = 1
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Modeling 1:N relationships Use a table with hash and range key Example: • One (1) User can play many (N) Games • User_Games table – Hash key = UserId – Range key = GameId User Games table Hash Key Range key Attributes UserId = bob GameId = Game1 HighScore = 10500, ScoreDate = 2011-10-20 UserId = fred GameId = Game2 HIghScore = 12000, ScoreDate = 2012-01-10 UserId = bob GameId = Game3 HighScore = 20000, ScoreDate = 2012-02-12
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Modeling N:M relationships Use two hash and range tables Example: • One User can play many Games • Hash key = UserId • Range key = GameId • One Game can have many Users • Hash key = GameId • Range key = UserId User_Games Hash Key Range key UserId = bob GameId = Game1 UserId = fred GameId = Game2 UserId = bob GameId = Game3 Game_Users Hash Key Range key GameId = Game1 UserId = bob GameId = Game2 UserId = fred GameId = Game3 UserId = bob
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Modeling Multi-tenancy Use tenant id as the hash key • Example: UserId in the User Profiles and User Scores tables:
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. MODELING EXAMPLE
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Example1: Multi-tenant application for storing file metadata Access Patterns 1. Get bob’s profile 2. List files owned by ‘bob’ 3. List bob’s files created between T1 and T2 4. List bob’s shared files 5. List bob’s files by descending order of file size
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Entities and Relationships Entities: • Users • Files Relationship • One User has many Files (1:N)
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Users (hash) user=bob email=bob@gmail.com joindate=‘2012/12/21’ user=ken email=ken@yahoo.com joindate=‘2013/02/10’ Files (hash-range) user=bob file=file1.txt date=2013/01/10 size=200 url=s3://bucket/bob/file1.txt user=bob file=folder1/file1 date=2013/12/21 size=100 url=s3://bucket/bob/folder1/file1 user=bob file=folder1/file2 data=2013/01/10 size=100 url=s3://bucket/bob/folder1/file1 user=ken file=folder1/file1 date=2013/02/10 size=300 shared=Y url=s3://bucket/ken/folder2/file1 user=ken file=file2.jpg date=2013/02/10 size=300 shared=Y url=s3://bucket/ken/file2.jpg DynamoDB Data Model Hash key (Tenant ID) Range key
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Primary Index Get & Query Get bob’s profile • GetItem (table = Users, user = ‘bob’) List files owned by ‘bob’ • Query (table = Files, user = “bob”) user=bob email=bob@gmail.com joindate=‘2012/12/21’ user=bob file=file1.txt date=2013/01/10 size=200 url=s3://bucket/bob/file1.txt user=bob file=folder1/file1 date=2013/12/21 size=100 url=s3://bucket/bob/folder1/file1 user=bob file=folder1/file2 data=2013/01/10 size=100 url=s3://bucket/bob/folder1/file1
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Local Secondary Index Query List bob’s files & folders created between T1 and T2 • Query (table = Users, user = bob, IndexName = DateIndex, date BETWEEN 2013/01/10 and 2013/01/20) user=bob date=2013/01/10 file=folder1/file2 user=bob date=2013/01/10 file=file1.txt user=bob date=2013/12/21 file=folder1/file1 user=ken date=2013/02/10 file=folder1/file1 user=ken date=2013/02/10 file=file2.jpg DateIndex
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Local Secondary (sparse) Index Query List bob’s shared files & folders • Query (Table = Users, user = bob, IndexName = SharedIndex, shared = Y) • No matches found user=ken shared=Y file=folder1/file1 user=ken shared=Y file=file2.jpg SharedIndex
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Local Secondary Index Query (backwards) List bob’s files & folders by descending order of size • Query (Table = Users, user = bob, IndexName = SizeIndex, ScanIndexForward = false) SizeIndex user=bob size=100 file=folder1/file2 user=bob size=100 file=folder1/file1 user=bob size=200 file=file1.txt user=ken size=0 file=file2.jpg user=ken size=300 file=folder1/file1
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. BEST PRACTICES
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Storing Large Items – Pattern 1 Break large attributes across multiple DynamoDB items Store Large attributes in Amazon S3 MESSAGE-ID (hash key) 1 FROM = ‘user1’ TO = ‘user2’ DATE = ‘12/12/2011’ SUBJECT = ‘DynamoDB Best practices’ BODY= ‘The first few Kbytes…..’ BODY_OVERFLOW = ‘S3bucket+key’ MESSAGE-ID (hash key) PART (range key) 1 0 FROM = ‘user1’ TO = ‘user2’ DATE = ‘12/12/2011’ SUBJECT = ‘DynamoDB Best practices’ BODY = ‘The first few Kbytes…..’ 1 1 BODY = ‘ the next 64k’ 1 2 BODY = ‘ the next 64k’ 1 3 EOM 41
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Storing Large Items – Pattern 2 Use a overflow table for large attributes Retrieve items via BatchGetItems Mail Box Table ID (hash key) Timestamp (range key) Attribute1 Attribute2 Attribute3 …. AttributeN LargeAttribute MailBox Table ID (hash key) Timestamp (range key) Attribute1 Attribute2 Attribute3 …. AttributeN LargeAttributeUUID Overflow Table LargeAttributeUUID LargeAttribute
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Storing Time Series Data You application wants to keep one year historic data You can pre-create one table per week (or per day or per month) and insert records into the appropriate table based on timestamp 43 Events_table_2012 Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N Events_table_2012_05_week1 Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute NEvents_table_2012_05_week2 Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute NEvents_table_2012_05_week3 Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Searching across items (with different hash keys) Create additional tables which will server as indexes • Example: First_name_index & Last_name_index Query: Get me all the Users data for First_name = ‘Tim’ • Query First_name_index for hash key = ‘Tim’ • This will return User_id = (101, 201) • BatchGet (Users, [101, 201]) 44 User_Id (hash key) First_name Last_name … 101 Tim White 201 Tim Black 301 Ted White 401 Keith Brown 501 Keith White 601 Keith Black First_name (hash key) User_id (range key) Tim 101 Tim 201 Ted 301 Keith 401 Keith 501 Keith 601 Last_name (hash key) User_id (range key) White 101 Black 201 White 301 Brown 401 White 501 Black 601 Users First_name_index Last_name_index
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Avoiding Hot Keys Use multiple keys (aliases) instead of a single hot key Generate aliases by prefixing or suffixing a known range (N) Use BatchGetItem API to retrieve ticket counts for all the aliases (1_Avatar, 2_Avatar, 3_Avatar,…, N_Avatar) and sum them in your client application 45 MOVIES MNAME (hash key) 1_Avatar TicketCount = 4,000,000 2_Avatar TicketCount = 2,000,000 3_Avatar TicketCount = 4,000,000 …. N_Avatar TicketCount = 4,000,000
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. When to use Key-value or simple queries Very high read/write rate Need Auto-sharding Need on-line scaling across multiple nodes Consistently low latency No size or throughput limits No Tuning High durability When not to use Need multi-item/row or cross table transactions Need complex queries, joins Need real-time Analytics on historic data When to use and when not to use DynamoDB
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Questions?
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Backup slides
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. DynamoDB / Elastic MapReduce integration Harness Hadoop parallel processing pipeline to • Perform complex analytics • Join DynamoDB tables with outside data sources like S3 • Export data from DynamoDB to S3 • Import data from S3 into DynamoDB Easy to leverage DynamoDB’s scale 49
  • © 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3 EMR DynamoDB 50
  • Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/amazon- dynamodb-patterns-practices