Deep Dive on Amazon DynamoDB - AWS Online Tech Talks
Nov. 9, 2017•0 likes•1,570 views
Download to read offline
Report
Learning Objectives:
- Learn the capabilities of Amazon DynamoDB in detail
- Learn best practices for schema design with DynamoDB
- Learn use cases for Non-relational (NoSQL) Databases
2. Dating Website Serverless IoT
o DAX
o GSIs
o TTL
o Streams
o DAX
Getting Started
o Developer Resources
Amazon DynamoDB
o Foundations
o Tables
o Indexes
o Partitioning
New Features
o TTL
o Tagging
o VPC Endpoints
o Auto Scaling
o DAX
Plan
4. NoSQL foundations
0000 {“Texas”}
0001 {“Illinois”}
0002 {“Oregon”}
T
X
W
A
I
L
Key
Column
0000-0000-0000-0001
Game Heroes
Version 3.4
CRC ADE4
Key Value Graph Document Column-family
Dynamo:
Amazon’s
Highly Available
Key-value
Store
January 2012Fall 2007 June 2009
Meetup
235 2nd St
San
Francisco
5. Scaling relational vs. non-relational databases
Traditional SQL NoSQL
DB
DB
Scale up
DB
Host
1
DB
Host
n
DB
Host
2
DB
Host
3
Scale out to many shards
(DynamoDB: partitions)
6. Scaling NoSQL
- Good sharding (partitioning) scheme affords even
distribution of both data and workload, as they grow
- Key concept: partition key
- Ideal scaling conditions:
1. Partition key is from a high cardinality set (that grows)
2. Requests are evenly spread over the key space
3. Requests are evenly spread over time
7. What (some) customers store in NoSQL DBs
Market Orders Tokenization
(PHI, Credit Cards)
Chat MessagesUser Profiles
(Mobile)
IoT Sensor Data
(& device status!)
File MetadataSocial Media Feeds
8. Use case: DataXu’s attribution engine
Meta
Amazon
EMR
JobAmazon
Cloud
Watch
DynamoDB
AWS Data
Pipeline
3rd
Party
S3
Buckets
1st
Party
AWS Direct
Connect
Amazon
VPC
Amazon
EC2
Amazon
RDS
Amazon SNS
AWS IAM
“Attribution" is the marketing term for the allocation of credit to individual
advertisements that eventually lead to a desired outcome (e.g., purchase).
9. Use case: DataXu’s attribution engine
Meta
Amazon
EMR
JobAmazon
Cloud
Watch
DynamoDB
AWS Data
Pipeline
3rd
Party
S3
Buckets
1st
Party
AWS Direct
Connect
Amazon
VPC
Amazon
EC2
Amazon
RDS
Amazon SNS
AWS IAM
“Attribution" is the marketing term for the allocation of credit to individual
advertisements that eventually lead to a desired outcame (e.g. purchase).
11. Highly available Consistent, single digit
millisecond latency
at any scale
Fully managed
Secure
Integrates with AWS Lambda,
Amazon Redshift, and more.
Amazon DynamoDB
12. Elastic is the new normal
Write Capacity Units
Read Capacity Units
ConsumedCapacityUnits
>200% increase from baseline
>300% increase from baseline
Time
15. 10 GB max per
partition key, i.e.
LSIs limit the # of
sort keys!
A1
(partition key)
A3
(sort key)
A2 A4 A5
A1
(partition key)
A4
(sort key)
A2 A3 A5
A1
(partition key)
A5
(sort key)
A2 A3 A4
• Alternate sort key
attribute
• Index is local to a
partition key
Local Secondary Indexes
16. RCUs/WCUs
provisioned
separately for GSIs
INCLUDE A2
A
L
L
KEYS_ONLY
A3
(partition key)
A1
(table key)
A2 A4 A7
A3
(partition key)
A1
(table key)
A3
(partition key)
A1
(table key)
A2
• Alternate partition
(+sort) key
• Index is across all
table partition keys
• Can be added or
removed anytime
A3
(partition key)
A1
(table key)
A2 A4 A7
A3
(partition key)
A1
(table key)
A2
A3
(partition key)
A1
(table key)
Global Secondary Indexes
17. Data types
Type DynamoDB Type
String String
Integer, Float Number
Timestamp Number or String
Blob Binary
Boolean Bool
Null Null
List List
Set
Set of String, Number,
or Binary
Map Map
18. Table creation options
PartitionKey, Type:
SortKey, Type:
Provisioned Reads:
Provisioned Writes:
LSI Schema GSI Schema
AttributeName [S,N,B]
AttributeName [S,N,B]
1+
1+
Provisioned Reads: 1+
Provisioned Writes: 1+
TableNameOptionalRequired
CreateTable
String,
Number,
Binary ONLY
Per Second
Unique to
Account and
Region
Optionally:
TTL and Auto Scaling
19. Provisioned capacity
Provisioned capacity
Read Capacity Unit (RCU)
1 RCU returns 4KB of data for strongly
consistent reads, or double the data
at the same cost for eventually
consistent reads
Capacity is per second, rounded up to the next
whole number
Write Capacity Unit (WCU)
1 WCU writes 1KB of data, and each
item consumes 1 WCU minimum
20. Horizontal Sharding
Host 1 Host 99 Host n
~Each new host brings compute, storage and network bandwidth~
CustomerOrdersTable
22. CustomerOrdersTable
00
55
AA
FF
Partition A
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition B
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition C
33.33 % Keyspace
33.33 % Provisioned Capacity
Hash.MIN = 0
Hash.MAX = FF
Keyspace
Time
Partition A
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition B
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition D
Partition E
16.66 %
16.66 %
16.66 %
16.66 %
Partition split due to partition size
00
55
AA
FF
Partition A
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition B
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition C
33.33 % Keyspace
33.33 % Provisioned Capacity
Time
Partition A
Partition C
16.66 %
16.66 %
16.66 %
16.66 %
Partition splits due to capacity increase
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
Partition B
Partition D
Partition E
Partition F
The desired size of a partition
is 10GB* and when a
partition surpasses this it can
split
*=subject to change
Split for partition size
The desired capacity of a
partition is expressed as:
3w + 1r < 3000 *
Where w = WCU & r = RCU
*=subject to change
Split for provisioned capacity
Partitioning
23. Partition A
1000 RCUs
100 WCUs
Partition C
1000 RCUs
100 WCUs
Host A Host C
Availability Zone A
Partition A
1000 RCUs
100 WCUs
Partition C
1000 RCUs
100 WCUs
Host E Host G
Availability Zone B
Partition A
1000 RCUs
100 WCUs
Partition C
1000 RCUs
100 WCUs
Host H Host J
Availability Zone C
CustomerOrdersTable
54:∞00:0 54:∞00:0 54:∞00:0
FF:∞AA:0 FF:∞AA:0 FF:∞AA:0
Data is replicated to
three Availability Zones
by design
3-way replication
OrderId: 1
CustomerId: 1
ASIN: [B00X4WHP5E]
Hash(1) = 7B
Partition B
1000 RCUs
100 WCUs
Host B Host F Host I
Partition B
1000 RCUs
100 WCUs
Partition B
1000 RCUs
100 WCUs
A9:∞55:0 A9:∞55:0 A9:∞55:0
Partitioning
24. DynamoDB Streams
Partition A
Partition B
Partition C
Ordered stream of item
changes
Exactly once, strictly
ordered by key
Highly durable, scalable
24 hour retention
Sub-second latency
Compatible with Kinesis
Client Library
DynamoDB Streams
1
Shards have a lineage and
automatically close after time or
when the associated DynamoDB
partition splits
2
3
Updates
KCL
Worker
Amazon
Kinesis Client
Library
Application
KCL
Worker
KCL
Worker
GetRecords
Amazon DynamoDB
Table
DynamoDB Streams
Stream
Shards
25. TTL
job
Time-To-Live (TTL)
Amazon DynamoDB
Table
CustomerActiveOrder
OrderId: 1
CustomerId: 1
MyTTL:
1492641900
DynamoDB Streams
Stream
Amazon Kinesis
Amazon Redshift
An epoch timestamp marking when an
item can be deleted by a background
process, without consuming any
provisioned capacity
Time-To-Live
Removes data that is no longer relevant
26. Time-To-Live (TTL)
TTL items
identifiable in
DynamoDB Streams
Configuration protected by AWS
Identity and Access Management
(IAM), auditable with AWS
CloudTrail
Eventual deletion,
free to use
27. Cost allocation tagging
• Track costs: AWS bills broken down by tags
in detailed monthly bills and Cost Explorer
• Flexible: Add customizable tags to tables,
indexes and DAX clusters
Features
Key Benefits
• Transparency: know exactly how much
your DynamoDB resources cost
• Consistent: report of spend across AWS
services
31. DynamoDB in the VPC
Availability Zone #1 Availability Zone #2
Private Subnet Private Subnet
VPC endpoint
web
app
server
security
group
security
group
oMicroseconds latency in-memory cache
oMillions of requests per second
oFully managed, highly available
oRole based access control
oNo IGW or VPC endpoint required
DAX
oDynamoDB-in-the-VPC
oIAM resource policy
restricted
VPC Endpoints
AWS Lambda
security
group
security
group
DAX
web
app
server
DAX
32. DynamoDB Accelerator (DAX)
Private IP, Client-side
Discovery
Supports AWS Java and NodeJS SDK,
with more AWS SDKs to come
Cluster based, Multi-AZ Separate Query and
Item cache
34. DynamoDB key choice
To get the most out of DynamoDB throughput, create tables where the
partition key has a large number of distinct values, and values are requested
fairly uniformly, as randomly as possible.
Amazon DynamoDB Developer Guide
35. Burst capacity is built-in
0
400
800
1200
1600
CapacityUnits
Time
Provisioned Consumed
“Save up” unused capacity
Consume saved up capacity
Burst: 300 seconds
(1200 × 300 = 360k CU)
DynamoDB “saves” 300
seconds of unused capacity
per partition
36. Burst capacity may not be sufficient
0
400
800
1200
1600
CapacityUnits
Time
Provisioned Consumed Attempted
Throttled requests
Don’t completely depend on burst capacity… provision sufficient throughput
Burst: 300 seconds
(1200 × 300 = 360k CU)
38. Throttling
Occurs if sustained throughput goes beyond provisioned throughput per partition
• Possible causes
• Non-uniform workloads
• Hot keys/hot partitions
• Very large items
• Mixing hot data with cold data
• Remedy: Use TTL or a table per time period
- Disable retries, write your own retry code, and log all throttled or returned
keys
40. Online dating website running on AWS
Users have people they like, and conversely
people who like them
Hourly batch job matches users
Data stored in Likes and Matches tables
Dating Website
DESIGN PATTERNS:
DynamoDB Accelerator and GSIs
41. Schema Design Part 1
GSI_Other
user_id_other
(Partition key)
user_id_self
(sort key)
Requirements:
1. Get all people I like
2. Get all people that like me
3. Expire likes after 90 days
LIKES|
Likes
user_id_self
(Partition key)
user_id_other
(sort key)
MyTTL
(TTL attribute)
… Attribute N
43. Matchmaking
LIKES
Requirements:
1.Get all new likes every hour
2.For each like, get the other user’s likes
3.Store matches in matches table
Partition 1
Partition …
Partition N Availability Zone
Public Subnet
match
making
server
security group
Auto Scaling group
44. Matchmaking
LIKES
Requirements:
1.Get all new likes every hour
2.For each like, get the other user’s likes
3.Store matches in matches table
Partition 1
Partition …
Partition N Availability Zone
Public Subnet
match
making
server
security group
Auto Scaling group
THROTTLE!
45. Matchmaking Requirements:
1.Get all new likes every hour
2.For each like, get the other user’s likes
3.Store matches in matches table
1.Key choice: High key cardinality
2.Uniform access: access is evenly spread over the key-space
3.Time: requests arrive evenly spaced in time
Even Access:
46. Matchmaking
LIKES
Requirements:
1.Get all new likes every hour
2.For each like, get the other user’s likes
3.Store matches in matches table
Partition 1
Partition …
Partition N Availability Zone
Public Subnet
match
making
server
security group
Auto Scaling group
0. Write like to like table, then query by user id to warm
cache, then queue for batch processing
security group
DAX
47. Takeaways:
Keep DAX warm by querying after writing
Use GSIs for many to many relationships
Dating Website
DESIGN PATTERNS:
DynamoDB Accelerator and GSIs
48. Amazon DynamoDB
DESIGN PATTERNS:
TTL, DynamoDB Streams, and DAX
Single DynamoDB table for storing sensor data
Tiered storage to remove archive old events to S3
Data stored in data table
Serverless IoT
49. Schema Design
Data
DeviceId
(Partition key)
EventEpoch
(sort key)
MyTTL
(TTL attribute)
… Attribute N
Requirements:
1.Get all events for a device
2.Archive old events after 90 days
DATA|
UserDevices
UserId
(Partition key)
DeviceId
(sort key)
Attribute 1 … Attribute N
Requirements:
1.Get all devices for a userUSERDEVICES|
References
50. DATA
DeviceId: 1
EventEpoch: 1492641900
MyTTL: 1492736400 Expiry
AWS Lambda
Amazon S3
Bucket
Amazon DynamoDB Amazon DynamoDB Streams
Single DynamoDB table for storing sensor data
Tiered storage to remove archive old events to S3
Data stored in data table
USERDEVICES
Serverless
IoT
51. Serverless IoT
DATA
Partition A Partition B Partition DPartition C
Throttling
Noisy sensor produces data at
a rate several times greater
than others
52. Data
00
3F
BF
FF
Partition A
25.0 % Keyspace
25.0 % Provisioned Capacity
Partition B
25.0 % Keyspace
25.0 % Provisioned Capacity
Partition D
25.0 % Keyspace
25.0 % Provisioned Capacity
Hash.MIN = 0
Hash.MAX = FF
Keyspace
Partition C
25.0 % Keyspace
25.0 % Provisioned Capacity
7F
53. Data
00
3F
BF
FF
Partition A
25.0 % Keyspace
25.0 % Provisioned Capacity
Partition B
25.0 % Keyspace
25.0 % Provisioned Capacity
Partition D
25.0 % Keyspace
25.0 % Provisioned Capacity
Hash.MIN = 0
Hash.MAX = FF
Keyspace
Partition C
25.0 % Keyspace
25.0 % Provisioned Capacity
7F
1.Key choice: High key cardinality
2.Uniform access: access is evenly spread
over the key-space
3.Time: requests arrive evenly spaced in
time
Even Access:
54. Serverless IoT
Requirements:
1. Single DynamoDB table for storing sensor data
2. Tiered storage to remove archive old events to
S3
3. Data stored in data table
0. Capable of dynamically sharding to overcome
throttling
55. Schema Design
Shard
DeviceId
(Partition key)
ShardCount
Requirements:
1. Get shard count for given device
2. Always grow the count of shards
SHARD|
Requirements:
1. Get all events for a device
2. Archive old events after 90 days
Data |
Data
DeviceId
(Partition key)
EventEpoch
(sort key)
MyTTL
(TTL attribute)
… Attribute N
A sharding scheme where the number of
shards is not predefined, and will grow over
time but never contract. Contrast with a fixed
shard count
Naïve Sharding
Range: 0..1,000
56. DATA
DeviceId_ShardId: 1_3
EventEpoch: 1492641900
MyTTL: 1492736400
SHAR
D
DeviceId: 1
ShardCount: 10
1.
2.
Serverless IoT: Naïve Sharding
Request path:
1.Read ShardCount from Shard table
2.Write to a random shard
3.If throttled, review shard count
Expiry
57. Serverless IoT
DATA
Partition A Partition B Partition DPartition C
Pick a random shard to write data to
DeviceId_ShardId: 1_Rand(0,10)
EventEpoch: 1492641900
MyTTL: 1492736400
2.
?
SHAR
D
DeviceId: 1
ShardCount: 10
1.
58. DATA
DeviceId: 1
EventEpoch: 1492641900
MyTTL: 1492736400 Expiry
AWS Lambda
Amazon S3
Bucket
Amazon DynamoDB
Streams
Single DynamoDB table for storing sensor data
Tiered storage to remove archive old events to S3
Data stored in data table
Capable of dynamically sharding to overcome throttling
USERDEVICES
Serverless
IoT
SHAR
D
DeviceId: 1
ShardCount: 10
DAX
+
Amazon Kinesis
Firehose
59. DESIGN PATTERNS:
TTL, DynamoDB Streams, and DAX
Takeaways:
Use naïve write sharding to dynamically expand shards
Use DAX for hot reads, especially from Lambda
Use TTL to create tiered storage
Serverless IoT