SRV404 Deep Dive on Amazon DynamoDB

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Edin Zulich - NoSQL Solutions Architect, AWS
August 14, 2017
Deep Dive on Amazon DynamoDB

Dating Website Serverless IoT
o DAX
o GSIs
o TTL
o Streams
o DAX
Getting Started
o Developer Resources
Fundamentals
o NoSQL
o DynamoDB
o Data Modeling
New Features
o TTL
o VPC Endpoints
o Auto Scaling
o DAX
Agenda

NoSQL foundations
0000 {“Texas”}
0001 {“Illinois”}
0002 {“Oregon”}
TXW
A
I
L
Key
Column
0000-0000-0000-0001
Game Heroes
Version 3.4
CRC ADE4
Key Value Graph Document Column-family
Amazon’s
Highly Available
Key-value
Store
January 2012Fall 2007 Late 2007
Amazon SimpleDB Amazon DynamoDBDynamo

Scaling relational vs. non-relational databases
Traditional SQL NoSQL
DB
DB
Scale up
DB
Host
1
DB
Host
n
DB
Host
2
DB
Host
3
Scale out to many shards
(DynamoDB: partitions)

Scaling NoSQL
- Good sharding (partitioning) scheme affords even
distribution of both data and workload, as they grow
- Key concept: partition key
- Ideal scaling conditions:
1. Partition key is from a high cardinality set (that grows)
2. Requests are evenly spread over the key space
3. Requests are evenly spread over time

Hot key problem in NoSQL
DB
Shard
1
DB
Shard
n
DB
Shard
2
DB
Shard
3
{k=X, v=Y}
Extremely unbalanced data or request distribution

Why NoSQL?
• Massive scale – At an affordable price
• Predictable, low latency – Regardless of the scale or load
• Flexible schema – e.g. DynamoDB: Key-value pairs and
JSON documents stored in the same table do not need to
be identical in form
Why not NoSQL?
• If you need ad hoc queries – Use SQL databases
Polyglot Persistence
• Use different databases, depending on how data is used

Use cases
Market orders
Tokenization
(PHI, credit cards)
Chat messages
User profiles
IoT sensor data
& device status
File metadata
Social media feeds
Shopping cart
Sessions

DynamoDB users
Ad Tech Gaming MobileIoT Web

Use case: DataXu’s attribution engine
Meta
Amazon
EMR
JobAmazon
Cloud
Watch
DynamoDB
AWS Data
Pipeline
3rd
Party
S3
Buckets
1st
Party
AWS Direct
Connect
Amazon
VPC
Amazon
EC2
Amazon
RDS
Amazon SNS
AWS IAM
“Attribution" is the marketing term for the allocation of credit to individual
advertisements that eventually lead to a desired outcome (e.g., purchase).

Use case: DataXu’s attribution engine
Meta
Amazon
EMR
JobAmazon
Cloud
Watch
DynamoDB
AWS Data
Pipeline
3rd
Party
S3
Buckets
1st
Party
AWS Direct
Connect
Amazon
VPC
Amazon
EC2
Amazon
RDS
Amazon SNS
AWS IAM
“Attribution" is the marketing term for the allocation of credit to individual
advertisements that eventually lead to a desired outcame (e.g. purchase).

Highly available
and durable
Consistently fast at any scale Fully managed
Secure
Integrates with AWS Lambda,
Amazon Redshift, and more
Amazon DynamoDB
Cost-effective

What’s new
• Cost Allocation Tagging
• Time-to-live (TTL)
• DynamoDB Accelerator (DAX)
• Auto Scaling
• VPC endpoints
• AWS Data Migration Service (DMS) connector for
data migration from MongoDB to DynamoDB

Availability Zone A
Partition A
Host 4 Host 6
Availability Zone B Availability Zone C
Partition APartition A Partition CPartition C Partition C
Host 5
Partition B
Host 1 Host 3Host 2
Partition B
Host 7 Host 9Host 8
Partition B
CustomerOrdersTable
Data is always
replicated to three
Availability Zones
3-way replication
OrderId: 1
CustomerId: 1
ASIN: [B00X4WHP5E]
Hash(1) = 7B
Highly available and durable
Partition A

Consistently fast at any scale
ConsistentSingle-Digit Millisecond Latency
Requests(millions)
Latency(milliseconds)

Scales throughput automatically (Auto Scaling)
Specify: 1) Target capacity in percent 2) Upper and lower bound

Fully managed
DB hosted on premises DB hosted on Amazon EC2

Fully managed
DB hosted on premises DynamoDB

Secure
Fully integrated with AWS Identity and Access Management (IAM)
for authentication and access control.
Provides fine-grained access control at a table, item or attribute
level.
Integrated with AWS CloudTrail to capture changes to DynamoDB
configuration and table setup.
Integrated with AWS CloudWatch to measure metrics around
DynamoDB performance and set alarms to track specific events.

Integrated: DynamoDB + AWS Ecosystem
AWS IoT
Amazon DynamoDB
Amazon
S3
Amazon
Kinesis
Amazon
EMR
Amazon
Redshift
AWS Data Pipeline
AWS Mobile Hub
AWS
Lambda
Amazon ES
Amazon SNS
Amazon
CloudWatch
AWS
CloudTrail

Integrated: Reference architecture
Amazon
Kinesis
consumer
Amazon
EMR

Cost-effective
- Perpetual free tier: 25GB, 25 writes, 25 reads per sec.
- Pay-as-you-grow for capacity and storage independently
- Auto scaling
- Time-to-live (TTL)
- Automatically purges data at no extra charge
- (Deleting tables doesn’t incur charges either)
- Cost Allocation Tagging
- DynamoDB Accelerator (DAX)
- Can help reduce cost of reads in read-heavy applications

Partition Key
Mandatory
Key-value access pattern
Determines data distribution
Optional
Model 1:N relationships
Enables rich query capabilities
DynamoDB table
A1
(partition key)
A2
(sort key)
A3 A4 A7
A1
(partition key)
A2
(sort key)
A6 A4 A5
A1
(partition key)
A2
(sort key)
A1
(partition key)
A2
(sort key)
A3 A4 A5
SortKey
Table
Items

Local secondary indexes
10 GB max per
partition key,
i.e. LSIs limit the
# of sort keys!
A1
(partition key)
A3
(sort key)
A2 A4 A5
A1
(partition key)
A4
(sort key)
A2 A3 A5
A1
(partition key)
A5
(sort key)
A2 A3 A4
• Alternate sort key
attribute
• Index is local to a
partition key

Reads and writes
provisioned
separately for GSIs
INCLUDE A2
A
LL
KEYS_ONLY
A3
(partition key)
A1
(table key)
A2 A4 A7
A3
(partition key)
A1
(table key)
A3
(partition key)
A1
(table key)
A2
• Alternate partition
(+sort) key
• Sparse
• Can be added or
removed anytime
A3
(partition key)
A1
(table key)
A2 A4 A7
A3
(partition key)
A1
(table key)
A2
A3
(partition key)
A1
(table key)
Global secondary indexes

Data types
Type DynamoDB Type
String String
Integer, Float Number
Timestamp Number or String
Blob Binary
Boolean Bool
Null Null
List List
Set
Set of String,
Number, or Binary
Map Map

Table creation options
PartitionKey, Type:
Provisioned Reads:
Provisioned Writes:
Auto Scaling: on (default)
SortKey, Type:
LSI Schema
Stream
Time-to-live
GSI Schema
AttributeName [S,N,B]
AttributeName [S,N,B]
1+
1+
Provisioned Reads: 1+
Provisioned Writes: 1+
TableNameOptionalRequired
CreateTable
String,
Number,
Binary ONLY
Per Second
Unique to
Region

Provisioned throughput capacity
Per table/GSI
Read Capacity Unit (RCU)
1 RCU returns 4KB of data for strongly
consistent reads, or double the data
for eventually consistent reads
Capacity is per second, rounded up to the
next whole number
Write Capacity Unit (WCU)
1 WCU writes 1KB of data, and each
item consumes 1 WCU minimum

Burst capacity is built in
0
400
800
1200
1600
CapacityUnits
Time
Provisioned Consumed
“Save up” unused capacity
Consume saved up capacity
Burst: 300 seconds
(1200 × 300 = 360k CU)
DynamoDB “saves” 300
seconds of unused
capacity per partition

Burst capacity may not be sufficient
0
400
800
1200
1600
CapacityUnits
Time
Provisioned Consumed Attempted
Throttled requests
Don’t completely depend on burst capacity… provision sufficient throughput
Burst: 300 seconds
(1200 × 300 = 360k CU)

Throttling
- Occurs if sustained throughput goes beyond provisioned throughput per partition
• Possible causes
• Non-uniform workloads
• Hot keys/hot partitions
• Very large items
• Mixing hot data with cold data
• Remedy: Use TTL or a table per time period
- Disable retries, write your own retry code, and log all throttled or
returned keys

Partitioning
00
55
AA
FF
Hash(1) = 7B
Orders
Partition A
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition B
33.33 % Keyspace
Partition C
33.33 % Keyspace
Hash.MIN = 0
Hash.MAX = FF
Keyspace
Hash(2) = 48
Hash(3) = CD
DynamoDB table
OrderId: 1
CustomerId: 1
ASIN: [B00X4WHP5E]
OrderId: 2
CustomerId: 4
ASIN: [B00OQVZDJM]
OrderId: 3
CustomerId: 3
ASIN: [B00U3FPN4U]

Orders
00
55
AA
FF
Partition A
33.33 % Keyspace
Partition B
33.33 % Keyspace
Partition C
33.33 % Keyspace
Hash.MIN = 0
Hash.MAX = FF
Keyspace
Time
Partition A
33.33 % Keyspace
Partition B
33.33 % Keyspace
Partition D
Partition E
16.66 %
16.66 %
16.66 %
16.66 %
Partition split due to partition size
00
55
AA
FF
Partition A
33.33 % Keyspace
Partition B
33.33 % Keyspace
Partition C
33.33 % Keyspace
Time
Partition A
Partition C
16.66 %
16.66 %
16.66 %
16.66 %
Partition splits due to capacity increase
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
Partition B
Partition D
Partition E
Partition F
The desired size of a
partition is 10GB* and
when a partition surpasses
this it can split
*=subject to change
Split for partition size
The desired capacity of a
partition is expressed as:
3w + 1r < 3000 *
Where w = WCU & r = RCU
*=subject to change
Split for provisioned capacity
Partitioning

DynamoDB Streams
Partition A
Partition B
Partition C
Ordered stream of item
changes
Exactly once, strictly
ordered by key
Highly durable, scalable
24 hour retention
Sub-second latency
Compatible with Kinesis
Client Library
DynamoDB Streams
1
Shards have a lineage and
automatically close after time
or when the associated
DynamoDB partition splits
2
3
Updates
KCL
Worker
Amazon
Kinesis Client
Library
Application
KCL
Worker
KCL
Worker
GetRecords
DynamoDB
Table
DynamoDB Stream
Shards

DynamoDB Streams and Triggers
AWS Lambda
function
Amazon SNS
 Implemented as AWS Lambda functions
 Scale automatically
 C#, Java, Node.js, Python
Triggers
Amazon ES
Amazon ElastiCache

Cost allocation tagging
• Track costs: AWS bills broken down by
tags in detailed monthly bills and Cost
Explorer
• Flexible: Add customizable tags to
tables, indexes and DAX clusters
Features
Key Benefits
• Transparency: know exactly how much
your DynamoDB resources cost
• Consistent: report of spend across AWS
services

TTL job
Time-to-Live (TTL)
CustomerActiveOrder
OrderId: 1
CustomerId: 1
MyTTL: 1492641900
DynamoDB
stream
Amazon Kinesis
Amazon Redshift
An epoch timestamp marking when
an item can be deleted, without
consuming any provisioned capacity
Time-To-Live
Removes data that is no longer relevant
 TTL items
identifiable in
DynamoDB
Streams
Amazon S3
 Configuration
protected by AWS
IAM, auditable with
AWS CloudTrail
 Doesn’t consume
capacity

42
Auto Scaling
With Auto Scaling
Without Auto Scaling
• Remove the guesswork out of provisioning
adequate capacity
• Increases capacity as application requests
increase, ensuring performance
• Decreases capacity as application requests
reduce, reducing costs
• Full visibility into scaling activities from console
• Fully managed, automatic, independent scaling
of read and write capacity of base tables and
global secondary indexes
• Set only target utilization % and min/max limits
• Accessed from management console, CLI, and
SDK
Features
Key Benefits

• Read performance and scale: Microseconds
response times at millions of reads/sec from single
DAX cluster
• Lower costs: Reduce provisioned read capacity for
DynamoDB tables for tables with hot data
DynamoDB Accelerator (DAX)
DynamoDB
Your Applications
DynamoDB Accelerator
• Fully managed, highly available
• DynamoDB API compatible
• Write-through
• Flexible – use for one or multiple tables
• Scales-out up to 10 read replicas
• Fully integrated AWS service
• Secure
Key Benefits
Features

DynamoDB in the VPC
Availability Zone #1 Availability Zone #2
Private Subnet Private Subnet
VPC endpoint
web
app
server
security
group
security
group
oRole-based access control
oNo IGW or VPC endpoint required
oPrivate IP, client-side discovery
DAX
oDynamoDB-in-the-VPC
oIAM resource policy
restricted
VPC endpoints
security
group
security
group
DAX
web
app
server
DAX

Data Modeling
and
Design Patterns

Data modeling: Hierarchical data structures as items
• Use composite sort key to define a hierarchy
• Highly selective result sets with sort queries
• Index anything, scales to any size
Primary Key
Attributes
ProductID type
Items
1 bookID
title author genre publisher datePublished ISBN
Some Book John Smith Science Fiction Ballantine Oct-70 0-345-02046-4
2 albumID
title artist genre label studio released producer
Some Album Some Band Progressive Rock Harvest Abbey Road 3/1/73 Somebody
2 albumID:trackID
title length music vocals
Track 1 1:30 Mason Instrumental
2 albumID:trackID
Track 2 2:43 Mason Mason
2 albumID:trackID
Track 3 3:30 Smith Johnson
3 movieID
title genre writer producer
Some Movie Scifi Comedy Joe Smith 20th Century Fox
3 movieID:actorID
name character image
Some Actor Joe img2.jpg
3 movieID:actorID
Some Actress Rita img3.jpg
3 movieID:actorID
Some Actor Frito img1.jpg

… or as documents (JSON)
• JSON data types (M, L, BOOL, NULL)
• Document SDKs available
• 400 KB maximum item size (limits hierarchical data structure)
Primary Key
Attributes
ProductID
Items
1
id title author genre publisher datePublished ISBN
bookID Some Book Some Guy Science Fiction Ballantine Oct-70 0-345-02046-4
2
id title artist genre Attributes
albumID Some Album Some Band Progressive Rock
{ label:"Harvest", studio: "Abbey Road", published: "3/1/73", producer: "Pink
Floyd", tracks: [{title: "Speak to Me", length: "1:30", music: "Mason", vocals:
"Instrumental"},{title: ”Breathe", length: ”2:43", music: ”Waters, Gilmour,
Wright", vocals: ”Gilmour"},{title: ”On the Run", length: “3:30", music: ”Gilmour,
Waters", vocals: "Instrumental"}]}
3
id title genre writer Attributes
movieID Some Movie Scifi Comedy Joe Smith
{ producer: "20th Century Fox", actors: [{ name: "Luke Wilson", dob: "9/21/71",
character: "Joe Bowers", image: "img2.jpg"},{ name: "Maya Rudolph", dob:
"7/27/72", character: "Rita", image: "img1.jpg"},{ name: "Dax Shepard", dob:
"1/2/75", character: "Frito Pendejo", image: "img3.jpg"}]

Online dating service
Users have people they like, and people who
like them
Hourly batch job matches users
Data stored in Likes and Matches tables
Dating website
DESIGN PATTERNS:
DynamoDB Accelerator and Global Secondary Indexes

Schema Design Part 1
GSI_LikedBy
user_id_liked
(Partition key)
user_id
(sort key)
1. Get all people I like
2. Get all people that like me
3. Expire likes after 90 days
LIKES|
Likes
user_id
(Partition key)
user_id_liked
(sort key)
MyTTL
(TTL attribute)
… Attribute N

Schema Design Part 2
Matches
event_id
(Partition key)
timestamp UserIdLeft
(GSI left)
UserIdRight
(GSI right)
Attribute N
GSI Left
UserIdLeft
(Partition key)
event_id
(Table key)
timestamp UserIdRight
GSI Right
UserIdRight
(Partition key)
event_id
(Table key)
timestamp UserIdLeft
Get my matchesMATCHES|
Get one user in a match
Get the other user

Matchmaking
LIKES
1. Get all new likes every hour
2. For each like, get the other user’s likes
3. Store matches in matches table
Partition 1
Partition …
Partition N Availability Zone
Public Subnet
match
making
server
security group
Auto Scaling group

Matchmaking
LIKES
Partition 1
Partition …
Public Subnet
match
making
server
security group
Auto Scaling group
THROTTLE!

Matchmaking 1. Get all new likes every hour
1. Key choice: High key cardinality
2. Uniform access: access is evenly spread over the key-space
3. Time: requests arrive evenly spaced in time
Even Access:

Matchmaking
LIKES
Partition 1
Partition …
Public Subnet
match
making
server
security group
Auto Scaling group
0. Write like to likes table, then query by user id to
warm cache, then queue for batch processing
security group
DAX

Takeaways:
 Use GSIs for many to many relationships
 Use DAX for read-heavy access
 Keep DAX warm by querying after writing
 Use DynamoDB Streams for event processing
Dating website
DESIGN PATTERNS:
DynamoDB Accelerator and GSIs

Amazon DynamoDB
DESIGN PATTERNS:
Time series with TTL, DynamoDB Streams, write-sharding, and DAX
 Store sensor data in DynamoDB table
 Age-out data older than 90 days to S3
Serverless IoT

Schema Design
Data
DeviceId
(Partition key)
Timestamp
(sort key)
MyTTL
(TTL attribute)
… Attribute N
1. Get all events for a device
2. Archive old events after 90 daysDATA|
UserDevices
UserId
(Partition key)
DeviceId
(sort key)
Attribute 1 … Attribute N
1. Get all devices for a userUSERDEVICES|
References

DATA
DeviceId: 1
Timestampl: 1492641900
MyTTL: 1492736400 Expiry
AWS Lambda
Amazon S3
Bucket
Amazon DynamoDB Amazon DynamoDB Streams
 Single DynamoDB table for storing sensor data
 Tiered storage to archive old events to S3
USERDEVICES
Serverless IoT

Serverless IoT
DATA
Partition A Partition B Partition DPartition C
Throttling
Noisy sensor produces data at
a rate several times greater
than others

Data
00
3F
BF
FF
Partition A
25.0 % Keyspace
Partition B
25.0 % Keyspace
Partition D
25.0 % Keyspace
Hash.MIN = 0
Hash.MAX = FF
Keyspace
Partition C
25.0 % Keyspace
7F
1.Key choice: High key cardinality
2.Uniform access: access is evenly
spread over the key-space
3.Time: requests arrive evenly
spaced in time
Even Access:

Serverless IoT
Requirements:
1. Single DynamoDB table for storing sensor
data
2. Tiered storage to remove archive old events
to S3
3. Data stored in data table
0. Capable of dynamically sharding to overcome
throttling

Schema Design
Shard
DeviceId
(Partition key)
ShardCount
1. Get shard count for given device
2. Always grow the count of shardsSHARD|
1. Get all events for a device
2. Archive old events after 90 daysDATA|
Data
DeviceId
(Partition key)
Timestamp
(sort key)
MyTTL
(TTL attribute)
… Attribute N
The number of shards is not predefined,
and may grow over time but never
contract. Contrast with a fixed shard
count.
Dynamic Sharding
Range: 0..1,000

DATA
DeviceId_ShardId: 1_3
Timestamp: 1492641900
MyTTL: 1492736400
SHARD
DeviceId: 1
ShardCount: 10
1.
2.
Serverless IoT: Write sharding
Request path:
1. Read ShardCount from Shard table
2. Write to a random shard
3. If throttled, review shard count
Expiry

Serverless IoT
DATA
Pick a random shard to write data to
DeviceId_ShardId:
1_Rand(0,10)
MyTTL: 1492736400
2.
?
SHARD
DeviceId: 1
ShardCount: 10
1.

DATA
DeviceId: 1
Timestampl: 1492641900
MyTTL: 1492736400 Expiry
AWS Lambda
Amazon S3
Bucket
DynamoDB
Stream
Single DynamoDB table for storing sensor data
Tiered storage to remove old events/archive to S3
Capable of dynamically sharding to overcome
throttling
USERDEVICES
Serverless IoT
SHARD
DeviceId: 1
ShardCount: 10
DAX
+
Amazon Kinesis
Firehose

Serverless IoT:
Alternative
Approach
DATA
DeviceId: 123
Temp: 172
MyTTL: 1492736400
AWS Lambda
Kinesis Stream
Queue-based load leveling
Group multiple data points into
a single item
Save on writes

DESIGN PATTERNS:
TTL, DynamoDB Streams, and DAX
Takeaways:
Avoid hot partitions using:
• Write sharding or
• Queue-based load leveling
Use DAX for hot reads, especially from Lambda
Use TTL to create tiered storage
Serverless IoT

Getting started?
DynamoDB Local Document SDKs
DynamoDB
Developer Resources
https://aws.amazon.com/dynamodb/developer-resources/

Thank you!
Remember to fill out your survey

SRV404 Deep Dive on Amazon DynamoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SRV404 Deep Dive on Amazon DynamoDB

Similar to SRV404 Deep Dive on Amazon DynamoDB (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

SRV404 Deep Dive on Amazon DynamoDB