Build High-Scale Apps with Amazon DynamoDB: Fast, Predictable Performance

Chris Munns
Solutions Architect
Amazon Web Services
Build High-Scale Applications with
Amazon DynamoDB

Traditional Database Architecture
App/Web Tier
Client Tier
Database Tier

• key-value access
• complex queries
• transactions
• analytics
One Database for All Workloads
App/Web Tier
Client Tier
RDBMS

Cloud Data Tier Architecture
App/Web Tier
Client Tier
Data Tier
Search Cache Blob Store
RDBMSNoSQL Data Warehouse

Workload Driven Data Store Selection
Data Tier
Search Cache Blob Store
RDBMSNoSQL Data Warehouse
logging
analytics
key/value
simple query
rich search hot reads
complex queries
and transactions

AWS Services for the Data Tier
Data Tier
Amazon
DynamoDB
Amazon
RDS
Amazon
ElastiCache
Amazon
S3
Amazon
Redshift
Amazon
CloudSearch
logging
analytics
key/value
simple query
rich search hot reads
complex queries
and transactions

RDBMS = Default Choice
• Amazon.com page composed of responses from 1000’s of
independent services
• Query patterns for different service are different
 Catalog service is usually heavy key-value
 Ordering service is very write intensive (key-value)
 Catalog search has a different pattern for querying
Relational Era @ Amazon.com
RDBMS
Poor Availability Limited Scalability High Cost

Dynamo = NoSQL Technology
• Replicated DHT with consistency management
• Consistent hashing
• Optimistic replication
• “Sloppy quorum”
• Anti-entropy mechanisms
• Object versioning
Distributed Era @ Amazon.com
lack of strong every engineer needs to
operational
consistency learn distributed systems complexity

DynamoDB = NoSQL Cloud Service
Cloud Era @ Amazon.com
Non-Relational
Fast & Predictable Performance
Seamless Scalability
Easy Administration

database service
automated operations predictable performance
fast development
always durable
low latency cost effective
=

partitions
1 .. N
table
• DynamoDB automatically partitions data by the hash key
 Hash key spreads data (& workload) across partitions
• Auto-partitioning occurs with:
 Data set size growth
 Provisioned capacity increases
Massive and Seamless Scale
large number of unique hash keys
+
uniform distribution of workload
across hash keys
ready
to scale
app’s

Making life easier for developers…
• Developers are freed from:
 Performance tuning (latency)
 Automatic 3-way multi-AZ replication
 Scalability (and scaling operations)
 Security inspections, patches, upgrades
 Software upgrades, patches
 Automatic hardware failover
 Improving the underlying hardware
…and lots of other stuff
Automated Operations

Provisioned Throughput
• Request-based capacity provisioning model
• Throughput is declared and updated via the API or the
console
 CreateTable (foo, reads/sec = 100, writes/sec = 150)
 UpdateTable (foo, reads/sec=10000, writes/sec=4500)
• DynamoDB handles the rest
 Capacity is reserved and available when needed
 Scaling-up triggers repartitioning and reallocation
 No impact to performance or availability
Predictable Performance

WRITES
Continuously replicated to 3 AZ’s
Quorum acknowledgment
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No trade-off in latency
Durable At Scale

WRITES
Continuously replicated to 3 AZ’s
Quorum acknowledgment
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No trade-off in latency
Low Latency At Scale

“DynamoDB has scaled effortlessly to match our company's
explosive growth, doesn't burden our operations staff, and
integrates beautifully with our other AWS assets”.
“I love how DynamoDB enables us to provision
our desired throughput, and achieve low
latency and seamless scale, even with our
constantly growing workloads.”

Weatherbug mobile app
Lightning detection & alerting
for 40M users/month
Developed and tested in weeks,
at “1/20th of the cost of the
traditional DB approach”
Super Bowl promotion
Millions of interactions over a
relatively short period of time
Built the app in 3 days, from
design to production-ready
Fast Development

Cost Effective
“Our previous NoSQL database required
almost a full time administrator to run.
Now AWS takes care of it.”
“Being optimized at AdRoll means we
spend more every month on snacks than
we do on DynamoDB – and almost
nothing on an ops team”
Save Money Reduce Effort

DynamoDB Concepts
attributes
items
table
schema-less
schema is defined per attribute

DynamoDB Concepts
attributes
items
table
scalar data types
• number, string, and binary
multi-valued types
• string set, number set, and binary set

DynamoDB Concepts
hash
hash keys
mandatory for all items in a table
key-value access pattern
PutItem
UpdateItem
DeleteItem
BatchWriteItem
GetItem
BatchGetItem

Hash = Distribution Key
partition 1..N
hash keys
mandatory for all items in a table
key-value access pattern
determines data distribution

Hash = Distribution Key
large number of unique hash keys
uniform distribution of workload
across hash keys
optimal
schema
design
+

Range = Query
range
hash
range keys
model 1:N relationships
enable rich query capabilities
composite primary key
all items for a hash key
==, <, >, >=, <=
“begins with”
“between”
sorted results
counts
top / bottom N values
paged responses

Index Options
local secondary indexes (LSI)
alternate range key + same hash key
index and table data is co-located (same partition)

Projected Attributes
KEYS_ONLY
INCLUDE
ALL

Index Options
global secondary
indexes (GSI)
any attribute indexed as
new hash or range key
Same projected
attribute options

• Currently 13 operations in total
Simple API
Manage Tables
• CreateTable
• UpdateTable
• DeleteTable
• DescribeTable
• ListTables
Read and
Write Items
• PutItem
• GetItem
• UpdateItem
• DeleteItem
Read and Write
Multiple Items
• BatchGetItem
• BatchWriteItem
• Query
• Scan

• Scalar data types
 String (S) - Unicode with UTF8 binary encoding
 Number (N) up to 38 digits precision and can be between 10-128 to
10+126
• Variable width encoding can occupy up to 21 bytes
• Multi-valued types
 String Set (SS)
 Number Set (NS)
 Not ordered
Data types

• Data is indexed by the primary key
 Single Hash Key
• Targeted towards object persistence
 Hash Range composite Key
• Sorted collection within hash bucket
• Can store series of events for a given entity
• Automatic partitioning
 Leading hash key spreads data & workload across partitions
• Traffic is scaled out and parallelized
Indexing & Partitioning

• Consistent Reads
 Inventory, shopping cart applications
• Atomic Counters
 Increment and return new value in same operation
• Conditional Writes
 Expected value before write – fails on mismatch
 “state machine” use cases
• Sparse Indexes
 Ideal for sorted lists; fast access to a subset of items
 Popular: identify recently updated items; top lists; leaderboards
Other Features

• Use API/SDK/CLI Management Console to crate tables
• Use the AWS SDK to interact with DynamoDB
 PutItem, UpdateItem, DeleteItem
 Query
 Scan
 etc.
How to use DynamoDB?
$client = $aws->get("dynamodb");
$tableName = "ProductCatalog";
$response = $client->putItem(array(
"TableName" => $tableName,
"Item" => $client->formatAttributes(array(
"Id" => 120,
"Title" => "Book 120 Title",
"ISBN" => "120-1111111111",
"Authors" => array("Author12", "Author22"),
"Price" => 20,
"Category" => "Book",
"Dimensions" => "8.5x11.0x.75",
"InPublication" => 0,
)
),
"ReturnConsumedCapacity" => 'TOTAL'
));
Libraries, SDK’s
Web Console
Interaction
Command Line
Figure: Writing an item to a table via the PHP SDK

• Higher-Level Programming
Interfaces
 Object Persistence Model for .NET
& Java
 Helper Classes for .NET
 Transaction Library for Java
• Local DynamoDB available for
development and testing
• Dynamic DynamoDB for auto-scaling
• Many community contributed tools/frameworks
How to use DynamoDB?
[DynamoDBTable("ProductCatalog")]
public class Book
{
[DynamoDBHashKey]
public int Id { get; set; }
public string Title { get; set; }
public int ISBN { get; set; }
[DynamoDBProperty("Authors")]
public List<string> BookAuthors { get; set; }
[DynamoDBIgnore]
public string CoverPage { get; set; }
}
Figure: .NET class using object persistence model

Use Libraries and Tools
Transactions
 Atomic transactions across multiple items & tables
 Tracks status of ongoing transactions via two tables
1. Transactions
2. Pre-transaction snapshots of modified items
Geolocation
 Add location awareness to mobile
applications
 Find Yourself – sample app
https://github.com/awslabs

• Third party library for automating scaling decisions
• Scale up for service levels, scale down for cost
• CloudFormation template for fast deployment
Autoscaling with Dynamic DynamoDB

• Disconnected development with full API support
No network
No usage costs
Develop and Test Locally – DynamoDB Local
Note! DynamoDB Local does not
have a durability or availability SLA
m2.4xlarge
DynamoDB
Local
do this instead!

Some minor differences from Amazon DynamoDB
• DynamoDB Local ignores your provisioned throughput
settings
 The values that you specify when you call CreateTable and
UpdateTable have no effect
• DynamoDB Local does not throttle read or write activity
• The values that you supply for the AWS access key and the
Region are only used to name the database file
• Your AWS secret key is ignored but must be specified
 Recommended using a dummy string of characters
Develop and Test Locally – DynamoDB Local

• Reports CloudWatch metrics
 Latency
 Consumed throughput
 Errors
 Throttling
• Alarms can be used to dynamically size throughput
Monitoring
CloudWatch

• DynamoDB can be used for large data ingest
• Redshift can directly load data from DynamoDB (COPY)
• EMR can directly read from DynamoDB by using Hive
Analytics
CREATE EXTERNAL TABLE pc_dynamodb (
[attributes]
)
STORED BY
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler
'
TBLPROPERTIES ([properties]);
Amazon S3
Redshift
EMR
External
Hive table
External
Hive table
Hive
DynamoDB
CREATE EXTERNAL TABLE pc_s3 (
[attributes]
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://myawsbucket1/catalog/';

• Provisioned Throughput:
 $0.0065 per hour for every 10 units of Write Capacity
1 write per second for 1 KB items
 $0.0065 per hour for every 50 units of Read Capacity
1 consistent read per second for 4 KB items
• Storage
 $0.25 per GB-month of storage
• Free tier!
 100MB storage + 50 writes/sec + 10 reads/sec each month
Pricing

• Method
1. Describe the overall use case – maintain context
2. Identify the individual access patterns of the use case
3. Model each access pattern to its own discrete data set
4. Consolidate data sets into tables and indexes
• Benefits
 Single table fetch for each query
 Payloads are minimal for each access
Access Pattern Modeling

• Design for uniform data access across items
 Partition distribution based on hash key
 Hash Key should be well distributed
 Access frequency should be distributed across different hash keys
• Time Series Pattern
 Logging
 Focus only on recent data
Table Best Practices
Hash Key value Efficiency
User ID, where the application has many users. Good
Status code, where there are only a few possible status codes. Bad
Device ID, where even if there are a lot of devices being tracked, one is by far more popular
than all the others.
Bad

• Use One-to-Many Tables instead of large set attributes
 Break items up in multiple tables
• Use Multiple Tables to support Varied Access Patterns
 If you frequently access large items but do not use all attributes, store
smaller frequently attributes in separate tables
• Compress large attributes
 Reduces cost of storage and throughput
• Store large attributes in S3
Item Best Practices

• Avoid sudden burst of read Activity
 Reduce page size of Scans
 Isolate scan operations; create separate
tables and write to both:
• Mission-Critical Table
• Shadow Table
• Take advantage of parallel scans
 Sequential scans take longer
Query and Scan Best Practices

Quick Poll + Questions?
Thanks for joining!

Build High-Scale Apps with Amazon DynamoDB: Fast, Predictable Performance

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Build High-Scale Apps with Amazon DynamoDB: Fast, Predictable Performance

Similar to Build High-Scale Apps with Amazon DynamoDB: Fast, Predictable Performance (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

Build High-Scale Apps with Amazon DynamoDB: Fast, Predictable Performance