This document discusses Amazon DynamoDB and how it provides a fully managed NoSQL database service. Some key points:
- DynamoDB allows developers to offload operational tasks like provisioned throughput, automated scaling and patching to AWS. This simplifies development and reduces costs.
- The document outlines DynamoDB's data model including tables, items, attributes and indexes. It also discusses how DynamoDB partitions and distributes data automatically based on hash keys to enable massive scale.
- Various AWS services are shown that integrate with DynamoDB for different data workloads like search, analytics and caching. Best practices are also provided around data modeling, queries and system design.
3. • key-value access
• complex queries
• transactions
• analytics
One Database for All Workloads
App/Web Tier
Client Tier
RDBMS
4. Cloud Data Tier Architecture
App/Web Tier
Client Tier
Data Tier
Search Cache Blob Store
RDBMSNoSQL Data Warehouse
5. Workload Driven Data Store Selection
Data Tier
Search Cache Blob Store
RDBMSNoSQL Data Warehouse
logging
analytics
key/value
simple query
rich search hot reads
complex queries
and transactions
6. AWS Services for the Data Tier
Data Tier
Amazon
DynamoDB
Amazon
RDS
Amazon
ElastiCache
Amazon
S3
Amazon
Redshift
Amazon
CloudSearch
logging
analytics
key/value
simple query
rich search hot reads
complex queries
and transactions
7. RDBMS = Default Choice
• Amazon.com page composed of responses from 1000’s of
independent services
• Query patterns for different service are different
Catalog service is usually heavy key-value
Ordering service is very write intensive (key-value)
Catalog search has a different pattern for querying
Relational Era @ Amazon.com
RDBMS
Poor Availability Limited Scalability High Cost
8. Dynamo = NoSQL Technology
• Replicated DHT with consistency management
• Consistent hashing
• Optimistic replication
• “Sloppy quorum”
• Anti-entropy mechanisms
• Object versioning
Distributed Era @ Amazon.com
lack of strong every engineer needs to
operational
consistency learn distributed systems complexity
9. DynamoDB = NoSQL Cloud Service
Cloud Era @ Amazon.com
Non-Relational
Fast & Predictable Performance
Seamless Scalability
Easy Administration
12. partitions
1 .. N
table
• DynamoDB automatically partitions data by the hash key
Hash key spreads data (& workload) across partitions
• Auto-partitioning occurs with:
Data set size growth
Provisioned capacity increases
Massive and Seamless Scale
large number of unique hash keys
+
uniform distribution of workload
across hash keys
ready
to scale
app’s
13. Making life easier for developers…
• Developers are freed from:
Performance tuning (latency)
Automatic 3-way multi-AZ replication
Scalability (and scaling operations)
Security inspections, patches, upgrades
Software upgrades, patches
Automatic hardware failover
Improving the underlying hardware
…and lots of other stuff
Automated Operations
14. Provisioned Throughput
• Request-based capacity provisioning model
• Throughput is declared and updated via the API or the
console
CreateTable (foo, reads/sec = 100, writes/sec = 150)
UpdateTable (foo, reads/sec=10000, writes/sec=4500)
• DynamoDB handles the rest
Capacity is reserved and available when needed
Scaling-up triggers repartitioning and reallocation
No impact to performance or availability
Predictable Performance
15. WRITES
Continuously replicated to 3 AZ’s
Quorum acknowledgment
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No trade-off in latency
Durable At Scale
16. WRITES
Continuously replicated to 3 AZ’s
Quorum acknowledgment
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No trade-off in latency
Low Latency At Scale
18. “DynamoDB has scaled effortlessly to match our company's
explosive growth, doesn't burden our operations staff, and
integrates beautifully with our other AWS assets”.
“I love how DynamoDB enables us to provision
our desired throughput, and achieve low
latency and seamless scale, even with our
constantly growing workloads.”
19. Weatherbug mobile app
Lightning detection & alerting
for 40M users/month
Developed and tested in weeks,
at “1/20th of the cost of the
traditional DB approach”
Super Bowl promotion
Millions of interactions over a
relatively short period of time
Built the app in 3 days, from
design to production-ready
Fast Development
20. Cost Effective
“Our previous NoSQL database required
almost a full time administrator to run.
Now AWS takes care of it.”
“Being optimized at AdRoll means we
spend more every month on snacks than
we do on DynamoDB – and almost
nothing on an ops team”
Save Money Reduce Effort
27. Hash = Distribution Key
partition 1..N
hash keys
mandatory for all items in a table
key-value access pattern
determines data distribution
28. Hash = Distribution Key
large number of unique hash keys
uniform distribution of workload
across hash keys
optimal
schema
design
+
29. Range = Query
range
hash
range keys
model 1:N relationships
enable rich query capabilities
composite primary key
all items for a hash key
==, <, >, >=, <=
“begins with”
“between”
sorted results
counts
top / bottom N values
paged responses
30. Index Options
local secondary indexes (LSI)
alternate range key + same hash key
index and table data is co-located (same partition)
35. • Currently 13 operations in total
Simple API
Manage Tables
• CreateTable
• UpdateTable
• DeleteTable
• DescribeTable
• ListTables
Read and
Write Items
• PutItem
• GetItem
• UpdateItem
• DeleteItem
Read and Write
Multiple Items
• BatchGetItem
• BatchWriteItem
• Query
• Scan
36. • Scalar data types
String (S) - Unicode with UTF8 binary encoding
Number (N) up to 38 digits precision and can be between 10-128 to
10+126
• Variable width encoding can occupy up to 21 bytes
• Multi-valued types
String Set (SS)
Number Set (NS)
Not ordered
Data types
37. • Data is indexed by the primary key
Single Hash Key
• Targeted towards object persistence
Hash Range composite Key
• Sorted collection within hash bucket
• Can store series of events for a given entity
• Automatic partitioning
Leading hash key spreads data & workload across partitions
• Traffic is scaled out and parallelized
Indexing & Partitioning
38. • Consistent Reads
Inventory, shopping cart applications
• Atomic Counters
Increment and return new value in same operation
• Conditional Writes
Expected value before write – fails on mismatch
“state machine” use cases
• Sparse Indexes
Ideal for sorted lists; fast access to a subset of items
Popular: identify recently updated items; top lists; leaderboards
Other Features
39. • Use API/SDK/CLI Management Console to crate tables
• Use the AWS SDK to interact with DynamoDB
PutItem, UpdateItem, DeleteItem
Query
Scan
etc.
How to use DynamoDB?
$client = $aws->get("dynamodb");
$tableName = "ProductCatalog";
$response = $client->putItem(array(
"TableName" => $tableName,
"Item" => $client->formatAttributes(array(
"Id" => 120,
"Title" => "Book 120 Title",
"ISBN" => "120-1111111111",
"Authors" => array("Author12", "Author22"),
"Price" => 20,
"Category" => "Book",
"Dimensions" => "8.5x11.0x.75",
"InPublication" => 0,
)
),
"ReturnConsumedCapacity" => 'TOTAL'
));
Libraries, SDK’s
Web Console
Interaction
Command Line
Figure: Writing an item to a table via the PHP SDK
40. • Higher-Level Programming
Interfaces
Object Persistence Model for .NET
& Java
Helper Classes for .NET
Transaction Library for Java
• Local DynamoDB available for
development and testing
• Dynamic DynamoDB for auto-scaling
• Many community contributed tools/frameworks
How to use DynamoDB?
[DynamoDBTable("ProductCatalog")]
public class Book
{
[DynamoDBHashKey]
public int Id { get; set; }
public string Title { get; set; }
public int ISBN { get; set; }
[DynamoDBProperty("Authors")]
public List<string> BookAuthors { get; set; }
[DynamoDBIgnore]
public string CoverPage { get; set; }
}
Figure: .NET class using object persistence model
41. Use Libraries and Tools
Transactions
Atomic transactions across multiple items & tables
Tracks status of ongoing transactions via two tables
1. Transactions
2. Pre-transaction snapshots of modified items
Geolocation
Add location awareness to mobile
applications
Find Yourself – sample app
https://github.com/awslabs
42. • Third party library for automating scaling decisions
• Scale up for service levels, scale down for cost
• CloudFormation template for fast deployment
Autoscaling with Dynamic DynamoDB
43. • Disconnected development with full API support
No network
No usage costs
Develop and Test Locally – DynamoDB Local
Note! DynamoDB Local does not
have a durability or availability SLA
m2.4xlarge
DynamoDB
Local
do this instead!
44. Some minor differences from Amazon DynamoDB
• DynamoDB Local ignores your provisioned throughput
settings
The values that you specify when you call CreateTable and
UpdateTable have no effect
• DynamoDB Local does not throttle read or write activity
• The values that you supply for the AWS access key and the
Region are only used to name the database file
• Your AWS secret key is ignored but must be specified
Recommended using a dummy string of characters
Develop and Test Locally – DynamoDB Local
45. • Reports CloudWatch metrics
Latency
Consumed throughput
Errors
Throttling
• Alarms can be used to dynamically size throughput
Monitoring
CloudWatch
46. • DynamoDB can be used for large data ingest
• Redshift can directly load data from DynamoDB (COPY)
• EMR can directly read from DynamoDB by using Hive
Analytics
CREATE EXTERNAL TABLE pc_dynamodb (
[attributes]
)
STORED BY
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler
'
TBLPROPERTIES ([properties]);
Amazon S3
Redshift
EMR
External
Hive table
External
Hive table
Hive
DynamoDB
CREATE EXTERNAL TABLE pc_s3 (
[attributes]
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://myawsbucket1/catalog/';
47. • Provisioned Throughput:
$0.0065 per hour for every 10 units of Write Capacity
1 write per second for 1 KB items
$0.0065 per hour for every 50 units of Read Capacity
1 consistent read per second for 4 KB items
• Storage
$0.25 per GB-month of storage
• Free tier!
100MB storage + 50 writes/sec + 10 reads/sec each month
Pricing
49. • Method
1. Describe the overall use case – maintain context
2. Identify the individual access patterns of the use case
3. Model each access pattern to its own discrete data set
4. Consolidate data sets into tables and indexes
• Benefits
Single table fetch for each query
Payloads are minimal for each access
Access Pattern Modeling
50. • Design for uniform data access across items
Partition distribution based on hash key
Hash Key should be well distributed
Access frequency should be distributed across different hash keys
• Time Series Pattern
Logging
Focus only on recent data
Table Best Practices
Hash Key value Efficiency
User ID, where the application has many users. Good
Status code, where there are only a few possible status codes. Bad
Device ID, where even if there are a lot of devices being tracked, one is by far more popular
than all the others.
Bad
51. • Use One-to-Many Tables instead of large set attributes
Break items up in multiple tables
• Use Multiple Tables to support Varied Access Patterns
If you frequently access large items but do not use all attributes, store
smaller frequently attributes in separate tables
• Compress large attributes
Reduces cost of storage and throughput
• Store large attributes in S3
Item Best Practices
52. • Avoid sudden burst of read Activity
Reduce page size of Scans
Isolate scan operations; create separate
tables and write to both:
• Mission-Critical Table
• Shadow Table
• Take advantage of parallel scans
Sequential scans take longer
Query and Scan Best Practices