DAT310_Which Database to Use When

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which Database to Use When?
T o n y P e t r o s s i a n , D i r e c t o r o f E n g i n e e r i n g , D y n a m o D B
I a n M e y e r s , H e a d o f D B S C u s t o m e r A d v i s o r y T e a m
D A T 3 1 0
N o v e m b e r 2 8 , 2 0 1 7

A Short Break from Generalities
Relational Non-Relational
NoSQL SQL
Schema Schema-free
Unstructured Structured

Looking at the Specifics
Purpose of a database Your application needs

Database Workloads
Data Considerations
Shape Size Compute

Shape
Purpose-Built For Optimized for When you need to Example Workload
Row Store Operate on a record or group of records Payroll
Column Store Aggregations, scans and joins Analytics
Key-Value Store Query by key with high throughput & fast ingestion Tracking devices
Document Store Index & store documents for query on any property Patient data
Graph Store Persist and retrieve relationships Recommendations
Time-Series Store Store and process data sequence Process Engine telemetry
Unstructured Store Get and put of objects Store user reviews

Size
Considerations Example Workload
Size at limit – bounded or unbounded
Number of employees – bounded
Number of sensors – unbounded
Working set size & caching
10-years of sales data but only the last 12-months is queried
Session data for users of a streaming service
Retrieval size
Get one row
Get one thousand rows
Partitionable or monolithic
Storage and processing of car location data is partitionable
Company payroll data has no natural partition boundary

Compute
Considerations Example Workload
Compute functions
Sum of sales for the last 12-months
Get & Put data
Throughput
Million users browsing a product catalogue every second
50 doctors looking at 300 patient records per day
Latency
Get the location of a car in 5 milliseconds
Get the min, max & average deal size for the last 12-months in 5 seconds
Change rate
Inventory counts are frequently updated
Sales records are never updated
Rate of ingestion
Location telemetry from cars added to the database every minute
New employees records being added to the database

My [insert your favorite DB] works for everything
General purpose Special purpose
One size fits all Efficiency at scale

But Which Database to Use When?
Decision points and considerations

Managed database services
DevOps
Build Deploy Operate
code, integrate, test provision, configure, rollout secure, monitor, scale, HA
All conveniently located at the end of an API call

But Which Databases to Use When?
Why pick just one?

Our Strategy

Operational Analytics
transactional retrospective
system of record streaming
content management predictive
Back to Generalities

Operational Analytics
transactional retrospective
system of record predictive
content management streaming

General characteristics
• Usually a good fit for caching
• Small compute size – few rows,
items, documents per request
• Low-latency
• High-throughput
• High-concurrency
• Mission critical HA, DR and data
protection
Primary dimensions to consider
• Size at limit – bounded or
unbounded
• Rows, key-values or documents
• Need relational capabilities or not
• partitioned or monolithic
• Push-down compute requirements
• Change velocity
• Ingestion requirements
Operational workloads

Amazon RDS
Managed relational database service with a choice of six popular database engines
Easy to administer Highly scalable Available &
Durable
Fast
No need for infrastructure
provisioning, installing and
maintaining database
software
Scale database compute
and storage with a few
mouse clicks with no
downtime
Multi-AZ:
Automatically
replicates data to in a
different AZ.
Automated backup,
snapshots, failover
Chose between 2 SSD-
backed storage for high
performance OLTP

Amazon ElastiCache
Extreme
Performance
Secure & hardened Easily scalable
Highly available &
reliable
In-memory data store and
cache using optimized stack
to deliver sub-millisecond
response times
VPC for cluster isolation,
encryption at rest/transit,
and HIPAA compliance
Read scaling with replicas.
Write and memory scaling
with sharding. Non
disruptive scaling
Multi-AZ with automatic
failover
Managed, in-memory data store service.
Redis or Memcached to power real-time apps with sub-millisecond latency.

Caching
ElastiCache can be added to most operational
database to improve read latency and reduce
provisioned read IOPS if required
And if your working set size fits in cache and you
can get a good hit rate
You application needs to be aware of the cache
that fronts the database
Important to understand the ‘cache aside’
pattern and the impact of stale reads on your
application
Application
Read
Miss
Read
Write
Respond
Read
Value
ElastiCache Amazon
RDS

Caching
ElastiCache can be added to most operational
database to improve read latency and reduce
provisioned read IOPS if required
And if your working set size fits in cache and you
can get a good hit rate
You application needs to be aware of the cache
that fronts the database
Important to understand the ‘cache aside’
pattern and the impact of stale reads on your
application Delete
Write
Respond
Write
Stale Cache Reads
Application ElastiCache Amazon
RDS

Amazon DynamoDB
F a s t a n d f l e x i b l e N o S Q L d a t a b a s e s e r v i c e f o r a n y s c a l e
NoSQL database that supports both document and key-value structures
Fast, consistent
performance
Highly scalable Fully managed
Business Critical
Reliability
Consistently single-digit
millisecond latencies at any
scale. DAX speeds up times
to microseconds.
Auto-scaling tables serving
millions of requests per
second, storing hundreds of
terabytes of data.
Automatic provisioning
and infrastructure
management.
Data is replicated across
fault tolerant availability
zones, with fine-grained
access control.

Fully managed, in-
memory cache for
DynamoDB.
Reduces DynamoDB
response times from
milliseconds to
microseconds.
Amazon DynamoDB Accelerator (DAX)
Fully managed write-through cache for DynamoDB

Caching
DAX is fully integrated caching for DynamoDB at
the API level so no additional application
considerations are needed to use DAX.
If your working set size fits in cache and you
need the lower latency DAX is a great option
Application
Item
Write
Respond
Write
No Stale Cache Reads
Write
DAX Amazon
DynamoDB

High Availability and Durability
DynamoDB is always Multi-AZ durable
Writes are synchronous to two availability zones
Reads are Multi-AZ consistent if requested by the API request
Consumers can read from an item from any of 3 nodes hosting the partition
Amazon Aurora is always Multi-AZ durable
Writes are synchronous to 4 nodes of the 6 node storage clique
Reads are transactionally consistent from only 3 nodes
Consumers must read from the primary node
Amazon
DynamoDB
Amazon
Aurora

Relational capabilities
RDBMS provide multi-table, multi-record transactions,
referential integrity and locking
DynamoDB provides Atomicity, Consistency (at the Item
level), Durability, and automatics partitioning at any scale
in exchange for relational capabilities
You must consider the scale of your requirement, skills of
your team, and data model complexity to make a good
choice
NOSQL Skills
Data Model
Massive Scale

Operational database dimensions
Size at limit – bounded ✔
Size at limit – unbounded ✔
key-values or documents ✔
Rows ✔
Need relational capabilities ✔
Partitioned ✔
Push-down compute requirements ✔
Change Velocity ✔ ✔
Ingestion requirements ✔
Amazon
DynamoDB
Amazon
RDS
A few Examples…

General characteristics
• Almost always a columnar
• Large and usually partitioned
• large compute size – millions of
items involved in query
• Heavy compute push down
• Batch writes or trickle inserts
• Little to no updates
• Needs a lot of memory and often
in-memory compute capabilities
Primary dimensions to consider
• Streaming or not
• Latency requirements
• ETL or no ETL
• Serverless or dedicated compute
• Always active or occasionally active
• Data formats
Analytic workloads

Amazon Redshift – Data Warehousing
Fast, powerful, and simple data warehousing at 1/10 the cost
Massively parallel, petabyte scale
$
Fast Inexpensive Scalable Secure
Columnar storage technology
to improve I/O efficiency and
parallelize queries. Data load
scales linearly.
As low as $1,000 per
terabyte per year, 1/10th
the cost of traditional data
warehouse solutions.
Resize your cluster up and
down as your performance
and capacity needs
change.
Data encrypted at rest and
transit. Isolate clusters with
VPC. Manage your own keys
with AWS KMS.

Amazon Athena – Interactive analysis
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to setup or manage and no data to load
$
SQL
Query Instantly
Zero setup cost. Just
point to S3 and start
querying
Pay per query
Pay only for queries run.
Save 30-90% on per query
costs through compression
Open
ANSI SQL interface,
JDBC/ODBC drivers, Multiple
formats, compression types,
and complex Joins and data
types
Easy
Serverless. Zero
Infrastructure. Zero
Administration

Amazon Kinesis Analytics
Process and Analyze Streaming Data in Real-time with SQL

Amazon Elasticsearch Service
Easy to Use
Fully-managed.
Deploy production-ready
clusters in minutes.
Open
Direct access to
Elasticsearch open-source
APIs. Supports Logstash
and Kibana.
Secure
Secure access with VPC to
keep all traffic within AWS
network.
Available
Zone awareness replicates
data between two AZs.
Automatically monitors &
replaces failed nodes.
Easy to deploy, secure, operate, and scale Elasticsearch
Customers use Elasticsearch for log analytics, full text search, & application monitoring

Analytics database dimensions
Streaming analytics ✔
Serverless ad-hoc query ✔
Process, prepare and index in-place ✔
Low-latency for reporting and BI dashboards ✔
Pay per query ✔
Data warehouse with multiple enterprise data sources ✔
Query data directly in S3 without format conversions ✔
Directly query CSV, JSON, TSV or text files ✔
Amazon
Redshift
AthenaKinesis Analytics

Well Modelled DataData Exploration
Non-SQL Analytics
Real-Time analytics
Managed Storage Delivery
Amazon
Elasticsearch Service
Kinesis Analytics
Athena Amazon
Redshift
Amazon S3
Why pick one when you can use all three?

DW | Big Data Processing | Ad hoc
AWS Databases and Analytics
B r o a d e s t a n d d e e p e s t p o r t f o l i o p u r p o s e - b u i l t f o r b u i l d e r s
Business Intelligence & Machine Learning
Data Movement
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams
QuickSight
Relational Databases
RDS
Aurora
Data Lake
S3/Glacier
(Storage)
Glue
(ETL & Data Catalog)
Machine Learning
Macie
(Data Protection)
Non-Relational Databases
Analytics
DynamoDB
(Key value/Document)
ElastiCache
(Redis, Memcached)
Redshift EMR Athena
Kinesis
Analytics
Elasticsearch
Service
Real-time
Operational Databases

In Closing
AWS offers a myriad of services designed to help you solve your toughest problems at scale – no need to just
pick one service
When selecting a data service, consider the dimensions and pick the best match for each component of your
application

Thank you!
P l e a s e f i l l i n t h e s e s s i o n s u r v e y
W e h o p e y o u e n j o y e d t h e d i s c u s s i o n !
D A T 3 1 0 : W h i c h D a t a b a s e t o U s e W h e n ?

DAT310_Which Database to Use When

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DAT310_Which Database to Use When

Similar to DAT310_Which Database to Use When (20)

More from Amazon Web Services

More from Amazon Web Services (20)

DAT310_Which Database to Use When