Aws Summit Berlin 2013 - Understanding database options on AWS

Jan Borch - AWS Solutions Architect
Understanding Database Options on AWS
Jan Borch
#awssummit
Berlin

We want to make it easy for you to start
1. Zero to Application in ____ Minutes
2. Zero to Millions of users in ____ Days
3. Zero to “Profits!” ASAP

https://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
http://nosql-database.org/

Spectrum of options on AWS
SQL NoSQL
Low Cost High Cost

Do-it-yourself Fully Managed
Not
available on
AWS

SQL NoSQL

RDS
- MySQL
- Oracle
- SQL Server
MySQL
Oracle
SQL Server
PostgreSQL
Your favorite RDBMS
SQL NoSQL

SQL NoSQL
MongoDB
Cassandra
Redis
Memcached
…
Amazon DynamoDB
Amazon ElastiCache

Thinking about the questions
Should I use SQL or
NoSQL?
Should I use MySQL
on EC2 or RDS?
Should I use
MongoDB, Cassandra
, or DynamoDB?
Should I use
Redis, Memcached, or
ElastiCache?
?

Actually, thinking about the right questions
What are my scale
and latency needs?
What are my
transactional and
consistency needs?
What are my
read/write, storage
and IOPS needs?
What are my time to
market and server
control needs?
?

Option 1:
Run your databases on EC2

Amazon Elastic Compute Cloud
Amazon EC2

Virtual core: 1
Memory: 1.7 GiB
I/O performance: Moderate
m1.small cc2.8xlarge
Virtual core: 32 - 2 x Intel Xeon
Memory: 60,5 GiB
I/O performance: 10 Gbit
cr1.8xlarge
Virtual core: 32 - 2 x Intel Xeon
Memory: 240 GiB
SSD Instance store: 240 GB
cr1.8xlarge
Virtual core: 16
Memory: 60.5 GiB
SSD Instance store: 2 x 1TB
cr1.8xlarge
Virtual core: 16
Memory: 117 GiB
Instance store: 24 x 2TB

Choose an Amazon Machine Image

Leverage AWS services
EBS storage Volumes with EBS Snapshots
S3 for backups (for example Oracle RMAN)
Automation with AWS API or CloudFormation

Option 2:
Let AWS manage my databases

backup & recovery,
data load & unload
performance
tuning
25%40%
5% 5%
scripting & coding
security
planning
install,
upgrade, patch
and migrate
documentation,
licensing &
training
differentiated effort
increases the
uniqueness
of an application
Why Managed Databases?

We believe in choice
One size does not fit all
Traditional Apps
Relational DB Needs
High
Performance, High
Scale Data
Warehouses
New Web Apps
Massive Scalability
Amazon RDS
Amazon
ElasticCache
Amazon
DynamoeDB
Amazon
Redshift

Option 2.1:
Managed SQL database

Amazon Relational Database Services
AmazonRDS
RDS is a fully managed relational database service
that is simple to deploy, easy to scale, reliable and
cost-effective

Rapid deployment via Web Console

Push Button Scaling
Scale …
• vertically up or down
• Storage vertically

Price
reduction
High Availability: Multi-AZ Deployments
Multi AZ price reductions
ranging from 15% to 32%

Horizontal Scaling with Read Replicas
New
Features
• Endpoint Renaming
• ReadReplica
to master promotion

Security
Oracle Native Network Encryption and Transparent
Data Encryption on Oracle EE
SSL support for SQL Server and mysql

Amazon RDS
Configuration
Improve
Availability
Increase
Throughput
Reduce
Latency
Push-Button Scaling
Multi-AZ
Read Replicas
Provisioned IOPS
Read ReplicasPush-Button Scaling
Provisioned IOPS
Region
Multi-AZ
Availability
Zone
Availability
Zone
Availability and performance options

Who is succeeding with RDS?
Thousands of developers use RDS every single day
Gaming Web Apps Mobile/Social Media

Amazon Elastic Cache
Amazon ElastiCache is a fully managed
Memcached-compatible caching service

Option 2.2:
Managed noSQL database

Amazon DynamoDB
Amazon DynamoDB is a fully managed NoSQL
database service

Single digit millisecond latency.
Backed on solid-state drives.
Consistent, predictable performance

No table size limits. Unlimited storage
No downtime.
Seamless scalability

Consistent, disk only writes.
Replication across data centers and availability
zones.
Durable

Without the operational burden.
managed by DynamoDB

Reserve IOPS for reads and writes.
Scale up for down at any time.
Provisioned throughput.

Pay per capacity unit
READ
Capacity Units =
Size of item (KB) x read per second
Consistent read:
$0.0065 for 50 read units
Eventually consistent reads:
$0.0065 for 100 read units
WRITE
Capacity Units =
Size of item (KB) x write per second
$0.0065 for 10 write units

Reserved capacity
Up to 53% for 1 year reservation
Up to 76% for 3 year reservation

Transactions
Item level transactions only
Puts, updates and deletes are ACID
Atomic increment and decrement
Conditional writes

Read Consistency
Strong or eventually consistent reads
Same latency expectations for strong
Mix and match at „read time‟

Data Modeling
Tables do not require a formal schema
Items are an arbitrarily sized hash.

id = 100
date = 2012-05-16-09-
00-10 total = 25.00
id = 101
date = 2012-05-15-15-
00-11 total = 35.00
id = 101
date = 2012-05-16-12-
00-10 total = 100.00
id = 102
date = 2012-03-20-18-
23-10 total = 20.00
id = 102
date = 2012-03-20-18-
23-10 total = 120.00
Data modeling
Table

id = 100
date = 2012-05-16-09-
00-10 total = 25.00
id = 101
date = 2012-05-15-15-
00-11 total = 35.00
id = 101
date = 2012-05-16-12-
00-10 total = 100.00
id = 102
date = 2012-03-20-18-
23-10 total = 20.00
id = 102
date = 2012-03-20-18-
23-10 total = 120.00
Data modeling
Item

id = 100
date = 2012-05-16-09-
00-10 total = 25.00
id = 101
date = 2012-05-15-15-
00-11 total = 35.00
id = 101
date = 2012-05-16-12-
00-10 total = 100.00
id = 102
date = 2012-03-20-18-
23-10 total = 20.00
id = 102
date = 2012-03-20-18-
23-10 total = 120.00
Data modeling
Attributes

Items are indexed by primary and secondary keys
Primary keys can be composite
Secondary keys are local to the table
Indexing

ID Date Total
Hash key
Indexing

ID Date Total
Hash key Range key
Composite primary key
Indexing

ID Date Total
Hash key Range key Secondary range key
Indexing

Programming DynamoDB.
Small but perfectly formed API.
CreateTable
UpdateTable
DeleteTable
DescribeTable
ListTables
PutItem
GetItem
UpdateItem
DeleteItem
BatchGetItem
BatchWriteItem
Query
Scan
Manage tables
Query specific
items OR scan the
full table
“Select”, “insert”,
“update” items
Bulk select or
update (max 1MB)

Query patterns
Retrieve all items by hash key.
Range key conditions:
==, <, >, >=, <=, begins with, between.
Counts. Top and bottom n values.
Paged responses.

500,000 WRITES PER SECOND
DURING SUPER BOWL

Amazon DynamoDB: who is succeeding
with it?

Option 2.3:
Managed datawarehouse database

OLTP <-> OLAP
SELECT ProductID, Name
FROM Products
Where ProductID = 1234;
SELECT ProductID, count(*)
FROM Page_Hits
WHERE hour in (12,13)
GROUP BY ProductID

Transactional Processing
• Global context
– Daily revenue report
• Throughput
• Full table scans
• Sequential IO
• Disk Transfer rates
Analytical Processing
• Transactional context
– Get order total
• Latency
• Indexed access
• Random IO
• Disk Seek times
OLTP <-> OLAP

Amazon Redshift is a fast, fully managed, petabyte-scale
data warehouse service
Amazon Redshift

Fast and powerful
Parallelize and Distribute Everything
Dramatically Reduce I/O
Direct-attached storage
Large data block sizes
Column data store
Data compression
Zone maps
MPP
Load
Query
Resize
Backup
Restore

Fully Managed
Protect Operations
Simplify Provisioning
Redshift data is always encrypted
Continuously backed up to S3
Automatic node recovery
Transparent disk failure
Create a cluster in minutes
Automatic OS and software patching
Scale up to 1.6PB with a few clicks and no downtime

Amazon Redshift architecture
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC

Best of both worlds: Use both SQL and
NoSQL models in one app

More on Amazon Redshift?
03:15pm to 03:45pm
Introducing the Amazon Redshift data warehouse
Room Zero
Speaker: Steffen Krause, Amazon

Aws Summit Berlin 2013 - Understanding database options on AWS

More Related Content

Viewers also liked

Similar to Aws Summit Berlin 2013 - Understanding database options on AWS

More from AWS Germany

Recently uploaded

Aws Summit Berlin 2013 - Understanding database options on AWS