Jan Borch - AWS Solutions Architect
Understanding Database Options on AWS
Jan Borch
#awssummit
Berlin
We want to make it easy for you to start
1. Zero to Application in ____ Minutes
2. Zero to Millions of users in ____ Days
3. Zero to “Profits!” ASAP
Spot the critical component!
https://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
http://nosql-database.org/
Spectrum of options on AWS
SQL NoSQL
Low Cost High Cost

Do-it-yourself Fully Managed
Not
available on
AWS
Spectrum of options on AWS
SQL NoSQL
Do-it-yourself Fully Managed
RDS
- MySQL
- Oracle
- SQL Server
MySQL
Oracle
SQL Server
PostgreSQL
Your favorite RDBMS
Spectrum of options on AWS
SQL NoSQL
Do-it-yourself Fully Managed
Spectrum of options on AWS
SQL NoSQL
Do-it-yourself Fully Managed
MongoDB
Cassandra
Redis
Memcached
…
Amazon DynamoDB
Amazon ElastiCache
Thinking about the questions
Should I use SQL or
NoSQL?
Should I use MySQL
on EC2 or RDS?
Should I use
MongoDB, Cassandra
, or DynamoDB?
Should I use
Redis, Memcached, or
ElastiCache?
?
Actually, thinking about the right questions
What are my scale
and latency needs?
What are my
transactional and
consistency needs?
What are my
read/write, storage
and IOPS needs?
What are my time to
market and server
control needs?
?
Focus on your application
Option 1:
Run your databases on EC2
Amazon Elastic Compute Cloud
Amazon EC2
Virtual core: 1
Memory: 1.7 GiB
I/O performance: Moderate
m1.small cc2.8xlarge
Virtual core: 32 - 2 x Intel Xeon
Memory: 60,5 GiB
I/O performance: 10 Gbit
cr1.8xlarge
Virtual core: 32 - 2 x Intel Xeon
Memory: 240 GiB
I/O performance: 10 Gbit
SSD Instance store: 240 GB
cr1.8xlarge
Virtual core: 16
Memory: 60.5 GiB
I/O performance: 10 Gbit
SSD Instance store: 2 x 1TB
cr1.8xlarge
Virtual core: 16
Memory: 117 GiB
I/O performance: 10 Gbit
Instance store: 24 x 2TB
Choose an Amazon Machine Image
Leverage AWS services
EBS storage Volumes with EBS Snapshots
S3 for backups (for example Oracle RMAN)
Automation with AWS API or CloudFormation
Option 2:
Let AWS manage my databases
backup & recovery,
data load & unload
performance
tuning
25%40%
5% 5%
scripting & coding
security
planning
install,
upgrade, patch
and migrate
documentation,
licensing &
training
differentiated effort
increases the
uniqueness
of an application
Why Managed Databases?
We believe in choice
One size does not fit all
Traditional Apps
Relational DB Needs
High
Performance, High
Scale Data
Warehouses
New Web Apps
Massive Scalability
Amazon RDS
Amazon
ElasticCache
Amazon
DynamoeDB
Amazon
Redshift
Option 2.1:
Managed SQL database
Amazon Relational Database Services
AmazonRDS
RDS is a fully managed relational database service
that is simple to deploy, easy to scale, reliable and
cost-effective
Choice of database options
Rapid deployment via Web Console
Backups and Recovery
Push Button Scaling
Scale …
• vertically up or down
• Storage vertically
Price
reduction
High Availability: Multi-AZ Deployments
Multi AZ price reductions
ranging from 15% to 32%
A few clicks or one API call
Horizontal Scaling with Read Replicas
New
Features
• Endpoint Renaming
• ReadReplica
to master promotion
A few clicks or one API call
High Performance RDS
Security
Oracle Native Network Encryption and Transparent
Data Encryption on Oracle EE
SSL support for SQL Server and mysql
Amazon RDS
Configuration
Improve
Availability
Increase
Throughput
Reduce
Latency
Push-Button Scaling
Multi-AZ
Read Replicas
Provisioned IOPS
Read ReplicasPush-Button Scaling
Provisioned IOPS
Region
Multi-AZ
Availability
Zone
Availability
Zone
Availability and performance options
Use case
Who is succeeding with RDS?
Thousands of developers use RDS every single day
Gaming Web Apps Mobile/Social Media
Amazon Elastic Cache
Amazon ElastiCache is a fully managed
Memcached-compatible caching service
Option 2.2:
Managed noSQL database
Amazon DynamoDB
Amazon DynamoDB is a fully managed NoSQL
database service
Single digit millisecond latency.
Backed on solid-state drives.
Consistent, predictable performance
No table size limits. Unlimited storage
No downtime.
Seamless scalability
Consistent, disk only writes.
Replication across data centers and availability
zones.
Durable
Without the operational burden.
managed by DynamoDB
Three click or on API call
Reserve IOPS for reads and writes.
Scale up for down at any time.
Provisioned throughput.
Pay per capacity unit
READ
Capacity Units =
Size of item (KB) x read per second
Consistent read:
$0.0065 for 50 read units
Eventually consistent reads:
$0.0065 for 100 read units
WRITE
Capacity Units =
Size of item (KB) x write per second
$0.0065 for 10 write units
Reserved capacity
Up to 53% for 1 year reservation
Up to 76% for 3 year reservation
Transactions
Item level transactions only
Puts, updates and deletes are ACID
Atomic increment and decrement
Conditional writes
Read Consistency
Strong or eventually consistent reads
Same latency expectations for strong
Mix and match at „read time‟
Data Modeling
Tables do not require a formal schema
Items are an arbitrarily sized hash.
id = 100
date = 2012-05-16-09-
00-10 total = 25.00
id = 101
date = 2012-05-15-15-
00-11 total = 35.00
id = 101
date = 2012-05-16-12-
00-10 total = 100.00
id = 102
date = 2012-03-20-18-
23-10 total = 20.00
id = 102
date = 2012-03-20-18-
23-10 total = 120.00
Data modeling
Table
id = 100
date = 2012-05-16-09-
00-10 total = 25.00
id = 101
date = 2012-05-15-15-
00-11 total = 35.00
id = 101
date = 2012-05-16-12-
00-10 total = 100.00
id = 102
date = 2012-03-20-18-
23-10 total = 20.00
id = 102
date = 2012-03-20-18-
23-10 total = 120.00
Data modeling
Item
id = 100
date = 2012-05-16-09-
00-10 total = 25.00
id = 101
date = 2012-05-15-15-
00-11 total = 35.00
id = 101
date = 2012-05-16-12-
00-10 total = 100.00
id = 102
date = 2012-03-20-18-
23-10 total = 20.00
id = 102
date = 2012-03-20-18-
23-10 total = 120.00
Data modeling
Attributes
Items are indexed by primary and secondary keys
Primary keys can be composite
Secondary keys are local to the table
Indexing
ID Date Total
Indexing
ID Date Total
Hash key
Indexing
ID Date Total
Hash key Range key
Composite primary key
Indexing
ID Date Total
Hash key Range key Secondary range key
Indexing
Programming DynamoDB.
Small but perfectly formed API.
CreateTable
UpdateTable
DeleteTable
DescribeTable
ListTables
PutItem
GetItem
UpdateItem
DeleteItem
BatchGetItem
BatchWriteItem
Query
Scan
Manage tables
Query specific
items OR scan the
full table
“Select”, “insert”,
“update” items
Bulk select or
update (max 1MB)
Query patterns
Retrieve all items by hash key.
Range key conditions:
==, <, >, >=, <=, begins with, between.
Counts. Top and bottom n values.
Paged responses.
500,000 WRITES PER SECOND
DURING SUPER BOWL
Amazon DynamoDB: who is succeeding
with it?
Option 2.3:
Managed datawarehouse database
OLTP <-> OLAP
SELECT ProductID, Name
FROM Products
Where ProductID = 1234;
SELECT ProductID, count(*)
FROM Page_Hits
WHERE hour in (12,13)
GROUP BY ProductID
Transactional Processing
• Global context
– Daily revenue report
• Throughput
• Full table scans
• Sequential IO
• Disk Transfer rates
Analytical Processing
• Transactional context
– Get order total
• Latency
• Indexed access
• Random IO
• Disk Seek times
OLTP <-> OLAP
Amazon Redshift is a fast, fully managed, petabyte-scale
data warehouse service
Amazon Redshift
Fast and powerful
Parallelize and Distribute Everything
Dramatically Reduce I/O
Direct-attached storage
Large data block sizes
Column data store
Data compression
Zone maps
MPP
Load
Query
Resize
Backup
Restore
Fully Managed
Protect Operations
Simplify Provisioning
Redshift data is always encrypted
Continuously backed up to S3
Automatic node recovery
Transparent disk failure
Create a cluster in minutes
Automatic OS and software patching
Scale up to 1.6PB with a few clicks and no downtime
Amazon Redshift architecture
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
Focus on your application
Best of both worlds: Use both SQL and
NoSQL models in one app
More on Amazon Redshift?
03:15pm to 03:45pm
Introducing the Amazon Redshift data warehouse
Room Zero
Speaker: Steffen Krause, Amazon

Aws Summit Berlin 2013 - Understanding database options on AWS

  • 1.
    Jan Borch -AWS Solutions Architect Understanding Database Options on AWS Jan Borch #awssummit Berlin
  • 2.
    We want tomake it easy for you to start 1. Zero to Application in ____ Minutes 2. Zero to Millions of users in ____ Days 3. Zero to “Profits!” ASAP
  • 3.
  • 4.
  • 5.
    Spectrum of optionson AWS SQL NoSQL Low Cost High Cost  Do-it-yourself Fully Managed Not available on AWS
  • 6.
    Spectrum of optionson AWS SQL NoSQL Do-it-yourself Fully Managed
  • 7.
    RDS - MySQL - Oracle -SQL Server MySQL Oracle SQL Server PostgreSQL Your favorite RDBMS Spectrum of options on AWS SQL NoSQL Do-it-yourself Fully Managed
  • 8.
    Spectrum of optionson AWS SQL NoSQL Do-it-yourself Fully Managed MongoDB Cassandra Redis Memcached … Amazon DynamoDB Amazon ElastiCache
  • 9.
    Thinking about thequestions Should I use SQL or NoSQL? Should I use MySQL on EC2 or RDS? Should I use MongoDB, Cassandra , or DynamoDB? Should I use Redis, Memcached, or ElastiCache? ?
  • 10.
    Actually, thinking aboutthe right questions What are my scale and latency needs? What are my transactional and consistency needs? What are my read/write, storage and IOPS needs? What are my time to market and server control needs? ?
  • 11.
    Focus on yourapplication
  • 12.
    Option 1: Run yourdatabases on EC2
  • 13.
    Amazon Elastic ComputeCloud Amazon EC2
  • 14.
    Virtual core: 1 Memory:1.7 GiB I/O performance: Moderate m1.small cc2.8xlarge Virtual core: 32 - 2 x Intel Xeon Memory: 60,5 GiB I/O performance: 10 Gbit cr1.8xlarge Virtual core: 32 - 2 x Intel Xeon Memory: 240 GiB I/O performance: 10 Gbit SSD Instance store: 240 GB cr1.8xlarge Virtual core: 16 Memory: 60.5 GiB I/O performance: 10 Gbit SSD Instance store: 2 x 1TB cr1.8xlarge Virtual core: 16 Memory: 117 GiB I/O performance: 10 Gbit Instance store: 24 x 2TB
  • 16.
    Choose an AmazonMachine Image
  • 17.
    Leverage AWS services EBSstorage Volumes with EBS Snapshots S3 for backups (for example Oracle RMAN) Automation with AWS API or CloudFormation
  • 18.
    Option 2: Let AWSmanage my databases
  • 19.
    backup & recovery, dataload & unload performance tuning 25%40% 5% 5% scripting & coding security planning install, upgrade, patch and migrate documentation, licensing & training differentiated effort increases the uniqueness of an application Why Managed Databases?
  • 20.
    We believe inchoice One size does not fit all Traditional Apps Relational DB Needs High Performance, High Scale Data Warehouses New Web Apps Massive Scalability Amazon RDS Amazon ElasticCache Amazon DynamoeDB Amazon Redshift
  • 21.
  • 22.
    Amazon Relational DatabaseServices AmazonRDS RDS is a fully managed relational database service that is simple to deploy, easy to scale, reliable and cost-effective
  • 23.
  • 24.
  • 25.
  • 26.
    Push Button Scaling Scale… • vertically up or down • Storage vertically
  • 27.
    Price reduction High Availability: Multi-AZDeployments Multi AZ price reductions ranging from 15% to 32%
  • 28.
    A few clicksor one API call
  • 29.
    Horizontal Scaling withRead Replicas New Features • Endpoint Renaming • ReadReplica to master promotion
  • 30.
    A few clicksor one API call
  • 31.
  • 32.
    Security Oracle Native NetworkEncryption and Transparent Data Encryption on Oracle EE SSL support for SQL Server and mysql
  • 33.
    Amazon RDS Configuration Improve Availability Increase Throughput Reduce Latency Push-Button Scaling Multi-AZ ReadReplicas Provisioned IOPS Read ReplicasPush-Button Scaling Provisioned IOPS Region Multi-AZ Availability Zone Availability Zone Availability and performance options
  • 34.
  • 35.
    Who is succeedingwith RDS? Thousands of developers use RDS every single day Gaming Web Apps Mobile/Social Media
  • 36.
    Amazon Elastic Cache AmazonElastiCache is a fully managed Memcached-compatible caching service
  • 37.
  • 38.
    Amazon DynamoDB Amazon DynamoDBis a fully managed NoSQL database service
  • 39.
    Single digit millisecondlatency. Backed on solid-state drives. Consistent, predictable performance
  • 40.
    No table sizelimits. Unlimited storage No downtime. Seamless scalability
  • 41.
    Consistent, disk onlywrites. Replication across data centers and availability zones. Durable
  • 42.
    Without the operationalburden. managed by DynamoDB
  • 43.
    Three click oron API call
  • 44.
    Reserve IOPS forreads and writes. Scale up for down at any time. Provisioned throughput.
  • 45.
    Pay per capacityunit READ Capacity Units = Size of item (KB) x read per second Consistent read: $0.0065 for 50 read units Eventually consistent reads: $0.0065 for 100 read units WRITE Capacity Units = Size of item (KB) x write per second $0.0065 for 10 write units
  • 46.
    Reserved capacity Up to53% for 1 year reservation Up to 76% for 3 year reservation
  • 47.
    Transactions Item level transactionsonly Puts, updates and deletes are ACID Atomic increment and decrement Conditional writes
  • 48.
    Read Consistency Strong oreventually consistent reads Same latency expectations for strong Mix and match at „read time‟
  • 49.
    Data Modeling Tables donot require a formal schema Items are an arbitrarily sized hash.
  • 50.
    id = 100 date= 2012-05-16-09- 00-10 total = 25.00 id = 101 date = 2012-05-15-15- 00-11 total = 35.00 id = 101 date = 2012-05-16-12- 00-10 total = 100.00 id = 102 date = 2012-03-20-18- 23-10 total = 20.00 id = 102 date = 2012-03-20-18- 23-10 total = 120.00 Data modeling Table
  • 51.
    id = 100 date= 2012-05-16-09- 00-10 total = 25.00 id = 101 date = 2012-05-15-15- 00-11 total = 35.00 id = 101 date = 2012-05-16-12- 00-10 total = 100.00 id = 102 date = 2012-03-20-18- 23-10 total = 20.00 id = 102 date = 2012-03-20-18- 23-10 total = 120.00 Data modeling Item
  • 52.
    id = 100 date= 2012-05-16-09- 00-10 total = 25.00 id = 101 date = 2012-05-15-15- 00-11 total = 35.00 id = 101 date = 2012-05-16-12- 00-10 total = 100.00 id = 102 date = 2012-03-20-18- 23-10 total = 20.00 id = 102 date = 2012-03-20-18- 23-10 total = 120.00 Data modeling Attributes
  • 53.
    Items are indexedby primary and secondary keys Primary keys can be composite Secondary keys are local to the table Indexing
  • 54.
  • 55.
    ID Date Total Hashkey Indexing
  • 56.
    ID Date Total Hashkey Range key Composite primary key Indexing
  • 57.
    ID Date Total Hashkey Range key Secondary range key Indexing
  • 58.
    Programming DynamoDB. Small butperfectly formed API. CreateTable UpdateTable DeleteTable DescribeTable ListTables PutItem GetItem UpdateItem DeleteItem BatchGetItem BatchWriteItem Query Scan Manage tables Query specific items OR scan the full table “Select”, “insert”, “update” items Bulk select or update (max 1MB)
  • 59.
    Query patterns Retrieve allitems by hash key. Range key conditions: ==, <, >, >=, <=, begins with, between. Counts. Top and bottom n values. Paged responses.
  • 60.
    500,000 WRITES PERSECOND DURING SUPER BOWL
  • 61.
    Amazon DynamoDB: whois succeeding with it?
  • 62.
  • 63.
    OLTP <-> OLAP SELECTProductID, Name FROM Products Where ProductID = 1234; SELECT ProductID, count(*) FROM Page_Hits WHERE hour in (12,13) GROUP BY ProductID
  • 64.
    Transactional Processing • Globalcontext – Daily revenue report • Throughput • Full table scans • Sequential IO • Disk Transfer rates Analytical Processing • Transactional context – Get order total • Latency • Indexed access • Random IO • Disk Seek times OLTP <-> OLAP
  • 65.
    Amazon Redshift isa fast, fully managed, petabyte-scale data warehouse service Amazon Redshift
  • 66.
    Fast and powerful Parallelizeand Distribute Everything Dramatically Reduce I/O Direct-attached storage Large data block sizes Column data store Data compression Zone maps MPP Load Query Resize Backup Restore
  • 67.
    Fully Managed Protect Operations SimplifyProvisioning Redshift data is always encrypted Continuously backed up to S3 Automatic node recovery Transparent disk failure Create a cluster in minutes Automatic OS and software patching Scale up to 1.6PB with a few clicks and no downtime
  • 68.
    Amazon Redshift architecture 10GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 69.
    Focus on yourapplication
  • 70.
    Best of bothworlds: Use both SQL and NoSQL models in one app
  • 71.
    More on AmazonRedshift? 03:15pm to 03:45pm Introducing the Amazon Redshift data warehouse Room Zero Speaker: Steffen Krause, Amazon