Successfully reported this slideshow.
Rethinking the database for the
cloud
AWS database services best practices
Amazon Data Services Japan
Rasmus Ekman
Traditional architecture
Client
Application
Relational database
Problems with this approach
Client
Application
Relational database
• It doesn’t scale
• Management is hard
• High cost
• L...
Why do we get these problems?
When all you have is a hammer, everything looks like a nail
Client
Application
Relational da...
Rethinking the architecture
Client
Application
Data
Search
NoSQL SQL DWH
Cache
Hadoop
Blob
Store
ETL
AWS service and use case mapping
Data
Search NoSQL SQL DWHCache Hadoop
Blob
store
ETL
Amazon S3 Amazon EMRDynamoDB Amazon ...
Sample references
Social gaming
Autoscaling
Elastic
Loadbalancer
Mobile client
DynamoDB Amazon S3
Log files
Amazon
Elastic
MapReduce
3
1
2
S...
E-commerce site
Autoscaling
End users
RDS
(Master)
ElastiCache
4
1
2
High availability, search performance
and flexibility...
How do I know which service to pick?
The “data temperature” method
What is “data temperature”?
Data ?
http://www.amazon.co.jp/dp/B0016V9FCQ
Data temperature
Hot Warm Cold
Volume MB~GB GB~TB PB
Item size B~KB KB~MB KB~TB
Latency ms ms-s min-hr
Durability Low-high...
The AWS service heat map
Low
Data volume
Latency
Cost/GB
Request
Amazon
ElastiCache Amazon RDS
Amazon DynamoDB Amazon S3
A...
How do I know which service to pick?
The cost estimation method
Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
• “I’m currently scoping out a project that...
Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
• Time for …
※: http://calculator.s3.amazon...
Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
Request rate Object size Total size Objects...
Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
Request rate Object size Total size Objects...
Summary
Summary
• The era of relational database only onpremises
architecture is over.
• Performance, reliability, and scalability...
When in doubt, contact us
https://aws.amazon.com/jp/contact-us/
APPENDIX
AWS database services -
introduction and best practices
Amazon RDS
A fully managed relational database service
• Create and scale with a
few clicks
• Automated backups every
5 mi...
Amazon RDS
A fully managed relational database service
When to use
• Transactions
• Complex queries
• Medium to high query...
DynamoDB
Fully managed NoSQL service
• Easy administration and
high availability
– No SPOF
– Data is replicated into 3
ava...
DynamoDB
Fully managed NoSQL service
• Fast and predictable
performance
• Seamless/massive scale
• Autosharding
• Consiste...
Amazon Redshift
Fully managed data warehouse service
• DWH as a Service: Amazon Redshift
is a fast, fully
managed, petabyt...
Amazon Redshift
Fully managed data warehouse service
• Information analysis and
reporting
• Complex DW queries that
summar...
Amazon S3
low cost, highly reliable object storage service
Datacenter A
Datacenter C
Datacenter B
File A
File B
File C
Use...
Amazon S3
low cost, highly reliable object storage service
• Store large objects
• Key-value store - Get/Put/List
• Unlimi...
Upcoming SlideShare
Loading in …5
×

Rethinking the database for the cloud (iJAWS)

750 views

Published on

Published in: Technology
  • Be the first to comment

Rethinking the database for the cloud (iJAWS)

  1. 1. Rethinking the database for the cloud AWS database services best practices Amazon Data Services Japan Rasmus Ekman
  2. 2. Traditional architecture Client Application Relational database
  3. 3. Problems with this approach Client Application Relational database • It doesn’t scale • Management is hard • High cost • Low performance • Migration is difficult
  4. 4. Why do we get these problems? When all you have is a hammer, everything looks like a nail Client Application Relational database
  5. 5. Rethinking the architecture Client Application Data Search NoSQL SQL DWH Cache Hadoop Blob Store ETL
  6. 6. AWS service and use case mapping Data Search NoSQL SQL DWHCache Hadoop Blob store ETL Amazon S3 Amazon EMRDynamoDB Amazon RDSElastiCache Amazon Redshift AWS Data Pipeline Amazon CloudSearch
  7. 7. Sample references
  8. 8. Social gaming Autoscaling Elastic Loadbalancer Mobile client DynamoDB Amazon S3 Log files Amazon Elastic MapReduce 3 1 2 Social gaming have a large amount of transactions, which all require high performance and extreme scalability ① Player data is stored in Amazon DynamoDB, which can scale both in terms of data volume and performance. Long term usage log files are sent in parallel to S3 for unlimited and cheap storage. Big data analytics are done in EMR, which can be easily integrated with both DynamoDB and S3. 1 2 3
  9. 9. E-commerce site Autoscaling End users RDS (Master) ElastiCache 4 1 2 High availability, search performance and flexibility to rapidly change data structures to fit new business requirements. ① For high performance, low latency responses, cache in Elasticache first ② Order and customer information stored in a traditional, but fault tolerant RDS. 商 Item meta data, such as color, title etc are all stored in DynamoDB for a very flexible data schema ④ For scalable search meta data is indexed into CloudSearch, which can handle full text search easily 1 2 3 RDS (Slave) Amazon CloudSearch Amazon DynamoDB 4
  10. 10. How do I know which service to pick? The “data temperature” method
  11. 11. What is “data temperature”? Data ? http://www.amazon.co.jp/dp/B0016V9FCQ
  12. 12. Data temperature Hot Warm Cold Volume MB~GB GB~TB PB Item size B~KB KB~MB KB~TB Latency ms ms-s min-hr Durability Low-high High Very high Request rate Very high High Low Cost/GB $$~$ $~¢¢ ¢ The temperature of the data will vary depending on its format and use.
  13. 13. The AWS service heat map Low Data volume Latency Cost/GB Request Amazon ElastiCache Amazon RDS Amazon DynamoDB Amazon S3 Amazon RedShift Amazon EMR Low High High Low Low High High
  14. 14. How do I know which service to pick? The cost estimation method
  15. 15. Choosing service based on cost estimate Example: Should I pick S3 or DynamoDB? • “I’m currently scoping out a project that will greatly increase my team’s use of Amazon S3. Hoping you could answer some questions. The current iteration of the design calls for many small files, perhaps up to a billion during peak. The total size would be on the order of 1.5 TB per month…” Request rate writes/s Object size bytes Total size GB/month Objects per month 300 2048 1483 777,600,000
  16. 16. Choosing service based on cost estimate Example: Should I pick S3 or DynamoDB? • Time for … ※: http://calculator.s3.amazonaws.com/index.html?lng=ja_JP
  17. 17. Choosing service based on cost estimate Example: Should I pick S3 or DynamoDB? Request rate Object size Total size Objects 300 2048 1483 777,600,000 DynamoDB Monthly cost: $669.56 Amazon S3 Monthly cost: $4325.33<
  18. 18. Choosing service based on cost estimate Example: Should I pick S3 or DynamoDB? Request rate Object size Total size Objects Scenario 1 300 2048 1483 777,600,000 Scenario 2 300 32,768 23,730 777,600,000 DynamoDB win Amazon S3 win
  19. 19. Summary
  20. 20. Summary • The era of relational database only onpremises architecture is over. • Performance, reliability, and scalability can all be improved by the cloud, but choosing the right architecture is must. • There are several ways of choosing the right service for the job – Use the “data temperature” and use case – Use the reverse cost estimate method – Ask AWS sales
  21. 21. When in doubt, contact us https://aws.amazon.com/jp/contact-us/
  22. 22. APPENDIX AWS database services - introduction and best practices
  23. 23. Amazon RDS A fully managed relational database service • Create and scale with a few clicks • Automated backups every 5 minutes for DR • Manual snapshot feature Availability Zone A Availability Zone B Master Slave Data synch Automatic failover Automated backup • Automated security patching • 4 supported engines • Monitoring and automatic recovery
  24. 24. Amazon RDS A fully managed relational database service When to use • Transactions • Complex queries • Medium to high query/write rate – Up to 30 K IOPS (15 K reads + 15K writes) • 100s of GB to low TBs • Workload can fit in a single node • High durability and not to use • Massive read/write rates – Example: 150 K write requests per second • Data size or throughput demands • sharding – Example: 10 s or 100 s of terabytes • Simple Get/Put and queries that a NoSQL can handle • Complex analytics
  25. 25. DynamoDB Fully managed NoSQL service • Easy administration and high availability – No SPOF – Data is replicated into 3 availability zones – Storage scales, and data is automatically partioned • No limit on storage – Only pay for the storage you use – No need to add nodes or disks as storage grows Client Region
  26. 26. DynamoDB Fully managed NoSQL service • Fast and predictable performance • Seamless/massive scale • Autosharding • Consistent/low latency • No size or throughput limits • Very high durability • Key-value or simple queries • Need multi-item/row or cross table transactions • Need complex queries, joins • Need real-time analytics on historic data • Storing cold data When to use and not to use
  27. 27. Amazon Redshift Fully managed data warehouse service • DWH as a Service: Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service • Scalable: 160GB ~ Petabytes • Fast: Amazon Redshift has a massively parallel processing (MPP) architecture, parallelizing and distributing SQL operations to take advantage of all available resources. • Low cost: No initial cost, no license fees, and only pay for what you use. +nodes BI tools リーダー ノード Comput e node Comput e node Comput e node JDBC/ODBC 10GigE Mesh SQL end point: • Parallel queries • Create results S3, DynamoDB, EMR integration
  28. 28. Amazon Redshift Fully managed data warehouse service • Information analysis and reporting • Complex DW queries that summarize historical data • Batched large updates e.g. daily sales totals • 10s of concurrent queries • 100s GB to PB • Compression • Column based • Very high durability • OLTP workloads – 1000s of concurrent users – Large number of singleton updates When to use and not to use
  29. 29. Amazon S3 low cost, highly reliable object storage service Datacenter A Datacenter C Datacenter B File A File B File C User side Infrastructure side • Never lose data with 99.99999999999% reliability • Data automatically replicated • Choose from over 9 regions globally • Only put data, with no need to worry about scalability, infrastructure, volume expansion etc. • Only pay for what you use Example:1GB/Month – ~3yen
  30. 30. Amazon S3 low cost, highly reliable object storage service • Store large objects • Key-value store - Get/Put/List • Unlimited storage • Versioning • Very high durability – 99.999999999% • Very high throughput (via parallel clients) • Use for storing persistent data – Backups – Source/target for EMR – Blob store with metadata in SQL or NoSQL • Complex queries • Very low latency (ms) • Search • Read-after-write consistency for overwrites • Need transactions When to use and not to use

×