Scalable Databases for Fast Growing Startups
Blair Layton
Business Development Manager – Database Services
Amazon Web Serv...
Agenda
• Self-Managed or Managed Database Services?
• NoSQL or Relational?
• Performance Tips and Tricks
• How to scale fr...
SelfSelfSelfSelf----Managed orManaged orManaged orManaged or
Managed Database Services?Managed Database Services?Managed D...
backup & recovery,
data load & unload
performance tuning
25%25%25%25%40%40%40%40%
5%5%5%5% 5%5%5%5%
scripting & coding
sec...
If You Host Your Databases On-premises
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database...
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s...
If You Host Your Databases in EC2
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database back...
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
you
App optimization
Power, HVAC, net...
If You Choose a Managed
Database Service
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Databa...
differentiated effort increases the
uniqueness of an application
Amazon RDS
Amazon DynamoDB Amazon Redshift
Amazon ElastiCache
Compute Storage
AWS Global Infrastructure
Database
Applicati...
Relational Databases
Fully managed; zero admin
MySQL, Oracle, Postgres, SQL Server
Trillions of I/O requests/month
Amazon
...
Flipboard relies on Amazon RDS
• Flipboard is an online
magazine with millions of
users and billions of “flips”
per month
...
• Manageability
Rapid deployment with pre-configured parameters
Patch Management
Monitoring and Metrics
• Availability and...
RDS for Production Workloads
AmazonAmazonAmazonAmazon RDSRDSRDSRDS
ConfigurationConfigurationConfigurationConfiguration
Im...
In-Memory Cache
Elastic and reliable
Memcached or Redis
Fully managed; zero admin
Amazon
ElastiCache
ElastiCache: Fully Managed Cache Service
Easy to
Deploy
Deploy master-
slave(s)
configuration with
a few button clicks
or ...
Application
Server
Hot Items
Small, frequently-accessed items are ideal
candidates for read caching
• Reduce server-side l...
NoSQL Database
Durable low latency
Fully managed; zero admin
Massive and seamless scalability
Amazon
DynamoDB
WRITES
Continuously replicated to 3 AZ’s
Quorum acknowledgment
Persisted to disk (custom SSD)
READS
Strongly or eventually...
Petabyte scale
Massively parallel
Relational data warehouse
Fully managed; zero admin
Amazon
Redshift
a lot faster
a lot c...
• Load
• Query
• Resize
• Backup
• Restore
Parallelize and Distribute Everything
Compute
Node
16TB
10 GigE
(HPC)
Ingestion...
Databases on EC2
• Any database that runs on Windows or Linux!
• Why?
• No managed service exists from AWS, e.g. MongoDB
•...
NoSQL or Relational?
Not available
on AWS
Spectrum of Database Options
SQL NoSQL
Low Cost High Cost
Do-it Yourself Fully
Managed
Spectrum of Database Options
SQL NoSQL
Do-it Yourself Fully
Managed
MySQL, Oracle, SQL
Server, PostgreSQL
Amazon Redshift
Spectrum of Database Options
SQL NoSQL
Do-it Yourself Fully
Managed
...
Spectrum of Database Options
SQL NoSQL
Do-it Yourself Fully
Managed
MongoDB
Cassandra
Redis
Memcache
DynamoDB
ElastiCache ...
Thinking About the Questions
Should I use
SQL or NoSQL?
Should I use
MySQL or
PostgreSQL?
Should I use Redis,
Memcache, or...
Actually, Thinking About the Right Questions
What are my scale
and latency
needs?
What are my
transactional and
consistenc...
Factors to Consider
Factors SQL NoSQL
Application • App with complex business logic? • Web app with lots of users?
Transac...
Performance
Tips and Tricks
Performance Tips and Tricks
• Understand your workload
– Read:Write ratio, I/O requirements, CPU requirements
• Identify b...
EBS =
Amazon EBS Magnetic
Amazon Elastic
Block Storage
(EBS)
• IOPS: ~100 IOPS steady-state, with best-effort bursts
to hundreds...
Amazon EBS General Purpose
• IOPS: 3 IOPS per GB consistent, with bursts to 3,000 IOPS.
Bucket principle, fills up when no...
Amazon EBS Provisioned IOPS
• IOPS: Within 10% of up to 4000 IOPS,
99.9% of a given year, as provisioned.
• Throughput: 16...
EC2
Why the ?
*
An I/O
EBS
Just because Amazon EC2 sends more
work doesn’t mean there’s enough
bandwidth to handle it!
EC2
Why the ?
*
An I/O
Without more bandwidth,
more EBS volumes or higher PIOPS won’t help!
EBS-Optimized
Oh, YEAH!!
*
EC2
A “boatload” of I/O
*
EBS w/ PIOPS
Architecting for Performance
• IOPS consistency requires EBS-
optimized instances
• Maximum throughput delivered by
Amazon...
Additional Hints
• Mount partitions with “noatime” and “nodiratime”
– Removes a write every time a read is done
• Turn off...
Scaling from
1 to 10,000,000 Users
So how do we scale?
Hi, I have NO IDEA what I am doing!!
So let’s start from day
one, user one ( you )
Day One, User One:
• We could potentially get
to a few hundred to a few
thousand depending on
application complexity
and t...
“We’re gonna need a bigger box”
• Simplest approach
• Can now leverage PIOPs
• High I/O instances
• High memory instances
...
Day Two, User >1
First let’s separate out
our single host into
more than one.
• Web
• Database
– Make use of a database
se...
Start with the right
databases for the job
User >100
First let’s separate out
our single host into
more than one
• Web
• Database
– Use RDS to make your life
easier ...
User > 1000
Next let’s address our
lack of failover and
redundancy issues
• Elastic Load Balancing
• Another web instance
...
User >10 ks–100 ks
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
RDS DB Instance
Standby (Multi-AZ...
This will take us pretty far
honestly, but we care about
performance and
efficiency, so let’s clean
this up a bit
Shift Some Load Around
Let’s lighten the load on our
web and database instances
• Move static content from the
web instanc...
User >500k+
Availability Zone
Amazon
Route 53
User
Amazon S3
Amazon
Cloudfront
Availability Zone
Elastic Load
Balancing
Dy...
From 500K to 1 Million Users
• Getting serious now
• Significant user base
• Plenty of attention if things go wrong
• Inte...
Time to make some
radical improvements at
the web & app layers
SOAing
Move services into their own tiers
or modules. Treat each of these
as 100% separate pieces of your
infrastructure a...
Users > 1 Million
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Elastic Load
Balancer
RDS DB Instance
Read Replica
R...
The next big steps
From 5 to 10 Million Users
You may start to run into issues with your database around
contention on the write master.
How ...
How do I Save Money?
Saving $$$
• Use managed database services
– Focus your limited resources on the application
– Elasticache can reduce your...
Sorry, How do I Scale my
Database?
Summary
• Decide on self-managed or managed database services
• Choose the right database for your use case and skillsets ...
Q & A
AWS Activate webinar - Scalable databases for fast growing startups
AWS Activate webinar - Scalable databases for fast growing startups
Upcoming SlideShare
Loading in...5
×

AWS Activate webinar - Scalable databases for fast growing startups

948

Published on

Fast growing startups building high scale applications demand a lot from their infrastructure and in particular from their databases. Often, databases become the bottleneck of the startups’ technology stack, with the risk of inhibiting fast growth as they are not easy to set up, operate and scale in the cloud. This webinar focuses on how to build scalable databases in the Cloud and covers how to effectively combine the use of relational, NoSQL, and even data warehouse databases, which have become a reality for startups with the launch of Amazon Redshift.
Key takeaways:

Understand the trade-off between SQL and NoSQL and when to go for a hybrid model.
Best practices in setting up your database in the AWS cloud whether using managed services or managing it yourself.
Learn how to minimize the costs of your database with the right architecture and pricing models.
Who should attend:

DBA’s
Startup CTO’s
Developers
Engineers
Architects
Growth Hackers

Published in: Data & Analytics
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
948
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
33
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

AWS Activate webinar - Scalable databases for fast growing startups

  1. 1. Scalable Databases for Fast Growing Startups Blair Layton Business Development Manager – Database Services Amazon Web Services - APAC
  2. 2. Agenda • Self-Managed or Managed Database Services? • NoSQL or Relational? • Performance Tips and Tricks • How to scale from 1 to 10,000,000 users? • How do I save money? • Summary • Q&A
  3. 3. SelfSelfSelfSelf----Managed orManaged orManaged orManaged or Managed Database Services?Managed Database Services?Managed Database Services?Managed Database Services?
  4. 4. backup & recovery, data load & unload performance tuning 25%25%25%25%40%40%40%40% 5%5%5%5% 5%5%5%5% scripting & coding security planning install, upgrade, patch and migrate documentation, licensing & training Why Managed Databases?
  5. 5. If You Host Your Databases On-premises Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups Scaling High availability DB s/w installs OS installation you App optimization
  6. 6. Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups Scaling High availability DB s/w installs OS installation you App optimization If You Host Your Databases On-premises
  7. 7. If You Host Your Databases in EC2 Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups Scaling High availability DB s/w installs OS installation you App optimization
  8. 8. OS patches DB s/w patches Database backups Scaling High availability DB s/w installs you App optimization Power, HVAC, net Rack & stack Server maintenance OS installation If You Host Your Databases in EC2
  9. 9. If You Choose a Managed Database Service Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups App optimization High availability DB s/w installs OS installation you Scaling
  10. 10. differentiated effort increases the uniqueness of an application
  11. 11. Amazon RDS Amazon DynamoDB Amazon Redshift Amazon ElastiCache Compute Storage AWS Global Infrastructure Database Application Services Deployment & Administration Networking AWS Database Services Scalable High Performance Application Storage in the Cloud
  12. 12. Relational Databases Fully managed; zero admin MySQL, Oracle, Postgres, SQL Server Trillions of I/O requests/month Amazon RDS
  13. 13. Flipboard relies on Amazon RDS • Flipboard is an online magazine with millions of users and billions of “flips” per month • Uses Amazon RDS and its Multi-AZ capabilities to store mission critical user data "We were able to go from concept to delivered product in about six months with just a handful of engineers." - Greg Scallan, Chief Architect, Flipboard
  14. 14. • Manageability Rapid deployment with pre-configured parameters Patch Management Monitoring and Metrics • Availability and Data Durability Automated Backups and Point-In-Time-Recovery DB Snapshots Automatic Host Replacement (Single-AZ) Multi-AZ deployments • Scalability Push-Button Scaling • Storage, Memory and Compute Read Replicas Key Features
  15. 15. RDS for Production Workloads AmazonAmazonAmazonAmazon RDSRDSRDSRDS ConfigurationConfigurationConfigurationConfiguration ImproveImproveImproveImprove AvailabilityAvailabilityAvailabilityAvailability IncreaseIncreaseIncreaseIncrease ThroughputThroughputThroughputThroughput ReduceReduceReduceReduce LatencyLatencyLatencyLatency PushPushPushPush----Button ScalingButton ScalingButton ScalingButton Scaling MultiMultiMultiMulti AZAZAZAZ ReadReadReadRead ReplicasReplicasReplicasReplicas Provisioned IOPSProvisioned IOPSProvisioned IOPSProvisioned IOPS Read ReplicasPush-Button Scaling Provisioned IOPS Region Multi-AZ availability zone availability zone
  16. 16. In-Memory Cache Elastic and reliable Memcached or Redis Fully managed; zero admin Amazon ElastiCache
  17. 17. ElastiCache: Fully Managed Cache Service Easy to Deploy Deploy master- slave(s) configuration with a few button clicks or API calls Easy to Migrate Compatible with memcached or Redis Existing code will work when you update node end points Easy to Administer ElastiCache automatically replaces failed nodes and patches software as needed CloudWatch enables you to monitor cache performance metrics Easy to Secure Supports VPC and Security Group configurations Easy to Scale Provide assisted scale up and scale out capability
  18. 18. Application Server Hot Items Small, frequently-accessed items are ideal candidates for read caching • Reduce server-side latency to <1ms • Eliminate “hot spot” performance barriers • Offload heavy read activity from database
  19. 19. NoSQL Database Durable low latency Fully managed; zero admin Massive and seamless scalability Amazon DynamoDB
  20. 20. WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD) READS Strongly or eventually consistent No trade-off in latency Durable Low Latency – At Scale
  21. 21. Petabyte scale Massively parallel Relational data warehouse Fully managed; zero admin Amazon Redshift a lot faster a lot cheaper a whole lot simpler
  22. 22. • Load • Query • Resize • Backup • Restore Parallelize and Distribute Everything Compute Node 16TB 10 GigE (HPC) Ingestion Backup Restore SQL Clients / BI Tools Amazon S3 Client VPC Compute Node 16TB Compute Node 16TB Leader Node
  23. 23. Databases on EC2 • Any database that runs on Windows or Linux! • Why? • No managed service exists from AWS, e.g. MongoDB • Full control • Exceed limits of managed service, e.g. > 3TB of storage on RDS
  24. 24. NoSQL or Relational?
  25. 25. Not available on AWS Spectrum of Database Options SQL NoSQL Low Cost High Cost Do-it Yourself Fully Managed
  26. 26. Spectrum of Database Options SQL NoSQL Do-it Yourself Fully Managed
  27. 27. MySQL, Oracle, SQL Server, PostgreSQL Amazon Redshift Spectrum of Database Options SQL NoSQL Do-it Yourself Fully Managed MySQL, Oracle, SQL Server, PostgreSQL, MariaDB, Vertica, ParAccel…
  28. 28. Spectrum of Database Options SQL NoSQL Do-it Yourself Fully Managed MongoDB Cassandra Redis Memcache DynamoDB ElastiCache (Memcache) ElastiCache (Redis) SimpleDB
  29. 29. Thinking About the Questions Should I use SQL or NoSQL? Should I use MySQL or PostgreSQL? Should I use Redis, Memcache, or ElastiCache? ?Should I use MongoDB, Cassandra, or DynamoDB?
  30. 30. Actually, Thinking About the Right Questions What are my scale and latency needs? What are my transactional and consistency needs? What are my read/write, storage and IOPS needs? What are my time to market and server control needs? ?
  31. 31. Factors to Consider Factors SQL NoSQL Application • App with complex business logic? • Web app with lots of users? Transactions • Complex transactions, joins, updates? • Simple data model, updates, queries? Scale • Developer managed • Automatic, on-demand scaling Performance • Developer architected • Consistent, high performance at scale Availability • Architected for fail-over • Seamless and transparent Core Skills • SQL + Java/Ruby/Python/PhP • NoSQL + Java/Ruby/Python/PhP Best of both worlds: Possible to Use SQL and NoSQL models in one AppBest of both worlds: Possible to Use SQL and NoSQL models in one App
  32. 32. Performance Tips and Tricks
  33. 33. Performance Tips and Tricks • Understand your workload – Read:Write ratio, I/O requirements, CPU requirements • Identify bottlenecks – CPU, Memory, Disk I/O, Network latency/bandwidth – Use Cloudwatch and OS metrics • Choose the right instance type – High CPU, High Memory, High Storage, etc. • Understand EBS!
  34. 34. EBS =
  35. 35. Amazon EBS Magnetic Amazon Elastic Block Storage (EBS) • IOPS: ~100 IOPS steady-state, with best-effort bursts to hundreds. 40-200 IOPS in terms of variability. • Throughput: variable by workload, best effort to 10s of MB/s. • Latency: Varies, reads typically <20 ms, writes typically <10 ms. • Capacity: As provisioned, up to 1 TB.
  36. 36. Amazon EBS General Purpose • IOPS: 3 IOPS per GB consistent, with bursts to 3,000 IOPS. Bucket principle, fills up when not used and empties as used. • Throughput: variable by workload, best effort to 64 MB/s. • Latency: Low and consistent. • Capacity: As provisioned, up to 1 TB. Amazon Elastic Block Storage (EBS)
  37. 37. Amazon EBS Provisioned IOPS • IOPS: Within 10% of up to 4000 IOPS, 99.9% of a given year, as provisioned. • Throughput: 16 KB per I/O = up to 64 MB/s, as provisioned. • Latency: low and consistent, at recommended QD • Capacity: As provisioned, up to 1 TB * * Amazon Elastic Block Storage (EBS)
  38. 38. EC2 Why the ? * An I/O EBS Just because Amazon EC2 sends more work doesn’t mean there’s enough bandwidth to handle it!
  39. 39. EC2 Why the ? * An I/O Without more bandwidth, more EBS volumes or higher PIOPS won’t help!
  40. 40. EBS-Optimized Oh, YEAH!! * EC2 A “boatload” of I/O * EBS w/ PIOPS
  41. 41. Architecting for Performance • IOPS consistency requires EBS- optimized instances • Maximum throughput delivered by Amazon EBS is limited by Amazon EC2 bandwidth • EBS throughput = EBS IOPS × Block size – Ex: 64 MB/s = 4000 IOPS × 16 KB Max 8k = 2x Max 4k = 4x* Max 2k = 8x* *Maximum IOPS is also limited to ~100,000 per 32 vCpu, irrespective of block size/throughput.
  42. 42. Additional Hints • Mount partitions with “noatime” and “nodiratime” – Removes a write every time a read is done • Turn off file system read ahead if possible – Especially for OLTP systems • Use vendor storage solutions – Oracle ASM • Optimize kernel settings
  43. 43. Scaling from 1 to 10,000,000 Users
  44. 44. So how do we scale?
  45. 45. Hi, I have NO IDEA what I am doing!!
  46. 46. So let’s start from day one, user one ( you )
  47. 47. Day One, User One: • We could potentially get to a few hundred to a few thousand depending on application complexity and traffic • No failover • No redundancy • Too many eggs in one basket EC2 Instance Elastic IP Amazon Route 53 User
  48. 48. “We’re gonna need a bigger box” • Simplest approach • Can now leverage PIOPs • High I/O instances • High memory instances • High CPU instances • High storage instances • Easy to change instance sizes • Will hit an endpoint eventually r3.8xlarge m3.2xlarge t2.small
  49. 49. Day Two, User >1 First let’s separate out our single host into more than one. • Web • Database – Make use of a database service? Web Instance Database Instance Elastic IP Amazon Route 53 User
  50. 50. Start with the right databases for the job
  51. 51. User >100 First let’s separate out our single host into more than one • Web • Database – Use RDS to make your life easier Web Instance Elastic IP RDS DB Instance Amazon Route 53 User
  52. 52. User > 1000 Next let’s address our lack of failover and redundancy issues • Elastic Load Balancing • Another web instance – In another Availability Zone • Enable Amazon RDS multi-AZ Web Instance RDS DB Instance Active (Multi-AZ) Availability Zone Availability Zone Web Instance RDS DB Instance Standby (Multi-AZ) Elastic Load Balancing Amazon Route 53 User
  53. 53. User >10 ks–100 ks RDS DB Instance Active (Multi-AZ) Availability Zone Availability Zone RDS DB Instance Standby (Multi-AZ) Elastic Load Balancing RDS DB Instance Read Replica RDS DB Instance Read Replica RDS DB Instance Read Replica RDS DB Instance Read Replica Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Amazon Route 53 User
  54. 54. This will take us pretty far honestly, but we care about performance and efficiency, so let’s clean this up a bit
  55. 55. Shift Some Load Around Let’s lighten the load on our web and database instances • Move static content from the web instance to Amazon S3 and CloudFront • Move dynamic content from the Elastic Load Balancing to CloudFront • Move session/state and DB caching to ElastiCache or DynamoDB Web Instance RDS DB Instance Active (Multi-AZ) Availability Zone Elastic Load Balancing Amazon S3 Amazon CloudFront Amazon Route 53 User ElastiCache Amazon DynamoDB
  56. 56. User >500k+ Availability Zone Amazon Route 53 User Amazon S3 Amazon Cloudfront Availability Zone Elastic Load Balancing DynamoDB RDS DB Instance Read Replica Web Instance Web Instance Web Instance ElastiCache RDS DB Instance Read Replica Web Instance Web Instance Web Instance ElastiCacheRDS DB Instance Standby (Multi-AZ) RDS DB Instance Active (Multi-AZ)
  57. 57. From 500K to 1 Million Users • Getting serious now • Significant user base • Plenty of attention if things go wrong • Interesting phase for startups with funding rounds
  58. 58. Time to make some radical improvements at the web & app layers
  59. 59. SOAing Move services into their own tiers or modules. Treat each of these as 100% separate pieces of your infrastructure and scale them independently. Use queues! Amazon.com and AWS do this extensively! It offers flexibility and greater understanding of each component.
  60. 60. Users > 1 Million RDS DB Instance Active (Multi-AZ) Availability Zone Elastic Load Balancer RDS DB Instance Read Replica RDS DB Instance Read Replica Web Instance Web Instance Web Instance Web Instance Amazon Route 53 User Amazon S3 Amazon Cloudfront Amazon DynamoDB Amazon SQS ElastiCache Worker Instance Worker Instance Amazon CloudWatch Internal App Instance Internal App Instance Amazon SES
  61. 61. The next big steps
  62. 62. From 5 to 10 Million Users You may start to run into issues with your database around contention on the write master. How can you solve it? • Federation (splitting into multiple DBs based on function) • Sharding (splitting one data set up across multiple hosts) • Moving some functionality to other types of databases – NoSQL for hot tables, lookup tables, leaderboards/scoring, meta data – Data warehouse for analytics: user behavior, performance monitoring, a/b testing results, KPIs/dashboards.
  63. 63. How do I Save Money?
  64. 64. Saving $$$ • Use managed database services – Focus your limited resources on the application – Elasticache can reduce your database costs • Understand how to scale from the start – Save redesign work and unhappy customers – Start and stop instances as required • Use the AWS platform – Don’t reinvent the wheel, concentrate on your core competency – Using CloudFront will reduce your costs on EC2 dramatically • Purchase RIs and use spot instances • Constantly monitor and right-size your environment
  65. 65. Sorry, How do I Scale my Database?
  66. 66. Summary • Decide on self-managed or managed database services • Choose the right database for your use case and skillsets to start with • Use Multi-AZ for your infrastructure • Choose the right instance family and size for your workloads • Understand the 3 types of EBS (Magnetic, General Purpose and PIOPS) • Make use of self-scaling services (Elastic Load Balancing, Amazon S3, Amazon SNS, SQS, Amazon SES, etc.) • Build in redundancy at every level • Blend SQL & NoSQL wisely • Use a data warehouse to offload large analytical queries from your main database • Cache data both inside and outside your infrastructure • Purchase RIs and use Spot instances • Split tiers into individual services (SOA) • Use autoscaling once you are ready for it • Use automation tools in your infrastructure • Make sure you have good metrics, monitoring, and logging tools in place • Don’t reinvent the wheel
  67. 67. Q & A
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×