Cloud computing gives you a number of advantages, such as being able to scale your application on demand. As a new business looking to use the cloud, you inevitably ask yourself, "Where do I start?" Join us in this session to understand best practices for scaling your resources from zero to millions of users. We will show you how to best combine different AWS services, make smarter decisions for architecting your application, and best practices for scaling your infrastructure in the cloud.
2. • ME: Simon Elisha – Principal Solutions Architect –
Amazon Web Services – @simon_elisha
• YOU: Here to learn more about scaling infrastructure on
AWS
• TODAY: About best practices and things to think about
when building for large scale
10. Day One, User One
• A single EC2 Instance
– With full stack on this host
• Web app
• Database
• Management
• Etc.
• A single Elastic IP
• Route53 for DNS
EC2
Instance
Elastic IP
Amazon
Route 53
User
11. “We’re gonna need a bigger box”
• Simplest approach
• Can now leverage PIOPs
• High I/O instances
• High memory instances
• High CPU instances
• High storage instances
• Easy to change instance sizes
• Will hit an endpoint eventually
hi1.4xlarge
m2.4xlarge
m1.small
12. “We’re gonna need a bigger box”
• Simplest approach
• Can now leverage PIOPs
• High I/O instances
• High memory instances
• High CPU instances
• High storage instances
• Easy to change instance sizes
• Will hit an endpoint eventually
hi1.4xlarge
m2.4xlarge
m1.small
13. Day One, User One
• We could potentially get
to a few hundred to a few
thousand depending on
application complexity
and traffic
• No failover
• No redundancy
• Too many eggs in one
basket
EC2
Instance
Elastic IP
Amazon
Route 53
User
14. Day One, User One
• We could potentially get
to a few hundred to a few
thousand depending on
application complexity
and traffic
• No failover
• No redundancy
• Too many eggs in one
basket
EC2
Instance
Elastic IP
Amazon
Route 53
User
15. Day Two, User >1
First let’s separate out
our single host into
more than one.
• Web
• Database
– Make use of a database
service?
Web
Instance
Database
Instance
Elastic IP
Amazon
Route 53
User
16. Self-managed Fully Managed
Database Server on
Amazon EC2
Your choice of database
running on Amazon EC2
Bring Your Own License
(BYOL)
Amazon
DynamoDB
Managed NoSQL
database service using
SSD storage
Seamless scalability
Zero administration
Amazon
RDS
Microsoft SQL, Oracle,
Postgres or MySQL as
a managed service
Flexible licensing –
BYOL or license
included
Amazon
Redshift
Massively parallel,
petabyte-scale, data
warehouse service
Fast, powerful and
easy to scale
Database Options
17. But how do I choose
what DB technology I
need? SQL? NoSQL?
21. Why start with SQL?
• Established and well worn technology
• Lots of existing code, communities, books, background,
tools, etc
• You aren’t going to break SQL DBs in your first 10 million
users. But you might break parts of it (hence blended
approach)
• Clear patterns to scalability
22. If your usage is such that you will be
generating several TB ( >5 ) of data
in the first year OR have an
incredibly data intensive workload
you might need NoSQL
23. Why else might you need NoSQL?
• Super low latency applications
• Metadata driven datasets
• Highly unrelational data
• Need schema-less data constructs*
• Massive amounts of data (again, in the TB range)
• Rapid ingest of data (thousands of records/sec)
*Need != “its easier to do dev without schemas”
25. User >100
First let’s separate out
our single host into
more than one
• Web
• Database
– Use RDS to make your life
easier
Web
Instance
Elastic IP
RDS DB
Instance
Amazon
Route 53
User
26. User > 1000
Next let’s address our
lack of failover and
redundancy issues
• Elastic Load Balancing
• Another web instance
– In another Availability Zone
• Enable Amazon RDS multi-AZ
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Availability Zone
Web
Instance
RDS DB Instance
Standby (Multi-AZ)
Elastic
Load
Balancing
Amazon
Route 53
User
28. User >10 ks–100 ks
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Availability Zone
RDS DB Instance
Standby (Multi-AZ)
Elastic Load
Balancing
RDS DB
Instance Read
Replica
RDS DB
Instance Read
Replica
RDS DB
Instance Read
Replica
RDS DB
Instance Read
Replica
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Amazon
Route 53
User
30. Shift Some Load Around
Let’s lighten the load on our
web and database instances:
• Move static content from the
web Instance to Amazon S3
and CloudFront
• Move dynamic content from the
Elastic Load Balancing to
CloudFront
• Move session/state and DB
caching to ElastiCache or
Amazon DynamoDB
Web
Instance
RDS DB Instance
Active (Multi-AZ)
AVAILABILITY ZONE
Elastic
Load
Balancer
Amazon
S3
Amazon
CloudFro
nt
Amazon
Route 53
User
ElastiCache
Amazon
DynamoDB
31. Shift Some Load Around
Let’s lighten the load on our web
and database instances
• Move static content from the web
instance to Amazon S3 and
CloudFront
• Move dynamic content from the
Elastic Load Balancing to
CloudFront
• Move session/state and DB
caching to ElastiCache or
DynamoDB
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Elastic
Load
Balancing
Amazon
S3
Amazon
CloudFront
Amazon
Route 53
User
ElastiCache
Amazon
DynamoDB
AVAILABILITY ZONE
32. Shift Some Load Around
Let’s lighten the load on our web
and database instances
• Move static content from the web
instance to Amazon S3 and
CloudFront
• Move dynamic content from the
Elastic Load Balancing to
CloudFront
• Move session/state and DB
caching to ElastiCache or
Amazon DynamoDB
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Elastic
Load
Balancing
Amazon
S3
Amazon
CloudFront
Amazon
Route 53
User
ElastiCache
Amazon
Dynamo
DB
AVAILABILITY ZONE
33. Now that our Web tier is much
more lightweight, we can revisit
the beginning of our talk…
42. User >500k+
Availability Zone
Amazon
Route 53
User
Amazon S3
Amazon
Cloudfront
Availability Zone
Elastic Load
Balancing
DynamoDB
RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
ElastiCache RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
ElastiCacheRDS DB Instance
Standby (Multi-AZ)
RDS DB Instance
Active (Multi-AZ)
Auto
Scaling
Group
Auto
Scaling
Group
47. AWS Marketplace & Partners Can Help
• Customers can find, research, buy
software
• Simple pricing aligns with EC2
usage model
• Launch in minutes
• Marketplace billing integrated into
your AWS account
• 1,000+ products across 20+
categories
Learn more at: aws.amazon.com/marketplace
48. Spend Your Time Wisely
• Managing your infrastructure will become an increasingly
important part of your time. Use tools to automate
repetitive tasks
• Tools to manage AWS resources
• Tools to manage software on and configuration of your
instances
• Automated data analysis of logs and user actions
49. AWS Application Management Solutions
Elastic Beanstalk AWS OpsWorks AWS CloudFormation EC2
Convenience Control
Higher level services Do it yourself
50. Host-based Configuration Management
• Popular offerings
– Opscode Chef
– PuppetLabs Puppet
– Ansible
• Do similar things in slightly different ways
• Works well with tools from the previous slide
• Require some learning time
• Can’t scale easily without this kind of capability
51. From 500K to 1 Million Users
• Getting serious now
• Significant user base
• Plenty of attention if things go wrong
• Interesting phase for startups with funding rounds
52. Time to make some
radical improvements at
the web & app layers
54. SOAing
Move services into their own tiers
or modules. Treat each of these
as 100% separate pieces of your
infrastructure and scale them
independently.
Amazon.com and AWS do this
extensively! It offers flexibility and
greater understanding of each
component.
55. Loose Coupling Sets You Free!
• The looser they're coupled, the bigger they scale
– Use independent components
– Design everything as a black box
– Decouple interactions
– Favor services with built in redundancy and scalability than building
your own
Controller A Controller B
Controller A Controller B
Q Q
Tight Coupling
Use Amazon SQS as Buffers
Loose Coupling
56. Loose Coupling + SOA = Winning
Examples:
• Email
• Queuing
• Transcoding
• Search
Amazon
CloudSearch
Amazon SQSAmazon SNS
Amazon Elastic
Transcoder
Amazon SWF
Amazon SES
In the early days, if someone has a service for it already,
use that instead of building it yourself
Don’t reinvent the wheel
• Databases
• Monitoring
• Metrics
• Logging
58. Amazon S3
Bucket for
Ingest
User
Amazon SNS Topic
RRS
Amazon S3
Bucket to
Serve
Content to
CloudFront
Amazon S3
Bucket for
Originals
CloudFront
Download
Distribution
SQS Queue
Size for Thumbnail
SQS Queue
Size Image for
Mobile
SQS Queue
Size Image for Web
Autoscaling
Group
Instances
Autoscaling
Group
Instances
Autoscaling
Group
Instances
59. Amazon Simple Workflow Service (SWF)
• Provides an orchestration tool across your infrastructure
• Can act as a middle layer to pass messages and setup tasks
• Lets you break down individual tasks into different workers
• Lets you define logic between workers
• Lets you make a worker task from anything that can be scripted
• Includes built-in retries, timeouts, logging
• Features built-in reliability, scalability, and low cost
Deciders Workers
Your code = &
60. Amazon S3
Bucket for
Ingest
User
RRS
Amazon S3
Bucket to
Serve
Content to
CloudFront
Amazon S3
Bucket for
Originals
CloudFront
Download
Distribution
SQS Queue
Size for Thumbnail
SQS Queue
Size Image for
Mobile
SQS Queue
Size Image for Web
Autoscaling
Group
Instances
Autoscaling
Group
Instances
Autoscaling
Group
Instances
Amazon SNS Topic
61. Amazon S3
Bucket for
Ingest
User
RRS
Amazon S3
Bucket to
Serve
Content to
CloudFront
Amazon S3
Bucket for
Originals
CloudFront
Download
Distribution
Autoscaling
Group
Instances
Autoscaling
Group
Instances
Autoscaling
Group
Instances
SWF
Instance Running
Decider
62. Users > 1 Million
Reaching a million and above is going to require some of all the
previous things:
• Multi-AZ
• Elastic Load Balancing between tiers
• Auto Scaling
• Service-oriented architecture
• Serving content smartly (S3/CloudFront)
• Caching off DB
• Moving state off tiers that autoscale
63. Users > 1 Million
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Elastic Load
Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Amazon
Route 53
User
Amazon S3
Amazon
Cloudfront
Amazon
DynamoDB
Amazon SQS
ElastiCache
Worker
Instance
Worker
Instance
Amazon
CloudWatch
Internal App
Instance
Internal App
Instance
Amazon SES
65. From 5 to 10 Million Users
You may start to run into issues with your database around contention
on the write master.
• How can you solve it?
• Federation (splitting into multiple DBs based on function)
• Sharding (splitting one data set up across multiple hosts)
• Moving some functionality to other types of DBs (NoSQL)
66. Database Federation
• Split up databases by function or
purpose
• Harder to do cross-function queries
• Essentially delays the need for
something like sharding or NoSQL until
much further down the line
• Won’t help with single huge functions
or tables
ForumsDB
UsersDB
ProductsDB
67. Sharded Horizontal Scaling
• More complex at the application
layer
• ORM support can help
• No practical limit on scalability
• Operational complexity and
sophistication
• Shard by function or key space
• RDBMS or NoSQL
User ShardID
002345 A
002346 B
002347 C
002348 B
002349 A
A
B
C
68. Shifting Functionality to NoSQL
• Similar in a sense to federation
• Again, think about the earlier points for when you need
NoSQL vs SQL
• Leverage hosted services like Amazon DynamoDB
• Consider these use cases:
– Leaderboards and scoring
– Rapid ingest of clickstream or log data
– Temporary data needs (cart data)
– “Hot” tables
– Metadata or lookup tables
Amazon
DynamoDB
70. • Use Multi-AZ for your infrastructure
• Make use of self-scaling services (Elastic Load Balancing, Amazon S3, Amazon
SNS, SQS, Amazon SES, etc)
Build in redundancy at every level
• Blend SQL & NoSQL wisely
• Cache data both inside and outside your infrastructure
• Split tiers into individual services (SOA)
• Use autoscaling once you’re ready for it
• Use automation tools in your infrastructure
• Make sure you have good metrics, monitoring, and logging tools in place
• Don’t reinvent the wheel
71. Putting all this together
means we should now
easily be able to handle
10+ million users!
72. Users > 10 Million
Iterating on top of the
patterns seen here will get
you up and over 100
million users.
73. Users > 10 Million
• More fine tuning of your application
• More SOA of features and functionality
• Going from Multi-AZ to multi-region
• Needing to start building custom solutions
• Deep analysis of your whole stack
74. One More Thing
• A fantastic amount of FINANCIAL ENGINEERING
to do as well
• Reserved Instances
• Spot Instances
• Correct use of storage
• Scaling driven by queues
• Correct instance sizes
• Etc…