AWS Summit 2014
Architecting Highly Available
Applications on AWS
Alex Sinner
Solutions Architect
@alexsinner
Architecting Highly Available Applications on AWS
•  ME: Alex Sinner – AWS Solutions Architect
•  YOU: Here to learn more ...
Going from 1 User
to >10 Millions
So how do we scale?
Hi, I have NO IDEA what I am doing!!
a lot of things to read
not where we want to start
a lot of things to read
Auto Scaling is a tool.
It’s not the single thing that
fixes everything.
What do we
need first?
Some basics…
Regions
US-WEST (Oregon)
EU-WEST (Ireland)
ASIA PAC (Tokyo)
US-WEST (N. California)
SOUTH AMERICA (Sao Paulo)
US-EAST (Vir...
Availability Zones
US-WEST (Oregon)
EU-WEST (Ireland)
ASIA PAC (Tokyo)
US-WEST (N. California)
SOUTH AMERICA (Sao Paulo)
U...
Compute	
  
Storage	
  &	
  
Content	
  
Delivery	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  ...
Compute	
  
Storage	
  &	
  
Content	
  
Delivery	
  
AWS	
  Global	
  Infrastructure	
  
Database	
  
App	
  Services	
  ...
So let’s start from day
one, user one ( you )
Day One, User One
•  A single EC2 Instance
–  With full stack on this host
•  Web app
•  Database
•  Management
•  Etc.
• ...
“We’re gonna need a bigger box”
•  Simplest approach
•  Can now leverage PIOPs
•  High I/O instances
•  High memory instan...
“We’re gonna need a bigger box”
•  Simplest approach
•  Can now leverage PIOPs
•  High I/O instances
•  High memory instan...
Day One, User One:
•  We could potentially get
to a few hundred to a few
thousand depending on
application complexity
and ...
Day One, User One:
•  We could potentially get
to a few hundred to a few
thousand depending on
application complexity
and ...
Day Two, User >1:
First, let’s separate out
our single host into
more than one:
•  Web
•  Database
–  Make use of a databa...
Self-Managed Fully-Managed
Database server
on Amazon EC2
Your choice of
database running on
Amazon EC2
Bring Your Own
Lice...
But how do I choose
what DB technology I
need? SQL? NoSQL?
Some people won’t
like this. But…
Start with SQL
databases
Why start with SQL?
•  Established and well-worn technology
•  Lots of existing code, communities, books, background,
tool...
AH HA! You
said “massive
amounts”, I
will have
massive
amounts!
If your usage is such that you will be
generating several TB of data in the
first year OR have an incredibly
data-intensiv...
Regardless, why NoSQL?
•  Super low latency applications
•  Metadata driven datasets
•  Highly non-relational data
•  Need...
But back to the main
path… Let’s see how
far SQL at the core
can grow
User >100
First let’s separate out
our single host into
more than one
•  Web
•  Database
–  Use RDS to make your life
easi...
User > 1000
Next let’s address our
lack of failover and
redundancy issues
•  Elastic Load Balancing
•  Another web instanc...
Scaling this horizontally
and vertically
will get us pretty far
( 10s-100s of thousands )
User >10 ks–100 ks
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
RDS DB Instance
Standby (Multi-AZ...
Shift some load around:
Let’s lighten the load on our
web and database
instances:
•  Move static content from the
web inst...
Now let’s revisit the
beginning of our talk…
Auto Scaling!
Automatic resizing of compute clusters
based on demand
Trigger auto-scaling policy
Feature	
   Details	
  
Control	
   Defi...
Auto Scaling can scale from
one instance to thousands
and back down
User >500k+:
Availability Zone
Amazon
Route 53
User
Amazon S3
Amazon
CloudFront
Availability Zone
Elastic Load
Balancing
A...
ARCHITECTING DATA-DRIVEN
MASS PRODUCED VIDEO
AWS SUMMIT 2014 | JUNE 10, 2014
JASPER JAGER
SENIOR DEVELOPER AND AWS ARCHITECT
REDNUN, AMSTERDAM
THIS IS WHAT WE DO
‣ Automatically mass produce data-driven, personalised or profiled video
‣ ING, KLM, Essent België, T-Mo...
‣ Small campaign, 25.000 personalised videos
‣ Self hosted 3x 8 core Xserves with 96GB RAM
‣ 300 videos an hour
HOW WE STA...
PROBLEMS WITH THE OLD IN-
HOUSE SETUP
‣ 250.000 videos would take us 35 days
‣ Or we would have to buy more hardware
‣ Sys...
REBUILD REDNUN IN
THE CLOUD
‣ Ability to start 100’s of machines, based on preconfigured AMI
‣ High availability for our ca...
SECOND INFRASTRUCTURE
AUTOSCALE EVERYTHING
‣ Automated daily flows, welcome video, birthday video etc.
‣ API, videos can be produced on the fly
‣ ...
LOOSE COUPLING
‣ Decoupled components
‣ Use SQS as a buffer
‣ Continuous monitoring and adjusting
AUTOMATE EVERYTHING
‣ Cloudformation and Opsworks
‣ Flexibility to start environment in different region
‣ Dev and QA envi...
CURRENT INFRASTRUCTURE
THE THINGS WE’VE
LEARNED
‣ AWS service limits
‣ Autoscale on Cloudwatch or custom metrics
‣ Automate your infrastructure
‣...
JASPER@REDNUN.NL
WWW.REDNUN.NL
On Tools:
Managing your infrastructure will become an ever
increasing important part of your time. Use tools to
automate r...
User >500k+:
You’ll potentially start to run into issues with speed and
performance of your applications:
•  Have monitori...
HOST
LEVEL
METRICS
AGGREGATE
LEVEL
METRICS
LOG
ANALYSIS
EXTERNAL
SITE
PERFORMANCE
Not having proper monitoring/
metrics is like flying a plane
with an eye mask on in a
thunderstorm.
Oh, and your wing is o...
AWS Marketplace & Partners Can Help
•  Customer can find, research,
and buy software
•  Simple pricing, aligns with
Amazon...
There are further
improvements to be
made in breaking apart
our web/app layer
SOA = Service Oriented Architecture
SOA’ing
Move services into their own tiers/
modules. Treat each of these as
100% separate pieces of your
infrastructure an...
Loose coupling sets you free!
•  The looser they're coupled, the bigger they scale
–  Independent components
–  Design eve...
Loose coupling + SOA = winning
Examples:
•  Email
•  Queuing
•  Transcoding
•  Search
•  Databases
•  Monitoring
•  Metric...
On re-inventing the wheel…
If you find yourself writing
your own: queue, DNS server,
database, storage system,
monitoring ...
Take a deep breath and
stop it. Now.
Back to SOA
Users > 1 Million
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Elastic Load
Balancer
RDS DB Instance
Read Replica
R...
The next big steps
From 5 to 10 Million Users
You may start to run into issues with your database around
contention on the write master.
How ...
…and there you have it.
10 Million
A Quick Review
Review
•  Multi-AZ your infrastructure
•  Make use of self-scaling services
–  Elastic Load Balancing, Amazon S3, Amazon S...
Review (cont)
•  Make sure you have good metrics/monitoring/
logging tools in place
•  Split tiers into individual service...
Putting all this together
means we should now
easily be able to handle
10+ million users!
To infinity…..
Thank You!
AWS EXPERT?
GET CERTIFIED!
aws.amazon.com/certification
Alex Sinner
Solutions Architect
@alexsinner
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on aws
Upcoming SlideShare
Loading in...5
×

T1 – Architecting highly available applications on aws

2,124

Published on

Published in: Technology

T1 – Architecting highly available applications on aws

  1. 1. AWS Summit 2014 Architecting Highly Available Applications on AWS Alex Sinner Solutions Architect @alexsinner
  2. 2. Architecting Highly Available Applications on AWS •  ME: Alex Sinner – AWS Solutions Architect •  YOU: Here to learn more about running highly available, scalable Applications on AWS •  TODAY: about best practices and things to think about when building for large scale
  3. 3. Going from 1 User to >10 Millions
  4. 4. So how do we scale?
  5. 5. Hi, I have NO IDEA what I am doing!!
  6. 6. a lot of things to read
  7. 7. not where we want to start a lot of things to read
  8. 8. Auto Scaling is a tool. It’s not the single thing that fixes everything.
  9. 9. What do we need first?
  10. 10. Some basics…
  11. 11. Regions US-WEST (Oregon) EU-WEST (Ireland) ASIA PAC (Tokyo) US-WEST (N. California) SOUTH AMERICA (Sao Paulo) US-EAST (Virginia) AWS GovCloud (US) ASIA PAC (Sydney) ASIA PAC (Singapore) CHINA (Beijing)
  12. 12. Availability Zones US-WEST (Oregon) EU-WEST (Ireland) ASIA PAC (Tokyo) US-WEST (N. California) SOUTH AMERICA (Sao Paulo) US-EAST (Virginia) AWS GovCloud (US) ASIA PAC (Sydney) ASIA PAC (Singapore) CHINA (Beijing)
  13. 13. Compute   Storage  &   Content   Delivery   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  Administra=on   Networking   Service Reference Model
  14. 14. Compute   Storage  &   Content   Delivery   AWS  Global  Infrastructure   Database   App  Services   Deployment  &  Administra=on   Networking   Amazon CloudSearch Amazon SQS Amazon SNS Amazon Elastic Transcoder Amazon SWF Amazon SES Amazon DynamoDB Amazon RDS Amazon ElastiCache Amazon RedShift AWS Storage Gateway Amazon S3 Amazon Glacier Amazon CloudFront Amazon CloudWatch AWS IAM AWS CloudFormation Amazon Elastic Beanstalk AWS Data Pipeline AWS OpsWorks AWS CloudTrail Amazon EC2 Amazon EMR Amazon VPC Amazon Route 53 AWS Direct Connect Amazon Kinesis
  15. 15. So let’s start from day one, user one ( you )
  16. 16. Day One, User One •  A single EC2 Instance –  With full stack on this host •  Web app •  Database •  Management •  Etc. •  A single Elastic IP •  Route53 for DNS EC2 Instance Elastic IP Amazon Route 53 User
  17. 17. “We’re gonna need a bigger box” •  Simplest approach •  Can now leverage PIOPs •  High I/O instances •  High memory instances •  High CPU instances •  High storage instances •  Easy to change instance sizes •  Will hit an endpoint eventually i2.4xlarge m3.xlarge m1.small
  18. 18. “We’re gonna need a bigger box” •  Simplest approach •  Can now leverage PIOPs •  High I/O instances •  High memory instances •  High CPU instances •  High storage instances •  Easy to change instance sizes •  Will hit an endpoint eventually i2.4xlarge m3.xlarge m1.small
  19. 19. Day One, User One: •  We could potentially get to a few hundred to a few thousand depending on application complexity and traffic •  No failover •  No redundancy •  Too many eggs in one basket EC2 instance Elastic IP address Amazon Route 53 User
  20. 20. Day One, User One: •  We could potentially get to a few hundred to a few thousand depending on application complexity and traffic •  No failover •  No redundancy •  Too many eggs in one basket EC2 instance Elastic IP address Amazon Route 53 User
  21. 21. Day Two, User >1: First, let’s separate out our single host into more than one: •  Web •  Database –  Make use of a database service? Web instance Database instance Elastic IP address Amazon Route 53 User
  22. 22. Self-Managed Fully-Managed Database server on Amazon EC2 Your choice of database running on Amazon EC2 Bring Your Own License (BYOL) Amazon DynamoDB Managed NoSQL database service using SSD storage Seamless scalability Zero administration Amazon RDS Microsoft SQL, Oracle, MySQL or PostgreSQL as a managed service Flexible licensing BYOL or License Included Amazon Redshift Massively parallel, petabyte-scale, data warehouse service Fast, powerful and easy to scale Database Options
  23. 23. But how do I choose what DB technology I need? SQL? NoSQL?
  24. 24. Some people won’t like this. But…
  25. 25. Start with SQL databases
  26. 26. Why start with SQL? •  Established and well-worn technology •  Lots of existing code, communities, books, background, tools, etc. •  You aren’t going to break SQL DBs in your first 10 million users. No really, you won’t*. •  Clear patterns to scalability * Unless you are manipulating data at MASSIVE scale; even then, SQL will have a place in your stack
  27. 27. AH HA! You said “massive amounts”, I will have massive amounts!
  28. 28. If your usage is such that you will be generating several TB of data in the first year OR have an incredibly data-intensive workload… you might need NoSQL
  29. 29. Regardless, why NoSQL? •  Super low latency applications •  Metadata driven datasets •  Highly non-relational data •  Need schema-less data constructs* •  Massive amounts of data (again, in the TB range) •  Rapid ingest of data ( thousands of records/sec ) •  Already have skilled staff *Need != “it is easier to do dev without schemas”
  30. 30. But back to the main path… Let’s see how far SQL at the core can grow
  31. 31. User >100 First let’s separate out our single host into more than one •  Web •  Database –  Use RDS to make your life easier Web Instance Elastic IP RDS DB Instance Amazon Route 53 User
  32. 32. User > 1000 Next let’s address our lack of failover and redundancy issues •  Elastic Load Balancing •  Another web instance –  In another Availability Zone •  Enable Amazon RDS multi-AZ Web Instance RDS DB Instance Active (Multi-AZ) Availability Zone Availability Zone Web Instance RDS DB Instance Standby (Multi-AZ) Elastic Load Balancing Amazon Route 53 User
  33. 33. Scaling this horizontally and vertically will get us pretty far ( 10s-100s of thousands )
  34. 34. User >10 ks–100 ks RDS DB Instance Active (Multi-AZ) Availability Zone Availability Zone RDS DB Instance Standby (Multi-AZ) Elastic Load Balancing RDS DB Instance Read Replica RDS DB Instance Read Replica RDS DB Instance Read Replica RDS DB Instance Read Replica Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Amazon Route 53 User
  35. 35. Shift some load around: Let’s lighten the load on our web and database instances: •  Move static content from the web instance to Amazon S3 and CloudFront •  Move dynamic content from the load balancer to CloudFront •  Move session/state and DB caching to ElastiCache or Amazon DynamoDB Web instance RDS DB Instance Active (Multi-AZ) Availability Zone Elastic Load Balancer Amazon S3 Amazon CloudFront Amazon Route 53 User ElastiCache Amazon DynamoDB
  36. 36. Now let’s revisit the beginning of our talk…
  37. 37. Auto Scaling!
  38. 38. Automatic resizing of compute clusters based on demand Trigger auto-scaling policy Feature   Details   Control   Define  minimum  and  maximum  instance  pool   sizes  and  when  scaling  and  cool  down  occurs   Integrated  to  Amazon   CloudWatch   Use  metrics  gathered  by  CloudWatch  to  drive   scaling   Instance  types   Run  Auto  Scaling  for  On-­‐Demand  and  Spot   Instances;  compa=ble  with  VPC   aws  autoscaling  create-­‐auto-­‐scaling-­‐ group   -­‐-­‐auto-­‐scaling-­‐group-­‐name  MyGroup   -­‐-­‐launch-­‐configuration-­‐name  MyConfig   -­‐-­‐min-­‐size  4   -­‐-­‐max-­‐size  200   -­‐-­‐availability-­‐zones  us-­‐west-­‐2c   Auto Scaling Amazon CloudWatch
  39. 39. Auto Scaling can scale from one instance to thousands and back down
  40. 40. User >500k+: Availability Zone Amazon Route 53 User Amazon S3 Amazon CloudFront Availability Zone Elastic Load Balancing Amazon DynamoDBRDS DB Instance Read Replica Web instance Web instance Web instance ElastiCache RDS DB Instance Read Replica Web instance Web instance Web instance ElastiCacheRDS DB Instance Standby (Multi-AZ) RDS DB Instance Active (Multi-AZ)
  41. 41. ARCHITECTING DATA-DRIVEN MASS PRODUCED VIDEO AWS SUMMIT 2014 | JUNE 10, 2014
  42. 42. JASPER JAGER SENIOR DEVELOPER AND AWS ARCHITECT REDNUN, AMSTERDAM
  43. 43. THIS IS WHAT WE DO ‣ Automatically mass produce data-driven, personalised or profiled video ‣ ING, KLM, Essent België, T-Mobile ‣ Run everything in AWS
  44. 44. ‣ Small campaign, 25.000 personalised videos ‣ Self hosted 3x 8 core Xserves with 96GB RAM ‣ 300 videos an hour HOW WE STARTED
  45. 45. PROBLEMS WITH THE OLD IN- HOUSE SETUP ‣ 250.000 videos would take us 35 days ‣ Or we would have to buy more hardware ‣ Systems which would idle most of the time ‣ Storing and serving all videos - HELP
  46. 46. REBUILD REDNUN IN THE CLOUD ‣ Ability to start 100’s of machines, based on preconfigured AMI ‣ High availability for our campaign sites, behind load balancers ‣ Big campaign, big Dutch lottery, 1.200.000 ‣ Batched, pre-rendered videos, stored on S3 ‣ Took us just a couple of days
  47. 47. SECOND INFRASTRUCTURE
  48. 48. AUTOSCALE EVERYTHING ‣ Automated daily flows, welcome video, birthday video etc. ‣ API, videos can be produced on the fly ‣ Autoscaling based on Cloudwatch metrics for web and app servers ‣ Custom autoscaling scripts for video rendering ‣ Use spot instances when available
  49. 49. LOOSE COUPLING ‣ Decoupled components ‣ Use SQS as a buffer ‣ Continuous monitoring and adjusting
  50. 50. AUTOMATE EVERYTHING ‣ Cloudformation and Opsworks ‣ Flexibility to start environment in different region ‣ Dev and QA environments
  51. 51. CURRENT INFRASTRUCTURE
  52. 52. THE THINGS WE’VE LEARNED ‣ AWS service limits ‣ Autoscale on Cloudwatch or custom metrics ‣ Automate your infrastructure ‣ AWS can help you scale with ease
  53. 53. JASPER@REDNUN.NL WWW.REDNUN.NL
  54. 54. On Tools: Managing your infrastructure will become an ever increasing important part of your time. Use tools to automate repetitive tasks. •  Tools to manage AWS resources – AWS CloudFormation •  Tools to manage software and configuration on your instances – AWS OpsWorks •  Automated data analysis of logs and user actions
  55. 55. User >500k+: You’ll potentially start to run into issues with speed and performance of your applications: •  Have monitoring/metrics/logging in place –  If you can’t build it internally, outsource it! (3rd party SaaS) •  Pay attention to what customers are saying works well •  Squeeze as much performance as you can out of each service/component
  56. 56. HOST LEVEL METRICS AGGREGATE LEVEL METRICS LOG ANALYSIS EXTERNAL SITE PERFORMANCE
  57. 57. Not having proper monitoring/ metrics is like flying a plane with an eye mask on in a thunderstorm. Oh, and your wing is on fire.
  58. 58. AWS Marketplace & Partners Can Help •  Customer can find, research, and buy software •  Simple pricing, aligns with Amazon EC2 usage model •  Launch in minutes •  AWS Marketplace billing integrated into your AWS account •  1300+ products across 20+ categories Learn more at: aws.amazon.com/marketplace
  59. 59. There are further improvements to be made in breaking apart our web/app layer
  60. 60. SOA = Service Oriented Architecture
  61. 61. SOA’ing Move services into their own tiers/ modules. Treat each of these as 100% separate pieces of your infrastructure and scale them independently. Amazon.com and AWS do this extensively! It offers flexibility and greater understanding of each component.
  62. 62. Loose coupling sets you free! •  The looser they're coupled, the bigger they scale –  Independent components –  Design everything as a black box –  Decouple interactions –  Favor services with built-in redundancy and scalability rather than building your own Controller  A   Controller  B   Controller  A   Controller  B   Q   Q   Tight  coupling   Use  Amazon  SQS  for  buffers   Loose  coupling  
  63. 63. Loose coupling + SOA = winning Examples: •  Email •  Queuing •  Transcoding •  Search •  Databases •  Monitoring •  Metrics •  Logging Amazon CloudSearch Amazon SQSAmazon SNS Amazon Elastic Transcoder Amazon SWF Amazon SES In the early days, if someone has a service for it already, opt to use that instead of building it yourself. DON’T RE-INVENT THE WHEEL
  64. 64. On re-inventing the wheel… If you find yourself writing your own: queue, DNS server, database, storage system, monitoring tool
  65. 65. Take a deep breath and stop it. Now.
  66. 66. Back to SOA
  67. 67. Users > 1 Million RDS DB Instance Active (Multi-AZ) Availability Zone Elastic Load Balancer RDS DB Instance Read Replica RDS DB Instance Read Replica Web Instance Web Instance Web Instance Web Instance Amazon Route 53 User Amazon S3 Amazon Cloudfront Amazon DynamoDB Amazon SQS ElastiCache Worker Instance Worker Instance Amazon CloudWatch Internal App Instance Internal App Instance Amazon SES
  68. 68. The next big steps
  69. 69. From 5 to 10 Million Users You may start to run into issues with your database around contention on the write master. How can you solve it? •  Federation - splitting into multiple DBs based on function •  Sharding - splitting one data set up across multiple hosts •  Moving some functionality to other types of DBs (NoSQL)
  70. 70. …and there you have it. 10 Million
  71. 71. A Quick Review
  72. 72. Review •  Multi-AZ your infrastructure •  Make use of self-scaling services –  Elastic Load Balancing, Amazon S3, Amazon SNS, Amazon SQS, Amazon SWF, Amazon SES, etc. •  Build in redundancy at every level •  Most likely start with SQL •  Cache data both inside and outside your infrastructure •  Use automation tools in your infrastructure
  73. 73. Review (cont) •  Make sure you have good metrics/monitoring/ logging tools in place •  Split tiers into individual services (SOA) •  Use Auto Scaling when you’re ready for it •  Don’t reinvent the wheel •  Move to NoSQL when it really makes sense but do your best not to administer it
  74. 74. Putting all this together means we should now easily be able to handle 10+ million users!
  75. 75. To infinity…..
  76. 76. Thank You! AWS EXPERT? GET CERTIFIED! aws.amazon.com/certification Alex Sinner Solutions Architect @alexsinner
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×