How to Design for High Availability & Scale with AWS


Published on

This presentation talks about how you can optimize your Application Architecture on AWS Cloud and create a Fault Tolerant Architecture that will have Zero Down Time! The best practices for a fault tolerant Web Applicaiton.

Published in: Technology, Business

How to Design for High Availability & Scale with AWS

  1. 1. Blazeclan 1
  2. 2. Agenda Introduction High Availability Scalability Fault Tolerance AWS Global Infrastructure Key Design Concepts Design for Failure Scaling Self Healing / Fault Tolerant Multiple AZ Architecture Loose Coupling Sample Architectures Blazeclan 2 Cloud IT Better
  3. 3. Introduction Blazeclan 3 Cloud IT Better
  4. 4. How Often Do You See This? Blazeclan 4 Cloud IT Better
  5. 5. Cost of Downtime A report published in 2010 for top 412 eCommerce sites says, • The median length of downtime was 840 minutes • On average, each of them saw 3291 minutes of downtime Lost Revenue • On average, each of them lost $800,099 in revenue due to downtime • The total amount of revenue lost due to downtime of all was $329,640,928! Blazeclan 412 companies 5 Cloud IT Better
  6. 6. Online Business & Downtime Facts The Average Hourly Loss because of Data Center Down Time in 2012 Source: Blazeclan 6 Cloud IT Better
  7. 7. How to Build a HIGHLY AVAILABLE, SCALABLE, DURABLE AND RESILIENT Web Application Blazeclan 7 Cloud IT Better
  8. 8. High Availability 99.999% • Up Time of an Application uptime • Planned or Unplanned Outage or Downtime • Offline, Unreachable, or Partially Available • Slow to Use • Goal • No Downtime • Always Available Blazeclan 8 Cloud IT Better
  9. 9. Scalability Ability of an Application to accommodate change in traffic without architectural changes Availability may be impacted if application cannot Scale Resources Demand Scalability doesn’t Guarantee Availability Blazeclan Time 9 Cloud IT Better
  10. 10. Fault Tolerance X • Built-in Redundancy so applications can Continue Functioning when Components fail X • Fault tolerance is crucial to High Availability Image courtesy: Blazeclan 10 Cloud IT Better
  11. 11. AWS Global Infrastructure Blazeclan 11 Cloud IT Better
  12. 12. AWS democratizes High Availability • Multiple Servers • Isolated Redundant Data Centers • Regions across the Globe • Availability Zones within Source: Regions Blazeclan 12 Cloud IT Better
  13. 13. AWS Capacity Source: Blazeclan 13 Cloud IT Better
  14. 14. AWS Platform Source : Blazeclan 14 Cloud IT Better
  15. 15. AWS Building Blocks Inherently Highly Available and Fault Tolerant Services  Amazon S3  Amazon DynamoDB  Amazon SNS  Amazon CloudFront  Amazon SES  Amazon Route53 Architect Across AZ’s Span Across AZ’s  Amazon SQS Highly Available with Right Architecture  Amazon EC2  Amazon EBS  Amazon RDS  Amazon VPC  Amazon SWF  Elastic Load Balancer  … Blazeclan 15 Cloud IT Better
  16. 16. Design For Failure Blazeclan 16 Cloud IT Better
  17. 17. Everything fails, all the time – Werner Vogels, CTO, Amazon Avoid single points of failure Application Should Continue to Function Assume everything fails, and work backwards Obama’s Prized Limo after it broke down in his Israel visit! Blazeclan 17 Avoid Impact on Business Cloud IT Better
  18. 18. Ask Questions for Right Architecture What kind of Scenarios do I have to plan for? What are my single points of failure? If there are master and slaves In your architecture, what if the master node fails? Blazeclan If a load balancer is sitting in front of an array of application servers, what if that load balancer fails? What happens if a node in your system fails? 18 Cloud IT Better
  19. 19. Lots of Questions How do you recognize that failure? How do I replace that node? What if the cache keys grow beyond memory limit of an instance? How does the failover occur & how is a new slave instantiated & brought into sync with the master? What if downstream service times out or returns an exception? Blazeclan 19 Cloud IT Better
  20. 20. Build Mechanisms to Handle Failure • Build process threads that resume on reboot • Allow the state of the system to re-sync by reloading messages from queues • Keep pre-configured and pre-optimized virtual images to support above point on launch/boot • Avoid in-memory sessions or stateful user context, move that to data stores Image courtesy: • Have a coherent backup and restore strategy for your data and automate it Blazeclan 20 Cloud IT Better
  21. 21. Design for Failure Source: ra_ftha_04.pdf Blazeclan 21 Cloud IT Better
  22. 22. Scaling Blazeclan 22 Cloud IT Better
  23. 23. Auto Scaling • Enables to automatically scale Amazon EC2 capacity up or down • Enables to terminate Server Instances at will • Enables to add more instances in response to an increasing load • Enables launch of a replacement Image Courtesy: instance immediately, in case of a failure • Enables application to transition seamlessly in case the primary server fails Blazeclan 23 Cloud IT Better
  24. 24. Elastic Load Balancing (ELB) • Distributes incoming traffic to a application across several Amazon EC2 instances • ELB is given a DNS host name & Requests Sent to this host name are Delegated to a pool of Amazon EC2 instances • ELB Detects Unhealthy Instances within its pool of Amazon EC2 instances and automatically reroutes traffic to healthy instances, until the unhealthy instances have been restored Blazeclan 24 Cloud IT Better
  25. 25. ELB & Auto Scaling • Auto Scaling & ELB are an ideal combination • ELB gives a single DNS name for addressing • Auto Scaling ensures there is always the right number of healthy Amazon EC2 instances to accept requests Blazeclan 25 Cloud IT Better
  26. 26. Fault Tolerant Blazeclan 26 Cloud IT Better
  27. 27. Fault Tolerance • In order to build fault-tolerant applications on Amazon EC2, it’s important to follow best practices such as, • Quickly being able to commission replacement instances • Using Amazon EBS for persistent storage • Use Multiple Availability Zones and elastic IP addresses. Blazeclan 27 Cloud IT Better
  28. 28. Multi-AZ Architecture Blazeclan 28 Cloud IT Better
  29. 29. Multi-AZ Design Considerations • Achieve greater Fault Tolerance by Distributing your application geographically • The Amazon EC2 service level agreement commitment is 99.95% availability for each Amazon EC2 Region • Deploy application that spans across multiple Availability Zones • Redundant instances for each tier of an Image Courtesy: application could be placed in distinct Availability Zones • ELB can automatically balance traffic across multiple instances & multiple Availability Zones Blazeclan 29 Cloud IT Better
  30. 30. Multi- AZ Architecture Blazeclan 30 Cloud IT Better
  31. 31. Loose Coupling Blazeclan 31 Cloud IT Better
  32. 32. Loose Coupled Systems • Loosely coupled systems are more fault tolerant and can achieve a bigger scale • Loosely coupled systems on AWS • De-coupling systems allows for hybrid models (in-cloud + in-physical data center) • Balancing between clusters enables easier scaling • Using queues (Amazon SQS) buffers against failures • Design for a jumble of black boxes Blazeclan 32 Cloud IT Better
  33. 33. Decoupling using SQS Blazeclan 33 Cloud IT Better
  34. 34. Loose Coupling - Best Practices on AWS • Use Amazon SQS to isolate components • Use Amazon SQS as buffers between components • Design every component such that it expose a service interface and is responsible for its own scalability and interacts with other components asynchronously • Bundle the logical construct of a component into an Amazon Machine Image so that it can be deployed more often • Make your applications as stateless as possible. Store session state outside of component (in Amazon SimpleDB, if appropriate) Blazeclan 34 Cloud IT Better
  35. 35. Sample Architectures Blazeclan 35 Cloud IT Better
  36. 36. High Availability Architecture in RDS Blazeclan 36 Cloud IT Better
  37. 37. Web Hosting on AWS Blazeclan 37 Cloud IT Better
  38. 38. Scalable Reader Farm Blazeclan 38 Cloud IT Better
  39. 39. Design for High Availability & Scale Don’t let this happen to your Business Our AWS Expert Solution Architects can help you review your Architecture. Avail for our 2hr Free Consultancy! For any assistance please contact us at Blazeclan 39 Cloud IT Better
  40. 40. Upcoming Webinars Check out Our Upcoming Webinars Blazeclan 40 Cloud IT Better
  41. 41. Thank you Follow Us On : Our Blog : Blazeclan