Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS Summit Benelux 2013 - Architecting for High Availability

1,025 views

Published on

Published in: Technology, Business
  • Be the first to comment

AWS Summit Benelux 2013 - Architecting for High Availability

  1. 1. ARCHITECTING FOR HIGH AVAILABILITY Carlos Conde Sr. Mgr. Solutions Architecture
  2. 2. “LET’S BUILD A ________ WEB APPLICATION”
  3. 3. “LET’S BUILD A HIGHLY AVAILABLE ________ WEB APPLICATION”
  4. 4. “LET’S BUILD A HIGHLY AVAILABLE AND SCALABLE ________ WEB APPLICATION”
  5. 5. “LET’S BUILD A HIGHLY AVAILABLE, DURABLE AND SCALABLE ________ WEB APPLICATION”
  6. 6. “LET’S BUILD A HIGHLY AVAILABLE, DURABLE, RESILIENT AND SCALABLE ________ WEB APPLICATION”
  7. 7. AWS BUILDING BLOCKS Inherently Fault-Tolerant Services Fault-Tolerant with the right architecture Amazon S3 Amazon DynamoDB Amazon CloudFront Amazon SWF Amazon SQS Amazon SNS Amazon SES Amazon Route53 Elastic Load Balancing AWS IAM AWS Elastic Beanstalk Amazon ElastiCache Amazon EMR Amazon Redshift Amazon CloudSearch  Amazon EC2  Amazon EBS  Amazon RDS  Amazon VPC
  8. 8. 1. DESIGN FOR FAILURE 2. USE MULTIPLE AZs 3. BUILD FOR SCALE 4. DECOUPLE COMPONENTS
  9. 9. « Everything fails all the time » Werner Vogels CTO of Amazon
  10. 10. YOUR GOAL APPLICATIONS SHOULD CONTINUE TO FUNCTION EVEN IF THE UNDERLYING PHYSICAL HARDWARE FAILS OR IS REMOVED OR REPLACED
  11. 11. #1 DESIGN FOR FAILURE
  12. 12. AVOID SINGLE POINTS OF FAILURE ASSUME EVERYTHING FAILS, AND WORK BACKWARDS
  13. 13. AVOID SINGLE POINTS OF FAILURE ASSUME EVERYTHING FAILS, AND WORK BACKWARDS
  14. 14. HEALTH CHECKS
  15. 15. #2 USE MULTIPLE AVAILABILITY ZONES
  16. 16. US-WEST (N. California) EU-WEST (Ireland) ASIA PAC (Tokyo) ASIA PAC (Singapore) US-WEST (Oregon) SOUTH AMERICA (Sao Paulo) US-EAST (Virginia) GOV CLOUD ASIA PAC (Sidney)
  17. 17. AMAZON RDS MULTI-AZ
  18. 18. #3 BUILD FOR SCALE
  19. 19. AMAZON CLOUDWATCH MONITORING FOR AWS RESOURCES
  20. 20. AUTO SCALING SCALE UP/DOWN EC2 CAPACITY
  21. 21. HEALTH CHECKS + AUTO SCALING
  22. 22. HEALTH CHECKS + AUTO SCALING = SELF-HEALING
  23. 23. #4 DECOUPLE COMPONENTS
  24. 24. BUILD LOOSELY COUPLED SYSTEMS The looser they are coupled, the bigger they scale, the more fault tolerant they get…
  25. 25. PUBLISH & NOTIFY RECEIVE TRANSCODE
  26. 26. AMAZON SQS SIMPLE QUEUE SERVICE
  27. 27. PUBLISH & NOTIFY RECEIVE TRANSCODE
  28. 28. PUBLISH & NOTIFY RECEIVE TRANSCODE
  29. 29. PUBLISH & NOTIFY RECEIVE
  30. 30. PUBLISH & NOTIFY RECEIVE TRANSCODE
  31. 31. ARCHITECTURE DESIGN PATTERN
  32. 32. SQS VISIBILITY TIMEOUT
  33. 33. BUFFERING
  34. 34. CLOUDWATCH METRICS FOR AMAZON SQS + AUTO SCALING
  35. 35. PUBLISH & NOTIFY RECEIVE TRANSCODE
  36. 36. PUBLISH & NOTIFY RECEIVE TRANSCODE
  37. 37. CAT? CHECK IMAGE TOO BIG? RESIZE IMAGE NO YES NO OMG, IT’S A CAT! TRANSCODE CAT CHECK START PUBLISH & NOTIFY STOPREJECT
  38. 38. CAT? CHECK IMAGE TOO BIG? RESIZE IMAGE NO YES NO YES TRANSCODE CAT CHECK START PUBLISH & NOTIFY STOPREJECT
  39. 39. CAT? CHECK IMAGE TOO BIG? RESIZE IMAGE NO YES NO YES TRANSCODE CAT CHECK START PUBLISH & NOTIFY STOPREJECT
  40. 40. TAKS DECISIONS HISTORY
  41. 41. TAKS DECISIONS HISTORY STATELESS !
  42. 42. STATELESS SCALES HORIZONTALLY
  43. 43. AMAZON SWF ENABLES RESILIENT, SCALABLE, DISTRIBUTED WORKFLOWS
  44. 44. WORKFLOW ACTORS
  45. 45. DECIDERS COORDINATION LOGIC 1. Poll for work on a decision list Long polling: 60 seconds 2. Evaluate workflow execution history SWF sends full history in JSON format 3. Return decision to Amazon SWF Usually scheduling another task
  46. 46. WORKERS EXECUTION LOGIC 1. Poll for work on a specific task list Long polling: 60 seconds 2. Execute works, send heartbeats SWF sends input data from deciders 3. Return success / failure Detailed data can be provided to deciders
  47. 47. SWF IS WATCHING TRACKING:  Execution tracking Time to start, time to finish, … Time to finish for overall workflow Timeouts controlled for each of these (and more)  Heartbeats for long-running activities (optional)  Decider is informed of timeouts Schedule retries, “mitigation” strategies or cleanup tasks
  48. 48. NO NEW LANGUAGE TO LEARN YOUR CODE IS YOUR WORKFLOW LANGUAGE AMAZON SWF MAINTAINS STATE
  49. 49. ALL HORIZONTAL SCALING PATTERNS APPLY
  50. 50. CHAINED TASKS WITHOUT DECISIONS? USE AMAZON SQS PUBLISH & NOTIFY RECEIVE TRANSCODE
  51. 51. TASK GRAPH WITH DECISIONS? USE AMAZON SWF SANITY CHECK RECEIVE DATA CHECK FORMAT REJECT ADJUST FORMAT PUBLISH & NOTIFY GOOD LONG OK SPAM TRANSCODE
  52. 52. 1. DESIGN FOR FAILURE 2. USE MULTIPLE AZs 3. BUILD FOR SCALE 4. DECOUPLE COMPONENTS
  53. 53. YOUR GOAL APPLICATIONS SHOULD CONTINUE TO FUNCTION EVEN IF THE UNDERLYING PHYSICAL HARDWARE FAILS OR IS REMOVED OR REPLACED
  54. 54. AWS ARCHITECTURE CENTER http://aws.amazon.com/architecture AWS TECHNICAL ARTICLES http://aws.amazon.com/articles AWS BLOG http://aws.typepad.com AWS PODCAST http://aws.amazon.com/podcast
  55. 55. THANK YOU!

×