AWS Summit 2011: Designing Fault Tolerant Applicatons


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

AWS Summit 2011: Designing Fault Tolerant Applicatons

  1. 1. Designing Fault-Tolerant Applications Miles Ward Enterprise Solutions Architect
  2. 2. Building Fault-Tolerant Applications on AWS White paper published last year Sharing best practices We’d like to hear your best practices as well Copyright © 2011 Amazon Web Services
  3. 3. AWS Fault-Tolerant Building BlocksTwo approaches: 1) AWS services that are inherently fault-tolerant and highly available: • Amazon Simple Storage Service (S3) • Amazon SimpleDB • Amazon SQS, SNS, SES, CloudWatch, CloudFront, and more. 2) AWS services that offer tools and features to design fault- tolerant and highly available systems: • Amazon Elastic Compute Cloud (EC2) – Availability Zones, Elastic IPs, EBS, etc. – Flexible to trade off budget vs. time to recovery • Amazon Relational Database Service (RDS) – Multi-AZ Deployments – Backup/Restore Copyright © 2011 Amazon Web Services
  4. 4. Amazon EC2 Architecture Amazon Region Machine Availability Zone Image (AMI) Ephemeral Storage EC2 Instance Elastic CloudWatch Block Storage Security Group(s) Auto Amazon S3 Scaling Elastic IP EBS EBS Address Snapshot Snapshot Load BalancingCopyright © 2011 Amazon Web Services
  5. 5. EC2 Features AMI  Packaged, reusable functionality On-Instance Storage  Lifetime tied to instance lifetime  AFR like standard hard disk (around 5%) EBS Volumes  Lifetime independent of any particular EC2 instance  Redundant within an AZ  AFR is 0.1% to 0.5%  Incorporate volume mappings into your architecture  Use EBS snapshot backups Copyright © 2011 Amazon Web Services
  6. 6. EC2 Features Elastic IP Addresses  Map to any EC2 instance within a given Region  Detach from failed instance; map to replacement Auto Scaling  Two ways to use it: • Respond to changing conditions by adding or terminating EC2 instances (attach to CloudWatch metrics) • Maintain a fixed number of instances running, replacing them if they fail or become unhealthy Reserved Instances  Guarantees capacity for when it’s needed Copyright © 2011 Amazon Web Services
  7. 7. EC2 Features CloudWatch Alarms Copyright © 2011 Amazon Web Services
  8. 8. EC2 Features Elastic Load Balancing  Distributes incoming traffic across multiple instances  Sends traffic only to healthy instances Copyright © 2011 Amazon Web Services
  9. 9. Amazon EC2 Regions and Availability Zones US East (Northern Virginia) EU (Dublin) Availability Availability Zone A Zone B Availability Availability Zone A Zone B Availability Availability Zone C Zone DAmazon EC2 Regions:US East (Northern Virginia) / US West (Northern California) /EU (Ireland) / Asia Pacific (Singapore) / Asia Pacific (Tokyo) Copyright © 2011 Amazon Web Services
  10. 10. Availability Zone Characteristics and Advice Distinct physical locations Low-latency network connections between AZs Independent power, cooling, network, security Always partition app stacks across 2 or more AZs Elastic Load Balance across instances in multiple AZs Copyright © 2011 Amazon Web Services
  11. 11. Proper Use of Multiple Availability Zones Centralized Services (S3 Backups, SimpleDB, etc) Availability Zone A Availability Zone B Database Server or Database Server or RDS DB Instance RDS DB Instance App Server App Server Web Server Web Server Requests and Health Checks Elastic Load BalancerCopyright © 2011 Amazon Web Services Incoming Requests
  12. 12. Region Characteristics and Advice Regions are:  Functionally separate  Composed of 2 or more AZs  Connected via the public internet Use regions to:  Have functionality geographically close to customers  Comply with national laws and practices  Implement a DR strategy
  13. 13. RDS Fault-Tolerant Features Multi-AZ Deployments  Synchronous replication across AZs  Automatic fail-over to standby replica Automated Backups  Enables point-in-time recovery of the DB instance  Retention period configurable Snapshots  User initiated full backup of DB  New DB can be created from snapshots
  14. 14. AWS Architectural Guidance Copyright © 2011 Amazon Web Services
  15. 15. Design For Failure – Basic Principles Avoid single points of failure Assume everything fails, and design backwards Goal: Applications should continue to function even if the underlying physical hardware fails or is removed or replaced. Design your recovery process Trade off business needs vs. cost of high -availability Copyright © 2011 Amazon Web Services
  16. 16. Design For Failure – Use AWS Building Blocks Use Elastic IP addresses for consistent and re - mappable routes Use multiple Amazon EC2 Availability Zones (AZs ) Replicate data across multiple AZs  Example: Amazon RDS Multi-AZ mode Use real-time monitoring (Amazon CloudWatch) Use Amazon Elastic Block Store (EBS) for persistent file systems Take EBS Snapshots and use S3 for backups Copyright © 2011 Amazon Web Services
  17. 17. Copyright © 2011 Amazon Web Services Build Loosely Coupled Systems Use independent components Design everything as a Black Box Load-balance and scale clusters Think about graceful degradationAmazon SQS as Buffers Tight Controller Controller Controller A B C Coupling Q Q Q Loose Coupling Controller Controller Controller using Queues A B C
  18. 18. Implement Elasticity Don’t assume health or fixed location of components Use designs that are resilient to reboot and re-launch Bootstrap your instances –  “Who am I am and what is my role?” Enable dynamic configuration Use configurations in SimpleDB for bootstrapping Use Auto Scaling Use Elastic Load Balancing on each tier Copyright © 2011 Amazon Web Services
  19. 19. Implementing ElasticityElastic Load Balancing, CloudWatch, and AutoScaling Elastic Load Balancing Utilization Auto Scaling CloudWatch Metrics Copyright © 2011 Amazon Web Services
  20. 20. Copyright © 2011 Amazon WebUse a Chaos Monkey Services From the Netflix blog: Simple monkey:  Kill any instance in the account Complex monkey:  Kill instances with specific tags  Introduce other faults (e.g. connectivity via Security Group) Human monkey:  Kill instances from the AWS Management Console
  21. 21. AWS Architecture Center White papers:  Cloud architectures  Building fault-tolerant applications  Web hosting best practices  Leveraging different storage options  AWS security best practices Copyright © 2011 Amazon Web Services
  22. 22. Thank You! Copyright © 2011 Amazon Web Services