Building Fault-Tolerant
Applications in the Cloud
Ryan Holland
Ecosystem Solution Architect
Faults?
Facilities
Hardware
Networking
Code



People
What is “Fault-Tolerant”?
Degrees of risk mitigation - not binary




Automated

Tested!
Agenda
The AWS Approach

Building Blocks

Design Patterns
Old School Fault-Tolerance: Build Two
Cloud Computing Benefits
   No Up-Front        Low Cost      Pay Only for
  Capital Expense                   What You Use




    Self-Service    Easily Scale   Improve Agility &
   Infrastructure   Up and Down     Time-to-Market

      Deploy
Cloud Computing Fault-Tolerance Benefits
  No Up-Front HA            Low Cost         Pay for DR Only
  Capital Expense           Backups          When You Use it




   Self-Service      Easily Deliver Fault-   Improve Agility &
 DR Infrastructure   Tolerant Applications   Time-to-Recovery

      Deploy
AWS Cloud allows Overcast Redundancy

                         Have the shadow
                         duplicate of your
                         infrastructure ready to go
                         when you need it…




…but only pay for what
you actually use
Old Barriers to HA
are now Surmountable

Cost

Complexity

Expertise
AWS Building Blocks: Two Strategies
 Inherently fault-      Services that are fault-tolerant
tolerant services         with the right architecture
          S3                     Amazon EC2
      SimpleDb
                                     VPC
     DynamoDB
      Cloudfront                      EBS
SWF, SQS, SNS, SES                    RDS
       Route53
Elastic Load Balancer
  Elastic Beanstalk
     ElastiCache
 Elastic MapReduce
         IAM
Resources

                  Deployment
The Stack:       Management

               Configuration

              Networking

             Facilities
     Geographies
EC2 Instances

                  Amazon Machine Images

The Stack:      CW Alarms - AutoScaling

             Cloudformation - Beanstalk

         Route53 – ElasticIP – ELB

      Availability Zones

   Regions
Regional Diversity

Use Regions for:
  Latency
   • Customers
   • Data Vendors
   • Staff
  Compliance
  Disaster Recovery
  … and Fault Tolerance!
Proper Use of Multiple Availability Zones
Network Fault-Tolerance Tools
107.22.18.45   isn’t fault-tolerant but 50.17.200.146 is: EIP

Elastic Load Balancing

Automated DNS: Route53

Latency-Based Routing
New EC2 VPC feature:
Elastic Network Interface


    Up to 8 Interfaces
    with 30 Addresses
    each
    Span Subnets
    Attach/Detach
    Public or Private
Cloudformation – Elastic Beanstalk




  Q: Is your stack unique?
Cloudwatch – Alarms – AutoScaling
AMI’s
Maintenance is critical

Alternatives: Chef, Puppet, cfn-init, etc.

When in doubt: 64-bit

Replicate for DR
EC2 Instances
Consistent, reliable building block

100% API controlled

Reserved Instances

EBS

Immense Fleet Scale
Example:
a “fork-lifted” app
Example:
Fault-Tolerant
Why mess with all of that?
Design For Failure




SPOF
Copyright ©
                                   2011 Amazon

   Build Loosely Coupled Systems   Web Services




Tight
Coupling
Loose Coupling
using Queues
Fault-Tolerant Front-end Systems

Addressing: Route53, EIP
                                                Auto Scaling                Amazon CloudFront


Distribution: Multi-AZ, ELB, Cloudfront

Redundancy: Auto-Scaling                               Amazon CloudWatch          Amazon Route
                                                                                      53


                                 Elastic Load
Monitoring: Cloudwatch            Balancer

                                                                     Elastic IP

                                                  AWS Elastic
Platform: Elastic Beanstalk                        Beanstalk
Fault-Tolerant Data-Tier Systems

Tuned
Patched
Cached
Sharded
Replicated
Backed Up
Archived
Monitored
Fault-Tolerant Data-Tier Systems

Tuned
Patched
Cached                      LOTS
Sharded
Replicated
                             OF
Backed Up                   WORK
Archived
Monitored
AWS Fault-Tolerant Data-Tier Services
S3

SimpleDB     Amazon Relational
             Database Service                                         Amazon Elastic
                  (RDS)                                                MapReduce

                                           Amazon Simple
                                           Storage Service
EMR                                             (S3)




DynamoDB                                                                 Amazon SimpleDB

                         Amazon DynamoDB




RDS                                                       Amazon
                                                        ElastiCache
RDS Fault-Tolerant Features
Multi-AZ Deployments

Read Replicas
                       RDS DB Instance   RDS DB Instance
                                         Multi-AZ Standby

Automated Backups

Snapshots
Storage Gateway
                Your Datacenter




                                                                                                  Amazon Elastic
                                                                                                  Compute Cloud
                                                                                                          (EC2)

                            AWS Storage
                             Gateway
                                VM                            SSL
  Clients

                                                              Internet
                          On-premises Host                       or
                                                               Direct     AWS Storage        Amazon Simple
                                                              Connect    Gateway Service   Storage Service (S3)


Application
 Servers                                                                                                          Amazon Elastic
                                                                                                                  Block Storage
                                                                                                                      (EBS)
              Direct Attached or Storage Area Network Disks
Test! Use a Chaos Monkey!
                                                                            Prudent

                                                                            Conservative

                                                                            Professional

                                                                            Open source


                                                                            …and all the cool kids are doing it




http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
Thank You!

Fault Tolerant Applications on AWS