Uploaded byinfolive

664 views

13h00 aws 2012-fault_tolerant_applications

This document discusses building fault-tolerant applications in the cloud using Amazon Web Services (AWS). It outlines that AWS provides many inherently fault-tolerant services like S3, DynamoDB, and Route53 that can be used as building blocks. It also discusses design patterns for fault tolerance like using multiple availability zones, elastic load balancing, auto-scaling, and loose coupling between application components. The document provides examples of how to architect fault-tolerant systems on AWS for both front-end and data tier systems using services like EC2, RDS, DynamoDB, S3, and ElastiCache. It emphasizes testing fault tolerance through the use of tools like Chaos Monkey.

Technology◦Travel◦

Building Fault-Tolerant
Applications in the Cloud
Ryan Holland
Ecosystem Solution Architect

Faults?
Facilities
Hardware
Networking
Code

People

What is “Fault-Tolerant”?
Degrees of risk mitigation - not binary

Automated

Tested!

Agenda
The AWS Approach

Building Blocks

Design Patterns

Old School Fault-Tolerance: Build Two

Cloud Computing Benefits
No Up-Front Low Cost Pay Only for
Capital Expense What You Use

Self-Service Easily Scale Improve Agility &
Infrastructure Up and Down Time-to-Market

Deploy

Cloud Computing Fault-Tolerance Benefits
No Up-Front HA Low Cost Pay for DR Only
Capital Expense Backups When You Use it

Self-Service Easily Deliver Fault- Improve Agility &
DR Infrastructure Tolerant Applications Time-to-Recovery

Deploy

AWS Cloud allows Overcast Redundancy

Have the shadow
duplicate of your
infrastructure ready to go
when you need it…

…but only pay for what
you actually use

Old Barriers to HA
are now Surmountable

Cost

Complexity

Expertise

AWS Building Blocks: Two Strategies
Inherently fault- Services that are fault-tolerant
tolerant services with the right architecture
S3 Amazon EC2
SimpleDb
VPC
DynamoDB
Cloudfront EBS
SWF, SQS, SNS, SES RDS
Route53
Elastic Load Balancer
Elastic Beanstalk
ElastiCache
Elastic MapReduce
IAM

Resources

Deployment
The Stack: Management

Configuration

Networking

Facilities
Geographies

EC2 Instances

Amazon Machine Images

The Stack: CW Alarms - AutoScaling

Cloudformation - Beanstalk

Route53 – ElasticIP – ELB

Availability Zones

Regions

Regional Diversity

Use Regions for:
Latency
• Customers
• Data Vendors
• Staff
Compliance
Disaster Recovery
… and Fault Tolerance!

Proper Use of Multiple Availability Zones

Network Fault-Tolerance Tools
107.22.18.45 isn’t fault-tolerant but 50.17.200.146 is: EIP

Elastic Load Balancing

Automated DNS: Route53

Latency-Based Routing

New EC2 VPC feature:
Elastic Network Interface

Up to 8 Interfaces
with 30 Addresses
each
Span Subnets
Attach/Detach
Public or Private

Cloudformation – Elastic Beanstalk

Q: Is your stack unique?

Cloudwatch – Alarms – AutoScaling

AMI’s
Maintenance is critical

Alternatives: Chef, Puppet, cfn-init, etc.

When in doubt: 64-bit

Replicate for DR

EC2 Instances
Consistent, reliable building block

100% API controlled

Reserved Instances

EBS

Immense Fleet Scale

Example:
a “fork-lifted” app

Example:
Fault-Tolerant

Why mess with all of that?

Design For Failure

SPOF

Copyright ©
2011 Amazon

Build Loosely Coupled Systems Web Services

Tight
Coupling
Loose Coupling
using Queues

Fault-Tolerant Front-end Systems

Addressing: Route53, EIP
Auto Scaling Amazon CloudFront

Distribution: Multi-AZ, ELB, Cloudfront

Redundancy: Auto-Scaling Amazon CloudWatch Amazon Route
53

Elastic Load
Monitoring: Cloudwatch Balancer

Elastic IP

AWS Elastic
Platform: Elastic Beanstalk Beanstalk

Fault-Tolerant Data-Tier Systems

Tuned
Patched
Cached
Sharded
Replicated
Backed Up
Archived
Monitored

Fault-Tolerant Data-Tier Systems

Tuned
Patched
Cached LOTS
Sharded
Replicated
OF
Backed Up WORK
Archived
Monitored

AWS Fault-Tolerant Data-Tier Services
S3

SimpleDB Amazon Relational
Database Service Amazon Elastic
(RDS) MapReduce

Amazon Simple
Storage Service
EMR (S3)

DynamoDB Amazon SimpleDB

Amazon DynamoDB

RDS Amazon
ElastiCache

RDS Fault-Tolerant Features
Multi-AZ Deployments

Read Replicas
RDS DB Instance RDS DB Instance
Multi-AZ Standby

Automated Backups

Snapshots

Storage Gateway
Your Datacenter

Amazon Elastic
Compute Cloud
(EC2)

AWS Storage
Gateway
VM SSL
Clients

Internet
On-premises Host or
Direct AWS Storage Amazon Simple
Connect Gateway Service Storage Service (S3)

Application
Servers Amazon Elastic
Block Storage
(EBS)
Direct Attached or Storage Area Network Disks

Test! Use a Chaos Monkey!
Prudent

Conservative

Professional

Open source

…and all the cool kids are doing it

http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html

Thank You!

Recommended

PDF

Plenary Talk at ACAT 2010

PDF

Overview of Amazon Web Services

byHarish Ganesan

PDF

IBM Spectrum Scale on the Cloud

PDF

Jeff barr Seattle_interactive_2011_q4

bySeattle Interactive Conference

PDF

Talk given at "Cloud Computing for Systems Biology" workshop

PDF

NHGRI Cloud Computing talk

PPTX

Aws tutorial for beginners- tibacademy.in

PDF

Bio-IT World 2010 - Keynote talk

PDF

Talk at Microsoft Cloud Futures 2010

PDF

クラウド時代のアーキテクチャ設計

PDF

The IoT Academy_awstraining_part2_aws_ec2_iaas

byThe IoT Academy

PDF

Aws 201:Advanced Breakout Track on HA and DR

byHarish Ganesan

DOCX

Wordpress site scaling architecture on cloud infrastructure with AWS

PPTX

Cloud computing with AWS

PPTX

Aws overview (Amazon Web Services)

byJatinder Randhawa

PPTX

AWS and Serverless with Alexa

PDF

AWS Use Cases

DOCX

Johana diego

PPTX

17h30 aws-databases-summit

PDF

Hotel restaurant don bienve

byHotel Restaurant Don Bienve

DOCX

L Branham GT resume 090716

byLayron Branham

RTF

Ashish_Chauhan_Resume_(1)_(2)

byAshish Chauhan

PPTX

Diaporama photo cfpj nath et suz 31 08-11

PDF

Szabó Éva: A digitális szakadékon innen és túl – A tanárszerep változása a XX...

bydigitalisnemzedek

PPS

RIA and Ajax

bySchubert Gomes

PPTX

Egy rövid evészavar kérdőív (scoff) magyar

PPTX

Filep Otília: Színes kémia

PPTX

KFC TQM PRINCIPALS

PPTX

Mesés filmszemle - segítő beszélgetés bemutatása animációs filmeken

bySzilvia Tóth-Mózer

PDF

Masterworks talk on Big Data and the implications of petascale science

More Related Content

PDF

Plenary Talk at ACAT 2010

PDF

Overview of Amazon Web Services

byHarish Ganesan

PDF

IBM Spectrum Scale on the Cloud

PDF

Jeff barr Seattle_interactive_2011_q4

bySeattle Interactive Conference

PDF

Talk given at "Cloud Computing for Systems Biology" workshop

PDF

NHGRI Cloud Computing talk

PPTX

Aws tutorial for beginners- tibacademy.in

PDF

Bio-IT World 2010 - Keynote talk

Plenary Talk at ACAT 2010

Overview of Amazon Web Services

byHarish Ganesan

IBM Spectrum Scale on the Cloud

Jeff barr Seattle_interactive_2011_q4

bySeattle Interactive Conference

Talk given at "Cloud Computing for Systems Biology" workshop

NHGRI Cloud Computing talk

Aws tutorial for beginners- tibacademy.in

Bio-IT World 2010 - Keynote talk

What's hot

PDF

Talk at Microsoft Cloud Futures 2010

PDF

クラウド時代のアーキテクチャ設計

PDF

The IoT Academy_awstraining_part2_aws_ec2_iaas

byThe IoT Academy

PDF

Aws 201:Advanced Breakout Track on HA and DR

byHarish Ganesan

DOCX

Wordpress site scaling architecture on cloud infrastructure with AWS

PPTX

Cloud computing with AWS

PPTX

Aws overview (Amazon Web Services)

byJatinder Randhawa

PPTX

AWS and Serverless with Alexa

PDF

AWS Use Cases

Talk at Microsoft Cloud Futures 2010

クラウド時代のアーキテクチャ設計

The IoT Academy_awstraining_part2_aws_ec2_iaas

byThe IoT Academy

Aws 201:Advanced Breakout Track on HA and DR

byHarish Ganesan

Wordpress site scaling architecture on cloud infrastructure with AWS

Cloud computing with AWS

Aws overview (Amazon Web Services)

byJatinder Randhawa

AWS and Serverless with Alexa

AWS Use Cases

Viewers also liked

DOCX

Johana diego

PPTX

17h30 aws-databases-summit

PDF

Hotel restaurant don bienve

byHotel Restaurant Don Bienve

DOCX

L Branham GT resume 090716

byLayron Branham

RTF

Ashish_Chauhan_Resume_(1)_(2)

byAshish Chauhan

PPTX

Diaporama photo cfpj nath et suz 31 08-11

PDF

Szabó Éva: A digitális szakadékon innen és túl – A tanárszerep változása a XX...

bydigitalisnemzedek

PPS

RIA and Ajax

bySchubert Gomes

PPTX

Egy rövid evészavar kérdőív (scoff) magyar

PPTX

Filep Otília: Színes kémia

PPTX

KFC TQM PRINCIPALS

PPTX

Mesés filmszemle - segítő beszélgetés bemutatása animációs filmeken

bySzilvia Tóth-Mózer

Johana diego

17h30 aws-databases-summit

Hotel restaurant don bienve

byHotel Restaurant Don Bienve

L Branham GT resume 090716

byLayron Branham

Ashish_Chauhan_Resume_(1)_(2)

byAshish Chauhan

Diaporama photo cfpj nath et suz 31 08-11

Szabó Éva: A digitális szakadékon innen és túl – A tanárszerep változása a XX...

bydigitalisnemzedek

RIA and Ajax

bySchubert Gomes

Egy rövid evészavar kérdőív (scoff) magyar

Filep Otília: Színes kémia

KFC TQM PRINCIPALS

Mesés filmszemle - segítő beszélgetés bemutatása animációs filmeken

bySzilvia Tóth-Mózer

Similar to 13h00 aws 2012-fault_tolerant_applications

PDF

Masterworks talk on Big Data and the implications of petascale science

PPTX

Running High Availability Websites with Acquia and AWS

PDF

An intro to Amazon Web Services (AWS)

byAndreas Chatzakis

PPTX

Scalable Application Development on AWS

byMikalai Alimenkou

PPTX

Keynote aws summit 2012 final

PDF

Keynote - Cloud e o Futuro com Werner Vogels, CTO da amazon

byAmazon Web Services LATAM

PPTX

ELEKS DevTalks #4: Amazon Web Services Crash Course

PPTX

Amazon web services in the cloud computing landscape

PDF

AMAZON CLOUD Course Content

byVarnaaz Technologies

PDF

Raindance - Tooling for the Clouds

byMarkus Knauer

PPTX

Amazon Web Services OverView

PPTX

Migrating enterprise workloads to AWS

byTom Laszewski

PDF

The Cloud as a Platform

PPTX

NWCloud Cloud Track - Best Practices for Architecting in the Cloud

PPTX

Architecting Cloud Apps

PDF

Cloud Computing Training

PDF

Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...

PDF

How to Run Amazon Web Services Workloads on Your VMware vCloud®

byCloudsoft Corp

PPTX

Razorfish Technology Summit 2012 - Introduction

PDF

Architecting for the cloud

byLeonidas Tsementzis

Masterworks talk on Big Data and the implications of petascale science

Running High Availability Websites with Acquia and AWS

An intro to Amazon Web Services (AWS)

byAndreas Chatzakis

Scalable Application Development on AWS

byMikalai Alimenkou

Keynote aws summit 2012 final

Keynote - Cloud e o Futuro com Werner Vogels, CTO da amazon

byAmazon Web Services LATAM

ELEKS DevTalks #4: Amazon Web Services Crash Course

Amazon web services in the cloud computing landscape

AMAZON CLOUD Course Content

byVarnaaz Technologies

Raindance - Tooling for the Clouds

byMarkus Knauer

Amazon Web Services OverView

Migrating enterprise workloads to AWS

byTom Laszewski

The Cloud as a Platform

NWCloud Cloud Track - Best Practices for Architecting in the Cloud

Architecting Cloud Apps

Cloud Computing Training

Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...

How to Run Amazon Web Services Workloads on Your VMware vCloud®

byCloudsoft Corp

Razorfish Technology Summit 2012 - Introduction

Architecting for the cloud

byLeonidas Tsementzis

More from infolive

PPTX

Projeto Exame Forum Virtual 3.0 v2

PPTX

16h30 aws gru security deck

PDF

16h00 globant - aws globant-big-data_summit2012

PPTX

15h00 intel - intel big data for aws summits rev3

PPTX

14h00 aws costoptimization_jvaria

PPTX

Infolive apresentação 2012

Projeto Exame Forum Virtual 3.0 v2

16h30 aws gru security deck

16h00 globant - aws globant-big-data_summit2012

15h00 intel - intel big data for aws summits rev3

14h00 aws costoptimization_jvaria

Infolive apresentação 2012

Recently uploaded

PDF

Logical Optimal Actions – Towards Knowledge-based Reinforcement Learning with...

byMichiaki Tatsubori

PDF

AI TOOLS FOR PRODUCTIVITY IN MODERN TIMES.pdf

PDF

Escape from the Forbidden Zone: Smuggling green and inclusive tech past the g...

byBookNet Canada

PDF

Bringing AI into R&D, Taking a Human-Centric Approach / Haim Yadid

PPTX

Spacecraft Guidance Quick Research Guide by Arthur Morgan

byArthur Morgan

PDF

20260212 Security-JAWS activity results for 2025 and activity goals for 2026

PDF

Reality Drift: Why Systems Keep Working After Meaning Drops Out

byReality Drift Archive | A. Jacobs

PDF

GDG Cloud Southlake #49: Pradeep R Kumar: Implications of Agentic AI for Iden...

byJames Anderson

PDF

Chapter 6 Authentication and Access Control.pdf

byGetnet Tigabie Askale -(GM)

PDF

Spec-Driven Development with Kiro: Elevating Software Quality, Traceability, ...

PDF

UiPath Automation Developer Associate Training Series 2025 - Session 4

PPTX

CTO Strategy OS 2026: The Tech, AI & Cloud Playbook Boards Want

byridwansassman

PDF

Empower your IT team with cloud-based PC management using Dell Management Por...

byPrincipled Technologies

PPTX

The Transformative Technology in Contemporary Businesses

PDF

GenerationAI_Paris_2025_Architecting_Intelligence.pdf

PDF

Digital Twin in IBM for Accelerated Discovery of Climate & Sustainability, K...

byMichiaki Tatsubori

PDF

final~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.pdf

byomarbishtawi04

PDF

GTM-and-Sales-Plan for a cyber security product

byAshish Jangir

PDF

UiPath Automation Developer Associate Training Series 2026 - Session 3

PDF

UiPath Modern Automation Playbook -Session 2

bysuhanisingh58689

Logical Optimal Actions – Towards Knowledge-based Reinforcement Learning with...

byMichiaki Tatsubori

AI TOOLS FOR PRODUCTIVITY IN MODERN TIMES.pdf

Escape from the Forbidden Zone: Smuggling green and inclusive tech past the g...

byBookNet Canada

Bringing AI into R&D, Taking a Human-Centric Approach / Haim Yadid

Spacecraft Guidance Quick Research Guide by Arthur Morgan

byArthur Morgan

20260212 Security-JAWS activity results for 2025 and activity goals for 2026

Reality Drift: Why Systems Keep Working After Meaning Drops Out

byReality Drift Archive | A. Jacobs

GDG Cloud Southlake #49: Pradeep R Kumar: Implications of Agentic AI for Iden...

byJames Anderson

Chapter 6 Authentication and Access Control.pdf

byGetnet Tigabie Askale -(GM)

Spec-Driven Development with Kiro: Elevating Software Quality, Traceability, ...

UiPath Automation Developer Associate Training Series 2025 - Session 4

CTO Strategy OS 2026: The Tech, AI & Cloud Playbook Boards Want

byridwansassman

Empower your IT team with cloud-based PC management using Dell Management Por...

byPrincipled Technologies

The Transformative Technology in Contemporary Businesses

GenerationAI_Paris_2025_Architecting_Intelligence.pdf

Digital Twin in IBM for Accelerated Discovery of Climate & Sustainability, K...

byMichiaki Tatsubori

final~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.pdf

byomarbishtawi04

GTM-and-Sales-Plan for a cyber security product

byAshish Jangir

UiPath Automation Developer Associate Training Series 2026 - Session 3

UiPath Modern Automation Playbook -Session 2

bysuhanisingh58689

13h00 aws 2012-fault_tolerant_applications

1.
Building Fault-Tolerant Applications inthe Cloud Ryan Holland Ecosystem Solution Architect
2.
Faults? Facilities Hardware Networking Code People
3.
What is “Fault-Tolerant”? Degreesof risk mitigation - not binary Automated Tested!
4.
Agenda The AWS Approach BuildingBlocks Design Patterns
5.
Old School Fault-Tolerance:Build Two
6.
Cloud Computing Benefits No Up-Front Low Cost Pay Only for Capital Expense What You Use Self-Service Easily Scale Improve Agility & Infrastructure Up and Down Time-to-Market Deploy
7.
Cloud Computing Fault-ToleranceBenefits No Up-Front HA Low Cost Pay for DR Only Capital Expense Backups When You Use it Self-Service Easily Deliver Fault- Improve Agility & DR Infrastructure Tolerant Applications Time-to-Recovery Deploy
8.
AWS Cloud allowsOvercast Redundancy Have the shadow duplicate of your infrastructure ready to go when you need it… …but only pay for what you actually use
9.
Old Barriers toHA are now Surmountable Cost Complexity Expertise
10.
AWS Building Blocks:Two Strategies Inherently fault- Services that are fault-tolerant tolerant services with the right architecture S3 Amazon EC2 SimpleDb VPC DynamoDB Cloudfront EBS SWF, SQS, SNS, SES RDS Route53 Elastic Load Balancer Elastic Beanstalk ElastiCache Elastic MapReduce IAM
11.
Resources Deployment The Stack: Management Configuration Networking Facilities Geographies
12.
EC2 Instances Amazon Machine Images The Stack: CW Alarms - AutoScaling Cloudformation - Beanstalk Route53 – ElasticIP – ELB Availability Zones Regions
13.
Regional Diversity Use Regionsfor: Latency • Customers • Data Vendors • Staff Compliance Disaster Recovery … and Fault Tolerance!
14.
Proper Use ofMultiple Availability Zones
15.
Network Fault-Tolerance Tools 107.22.18.45 isn’t fault-tolerant but 50.17.200.146 is: EIP Elastic Load Balancing Automated DNS: Route53 Latency-Based Routing
16.
New EC2 VPCfeature: Elastic Network Interface Up to 8 Interfaces with 30 Addresses each Span Subnets Attach/Detach Public or Private
17.
Cloudformation – ElasticBeanstalk Q: Is your stack unique?
18.
Cloudwatch – Alarms– AutoScaling
19.
AMI’s Maintenance is critical Alternatives:Chef, Puppet, cfn-init, etc. When in doubt: 64-bit Replicate for DR
20.
EC2 Instances Consistent, reliablebuilding block 100% API controlled Reserved Instances EBS Immense Fleet Scale
21.
Example: a “fork-lifted” app
22.
Example: Fault-Tolerant
23.
Why mess withall of that?
24.
Design For Failure SPOF
25.
Copyright © 2011 Amazon Build Loosely Coupled Systems Web Services Tight Coupling Loose Coupling using Queues
26.
Fault-Tolerant Front-end Systems Addressing:Route53, EIP Auto Scaling Amazon CloudFront Distribution: Multi-AZ, ELB, Cloudfront Redundancy: Auto-Scaling Amazon CloudWatch Amazon Route 53 Elastic Load Monitoring: Cloudwatch Balancer Elastic IP AWS Elastic Platform: Elastic Beanstalk Beanstalk
27.
Fault-Tolerant Data-Tier Systems Tuned Patched Cached Sharded Replicated BackedUp Archived Monitored
28.
Fault-Tolerant Data-Tier Systems Tuned Patched Cached LOTS Sharded Replicated OF Backed Up WORK Archived Monitored
29.
AWS Fault-Tolerant Data-TierServices S3 SimpleDB Amazon Relational Database Service Amazon Elastic (RDS) MapReduce Amazon Simple Storage Service EMR (S3) DynamoDB Amazon SimpleDB Amazon DynamoDB RDS Amazon ElastiCache
30.
RDS Fault-Tolerant Features Multi-AZDeployments Read Replicas RDS DB Instance RDS DB Instance Multi-AZ Standby Automated Backups Snapshots
31.
Storage Gateway Your Datacenter Amazon Elastic Compute Cloud (EC2) AWS Storage Gateway VM SSL Clients Internet On-premises Host or Direct AWS Storage Amazon Simple Connect Gateway Service Storage Service (S3) Application Servers Amazon Elastic Block Storage (EBS) Direct Attached or Storage Area Network Disks
32.
Test! Use aChaos Monkey! Prudent Conservative Professional Open source …and all the cool kids are doing it http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
33.
Thank You!

Editor's Notes

#2 We are going to talk today about building fault-tolerant systems, andlmorespecificically look at how AWS enables the cost effective and scalable design of these systems in ways that simply cannot be done otherwise.
#3 So what types of faults are we trying to survive? If we stop and think about most applications there’s Really there are a wide array of different ways most applications can fail, the facitilits themselves, could have failures ranging from something extremely catasrophic like the building catching fire to something as simple as a power outage. Inside the facilities we are relying on a number of systems to be opperating, we have a network stack with routers, switches and firewalls, as well as servers and storage devices all of which very much have the ability to fail, either through hardware failures or configuraiton errors. And all of that is before we even get to the code for your applications and the peple that manage it, both of which are also potential areas of where failures can occur.
#4 So what does fault tolerant mean, first its important to point out this sin’t an absolte there sin’t a magic easy button for this nor is there a one size fits all approach to building applications that can survive every possible failure, generally speaking there are costs associated with mitigating the risk of different types of failures as well as likelithoods of those fialuresoccuring so the design of these applications becomes an exercise in risk mitigation. For example of hard drives, the risk associated with a hard drive failing is pretty high compared to an entire datacenter being destroyed luckily the cost of mitigating against a failed hard drive is also far lower than building duplicate datacenters. The second bullet on there is very important, given that people, or human error, is probably the most common cause of failures for applications to truly be fualtfolerant they must leveage automation in the case of failure, this not only makes recovery much faster but also assures it happens in a known and controlled manner. And lastly if you don’t test your design, you won’t know if it works.
#6 So here’s how we used to implimentfualt tolerance, it was really simple: build two of everything. Now there’s some signifigant problems with this approach, the obvious one is cost since your application just got 100% more expensive, and here in brazil which already has much higher server hw costs that can make the cost of mitigating against many types of failures impracticle for many applications. So what ends up happening is, again going back the risk mitigation idea someone is going to have to look at the cost of purchaings, maintaining and opperating a second instance of the applcaition and decide if its worth the cost.
#7 I’m sure people are familiar with a lot of the commonly talked about beneifts of cloud computing from removing upfront capital costs to time to market and agility and in the area of fault tolerance these benefits all translate very well.
#8 The upfront capital cost of adding a second server or mirroring stroage is gone for HA, backups are far simipler to use and extremely cost effective, and today with our release of Glacier which will revolutionize the way business backup and archive data that’s never been more true. From a DR perspective you can stage infrastructure and only launch and pay for it when its actually needed, versus paying for infrastrucuture 24/7 you hope to never use. Services that are part of AWS greatly simpliy making your applications highly available and fault tolerant, often at a m
#9 With DR this becomes every evident, think of how often you actually use your DR site, hopefully your thinking of a really small number, now think of how youre paying for that. With AWS we have massive ammounts of infrastructure, in 8 different regions around the world and the ability of stage and programmically deploy infrastrcture to any of those regions. So you can stage your DR site and have it ready to spring into action if needed but only pay for what your actually using.
#10 The next eveoltion beyond DR is HA, and HA has traditionally had a number of barriers that limited the applications that could be deployed in a HA manner, the first being cost but also from the standpoint of complexity, now this is different that DR where something is broke and you need to have some method of getting it back online either by using a second location with HA you want to have components be able to fail but have the system still opperate normally because there’s multiple servers that can perform that fuction or multiple online replicas of the data. In the traditional DC this can be very difficult and complex as well as costly, but with AWS we have built HA services that you can leverage which not only bends the cost curve but also makes its extremely simple to do.
#11 As you can see here many of the servives we provide, are inherently fault-tolerant, we’ve done all the work to create them in a fashion that is resiliant to failure and highly durable so you don’t have to. So now if you need a fault-tolerant NoSQL DB you don’t have to worry about how to architect that you can simply use DynamoDB. So with the right design and by leveraging the services we provide that are inherently fault-tolerant you can focus on building your application rather than the infrastrcutre. Some of the services you see on the right are fault tolerant with the right architecture, and what we mean by that is we give you options on how you’re going to architect and deploy those services, RDS with mysql for example is fault-tolerant when Multi-AZ deployments are used since it will be replicating the data to multiple datacenters.
#12 So we know there are opprotunities for failure at every layer of the stack, from disasters that affect entire geographies, or indidividualbuildingsall they up to the sever you’re application is running on. Now lets see how this translates in AWS and look at the service we have that provide fault-tolerance
#13 At AWS we’ve built fault tolerant systems at every level of that stack.
#14 Fault Separation Amazon EC2 provides customers the flexibility to place instances within multiple geographic regions as well as across multiple Availability Zones. Each Availability Zone is designed with fault separation. This means that Availability Zones are physically separated within a typical metropolitan region, on different flood plains, in seismically stable areas. In addition to discrete uninterruptable power source (UPS) and onsite backup generation facilities, they are each fed via different grids from independent utilities to further reduce single points of failure. They are all redundantly connected to multiple tier-1 transit providers. It should be noted that although traffic flowing across the private networks between Availability Zones in a single region is on AWS-controlled infrastructure, all communications between regions is across public Internet infrastructure, so appropriate encryption methods should be used to protect sensitive data. Data are not replicated between regions unless proactively done so by the customer.
#15 Distinct physical locationsLow-latency network connections between AzsIndependent power, cooling, network, securityAlways partition app stacks across 2 or more AzsElastic Load Balance across instances in multiple AzsDon’t confuse AZ’s with Regions!
#18 Note, the question is not “do you need to automate your deployment” or “should I use automation when I’m using the cloud?” the answer to that is YES!The question is; if you’re using fully standard PHP or Java stacks, why manage it? Beanstalk does that great, with zero lock-in. If what you need is more complex, perhaps cloudformation (note, you can do BOTH!)
#22 Three-Tier Web App has been “fork-lifted” to the cloudEverything in a single Availability ZoneLoad balanced at the Web tier and App tier using software load balancersMaster and Standby databaseElastic IP on front end load balancer onlyS3 used as DB backup instead of tapeHow can you use AWS features to make this app more highly available?
#23 Three-Tier Web App has been “fork-lifted” to the cloudEverything in a single Availability ZoneLoad balanced at the Web tier and App tier using software load balancersMaster and Standby databaseElastic IP on front end load balancer onlyS3 used as DB backup instead of tapeHow can you use AWS features to make this app more highly available?
#25 Avoid single points of failureAssume everything fails, and design backwardsGoal: Applications should continue to function even if the underlying physical hardware fails or is removed or replaced.Design your recovery processTrade off business needs vs. cost of high-availability
#28 Multiple DNS TargetsLoad Balanced across Availability ZonesAuto-scaled web-cache servers with health checksAuto-scaled web-servers with health checksComprehensive config, data, and AMI backupMonitoring, alarming and logging
#29 DB-Tier Load Balancing or QueueingAuto-scaled Database cache servers with health checksRedundant Relational Database systems Mirrored, log-shipped, async or sync replicatedDesigned to scale horizontally (sharding)Durable NoSQL or KV-store Data SystemsNo SPOF designSupports automatic re-balancing, replication, and fault-recoveryMonitoring, alarming and logging
#30 DB-Tier Load Balancing or QueueingAuto-scaled Database cache servers with health checksRedundant Relational Database systems Mirrored, log-shipped, async or sync replicatedDesigned to scale horizontally (sharding)Durable NoSQL or KV-store Data SystemsNo SPOF designSupports automatic re-balancing, replication, and fault-recoveryMonitoring, alarming and logging
#32 Multi-AZ DeploymentsSynchronous replication across AZsAutomatic fail-over to standby replicaAutomated BackupsEnables point-in-time recovery of the DB instanceRetention period configurableSnapshotsUser initiated full backup of DBNew DB can be created from snapshots