5. AWS encourages some new HA/DR thinking…
Elastic Load
Balancer
Amazon EC2
Instances
Web
Web Server
Server
Amazon SNS Amazon S3
Notifications Bucket
Auto Scaling Group
AZ-1
AZ-2
Region
6. DR on AWS does not mean “rearchitect”
This talk is about using AWS for DR for your datacenters
Existing, on-premises Infrastructure
7. Spot the difference!
Production: DR Site (Datacenter):
DR Site (AWS):
Routers Routers
Routers
Firewalls Firewalls
Firewalls
IP Network IP Network
IP Network
Application Licenses Application Licenses
Application Licenses
Operating Systems Operating Systems
Operating Systems
Hypervisor Hypervisor
Hypervisor
Servers Servers
Servers
Storage Network Storage Network
Storage Network
Primary Storage Snapshot Storage
Snapshot Storage
Backup SW Backup SW
Backup SW
Backup Tapes Backup Tapes
Backup Tapes
Tape Silos Tape Silos
Tape Silos
Archive SW Archive SW
Archive
Archive Storage Archive Storage
Archive Storage
8. A different cost model
Ability to scale
– no arbitrary
Cost savings time limit to
w/ AWS failback
Infrastructure Cost
2nd Site
Cost
AWS
Cost
Demand
Test Test Failover Failback
Time
14. Disaster Recovery Terms
RTO: Recovery Time Objective
• Acceptable time period within which normal operation (or
degraded operation) needs to be restored after event
RPO: Recovery Point Objective
• Acceptable data loss measured in time
15. Backup and Restore
Amazon Route 53
Data copied
to S3
Traditional server S3 Bucket
with Objects
AWS
On-premises Infrastructure Import/Export
17. Backup and Restore
Amazon EC2 Data copied from
Instance objects in S3
Data
Volume
Instance Quickly
Amazon
provisioned from
S3 Bucket
AMI
Pre-bundled with
OS and
AMI applications
Availability Zone
AWS Region
18. Backup and Restore
Advantages
• Simple to get started
• Extremely cost effective (mostly backup storage)
Preparation Phase
• Take backups of current systems
• Store backups in S3
• Describe procedure to restore from backup on AWS
• Know which AMI to use, build your own as needed
• Know how to restore system from backups
• Know how to switch to new system
• Know how to configure the deployment
19. Backup and Restore – Storage Gateway
Advantages
• Simple to get started
• Extremely cost effective (mostly backup storage)
Preparation Phase
• Download AWS Storage Gateway software appliance
• Install and configure Storage Gateway
• Use Storage Gateway
• Describe procedure to restore from backup on AWS
• Know which AMI to use, build your own as needed
• Know how to switch to new system
• Know how to configure the deployment
20. Backup and Restore
In Case of Disaster
• Retrieve backups from S3
• Bring up required infrastructure
• EC2 instances with prepared AMIs, Load Balancing, etc.
• Restore system from backup
• Switch over to the new system
• Adjust DNS records to point to AWS
Objectives
• RTO: as long as it takes to bring up infrastructure and restore
system from backups
• RPO: time since last backup
21. Pilot Light
User or system
Web Web
Server Server
Amazon Route 53 Not Running
Application Application
Server Server
Database Database
Data Mirroring/ Smaller Instance
Server Server
Replication
Data Data
Volume Volume
22. Pilot Light
User or system
Web
Web Web
Server
Server Server
Amazon Route 53 Not Running
Application Application
Server Server
Database Database
Database Data Mirroring/ Smaller Instance
Server Server
Server Replication
Data Data
Volume Volume
23. Pilot Light
User or system
Web
Web Web
Server
Server Server
Amazon Route 53 Start in minutes
Application Application
Server Server
Database Database
Database Data Mirroring/ Resize as desired
Server Server
Server Replication
Data Data
Volume Volume
24. Pilot Light
Advantages
• Very cost effective (fewer 24/7 resources)
Preparation Phase
• Enable replication of all critical data to AWS
• Prepare all required resources for automatic start
• AMIs, Network Settings, Load Balancing, etc.
• Reserved Instances
25. Pilot Light
In Case of Disaster
• Automatically bring up resources around the replicated core data
set
• Scale the system as needed to handle current production traffic
• Switch over to the new system
• Adjust DNS records to point to AWS
Objectives
• RTO: as long as it takes to detect need for DR and automatically
scale up replacement system
• RPO: depends on replication type
26. Fully-Working Low Capacity Standby
User or system
Web
Web Server
Server
Amazon Route 53 Low Capacity
App
Application Server
Server
Database DB
Data Mirroring/ Server
Server
Replication
Data Data
Volume Volume
27. Fully-Working Low Capacity Standby
User or system
Web
Web Server
Server
Amazon Route 53 Low Capacity
App
Application Server
Server
Database DB
Data Mirroring/ Server
Server
Replication
Data Data
Volume Volume
28. Fully-Working Low Capacity Standby
User or system
Web Web Web
Server
Server Server
Amazon Route 53 Grow Capacity
Application Application
App
Server Server
Server
Database Database
DB
Data Mirroring/
Server Server
Server
Replication
Data Data
Volume Volume
29. Fully-Working Low Capacity Standby
User or system
Web Web Web
Server
Server Server
Amazon Route 53 Grow Capacity
Application Application
App
Server Server
Server
Database Database
DB
Data Mirroring/
Server Server
Server
Replication
Data Data
Volume Volume
30. Fully-Working Low-Capacity Standby
Advantages
• Can take some production traffic at any time
• Cost savings (IT footprint smaller than full DR)
Preparation
• Similar to Pilot Light
• All necessary components running 24/7, but not scaled for
production traffic
• Best practice – continuous testing
• “Trickle” a statistical subset of production traffic to DR site
31. Fully-Working Low-Capacity Standby
In Case of Disaster
• Immediately fail over most critical production load
• Adjust DNS records to point to AWS
• (Auto) Scale the system further to handle all production load
Objectives
• RTO: for critical load: as long as it takes to fail over; for all other
load, as long as it takes to scale further
• RPO: depends on replication type
32. Multi-Site Hot Standby
User or system
Web Web Web
Server
Server Server
Amazon Route 53 Full Capacity
Application
Application Application
App
Server
Server Server
Server
Database
Database Database
DB
Data Mirroring/ Server
Server
Server Server
Replication
Data Data
Volume Volume
33. Multi-Site Hot Standby
Advantages
• At any moment can take all production load
Preparation
• Similar to Low-Capacity Standby
• Fully scaling in/out with production load
In Case of Disaster
• Immediately fail over all production load
• Adjust DNS records to point to AWS
Objectives
• RTO: as long as it takes fail over
• RPO: depends on replication type
46. What We Learned
Try It
• Get a good understanding of capabilities and potential
• Get a handle on costs
Test It, and Test Again
• Test systems and data flow to ensure replication is working
correctly
Realize Its Potential
• Quickly scalable to add resources to production