Disaster Recovery Site on AWS:
Minimal Cost Maximum Efficiency
Abdul Sathar Sait, Vikram Garlapati, and Kamal Arora (AWS)
...
What you will learn
•
•

•
•

Why AWS for disaster recovery?
Common DR architectures
– Pilot light architecture
• Demo
• C...
Conventional Disaster Recovery sites
•
•
•
•
•
•

High cost
Low ROI
Implemented only for most critical systems
Usually sca...
Disaster Recovery site on AWS
•
•
•
•

Unprecedented capabilities to implement DR sites
Easily setup DR sites on different...
Global reach from your desktop
Common DR architectures

Backup
and
restore

Pilot light

Warm
standby

Hot
standby
Pilot light architecture
Pilot light architecture

Create
instances from
AMIs
Pilot light architecture
Build resources around
replicated dataset
Keep ‘pilot light’ on by replicating core
databases
Bui...
Pilot light architecture
Build resources around
replicated dataset

Scale resources in AWS in
response to a DR event

Keep...
Pilot light architecture

Switchover to AWS
Make necessary DNS changes to redirect
traffic to the DR site on AWS
Pilot Light

DEMO
Simple DR solution – awsdrdemo.com
Active

Passive

Active
Elastic
Load
Balancing

Scaled down
Standby

Amazon
Route 53

C...
Simple DR solution – awsdrdemo.com
DNS
Failover

Active

Gone

Elastic
Load
Balancing

Web/ App
servers

Active

Active
El...
Architecture
failover.awsdrdemo.com

awsdrdemo.com
Active
Active ELB:
DRDemoPrimaryELB52152634.us-east1.elb.amazonaws.com
...
Demo – AWS Resources

console.aws.amazon.com
Demo – Application

awsdrdemo.com
Demo – Failover Kickoff

failover.awsdrdemo.com
Demo – Failover Status Updates

status.awsdrdemo.com/dr
Failover Steps
Launch Failover
Application

Route 53 DNS
Updates

Resize Target
Database Instance

Go Live

AWS CloudForma...
Failover Application Architecture
(1)
Trigger DR
procedure

Failover
App

(6)
Real-time
feed from SNS

Webserver
AMI

SNS ...
Metadata Requests
// Sample code for metadata request using .NET API SDK
string uri = "http://169.254.169.254/latest/meta-...
Amazon Route53 Updates
http://vrg.s3.amazonaws.com/downloads/route53.json
# Retrieving existing ELB details from Route53 H...
Resize Database Instance
# Stopping DB instance for resizing

aws --region us-west-1 ec2 stop-instances --instance-ids $db...
AWS CloudFormation Stack Launch
# Launch DR stack using AWS CloudFormation script
launchedstackid =$(aws --region us-west-...
AWS CloudFormation Template
http://vrg.s3.amazonaws.com/downloads/ELBWithEC2Instances.template
{
"AWSTemplateFormatVersion...
Parameters
"Parameters" : {
"KeyPairName" : {
"Description" : "Name of an existing Amazon EC2 key pair for SSH access",
"T...
Resources – Web Servers
"WebServerGroup" : {
"Type" : "AWS::AutoScaling::AutoScalingGroup",
"Properties" : {
"Availability...
Demo – Failover Status Updates

status.awsdrdemo.com/dr
Disaster recovery site on AWS can be for
• Primary site on customer data center
• Primary on AWS itself
Primary and DR sites on AWS
Backup & Restore pattern
Simple to get started

Cost-effective

Easy starting point for exploring the

Very high levels of...
Backup and restore
Backup and restore
Backup and restore

Create
instances from
AMIs

Restore data
from backups
Many ways to backup
Disaster Recovery site on AWS can be for
• Primary site on customer data center
• Primary on AWS itself
Primary and DR sites on AWS
Customer case study
We are sincerely eager to hear
your feedback on this
presentation and on re:Invent.
Please fill out an evaluation form
whe...
Upcoming SlideShare
Loading in...5
×

Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

1,502

Published on

Implementation of a disaster recovery (DR) site is crucial for the business continuity of any enterprise. Due to the fundamental nature of features like elasticity, scalability, and geographic distribution, DR implementation on AWS can be done at 10-50% of the conventional cost. In this session, we do a deep dive into proven DR architectures on AWS and the best practices, tools and techniques to get the most out of them.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,502
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
104
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

  1. 1. Disaster Recovery Site on AWS: Minimal Cost Maximum Efficiency Abdul Sathar Sait, Vikram Garlapati, and Kamal Arora (AWS) November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. What you will learn • • • • Why AWS for disaster recovery? Common DR architectures – Pilot light architecture • Demo • Code walkthrough – Backup and restore Customer case studies Where to go next
  3. 3. Conventional Disaster Recovery sites • • • • • • High cost Low ROI Implemented only for most critical systems Usually scaled down to 50% of production Systems in a remote region challenging Costly software licenses based on hardware usage
  4. 4. Disaster Recovery site on AWS • • • • Unprecedented capabilities to implement DR sites Easily setup DR sites on different geographic regions Cut down DR site cost by up to 70% Substantial savings on software licenses
  5. 5. Global reach from your desktop
  6. 6. Common DR architectures Backup and restore Pilot light Warm standby Hot standby
  7. 7. Pilot light architecture
  8. 8. Pilot light architecture Create instances from AMIs
  9. 9. Pilot light architecture Build resources around replicated dataset Keep ‘pilot light’ on by replicating core databases Build AWS resources around dataset and leave in stopped state
  10. 10. Pilot light architecture Build resources around replicated dataset Scale resources in AWS in response to a DR event Keep ‘pilot light’ on by replicating core Start up pool of resources in AWS when databases events dictate Build AWS resources around dataset and Scale up the database instance to handle leave in stopped state production capacity
  11. 11. Pilot light architecture Switchover to AWS Make necessary DNS changes to redirect traffic to the DR site on AWS
  12. 12. Pilot Light DEMO
  13. 13. Simple DR solution – awsdrdemo.com Active Passive Active Elastic Load Balancing Scaled down Standby Amazon Route 53 Copy AMI Web/ App servers Web/ App Server AMI Auto scaling Group Oracle Master DB Setup Data Replication Oracle Slave DB Data Volume US East (N. Virginia) US West (N. California)
  14. 14. Simple DR solution – awsdrdemo.com DNS Failover Active Gone Elastic Load Balancing Web/ App servers Active Active Elastic Load Balancing Amazon Route 53 Web/ App servers Autoscale Auto Scaling group Oracle Slave DB Oracle Master DB Data Volume US East (N. Virginia) Scale up DB Data Volume US West (N. California)
  15. 15. Architecture failover.awsdrdemo.com awsdrdemo.com Active Active ELB: DRDemoPrimaryELB52152634.us-east1.elb.amazonaws.com Web Servers: i-36af5751 AMI Copy (ami-996634f0) Web/ App server VPC ID - vpc-5f9ef53e Subnet IDssubnet-440c786c subnet-289ef549 subnet-2c9ef54d Primary Database Server: (i-026aad65) Private IP 174.168.1.11 Amazon Route 53 Passive DR ELB Created on Failover Failover App Instance: i-55cfde0e Elastic IP 54.215.157.25 Webserver Failover AMI App AMI - Scaled down Standby Active Mirroring / Replication Primary Data US East (N. Virginia) Volume Secondary DB Data Volume US West (N. California) Web Servers Created on Failover VPC ID - vpc-a4f2efcc Subnet IDssubnet-bbf2efd3 subnet-884b01ce subnet-bef2efd6 Secondary Database Server: (i-3b266960) Private IP 174.168.1.11
  16. 16. Demo – AWS Resources console.aws.amazon.com
  17. 17. Demo – Application awsdrdemo.com
  18. 18. Demo – Failover Kickoff failover.awsdrdemo.com
  19. 19. Demo – Failover Status Updates status.awsdrdemo.com/dr
  20. 20. Failover Steps Launch Failover Application Route 53 DNS Updates Resize Target Database Instance Go Live AWS CloudFormation – Launch ELB AWS CloudFormation - Launch web servers
  21. 21. Failover Application Architecture (1) Trigger DR procedure Failover App (6) Real-time feed from SNS Webserver AMI SNS HTTP Notification Admin Users (2) Invoke Shell Script (4) Script Updates CLI (5) CF Updates (3) Launch CloudFormation AWS Region
  22. 22. Metadata Requests // Sample code for metadata request using .NET API SDK string uri = "http://169.254.169.254/latest/meta-data/placement/availability-zone"; // Create Web Request HttpWebRequest webrequest = (HttpWebRequest)WebRequest.Create(uri); HttpWebResponse webresponse = webresponse = (HttpWebResponse)webrequest.GetResponse(); Encoding enc = System.Text.Encoding.GetEncoding(1252); StreamReader loResponseStream = new StreamReader(webresponse.GetResponseStream(), enc); // get availability zone value string availzone = loResponseStream.ReadToEnd();
  23. 23. Amazon Route53 Updates http://vrg.s3.amazonaws.com/downloads/route53.json # Retrieving existing ELB details from Route53 Hosted Zone..“ domainname=www.awsdrdemo.com hostedzoneid="ZXXXXXXXXXXXXR“ # Retrieve ELB alias zone-id from existing Route53 zone zoneid= $(aws --region us-west-1 --output text route53 list-resource-record-sets --hosted-zone-id $hostedzoneid -start-record-name $domainname --start-record-type A --max-items 1 | grep ALIASTARGET | awk {'print $2'}) dns=$(aws --region us-west-1 --output text route53 list-resource-record-sets --hosted-zone-id $hostedzoneid --startrecord-name $domainname --start-record-type A --max-items 1 | grep ALIASTARGET | awk {'print $4'}) change-resource-record-sets --hosted-zone-id $hostedzoneid -change-batch file:///usr/local/bin/route53.json aws --region us-west-1 route53
  24. 24. Resize Database Instance # Stopping DB instance for resizing aws --region us-west-1 ec2 stop-instances --instance-ids $dbInstanceId # Publish Amazon SNS messages for actions aws --region us-west-1 sns instance“ publish --topic-arn $snsarn --message "Resizing the stopped # Resize the DB instance aws --region us-west-1 ec2 modify-instance-attribute --instance-id $dbInstanceId --instancetype "{"Value": "m1.small"}" # Start the resized DB instance aws --region us-west-1 ec2 start-instances --instance-ids $dbInstanceId
  25. 25. AWS CloudFormation Stack Launch # Launch DR stack using AWS CloudFormation script launchedstackid =$(aws --region us-west-1 --output text cloudformation create-stack --stackname $stackname --template-body file:///usr/local/bin/ELBWithEC2Instances.template -notification-ar-ns $snsarn --parameters ParameterKey="HostedZoneId",ParameterValue="$hostedzoneid")
  26. 26. AWS CloudFormation Template http://vrg.s3.amazonaws.com/downloads/ELBWithEC2Instances.template { "AWSTemplateFormatVersion" : "2010-09-09", "Description" : "AWS CloudFormation Template ELBWithEC2Instances: Create a load balanced, Auto Scaled sample website where the instances are locked down to only accept traffic from the load balancer. This script creates an Auto Scaling group behind a load balancer with a simple health check. The web site is available on port 80, however, the instances can be configured to listen on any port (8888 by default).", "Parameters" : { HEADERS "KeyPairName" : { "Description" : "Name of an existing Amazon EC2 key pair for SSH access", "Type" : "String", "Default" : "kamalkeydr" }, "InstanceType" : { "Description" : "WebServer EC2 instance type", "Type" : "String", "Default" : "m1.small", "AllowedValues" : [ "t1.micro","m1.small","m1.medium","m1.large","m1.xlarge","m2.xlarge","m2.2xlarge","m2.4xlarge","c1.medium","c1.xlarge","cc1.4xlarge","cc2.8xlarge","cg1.4xlarge"], "ConstraintDescription" : "must be a valid EC2 instance type." }, "WebServerPort" : { "Description" : "TCP/IP port of the web server", "Type" : "String", "Default" : "80" }, "HostedZoneId" : { "Type" : "String", "Description" : "The Record Set's Hosted Zone Id for the existing hosted zone", "Default" : "Z1M58G0W56PQJA" } }, PARAMETERS "Mappings" : { "AWSInstanceType2Arch" : { "t1.micro" : { "Arch" : "64" }, "m1.small" : { "Arch" : "64" }, "m1.medium" : { "Arch" : "64" }, "m1.large" : { "Arch" : "64" }, "m1.xlarge" : { "Arch" : "64" }, "m2.xlarge" : { "Arch" : "64" }, "m2.2xlarge" : { "Arch" : "64" }, "m2.4xlarge" : { "Arch" : "64" }, "c1.medium" : { "Arch" : "64" }, "c1.xlarge" : { "Arch" : "64" } }, MAPPINGS "AWSRegionArch2AMI" : { "us-west-1" : { "32" : "ami-5e41761b", "64" : "ami-5e41761b" } } }, "Resources" : { "WebServerGroup" : { "Type" : "AWS::AutoScaling::AutoScalingGroup", "Properties" : { "AvailabilityZones" : [ "us-west-1a"], "LaunchConfigurationName" : { "Ref" : "LaunchConfig" }, "MinSize" : "2", "MaxSize" : "2", "LoadBalancerNames" : [ { "Ref" : "ElasticLoadBalancer" }], "VPCZoneIdentifier" : ["subnet-bbf2efd3"] } }, "LaunchConfig" : { "Type" : "AWS::AutoScaling::LaunchConfiguration", "Properties" : { "ImageId" : { "Fn::FindInMap" : [ "AWSRegionArch2AMI", { "Ref" : "AWS::Region" }, { "Fn::FindInMap" : [ "AWSInstanceType2Arch", { "Ref" : "InstanceType" }, "Arch" ] } ] }, "UserData" : { "Fn::Base64" : { "Ref" : "WebServerPort" }}, "SecurityGroups" : [ { "Ref" : "InstanceSecurityGroup" } ], "InstanceType" : { "Ref" : "InstanceType" }, "KeyName" : { "Ref" : "KeyPairName" }, "AssociatePublicIpAddress" : "true" } }, "ElasticLoadBalancer" : { "Type" : "AWS::ElasticLoadBalancing::LoadBalancer", "Properties" : { RESOURCES "SecurityGroups" : [ { "Ref" : "LoadBalancerSecurityGroup" } ], "Subnets" : ["subnet-bbf2efd3"], "Listeners" : [ { "LoadBalancerPort" : "80", "InstancePort" : { "Ref" : "WebServerPort" }, "Protocol" : "HTTP" } ], "HealthCheck" : { "Target" : { "Fn::Join" : [ "", ["HTTP:", { "Ref" : "WebServerPort" }, "/"]]}, "HealthyThreshold" : "2", "UnhealthyThreshold" : "10", "Interval" : "10", "Timeout" : "3" } } }, "LoadBalancerSecurityGroup" : { "Type" : "AWS::EC2::SecurityGroup", "Properties" : { "GroupDescription" : "Enable HTTP access on port 80", "VpcId" : "vpc-a4f2efcc", "SecurityGroupIngress" : [ { "IpProtocol" : "tcp", "FromPort" : "80", "ToPort" : "80", "CidrIp" : "0.0.0.0/0" } ], "SecurityGroupEgress" : [ { "IpProtocol" : "tcp", "FromPort" : { "Ref" : "WebServerPort" }, "ToPort" : { "Ref" : "WebServerPort" }, "CidrIp" : "0.0.0.0/0" }] } }, "myDNS" : { "Type" : "AWS::Route53::RecordSetGroup", "Properties" : { "HostedZoneName" : "awsdrdemo.com.", "Comment" : "Zone apex alias targeted to myELB LoadBalancer.", "RecordSets" : [ { "Name" : "www.awsdrdemo.com.", "Type" : "A", "AliasTarget" : { "HostedZoneId" : { "Fn::GetAtt" : ["ElasticLoadBalancer", "CanonicalHostedZoneNameID"] }, "DNSName" : { "Fn::GetAtt" : ["ElasticLoadBalancer","CanonicalHostedZoneName"] } } } ] } }, "InstanceSecurityGroup" : { "Type" : "AWS::EC2::SecurityGroup", "Properties" : { "GroupDescription" : "Enable SSH access and HTTP access on the inbound port", "VpcId" : "vpc-a4f2efcc", "SecurityGroupIngress" : [ { "IpProtocol" : "tcp", "FromPort" : { "Ref" : "WebServerPort" }, "ToPort" : { "Ref" : "WebServerPort" }, "CidrIp" : "0.0.0.0/0" }] } } }, OUTPUTS
  27. 27. Parameters "Parameters" : { "KeyPairName" : { "Description" : "Name of an existing Amazon EC2 key pair for SSH access", "Type" : "String" }, "InstanceType" : { "Description" : "WebServer EC2 instance type", "Type" : "String", "Default" : "m1.small", "AllowedValues" : [ "t1.micro","m1.small","m1.medium","m1.large","m1.xlarge","m2.xlarge","m2.2xlarge","m2.4xlarge","c1.medium","c1.xlarge","cc1.4xlarge","cc2.8xl arge","cg1.4xlarge"], "ConstraintDescription" : "must be a valid EC2 instance type." }, "HostedZoneId" : { "Type" : "String", "Description" : "The Record Set's Hosted Zone Id for the existing hosted zone" } }
  28. 28. Resources – Web Servers "WebServerGroup" : { "Type" : "AWS::AutoScaling::AutoScalingGroup", "Properties" : { "AvailabilityZones" : [ "us-west-1a"], "LaunchConfigurationName" : { "Ref" : "LaunchConfig" }, "MinSize" : "2", "MaxSize" : "2", "LoadBalancerNames" : [ { "Ref" : "ElasticLoadBalancer" }], "VPCZoneIdentifier" : ["subnet-bbf2efd3"] } }, "LaunchConfig" : { "Type" : "AWS::AutoScaling::LaunchConfiguration", "Properties" : { "ImageId" : { "Fn::FindInMap" : [ "AWSRegionArch2AMI", { "Ref" : "AWS::Region" }, { "Fn::FindInMap" : [ "AWSInstanceType2Arch", { "Ref" : "InstanceType" }, "Arch" ] } ] }, "UserData" : { "Fn::Base64" : { "Ref" : "WebServerPort" }}, "SecurityGroups" : [ { "Ref" : "InstanceSecurityGroup" } ], "KeyName" : { "Ref" : "KeyPairName" } }
  29. 29. Demo – Failover Status Updates status.awsdrdemo.com/dr
  30. 30. Disaster recovery site on AWS can be for • Primary site on customer data center • Primary on AWS itself
  31. 31. Primary and DR sites on AWS
  32. 32. Backup & Restore pattern Simple to get started Cost-effective Easy starting point for exploring the Very high levels of data durability at AWS cloud low price Low technical barrier to entry Cost of storing snapshots in Focus on incorporating cloud into your Amazon S3 DR strategy, not on complex technical Archiving possibilities beyond tape issues related to hot-hot systems using Amazon Glacier
  33. 33. Backup and restore
  34. 34. Backup and restore
  35. 35. Backup and restore Create instances from AMIs Restore data from backups
  36. 36. Many ways to backup
  37. 37. Disaster Recovery site on AWS can be for • Primary site on customer data center • Primary on AWS itself
  38. 38. Primary and DR sites on AWS
  39. 39. Customer case study
  40. 40. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×