Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS Cloud Disaster Recovery Plan Checklist - Are you ready?

1,061 views

Published on

Building your DR plan in the AWS cloud can be tricky when compared to on-premise methodologies. Make sure you take the following into consideration when designing your AWS DR plan:

•Amazon Web Services required for DR
•High Availability AWS Architecture
•Known AWS disaster types
•Impact of 3rd-party services

Get an in-depth presentation covering DR on the AWS cloud.

Published in: Technology, Business
  • Be the first to comment

AWS Cloud Disaster Recovery Plan Checklist - Are you ready?

  1. 1. Solving the problem of downtime in the cloud AWS Cloud Disaster Recovery Plan Checklist Are You Ready?
  2. 2.  Founded: 2012  Offers Disaster Recovery as a Service for cloud-based applications  Using Continuous Replication of your Entire Application Stack Source: Forrester About CloudEndure Some Of Our Customers
  3. 3. Agenda  DR 101 – Definitions and Terminology  Why AWS for DR?  AWS Global Infrastructure  4 Types of Disaster  3 Takeaways  Q&A
  4. 4. Disaster Recovery in 30 Words Disaster recovery (DR) is the process, policies and procedures that are related to preparing for recovery or continuation of technology infrastructure which are vital to an organization after a natural or human induced crisis
  5. 5. DR Key Terminology  RPO – Recovery Point Objective – The maximum tolerable period in which data might be lost.  RTO – Recovery Time Objective - The duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences.  Data replication – sharing information so as to ensure consistency between redundant resources.
  6. 6. DR – What it’s not  Unlike Backup, which is mostly about data loss prevention, DR is about service availability - low RPO and RTO.  DR complements other High Availability activities, but while those deal with disaster prevention, DR is for those times when the preventions failed.
  7. 7. Why DR?  54% of Cloud IT Managers experienced an outage in the past 3 months  Top challenges in meeting availability goals: Insufficient IT resources, Budget limitations, Software Bugs  79% reports a service availability goal of “Three Nines” (99.9%) Source: 2014 Cloud Disaster Recovery Survey Available for download in the “Resources” tab of the webinar
  8. 8. Why AWS for DR Flexible Define different recovery objectives for different components and change them on the fly. You can grow and shrink your disaster site whenever necessary (even automatically). Cheap Pay for hourly usage of resources. Only create your disaster site when it’s needed. Don’t pay for two running sites all the time Easy DR and HA made easier – No need to build your DR solution from scratch. AWS already has many of the building blocks built-in – AutoScale, snapshots, CloudFormation…
  9. 9. AWS Global Infrastructure AWS Region Availability Zone
  10. 10. AWS Global Infrastructure  Regions  8 publicly available regions.  Spread all over the world.  Completely independent. Different teams. Different infrastructure.  Availability Zones (AZs)  Each region contains one or more availability zones.  Physically separated, but in the same geographical location.  Share teams and software infrastructure.  Dynamic Resource Allocation  Pay for resources on an hourly basis.  Create and destroy resources quickly on demand using AWS dashboard, CLI or API.  Automation is built into several services (such as Autoscale). APIs let you add additional automation layers.
  11. 11. Types of downtime Single-AZ disaster Whole-region disaster Single-service disaster Single-resource disaster
  12. 12. Disaster Type 1 - Single-resource disaster  A single resource (instance, EBS, ELB…) stops functioning.  Very high. For example, instances are sometimes terminated by AWS or just stop working without warning.  Make sure that no single resource is a point of failure. Use clusters for stateless instances (you can use AutoScale and AMIs to help you). Configure RAIDs for volumes. Use services that are managed by AWS such as RDS to store your state and data. What is it? Frequency How to prepare? Single- resource disaster
  13. 13. Disaster Type 2 - Single-AZ disaster  A whole AZ goes down, but all the other AZs in the region still function.  More than 10 times a year (may be a different AZ every time).  Build your system so that it’s spread across multiple AZs and can survive downtime of any single AZ failure. Connect subnets in different AZs to your ELB and turn on multi-AZ for RDS. Single- AZ disaster What is it? Frequency How to prepare?
  14. 14. Disaster Type 3 - Single-service disaster  A specific service goes down across the entire region. Almost always contained within a single region.  Several times a year (a different service every time).  Resist the temptation to use AWS services for everything. Choose your services carefully. Be ready to recreate your system in a different region, where the service works well (see next slide). Single- service disaster What is it? Frequency How to prepare?
  15. 15. Disaster Type 4 - Whole-region disaster  An entire region goes down taking all the applications running on it with it.  Several times a year (a different region every time) – see CloudEndure blog post comparing the uptime of all AWS regions.  Implement cross-region DR methodology. Take snapshots of your instances and copy them to a different region. Use CloudFormation to define your application stack. Copy AMIs to a different region. Use cross-region read replicas for RDS. Use continuous data replication. Whole-region disaster What is it? Frequency How to prepare?
  16. 16. Beyond AWS  Not all outages are caused by your cloud provider. Downtime of used 3rd party services can take your application down too. For example – DNS, CDN, 3-rd part login services…  Pick your 3rd party services carefully. Check the historical stability of the considered services. Don’t rely on 3-rd party services more than you need to.
  17. 17. 3 Takeaways Design DR into your system – the earlier you implement DR the easier it is to recover. It’s too late to think about DR after disaster strikes. Take advantage of what AWS offers. AWS provides many building blocks to help you build a DR solution for your application – you don’t need to do everything from scratch. Understand the impact of relying on services – each used service can cause downtime. Check the stability of the service you’re using and design your system to stay up even if some of the services it depends on are down. 1 2 3
  18. 18. Thank You Leonid Feinberg VP Products leonid@cloudendure.com

×