Leveraging the Public Cloud for Disaster Recovery

640 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
640
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Leveraging the Public Cloud for Disaster Recovery

  1. 1. Leveraging the Public Cloud for Disaster Recovery Lahav Savir, Architect & CEO Emind systems Ltd. lahavs@emind.co
  2. 2. AboutLahav Savir• 15+ years’ experience in on-line industry• Architect and CEO @ Emind SystemsEmind Systems (est. 2006)• Boutique system integrator• ~100 AWS customers• AWS solution provider
  3. 3. Amazon (AWS) Certification Amazon Solution Provider & Consulting Partner https://aws.amazon.com/solution-providers/si/emind-systems-ltd
  4. 4. Disaster Recovery in a Nutshell• Business continuity• Minimize downtime and data loss• Recovery Time Objective (RPO)• Recovery Point Objective (RTO)• Price
  5. 5. DR ApproachesComplete server Data mirroring / Configurationmirroring replication replication
  6. 6. Emind’s Best Practice
  7. 7. Why Amazon ? Flexible, Global Infrastructure• N. Virginia• Oregon• N. California• Ireland• Singapore• Tokyo• Sydney• São Paulo• GovCloud
  8. 8. Secure• VPC - Virtual Private Cloud on AWSs infrastructure• Specify private IP address range• Bridge your onsite IT infrastructure and the VPC with a VPN connection or Direct Connect• Extending your existing security and management policies to the cloud
  9. 9. A different cost model Ability to scale – Cost savings no arbitrary time limit to failback w/ AWSInfrastructure Cost 2nd Site Cost AWS Cost Demand Time Test Test Failover Failback
  10. 10. Zoom into the technics
  11. 11. Disaster Recovery Terms• RTO: Recovery Time Objective – Acceptable time period within which normal operation (or degraded operation) needs to be restored after event• RPO: Recovery Point Objective – Acceptable data loss measured in time
  12. 12. Backup and Restore Amazon Route 53 Data copied to S3 Traditional server S3 Bucket with Objects AWSOn-premises Infrastructure Import/Export
  13. 13. Backup and Restore Amazon EC2 Data copied from Instance objects in S3 Data VolumeInstance Quickly Amazonprovisioned from S3 Bucket AMI Pre-bundled with OS and applications AMI Availability Zone AWS Region
  14. 14. Backup and Restore• Advantages – Simple to get started – Extremely cost effective (mostly backup storage)• Preparation Phase – Take backups of current systems – Store backups in S3 – Describe procedure to restore from backup on AWS • Know which AMI to use, build your own as needed • Know how to restore system from backups • Know how to switch to new system
  15. 15. Backup and Restore• In Case of Disaster – Retrieve backups from S3 – Bring up required infrastructure • EC2 instances with prepared AMIs, Load Balancing, etc. – Restore system from backup – Switch over to the new system • Adjust DNS records to point to AWS• Objectives – RTO: as long as it takes to bring up infrastructure and restore system from backups – RPO: time since last backup
  16. 16. Pilot Light User or systemWeb WebServer Server Amazon Route 53 Not RunningApplication ApplicationServer ServerDatabase Database Data Mirroring/ Server Smaller InstanceServer Replication Data Data Volume Volume
  17. 17. Pilot Light User or systemWebWeb WebServerServer Server Amazon Route 53 Not RunningApplication ApplicationServer ServerDatabase DatabaseDatabaseServer Data Mirroring/ Server Smaller InstanceServer Replication Data Data Volume Volume
  18. 18. Pilot Light User or systemWebWeb WebServerServer Server Amazon Route 53 Start in minutesApplication ApplicationServer ServerDatabase DatabaseDatabaseServer Data Mirroring/ Server Resize as desiredServer Replication Data Data Volume Volume
  19. 19. Pilot Light• Advantages – Very cost effective (fewer 24/7 resources)• Preparation Phase – Enable replication of all critical data to AWS – Prepare all required resources for automatic start • AMIs, Network Settings, Load Balancing, etc.
  20. 20. Pilot Light• In Case of Disaster – Automatically bring up resources around the replicated core data set – Scale the system as needed to handle current production traffic – Switch over to the new system • Adjust DNS records to point to AWS• Objectives – RTO: as long as it takes to detect need for DR and automatically scale up replacement system – RPO: depends on replication type
  21. 21. Fully-Working Low Capacity Standby User or system Web Web Server Server Amazon Route 53 Low Capacity App Application Server Server Database DB Server Data Mirroring/ Server Replication Data Data Volume Volume
  22. 22. Fully-Working Low Capacity Standby User or system Web Web Server Server Amazon Route 53 Low Capacity App Application Server Server Database DB Server Data Mirroring/ Server Replication Data Data Volume Volume
  23. 23. Fully-Working Low Capacity Standby User or system Web Web Web Server Server Server Amazon Route 53 Grow Capacity Application Application App Server Server Server Database Database DB Server Data Mirroring/ Server Server Replication Data Data Volume Volume
  24. 24. Fully-Working Low-Capacity Standby User or system Web Web Web Server Server Server Amazon Route 53 Grow Capacity Application Application App Server Server Server Database Database DB Server Data Mirroring/ Server Server Replication Data Data Volume Volume
  25. 25. Fully-Working Low-Capacity Standby• Advantages – Can take some production traffic at any time – Cost savings (IT footprint smaller than full DR)• Preparation – Similar to Pilot Light – All necessary components running 24/7, but not scaled for production traffic – Best practice – continuous testing • “Trickle” a statistical subset of production traffic to DR site
  26. 26. Fully-Working Low-Capacity Standby• In Case of Disaster – Immediately fail over most critical production load • Adjust DNS records to point to AWS – (Auto) Scale the system further to handle all production load• Objectives – RTO: for critical load: as long as it takes to fail over; for all other load, as long as it takes to scale further – RPO: depends on replication type
  27. 27. Multi-Site Hot Standby User or systemWeb Web Web ServerServer Server Amazon Route 53 Full Capacity ApplicationApplication Application App ServerServer Server ServerDatabase Database Database DBServer Server Data Mirroring/ Server Server Replication Data Data Volume Volume
  28. 28. Multi-Site Hot Standby• Advantages – At any moment can take all production load• Preparation – Similar to Low-Capacity Standby – Fully scaling in/out with production load• In Case of Disaster – Immediately fail over all production load • Adjust DNS records to point to AWS• Objectives – RTO: as long as it takes fail over – RPO: depends on replication type
  29. 29. Summary• Plan – Analyze your existing applications and services – Find the right approach per case• Adapt – Match your plan to RTO, RPO and Budget• POC – Validate your plan• Test – Periodic testing• Monitor – Ensure continues operation of all
  30. 30. • goCloud – Emind’s optimal road to the cloud – Secure cloud architecture – Scalable & high-availability design – Customized system deployment – Orchestrating cloud and software – Cloud operation team – Monitoring and alerting – 24x7 SLA
  31. 31. Contact melahavs@emind.co @lahavsavir 054-4321688

×