Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Disaster Recovery and Business Continuity - Toronto FSI Symposium - October 2016

7,617 views

Published on

Felix Candelario
Global Financial Services Solutions Architect explains the high level AWS Cloud Architecture, the concepts behind Availability Zones, Regions and how they relate with the traditional concept of Data Centers, Pods. He concludes with a presentation on how applications can be architected for the AWS Cloud, and how Mission Critical, Disaster Recovery and Business Continuity architectures are in use by Financial Services customer today.

Published in: Business
  • Login to see the comments

Disaster Recovery and Business Continuity - Toronto FSI Symposium - October 2016

  1. 1. Felix Candelario Global Financial Services Solutions Architect “Disaster Recovery and Business Continuity”
  2. 2. Agenda • AWS Disaster Recovery Concepts & Terminology • Architecting for Recovery & Resiliency • Disaster Recovery Testing & Assurance • Architecting for the Cloud
  3. 3. “Everything fails, all the time” - Werner Vogels (CTO, Amazon.com)
  4. 4. Concepts & Terminology
  5. 5. DR Terminology Map ELB/Appliance EC2/Auto Scaling Route 53 Load Balancers Web/App Servers Your Data Centers DNS Amazon RDS Security Groups / ACL Availability Zones / VPC Multi-region Geographical Redundancy Data Centers Firewall Database Servers
  6. 6. What is an AWS Region? • Geographic locations that contains a cluster of availability zones in a given metropolitan area. • Each region is completely isolated and independent from other regions • Each region consists of 2 or more AZs to support high availability (HA) through AZ independence
  7. 7. Highly Reliable Global Footprint • Over 1 million active customers per month across 190 countries • 2,300 government agencies • 7,000 educational institutions • 35 availability zones + 9 more coming soon • 59 edge locations 13+ worldwide regions
  8. 8. What are Availability Zones? • Groupings of one or more data centers that are physically isolated. • AZs are connected to each other over low- latency links within the same region • Using 2 or more AZs within a region can provide support for capabilities such as synchronous database replication and better pricing when using Amazon EC2 Spot instances
  9. 9. Availability Zones are Notated as Letters 35 Availability Zones (AZs) • Example • US East 1 (Northern VA) – us-east-1a – us-east-1b – us-east-1c – us-east-1d – us-east-1e Availability Zone A Availability Zone B Availability Zone C US-EAST-1 Availability Zone D Availability Zone E
  10. 10. What is an Amazon VPC? • Virtual isolated network that you define in which you can launch AWS resources such as Amazon EC2 instances • Complete control of your virtual networking environment such as • Set your own IP address ranges • Create subnets • Configure routing tables and network gateways • Allows extension of your corporate network to the AWS Cloud
  11. 11. VPC Pattern Diagram - Example Development Amazon VPC Integration Amazon VPC Pre-production Amazon VPC Production Amazon VPC
  12. 12. Putting It All Together
  13. 13. What Compute Services are available? Amazon EC2 Auto Scaling Elastic Load Balancing Actual EC2 Elastic Virtual servers in the cloud Dynamic traffic distribution Automated scaling of EC2 capacity
  14. 14. What Network Services are available? Amazon VPC: AWS DirectConnect Amazon Route 53 Availability Zone B Availability Zone A Private, isolated section of the AWS Cloud Private connectivity between AWS and your datacenter Domain Name System (DNS) web service.
  15. 15. Architecting for Recovery & Resiliency
  16. 16. Resiliency Backup Disaster Recovery Reducing likelihood of service failure Maintaining Data Integrity Recovery after loss of availability It’s not all or nothing. Choose a strategy that fits the business objective.
  17. 17. Disaster Recovery point Recovery time Data loss Down time
  18. 18. Ascending levels of DR options Backup & Restore Pilot Light Warm Standby Hot-Site Backup of on- premises data to AWS to use in a DR event Replicate data and minimal running services into AWS, ready to take over and flare up Replicate data and services into AWS ready to take over Replicated and load balanced environments that are both actively taking production traffic RPO a RTO COST 24 hours 24 hours $ RPO a RTO COST 12 hours 4 hours $$ RPO a RTO COST 1-4 hours 15 min $$$ RPO a RTO COST <15 min 0-5 min $$$ Business continuity begins Un-interrupted Business continuity
  19. 19. ~$200 / Month In US-EAST +VPN On-premises Active Production www.example.com Corporate data center AWS region AWS DR failover App Servers DB Server VPN Connection Storage GatewayiSCSI Backup System S3 / Bucket Glacier / Archive Web Servers Internet traffic S3 (1TB) $31/Month Glacier (2TB) $22/Month Storage Gateway $125/Month S3 / Bucket S3 (1TB) $31/Month 1TB Data Volume Backup and Restore Architecture
  20. 20. Suitable for • Solutions that can sustain higher technical debt • Lower business critical nature • Low cost DR option Leverage existing investments in • De-duplication • Compression • WAN Acceleration Backup and Restore Details
  21. 21. Pilot light
  22. 22. Subordinate database server Pilot light–prep www.example.com Data mirroring replication Not running Pilot light system Reverse proxy/ caching server Data volume Application server Corporate data center Reverse proxy/ caching server Application server Master Database server
  23. 23. Database server Pilot light–recovery www.example.com Start in minutes Add additional capacity, if needed Reverse proxy/ caching server Data volume Application server Corporate data center Reverse proxy/ caching server Application server Master Database server
  24. 24. Considerations Suitable for: • Solutions that need lower RTO & RPO • higher business critical nature • Mid-range cost DR option Pilot Light Details
  25. 25. Warm standby
  26. 26. Warm standby–prep Mirroring /replication Application data source cut over Elastic load balancer Active Not active for production traffic Route 53 www.example.com Scaled down standbyCorporate data center Data volume Application server Subordinate database server Reverse proxy/ caching server AWS region Reverse proxy/ caching server Application server Master Database server
  27. 27. Warm standby–recover Elastic load balancerActive Route 53 www.example.com Scaled-up production Corporate data center Data volume Application server Database server Reverse proxy/ caching server AWS region Reverse proxy/ caching server Application Server Master Database server
  28. 28. Hot site
  29. 29. Hot site–prep Mirroring /replication Application data source cut over Elastic load balancer Active Route 53 www.example.com Corporate data center Data volume Application server Subordinate database server Reverse proxy/ caching server AWS region Reverse proxy/ caching server Application server Master Database server Active
  30. 30. Hot site–recovery Elastic load balancer Route 53 www.example.com Corporate data center Data volume Application server Database server Reverse proxy/ caching server AWS region Reverse proxy/ caching server Application server Master Database server Active Scaled up for production use
  31. 31. Considerations Suitable for: • Solutions that require RTO & RPO in minutes • Core business critical functions • Higher cost DR option Warm Standby and Multi-site Details
  32. 32. Disaster Recovery Testing & Assurance
  33. 33. Continuous Testing of Infrastructure • Continuously and constantly test. • Regularly execute tests in stable, production & production-like test environments. • Infrastructure as Code • CI/CD Test in Infrastructure Build Pipeline • Testing of infrastructure during Integration Test
  34. 34. Warm Standby – Testing Mirroring /replication Application data source cut over Elastic load balancer Active Not active for production traffic Route 53 www.example.com Scaled down standbyCorporate data center Data volume Application server Subordinate database server Reverse proxy/ caching server AWS region Reverse proxy/ caching server Application server Master Database server
  35. 35. Warm Standby – Testing Mirroring /replication Application data source cut over Elastic load balancer Active Not active for production traffic Route 53 www.example.com Scaled down standbyCorporate data center Data volume Application server Subordinate database server Reverse proxy/ caching server AWS region Reverse proxy/ caching server Application server Master Database server
  36. 36. Warm Standby – Testing Mirroring /replication Application data source cut over Elastic load balancer Active Not active for production traffic Route 53 www.example.com Scaled down standbyCorporate data center Data volume Application server Subordinate database server Reverse proxy/ caching server AWS region Reverse proxy/ caching server Application server Master Database server
  37. 37. Warm Standby – Testing Mirroring /replication Application data source cut over Elastic load balancer Active Not active for production traffic Route 53 www.example.com Scaled down standbyCorporate data center Data volume Application server Subordinate database server Reverse proxy/ caching server AWS region Reverse proxy/ caching server Application server Master Database server aws rds reboot-db-instance --db-instance-identifier dbInstanceID --force-failover
  38. 38. Architecting for Cloud
  39. 39. Architecting for Resiliency
  40. 40. Cloud Based Architectures • High level of control over the environment • Automate Everything! – Utilise AWS APIs • Infrastructure as code – CloudFormation • Parallel environment • Rolling Update / All at Once • Blue / Green Deployments - Significant difference between physical and cloud is the control and visibility cloud provides
  41. 41. Common thread: Environment automation Deployment success depends on mitigating risk for: • Application issues (functional) • Application performance • People/process errors • Infrastructure failure • Rollback capability • Large costs CloudFormation most comprehensive automation platform • Scope stacks from network to software • Control higher-level automation services: Elastic Beanstalk, ECS, OpsWorks, Auto Scaling Strength of automation platform
  42. 42. Benefits of deployment on AWS AWS: • Agile deployments • Flexible options • RPO/RTO & Business Continuity objectives • Scalable capacity • Pay for what you use • Automation capabilities
  43. 43. Enterprise Observations Business Enablement Art of the Possible Legacy Tech Debt
  44. 44. Art of the Possible - State of DevOps 2016 Frequent Deployments 200x more frequent deployment Faster Recovery 24x faster recovery from failure Lower Failure Rate 3x lower change failure rate Less Unplanned Work 22% less time spent on unplanned work and rework Shorter Lead Times 2,555x shorter lead times Source: Puppet Labs - State of DevOps 2016 Report
  45. 45. Thank You

×