Application-level Disaster Recovery on OpenStack

2,498 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,498
On SlideShare
0
From Embeds
0
Number of Embeds
576
Actions
Shares
0
Downloads
126
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Application-level Disaster Recovery on OpenStack

  1. 1. Cloudifying High Availability Application-level Disaster Recovery on OpenStack Ali Hodroj Director, Solution Architecture
  2. 2.  Context and Concepts  Regions, Zones, and Single Points of Failure  Challenges and Trade-Offs  Architecting HA/DR Solutions with Cloudify  Case Studies  Resources and Q&A AGENDA Copyright 2014 Gigaspaces. All Rights Reserved
  3. 3. Context and Concepts 3
  4. 4. FOCUS OF THIS SESSION Copyright 2014 Gigaspaces. All Rights Reserved Application DR, Fault isolation strategies, deployment patterns • Pacemaker • Corosync messaging • HAProxy • Galera MySQL replication HA/DR Layers Power, air conditioning, fire protection…etc
  5. 5.  Fault Tolerance  Ability to withstand failure and operate with normal or degraded performance  Redundancy and Replication  High Availability  “The nines” – 99.99% = 33mins/year  Minutes/Hours of uptime per year  Single Point of Failure  Part of a system that, if it fails, it will bring down the entire system CONCEPTS High Availability  RTO  How much downtime are you willing to tolerate?  RPO  How much data are you willing to lose ?  Cost  Development Effort  Redundant environments Disaster Recovery Copyright 2014 Gigaspaces. All Rights Reserved
  6. 6.  Availability includes both planned and unplanned outage  “Everything fails, all the time”  Cloud vendor SLA’s demand multi-zone outage and deployments to be effective CONCEPTS…IN THE REAL WORLD High Availability Disaster Recovery Copyright 2014 Gigaspaces. All Rights Reserved + + + 99.95% 99.90% 99.90% 99.99% 21 minutes 43 minutes 43 minutes 21 minutes = 99.74% 112.3 minutes  Accomplishing high levels of redundancy in the cloud is expensive  Determining an appropriate RPO and RTO is ultimately a financial calculation
  7. 7. Regions, Zones, and Single Points of Failure 7
  8. 8. 8 CLOUD HIGH AVAILABILITY: MATURITY MODEL Single server instance, same data center Same geographical region Same operational procedures, provider Single Points of Failures Copyright 2014 Gigaspaces. All Rights Reserved
  9. 9. 9 MULTI-ZONE ARCHITECTURE Copyright 2014 Gigaspaces. All Rights Reserved  Physically separated data centers within a region  Each availability zone  Independent power feeds from separate substations  Redundant Power on each rack and diverse cabling  Shared images, security groups, and floating IPs
  10. 10. 10 MULTI-REGION ARCHITECTURE Copyright 2014 Gigaspaces. All Rights Reserved  Characteristics  Geographically dispersed architecture  Disaster Recovery Patterns  Replicate stateful tiers, orchestrate stateless upon failure  Challenges  Data replication costs and performance  Network flow  Orchestrating recovery
  11. 11. 11 MULTI-CLOUD ARCHITECTURE Copyright 2014 Gigaspaces. All Rights Reserved  Characteristics  Leverages cloud economics  Workload migration (“Own the base, rent the spike”)  Least single points of failure  Disaster Recovery Patterns  Replicate stateful tiers, orchestrate stateless upon failure  Challenges  Bootstrapping data for stateful services (snapshot or async replication?)  Data replication challenges over WAN  Complex setup
  12. 12. Challenges and Trade-Offs 12
  13. 13. 13 DEPLOYMENT (ACCIDENTAL) COMPLEXITY Consistent deployment Cross zone configuration Machine images, security groups, keys Different API, zone/region hierarchies Accidental Complexity: The higher we move in the HA scale, the less manageable the deployments become Copyright 2014 Gigaspaces. All Rights Reserved Replication in itself is useless, it’s the recovery orchestration that counts
  14. 14.  Compute, Storage Cost  Bandwidth Cost COST OF REDUNDANCY Cost  VM Startup time / Instance Acquisition  Latency/Bandwidth across regions  General performance (IOPS, SSD) RTO/RPO Impacting Copyright 2014 Gigaspaces. All Rights Reserved http://www.slideshare.net/mingtemp/a-performance-study-on-the-vm- startup-time-in-the-cloud
  15. 15. Architecting HA/DR Solutions with Cloudify 15
  16. 16. Cloudify provides the equivalent of Amazon OpsWork on OpenStack APP CENTRIC DEVOPS http://appcatalog.cloudifysource.org/ Nova, Cinder, NeutronHeat OpenShift, CloudFoundry
  17. 17. ORCHESTRATORS, RECIPES, AND “CLOUDS” Existing Data Center OpenStack Private Cloud Cloud Driver OpenStack Public CloudOpenStack Micro Cloud Cloud – a set of shared compute, storage, network resources behind an OpenStack API, e.g.: resource in: • Availability zone • Region • Public cloud • HP Cloud US-West / AZ1 • RackSpace Chicago (ORD) region • DevStack, Vagrant • Recipe Development & DR testing • Bare metal or virtual environment
  18. 18. 18 KEY PRINCIPLES Copyright 2014 Gigaspaces. All Rights Reserved • Automation First (operational processes) • Decouple the Application from the infrastructure (design for failure) • Use Plug-In approach to plug the right cloud for the Job (balance cost, complexity, testing) • Aggressive monitoring across the app stack
  19. 19. 19 KEY PRINCIPLES Copyright 2014 Gigaspaces. All Rights Reserved • Automation First (operational processes) Provision Install Configure Deploy Monitor Scale https://github.com/CloudifySource/cloudify-recipes/
  20. 20. 20 KEY PRINCIPLES Copyright 2014 Gigaspaces. All Rights Reserved • Decouple the Application from the infrastructure (design for failure) Storage Network Cloud Templates Compute
  21. 21. 21 KEY PRINCIPLES Copyright 2014 Gigaspaces. All Rights Reserved • Use Plug-In approach to plug the right cloud for the Job (balance cost, complexity, testing)
  22. 22. 22 KEY PRINCIPLES Copyright 2014 Gigaspaces. All Rights Reserved • Aggressive monitoring across the app stack Scaling rules Automatic Failover Scaling rules
  23. 23. Case Studies (putting it all together) 23
  24. 24. 24 DR ELASTICITY CONTINUUM Cold/Warm Disaster Recovery Hot Disaster Recovery Higher RTO Lower Cost Lower RTO Higher Cost Copyright 2014 Gigaspaces. All Rights Reserved Operationally Critical Business Critical Mission Critical
  25. 25. 25 OPERATIONALLY CRITICAL: COLD DR Copyright 2014 Gigaspaces. All Rights Reserved Characteristics Design / Recipe Implementation Financial Services customer, post-trade processing application • Cold Disaster Recovery (clone your recipe on another cloud in case of disaster) • Recipes used for Disaster Recovery planning trade-off analysis
  26. 26. 26 BUSINESS CRITICAL: CROSS-REGION DR Copyright 2014 Gigaspaces. All Rights Reserved Characteristics Design / Recipe Implementation Transportation/Logistics Big Data / Realtime Analytics • Autoscaling JBoss • 4 services recipes deployed across both regions • Recipes orchestrate setup, snapshot, and provisioning of PostgreSQL, Cassandra replication • Federated data between cloud controllers (failover, polling, SQL master/slave promotion)
  27. 27. 27 MISSION CRITICAL: IN-MEMORY WAN REPLICATION Copyright 2014 Gigaspaces. All Rights Reserved Characteristics Design / Recipe Implementation Transportation/Logistics • Replication as a Service https://github.com/dfilppi/repl-service • Low-latency asynchronous replication across regions using in-memory replication technology (GigaSpaces XAP) • Topologies: Master-Slave, Master-Master, Hub/Soke, Ring • Reference data, HTTP session sharing
  28. 28. Resources 28
  29. 29. 29 TRY IT OUT TODAY Copyright 2014 Gigaspaces. All Rights Reserved Join the community http://www.cloudifysource.org https://github.com/CloudifySource/cloudify-recipes Try out and contribute some recipes
  30. 30. Questions? 30

×