Outage-Proof Your CloudApplicationsBrian Adler, Sr. Services ArchitectRoberto Monge, Cloud Solutions EngineerRightScaleDec...
2#Your Panel TodayPresenting• Brian Adler, Sr. Services Architect, RightScale• Roberto Monge, Cloud Solutions Engineer, Ri...
3#Agenda• High Availability and Disaster Recovery   •   Terminology/Level-Setting   •   Designing for Failure   •   Cloud ...
4#Terminology  Ability of a system to     Fault Tolerant          The process, policies  continue operating         system...
5#Designing for Failure  1   Large scale failures in     3   Need to balance cost      the cloud are rare but do       and...
6#Cloud Isolation Definitions                                Region               Zone  Resources                     One ...
7#Multi-Zone HA                                                                                                           ...
8#Multi-Region/Cloud DR OptionsAvailability Downtime  99.999%            0                                      Multi-Clou...
9#Multi-Region Cold DRStaged Server Configuration and generally no staged data• Not recommended if rapid recovery is requi...
10#Multi-Region Warm DRStaged Server Configuration, pre-staged data and running Slave Database Server• Generally recommend...
11#Multi-Region Hot DRParallel Deployment with all servers running but all traffic going to primary• Not recommended• Very...
12#Multi-Cloud HALive/Live configuration. Geo-target IP services to direct traffic to regional LBs.• Possible, but not rec...
13#Multi-Cloud HALooks similar to Multi-Zone… but additional problems to solve as some resourcesare not shared            ...
14#In the Dashboard                                         Cost                                     forecastingMulti-regi...
15#Automating HA and DR• Use dynamic DNS for your database servers   • Allow app servers to use a single FQDN.   • Use a l...
16#How RightScale makes it possibleMultiCloud Images• MultiCloud Images can be launched across regions and clouds  without...
17#How RightScale makes it possibleServerTemplates, Tags, and Inputs• Automated load balancer registration and database co...
18#DR Cost Comparison Example                      Multi-Region               Multi-Region               Multi-Region     ...
19#Most Common Observed Cloud Outages• Outage of specific services in a zone   • Degraded performance   • E.g. EBS, ELB, R...
20#Outage-Proofing Best Practices  Place in >1          Replicate data        Replicate data   zone:                  a...
21#Next Steps• Learn: Building Scalable Applications in the Cloud Whitepaper   • http://www.rightscale.com/info_center/whi...
Upcoming SlideShare
Loading in …5
×

Rightscale Webinar: Outage Proof Your Cloud Applications

1,006 views

Published on

What happens when your mission-critical application is unavailable because of a cloud outage? Do you have a disaster recovery plan? Are you prepared to fail over to an alternate cloud, or are you just crossing your fingers that you'll make it through unscathed?
The cloud isn't magic, it's a data center. And it's not "what if" the cloud goes down -- it's "when".
Join RightScale for a webinar to learn from the experts how to outage-proof your cloud applications. At RightScale, we have seen the good, the bad, and the ugly of cloud outages. Now is the time to build for failure and plan for the worst-case scenario.
In this webinar, we will:
- Level-set critical concepts: Fault tolerance, high availability and disaster recovery
- Show you how to design for failure
- Talk you through disaster recovery options that you can tailor based on your uptime requirements
- Share best practices for outage-proofing your cloud applications

Published in: Technology
  • Be the first to comment

Rightscale Webinar: Outage Proof Your Cloud Applications

  1. 1. Outage-Proof Your CloudApplicationsBrian Adler, Sr. Services ArchitectRoberto Monge, Cloud Solutions EngineerRightScaleDecember 18, 2012 Watch the video of this webinar
  2. 2. 2#Your Panel TodayPresenting• Brian Adler, Sr. Services Architect, RightScale• Roberto Monge, Cloud Solutions Engineer, RightScaleQ&A• Spencer Adams, Account Manager, RightScale• Noel Cohen, Account Manager, RightScale Please use the “Questions” window to ask questions any time! Cloud Management #rightscale
  3. 3. 3#Agenda• High Availability and Disaster Recovery • Terminology/Level-Setting • Designing for Failure • Cloud and component definitions • HA and DR configurations• Conclusions / Q&A Cloud Management #rightscale
  4. 4. 4#Terminology Ability of a system to Fault Tolerant The process, policies continue operating systems are and procedures properly (perhaps at measured by their related to restoring a degraded level) if Availability in terms critical systems after one or more of planned and a catastrophic event components fails unplanned service outages for end users Cloud Management #rightscale
  5. 5. 5#Designing for Failure 1 Large scale failures in 3 Need to balance cost the cloud are rare but do and complexity of HA happen efforts against risks you are willing to bear 2 Application owners are 4 Cloud infrastructure has ultimately responsible made DR and HA for availability and remarkably affordable recoverability • Multi-server • Multi-Zone • Multi-Region • Multi-Cloud Cloud Management #rightscale
  6. 6. 6#Cloud Isolation Definitions Region Zone Resources One or more Datacenter with geographically separate power source proximate Zones API endpoint, control plane Shared Shared Local Area Network Shared Shared Clouds Amazon Web Services Region Availability Zone Rackspace Region Windows Azure Region Google Cloud Platform Region Availability Group CloudStack Region Zone OpenStack Zone Availability Zone Cloud Management #rightscale
  7. 7. 7#Multi-Zone HA Consider distributed DNS NoSQL databases with 172.168.7.31 172.168.8.62 the same US-EAST 1a 1 US-EAST 1b distribution considerations. LOAD BALANCERS LOAD BALANCERS Spread primary and replica nodes across multiple zones. Place as many APP SERVERS as you need for required AUTOSCALE resiliency. MASTER DB SLAVE DB REPLICATE EBS SNAPSHOTS S3 Snapshot data volume for backups Consider local storage for additional Place Slave databases in one so the database can be readily slave database to remove or more zones for failover. recovered within the region. dependency on attached volume Cloud Management #rightscale
  8. 8. 8#Multi-Region/Cloud DR OptionsAvailability Downtime 99.999% 0 Multi-Cloud HA (Live/Live Config) 99.9% < 5 Mins Hot DR (Least Common) 99.5% < 1 Hour Warm DR (Recommended) 99% > 1 Hour Cold DR (Most Common) $ $$ $$$ $$$$ Cloud Management #rightscale
  9. 9. 9#Multi-Region Cold DRStaged Server Configuration and generally no staged data• Not recommended if rapid recovery is required• Slow to replicate data to other cloud and bring database online DNS 172.168.7.31 CHICAGO DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE CBS SNAPSHOTS CLOUD FILES Cloud Management #rightscale
  10. 10. 10#Multi-Region Warm DRStaged Server Configuration, pre-staged data and running Slave Database Server• Generally recommended DR solution• Minimal additional cost and allows fairly rapid recovery DNS 172.168.7.31 CHICAGO DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE CBS SNAPSHOTS SNAPSHOTS CLOUD FILES Cloud Management #rightscale
  11. 11. 11#Multi-Region Hot DRParallel Deployment with all servers running but all traffic going to primary• Not recommended• Very high additional cost to allow rapid recovery DNS 172.168.7.31 CHICAGO DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE CBS SNAPSHOTS SNAPSHOTS CLOUD FILES Cloud Management #rightscale
  12. 12. 12#Multi-Cloud HALive/Live configuration. Geo-target IP services to direct traffic to regional LBs.• Possible, but not recommended (more to follow…)• Max additional cost and max availability, but complex to implement and manage DNS 172.168.7.31 172.168.8.62 US-EAST CHICAGO LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE EBS SNAPSHOTS SNAPSHOTS S3 SWIFT Cloud Management #rightscale
  13. 13. 13#Multi-Cloud HALooks similar to Multi-Zone… but additional problems to solve as some resourcesare not shared You need DNS management Security is an issue as security or a global load balancer. DNS groups are Region-specific. 172.168.7.31 172.168.8.62 US-EAST CHICAGO Machine Images LOAD BALANCERS LOAD BALANCERS are specific to the cloud/region. APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE EBS VOLUME SNAPSHOTS SNAPSHOTS S3 SWIFT Cloud Management #rightscale
  14. 14. 14#In the Dashboard Cost forecastingMulti-region for DR or cloud environmentMulti-region Warm DR Staged servers Cloud Management #rightscale
  15. 15. 15#Automating HA and DR• Use dynamic DNS for your database servers • Allow app servers to use a single FQDN. • Use a low TTL to allow rapid failover in the case of a change in master database• Automatic connection of app servers to load balancing servers • App servers can connect to all load balancers automatically at launch • No manual intervention • No DNS modifications• Automated promotion of slave to master • Process is automated • Decision to run process is manual Cloud Management #rightscale
  16. 16. 16#How RightScale makes it possibleMultiCloud Images• MultiCloud Images can be launched across regions and clouds without modification ServerTemplate contains a list 1 of MultiCloud Images (MCIs) When the Server is 2 created, a specific MCI is chosen. The appropriate 3 RightImage is used at MultiCloud Images launch. Cloud A, B, Image 1 Cloud A C, Image 2 Cloud B, Image 1 Cloud A, B, Image 1 Cloud B Stability across clouds Image 1 RightImage Cloud Management #rightscale
  17. 17. 17#How RightScale makes it possibleServerTemplates, Tags, and Inputs• Automated load balancer registration and database connections• Autoscaling across zones• Dynamic configuration Cloud Management #rightscale
  18. 18. 18#DR Cost Comparison Example Multi-Region Multi-Region Multi-Region Cold DR Warm DR Hot DRTotal $4480 / month $5630 / month $8800 / monthRunning $4470 / month $5540 / month $8440 / month 3 Load Balancers (Large) 3 Load Balancers (Large) 6 Load Balancers (Large) 6 App Servers (XLarge) 6 App Servers (XLarge) 12 App Servers (XLarge) 1 Master DB (2XLarge) 1 Master DB (2XLarge) 1 Master DB (2XLarge) 1 Slave DB (2XLarge) 2 Slave DB (2XLarge) 2 Slave DB (2XLarge)Staged $0 / month $0 / month 3 Load Balancers (Large) 3 Load Balancers (Large) 6 App Servers (XLarge) 6 App Servers (Xlarge) 1 Slave DB (2XLarge)Replication $10 / month $90 / month $360 / month 25GB / day cross-zone 25GB / day cross-region 100GB / day cross-region Cloud Management #rightscale
  19. 19. 19#Most Common Observed Cloud Outages• Outage of specific services in a zone • Degraded performance • E.g. EBS, ELB, RDS• Outage of specific services in a region • Control plane error or cascading problems • E.g. EBS• Outage of power or network in a zone • No connectivity • E.g. EC2, Azure• Capacity availability in a region during an outage • Not possible to provision instances, volumes, or other services Cloud Management #rightscale
  20. 20. 20#Outage-Proofing Best Practices Place in >1 Replicate data Replicate data zone: across zones across zones • Load balancers  Backup across Design stateless • App servers regions & clouds apps for • Databases  Monitoring, alert, resilience to Maintain and automate reboot / relaunch capacity to operations to absorb zone or speed up region failures failover Cloud Management #rightscale
  21. 21. 21#Next Steps• Learn: Building Scalable Applications in the Cloud Whitepaper • http://www.rightscale.com/info_center/white-papers/building-scalable- applications-in-the-cloud.php• Analyze: Deployment review of your environment • http://www.rightscale.com/about_us/contact_us.php• Try: Free Edition • www.rightscale.com/free Contact RightScale (866) 720-0208 sales@rightscale.com www.rightscale.com Cloud Management #rightscale

×