• Like

Rightscale Webinar: Outage Proof Your Cloud Applications

  • 613 views
Uploaded on

What happens when your mission-critical application is unavailable because of a cloud outage? Do you have a disaster recovery plan? Are you prepared to fail over to an alternate cloud, or are you just …

What happens when your mission-critical application is unavailable because of a cloud outage? Do you have a disaster recovery plan? Are you prepared to fail over to an alternate cloud, or are you just crossing your fingers that you'll make it through unscathed?
The cloud isn't magic, it's a data center. And it's not "what if" the cloud goes down -- it's "when".
Join RightScale for a webinar to learn from the experts how to outage-proof your cloud applications. At RightScale, we have seen the good, the bad, and the ugly of cloud outages. Now is the time to build for failure and plan for the worst-case scenario.
In this webinar, we will:
- Level-set critical concepts: Fault tolerance, high availability and disaster recovery
- Show you how to design for failure
- Talk you through disaster recovery options that you can tailor based on your uptime requirements
- Share best practices for outage-proofing your cloud applications

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
613
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
45
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Cold DR(Most common... hours) Staged Server Configuration and generally no staged data. Bring up the servers and load the data to failover. Cold DR failover is typically manual.Warm DR(Recommended... >hour) Staged Server Configuration, pre-staged data and running Database Slave Server. Warm DR failover is typically manual but can be automated.Hot DR(Least common... but needed if <5 min) Parallel Deployment with all servers running but all traffic going to primary. Hot DR failover is normally automated.Hot HALive/Live configuration. May use Geo-target IP services to direct traffic to regional load balancers. Failover to other region if one has problems. Hot HA is normally seamlessly automated.
  • Note: Other costs such as IOPS, volumes, other bandwidth, object storage, and snapshot storage is additional

Transcript

  • 1. Outage-Proof Your CloudApplicationsBrian Adler, Sr. Services ArchitectRoberto Monge, Cloud Solutions EngineerRightScaleDecember 18, 2012 Watch the video of this webinar
  • 2. 2#Your Panel TodayPresenting• Brian Adler, Sr. Services Architect, RightScale• Roberto Monge, Cloud Solutions Engineer, RightScaleQ&A• Spencer Adams, Account Manager, RightScale• Noel Cohen, Account Manager, RightScale Please use the “Questions” window to ask questions any time! Cloud Management #rightscale
  • 3. 3#Agenda• High Availability and Disaster Recovery • Terminology/Level-Setting • Designing for Failure • Cloud and component definitions • HA and DR configurations• Conclusions / Q&A Cloud Management #rightscale
  • 4. 4#Terminology Ability of a system to Fault Tolerant The process, policies continue operating systems are and procedures properly (perhaps at measured by their related to restoring a degraded level) if Availability in terms critical systems after one or more of planned and a catastrophic event components fails unplanned service outages for end users Cloud Management #rightscale
  • 5. 5#Designing for Failure 1 Large scale failures in 3 Need to balance cost the cloud are rare but do and complexity of HA happen efforts against risks you are willing to bear 2 Application owners are 4 Cloud infrastructure has ultimately responsible made DR and HA for availability and remarkably affordable recoverability • Multi-server • Multi-Zone • Multi-Region • Multi-Cloud Cloud Management #rightscale
  • 6. 6#Cloud Isolation Definitions Region Zone Resources One or more Datacenter with geographically separate power source proximate Zones API endpoint, control plane Shared Shared Local Area Network Shared Shared Clouds Amazon Web Services Region Availability Zone Rackspace Region Windows Azure Region Google Cloud Platform Region Availability Group CloudStack Region Zone OpenStack Zone Availability Zone Cloud Management #rightscale
  • 7. 7#Multi-Zone HA Consider distributed DNS NoSQL databases with 172.168.7.31 172.168.8.62 the same US-EAST 1a 1 US-EAST 1b distribution considerations. LOAD BALANCERS LOAD BALANCERS Spread primary and replica nodes across multiple zones. Place as many APP SERVERS as you need for required AUTOSCALE resiliency. MASTER DB SLAVE DB REPLICATE EBS SNAPSHOTS S3 Snapshot data volume for backups Consider local storage for additional Place Slave databases in one so the database can be readily slave database to remove or more zones for failover. recovered within the region. dependency on attached volume Cloud Management #rightscale
  • 8. 8#Multi-Region/Cloud DR OptionsAvailability Downtime 99.999% 0 Multi-Cloud HA (Live/Live Config) 99.9% < 5 Mins Hot DR (Least Common) 99.5% < 1 Hour Warm DR (Recommended) 99% > 1 Hour Cold DR (Most Common) $ $$ $$$ $$$$ Cloud Management #rightscale
  • 9. 9#Multi-Region Cold DRStaged Server Configuration and generally no staged data• Not recommended if rapid recovery is required• Slow to replicate data to other cloud and bring database online DNS 172.168.7.31 CHICAGO DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE CBS SNAPSHOTS CLOUD FILES Cloud Management #rightscale
  • 10. 10#Multi-Region Warm DRStaged Server Configuration, pre-staged data and running Slave Database Server• Generally recommended DR solution• Minimal additional cost and allows fairly rapid recovery DNS 172.168.7.31 CHICAGO DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE CBS SNAPSHOTS SNAPSHOTS CLOUD FILES Cloud Management #rightscale
  • 11. 11#Multi-Region Hot DRParallel Deployment with all servers running but all traffic going to primary• Not recommended• Very high additional cost to allow rapid recovery DNS 172.168.7.31 CHICAGO DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE CBS SNAPSHOTS SNAPSHOTS CLOUD FILES Cloud Management #rightscale
  • 12. 12#Multi-Cloud HALive/Live configuration. Geo-target IP services to direct traffic to regional LBs.• Possible, but not recommended (more to follow…)• Max additional cost and max availability, but complex to implement and manage DNS 172.168.7.31 172.168.8.62 US-EAST CHICAGO LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE EBS SNAPSHOTS SNAPSHOTS S3 SWIFT Cloud Management #rightscale
  • 13. 13#Multi-Cloud HALooks similar to Multi-Zone… but additional problems to solve as some resourcesare not shared You need DNS management Security is an issue as security or a global load balancer. DNS groups are Region-specific. 172.168.7.31 172.168.8.62 US-EAST CHICAGO Machine Images LOAD BALANCERS LOAD BALANCERS are specific to the cloud/region. APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE EBS VOLUME SNAPSHOTS SNAPSHOTS S3 SWIFT Cloud Management #rightscale
  • 14. 14#In the Dashboard Cost forecastingMulti-region for DR or cloud environmentMulti-region Warm DR Staged servers Cloud Management #rightscale
  • 15. 15#Automating HA and DR• Use dynamic DNS for your database servers • Allow app servers to use a single FQDN. • Use a low TTL to allow rapid failover in the case of a change in master database• Automatic connection of app servers to load balancing servers • App servers can connect to all load balancers automatically at launch • No manual intervention • No DNS modifications• Automated promotion of slave to master • Process is automated • Decision to run process is manual Cloud Management #rightscale
  • 16. 16#How RightScale makes it possibleMultiCloud Images• MultiCloud Images can be launched across regions and clouds without modification ServerTemplate contains a list 1 of MultiCloud Images (MCIs) When the Server is 2 created, a specific MCI is chosen. The appropriate 3 RightImage is used at MultiCloud Images launch. Cloud A, B, Image 1 Cloud A C, Image 2 Cloud B, Image 1 Cloud A, B, Image 1 Cloud B Stability across clouds Image 1 RightImage Cloud Management #rightscale
  • 17. 17#How RightScale makes it possibleServerTemplates, Tags, and Inputs• Automated load balancer registration and database connections• Autoscaling across zones• Dynamic configuration Cloud Management #rightscale
  • 18. 18#DR Cost Comparison Example Multi-Region Multi-Region Multi-Region Cold DR Warm DR Hot DRTotal $4480 / month $5630 / month $8800 / monthRunning $4470 / month $5540 / month $8440 / month 3 Load Balancers (Large) 3 Load Balancers (Large) 6 Load Balancers (Large) 6 App Servers (XLarge) 6 App Servers (XLarge) 12 App Servers (XLarge) 1 Master DB (2XLarge) 1 Master DB (2XLarge) 1 Master DB (2XLarge) 1 Slave DB (2XLarge) 2 Slave DB (2XLarge) 2 Slave DB (2XLarge)Staged $0 / month $0 / month 3 Load Balancers (Large) 3 Load Balancers (Large) 6 App Servers (XLarge) 6 App Servers (Xlarge) 1 Slave DB (2XLarge)Replication $10 / month $90 / month $360 / month 25GB / day cross-zone 25GB / day cross-region 100GB / day cross-region Cloud Management #rightscale
  • 19. 19#Most Common Observed Cloud Outages• Outage of specific services in a zone • Degraded performance • E.g. EBS, ELB, RDS• Outage of specific services in a region • Control plane error or cascading problems • E.g. EBS• Outage of power or network in a zone • No connectivity • E.g. EC2, Azure• Capacity availability in a region during an outage • Not possible to provision instances, volumes, or other services Cloud Management #rightscale
  • 20. 20#Outage-Proofing Best Practices Place in >1 Replicate data Replicate data zone: across zones across zones • Load balancers  Backup across Design stateless • App servers regions & clouds apps for • Databases  Monitoring, alert, resilience to Maintain and automate reboot / relaunch capacity to operations to absorb zone or speed up region failures failover Cloud Management #rightscale
  • 21. 21#Next Steps• Learn: Building Scalable Applications in the Cloud Whitepaper • http://www.rightscale.com/info_center/white-papers/building-scalable- applications-in-the-cloud.php• Analyze: Deployment review of your environment • http://www.rightscale.com/about_us/contact_us.php• Try: Free Edition • www.rightscale.com/free Contact RightScale (866) 720-0208 sales@rightscale.com www.rightscale.com Cloud Management #rightscale