Rightscale Webinar: Outage Proof Your Cloud Applications

Outage-Proof Your Cloud
Applications
Brian Adler, Sr. Services Architect
Roberto Monge, Cloud Solutions Engineer
RightScale

December 18, 2012

Watch the video of this webinar

2#

Your Panel Today
Presenting
• Brian Adler, Sr. Services Architect, RightScale
• Roberto Monge, Cloud Solutions Engineer, RightScale

Q&A
• Spencer Adams, Account Manager, RightScale
• Noel Cohen, Account Manager, RightScale

Please use the “Questions” window
to ask questions any time!

Cloud Management #rightscale

3#

Agenda
• High Availability and Disaster Recovery
• Terminology/Level-Setting
• Designing for Failure
• Cloud and component definitions
• HA and DR configurations
• Conclusions / Q&A


4#

Terminology

Ability of a system to Fault Tolerant The process, policies
continue operating systems are and procedures
properly (perhaps at measured by their related to restoring
a degraded level) if Availability in terms critical systems after
one or more of planned and a catastrophic event
components fails unplanned service
outages for end
users


5#

Designing for Failure

1 Large scale failures in 3 Need to balance cost
the cloud are rare but do and complexity of HA
happen efforts against risks you
are willing to bear

2 Application owners are 4 Cloud infrastructure has
ultimately responsible made DR and HA
for availability and remarkably affordable
recoverability • Multi-server
• Multi-Zone
• Multi-Region
• Multi-Cloud


6#

Cloud Isolation Definitions
Region Zone
Resources One or more Datacenter with
geographically separate power source
proximate Zones
API endpoint, control plane Shared Shared
Local Area Network Shared Shared
Clouds
Amazon Web Services Region Availability Zone
Rackspace Region
Windows Azure Region
Google Cloud Platform Region Availability Group
CloudStack Region Zone
OpenStack Zone Availability Zone


7#

Multi-Zone HA
Consider
distributed
DNS NoSQL
databases with
172.168.7.31 172.168.8.62
the same
US-EAST 1a
1
US-EAST 1b
distribution
considerations.
LOAD BALANCERS LOAD BALANCERS
Spread primary
and replica
nodes across
multiple zones.
Place as many
APP SERVERS as you need for
required
AUTOSCALE
resiliency.

MASTER DB SLAVE DB

REPLICATE

EBS
SNAPSHOTS
S3

Snapshot data volume for backups Consider local storage for additional
Place Slave databases in one
so the database can be readily slave database to remove
or more zones for failover.
recovered within the region. dependency on attached volume


8#

Multi-Region/Cloud DR Options
Availability Downtime

99.999% 0 Multi-Cloud HA
(Live/Live Config)
99.9% < 5 Mins Hot DR
(Least Common)
99.5% < 1 Hour Warm DR
(Recommended)

99% > 1 Hour Cold DR
(Most Common)

$ $$ $$$ $$$$


9#

Multi-Region Cold DR
Staged Server Configuration and generally no staged data
• Not recommended if rapid recovery is required
• Slow to replicate data to other cloud and bring database online
DNS
172.168.7.31

CHICAGO DALLAS


APP SERVERS APP SERVERS

MASTER DB SLAVE DB SLAVE DB

REPLICATE

CBS

SNAPSHOTS
CLOUD
FILES


10#

Multi-Region Warm DR
Staged Server Configuration, pre-staged data and running Slave Database Server
• Generally recommended DR solution
• Minimal additional cost and allows fairly rapid recovery
DNS
172.168.7.31

CHICAGO DALLAS




REPLICATE REPLICATE

CBS
SNAPSHOTS
SNAPSHOTS
CLOUD
FILES


11#

Multi-Region Hot DR
Parallel Deployment with all servers running but all traffic going to primary
• Not recommended
• Very high additional cost to allow rapid recovery
DNS
172.168.7.31

CHICAGO DALLAS




REPLICATE REPLICATE

CBS

SNAPSHOTS SNAPSHOTS
CLOUD
FILES


12#

Multi-Cloud HA
Live/Live configuration. Geo-target IP services to direct traffic to regional LBs.
• Possible, but not recommended (more to follow…)
• Max additional cost and max availability, but complex to implement and manage
DNS
172.168.7.31 172.168.8.62

US-EAST CHICAGO




REPLICATE REPLICATE

EBS

SNAPSHOTS SNAPSHOTS

S3 SWIFT


13#

Multi-Cloud HA
Looks similar to Multi-Zone… but additional problems to solve as some resources
are not shared
You need DNS management Security is an issue as security
or a global load balancer. DNS groups are Region-specific.
172.168.7.31 172.168.8.62

US-EAST CHICAGO

Machine Images LOAD BALANCERS LOAD BALANCERS
are specific to the
cloud/region.



REPLICATE REPLICATE

EBS VOLUME

SNAPSHOTS SNAPSHOTS

S3 SWIFT


14#

In the Dashboard

Cost
forecasting
Multi-region for DR
or cloud environment

Multi-region
Warm DR
Staged
servers


15#

Automating HA and DR
• Use dynamic DNS for your database servers
• Allow app servers to use a single FQDN.
• Use a low TTL to allow rapid failover in the case of a change in master
database
• Automatic connection of app servers to load balancing servers
• App servers can connect to all load balancers automatically at launch
• No manual intervention
• No DNS modifications
• Automated promotion of slave to master
• Process is automated
• Decision to run process is manual


16#

How RightScale makes it possible
MultiCloud Images
• MultiCloud Images can be launched across regions and clouds
without modification
ServerTemplate contains a list
1 of MultiCloud Images (MCIs)
When the Server is
2 created, a specific MCI
is chosen.
The appropriate
3 RightImage is used at
MultiCloud Images
launch.
Cloud A, B, Image 1
Cloud A C, Image 2
Cloud B, Image 1 Cloud A, B, Image 1

Cloud B
Stability across clouds
Image 1

RightImage


17#

How RightScale makes it possible
ServerTemplates, Tags, and Inputs
• Automated load balancer registration and database connections
• Autoscaling across zones
• Dynamic configuration


18#

DR Cost Comparison Example
Multi-Region Multi-Region Multi-Region
Cold DR Warm DR Hot DR
Total $4480 / month $5630 / month $8800 / month
Running $4470 / month $5540 / month $8440 / month
3 Load Balancers (Large) 3 Load Balancers (Large) 6 Load Balancers (Large)
6 App Servers (XLarge) 6 App Servers (XLarge) 12 App Servers (XLarge)
1 Master DB (2XLarge) 1 Master DB (2XLarge) 1 Master DB (2XLarge)
1 Slave DB (2XLarge) 2 Slave DB (2XLarge) 2 Slave DB (2XLarge)

Staged $0 / month $0 / month
3 Load Balancers (Large) 3 Load Balancers (Large)
6 App Servers (XLarge) 6 App Servers (Xlarge)
1 Slave DB (2XLarge)

Replication $10 / month $90 / month $360 / month
25GB / day cross-zone 25GB / day cross-region 100GB / day cross-region


19#

Most Common Observed Cloud Outages
• Outage of specific services in a zone
• Degraded performance
• E.g. EBS, ELB, RDS
• Outage of specific services in a region
• Control plane error or cascading problems
• E.g. EBS
• Outage of power or network in a zone
• No connectivity
• E.g. EC2, Azure
• Capacity availability in a region during an outage
• Not possible to provision instances, volumes, or other services


20#

Outage-Proofing Best Practices

Place in >1 Replicate data Replicate data
zone: across zones across zones
• Load balancers  Backup across Design stateless
• App servers regions & clouds apps for
• Databases  Monitoring, alert, resilience to
Maintain and automate reboot / relaunch
capacity to operations to
absorb zone or speed up
region failures failover


21#

Next Steps
• Learn: Building Scalable Applications in the Cloud Whitepaper
• http://www.rightscale.com/info_center/white-papers/building-scalable-
applications-in-the-cloud.php
• Analyze: Deployment review of your environment
• http://www.rightscale.com/about_us/contact_us.php
• Try: Free Edition
• www.rightscale.com/free

Contact RightScale
(866) 720-0208
sales@rightscale.com
www.rightscale.com

Rightscale Webinar: Outage Proof Your Cloud Applications

Recommended

Recommended

More Related Content

More from RightScale

More from RightScale (20)

Recently uploaded

Recently uploaded (20)

Rightscale Webinar: Outage Proof Your Cloud Applications

Editor's Notes