4. LinkAggregation
Storage Multi-pathing
VM Live Migration
Multi-tier Apps
VM Backup/Snapshots
Multi-site Redundancy
Chaos Monkey
Ephemeral Resources
Traditional Workload Distributed Cloud-era Workload
Expect failure. Design app for failure. Self-
service failure handling
Think AmazonWeb Services
Expect reliability. Back-up entire cloud.
Admin controlled failure handling
Think ServerVirtualization 1.0
Workload reliability drives unique requirements
5. CloudStack
CloudStack Supports bothWorkloads
Software Defined Networks
(e.g., Security Groups, EIP, ELB,...)
Cloud-EraAvailability Zone
Server
Racks
Server
Racks
Server
Racks
Server
Racks
Server
Racks
Server
Racks
Server
Racks
Server
Racks
Elastic Block Storage
vCenter
ESXi
Cluster
ESXi
Cluster
ESXi
Cluster
Enterprise Networking (e.g.,VLAN)
Enterprise Storage (e.g., SAN)
TraditionalAvailability Zone
6. Problem Statement
• How to measure business continuance
• High availability
• Fault tolerance
• Disaster Recovery
• How to build application for clouds
• How does Apache CloudStack enable highly available and fault
tolerant applications
7. Solution set
• Rich set of features in CloudStack to build highly available and fault
tolerant applications
• VM high availability
• Snapshotting VM and Volumes
• Automated snapshotting and backup
• Anti-affinity and user dispersing planners
• Auto-Scaling
• VM health checks
• Load balancing
• Global Server Load balancing
VM operations
HA & FT with loadbalancing
8. Distribute applications geographically
• Regions
• Availability zones
• Object store
Masking instance failures
• Elastic IP address
• Portable IP address
Network service SPOF
• Redundant virtual Router
11. Users
West-Zone 1 Data Center
West-Zone-2 Data Center
GSLB
Object Store
Enablers for DR and Multi-Site Redundancy
Portable IP
Portable IP
12. Availability
Zone
Availability
Zone
Availability
Zone
Object Storage
Cloud-Era Cloud
CloudStack
Mgmt. Server • Workloads are distributed across
availability zones
• No guarantee on zone reliability
• Applications designed to handle node
level failure
• DBs and Templates snapped to object
store.
• In event of failure, images are
recreated on new availability zone.
• Dramatically less expensive
Designing a zone for an Cloud workload
13. • User acquires Portable IP to communicate
externally as well as protect against (zone)
failures
• Routing element sends a RHI (route health
injection) to the upstream router to inject a
route to PortableIP via OSPF or BGP
• Incoming traffic is directed to LB with
PorableIP
• On Zone failure, PortableIP can be (via API
or UI) transferred to another zone
• Routing in the new zone sends a RHI to
the upstream router
• Traffic is directed to new LB with Portable
IP
abc.xloud.xytelco.com
CloudStack
Region-West
West-Zone 1
Core RouterXYZTelco
VM2 VM1
Private IP
10.1.1.12
LB1
PortableIP Across Zones
L3 Router
West-Zone 2
LB2
L3 Router
X
Portable IP
8.1.1.11
OSPF / BGP
8.1.1.11 LB1 Cost:
1
VM3 VM4
Private IP
10.1.2.12
19. Public DNS
TENANT-A.cloud.xyztelco.com TENANT-B.cloud.xyztelco.com
CloudStack
Region-West
West-Zone 1 datacenter West-Zone 2 datacenter
Private DNS
cloud.xyztelco.com
<-TENANTB.cloud.xyztelco.com
ADNS LB1 or ADNS LB2
XYZTelco
cloud
DR as a Service with NetScaler
Object Store
Tenant A network
VM1 VM2
Tenant B network
VM3 VM4
Tenant A network
VM5 VM6
Tenant B network
VM7 VM8
MEP
NetScaler
ADNS LB1
TENANTA.cloud.xyztelco.com
ADNS LB1 or ADNS LB2
NetScaler
ADNS LB2