Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stacking up with OpenStack: Building for High Availability


Published on

Published in: Technology, Business
  • Be the first to comment

Stacking up with OpenStack: Building for High Availability

  1. 1. Stacking up with OpenStack:Building for High AvailabilityUtpal Thakrar, Sr. Product ManagerApril 17, 2013
  2. 2. 2#My relationship with HA 1975 Cloud Management #rightscale
  3. 3. 3#My relationship with HA 1991 Cloud Management #rightscale
  4. 4. 4#My relationship with HA 2001 How many 9-s can your product do? Cloud Management #rightscale
  5. 5. 5#So what did they mean by 5-9s? Availability Allowed Down Time each Year 99% 3.65 days 99.9% 8.76 hours 99.99% 52.56 minutes 99.999% 5.26 minutes Cloud Management #rightscale
  6. 6. 6#Stuff happens, are you prepared? Cloud Management #rightscale
  7. 7. 7#Who dunnit?… Cloud Management #rightscale
  8. 8. 8#And you see these … Cloud Management #rightscale
  9. 9. 9#Is 100% Outage-proofing possible? Cloud Management #rightscale
  10. 10. 10#Old School Fault-Tolerance: Build Two Cloud Management #rightscale
  11. 11. 11#Golden Age of Cloud Computing No Up-Front Low Cost Pay Only for Capital Expense What You Use Self-Service Easily Scale Up Improve Agility & Infrastructure and Down Time-to-Market Deploy Cloud Management #rightscale
  12. 12. 12#Golden Age for Fault-Tolerance No Up-Front HA Low Cost Pay for DR Only Capital Expense Backups When You Use it Self-Service Easily Deliver Fault- Improve Agility & DR Infrastructure Tolerant Applications Time-to-Recovery Deploy Cloud Management #rightscale
  13. 13. 13#Yeah, but …What about my private cloud?Applications deployed in private clouds have to worry about:• Private Cloud Infrastructure being HA• Application architecture HA / DR• With Public Clouds – Well, you get what your provider gives you Cloud Management #rightscale
  14. 14. 14#Private Cloud Infrastructure HASeveral single points of failure in OpenStack deployment• OpenStack API services• MySQL• RabbitMQSolved in various ways• Pacemaker cluster management• Keepalived (e.g: RAX Private Cloud)• MySQL (Galera), RabbitMQ (active-active mirrored queues) Eliminate SPoFs as best as you can. Cloud Management #rightscale
  15. 15. 15#What about my app?Design for failure:• If your application relies on Cloud infrastructure SLA for its HA needs, you are STUCK with that vendor / infrastructure• Need to balance cost and complexity against risk tolerance• Design application so that its:  Build for server failure  Build for zone failure  Build for cloud failure  Keep management layer separate from infrastructure Cloud Management #rightscale
  16. 16. 16#Build for Server Failure• Set up auto-scaling• Set up database mirroring, master/slave configuration• Use static public IPs• Use Dynamic DNS for private IPs Cloud Management #rightscale
  17. 17. 17# Build for Zone Failure Static Public IPs DNS Zone 1 Zone 2 1 LOAD BALANCERS LOAD BALANCERS Where possible, use NoSQL DB like Cassandra or MongoDB APP SERVERS AUTOSCALE MASTER DB SLAVE DB REPLICATE Block SNAPSHOTS Object storeSnapshot data volume for backups so Place Slave databases in onethe database can be readily recovered or more zones for failover. within the region. A creative deployment model would be to make your private cloud an “AZ” by placing it in close physical proximity to a public cloud provider Cloud Management #rightscale
  18. 18. 18#Build for Cloud Failure (Cold DR)Staged Server Configuration and generally no staged data $• Not recommended if rapid recovery is required• Slow to replicate data to other cloud and bring database online DNS Private DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE Block SNAPSHOTS CLOUD Cloud Management FILES #rightscale
  19. 19. 19#Build for Cloud Failure (Warm DR)Staged Server Configuration, pre-staged data and running Slave Database Server $$• Generally recommended DR solution• Minimal additional cost and allows fairly rapid recovery DNS Private DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE Block SNAPSHOTS SNAPSHOTS CLOUD Cloud Management FILES #rightscale
  20. 20. 20#Build for Cloud Failure (Hot DR)Parallel Deployment with all servers running but all traffic going to primary $$$• Not recommended• Very high additional cost to allow rapid recovery DNS Private DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE Block SNAPSHOTS SNAPSHOTS CLOUD Cloud Management FILES #rightscale
  21. 21. 21#Availability vs. Cost - Dial Cost Availability Min Min Max Max Cloud Management #rightscale
  22. 22. 22#Make sure workload is portable across clouds Cloud Management #rightscale
  23. 23. 23#Automate and test everything• Automate backups of your data• Setup monitoring and alerts• Run fire-drills! Plan and Practice your recovery procedures! Cloud Management #rightscale
  24. 24. 24#Separate Management layer from Infrastructure• Keep the keys to the car outside the car Cloud Management #rightscale
  25. 25. 25#Automating HA and DR• Use dynamic DNS for your database servers • Allow app servers to use a single FQDN. • Use a low TTL to allow rapid failover in the case of a change in master database• Automatic connection of app servers to load balancing servers • App servers can connect to all load balancers automatically at launch • No manual intervention • No DNS modifications• Automated promotion of slave to master • Process is automated • Decision to run process is manual Cloud Management #rightscale
  26. 26. Samsung SDS Mr. Kirk KimCopyright © 2013 Samsung SDS Co., Ltd. All rights reserved
  27. 27. Hybrid Cloud Network Architecture Internet traffic CF Router Public ASN: XXXX Firewall IPS VPN Gateway Compute EIP: e.x.y.b EIP: e.x.y.a VM VM Private Network VM VM VPC Virtual GW Private: 10.x.x.x/24 Private: 10.x.x.x/24 VM VM Public: *.*.*.0/24 Public: *.*.*.0/24 Internet GW 10.x.x.x/24 Object Storage SPCS Public Cloud Between SPCS and Public Cloud using public IP Between SPCS and Public Cloud using private IP Internet traffic to SPCS and Public Cloud using public IP Copyright © 2013 Samsung SDS Co., Ltd. All rights reserved27
  28. 28. 28#How RightScale makes it possibleRightScale ServerTemplates™• Reproducible: Predictable deployment• Dynamic: Configuration from scripts at boot time• Multi-cloud: Cloud agnostic and portable• Modular: Role and behavior abstracted from cloud infrastructure Cloud Management #rightscale
  29. 29. 29#How RightScale makes it possibleMultiCloud Images• MultiCloud Images can be launched across regions and clouds without modification ServerTemplate contains a list 1 of MultiCloud Images (MCIs) When the Server is 2 created, a specific MCI is chosen. The appropriate 3 RightImage is used at MultiCloud Images launch. Cloud A, B, Image 1 Cloud A C, Image 2 Cloud B, Image 1 Cloud A, B, Image 1 Cloud B Stability across clouds Image 1 RightImage Cloud Management #rightscale
  30. 30. 30#Outage-Proofing Best Practices Place in >1 Replicate data Replicate data zone: across zones across zones • Load balancers  Backup across Design stateless • App servers regions & clouds apps for • Databases  Monitoring, alert, resilience to Maintain and automate reboot / relaunch capacity to operations to absorb zone or speed up region failures failover Cloud Management #rightscale
  31. 31. 31#Thank you!Sign-up for a free account at: www.rightscale.comCheck out job postings are: We are hiring! Cloud Management #rightscale