Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Avoiding Cloud Outage

Building cross-region and cross could high availability into your app, a real life use case by Gigaspaces, Nati Shalom, Funder & CTO, Gigaspaces
Achieving high levels of availability and disaster recovery in a cloud environment requires the implementation of patterns and practices that introduce redundancy through multi-zone, multi-region, and multi-cloud deployments. As we move towards implementing higher availability, we cannot escape the direct increase in the accidental complexity of the deployment architecture resulting from lack of cloud portability and deployment lifecycle automation. We present how high availability and disaster recovery were achieved in reality by using the Cloudify open source framework on top of AWS. This approach applies to not just AWS but also other public clouds and private cloud environments such as Eucalyptus. The resulting reference architecture provides portable PostgreSQL replication and disaster recovery as well as application tier scalability across zones, regions, and public/private clouds through a unified deployment workflow.

  • Login to see the comments

  • Be the first to like this

Avoiding Cloud Outage

  1. 1. Protect your app from OutagesNati Shalom CTO GigaSpaces@natishalomMay 2013
  2. 2.  AWS and outages Outage impact Disaster Recovery – it’s all about redundancy! Cloudify as a solution for redundancy Demo with Cloudify on EC2® Copyright 2013 GigaSpaces Ltd. All Rights Reserved2AGENDA
  3. 3. 3AWS USAGE• AWS – around 0.5M servers• Facebook – less than 0.1M servers• Google – around 1M servers
  5. 5. OUTAGE – APRIL 21, 2011® Copyright 2012 GigaSpaces Ltd. All Rights Reserved5
  6. 6. OUTAGE - JUNE 29, 2012® Copyright 2012 GigaSpaces Ltd. All Rights Reserved6
  7. 7. OUTAGE - OCTOBER 22, 2012® Copyright 2012 GigaSpaces Ltd. All Rights Reserved7
  8. 8. OUTAGE - CHRISTMAS EVE 2012® Copyright 2012 GigaSpaces Ltd. All Rights Reserved8
  9. 9. NOT ONLY AMAZON® Copyright 2012 GigaSpaces Ltd. All Rights Reserved9 28 December 2012 - some owners ofMicrosofts XBox 360 gaming console wereunable to access some of their cloud-basedstorage files. 26 July 2012 - Service for Microsoft’sWindows Azure Europe region went down formore than two hours 29 February 2012 - The ultimate result wasservice impacts of 8-10 hours for users ofAzure data centers in Dublin, Ireland, Chicago,and San Antonio.
  10. 10. 10THAT’S WHAT YOU EXPECT?99% - 3.65 days downtime99.9% - 8.76 hours downtime99.99% - 53 minutes downtime99.999% - 5.26 minutes downtime
  11. 11. ® Copyright 2012 GigaSpaces Ltd. All Rights Reserved11OUTAGE IMPACT – DESIGN FOR FAILURESOutage could cost…$89K per hour for Amadeus$225K per hour for PayPal!
  13. 13. 13MULTI CLOUD
  14. 14. 14PREPARE FOR DISASTER RECOVERY•Dedicated expert for DR architecture•Define target recovery time & point•Assume every tier can fail•Use monitoring and alerts•Document your operational processes
  15. 15. 15CHAOS MONKEY
  16. 16. 16
  18. 18. 18CLONE YOUR DATA
  19. 19. 19
  20. 20. Leverage Existing Automation FrameworksConfiguration Centric APP Centric (PaaS)
  22. 22. BUILT IN SUPPORT FOR MANAGING DATA IN THE CLOUDReal Time Relational DBClustersNoSQL Clusters HadoopStorm MySQL MongoDB Hadoop (Hive,Pig,..)Elastic Caching XAP Postgress Cassandra ZooKeeperCouchbaseElasticSearch
  23. 23. 23
  24. 24. VERIFI (CURRENT) DEPLOYMENT ARCHITECTURE24Availability region (US-West: Oregon)Data VolumeInternet EC2 Instancemod_clusterEC2 InstanceJBossData VolumeEC2 InstanceEC2 InstancePostgresSQLCassandra4 recipes
  25. 25. TARGET ARCHITECTUREAvailability Region (US-West Oregon)Data VolumeInternet EC2 Instancemod_clusterEC2 InstanceJBossData VolumePostgres MasterEC2 InstanceEC2 InstanceCassandraAvailability Region (US-East Virginia)Data VolumeEC2 Instancemod_clusterEC2 InstanceJBossData VolumePostgres SlaveEC2 InstanceEC2 InstanceCassandrareplicationBootstrap two EC2 clouds in different regions, install the “verifi” application on each. The second cloud will have a slightly modified(extended) postgres recipe for acting as a slave + no running app servers. Upon the primary zone failure, the second cloud will spin upinstances of the app servers and turn the data instance into master, then bootstrapping another “slave” cloud in another zone.
  26. 26. FAILOVER SCENARIO26Region (US-West Oregon)App ServersPostgresSQLRegion (US-East Virginia)PostgresSQLCloud #1 Cloud #2Region (US-East Virginia )PostgresSQLCloud #1 Cloud #2App ServersRegion (US-West California)PostgresSQLCloud #3Region failureoccursBootstrap another cloud ina different region using thesame application recipeused to bootstrap cloud #2above*Liveness pollLiveness pollUpon initial deployment, the primary deploymentof the application will be bootstrapped onto cloud#1, another slightly modified application recipewill be bootstrapped as cloud #2, polling cloud #1for failure, and acting as a PostgresSQL db slave.Turn Postgres slave intomaster, Start app serverinstances*
  27. 27. Copyright 2012 Gigaspaces. All Rights Reserved27NEXT STEPSAcross clouds(AWS, Rackspace, Azure…etc)Across AWS regionsAcross AWS zones1 application+ overridesSeveral clouddrivers1 application+ overrides1 cloud driver1 application +overrides1 cloud driverAvailabilitySupported byVerifi phase #1
  28. 28. Copyright 2012 Gigaspaces. All Rights Reserved28EVOLUTION PATHAvailabilityComplexity Multicloud/providerMultiregionMultizoneMultiinstanceMulticloud/providerMultiregionMultizoneMultiinstance
  29. 29.  AWS and outages Outage impact Disaster Recovery – it’s all about redundancy! Cloning your environment – app stack Cloning your DB – Replication Cloudify as a solution for Redundancy Use recipes to work on any cloud Fast and customized data replication Demo with Cloudify on EC2® Copyright 2013 GigaSpaces Ltd. All Rights Reserved29SUMMARY
  30. 30. Thank You!@natishalom® Copyright 2013 GigaSpaces Ltd. All Rights Reserved30QUESTIONS & ANSWERS