Best Practices for                Surviving Outages                Designing and implementing a High Availability         ...
Disaster Recovery       Photo credit: naturaldisasterss.com/wp-content/uploads/2011/12/Natural-Disaster-Images.jpg   2
Tiers of Disaster Recovery            0 - No off-site data            1 - Data backup with no hot site            2 - Data...
Definition: High Availability    “Design approach & associated service    implementation that ensures a pre-    arranged l...
High Availability Architecture                                 5
Why implement HA?                    6
Best Practices for High Availability        Environment                                    Validate         Analysis      ...
Application Considerations• Environment Specific Configurations• Asset Hosting• Page Caching• Other Data Stores• Backgroun...
Failover Process at Engine YardManual, customer owned decision1. Client contacted per   terms of SLA2. Engine Yard syncs  ...
Questions?             10
Get in touchContact us:Sal Cardello, Director of Pro Servicesproservices@engineyard.comLearn more:http://www.engineyard.co...
Upcoming SlideShare
Loading in …5
×

Best Practices for Surviving Outages

1,329 views

Published on

Site disruptions happen, often when you least expect. When your business depends on application uptime or access to critical data, a strategy for high availability (HA) and disaster recovery (DR) is essential. Carefully considering how to architect and successfully implement an HA and DR strategy helps ensure that you minimize risk, strengthen fault tolerance, and rapidly re-deploy your application and data in case of a disruption.

This presentation walks through an overview of HA and DR, and offers some best practices from the Engine Yard team.

The full on-demand webcast can be viewed here: http://pages.engineyard.com/BestPracticesforSurvivingOutagesWebcast.html

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,329
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Introduction roles and titlesMelissaSalAvrohomMatt
  • What is Disaster Recovery?The process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induceddisaster.
  • Seven Tiers to Disaster Recovery0: No off-site data – Possibly no recovery 1: Data backup with no hot site 2: Data backup with a hot site 3: Electronic vaulting 4: Point-in-time copies 5: Transaction integrity 6: Zero or near-Zero data loss 7: Highly automated, business integrated solution
  • High Availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.Sal to explain, Matt to cover diagram.
  • Avrohom to talk about complexityWhy should we implement a H/A environment.Revenue lossMore consistent up timeHigher client satisfactionBetter level of protection for critical systemsInsuranceThings to know up front about implementing a H/A environmentCostAdditional Complexity
  • AvrohomImplementation for High Availability systemNeeds Assessment H/A is implemented using geo-redundant systemsDatabases are kept in sync using replicationAssets are ideally stored on a storage system such as Amazon S3 but can be kept in sync using rsyncFile system synchronized between locationsCode is deployed to both systemsStack changes applied to both systemsCreate escalation flow chartBring up Secondary Site.One week test cycleFailover testLive
  • EnvConfigs: Stored as template in Chef Stored on filesystem and symlinked on deploy with CapistranoAsset hosting: Assets must be synced if stored locally Adds complexity and strain on resourcesPage caching: Sync page cache to prevent higher response time as cache warmsOther data stores: Dump and sync data at select intervals and during failoverBackground: Wait for jobs to finish when failing over consider where jobs are storedCron jobs:Use a gem such as whenever to automate cron jobs
  • Decision to failover is mutualNo automatic failoverDBA is brought in to perform manual failoverClient uptime needs are designated in client flow chartDBA promotes redundant database to masterDNS is updatedRe-establish replication to former master once back onlineDBA is brought in to check the state of the database and perform manual failoverClient uptime needs are designated in client flow chartAfter the decision to failover is made, a DBA promotes the redundant database to masterAfter a quick test of the redundant system, DNS is updatedLow TTL should be setDNS load balancing such as DynECT Managed DNS can be used to minimize downtime during IP switchRe-establish: When the former master environment is back online, configure the former master database as a read only slave
  • Best Practices for Surviving Outages

    1. 1. Best Practices for Surviving Outages Designing and implementing a High Availability and Disaster Recovery strategySal Cardello, Matt Dolian, Avroham Katz,Director of System Engineer System EngineerPro Services
    2. 2. Disaster Recovery Photo credit: naturaldisasterss.com/wp-content/uploads/2011/12/Natural-Disaster-Images.jpg 2
    3. 3. Tiers of Disaster Recovery 0 - No off-site data 1 - Data backup with no hot site 2 - Data backup with hot site 3 - Electronic vaulting 4 - Point-in-time copies 5 - Transaction integrity 6 - Zero or near-Zero data loss 7 - Highly automated, business integrated solution Citation: http://en.wikipedia.org/wiki/Seven_tiers_of_disaster_recovery 3
    4. 4. Definition: High Availability “Design approach & associated service implementation that ensures a pre- arranged level of operational performance will be met during a contractual measurement period” Citation: ttp://en.wikipedia.org/wiki/High_availability 4
    5. 5. High Availability Architecture 5
    6. 6. Why implement HA? 6
    7. 7. Best Practices for High Availability Environment Validate Analysis Synchronization Geographic Escalation Plan Mirroring Database Replication Test Store Assets Launch Replication Photo Credit: http://bit.ly/z9OEwG 7
    8. 8. Application Considerations• Environment Specific Configurations• Asset Hosting• Page Caching• Other Data Stores• Background Processing• Cron Jobs Photo credit: http://www.flickr.com/photos/dseneste/5912382808/ 8
    9. 9. Failover Process at Engine YardManual, customer owned decision1. Client contacted per terms of SLA2. Engine Yard syncs database and performs manual failover3. Redundant database promoted to master4. DNS is updated5. Replication to former master is re-established 9
    10. 10. Questions? 10
    11. 11. Get in touchContact us:Sal Cardello, Director of Pro Servicesproservices@engineyard.comLearn more:http://www.engineyard.com/services 11

    ×