SlideShare a Scribd company logo
DR Strategies with CM
Mandi Walls
CfgMgmtCamp
3 FEB 2014

Monday, February 3, 14
whoami
• Mandi Walls
• Technical Practice Manager, CHEF
• mandi@getchef.com
• @lnxchk

Monday, February 3, 14
What is Disaster Recovery

http://www.flickr.com/photos/61617934@N03/6196510705/sizes/z/in/photostream/
Monday, February 3, 14
Reasons to Make DR Plans
• Your business insurance requires it
• Things are going to happen, whether you are ready or not

Monday, February 3, 14
Tornado Events in Loudoun County, VA

http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map

Monday, February 3, 14
Tornado Events in Loudoun County, VA
September 17,
2004 3:55 pm

http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map

Monday, February 3, 14
Tornado Events in Loudoun County, VA
September 17,
2004 3:55 pm

http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map

Monday, February 3, 14
Tornado Events in Loudoun County, VA
September 17,
2004 3:55 pm

http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map

Monday, February 3, 14
Tornado Events in Loudoun County, VA
September 17,
2004 3:55 pm

Everybody Else

http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map

Monday, February 3, 14
Hurricane Sandy, NYC, October 2012

Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
Hurricane Sandy, NYC, October 2012

33 Whitehall

Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
Hurricane Sandy, NYC, October 2012

60 Hudson

33 Whitehall

Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
Hurricane Sandy, NYC, October 2012
375 Pearl
60 Hudson

33 Whitehall

Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
Hurricane Sandy, NYC, October 2012
375 Pearl
60 Hudson
65 Broadway

33 Whitehall

Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
Hurricane Sandy, NYC, October 2012
375 Pearl
60 Hudson
65 Broadway

33 Whitehall

25 Broadway

Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
Hurricane Sandy, NYC, October 2012
111 8th
60 Hudson
65 Broadway

375 Pearl
33 Whitehall

25 Broadway

Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
Hurricane Sandy, NYC, October 2012
111 8th
60 Hudson
65 Broadway
25 Broadway

375 Pearl
33 Whitehall
75 Broad

Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
Hurricane Sandy, NYC, October 2012
111 8th
121 Varick
60 Hudson
65 Broadway
25 Broadway

375 Pearl
33 Whitehall
75 Broad

Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
Hurricane Sandy, NYC, October 2012
111 8th
121 Varick
60 Hudson
65 Broadway
25 Broadway

375 Pearl
33 Whitehall
75 Broad

My Apartment
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
Hurricane Sandy, NYC, October 2012
111 8th
121 Varick
60 Hudson
65 Broadway
25 Broadway

Bitches in BPC with newer infrastructure

375 Pearl
33 Whitehall
75 Broad

My Apartment
Photo: Iwan Baan and New York Magazine

Monday, February 3, 14
Current State of DR
• Event horizon for modern DR was 9/11
• Same neighborhood as Hurricane Sandy

• Most of the literature reflects the state of IT at that time

Monday, February 3, 14
Goals of DR Planning
• Name staff and services that are key to business continuity
• Provide clear guidance for making decisions in real time
• Set rules for escalation, communication, participation
• Document all of these things, publish the results, keep them updated
on a regular basis

Monday, February 3, 14
Advantages of CM when Planning DR
• Topology and service definition
• Settings and relationships
• Documentation
• Tooling and workflows

Monday, February 3, 14
Old Rules that Still Apply
• Accessible off site backups, with periodically tested restores
• Documentation should also be available if your normal services are
not
• Documents need to be updated on a regular schedule, and personnel
should be trained on their potential roles

Monday, February 3, 14
New Rules

http://www.flickr.com/photos/26058810@N02/5650149188/sizes/z/in/photostream/
Monday, February 3, 14
Rule 1: Your availability is your responsibility
• Cloud / managed hosting allows us to outsource a number of worries
• Bandwidth, power, cooling

• That’s awesome, but does your vendor care as much about your
customers or users as you do?
• You must assess your tolerance for risk vs cost
• No longer entirely dependent on getting budget for full scale “DR sites”

Monday, February 3, 14
Rule 1: To the Cloud!
• Justifying DR planning is much easier without justifying massive
quantities of capital for emergency capacity
• If your applications are not tightly coupled to custom services by your
IaaS provider, your flexibility in outage events is increased
• Commonly missed items include
• Keeping passwords in a single location that may be inaccessible in outages
• Not having the most correct information about operating systems or server
capacities that will be needed, and how to translate among providers
• Not engaging with security and network teams to ensure all access is ok

Monday, February 3, 14
Knife Plugins
$ knife rackspace server create (options)
$ knife linode server create (options)
$ knife ec2 server create (options)

Monday, February 3, 14
Rule 2: Assessing realistic risk
• Do not bikeshed all possible events along all
potential space-time continua
• Assess risk based on affected services

http://badassoftheweek.com/godzilla.html
Monday, February 3, 14
Rule 2: Planning for the Extent of an Event
• Service level
• Datacenter level
• Regional level
• National level

Monday, February 3, 14
Service-Level and Datacenter-Level Events
• These are the easiest to deal with when you’re using CM!
• If your infrastructure is in code, move services to new blades of grass
by redeploying

Monday, February 3, 14
Spiceweasel
• https://github.com/mattray/spiceweasel
• Define groups of infrastructure in Ruby, JSON, or YAML
• Spiceweasel will translate into knife commands to recreate the
running infrastructure

Monday, February 3, 14
Spiceweasel
nodes:
- serverA:
run_list: role[base]
options: -i ~/.ssh/mray.pem -x user --sudo
- serverB serverC:
run_list: role[base]
options: -i ~/.ssh/mray.pem -x user --sudo -E production
- windows_winrm winboxA:
run_list: role[base],role[iisserver]
options: -x Administrator -P 'super_secret_password'
- windows_ssh winboxB winboxC:
run_list: role[base],role[iisserver]
options: -x Administrator -P 'super_secret_password'
Monday, February 3, 14
Regional Events
• Storms, volcanoes, large telecom cuts, worker strikes, etc
• When regional civil infrastructure is affected
• May provide more warning - hurricanes may take several days to form
• Your staff may be without power or the ability to be physically present
in your office or datacenter
• Prioritization of services, training of backup staff

Monday, February 3, 14
National Events
• Political unrest
• Other large natural disasters
• Decide if you even need a strategy for these cases
• If your service is down, but all of your customers are also offline, does it make
sense to pursue an extensive plan?

Monday, February 3, 14
Kind of a Bummer

http://i.imgur.com/CH5J6Uz.jpg
Monday, February 3, 14
Rule 3: Comprehensive plans require all players
• You may find yourself faced with an event in which your organization
is able to only provide Minimum Viable Product-level services
• Scaling back services to only critical core components requires
decision making and planning by product, dev, ops, security, etc
• Minimize the need to also bring along extraneous services like VPNs
and specialized gear

Monday, February 3, 14
Getting an MVP Up
App LBs
Cache
App Servers
DB Cache
DB slaves
DBs

Monday, February 3, 14
Getting an MVP Up
App LBs

Baseline Capacity
Cache
App Servers
DB Cache
DB slaves

DBs

Monday, February 3, 14

Baseline Capacity
Getting an MVP Up
App LBs

Baseline Capacity
Cache
App Servers
DB Cache

Maintain Interfaces?

DB slaves
DBs

Monday, February 3, 14

Baseline Capacity
Tackling a Reduced Topology
• Container for metadata related to the DR topology
• Chef environment, data bags for storing new info
• Separate from existing infrastructure metadata

http://www.flickr.com/photos/psd/9626226855/sizes/z/in/photostream/
Monday, February 3, 14
DR Environment
• In Chef, an environment is a logical grouping for nodes
• Environments belonging to the same organization share other Chef
components like cookbooks and role definitions
• The environment allows you to customize settings for the nodes that
live in the environment

Monday, February 3, 14
DR Environment
$ cat environments/dr.rb
name “dr-app1”
description “DR for App1”
override_attributes(
:app1 => {
:db_conn => “ro”
}
)
Monday, February 3, 14
Rule 4: Prioritize
• Determine the hierarchy of all critical services
• Your list may have a different order depending on:
• Day of week / month / quarter - is accounting software P1 on the 10th of the
month?
• Length of outage - can a service be down a short time with fewer risks?
• Amount of time necessary to recover - how long will it take your data analytics
system to catch up after an outage of N hours? More than N additional hours?

Monday, February 3, 14
User Behavior
App 1

App1 Avg

150

112.5

75

37.5

0
0600 0800 1000 1200 1400 1600 1800 2000 2200 0000 0200 0400 0600
Monday, February 3, 14
Managing Complexity
• Your CM tool is composed of atomic units representing your
infrastructure
• Rely on those to help you manage the additional complexity of
instantiating new resources in emergencies
• All relationships should be well defined and encoded in the CM tools
• Eliminate the need for specialized knowledge for your DR planning

Monday, February 3, 14
Rule 5: Don’t plan for heroism
• When catastrophic events occur, safety of your people is primary
• Large events affect the availability of people resources
• If your staff has reason to be concerned for their welfare, or the
welfare of their families, those are priorities

Monday, February 3, 14
DR for People
• Resist the urge to hide your config management from different teams
• You can’t predict which members of your team will be able to help

Monday, February 3, 14
Checklist
• Identify providers to be used in the case of an outage
• Are you going to use AWS? Use idle or under utilized infrastructure in other
locations? Will there be DNS changes, etc?

• Make sure all accounts, billing, and personnel access are up to date
• Check this on a regular basis. Add new staff to access lists promptly.

• All new service deployments must include emergency plan
• Plan for your primary folks to be unavailable

Monday, February 3, 14
TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible

Monday, February 3, 14
TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible

Monday, February 3, 14
TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible

Monday, February 3, 14
TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible

Monday, February 3, 14
TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible

Monday, February 3, 14
TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible

Monday, February 3, 14
TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible

Monday, February 3, 14
TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible

Monday, February 3, 14
TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible

Monday, February 3, 14
TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible

Monday, February 3, 14
TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible

Monday, February 3, 14
Other Stuff to Take into Consideration
• SaaS solutions for temporary infrastructures
• Monitoring and metrics, CDNs, code repositories
• Also for backoffice: email services, document storage

• Often scary for security and compliance folks
• Speed time to recovery in large-loss events

Monday, February 3, 14
fin
• Time to rewrite DR practices for new
generation of tools and services
• Send me your stories if you can share
mandi@getchef.com

http://i.imgur.com/KdRnwZK.jpg
Monday, February 3, 14

More Related Content

Similar to Disaster Recovery Strategies with Config Management

Webinar: How to Create a Disaster Recovery (DR) Plan that Actually Works
Webinar: How to Create a Disaster Recovery (DR) Plan that Actually WorksWebinar: How to Create a Disaster Recovery (DR) Plan that Actually Works
Webinar: How to Create a Disaster Recovery (DR) Plan that Actually Works
Storage Switzerland
 
Sandy: Making it Through the Storm
Sandy: Making it Through the StormSandy: Making it Through the Storm
Sandy: Making it Through the Storm
Ralph Monaco
 
Infrastructure Migration
Infrastructure MigrationInfrastructure Migration
Infrastructure Migration
Matt Simmons
 
The Top Outages of 2021: Analysis and Takeaways
The Top Outages of 2021: Analysis and TakeawaysThe Top Outages of 2021: Analysis and Takeaways
The Top Outages of 2021: Analysis and Takeaways
ThousandEyes
 
Capitalizing on the Cloud
Capitalizing on the CloudCapitalizing on the Cloud
Capitalizing on the Cloud
GovLoop
 
An Introduction to DevOps with Chef
An Introduction to DevOps with ChefAn Introduction to DevOps with Chef
An Introduction to DevOps with ChefJulian Dunn
 
Putting Spatial Information in Customer Hands - Wayne Fry - Dept Natural Reso...
Putting Spatial Information in Customer Hands - Wayne Fry - Dept Natural Reso...Putting Spatial Information in Customer Hands - Wayne Fry - Dept Natural Reso...
Putting Spatial Information in Customer Hands - Wayne Fry - Dept Natural Reso...
International Map Industry Association
 
SF Bay Area Disaster Management overview
SF Bay Area Disaster Management overviewSF Bay Area Disaster Management overview
SF Bay Area Disaster Management overview
knmontgomery
 
Successful_BC_Strategy.pdf
Successful_BC_Strategy.pdfSuccessful_BC_Strategy.pdf
Successful_BC_Strategy.pdf
mykovalenko1
 
Don’t wait for Disaster to Strike! Be Prepared with Business Continuity Plans
Don’t wait for Disaster to Strike! Be Prepared with Business Continuity Plans Don’t wait for Disaster to Strike! Be Prepared with Business Continuity Plans
Don’t wait for Disaster to Strike! Be Prepared with Business Continuity Plans
SRIIA Technologies, Inc.
 
SURENDRA KUMAR_MADDI_Latest_Resume
SURENDRA KUMAR_MADDI_Latest_ResumeSURENDRA KUMAR_MADDI_Latest_Resume
SURENDRA KUMAR_MADDI_Latest_ResumeSurendra Maddi
 
Chap013.pptx
Chap013.pptxChap013.pptx
Chap013.pptx
FilizMizrak
 
Analyzing data, performance and impacts in construction
Analyzing data, performance and impacts in constructionAnalyzing data, performance and impacts in construction
Analyzing data, performance and impacts in construction
Michael Pink
 
Tudor Damian - Microsoft Azure ca si solutie pentru backup sau disaster recovery
Tudor Damian - Microsoft Azure ca si solutie pentru backup sau disaster recoveryTudor Damian - Microsoft Azure ca si solutie pentru backup sau disaster recovery
Tudor Damian - Microsoft Azure ca si solutie pentru backup sau disaster recovery
Avaelgo
 
Earthlink Business Cloud Disaster Recovery
Earthlink Business Cloud Disaster RecoveryEarthlink Business Cloud Disaster Recovery
Earthlink Business Cloud Disaster RecoveryMike Ricca
 
Stakeholder update 4 14 data center outage
Stakeholder update 4 14 data center outageStakeholder update 4 14 data center outage
Stakeholder update 4 14 data center outage
kevin_donovan
 
7 Habits for High Effective Disaster Recovery Administrators
7 Habits for High Effective Disaster Recovery Administrators7 Habits for High Effective Disaster Recovery Administrators
7 Habits for High Effective Disaster Recovery Administrators
QuorumLabs
 
Resource Planning
Resource PlanningResource Planning
Resource Planning
Wajahat Ali
 
Apdip disaster mgmt
Apdip disaster mgmtApdip disaster mgmt
Apdip disaster mgmt
srinivasan gopalan
 
Download-manuals-surface water-software-03understandingswd-pplan
 Download-manuals-surface water-software-03understandingswd-pplan Download-manuals-surface water-software-03understandingswd-pplan
Download-manuals-surface water-software-03understandingswd-pplanhydrologyproject001
 

Similar to Disaster Recovery Strategies with Config Management (20)

Webinar: How to Create a Disaster Recovery (DR) Plan that Actually Works
Webinar: How to Create a Disaster Recovery (DR) Plan that Actually WorksWebinar: How to Create a Disaster Recovery (DR) Plan that Actually Works
Webinar: How to Create a Disaster Recovery (DR) Plan that Actually Works
 
Sandy: Making it Through the Storm
Sandy: Making it Through the StormSandy: Making it Through the Storm
Sandy: Making it Through the Storm
 
Infrastructure Migration
Infrastructure MigrationInfrastructure Migration
Infrastructure Migration
 
The Top Outages of 2021: Analysis and Takeaways
The Top Outages of 2021: Analysis and TakeawaysThe Top Outages of 2021: Analysis and Takeaways
The Top Outages of 2021: Analysis and Takeaways
 
Capitalizing on the Cloud
Capitalizing on the CloudCapitalizing on the Cloud
Capitalizing on the Cloud
 
An Introduction to DevOps with Chef
An Introduction to DevOps with ChefAn Introduction to DevOps with Chef
An Introduction to DevOps with Chef
 
Putting Spatial Information in Customer Hands - Wayne Fry - Dept Natural Reso...
Putting Spatial Information in Customer Hands - Wayne Fry - Dept Natural Reso...Putting Spatial Information in Customer Hands - Wayne Fry - Dept Natural Reso...
Putting Spatial Information in Customer Hands - Wayne Fry - Dept Natural Reso...
 
SF Bay Area Disaster Management overview
SF Bay Area Disaster Management overviewSF Bay Area Disaster Management overview
SF Bay Area Disaster Management overview
 
Successful_BC_Strategy.pdf
Successful_BC_Strategy.pdfSuccessful_BC_Strategy.pdf
Successful_BC_Strategy.pdf
 
Don’t wait for Disaster to Strike! Be Prepared with Business Continuity Plans
Don’t wait for Disaster to Strike! Be Prepared with Business Continuity Plans Don’t wait for Disaster to Strike! Be Prepared with Business Continuity Plans
Don’t wait for Disaster to Strike! Be Prepared with Business Continuity Plans
 
SURENDRA KUMAR_MADDI_Latest_Resume
SURENDRA KUMAR_MADDI_Latest_ResumeSURENDRA KUMAR_MADDI_Latest_Resume
SURENDRA KUMAR_MADDI_Latest_Resume
 
Chap013.pptx
Chap013.pptxChap013.pptx
Chap013.pptx
 
Analyzing data, performance and impacts in construction
Analyzing data, performance and impacts in constructionAnalyzing data, performance and impacts in construction
Analyzing data, performance and impacts in construction
 
Tudor Damian - Microsoft Azure ca si solutie pentru backup sau disaster recovery
Tudor Damian - Microsoft Azure ca si solutie pentru backup sau disaster recoveryTudor Damian - Microsoft Azure ca si solutie pentru backup sau disaster recovery
Tudor Damian - Microsoft Azure ca si solutie pentru backup sau disaster recovery
 
Earthlink Business Cloud Disaster Recovery
Earthlink Business Cloud Disaster RecoveryEarthlink Business Cloud Disaster Recovery
Earthlink Business Cloud Disaster Recovery
 
Stakeholder update 4 14 data center outage
Stakeholder update 4 14 data center outageStakeholder update 4 14 data center outage
Stakeholder update 4 14 data center outage
 
7 Habits for High Effective Disaster Recovery Administrators
7 Habits for High Effective Disaster Recovery Administrators7 Habits for High Effective Disaster Recovery Administrators
7 Habits for High Effective Disaster Recovery Administrators
 
Resource Planning
Resource PlanningResource Planning
Resource Planning
 
Apdip disaster mgmt
Apdip disaster mgmtApdip disaster mgmt
Apdip disaster mgmt
 
Download-manuals-surface water-software-03understandingswd-pplan
 Download-manuals-surface water-software-03understandingswd-pplan Download-manuals-surface water-software-03understandingswd-pplan
Download-manuals-surface water-software-03understandingswd-pplan
 

More from Mandi Walls

DOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdfDOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdf
Mandi Walls
 
Addo reducing trauma in organizations with SLOs and chaos engineering
Addo  reducing trauma in organizations with SLOs and chaos engineeringAddo  reducing trauma in organizations with SLOs and chaos engineering
Addo reducing trauma in organizations with SLOs and chaos engineering
Mandi Walls
 
Full Service Ownership
Full Service OwnershipFull Service Ownership
Full Service Ownership
Mandi Walls
 
PagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call TeamsPagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call Teams
Mandi Walls
 
InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020
Mandi Walls
 
Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019
Mandi Walls
 
Using Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure SecurityUsing Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure Security
Mandi Walls
 
Adding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17xAdding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17x
Mandi Walls
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Mandi Walls
 
BuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec WorkshopBuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec Workshop
Mandi Walls
 
InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018
Mandi Walls
 
DevOpsDays InSpec Workshop
DevOpsDays InSpec WorkshopDevOpsDays InSpec Workshop
DevOpsDays InSpec Workshop
Mandi Walls
 
Adding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpecAdding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpec
Mandi Walls
 
InSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.beInSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.be
Mandi Walls
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
Mandi Walls
 
Ingite Slides for InSpec
Ingite Slides for InSpecIngite Slides for InSpec
Ingite Slides for InSpec
Mandi Walls
 
Habitat at LinuxLab IT
Habitat at LinuxLab ITHabitat at LinuxLab IT
Habitat at LinuxLab IT
Mandi Walls
 
InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017
Mandi Walls
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
Mandi Walls
 
InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017
Mandi Walls
 

More from Mandi Walls (20)

DOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdfDOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdf
 
Addo reducing trauma in organizations with SLOs and chaos engineering
Addo  reducing trauma in organizations with SLOs and chaos engineeringAddo  reducing trauma in organizations with SLOs and chaos engineering
Addo reducing trauma in organizations with SLOs and chaos engineering
 
Full Service Ownership
Full Service OwnershipFull Service Ownership
Full Service Ownership
 
PagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call TeamsPagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call Teams
 
InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020
 
Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019
 
Using Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure SecurityUsing Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure Security
 
Adding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17xAdding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17x
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
 
BuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec WorkshopBuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec Workshop
 
InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018
 
DevOpsDays InSpec Workshop
DevOpsDays InSpec WorkshopDevOpsDays InSpec Workshop
DevOpsDays InSpec Workshop
 
Adding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpecAdding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpec
 
InSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.beInSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.be
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
 
Ingite Slides for InSpec
Ingite Slides for InSpecIngite Slides for InSpec
Ingite Slides for InSpec
 
Habitat at LinuxLab IT
Habitat at LinuxLab ITHabitat at LinuxLab IT
Habitat at LinuxLab IT
 
InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
 
InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017
 

Recently uploaded

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 

Recently uploaded (20)

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 

Disaster Recovery Strategies with Config Management

  • 1. DR Strategies with CM Mandi Walls CfgMgmtCamp 3 FEB 2014 Monday, February 3, 14
  • 2. whoami • Mandi Walls • Technical Practice Manager, CHEF • mandi@getchef.com • @lnxchk Monday, February 3, 14
  • 3. What is Disaster Recovery http://www.flickr.com/photos/61617934@N03/6196510705/sizes/z/in/photostream/ Monday, February 3, 14
  • 4. Reasons to Make DR Plans • Your business insurance requires it • Things are going to happen, whether you are ready or not Monday, February 3, 14
  • 5. Tornado Events in Loudoun County, VA http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map Monday, February 3, 14
  • 6. Tornado Events in Loudoun County, VA September 17, 2004 3:55 pm http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map Monday, February 3, 14
  • 7. Tornado Events in Loudoun County, VA September 17, 2004 3:55 pm http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map Monday, February 3, 14
  • 8. Tornado Events in Loudoun County, VA September 17, 2004 3:55 pm http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map Monday, February 3, 14
  • 9. Tornado Events in Loudoun County, VA September 17, 2004 3:55 pm Everybody Else http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map Monday, February 3, 14
  • 10. Hurricane Sandy, NYC, October 2012 Photo: Iwan Baan and New York Magazine Monday, February 3, 14
  • 11. Hurricane Sandy, NYC, October 2012 33 Whitehall Photo: Iwan Baan and New York Magazine Monday, February 3, 14
  • 12. Hurricane Sandy, NYC, October 2012 60 Hudson 33 Whitehall Photo: Iwan Baan and New York Magazine Monday, February 3, 14
  • 13. Hurricane Sandy, NYC, October 2012 375 Pearl 60 Hudson 33 Whitehall Photo: Iwan Baan and New York Magazine Monday, February 3, 14
  • 14. Hurricane Sandy, NYC, October 2012 375 Pearl 60 Hudson 65 Broadway 33 Whitehall Photo: Iwan Baan and New York Magazine Monday, February 3, 14
  • 15. Hurricane Sandy, NYC, October 2012 375 Pearl 60 Hudson 65 Broadway 33 Whitehall 25 Broadway Photo: Iwan Baan and New York Magazine Monday, February 3, 14
  • 16. Hurricane Sandy, NYC, October 2012 111 8th 60 Hudson 65 Broadway 375 Pearl 33 Whitehall 25 Broadway Photo: Iwan Baan and New York Magazine Monday, February 3, 14
  • 17. Hurricane Sandy, NYC, October 2012 111 8th 60 Hudson 65 Broadway 25 Broadway 375 Pearl 33 Whitehall 75 Broad Photo: Iwan Baan and New York Magazine Monday, February 3, 14
  • 18. Hurricane Sandy, NYC, October 2012 111 8th 121 Varick 60 Hudson 65 Broadway 25 Broadway 375 Pearl 33 Whitehall 75 Broad Photo: Iwan Baan and New York Magazine Monday, February 3, 14
  • 19. Hurricane Sandy, NYC, October 2012 111 8th 121 Varick 60 Hudson 65 Broadway 25 Broadway 375 Pearl 33 Whitehall 75 Broad My Apartment Photo: Iwan Baan and New York Magazine Monday, February 3, 14
  • 20. Hurricane Sandy, NYC, October 2012 111 8th 121 Varick 60 Hudson 65 Broadway 25 Broadway Bitches in BPC with newer infrastructure 375 Pearl 33 Whitehall 75 Broad My Apartment Photo: Iwan Baan and New York Magazine Monday, February 3, 14
  • 21. Current State of DR • Event horizon for modern DR was 9/11 • Same neighborhood as Hurricane Sandy • Most of the literature reflects the state of IT at that time Monday, February 3, 14
  • 22. Goals of DR Planning • Name staff and services that are key to business continuity • Provide clear guidance for making decisions in real time • Set rules for escalation, communication, participation • Document all of these things, publish the results, keep them updated on a regular basis Monday, February 3, 14
  • 23. Advantages of CM when Planning DR • Topology and service definition • Settings and relationships • Documentation • Tooling and workflows Monday, February 3, 14
  • 24. Old Rules that Still Apply • Accessible off site backups, with periodically tested restores • Documentation should also be available if your normal services are not • Documents need to be updated on a regular schedule, and personnel should be trained on their potential roles Monday, February 3, 14
  • 26. Rule 1: Your availability is your responsibility • Cloud / managed hosting allows us to outsource a number of worries • Bandwidth, power, cooling • That’s awesome, but does your vendor care as much about your customers or users as you do? • You must assess your tolerance for risk vs cost • No longer entirely dependent on getting budget for full scale “DR sites” Monday, February 3, 14
  • 27. Rule 1: To the Cloud! • Justifying DR planning is much easier without justifying massive quantities of capital for emergency capacity • If your applications are not tightly coupled to custom services by your IaaS provider, your flexibility in outage events is increased • Commonly missed items include • Keeping passwords in a single location that may be inaccessible in outages • Not having the most correct information about operating systems or server capacities that will be needed, and how to translate among providers • Not engaging with security and network teams to ensure all access is ok Monday, February 3, 14
  • 28. Knife Plugins $ knife rackspace server create (options) $ knife linode server create (options) $ knife ec2 server create (options) Monday, February 3, 14
  • 29. Rule 2: Assessing realistic risk • Do not bikeshed all possible events along all potential space-time continua • Assess risk based on affected services http://badassoftheweek.com/godzilla.html Monday, February 3, 14
  • 30. Rule 2: Planning for the Extent of an Event • Service level • Datacenter level • Regional level • National level Monday, February 3, 14
  • 31. Service-Level and Datacenter-Level Events • These are the easiest to deal with when you’re using CM! • If your infrastructure is in code, move services to new blades of grass by redeploying Monday, February 3, 14
  • 32. Spiceweasel • https://github.com/mattray/spiceweasel • Define groups of infrastructure in Ruby, JSON, or YAML • Spiceweasel will translate into knife commands to recreate the running infrastructure Monday, February 3, 14
  • 33. Spiceweasel nodes: - serverA: run_list: role[base] options: -i ~/.ssh/mray.pem -x user --sudo - serverB serverC: run_list: role[base] options: -i ~/.ssh/mray.pem -x user --sudo -E production - windows_winrm winboxA: run_list: role[base],role[iisserver] options: -x Administrator -P 'super_secret_password' - windows_ssh winboxB winboxC: run_list: role[base],role[iisserver] options: -x Administrator -P 'super_secret_password' Monday, February 3, 14
  • 34. Regional Events • Storms, volcanoes, large telecom cuts, worker strikes, etc • When regional civil infrastructure is affected • May provide more warning - hurricanes may take several days to form • Your staff may be without power or the ability to be physically present in your office or datacenter • Prioritization of services, training of backup staff Monday, February 3, 14
  • 35. National Events • Political unrest • Other large natural disasters • Decide if you even need a strategy for these cases • If your service is down, but all of your customers are also offline, does it make sense to pursue an extensive plan? Monday, February 3, 14
  • 36. Kind of a Bummer http://i.imgur.com/CH5J6Uz.jpg Monday, February 3, 14
  • 37. Rule 3: Comprehensive plans require all players • You may find yourself faced with an event in which your organization is able to only provide Minimum Viable Product-level services • Scaling back services to only critical core components requires decision making and planning by product, dev, ops, security, etc • Minimize the need to also bring along extraneous services like VPNs and specialized gear Monday, February 3, 14
  • 38. Getting an MVP Up App LBs Cache App Servers DB Cache DB slaves DBs Monday, February 3, 14
  • 39. Getting an MVP Up App LBs Baseline Capacity Cache App Servers DB Cache DB slaves DBs Monday, February 3, 14 Baseline Capacity
  • 40. Getting an MVP Up App LBs Baseline Capacity Cache App Servers DB Cache Maintain Interfaces? DB slaves DBs Monday, February 3, 14 Baseline Capacity
  • 41. Tackling a Reduced Topology • Container for metadata related to the DR topology • Chef environment, data bags for storing new info • Separate from existing infrastructure metadata http://www.flickr.com/photos/psd/9626226855/sizes/z/in/photostream/ Monday, February 3, 14
  • 42. DR Environment • In Chef, an environment is a logical grouping for nodes • Environments belonging to the same organization share other Chef components like cookbooks and role definitions • The environment allows you to customize settings for the nodes that live in the environment Monday, February 3, 14
  • 43. DR Environment $ cat environments/dr.rb name “dr-app1” description “DR for App1” override_attributes( :app1 => { :db_conn => “ro” } ) Monday, February 3, 14
  • 44. Rule 4: Prioritize • Determine the hierarchy of all critical services • Your list may have a different order depending on: • Day of week / month / quarter - is accounting software P1 on the 10th of the month? • Length of outage - can a service be down a short time with fewer risks? • Amount of time necessary to recover - how long will it take your data analytics system to catch up after an outage of N hours? More than N additional hours? Monday, February 3, 14
  • 45. User Behavior App 1 App1 Avg 150 112.5 75 37.5 0 0600 0800 1000 1200 1400 1600 1800 2000 2200 0000 0200 0400 0600 Monday, February 3, 14
  • 46. Managing Complexity • Your CM tool is composed of atomic units representing your infrastructure • Rely on those to help you manage the additional complexity of instantiating new resources in emergencies • All relationships should be well defined and encoded in the CM tools • Eliminate the need for specialized knowledge for your DR planning Monday, February 3, 14
  • 47. Rule 5: Don’t plan for heroism • When catastrophic events occur, safety of your people is primary • Large events affect the availability of people resources • If your staff has reason to be concerned for their welfare, or the welfare of their families, those are priorities Monday, February 3, 14
  • 48. DR for People • Resist the urge to hide your config management from different teams • You can’t predict which members of your team will be able to help Monday, February 3, 14
  • 49. Checklist • Identify providers to be used in the case of an outage • Are you going to use AWS? Use idle or under utilized infrastructure in other locations? Will there be DNS changes, etc? • Make sure all accounts, billing, and personnel access are up to date • Check this on a regular basis. Add new staff to access lists promptly. • All new service deployments must include emergency plan • Plan for your primary folks to be unavailable Monday, February 3, 14
  • 50. TL;DR • Start with baseline • Add components over time • Rebuild and return to initial infrastructure if / when possible Monday, February 3, 14
  • 51. TL;DR • Start with baseline • Add components over time • Rebuild and return to initial infrastructure if / when possible Monday, February 3, 14
  • 52. TL;DR • Start with baseline • Add components over time • Rebuild and return to initial infrastructure if / when possible Monday, February 3, 14
  • 53. TL;DR • Start with baseline • Add components over time • Rebuild and return to initial infrastructure if / when possible Monday, February 3, 14
  • 54. TL;DR • Start with baseline • Add components over time • Rebuild and return to initial infrastructure if / when possible Monday, February 3, 14
  • 55. TL;DR • Start with baseline • Add components over time • Rebuild and return to initial infrastructure if / when possible Monday, February 3, 14
  • 56. TL;DR • Start with baseline • Add components over time • Rebuild and return to initial infrastructure if / when possible Monday, February 3, 14
  • 57. TL;DR • Start with baseline • Add components over time • Rebuild and return to initial infrastructure if / when possible Monday, February 3, 14
  • 58. TL;DR • Start with baseline • Add components over time • Rebuild and return to initial infrastructure if / when possible Monday, February 3, 14
  • 59. TL;DR • Start with baseline • Add components over time • Rebuild and return to initial infrastructure if / when possible Monday, February 3, 14
  • 60. TL;DR • Start with baseline • Add components over time • Rebuild and return to initial infrastructure if / when possible Monday, February 3, 14
  • 61. Other Stuff to Take into Consideration • SaaS solutions for temporary infrastructures • Monitoring and metrics, CDNs, code repositories • Also for backoffice: email services, document storage • Often scary for security and compliance folks • Speed time to recovery in large-loss events Monday, February 3, 14
  • 62. fin • Time to rewrite DR practices for new generation of tools and services • Send me your stories if you can share mandi@getchef.com http://i.imgur.com/KdRnwZK.jpg Monday, February 3, 14