The document discusses disaster recovery (DR) strategies when using configuration management (CM). It recommends starting with a baseline of essential services and adding components over time. The document provides guidelines for prioritizing services and assessing risks at different levels from a single event to a national disaster. It emphasizes using CM tools to define infrastructure relationships to simplify managing DR complexity. The document also stresses the importance of planning for staff availability and safety during catastrophic events.
This presentation highlights the requirements and advantages of a fully functional Back up and Disaster Recovery System. We tell you why & how AWS Cloud can help you implement better BackUp and DR at lowered costs. We share with you a few of our sample solution architectures and tell you how exactly the Blazeclan Solution can fit to your Business requirements.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
This presentation highlights the requirements and advantages of a fully functional Back up and Disaster Recovery System. We tell you why & how AWS Cloud can help you implement better BackUp and DR at lowered costs. We share with you a few of our sample solution architectures and tell you how exactly the Blazeclan Solution can fit to your Business requirements.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Webinar: How to Create a Disaster Recovery (DR) Plan that Actually WorksStorage Switzerland
Disaster Recovery plans have always been hard to create and maintain in part because it is difficult to meet the service levels that those plans promise. With recent trends like GDPR and ransomware threats, it is even more difficult for DR plans to live up to expectations.
Join Storage Switzerland’s Founder and Lead Analyst, George Crump, and Commvault’s Director of Product Management, Deepak Verma and learn:
1. What new trends and threats promise to break your current DR process
2. How to simplify creation and maintenance of your DR plan
3. How to meet the service levels that your DR plan commits you to
In today's fast paced and digital world, many in government are looking to the cloud as a means to transform their agency. The cloud allows us to easily collaborate, share resources, receive on demand computing power, and change the way we deliver services to citizens. With the cloud, this all can be done faster and more efficiently than ever before.
SF Bay Area Disaster Management overviewknmontgomery
For residents of the San Francisco Bay Area, who need a reliable source for emergency information, the SF Bay Area Disaster Management Portal is a product/service that provides a consolidated real-time resource for gathering and sharing information and instructions for use by individuals, first responders and other emergency personnel in a regional (city, county, state-wide) emergency situation.
Don’t wait for Disaster to Strike! Be Prepared with Business Continuity Plans SRIIA Technologies, Inc.
A business continuity plan (BCP) includes planning for non-IT related aspects such as key personnel, facilities, crisis communication and reputation protection, and should refer to the disaster recovery plan (DRP) for IT related infrastructure recovery / continuity
Source: http://en.wikipedia.org/wiki/Disaster_recovery
Analyzing data, performance and impacts in constructionMichael Pink
Data management and analysis in the construction industry. Learn how to mine data to effectively manage construction projects, while utilizing data to capture and study variances related to cost and time/delay.
Tudor Damian - Microsoft Azure ca si solutie pentru backup sau disaster recoveryAvaelgo
În această sesiune se intră în detalii despre ceea ce înseamnă cu adevărat continuitatea afacerii și recuperarea în caz de catastrofe. Dacă v-ați confruntat cu întreruperi de funcționare sau momente în care soluția Dvs. nu a fost online, dacă aveți o soluție ERP / CRM / altă soluție de LOB care trebuie neapărat să ruleze în timpul orelor de program, dar uneori nu pornește, dacă vă este teamă de o posibilă pierdere a datelor companiei Dvs. și de tot timpul pe care va trebui să-l pierdeți încercând recuperarea lor, cu siguranță veți găsi această sesiune extrem de interesantă.
7 Habits for High Effective Disaster Recovery AdministratorsQuorumLabs
Quorum and Forrester discuss the 7 habits for highly effective Disaster Recovery administrators. Topics such as RPO, RTO, performance, and networking will be discussed as part of a due diligence list prior to making the 7 habits highly effective.
DOD Raleigh Gamedays with Chaos Engineering.pdfMandi Walls
My talk from DevOpsDays Raleigh 2022: Plan for Unplanned Work; Game Days with Chaos Engineering.
How do you plan for unplanned incidents? You practice with Chaos Engineering. Strong incident response doesn"t just happen, you have to build the skills and train your team. Practicing for major incidents gives your team insight into how your applications will behave when something goes wrong as well as how the team will interact to solve problems. Combining your Incident Response practices with Chaos Engineering roots your response practice in real-world scenarios, helping your team build confidence.
Webinar: How to Create a Disaster Recovery (DR) Plan that Actually WorksStorage Switzerland
Disaster Recovery plans have always been hard to create and maintain in part because it is difficult to meet the service levels that those plans promise. With recent trends like GDPR and ransomware threats, it is even more difficult for DR plans to live up to expectations.
Join Storage Switzerland’s Founder and Lead Analyst, George Crump, and Commvault’s Director of Product Management, Deepak Verma and learn:
1. What new trends and threats promise to break your current DR process
2. How to simplify creation and maintenance of your DR plan
3. How to meet the service levels that your DR plan commits you to
In today's fast paced and digital world, many in government are looking to the cloud as a means to transform their agency. The cloud allows us to easily collaborate, share resources, receive on demand computing power, and change the way we deliver services to citizens. With the cloud, this all can be done faster and more efficiently than ever before.
SF Bay Area Disaster Management overviewknmontgomery
For residents of the San Francisco Bay Area, who need a reliable source for emergency information, the SF Bay Area Disaster Management Portal is a product/service that provides a consolidated real-time resource for gathering and sharing information and instructions for use by individuals, first responders and other emergency personnel in a regional (city, county, state-wide) emergency situation.
Don’t wait for Disaster to Strike! Be Prepared with Business Continuity Plans SRIIA Technologies, Inc.
A business continuity plan (BCP) includes planning for non-IT related aspects such as key personnel, facilities, crisis communication and reputation protection, and should refer to the disaster recovery plan (DRP) for IT related infrastructure recovery / continuity
Source: http://en.wikipedia.org/wiki/Disaster_recovery
Analyzing data, performance and impacts in constructionMichael Pink
Data management and analysis in the construction industry. Learn how to mine data to effectively manage construction projects, while utilizing data to capture and study variances related to cost and time/delay.
Tudor Damian - Microsoft Azure ca si solutie pentru backup sau disaster recoveryAvaelgo
În această sesiune se intră în detalii despre ceea ce înseamnă cu adevărat continuitatea afacerii și recuperarea în caz de catastrofe. Dacă v-ați confruntat cu întreruperi de funcționare sau momente în care soluția Dvs. nu a fost online, dacă aveți o soluție ERP / CRM / altă soluție de LOB care trebuie neapărat să ruleze în timpul orelor de program, dar uneori nu pornește, dacă vă este teamă de o posibilă pierdere a datelor companiei Dvs. și de tot timpul pe care va trebui să-l pierdeți încercând recuperarea lor, cu siguranță veți găsi această sesiune extrem de interesantă.
7 Habits for High Effective Disaster Recovery AdministratorsQuorumLabs
Quorum and Forrester discuss the 7 habits for highly effective Disaster Recovery administrators. Topics such as RPO, RTO, performance, and networking will be discussed as part of a due diligence list prior to making the 7 habits highly effective.
DOD Raleigh Gamedays with Chaos Engineering.pdfMandi Walls
My talk from DevOpsDays Raleigh 2022: Plan for Unplanned Work; Game Days with Chaos Engineering.
How do you plan for unplanned incidents? You practice with Chaos Engineering. Strong incident response doesn"t just happen, you have to build the skills and train your team. Practicing for major incidents gives your team insight into how your applications will behave when something goes wrong as well as how the team will interact to solve problems. Combining your Incident Response practices with Chaos Engineering roots your response practice in real-world scenarios, helping your team build confidence.
Prescriptive Security with InSpec - All Things Open 2019Mandi Walls
What is Chef InSpec, and how can it help you manage and maintain system security through the full lifecycle of your applications? See how this powerful tool can keep your systems secure. Demo slides included in the appendix
This is an approximately 90-minute InSpec workshop covering basic InSpec resources and profiles and applying them to Linux Hardening. Delivered at DevSecCon 2017 in London, October 20, 2017
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Disaster Recovery Strategies with Config Management
1. DR Strategies with CM
Mandi Walls
CfgMgmtCamp
3 FEB 2014
Monday, February 3, 14
2. whoami
• Mandi Walls
• Technical Practice Manager, CHEF
• mandi@getchef.com
• @lnxchk
Monday, February 3, 14
3. What is Disaster Recovery
http://www.flickr.com/photos/61617934@N03/6196510705/sizes/z/in/photostream/
Monday, February 3, 14
4. Reasons to Make DR Plans
• Your business insurance requires it
• Things are going to happen, whether you are ready or not
Monday, February 3, 14
5. Tornado Events in Loudoun County, VA
http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map
Monday, February 3, 14
6. Tornado Events in Loudoun County, VA
September 17,
2004 3:55 pm
http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map
Monday, February 3, 14
7. Tornado Events in Loudoun County, VA
September 17,
2004 3:55 pm
http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map
Monday, February 3, 14
8. Tornado Events in Loudoun County, VA
September 17,
2004 3:55 pm
http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map
Monday, February 3, 14
9. Tornado Events in Loudoun County, VA
September 17,
2004 3:55 pm
Everybody Else
http://www.tornadohistoryproject.com/tornado/Virginia/Loudoun/map
Monday, February 3, 14
10. Hurricane Sandy, NYC, October 2012
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
11. Hurricane Sandy, NYC, October 2012
33 Whitehall
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
12. Hurricane Sandy, NYC, October 2012
60 Hudson
33 Whitehall
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
13. Hurricane Sandy, NYC, October 2012
375 Pearl
60 Hudson
33 Whitehall
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
14. Hurricane Sandy, NYC, October 2012
375 Pearl
60 Hudson
65 Broadway
33 Whitehall
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
15. Hurricane Sandy, NYC, October 2012
375 Pearl
60 Hudson
65 Broadway
33 Whitehall
25 Broadway
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
16. Hurricane Sandy, NYC, October 2012
111 8th
60 Hudson
65 Broadway
375 Pearl
33 Whitehall
25 Broadway
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
17. Hurricane Sandy, NYC, October 2012
111 8th
60 Hudson
65 Broadway
25 Broadway
375 Pearl
33 Whitehall
75 Broad
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
18. Hurricane Sandy, NYC, October 2012
111 8th
121 Varick
60 Hudson
65 Broadway
25 Broadway
375 Pearl
33 Whitehall
75 Broad
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
19. Hurricane Sandy, NYC, October 2012
111 8th
121 Varick
60 Hudson
65 Broadway
25 Broadway
375 Pearl
33 Whitehall
75 Broad
My Apartment
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
20. Hurricane Sandy, NYC, October 2012
111 8th
121 Varick
60 Hudson
65 Broadway
25 Broadway
Bitches in BPC with newer infrastructure
375 Pearl
33 Whitehall
75 Broad
My Apartment
Photo: Iwan Baan and New York Magazine
Monday, February 3, 14
21. Current State of DR
• Event horizon for modern DR was 9/11
• Same neighborhood as Hurricane Sandy
• Most of the literature reflects the state of IT at that time
Monday, February 3, 14
22. Goals of DR Planning
• Name staff and services that are key to business continuity
• Provide clear guidance for making decisions in real time
• Set rules for escalation, communication, participation
• Document all of these things, publish the results, keep them updated
on a regular basis
Monday, February 3, 14
23. Advantages of CM when Planning DR
• Topology and service definition
• Settings and relationships
• Documentation
• Tooling and workflows
Monday, February 3, 14
24. Old Rules that Still Apply
• Accessible off site backups, with periodically tested restores
• Documentation should also be available if your normal services are
not
• Documents need to be updated on a regular schedule, and personnel
should be trained on their potential roles
Monday, February 3, 14
26. Rule 1: Your availability is your responsibility
• Cloud / managed hosting allows us to outsource a number of worries
• Bandwidth, power, cooling
• That’s awesome, but does your vendor care as much about your
customers or users as you do?
• You must assess your tolerance for risk vs cost
• No longer entirely dependent on getting budget for full scale “DR sites”
Monday, February 3, 14
27. Rule 1: To the Cloud!
• Justifying DR planning is much easier without justifying massive
quantities of capital for emergency capacity
• If your applications are not tightly coupled to custom services by your
IaaS provider, your flexibility in outage events is increased
• Commonly missed items include
• Keeping passwords in a single location that may be inaccessible in outages
• Not having the most correct information about operating systems or server
capacities that will be needed, and how to translate among providers
• Not engaging with security and network teams to ensure all access is ok
Monday, February 3, 14
28. Knife Plugins
$ knife rackspace server create (options)
$ knife linode server create (options)
$ knife ec2 server create (options)
Monday, February 3, 14
29. Rule 2: Assessing realistic risk
• Do not bikeshed all possible events along all
potential space-time continua
• Assess risk based on affected services
http://badassoftheweek.com/godzilla.html
Monday, February 3, 14
30. Rule 2: Planning for the Extent of an Event
• Service level
• Datacenter level
• Regional level
• National level
Monday, February 3, 14
31. Service-Level and Datacenter-Level Events
• These are the easiest to deal with when you’re using CM!
• If your infrastructure is in code, move services to new blades of grass
by redeploying
Monday, February 3, 14
34. Regional Events
• Storms, volcanoes, large telecom cuts, worker strikes, etc
• When regional civil infrastructure is affected
• May provide more warning - hurricanes may take several days to form
• Your staff may be without power or the ability to be physically present
in your office or datacenter
• Prioritization of services, training of backup staff
Monday, February 3, 14
35. National Events
• Political unrest
• Other large natural disasters
• Decide if you even need a strategy for these cases
• If your service is down, but all of your customers are also offline, does it make
sense to pursue an extensive plan?
Monday, February 3, 14
36. Kind of a Bummer
http://i.imgur.com/CH5J6Uz.jpg
Monday, February 3, 14
37. Rule 3: Comprehensive plans require all players
• You may find yourself faced with an event in which your organization
is able to only provide Minimum Viable Product-level services
• Scaling back services to only critical core components requires
decision making and planning by product, dev, ops, security, etc
• Minimize the need to also bring along extraneous services like VPNs
and specialized gear
Monday, February 3, 14
38. Getting an MVP Up
App LBs
Cache
App Servers
DB Cache
DB slaves
DBs
Monday, February 3, 14
39. Getting an MVP Up
App LBs
Baseline Capacity
Cache
App Servers
DB Cache
DB slaves
DBs
Monday, February 3, 14
Baseline Capacity
40. Getting an MVP Up
App LBs
Baseline Capacity
Cache
App Servers
DB Cache
Maintain Interfaces?
DB slaves
DBs
Monday, February 3, 14
Baseline Capacity
41. Tackling a Reduced Topology
• Container for metadata related to the DR topology
• Chef environment, data bags for storing new info
• Separate from existing infrastructure metadata
http://www.flickr.com/photos/psd/9626226855/sizes/z/in/photostream/
Monday, February 3, 14
42. DR Environment
• In Chef, an environment is a logical grouping for nodes
• Environments belonging to the same organization share other Chef
components like cookbooks and role definitions
• The environment allows you to customize settings for the nodes that
live in the environment
Monday, February 3, 14
43. DR Environment
$ cat environments/dr.rb
name “dr-app1”
description “DR for App1”
override_attributes(
:app1 => {
:db_conn => “ro”
}
)
Monday, February 3, 14
44. Rule 4: Prioritize
• Determine the hierarchy of all critical services
• Your list may have a different order depending on:
• Day of week / month / quarter - is accounting software P1 on the 10th of the
month?
• Length of outage - can a service be down a short time with fewer risks?
• Amount of time necessary to recover - how long will it take your data analytics
system to catch up after an outage of N hours? More than N additional hours?
Monday, February 3, 14
46. Managing Complexity
• Your CM tool is composed of atomic units representing your
infrastructure
• Rely on those to help you manage the additional complexity of
instantiating new resources in emergencies
• All relationships should be well defined and encoded in the CM tools
• Eliminate the need for specialized knowledge for your DR planning
Monday, February 3, 14
47. Rule 5: Don’t plan for heroism
• When catastrophic events occur, safety of your people is primary
• Large events affect the availability of people resources
• If your staff has reason to be concerned for their welfare, or the
welfare of their families, those are priorities
Monday, February 3, 14
48. DR for People
• Resist the urge to hide your config management from different teams
• You can’t predict which members of your team will be able to help
Monday, February 3, 14
49. Checklist
• Identify providers to be used in the case of an outage
• Are you going to use AWS? Use idle or under utilized infrastructure in other
locations? Will there be DNS changes, etc?
• Make sure all accounts, billing, and personnel access are up to date
• Check this on a regular basis. Add new staff to access lists promptly.
• All new service deployments must include emergency plan
• Plan for your primary folks to be unavailable
Monday, February 3, 14
50. TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible
Monday, February 3, 14
51. TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible
Monday, February 3, 14
52. TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible
Monday, February 3, 14
53. TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible
Monday, February 3, 14
54. TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible
Monday, February 3, 14
55. TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible
Monday, February 3, 14
56. TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible
Monday, February 3, 14
57. TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible
Monday, February 3, 14
58. TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible
Monday, February 3, 14
59. TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible
Monday, February 3, 14
60. TL;DR
• Start with baseline
• Add components
over time
• Rebuild and return to
initial infrastructure
if / when possible
Monday, February 3, 14
61. Other Stuff to Take into Consideration
• SaaS solutions for temporary infrastructures
• Monitoring and metrics, CDNs, code repositories
• Also for backoffice: email services, document storage
• Often scary for security and compliance folks
• Speed time to recovery in large-loss events
Monday, February 3, 14
62. fin
• Time to rewrite DR practices for new
generation of tools and services
• Send me your stories if you can share
mandi@getchef.com
http://i.imgur.com/KdRnwZK.jpg
Monday, February 3, 14