Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a Business Continuity Capability


Published on

A detailed overview of the business continuity / disaster recovery planning process. Gives numerous tips for effective execution of plan development. Emphasizes development of a true recovery capability through exercises which reveal weaknesses in the plan or technology leading to improvements.

Published in: Technology
  • Be the first to comment

Building a Business Continuity Capability

  1. 1. A Detailed Overview of Business Continuity Planning Rod Davis, CRISC, CBCP Version 1.07
  2. 2.  Formal education: BS Electrical Engineering  26 years in SIL International, 13 years in Mali, West Africa  Information Technology, Contingency Planning, Data Recovery, and Business Continuity  IT Certifications: Security+, Network+, A+, MCSA  CRISC - Certified in Risk and Information Systems Control  CBCP - Certified Business Continuity Professional,
  3. 3.  Identify what/who they are in your organization.  Champion/facilitate information sharing.  Create forum for shared decision making. Physical Security Business Continuity Cyber- security Information Sharing Information Sharing
  4. 4. Defining ‘Disaster’ Disaster Recovery is a subset of Business Continuity Business Continuity Planning helps achieve Organizational Resilience The Business Continuity Planning Cycle
  5. 5.  If a natural disaster struck a data center rendering critical IT services unavailable?  If a terrorist attack targeted an overseas regional center?  If a pandemic threatened global operations for your mission?
  6. 6.  The occurrence of some events could cause a temporary disruption of mission-critical services.  Some scenarios could actually result in long-term loss of mission-critical capacity.  The ‘unthinkable’ might include disruption or shutdown of programs that these services and capacity support.
  7. 7. Disaster – an event, which causes the loss of an essential service, or part of it, for a length of time which imperils mission achievement. ― Andrew Hiles, Business Continuity: Best Practices
  8. 8. Disaster – An event that compromises an organization’s ability to provide critical functions, processes, or services for some unacceptable period of time. ― Disaster Recovery Journal
  9. 9.  Disaster Recovery Planning: The activities associated with the continuing availability and restoration Planning of the IT infrastructure. ― BCI Dictionary of BC Management Terms
  10. 10. 43% 51% 6% Never reopen Close within two years Survive long-term Organizations that experience major data loss without disaster recovery plans* * Cummings, Haag, & McCubbrey (2005). Management Information Systems for the Information Age.
  11. 11. Business Continuity Planning is the process of developing prior arrangements and procedures that enable an organization to respond to an event in such a manner that critical business functions can continue within planned levels of disruption. The end result of the planning process is the BC Plan. ― BCI Dictionary of BC Management Terms
  12. 12.  A resilient organization is one that is able to achieve its core objectives in the face of adversity. ―
  13. 13. Project Initiation Risk Assessment Business Impact Analysis Business Continuity Strategies Business Continuity Plan Development Training, Testing, Evaluation Business Continuity Planning is ... project oriented iterative ongoing multi-phased requires testing The Business Continuity Planning Cycle
  14. 14. “It does not do to leave a live dragon out of your calculations, if you live near him.” ― Gandalf in ‘The Hobbit’, by J.R.R. Tolkien Two questions to ask … • Is he alive? • Does he live near you?
  15. 15. The Threat Equation Determine Relevant Threats Threats, Vulnerability, and Probability
  16. 16. RISK Threat Probability Vulnerability Impact Risk = Threat + Probability + Vulnerability + Impact The Threat Equation
  17. 17. Natural/Environmental Threats • Fire • Flood • Hurricane • Pandemic • Winter storm • Tornado • Lightning • Drought • Earthquake • Volcano • Tsunami Human Threats • Fire (accidental or arson) • Cyber-attack • Data theft or loss • Terrorist attack • Sabotage/Vandalism • Workplace violence • Civil unrest • Coup d'état • Civil war • Chemical or biological hazard Infrastructure Threats • Power grid failure • Petroleum supply disruption • Food or water contamination • Public utility failure (water, sewer, etc.) • Heating/Cooling system failure (affects IT & people) • Public transport disruption
  18. 18. Threat Assessment • Determine the most relevant threats, e.g. in your location dangerous lightning occurs frequently. Probability Assessment • High frequency of electrical storms = high probability of lightning strike. Vulnerability Assessment • Lack of lightning suppression = high vulnerability to a lightning strike.
  19. 19. What is Business Impact Analysis? Impact Rating System Defining ‘Mission-Critical’ Recovery Point Objective Recovery Time Objective
  20. 20. The process of analyzing business functions and the effect that a business disruption might have upon them. ― Business Continuity Institute
  21. 21. A process used to identify and prioritize:  Critical business functions and processes  Essential IT services and data  Required staff and equipment
  22. 22.  Identify mission-critical business functions.  Which ones require the highest level of risk mitigation?  Determine impact of disruptions over time.  Establish recovery priorities in case of disruption.
  23. 23. RPO – Recovery Point Objective RTO – Recovery Time Objective Point of last data backup Systems fully recovered Disaster strikes! • RPO – Recovery Point Objective • The maximum data loss that an organization will tolerate. Data and systems must be restored to this point after a disruption. • RTO – Recovery Time Objective • The maximum period of time that an organization accepts for recovery of business functions, systems, and processes. DowntimeData Timeline
  24. 24.  List of mission-critical business functions and their IT dependencies;  Recovery Time Objectives (RTOs) for these priorities  Recovery Point Objectives (RPOs) for IT assets  Recovery priorities … What do you recover first?
  25. 25. • Mission-critical business functions are those whose sustained failure could severely impair the business or lead to its imminent failure. • Examples of disruptions to business functions/processes: o Inability to meet employee payroll o Unable to process critical bank transfers o Critical financial data is corrupted
  26. 26. • Mission-critical refers to any network, system or application whose sustained failure would severely disrupt business operations. • Examples of disruptions to technology functions include: o Fire in a server room (destroying critical data) o Storm causing sustained power & Internet outage o Ransomware encrypts critical files on a data server
  27. 27.  Department Managers should decide - They understand which processes and services are most critical to their department’s successful operation.  The Directors should decide - They know what things are critical to the accomplishment of the overall corporate mission.  Do the IT Admins decide? Often without clear direction from leadership, by default they decide which data and services are critical to protect.  Point – Leadership should decide what is mission-critical, and IT staff should implement their decisions.
  28. 28. *Recovery times shown are arbitrary and will vary greatly depending on the type of business. • Mission-Critical: 12 - 48 hours • Highest priority for rapid recoverability • Vital: 3 - 5 days • Essential to operatons but not as critical • Important: 1 - 4 weeks • Long-term absence has eventual impact. • Minor: Months • Absence causes minimal impact.
  29. 29. Identify low-hanging fruit Group related functions together Deciding what to protect
  30. 30.  Use the Risk Assessment with your Recovery Priorities to identify risk mitigation that will produce the greatest positive impact for the least investment.
  31. 31.  Example: Risk Assessment discovers that both an IT system and its only backup device are stored in the same room. The Recovery Priorities mark this as a mission- critical system.  Solution: Move the backup device to another building on campus.
  32. 32.  Don’t attempt to identify every single business function/process, and every single server, etc.  Instead, identify RELATED business functions and GROUP them together into LOGICAL SYSTEMS.  Example: FINANCE SYSTEM - All the component parts of that system have to be working for that system to function …  Servers  Processes  Staff roles
  33. 33.  Your Business Impact Analysis identifies mission-critical functions and ranks which should be recovered first.  Choose five systems which you think are among the most mission- critical.  Chose one from that list of five systems … Develop System Documentation and Recovery Procedures for that one system.  Apply lessons learned from this example to your remaining systems.
  34. 34.  Finance Systems  Project Funding  Power Systems  Network  Data Storage and Backup Systems
  35. 35.  For in-house hosted systems, do both local backup and cloud backup.  Examples of vendors for cloud based backup: Crashplan (Business/Enterprise), Carbonite.  For cloud-based systems (e.g., Google Drive), use cloud-to- cloud backup  Examples of vendors for cloud-to-cloud backup: Datto/Backupify, Spanning
  36. 36. Minimum Deliverables Recovery Priorities Recovery Operations Leader Response and Recovery Teams Guidelines for Writing Recovery Documentation Templates for System Documentation and Recovery Procedures
  37. 37. • Mission critical systems identified and prioritized • RPOs, RTOs established • Vital Records, Databases, IT Services Priorities • Designated Roles and Responsibilities • Contact InformationTeams • Recovery Procedures for Mission-Critical Functions, Processes, Systems • Business Owners test/certify recovered systems. Procedures • Plan Activation: Transition Point from Emergency Response to Plan Activation • Declaration: Disruptive Event to Disaster Criteria Business Continuity Theory
  38. 38.  Has authority to declare an emergency or disaster  Can direct folks to stay home or move to alternate location  Allows departmental staff and IT Department to focus on recovery effort
  39. 39. Primary focus is on initial response to event  Ensures safety  Secures IT assets  Gives preliminary recovery time estimates
  40. 40. • Recover business function • Activity of business ownerBusiness • Recover IT systems • Activity of IT Administrators Information Technology • Departmental staff validates functionality of services Validation Disaster Recovery Team Focus is on recovery from a disruptive event.
  41. 41.  Documentation should be developed by the system owner, i.e., the one who by default manages that system.  In other words, don’t assign a specific individual to write all the documentation; task the people responsible for those systems to write it.
  42. 42. The person writing the recovery procedure should write it with the following assumptions:  The person performing the recovery is not normally responsible for this service.  The person performing the recovery has sufficient competence.
  43. 43. Include key staff roles and actual recovery procedure. Recovery Procedure for Business Functions
  44. 44. Include business functions served and actual recovery procedures. Recovery Procedure for IT Systems
  45. 45.  Do not attempt to design for the worst case scenario.  Initially focus your efforts on recovery from smaller scale events, e.g.,  Finance Systems has failed, needs to be rebuilt and tested.  Grow your planning efforts to handle more disastrous events, e.g.,  A fire has destroyed your data center, all in-house hosted systems are down and need to be rebuilt.
  46. 46. Exercise your staff and your plan to develop a true recovery capability!
  47. 47. • The goal of testing your disaster recovery plan is not to find out if it works, but to determine where it fails. • A planned test should never, never cause a business interruption! o Don’t lose your data in the process of testing your data recovery plan! o Don’t shut down a mission-critical service as a result of the test.
  48. 48. • Testing the disaster recovery plan reveals weaknesses and also trains staff. • As you execute the test, weaknesses are revealed. • The staff evaluates the results of the test, and this helps staff to ‘own’ the plan. • Training staff helps test your plan • As you describe the plan to your staff, they may notice inconsistencies or weaknesses.
  49. 49. • Document Review – Validate the disaster recovery plan via stakeholder’s review of the recovery documentation. • Table-Top Testing – Simple walkthrough of the plan in a safe environment, e.g., conference room. • Advanced Table-top Exercise – Directed simulation of activating the disaster recovery plan against a specific business disruption scenario. • Component Test - Evaluation of a single threat event impacting a single mission-critical function. • Comprehensive Simulation - An exercise to evaluate overall recovery capability in a high-stress environment.
  50. 50. Proverbs 21:5 Good planning and hard work lead to prosperity, but hasty shortcuts lead to poverty. “You can't plow a field simply by turning it over in your mind.” ― Gordon B. Hinckley
  51. 51.  Business Continuity Planning focuses on recovery of the broader business; Disaster Recovery Planning focuses on recovery of its IT Infrastructure.  Both are closely related and use a nearly identical process, and realistically you cannot fully consider one without the other.  Use the Risk Assessment with your Recovery Priorities to identify high ROI risk mitigations.  Identify RELATED business functions and GROUP them together into LOGICAL SYSTEMS  Pick the top five business functions and/or IT systems, then pick one, complete the BC/DR planning process for that one.  Then apply lessons learned to the other five and eventually for the remaining systems.
  52. 52. References for further research
  53. 53.  Business Continuity Institute -  Six ‘Good Practice Guidelines’  Disaster Recovery Institute International  Ten professional practices  The International Consortium for Organizational Resilience –  Disaster Recovery Journal –
  54. 54.  Dictionary of Business Continuity Management Terms: Business Continuity Institute - BCI  International Glossary for Resiliency maintained by DRI International  Business Continuity Glossary by DRJ Disaster Recovery Journal
  55. 55.  ISO 22301:2012 - Societal security This has emerged as the predominant ‘gold standard’.  NIST Special Publication 800-34 Rev. 1 - Contingency Planning Guide for Federal Information Systems