Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Structured Approach To It Business System Availability And Continuity Planning, Analysis And Design

  • 2,134 views
Uploaded on

Structured Approach to IT Business System Availability and Continuity Planning, Analysis and Design

Structured Approach to IT Business System Availability and Continuity Planning, Analysis and Design

More in: Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,134
On Slideshare
2,131
From Embeds
3
Number of Embeds
3

Actions

Shares
Downloads
165
Comments
0
Likes
1

Embeds 3

http://www.slideshare.net 1
http://darya-ld1.linkedin.biz 1
https://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Structured Approach to IT Business System Availability and Continuity Planning, Analysis and Design Alan McSweeney
  • 2. Objectives • To provide details on a structured approach to analyse and define availability and continuity requirements for IT systems • To provide background information on the changing landscape of availability and continuity February 18, 2010 2
  • 3. Agenda • Availability and Continuity Overview • Availability Management • Continuity Management • Summary February 18, 2010 3
  • 4. Availability and Continuity • Availability is the ability of a system or service to perform its required function at a stated instant or over a stated period of time. • Availability is expressed as the availability ratio − The proportion of time that the service is actually available for use by the customers within the agreed service hours • Continuity is concerned with preparing to address unwanted occurrences − May relate to the recovery of IT systems or entire business processes. • Continuity is concerned with ensuring that IT Services are recovered within agreed time scale • Availability is a superset of Continuity and encompasses the continued operation of systems in the event of a disaster • Continuity ensures availability in extreme circumstances • Availability defines what is to be available in these extreme circumstances February 18, 2010 4
  • 5. Availability and Continuity Relationship Availability Provides Availability Criteria to Continuity Availability Continuity Continuity Provides Business Impact Analysis to Availability February 18, 2010 5
  • 6. Availability and Continuity Relationships with Other IT Management Processes Service Finance Security Planning and Management Management Management Controls Security that May Ensures that Continuity and Puts a Cost on Lack of Impact Continuity and Availability are Availability Availability Incorporated into Service Controls Expenditure on Availability Provides Availability Criteria Agreements and Provisions Availability and Continuity to Continuity Availability Continuity Continuity Provides Business Impact Analysis to Availability Defines the Capacity Ensures Systems and Controls Change that May Required for Continuity and Infrastructure are Designed Impact Availability or Availability to Incorporate Continuity Require Continuity to be and Availability Invoked Capacity Change Planning and IT Architecture Management Management February 18, 2010 6
  • 7. Availability and Continuity • Availability − Defines availability of service during operating hours • Under normal circumstances • Under extraordinary circumstances • Continuity − Defines continued operations of critical services and their availability • Time until services are available and state of service after recovery • Under extraordinary circumstances February 18, 2010 7
  • 8. Availability and Continuity Availability of Services During Normal Availability of Services Operations After Continuity Primary IT Facilities Recovery IT Facilities Service 1 Service 2 Service 1 Component 1 Component 1 Component 1 Component 2 Component 4 Component 2 Continuity Component 3 Component 5 of Component 3 Operations Service 3 Service 4 Service 3 Component 1 Component 1 Component 1 Component 5 Component 2 Component 5 Component 6 Component 7 Component 6 February 18, 2010 8
  • 9. Availability and Continuity Full View of Availability Primary IT Facilities Recovery IT Facilities Service 1 Service 2 Service 1 Component 1 Component 1 Component 1 Component 2 Component 4 Component 2 Continuity Component 3 Component 5 of Component 3 Operations Service 3 Service 4 Service 3 Component 1 Component 1 Component 1 Component 5 Component 2 Component 5 Component 6 Component 7 Component 6 February 18, 2010 9
  • 10. Availability and Continuity Non-disruptive system maintenance such as data backup combined with continuous availability of agreed business systems Continuous Operation Business Continuity High Disaster Availability Recovery Fault-tolerant, failure-resistant Protection against unplanned infrastructure supporting continuous outages such as disasters through availability of agreed business systems reliable and predictable recovery and continuity of operations February 18, 2010 10
  • 11. Availability and Continuity Availability Continuity Availability During Normal Operations Availability During Housekeeping and Maintenance Operations Availability After Some Component Failures Availability After Complete Failure of Primary Facility February 18, 2010 11
  • 12. Availability and Continuity Heat Map Last Transaction Recovery Point Minutes Objective (RPO) – Amount of Data Increasing Availability Loss Hours (and Continuity) Tolerable Requirements After Recovery Days Days Hours Minutes Seconds Instantly Recovery Time Objective (RTO) – Time to Recover Service/Time By Which Service Needs to be Recovered February 18, 2010 12
  • 13. RTO and RPO • Recovery Point Objective (RPO) − Amount of Data Loss Tolerable After Recovery • Either amount of data immediately available after recovery or amount of data available for some time after recovery • Can be different • Provide some data for minimal operations initially • Provide more/all data • Recovery Time Objective (RTO) − Time to Recover Service/Time By Which Service Needs to be Recovered February 18, 2010 13
  • 14. RTO and RPO With Cost of Lack of Availability Recovery Business Critical Services Point Requiring Immediate Access With Objective Very Limited/No Data Loss and (RPO) – Requiring Continued Operation in Amount the Event of a Disaster of Data Loss Tolerable After Recovery Recovery Time Objective (RTO) – Time to Recover Service/Time By Which Cost of Lack of Service Needs to be Availability of Recovered Service/Cost Benefit of Providing High Availability and High Continuity • Add extra dimension to Availability and Continuity Heat Map to allow for explicit identification of those systems that need to be continuously available February 18, 2010 14
  • 15. What is a Business Critical Application? • Applications deemed business/mission critical − 2006 – 16% − 2007 – 36% − 2008 – 56% − 2009 – 60% • Availability and continuity are merging as most applications are being deemed mission critical February 18, 2010 15
  • 16. How Often Have You had to Invoke Continuity Plan in Last Five Years? Once 14% Twice 6% Three 3% Four 2% Five or More 2% None 73% • 27% of organisations have declared at least one disaster in the last five years February 18, 2010 16
  • 17. What Were the Causes of Having to Invoke Continuity Plans? Power Failure 22.5% Hardware Failure 16.6% Network Failure 11.2% Software Failure 8.9% Human Error 8.4% Flood 6.3% Other 6.3% Hurricane 5.6% Fire 3.9% Winter Storm 3.5% Terrorism 1.9% Not Specified 1.9% Earthquake 1.5% Tornado 1.1% Chemical Spill 0.4% February 18, 2010 17
  • 18. Continuity Testing Seen as Disruptive • 40% of organisations state that continuity testing impacts customers • 32% of organisations state that continuity testing impacts sales • Reasons for lack of testing − Lack of time resources − Lack of technology − Disruption to employees − Budget − Disruption to customers − Disruption to sales − Disruption to production systems − Not seen as a priority February 18, 2010 18
  • 19. Business Impact of Lack of Availability and Continuity Increase Exponentially Over Time Financial Loss Seconds Minutes Hour Hours Day Days Duration of Loss of Continuity Revenue Loss Staff Productivity Loss Reputational Damage Financial Performance February 18, 2010 19
  • 20. Availability Design and Management • Availability design optimises the capability of the IT infrastructure, services and supporting organisation to deliver a cost effective and sustained level of availability that enables the business to satisfy its business objectives − Ensures IT systems and infrastructure are designed to deliver the levels of availability required by the business − Provides a range of availability reporting to ensure that agreed levels of Availability are continuously measured and monitored − Optimises the availability of the IT infrastructure to deliver cost effective improvements that deliver real benefits to the business − Ensures shortfalls in availability are recognised and corrective actions are identified and performed − Reduces problems and incidents that impact availability − Creates and maintains an Availability Plan aimed at improving the overall availability and infrastructure components to ensure business availability requirements can be satisfied February 18, 2010 20
  • 21. Continuity Design and Management • Continuity design is concerned with responding to and recovering business operations in the event of an outage or disaster rendering significant impact on the organisation − Support the business by ensuring that the required IT facilities can be recovered within required and agreed business timescales − Provides the strategic and operational framework to review the way the organisation continues to provide its services while increasing its ability to recover from disruption, interruption or loss − Depends both on management and operations − Requires management commitment February 18, 2010 21
  • 22. People, Process, Technology • Start availability and continuity design with a business impact analysis and risk assessment • Technology exists to supports availability and continuity design - technology not constitute a plan • Focus on prevention before investing in technology • However, availability and continuity is seen as the preserve of IT − The business frequently does not have the required project focus or experience • Embed availability and continuity into IT architectures February 18, 2010 22
  • 23. Questions • Do you have adequate control over prevention of business process or IT infrastructure downtime? • Do you have adequate IT capabilities to insure continuous operations? • Do you know the risks your business and its business systems face? • What would the cost and impact of downtime be to your business? • Is your current continuity plan sufficient to meet your RPO and RTO objectives? • Do you know how much will business continuity costs? • What business problems will implementing availability and continuity solve even if you do not experience an unplanned IT outage? • What is the overall business value of availability and continuity to the business? • How should we define what level of business continuity we really need? February 18, 2010 23
  • 24. Availability Design and Management February 18, 2010 24
  • 25. Availability Design and Management Process Availability Design and Management Consists of Two Parallel Sub-Processes Availability Process Quality Control 2. Availability 3. Management 1. Availability Report Escalations of Reporting Evaluation and Service Availability Improvement Violations Availability Process Design and Management 2. Document 1. Availability 3. Gap Analysis System and 4. Availability Requirements and Application Review Analysis Recommendations Architecture February 18, 2010 25
  • 26. Structured Approach to Availability Design and Management • Can be used for an individual system or application or a service that consists of a number of systems or applications or the entire IT landscape • Scope is to define a plan to implement agreed availability February 18, 2010 26
  • 27. Scope of Availability Design and Management • Planning for service availability • Designing for service availability by anticipating disruptions, estimating and measuring reliability and maintainability • Planning for availability within SLA and reporting on them • Ensuring cost effectiveness of availability solutions • Reducing the duration of problems and incidents affecting availability • Ensuring that security requirements are defined and incorporated within the overall availability design February 18, 2010 27
  • 28. Availability Design and Management Driven by Requirements • Availability requirements are based on the needs of the business • Requirements are gathered, defined, and validated by the key users and business management • Includes hours of uptime as well as planned and unplanned downtime • Includes ongoing support and procedures to address service disruptions February 18, 2010 28
  • 29. Benefits of a Structured Approach to Availability Design and Management • Reduce Risks − SLAs will incorporate availability design based on architecture, − Reduced risk of violating SLAs • Cost Reduction − A defined and agreed acceptable level of service prevents over-delivery − Unnecessary expenditure on maintenance and resilience building is avoided • Improved Service Agility − Changing business availability requirements are addressed quickly − Cost of changes in availability of different levels is defined or can be assessed quickly. • Improved Service Quality − Improvement in Service Quality results from reduced Incidents as well as a reduced time to restore service February 18, 2010 29
  • 30. Structured Approach to Availability Design and Management Availability Analysis and Design 1. Availability 2. Document System and 3. Gap Analysis and 4. Availability Review Requirements Analysis Application Architecture Recommendations 1.1 Understand Service 2.1 Define Service Critical 3.1 Perform Gap and Risk 4.1 Define Availability Goals Components Analysis Measurement Model 2.2 Document Service 1.2 Document Availability 3.2 Identify Single points 4.2 Perform Trend Critical Components and Requirements of Failure Analysis Their Relationships 1.3 Validate with Service 2.3 Document and 3.3 Evaluate Alternative 4.3 Analyse Expanded Level Management Review Components Approaches and Costs Incident Lifecycle Function Monitoring Capability 3.4 Produce Gap Closure 2.4 Document System and 4.4 Investigate Major Recommendation and Application Architecture Outages Specification 3.5 Plan and Summarise 4.5 Analyse Availability Downtime Reports 3.6 Create Statement of Work to Implement February 18, 2010 30
  • 31. Step 1 - Availability Requirements Analysis Step Scope Inputs Outputs 1. Availability Determine availability Request for new service or changes Documented and agreed Requirements requirements related to supporting to existing service availability requirements Analysis the needs of the business Request for change to availability Validate with other IT management processes Create draft service agreement and assess for feasibility from availability perspective 1.1 Understand Document business goals for the Service design specification Documented and agreed Service Goals service business goals 1.2 Document Produce draft availability Draft service level agreement Documented and agreed Availability requirements based on availability requirements Requirements understanding of business goals 1.3 Validate with Validate availability draft Overall service management plan Validated availability Service Level requirements with service level requirements Management agreements and overall service Function management plan February 18, 2010 31
  • 32. Step 2 - Document System and Application Architecture Step Scope Inputs Outputs 2. Document Analyse operating environment of Service design specification Documented and agreed System and the individual components that Configurations of individual existing architecture for Application comprise the service components that comprise the service delivery Architecture service level agreement 2.1 Define Define the configurations of Service design specification Documented and agreed list Service Critical individual components that Configurations of individual of individual components Components comprise the service components that comprise the that comprise the service service 2.2 Document Document the structure of the Configurations of individual Representation of individual Service Critical service breakdown - individual components, their attributes and components, their attributes Components and components and and their relationships and relationships Their relationships that deliver the Relationships service 2.3 Document Review existing service monitoring Existing service monitoring Defined service monitoring and Review facilities and update or replace if procedures criteria Components required Monitoring Capability 2.4 Document Complete architecture document Representation of individual Architecture document System and that describes how the service is components, their attributes and Application delivered according to the service relationships Architecture level agreement Defined service monitoring criteria February 18, 2010 32
  • 33. Step 3 - Gap Analysis and Recommendations Step Scope Inputs Outputs 3. Gap Analysis and Perform gap analysis and Validated availability requirements Availability design Recommendations recommend suitable approach, Architecture document create specifications and cost justification Service problem and incident history 3.1 Perform Gap Based on knowledge derived Problem and incident data Gaps analysed and risks and Risk Analysis from Incident and Problem data Availability requirements identified and documented identify gaps in current services Architecture document Identify individual components 3.2 Identify Single Components attributes and Identified points of failure points of Failure whose failure can cause service relationships disruption 3.3 Evaluate Explore various options within IT strategy and architecture Approach for required Alternative the approved range and identify Gaps analysed and risks identified availability Approaches and a suitable approach based on and documented Costs requirements and cost justification 3.4 Produce Gap Decision on how the closure Approach for required availability Decision on design and Closure should be implemented based Cost information implementation Recommendation on financial and business Specifications for the and Specification reasons availability design and Develop specifications for the architecture availability design and architecture 3.5 Plan and Plan downtime for components Decision on design and Planned downtime Summarise and aggregate downtime across implementation Downtime services 3.6 Create Initiate project for implementing Specifications for the availability Statement of work for project Statement of Work changes to address availability design and architecture to Implement issues February 18, 2010 33
  • 34. Step 4 - Availability Review Step Scope Inputs Outputs 4. Availability Assess, review and update Incident, problem, fault reports Identified availability Review availability design if required concerns and amended design if required 4.1 Define Define availability measurement Documented and agreed availability Defined data sources for Availability model requirements availability measurement Measurement Model 4.2 Perform Trend Analyse incident and problem Incident and problem trend reports Identified availability Analysis data to arrive at a high level view concerns of availability 4.3 Analyse Analyse expanded incident Analyse breakdown of incident Identified specific areas Expanded Incident lifecycle resolution to validate and update which need improvement Lifecycle design considerations 4.4 Investigate Investigate large outages and Detailed incident analysis for Identified availability Major Outages update availability design if specific incidents, fault, problems concerns required and performance reports 4.5 Analyse Review availability reports and Availability reports Identified availability Availability Reports update infrastructure if required concerns Statement of work for identified changes February 18, 2010 34
  • 35. Core Principles • Core principles ensure consistency of work and outputs • Ensure processes will meet the requirements of the business • Work will be of a high quality • Core principles should serve as a checklist against which all work is assessed February 18, 2010 35
  • 36. Availability Design and Management Core Principles 1. Availability requirements are based on the agreed and defined needs of the business 2. The IT function will determine the overall requirement of availability, performance and recoverability of systems under the terms of a service agreement with the business 3. Infrastructure needs to be designed to routinely incorporate availability requirements 4. The availability design and management process must adhere to security policies and procedures 5. An availability plan will be used to track and manage availability requirements and information collected 6. Data on service reliability, maintainability, resiliency must be collected and monitored 7. The IT function will use continuous process improvement to achieve and maintain level of service availability 8. Planned downtime must be minimised for business-critical functions and unplanned downtime is handled by service management processes including Incident Management, Service Request Management, Continuity Management February 18, 2010 36
  • 37. Core Principle 1 - Availability Requirements Are Based On The Agreed And Defined Needs Of The Business • Elements • Benefits − Conditions for availability must be − Expectations are clearly defined aligned with the needs of the and accepted business − User satisfaction is increased − Relevant availability data must be − Growth can be forecast more gathered and analysed easily − Input and validation of − Problem areas can be identified requirements must be solicited from the business − Availability requirements must be documented and distributed for agreement and approval February 18, 2010 37
  • 38. Core Principle 2 - The IT Function Determines The Overall Requirement Of Availability, Performance And Recoverability Of Systems • Elements • Benefits − Requirements are met under − There is a structure of supporting defined and agreed service contracts in place from suppliers agreements and vendors to met business − Good working relationships need availability requirements to exist with key suppliers and vendors − Changes to environment must be reflected in service agreements February 18, 2010 38
  • 39. Core Principle 3 - Infrastructure Needs To Be Designed To Routinely Incorporate Availability Requirements • Elements • Benefits − Changes in infrastructure and − Availability requirements and business needs must reflected in expectations are clearly defined availability planning and design and accepted − Availability and recovery requirements need to be explicitly incorporated at the design stage February 18, 2010 39
  • 40. Core Principle 4 - Availability Design And Management Process Must Adhere To Security Policies And Procedures • Elements • Benefits − Access to IT services must be − Security measures will be followed provided in a secure environment − There will be an ability to − Availability processes must be differentiate between security aligned with security policies problems and availability problems February 18, 2010 40
  • 41. Core Principle 5 - Availability Plan Will Be Used To Track And Manage Availability Requirements And Information Collected • Elements • Benefits − An availability plan must be − Availability management goals are developed and distributed clearly defined and documented − Availability planning must be − There will be a clearly defined and outlined communicated process for − The availability plan must define availability planning and reporting the details about the to be data − Data provided for availability collected: what, how often, reporting, analysis and forecasting analysis, reporting, distribution, responses required February 18, 2010 41
  • 42. Core Principle 6 - Data On Service Reliability, Maintainability, Resiliency Must Be Collected And Monitored • Elements • Benefits − The data to be collected and − Availability management will be monitored must be defined, proactive and responsive rather documented and communicated than reactive − A supporting procedure to collect − The expectations of the business and monitor data, including can be set accurately response to potential problems − There will be an ability to prepare must be defined for potentially increased future − Data needs to be reviewed on a requirements regular and consistent basis − Availability trends can be identified and addresses February 18, 2010 42
  • 43. Core Principle 7 - IT Function Will Use Continuous Process Improvement To Achieve And Maintain Level Of Service Availability • Elements • Benefits − Collected availability data will be − The business is enabled to make used to identify areas requiring recommendations on availability improvement improvements − Implementation of any availability process improvement must be controlled by the change management process to control impact February 18, 2010 43
  • 44. Core Principle 8 - Planned Downtime Must Be Minimised For Business-Critical Functions And Unplanned Downtime Is Handled By Service Management Processes • Elements • Benefits − Planned and unplanned downtime − Expectations are set with the must be clearly notified to the business business − IT demonstrates commitment to − Acceptable versus unacceptable supporting business-critical unplanned downtime for business- functions critical functions must be defined − Escalation procedures will be developed and distributed February 18, 2010 44
  • 45. Use Core Principles as Checklist for Independent Verification of Availability Design and Processes 1 Availability requirements are based on the agreed and defined needs of the business 1.1 Conditions for availability must be aligned with the needs of the business 1.2 Relevant availability data must be gathered and analysed 1.3 Input and validation of requirements must be solicited from the business 1.4 Availability requirements must be documented and distributed for agreement and approval 2 The IT function will determine the overall requirement of availability, performance and recoverability of systems under the terms of a service agreement with the business 2.1 Requirements are met under defined and agreed service agreements 2.2 Good working relationships need to exist with key suppliers and vendors 2.3 Changes to environment must be reflected in service agreements 3 Infrastructure needs to be designed to routinely incorporate availability requirements 3.1 Changes in infrastructure and business needs must reflected in availability planning and design 3.2 Availability and recovery requirements need to be explicitly incorporated at the design stage 4 Availability Design And Management Process Must Adhere To Security Policies And Procedures 4.1 Access to IT services must be provided in a secure environment 4.2 Availability processes must be aligned with security policies February 18, 2010 45
  • 46. Continuity Design and Management February 18, 2010 46
  • 47. Continuity Design and Management Process Continuity Design and Management Consists of Two Parallel Sub-Processes Continuity Process Quality Control 2. Continuity Report 3. Management 1. Continuity Evaluation and Escalations of Service Reporting Improvement Continuity Violations Continuity Process Design and Management 1. Conduct Risk and 3. Determine Data 4. Form Continuity 2. Conduct Business Disaster Avoidance Backup and Recovery and Disaster Recovery Impact Analysis Assessment Options Team 6. Continuity 7. Conduct Continuity 8. Maintain Continuity 5. Design and Develop Processing for Critical and Disaster Recovery and Disaster Recovery Disaster Recovery Plan Service Components Rehearsal Plan February 18, 2010 47
  • 48. Structured Approach to Continuity Design and Management • Can be used for an individual system or application or a service that consists of a number of systems or applications or the entire IT landscape • Scope is to define a plan to implement agreed continuity February 18, 2010 48
  • 49. Scope of Continuity Design and Management • Conducting impact analyses on loss of business systems • Designing for service continuity by anticipating disruptions, estimating and measuring reliability and maintainability • Supporting business critical functions • Designing and developing a Disaster Recovery Plan • Design and developing Disaster Recovery Training • Planning for and performing disaster mitigation and avoidance • Assessing and managing risk February 18, 2010 49
  • 50. Structured Approach to Continuity Design and Management Continuity Analysis and Design 1. Conduct Risk 5. Design and 6. Continuity 7. Conduct 8. Maintain 2. Conduct 3. Determine 4. Form Continuity and Disaster Develop Disaster Processing for Continuity and Continuity and Business Impact Backup and and Disaster Avoidance Recovery Plan Critical Service Disaster Recovery Disaster Recovery Analysis Recovery Options Recovery Team Assessment (DRP) Components Rehearsal Plan 3.1 Identify 2.1 Define 6.1 Identify Backup and 4.1 Define 5.1 Determine 7.1 Design 8.1 Assign 1.1 Identify Business Impact Critical Recovery Options Recovery Team DRP Structure and Rehearsal Responsibility for Potential Threats Analysis Components for for Critical Structure Methodology Programme DRP Maintenance Methodology Continuity Functions 8.2 Establish DRP 2.2 Identify 3.2 Evaluate 5.2 Define DRP 1.2 Assess 4.2 Define 6.2 Develop 7.2 Develop Review and Business Operation of Notification Probability of Recovery Team Options for Rehearsal Maintenance Functions to be Backup and Schedule and Threats Functions Continuity Scenarios Procedures and Analysed Recovery Options Process Schedule 3.3 Determine 1.3 Evaluate 2.3 Define 8.3 Integrate DRP Backup and 4.3 Define Team 6.3 Develop 7.3 Plan and Current Disaster Business Function 5.3 Define DRP Maintenance into Recovery Options Leaders and Continuity Schedule Avoidance Criticality Escalation Process Change for Critical Members Processing Steps Rehearsals Measures Categorisation Management Functions 2.4 Design 7.4 Develop 1.4 Assess Risk 3.4 Design Backup 5.4 Define Key 6.4 Develop 8.4 Agree and Questions and 4.4 Define Team Rehearsal Controls to and Recovery Recovery Return from Maintain DRP Conduct Charter Evaluation Mitigate Threats Procedures Objectives Continuity Process Distribution List Interviews Criteria 1.5 Determine 2.5 Analyse 5.5 Define 7.5 Conduct Impact of Results of Recovery Steps Rehearsals Reduced Controls Interviews 1.6 Determine 5.6 Define Critical 2.6 Summarise 7.6 Review and Value of Function and Present Analyse Additional Restoration Results Rehearsals Controls Process February 18, 2010 50
  • 51. Step 1 - Conduct Risk and Disaster Avoidance Assessment Step Scope Inputs Outputs 1. Conduct Risk Identify and quantify risks and Risks and threats, historical data, Risk assessment report with and Disaster vulnerabilities to the organisation current environment, current policies, recommendations for Avoidance Assessment processes and procedures improvements 1.1 Identify Identify potential threats, internal Agreement on scope of Continuity Potential threats affecting IT Potential Threats and external, including weaknesses in recovery plan systems are identified the organisation that will cause failure of IT systems 1.2 Assess Assess the probability of the potential Potential threats affecting IT systems Assessment of probability of Probability of threats affecting IT systems are are identified identified potential threats identified Threats 1.3 Evaluate Evaluates current disaster avoidance Potential threats affecting IT systems Evaluation of current disaster Current Disaster measures are identified and their probability avoidance measures Avoidance Measures 1.4 Assess Risk Determine the effectiveness of Current avoidance measures Assessment of risk controls to Controls to controls in deterring threats reduce threats Mitigate Threats 1.5 Determine Determine how effective a control Assessment of risk controls to reduce Impact to organisation without Impact of Reduced would be in deterring the threat, threats adequate disaster recovery limiting the cost of the risk and controls Controls minimising the impact threats have 1.6 Determine Determine which risks the Assessment of risk controls to reduce Value to organisation of Value of Additional organisation is willing to accept and threats, impact to organisation additional controls those to be controlled Controls February 18, 2010 51
  • 52. Step 2 - Conduct Business Impact Analysis Step Scope Inputs Outputs 2. Conduct Conduct business impact analysis In Risk and disaster avoidance assessment Critical function categorisation Business Impact order to know which functions are List of recovery requirements Analysis the most critical to the organisation for processing critical functions for survival 2.1 Define Defines methodology and process to Business systems Agreed methodologies and Business Impact be used in Business Impact Analysis processes to be used in based on the risk and disaster Business Impact Analysis Analysis avoidance assessment Methodology 2.2 Identify Identify business functions to be Agreed methodologies and processes Business functions identified for Business Functions analysed for risk and disasters to be used in Business Impact Analysis analysis to be Analysed 2.3 Define Defined categorisation criteria for Identified business functions Criteria for categorising Business Function each business function business functions Criticality Categorisation 2.4 Design Design and validate questions and Defined criteria for categories of Validation of business losses Questions and conduct interviews business functions Conduct Interviews 2.5 Analyse Results Analyse the data and validate findings Validation of business losses Analysis of data of Interviews if necessary 2.6 Summarise and Develop conclusions and present final Analysis of data Conclusions and final report of Present Results report regarding Business Impact Business Impact Analysis Analysis February 18, 2010 52
  • 53. Step 3 - Determine Data Backup and Recovery Options Step Scope Inputs Outputs 3. Determine Data Determine data backup and recovery Available time to backup and recover Recovery objectives Backup and options based on the requirements Acceptable downtime List of backup options, Recovery Options for recovering critical functions and the type of disaster or interruption Recovery requirements Supporting procedures being cater for 3.1 Identify Backup Work with business units to identify Conclusions and final report of Backup options for critical and Recovery possible backup options for critical Business Impact Analysis functions business functions Options for Critical Functions 3.2 Evaluate Evaluate previously identified backup Backup options for critical functions Evaluated backup options for Operation of options needs to be for various critical business functions scenarios Backup and Recovery Options 3.3 Determine Determine backup options for those Evaluated backup options for critical Backup options for all critical Backup and critical business functions that business functions business functions currently do not have any backup Recovery Options options or where the options do not for Critical work correctly Functions 3.4 Design Backup Design backup procedures for all Backup options for critical business Backup procedures for critical and Recovery critical business functions functions business functions Procedures February 18, 2010 53
  • 54. Step 4 - Form Continuity and Disaster Recovery Team Step Scope Inputs Outputs 4. Form Continuity Establish recovery teams and specify Business needs Recovery team structure and Disaster what each team is to do in the event Recovery requirements Recovery team charter and Recovery Team of a broad range of possibilities members Recovery procedures 4.1 Define Define structure of disaster recovery Decision to proceed Structure of disaster recovery Recovery Team team team Structure 4.2 Define Define the function of each individual Structure of disaster recovery team Functions for recovery team Recovery Team disaster recovery team of each business units Functions 4.3 Define Team Define team leader, alternative leader Functions for recovery team Recovery team leader, alternate Leaders and and other team members for each team leader and members type of disaster and business units Members 4.4 Define Team Define charter for each team along Recovery team leader, alternate team Charter and recovery Charter with the defined roles and leader and members procedures along with roles and responsibilities responsibilities for each Define recovery procedures for each recovery team team relevant to their team role and charter February 18, 2010 54
  • 55. Step 5 - Design and Develop Disaster Recovery Plan Step Scope Inputs Outputs 5. Design and Develop and validate processes and Recovery objectives Recovery Plan Develop Disaster procedures to support the critical Scope of plan Recovery Plan business functions and validate, Business function classification Disaster definitions and classification Recovery team organisation 5.1 Determine DRP Determine the structure and Structure of disaster recovery team Structure and methodology of Structure and methodology of how the plan will be developing DRP developed Methodology 5.2 Define DRP Define the notification schedule and Structure and methodology of Notification schedule and Notification process of recovery developing DRP recovery process Schedule and Process 5.3 Define DRP Define the DRP escalation criteria and Notification schedule and recovery Escalation procedure Escalation Process procedure process 5.4 Define Key Consider the organisation’s key Escalation procedure Consideration of key recovery Recovery recovery objectives and policies while objectives and policies designing DRP Objectives 5.5 Define Define the framework for disaster Consideration of key recovery Disaster recovery steps Recovery Steps recovery to ensure it contains the objectives and policies required recovery steps 5.6 Define Critical Discuss the DRP with business units Disaster recovery steps Accepted restoration process Function to get acceptance to define final restoration process and define Restoration training to be provided Process February 18, 2010 55
  • 56. Step 6 - Alternate Processing for Critical Service Components Step Scope Inputs Outputs 6. Alternate Evaluate critical business function Critical business function components Critical business function Processing for components to determine if alternate Alternatives for processing critical components timelines Critical Service processing procedures are necessary components Alternate procedures Components and feasible for the period between a disaster and recovery and how recovery should be achieved 6.1 Identify Critical Work with business units to identify Accepted restoration process Critical components identified Components for critical components that need alternate processing Continuity 6.2 Develop Develop options for alternate Critical components identified Options for alternate processing Options for processing for critical components in coordination with business units Continuity 6.3 Develop Develop processing steps based on Options for alternate processing Alternate processing steps Continuity the options for alternate processing for critical components Processing Steps 6.4 Develop Return Develop procedure to return from Alternate processing steps Steps to return critical from Continuity alternate processing to normal components to normal processing processing from alternate Process processing February 18, 2010 56
  • 57. Step 7 - Conduct Continuity and Disaster Recovery Rehearsal Step Scope Inputs Outputs 7. Conduct Conduct rehearsals to validate the Rehearsal plan Lessons learned Continuity and success of an organisation’s ability to Recovery procedures Rehearsal report Disaster Recovery respond and recover from a disaster Rehearsal Alternate procedures Rehearsal objectives 7.1 Design Designed programmes for rehearsals Disaster Recovery Plan Programs for rehearsals Rehearsal 7.2 Develop Develop rehearsal scenarios based on Programs for rehearsals Rehearsal scenarios Rehearsal the design of rehearsals Scenarios 7.3 Plan and Plan and schedule rehearsals, both Rehearsal scenarios Schedule rehearsals Schedule planned and unannounced Rehearsals 7.4 Develop Develop evaluation techniques and Schedule rehearsals Evaluation techniques and Rehearsal criteria for each rehearsal scenarios criteria Evaluation Criteria 7.5 Conduct Conduct rehearsals in coordination Schedule rehearsals Conduct rehearsals Rehearsals with all other members 7.6 Review and Document and distribute outcomes of Conduct rehearsals Reports on conducted Analyse Rehearsals the rehearsals to all the members rehearsals along with lessons learned and review reports February 18, 2010 57
  • 58. Step 8 - Maintain Continuity and Disaster Recovery Plan Step Scope Inputs Outputs 8. Maintain Conduct scheduled reviews of the Disaster recovery plan Recommendations for Continuity and contents of the continuity plan Review schedule improvements or changes Disaster Recovery Updated the plan as part of the Approval list from reviewer Plan List of reviewers change management process and with other related changes Review criteria and objectives 8.1 Assign Identify reviewers responsible for Rehearsal review reports Assigned responsibilities to Responsibility for plan maintenance and assign DRP review and maintenance of DRP responsibility DRP Maintenance Review criteria and objectives 8.2 Establish DRP Establish review and maintenance of Assigned responsibilities to review and Procedure for review and Review and procedures and schedules maintenance of DRP maintenance of DRP Maintenance Procedures and Schedule 8.3 Integrate DRP Integrate maintenance process with Review feedbacks and inputs Updated DRP Maintenance into change management processes to assessed changes for their potential Change impact on the continuity plans Management 8.4 Agree and After updating DRP create a Updated DRP Distribution list Maintain DRP distribution list to whom the DRP has to be distributed Distribution List February 18, 2010 58
  • 59. Continuity Design and Management Core Principles 1. Scope of continuity plan must contain clear and realistic recovery objectives and recovery timeframes 2. Risk management and disaster avoidance measures should be in place and practiced 3. Continuity plan including disaster recovery should be designed and developed to support recovery of agreed critical business functions 4. Continuity plan should be rehearsed regularly 5. Continuity and recovery strategies or plans should be integrated into design and deployment of changes to infrastructure 6. Continuity and recovery processes or plans should be reviewed and updated on a regular basis February 18, 2010 59
  • 60. Core Principle 1 - Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes • Elements • Benefits − Recovery process must be aligned − Clear objectives to support business objectives − Defined scope of efforts − It must be ensured that business − Expectations are agreed and impact and recovery investments defined have direct relationship − Coordinated recovery efforts − Recovery time and objectives needs to be communicated and validated − The disasters must be defined, which continuity plan will and will not address − Scope of planning efforts must be stated February 18, 2010 60
  • 61. Core Principle 2 - Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes • Elements • Benefits − Ensure that environment is − Control of preventable, predictable constructed and operated to disasters prevent potential disasters − Minimising and deterring potential − As infrastructure changes and disasters business needs change, ensure risks and exposures are addressed February 18, 2010 61
  • 62. Core Principle 3 - Continuity Plan Including Disaster Recovery Should Be Designed And Developed To Support Recovery Of Agreed Critical Business Functions • Elements • Benefits − Investment for adequate − Expectations are set and agreed preventative, proactive, and upon recovery methods for critical − Minimise significant losses to the business functions organisation in terms of financial, − All business functions and their legal, and operational issues criticality must be defined and communicated to the organisation − Must be ensured that the key customers are reassured of continuity management process February 18, 2010 62
  • 63. Core Principle 4 - Continuity Plan Should Be Rehearsed Regularly • Elements • Benefits − Regular rehearsals must be − Potential for successful recovery is conducted, both planned and high unannounced − Reinforces learning and − Partial and full rehearsals must be commitment conducted − Demonstrates value to − A variety of rehearsal techniques organisation must be used − Identification of potential − Rehearsal objectives and success weaknesses in plan criteria must be clearly defined February 18, 2010 63
  • 64. Core Principle 5 - Continuity And Recovery Strategies Or Plans Should Be Integrated Into Design And Deployment Of Changes To Infrastructure • Elements • Benefits − Must ensure the plans for changes − Continuity is critical component of to infrastructure are considered operating environment with continuity in mind − Continuity strategies and plan have − Recovery procedures must be important role in design and requested for new applications, deployment decisions and plans systems, networks February 18, 2010 64
  • 65. Core Principle 6 - Continuity And Recovery Processes Or Plans Should Be Reviewed And Updated On A Regular Basis • Elements • Benefits − Regular reviews of continuity plans − Keeps continuity plan as a living must be defined and scheduled document − Make sure reviewers are not − Ensures the plan is kept current involved in the development of the − Reminder of continuing purpose of plan and are objective plan and its benefits to the − Integration into the change organisation management process for plan updates must be ensured − Revision, tracking, and distribution list must be defined and document February 18, 2010 65
  • 66. Use Core Principles as Checklist for Independent Verification of Continuity Design and Processes 1 Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes 1.1 Recovery process must be aligned to support business objectives 1.2 It must be ensured that business impact and recovery investments have direct relationship 1.3 Recovery time and objectives needs to be communicated and validated 1.4 The disasters must be defined, which continuity plan will and will not address 2 Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes 2.1 Ensure that environment is constructed and operated to prevent potential disasters 2.2 As infrastructure changes and business needs change, ensure risks and exposures are addressed 3 Continuity Plan Including Disaster Recovery Should Be Designed And Developed To Support Recovery Of Agreed Critical Business Functions 3.1 Investment for adequate preventative, proactive, and recovery methods for critical business functions 3.2 All business functions and their criticality must be defined and communicated to the organisation 3.3 Must be ensured that the key customers are reassured of continuity management process 4 Continuity Plan Should Be Rehearsed Regularly 4.1 Regular rehearsals must be conducted, both planned and unannounced 4.2 Partial and full rehearsals must be conducted February 18, 2010 66
  • 67. Process Quality Control February 18, 2010 67
  • 68. Common Process Quality Control Procedures for Availability and Continuity Continuity Process Quality Control 3. Management 2. Continuity 1. Continuity Escalations of Report Evaluation Reporting Service Continuity and Improvement Violations Availability Process Quality Control 2. Availability 3. Management 1. Availability Report Escalations of Reporting Evaluation and Service Availability Improvement Violations February 18, 2010 68
  • 69. Structured Approach to Availability and Continuity Process Quality Control Availability and Continuity Process Quality Control 1. Generate Report Metrics and 3. Management Escalations of 2. Evaluation and Improvement Reports Service Continuity Violations 1.1 Develop Management Reports 2.1 Evaluate Process for Based on Agreed Metrics Improvement 2.2 Develop Improvements and 1.2 Schedule Report Implementation Plan 2.3 Create and Submit 1.3 Generate Reports Improvement Implementation Plan 1.4 Distribute Reports 2.4 Implement Improvement Plan 1.5 Review Report Schedule 2.5 Review Implementation 2.6 Update Process Improvement 1.6 Update Reporting Schedule Plan February 18, 2010 69
  • 70. Step 1 - Generate Report Metrics and Reports Step Scope Inputs Outputs 1. Generate Report Generate report metrics and periodic Report Schedule Generated or distributed Metrics and and ad hoc reports as per Reports Reports Request for Ad hoc reports requirement or plan 1.1 Develop Report to management the Report requirements Accepted reports, frequency Management contributions made by this process to and costs overall service management Reports Based on Agreed Metrics 1.2 Schedule Update the report schedule Report schedule Updated report schedule Report 1.3 Generate Generate reports according to per Collected metrics Generated reports Reports schedule or in response to ad hoc requirements 1.4 Distribute Distribute the generated report to the Generated reports Distributed reports Reports target recipients 1.5 Review Report Review regularly the report Report schedule Review results Schedule requirements Report details 1.6 Update Update report schedule with the new Report schedule Updated report schedule Reporting reports Schedule February 18, 2010 70
  • 71. Step 2 - Evaluation and Improvement Step Scope Inputs Outputs 2. Evaluation and Perform periodic reviews for process Process metrics Implemented improvements, Improvement performance improvement Future directives Reduced costs, Improved process efficiency and Service level expectations effectiveness Review schedule Improvement plan 2.1 Evaluate Review the effectiveness and Improvement plan Gap analysis report Process for efficiency of the continuity management process regularly Improvement 2.2 Develop Develop and review proposed process Improvement plan Improvement strategy Improvements and improvements Gap analysis report Implementation Revised business requirements Plan 2.3 Create and Create and submit improvement Improvement strategy Submitted improvement Submit implementation plan implementation plan Improvement Implementation Plan 2.4 Implement Manage and coordinate the Approved improvement Implemented improvements Improvement Plan implementation of the process implementation plan Reduced costs improvement plan Improvement strategy Improved process efficiency And effectiveness 2.5 Review Monitor implementation to ensure Implemented improvements Closed improvement Implementation that process is not disrupted and that implementation plan the changes are working as intended Review Results 2.6 Update Process Update the process improvement Process Improvement plan Updated process improvement Improvement Plan plan with any changes Review cycle plan February 18, 2010 71
  • 72. Summary • Availability and continuity are merging into a single unbroken requirement • Availability and continuity can be a significant overhead to an organisation so their cost should yield benefits elsewhere • Most business systems and processes are defined as business critical • Management commitment is needed to ensure availability and continuity can the required attention and resources • Use core principles for availability and continuity for independent verification of processes and designs • Availability and continuity should be embedded into system architectures and designs rather than being an afterthought February 18, 2010 72
  • 73. More Information Alan McSweeney alan@alanmcsweeney.com February 18, 2010 73