Key Considerations for Business Resiliency


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Key Considerations for Business Resiliency

  1. 1. Key Considerations for Business Resiliency Steve Suther, Product Manager, CISM March 18, 2010
  2. 2. Agenda ► Business Resiliency, what is it? – Crisis Management – Incident Response – Business Continuance – Disaster Recovery ► Testing Methods ► “Return To Normal” ► Heterogeneous Approach ► Final Thoughts
  3. 3. Business Resiliency – What is It? ► Consolidation of multiple common elements into a single program – Command and Control – Incident Response – Business Continuance – Disaster Recovery ► Provides organization the ability to deal with business impacting events in a structured and organized fashion ► Proactive instead of reactive approach
  4. 4. Crisis Management ► Umbrella for all other capabilities ► Comprised of senior leadership and key stakeholders ► Responsible for crisis identification, classification, management, and resolution ► Uses pre-determined scenarios for guideline development ► Utilizes generic and specific guidelines for activities
  5. 5. Crisis Management Command and Control ► People, processes, procedures, and facilities to identify analyze and react appropriately to business impacting events ► Formulates action plans for pre-determined and unidentified scenarios ► Addresses critical initial 72 hours – Requires the most advanced and prescriptive planning
  6. 6. Crisis Management Leadership Identification and Availability ► Identify key leaders and stakeholders and their functional knowledge – Documentation, responsibilities, financial and signature authority, contacts, etc. ► Do not assume senior leadership will be available in a crisis – Validate or develop delegation of authority – Ensure multiple backups are identified and briefed – Geographically separated whenever possible
  7. 7. Crisis Management Communication Plan – Initial ► Everyone will want to know what happened and what is being done to resolve the situation ► Misinformation will run rampant if clear communication plan is not established and utilized – Rumors perceived as reality ► Zero hour communications should be pre-established and approved – Generic language for initial communications – Include when and how future updates will provided
  8. 8. Crisis Management Communication Plan - Ongoing ► Communication should be performed through multiple platforms – Web, blog, telephone, press release, and in-person briefings ► Update schedule should be structured – Initial updates more frequent then future updates ► Updates should be provided on schedule even if there are no updates – Provides confidence in organizations capability to resolve issue – No update introduces mistrust and perception of possible deception ► Consistency for both internal and external communications – Information will leak – Internal updates should include authentication layer for accountability and traceability ► Interactive updates important at least once every 24 hours during initial 72 hours
  9. 9. Crisis Management Communication Plan - External Assistance ► Critical to have an external entity assist in crisis management activities – Do not use regular public relations firm – Establish retainer relationship – Ensure call center and communication capabilities available ► Provide zero hour communication plans in advance – No content approvals required ► Educate firm about your business and your industry – Identify industry hot buttons and key issues
  10. 10. Command and Control Legal Considerations ► Identify internal and external legal resources – External counsel involvement assists with public opinion ► Establish legally documented delegations of authority – Enable expanded signature authority – Provide proof of authority to internal and external parties ► Develop declaration and completion of incident documents – Enable special powers for designated individuals to be legally recognized – Ensure powers are removed at the end of the incident
  11. 11. Command and Control Information Infrastructure ► Establish rally points for command and control activities – Physical site – Conference bridge ► Ensure sites include redundant capabilities for power, communication, and life safety ► Establish multiple rally points – Geographically separated if possible ► Identify single points of failure and scheduled refresh for supplies and equipment – Base requirements on recovery time and point objectives
  12. 12. Command and Control Grab and Go Books ► Contain essential information for crisis management – Contact information – Processes and procedures – Forms – Communication plans ► Require highest level of data protection controls – Access control and encryption ► Important to constantly update – Electronic versions ideal for data synchronization • Directory and data store synchronization ► Store electronic versions in secure distant location
  13. 13. Tiered Response Model ► Each tier invokes different capabilities and resource availability – Minimizes disruption to normal business activities ► Command and control oversees incident response, business continuance, and disaster recovery – Operational response overseen by operations management – Trust people to do their jobs
  14. 14. Incident Response Events versus Incidents versus Investigations ► Events and incidents require different levels of investigation and response ► Events highlight business impacting activities to investigate – Can lead to incidents ► Incidents require structured and focused response – Identify, analyze, remediate, and document – Formal documentation
  15. 15. Incident Response Operational versus Forensic Response ► Incident identification process classifies response type – Operational or forensic ► Operational response focuses on return to normal activities – Minimal disruption to business activities ► Forensic response focuses on preservation and integrity of evidence (ex., e-Discovery) – Required for litigation activities – Potential for business disruption
  16. 16. Incident Response Recognition of Incident Completion ► Important to identify incident completion – Reduce or discontinue incident response resource usage ► Completion of physical incidents easier to identify then logical incidents – Dormant attack code and multi-phase attacks ► Reduce to operational response instead of discontinuing efforts completely – Operational response team can monitor situation for “flare up’s” ► Engage legal council for opinion in forensic response – Evidence preservation – Chain of custody
  17. 17. Business Continuance Overview ► Focuses on ability of enterprise to operate effectively while encountering business debilitating incident ► Based on business processes not facilities and technology ► Includes partial and complete business disruptions
  18. 18. Business Continuance Key Business Process Identification ► Mapping of revenue streams is traditional approach identifying key business processes – Revenue required for business survival ► Other considerations – Compliance requirements – Contractual arrangements • Service level agreements – Customer expectations – Public and customer opinions
  19. 19. Business Continuance Partner and Vendor Impact ► Businesses are typically customers and consumers of other businesses ► Contractual availability requirements may exist – Service Level Agreements (SLAs) – Legal and financial consequences if requirements are not met can be significant ► Important to establish secondary capabilities to minimize impact to partners and vendors – Reciprocal arrangements with similar organizations – Establish arrangements in advance
  20. 20. Business Continuance Business Impact Analysis ► Enumerates impact of loss of or all of business process capabilities ► Typically performed through surveys and questionnaires – Highlight obvious processes and impacts – Often miss critical considerations and data points ► Business process mapping key to success – Provides visual depiction of all business process elements and dependencies
  21. 21. Business Continuance Business Impact Analysis (continued) ► Identify information infrastructure and data elements – People, processes, procedures, technical infrastructure and data used in business process ► Account for partial loss as well as full loss – Ensures response is measured and appropriate ► Perform Threat and Vulnerability Analysis – High likelihood and business impact
  22. 22. Business Continuance Business Impact Analysis – Recovery Objectives ► Recovery point objectives – Establish key business process and information infrastructure requirements for business resumption ► Recovery time objectives – Establish time windows to reach established recovery points ► Define level of effort and investment for recovery efforts ► Provide realistic metrics – Prioritization schedule for recovery activities – When efforts can be reduced – When efforts should be discontinued – When it is not appropriate to recover business process
  23. 23. Business Continuance Competency Models ► Inventory of skills and knowledge required for recovery and operation activities – Move knowledge from brain to documentation ► Map to Human Resource skill inventories – Regularly identify gaps in available staff capabilities ► Should be simple enough to provide to staffing organizations ► Focused on needs during recovery not for normal business activities – Job descriptions are not sufficient
  24. 24. Business Continuance Staff Availability ► Be careful of the “Hero” assumption – Staff will typically be less effective during incident – Staff may not be willing or able to participate ► Gain commitment to participate in advance – Brief staff on expectations and requirements ► Contract with third party staffing firms in advance for key skill areas – Provide competency models to third parties ► Ensure deep bench of staff available for key skills – Maintain current contact database of candidates
  25. 25. Business Continuance Financial Planning and Reserves ► Availability of funds key to success – Payroll and capital expense plan key during incident – Reserves for initial costs and finance plan – Utilize insurance for long term financial coverage ► Ensure contingencies in place for financial mechanisms – Confidence built after first payments made – Accounts payable and receivable capabilities need to be a high priority
  26. 26. Business Continuance Unavailable / Remote Workforce ► Most unavailable workforce scenarios effect regional areas – Pandemic – Natural disaster – Hazardous material incident ► Limited services available from infrastructure providers – Limited internet bandwidth from service providers – Limited telephony capabilities ► Develop remote capabilities which utilize minimal bandwidth – Limited use of graphics – Text based services – Minimal file transfers • Off hour scheduling – No screen scrape applications
  27. 27. Disaster Recovery ► Focused on physical and technical infrastructure ► Typically utilize mirrored capabilities in separate location – Data centers – Data replication – Working space ► Typically overlook logical disruptions – Account for physical disruptions only – Assume staff capabilities and business process resiliency
  28. 28. Disaster Recovery Recover Remote Or In Place ► Recovery location should be determined by using recovery time and point objective analysis ► Recover in place typically preferred if metrics can be met – Least disruptive to organization – Typically most cost effective – Fastest return to normal ► Ensure remote site appropriately configured and available – Providers with shared service model may not have availability – Data and infrastructure synchronization ► Focus on business process impact – What will cause the least disruption to business activities
  29. 29. Disaster Recovery Overlooked Threat Scenarios ► Logical attacks are more likely then physical attacks – Delayed attack scenario using malicious code – Integrity attacks – Denial of service – Cryptographic attacks ► Insiders are most dangerous adversaries – Trust but verify ► Simple countermeasures can counteract wide range of logical threats – Integrity checks – N+1 access controls for sensitive environments
  30. 30. Disaster Recovery Overlooked Threat Scenarios (continued) ► Access and Availability of Facilities – Replenishment capabilities challenged in regional disaster ► Local authorities may not allow access to facilities – Arrangements for clearance need to be made in advance • Physical access • Telephony and networking ► Network, power, and cooling at shared facilities may not be adequate during regional outage at shared provider ► Backup of backup facilities should be identified – Reduced capabilities compared to primary recovery facility • Virtualization and Software as a Service providers
  31. 31. Table Top Versus Actual Tests ► Table top tests identify obvious challenges – Provide false sense of security – Typically not representative of actual incident ► Actual tests – Full test should be performed with key personnel at least once per year • Perform during time with least disruption • Should be unannounced – Test key elements on regular basis • Communication plan • Activation of alternative environments ► Lessons learned activity essential to testing activities – Lessons learned exercise will enumerate areas for improvement – Important to document findings and update plans appropriately
  32. 32. Return to Normal Considerations ► Easier to activate capabilities then to deactivate them – Crisis will drive cooperation for plan inception – Longer capabilities are in place harder it is to back out of them ► Detailed plans for return to normal as important as inception plans – Utilize same methods and practices as other elements of business resiliency capabilities • Business impact analysis • Recovery point and time objectives ► Phased approach based to de-escalation provides checkpoints and minimal business disruption – Ensure operational effectiveness checks are made at each level of reduction
  33. 33. Heterogeneous Approach ► Do not develop capabilities independent of each other – Focus on business processes ► Multiple capabilities often used during business disrupting event – Interdependencies between elements become apparent quickly ► Cooperative development will minimize costs and ensure interoperability between elements – Consistent language, methods, practices, processes, and procedures ► Develop test cases which utilize multiple elements – Wide scale disruption of services – Case studies of scenarios which have effected similar organizations
  34. 34. Final Thoughts ► Business Resiliency is maturation and consolidation of traditionally separate capabilities – Command and Control, Incident Response, Disaster Recovery, and Business Continuance ► In order to be effective advanced planning must be utilized – Perform threat and vulnerability analysis – Define Recovery time and point objectives – Validate assumptions – Consider details – Develop capabilities heterogeneously ► Capabilities will not account for all possibilities – Develop capabilities which are flexible enough to adapt to any scenario but detailed enough for full recovery of key business processes ► Business Resiliency capabilities will constantly evolve – Business evolution will drive capability evolution as well – Ability to adapt is key to success
  35. 35. Business Continuity Management Automate your approach to business continuity and disaster recovery planning, and enable rapid, effective crisis management in one solution. Overview ► Centralize business continuity and disaster recovery plans, business impact analyses and recovery tasks. ► Prioritize business processes based on the impact to your business in the event of process disruption or failure. ► Test plans to identify process gaps and determine the time it will take to restore processes and infrastructure. ► Track crisis events in real time. ► Implement rapid response plans, contacting emergency responders through phased notification plans. ► Report on plan testing, gap analyses and remediation efforts using real-time reports and graphical dashboards. Benefits ► Automate and streamline your plan creation, review, testing and activation. Business Continuity Management Dashboard ► Reduce effort and expense through a “create once, use many times” approach.
  36. 36. Steve Suther RSA, the Security Division of EMC eGRC Solution Manager