Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Business Continuation The Basics


Published on

Business Continuation The Basics

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

Business Continuation The Basics

  1. 1. Business Continuation – The Basics Terminology The terms Disaster Recovery and Business Continuation / Business Continuity are sometimes used inter- changeably. However, the terms have two very different contexts. In the early days of data processing the mainframe computer was usually housed in a large room with very large windows so everyone could see the computer. This led to the term “Glass House”. The term “Disaster Recovery” originally was associated with restoring the “Glass House” environment. Now the term “Disaster Recovery” is commonly related to the restoration of the IT infrastructure and its many technical components. In today’s complex business work environment we not only have to take the “Glass House” into consideration, but also the client / server computer networks (which are spread all over the place). In today’s complex business environment there is also a particular focus on restoring a business unit’s processes and that is commonly referred to as “Business Continuation” or “Business Continuity”. Business Continuation also involves work-area space and needed facilities (i.e., desks, chairs, telephones, office supplies, etc.). Another very important aspect of Business Continuation, which is sometimes overlooked, is the human factor – the business unit staff who will actually perform the recovery of the business unit’s processes. What is a disaster? When asked about disasters most people normally think of fires, floods, tornadoes, earthquakes, and airplane crashes. These are certainly disasters. However, there are other types of events that many people don’t normally think of, but they can certainly be deemed a disaster too: A building is evacuated because of a gas leak. The building and all its contents are undamaged, but the building and all its contents are placed off-limits. A disgruntle employee sabotages essential computer hardware / software / data. A database becomes hopelessly corrupted. A person digging a ditch with a backhoe cuts a fiber optic cable. Network communications are disrupted. An employee downloads a file from the internet. The file just happens to be infected with a computer virus. The virus is unintentionally introduced into the company’s network and causes a multitude of significant problems. Major portions of the IT infrastructure has to be taken off- line during the cleanup process. So, with all this variety taken into consideration, the classic definition of a disaster is: an event, of significant proportion, that disrupts normal business activities for an extended period of time. Disasters occur everyday in varying degrees of magnitude. Sometimes there is little or no warning. Sometimes there is little, if anything, that you can do to prevent them. However, you and your company do have control over how you can prepare for a disaster. Page 1 of 8 Copyright 2009 All Rights Reserved. Business As Usual, Inc.
  2. 2. Why do we need Business Continuation? It should be fair to say that most operating companies would like to remain just that – operational. However, statistics have proven that a company experiencing a disaster without a Business Continuation Plan (Plan) usually ceases operating in a relatively short period of time. A prime reason for going out of business is lost computer data. Without critical computer data (i.e., Accounts Receivable, Order Entry, etc.) a company quickly loses its business focus. At least with a Plan a company can cripple along with its few critical processes. This buys time to determine the extent of the damage and to estimate the length of the outage. If you can’t take care of your customers while experiencing a disaster, your competition most certainly will. Once a customer is lost, it is extremely difficult to get them back. For publicly owned companies, the fiduciary duty to protect the business and corporate assets continues to gain legal standing. Senior management may be vulnerable to stockholder’s litigation if reasonable and prudent steps have not been taken to address risks that may threaten the company’s continued operation. In some industries, primarily financial institutions, the Federal Government has mandated that Business Continuation Programs be established and maintained. How is a Business Continuation Strategy developed? A Business Continuation Strategy can be described as a program (a continuous activity) as compared to a project (a one-time activity with a specific start and a specific end). A Business Continuation Strategy has a lifecycle: Business Impact Analysis (B.I.A.) Solutions Planning Testing Maintenance Each of these lifecycle components will be discussed in more detail below. Business Impact Analysis The classic first step in developing a Plan is conducting a B.I.A. survey. The B.I.A. is a consistent and objective way to gather data regarding a business unit and its processes. Elements of risk are also identified. The survey asks a wide variety of questions. To list only a very few questions: Questions regarding the Business Unit’s mission critical processes: What are these processes? What resources are needed to perform these processes? Who provides inputs to these process? Who receives the output from these processes? What are the process’s deadlines? Page 2 of 8 Copyright 2009 All Rights Reserved. Business As Usual, Inc.
  3. 3. How much (in terms of dollars) adverse financial impact would result from this business unit’s downtime? What is the bare minimum number of persons that would be needed to re-establish the business unit’s critical processes at the recovery site? Specifically, who will perform these critical processes? Specifically, who are their trained back ups? What computer applications are absolutely critical to the business unit’s processes? Who are the business unit’s customers and/or contacts (both internal and external to the company) that would be most affected by its downtime? Do you have their contact information stored off-site? Where is the original application software install media stored? Who are your critical vendors? Do you have their contact information and is this contact information stored off-site? What is the maximum amount of downtime that can be tolerated? Any vital documents, reports, or microfiche? If so, what are they and where are they stored? Are there fines or breach of contract conditions that can result from this business unit’s downtime? Are there highly modified or one-of-a-kind devices currently in production that are critical to the business unit’s operation? If so, do they have back ups? Are the business unit’s servers and/or PC hard drives backed up on a regular basis? If so, is the backup media taken to a secure off-site location in a timely manner? Another reason to perform a B.I.A. is to determine the relative degree of criticality of the company’s business units. If you asked all the company’s managers about their business unit’s degree of “criticality”, they would all say their business unit’s processes are very important to the company’s survival. No one wants to refer to their business unit as non-critical. However, in reality, there will be only a few business units that are truly mission critical, there will be many that are fairly important, and several business units that are non-critical. The reason for realistically classifying the business units to their degree of “criticality” is that you want to recover only the most critical business processes first. Business processes considered less critical can be recovered later in a phased manner. In many cases it is appropriate to tell non-critical business unit’s staff to stay home until instructed to return to work or commandeer the staff to assist mission critical business units. It is unwise, and probably impossible, to attempt restoring everyone at the same time. The B.I.A. process assists senior management in objectively making this criticality evaluation by providing facts and information that can be compared across the wide variety of business units. When is the frequency for conducting B.I.A.s?: It is common Business Continuation practice to conduct company wide B.I.A.s annually. It is recommended that a B.I.A. be reviewed anytime there is a major change in a business unit’s business processes, major changes in computer hardware/software, or when there has been a major business unit organizational change (i.e., merger or divestiture, etc.). Any significant changes identified should also be reflected in the business unit’s Plan in a timely manner. Page 3 of 8 Copyright 2009 All Rights Reserved. Business As Usual, Inc.
  4. 4. Another set of questions normally answered at the preliminary stage is a Risk Assessment. A Risk Assessment asks questions regarding events (i.e., Hurricane, ice/snow, riots, computer hardware failure, flood, etc.) that a particular company location may encounter, an estimate of the probability of an event occurring, and an estimate to the possible level of impact if this event occurs. Solutions Once a determination is made regarding which business units need a Plan, and many low criticality business units may not need a Plan, it is determined just what would be the most appropriate response / method(s) for recovering this business unit’s processes. These “Solutions” will be the foundation for the subsequent step – the actual writing of the Plan. Several factors are taken into consideration. For example: Based on the B.I.A., how quickly does the business unit really need to recover its critical business processes? The term for the recovery time aspect is Recovery Time Objective (R.T.O.). R.T.O. is the maximum length of time, in hours or days, that can elapse before the loss of a business process becomes unacceptable. How much money has been budgeted for the business units’ recovery efforts and/or recovery resources? Do mission critical business units depend on the output of this business unit for their continued processing? Is management willing, based on a cost justification analysis, to assume the risk / consequences of a business unit’s processes not being performed in the usual manner? Examples of common response solutions: Call Center – Re-route incoming calls to a sister site that handles similar calls. Out-source work to an external vendor. Go to a formal recovery site (i.e., internal/external to the company, or business unit staff work from home, etc.) for recovery efforts. “Hold Until Restored” – identify, hold, and manage incoming work until the computer system’s functionality is restored. Work is subsequently processed on a First In / First Out or on a prioritized basis. Planning The classic approach to prepare for a disaster is to create a detailed written Plan for each business unit as deemed appropriate via the B.I.A. Plans can be developed in a variety of ways: There are many commercial proprietary software packages (i.e., LDRPS, Paragon, etc.) that accommodate data input, plan generation, and reporting. Many organization (both large and small) use Microsoft WORD and internally developed templates. There are pro’s / con’s to each of the development methods (i.e., cost, flexibility, maintainability, etc.). Page 4 of 8 Copyright 2009 All Rights Reserved. Business As Usual, Inc.
  5. 5. Regardless of how the Plan is written, at a minimum, it should answer the following basic recovery questions: 1. Where do we go? There are several generic options: Recovery Sites can have a variety of preparedness: Hot-Site – with minimal preparation; walk in, sit down, and begin processing. Minimal delays are expected before a business unit can begin its recovery steps. Warm Site – some recovery resources are in place, but some degree of preparation is still needed (i.e., hardware may need to be reconfigured from a test environment to a production environment, etc.). Some delays can be expected before a business unit can begin its recovery steps. Cold Site – The bare basics are in place (i.e., raised floor, HVAC, etc.), but an extensive amount of preparation / build out is still needed in order to become fully functional for a recovery effort. Considerable delays are to be expected before a business unit can begin its recovery steps. The degree of preparedness depends on how quickly the business unit needs to be back in operation and what kind of budget has been allocated. The cost to recover increases exponentially as the recovery time objective shortens. There are several recovery location options available: An external commercial recovery resource provider. For a subscription fee, these companies will provide various resources (i.e., hardware, telecomm connectivity, work-area recovery space, system software, mobile recovery units, etc.) needed for a recovery. The recovery resources to be provided are specified in a contract. Recovering companies are responsible for supplying their respective application software, data, and personnel. Reciprocal recovery agreement between two companies or business units within a company. The agreement basically says that if one party experiences a disaster it can use the other party’s computer / facility / resources. This alternative is particularly suited to large companies that have large data centers located around the country that can serve as backups for each other. The creation of a Memorandum of Understanding document is common practice to formalize this recovery relationship. The relationship may be bilateral or unilateral. Commandeer internal work-area / resources – It is common for non-critical business unit staff’s work-area / resources to be taken over by mission critical business units during a recovery effort. With high speed networks and many computer applications being offered via the internet, recovering business units’ processes at employees’ homes may be an option. Page 5 of 8 Copyright 2009 All Rights Reserved. Business As Usual, Inc.
  6. 6. 2. What should we do when we get there? When a business unit manager receives a phone call in the middle of the night informing them that their building is on fire is probably not the best time for them to start thinking about what needs to be done. It is really hard to think clearly, objectively, and rationally in this kind of situation. A better approach is to develop a portion of the Plan that is called the “Day 1 List”. This portion of the Plan summarizes the critical few (no more than 10 – 15) items that MUST be performed in the first twelve hours of a disaster. This way the business unit manager and/or their key personnel can go bing-bing right down the list and not overlook something important. If you can act in an efficient / organized manner in the first twelve hours of a disaster, you’re a long way down the road to survival, and hopefully, to recovery. A major aspect of the “Day 1 List” is contacting a wide variety of persons (internal and external to the business unit) and/or organizations and informing them that you are experiencing a problem. This portion of the Plan should detail their contact information (i.e., name, address, telephone number, FAX number, after hours telephone number, email address (office and home), pager number, and cell phone number, etc.). If the person to be contacted is a VIP, who within the organization should make the call? People normally can be very understanding and helpful if they know up front that you are experiencing a problem. However, what they don’t like are surprises. 3. What resources are needed at the recovery site? Some needed items (i.e., desk/chairs, PCs, general office supplies, copy of the Plan, etc.) can be pre-staged at the recovery site. However, there are some items (i.e., backup data from off-site storage, special pre-printed forms from your vendor’s warehouse, etc.) that the business unit will need to obtain once staff arrives at the recovery site. These items need to be detailed in the Plan along with the vendor’s contact information and the time frame in which the items are needed (i.e., immediate, 1 – 2 days, etc.). Another important aspect of the Plan is to have it stored at several secure off-site locations (i.e., key employees’ homes) so the Plan can be quickly and easily accessed. The Plan should be marked and treated as a confidential document. It is recommended that, on an annual basis, the individual Plans be presented to senior management for their review / approval. Testing The initial writing of the Plan is important, but even more important is subsequent testing of the Plan. A very lengthy, eloquently worded Plan can be written and place it in a fancy binder, but is this Plan really capable of doing its intended job (i.e., facilitating the recovery of the respective business unit’s critical business processes)? There is only one way to find out. Test It! As a Plan is tested there will be items identified that will necessitate additions and/or corrections to the Plan. A testing program should be established with a crawl, walk, run philosophy in mind. Begin with basic test goals and steadily increase the scope and complexity of the tests over time. This builds confidence of the test participants and ferrets out the most basic “Gaps In Excellence” first. Page 6 of 8 Copyright 2009 All Rights Reserved. Business As Usual, Inc.
  7. 7. There are a variety of ways that a Plan can be tested: A table-top exercise is conducted. Normally the exercise is set in the context of a specific scenario (i.e., building is inaccessible due to storm damage). Business unit staff discuss their response participation based on the steps outlined in the Plan. A formal scheduled test with limited and specific test objectives conducted with a limited number of business unit staff. Conducted in a test environment. A formal scheduled test of all the business unit’s mission critical recovery activities conducted by the business unit’s key staff. Conducted in a test environment. An unscheduled test of all the business units’ mission critical recovery activities conducted by the business unit’s key staff. Conducted in a production environment. This is the ultimate recovery test challenge. How frequently the Plan is to be tested depends on the business unit’s degree of criticality. Minimally, the Plan should be tested once a year. For Business Units that are deemed “mission critical” more frequently is recommended. It just depends on the business unit’s situation and criticality. Components of a good test: Specific test goals set prior to the test. Adequate preparation by testing participants. During the test the participants record specific issues that occur, recorded on an Issue Form, in a timely manner. A formal test Post-mortem should be conducted by the test participants to determine: What went well. What didn’t go so well and make specific recommendations to address the situation. Did the actual time for the recovery meet the business unit’s stated R.T.O. The goal of the Post-mortem is not to assign blame or point the finger. The Post-mortem’s goal is to determine how the response / recovery processes can be improved. The outcome of the Post-mortem should be a detailed list of action items needed to address the “Gaps In Excellence”. Each topic should be assigned a responsible person and a deadline for completion. The action items should be completed, at the latest, before the next test. It is recommended that the test results, Post-mortem documentation, and the improvement action items be presented to senior management for their review / approval. Some Business Continuation professionals grade the test’s outcome as pass / fail. In a recovery event either the business process is recovered or it isn’t. Not much middle ground. This may sound a little harsh, but Mother Nature/God won’t cut you any slack when you attempt a recovery. Even if you are successful with a test or a real recovery, the recovery event will be a learning experience. There will always be portions of the recovery event that can be improved in some fashion or form. Always look at a recovery event as a learning experience with the goal of improving the response / recovery process. Page 7 of 8 Copyright 2009 All Rights Reserved. Business As Usual, Inc.
  8. 8. Maintenance Have you ever seen a business unit that didn’t have personnel turnover and/or changes in its business processes? Probably not many. The Plan must be kept current and reflect an accurate picture of the business unit, its processes, supporting information, and its recovery / response strategies. Maintenance of the Plan should be periodically performed, minimally once a year, or when there is a significant change in: business unit staff computer hardware / software business processes significant business unit organizational change (i.e., merger or divestiture, etc.) Lifecycle Conclusion This lifecycle is iterative. Once we finish the sequence, we start all over again. So, we’re never really finished. Summary This article is obviously highly summarized and an over simplification of the Business Continuation Planning process. However, it at least illustrates some of the data gathering, data analysis, and thought processes that goes into developing a Business Continuation Strategy. Also, this article is meant to be thought provoking: If your senior management asked business unit managers some of the sample B.I.A. questions, what kind of answers do you think they’d get? Is your company / business units adequately prepared to address the three basic Business Continuation questions?: Where do we go? What should we do when we get there? What resources are needed at the recovery site? Today’s business environments are very complicated and the recovery window is ever shrinking. This places great challenges on management to be ready for whatever may come their company’s way. Being prepared for a disaster may mean the difference of the company staying in business or not. If there are any questions, please call: John Small, C.B.C.P. Business As Usual, Inc. 3532 Pinehurst Drive, Suite 350 Plano, Texas 75075 Office 972-743-2631 Page 8 of 8 Copyright 2009 All Rights Reserved. Business As Usual, Inc.