REGIS University ARNe Network Data Backups and Disaster Recovery ...


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

REGIS University ARNe Network Data Backups and Disaster Recovery ...

  1. 1. REGIS University ARNe Network Data Backups and Disaster Recovery Plans for the ARNe Network by Anthony O. Ayodele An Operational Guideline and Procedure for Academic Research Network Practicum Paper submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Information Technology School for Professional Studies Regis University Denver, Colorado 07-30-05
  2. 2. School for Professional Studies Regis University MSCIT Program Certification of Authorship of Professional Project Work Submitted to Dan Likarish Student’s Name: Anthony Ayodele Date of Submission: Title of Submission: Data Backups and Disaster Recovery Plans for ARNe Network Certification of Authorship: I hereby certify that I am the author of this document and that any assistance I received in its preparation is fully acknowledge and
  3. 3. disclosed in the document. I have also cited all sources from which I obtained data, ideas, or words that are copied directly or paraphrased in the document. Sources are properly credited according to accepted standards for professional publications. I also certify that his paper was prepared by me for the purpose of partial fulfillment of requirements for the MSCIT degree. Student’s Signature:
  4. 4. School for Professional Studies Regis University MSCIT program IT Section: Data Access Group Procedure Name: Data Backups and Disaster Recovery Plans Created by: Anthony Ayodele Approval by: Document Library # Date Created: 07-30-2005 Date Approved:
  5. 5. Introduction: This procedure will walk the reader through the backup and disaster recovery procedure plans for the ARNe Network Precedence or Reference:
  6. 6. School for Professional Studies Regis University MSCIT program Advisor/MSC 696 Faculty Approval Forum Student’s Name: Anthony Ayodele Professional Project Title: Data Backups and Disaster Recovery Plan for the ARNe Network Advisor’s Declaration: I have advised this student through the Professional project Process and approve of this final document as acceptable to be submitted as fulfillment of partial completion of requirements for the MSC 696
  7. 7. course. The student has received project approval from the Advisory Board and has followed due process in the completion of the project and subsequent documentation. ADVISOR Dan Likarish. Asst. Professor Signature Date
  8. 8. Abstract Data Backup and Disaster Recovery Plan for the ARNe Network This research paper will save as safeguard and best practice procedure plan for the ARNe network in respect to Data Backups and Disaster recovery, so that we can be fully prepare when disaster strikes. This paper details the methodology approach that will be used by ARNe to implement Data backups and Disaster recovery plan.
  9. 9. Acknowledgment I would like to start by thanking the Author of Life, God the creator of all things. My special thanks go to Asst. Professor Dan Likarish, for his invaluable comments and constructive feedback throughout the course of written this paper. Dan laid out standard and leadership directions that serve as guide through the practicum class My special appreciation goes to Dr Jame Lupo for his assistance on this project. As a co-coordinator for the practicum lab, Dr Lupo provides the direction and useful insight into this project. I offer my heartfelt thanks to my entire course mate for their peer review of the Review of Literature and Research section of this paper. Finally, my thanks go to my father late Bishop Joseph Ayodele for his moral support, and contribution towards my education.
  10. 10. Table of Content 1.0Introduction ..................................................................................................................11 2.0 Review of Literature and Research..........................................................................13 2.1 Microsoft Operations Framework (MOF)...............................................................13 2. 2 Service-oriented architecture (SOA) ......................................................................16 I.Literature and research that is specific/relevant to the project....................................18 II.What is known and unknown about the project topic................................................20 III.Contribution this project will make to the Academic Research Network................21 3.0Methodology.................................................................................................................22 4.0Project History .............................................................................................................26 4.1 Data Backup and Recovery .....................................................................................32 4.2 Disaster Recovery ..................................................................................................46 5.0Lessons Learned and Next Evolution of the Project.....................................................52 5.1 Conclusion...............................................................................................................53 Practicum Support Documentation................................................................................53 List of Tables ...............................................................................................................53 Table 4.1 Data Backup and Recovery Support Matrix .................................................53 Table 4.2 Backup and Recovery Support Responsibilities..........................................55 Table 4.3 Data Backup Configuration and Management...............................................55 Table 4. 4 Data Daily Monitoring and Failure Notification .........................................56 Table 4.5 (file restoration and Recovery of Corrupt or deleted files) ...........................56 Table 4.6 Media Labels..................................................................................................57 List of Figures................................................................................................................58 Figure 2.1 MOF (Microsoft Operational framework) Quadrant ..................................58 Figure 2.2 (SMF service Management function of each MOF Quadrant) ....................59 Figure 2.3 (SOA) Service Oriented Architecture .........................................................59 Figure 3.0 SDLC ( System Development Life Cycle ) .............................................60 Bibliography..................................................................................................................61 References ....................................................................................................................63 Definition of Terms .......................................................................................................64
  11. 11. Chapter 1 1.0 Introduction This project describes the methods and procedure to be used by ARNe for Data Backup and restore. Also act as a safeguard procedure in the event of a disaster. Data corruption, viruses, hard disk failure, power failure, accidental or malicious data deletion, theft and natural disasters are all situations that necessitate attention for a meaningful disaster recovery policy. Security risk analysis, otherwise known as risk assessment, is fundamental to the security of any organization. It is essential in ensuring that controls and expenditure are fully commensurate with the risks to which the organization is exposed. A critical part of handling any serious emergency situation is in the management of the Disaster Recovery Phase. By definition, the Disaster Recovery Phase is likely to involve, to a significant degree, external emergency services. The priority during this phase is the safety and well being of the employees and other involved persons, the minimizations of the emergency itself, the removal or minimization of the threat of further injury or damage and the re-establishment of external services such as power, communications, water etc. A significant task during this phase is also the completion of Damage Assessment Forms.
  12. 12. Disaster Recovery Phase may involve different personnel depending upon the type of emergency and a Disaster Recovery Team should be nominated according to the requirements of each specific crisis. Today, Business continuity planning and disaster recovery planning are now generally acknowledged as a vital element of an organization business activity plans. However, the creation and maintenance of a sound business continuity and disaster recovery plan, is a complex undertaking, involving a series of steps. An organization must analyze what needs to be achieved in order to carry on as though the disaster never happened. Data and assets must be identified for restoration, documentation and reservation to reduce loss. Prior to creation of the plan itself, it is essential to consider the potential impacts of disaster and to understand the underlying risks: these are the foundations upon which sound business continuity plan or disaster recovery plan should be built. Following these activities the plan itself must be constructed. This itself must then be maintained, tested and audited to ensure that it remains appropriate to the needs of the organization. The creation and maintenance of a sound business continuity and disaster recovery plan, is a complex undertaking, involving a series of steps.
  13. 13. Chapter 2 2.0 Review of Literature and Research In supporting and managing the ARNe network, Microsoft Operations Framework (MOF) and SOA (Service Oriented Architecture) will be used as standard for the as a best practice for the System Engineering and Application Development practicum ( SEADP) . The Strategic nature of the ARNE Network, call for an operational framework that can stand the test of time, in other to achieve high 2.1 Microsoft Operations Framework (MOF) The framework is divided into four quadrants namely: 1) Optimizing 2) Supporting 3) Operating 4) Changing See figure 2.1 (MOF) Overview of MOF Quadrant 2.1.1 Optimizing: delivering the best service possible a) Service Level Management - All service provider (Qwest) of the network will be require to meet certain level of service agreement base on the need of the network, In other to serve it purpose. Business focused service levels will be created, managed, met and improved.
  14. 14. b) Capacity management - Meet demands on services by controlling capacity requirements c) Availability Management – ARNE Network will be up and running 24/7, expect when maintenance are been carry out. d) Financial Management – The running cost of 100k will be maintain e) Workforce management - Students in the practicum will be supporting and maintaining the network within the budget constraint. MSCIT faculty member will be in charge of the daily operation of the network There will be new improvement to service and delivery as the network continues to grow. To accommodate any propose changes to services, the approval process includes confirming business priority, cost/benefit analysis and release plans. 2.1.2 Changing: Managing changes in the enterprise a) Change Management – All network changes in term of hardware and software changes will be recorded, tracked, assessed and monitor. b) Configuration Management – All configuration and update on any network infrastructure will conform to standard procedure and business rules. c) Release Management - All software and hardware releases into the ARNE network will be deployed in most efficient manner without any disruption of service
  15. 15. There will be adequate plan, release readiness review, before the release of any new product into the network, to ensure that changes happen smoothly with minimal distribution to the IT Services 2.1.3 Supporting: Responsive high quality support a) Service Desk - Practicum MSCIT student with require skill will serve as first point of contact in problem resolution. b) Incident Management – Track-it service desk will be use to report, monitor and escalate all incident in conformity with SLA (Service Level Agreement) with all parties c) Problem Management - All problems will be determine, manage, resolve and documented on centralized knowledge management database, with the aim of proactively preventing problems happening. This will ensures customer satisfaction by reviewing the IT performance delivered for the services against the targets documented within the Service Level Agreement (SLA). 2.1.4 Operating: Successful, reliable and predictable day-to-day IT Operations a) System Administration – MSCIT faculty will provide day-to-day administrative services, and responsible for providing direction for operations.
  16. 16. b) Security Administration – With the implementation of Single Sign On, and Firewall security, this will ensure IT is safe, confidential, accurate and available c) Service Monitoring and Control – All network resources will be monitor for optimization, availability and efficiency. Notification will be sent to the all right people know what is going on d) Network Administration - Access to the server and all physical component of the network will be restricted to authorized student e) Directory Services Administration - Application delivery will be through the Citrix server, this will ensure that all student and faculty have access to the right information and application whenever they need it. f) Storage management - It is important to have High performance SAN storage with outstanding scalability (IBM Total Storage DS 4300 and the Hitachi Storage) ARNe will make use of these systems as the practicum grows See figure 2.2 (MOF) Overview of Services Management functions of each Quadrant 2. 2 Service-oriented architecture (SOA) Successful integration for today’s business must accommodate a high level of variety and change involving a large number of systems, applications, data format, standards and connectivity for both legacy systems and new applications. Driven by business and technical factors, this growing volatility makes the goal of
  17. 17. enterprise integration a complex, hard-to-reach moving target for today professionals. Service-Oriented Architecture offers a fresh approach for business integration that provides more flexibility technologies such as Web Services, Asynchronous Messaging, Business Process Management (BPM) and the Enterprise Service Bus (ESB) A service-oriented architecture is essentially a collection of services. These services communicate with each other. The communication can involve either simple data passing or it could involve two or more services coordinating some activity. Some means of connecting services to each other is needed. Service-oriented architectures are not a new thing. SOA and its related technologies are being adopted across a range of industries by both large and small to medium-sized businesses. The first service-oriented architecture for many people in the past was with the use DCOM or Object Request Brokers (ORBs) based on the CORBA specification. 1) Services If a service-oriented architecture is to be effective, we need a clear understanding of the term service. A service is a function that is well-defined, self-contained, and does not depend on the context or state of other services. 2) Connections
  18. 18. The technology of Web services is the most likely connection technology of service-oriented architectures. Web services essentially use XML to create a robust connection. The following figure (see figure 2.2) illustrates a basic service-oriented architecture. It shows a service consumer at the right sending a service request message to a service provider at the left. The service provider returns a response message to the service consumer. The request and subsequent response connections are defined in some way that is understandable to both the service consumer and service provider. How those connections are defined is explained in Web Services explained. A service provider can also be a service consumer. With integration processes as the key building blocks of a flexible integration strategy, SOA can accommodate variety and change, thereby fully delivering on the promises of agile enterprise. I. Literature and research that is specific/relevant to the project The main focus of this project is to develop an operational procedure for data backup and disaster recovery on the ARNe network. The area to be examined is: 1) Storage management 2) Continuity management
  19. 19. Each area of the Regis ARN will be analyzed and then suggested guidelines will be created to use in changing the network into a more structured format. a) Storage management. The purpose of storage management is to properly maintain, monitor, and develop policies for storing, backing up, and restoring data. The roles involved with the functions of storage management are storage manager, media librarian, and capacity manager. The storage manager has total responsibility for ensuring proper storage management processes are being followed. The practicum data access groups are responsible for tracking all media used for backup operations. The practicum faculty will be responsible for ensuring that current storage capacities and processes are meeting the requirements of the organization and projects changes to such agreements based upon foreseeable storage growth changes. Microsoft “best-practices” will be used to ensure proper storage configuration management. The use of media sets, off-site rotation schedules, scheduled restoration tests, and server space storage checklists will be used to ensure that proper storage management is being conducted. b) Continuity management: This is concerned with ensuring that critical services remain available to customers. Continuity management is usually associated with disaster recovery procedures and maintaining high availability of services.
  20. 20. The major focus of this project is disaster recovery for the purpose of business continuity and operation in face of any failure defined within other functional areas. Each new NLP participant will be required to become familiar with the processes laid out in this project. . The facilitators of the NLP must ensure that exiting students have updated all processes and that entering students understand the goals, design, structure, and processes for the entire NLP and each site/domain. II. What is known and unknown about the project topic The current Data Backup is not fully operation, though we have all the resources needed to put it into operation. Faculty and practicum student are aware of the existence of the all the tools. Since all operation of the practicum need appropriate documentation, it necessary to have a laid down procedure to serve as a guide for future usage and improvement. With the advent of share- point (a repository point for all documentation and communication). Unknown factor for this project is the budget available for the implementation of the project. I am aware that NLP faculty is always sourcing for fund to make the NLP a success. The goals and guidelines proposed by this project may never be accepted by Regis administration as a cost-effective resource for student learning. Also, an unknown factor is the general acceptance of the procedure and plans mentioned in this project by the NLP faculty. Student Participation at various campuses is not equal and this might hamper the implementation of this project
  21. 21. III. Contribution this project will make to the Academic Research Network This project is to as serve a standard procedure for Data Backup and Disaster Recovery activities on the ARNe. Since, there is no any precious procedure for Data Backup and Disaster Recovery, this project will serve as not just as starting point but as foundation that subsequent Practicum student can be build on . The procedures presented in this project will serve a guide for ARN practicum student that will be involve the actual Data Backup activities. The objectives and guidelines presented in this project offer standards to be followed at each location. In addition, this project will lay out fundamental procedures for maintaining continuity between cycles of NLP students. This project will give Practicum student the basic understanding of Data Backup and Disaster Recovery.
  22. 22. Chapter 3 3.0 Methodology System development methodology provides guidelines to follow for completing every activity in the systems development life cycle, including specific models, tools, and techniques. These methodologies examine the need, and the risk associated with the implementation of the propose Data backup and Disaster Recovery Plan for the ARNe Network. The development phases for this project are will follow the standard format for any systems development life cycle (SDLC) such as: planning, analysis, design, implementation, and support. The real world implementation of this project may or may not occur. The implementation of this project rest solely on ARN NLP management and support of practicum students.
  23. 23. System Development Life Cycle for the Project Planning Analysis Design Implementati Technology on Vendors/Ser vice Provider Support /Maintenance Update and Suspend Review Project documentation Figure 3.0
  24. 24. 1) Planning phase: Planning phase involves the justification of the feasibility of the project if it worth investing time and energy into. The need to have standardized procedure of operation in the ARN NLP in respect of Data backup and Disaster Recovery is very important. The structure for determining guidelines (i.e. the Microsoft Operational Framework) was already chosen my ARN NLP management before this project was constructed. Therefore a need and an operating template were already chosen. The only thing that required planning was limiting the scope of the project down to an area that was manageable. This was accomplished by only examining nine areas of the Microsoft Operational Framework that need to be addressed concerning operational control guidelines, change control guidelines, and continuity planning. 2) Analysis phase: After a examining the rapid growth and expansion of the ARNe network , it is it paramount to have a procedure to handle Data backup and Disaster recovery to safeguard the loss of valuable data and assets . MOF (Microsoft Operation Framework) and SOA (Service Oriented Architecture), best practice was used as baseline for the analysis of the ARNe network The tools used for this project are Microsoft Visio, MOF and SOA instruction guides. The instruction guides are what were examined to determine the “best- practices” to be used within the ARN NLP. Visio was used to create logical, physical, and organizational diagrams. Life-cycle models to be followed
  25. 25. 3) Design phase: this will involves listing of all assets and other valuables data to be protected and backup. Also, software and hardware upgrade will be part of this process. It might require changing or replacing some of the system, to accommodate any anticipated capacity. With full participation of future ARN NLP student, this phase will be an opportunity to get a hands-on experience 4) Implementation phase: this will involve actual execution of the Back up and Disaster Recovery Procedure in steps to ensure that it meets all it intended purpose. This phase with be the easiest to carry out once all resources have been put in place. However, this phase can also be daunted based upon the same factors that inhibit the third phase. Participation of all ARN NLP practicum members, most, especially the Data access group will be needed. Funding is another cogent factor here as well, as lack of funding will make implementation unrealistic. Effective training, constant update and review of documentation are also important. 5) Support Phase: this is not an actual phase on it own. After, implementation phase, Support phase will involve maintenance of the systems. Creating a knowledge based during the support phase will serve as a repository for troubleshooting. And this enhances the improvement of the procedures. Data access group member should be fully involve in support
  26. 26. Implementa tion Chapter 4 4.0 Project History I. How the project began Data Access group of Regis System Engineering and Application Development Practicum are saddle with following responsibility of maintaining the security and integrity of data. To accomplish this task the group are require to; a) monitor server health and performance b) perform backups and disaster recovery c) verify and adjust the security configuration of servers and desktops d) maintain the storage devices e) Create and maintain UNIX, and Window user account. This Project is one of the core assignments of the Data Access group. At the very beginning of this project, it was realized that no specific procedure or standard as per Data Backups and Recovery was in place, though there were previous documentation on Data backups and Recovery. The Adam Brennen’s project “Combine_6_12_04.doc” was one reference for this project. With my participation in various group meeting and consultation with practicum faculty, I was able to establish the basis for this project. After due consultation with SEADP faculty( Dan Likarish and Dr. Jim Lupo ) , it became necessary to have a written procedure on best practices for Data Backup and Disaster Recovery on ARNe network .
  27. 27. With an appreciable knowledge of the ARNe network, assets and distributed applications, this help in the creations and documentation of this project. II. How the project was managed The first phase of this project was to define the goals and objectives of this project. Then a proposal was submitted to the faculty to establish the rationale behind the project. Also, a project outline with an anticipated time line was included in the proposal. Another vigorous phase of this project was the research and fact finding activities, this was done with various consultation with the SEADP faculty and practicum group members. After a thorough research and consultation from other group members, with the input from SEADP faculty, a guideline for this project was established for this project. III. Significant events/milestones in the project The most significant events of this project were the change of direction of the project. The project began as one focus and evolved into a much broader focus. The realization of a need for organizational guidelines, rather than immediate documentation, was significant to the development of this project. Another major milestone was actually the research. Almost all of the processes that need to be implemented in smoothly-run IT organization are already defined within the Service Management Functions (SMFs) of the Microsoft Operational
  28. 28. Framework. The processes just needed to be modified to meet the organizational goals of the NLP. The separation of the ARN development network from the ARN production network reduced the coverage area of the project. The decisions from NLP management were not to change the structure or operations of the ARN production network. The current operations of the production network did not need to be changed. Therefore, considerations for how the development network interacted with the production network could be eliminated from the scope of the project. Interviews with NLP management demonstrated that not every single process needs to be defined according to the recommended, Microsoft “best-practices”. The lack of a true business environment within the ARN development network does not require certain service level agreements (SLAs). The lack of SLAs does not justify implementing all of the SMFs defined within the MOF (i.e. high availability management). This realization also reduced the area of concern for this project. IV. Changes to the project plan My initial intention was to start doing a daily and weekly Data Backups on the ARNe network, but I realized that there are no specific written procedure as to how , and when backup activities should be carry out on the ARNE network .
  29. 29. Due to this development, I decided to concentrate on developing a procedure, which other subsequent practicum student can follow and update accordingly. V. Evaluation of whether or not the project met project goals This project met its goals and objectives: (1) An operational framework was established (2) Basic procedure of Data Backup was established (3) Disaster Recovery plans was formulated All goals and objectives of this project were structured to accommodate future inputs and expansion. VI. Discussion of what went right and what went wrong in the project The area that went wrong with the project has to deal with the lack of current documentation of the existing network structure that is being used at each campus of the DTC. The needs identified by NLP management may not be the needs of each individual campus. High-level ARN management defined the needs for the entire organization. Therefore the guidelines that were developed are going to be downward directed. This may cause unrest and contention amongst member of mid-level ARN management.
  30. 30. In addition, the DTC NLP participants were making changes to the ARN that were not aligned with the guidelines being developed in this project. The implementations that the DTC, and possibly other campuses, were making to the ARN during the development of these guidelines may have to be scraped and redesigned if management actually follows through with the implementation of the proposed guidelines within this project. The major areas that went right in this project are that the chosen operational framework fits well with the goals defined within the project. The fact that the Microsoft Operational Framework was already decided upon by ARN management made the research and development of guidelines very easy to facilitate. VII. Discussion of project variables and their impact on the project The greatest project variable was the decisions of ARN management. The guidelines presented in this project offer a template for “best practices” in an environment that has funding and everyday administration. Many responsibilities fall high-level and mid-level management to ensure that guidelines are followed. Management may not view such highly “expensive” guidelines are necessary. However, the guidelines should be used a “golden state” template and management will tweak the guidelines to meet the needs of the ARN accordingly. VIII. Findings / analysis results The analyses of the results of this project are:
  31. 31. (1) An organizational structure was defined; (2) Needs were identified and prioritized; (3) a logical/physical network structure was designed; and (4) Operational guidelines were created for daily network operations. IX. Summary of results The analyses of the results of this project are: (1) an organizational structure was defined; (2) needs were identified and prioritized; (3) a logical/physical network structure was designed; and (4) operational guidelines were created for daily network operations.
  32. 32. 4.1 Data Backup and Recovery (DTCBACK01 Server as A case study) Backup Procedure: Login to DTCBACK01 From you desktop select Program >>>>> then Control Panel >>>> Then Remote Desktop connection >>> Then enter as show below, then connect 2) Then you should have this screen below
  33. 33. 3) After entering you password you should have the screen below 4) Then select Remote Desktop Connection from the desktop then login again with you username and password. After you successfully login, you should be looking at the screen below.
  34. 34. 4) You can now select VERITAS backup Exec from the desktop or go through the start menu as shown below
  35. 35. Finally you should be looking at Veritas Overview screen and the available option as shown below See table 4.1 (Data Backup and Recovery Support Matrix) See table 4.2. (Data Backup and Recovery Support Responsibilities) 4.1.1 Provide Documentation The Backup and Recovery Team will be responsible for providing documentation on the installation and configuration guidelines of the backup toolsets. The documentation will be posted on the SEAD practicum share point site (www.arn- ) the documentation will be reviewed and updated on a 6-month review cycle.
  36. 36. 4.1.2 Provide Access to the Toolset The Backup and Recovery Team in conjunction with practicum faculty will be responsible for providing all files needed for the installation of each supported version of backup tool, its agents, and software build updates. The locations of the files are found in the documentation. 4.1.3 Install Software and Patches The faculty and Backup and Recovery Team will coordinate software installations on new servers using the documentation and toolset provided by the Backup and Recovery Team. The Backup and Recovery Team will install software patches, updates and fixes as necessary. 4.1. 4 Request and Provide Licenses The Faculty lead will provide appropriate Veritas backup licenses to the Backup & Recovery Team new servers being brought into production, hardware upgrades, database installations, or database upgrades. Different versions of backup software may not be compatible when used in the same backup scheme. 4.1.5 Create Change Control The change implementer will normally submit changes using the MOF change management approach. Backup and Recovery team support will submit changes
  37. 37. for software and update installation. The change must include Faculty and the Backup and Recovery Team Lead as an approver. See table 4.3 (Data Backup and Configuration and Management) 4.1.6 Configure and Verify Backups The Backup and Recovery Team will be responsible for configuring server and file backups. The backup configuration will be checked and verified every 4-6 months and regularly on a failure basis. 4.1.7 Coordinate Database Backups Faculty and Data access Team lead will always coordinate all the backup process to ensure proper security of data 4.1.8 Save and Document Backup Scheme Backup configurations are saved automatically on a regular basis as part of the normal log collection process. 4.1.9 Check Failures and Update Reports The Backup and Recovery Team will investigate the failure reports first thing in the morning Monday-Friday. After failure detection and resolution the failures are logged. 4.1.10 Failure Notification and Resolution The Backup and Recovery Team will try to identify and resolve any failure.
  38. 38. In the case of an unresolved or second consecutive failure, the Backup and Recovery team will notify the their group team lead and faculty in charge to coordinate the best solution The Backup and Recovery Team Lead will work with faculty to determine the Root Cause Analysis of the problem. Knowledge base and other means of troubleshooting will be utilized. A hardware or network failure that impacts a backup will be treated as an Unresolved or Second Consecutive Failure if a workaround cannot be established. The Backup and recovery team will work will notify the faculty in charge to determine if the backup should be moved to other available resources or accept the risk of subsequent failures. See table 4.4 (Daily Monitoring and Failure Notification) 4.1.11 Handling lost of Drives The Backup and Recovery (Data Access group) will notify the faculty in charge of the practicum that a drive was missing media immediately after it is detected. Faculty with Backup and Recovery team lead will take appropriate action. 4.1.12 Team Knowledge Database The Data Backup team will create a Knowledge base document so that know problems could be resolve easily. The knowledge base will be updated periodically. The Backup and Recovery “Team Knowledge Base” is to be used to: • Track progress on large backup issues
  39. 39. • Provide a central location for viewing progress • Allow updates by multiple parties • Provide technical reference for similar issues. The Backup and Recovery Team Lead will create and monitor the entry when the above needs cannot be met using standard failure notification. 4. 1.13 File Restoration and Recovery of Corrupt or Deleted Files The Backup and Recovery team will carry out file restoration. The Backup & Recovery team will liaise with the faculty to know on what file to restore. And make sure the appropriate media has been received and mounted. The Backup and Recovery Team will coordinate and perform the restore. See table 4.5 (file restoration and Recovery of Corrupt or deleted files 4.1.14 Label Media Backup and Recovery team will accurately label all backup media according to the media rotation and retention defined by the by faculty. Media requirements should be communicated to the Backup and Recovery Team faculty. Refer to “Sample Media Labeling” (table 4.6) in the appendix of this document for NetBackup uses barcode media labels in the robotic devices. These can be any 6-character alphanumeric combinations. Labels exceeding the 6-character limitation will truncate from the left. Typically cleaning tapes are designated with a CLN### label. Bar codes should not be transferred from a tape with usable data or the NetBackup database will become corrupted and data recovery is jeopardized.
  40. 40. 4.1.15 Maintain Cleaning Schedule It is recommended that running a proactive cleaning tape at least once every 30-45 days or as needed unless specified by the tape device manufacturer. The Backup and Recovery Team will maintain a cleaning schedule similar to the “Sample Cleaning Tracking Report” found in the appendix of this document. 4.1.16 Change Media and Acquire Off-Site Media Onsite support will mount the correct media in accordance with the backup retention and rotation prior to the start time of the nightly backup. When the backup has exceeded the capacity of the media or when the current media has become damaged, the Backup and Recover Team will follow up with the faculty 4.1.17 Server Access and Administration The faculty on site will allow the Backup and Recovery Team appropriate access to a server whenever the need arise. 4.1.18 Sample Cleaning Tracking Report The name of the tape device is usually associated with the server to which it is connected. TAPE DEVICE JANUARY FEBRUARY MARCH APRIL DTCBACK01XXX Monday 1st Monday 5th Monday 5th
  41. 41. After approximately 12-15 uses the cleaning tape should be replaced.
  42. 42. Data Backup Process Flowchart Data Backup and Restore Procedure Check For problem YES Problem Open Failure detected Report Yes Failur Contact Faculty e Support Notice Troubleshoot with other Data access group NO members Yes Proble Follow m Escalation Update the daily problem log /Knowledge base Verified that Job are now ready for backup or restore Return
  43. 43. 4.2 Disaster Recovery 4.2. 1 Perform Risk assessment and Audit To have effective Disaster Recovery plan for the ARNe network, this project will look at the following threats on the ARNE network as a potential disaster that can affect the network: a) Accidental: loss of power, b) Natural: floods, earthquakes, hurricanes, tornadoes c) Internal: Sabotage, theft etc Inventory of Assets on the ARNe Network and Severity CS Severity ILB Severity DTC Severity Level Level Level Primary Mission Broomfield Importan Firewall Mission W2K3 DC Critical Ghost Server t critical Server = ILOFS03 Secondary Importan Broomfield Mission Citrix 03 Mission W2K3 DC/ t Server = critical critical Ghost ILOFS04 = Server 192.168.X.X
  44. 44. W2K3 Mission SaintMary Mission VMware Mission Citrix critical ( Citrix Server critical Servers Critical Server = 192.168.X.X Solaris 10 Importan SQL server Importan Sun Solaris Important x86 t (NLP- t Server XXXXXXXX) = Win2000 Mission Saintluke( DC/A Mission critical D) Critical NETGEAR fast Ethernet switch (fs116) 16 port NETGEAR Mission NetIQ Mission fast Ethernet critical critical switch ( fs 108 ) Cisco router Mission DTCBACKUP01 Mission 2500 critical Critical ( Backup
  45. 45. Server) Mission critical NLP T1 Red Switch Acadunix.regis Mission .xxxx Critical SEVERITY BASED ON IMPACT IN CASE OF DISRUPTIVE EVENT Mission This will cause extreme disruptions to the network and practicum Critical student wont’ be able to function Important This will cause a moderate disruption to the network 4.2.2 Disaster recovery plan for the ARNe Network The plan will entail the following: 1) The purpose and scope: The recovery scope will cover all assets on the ARNE network in all the three location (DTC, ILB and CS). All detail
  46. 46. documentation (software and Hardware) physical safeguards, Insurance considerations, contingencies should are considered. Also, computer service and telecommunication link to the entire three sites are put into consideration, to avoid undue disruption. 2) Creating, maintaining and protecting backups: All data backup activities at DTC location through DTCBACK01 server will be kept away from the site . Up to date backups of all application and data will be maintained. This step ensure that data are recover incase of any loss. Also this step helps in maintaining data integrity in case of any disparity. All Tapes backups will be protected against strong magnetic fields, which can destroy the tapes 3) Disaster Recovery Team (DRT): ARNe Management will set up a Recovery team to comprise the variuos particum group (Data access group, Development group, Operation group and System Network group) on the ARNe network for effective response to any incident. This team will be organize into various areas of responsible such ; collecting and analyzing evidence, containing and preventing further intrusions, and updating the recovery plan. The Recovery team will have an establised line of communication ( email, phones etc ) 5) Disaster Recovery Procedure: In case of any disaster on the ARNe network, the disaster recovery teams need to be assembled. The team will make a decision on which of the alternate site ( CS, ILB or DTC) they need to utilize
  47. 47. for business continuity purpose, depending on which site are affected by the disaster . Make instant evacuation of personnel (if practicum student) are onsite at the time of incident. On completion of the Initial Disaster Recovery Phase the DRT leader(s) should prepare a report on the activities undertaken. The report should contain information on the emergency, who was notified and when, action taken by members of the DRT together with outcomes arising from those actions. The report will also contain an assessment of the impact to normal business operations. The report should be given to the Practicum faculty management Business Recovery Team (beyond this project), with a copy to any Management team, as appropriate. 6) Recovery procedure: after initial response has been put in place and operation shifted to unaffected site. The recovery team will start the utilization of all data (backups) that has been kept off-site. The procedure for business continuity (This is beyond the scope of the project) and restoration of data will be observed. Also, how to fully recover from the disaster and prompt return to normal business operation needs to be fully addressed. 7) Preparing for a disaster and Test the plan: There will be a constant review of procedure and safeguards measure in place to reduce the risk of a disaster and evaluate the level of impact. This includes general procedure and software safeguards. Also, all systems (Desktop and servers) network devices ,
  48. 48. communication links , and office facilities will be tested on a periodic basis to ascertain the readiness in case of recovery of a disaster . A realistic test of the components of a business continuity plan should be conducted and analyzed so that modifications can be made as necessary. event of serious injury or even death of an employee, it would be beneficial if the person notifying had access to counseling service contact numbers in order to be able to offer this type of support and advice. 8) Assessing Potential Business Impact of the Emergency Assessments need to be made at various stages during the recovery process as to the potential scale of the emergency from a business perspective. During the Disaster Recovery Process, these will include a preliminary damage assessment. The initial assessments will normally be carried out by the Disaster Recovery Team who may call on other specialists to help them with this process as appropriate. The assessments will be based on the particular circumstances applying and the following five point scale may be considered appropriate.
  49. 49. 9) Maintaining Event Log during Disaster Recovery Phase It is important that all key events during the disaster recovery phase are recorded. An event log should be maintained by the leader of the Disaster Recovery Team. This Event Log should be started at the commencement of the emergency and a copy of the log passed on to the Business Recovery Team once the initial dangers have been controlled. The format should include the date, time, title of the event, brief description of the event and outcomes. It should also include follow up action needed, as appropriate. Chapter 5 5.0 Lessons Learned and Next Evolution of the Project No disaster recovery plan is a static document, even this document, but this represent the starting point for the on-going maintenance necessary to keep any such plan current. And To assist those who will be responsible for maintenance of and safeguard of the ARNe network .Each student exiting the NLP is
  50. 50. responsible for ensuring that any relevant information pertaining to Data Backup and Disaster Recovery are well documented as an ongoing process. 5.1 Conclusion Data backup and disaster recovery is a critical aspect for any organization. . Business continuity planning and disaster recovery planning are now generally acknowledged as a vital element of an organization business activity plans. In conclusion, a sound Data Backup and Disaster Recovery plan is essential to protect the well being of an organization. Practicum Support Documentation List of Tables Table 4.1 Data Backup and Recovery Support Matrix Action Data Access Group Approval ( Faculty) Provide Software Documentation Provide Access to Toolset Install Software and Patches (Staffed Sites) Install Software and
  51. 51. Patches Request Licenses Order Licenses Create Change Control for Software Installation Configure Backup Jobs Verify Backup Configuration Coordinate Database Backups Save and Document Backup Configuration Check Failure Report Failure Notification Update Failure Logs Check Failure on Web Reports Hardware Failure Create Issue in Tracking Database Update and Monitor Issue in Tracking Database Recover Corrupt or Deleted Files Mount Correct Media Attain Tapes from Off-Site Storage Location Complete Test Restores
  52. 52. Label Media Maintain Cleaning Schedule Change Media for Backups Communicate Additional Media Needs Provide Appropriate Server Access Table 4.2 Backup and Recovery Support Responsibilities Action Implementation Group ( Data Approval access group ) Provide Documentation Provide access to the toolset Install software and patches Install software and patches Request appropriate licenses Acquire and Provide appropriate licenses Create Change Control Table 4.3 Data Backup Configuration and Management Action Implementation ( Data Approval ( Faculty) access group ) Configure backup jobs Verify backup configuration
  53. 53. Coordinate specific backup for databases Save and document backup Table 4. 4 Data Daily Monitoring and Failure Notification Action Practicum Student Faculty ( Data access group ) Check Failure Reports Immediate failure notification 2 or more failure notification Update the failure logs Check reports on web site Hardware Failure Table 4.5 (file restoration and Recovery of Corrupt or deleted files) Implementation ( Data Faculty access group ) Recover corrupt or deleted files Mount correct media
  54. 54. Attain tapes from off-site storage location Monitor and report Complete Test Restores Table 4.6 Media Labels Daily Media “SERVERNAME ” Monday “SERVERNAME” Tuesday “SERVERNAME” Wednesday “SERVERNAME” Thursday Weekly Media “SERVERNAME” Week 1 “SERVERNAME” Week 2 “SERVERNAME” Week 3 “SERVERNAME” Week 4 Monthly Media “SERVERNAME” January “SERVERNAME” February “SERVERNAME” March “SERVERNAME” April …..
  55. 55. List of Figures Figure 2.1 MOF (Microsoft Operational framework) Quadrant
  56. 56. Figure 2.2 (SMF service Management function of each MOF Quadrant) Figure 2.3 (SOA) Service Oriented Architecture
  57. 57. Figure 3.0 SDLC ( System Development Life Cycle ) Planning Analysis Design Implementati Technology on Vendors/Ser vice Provider Support /Maintenance Update and Suspend Review Project documentation
  58. 58. Bibliography 1) Mayo, Sophie. "Service Oriented Architecture: The Services Opportunity ." An IDC Report Series . 21 Jul. 2005>. This IDC Report series discussed how the emergent of Service- oriented architectures (SOA) and Web services promise to be crucial enablers in the dynamic and on-demand IT and business computing journey. This report series, Service Oriented Architecture: the Services Opportunity, examine the most pertinent topics in this area to help services providers enhance their services portfolio and guide their strategic direction in this rapidly evolving area. 2 MOF Process Model for Operations." Microsoft Operations Framework (MOF). 17 2005. Microsoft Inc. 21 Jul. 2005 < of/mofpm.mspx#ECAA>. This paper describes the Microsoft Operations Framework (MOF) Process Model, one of the two core MOF models. (The other is the MOF Team Model.) The MOF Process Model describes Microsoft's approach to the IT operations and service management life cycle. The Process Model
  59. 59. organizes the life cycle into quadrants, with each quadrant having a specific focus and set of tasks that are carried out through its corresponding set of service management functions (SMFs). 3 Storage Area Network : An approach to Data Backup and recovery ." Storage Environment . Brocade Communication Systems Inc . 21 Jul. 2005 < broca.pdf>. This article highlights some of the advantages of implementing SANs. As enterprise data becomes an increasingly essential business asset, ensuring its stability and protection is more critical than ever. Many organizations have faced the challenge of having to back up growing data within shrinking backup windows. The backup server receives data from other servers across a LAN or wide area network (WAN), then stores that data on centrally owned disk and tape resources. SANs improve storage resource management through centralization, even within distributed information technology (IT) architectures.
  60. 60. References 1 Ciampa, Mark . Security + Guide To Network Security . 2nd ed. Boston : Thomson Course Technology , 2005. 2 Holden , Greg. Guide To Network Defense And Countermeasures . 2nd ed. Boston : Thomson Course Technology , 2003. 3 Hoskins, Micheal. "Developing SOA Solutions To Accommodate Variety and Change." Pervasive Software. 21 Jul. 2005 < ons.pdf 4 Johnson , Judith J. "Disaster Recovery Planning With a Focus On 5 Data On Data Backup/Recovery." 26 2001. SANS Institute. 18 Jul. 2005 <>. 6 Web Services and Service-Oriented Architectures . Barry & Associates, Inc.. 19 Jul. 2005 <http://www.service->. 7 "Developing an effective data backup/recovery procedure." The do's and don'ts of backup . 10 Jul. 2005 ure.pdf 8 Data Center Contingency Management / Disaster Recovery Plan." 25 1998. Disaster Recovery Plan . 12 Jul. 2005>.
  61. 61. 9 "Contingency Planning & Business Continuity Plan Development: Disaster Recovery Plans." 2005. Contingency Planning Technologies. 12 Jul. 2005 <>. 10 "Disaster Recovery : Best Practices White Paper ." Cisco Systems, Inc . 17 Jul. 2005 <>. 11 , . "." Disaster Recovery Journal (). 19 Jul 2005 <>. Definition of Terms Definition of a Disaster: “An event that create an inability on an organization’ part to provide critical business functions for some predetermined period of time” [10] Definition of a Disaster Recovery plan:
  62. 62. “The Document that defines the resources, actions, tasks and data required to manage the business recovery process in the event of a business interruption. The plan is designed to assist in restoring the business process within the stated disaster “[10] Definition of Disaster Recovery: “The ability to respond to an interruption in services by implementing a disaster recovery plan to restore an organization’ critical business functions.” [10] Definition of Data Backups: The back up of system, application, program and/or production files to media that can be stored both on and/or offsite. Data backups can be used to restore corrupted or lost data or to recover entire systems and databases in the event of a disaster. Data backups should be considered confidential and should be kept secure from physical damage and theft” [10] Definition of Backups (Data): “A process to copy electronic or paper based data in some form to be available if the original data is lost destroyed or corrupted”. [10] Definition of Data Recovery: “The restoration of computer files from backup media to restore programs and production data to the state that existed at the time of the last safe backup”. [10]
  63. 63. Definition of Business Continuity: “The ability of an organization to ensure continuity of service and support for its customers and to maintain its viability before, after and during an event”.[10] Definition of Risk Assessment / Analysis: “Process of identifying the risks to an organization, assessing the critical functions necessary for an organization to continue business operations, defining the controls in place to reduce organization exposure and evaluating the cost for such controls. Risk analysis often involves an evaluation of the probabilities of a particular event”. [10]