Your SlideShare is downloading. ×
Recovering From the Ground Up Case Study
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Recovering From the Ground Up Case Study


Published on

Dell builds a blueprint for recovery, leading to a service that can help customers minimize risk when disaster strikes.

Dell builds a blueprint for recovery, leading to a service that can help customers minimize risk when disaster strikes.

Published in: Technology, Business

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. SOLUTION • BACKUP/RECOVERY/ARCHIVING RecoveRing FRom the customeR pRoFiLe Country: United States gRound up Industry: Technology Founded: 1984 number oF employees: 80,000 Web Address: Dell builds a blueprint for recovery, leading to a service that can chaLLenge help customers minimize risk when disaster strikes Dell’s failover plans for disaster recovery needed to be refined to minimize downtime for essential applications and processes and help ensure business continuity in the event of a disaster. soLution The Dell disaster recovery team assessed the potential for loss, prioritized applications and processes, and established thorough and repeatable best practices, ultimately evolving its methodology into Dell Disaster Recovery Consulting Services. BeneFits run It better • Dedication to documentation, testing, and improvement has helped Dell speed failover from eight hours to mere minutes in some cases • Working with auditors to improve processes and create an accountable methodology for fixes has accelerated the audit process and minimized the impact on daily workload • Planning risk avoidance alongside Dell’s insurance carrier allowed Dell to reduce its monthly premium after implementing a secondary data center and failover procedures The ability to stay up and running in the face of a hurricane, fire, malicious attack, or even a blackout can be crucial to the long-term survival of every business. Without effective disaster recovery (DR) and business continuity plans, businesses of all sizes risk lost income, productivity, or worse. As a result, more organizations are developing plans to resume business as usual after an unplanned outage while balancing the risks and costs of disaster recovery.
  • 2. “at deLL, the disasteR RecoveRy poLicy RequiRes that eveRy cLass 1 appLication conducts a disasteR RecoveRy test eveRy yeaR. noW ouR staFF KnoWs the pRoceduRe WeLL enough to FaiL oveR quicKLy Because they have tested yeaR aFteR yeaR.” Debi Higdon, practice lead for DR Services, Dell At Dell, disaster recovery and business continuity ground up,” says Debi Higdon, practice lead for DR hoW it WoRKs are top priorities, and have been for years. A Services at Dell, and DR Test Manager at Dell from haRdWaRe successful disaster recovery plan is most often 2001 to 2008. “As a result, we quickly developed a proven via audits that assess clear and concise set of core best practices around assessing needs • Dell™ PowerEdge™ R900 and Dell plans for recovery procedures, particularly in and provisioning a plan to meet those needs.” PowerEdge 2950 servers with Intel® regard to mission-critical applications. Auditors Xeon® processors aLigning it With Business units also look for proof of successful disaster recovery An effective disaster recovery and business soFtWaRe testing. Since launching its first disaster recovery continuity plan depends on an enterprise’s ability • Microsoft® SharePoint® Server plan into production in September of 2002, Dell to identify critical processes and technologies and • Oracle® Database 10g has successfully passed every annual internal and balance risks with the costs of continuity efforts. In external audit. • Oracle Data Guard software order to achieve the critical assessment necessary • Oracle Enterprise Manager 10g Over the years, Dell has refined and improved those for success, Dell recommends first closely aligning Grid Control plans, developing real-world experience that can IT and business staff to make decisions as a team. help other companies in need of effective disaster “We asked ourselves, who from the business • Oracle Real Application Clusters 10g recovery. Those efforts have today evolved into would call Event Management the quickest if seRvices Dell Disaster Recovery Consulting Services, a an application is down?” says Higdon. “While direct, that usually provides a great impression • Dell Disaster Recovery Consulting Services division dedicated to sharing the disaster recovery knowledge assembled over the last eight years. of how much downtime your business units can “We started with a blank whiteboard and an withstand. Once you have a starting point, IT and absence of preconceived notions, and worked business staff can refine that feedback into clearly on our own to develop a recovery plan from the articulated business goals.”
  • 3. deFining pRioRities in a also learned that a company of its size couldn’t mean more expensive solutions,” says Higdon. “If Business context depend on every employee to be available should data isn’t needed to keep the business operational The Dell disaster recovery team worked with service be interrupted. “You definitely have to in the near term, tape backups stored in a secure Dell business units to conduct a rigorous consider logistics when establishing a plan or off-site location may be appropriate. Virtualization analysis, identifying the applications and building a disaster recovery site,” says Higdon. can also present a cost-effective option for many business processes that were most critical and “How quickly can people get to this site? How businesses, providing flexible disaster recovery in a thus needed to be online first after a disaster. much of the IT staff can work remotely? How can very secure one-to-many relationship.” “At Dell, our customers come first, so we you ensure power can be supplied 24/7? These are Reducing Rto to Within minutes classified the most critical applications as those essential considerations.” With disasteR RecoveRy Best that touch sales, manufacturing, shipping, or BaLancing RecoveRy pRactices service,” explains Higdon. “But it’s not always and Business needs With Over time, Dell’s continued dedication to about revenue—for example, health care oRacLe Rac on deLL improving its disaster recovery plan, processes, and government organizations will have very poWeRedge seRveRs and the use of new technologies has led to a different priorities in the event of a disaster. To Once RTOs and RPOs were determined, Dell began drastic improvement in its RTO. “In building a establish rules for criticality, each organization exploring disaster recovery solutions. By assessing disaster recovery practice from the ground up, we has to determine what could happen in a disaster the potential financial losses of a disaster as began with what appeared to be a very complex and what the impact would be.” well as the risks to its data center, the company set of systems and then simplified them through Prioritizing processes, applications, and data could better balance business needs with the process analysis, automation, and rigorous according to their business impact helps to cost of recovery. “To determine an appropriate testing,” says Higdon. “While Dell’s initial ensure that the appropriate investment is made budget for disaster recovery, we calculated all recovery efforts resulted in six to eight hours to recover the most crucial systems first. In the of the potential financial risks associated with a to fully transition to the failover environment, tiered system at Dell, class 1 applications fail worst-case scenario,” says Higdon. “Working with persistence and experience have helped us over to a secondary data center within four hours, our financial teams, insurance carriers, and even a shorten failover of some mission-critical systems while class 2 applications have a recovery time local meteorologist helped us establish a realistic to minutes.” objective (RTO) of 4 to 48 hours after an incident. budget for a disaster recovery plan.” ReguLaRLy updating Class 3 applications are recovered as a “best documentation encouRages Dell settled on an active/active approach for some effort” whenever resources become available. If ReLiaBLe, RepeataBLe of its mission-critical applications that would provide the mission-critical systems are not set up to be pRocesses rapid failover to a secondary facility in the event of active/active across the entire application stack, To develop and enforce a reliable, repeatable a disaster, helping to ensure as short an RTO and those applications will need to be prioritized if disaster recovery plan, Dell documents the RPO as possible. Dell data centers run Oracle® Real there are not enough resources to support the recovery process for each of its mission-critical Application Clusters (RAC) 10g technology on Dell™ recovery at the time of disaster. applications and infrastructure elements. PowerEdge™ R900 and Dell PowerEdge 2950 servers with Intel® Xeon® processors. Dell data centers “Our rigorous dedication to step-by-step estaBLishing RecoveRy also use Oracle Data Guard software, which helps documentation has been a secret to our success,” oBjectives manage standby databases, and Oracle Enterprise says Higdon. “Over time, we’ve developed Dell’s business units and recovery IT staff also Manager 10g Grid Control, which provides a single a solid template for all applications that worked together to establish RTOs and recovery point of management for all of the Dell global includes step-by-step instructions for failover point objectives (RPOs). Like most organizations, production databases. and information about where the servers are Dell had grown rapidly and in an ad hoc manner, located, IP addresses, upstream and downstream making a complete application inventory a Oracle RAC technology helps to ensure high dependencies, schematics, and more.” necessary first step. “We went through each application availability. If one system in a cluster application one at a time and identified how much Those documents as well as related information fails or is taken down for maintenance, the others downtime the application could withstand,” says like application failover plans, master recovery can pick up its workload instantly. “About 72 percent Higdon. Dell’s mission-critical applications had plans, classification information, and test plans, of the Oracle databases we have in production are much less leeway when considering RPO. “RPO test requirements and scripts are stored on a Oracle RAC 10g,” says Logan McLeod, IT strategist is really about how much data you can afford to central Microsoft ® SharePoint ® server dedicated at Dell. “Oracle RAC provides high availability and lose,” explains Higdon. “Since our mission-critical to disaster recovery and also kept on CDs in scalability, and it enables us to dynamically respond applications are centered on customer interactions, three separate locations. Even if a disaster to ever-changing workloads in our environment.” we want to establish a low RPO.” cripples phone service, network availability, or While supporting extremely short RTOs and RPOs transportation, Dell can still begin the recovery thinKing outside the data centeR Box was critical to helping Dell reduce the impact process. “If documentation is not updated at In many cases, enterprises build a disaster recovery of downtime on sales and customer-oriented least once every six months, a red flag is raised plan that assumes communications modes will be processes, other organizations may have different at the executive level through the disaster operational and the network will be available. Dell priorities. “As a general rule, shorter recovery times recovery scorecard,” says Higdon. “By keeping
  • 4. “By Keeping the step-By-step RecoveRy pRocess FRont oF mind, deLL ensuRes that its pLan is Ready When the company needs it most.” Debi Higdon, practice lead for DR Services, Dell the step-by-step recovery process front of mind, By adding integration testing to its course of a management action plan until they are completed. Dell ensures that its plan is ready when the failover testing, Dell gets a clearer idea of what Our documentation and processes have helped company needs it most. Dell is moving forward could happen during a real recovery process. accelerate the auditing process and minimize the with virtualization within its own environment.” “Failing over a single application may be easy,” amount of time it takes to conduct an audit.” says Higdon. “But when that application talks to assessing appLication and end-to-end pLanning pRevents 20 other applications and they all go down at the pRocess inteRaction pRevents singLe points oF FaiLuRe same time, that’s when you really know what’s domino-eFFect doWntime “There can be no gaps,” says Higdon. “A single going to happen in a real disaster. By simulating Over time, Dell Disaster Recovery Consulting point of failure could be a particular application or catastrophic outages, Dell learned how to Services has learned to assess the interactions of database server, a lone backup generator in a data account for application interdependencies as it applications and business processes both before center, or the long-haul network itself. Organizations restores service.” and after an outage. “Good planning requires should perform a specific and detailed single-point- looking ahead a step,” says Higdon. “If a class cRoss-tRaining impRoves of-failure analysis across the entire infrastructure. 2 application could cause a domino effect in disasteR pRepaRedness That kind of gap analysis can help prevent a major your mission-critical applications by going down, By ensuring that multiple staff members are outage if a relatively minor component fails.” it should be reclassified as class 1. Likewise, prepared to recover any given system, Dell has chronological priorities should be considered when improved disaster preparedness. “Realistically, The Dell disaster recovery team also determined planning recovery—for example, shipping shouldn’t we had to assume a certain degree of chaos in that its disaster recovery plans had to look come back online before order management. the event of a real disaster,” explains Higdon. past keeping the data center up and running. You have to be able to prioritize applications so “To counter that, we’ve instituted a large degree Effective business continuity planning must that if you don’t have enough resources to bring of cross-pollination when it comes to recovery support all vital business functions, such as everything back up at the same time, you can bring assignments in IT. During a DR test, we rely on a shipping and manufacturing. “It’s not just about them up in an order that makes sense.” well-trained networking team halfway around the applications—it’s about buildings, infrastructure, world that is far removed from any disaster at and people,” says Higdon. “By creating a plan to RigoRous testing Reduces our headquarters. Database administrators rotate reroute manufacturing orders from a destroyed FaiLoveR time and pRepaRes the support of the databases during a disaster manufacturing facility or rerouting calls to a call FoR compLex RecoveRy recovery test. If something catastrophic happens, center that has been taken off the grid, we can Dell Disaster Recovery Consulting Services is I feel confident because we don’t have just a single improve business continuity for processes outside quick to point out that disaster recovery should employee who knows any given application.” of our data centers.” not end with the failover plan. “Our number-one best practice is to test our disaster recovery plan, RecoveRy pLanning pRotecting systems heLp again and again,” says Higdon. “At Dell, the eases auditing LoWeR insuRance pRemiums Disaster Recovery Policy requires that every class 1 compLiance pRocesses Dell also discovered a hidden cost benefit as it application conducts a disaster recovery test every In addition to yearly internal audits, Dell’s extensive completed the first phase of its disaster recovery year. Now our staff knows the procedure well disaster recovery planning has drastically eased plan. By implementing a secondary data center enough to fail over quickly because they have compliance with internal and external audits as and failover procedures, the company saved on its well as yearly Sarbanes-Oxley compliance audits. insurance premium. “From day one, we worked tested year after year. And once we’re running “Instead of forming an adversarial relationship with with our insurance company—who better to from the secondary data center, we process real auditors, we’ve learned to work closely with them teach you about risk than someone that deals with transactions from some applications to make sure and incorporate their feedback,” says Higdon. “As a insurance claims every day?” says Higdon. “Once a everything’s running smoothly.” result, action items are assigned a case handler and disaster recovery solution is in place and working,
  • 5. “deLL disasteR RecoveRy consuLting seRvices heLps customeRs design soLutions that Fit theiR unique needs WhiLe aLso BetteR pRotecting theiR Business shouLd the unthinKaBLe happen.” Debi Higdon, practice lead for DR Services, Dell Dell Disaster Recovery Consulting Services ground up. “Everything that Dell has developed— recommends contacting your insurance carrier to templates, documentation, best practices, and ask about a reduction in premium.” methodology—is now being shared with our customers through the DR Service Offering,” LeveRaging RecoveRy says Higdon. “Dell Disaster Recovery Consulting expeRience to BeneFit Services helps customers design solutions that customeRs fit their unique needs while also better protecting Dell now has tested disaster recovery plans their business should the unthinkable happen.” that are ready to help the organization recover mission-critical applications rapidly and For more information on this case study continue to serve customers even in the event or to read additional case studies, go to of disasters—as well as the experience that dell.Com/Casestudies. comes from developing those plans from the Simplify your total Solution at DEll.Com/Simplify August 2009. © 2009 Dell, Inc. Dell is a trademark of Dell Inc. Intel, the Intel logo, and Intel Xeon are registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. Microsoft and the Microsoft logo are registered trademarks of Microsoft Corporation in the United States and/or other countries. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. This case study is for informational purposes only. DELL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS CASE STUDY. G910009224