VMware Disaster Recovery Planning: Essential Checklist


Published on

Planning and implementing a VMware disaster recovery (DR) plan is not a task to be taken lightly. Download this new white paper that will function as a checklist, that can guide you on the creation of a top-notch VMware disaster recovery plan.

Published in: Business, Technology

VMware Disaster Recovery Planning: Essential Checklist

  1. 1. VMware Disaster Recovery Planning: Essential ChecklistSponsored by VMware Disaster Recovery Planning: Essential Checklist Drop the baggage and simplify your Disaster Recovery strategy Sean Clark VCP 2, 3, 4 and VMware vExpert 2009
  2. 2. VMware Disaster Recovery Planning: Essential ChecklistExecutive summaryPlanning and implementing a VMware disaster recovery (DR) plan is not a taskto be taken lightly. If you jump in without some minimum planning you’ll end upwith some surprises you, your boss or your budget won’t be agreeable to. Worseyet, if you skip key steps neglecting to collect key information prior to designing,you can end up with an unwieldy DR solution that doesn’t meet the needs of thebusiness.This eBook will serve as a checklist that can guide you on the creation of atop-notch VMware disaster recovery plan. The checklist follows industry bestpractices for implementing complex technology solutions by taking a phasedapproach to instituting a DR plan. This approach includes the following phases: • Assessment—Gathering key requirements for DR solution • Design—Creating a DR plan to meet business and technical requirements • Deploy—Stand up necessary infrastructure. Install, configure and test solution • Manage—Test your DR plan as frequently as possibleThis approach should be reapplied as your business requirements change or totake advantage of technology advancements that can reduce costs and enhanceDR capabilities. The result is a DR plan that is flexible enough to adapt withthe times and your business. Since we’re talking about recovery of VMwareenvironments, we will focus on leveraging its unique capabilities to the maximumdegree. The unique capabilities and properties of VMware environments allow usto create the ultimate test-driven DR plan and help be a catalyst for moving to100% virtualized environments. The focus of this eBook is the planning side ofDR and less on the execution and operation of the DR plan. As such, well focuson the Assess and Design phases.Assess Business impact analysis Determine RPO (recovery point objective) and RTO (recovery time objective) Understand your budget Understand application dependencies Automate VMware environment data collectionDesign Virtualize stragglers Analyze resource requirements Design for easiest restore Decide on infrastructure configuration Test-drive DR plan 2
  3. 3. VMware Disaster Recovery Planning: Essential ChecklistBusiness Impact AnalysisLet’s be frank. As cool as server virtualization technology is, it doesn’t provideyour business with revenue, unless you are company like VMware or Veeam.As IT professionals, your job is to design, develop and manage the technologysystems that support the business’ ability to turn a profit. But equally important,is the ability to restore these systems as rapidly as possible in a disaster. Beforeyou begin spending money on your disaster recovery software, hardware andfacilities, you need to know where to start. You should first assess your businessand gain an intimate understanding of which processes, systems and data aremost critical to the success and future survival of your company. This process isreferred to as a business impact analysis (BIA). Without one, you open yourselfto the risk of wasting resources or overprotecting assets of little value to yourbusiness. Worse yet, without a BIA, you may end up neglecting to plan recoveryfor key IT systems that your mission-critical systems depend on.Whats in a BIA?A BIA can be as complex and time consuming as you want it to be. Complexityand time are also a discrete function of the size and complexity of your business.In either regard, all business impact assessments are looking to accomplish thesame fundamental tasks. For example, the following list was gathered from agreat free BIA template published by the U.S. Centers for Disease Control andPrevention for the purpose of helping guide the development of a DR plan. Inthis template, the following main areas of assessment are addressed:Key BIA activities 1. Identify critical business systems 2. Identify system resource dependencies 3. Identify key support personnel or teams 4. Estimate disruption impact 5. Determine resource recovery priorityHaving this granular knowledge of the business impact of your critical systemsnot only will help you in a real disaster scenario, but it will help you test yourpreparation for the disaster. Knowing where to focus your DR planning effortswill help you greatly streamline and prioritize your disaster recovery exercises,or tests.The BIA is not just a technical system inventory, but you’ll need to work withthe business side to define RPO and RTO for key business services and the ITservices they depend on. Setting the objectives gives DR planners and managerssomething to design and manage to. Without defining these key requirementsfor your disaster recovery planning, you risk spending too much on a higherlevel of data protection than is required. The other risk is that you don’t provideenough protection for your critical resources and end up costing the businesstime and money in the event of a disaster. In the following section, we’ll walkthrough why RPO and RTO planning matters. 3
  4. 4. VMware Disaster Recovery Planning: Essential ChecklistDetermine RPO and RTODuring your BIA, you should spend a good deal of time developing an intimateunderstanding of how your business makes money and catalog the key resourcesand processes necessary to enable revenue creation. In some companies,thismay be a single business process that will translate to clear guidance onRTO/RPO requirements. In larger companies, with diversified products andservices, you will likely catalog multiple use cases to address with different DRstrategies. Regardless of the business size, the process of DR planning is similar,and we apply the same process to categorize the use cases we discover in our DRassessments. The following are some example DR use cases. By reviewing thesecommon use cases and identifying their RPO/RTO, you should be able to applysimilar logic to your own business and gather these guiding requirements.Educational Institution: Keeping costs lowOn one end of the spectrum, would be a grade school with fixed budget basedon enrollment and no revenue generation based on system up-time. However,the core product and service coming out of the school is the quality of thechildren’s education. In an environment like this, it’s important to keep costs aslow as possible, but still provide a system that will reduce cost of recovery thatcan likely call for third-party IT staff to help recover systems. Most systems canwithstand some loss of data and recovery needs to be timely, but certainly notinstant. Since schools can still operate effectively without IT, it is likely that a RTOmeasured in days to a week will result from this assessment. The RPO wouldlikely be around 24 to 48 hours for most systems.Accounting Firm: CDP during tax seasonThroughout the year accounting firms do a certain amount of business and aretolerant of 24 to 48 hour RPO and RTO. But in the United States, from Februarythrough April, it is the busy tax season. Overtime and packed daytime schedulesare the reality. Data loss and system downtime is NOT an option. Near-zeroRTOs and RPOs are required for this use case. For the small and midsized firms,expensive active-active accounting system designs are not an option, andneither is expensive array level replication. This is a perfect use case for Veeam’sSmartCDP. SmartCDP is a near-continuous data protection (near-CDP) solutionwhere virtual machines (VMs) are continuously replicated to a safe site whichbrings RPO down to around 5 minutes and RTO can be as fast as the accountingfirm’s IT professionals and executives are ready to declare a disaster and restartthe affected VMs.Healthcare: Ensuring quality patient careIn a healthcare setting there exists the perfect storm for DR planning. Todayshealthcare IT systems can actually be a profit center for many organizations.These systems need to be functioning to allow profitable procedures like MRIs,ultrasounds or any of the cardiac-related procedures. The profit side of theequation is important, but not the only concern. In healthcare, patient care isking. If technology is not available to serve patient needs, then the profits aremeaningless. Healthcare operates around the clock and pressures IT to deliverservices at a very high level. These highly critical systems demand a low RTO toguarantee patient care goals are met and healthcare organizations can remainprofitable through systems disasters. Data loss isnt necessarily a financial risk, 4
  5. 5. VMware Disaster Recovery Planning: Essential Checklistunless lawsuits are considered, but a high quality of patient care dictates that thebest patient health information to be available to ensure continuously good carethrough shift changes and doctor rotations. So, in most healthcare organizations,budget becomes the determining factor for what youll end up choosing for RTOand RPO. While working within your fixed budget, you may establish RTOs andRPOs that are as low as possible during the times you require. Veeam Backup Replication with its ability to back up, replicate and provide near-CDP in oneproduct, provides the flexibility to address all healthcare needs.Understand Your BudgetVirtualized DR is much less expensive than traditional physical DR, but it is still anadditional cost on top of your already sizable investment in virtualization software,supported server hardware and new storage systems. Some companies will havethe budget freedom to design the “Cadillac” of DR solutions. For the rest of us,we have to be very aware of the budget limitations and the factors contributingto that limitation. Knowing your budget limit and some key strategies will allowyou to optimize your DR plan to get the most capabilities out of your limitedresources.Budget purchases wiselyOne recommendation for DR budget planning is to consolidate all disasterrecovery options inside a single product so that you’re not paying twice for twosoftware products, two backup infrastructures and the associated operationalcosts that can sink you. Instead, plan to make a strategic investment in virtualizedDR and go “all-in” with Veeam Backup Replication and drop strategies thatleverage legacy components. This can reduce ongoing license and support costas well as simplify operations, creating a substantial cost savings.Phased BudgetIdeally, you should suggest a phased DR assessment budget by the business orby your customer. In this method, you are allowed a fixed budget to conductyour assessment and initial planning with the expectation the outcome will be amore accurate budget estimation of the final solution. This method builds trustwith the business or customer and ensures them that you are not making theserecommendations on a whim and that the DR plan is exercising due diligence.But once the assessment is complete and a preliminary design can be conducted,youll have to a decision to make in determining how much you will ask for.Communicate the VisionJust like companies and their initial server virtualization efforts realized, you mayhave to spend money to save money in the case of virtualized DR. These firstpioneering IT organizations realized that the sooner they virtualized their entireenvironment, the sooner they could start reaping the rewards of reduced powerconsumption, reduced server hardware cost, and greater flexibility in operations.Delaying ROI of virtualizing your physical servers was not recommended and thesame thing is true with a virtualized DR plan. The longer you have to maintaina legacy DR solution alongside a virtualized one, the more risk your businessaccepts, the more costly DR exercises are, and the more operational costs ITdepartments take on to maintain two or more separate DR solutions. At this pointin the budget process, its important to lay out the long term vision for 100% 5
  6. 6. VMware Disaster Recovery Planning: Essential Checklistvirtualized DR and communicate the benefits of legacy-free disaster recovery. Ifbusiness leaders understand the true value of virtualized DR, you should havesuccess funding the project properly to realize the full benefits. A good ruleof thumb is to shoot for the plan that can create the lowest TCO (total cost ofownership) over the next 3 to 5 years.Executive ChampionVirtualization can be a complex topic for business leaders to understand.Throwing DR planning on top of that can sometimes put non-technical leadershipinto a corner they are uncomfortable with. To make sure you get the DR planyour company needs, youll need to understand the concerns and realities ofthe business, and be able to clearly communicate the solutions benefits tothe company. If you are dreading these conversations you should consideridentifying an executive champion to get involved. This champion is usually anexecutive familiar with IT but accustomed to planning with other executives andspeaking at a technical level they can understand. For companies without formaltechnology executive positions like a chief information officer (CIO), an executivechampion dedicated to DR planning project can be a critical component.Understand Application DependenciesIt’s crucial to understand dependencies of core applications when creating yourDR plan. Does an application depend on an external database on another VM?What restore order is needed to test the application? And even today, we stillneed to be concerned about what servers are still physical. These dependenciesare critical to catalog and plan for. No matter how well you protect the criticalVMs, if you forget the weakest link in the dependency chain, you might as wellhave not protected any systems.Active Directory, DNS and DHCPEveryone has some critical applications that are core to the organization’s success,and maybe people look to protect these applications first. This is a mistake thatcan be avoided but continuing to follow the change of dependencies down thelowest level. Assuming you have basic physical facilities and network accessaccounted for in the DR plan, you need to make sure that core infrastructureservices are next on the plan. Like buying a car without wheels, restoring yourcritical applications without these core infrastructure services will have you goingnowhere fast.Core Infrastructure Services Needed for DR • IP address assignment (DHCP, etc.)—Local to each site and required for any communication on the network • DNS—Ensures that servers and PCs understand how to reach each other as well as Internet resources • Active Directory—Provides directory service to secure access to recovered systemsThese services can be provided in a number of ways, but in the majority of smallbusiness and mid-sized companies, we are talking about Microsoft WindowsServer VMs running Active Directory. Before Active Directory servers providing 6
  7. 7. VMware Disaster Recovery Planning: Essential ChecklistDHCP and DNS can be recovered, you need to ensure you have a strategy inplace to guarantee their recovery. If Active Directory servers are restoredincorrectly, you will waste precious hours conducting manual recovery steps torestore function to this critical service. Ensuring a successful trouble-free restoreactually starts with a VM backup and replication tool that fully supports VSS(Volume Shadow Copy Services) for both backups and restore. Veeam Backup Replication has provided this functionality since its first release. This VSS-awaretechnology is the key layer of defense for situations in which you are unable toreplicate Active Directory to your disaster recovery facility. Example scenariosinclude single site businesses without the budget for a DR site, or businessesthat choose to only maintain DR contracts with cloud providers and don’t wantto maintain active VMs that incur monthly fees.Mission Critical ApplicationsMission critical applications are the lifeblood of the company and consist ofstateless application server VMs as well as VMs containing persistent data likedatabases or file system objects. With traditional legacy backups this can bea terrible chore to document and remain current on all the specific files andfolders inside a VM that needs to be backed up, especially when applicationupgrades may create new files and folders not included in the original backupconfiguration. Focusing on entire VMs for a virtualized DR plan avoids focuson the micro-management of the data protection, but rather on identifying thehigher level service components that need to be protected and restored.Similar to the Active Directory backups mentioned earlier, if your mission criticaldatabase is running SQL Server, you need to leverage VSS-aware products toensure application consistency is maintained during backup and replication.Without this key technology, you will be left with crash-consistent databaseswhich will slow your recovery time after a disaster or during disaster exercises.Microsoft SQL Server provides what’s called a VSS writer with its products, andis one of the key pieces that Veeam can leverage with its own advanced VSSrequestor technology to ensure application-consistent backups. However, if youare not running a database or OS that supports VSS you will have to take otherprecautions. In the non-windows world, there can be various ways to ensuredatabase and application consistency during backup and replication windows.You will need to identify these non-VSS enabled Databases and work with gooddatabase administrators (DBAs) that understand the application requirements soyou can create a backup and replication method that can guarantee applicationconsistency. This may require pre- and post-scripts to be developed, purchasedor borrowed from the community or a vendor. These scripts can help guaranteerecovery consistency by shutting down the database service and freezing I/O onthe mission critical VM until Veeam can initiate the VM snapshot.Stateless Application ServersNot all servers contain persistent data, but all servers have a rebuild time. This canbe as simple as an automated deployment or a large effort involving significanttime and labor costs. These stateless application servers can be important serversto protect, but they will not be protected in the same way as busier serverswith critical business data being creating daily. These servers can be protectedless frequently and triggered only during critical periods containing change. Forinstance, prior to application upgrades or major software changes, you couldinitiate a whole-VM backup and take small, quick incremental backups for a 7
  8. 8. VMware Disaster Recovery Planning: Essential Checklistperiod of time to ensure you have a good known state to restore to. Then shortlyafter the software change has been determined to be successful, you wouldthen want to take another series of full and incremental backups to provide agood known state of the VM to restore back to if disaster occurs after the VMchange. Once you have proven you have good backups of these stateless VMs,the schedule can be relaxed to save bandwidth and I/O. As long as you continueto test restores of the entire application stack, along with its dependent data tier,this can be an option for certain DR requirements.Protect Your Deployment SystemsSome businesses choose to invest heavily into their ability to quickly re-installapplication stacks on stateless VMs rather then protecting more data than istechnically needed. This can be a great strategy for scale-out workloads likeTerminal Server-based application servers, java application or web servers. Inthese situations there is a large ROI (return on investment) from automatingthe provisioning of complex server configurations and utilizing it often forupgrades, server refreshes or scale out operations. This doesn’t mean you’reoff the hook, though. Youll need to follow the dependencies and identify thesystems responsible for re-deploying these stateless VMs to make sure thatthey are protected and available in the DR site. This exercise will raise someinteresting questions about what should and should not be re-deployed aspart of a disaster, ultimately bringing into the question of the value of end-to-end provisioning systems. In short, end-to-end provisioning strategies shouldalso include re-deployed tested whole-VM images as well as whole applicationstacks. Your ultimate decision will just come down to numbers—how many VMsare based on a common configuration and how quickly they need to be restoredin the event of the disaster.Automate VMware Environment DataCollectionVM inventory automation can help quickly and accurately collect informationon the virtual environment that’s invaluable to designing your DR plan. Whetheryou have 25 VMs or 2,500, automating the collection of VM information canjumpstart you on the way to accurately designing your DR plan. There are amultitude of tools available to assist with this data collection task. Since we’replanning for a virtualized DR plan, I’m a big fan of leveraging virtualizationmanagement software from Veeam to help ease this task.Veeam ReporterVeeam Reporter is a great tool that gives you the insight into your environmentvery quickly. There is both a free version and full version of Veeam Reporter.Both can be invaluable in quickly collecting inventory information from existingVMware environments. This information helps document your primary virtualdatacenter, and can be a guide for setting up the disaster recovery site. Outputfrom this tool include spreadsheets, Word documents and even Visio diagramsof your VMs, ESX(i) hosts, VMware datastores and networks. Veeam Reporter cangrab a lot of data in just a few minutes. 8
  9. 9. VMware Disaster Recovery Planning: Essential ChecklistVeeam MonitorWhether you are looking to build out the bare minimum DR infrastructure or youare looking to determine at what point your DR solution is just getting gaudy,youll need good statistics on current resource utilization to properly size yourDR infrastructure. Veeam Monitor can be used to assist you with this resourceutilization collection. Again, there is a free version, but the full version is theway to go if you would like to continue charting virtual infrastructure utilizationafter the planning phase. When measuring virtual infrastructure utilization, weare trying to capture a few key areas that will determine the ultimate cost of thesolution. CPU and memory utilization is important for sizing the servers requiredfor the recovery site. Here we’ll focus on CPU GHz used on average, and GBmemory consumed.Storage Consumption with Veeam Backup ReplicationStorage requirements can be complex to discover. Although Veeam Monitor canreport on how much storage you are using today, it’s not operationally realizedfor a virtualized DR plan. This means it doesn’t give you an easy way to estimateyour storage requirements for backup and replication of whole VMs. To accuratelyassess storage needs you should consider using Veeam Backup Replicationand conduct an actual proof-of-concept (POC) of the software. This will allowyou to learn how the product will work in your environment but also to give youreal-world values for backup storage needs. Everyone can determine the cost tostore full backups since it is just a multiplier of the original disk, but incrementalreplication passes can be tricky. By running Veeam Backup Replication for afew days on actual workloads, you’ll accurately record the real-world values forthe daily change rate as well as gain statistics required to size your DR storagesystems. This daily change rate of the VM data will be critical in determiningfuture replication bandwidth requirements as well. So, with a simple POC of yourpossible DR solution, you’ll be able to gather the statistics needed to properlysize two of your most expensive DR resources: storage and network bandwidth. 9
  10. 10. VMware Disaster Recovery Planning: Essential ChecklistVirtualize the StragglersIf you still have physical servers, it’s time to make the switch. The benefits ofvirtualized DR are well known and have been written about and practiced formore than 5 years. No matter how good your DR solution for physical servers is,it can’t come close to approaching the capabilities and cost of a virtualized DR.But many workloads have avoided migrating to virtual for one reason or another,which throws a wrench in to the DR planning machinery.VMware Is Best DR Platform for Business Critical WorkloadsYou probably have a large portion of your environment virtualized with VMware,but it’s likely that you have some business critical systems that remain on physicalsystems because of their importance. Unfortunately, placing your businesscritical systems on physical servers because they’re important is a mistake. If theywere that critical to your business, don’t you think that providing the ultimateDR solution for them would be your organization’s starting point? These criticalsystems should be virtualized to experience the full benefits of virtualized DR,but not every application owner and DBA fully understands our virtualized DRzealotry. Youll need to communicate the benefits inherent to today’s virtualizedDR to these application owners to win them over.Virtualized DR Benefits • Snapshots—Be able to test critical patches and roll back if failure occurs • Restore entire server image (operating system [OS], application, data) to any hardware without messing with drivers • Refresh hardware with zero downtime • Restores can be automated and tested often • Files and other application items can be restored from VM backup images • VSS integration can ensure application consistencyVMs Performance Is No Longer a BarrierStubborn DBAs and critical-application owners may love to take you up onDR capabilities alone, but they still have one trump card in their hand: VMperformance concerns. Five years ago, they might be justified in playing thattrump card. But with advances in virtualization software and hardware purposelybuilt for virtualization, these concerns are no longer warranted. • VMs can now scale up to 8 processors with 255 GB of RAM • VMs are capable of providing north of 300,000 IOPS when configured with a powerful enough backend storage system • 10 GB networking erases concerns of VMs being starved of bandwidth by other VMs • Next vSphere version due out in 2011 will likely increase these limits moreYes, VMs can scale larger than ever before, but without proper VMware capacityplanning practices, the benefits are negated. Planners still need to ensure they 10
  11. 11. VMware Disaster Recovery Planning: Essential Checklisthave the right consolidation ratios given the virtualized workloads and underlyinghardware resources. Having the tools to properly plan these consolidationratios, monitor utilization and give application owners relevant virtual hardwareperformance statistics is critical. Tools like Veeam Monitor can provide theperformance view at the hypervisor level and provide OS level statistics in asingle view. Having the right plan and the right visibility into performance willhelp gain trust and drive more use of virtualization where you need it most.The Singular Option: 1 VM Per ServerAll the management tools and virtualization know-how may still not be enoughgain permission to virtualize critical servers. Many times these mission criticalworkloads have seen duty on non-x86 server platforms in the past, such as anAS/400 or a proprietary RISC-based platform. These options were not cheap andmany times more expensive than a physical x86 server, so they are not easilydefeated by server consolidation benefits. With today’s powerful virtualizationenabled hardware and VMware software to provide the DR platform, you have asingular option available to meet the performance and DR requirements of thecritical workload. That means dedicating a single host server to a critical VM.Who says just because you are running VMware on your servers that you haveto maintain hero consolidation ratios? Putting performance (and egos) first andnot being afraid to consider limited use of 1:1 consolidation ratios, will help youwin over the stodgiest application owner. Over time, you may be able to win theirtrust to allow higher consolidation ratios as long as performance requirementsare met. The singular option is a little of both, but when compared to the lack ofDR features for physical servers or the cost of the proprietary past, it can makegood sense.On-Demand Sandbox TestingOnce virtualized, another unique benefit you can offer owners of business criticalapplications is the ability to troubleshoot on mirror image systems in real timewithout disrupting the production server image. How is this possible? VeeamBackup Replication offers an On-Demand Sandbox functionality. . This sandboxfunctionality allows you to boot VMs directly from an NFS server presented by theVeeam backup host, without requiring a time-consuming restore or provisioningextra storage to the vSphere environment. This capability allows problems toquickly be resolved since administrators will have exact copies of productionssystems to test with, restart services or try fixes without the fear of disruptingoperations on the live systems. This enables the system administrators versionof the Hippocratic Oath so administrators “do no harm” to production systems inthe process of fixing them. This is no replacement for good system deploymentcapabilities, but there are plenty of instances where, due to the number of systemsdeployed, it doesn’t make sense to spend time automating the deployment. Inthis case, having the entire VM backed up for redeployment is a big benefit fordevelopers.Analyze Resource RequirementsIt should go without saying, but DR planning is about preparing for the loss ofyour primary datacenter. This means you need to size and budget for a recoverysite capable of meeting your requirements in a disaster scenario. During theassessment phase of our DR planning, automating the collection of some 11
  12. 12. VMware Disaster Recovery Planning: Essential Checklistkey virtual infrastructure inventory as well as resource utilization statistics isrecommended. It’s now time to use that information to properly size your DRsolution.ComputeIts best to start with compute statistics, that is, how much CPU and memory areutilized. If youre planning to restore every single server and maintain identicalcapacity, this exercise is easy and youll duplicate your production environmentat the DR facility. In more budget-minded organizations, youre going to analyzethe statistics of production workloads and identify only the critical workloads thatneed to be running in order to establish the estimated DR infrastructure requiredin a disaster. These utilization statistics translate into CPU sockets required.StorageWhen analyzing the storage requirements, we want to make sure that wehave enough raw storage space to store what is necessary. But we also wantto know what kind of performance characteristics are required to drive yourprimary workloads. For storage space requirements you will start with the totalgigabytes of all the VMs that you plan to recover to the DR site. Add to that,the amount of full backups or replicas youll want to keep and the amount ofdaily incremental backups. Generally speaking, DR storage is based on the VMsconfigured memory and storage allocation, so calculations could be derivedfrom the statistics gathered with Veeam Reporter.In addition to the raw space, its important to understand the performancerequired of your storage systems. This is usually measured as IOPS and storagebandwidth. These two statistics describe how active your VM storage is, andwhether you can get by with SATA drives, SAS drives, or whether youd be a goodcandidate for an auto-tiered storage system with enterprise flash or SSDs for tier0, SAS for tier 1 and SATA for tier 2. Many people make the mistake of buyinglarge capacity SATA for DR because they can save on storage purchase costs.However, when it comes time to rely on that storage in a disaster, the availabilityof their systems is in jeopardy due to performance. Its understandable to wantto save money on your DR, but for this size of an investment, you need to makesure youre not shooting yourself in the foot by getting too risky. Analyze thestatistics and budget accordingly.NetworkWe talked about calculating daily change rate of data earlier. If you decide touse replication over a secured VPN connection over the Internet or leased WANcircuit, youll want to know how much data will need to be moved across thenetwork in a single day or replication window. This will help you forecast the sizeof Internet bandwidth required to be successful and if you need to upgrade yourWAN circuits. If your options for Internet bandwidth are limited, this analysis willbe crucial to understand whether you will be a good candidate for replication orwhether whole-VM backups to tape-backed disk archives is a better option foryou.WAN Acceleration NeedIn analyzing your network connectivity, its critical that you understand yourbandwidth, the latency between your primary site and DR site, and the reliability 12
  13. 13. VMware Disaster Recovery Planning: Essential Checklistof the connection. If you have high latency and packet loss connections, youmay not be able to meet your backup windows, and consequently, suffer lowerRPOs. Products like HyperIP from Netex offer WAN acceleration technology thatis purpose-built for accelerating large data transfers over packet loss and highlatency network links. If you are a Veeam customer, they even offer a 1-yearfree trial version of HyperIP to allow you to thoroughly kick the tires beforepurchasing.Design for Easiest RestoreIt is imperative that a good DR plan design for the easiest restore possible. In adisaster situation, there can be a lot of confusion, different environments, andpossibly a different or missing workforce. A disaster is not the time to have manycomplex, manual steps to follow while under the stress knowing your businessability to pay your next paycheck may hang in the balance. Simplicity is king ina disaster situation. You cant count on having your best DR expert available tocoordinate the recovery. You should plan for other personnel to be coordinatingthe recovery while you or your DR expert is stuck on a Caribbean island with thecell phone turned off. Being successful in this situation will require your skeletonDR staff to have plenty of DR exercises under their belt and have the simplestrestore procedures possible. VM Recovery Steps Comparison Legacy Recovery Whole-VM Recovery Method OR with Veeam 1. Provision empty VM 1. Restore whole-VM 2. Reinstall a fresh system at 2. Power-on VM that was pre- the DR site verified with SureBackup 3. Patch the OS 4. Install application binaries Note: Replication would be a and other dependencies single step to power on the VM. 5. Install backup agent and then proceed with restore of unverified application data 6. Configure application to work with recovered data 7. VM is ready to be restarted and application verified for first timeDon’t Reinvent the Wheel, Restore the Whole VMIn a VMware DR plan, you want to utilize the most powerful features of VMs:encapsulation and hardware independence. VMs are just a group of files andcan be copied to other ESX(i) servers (regardless of hardware vendor) and can bepowered on and returned to service, sometimes, without any further modification.For this reason, whole-VM protection methods are superior to legacy methodsthat rely on multiple manual steps to create a workload from start to finish. Fromthe comparison chart below its pretty easy to see why whole-VM recovery ispreferred. 13
  14. 14. VMware Disaster Recovery Planning: Essential ChecklistUsing legacy protection measures on virtualized workloads introduces complexity,cost and risk to into your DR plan that your company can’t afford. By restoringthe whole VM you drastically cut the number of steps required and you openthe door to be able to test and verify the recovery of the VM prior to needing it.Start with ReplicationThere are two general methods to accomplish whole-VM recovery: VM backupand VM replication. The first requires you to backup the entire VM to someexternal storage media or a replicated file system. Then when recovery is required,restore the VM to the recovery site ESX(i) servers through a simple file copy.Veeam Backup can make this process slightly quicker in that it offers instant-onfeatures allowing you to present an NFS export to the ESX(i) server, and boot aVM directly from its backup file. Eventually you would need to perform a StoragevMotion of the VM to primary storage or do a full cold restore of the VM from abackup media/file system to the recovery VMware environment.The other recovery method is to replicate the entire VM to the recovery siteESX(i) servers so that the replica VM is already pre-staged and ready-to-boot. Torecover the VM, you power it on. As simple as VM backup is, one-step recoverywith VM replication is very hard to beat and it is almost impossible to providebetter RTO unless you call active-active geo clustering a recovery technique.”When designing for easiest restore, VM replication has to be on the top of yourlist for tools to consider.Decide on Infrastructure ConfigurationAfter all the interviews, data collection and analysis, youll eventually have to charta direction and make some decisions. Youll have to decide on a final configurationfor hardware, software, off-site data transport method, and recovery processes.There is no one right way to do this but armed with business requirements forrecovery, application dependencies, and your budget guidelines, you shouldhave enough information to start to make some decisionsServersAny x86 server hardware will do here, but the question is more about whatkind of capacity do you require in a disaster and whats the most cost effectiveway to meet that need at the DR site? Those 5-year-old 2U rackmount serverswith 4 total processor cores and 16GB of RAM might do okay in a pinch for asmall portion of your DR environment, but only if you dont recover the wholeenvironment. Although those servers might be free, they dont look so good whenyoure paying for rack space and power for dozens of servers at a co-locationfacility. It may be cost advantageous to purchase new servers that have 10 timesthe capacity which can reduce DR licensing costs for Veeam and VMware whilecutting your physical space and power requirements by a factor of 10. Whateveryour decision, make sure you provide for adequate capacity based on real worldmeasurements from your production VMware environment and guided by theBusiness Impact Assessment (BIA).StorageIf you need vMotion and High Availability at the DR site, youll need to invest inshared storage to go with the VMware ESX(i) servers that youll be replicating toor restoring VM backups to. Choosing NAS, iSCSI and Fiber Channel are all good 14
  15. 15. VMware Disaster Recovery Planning: Essential Checklistdecisions, and most are valid options. If you are a small business or a small remoteoffice, shared storage for VMware may not always be possible and you may berequired to use local storage contained within the recovery servers. Althoughthese setups arent as efficient to manage in a production environment, they canbe good enough in a disaster to allow your business to provide revolutionaryDR capability at a bargain price. In configurations with locally attached storage,youll be happy to know that Veeam Backup Replication can support thatoption as well since it can write to any VMware datastore visible to ESX(i) server.NetworkWe talked earlier about network considerations. Basically, there is enoughnetwork bandwidth or there is not. There either is high latency or there is not. Ifyou have the budget, make the investment in high bandwidth links between yourrecovery site and your primary datacenter. This can allow the most reliability andlowest operational cost for your backups and replication since no error-pronemanual or physical methods are required to me move data to recovery site.Whether it is due to budget-related or geography-related limitations, noteveryone has the network bandwidth available to replicate critical assets. Thatswhy the old adage remains true, Never underestimate the bandwidth of a vanfull of tapes driving down the highway. Your network realities may dictate awhole VM backup to disk or tape that is then trucked off-site for safe keeping orfor test restoration at the DR site. In these situations you will be sacrificing theultimate RTO from the start, so its not as important to have all servers racked,stacked, powered and ready to go. You might even consider alternate means forprovisioning server resources in these scenarios.DR in the Cloud?In the case where replication is not an option and you have VM backup imagesthat can be restored to any ESX(i) servers in the world, why not restore to thecloud? There are countless VMware hosting providers available today that canrent you resource pools or whole VMware environments. Rather than investingin expensive, duplicate datacenter locations that will only be used in the unlikelyevent that an actual disaster occurs, you can instead bank that money and onlypay a small portion in the event a disaster happens or in the event youd like totest your recovery. If you do decide to move forward with restoring to a VMwarehosting provider, you may need to do some advance planning on the contractside to help speed your recovery if needed. Although were getting closer to thedream world that allows you to whip out the company credit card and spin up aDR site in minutes, its more likely that youll want to sign a contract in advancein order to get some guarantee that the resources youll require will be availableshould you declare a disaster. Of course this insurance will cost you, but it will bemuch less than if you purchased the resources full time or if you stood up yourown DR site.Test-Driven DR PlanIn the software development world, a popular software development process istest-driven development (TDD). In TDD, developers first create automated unittests that will only pass successfully if the new piece of code under developmentfulfills all criteria. By writing the test first and then developing the code, quality is 15
  16. 16. VMware Disaster Recovery Planning: Essential Checklistimproved since code can’t be released until the unit tests pass successfully. Thisprocess has proven very successful for software developers, and a derivation ofthe process can now be applied to virtualized DR. This derivation can be calledtest-driven DR and it turns traditional DR planning on its head by planning forthe restore and verification of the restore before you plan to do your first backup.Design with Testing in MindIt takes more than petabytes of shiny deduplicated, compressed backups to savethe day in a disaster situation. If you cant restore successfully and quickly fromthose fancy backups, then what was the point? With todays VMware virtualizedsystems, there is no longer an excuse to not test your recovery as often aspossible. If replica VMs are ready to boot in a test recovery environment, you arejust a PowerCLI script or web service call away from creating custom workflowsto orchestrate the recovery of your protected VMs. By building upon what youlearned with application dependencies and target RTOs, you can create customworkflows that will bring up your protected VMs in the order required to testyour ability to restore. And since were talking about virtualized environments,you can easily adjust the VMware networking with a simple web service call toensure that recovered VMs are placed in an isolated network that you preparedin advance. Your functionality is only limited by your time and scripting ability.However, not everyone is a programmer or has the time to create these workflowsand verification scripts. Veeam SureBackup is the solution to help programmersand non-programmers create a foundation for their test-driven DR plan.SureBackup Recovery Verification Is Test-Driven DRWith the SureBackup functionality in Veeam Backup Replication, you can verifythe recoverability of every VM backup every time. This recovery verification ofindividual VMs is the essence of test-driven DR planning since you are designingtests that will fail until you properly design and execute your automated backupand restore system properly. As in TDD, if a SureBackup verification test fails, youwill reconfigure your recovery or backup until the verification passes all tests.Since it’s all virtual and able to automatically run within an isolated network, youcan run these DR tests repeatedly until you find the hidden dependency, incorrectIP addressing, or out-of-order recovery step. This cycle of DR plan refinement canverify your entire DR plan within a few short days for small DR plans to weeks formore complex DR plans.Test-Driven DR Enables Continuous TestingWith traditional DR exercises leveraging legacy DR solutions, businesses are luckyif they have the time and budget to test their DR plan annually, let alone resolveall outstanding issues with the recovery. The technology is available to maketest-driven DR planning a continuous process for your virtualized environment.You should work toward a goal of near-daily automated recovery verificationsto bulletproof your DR plan. Limiting yourself to annual or quarterly DR tests isa relic of DR planning past that no longer applies. When creating your VMwareDR plan, be sure to not only eliminate legacy technologies of the past, but toscrap the legacy processes as well. Freeing your mind of yesterday’s DR baggagewill help you embrace the possibilities of virtualized DR and fully experience itsbenefits for your business. 16
  17. 17. VMware Disaster Recovery Planning: Essential Checklist About the Author Sean is a ten-year IT veteran with a background in software development, database administration, security coordination, and IT management. The last five years, he has focused on developing his expertise in VMware virtualization and surrounding technologies. He has kept current VMware Certified Professional (VCP) status on VI 2.5, VI 3.5 and vSphere 4. In 2009 Sean was awarded VMware vExpert status, one of 300 globally to receive the award recognizing their contribution to the virtualization community. Since Sean Clark then, Sean has been an active member of the virtualizationVMware vExpert community as a notorious Twitter contributor with the handle of @vSeanClark, as co-instigator of the popular vmunderground.com community party at VMworld, and as a random blogger at http://seanclark.us. He has provided guidance on virtualization strategy to businesses of all sizes and from all industries, and is currently a virtualization consultant with TEKsystems working on a long-term cloud computing project for a Fortune 500 company. About Veeam Software Veeam Software, an Elite VMware Technology Alliance Partner, develops innovative software to manage VMware vSphere®. Veeam vPower™ provides advanced Virtualization-Powered Data Protection™ and is the underlying technology in Veeam Backup Replication™, the #1 virtualization backup solution. Veeam nworks extends enterprise monitoring to VMware and includes the nworks Management Pack™ for VMware management in Microsoft System Center and the nworks Smart Plug-in™ for VMware management in HP Operations Manager. Veeam ONE™ provides a single solution to optimize the performance, configuration and utilization of VMware environments and includes: Veeam Monitor™ for easy-to-deploy VMware monitoring; Veeam Reporter™ for VMware capacity planning, change management, and reporting and chargeback; and Veeam Business View™ for VMware business service management and categorization. Learn more about Veeam Software by visiting www.veeam.com. 17
  18. 18. 2010 Products of the Year GOLD VMware Backup 100% Reliability Best RTOs Best RPOs SureBackup TM InstantRestore TM SmartCDP TM vPower TM Virtualization-Powered Data Protection TM 5 Patents Pending! VMware vSphere5 Patents Pending! NEW Veeam Backup Replication™vPower enables these game-changing capabilities inVeeam Backup Replication v5:  Instant VM Recovery—restore an entire virtual machine IN MINUTES by running it directly from a backup file  U-AIR™ (Universal Application-Item Recovery)—recover individual objects from ANY application, on ANY OS  SureBackup™ Recovery Verification—automatically verify the recoverability of EVERY backup, of EVERY virtual machine, EVERY time To learn more, visit www.veeam.com/vPower
  19. 19. VMware Disaster Recovery Planning Essential ChecklistDR planning in a VMware environment requires old-fashioned DR planning fundamentals but also requiresfully leveraging virtualization’s unique characteristics. This checklist of 10 proven DR planning activitiesprovides you a jumpstart towards an award-winning VMware DR plan. Conduct a Business Impact Analysis (BIA). A BIA helps you identify critical business systems, their IT and human dependencies, and an estimated disruption impact to your business. You can then determine which applications are most important. Know your recovery point objective (RPO) and your recovery time objective (RTO). Develop an intimate understanding of how your business runs and catalog the key resources and processes necessary to enable revenue creation. Translating the findings from the BIA into RTO/RPO requirements for each application, helps you focus your resources where they are needed most. Understand your budget. Virtualized DR is much less expensive than traditional physical DR, but it is still an additional cost on top of your already sizable investment in virtualization software, supported server hardware and new storage systems. Avoid the expense of maintaining both legacy and virtualization-aware backup systems by migrating to all virtual DR. Understand application dependencies. Most applications have dependencies external to the virtual machine (VM) that it runs on. In a disaster, it’s critical to have cataloged these dependencies because you will have to recover each one to restore end- to-end functioning for that application. Start at the base infrastructure services like DHCP, DNS and Active Directory. But don’t forget to account for file shares, databases or other non-virtualized servers recovered through legacy means. Automate VMware environment data collection. VMware inventory automation can quickly and accurately collect information on the virtual environment that’s invaluable to designing your DR plan. Tools from Veeam can help ease the task. Veeam Reporter can catalog the configuration of the VMware environment, even providing Visio diagrams to reference. Veeam Monitor can provide the performance statistics you need to size your DR infrastructure. Plus, a Veeam Backup Replication proof of concept (POC) is a good way to learn what your daily data change rate is so you can appropriately size your network connections to the recovery location. Virtualize stragglers. If you still have physical servers, it’s time to make the switch. The benefits of virtualized DR are well known and have been written about and practiced for more than 5 years. No matter how good your DR solution for physical servers is, it can’t come close to the capabilities and efficiencies of virtualized DR. Virtualize your remaining physical servers to achieve the most benefit from your DR Plan. Analyze resource requirements. Using the data collected in the assessment phase of your DR plan is invaluable in sizing the DR site and creating your DR budget for servers and storage. The largest limiting factor to the ideal DR plan for VMware is the bandwidth required to replicate all necessary VMs. Consider products, such as HyperIP from Netex, that offer WAN acceleration technology purpose-built for accelerating large data transfers over high packet loss and high-latency network links. This can allow for better use of available bandwidth without breaking your budget. Design for easiest restore. Simplicity is king in a DR situation. Rather than reinvent the wheel by reinstalling operating systems, applications and restoring individual files, you should restore the entire application as a VM to minimize restore time. Replicating the VM with Veeam Backup Replication can provide the lowest RTO possible since VMs only need to be powered on to restore service. Decide on infrastructure configuration. In today’s cost-conscious IT environment, it’s good to know there are options for your recovery site configuration. Although you can choose to self-host DR options in your own facilities, you can also take advantage of VMware service providers that could provide your DR infrastructure as an on-demand cloud service. Test-drive your DR plan. Setting up a DR plan for VMware environments is not a one time activity. You need to ensure that you test, test and test. Manual tests are good, but since you’re working with VMware technology, there’s no reason testing can’t be automated and run as often as daily if needed. Using Veeam Backup Replication’s SureBackup automated backup verification feature is a great way to do this.