Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

VMworld 2011 (BCO3276)


Published on

In this session we heard customer experiences facing some of the biggest DR challenges ever. We heard how Site Recovery Manager was used in Japan after the great earthquake disaster and in New Zealand after the earthquake at Christchurch. We also learned about a case in which Site Recovery Manager was used for site migration.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

VMworld 2011 (BCO3276)

  1. 1. BCO3276Disaster Recovery and Site Migrationwith Site Recovery Manager: CustomerExperiences from Around the WorldGil Haberman, Product Marketing Manager, Business Continuity and Disaster Recovery, VMware, Inc.Alan Baird, VMware, Inc.Christopher Wells, TUV Rheinland Japan Ltd.Paul Schlosser, VMware, Inc.Robert Busillo, Independence Blue Cross
  2. 2. Disclaimer This session may contain product features that are currently under development. This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features discussed or presented have not been determined.2
  3. 3. Agenda SRM and vSphere For Simple and Reliable DR TÜV Rheinland, Japan Mainfreight, New Zealand Independence Blue Cross, USA3
  4. 4. SRM and vSphere For Simple and Reliable DR4
  5. 5. Disasters Happen. Do You Need Protection? 43% of companies experiencing disasters never re-open, and 29% close within two years. (McGladrey and Pullen) 93% of business that lost their data center for 10 days went bankrupt within one year. (National Archives & Records Administration) 40% of all companies that experience a major disaster will go out of business if they cannot gain access to their data within 24 hours. (Gartner) Top executives say 10 hours to recovery; IT managers say up to 30 hours. (Harris Interactive)5
  6. 6. vCenter Site Recovery Manager Ensures Simple, Reliable DR Site Recovery Manager Complements vSphere to provide the simplest and most reliable disaster protection and site migration for all applications Provide cost-efficient replication of applications to failover site • Built-in vSphere Replication • Broad support for storage-based Site A (Primary) Site B (Recovery) VMwarevCenter Server Site Recovery Manager VMware vCenter Server Site Recovery Manager replication Simplify management of recovery and VMware vSphere VMware vSphere migration plans • Replace manual runbooks with centralized recovery plans • From weeks to minutes to set up new plan Servers Servers Automate failover and migration processes for reliable recovery • Enable frequent non-disruptive testing • Ensure fast, automated failover • Automate failback processes6
  7. 7. SRM Momentum Introduced in Q2’ 2008 125,000+ units sold 5,000+ customers 50% annual growth in 2010 “If your organization is already taking advantage of virtualization, then adding Site Recovery Manager to handle disaster recovery is a no-brainer.” ― Jerry Wilkin Senior Systems Administrator, Dayton Superior Corp7
  8. 8. What’s New In Site Recovery Manager 5.0?vSphere Replication  Bundled with SRM at no additional cost Expand DR coverage to Tier 2 apps and smaller  Provides simple, cost-efficient replication between vSphere clusters sitesAutomated failback  Bi-directional recovery plans  Automates failback to original site Streamline plannedPlanned migration migrations  New workflow that can be applied to any (for disaster avoidance, recovery plan planned maintenance, …)  Ensures no data-loss, application-consistent migrations of virtual machinesOthers  More granular control over VM startup order  Protection-side APIs  IPv6 support8
  9. 9. Beyond DR: Disaster Avoidance And Planned Migrations3 typical use-cases for SRM Disaster Failover Disaster Avoidance Planned MigrationRecover from unexpected Anticipate potential Most frequent SRM use casesite failure datacenter outages • Planned datacenter • Full or partial site failure • For example: in case of maintenance planned hurricane, floods, • Global load balancingThe most critical but least forced evacuation, etc.frequent use-case Streamline routine • Unexpected site failures do Initiate preventive failover migrations across sites not happen often for smooth migration • Test to minimize risk • When they do, fast recovery • Leverage SRM ‘planned • Execute partial failovers is critical to the business migration’ to ensure no • Leverage SRM ‘planned data-loss migration’ to ensure no • ‘Automated failback’ data-loss enables easy return to • ‘Automated failback’ original site enables bi-directional migrations9
  10. 10. TÜV Rheinland10
  11. 11. Background TÜV Rheinland was started in Germany in 1872 to perform safety testing of steam pressure vessels. Today TÜV Rheinland is active in 61 countries and 39 different business fields. Technical certification of a wide range of technology products and services. Examples: PV cells, X-ray machines, photocopiers, computer monitors, computer mice/keyboards. Also perform Business Continuity Management, Data Protection Management, Information Security and ITIL services.11
  12. 12. Justification Propensity for seismicity in Japan. Already had infrastructure at more than one location. Services hosted for external customers required specific SLA. Simplify difficult process of disaster recovery.12
  13. 13. Status Quo Before the earthquake, companies where using physical servers at their DR site, or had no DR site at all! Companies in Japan are now conscious of a need for DR and BCP solutions. Many Japanese VMware customers are only familiar with the vSphere base product, not complimentary solutions. VMware is now more actively marketing the SRM products as a result of the recent earthquake.13
  14. 14. History Prior to SRM, DR process was manual. Already had implemented SAN replication, so running SRM was next logical step. DR testing was non-existent due to manual overhead involved with testing. Leveraged VMware snapshots to reduce RTO during failback.14
  15. 15. Implementation Met with VMware and a local reseller for guidance. Set up a POC and learned the product, especially with help of official documentation and books by 3rd party authors. Performed tests of the recovery plan. Leveraged IP address mapping CSV. 3-4 months later, put system into production.15
  16. 16. Use Cases General use of VMware products helps conserve power (useful during power shortages). Shift workloads from areas under power consumption constraints/reductions to unaffected areas. Typical DR protection between Eastern and Western Japan offices. Temporary fail-over to remote site for planned power outage situations (once per year).16
  17. 17. Disaster & Aftermath On March 11th, at 2:46PM JST our disaster recovery plan went into motion. Immediately following the initial shock, systems were functional. Performed testing of the SRM recovery plans as extra precaution. Rolling power outages were implemented by TEPCO, necessitating failover process. Systems not covered by SRM (physical machines) had RTO of >24 hours.17
  18. 18. Lessons & Suggestions Planning for the initial disaster is not enough, you must also plan for energy and other supply shortages. Ensure there is a chain of command to kick-off recovery and ensure more than 1 person can initiate it. Make sure newly created VMs are configured in the Recovery Plan. Be sure to back-up the SRM configuration (local files) and DB backend prior to upgrade. Perform frequent disaster tests. Provide more user-friendly way to map IP addresses. Alert administrators about unprotected or misconfigured VMs.18
  19. 19. Pray for Japan!19
  20. 20. Thanks! For more information:  Follow me:  Blog:  Twitter: @wygtya  LinkedIn:  Facebook:
  21. 21. Mainfreight21
  22. 22. New Zealand - We are here! We are here!22 Confidential
  23. 23. Challenges we face Natural Disasters • Earthquakes ( 3 major and 250 minor in the last 12 months) • Tsunami • Volcanic – 2 active Remote • 3 hour flight to Australia Stability of Power • 1998 Auckland power crisis • Reliance on hydro electricity WAN Considerations • Cost and bandwidth limitations23 Confidential
  24. 24. What was learnt from Christchurch Christchurch was considered low risk for earthquakes Servers and desktops • Unable to return to the office 6 months later • Servers were protected but desktops were lost Reliance on backup media • Slow and potentially unreliable The Human factor • Other priorities • Civil unrest The value of virtualisation • DR with SRM becomes viable24 Confidential
  25. 25. SRM - Customer Experience From Around the Globe David Hall Mainfreight Group IT Infrastructure Manager25 Confidential
  26. 26. Who are we “A company with a 100 year vision”Mainfreight is a global supply chain logistics providerCommenced business in 1978Today has a market capitalisation of $993 millionSales revenues in excess of $1.75 billion4,600+ team membersUnique culture & philosophyWe have a quality focus and aim to delight our customers.26 Confidential
  27. 27. Where We Are “Ready, Fire, Aim!”27 Confidential
  28. 28. Our Challenges “Do more with less” Hybrid model consisting of mostly physical Cost of DR & BCP Previous DR process worked but was complex & time consuming Recent Christchurch earthquakes reiterated to our business the reality of disaster occurring & the importance of DR & BCP Costs of ~$10,000 every hour the systems are down28 Confidential
  29. 29. When Disaster Strikes - Christchurch29 Confidential
  30. 30. About our environment “Top performing organisations are those that have harnessed the true potential of todays cutting edge technologies”Hardware / Software  HP servers & storage South Auckland  Cisco network Production  Microsoft, Citrix, VSphere/SRM 4.x  Active – Active data centres Applications protected with SRM  Maintrak - Web-based consignment tracking system Recovery  MIMs - Inventory management system  Cargowise – International freight forwarding system  On Account – Accounting system  On Sale – CRM system Central Auckland30 Confidential
  31. 31. SRM Highlights “DR is only as good as the last time it was tested”Reduced DR test times from ~15 hours to 4 hoursReduced number of team for DR from 4 to 2Minimised downtime costs – estimated at $10k per hourAchieved 99.999% availabilitySRM has been proven and used in ‘anger’ - SAN failureInstallation well planned and implementedProject completed on time and on budgetMinimal external consultancy requiredProvided a platform to deliver DR for future business applications31 Confidential
  32. 32. Thank you “VMware has provided us with a flexible, reliable IT platform to support the business and deliverIT services in more responsive and cost-effective ways.” – Kevin Drinkwater, Global Chief Information Officer 32 Confidential
  33. 33. IBC33
  34. 34. Company BackgroundVMware History IBC started in 2004 to convert physical servers to VMs in a company wide effort to consolidate hardware, drive down maintenance cost & datacenter space/utilities.Servers Virtualized We currently manage about 800 VMs residing on 60 plus ESX Hosts running ESX 4.1 & ESXi. Since 2005 we have converted over 300 physical servers to VMs.StorageEMC DMX 4 (Production and DR ) & NetApp (Test, Dev and QA)Uses for VMware We run Windows 2003, 2008, Red Hat v5 (64 and 32 bit O/Ss). We have many Tiers 1 applications running in our VM environment SQL, Share Point, Citrix, Hyperion/Informatics and our Claims processing servers. 34
  35. 35. Business NeedsWhat was neededWe were moving our data center in the Summer of 2009 from Philadelphiato Hershey , PA and needed to migrate 300+ Production VMs to our newlocation.SRM Review VMware came onsite to present the SRM product for a future IBC project (DR insourcing) after the product presentation we saw the potential in using this product for our Datacenter move. Working with VMware professional services served very beneficial for IBC.Did it solve the problem? Yes, SRM made our D.C. move less stressful and streamlined, it also solved our plans for DR insourcing & Redundant Production environment.35
  36. 36. Business NeedsWhy VMware solutionWhen we saw the SRM product and how it could help us move 300+production VMs from our Center City Philadelphia D.C to our newHershey, PA D.C it was clear to us that this product would save us manyman hours that we needed elsewhere on our D.C move weekend.SRM Characteristics The SRM advantages that IBC leveraged were the pre-move testing, streamlining and automation of the over all D.C move script which we could plan out the recovery sequence of Tier 1 Prod VM’s to Tier 3 Test VM’s. The over reliability of this product saved our company many Admin man hours, pre and post migration.36
  37. 37. Business NeedsTime outages avoided We saved hours of Production server outage times by using SRM instead of a manual migration and countless Admin man hours were saved allowing our staff to be utilized in other areas of the move weekend.What was neededSRM plugin for Virtual CenterEMC – SRDFVMware Professional Services – The professional services contact was very knowledgeable in the SRM product and how to integrate this with our EMC storage.SRM script and planning – Setting up your server priority migration planning.37
  38. 38. Data Center Migration How much time till DC cutover Professional Service came out a few months prior to the DC move and were onsite for 2 days to prepare the plan and gather information about the environment. What was the setup and integration process We worked with VMware to setup our migration script and verify that the EMC storage was replicating correctly38
  39. 39. Data Center Migration Services needed  Replication of data – Our initial synch was about 50 LUNS and about 30TB of data.  We then setup daily replication of about 1TB a day.  Setup our server priority script (what servers to power down last and which servers to power up 1st.  VMware came onsite 1 more day for verification that all was well before the final move date.39
  40. 40. Data Center Migration What happened on Labor day move weekend? VMware was on site Friday night when we kicked off SRM, there was about 1TB of changes left to be synched. We then disconnected our EMC storage at the old datacenter and failed over to the new datacenter storage. We had less than 10 VMs that needed some attention to get back online. I would highly recommend the VMware Professional Services. They were on site a total of 4 days and walked us through the whole datacenter migration.40
  41. 41. Today How is SRM running today? We currently insource our Disaster Recovery Drill at our D.R./Redundant Production datacenter in Reading, PA utilizing SRM and VMware to get us through the DR drill with replication and failover. We currently run these tests 3-4 times a year.41
  42. 42. Next Steps42
  43. 43. Where Can I Learn More?At VMworld • Visit us at the booth • Multiple great sessions on SRM  BCO 1269 – SRM 5 technical – Tue 4:30PM; Wed 1 PM  BCO 1562 – SRM 5 technical – Tue 12 PM, Wed 10 AM  BCO 2527 – SRM 5 technical – Tue 3 PM  BCO 3334 – Cloud DR – Mon 10 AM; Wed 4 PM  BCO 3336 – Cloud DR – SP perspective – Mon 11:30AM; Tue 12 • Product Page – • Overview, datasheet, webinars, docs, community links • Free 60-day Evaluation – all you need to get started! • Solutions from VMware –
  44. 44. Questions? © 2011 VMware Inc. All rights reserved
  45. 45. BCO3276Disaster Recovery and Site Migrationwith Site Recovery Manager: CustomerExperiences from Around the World