Implementing Affordable Disaster Recovery with Hyper-V andMulti-Site Clustering<br />Greg Shields, MVPPartner and Principa...
This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it withi...
What Makes a Disaster?<br /><ul><li>Which of the following would you consider a disaster?
A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage.  That dama...
A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datace...
A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server.
An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing s...
An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.</li>...
A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage.  That dama...
A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datace...
A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server.
An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing s...
An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.</li>...
What Makes a Disaster?<br />Your decision to “declare a disaster” and move to “disaster ops” is a major one.<br />The tech...
What Makes a Disaster?<br />Your decision to “declare a disaster” and move to “disaster ops” is a major one.<br />The tech...
A Disastrous Poll<br />Where are We?  Who Here is…<br />Planning a DR Environment?<br />In Process of Implementing One?<br...
Multi-Site Hyper-V == Single-Site Hyper-V<br />DON’T PANIC:  Multi-site Hyper-V looks very much the same as single-site Hy...
Multi-Site Hyper-V == Single-Site Hyper-V<br />DON’T PANIC:  Multi-site Hyper-V looks very much the same as single-site Hy...
Constructing Site-Proof Hyper-V:Three Things You Need<br />At a very high level, Hyper-V disaster recovery is three things...
Constructing Site-Proof Hyper-V:Three Things You Need<br />Replication Mechanism<br />Storage Device(s)<br />Target Server...
Thing 1:A Storage Mechanism<br />Typically, two SANs in two different locations<br />Fibre Channel , iSCSI, FCoE, heck JBO...
Thing 1:A Storage Mechanism<br />Typically, two SANs in two different locations<br />Fibre Channel , iSCSI, FCoE, heck JBO...
Thing 2:A Replication Mechanism<br />Replication between SANs must occur.<br />There are two commonly-accepted ways to acc...
Thing 2:A Replication Mechanism<br />Replication between SANs must occur.<br />There are two commonly-accepted ways to acc...
Thing 2:A Replication Mechanism<br />Synchronously<br />Changes are made on one node at a time.  Subsequent changes on pri...
Thing 2:A Replication Mechanism<br />Asynchronously<br />Changes on backup SAN will eventually be written.  Are queued at ...
Class Discussion<br /><ul><li>Which would you choose?  Why?</li></li></ul><li>Class Discussion<br /><ul><li>Which would yo...
Synchronous
Assures no loss of data.
Requires a high-bandwidth and low-latency connection.
Write and acknowledgement latencies impact performance.
Requires shorter distances between storage devices.
Asynchronous
Potential for loss of data during a failure.
Leverages smaller-bandwidth connections, more tolerant of latency.
No performance impact.
Potential to stretch across longer distances.
Your Recovery Point Objective makes this decision…</li></li></ul><li>Thing 2½:Replication Processing Location<br />There a...
Thing 2½:Replication Processing Location<br />There are also two locations for replication processing…<br />Storage Layer<...
Thing 3:Target Servers and a Cluster<br />Finally are target servers and a cluster in the backup site.<br />
Clustering’s Sordid History<br />Windows NT 4.0<br />Microsoft Cluster Service “Wolfpack”.<br />“As the corporate expert i...
Clustering’s Sordid History<br />Windows NT 4.0<br />Microsoft Cluster Service “Wolfpack”.<br />“As the corporate expert i...
Clustering’s Sordid History<br />Windows NT 4.0<br />Microsoft Cluster Service “Wolfpack”.<br />“As the corporate expert i...
Clustering’s Sordid History<br />Windows NT 4.0<br />Microsoft Cluster Service “Wolfpack”.<br />“As the corporate expert i...
So, What IS a Cluster?<br />
So, What IS a Cluster?<br />Quorum Drive & Storage for Hyper-V VMs<br />
So, What IS a Multi-Site Cluster?<br />
Quorum:  Windows Clustering’s Most Confusing Configuration<br />Ever been to a Kiwanis meeting…?<br />
Quorum:  Windows Clustering’s Most Confusing Configuration<br />Ever been to a Kiwanis meeting…?<br />A cluster “exists” b...
Quorum:  Windows Clustering’s Most Confusing Configuration<br />Ever been to a Kiwanis meeting…?<br />A cluster “exists” b...
Four Options for Quorum<br />Node and Disk Majority<br />Node Majority<br />Node and File Share Majority<br />No Majority:...
Four Options for Quorum<br />Node and Disk Majority<br />Node Majority<br />Node and File Share Majority<br />No Majority:...
Four Options for Quorum<br />Node and Disk Majority<br />Node Majority<br />Node and File Share Majority<br />No Majority:...
Four Options for Quorum<br />Node and Disk Majority<br />Node Majority<br />Node and File Share Majority<br />No Majority:...
Quorum in Multi-Site Clusters<br />Node and Disk Majority<br />Node Majority<br />Node and File Share Majority<br />No Maj...
Upcoming SlideShare
Loading in...5
×

Implementing dr w. hyper v clustering

803

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
803
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Implementing dr w. hyper v clustering

  1. 1. Implementing Affordable Disaster Recovery with Hyper-V andMulti-Site Clustering<br />Greg Shields, MVPPartner and Principal Technologistwww.ConcentratedTech.com<br />
  2. 2. This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it within your own organization however you like.<br />For more information on our company, including information on private classes and upcoming conference appearances, please visit our Web site, www.ConcentratedTech.com. <br />For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg<br />This work is copyright ©Concentrated Technology, LLC<br />
  3. 3. What Makes a Disaster?<br /><ul><li>Which of the following would you consider a disaster?
  4. 4. A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease.
  5. 5. A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time.
  6. 6. A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server.
  7. 7. An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server.
  8. 8. An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.</li></li></ul><li>What Makes a Disaster?<br /><ul><li>Which of the following would you consider a disaster?
  9. 9. A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease.
  10. 10. A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time.
  11. 11. A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server.
  12. 12. An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server.
  13. 13. An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.</li></ul>DISASTER!<br />JUST A BAD DAY!<br />
  14. 14. What Makes a Disaster?<br />Your decision to “declare a disaster” and move to “disaster ops” is a major one.<br />The technologies used for disaster protection are different than those used for high-availability.<br />More complex.<br />More expensive.<br />
  15. 15. What Makes a Disaster?<br />Your decision to “declare a disaster” and move to “disaster ops” is a major one.<br />The technologies used for disaster protection are different than those used for high-availability.<br />More complex.<br />More expensive.<br />Failover and failback processes involve more thought.<br />You might not be able to just “fail back” with a click of a button.<br />
  16. 16. A Disastrous Poll<br />Where are We? Who Here is…<br />Planning a DR Environment?<br />In Process of Implementing One?<br />Already Enjoying One?<br />What’s a “DR Environment” ???<br />
  17. 17. Multi-Site Hyper-V == Single-Site Hyper-V<br />DON’T PANIC: Multi-site Hyper-V looks very much the same as single-site Hyper-V.<br />Microsoft has not done a good job of explaining this fact!<br />Some Hyper-V hosts.<br />Some networking and storage.<br />Virtual machines that Live Migrate around.<br />
  18. 18. Multi-Site Hyper-V == Single-Site Hyper-V<br />DON’T PANIC: Multi-site Hyper-V looks very much the same as single-site Hyper-V.<br />Microsoft has not done a good job of explaining this fact!<br />Some Hyper-V hosts.<br />Some networking and storage.<br />Virtual machines that Live Migrate around.<br />But there are some major differences too…<br />VMs can Live Migrate across sites.<br />Sites typically have different subnet arrangements.<br />Data in the primary site must be replaced with the DR site.<br />Clients need to know where your servers go!<br />
  19. 19. Constructing Site-Proof Hyper-V:Three Things You Need<br />At a very high level, Hyper-V disaster recovery is three things:<br />A storage mechanism<br />A replication mechanism<br />A set of target servers and a cluster to receive virtual machines and their data<br />Once you have these three things, layering Hyper-V atop is easy.<br />
  20. 20. Constructing Site-Proof Hyper-V:Three Things You Need<br />Replication Mechanism<br />Storage Device(s)<br />Target Servers<br />
  21. 21. Thing 1:A Storage Mechanism<br />Typically, two SANs in two different locations<br />Fibre Channel , iSCSI, FCoE, heck JBOD.<br />Often similar model or manufacturer. <br />This similarity can be necessary (although not required) for some replication mechanisms to function property.<br />
  22. 22. Thing 1:A Storage Mechanism<br />Typically, two SANs in two different locations<br />Fibre Channel , iSCSI, FCoE, heck JBOD.<br />Often similar model or manufacturer. <br />This similarity can be necessary (although not required) for some replication mechanisms to function property.<br />Backup SAN doesn’t necessarily need to be of the same size or speed as the primary SAN<br />Replicated data isn’t always full set of data.<br />You may not need disaster recovery for everything.<br />DR Environments: Where Old SANs Go To Die.<br />
  23. 23. Thing 2:A Replication Mechanism<br />Replication between SANs must occur.<br />There are two commonly-accepted ways to accomplish this….<br />
  24. 24. Thing 2:A Replication Mechanism<br />Replication between SANs must occur.<br />There are two commonly-accepted ways to accomplish this….<br />Synchronously<br />Changes are made on one node at a time. <br />Subsequent changes on primary SAN must wait for ACK from backup SAN.<br />Asynchronously<br />Changes on backup SAN will eventually be written. <br />Changes queued at primary SAN to be transferred at intervals.<br />
  25. 25. Thing 2:A Replication Mechanism<br />Synchronously<br />Changes are made on one node at a time. Subsequent changes on primary SAN must wait for ACK from backup SAN.<br />
  26. 26. Thing 2:A Replication Mechanism<br />Asynchronously<br />Changes on backup SAN will eventually be written. Are queued at primary SAN to be transferred at intervals.<br />
  27. 27. Class Discussion<br /><ul><li>Which would you choose? Why?</li></li></ul><li>Class Discussion<br /><ul><li>Which would you choose? Why?
  28. 28. Synchronous
  29. 29. Assures no loss of data.
  30. 30. Requires a high-bandwidth and low-latency connection.
  31. 31. Write and acknowledgement latencies impact performance.
  32. 32. Requires shorter distances between storage devices.
  33. 33. Asynchronous
  34. 34. Potential for loss of data during a failure.
  35. 35. Leverages smaller-bandwidth connections, more tolerant of latency.
  36. 36. No performance impact.
  37. 37. Potential to stretch across longer distances.
  38. 38. Your Recovery Point Objective makes this decision…</li></li></ul><li>Thing 2½:Replication Processing Location<br />There are also two locations for replication processing…<br />
  39. 39. Thing 2½:Replication Processing Location<br />There are also two locations for replication processing…<br />Storage Layer<br />Replication processing is handled by the SAN itself.<br />Agents are often installed to virtual hosts or machines to ensure crash consistency.<br />Easier to set up, fewer moving parts. More scalable.<br />Concerns about crash consistency.<br />OS / Application Layer<br />Replication processing is handled by software in the VM OS.<br />This software also operates as the agent.<br />More challenging to set up, more moving parts. More installations to manage/monitor. Scalability and cost are linear.<br />Fewer concerns about crash consistency.<br />
  40. 40. Thing 3:Target Servers and a Cluster<br />Finally are target servers and a cluster in the backup site.<br />
  41. 41. Clustering’s Sordid History<br />Windows NT 4.0<br />Microsoft Cluster Service “Wolfpack”.<br />“As the corporate expert in Windows clustering, I recommend you don’t use Windows clustering.”<br />
  42. 42. Clustering’s Sordid History<br />Windows NT 4.0<br />Microsoft Cluster Service “Wolfpack”.<br />“As the corporate expert in Windows clustering, I recommend you don’t use Windows clustering.”<br />Windows 2000<br />Greater availability, scalability. Still painful.<br />Windows 2003<br />Added iSCSI storage to traditional Fibre Channel.<br />SCSI Resets still used as method of last resort (painful).<br />
  43. 43. Clustering’s Sordid History<br />Windows NT 4.0<br />Microsoft Cluster Service “Wolfpack”.<br />“As the corporate expert in Windows clustering, I recommend you don’t use Windows clustering.”<br />Windows 2000<br />Greater availability, scalability. Still painful.<br />Windows 2003<br />Added iSCSI storage to traditional Fibre Channel.<br />SCSI Resets still used as method of last resort (painful).<br />Windows 2008<br />Eliminated use of SCSI Resets.<br />Eliminated full-solution HCL requirement.<br />Added Cluster Validation Wizard and pre-cluster tests.<br />Clusters can now span subnets (ta-da!)<br />
  44. 44. Clustering’s Sordid History<br />Windows NT 4.0<br />Microsoft Cluster Service “Wolfpack”.<br />“As the corporate expert in Windows clustering, I recommend you don’t use Windows clustering.”<br />Windows 2000<br />Greater availability, scalability. Still painful.<br />Windows 2003<br />Added iSCSI storage to traditional Fibre Channel.<br />SCSI Resets still used as method of last resort (painful).<br />Windows 2008<br />Eliminated use of SCSI Resets.<br />Eliminated full-solution HCL requirement.<br />Added Cluster Validation Wizard and pre-cluster tests.<br />Clusters can now span subnets (ta-da!)<br />Windows 2008 R2<br />Improvements to Cluster Validation Wizard and Migration Wizard.<br />Additional cluster services.<br />Cluster Shared Volumes (!) and Live Migration (!)<br />
  45. 45. So, What IS a Cluster?<br />
  46. 46. So, What IS a Cluster?<br />Quorum Drive & Storage for Hyper-V VMs<br />
  47. 47. So, What IS a Multi-Site Cluster?<br />
  48. 48. Quorum: Windows Clustering’s Most Confusing Configuration<br />Ever been to a Kiwanis meeting…?<br />
  49. 49. Quorum: Windows Clustering’s Most Confusing Configuration<br />Ever been to a Kiwanis meeting…?<br />A cluster “exists” because it has quorum between its members. That quorum is achieved through a voting process.<br />Different Kiwanis clubs have different rules for quorum.<br />Different clusters have different rules for quorum.<br />
  50. 50. Quorum: Windows Clustering’s Most Confusing Configuration<br />Ever been to a Kiwanis meeting…?<br />A cluster “exists” because it has quorum between its members. That quorum is achieved through a voting process.<br />Different Kiwanis clubs have different rules for quorum.<br />Different clusters have different rules for quorum.<br />If a cluster “loses quorum”, the entire cluster shuts down and ceases to exist. This happens until quorum is regained.<br />This is much different than a resource failover, which is the reason why clusters are implemented.<br />Multiple quorum models exist.<br />
  51. 51. Four Options for Quorum<br />Node and Disk Majority<br />Node Majority<br />Node and File Share Majority<br />No Majority: Disk Only<br />
  52. 52. Four Options for Quorum<br />Node and Disk Majority<br />Node Majority<br />Node and File Share Majority<br />No Majority: Disk Only<br />
  53. 53. Four Options for Quorum<br />Node and Disk Majority<br />Node Majority<br />Node and File Share Majority<br />No Majority: Disk Only<br />
  54. 54. Four Options for Quorum<br />Node and Disk Majority<br />Node Majority<br />Node and File Share Majority<br />No Majority: Disk Only<br />
  55. 55. Quorum in Multi-Site Clusters<br />Node and Disk Majority<br />Node Majority<br />Node and File Share Majority<br />No Majority: Disk Only<br />Microsoft recommends using the Node and File Share Majority model for multi-site clusters.<br />This model provides the best protection for a full-site outage.<br />Full-site outage requires a file share witness in a third geographic location.<br />
  56. 56. Quorum in Multi-Site Clusters<br />Use the Node and File Share Quorum<br />Prevents entire-site outage from impacting quorum.<br />Enables creation of multiple clusters if necessary.<br />Third Site for Witness Server<br />
  57. 57. I Need a Third Site? Seriously?<br />Here’s where Microsoft’s ridiculous quorum notion gets unnecessarily complicated…<br />What happens if you put the quorum’s file share in the primary site?<br />The secondary site might not automatically come online after a primary site failure.<br />Votes in secondary site < Votes in primary site<br />Let’s count on our fingers…<br />
  58. 58. I Need a Third Site? Seriously?<br />Here’s where Microsoft’s ridiculous quorum notion gets unnecessarily complicated…<br />What happens if you put the quorum’s file share in the secondary site?<br />A failure in the secondary site could cause the primary site to go down.<br />Votes in secondary site > votes in primary site.<br />More fingers…<br />This problem gets even weirder as time passes and the number of servers changes in each site.<br />
  59. 59. I Need a Third Site? Seriously?<br />Third Site for Witness Server<br />
  60. 60. -DEMO-Multi-Site Clustering<br />
  61. 61. Multi-Site Cluster Tips/Tricks<br />Install servers to sites so that your primary site always contains more servers than backup sites.<br />Eliminates some problems with quorum during site outage.<br />
  62. 62. Multi-Site Cluster Tips/Tricks<br />Manage Preferred Owners & Persistent Mode options.<br />Make sure your servers fail over to servers in the same site first.<br />But also make sure they have options on failing over elsewhere.<br />
  63. 63. Multi-Site Cluster Tips/Tricks<br />
  64. 64. Multi-Site Cluster Tips/Tricks<br />Manage Preferred Owners & Persistent Mode options.<br />Make sure your servers fail over to servers in the same site first.<br />But also make sure they have options on failing over elsewhere.<br />Consider carefully the effects of Failback.<br />Failback is a great solution for resetting after a failure.<br />But Failback can be a massive problem-causer as well.<br />Its effects are particularly pronounced in Multi-Site Clusters.<br />Recommendation: Turn it off, (until you’re ready).<br />
  65. 65. Multi-Site Cluster Tips/Tricks<br />
  66. 66. Multi-Site Cluster Tips/Tricks<br />Resist creating clusters that support other services.<br />A Hyper-V cluster is a Hyper-V cluster is a Hyper-V cluster.<br />
  67. 67. Multi-Site Cluster Tips/Tricks<br />Resist creating clusters that support other services.<br />A Hyper-V cluster is a Hyper-V cluster is a Hyper-V cluster.<br />Use disk “dependencies” as Affinity/Anti-Affinity rules.<br />Hyper-V all by itself doesn’t have an elegant way to affinitize.<br />Setting disk dependencies against each other is a work-around.<br />
  68. 68. Multi-Site Cluster Tips/Tricks<br />Resist creating clusters that support other services.<br />A Hyper-V cluster is a Hyper-V cluster is a Hyper-V cluster.<br />Use disk “dependencies” as Affinity/Anti-Affinity rules.<br />Hyper-V all by itself doesn’t have an elegant way to affinitize.<br />Setting disk dependencies against each other is a work-around.<br />Add Servers in Pairs<br />Ensures that a server loss won’t cause site split brain.<br />This is less a problem with the File Share Witness configuration.<br />
  69. 69. Multi-Site Cluster Tips/Tricks<br />Segregate traffic!!!<br />
  70. 70. Most Important!<br />Ensure that networking remains available when VMs migrate from primary to backup site.<br />
  71. 71. Most Important!<br />Ensure that networking remains available when VMs migrate from primary to backup site.<br />Clustering can span subnets!This is good, but only if you plan for it…<br />Remember that crossing subnets also means changing IP address, subnet mask, gateway, etc, at new site.<br />This can be automatically done by using DHCP and dynamic DNS, or must be manually updated.<br />DNS replication is also a problem. Clients will require time to update their local cache.<br />Consider reducing DNS TTL or clearing client cache.<br />
  72. 72. Implementing Affordable Disaster Recovery with Hyper-V andMulti-Site Clustering<br />Greg Shields, MVPPartner and Principal Technologistwww.ConcentratedTech.com<br />
  73. 73. This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it within your own organization however you like.<br />For more information on our company, including information on private classes and upcoming conference appearances, please visit our Web site, www.ConcentratedTech.com. <br />For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg<br />This work is copyright ©Concentrated Technology, LLC<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×