CenturyLink Cloud's SafeHaven disaster recovery service replicates customer virtual machines and data to an off-site location to protect against disasters like datacenter outages. It works by creating replicas of protected virtual machines that are kept in sync at a remote site using replication traffic. If the primary site goes down, control traffic directs failover to the recovery virtual machines at the remote site to maintain availability and services. SafeHaven provides disaster protection for on-premises CenturyLink Cloud customers.
13. Protected
VM
Protected
VM
Proxy
Recovery VM
SafeHaven: How it works . . .
SRN
CMS
On-Prem CenturyLink Cloud
Control Traffic Central Mgmt. Server
Local I/O Traffic Production VMVM
Replication Traffic SafeHaven Rep NodeSRN
Failover Traffic Recovery VMVM
CMS
Legend
Console
Local Disk
Remote
Replicas
Protected VMs
Physical
Servers
Rollback
Images
MIRROR MIRROR
LAN
SRN
Editor's Notes
Introduction
Give the history
Ask question of who experienced a disaster before
Give my story
Time to update resume, but resume was on shared drive
DR plans does anyone have one? Probe because everyone lies
Bet they never tested
Scare the shit out of them
Capital cost vs opex cost
Wont pay for disaster recovery till it happens to them and talk about sharknado
Note that disasters do happen and even squirrels suck
Even a star wars dude has a bad day.
DR plan is the most important, we give you the hammer, but it’s what you do with that hammer
Talk about how DR planning is the most critical step
How DR to Cloud is fucking awsome
At this point we have covered how SafeHaven benefits customers, now we will go into a deeper explanation of how the technology works. Before we delve into the weeds on the technology, this is good place to pause for questions and comments.
Recall that we said each subscribing organization receives a SafeHaven cluster dedicated for them. This slide represents aSafeHaven Cluster with just two data centers: a private data center on-site at the customer premises anda recovery data center in the CenturyLink Cloud. The slide will build gradually to help us understand how SafeHaven works.
First, we download a “SafeHaven Replication Node” or “SRN” into the production data center. This is a light-weight virtual appliance that is responsible for replicating data into the CenturyLink Cloud. It is normally provisioned with 2 vCPU, 4 GB RAM and 15 GB disk space. With these allocations, it normally protects about ten VMs. The SRN acts as a virtual storage appliance and presents iSCSI targets to the guest fabric.
To get data onto the SRN, we then install a driver on the protected servers. The driver performs two functions. First, it copies the base images of the protected VMs onto the SRN. And while it is doing this it also writes all new updates onto the SRN as well.
The SRN can keep a local copy of the protected VMs in the production data center for local protection, but it can also just buffer the changes there and asynchronously transmit the data into the CenturyLink Cloud.
There, a sister SRN receives the data and stores persistent copies in the cloud. In addition to the persistent copies, the SRN also maintains a scrolling log of up to 2048 checkpoint for Continuous Data Protection. The purpose of these checkpoints is to ensure that you can roll back to previous points in time if your production VMs get corrupted or infected with a virus. Many disruption events will involve some data corruption. If SafeHaven merely faithfully copied this corrupted data into the cloud, it wouldn’t be doing you much good. You need the ability to rollback in time to the instant before the corruption event occurred. The interval between checkpoints can be set differently for each “protection group” (a concept that we will explain further in the next slide). The interval can be as little as 30 seconds or as much as several hours. While SafeHaven can keep up to 2048 checkpoints in its scrolling log, the actual number it will retain is determined by the amount of storage that you allocated to the checkpoint pool.
The replica disk copies that SRN maintains in the cloud are presented to “proxy recovery servers”. For every protected VM you have in the production data centerSafeHaven maintains a corresponding “Proxy” in the cloud. But the proxy is kept powered off and you will not be charged for any memory or CPU unless a disruption event occurs that causes you to failover into the cloud. Even so, the disk images of the Proxies are kept up to date by the SRN so that they are current replicas of the VMs in your production data center.
There are just a couple more components and then we are done on the architecture. There is a “SafeHaven Console” that you use to manage your disaster protection environment. This is a rich java application that is installed on one or more administrative clients that you use to manage your data center. Commands from this console are sent to a “Central Management Server” or “CMS”. The CMS plays a role analogous to that of Vmware’s vCenter Server. It is like a quarterback or a clearing house issuing commands from the management console to the SafeHaven Cluster. The CMS sends commands to the appropriate SRNs in the appropriate data centers, they execute the commands locally and then report back to the CMS. The CMS then reports back to the SafeHaven management console.
This is how the SafeHaven system works. The SRN is the workhorse responsible for replicating data to the cloud; the mirroring driver gets data onto the SRN; the CMS is like a quarterback in charge of the entire Cluster; and the whold system is managed through the SafeHaven console.
This shit is awesome.
Talk about financeial benefits are awesome and flexibility is awesome!
Demo that DR, but make sure the wireless you are on doesn't block outgoing ports.