DevConf.cz 2016 1/21
Host fencing in oVirt
Fixing the unknown and allowing
VMs to be highly available
Martin Peřina
Software Engineer at Red Hat
DevConf.cz 2016 2/21
Agenda
● Introduction
● Fencing in real life
● Future plans
DevConf.cz 2016 3/21
Introduction
DevConf.cz 2016 4/21
oVirt architecture
Engine
VDSM
VDSM
Storage
Cluster
Data Center
DevConf.cz 2016 5/21
Terminology
● Host - physical server to run hypervisor on
● Cluster - set of hosts with same architecture/capabilities
to enable VM migrations between those hosts
● Data Center - set of clusters and storage
● Highly Available VM - VM which is automatically
restarted on different host in case of failure
DevConf.cz 2016 6/21
Terminology
● Power Management Interface - interface of the hosts
that allow to perform PM operations on it
● Fence Agent - tool that exposes PM interface of the host
through common API
● Non Responsive Host - host that didn't respond to
engine communication request for some time
● Fence Proxy - host on which fence agent is executed to
perform power management operation for non
responsive host
DevConf.cz 2016 7/21
Host detail / Power Management tab
DevConf.cz 2016 8/21
Power management operation
Engine
Fence Proxy
Host
Target Host- Power Management Restart
- Fence Agent Call
DevConf.cz 2016 9/21
Fence proxy selection
● Process to select a host on which fence agent will be
executed
● Hosts are preferred according to their status
● Hosts are evaluated by their location:
– Cluster
– Data Center
– Other Data Center (not by default)
● Proxy host location preference can be customized either
globally or per host
DevConf.cz 2016 10/21
Power management proxy preference
DevConf.cz 2016 11/21
Fencing
● Process that tries to make non responsive host
responsive again using various techniques
● Successful detection of host dumping flow or successful
execution of power management stop is the only way
how to ensure that VMs executed on the host are no
longer alive -> those VMs can be restarted on different
host
● Prevent data corruption is most important goal
DevConf.cz 2016 12/21
Fencing flow steps
1.SSH Soft Fencing
– Attempt to restart VDSM using SSH connection
2.Kdump Detection
– Detect if host is dumping and wait until it finishes
dumping to preserve kdump data
3.Power Management Restart
– Restart the host using power management interface
DevConf.cz 2016 13/21
Fencing in real life
DevConf.cz 2016 14/21
VDSM crashed
Engine
Fence Proxy
Host
Non Responding
Host
- SSH Soft Fencing
DevConf.cz 2016 15/21
Link failure - simple network configuration
Engine
Fence Proxy
Host
Non Responding
Host
- SSH Soft Fencing
X
- Power Management Restart
- Fence Agent Call
DevConf.cz 2016 16/21
Host is dumping
Engine
Fence Proxy
Host
Dumping Host
- SSH Soft Fencing
- Host starts dumping - notification to engine
- Host finished dumping - notification to engine
DevConf.cz 2016 17/21
Link failure - advanced network configuration
Engine
Fence Proxy Host
(cluster 2)
Non Responding Host
(cluster 1)
- SSH Soft Fencing
- Power Management Restart
- Fence Agent Call
x
Storage
DevConf.cz 2016 18/21
Cluster Fencing Policy
DevConf.cz 2016 19/21
Future plans
DevConf.cz 2016 20/21
Fencing – Future plans
● Storage fencing
● Detection of hardware failures
DevConf.cz 2016 21/21
THANK YOU!
http://www.ovirt.org
mperina@redhat.com
mperina at #ovirt (irc.oftc.net)

Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available

  • 1.
    DevConf.cz 2016 1/21 Hostfencing in oVirt Fixing the unknown and allowing VMs to be highly available Martin Peřina Software Engineer at Red Hat
  • 2.
    DevConf.cz 2016 2/21 Agenda ●Introduction ● Fencing in real life ● Future plans
  • 3.
  • 4.
    DevConf.cz 2016 4/21 oVirtarchitecture Engine VDSM VDSM Storage Cluster Data Center
  • 5.
    DevConf.cz 2016 5/21 Terminology ●Host - physical server to run hypervisor on ● Cluster - set of hosts with same architecture/capabilities to enable VM migrations between those hosts ● Data Center - set of clusters and storage ● Highly Available VM - VM which is automatically restarted on different host in case of failure
  • 6.
    DevConf.cz 2016 6/21 Terminology ●Power Management Interface - interface of the hosts that allow to perform PM operations on it ● Fence Agent - tool that exposes PM interface of the host through common API ● Non Responsive Host - host that didn't respond to engine communication request for some time ● Fence Proxy - host on which fence agent is executed to perform power management operation for non responsive host
  • 7.
    DevConf.cz 2016 7/21 Hostdetail / Power Management tab
  • 8.
    DevConf.cz 2016 8/21 Powermanagement operation Engine Fence Proxy Host Target Host- Power Management Restart - Fence Agent Call
  • 9.
    DevConf.cz 2016 9/21 Fenceproxy selection ● Process to select a host on which fence agent will be executed ● Hosts are preferred according to their status ● Hosts are evaluated by their location: – Cluster – Data Center – Other Data Center (not by default) ● Proxy host location preference can be customized either globally or per host
  • 10.
    DevConf.cz 2016 10/21 Powermanagement proxy preference
  • 11.
    DevConf.cz 2016 11/21 Fencing ●Process that tries to make non responsive host responsive again using various techniques ● Successful detection of host dumping flow or successful execution of power management stop is the only way how to ensure that VMs executed on the host are no longer alive -> those VMs can be restarted on different host ● Prevent data corruption is most important goal
  • 12.
    DevConf.cz 2016 12/21 Fencingflow steps 1.SSH Soft Fencing – Attempt to restart VDSM using SSH connection 2.Kdump Detection – Detect if host is dumping and wait until it finishes dumping to preserve kdump data 3.Power Management Restart – Restart the host using power management interface
  • 13.
  • 14.
    DevConf.cz 2016 14/21 VDSMcrashed Engine Fence Proxy Host Non Responding Host - SSH Soft Fencing
  • 15.
    DevConf.cz 2016 15/21 Linkfailure - simple network configuration Engine Fence Proxy Host Non Responding Host - SSH Soft Fencing X - Power Management Restart - Fence Agent Call
  • 16.
    DevConf.cz 2016 16/21 Hostis dumping Engine Fence Proxy Host Dumping Host - SSH Soft Fencing - Host starts dumping - notification to engine - Host finished dumping - notification to engine
  • 17.
    DevConf.cz 2016 17/21 Linkfailure - advanced network configuration Engine Fence Proxy Host (cluster 2) Non Responding Host (cluster 1) - SSH Soft Fencing - Power Management Restart - Fence Agent Call x Storage
  • 18.
  • 19.
  • 20.
    DevConf.cz 2016 20/21 Fencing– Future plans ● Storage fencing ● Detection of hardware failures
  • 21.
    DevConf.cz 2016 21/21 THANKYOU! http://www.ovirt.org mperina@redhat.com mperina at #ovirt (irc.oftc.net)