Integrating kdump into oVirt 3.5
Martin Peřina
Software Engineer at Red Hat
August 26th
2014
Integrating kdump into oVirt 3.5 2/43
Agenda
● Motivation
● What is kdump?
● What is fence_kdump?
● How is it all coupled together?
● Configuration
● Future features
Integrating kdump into oVirt 3.5 3/43
Motivation
Integrating kdump into oVirt 3.5 4/43
Host kernel crash on oVirt <= 3.4:
1.host kernel crashed, process which gathers crash
information started (this process can take a lot of time)
2.after some time engine detected the host as non
responsive and execute fencing on it
3.if host is fenced during crash gathering, all crash
information are lost
Integrating kdump into oVirt 3.5 5/43
Goal for oVirt 3.5
● Try to detect if host is not in kdump flow prior to fence
execution
● If host is in kdump flow, do not execute fencing and wait for
host to gather its crash information successfully
Integrating kdump into oVirt 3.5 6/43
What is kdump?
Integrating kdump into oVirt 3.5 7/43
What is kdump?
● kexec based kernel crash dumping mechanism (when
standard kernel crashed, capture kernel is booted)
● dumps memory content of crashed kernel into file on local
or remote target
● dumping is executed from capture kernel, crashed kernel
memory is preserved
● capture kernel needs reserved memory in standard kernel
Integrating kdump into oVirt 3.5 8/43
Standard and capture kernel
Integrating kdump into oVirt 3.5 9/43
How kdump works?
1. Standard kernel crashes
2. Kexec boots capture kernel
3. Memory dump is executed in capture kernel
4. Memory dump file is stored to specified target
5. Host is rebooted
Integrating kdump into oVirt 3.5 10/43
Kdump configuration
kdump configuration is stored in:
● /etc/kdump.conf
● static configuration that can be changed by
administrator
● capture kernel initial ramdisk file
● created from /etc/kdump.conf on kdump service restart
Integrating kdump into oVirt 3.5 11/43
Sample kdump.conf
path /var/crash
core_collector makedumpfile -l --message-level 1 -d 31
Integrating kdump into oVirt 3.5 12/43
Kdump requirements
● kexec-tools package which contains tools to setup and
execute kdump
● crashkernel=MEM_SIZE command line parameter needs to
be configured for standard kernel (on RHEL/Centos enabled
by default, on Fedora administrator is required to enable it)
● kdump service has to be enabled
Integrating kdump into oVirt 3.5 13/43
What is fence_kdump?
Integrating kdump into oVirt 3.5 14/43
What is fence_kdump?
● set of command line tools to receive messages from
dumping host on another predefined host
● part of fence-agents-kdump package
● it uses UDP protocol for messaging
● it uses port 7410 (can be changed)
● it sends messages each 10 seconds (can be changed)
Integrating kdump into oVirt 3.5 15/43
Kdump and fence_kdump
/etc/kdump.conf contains two options to setup fence_kdump:
● fence_kdump_nodes
● list of hosts to send messages to
● fence_kdump_args
● additional parameters for fence_kdump_send
Integrating kdump into oVirt 3.5 16/43
kdump.conf with fence_kdump
path /var/crash
core_collector makedumpfile -l --message-level 1 -d 31
fence_kdump_nodes mperina.brq.redhat.com
fence_kdump_args -p 7410 -i 5
Integrating kdump into oVirt 3.5 17/43
fence_kdump limitations
● fence_kdump destination host(s) have to be predefined and
they are part of capturing kernel initial ramdisk
● fence_kdump receiver can be used to determine if host is
kdumping only for one host at the time and it cannot be
used to determine if host finished kdumping
● fence_kdump messages are sent unencrypted using UDP
protocol
● fence_kdump messages are not signed, sender can be
identified only by source IP address
Integrating kdump into oVirt 3.5 18/43
How is it coupled
together?
Integrating kdump into oVirt 3.5 19/43
oVirt kdump integration
Integrating kdump into oVirt 3.5 20/43
New fence_kdump listener
● new standalone fence_kdump listener was implemented as
a part of oVirt kdump integration
● it can receive messages from multiple kdumping hosts at
once
● it can determine that host finished kdumping using timeout
from last received message
● it communicates with engine using engine database
● it's executed as a service on the same host as engine
Integrating kdump into oVirt 3.5 21/43
Integration – host deploy 1/3
● kdump integration can be enabled for each host by setting
an option in Power Management tab of Host detail popup in
webadmin
● host needs to be redeployed after kdump integration was
enabled
● kdump integration is not bound to cluster level, it can be
enabled even for < 3.5 cluster levels
Integrating kdump into oVirt 3.5 22/43
Integration – host deploy 2/3
● during host deploy there are executed checks if kdump
integration can be enabled:
● host kernel has crashkernel=MEM_SIZE option set
● correct version of kexec-tools is available
● kdump destination address (engine FQDN) can be
resolved
● if any of these checks are not successful, host deploy
finishes successfully, but kdump integration is not
configured and warning displayed
Integrating kdump into oVirt 3.5 23/43
Integration – host deploy 3/3
● if all checks are successful
● fence_kdump options are updated in /etc/kdump.conf
● kdump service is restarted
● if kdump integration was not successfully configured
during host deploy, administrator can fix the issues later
manually and try to redeploy host again
Integrating kdump into oVirt 3.5 24/43
UI: New Host popup
Integrating kdump into oVirt 3.5 25/43
UI: Host Detail
Integrating kdump into oVirt 3.5 26/43
Host deploy part limitations
● host deploy updates only fence_kdump options in
kdump.conf, other options are untouched
● administrator is responsible to manually set correct kdump
target
Integrating kdump into oVirt 3.5 27/43
Integration – kdumping 1/2
Integrating kdump into oVirt 3.5 28/43
Integration – kdumping 2/2
Integrating kdump into oVirt 3.5 29/43
UI: Host start dumping
Integrating kdump into oVirt 3.5 30/43
UI: Host finished dumping
Integrating kdump into oVirt 3.5 31/43
Configuration
Integrating kdump into oVirt 3.5 32/43
fence_kdump listener config
Listener configuration is stored in text files:
● They need to have .conf suffix
● They have to be located under
/etc/ovirt-engine/ovirt-fence-kdump-listener.d directory
● They are simple property based text files
Service restart is needed when config files were changed:
systemctl restart ovirt-fence-kdump-listener
Integrating kdump into oVirt 3.5 33/43
Listener config file sample
LISTENER_ADDRESS=0.0.0.0
LISTENER_PORT=7410
HEARTBEAT_INTERVAL=30
SESSION_SYNC_INTERVAL=5
REOPEN_DB_CONNECTION_INTERVAL=30
KDUMP_FINISHED_TIMEOUT=30
Integrating kdump into oVirt 3.5 34/43
fence_kdump listener options 1/3
LISTENER_ADDRESS
● IP adress(es) that fence_kdump listener listens on
● It can contains either 0.0.0.0 (default) or one specific IP
address
LISTENER_PORT
● port that fence_kdump listener listens on (default 7410)
Integrating kdump into oVirt 3.5 35/43
fence_kdump listener options 2/3
HEARTBEAT_INTERVAL
● Defines the interval in seconds (default 30) of listener's
heartbeat updates to database
SESSION_SYNC_INTERVAL
● Defines the interval in seconds (default 5) to synchronize
listener's host kdumping sessions in memory to database
Integrating kdump into oVirt 3.5 36/43
fence_kdump listener options 3/3
REOPEN_DB_CONNECTION_INTERVAL
● Defines the interval in seconds (default 30) to reopen
database connection which was previously unavailable
KDUMP_FINISHED_TIMEOUT
● Defines maximum timeout in seconds after last received
message from kdumping hosts after which the host
kdump flow is marked as FINISHED
Integrating kdump into oVirt 3.5 37/43
fence_kdump engine config 1/4
● fence_kdump options which are not related to listener are
stored in database and they can be changed using
engine‑config tool
● it's required to restart ovirt-engine (and sometimes also
redeploy hosts) when these values were changed
Integrating kdump into oVirt 3.5 38/43
fence_kdump engine config 2/4
FenceKdumpDestinationAddress
● Defines the hostname(s) or IP address(es) to send
fence_kdump messages to
● If empty (default), engine FQDN is used
FenceKdumpDestinationPort
● Defines the port (default 7410) to send fence_kdump
messages to
Integrating kdump into oVirt 3.5 39/43
fence_kdump engine config 3/4
FenceKdumpMessageInterval
● Defines interval in seconds (default 5) between messages
sent by fence_kdump
FenceKdumpListenerTimeout
● Defines max timeout in seconds (default 90) since last
heartbeat to consider fence_kdump listener alive.
Integrating kdump into oVirt 3.5 40/43
fence_kdump engine config 3/4
KdumpStartedTimeout
● Defines maximum timeout in seconds (default 30) to wait
until 1st message from kdumping host is received (to
detect that host kdump flow started)
Integrating kdump into oVirt 3.5 41/43
Future features
Integrating kdump into oVirt 3.5 42/43
Future features
● Extend kdump to send it's flow status as a part of
fence_kdump message (starting, dumping, finished,
error, ...)
● Extend fence_kdump protocol to:
● use message sequence number
● include unique host id (not to rely just on IP address)
● include HMAC signature for message
Integrating kdump into oVirt 3.5 43/43
THANK YOU !
mperina@redhat.com
mperina at #ovirt (irc.oftc.net)

Integrating kdump into oVirt

  • 1.
    Integrating kdump intooVirt 3.5 Martin Peřina Software Engineer at Red Hat August 26th 2014
  • 2.
    Integrating kdump intooVirt 3.5 2/43 Agenda ● Motivation ● What is kdump? ● What is fence_kdump? ● How is it all coupled together? ● Configuration ● Future features
  • 3.
    Integrating kdump intooVirt 3.5 3/43 Motivation
  • 4.
    Integrating kdump intooVirt 3.5 4/43 Host kernel crash on oVirt <= 3.4: 1.host kernel crashed, process which gathers crash information started (this process can take a lot of time) 2.after some time engine detected the host as non responsive and execute fencing on it 3.if host is fenced during crash gathering, all crash information are lost
  • 5.
    Integrating kdump intooVirt 3.5 5/43 Goal for oVirt 3.5 ● Try to detect if host is not in kdump flow prior to fence execution ● If host is in kdump flow, do not execute fencing and wait for host to gather its crash information successfully
  • 6.
    Integrating kdump intooVirt 3.5 6/43 What is kdump?
  • 7.
    Integrating kdump intooVirt 3.5 7/43 What is kdump? ● kexec based kernel crash dumping mechanism (when standard kernel crashed, capture kernel is booted) ● dumps memory content of crashed kernel into file on local or remote target ● dumping is executed from capture kernel, crashed kernel memory is preserved ● capture kernel needs reserved memory in standard kernel
  • 8.
    Integrating kdump intooVirt 3.5 8/43 Standard and capture kernel
  • 9.
    Integrating kdump intooVirt 3.5 9/43 How kdump works? 1. Standard kernel crashes 2. Kexec boots capture kernel 3. Memory dump is executed in capture kernel 4. Memory dump file is stored to specified target 5. Host is rebooted
  • 10.
    Integrating kdump intooVirt 3.5 10/43 Kdump configuration kdump configuration is stored in: ● /etc/kdump.conf ● static configuration that can be changed by administrator ● capture kernel initial ramdisk file ● created from /etc/kdump.conf on kdump service restart
  • 11.
    Integrating kdump intooVirt 3.5 11/43 Sample kdump.conf path /var/crash core_collector makedumpfile -l --message-level 1 -d 31
  • 12.
    Integrating kdump intooVirt 3.5 12/43 Kdump requirements ● kexec-tools package which contains tools to setup and execute kdump ● crashkernel=MEM_SIZE command line parameter needs to be configured for standard kernel (on RHEL/Centos enabled by default, on Fedora administrator is required to enable it) ● kdump service has to be enabled
  • 13.
    Integrating kdump intooVirt 3.5 13/43 What is fence_kdump?
  • 14.
    Integrating kdump intooVirt 3.5 14/43 What is fence_kdump? ● set of command line tools to receive messages from dumping host on another predefined host ● part of fence-agents-kdump package ● it uses UDP protocol for messaging ● it uses port 7410 (can be changed) ● it sends messages each 10 seconds (can be changed)
  • 15.
    Integrating kdump intooVirt 3.5 15/43 Kdump and fence_kdump /etc/kdump.conf contains two options to setup fence_kdump: ● fence_kdump_nodes ● list of hosts to send messages to ● fence_kdump_args ● additional parameters for fence_kdump_send
  • 16.
    Integrating kdump intooVirt 3.5 16/43 kdump.conf with fence_kdump path /var/crash core_collector makedumpfile -l --message-level 1 -d 31 fence_kdump_nodes mperina.brq.redhat.com fence_kdump_args -p 7410 -i 5
  • 17.
    Integrating kdump intooVirt 3.5 17/43 fence_kdump limitations ● fence_kdump destination host(s) have to be predefined and they are part of capturing kernel initial ramdisk ● fence_kdump receiver can be used to determine if host is kdumping only for one host at the time and it cannot be used to determine if host finished kdumping ● fence_kdump messages are sent unencrypted using UDP protocol ● fence_kdump messages are not signed, sender can be identified only by source IP address
  • 18.
    Integrating kdump intooVirt 3.5 18/43 How is it coupled together?
  • 19.
    Integrating kdump intooVirt 3.5 19/43 oVirt kdump integration
  • 20.
    Integrating kdump intooVirt 3.5 20/43 New fence_kdump listener ● new standalone fence_kdump listener was implemented as a part of oVirt kdump integration ● it can receive messages from multiple kdumping hosts at once ● it can determine that host finished kdumping using timeout from last received message ● it communicates with engine using engine database ● it's executed as a service on the same host as engine
  • 21.
    Integrating kdump intooVirt 3.5 21/43 Integration – host deploy 1/3 ● kdump integration can be enabled for each host by setting an option in Power Management tab of Host detail popup in webadmin ● host needs to be redeployed after kdump integration was enabled ● kdump integration is not bound to cluster level, it can be enabled even for < 3.5 cluster levels
  • 22.
    Integrating kdump intooVirt 3.5 22/43 Integration – host deploy 2/3 ● during host deploy there are executed checks if kdump integration can be enabled: ● host kernel has crashkernel=MEM_SIZE option set ● correct version of kexec-tools is available ● kdump destination address (engine FQDN) can be resolved ● if any of these checks are not successful, host deploy finishes successfully, but kdump integration is not configured and warning displayed
  • 23.
    Integrating kdump intooVirt 3.5 23/43 Integration – host deploy 3/3 ● if all checks are successful ● fence_kdump options are updated in /etc/kdump.conf ● kdump service is restarted ● if kdump integration was not successfully configured during host deploy, administrator can fix the issues later manually and try to redeploy host again
  • 24.
    Integrating kdump intooVirt 3.5 24/43 UI: New Host popup
  • 25.
    Integrating kdump intooVirt 3.5 25/43 UI: Host Detail
  • 26.
    Integrating kdump intooVirt 3.5 26/43 Host deploy part limitations ● host deploy updates only fence_kdump options in kdump.conf, other options are untouched ● administrator is responsible to manually set correct kdump target
  • 27.
    Integrating kdump intooVirt 3.5 27/43 Integration – kdumping 1/2
  • 28.
    Integrating kdump intooVirt 3.5 28/43 Integration – kdumping 2/2
  • 29.
    Integrating kdump intooVirt 3.5 29/43 UI: Host start dumping
  • 30.
    Integrating kdump intooVirt 3.5 30/43 UI: Host finished dumping
  • 31.
    Integrating kdump intooVirt 3.5 31/43 Configuration
  • 32.
    Integrating kdump intooVirt 3.5 32/43 fence_kdump listener config Listener configuration is stored in text files: ● They need to have .conf suffix ● They have to be located under /etc/ovirt-engine/ovirt-fence-kdump-listener.d directory ● They are simple property based text files Service restart is needed when config files were changed: systemctl restart ovirt-fence-kdump-listener
  • 33.
    Integrating kdump intooVirt 3.5 33/43 Listener config file sample LISTENER_ADDRESS=0.0.0.0 LISTENER_PORT=7410 HEARTBEAT_INTERVAL=30 SESSION_SYNC_INTERVAL=5 REOPEN_DB_CONNECTION_INTERVAL=30 KDUMP_FINISHED_TIMEOUT=30
  • 34.
    Integrating kdump intooVirt 3.5 34/43 fence_kdump listener options 1/3 LISTENER_ADDRESS ● IP adress(es) that fence_kdump listener listens on ● It can contains either 0.0.0.0 (default) or one specific IP address LISTENER_PORT ● port that fence_kdump listener listens on (default 7410)
  • 35.
    Integrating kdump intooVirt 3.5 35/43 fence_kdump listener options 2/3 HEARTBEAT_INTERVAL ● Defines the interval in seconds (default 30) of listener's heartbeat updates to database SESSION_SYNC_INTERVAL ● Defines the interval in seconds (default 5) to synchronize listener's host kdumping sessions in memory to database
  • 36.
    Integrating kdump intooVirt 3.5 36/43 fence_kdump listener options 3/3 REOPEN_DB_CONNECTION_INTERVAL ● Defines the interval in seconds (default 30) to reopen database connection which was previously unavailable KDUMP_FINISHED_TIMEOUT ● Defines maximum timeout in seconds after last received message from kdumping hosts after which the host kdump flow is marked as FINISHED
  • 37.
    Integrating kdump intooVirt 3.5 37/43 fence_kdump engine config 1/4 ● fence_kdump options which are not related to listener are stored in database and they can be changed using engine‑config tool ● it's required to restart ovirt-engine (and sometimes also redeploy hosts) when these values were changed
  • 38.
    Integrating kdump intooVirt 3.5 38/43 fence_kdump engine config 2/4 FenceKdumpDestinationAddress ● Defines the hostname(s) or IP address(es) to send fence_kdump messages to ● If empty (default), engine FQDN is used FenceKdumpDestinationPort ● Defines the port (default 7410) to send fence_kdump messages to
  • 39.
    Integrating kdump intooVirt 3.5 39/43 fence_kdump engine config 3/4 FenceKdumpMessageInterval ● Defines interval in seconds (default 5) between messages sent by fence_kdump FenceKdumpListenerTimeout ● Defines max timeout in seconds (default 90) since last heartbeat to consider fence_kdump listener alive.
  • 40.
    Integrating kdump intooVirt 3.5 40/43 fence_kdump engine config 3/4 KdumpStartedTimeout ● Defines maximum timeout in seconds (default 30) to wait until 1st message from kdumping host is received (to detect that host kdump flow started)
  • 41.
    Integrating kdump intooVirt 3.5 41/43 Future features
  • 42.
    Integrating kdump intooVirt 3.5 42/43 Future features ● Extend kdump to send it's flow status as a part of fence_kdump message (starting, dumping, finished, error, ...) ● Extend fence_kdump protocol to: ● use message sequence number ● include unique host id (not to rely just on IP address) ● include HMAC signature for message
  • 43.
    Integrating kdump intooVirt 3.5 43/43 THANK YOU ! mperina@redhat.com mperina at #ovirt (irc.oftc.net)