4. What will I cover?
➢ How to gather crash data
➢ How to analyze crash dumps
5. Ways to gather crash data
➢ Serial console, netconsole
➢ Kmsg dumpers: ramoops, mtdoops
➢ Kdump: core dump of the whole kernel
➢ Pstore: persistent store filesystem
➢ NVRAM: Non-Volatile RAM (in progress)
➢ MCE: hardware errors
6. Gather the OOPS
➢ Serial console
➢ it is not wide spread
➢ it is limited to a several meters from the machine
➢ Netconsole
➢ allows for sending oopses over the network
➢ if compiled as a module, allows reconfiguration
➢ relies on UDP
➢ if the network is broken or the network stack is the
one experiencing issues - IT DOES NOT WORK :)
7. Gather the OOPS
➢ Serial console
➢ it is not wide spread
➢ it is limited to a several meters from the machine
➢ Netconsole
➢ allows for sending oopses over the network
➢ if compiled as a module, allows reconfiguration
➢ relies on UDP
➢ if the network is broken or the network stack is the
one experiencing issues - IT DOES NOT WORK :)
8. Pstore
➢ Pstore: persistent store filesystem
➢ Relies on APEI or UEFI
➢ ACPI Platform Error Interface(APEI)
➢ Provides a generic FS layer for lower persistent
storage
➢ Relies on platform drivers
➢ Available since 2010
# dmesg|grep persistent
pstore: Registered erst as persistent store backend
# mount -t pstore none /sys/fs/pstore
9. Pstore
# ls -l /sys/fs/pstore
total 0
-r--r--r--. 1 root root 1016 May 13 07:46 dmesg-efi-1
-r--r--r--. 1 root root 1012 May 13 07:46 dmesg-efi-10
-r--r--r--. 1 root root 948 May 13 07:46 dmesg-efi-11
-r--r--r--. 1 root root 943 May 13 07:46 dmesg-efi-2
-r--r--r--. 1 root root 677 May 13 07:46 dmesg-efi-3
-r--r--r--. 1 root root 993 May 13 07:46 dmesg-efi-4
-r--r--r--. 1 root root 1010 May 13 07:46 dmesg-efi-5
-r--r--r--. 1 root root 999 May 13 07:46 dmesg-efi-6
-r--r--r--. 1 root root 976 May 13 07:46 dmesg-efi-7
-r--r--r--. 1 root root 1006 May 13 07:46 dmesg-efi-8
-r--r--r--. 1 root root 949 May 13 07:46 dmesg-efi-9
11. Kmsg dumpers
➢ ramoops
➢ utilizes the pstore for storing oopses and panics
➢ since 2011
➢ mtdoops
➢ utilizes Memory Technology Devices found on
some SoC
➢ available since 2007
12. ➢ NVRAM
➢ still not widely available
➢ MCE - mainly EDAC
➢ Error Detection And Correction
13. Kdump
➢ No dependencies, theoretically ideal, but...
➢ Based on kexec
➢ Not all arch support kexec
➢ Not easy to setup
➢ Boots a second kernel to retrieve the crash vmcore
➢ Almost useless in cases of HW failure
➢ Needs assistance of other tools for analysis
15. Kdump
➢ A second kernel needs to be started when
crashing
➢ Not all drivers work fine in the second kenrel
➢ Very limited memory for the second kernel
➢ We need to construct a new initrd for the
second kernel
18. Analyzing the crashed kernel
➢ Try using ksymoops on the collected
oops/panic
➢ System.map - kernel function addresses
➢ /proc/ksyms - list of kernel symbols
➢ /proc/kcore - the system memory
➢ vmlinux - the uncompressed kernel, can be
disassembled using objdump
19. Analyzing the crashed kernel
# gdb namei.o
(gdb) list *(lookup_fast+0x3e)
0x48fe is in lookup_fast (fs/namei.c:1551).
1546 * going to fall back to non-racy lookup.
1547 */
1548 if (nd->flags & LOOKUP_RCU) {
1549 unsigned seq;
1550 bool negative;
1551 dentry = __d_lookup_rcu(parent, &nd-
>last, &seq);
1552 if (unlikely(!dentry)) {
1553 if (unlazy_walk(nd, NULL, 0))
1554 return -ECHILD;
1555 return 0;
20. Analyzing the crashed kernel
# cd /usr/src/kernels/KERNEL
# grep -r lookup_fast
.....
fs/namei.c:static int lookup_fast(struct
nameidata *nd,
.....
21. Using the crash utility
➢ Download and build the latest version
➢ https://github.com/crash-utility/crash
➢ Run it on the same architecture
➢ mix between 32 and 64bit arch is not supported
➢ Checkout the help information
➢ http://people.redhat.com/anderson/help.html
22. Using the crash utility
➢ Most used commands:
➢ bt - backtrace
➢ log - print the kernel buffer
➢ ps - list all processes
➢ files - list all file descriptors related to task/PID
➢ whatis - gives you data or type information
23. Analyzing the crashed kernel
➢ Usually you would manually examine the crash
➢ But a small automation may be nice:
# cat extract-basic-info
bt
log
ps
exit
# crash < extract-basic-info
24. Analyzing the crashed kernel
➢ Usually you would manually examine the crash
➢ But a small automation may be nice:
# cat extract-basic-info
bt
log
ps
exit
# crash vmlinux vmcore < extract-basic-info > report
37. Analyzing the crashed kernel
crash> struct cred 0xffff880845a2ec00
struct cred {
usage = {
counter = 48
},
uid = {
val = 1849
},
gid = {
val = 1845
},
suid = {
val = 1849
},
sgid = {
val = 1845
},
38. Using the crash utility
➢ Most used commands:
➢ sys - show the system information stored in the
crash
➢ ipcs - show the shared memory segments
➢ vm - examine the virtual memory in the crash dump
➢ dev - list all devices