Linux Kernel Crashdump

AnalyzingAnalyzing
Linux kernelLinux kernel
crash dumpscrash dumps
Marian Marinov <mm@1h.com>
hackman @ irc.freenode.net
https://github.com/hackman

What will I cover?
➢ How to gather crash data
➢ How to analyze crash dumps

Ways to gather crash data
➢ Serial console, netconsole
➢ Kmsg dumpers: ramoops, mtdoops
➢ Kdump: core dump of the whole kernel
➢ Pstore: persistent store filesystem
➢ NVRAM: Non-Volatile RAM (in progress)
➢ MCE: hardware errors

Gather the OOPS
➢ Serial console
➢ it is not wide spread
➢ it is limited to a several meters from the machine
➢ Netconsole
➢ allows for sending oopses over the network
➢ if compiled as a module, allows reconfiguration
➢ relies on UDP
➢ if the network is broken or the network stack is the
one experiencing issues - IT DOES NOT WORK :)

Pstore
➢ Pstore: persistent store filesystem
➢ Relies on APEI or UEFI
➢ ACPI Platform Error Interface(APEI)
➢ Provides a generic FS layer for lower persistent
storage
➢ Relies on platform drivers
➢ Available since 2010
# dmesg|grep persistent
pstore: Registered erst as persistent store backend
# mount -t pstore none /sys/fs/pstore

Pstore
# ls -l /sys/fs/pstore
total 0
-r--r--r--. 1 root root 1016 May 13 07:46 dmesg-efi-1

Pstore
# cat dmesg-efi-4
cat /sys/fs/pstore/dmesg-efi-4
Panic#2 Part4
<1>[ 306.271891] IP: [<ffffffff813ba3e6>] sysrq_handle_crash+0x16/0x20
<4>[ 306.271917] PGD 80a98c067 PUD 807e8e067 PMD 0
<4>[ 306.271937] Oops: 0002 [#1] SMP
<4>[ 306.271952] Modules linked in:
tcp_lp rfcomm fuse xt_CHECKSUM nf_conntrack_netbios_ns
nf_conntrack_broadcast ipt_MASQUERADE ........
function that triggered the crash

Kmsg dumpers
➢ ramoops
➢ utilizes the pstore for storing oopses and panics
➢ since 2011
➢ mtdoops
➢ utilizes Memory Technology Devices found on
some SoC
➢ available since 2007

➢ NVRAM
➢ still not widely available
➢ MCE - mainly EDAC
➢ Error Detection And Correction

Kdump
➢ No dependencies, theoretically ideal, but...
➢ Based on kexec
➢ Not all arch support kexec
➢ Not easy to setup
➢ Boots a second kernel to retrieve the crash vmcore
➢ Almost useless in cases of HW failure
➢ Needs assistance of other tools for analysis

Kdump
➢ A second kernel needs to be started when
crashing
➢ Not all drivers work fine in the second kenrel
➢ Very limited memory for the second kernel
➢ We need to construct a new initrd for the
second kernel

Analyzing the crashed kernel
general protection fault: 0000 [#2] SMP
Modules linked in: module list here
CPU: 4 PID: 6839 Comm: iceweasel Tainted: G D 3.16-2-amd64 #1 Debian
3.16.3-2
Hardware name: Gigabyte 990FXA-UD5, BIOS FB 01/23/2013
task: ffff88009c063370 ti: ffff8801f7c94000 task.ti: ffff8801f7c94000
RIP: 0010:[<ffffffff811bcd08>] [<ffffffff811bcd08>]
__d_lookup_rcu+0xc8/0x160
RSP: 0018:ffff8801f7c97cb0 EFLAGS: 00010212
RAX: 0000000000000015 RBX: ffff8800984a2b60 RCX: 000000000000000c
RDX: ffff0800984a2b90 RSI: ffff8801f7c97e10 RDI: 6461657262757065
RBP: ffff8800984a2cd8 R08: ffff88009c19308c R09: ffff88009c19308c
R10: 0000000000000015 R11: ffffffffffffffff R12: ffff8800984a2b58
R13: 00000015067b0bda R14: ffff8801f7c97e10 R15: ffff8801f7c97d0c
FS: 00007f4f52f7d740(0000) GS:ffff88023fd00000(0000)
knlGS:00000000f55ffb40
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4f37200018 CR3: 0000000206380000 CR4: 00000000000007e0

Analyzing the crashed kernelStack:
ffff88009c063370 ffff8801f4e60c10 ffff88009c063370 ffff8801f7c97d78
ffff8801f7c97d68 0000000000000041 ffff8802356b80a0 ffff8800984a2cd8
ffff8801f7c97e00 ffffffff811aedde 02ffff8000000000 0000000200000000
Call Trace:
[<ffffffff811aedde>] ? lookup_fast+0x3e/0x2b0
[<ffffffff811b0865>] ? path_lookupat+0x155/0x780
[<ffffffffa0536c8a>] ? jfs_readdir+0x1ba/0xf90 [jfs]
[<ffffffff811b0eb6>] ? filename_lookup+0x26/0xc0
[<ffffffff811b4fa4>] ? user_path_at_empty+0x54/0x90
[<ffffffff810e908e>] ? from_kgid_munged+0xe/0x20
[<ffffffff811a9f0a>] ? cp_new_stat+0x13a/0x160
[<ffffffff811a9ab6>] ? vfs_fstatat+0x46/0x90
[<ffffffff811a9f4a>] ? SYSC_newstat+0x1a/0x40
[<ffffffff8150c26d>] ? system_call_fast_compare_end+0x10/0x15
Code: 6b 18 75 cf 41 89 07 4d 89 c8 48 8b 53 20 44 89 d0 eb 12 48 39 fe 75 bb 48 83
c2 08 49 83 c0 08 83 e8 08 74 26 49 8b 38 83 f8 07 <48> 8b 32 77 e3 8d 0c c5 00 00
00 00 4c 89 d8 48 31 fe 48 d3 e0
RIP [<ffffffff811bcd08>] __d_lookup_rcu+0xc8/0x160
RSP <ffff8801f7c97cb0>
---[ end trace d7e9304af4a09ee6 ]---

➢ Try using ksymoops on the collected
oops/panic
➢ System.map - kernel function addresses
➢ /proc/ksyms - list of kernel symbols
➢ /proc/kcore - the system memory
➢ vmlinux - the uncompressed kernel, can be
disassembled using objdump

# gdb namei.o
(gdb) list *(lookup_fast+0x3e)
0x48fe is in lookup_fast (fs/namei.c:1551).
1546 * going to fall back to non-racy lookup.
1547 */
1548 if (nd->flags & LOOKUP_RCU) {
1549 unsigned seq;
1550 bool negative;
1551 dentry = __d_lookup_rcu(parent, &nd-
>last, &seq);
1552 if (unlikely(!dentry)) {
1553 if (unlazy_walk(nd, NULL, 0))
1554 return -ECHILD;
1555 return 0;

# cd /usr/src/kernels/KERNEL
# grep -r lookup_fast
.....
fs/namei.c:static int lookup_fast(struct
nameidata *nd,
.....

Using the crash utility
➢ Download and build the latest version
➢ https://github.com/crash-utility/crash
➢ Run it on the same architecture
➢ mix between 32 and 64bit arch is not supported
➢ Checkout the help information
➢ http://people.redhat.com/anderson/help.html

➢ Most used commands:
➢ bt - backtrace
➢ log - print the kernel buffer
➢ ps - list all processes
➢ files - list all file descriptors related to task/PID
➢ whatis - gives you data or type information

➢ Usually you would manually examine the crash
➢ But a small automation may be nice:
# cat extract-basic-info
bt
log
ps
exit
# crash < extract-basic-info

➢ Usually you would manually examine the crash
➢ But a small automation may be nice:
# cat extract-basic-info
bt
log
ps
exit
# crash vmlinux vmcore < extract-basic-info > report

RIP: 0010:[<ffffffff9bd376d0>] [<ffffffff9bd376d0>] __list_del_entry+0x0/0xb0
RSP: 0018:ffff88002a4e3d20 EFLAGS: 00010006
RAX: dead000000000100 RBX: dead000000000100 RCX: 0000000000000001
RDX: 0000000000000101 RSI: 0000000000000001 RDI: dead000000000100
RBP: ffff88006a6e6028 R08: 0000000000000101 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88006a6e6008
R13: 0000000000000246 R14: deacffffffffff18 R15: ffff880036a22098
FS: 00007f2970ff9700(0000) GS:ffff88006fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efdc457c000 CR3: 0000000069a27000 CR4: 00000000000006f0

➢ Code Segment(CS): 0010
➢ if the right most bit is even
➢ kernel space
➢ if it is odd
➢ user space

[4005105.249407] Kernel panic - not syncing: Hard LOCKUP
[4005105.249409] CPU: 16 PID: 18891 Comm: kworker/u96:0 Tainted: G O 4.4.14-clouder2 #1
[4005105.249411] Workqueue: ipoib_wq ipoib_mcast_join_task [ib_ipoib]
[4005105.249412] 0000000000000000 ffff883fff285b10 ffffffff812f4269 ffffffff81a05545
[4005105.249414] ffff883fff285ba0 ffff883fff285b90 ffffffff8112738d ffffffff00000008
[4005105.249416] ffff883fff285ba0 ffff883fff285b38 0000000000000000 0000000000000046
[4005105.249417] Call Trace:
[4005105.249418] <NMI> [<ffffffff812f4269>] dump_stack+0x67/0x9e
[4005105.249422] [<ffffffff8112738d>] panic+0xc5/0x20b
[4005105.249424] [<ffffffff810e1dcd>] watchdog_overflow_callback+0xdd/0xe0
[4005105.249426] [<ffffffff8111f5f8>] __perf_event_overflow+0x88/0x250
[4005105.249427] [<ffffffff81120174>] perf_event_overflow+0x14/0x20
[4005105.249429] [<ffffffff8101e228>] intel_pmu_handle_irq+0x1c8/0x430
[4005105.249432] [<ffffffff81165bc6>] ? vunmap_page_range+0x1a6/0x310
[4005105.249434] [<ffffffff81165d41>] ? unmap_kernel_range_noflush+0x11/0x20
[4005105.249436] [<ffffffff81382ab8>] ? ghes_copy_tofrom_phys+0x118/0x1e0
[4005105.249437] [<ffffffff81034cff>] ? native_apic_wait_icr_idle+0x1f/0x30
[4005105.249439] [<ffffffff8100a275>] ? arch_irq_work_raise+0x35/0x40
[4005105.249441] [<ffffffff81016b48>] perf_event_nmi_handler+0x28/0x50
[4005105.249443] [<ffffffff81008efd>] nmi_handle+0x6d/0x140
[4005105.249445] [<ffffffff81009480>] default_do_nmi+0x40/0x100
[4005105.249446] [<ffffffff81009641>] do_nmi+0x101/0x150
[4005105.249447] [<ffffffff81616687>] end_repeat_nmi+0x1a/0x1e
[4005105.249450] [<ffffffffa02dd7fc>] ? ipoib_mcast_join_task+0x14c/0x330 [ib_ipoib]

[4005105.249417] Call Trace:
[4005105.249418] <NMI> [<ffffffff812f4269>] dump_stack+0x67/0x9e
[4005105.249422] [<ffffffff8112738d>] panic+0xc5/0x20b
[4005105.249424] [<ffffffff810e1dcd>] watchdog_overflow_callback+0xdd/0xe0
[4005105.249426] [<ffffffff8111f5f8>] __perf_event_overflow+0x88/0x250
[4005105.249427] [<ffffffff81120174>] perf_event_overflow+0x14/0x20
[4005105.249429] [<ffffffff8101e228>] intel_pmu_handle_irq+0x1c8/0x430
[4005105.249432] [<ffffffff81165bc6>] ? vunmap_page_range+0x1a6/0x310
[4005105.249434] [<ffffffff81165d41>] ? unmap_kernel_range_noflush+0x11/0x20
[4005105.249436] [<ffffffff81382ab8>] ? ghes_copy_tofrom_phys+0x118/0x1e0
[4005105.249437] [<ffffffff81034cff>] ? native_apic_wait_icr_idle+0x1f/0x30
[4005105.249439] [<ffffffff8100a275>] ? arch_irq_work_raise+0x35/0x40
[4005105.249441] [<ffffffff81016b48>] perf_event_nmi_handler+0x28/0x50
[4005105.249443] [<ffffffff81008efd>] nmi_handle+0x6d/0x140
[4005105.249445] [<ffffffff81009480>] default_do_nmi+0x40/0x100
[4005105.249446] [<ffffffff81009641>] do_nmi+0x101/0x150
[4005105.249447] [<ffffffff81616687>] end_repeat_nmi+0x1a/0x1e
[4005105.249450] [<ffffffffa02dd7fc>] ? ipoib_mcast_join_task+0x14c/0x330
[ib_ipoib]

[4005105.249450] [<ffffffffa02dd7fc>] ?
ipoib_mcast_join_task+0x14c/0x330 [ib_ipoib]
crash> list *(ipoib_mcast_join_task+0x14c)
list: invalid argument:
*(ipoib_mcast_join_task+0x14c)
crash>
# grep -r ipoib_mcast_join_task
drivers/infiniband/ulp/ipoib/ipoib_multicast.c:void
ipoib_mcast_join_task(struct work_struct *work)

# grep -r ipoib_mcast_join_task
drivers/infiniband/ulp/ipoib/ipoib_multicast.c:void
ipoib_mcast_join_task(struct work_struct *work)
# gdb drivers/infiniband/ulp/ipoib/ipoib_multicast.o
(gdb) list *(ipoib_mcast_join_task+0x14c)
0xffc is in ipoib_mcast_join_task
(drivers/infiniband/ulp/ipoib/ipoib_multicast.c:641)
636 }
637 } else if (!delay_until ||
638 time_before(mcast->delay_until, delay_until))
639 delay_until = mcast->delay_until;
640 }
641 }
642
643 mcast = NULL;
644 ipoib_dbg_mcast(priv, "successfully started all multicast joinsn");
645

# crash vmlinux vmcore
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
....
5093 1 4 ffff880856d30c80 IN 0.0 36276 6340 hald
....
crash> whatis ffff880856d30c80
unsigned long
crash> p ffff880856d30c80
$1 = 18446612168130628736

crash> task 5093
PID: 5093 TASK: ffff880856d30c80 CPU: 4 COMMAND: "hald"
struct task_struct {
state = 1,
stack = 0xffff8808429b0000,
flags = 4211008,
ptrace = 0,
real_cred = 0xffff880845a2ec00,
cred = 0xffff880845a2ec00,
on_cpu = 0,
prio = 120,
static_prio = 120,
normal_prio = 120,

crash> whatis task_struct.cred
struct task_struct {
[1456] const struct cred *cred;
}
crash> whatis struct cred
struct cred {
kuid_t uid;
kuid_t suid;
kuid_t euid;
kuid_t fsuid;
kernel_cap_t cap_inheritable;
kernel_cap_t cap_permitted;
kernel_cap_t cap_effective;
struct user_struct *user;
struct user_namespace *user_ns;
struct group_info *group_info;

crash> struct cred 0xffff880845a2ec00
struct cred {
usage = {
counter = 48
},
uid = {
val = 1849
},
gid = {
val = 1845
},
suid = {
val = 1849
},
sgid = {
val = 1845
},

➢ Most used commands:
➢ sys - show the system information stored in the
crash
➢ ipcs - show the shared memory segments
➢ vm - examine the virtual memory in the crash dump
➢ dev - list all devices

Marian Marinov <mm@1h.com>
hackman @ irc.freenode.net
https://github.com/hackman

Linux Kernel Crashdump

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Linux Kernel Crashdump

Similar to Linux Kernel Crashdump (20)

More from Marian Marinov

More from Marian Marinov (20)

Recently uploaded

Recently uploaded (20)

Linux Kernel Crashdump