• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Crash Dump Analysis 101
 

Crash Dump Analysis 101

on

  • 2,538 views

Introduction to illumos Crash Dump Analysis

Introduction to illumos Crash Dump Analysis

Statistics

Views

Total Views
2,538
Views on SlideShare
2,538
Embed Views
0

Actions

Likes
2
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Crash Dump Analysis 101 Crash Dump Analysis 101 Presentation Transcript

    • CRASH DUMP ANALYSIS 101 JOHN S. HOWARD JOHN.HOWARD@NEXENTA.COM1 © Copyright Nexenta 2012
    • AGENDA!   Terminology!   Core Dumps and Crash Dumps!   C Language Basics!   The Mechanism of a Panic! mdb Overview!   Basic Crash Dump Analysis2 © Copyright Nexenta 2012
    • PROCESS, THREAD, LWP! Process   !  A program in execution !  May be comprised of threads or LWPs!  Thread !  The smallest unit of scheduling !  Shared address space and resources!  Light Weight Process (LWP) !  A many-to-1 mapping of user threads to a kernel thread !  Provides user-level multitasking3 © Copyright Nexenta 2012
    • INTERRUPTS AND TRAPS! I nterrupts are asynchronous messages notifying the kernel of external device events !  Some interrupts are handled as traps!  Traps are synchronous messages, essentially a software interrupt!  Bus errors are issued to a processor when referencing a location that can’t be resolved or located4 © Copyright Nexenta 2012
    • HANGS, CRASHES, AND PANICS! Hang   !  Potentially limited or no forensic information !  System up, but unresponsive!  Crash !  Potentially limited forensic information !  System down or rebooted!  Panic !  Maximum potential forensic information !  System down or rebooted5 © Copyright Nexenta 2012
    • FORENSIC INFORMATION SOURCES! Forensic Information Sources   !  Console !  syslog, typically logged to /var/adm/messages !  Core file or crash dump6 © Copyright Nexenta 2012
    • CORE FILE!   A dump of the contents of all memory allocated to the process!   Inert and static record of state!   Process core files are dumped to the working directory by default!   Core file properties managed via coreadm!   Requires the same libraries to be read7 © Copyright Nexenta 2012
    • CRASH DUMP! A dump of the contents of all memory allocated to the kernel  !  Inert and static record of state!  Written to the pre-specified dump device or swap partition !  Written “backwards”!  Reading requires the same OS version!  Kernel core file facility managed via dumpadm8 © Copyright Nexenta 2012
    • DUMPADM! dumpadm with no options shows current settings # dumpadm !Dump content: kernel pages! !Dump device: /dev/zvol/dsk/rpool/dump (dedicated)! !Savecore directory: /var/crash/myhost! !Savecore enabled: yes!!  To force a crash dump: # savecore -L!  Note that savecore does not quiesce system, so memory contents are changing # uadmin 5 0 # reboot -dn9 © Copyright Nexenta 2012
    • PANIC! Kernel detected inconsistency  !  Protect by exiting!  Three major tasks to be performed in a system panic: !  record information about the panic in memory (making it part of the crash dump) !  synchronize the file systems to preserve user file data !  generate the crash dump10 © Copyright Nexenta 2012
    • C PROGRAMMING LANGUAGE DATATYPES! Built-ins   ! int, float,char! struct !  A grouping of data!  union !  variant records !  All constituent data items are overlaid! typedef!  Pointers !  A reference to a memory location11 © Copyright Nexenta 2012
    • C DATATYPES EXAMPLESint ap;!char buf[128];!int *user = sr;!typedef struct smb_mtype {! ! !char! !*mt_name;!  ! !int ! !mt_namelen;!  ! !int ! !mt_flags;!} smb_mtype_t12 © Copyright Nexenta 2012
    • C FUNCTIONS! Declaration  !  Definition!  Parameters are pass by value13 © Copyright Nexenta 2012
    • C FUNCTION EXAMPLESDeclaration static void smb_tree_log(smb_request_t *, const char *, ! const char *, ...);!Definition
 smb_tree_log(smb_request_t *sr, const char *sharename,! const char *fmt, ...)
 {
 .
 .
 .
 }!14 © Copyright Nexenta 2012
    • PANIC()! panic(),   cmn_err() !  Common entry points for vpanic() !  Responsible for providing panic information!  die()! vpanic() !  Assembly language function for saving register state!  ASSERT(condition) !  Halts execution of the kernel if condition is false !  Evaluated and executed only when the DEBUG compilation symbol is defined!  VERIFY(condition) !  Similar to ASSERT, but active even when DEBUG isn’t defined !  Stack will contain assfail() near top15 © Copyright Nexenta 2012
    • EXAMPLE 1: PANIC STRINGpanic[cpu1]/thread=ffffff000e4e7c60:BAD TRAP: type=e (#pf Page fault)rp=ffffff000e4e77c0 addr=0 occurred in module"unix" due to a NULL pointer dereference16 © Copyright Nexenta 2012
    • EXAMPLE 1: STACK TRACEffffff000e4e76a0 unix:die+dd ()ffffff000e4e77b0 unix:trap+177b ()ffffff000e4e77c0 unix:cmntrap+e6 ()ffffff000e4e78c0 unix:strcasecmp+16 ()ffffff000e4e7a50 smbsrv:smb_tree_log+b3 ()ffffff000e4e7a90 smbsrv:smb_tree_connect_core+14a ()ffffff000e4e7ac0 smbsrv:smb_tree_connect+35 ()ffffff000e4e7ae0 smbsrv:smb_com_tree_connect_andx+16 ()ffffff000e4e7b80 smbsrv:smb_dispatch_request+4a9 ()ffffff000e4e7bb0 smbsrv:smb_session_worker+6c ()ffffff000e4e7c40 genunix:taskq_d_thread+b1 ()ffffff000e4e7c50 unix:thread_start+8 ()17 © Copyright Nexenta 2012
    • MDB – MODULAR DEBUGGER! Extensible utility for low-level debugging and editing  !  On live kernel: # mdb -k # mdb -kw to edit (VERY  DANGEROUS)!  On a core file: mdb syseventd.core.125!  On a crash dump: # mdb -k unix.3 vmcore.318 © Copyright Nexenta 2012
    • ANALYZE-CRASH.SH! Extracts the crash dump from the dump device   (savecore -vf filename) if necessary!  Scripted mdb commands for basic crash information: !  Panic string and registers ! dmesg buffer !  Stack !  Thread list!  Executed automatically by the NMC `support` command (NS 3.1.2 and later)19 © Copyright Nexenta 2012
    • HAVE I SEEN THIS BEFORE?! Footprints  !  Known problem or new? ! Redmine !  Search illumos Hg issues https://www.illumos.org/issues/ ! SunSolve is gone, however “We Sun Solve” is rescuing the data from SunSolve.Sun.COM http://wesunsolve.net/bsearch! illumos Source browser http://src.illumos.org/source/20 © Copyright Nexenta 2012
    • EXAMPLE 1: PANIC STRINGpanic[cpu1]/thread=ffffff000e4e7c60:BAD TRAP: type=e (#pf Page fault)rp=ffffff000e4e77c0 addr=0 occurred in module"unix" due to a NULL pointer dereference21 © Copyright Nexenta 2012
    • EXAMPLE 1: STACK TRACEffffff000e4e76a0 unix:die+dd ()ffffff000e4e77b0 unix:trap+177b ()ffffff000e4e77c0 unix:cmntrap+e6 ()ffffff000e4e78c0 unix:strcasecmp+16 ()ffffff000e4e7a50 smbsrv:smb_tree_log+b3 ()ffffff000e4e7a90 smbsrv:smb_tree_connect_core+14a ()ffffff000e4e7ac0 smbsrv:smb_tree_connect+35 ()ffffff000e4e7ae0 smbsrv:smb_com_tree_connect_andx+16 ()ffffff000e4e7b80 smbsrv:smb_dispatch_request+4a9 ()ffffff000e4e7bb0 smbsrv:smb_session_worker+6c ()ffffff000e4e7c40 genunix:taskq_d_thread+b1 ()ffffff000e4e7c50 unix:thread_start+8 ()22 © Copyright Nexenta 2012
    • EXAMPLE 2: PANIC INFOpanic[cpu5]/thread=ffffff000fd72c60:BAD TRAP: type=0 (#de Divide error) rp=ffffff000fd72a40 addr=ffffff02da92e900sched:#de Divide erroraddr=0xffffff02da92e900pid=0, pc=0xfffffffff7ad977b, sp=0xffffff000fd72b30, eflags=0x10246cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>cr2: fffffd7fff2a60c8cr3: 5000000cr8: c        rdi: ffffff02d282e840 rsi:                0 rdx:                0        rcx:               64  r8: ffffff000fd72c60  r9:                0        rax:                0 rbx:                0 rbp: ffffff000fd72b90        r10:                0 r11: ffffff02f46e8264 r12: ffffff02da316338        r13: ffffff02da3163d0 r14: ffffff02d5061a50 r15: ffffff02da92e900        fsb:                0 gsb: ffffff02da9a1540  ds:               4b         es:               4b  fs:                0  gs:              1c3        trp:                0 err:                0 rip: fffffffff7ad977b         cs:               30 rfl:            10246 rsp: ffffff000fd72b30         ss:               3823 © Copyright Nexenta 2012
    • EXAMPLE 2: STACK ffffff000fd72920 unix:die+10f () ffffff000fd72a30 unix:trap+1555 () ffffff000fd72a40 unix:cmntrap+e6 () ffffff000fd72b90 cpudrv:cpudrv_monitor+1cb () ffffff000fd72c40 genunix:taskq_thread+285 () ffffff000fd72c50 unix:thread_start+8 () syncing file systems...  done dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel + curproc STACK --- ffffff000fd72b90 cpudrv_monitor+0x1cb(ffffff02da316338) ffffff000fd72c40 taskq_thread+0x285(ffffff02da859140) ffffff000fd72c50 thread_start+8()24 © Copyright Nexenta 2012
    • EXAMPLE 2: THREAD LISTffffff000fd72c60 fffffffffbc2dbf0                0   0  60                0  PC: panicsys+0x9b    TASKQ: cpudrv_cpudrv_monitor  stack pointer for thread ffffff000fd72c60: ffffff000fd726e0    xc_insert+0x36()    0xffffff0200000000()    cpudrv_monitor+0x1cb()    taskq_thread+0x285()    thread_start+8()25 © Copyright Nexenta 2012
    • EXAMPLE 2: SOURCECODEFrom cpudrv_monitor() 1109      /*   1110       * Adjust counts based on the delay added by timeout and taskq.   1111       */   1112      idle_cnt = (idle_cnt * cur_spd->quant_cnt) / tick_cnt;   1113      user_cnt = (user_cnt * cur_spd->quant_cnt) / tick_cnt;   1114 26 © Copyright Nexenta 2012
    • HARDWARE, FIRMWARE, OR SOFTWARE?!   Crash dumps are inconclusive on hardware errors!   Correlate to fmdump output!   PCI-X panics are the most common hardware caused panic!   PCI Vendor Database http://pcidatabase.com!   KB Article: “Understanding and decoding PCI(-X) Express Fatal Error panics”27 © Copyright Nexenta 2012
    • EXAMPLE 3: PANIC STRINGAND STACK TRACE panic[cpu7]/thread=ffffff005cbdbc60: pcieb-3: PCI(-X) Express Fatal Error. (0x101) ffffff005cbdbbb0 pcieb:pcieb_intr_handler+228 () ffffff005cbdbc00 unix:av_dispatch_autovect+7c () ffffff005cbdbc40 unix:dispatch_hardint+33 () ffffff005cbaba80 unix:switch_sp_and_call+13 () ffffff005cbabad0 unix:do_interrupt+b8 () ffffff005cbabae0 unix:_interrupt+b8 () ffffff005cbabbd0 unix:i86_mwait+d () ffffff005cbabc20 unix:cpu_idle_mwait+f1 () ffffff005cbabc40 unix:idle+114 () ffffff005cbabc50 unix:thread_start+8 ()28 © Copyright Nexenta 2012
    • IDENTIFYING THE PCI-XCOMPONENT Mar 30 2011 00:53:53.606674454 ereport.io.pci.fabric nvlist version: 0 class = ereport.io.pci.fabric ena = 0xbcd565541a801401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@0,0/pci8086,3408@1 (end detector) bdf = 0x8 device_id = 0x3408 vendor_id = 0x808629 © Copyright Nexenta 2012
    • IDENTIFYING THE VENDOR Device ID Chip Description Vendor ID Vendor Name 0x3408 Intel 7500 Chipset PCIe Root Port 0x8086 Intel Corporation device-path = /pci@0,0/pci8086,3408@1 device-path = /pci@0,0/pci8086,3408@1/pci108e,484c@0 device-path = /pci@0,0/pci8086,3408@1/pci108e,484c@0,1If no entries in neither the PCI vendor database nor`/usr/share/hwdata/pci.ids` then grep`/etc/path_to_inst`: "/pci@0,0/pci8086,3408@1" 0 "pcie_pci" "/pci@0,0/pci8086,3408@1/pci108e,484c@0" 0 "igb" "/pci@0,0/pci8086,3408@1/pci108e,484c@0,1" 1 "igb“igb is the intel Gigabit NIC driver30 © Copyright Nexenta 2012
    • DETERMINE DRIVER AND PACKAGE DETAILS# dpkg -S igb | grep /kernel’sunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel/drv/igb.confsunwigb: /kernel/drv/amd64/igbsunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernel/drvsunwigb: /kernel/drv/igbsunwigb: /var/lib/dpkg/alien/sunwigb/reloc/kernelExamine the package details:# dpkg -l sunwigbDesired=Unknown/Install/Remove/Purge/Hold| Status=Not/Installed/Config-f/Unpacked/Failed-cfg/Half-inst/t-aWait/T-pend|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)||/ Name Version Description+++-=======================-======================-======================================ii sunwigb 5.11.134-31-8234-1 Intel 82575 1Gb PCI Express NICDriver 31 © Copyright Nexenta 2012
    • A PCI-X CONCLUSION, OF SORTS!  Searching redmine for “igb driver” will find a bug, but also check for any Intel 82575 gigabit issues!  Next, determine: !  Is the driver is down revision? !  Is the firmware is down revision?!  If the driver and firmware are current, then this is most likely a hardware problem!  CDA is inconclusive for proving hardware failures32 © Copyright Nexenta 2012