• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Problem Reporting and Analysis Linux on System z -How to survive a Linux Critical Situation
 

Problem Reporting and Analysis Linux on System z -How to survive a Linux Critical Situation

on

  • 813 views

Learn about Systems monitoring,How to dump a Linux on System z, Real Customer cases, etc.For more information, visit http://ibm.co/PNo9Cb.

Learn about Systems monitoring,How to dump a Linux on System z, Real Customer cases, etc.For more information, visit http://ibm.co/PNo9Cb.

Statistics

Views

Total Views
813
Views on SlideShare
813
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Problem Reporting and Analysis Linux on System z -How to survive a Linux Critical Situation Problem Reporting and Analysis Linux on System z -How to survive a Linux Critical Situation Presentation Transcript

    • Problem Reporting and Analysis Linux on System z - How to survive a Linux Critical Situation Sven Schuetz Linux on System z Development and Service sven@de.ibm.com1 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zAgenda Introduction How to help us to help you Systems monitoring How to dump a Linux on System z Real Customer cases2 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zIntroductory remarks Problem analysis looks straight forward on the charts but it might have taken weeks to get it done. A problem does not necessarily show up on the place of origin The more information is available, the sooner the problem can be solved, because gathering and submitting additional information again and again usually introduces delays. This presentation can only introduce some tools and how the tools can be used, comprehensive documentation on their capabilities is to be found in the documentation of the corresponding tool. Do not forget to update your systems3 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zDescribe the problem Get as much information as possible about the circumstances: – What is the problem? – When did it happen? (date and time, important to dig into logs ) – Where did it happen? One or more systems, production or test environment? – Is this a first time occurrence? – If occurred before: how frequently does it occur? – Is there any pattern? – Was anything changed recently? – Is the problem reproducible? Write down as much information as possible about the problem!4 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zDescribe the environment Machine Setup – Machine type (z196, z10, z9, ...) – Storage Server (ESS800, DS8000, other vendors models) – Storage attachment (FICON, ESCON, FCP, how many channels) – Network (OSA (type, mode), Hipersocket) ... Infrastructure setup – Clients – Other Computer Systems – Network topologies – Disk configuration Middleware setup – Databases, web servers, SAP, TSM, (including version information)5 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zTrouble Shooting First-Aid Kit (1/2) Install packages required for debugging – s390-tools/s390-utils • dbginfo.sh – sysstat • sadc/sar • iostat – procps • vmstat, top, ps – net-tools • netstat – dump tools crash / lcrash • lcrash (lkcdutils) available with SLES9 and SLES10 • crash available on SLES11 • crash in all RHEL distributions6 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zTrouble Shooting First-Aid Kit (2/2) Collect dbginfo.sh output – Proactively in healthy system – When problems occur – then compare with healthy system Collect system data – Always archive syslog (/var/log/messages) – Start sadc (System Activity Data Collection) service when appropriate (please include disk statistics) – Collect z/VM MONWRITE Data if running under z/VM when appropriate When System hangs – Take a dump • Include System.map, Kerntypes (if available) and vmlinux file – See “Using the dump tools” book on http://download.boulder.ibm.com/ibmdl/pub/software/dw/linux390/docu/l26ddt02.pdf Enable extended tracing in /sys/kernel/debug/s390dbf for subsystem7 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zdbginfo Script (1/2) dbginfo.sh is a script to collect various system related files, for debugging purposes. It generates a tar-archive which can be attached to PMRs / Bugzilla entries part of the s390-tools package in SUSE and recent Red Hat distributions – dbginfo.sh gets continuously improved by service and development Can be downloaded at the developerWorks website directly http://www.ibm.com/developerworks/linux/linux390/s390-tools.html It is similar to the RedHat tool sosreport or supportconfig from Novell root@larsson:~> dbginfo.sh Create target directory /tmp/DBGINFO-2011-01-15-22-06- 20-t6345057 Change to target directory /tmp/DBGINFO-2011-01-15-22- 06-20-t63450578 [...] © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zdbginfo Script (2/2) Linux Information: – /proc/[version, cpu, meminfo, slabinfo, modules, partitions, devices ...] – System z specific device driver information: /proc/s390dbf (RHEL 4 only) or /sys/kernel/debug/s390dbf – Kernel messages /var/log/messages – Reads configuration files in directory /etc/ [ccwgroup.conf, modules.conf, fstab] – Uses several commands: ps, dmesg – Query setup scripts • lscss, lsdasd, lsqeth, lszfcp, lstape – And much more z/VM information: – Release and service Level: q cplevel – Network setup: q [lan, nic, vswitch, v osa] – Storage setup: q [set, v dasd, v fcp, q pav ...] – Configuration/memory setup: q [stor, v stor, xstore, cpus...] – When the system runs as z/VM guest, ensure that the guest has the appropriate privilege class authorities to issue the commands9 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zSADC/SAR  Capture Linux performance data with sadc/sar – CPU utilization – Disk I/O overview and on device level – Network I/O and errors on device level – Memory usage/Swapping – … and much more – Reports statistics data over time and creates average values for each item  SADC example (for more see man sadc) – System Activity Data Collector (sadc) --> data gatherer – /usr/lib64/sa/sadc [options] [interval [count]] [binary outfile] – /usr/lib64/sa/sadc 10 20 sadc_outfile10 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zSADC/SAR (contd) – /usr/lib64/sa/sadc -d 10 sadc_outfile – -d option: statistics for disk – Should be started as a service during system start ✱ SAR example (for more see man sar) – System Activity Report (sar) command --> reporting tool – sar -A – -A option: reports all the collected statistics – sar -A -f sadc_outfile >sar_outfile  Please include the binary sadc data and sar -A output when submitting SADC/SAR information to IBM support11 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zCPU utilization Per CPU values: watch out for system time (kernel time) iowait time (slow I/O subsystem) steal time (time taken by other guests)12 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zDisk I/O rates read/write operations - per I/O device - tps: transactions - rd/wr_secs: sectors is your I/O balanced? Maybe you should stripe your LVs13 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zLinux on System z dump tools DASD dump tool – Writes dump directly on DASD partition – Uses s390 standalone dump format – ECKD and FBA DASDs supported – Single volume and multiple volume (for large systems) dump possible – Works in z/VM and in LPAR SCSI dump tool – Writes dump into filesystem – Uses lckd dump format – Works in z/VM and in LPAR VMDUMP – Writes dump to vm spool space (VM reader) – z/VM specific dump format, dump must be converted – Only available when running under z/VM Tape dump tool – Writes dump directly on ESCON/FICON Tape device14 – Uses s390 standalone dump format © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zDASD dump tool – general usage1. Format and partition dump device root@larsson:~>  dasdfmt ­f /dev/dasd<x> ­b 4096 root@larsson:~>  fdasd /dev/dasd<x>2. Prepare dump device in Linux root@larsson:~>  zipl ­d /dev/dasd<x1>3. Stop all CPUs4. Store Status5. IPL dump device6. Copy dump to Linux root@larsson:~>  zgetdump /dev/<x1> > dump_file15 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zDASD dump under z/VM Prepare dump device under Linux, if possible on 64Bit environment: root@larsson:~>  zipl ­d /dev/dasd<x1> After Linux crash issue these commands on 3270 console: #cp cpu all stop #cp cpu 0 store status #cp i <dasd_devno> Wait until dump is saved on device: 00: zIPL v1.6.0 dump tool (64 bit) 00: Dumping 64 bit OS 00: 00000087 / 00000700 MB 0 ... 00: Dump successful Only disabled wait PSW on older Distributions Attach dump device to a linux system with dump tools installed Store dump to linux file system from dump device (e.g. zgetdump) Corporation16 © 2011 IBM
    • IBM Live Virtual Class – Linux on System zDASD dump on LPAR (1/2)17 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zDASD dump on LPAR (2/2)18 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zMulti volume dump zipl can now dump to multiple DASDs. It is now possible to dump system images, which are larger than a single DASD. – You can specify up to 32 ECKD DASD partitions for a multi-volume dump What are dumps good for? – Full snapshot of system state taken at any point in time (e.g. after a system has crashed, of or a running system) – Can be used to analyse system state beyond messages written to the syslog – Internal data structures not exported to anywhereObtain messages, which have not been written to the syslog due to acrash19 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zMulti volume dump (contd) How to prepare a set of ECKD DASD devices for a multi-volume dump? (64-bit systems only) – We use two DASDs in this example: root@larsson:~>  dasdfmt ­f /dev/dasdc ­b 4096      root@larsson:~>  dasdfmt ­f /dev/dasdd ­b 4096 – Create the partitions with fdasd. The sum of the partition sizes must be sufficiently large (the memory size + 10 MB): root@larsson:~>  fdasd /dev/dasdc      root@larsson:~>  fdasd /dev/dasdd – Create a file called sample_dump_conf containing the device nodes (e.g. /dev/dasdc1) of the two partitions, separated by one or more line feed characters – Prepare the volumes using the zipl command. root@larsson:~>  zipl ­M sample_dump_conf         [...]20 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zMulti volume dump (contd) To obtain a dump with the multi-volume DASD dump tool, perform the following steps: – Stop all CPUs, Store status on the IPL CPU. – IPL the dump tool using one of the prepared volumes, either 4711 or 4712. – After the dump tool is IPLed, youll see a messages that indicates the progress of the dump. Then you can IPL Linux again #cp cpu all stop #cp cpu 0 store status #cp ipl 4711 Copying a multi-volume dump to a file – Use zgetdump without any option to copy the dump parts to a file: root@larsson:~>  zgetdump /dev/dasdc > mv_dump_file21 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zMulti volume dump (contd) Display information of the involved volumes: root@larsson:~>  zgetdump ­d /dev/dasdc                   /dev/dasdc is part of Version 1 multi­volume dump,which is  spread along the following DASD volumes:          0.0.4711 (online, valid)  0.0.4712 (online, valid) [...] Display information about the dump itself: root@larsson:~>  zgetdump ­i /dev/dasdc                   Dump device: /dev/dasdc >>>  Dump header information  <<< Dump created on: Fri Aug  7 15:12:41 2009  [...] Multi­volume dump: Disk 1 (of 2) Reading dump contents from  0.0.4711................................. Dump ended on:   Fri Aug  7 15:12:52 2009 Dump End Marker found: this dump is valid.22 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zSCSI dump tool – general usage1. Create partition with PCBIOS disk-layout (fdisk)2. Format partition with ext2 or ext3 filesystem3. Install dump tool: – mount and prepare disk : root@larsson:~>  mount /dev/sda1 /dumps root@larsson:~>  zipl ­D /dev/sda1 ­t dumps – Optional: /etc/zipl.conf: dumptofs=/dev/sda1 target=/dumps4. Stop all CPUs5. Store Status6. IPL dump device Dump tool creates dumps directly in filesystem23 SCSI dump supported for LPARs and as of z/VM 5.4 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zSCSI dump under z/VM SCSI dump from z/VM is supported as of z/VM 5.4 Issue SCSI dump #cp cpu all stop #cp cpu 0 store status #cp set dumpdev portname 47120763 00ce93a7 lun 47120000  00000000 bootprog 0 #cp ipl 4b49 dump To access the dump, mount the dump partition24 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zSCSI dump on LPAR Select CPC image for LPAR to dump Goto Load panel Issue SCSI dump – FCP device – WWPN – LUN25 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zVMDUMP The only method to dump NSSes or DCSSes under z/VM Works nondisruptive Create dump: #cp vmdump to cmsguest Receive dump: – Store the dump from the reader into CMS dump file: #cp dumpload – Transfer the dump to linux from CMS e.g. FTP – NEW: vmur device driver: root@larsson:~>  vmur rec <spoolid> vmdump Linux tool to convert vmdump to lkcd format: root@larsson:~>  vmconvert vmdump linux.dump Problem: Dump process relatively slow26 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zHow to obtain information about a dump Display information of the involved volume: root@larsson:~>  zgetdump ­d /dev/dasdb                  /dev/dasdb is Version 0 dump device.  Dump size limit: none Display information about the dump itself: root@larsson:~>  zgetdump ­i /dev/dasdb1                   Dump device: /dev/dasdb1 Dump created on: Thu Oct  8 15:44:49 2009 Magic number:  0xa8190173618f23fd Version number:  3 Header size:  4096 Page size:  4096 Dumped memory:  1073741824 Dumped pages:  262144 Real memory:  1073741824 cpu id:  0xff00012320978000 System Arch:  s390x (ESAME) Build Arch:  s390x (ESAME) >>>  End of Dump header  <<< Dump ended on:  Thu Oct  8 15:45:01 200927 Dump End Marker found: this dump is valid. © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zHow to obtain information about a dump (contd) Display information about the dump itself (dump header) and check if the dump is valid, use lcrash with options ’-i’ and ’-d’. root@larsson:~>  lcrash ­i ­d /dev/dasdb1                      Dump Type: s390 standalone dump           Machine: s390x (ESAME)            CPU ID: 0xff00012320978000      Memory Start: 0x0        Memory End: 0x40000000       Memory Size: 1073741824      Time of dump: Thu Oct  8 15:44:49 2009   Number of pages: 262144  Kernel page size: 4096    Version number: 3      Magic number: 0xa8190173618f23fd  Dump header size: 4096        Dump level: 0x4        Build arch: s390x (ESAME)  Time of dump end: Thu Oct  8 15:45:01 200928 End Marker found! Dump is valid! © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zAutomatic dump on panic (SLES 10/11, RHEL 5/6): dumpconf The dumpconf tool configures a dump device that is used for automatic dump in case of a kernel panic. – The command can be installed as service script under /etc/init.d/dumpconf or can be called manually. – Start service: # service dumpconf start – It reads the configuration file /etc/sysconfig/dumpconf. – Example configuration for CCW dump device (DASD) and reipl after dump: ON_PANIC=dump_reipl   DUMP_TYPE=ccw DEVICE=0.0.471129 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zAutomatic dump on panic (SLES 10/11, RHEL 5): dumpconf(contd) – Example configuration for FCP dump device (SCSI disk): ON_PANIC=dump  DUMP_TYPE=fcp DEVICE=0.0.4714 WWPN=0x5005076303004712  LUN=0x4047401300000000 BOOTPROG=0 BR_LBA=0 – Example configuration for re-IPL without taking a dump, if a kernel panic occurs: ON_PANIC=reipl – Example of executing a CP command, and rebooting from device 4711 if a kernel panic occurs: ON_PANIC=vmcmd     VMCMD_1="MSG <vmguest> Starting VMDUMP"  VMCMD_2="VMDUMP" VMCMD_3="IPL 4711"30 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zGet dump and send it to service organization DASD/Tape: – Store dump to Linux file system from dump device: root@larsson:~>  zgetdump /dev/<device node> > dump_file – Alternative: lcrash (Compression possible) root@larsson:~>  lcrash ­d /dev/dasdxx ­s <dir> SCSI: – Get dump from filesystem Additional files needed for dump analysis: – SUSE (lcrash tool): /boot/System.map-xxx and /boot/Kerntypes-xxx – Redhat & SUSE (crash tool): vmlinux file (kernel with debug info) contained in debug kernel rpms: • RedHat: kernel-debuginfo-xxx.rpm and kernel-debuginfo-common-xxx.rpm • SUSE: kernel-default-debuginfo-xxx.rpm31 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zHandling large dumps Compress the dump and split it into parts of 1 GB root@larsson:~>  zgetdump /dev/dasdc1 | gzip | split ­b 1G Several compressed files such as xaa, xab, xac, .... are created Create md5 sums of the compressed files root@larsson:~>  md5sum xa* > dump.md5      Upload all parts together with the md5 information Verification of the parts for a receiver root@larsson:~>  md5sum ­c dump.md5  xaa: OK [....]   Merge the parts and uncompress the dump root@larsson:~>  cat xa* | gunzip ­c > dump32 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zTransferring dumps Transferring single volume dumps with ssh root@larsson:~>  zgetdump /dev/dasdc1 | ssh user@host "cat >  dump_file_on_target_host"  Transferring multi-volume dumps with ssh root@larsson:~>  zgetdump /dev/dasdc | ssh user@host "cat >  multi_volume_dump_file_on_target_host" Transferring a dump with ftp – Establish an ftp session with the target host, login and set the transfer mode to binary – Send the dump to the host root@larsson:~>  ftp> put |"zgetdump /dev/dasdc1"  <dump_file_on_target_host>33 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zDump tool summaryTool Stand alone tools VMDUMP DASD Tape SCSIEnvironment VM&LPAR VM&LPAR VMPreparation mkdir zipl -d /dev/<dump_dev> /dumps/mydumps --- zipl -D /dev/sda1 ...Creation Stop CPU & Store status vmdump ipl <dump_dev_CUU>Dump ECKD or LINUX file system Tape cartridges VM readermedium FBA on a SCSI diskCopy to Dumpload zgetdump /dev/<dump_dev>filesystem --- ftp ... > dump_file vmconvert ...Viewing lcrash or crashSee “Using the dump tools” book onhttp://www.ibm.com/developerworks/linux/linux390/documentation_dev.html34 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System z Customer Cases35 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zAvailability: Guest spontaneously reboots Configuration: HA Cluster – Oracle RAC server or other HA solution under z/VM Linux 1 Linux 2 Problem Description: communication – Occasionally guests spontaneously reboot without any notification or console message Oracle RAC Oracle RAC Tools used for problem Server Server determination: – cp instruction trace of (re)IPL code – Crash dump taken after trace was hit Oracle RAC Database36 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zAvailability: Guest spontaneously reboots - Steps to find rootcause Question: Who rebooted the system? Step 1 – Find out address of (re)ipl code in the system map – Use this address to set instruction trace cd /boot grep machine_restart System.map­2.6.16.60­0.54.5­default  000000000010c364 T machine_restart 00000000001171c8 t do_machine_restart 0000000000603200 D _machine_restart37 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zAvailability: Guest spontaneously reboots - Steps to find rootcause (contd) Step 2 – Set CP instruction trace on the reboot address – System is halted at that address, when a reboot is triggered CP CPU ALL TR IN R 10C364.4 HCPTRI1027I An active trace set has turned RUN off CP Q TR NAME  INITIAL     (ACTIVE)   1     INSTR   PSWA  0010C364­0010C367         TERM    NOPRINT  NORUN SIM         SKIP 00000  PASS 00000 STOP 00000  STEP 00000         CMD  NONE  ­> 000000000010C364  STMF    EBCFF0780024 >> 000000003A557D48      CC 238 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zAvailability: Guest spontaneously reboots - Steps to find rootcause (contd) Step 3 – Take a dump, when the (re)ipl code is hit cp cpu all stop cp store status Store complete.   cp i 4fc6 Tracing active at IPL HCPGSP2630I The virtual machine is placed in CP mode due to a  SOGP stop and store status from CPU 00. zIPL v1.6.3­0.24.5 dump tool (64bit) Dumping 64 bit OS 00000128 / 00001024 MB ...... 00001024 / 00001024 MB Dump successful HCPIR450W CP entered, disabled wait PSW 00020000 80000000  00000000 0000000039 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zAvailability: Guest spontaneously reboots - Steps to find rootcause (contd) Step 4 – Save dump in a file zgetdump /dev/dasdb1  > dump_file Dump device: /dev/dasdb1 >>>  Dump header information  <<< Dump created on: Wed Oct 27 12:00:40 2010 Magic number:  0xa8190173618f23fd Version number:  4 Header size:  4096 Page size:  4096 Dumped memory:  1073741824 Dumped pages:  262144 Real memory:  1073741824 cpu id:  0xff00012320948000 System Arch:  s390x (ESAME) Build Arch:  s390x (ESAME) >>>  End of Dump header  <<< Reading dump content ................................ Dump ended on:  Wed Oct 27 12:00:52 201040 © 2011 IBM Corporation Dump End Marker found: this dump is valid.
    • IBM Live Virtual Class – Linux on System zAvailability: Guest spontaneously reboots - Steps to find rootcause (contd) Step 5 – Use (l)crash, to find out, which process has triggered the reboot  STACK:  0 start_kernel+950 [0x6a690e]  1 _stext+32 [0x100020] ================================================================ TASK HAS CPU (1): 0x3f720650 (oprocd.bin):  LOWCORE INFO:   ­psw      : 0x0704200180000000 0x000000000010c36a   ­function : machine_restart+6   ­prefix   : 0x3f438000   ­cpu timer: 0x7fffffff 0xff9e6c00   ­clock cmp: 0x00c6ca69 0x22337760   ­general registers: <snip> /var/opt/oracle/product/crs/bin/oprocd.bin  STACK:  0 __handle_sysrq+248 [0x361240]  1 write_sysrq_trigger+98 [0x2be796]  2 sys_write+392 [0x225a68]41  3 sysc_noemu+16 [0x1179a8] © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zAvailability: Guest spontaneously reboots (contd) Problem Origin: – HA component erroneously detected a system hang • hangcheck_timer module did not receive timer IRQ • z/VM time bomb switch • TSA monitor z/VM cannot guarantee real-time behavior if overloaded – Longest hang observed: 37 seconds(!) Solution: – Offload HA workload from overloaded z/VM • e.g. use separate z/VM • or: run large Oracle RAC guests in LPAR42 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zNetwork: network connection is too slow Configuration: – z/VSE running CICs, connection to DB2 in Linux on System z – Hipersocket connection from Linux to z/VSE – But also applies to hipersocket connections between Linux and z/OS Problem Description: – When CICS transactions were monitored, some transactions take a couple of seconds instead of milliseconds Tools used for problem determination: – dbginfo.sh – s390 debug feature – sadc/sar – CICS transaction monitor43 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zNetwork: network connection is too slow (contd) s390 debug feature – Check for qeth errors: cat /sys/kernel/debug/s390dbf/qeth_qerr 00 01282632346:099575 2 ­ 00 0000000180b20218  71 6f 75 74 65 72 72 00 | qouterr. 00 01282632346:099575 2 ­ 00 0000000180b20298  20 46 31 35 3d 31 30 00 |  F15=10. 00 01282632346:099576 2 ­ 00 0000000180b20318  20 46 31 34 3d 30 30 00 |  F14=00. 00 01282632346:099576 2 ­ 00 0000000180b20390  20 71 65 72 72 3d 41 46 |  qerr=AF 00 01282632346:099576 2 ­ 00 0000000180b20408  20 73 65 72 72 3d 32 00 |  serr=2. dbginfo file – Check for buffer count: cat /sys/devices/qeth/0.0.1e00/buffer_count 16 Problem Origin: – Too few inbound buffers44 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zNetwork: network connection is too slow (contd) Solution: – Increase inbound buffer count (default: 16, max 128) – Check actual buffer count with lsqeth -p – Set the inbound buffer count in the appropriate config file: • SUSE SLES10: - in /etc/sysconfig/hardware/hwcfg-qeth-bus-ccw-0.0.F200 - add QETH_OPTIONS="buffer_count=128" • SUSE SLES11: - in /etc/udev/rules.d/51-qeth-0.0.f200.rules add ACTION=="add", SUBSYSTEM=="ccwgroup", KERNEL=="0.0.f200", ATTR{buffer_count}="128" • Red Hat: - in /etc/sysconfig/network-scripts/ifcfg-eth0 - add OPTIONS="buffer_count=128"45 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zHigh Disk response times Configuration: – z10, HDS Storage Server (Hyper PAV enabled) – z/VM, Linux with Oracle Database – VM controlled Minidisks attached to Linux, LVM on top Problem description: – I/O throughput not matching expectations – Oracle Database shows poor performance because of that – One LVM volume showing significant stress Tools used for problem determination: – dbginfo.sh – sadc/sar – z/VM Monitor data46 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zHigh Disk response times (contd)Observation in Linux Response timedm-9 0.00 0.00 49.75 0.00 19790.50 0.00 795.56 17.89 15.79 2.01 100.00 Throughput Utilization Conclusion PAV not being utilized No Hyper PAV support in SLES10 SP2 Static PAV not possible with current setup (VM controlled minidisks) Need to look for other ways for more parallel I/O – Link same minidisk multiple times to a guest – Use smaller minidisks and increase striping in Linux 47 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zHigh Disk response times (contd)Initial and proposed setup Physical Disk Logical Disk(s) VM Logical Disk(s) Linux1. Initial Setup Link multiple times multipath2. Link Minidisks to guest multiple times striping3. Smaller disks, more stripes LVM / Device mapper48 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zHigh Disk response times (contd)New Observation in Linux Response times stay equal Throughput equal No PAV being used!!49 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zHigh Disk response times (contd)Solution: check PAV setup in VM50 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zHigh Disk response times (contd)Finally: dasdaen         133.33     0.00  278.11    0.00 12298.51     0.00    88.44     0.79    2.83   1.48  41.29 dasdcbt         161.19     0.00  248.26    0.00 12260.70     0.00    98.77     0.91    3.47   1.88  46.77 dasdfwc         149.75     0.00  266.17    0.00 12374.13     0.00    92.98     1.88    7.07   2.54  67.66 dasdael         162.19     0.00  250.25    0.00 12483.58     0.00    99.77     1.90    7.57   2.86  71.64 dasddyz         134.83     0.00  277.61    0.00 12431.84     0.00    89.56     0.75    2.71   1.68  46.77 dasdaem         151.24     0.00  266.17    0.00 12595.02     0.00    94.64     2.01    7.61   2.82  75.12 dasdcbr         169.65     0.00  242.79    0.00 12386.07     0.00   102.03     1.72    7.05   2.83  68.66 dasdfwd         162.69     0.00  249.25    0.00 12348.26     0.00    99.08     1.92    7.70   2.83  70.65 dasddyy         157.21     0.00  259.70    0.00 12409.95     0.00    95.57     2.58    9.96   3.05  79.10 dasddyx         174.63     0.00  237.81    0.00 12374.13     0.00   104.07     1.76    7.38   2.93  69.65 dasdcbs         144.78     0.00  272.14    0.00 12264.68     0.00    90.14     2.53    9.31   2.89  78.61 dasda             0.00     0.00    0.00    1.00     0.00     3.98     8.00     0.01   10.00   5.00   0.50 dasdq             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00 dasdss            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00 dasdadx           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0. dasdawh           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00 dasdamk           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00 dasdaek         160.70     0.00  255.22    0.00 12382.09     0.00    97.03     2.27    8.95   2.88  73.63 dasdcbq         148.76     0.00  265.67    0.00 12372.14     0.00    93.14     2.14    8.01   2.85  75.62 dasddyw         162.19     0.00  254.23    0.00 12384.08     0.00    97.42     2.12    8.40   2.90  73.63 dasdfwe         146.27     0.00  271.64    0.00 12419.90     0.00    91.44     2.63    9.71   2.80  76.12 dasdfwf         162.19     0.00  249.75    0.00 12455.72     0.00    99.75     0.71    2.83   1.79  44.78 dasdb             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00 dm­0              0.00     0.00 1646.77    0.00 49494.53     0.00    60.11     5.08    3.04   0.36  59.70 dm­1              0.00     0.00 1665.17    0.00 49482.59     0.00    59.43    15.00    9.04   0.56  93.53 dm­2              0.00     0.00 1660.70    0.00 49432.84     0.00    59.53    13.46    8.11   0.55  90.55 dm­3              0.00     0.00 1647.26    0.00 49490.55     0.00    60.09    12.05    7.32   0.53  87.56 dm­4              0.00     0.00 1646.77    0.00 49494.53     0.00    60.11     5.08    3.04   0.36  59.70 dm­5              0.00     0.00 1665.17    0.00 49482.59     0.00    59.43    15.00    9.04   0.56  93.53 dm­6              0.00     0.00 1660.70    0.00 49432.84     0.00    59.53    13.46    8.11   0.55  90.55 dm­7              0.00     0.00 1647.26    0.00 49490.55     0.00    60.09    12.06    7.32   0.53  87.56 dm­8              0.00     0.00    0.00    1.99     0.00     7.96     8.00     0.00    0.00   0.00   0.00 dm­9              0.00     0.00  497.51    0.00 197900.50    0.00   795.56     7.89   15.79   2.01 100.0051 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zBonding throughput not matching expectations Configuration: – SLES10 system, connected via OSA card and using bonding driver Problem Description: – Bonding only working with 100mbps – FTP also slow Tools used for problem determination: – dbginfo.sh, netperf Problem Origin: – ethtool cannot determine line speed correctly because qeth does not report it Solution: – Ignore the 100mbps message – upgrade to SLES11 bonding: bond1: Warning: failed to get speed and duplex  from eth0, assumed to be 100Mb/sec and Full52 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zAvailability: Unable to mount file system after LVM changes Linux 2 Configuration: – Linux HA cluster with two nodes – Accessing same dasds which are exported via ocfs2 Linux 1 Problem Description: – Added one node to cluster, brought Logical Volume online – Unable to mount the filesystem from OCFS2 any node after that Tools used for problem dasda b c d e f determination: – dbginfo.sh Logical Volume53 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zAvailability: Unable to mount file system after LVM changes(contd) Linux 2 Problem Origin: – LVM metadata was overwritten when adding 3rd node Linux 3 Linux 1 – e.g. superblock not found Solution: – Extract meta data from {pv|vg|lv}create running node OCFS2 (/etc/lvm/backup) and write to disk again dasdf e c a b d Logical Volume54 © 2011 IBM Corporation
    • IBM Live Virtual Class – Linux on System zKernel panic: Low address protection Configuration: – z10 only – High work load – The more likely the more multithreaded applications are running Problem Description: – Concurrent access to pages to be removed from the page table Tools used for problem determination: – crash/lcrash Problem Origin: – Race condition in memory management in firmware Solution: – Upgrade to latest firmware! – Upgrade to latest kernels – fix to be integrated in all supported distributions55 © 2011 IBM Corporation