Information Gathering 2


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Bios is a collection of APIs Its also consists of configratiopn information User selectable varables APCI memory information SEL DMI logs sets resiters to determine behaviuor powernow and varioius enable/diable options.
  • Use eample of evnt on f02 | 03/21/2007 | 06:08:09 | System Firmware Progress | Motherboard initialization | Asserted 1002 | 03/21/2007 | 06:08:09 | Fan ft0.fm2.f1.speed | Lower Non-recoverable going low | Reading 2600 < Threshold 3000 RPM 1102 | 03/21/2007 | 06:08:10 | System Firmware Progress | Video initialization | Asserted 1202 | 03/21/2007 | 06:08:11 | Fan ft0.fm1.f0.speed | Lower Non-recoverable going low | Reading 2600 < Threshold 3000 RPM 1302 | 03/21/2007 | 06:08:14 | Fan | Predictive Failure Asserted 1402 | 03/21/2007 | 06:08:15 | Fan ft0.fm2.f1.speed | Lower Non-recoverable going high | Reading 7200 > Threshold 3000 RPM 1502 | 03/21/2007 | 06:08:15 | Fan | Predictive Failure Asserted 1602 | 03/21/2007 | 06:08:16 | Fan ft0.fm1.f0.speed | Lower Non-recoverable going high | Reading 7100 > Threshold 3000 RPM
  • Information Gathering 2

    1. 1. Information Gathering <ul><li>Michael Johnson </li></ul><ul><li>SYS-TSC-VSP </li></ul>
    2. 2. Information Gathering <ul><li>BIOS / DMI / POST events - overview </li></ul><ul><li>IPMI - Generic overview </li></ul><ul><li>SEL, SP, IPMI, DMI and register data </li></ul><ul><li>How to view gathered data and analysis of events </li></ul><ul><li>Types of events available and were to look </li></ul><ul><li>Normal / expected output </li></ul><ul><li>Explorer, siga, sysreport, mpsreport, msinfo32 </li></ul><ul><li>Working with non-service processor platforms </li></ul>
    3. 3. A Briefer on BIOS Execution Stages <ul><li>BIOS = Basic Input Output System </li></ul><ul><li>BIOS has certain execution phases, namely checkpoints, defined and reported throughout it's execution life </li></ul><ul><li>These checkpoints are both written out onto on a specific port (80,81), and reported at the bottom of POST screen if possible, via LCD display or SP (sp get port80 vX0z) progress is also available through sundiag - current-port80 on the Galaxy SP </li></ul>
    4. 4. BIOS Output (HT Sync Flood Example)
    5. 5. BIOS Stages <ul><li>Initial jump to Reset Vector </li></ul><ul><ul><ul><ul><li>At This stage,first CPU (also called Boot Strap Processor, or </li></ul></ul></ul></ul><ul><ul><ul><ul><li>BSP) has a hard-coded jump instruction to a globally_defined </li></ul></ul></ul></ul><ul><ul><ul><ul><li>address of F000:FFF0 </li></ul></ul></ul></ul><ul><li>Boot block execution from Flash ROM </li></ul><ul><ul><ul><ul><li>Consists of Initial and extremely efficient code for very early </li></ul></ul></ul></ul><ul><ul><ul><ul><li>bring-up of a minimalist system. During this phase: </li></ul></ul></ul></ul><ul><ul><ul><ul><li>HyperTransport Devices which are interconnected to CPU0 are </li></ul></ul></ul></ul><ul><ul><ul><ul><li>detected and linked to with minimum speed/bandwidth </li></ul></ul></ul></ul><ul><li>Available and working DIMM memory which is attached to each North bridge, is arranged in local memory map registers </li></ul>
    6. 6. BIOS Stages cont <ul><li>Potential Sync Flood is detected and flagged in CMOS </li></ul><ul><li>Compressed BIOS binary checksum is verified </li></ul><ul><li>Motherboard initialization: </li></ul><ul><ul><ul><li>All general chips are initialized </li></ul></ul></ul><ul><ul><ul><li>Hyper Transport, North Bridges </li></ul></ul></ul><ul><ul><ul><li>South Bridge, Tunnels, etc </li></ul></ul></ul><ul><li>Video initialization: Graphic is initialized </li></ul><ul><li>USB resource configuration: </li></ul><ul><ul><ul><li>South Bridge USB controller is initialized </li></ul></ul></ul><ul><ul><ul><li>Attached USB devices are recognized </li></ul></ul></ul>
    7. 7. General Execution Stages of BIOS cont <ul><li>Option ROM initialization: </li></ul><ul><ul><ul><li>All ROM binaries of available PCI devices are sequentially executed </li></ul></ul></ul><ul><ul><ul><li>(like SAS, GB LAN, etc.) </li></ul></ul></ul><ul><li>System boot initiation: </li></ul><ul><ul><ul><li>Execution is handed over to boot loader of an OS on certain Boot device </li></ul></ul></ul><ul><ul><ul><li>(Solaris on Disk0) </li></ul></ul></ul><ul><li>User-initiated system setup: </li></ul><ul><ul><ul><li>Alternate entry into BIOS Setup Page has happened </li></ul></ul></ul><ul><ul><ul><li>By pressing F2 during BIOS POST </li></ul></ul></ul>
    8. 8. IPMI <ul><li>Ipmitool is a utility for interfacing with devices that support the Intelligent Platform Management Interface specification. IPMI is an open standard for machine health, inventory, and remote power control. </li></ul><ul><li>Ipmitool can communicate with IPMI-enabled devices through either a kernel driver such as OpenIPMI or over the RMCP (Remote Management Control Protocol) LAN protocol defined in the IPMI specification. </li></ul><ul><li>IPMIv2 adds support for encrypted LAN communications and remote Serial-over-LAN functionality. </li></ul><ul><li>Ipmitool is found in /usr/sfw/bin on Solaris operating systems. </li></ul><ul><li>Ipmitool has to be installed as an option on Linux and Windows. </li></ul>
    9. 9. IPMI v2.0 Architecture Baseboard System Bus Bridge Controller ICMB Aux. IPMB Remote Mgmt. Card SMBus/PCI Mgmt. Bus Baseboard Mgmt. Controller (BMC) I 2 C/SMBus SDR, SEL, FRU NV Store Mgmt Netwk Ctrlr LAN PCI RS-232 MODEM / Serial IPMB (I 2 C) Chassis FRU SEEPROM System Interface SENSORs & control circuitry I 2 C / SMBus sensors & control circuitry Satellite Mgmt. Controller In Band Out of Band “ side-band” IPMI Messages
    10. 10. IPMI Block Diagram V20z:
    11. 11. IPMI <ul><li>SDR - Sensor Data Repository displays sensor values via i2c </li></ul><ul><li>SEL - System Event Log displays the SM BIOS log events </li></ul><ul><li>FRU - Field Replaceable Units displays the contents of a FRU PROM chip built onto the component itself. If the device does not have a FRU PROM chip, its values cannot be displayed </li></ul><ul><li>URL: </li></ul><ul><li>URL: </li></ul><ul><li>IPMI Specification Second Generation v2.0 </li></ul><ul><li> </li></ul><ul><li>Or of course, man ipmitool </li></ul>
    12. 12. Ipmitool Command Options: <ul><li>Commands: </li></ul><ul><li>raw Send a RAW request and print response i2c Send I2C master write-read command and print response </li></ul><ul><li>lan Configure LAN channels chassis Get chassis status and set power state </li></ul><ul><li>power Shortcut to chassis power commands event Send pre-defined events to MC </li></ul><ul><li>mc Management controller status and global enables sdr Print sensor data repository entries and readings </li></ul><ul><li>sensor Print detailed sensor information fru Print built-in FRU and scan SDR for FRU locators </li></ul><ul><li>sel Print system event log (SEL) pef Configure platform event filtering (PEF) </li></ul><ul><li>sol Configure and connect IPMIv2.0 Serial-over-LAN tsol Configure and connect with Tyan IPMIv1.5 Serial-over-LAN </li></ul><ul><li>isol Configure IPMIv1.5 Serial-over-LAN user Configure management controller users </li></ul><ul><li>channel Configure management controller channels session Print session information </li></ul><ul><li>sunoem OEM commands for sun servers kontronoem OEM Commands for Kontron devices </li></ul><ul><li>picmg Run a PICMG/ATCA extended cmd fwum Update IPMC using Kontron OEM Firmware Update Mgr </li></ul><ul><li>exec Run list of commands from file set Set runtime variable for shell and exec </li></ul>
    13. 13. IPMI Command Options <ul><li>ipmitool [-chvV] -I lan -H hostname [-p <port>] [-U <user-name>] </li></ul><ul><li>[-f <password_file>] [-S <sdrcache>] <command> </li></ul><ul><li>ipmitool [-chvV] -I lanplus -H hostname [-p <port>] [-U </li></ul><ul><li><username>] [-f <password_file>] [-S <sdrcache>] <command> </li></ul><ul><li>ipmitool [-chvV] [-S <sdrcache>] -I bmc <command> </li></ul><ul><ul><ul><li>-c Present output in CSV (comma separated variable) format. </li></ul></ul></ul><ul><ul><ul><li>-f <password_file> </li></ul></ul></ul><ul><ul><ul><li>-v Increase verbose output level. Each instance increases verbosity </li></ul></ul></ul><ul><ul><ul><li>-V Display version information. </li></ul></ul></ul>
    14. 14. IPMI Shell Command <ul><li>An alternative to this is to use the provided shell interface to issue repeated commands that will all use the same automatically generated cache. The shell interface is not available with all platforms and sometimes it is more advantageous to use the cache method (SDR), but if you are going to be analyzing the SEL chances are you will want to issue multiple commands and the shell interface makes this much easier with command history and editing. </li></ul><ul><ul><ul><ul><li>ipmitool -I lanplus -H IPADDR -U root -P changeme shell </li></ul></ul></ul></ul><ul><ul><ul><ul><li>ipmitool> sel elist </li></ul></ul></ul></ul><ul><ul><ul><ul><li>100 | Pre-Init Time-stamp | Entity Presence ps1.prsnt | Device Absent </li></ul></ul></ul></ul><ul><ul><ul><ul><li>200 | Pre-Init Time-stamp | Entity Presence io.f0.prsnt | Device Absent </li></ul></ul></ul></ul><ul><ul><ul><ul><li>300 | Pre-Init Time-stamp | Power Supply ps0.vinok | State Asserted </li></ul></ul></ul></ul><ul><li>NOTE: In order to improve readability and avoid repeating useless command line arguments all further examples will assume that the shell interface is being used and that an appropriate session is already established either over LAN interface or using KCS interface and an OS kernel driver. </li></ul>
    15. 15. IPMI Sel Elist for X4450 <ul><li>Typically and uncorrectable or correctable memory error is reported as the following: </li></ul><ul><ul><li>0 | Pre-Init Time-stamp | Memory | Correctable Error | Asserted | CPU 0 DIMM(PAIR) 0 </li></ul></ul><ul><ul><li>0 | Pre-Init Time-stamp | Memory | Uncorrectable Error | Asserted | CPU 0 DIMM(PAIR) 0 </li></ul></ul><ul><li>Note the updates in firmware to include the term (PAIR) to aid misconception in FRU part. </li></ul><ul><li>The BIOS is responsible for DIMM ECC handling. When a CE/UE error occurs, the chipset will generate a SMI, the BIOS will detect it and send a SEL event to the BMC. The IPMI spec. defines the format of SEL for DIMM ECC events. In addition, since the UE may cause the system hang or reset, the BIOS checks the UE status bit during early post stage and fires a SEL event to BMC as appropriate. </li></ul><ul><li>Make sure the system is running the latest firmware. Earlier firmware reported as follows: </li></ul><ul><ul><li>19 | 02/14/2008 | 19:53:51 | Memory #0x7a | Correctable ECC | Asserted </li></ul></ul>
    16. 16. Entities <ul><li>An important foundation for sensors and events and FRUs is the concept of entities. In IPMI every sensor is assigned an entity ID and instance, which at its most basic level is a classification system that helps define what device type and number a sensor monitors. These are closely related to the concept of a Field Replaceable Unit in that a particular entity ID and instance can be used to describe a FRU. </li></ul><ul><li>An entity ID is mapped to a physical device through a table defined in the IPMI specification. For our purposes the following entities are used on the Sun Fire X4000 platform: </li></ul>Their primary usefulness is for grouping and querying sensors, and in fact they come in very handy with ipmitool to do specific sensor queries.
    17. 17. Entities cont <ul><li>For example to see all power supply related sensors: </li></ul><ul><li>ipmitool> sdr entity 10 </li></ul><ul><li>ps0.prsnt | 1Ch | ok | 10.0 | Device Present </li></ul><ul><li>ps0.pwrok | 1Dh | ok | 10.0 | State Deasserted </li></ul><ul><li>ps0.vinok | 1Eh | ok | 10.0 | State Asserted </li></ul><ul><li>ps1.prsnt | 1Fh | ok | 10.1 | Device Present </li></ul><ul><li>ps1.pwrok | 20h | ok | 10.1 | State Deasserted </li></ul><ul><li>ps1.vinok | 21h | ok | 10.1 | State Deasserted </li></ul><ul><li>As you can see above the Entity ID and Instance is provided in the sdr elist command output in the form: </li></ul><ul><li><entity ID>.<entity instance>. </li></ul>
    18. 18. Entities cont <ul><li>Entities can also be used for entity lookups whereby you can associate a sensor with other sensors by finding all other sensors with the same entity ID and instance as the one you are looking up. This allows you to associate a sensor (and event). </li></ul><ul><ul><li>Sensor ID : ft0.fm2.f1.speed (0x49) </li></ul></ul><ul><ul><li>Entity ID : 29.2 </li></ul></ul><ul><ul><li>Sensor Type (Analog) : Fan </li></ul></ul><ul><ul><li>Sensor Reading : 7000 (+/- 0) RPM </li></ul></ul><ul><ul><li>Status : ok </li></ul></ul><ul><ul><li>Lower Non-Recoverable : 3000.000 </li></ul></ul><ul><ul><li>Lower Critical : na </li></ul></ul><ul><ul><li>Lower Non-Critical : na </li></ul></ul><ul><ul><li>Upper Non-Critical : na </li></ul></ul><ul><ul><li>Upper Critical : na </li></ul></ul><ul><ul><li>Upper Non-Recoverable : 22000.000 </li></ul></ul><ul><ul><li>Assertions Enabled : lnr- unr+ </li></ul></ul><ul><ul><li>Deassertions Enabled : lnr- unr+ </li></ul></ul><ul><ul><li>SEL Record ID : 1002 </li></ul></ul><ul><ul><li>Record Type : 02 </li></ul></ul><ul><ul><li>Timestamp : 03/21/2007 06:08:09 </li></ul></ul><ul><ul><li>Generator ID : 0020 </li></ul></ul><ul><ul><li>EvM Revision : 04 </li></ul></ul><ul><ul><li>Sensor Type : Fan </li></ul></ul><ul><ul><li>Sensor Number : 49 </li></ul></ul><ul><ul><li>Event Type : Threshold </li></ul></ul><ul><ul><li>Event Direction : Assertion Event </li></ul></ul><ul><ul><li>Event Data (RAW) : 541a1e </li></ul></ul><ul><ul><li>Trigger Reading : 2600.000 RPM </li></ul></ul><ul><ul><li>Trigger Threshold : 3000.000 RPM </li></ul></ul><ul><ul><li>Description : Lower Non-recoverable going low </li></ul></ul>
    19. 19. Threshold <ul><li>Threshold (or analog) sensors are used for inputs like temperature, voltages, and fan speeds. These are connected to a sensor chip and read by the service processor using I2C. They can have both upper and lower thresholds set and multiple different events can be configured around each of them. </li></ul><ul><li>There can be zero or more of the following thresholds configured as upper or lower bounds for each threshold sensor: </li></ul><ul><li>* Non-Critical * Critical * Non-Recoverable </li></ul><ul><li>These are commonly represented in short form with ipmitool, here is how the short names map to the longer names </li></ul><ul><li>Each threshold can also have both a direction and a status flag attached to it. The use of a &quot;+&quot; or &quot;-&quot; is used to indicate whether that particular threshold applies to a reading that is going high or going low, respectively. The status flag indicates whether the event is generated is an assertion or a deassertion event. Each of these can be configured independently by setting various bits in the sensor data record. </li></ul>
    20. 20. Threshold cont <ul><li>Example Thresholds: </li></ul>
    21. 21. Discrete Sensors <ul><li>While threshold sensors represent analog readings, discrete sensors can be considered to represent digital readings. There are different types of discrete sensors as well: </li></ul><ul><li>Generic Discrete: these sensors can have various flags from a generic pre-defined table. None of these sensors are used on this platform. </li></ul><ul><li>Digital Discrete: similar to the above but are for sensors that only represent one of two possible states. Think of a GPIO pin where the state is either 1 or 0. There are many sensors of this type used on this platform, everything from presence detection sensors to LED state sensors. </li></ul><ul><li>Sensor-Specific Discrete: similar to the first type these sensor types have any number of flags defined depending on the sensor type. This makes it possible to represent a large number of sensors, each with different states. There is only one of these types of sensor on this platform and it belongs to the Chassis Intrusion sensor. </li></ul>
    22. 22. BIOS Error Handling and Reporting : Correctable Errors – DMI Log
    23. 23. <ul><li>Sync Flooding is a HyperTransport™ method used to stop data propagation in the case of a serious error </li></ul><ul><li>Device that detects the error initiates sync flood packets </li></ul><ul><li>All other HT devices cease operation </li></ul><ul><li>They transmit sync flood packets out of all HT links </li></ul><ul><li>Packets finally reach the platform South Bridge (AMD8111/NF2200/3400) </li></ul><ul><li>BIOS has Pre-programmed SB to trigger system RESET signal, when sync flood is detected. System reboots </li></ul><ul><li>During Boot Block and POST, BIOS analyzes related error bits in all HT Nodes, reports of Sync Flood reasons </li></ul><ul><li>BIOS detects some of the error sources, but not all of them </li></ul>BIOS Error Handling & Reporting Overview
    24. 24. Error Handling and Reporting Uncorrectable ECC Errors
    25. 25. Rev.F BIOS Error Handling Framework <ul><li>General Conditions: </li></ul><ul><li>1- Two lines of SEL Log will be reported for each Node with valid fault and MCA address </li></ul><ul><li>2- One line of SEL Log will be reported for each node with evidence of a generic sync flood and no valid address </li></ul><ul><li>3- If a node has detected CRC errors on incoming HT links... </li></ul><ul><ul><ul><li>Link# will be appended after &quot;FEDBADF00D&quot; (indicator of a CRC fed to HT link)... </li></ul></ul></ul><ul><ul><ul><li>&quot;000201&quot; comes in four least significant nibbles (00[future link4, zero for now][link2][link1][link0]) </li></ul></ul></ul><ul><ul><ul><ul><li>each nibble content represents: </li></ul></ul></ul></ul><ul><ul><ul><ul><li>0 = no CRC Error was received on this link </li></ul></ul></ul></ul><ul><ul><ul><ul><li>1 = incoming Byte0 Link received the CRC error </li></ul></ul></ul></ul><ul><ul><ul><ul><li>2 = incoming Byte1 Link received the CRC Error </li></ul></ul></ul></ul><ul><ul><ul><ul><li>3 = incoming Bytes0+1 Links both saw CRC Errors on incoming links </li></ul></ul></ul></ul>
    26. 26. Data Gathering Tools <ul><ul><ul><ul><ul><ul><li>Windows MPSreports </li></ul></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><ul><li>Solaris - Explorer </li></ul></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><ul><li>Redhat -sysreport </li></ul></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><ul><li>Suse – Siga, supportconfig </li></ul></ul></ul></ul></ul></ul>
    27. 27. Windows MPSreports <ul><li>Microsoft Premier Field Engineering (PFE) Reporting Utility </li></ul><ul><ul><ul><li>Supported Operating Systems: </li></ul></ul></ul><ul><ul><ul><li>Windows NT 4.0 </li></ul></ul></ul><ul><ul><ul><li>Windows 2000 </li></ul></ul></ul><ul><ul><ul><li>Windows XP </li></ul></ul></ul><ul><ul><ul><li>Windows Server 2003 x86 </li></ul></ul></ul><ul><ul><ul><li>Windows Server 2003 x64 (AMD64) </li></ul></ul></ul><ul><ul><ul><li>Windows Server 2003 for Itanium </li></ul></ul></ul>
    28. 28. Windows MPSreports cont <ul><li>The PFE version of MPS_REPORTS gathers a wide range of diagnostic information from Windows and limited information for server applications installed such as SQL or Exchange. </li></ul><ul><li>The user running the utility must have administrative privileges on the system. </li></ul><ul><li>During runtime there are uncompressed text files that could possibly consume 100 MB or more or more of disk space, depending on the size of the event logs. Most of the space is freed when the reporting tool finishes, leaving behind only the tools folder and the .cab file, which can be deleted if no longer needed. </li></ul>
    29. 29. Windows MPSreports <ul><li>Microsoft Product Support Reporting Tool </li></ul><ul><li>Supported Operating Systems: </li></ul><ul><ul><ul><li>Windows 2000 </li></ul></ul></ul><ul><ul><ul><li>Windows NT </li></ul></ul></ul><ul><ul><ul><li>Windows Server 2003 </li></ul></ul></ul><ul><ul><ul><li>Windows XP </li></ul></ul></ul><ul><li>There are 8 specialty versions, one for each of the following support scenario categories: </li></ul><ul><li>Alliance, Directory Services (not for Windows NT 4.0), Networking, Clustering, SQL, Software Update Services, MDAC and Base/Setup/Storage/Print/Performance. </li></ul><ul><li>Each version gathers some of the same basic information but there are specific reports unique to each of the support scenario categories. </li></ul><ul><li>Please read the readme.txt files for more details about each version. </li></ul>
    30. 30. Windows MPSreports cont <ul><li>The Microsoft Product Support Reporting Tool facilitates the gathering of critical system and logging information used in troubleshooting support issues. The reporting tool DOES NOT make any registry changes or modifications to the operating system. </li></ul><ul><li>Files created are zipped CAB files cabextract can </li></ul><ul><li>be used on Solaris to extract. </li></ul><ul><li>Microsoft Product Support Reporting Tool </li></ul><ul><li>Cabextract website for Solaris and other packages. </li></ul>/usr/local/gnu/bin/cabextract - - help
    31. 31. Windows Utilities “msconfig” <ul><li>The System.ini file </li></ul><ul><li>The Win.ini file </li></ul><ul><li>The Boot.ini file </li></ul><ul><li>Programs that are set to load during the startup process </li></ul><ul><li>Environment settings </li></ul><ul><li>International settings </li></ul><ul><li>Description of Windows XP System Information (Msconfig.exe) Tool </li></ul>
    32. 32. Windows Utilities “msinfo32” <ul><li>Hardware Resources </li></ul><ul><li>Components </li></ul><ul><li>Software Environment </li></ul><ul><li>Applications </li></ul><ul><li>Internet Explorer </li></ul><ul><li>Description of Windows XP System Information (Msinfo32.exe) Tool </li></ul>
    33. 33. Windows Event Viewer <ul><li>The Event Logs for Windows include: </li></ul><ul><li>System Event Log </li></ul><ul><li>Security Event Log </li></ul><ul><li>Application Event Log </li></ul><ul><li>Directory Service </li></ul><ul><li>File Replication Service </li></ul>
    34. 34. Other Windows Utilities <ul><li>System Monitor - perfmon.exe </li></ul><ul><li>Ipconfig Dosutil network information </li></ul><ul><li>Device Manager / Control Panel </li></ul><ul><li>Network Sniffers Ethereal Wireshark </li></ul><ul><li>Dumps - three memory dump file types: </li></ul><ul><ul><ul><li>Complete memory dump </li></ul></ul></ul><ul><ul><ul><li>Kernel memory dump </li></ul></ul></ul><ul><ul><ul><li>Small memory dump (64 KB) </li></ul></ul></ul><ul><li>Overview of memory dump file options for Windows Server 2003 and XP </li></ul>
    35. 35. Windows DRAM Limitations <ul><li> </li></ul><ul><li>Distribution 32-bit 64-bit </li></ul><ul><li>Windows Server 2003 SP2, Datacenter 128GB 4TB </li></ul><ul><li>Windows Server 2003 SP2, Enterprise Edition 8GB 2TB </li></ul><ul><li>Windows Storage Server 2003, Enterprise Edition 8GB - </li></ul><ul><li>Windows Storage Server 2003 4GB - </li></ul><ul><li>Windows Server 2003 R2 Datacenter Edition 128GB 1TB </li></ul><ul><li>Windows Server 2003 SP1, Datacenter Edition 128GB 1TB </li></ul><ul><li>Windows Server 2003 R2 Enterprise Edition 64GB 1TB </li></ul><ul><li>Windows Server 2003 SP1, Enterprise Edition 64GB 1TB </li></ul><ul><li>Windows Server 2003, Standard Edition 4GB 16GB </li></ul>
    36. 36. Solaris Explorer <ul><li>A new version has been released 5.8 it is available </li></ul><ul><li>from the sun download site </li></ul><ul><li>Sun Explorer Data Collector version 5.8 </li></ul><ul><li>Supported Platforms </li></ul><ul><ul><ul><li>Solaris 7/8/9/10 </li></ul></ul></ul><ul><ul><ul><li>SPARC/Solaris 9/10 </li></ul></ul></ul><ul><ul><ul><li>X86 Platform </li></ul></ul></ul>
    37. 37. Solaris Explorer <ul><li>The following describes the changes made in Sun Explorer 5.8: </li></ul><ul><li>New Features </li></ul><ul><li>Explorer 5.8 now collects /usr/sbin/smbios output. (6412712) </li></ul><ul><li>Modified Features </li></ul><ul><li>Now supports transport target with HTTP protocol in addition to HTTPS protocol. </li></ul><ul><li>Because some companies have problems using HTTPS internally, such as in the DMZ net, Explorer supports the use of HTTP from the DMZ to the company's own HTTP endpoint. (6471251) </li></ul><ul><li>Explorer 5.8 restores the ability to gather system controller data via telnet with the scextended and 1280extended modules. </li></ul><ul><li>A defect in Explorer 5.7 caused these telnet connections to be blocked unless an ssh connection was made to accept the host key, before running Explorer. (6521236, 6528445 ) </li></ul>
    38. 38. Explorer Modules “ilomextended” <ul><li>Explorer -if <ipmiinput.txt-file> -w <module-list> </li></ul><ul><li>Collects remote Integrated Lights Out Manager </li></ul><ul><ul><ul><ul><li>(ILOM) Intelligent Platform Management </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Interface (IPMI) data from Galaxy systems. </li></ul></ul></ul></ul><ul><li>Commands Collected </li></ul><ul><li>/usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} mc info /usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} mc getenables </li></ul><ul><li>/usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} chassis poh /usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} chassis power status </li></ul><ul><li>/usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} fru print /usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} pef status </li></ul><ul><li>/usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} pef list /usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} sdr list full </li></ul><ul><li>/usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} sel info /usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} sel elist </li></ul><ul><li>/usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} sensor list /usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} user summary </li></ul><ul><li>/usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} user list /usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} sunoem led get </li></ul><ul><li>/usr/sfw/bin/ipmitool -H {host} -p {port} -U {user} -f {pwfile} chassis restart_cause </li></ul>
    39. 39. Explorer Modules “ipmi” <ul><li>Collects local Intelligent Platform Management Interface (IPMI) data on x86 platform. </li></ul><ul><li>The following commands are collected: </li></ul><ul><li>/usr/sfw/bin/ipmitool chassis status /usr/sfw/bin/ipmitool chassis poh </li></ul><ul><li>/user/sfw/bin/ipmitool chassis power status /user/sfw/bin/ipmitool chassis restart_cause </li></ul><ul><li>/usr/sfw/bin/ipmitool fru /user/sfw/bin/ipmitool fru print </li></ul><ul><li>/user/sfw/bin/ipmitool mc getenables /user/sfw/bin/ipmitool mc info </li></ul><ul><li>/usr/sfw/bin/ipmitool pef status /usr/sfw/bin/ipmitool pef list </li></ul><ul><li>usr/sfw/bin/ipmitool sel info /usr/sfw/bin/ipmitool sel elist </li></ul><ul><li>/user/sfw/bin/ipmitool sdr enlist full /usr/sfw/bin/ipmitool sdr list all info </li></ul><ul><li>/user/sfw/bin/ipmitool sensor list /user/sfw/bin/ipmitool sunoem led get </li></ul>
    40. 40. Griffon Knowledge Engine <ul><li>Sun Alerts and FINS checked against </li></ul><ul><li>Recommended Patches for Solaris 7, 8, 9 and 10 (sparc) (x86) checked against </li></ul><ul><li>Firmware Patches checked against </li></ul><ul><li>ASP Group Patches checked against </li></ul><ul><li>BADPATCH/WITHDRAWN Patches checked against </li></ul><ul><li>Griffin Knowledge Engine Home page Griffon System Analysis Job Submission Form </li></ul><ul><li>Features of the Griffon System Analysis Job Submission Form </li></ul><ul><ul><ul><li>Explorer files may be referenced via a network path. </li></ul></ul></ul><ul><ul><ul><li>Provides a high-level matrix to highlight critical problem areas. </li></ul></ul></ul><ul><ul><ul><li>Individual System Analysis Reports </li></ul></ul></ul><ul><ul><ul><li>Container files of hostids to group results </li></ul></ul></ul><ul><ul><ul><li>Hostid references utilize the current Explorer file Proactive servers </li></ul></ul></ul><ul><ul><ul><li>Patch references are linked to external SunSolve. </li></ul></ul></ul>
    41. 41. Sun Gathering Debug Data (Sun GDD) <ul><li>Tools provide the right approach to problem resolution by leveraging proactive actions and best practices to help you gather the required debug data needed for further analysis. </li></ul><ul><li>For each product covered, GDD tools provide documentation and scripts which detail the relevant data the Sun Technical Support Center requires for analyzing your problem </li></ul>
    42. 42. Sun Gathering Debug Data (Sun GDD) <ul><li>The tools gather 90% of the debug data frequently requested by the Sun Technical Support Center - including data for more common critical situations including memory, start/stop, installation, hang, and crash issues. By collecting this data before you initiate a Service Request, you can substantially reduce the time needed to analyze and resolve the problem. </li></ul><ul><ul><ul><li>Sun Java System Calendar Server </li></ul></ul></ul><ul><ul><ul><li>Sun Java System Directory Editor </li></ul></ul></ul><ul><ul><ul><li>Sun Java System Directory Proxy Server </li></ul></ul></ul><ul><ul><ul><li>Sun Java System Directory Server </li></ul></ul></ul><ul><ul><ul><li>Sun Java System Messaging Server </li></ul></ul></ul><ul><ul><ul><li>Sun Java System Portal Server </li></ul></ul></ul><ul><ul><ul><li>Sun Java System Web Proxy Server 3.6 Service Pack 9 </li></ul></ul></ul><ul><ul><ul><li>Sun Java System Web Server </li></ul></ul></ul><ul><li>External GDD sun website </li></ul><ul><li>Sun Gathering Debug Data (GDD) (WZT-0253) - Web based </li></ul>
    43. 43. SUSE “supportconfig” <ul><li>Detailed system information and logs are collected and organized in a manner that helps reduce service request resolution times. Private system information can be disclosed when using this tool. If this is a concern, please prune private data from the log files. Several startup options are available to exclude more sensitive information. Refer to the man page to see these options. </li></ul><ul><li>Output format is better the SIGA scripts, however needs to be installed from the SUSE web site. </li></ul>
    44. 44. Supportconfig Installation <ul><li>The script has been renamed from to supportconfig. It has been repackaged as an RPM for easy installation and updating. There is now a supportconfig man page. </li></ul><ul><li>1. Download supportconfig-2.13-0.4.noarch.rpm </li></ul><ul><li>2. Install the RPM rpm -Uvh supportconfig-2.13-0.4.noarch.rpm </li></ul><ul><li>3. Run the supportconfig command as root </li></ul><ul><li>4. To view the information files, assuming the tarball filename is: </li></ul><ul><li>/var/log/nts_host1_070406_1300.tar.bz2 </li></ul><ul><li>cd /var/log </li></ul><ul><li> tar jxf nts_host1_070406_1300.tar.bz2 </li></ul><ul><li>cd nts_host1_070406_1300 </li></ul><ul><li>ls -l </li></ul>
    45. 45. VMware <ul><li>ESX Server includes a script called vm-support, which collects information that VMware support might request and packages that information into one file. </li></ul><ul><li>To make sure you have version 1.14 or later of the script Version 1.14 (and later) of the vm-support script produces more diagnostic information than previous versions of the script. </li></ul><ul><li>ESX Server 3.x includes a version of vm-support newer than 1.14. You don't need to check the version of this script or update it. </li></ul><ul><li>ESX Server 2.x might include a version earlier than 1.14. To see which version is installed on your system, run the command with no options. For example: </li></ul><ul><ul><ul><ul><li>[user@esx2host]$ cd /tmp </li></ul></ul></ul></ul><ul><ul><ul><ul><li>[user@esx2host]$ /usr/bin/vm-support </li></ul></ul></ul></ul><ul><ul><ul><ul><li>VMware ESX Server Support Script 0.94 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Preparing Files: [Ctrl+C to cancel] </li></ul></ul></ul></ul><ul><li>If you have a version earlier than 1.14, follow the instructions later in this presentation. </li></ul>
    46. 46. VMware cont <ul><li>To collect diagnostic information using the script: </li></ul><ul><li>1. Log on to the service console as root </li></ul><ul><li>2. Confirm what version of VMware ESX Server you are running: </li></ul><ul><ul><ul><ul><li>[root@esxhost]# vmware -v </li></ul></ul></ul></ul><ul><li>3. Change to the directory where you want the output to appear. For example: </li></ul><ul><ul><ul><ul><li>[root@esxhost]# cd /tmp </li></ul></ul></ul></ul><ul><li>If you run the script in /usr/bin, the output will appear in that directory and remain there until you delete it </li></ul><ul><li>4. Run the script: </li></ul><ul><ul><ul><ul><li>[root@esxhost]# /usr/bin/vm-support </li></ul></ul></ul></ul><ul><li>You don't need to power off your virtual machines before running this script </li></ul><ul><li>For ESX Server performance issues, VMware Technical Support might ask you to collect performance snapshots using the the -s and -S switches. Please refer to: </li></ul><ul><ul><ul><ul><li> for more information </li></ul></ul></ul></ul><ul><li>5. When the script finishes, it informs you of the output filename and location </li></ul>
    47. 47. VMware Upgrading “vm-support” <ul><li>Download vm-support script and move it to the /tmp directory on the service console of the ESX Server system </li></ul><ul><li>Make a backup copy of your existing script: </li></ul><ul><ul><li>[root@esxhost]# cp /usr/bin/vm-support /usr/bin/vm-support.old </li></ul></ul><ul><li>Extract the archive to /usr/bin/, replacing the original vm-support script: </li></ul><ul><ul><li>[root@esxhost]# cd /usr/bin </li></ul></ul><ul><ul><li>[root@esxhost]# tar xvzf /tmp/653_fvm-support_114.tgz </li></ul></ul><ul><li>The archive places the vm-support script in the current directory. </li></ul><ul><li>Note: When running on an older version of ESX Server, the updated script might report errors about missing commands. This is normal. </li></ul><ul><li>Supports Versions: </li></ul><ul><ul><li>VMware ESX Server 2.0.x VMware ESX Server 2.1.x VMware ESX Server 2.5.x VMware ESX Server 3.0.x </li></ul></ul>
    48. 48. Various Other Tools Available <ul><li>Dmidecode Available Solaris Linux later versions </li></ul><ul><li>Biosdecode Available Linux part of dmidecode 2.9 </li></ul><ul><li>Smbios Solaris </li></ul><ul><li> Graphics 'X' data gather </li></ul><ul><li>Kstat Solaris Various information gathered </li></ul><ul><li>Scanpci Linux Solaris </li></ul><ul><li>Pcitweak Linux (SuSe) Solaris </li></ul><ul><li>Setpci Linux Redhat </li></ul>
    49. 49. Working With Non-SP Machines <ul><li>You have no IPMI data </li></ul><ul><li>You are likely to have no DMI logs either </li></ul><ul><li>Some systems will pause on boot to indicate syncflood and other serious events </li></ul><ul><li>Platform messages </li></ul><ul><li>PCI.exe could gather northbridge registers </li></ul><ul><li>POST codes POST LEDs </li></ul><ul><li>What does occur LEDS, fans Graphics </li></ul>
    50. 50. Lab Connecting To Remote Platforms <ul><li>Ultra 24 </li></ul><ul><li>X2100/X2200 </li></ul><ul><li>X4100/X4200 Not available </li></ul><ul><li>X4150 Should be available by 12pm </li></ul><ul><li>X4600 </li></ul><ul><li>Collect ipmi SEL, SDR and FRU data in/out of band </li></ul>
    51. 51. Thank you <ul><li>Michael Johnson </li></ul><ul><ul><li>[email_address] </li></ul></ul>