Xtw01t11v0901 troubleshooting

1,601 views

Published on

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,601
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • {DESCRIPTION} {TRANSCRIPT} Welcome, this section covers the Troubleshooting aspects of servers. This is Topic 11 in a series of topics of the System x Technical Principles Course – XTW01.
  • {DESCRIPTION} {TRANSCRIPT} At the completion of this topic, you should be able to: Identify basic troubleshooting questions to consider Identify the six possible states of a system Identify diagnostic tools that are available to gather and analyze information for each given system state
  • {DESCRIPTION} {TRANSCRIPT} This course is designed to familiarize you with troubleshooting tools that will help to determine the appropriate solutions to your system’s problem. We will examine the six system states to gather and analyze problem, introduce the basic tools that will help identify and determine any system issues, and introduce the Dynamic System Analysis (DSA) tool which collects and analyzes system information to aid in diagnosing system problems. The next slides describes the troubleshooting questions to consider.
  • {DESCRIPTION} {TRANSCRIPT} When troubleshooting System x and BladeCenter blade servers, knowing the answers to these questions can help lead you toward a quicker fix. It also will give you an idea of how to enter the diagnostic package and where to look for error indications. In the next slides, we will take a look at other tools available to help you diagnose and solve hardware-related problems as well as software installation and configuration options.
  • {DESCRIPTION} {TRANSCRIPT} This section examines the six system states.
  • {DESCRIPTION} {TRANSCRIPT} When trying to identify a system problem consider the six possible states of a system. All IBM xSeries, eServer (AMD processor-based) and System x servers start in a uniform manner. All have a common set of interfaces to advise where in the power-up sequence the server has reached. Knowing how far the system gets helps in determining what should have happened. This will help you in your problem analysis.
  • {DESCRIPTION} {TRANSCRIPT} All servers are supported by documentation, which forms part of the tool set for both information gathering and information analysis. For example, a Problem Determination and Service Guide (PDSG), contains a list of errors that may occur (information gathering) during POST but also contain probable causes of the error (information analysis). Not all information sources are available in all system states. Let’s take a closer look at the individual states.
  • {DESCRIPTION} {TRANSCRIPT} In this state, there are likely to be no indicators from the system. However, that fact is an indicator in itself. A system with no AC power cannot start so there will be no fan noise, the disks will not spin and there will be no lights. Your information gathering tools are your eyes and ears. Sight and sound clues are often overlooked but a lack of visual or audible indicators can be used for problem isolation. The PDSG contains information in several places about such situations and offers guidance on how to test the condition and what action to take to resolve it.
  • {DESCRIPTION} {TRANSCRIPT} Here, AC power is present. There may be a visual indicator on the outside of the power supply. Depending on the configuration of the system, there may also be event logs from a BMC or an RSA/management module if one is installed in the system. The information gathering tools for BMC are the SMBridge utility and the IBM SVCCon tool. The BMC log is in IPMI standard format and may require some interpretation to identify any problems. The RSA log is in human readable format and will indicate more clearly what is wrong. Remember that the tools from system state 1 are also available in this state. The PDSG will be useful in understanding any event log information that is available.
  • {DESCRIPTION} {TRANSCRIPT} In this state, something has caused the system to fail in POST. If POST can start, there may be audible warnings or visual warnings in the form of POST messages such as check point codes, adapter messages, etc. In addition, if POST is able to load the setup and diagnostic routines, you may be able to use these tools. If so, both have the ability to view the event logs of the system, giving you another option to view system status. If POST fails on an adapter, the adapter may display messages that give clues to what is failing. Information on adapter failure messages may be contained in the PDSG, RETAIN tips or on the IBM Support Web site. All of these tools are useful in trying to understand the nature of the problem.
  • {DESCRIPTION} {TRANSCRIPT} Here, POST has completed. This is a strong indicator that the hardware has checked out and is working as designed. However, it does not mean that the hardware is ‘fit for purpose’. As the ‘purpose’ is to have the hardware behave as a server, something is stopping the software from loading. Areas to examine here may include diagnostic checks on disks, the disk boot sequence and any RAID configurations to confirm that a viable RAID array exists and can load an OS. Again, RETAIN tips and the IBM support Web site may have information that can help diagnose why the OS is failing to start.
  • {DESCRIPTION} {TRANSCRIPT} In this state, the OS has started its boot process but is failing to complete and provide a console. At this stage, the OS itself may give indicators of what is wrong. For example, a Microsoft Windows server OS shows progress indicators on the screen and may report software error messages. This information is likely to be documented on the Microsoft support Web site. Some of the information may also be contained within IBM RETAIN tips. It is important to check all available sources when information is available and verify any fault diagnosis with two confirmations if they are available.
  • {DESCRIPTION} {TRANSCRIPT} Finally, this state indicates that the system started correctly. This means that, at some point, everything was working as designed and the server was fit for purpose. However, something subsequently failed. An example of this is where a disk fails after the OS loads, causing the system to reboot. Depending on any fault tolerant characteristics of the server, the restart may have been successful. If it was, this opens up new opportunities for information gathering and analysis, in the form of OS event logs and the DSA tool. As with all previous states, if multiple information sources are available, all should be checked to look for matches in symptoms to increase your confidence level in the diagnosis.
  • {DESCRIPTION} {TRANSCRIPT} It may not always be the case that multiple information sources are available for a given fault. If they are, you must always check all sources against the available analysis tools. Ideally, you are looking for a match. If there is a mis-match between two information sources, you may need to look for a third source to help you to be confident that you have identified the cause of the problem.
  • {DESCRIPTION} {TRANSCRIPT} Always use recognized reference points. Tools such as RETAIN tips, the PDSG and the IBM support Web site contain the collective knowledge of the teams who designed the system, along with the accumulated knowledge of the many people who support the product. Your experience is, of course, extremely valuable. If you have seen a fault many times and you are working with someone who is seeing the fault for the first time, you can bring your experience to bear and help your team members. Remember to explain why a fault exists, not just how to fix it so that next time, your team members can fully understand the circumstances surrounding the fault.
  • {DESCRIPTION} {TRANSCRIPT} This section introduces the some of the basic data gathering diagnostic tools starting with the light path diagnostic tool.
  • {DESCRIPTION} {TRANSCRIPT} The light path diagnostics allow you to quickly identify the type of system error that occurred by monitoring and reporting the health of the processors, main memory, hard disk drives, PCI adapters, fans, power supplies, VRMs, and the internal system temperature. The server is designed so that any LEDs that are illuminated remain illuminated when the server shuts down as long as the power source is good. This feature helps you isolate the problem if an error causes the server to shut down. The system board also contains LEDs beside specific components—such as DIMM Slot 12—identifies the failed part. The light path diagnostics works even when the server is unplugged. The two buttons shown on the light path panel are: Remind - You can use the remind button on the light path diagnostics panel to put the system-error LED on the operator information panel into Remind mode. When you press the remind button, you acknowledge the error but indicate that you will not take immediate action. The system-error LED flashes while it is in Remind mode and stays in Remind mode until one of the following conditions occurs: - All known errors are corrected. - The server is restarted. - A new error occurs, causing the system-error LED to be lit again. Reset - Use this button to force an immediate system restart.
  • {DESCRIPTION} {TRANSCRIPT} This section identify the three types of service processor: Baseboard Management Control (BMC), Remote Supervisor Adapter (RSA), and Advanced Management Module (AMM).
  • {DESCRIPTION} {TRANSCRIPT} This table lists the three types of service processor hardware management options available for IBM System x servers and BladeCenter chassis. These are separate service processors used to control power to the device and perform management and diagnostic functions. Baseboard management controller (BMC) or mini baseboard management controller (Mini-BMC) is a specialized microcontroller embedded on the motherboard, and used to control some System x servers. It is the intelligence in the Intelligent Platform Management Interface (IPMI) architecture. The BMC manages the interface between system management software and platform hardware through different types of sensors built into the server report to the BMC on parameters such as temperature, cooling fan speeds, power mode, operating system (OS) status, to name a few. The BMC monitors these sensors and can send alerts to a system administrator via the network if any of the parameters do not stay within preset limits, indicating a potential failure of the system. The administrator can also remotely communicate with the BMC to take some corrective action such as resetting or power cycling the system to get a hung OS running again. Standard in some System x servers and an option in others, the RSA expands BMC capability by allowing you to perform systems management functions whether your server is operational or not. The RSA can both be accessed either in-band through a device driver, or out-band over serial or Ethernet. The AMM provides system-management functions and keyboard/video/mouse (KVM) switching for all of the blade servers in a BladeCenter chassis that support KVM. Each BladeCenter chassis comes with at least one advanced management module. The Remote Supervisor Adapter or the Remote Supervisor Adapter II and the Advanced Management Module (AMM) monitors and reports the status and health of your system’s components via standalone Web interfaces. The Web interfaces contains similar tasks such an System Status and Event Log where you can view possible errors that may indicate a potential failure, and messages captured on the server’s hardware environment.
  • {DESCRIPTION} {TRANSCRIPT} This section introduce the Dynamic System Analysis (DSA) tools.
  • {DESCRIPTION} {TRANSCRIPT} IBM Dynamic System Analysis (DSA) collects and analyzes system information to aid in diagnosing system problems. The system information is collected into a compressed XML file that can be sent to IBM Service. By default, DSA output is created in the \IBM_Support directory of the hard disk defined by the %SystemDrive% environment variable. Additionally, users can view the system information through optionally generated HTML Web pages. DSA creates a merged log that allows users to easily identify cause-and-effect relationships from different log sources in the system. Optionally, DSA can also run diagnostics on the installed components. Two versions of the DSA are available. The first, DSA Portable Edition runs from the command prompt on a supported system without altering any system files or system settings. It expands to temporary space on the target system, runs, and deletes all intermediate files after execution completes. Its design and packaging allow it to collect system information in sensitive customer environments with only temporary use of system resources. The second version, DSA Installable Edition provides a permanent installation of DSA onto a system. This installation shares a similar command prompt interface with the portable edition. With DSA Installable Edition, you can get an UpdateXpress comparison analysis to verify whether your firmware and drivers are current.
  • {DESCRIPTION} {TRANSCRIPT} DSA Portable Edition is not installed on the target system. When run, DSA Portable Edition expands to a temporary directory, which is removed after DSA information collection is completed. DSA Portable Edition is designed to fit on removable media such as a CD or USB key. The removable media must be supported for use with the server on which you plan to run DSA Portable Edition. Installation of DSA requires 15MB of disk space. DSA requires 50 to 100 MB of available memory during the data collection process. The amount of memory required for this process depends on the size of the logs being collected from the system. To view the information that is collected by DSA, you must use Internet Explorer 6.0, with Service Pack 1 (or later) or Mozilla 1.4.0 (or later) or Firefox 1.04 (or later). In order to display the DSA data in a web browser, 30 to 100MB of available memory is required. The exact amount of memory required depends on the size of the logs being viewed. At the time of writing, DSA is launched by running the single MS Windows executable file: (ibm_utl_dsa_200p_windows_noarch.exe) or a Linux distribution specific shell script. It supports various command line options and the command might vary according the released version of the program. Using the -t option allows to send the collected data to IBM System x Service and Support. DSA uses File Transfer Protocol (FTP) to transfer the compressed XML output file to IBM Service. When DSA is run with the -v option, it creates a subdirectory that contains HTML files that you can view with a Web browser.
  • {DESCRIPTION} {TRANSCRIPT} DSA Installable Edition can be run from the Start menu, from a command prompt or a Linux shell prompt and has the same requirements as the Portable edition. The installation can be performed either manually or in unattended mode. Once installed, the DSA executable file is collectall.exe and, like in the Portable Edition, it supports command-line options. When the file is launched with the -u option, the user can designate a fully qualified path to an UpdateXpress CD or CD image. Alternatively, using the -ul option will instruct DSA to use the online UpdateXpress index. An additional command-line utility, rtdcli.exe , is included with DSA Installable Edition to provides more control over the execution of the diagnostic tests. When launched from the Start Menu, the HTML files are automatically generated.
  • {DESCRIPTION} {TRANSCRIPT} IBM has developed an embedded Preboot DSA (Dynamic System Analysis) diagnostics tool for the IBM System x3850 M2 and x3950 M2 because the previous tools such as PC Doctor didn’t meet the requirements for the newer IBM System x systems. The Preboot DSA is a NVRAM-based version of the of the Dynamic System Analysis tool that is used by the Technical Support teams to collect system and component level, operating system driver information as well as hardware event logs of different of hardware components or operating system event logs to diagnosis of system problems. The DSA and Preboot DSA collect information that can be viewed locally or uploaded to an IBM internal FTP server for the Technical Support teams to have remote access from different locations at every time in each part of the world in case a deeper analysis of system state information or error logs is required. It is the primary method of testing the major components of the server. This image shows the options available from the main Preboot DSA interface. The Preboot DSA is a feature for selected servers, at this time.
  • {DESCRIPTION} {TRANSCRIPT} You can accessed the Preboot DSA once the system reaches state 4, completion of POST by pressing the F2 key when it is displayed on screen. By default, the system will first display the memory test menu, simply select Quit using the right arrow key on your keyboard and select ‘Quit to DSA’. The Preboot DSA diagnostic program might appear to be unresponsive for an unusual length of time (up to 10 minutes) when you start the program. This is normal operation while the program loads. It is recommended that you power on all attached devices first before powering on the server.
  • {DESCRIPTION} {TRANSCRIPT} Next, the Preboot DSA displays a Command main menu that allows you to make the following selections by typing in the following commands: gui - take you the graphical environment cmd - offers various command as an option copy - copy DSA results to a removable media exit - exits the program help - is also available If you choose to type in the option “cmd”, as shown in the second image. It will provide a list of IBM DSA Interactive that allows you to collect, view, or display various tests and results on your system components.
  • {DESCRIPTION} {TRANSCRIPT} This is a screen shot of the Diagnostics option. You can use this submenu to perform a variety of diagnostic tests on system hardware such as CPU or Memory stress test.
  • {DESCRIPTION} {TRANSCRIPT} This is a screen shot of the System Information option. Use the submenu to obtain an overview about your system or your multinode partition.
  • {DESCRIPTION} {TRANSCRIPT} The advice above is not specific to the any server, it is generic to any problem handling. Comparing the configuration and software set-up between “working” and “non-working” systems will often lead to problem resolution. With the variety of hardware and software combinations that can be encountered, the following information can help assist you in problem determination. If possible, have this information available when requesting assistance from Service Support and Engineering functions.
  • {DESCRIPTION} {TRANSCRIPT} Any time parts are removed to reseat, replaced with a new part or any cables are unplugged for any reason the potential to disturb other components of the system is very high. Especially in multi-board systems, it is vital that code levels are matched; that is they can work together across the boards. You “must” also run a full pass of diagnostics on the part that was replaced, then, if that is OK run a “quick test” on the entire system to make sure no other errors were induced during the maintenance activity. You should also run the Light Path Diagnostic LED test to ensure that the LED’s will be effective in indicating future errors.
  • {DESCRIPTION} {TRANSCRIPT} This slide presents a glossary of acronyms and terms used in this topic .
  • {DESCRIPTION} {TRANSCRIPT} Having completed this topic, you should be able to: Identify basic troubleshooting questions to consider Identify the six possible states of a system. Identify diagnostic tools that are available to gather and analyze information for each given system state
  • {DESCRIPTION} This screen displays html links. {TRANSCRIPT} Listed are some additional resources that will help you learn more about the IBM System x. IBM offers a rich library of resources on a variety of topics - from customized Web-based education to downloadable brochures, planning and installation guides on popular solutions, as well as maintaining IBM Systems.
  • {DESCRIPTION} Displays the statement of “End of Presentation” in the center of the slide. {TRANSCRIPT} Thank you!
  • Xtw01t11v0901 troubleshooting

    1. 1. © 2006 IBM Corporation This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to customers. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation System x Basic Troubleshooting XTW01 Topic 11
    2. 2. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 2 Course Objectives At the completion of this topic, you should be able to: > Identify basic troubleshooting questions to consider > Identify the six possible states of a system > Identify diagnostic tools that are available to gather and analyze information for each given system state
    3. 3. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 3 > * IBM System x Troubleshooting Questions * > Six System States > Data Gathering Diagnostic Tools  Light Path Diagnostic  BMC, RSA and AMM  Dynamic System Analysis (DSA) Topic 11- Course Agenda
    4. 4. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 4 When working with problems on the System x servers, consider asking the following questions: > Will the system power up? > Did it ever power up? > Is there a POST error message? > If yes, what is it? > Does the NOS load? > Are any error lights illuminated? > Is the BMC configured for remote access? > Is the RSA-II and AMM installed? > The log can be captured for analysis? Questions To Ask Troubleshooting IBM System x Servers
    5. 5. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 5 > IBM System x Troubleshooting Questions > * Six System States * > Data Gathering Diagnostic Tools  Light Path Diagnostic  BMC, RSA and AMM  Dynamic System Analysis (DSA) Topic 11 - Course Agenda
    6. 6. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 6 AC AC/DC POST NOS Start Complete Stop System state #1 – There is no AC power System state #2 - There is AC power but there is no DC output System state #3 – There is both AC and DC power but the system fails to complete POST System state #4 – There is both AC and DC power, the system completes POST but the NOS fails to start loading System state #5 – There is both AC and DC power, the system completes POST but the NOS fails to complete loading System state #6 – There is both AC and DC power, the system completes POST and the NOS completes loading but stops during operation > Identifying the Six System States IBM System x – Six States PD
    7. 7. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 7 Information Gathering and Analysis Tools Information Gathering: > Eyes and ears > HMM and PDSG > Light Path diagnostics > BMC > RSA > Boot sequence options  F1 setup, F2 diagnostics  Adapter BIOS messages > NOS start-up messages > NOS failure messages > Dynamic System Analysis > NOS event logs Information Analysis: > HMM and PDSG > Light Path diagnostics > BIOS messages  Checkpoint codes  Adapter BIOS warnings > SVCCon, SMBridge, F1 setup and F2 diagnostics  Access BMC event logs > Web browser  Access RSA event logs > RETAIN tips > IBM Support Web site > DSA
    8. 8. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 8 System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM State 1 - No AC Power
    9. 9. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 9 System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM 2. There is AC power but no DC output BMC RSA and AMM Light path SvcCon, SMBridge RSA and AMM event log State 2 - AC Power But No DC Output
    10. 10. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 10 System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM 2. There is AC power but no DC output BMC RSA and AMM Light path SvcCon, SMBridge RSA and AMM event log 3. There is AC and DC power but the system fails to complete POST Checkpoint codes F1 and F2 Beep codes Adapter BIOS msgs (Adaptec, LSI, etc.) PDSG RETAIN tips IBM support Web site State 3 - System Fails To Complete POST
    11. 11. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 11 State 4 - System Completes POST But NOS Fails To Start Loading System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM 2. There is AC power but no DC output BMC RSA and AMM Light path SvcCon, SMBridge RSA and AMM event log 3. There is AC and DC power but the system fails to complete POST Checkpoint codes F1 and F2 Beep codes Adapter BIOS msgs (Adaptec, LSI, etc.) PDSG RETAIN tips IBM support Web site 4. There is AC and DC power, the system completes POST but the NOS fails to start loading ServeRAID Manager F2 diagnostics PDSG RETAIN tips F2 diagnostics
    12. 12. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 12 System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM 2. There is AC power but no DC output BMC RSA and AMM Light path SvcCon, SMBridge RSA and AMM event log 3. There is AC and DC power but the system fails to complete POST Checkpoint codes F1 and F2 Beep codes Adapter BIOS msgs (Adaptec, LSI, etc.) PDSG RETAIN tips IBM support Web site 4. There is AC and DC power, the system completes POST but the NOS fails to start loading ServeRAID Manager F2 diagnostics PDSG RETAIN tips 5. There is AC and DC power, the system completes POST but the NOS fails to complete loading NOS boot messages ‘Blue screen’ ‘Safe’ mode NOS vendor messages State 5 - NOS Fails To Complete Loading
    13. 13. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 13 System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM 2. There is AC power but no DC output BMC RSA and AMM Light path SvcCon, SMBridge RSA and AMM event log 3. There is AC and DC power but the system fails to complete POST Checkpoint codes F1 and F2 Beep codes Adapter BIOS msgs (Adaptec, LSI, etc.) PDSG RETAIN tips IBM support Web site 4. There is AC and DC power, the system completes POST but the NOS fails to start loading ServeRAID Manager F2 diagnostics PDSG RETAIN tips 5. There is AC and DC power, the system completes POST but the NOS fails to complete loading NOS boot messages ‘Blue screen’ ‘Safe’ mode NOS vendor messages 6. There is AC and DC power, the system completes POST and the NOS completes loading but stops during operation DSA NOS event logs DSA State 6 - NOS Loads But Stops During Normal Operations
    14. 14. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 14 Gathering Information - Tip If multiple sources are available, look for confirmations > Two sources pointing at the same probable cause increases confidence in the information > Two sources pointing at different probable causes reduces confidence in the information  Search for a third source to clarify the information being presented
    15. 15. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 15 Analyzing Information - Tip Formal reference points are proven > RETAIN tips are based on factual evidence from previous cases histories > The PDSG is based on the collective knowledge of the system designers and senior support teams Guessing is NOT an option > If the information is unclear, seek help Experience is very valuable > Consult with team members if you are unsure of what the information is telling you > Offer guidance to less experienced co-workers
    16. 16. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 16 > IBM System x Troubleshooting Questions > Six System States > Data Gathering Diagnostic Tools  * Light Path Diagnostic *  BMC, RSA and AMM  Dynamic System Analysis (DSA) Topic 11 - Course Agenda
    17. 17. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 17 Light Path Diagnostics > Allows quick diagnosis of any type of server error  Introduced in 1998, now included in most System x, BladeCenter, and Blade Servers > Level 1 – Drop-down panel containing system status LEDs  LEDs that correspond to major server components  Includes Remind and Reset buttons > Level 2 – LED identifying suspect component  LEDs placed throughout server next to individual server components  Even without power to server, can be used for up to 12 hours Pop out Operator Information Panel Blade server Front Panel LEDs
    18. 18. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 18 > IBM System x Troubleshooting Questions > Six System States > Data Gathering Diagnostic Tools  Light Path Diagnostic  * BMC, RSA and AMM *  Dynamic System Analysis (DSA) Topic 11 - Course Agenda
    19. 19. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 19 IBM Systems Management Hardware Portfolio Mini-BMC BMC Remote Supervisor Adapter Advanced Management Module Mini Baseboard Management Controller • IPMI 1.5 compliant • Monitor voltages, temps, battery • Drive system LED’s except LightPath • Power control, system reset, and reboot • Used in value servers Baseboard Management Controller • Same features as mini-BMC plus the following: • IPMI 1.5 or 2.0 compliant, depending on system • Serial over LAN (SOL) • Drives LightPath • On all but value servers Remote Supervisor Adapter • Web interface and full SSL and other security module integrations • LDAP integration for authentication • Remote KVM support • Remote disk support • DNS, DHCP, SNMP, SLP • Standard in select servers and optional for most other servers in portfolio BladeCenter Adv Mgt Module • Web interface and full SSL and other security module integrations • LDAP integration for authentication • Remote KVM support • Remote disk support • DNS, DHCP, SNMP , SLP • USB Virtualization • With concurrent capable blade • Concurrent KVM capable • Concurrent Remote Drive capable • Concurrent Media Tray capable
    20. 20. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 20 > IBM System x Troubleshooting Questions > Six System States > Data Gathering Diagnostic Tools  Light Path Diagnostic  BMC, RSA and AMM  * Dynamic System Analysis (DSA) * Topic 11 - Course Agenda
    21. 21. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 21 Product download page: http://www.ibm.com/systems/management/dsa.html Dynamic System Analysis DSA collects and analyzes information about various aspects of a system to aid in troubleshooting Creates a merged log with all the retrieved information > Compressed XML file for IBM Support personnel > Optionally, HTML pages can be created for all users Portable Edition > Runs without altering target system > Removes any created temporary files Installable Edition > Permanent > Integrates with UpdateXpress input to rapidly identify down-level firmware and drivers Analysed components: > System configuration > Installed applications and hot fixes > Device drivers and system services > Network interfaces and settings > Performance data and details for running processes > Hardware inventory, including PCI information > Vital product data, firmware, and basic input/output system (BIOS) information > SCSI device sense data > EXA chipset uncorrectable error register information > ServeRAID configuration > Event logs for the operating system, applications, security, ServeRAID controllers, and service processors
    22. 22. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 22 Dynamic System Analysis - Portable Edition
    23. 23. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 23 Dynamic System Analysis - Installable Edition
    24. 24. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 24 > Provide problem isolation, configuration analysis, error log collection > Primary method of testing the major components > Viewed locally or uploaded to an internal FTP server > Standard for System x and BladeCenter servers New Preboot Dynamic System Analysis
    25. 25. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 25 > Press F2 key during POST > By default, it takes you to the IBM Memory Test  Select Quit to exit to DSA > Can take up to 10 minutes to load > Power on all attached devices before powering on the server Preboot DSA memory tests Preboot DSA - Access
    26. 26. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 26 > Preboot DSA offers several options in a command line menu system > IBM DSA Interactive  Several command line instructions are available Preboot DSA - Command Line
    27. 27. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 27 Selecting ‘Diagnostics’ from the main menu will load the diagnostic tests page Preboot DSA - Graphical Diagnostics
    28. 28. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 28 Preboot DSA - Graphical Interface Select System Information GUI to enter the Graphical User Menu
    29. 29. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 29 Problem Determination - Information Gathering > Machine type and model > Microprocessor or hard disk upgrades > Failure symptom  Do diagnostics fail?  What, when, where, single, or multiple systems?  Is the failure repeatable?  Has this configuration ever worked?  If it has been working, what changes were made prior to it failing?  Is this the original reported failure? > Diagnostics version — type and version level > Hardware configuration  Print (print screen) configuration currently in use  BIOS level > Operating system software — type and version level
    30. 30. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 30 > When solving problems – especially ones that involve a component replacement, ensure the following: > Apply code updates to ensure that all code across all boards is matched for levels and will provide a working system > Run the embedded diagnostics program to test the new component > Run a “quick test” on the entire system > Clear the BMC event log in readiness for any subsequent events > The embedded diagnostics programs are the primary method of testing the major components of the server following parts replacement > Event logs are limited in capacity  Once a problem has been resolved, clear the logs so that useful information can be captured, should another fault occur When Solving Problems
    31. 31. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 31 Advanced Management Module (AMM) Baseboard Management Controller (BMC) Common Information Model (CIM) Dynamic System Analysis (DSA) Intelligent Platform Management Interface (IPMI) Light Path Diagnostic Multiple processing (MP) Problem Determination and Service Guide (PDSG) Remote Supervisor Adapter (RSA) II Glossary of terms
    32. 32. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 32 Course Summary Having completed this topic, you should be able to: > Identify basic troubleshooting questions to consider > Identify the six possible states of a system > Identify diagnostic tools that are available to gather and analyze information for each given system state
    33. 33. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 33 Additional Resources IBM STG SMART Zone for more education on Webinar, Web Lectures, etc..: > Internal: http://lt.be.ibm.com/smartzone/modulartechnical > BP: http://www.ibm.com/services/weblectures/dlv/partnerworld IBM System x > http://www-03.ibm.com/systems/x/ IBM BladeCenter Chassis > http://www-03.ibm.com/systems/bladecenter/ IBM BladeCenter Blade Servers > http://www-03.ibm.com/systems/bladecenter/hardware/servers/index.html IBM BladeCenter Redbooks > http://www.redbooks.ibm.com/ IBM ServerProven > http://www-03.ibm.com/servers/eserver/serverproven/compat/us/ IBM System x Support > http://www-304.ibm.com/systems/support/supportsite.wss/brandmain?brandind=5000008
    34. 34. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 34 End of Presentation

    ×