SlideShare a Scribd company logo
1 of 34
Download to read offline
© 2006 IBM Corporation
This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to customers.
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
System x Basic Troubleshooting
XTW01
Topic 11
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
2
Course Objectives
At the completion of this topic, you should be able to:
> Identify basic troubleshooting questions to consider
> Identify the six possible states of a system
> Identify diagnostic tools that are available to gather and analyze information for
each given system state
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
3
> * IBM System x Troubleshooting Questions *
> Six System States
> Data Gathering Diagnostic Tools
 Light Path Diagnostic
 BMC, RSA and AMM
 Dynamic System Analysis (DSA)
Topic 11- Course Agenda
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
4
When working with problems on the System x servers, consider asking the
following questions:
> Will the system power up?
> Did it ever power up?
> Is there a POST error message?
> If yes, what is it?
> Does the NOS load?
> Are any error lights illuminated?
> Is the BMC configured for remote access?
> Is the RSA-II and AMM installed?
> The log can be captured for analysis?
Questions To Ask
Troubleshooting IBM System x Servers
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
5
> IBM System x Troubleshooting Questions
> * Six System States *
> Data Gathering Diagnostic Tools
 Light Path Diagnostic
 BMC, RSA and AMM
 Dynamic System Analysis (DSA)
Topic 11 - Course Agenda
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
6
AC
AC/DC
POST
NOS
Start
Complete
Stop
System state #1 – There is no AC power
System state #2 - There is AC power but there is no DC output
System state #3 – There is both AC and DC power but
the system fails to complete POST
System state #4 – There is both AC and DC power, the system
completes POST but the NOS fails to start loading
System state #5 – There is both AC and DC power, the system
completes POST but the NOS fails to complete loading
System state #6 – There is both AC and DC power,
the system completes POST and the NOS completes
loading but stops during operation
> Identifying the Six System States
IBM System x – Six States PD
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
7
Information Gathering and Analysis Tools
Information Gathering:
> Eyes and ears
> HMM and PDSG
> Light Path diagnostics
> BMC
> RSA
> Boot sequence options
 F1 setup, F2 diagnostics
 Adapter BIOS messages
> NOS start-up messages
> NOS failure messages
> Dynamic System Analysis
> NOS event logs
Information Analysis:
> HMM and PDSG
> Light Path diagnostics
> BIOS messages
 Checkpoint codes
 Adapter BIOS warnings
> SVCCon, SMBridge, F1 setup
and F2 diagnostics
 Access BMC event logs
> Web browser
 Access RSA event logs
> RETAIN tips
> IBM Support Web site
> DSA
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
8
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
State 1 - No AC Power
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
9
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
2. There is AC power but no DC
output
BMC
RSA and AMM
Light path
SvcCon, SMBridge
RSA and AMM event log
State 2 - AC Power But No DC Output
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
10
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
2. There is AC power but no DC
output
BMC
RSA and AMM
Light path
SvcCon, SMBridge
RSA and AMM event log
3. There is AC and DC power but
the system fails to complete
POST
Checkpoint codes
F1 and F2
Beep codes
Adapter BIOS msgs (Adaptec,
LSI, etc.)
PDSG
RETAIN tips
IBM support Web site
State 3 - System Fails To Complete POST
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
11
State 4 - System Completes POST But NOS Fails To
Start Loading
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
2. There is AC power but no DC
output
BMC
RSA and AMM
Light path
SvcCon, SMBridge
RSA and AMM event log
3. There is AC and DC power but
the system fails to complete
POST
Checkpoint codes
F1 and F2
Beep codes
Adapter BIOS msgs (Adaptec,
LSI, etc.)
PDSG
RETAIN tips
IBM support Web site
4. There is AC and DC power, the
system completes POST but the
NOS fails to start loading
ServeRAID Manager
F2 diagnostics
PDSG
RETAIN tips
F2 diagnostics
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
12
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
2. There is AC power but no DC
output
BMC
RSA and AMM
Light path
SvcCon, SMBridge
RSA and AMM event log
3. There is AC and DC power but
the system fails to complete
POST
Checkpoint codes
F1 and F2
Beep codes
Adapter BIOS msgs (Adaptec,
LSI, etc.)
PDSG
RETAIN tips
IBM support Web site
4. There is AC and DC power, the
system completes POST but the
NOS fails to start loading
ServeRAID Manager
F2 diagnostics
PDSG
RETAIN tips
5. There is AC and DC power, the
system completes POST but the
NOS fails to complete loading
NOS boot messages
‘Blue screen’
‘Safe’ mode
NOS vendor messages
State 5 - NOS Fails To Complete Loading
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
13
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
2. There is AC power but no DC
output
BMC
RSA and AMM
Light path
SvcCon, SMBridge
RSA and AMM event log
3. There is AC and DC power but
the system fails to complete
POST
Checkpoint codes
F1 and F2
Beep codes
Adapter BIOS msgs (Adaptec,
LSI, etc.)
PDSG
RETAIN tips
IBM support Web site
4. There is AC and DC power, the
system completes POST but the
NOS fails to start loading
ServeRAID Manager
F2 diagnostics
PDSG
RETAIN tips
5. There is AC and DC power, the
system completes POST but the
NOS fails to complete loading
NOS boot messages
‘Blue screen’
‘Safe’ mode
NOS vendor messages
6. There is AC and DC power, the
system completes POST and the
NOS completes loading but stops
during operation
DSA
NOS event logs
DSA
State 6 - NOS Loads But Stops During Normal
Operations
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
14
Gathering Information - Tip
If multiple sources are available, look for confirmations
> Two sources pointing at the same probable cause increases confidence in the
information
> Two sources pointing at different probable causes reduces confidence in the
information
 Search for a third source to clarify the information being presented
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
15
Analyzing Information - Tip
Formal reference points are proven
> RETAIN tips are based on factual evidence from previous cases histories
> The PDSG is based on the collective knowledge of the system designers and senior
support teams
Guessing is NOT an option
> If the information is unclear, seek help
Experience is very valuable
> Consult with team members if you are unsure of what the information is telling you
> Offer guidance to less experienced co-workers
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
16
> IBM System x Troubleshooting Questions
> Six System States
> Data Gathering Diagnostic Tools
 * Light Path Diagnostic *
 BMC, RSA and AMM
 Dynamic System Analysis (DSA)
Topic 11 - Course Agenda
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
17
Light Path Diagnostics
> Allows quick diagnosis of any type of
server error
 Introduced in 1998, now included in most
System x, BladeCenter, and Blade Servers
> Level 1 – Drop-down panel containing
system status LEDs
 LEDs that correspond to major server
components
 Includes Remind and Reset buttons
> Level 2 – LED identifying suspect
component
 LEDs placed throughout server next to
individual server components
 Even without power to server, can be used
for up to 12 hours
Pop out Operator Information Panel
Blade server Front Panel LEDs
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
18
> IBM System x Troubleshooting Questions
> Six System States
> Data Gathering Diagnostic Tools
 Light Path Diagnostic
 * BMC, RSA and AMM *
 Dynamic System Analysis (DSA)
Topic 11 - Course Agenda
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
19
IBM Systems Management Hardware Portfolio
Mini-BMC BMC
Remote
Supervisor
Adapter
Advanced
Management
Module
Mini Baseboard Management
Controller
• IPMI 1.5 compliant
• Monitor voltages, temps, battery
• Drive system LED’s except LightPath
• Power control, system reset, and
reboot
• Used in value servers
Baseboard Management
Controller
• Same features as mini-BMC plus
the following:
• IPMI 1.5 or 2.0 compliant,
depending on system
• Serial over LAN (SOL)
• Drives LightPath
• On all but value servers
Remote Supervisor Adapter
• Web interface and full SSL and
other security module
integrations
• LDAP integration for
authentication
• Remote KVM support
• Remote disk support
• DNS, DHCP, SNMP, SLP
• Standard in select servers and
optional for most other servers
in portfolio
BladeCenter Adv Mgt Module
• Web interface and full SSL and
other security module integrations
• LDAP integration for authentication
• Remote KVM support
• Remote disk support
• DNS, DHCP, SNMP , SLP
• USB Virtualization
• With concurrent capable blade
• Concurrent KVM capable
• Concurrent Remote Drive capable
• Concurrent Media Tray capable
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
20
> IBM System x Troubleshooting Questions
> Six System States
> Data Gathering Diagnostic Tools
 Light Path Diagnostic
 BMC, RSA and AMM
 * Dynamic System Analysis (DSA) *
Topic 11 - Course Agenda
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
21
Product download page:
http://www.ibm.com/systems/management/dsa.html
Dynamic System Analysis
DSA collects and analyzes
information about various aspects
of a system to aid in troubleshooting
Creates a merged log with all the
retrieved information
> Compressed XML file for IBM Support
personnel
> Optionally, HTML pages can be created
for all users
Portable Edition
> Runs without altering target system
> Removes any created temporary files
Installable Edition
> Permanent
> Integrates with UpdateXpress input to
rapidly identify down-level firmware and
drivers
Analysed components:
> System configuration
> Installed applications and hot fixes
> Device drivers and system services
> Network interfaces and settings
> Performance data and details for
running processes
> Hardware inventory, including PCI
information
> Vital product data, firmware, and
basic input/output system (BIOS)
information
> SCSI device sense data
> EXA chipset uncorrectable error
register information
> ServeRAID configuration
> Event logs for the operating system,
applications, security, ServeRAID
controllers, and service processors
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
22
Dynamic System Analysis - Portable Edition
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
23
Dynamic System Analysis - Installable Edition
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
24
> Provide problem isolation,
configuration analysis,
error log collection
> Primary method of testing
the major components
> Viewed locally or
uploaded to an internal
FTP server
> Standard for System x and
BladeCenter servers
New Preboot Dynamic System Analysis
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
25
> Press F2 key during POST
> By default, it takes you to the
IBM Memory Test
 Select Quit to exit to DSA
> Can take up to 10 minutes to
load
> Power on all attached devices
before powering on the server Preboot DSA memory tests
Preboot DSA - Access
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
26
> Preboot DSA offers several
options in a command line
menu system
> IBM DSA Interactive
 Several command line
instructions are available
Preboot DSA - Command Line
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
27
Selecting ‘Diagnostics’ from the main menu will load the diagnostic tests
page
Preboot DSA - Graphical Diagnostics
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
28
Preboot DSA - Graphical Interface
Select System Information GUI to enter the Graphical User Menu
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
29
Problem Determination - Information Gathering
> Machine type and model
> Microprocessor or hard disk upgrades
> Failure symptom
 Do diagnostics fail?
 What, when, where, single, or multiple systems?
 Is the failure repeatable?
 Has this configuration ever worked?
 If it has been working, what changes were made prior to it failing?
 Is this the original reported failure?
> Diagnostics version — type and version level
> Hardware configuration
 Print (print screen) configuration currently in use
 BIOS level
> Operating system software — type and version level
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
30
> When solving problems – especially ones that involve a component
replacement, ensure the following:
> Apply code updates to ensure that all code across all boards is matched for
levels and will provide a working system
> Run the embedded diagnostics program to test the new component
> Run a “quick test” on the entire system
> Clear the BMC event log in readiness for any subsequent events
> The embedded diagnostics programs are the primary method of testing the
major components of the server following parts replacement
> Event logs are limited in capacity
 Once a problem has been resolved, clear
the logs so that useful information can be
captured, should another fault occur
When Solving Problems
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
31
Advanced Management Module (AMM)
Baseboard Management Controller (BMC)
Common Information Model (CIM)
Dynamic System Analysis (DSA)
Intelligent Platform Management Interface
(IPMI)
Light Path Diagnostic
Multiple processing (MP)
Problem Determination and Service Guide
(PDSG)
Remote Supervisor Adapter (RSA) II
Glossary of terms
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
32
Course Summary
Having completed this topic, you should be able to:
> Identify basic troubleshooting questions to consider
> Identify the six possible states of a system
> Identify diagnostic tools that are available to gather and analyze
information for each given system state
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
33
Additional Resources
IBM STG SMART Zone for more education on Webinar, Web Lectures, etc..:
> Internal: http://lt.be.ibm.com/smartzone/modulartechnical
> BP: http://www.ibm.com/services/weblectures/dlv/partnerworld
IBM System x
> http://www-03.ibm.com/systems/x/
IBM BladeCenter Chassis
> http://www-03.ibm.com/systems/bladecenter/
IBM BladeCenter Blade Servers
> http://www-03.ibm.com/systems/bladecenter/hardware/servers/index.html
IBM BladeCenter Redbooks
> http://www.redbooks.ibm.com/
IBM ServerProven
> http://www-03.ibm.com/servers/eserver/serverproven/compat/us/
IBM System x Support
> http://www-304.ibm.com/systems/support/supportsite.wss/brandmain?brandind=5000008
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
34
End of Presentation

More Related Content

What's hot

Future of Power: PureFlex and IBM i - Erik Rex
Future of Power: PureFlex and IBM i - Erik RexFuture of Power: PureFlex and IBM i - Erik Rex
Future of Power: PureFlex and IBM i - Erik RexIBM Danmark
 
IBM i 7.1 & TRs CEC 2012
IBM i 7.1 & TRs CEC 2012IBM i 7.1 & TRs CEC 2012
IBM i 7.1 & TRs CEC 2012COMMON Europe
 
Ibm pure systems sales bootcamp
Ibm pure systems sales bootcampIbm pure systems sales bootcamp
Ibm pure systems sales bootcampsolarisyougood
 
Flex system client_presentation
Flex system client_presentationFlex system client_presentation
Flex system client_presentationNatalija Pavic
 
IBM Power Event, Keynote Presentation Doug Davis
IBM Power Event, Keynote Presentation Doug DavisIBM Power Event, Keynote Presentation Doug Davis
IBM Power Event, Keynote Presentation Doug DavisIBM Danmark
 
November flex and pure flex announcements.ppt&token=mtm1mjkynzewmze4mw==&loca...
November flex and pure flex announcements.ppt&token=mtm1mjkynzewmze4mw==&loca...November flex and pure flex announcements.ppt&token=mtm1mjkynzewmze4mw==&loca...
November flex and pure flex announcements.ppt&token=mtm1mjkynzewmze4mw==&loca...Simon Womack
 
The benefits of IBM FlashSystems
The benefits of IBM FlashSystemsThe benefits of IBM FlashSystems
The benefits of IBM FlashSystemsLuca Comparini
 
IBM PureFlex Solution for Cloud Backup and Recovery: Private Cloud Disaster R...
IBM PureFlex Solution for Cloud Backup and Recovery: Private Cloud Disaster R...IBM PureFlex Solution for Cloud Backup and Recovery: Private Cloud Disaster R...
IBM PureFlex Solution for Cloud Backup and Recovery: Private Cloud Disaster R...IBM India Smarter Computing
 
Aix The Future of UNIX
Aix The Future of UNIX Aix The Future of UNIX
Aix The Future of UNIX xKinAnx
 
Xtw01t5v011311 disk storage
Xtw01t5v011311 disk storageXtw01t5v011311 disk storage
Xtw01t5v011311 disk storagepgnguyen44
 
Flash Ahead: IBM Flash System Selling Point
Flash Ahead: IBM Flash System Selling PointFlash Ahead: IBM Flash System Selling Point
Flash Ahead: IBM Flash System Selling PointCTI Group
 
IBM i Technology Refreshes Overview 2012 06-04
IBM i Technology Refreshes Overview 2012 06-04IBM i Technology Refreshes Overview 2012 06-04
IBM i Technology Refreshes Overview 2012 06-04COMMON Europe
 

What's hot (17)

Future of Power: PureFlex and IBM i - Erik Rex
Future of Power: PureFlex and IBM i - Erik RexFuture of Power: PureFlex and IBM i - Erik Rex
Future of Power: PureFlex and IBM i - Erik Rex
 
IBM i 7.1 & TRs CEC 2012
IBM i 7.1 & TRs CEC 2012IBM i 7.1 & TRs CEC 2012
IBM i 7.1 & TRs CEC 2012
 
Ibm pure systems sales bootcamp
Ibm pure systems sales bootcampIbm pure systems sales bootcamp
Ibm pure systems sales bootcamp
 
Flex system client_presentation
Flex system client_presentationFlex system client_presentation
Flex system client_presentation
 
IBM Power Event, Keynote Presentation Doug Davis
IBM Power Event, Keynote Presentation Doug DavisIBM Power Event, Keynote Presentation Doug Davis
IBM Power Event, Keynote Presentation Doug Davis
 
November flex and pure flex announcements.ppt&token=mtm1mjkynzewmze4mw==&loca...
November flex and pure flex announcements.ppt&token=mtm1mjkynzewmze4mw==&loca...November flex and pure flex announcements.ppt&token=mtm1mjkynzewmze4mw==&loca...
November flex and pure flex announcements.ppt&token=mtm1mjkynzewmze4mw==&loca...
 
The benefits of IBM FlashSystems
The benefits of IBM FlashSystemsThe benefits of IBM FlashSystems
The benefits of IBM FlashSystems
 
IBM PureFlex Solution for Cloud Backup and Recovery: Private Cloud Disaster R...
IBM PureFlex Solution for Cloud Backup and Recovery: Private Cloud Disaster R...IBM PureFlex Solution for Cloud Backup and Recovery: Private Cloud Disaster R...
IBM PureFlex Solution for Cloud Backup and Recovery: Private Cloud Disaster R...
 
Overview of IBM PureSystems
Overview of IBM PureSystemsOverview of IBM PureSystems
Overview of IBM PureSystems
 
Aix The Future of UNIX
Aix The Future of UNIX Aix The Future of UNIX
Aix The Future of UNIX
 
IBM I and blade center update 2009
IBM I and blade center update 2009IBM I and blade center update 2009
IBM I and blade center update 2009
 
Xtw01t5v011311 disk storage
Xtw01t5v011311 disk storageXtw01t5v011311 disk storage
Xtw01t5v011311 disk storage
 
Flash Ahead: IBM Flash System Selling Point
Flash Ahead: IBM Flash System Selling PointFlash Ahead: IBM Flash System Selling Point
Flash Ahead: IBM Flash System Selling Point
 
IBM i Technology Refreshes Overview 2012 06-04
IBM i Technology Refreshes Overview 2012 06-04IBM i Technology Refreshes Overview 2012 06-04
IBM i Technology Refreshes Overview 2012 06-04
 
IBM PureFlex System configurations
IBM PureFlex System configurationsIBM PureFlex System configurations
IBM PureFlex System configurations
 
OMEGAMON XE for Storage V530 Long client presentation
OMEGAMON XE for Storage V530 Long client presentationOMEGAMON XE for Storage V530 Long client presentation
OMEGAMON XE for Storage V530 Long client presentation
 
Ibm aix
Ibm aixIbm aix
Ibm aix
 

Similar to Xtw01t11v0901 troubleshooting

S200515 storage-insights-ist2020-v2001d
S200515 storage-insights-ist2020-v2001dS200515 storage-insights-ist2020-v2001d
S200515 storage-insights-ist2020-v2001dTony Pearson
 
Fannie mae bmc remedy its mv7 production infrastructure_v8_021009
Fannie mae bmc remedy its mv7 production infrastructure_v8_021009Fannie mae bmc remedy its mv7 production infrastructure_v8_021009
Fannie mae bmc remedy its mv7 production infrastructure_v8_021009Accenture
 
Enterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologyEnterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologysolarisyougood
 
Reliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxReliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxSamsung Open Source Group
 
C Cure Users Group Presentation Final 4
C Cure Users Group Presentation Final 4C Cure Users Group Presentation Final 4
C Cure Users Group Presentation Final 4halgig
 
Cso 4any ram rev 2.6 management summary
Cso 4any ram rev 2.6 management summaryCso 4any ram rev 2.6 management summary
Cso 4any ram rev 2.6 management summaryCSO GmbH
 
Getting Started with IBM i Security: Securing PC Access
Getting Started with IBM i Security: Securing PC AccessGetting Started with IBM i Security: Securing PC Access
Getting Started with IBM i Security: Securing PC AccessHelpSystems
 
Windows Debugging Tools - JavaOne 2013
Windows Debugging Tools - JavaOne 2013Windows Debugging Tools - JavaOne 2013
Windows Debugging Tools - JavaOne 2013MattKilner
 
Managing bitlocker with MBAM
Managing bitlocker with MBAMManaging bitlocker with MBAM
Managing bitlocker with MBAMOlav Tvedt
 
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
S104875 nightmares-dreams-spectrum-control-jburg-v1809hS104875 nightmares-dreams-spectrum-control-jburg-v1809h
S104875 nightmares-dreams-spectrum-control-jburg-v1809hTony Pearson
 
Computer Hardware & Software Lab Manual 2
Computer Hardware & Software Lab Manual 2Computer Hardware & Software Lab Manual 2
Computer Hardware & Software Lab Manual 2senayteklay
 
Government Agencies Using Splunk: Is Your Critical Data Missing?
Government Agencies Using Splunk: Is Your Critical Data Missing?Government Agencies Using Splunk: Is Your Critical Data Missing?
Government Agencies Using Splunk: Is Your Critical Data Missing?Precisely
 
Iod 2010 1971_lohman_final
Iod 2010 1971_lohman_finalIod 2010 1971_lohman_final
Iod 2010 1971_lohman_finalKeshav Murthy
 
Iod 2010 1971_lohman_final
Iod 2010 1971_lohman_finalIod 2010 1971_lohman_final
Iod 2010 1971_lohman_finalKeshav Murthy
 
Monitoring system performance and health of i CEC 2012
Monitoring system performance and health of i CEC 2012Monitoring system performance and health of i CEC 2012
Monitoring system performance and health of i CEC 2012COMMON Europe
 
SMT Verification of the POWER5 and POWER6 High-Performance Processors
SMT Verification of the POWER5 and POWER6 High-Performance ProcessorsSMT Verification of the POWER5 and POWER6 High-Performance Processors
SMT Verification of the POWER5 and POWER6 High-Performance ProcessorsDVClub
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture explorationDeepak Shankar
 

Similar to Xtw01t11v0901 troubleshooting (20)

STE_DailyHC_TSMV6.pptx
STE_DailyHC_TSMV6.pptxSTE_DailyHC_TSMV6.pptx
STE_DailyHC_TSMV6.pptx
 
S200515 storage-insights-ist2020-v2001d
S200515 storage-insights-ist2020-v2001dS200515 storage-insights-ist2020-v2001d
S200515 storage-insights-ist2020-v2001d
 
Fannie mae bmc remedy its mv7 production infrastructure_v8_021009
Fannie mae bmc remedy its mv7 production infrastructure_v8_021009Fannie mae bmc remedy its mv7 production infrastructure_v8_021009
Fannie mae bmc remedy its mv7 production infrastructure_v8_021009
 
Enterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technologyEnterprise power systems transition to power7 technology
Enterprise power systems transition to power7 technology
 
Reliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxReliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on Linux
 
C Cure Users Group Presentation Final 4
C Cure Users Group Presentation Final 4C Cure Users Group Presentation Final 4
C Cure Users Group Presentation Final 4
 
Cso 4any ram rev 2.6 management summary
Cso 4any ram rev 2.6 management summaryCso 4any ram rev 2.6 management summary
Cso 4any ram rev 2.6 management summary
 
Getting Started with IBM i Security: Securing PC Access
Getting Started with IBM i Security: Securing PC AccessGetting Started with IBM i Security: Securing PC Access
Getting Started with IBM i Security: Securing PC Access
 
Windows Debugging Tools - JavaOne 2013
Windows Debugging Tools - JavaOne 2013Windows Debugging Tools - JavaOne 2013
Windows Debugging Tools - JavaOne 2013
 
Managing bitlocker with MBAM
Managing bitlocker with MBAMManaging bitlocker with MBAM
Managing bitlocker with MBAM
 
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
S104875 nightmares-dreams-spectrum-control-jburg-v1809hS104875 nightmares-dreams-spectrum-control-jburg-v1809h
S104875 nightmares-dreams-spectrum-control-jburg-v1809h
 
Computer Hardware & Software Lab Manual 2
Computer Hardware & Software Lab Manual 2Computer Hardware & Software Lab Manual 2
Computer Hardware & Software Lab Manual 2
 
Government Agencies Using Splunk: Is Your Critical Data Missing?
Government Agencies Using Splunk: Is Your Critical Data Missing?Government Agencies Using Splunk: Is Your Critical Data Missing?
Government Agencies Using Splunk: Is Your Critical Data Missing?
 
IBM i on Power - Performance FAQ
IBM i on Power - Performance FAQIBM i on Power - Performance FAQ
IBM i on Power - Performance FAQ
 
IBM i on Power - Performance FAQ
IBM i on Power - Performance FAQIBM i on Power - Performance FAQ
IBM i on Power - Performance FAQ
 
Iod 2010 1971_lohman_final
Iod 2010 1971_lohman_finalIod 2010 1971_lohman_final
Iod 2010 1971_lohman_final
 
Iod 2010 1971_lohman_final
Iod 2010 1971_lohman_finalIod 2010 1971_lohman_final
Iod 2010 1971_lohman_final
 
Monitoring system performance and health of i CEC 2012
Monitoring system performance and health of i CEC 2012Monitoring system performance and health of i CEC 2012
Monitoring system performance and health of i CEC 2012
 
SMT Verification of the POWER5 and POWER6 High-Performance Processors
SMT Verification of the POWER5 and POWER6 High-Performance ProcessorsSMT Verification of the POWER5 and POWER6 High-Performance Processors
SMT Verification of the POWER5 and POWER6 High-Performance Processors
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture exploration
 

Recently uploaded

CORPORATE SOCIAL RESPONSIBILITY - FINAL REQUIREMENT.pdf
CORPORATE SOCIAL RESPONSIBILITY - FINAL REQUIREMENT.pdfCORPORATE SOCIAL RESPONSIBILITY - FINAL REQUIREMENT.pdf
CORPORATE SOCIAL RESPONSIBILITY - FINAL REQUIREMENT.pdfLouis Malaybalay
 
0311 National Accounts Online Giving Trends.pdf
0311 National Accounts Online Giving Trends.pdf0311 National Accounts Online Giving Trends.pdf
0311 National Accounts Online Giving Trends.pdfBloomerang
 
Benihana of Tokyo case study11111111.pdf
Benihana of Tokyo case study11111111.pdfBenihana of Tokyo case study11111111.pdf
Benihana of Tokyo case study11111111.pdfjavenxxx01
 
AirOxi - Pioneering Aquaculture Advancements Through NFDB Empanelment.pptx
AirOxi -  Pioneering Aquaculture Advancements Through NFDB Empanelment.pptxAirOxi -  Pioneering Aquaculture Advancements Through NFDB Empanelment.pptx
AirOxi - Pioneering Aquaculture Advancements Through NFDB Empanelment.pptxAirOxi Tube
 
How The Hustle Milestone Referral Program Got 300K Subscribers
How The Hustle Milestone Referral Program Got 300K SubscribersHow The Hustle Milestone Referral Program Got 300K Subscribers
How The Hustle Milestone Referral Program Got 300K SubscribersFlyyx Tech
 
Record of Module Forensic photography in
Record of Module Forensic photography inRecord of Module Forensic photography in
Record of Module Forensic photography inalexademileighpacal
 
Streamlining Your Accounting A Guide to QuickBooks Migration Tools.pptx
Streamlining Your Accounting A Guide to QuickBooks Migration Tools.pptxStreamlining Your Accounting A Guide to QuickBooks Migration Tools.pptx
Streamlining Your Accounting A Guide to QuickBooks Migration Tools.pptxPaulBryant58
 
Shopclues: Failure & Solutions in Business Model
Shopclues: Failure & Solutions in Business ModelShopclues: Failure & Solutions in Business Model
Shopclues: Failure & Solutions in Business ModelBhaviniSharma12
 
"InShorts: A Game-Changer in the Digital News Age"
"InShorts: A Game-Changer in the Digital News Age""InShorts: A Game-Changer in the Digital News Age"
"InShorts: A Game-Changer in the Digital News Age"Adharsh45
 
Project Work on Consumer Behavior in Fast Food Restaurants. Their behavior to...
Project Work on Consumer Behavior in Fast Food Restaurants. Their behavior to...Project Work on Consumer Behavior in Fast Food Restaurants. Their behavior to...
Project Work on Consumer Behavior in Fast Food Restaurants. Their behavior to...BilalAhmed717
 
Unleashing the Power of Fandom: A Short Guide to Fan Business
Unleashing the Power of Fandom: A Short Guide to Fan BusinessUnleashing the Power of Fandom: A Short Guide to Fan Business
Unleashing the Power of Fandom: A Short Guide to Fan Businesstompeter3736
 
unfinished legacy it is a clothing brand
unfinished legacy it is a clothing brandunfinished legacy it is a clothing brand
unfinished legacy it is a clothing brandakashm530190
 
The Smart Bridge Interview now Veranda Learning
The Smart Bridge Interview now Veranda LearningThe Smart Bridge Interview now Veranda Learning
The Smart Bridge Interview now Veranda LearningNaval Singh
 
Pitch Deck Teardown: SuperScale's $5.4M Series A deck
Pitch Deck Teardown: SuperScale's $5.4M Series A deckPitch Deck Teardown: SuperScale's $5.4M Series A deck
Pitch Deck Teardown: SuperScale's $5.4M Series A deckHajeJanKamps
 
The 10 Most Influential Women Making Difference In 2024.pdf
The 10 Most Influential Women Making Difference In 2024.pdfThe 10 Most Influential Women Making Difference In 2024.pdf
The 10 Most Influential Women Making Difference In 2024.pdfInsightsSuccess4
 
NewBase 14 March 2024 Energy News issue - 1707 by Khaled Al Awadi_compress...
NewBase  14 March  2024  Energy News issue - 1707 by Khaled Al Awadi_compress...NewBase  14 March  2024  Energy News issue - 1707 by Khaled Al Awadi_compress...
NewBase 14 March 2024 Energy News issue - 1707 by Khaled Al Awadi_compress...Khaled Al Awadi
 
NVIDIA's overall business overview Presentation.pptx
NVIDIA's overall business overview Presentation.pptxNVIDIA's overall business overview Presentation.pptx
NVIDIA's overall business overview Presentation.pptxKrutik Rakade
 
Reframing Requirements: A Strategic Approach to Requirement Definition, with ...
Reframing Requirements: A Strategic Approach to Requirement Definition, with ...Reframing Requirements: A Strategic Approach to Requirement Definition, with ...
Reframing Requirements: A Strategic Approach to Requirement Definition, with ...Jake Truemper
 
We are inviting you on board, to move forward together in the Right Direction
We are inviting you on board, to move forward together in the Right DirectionWe are inviting you on board, to move forward together in the Right Direction
We are inviting you on board, to move forward together in the Right DirectionRight Direction Aero
 

Recently uploaded (20)

CORPORATE SOCIAL RESPONSIBILITY - FINAL REQUIREMENT.pdf
CORPORATE SOCIAL RESPONSIBILITY - FINAL REQUIREMENT.pdfCORPORATE SOCIAL RESPONSIBILITY - FINAL REQUIREMENT.pdf
CORPORATE SOCIAL RESPONSIBILITY - FINAL REQUIREMENT.pdf
 
0311 National Accounts Online Giving Trends.pdf
0311 National Accounts Online Giving Trends.pdf0311 National Accounts Online Giving Trends.pdf
0311 National Accounts Online Giving Trends.pdf
 
Benihana of Tokyo case study11111111.pdf
Benihana of Tokyo case study11111111.pdfBenihana of Tokyo case study11111111.pdf
Benihana of Tokyo case study11111111.pdf
 
AirOxi - Pioneering Aquaculture Advancements Through NFDB Empanelment.pptx
AirOxi -  Pioneering Aquaculture Advancements Through NFDB Empanelment.pptxAirOxi -  Pioneering Aquaculture Advancements Through NFDB Empanelment.pptx
AirOxi - Pioneering Aquaculture Advancements Through NFDB Empanelment.pptx
 
How The Hustle Milestone Referral Program Got 300K Subscribers
How The Hustle Milestone Referral Program Got 300K SubscribersHow The Hustle Milestone Referral Program Got 300K Subscribers
How The Hustle Milestone Referral Program Got 300K Subscribers
 
Record of Module Forensic photography in
Record of Module Forensic photography inRecord of Module Forensic photography in
Record of Module Forensic photography in
 
Streamlining Your Accounting A Guide to QuickBooks Migration Tools.pptx
Streamlining Your Accounting A Guide to QuickBooks Migration Tools.pptxStreamlining Your Accounting A Guide to QuickBooks Migration Tools.pptx
Streamlining Your Accounting A Guide to QuickBooks Migration Tools.pptx
 
Shopclues: Failure & Solutions in Business Model
Shopclues: Failure & Solutions in Business ModelShopclues: Failure & Solutions in Business Model
Shopclues: Failure & Solutions in Business Model
 
"InShorts: A Game-Changer in the Digital News Age"
"InShorts: A Game-Changer in the Digital News Age""InShorts: A Game-Changer in the Digital News Age"
"InShorts: A Game-Changer in the Digital News Age"
 
Project Work on Consumer Behavior in Fast Food Restaurants. Their behavior to...
Project Work on Consumer Behavior in Fast Food Restaurants. Their behavior to...Project Work on Consumer Behavior in Fast Food Restaurants. Their behavior to...
Project Work on Consumer Behavior in Fast Food Restaurants. Their behavior to...
 
Unleashing the Power of Fandom: A Short Guide to Fan Business
Unleashing the Power of Fandom: A Short Guide to Fan BusinessUnleashing the Power of Fandom: A Short Guide to Fan Business
Unleashing the Power of Fandom: A Short Guide to Fan Business
 
unfinished legacy it is a clothing brand
unfinished legacy it is a clothing brandunfinished legacy it is a clothing brand
unfinished legacy it is a clothing brand
 
The Smart Bridge Interview now Veranda Learning
The Smart Bridge Interview now Veranda LearningThe Smart Bridge Interview now Veranda Learning
The Smart Bridge Interview now Veranda Learning
 
Pitch Deck Teardown: SuperScale's $5.4M Series A deck
Pitch Deck Teardown: SuperScale's $5.4M Series A deckPitch Deck Teardown: SuperScale's $5.4M Series A deck
Pitch Deck Teardown: SuperScale's $5.4M Series A deck
 
WAM Corporate Presentation Mar 12 2024.pdf
WAM Corporate Presentation Mar 12 2024.pdfWAM Corporate Presentation Mar 12 2024.pdf
WAM Corporate Presentation Mar 12 2024.pdf
 
The 10 Most Influential Women Making Difference In 2024.pdf
The 10 Most Influential Women Making Difference In 2024.pdfThe 10 Most Influential Women Making Difference In 2024.pdf
The 10 Most Influential Women Making Difference In 2024.pdf
 
NewBase 14 March 2024 Energy News issue - 1707 by Khaled Al Awadi_compress...
NewBase  14 March  2024  Energy News issue - 1707 by Khaled Al Awadi_compress...NewBase  14 March  2024  Energy News issue - 1707 by Khaled Al Awadi_compress...
NewBase 14 March 2024 Energy News issue - 1707 by Khaled Al Awadi_compress...
 
NVIDIA's overall business overview Presentation.pptx
NVIDIA's overall business overview Presentation.pptxNVIDIA's overall business overview Presentation.pptx
NVIDIA's overall business overview Presentation.pptx
 
Reframing Requirements: A Strategic Approach to Requirement Definition, with ...
Reframing Requirements: A Strategic Approach to Requirement Definition, with ...Reframing Requirements: A Strategic Approach to Requirement Definition, with ...
Reframing Requirements: A Strategic Approach to Requirement Definition, with ...
 
We are inviting you on board, to move forward together in the Right Direction
We are inviting you on board, to move forward together in the Right DirectionWe are inviting you on board, to move forward together in the Right Direction
We are inviting you on board, to move forward together in the Right Direction
 

Xtw01t11v0901 troubleshooting

  • 1. © 2006 IBM Corporation This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to customers. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation System x Basic Troubleshooting XTW01 Topic 11
  • 2. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 2 Course Objectives At the completion of this topic, you should be able to: > Identify basic troubleshooting questions to consider > Identify the six possible states of a system > Identify diagnostic tools that are available to gather and analyze information for each given system state
  • 3. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 3 > * IBM System x Troubleshooting Questions * > Six System States > Data Gathering Diagnostic Tools  Light Path Diagnostic  BMC, RSA and AMM  Dynamic System Analysis (DSA) Topic 11- Course Agenda
  • 4. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 4 When working with problems on the System x servers, consider asking the following questions: > Will the system power up? > Did it ever power up? > Is there a POST error message? > If yes, what is it? > Does the NOS load? > Are any error lights illuminated? > Is the BMC configured for remote access? > Is the RSA-II and AMM installed? > The log can be captured for analysis? Questions To Ask Troubleshooting IBM System x Servers
  • 5. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 5 > IBM System x Troubleshooting Questions > * Six System States * > Data Gathering Diagnostic Tools  Light Path Diagnostic  BMC, RSA and AMM  Dynamic System Analysis (DSA) Topic 11 - Course Agenda
  • 6. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 6 AC AC/DC POST NOS Start Complete Stop System state #1 – There is no AC power System state #2 - There is AC power but there is no DC output System state #3 – There is both AC and DC power but the system fails to complete POST System state #4 – There is both AC and DC power, the system completes POST but the NOS fails to start loading System state #5 – There is both AC and DC power, the system completes POST but the NOS fails to complete loading System state #6 – There is both AC and DC power, the system completes POST and the NOS completes loading but stops during operation > Identifying the Six System States IBM System x – Six States PD
  • 7. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 7 Information Gathering and Analysis Tools Information Gathering: > Eyes and ears > HMM and PDSG > Light Path diagnostics > BMC > RSA > Boot sequence options  F1 setup, F2 diagnostics  Adapter BIOS messages > NOS start-up messages > NOS failure messages > Dynamic System Analysis > NOS event logs Information Analysis: > HMM and PDSG > Light Path diagnostics > BIOS messages  Checkpoint codes  Adapter BIOS warnings > SVCCon, SMBridge, F1 setup and F2 diagnostics  Access BMC event logs > Web browser  Access RSA event logs > RETAIN tips > IBM Support Web site > DSA
  • 8. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 8 System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM State 1 - No AC Power
  • 9. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 9 System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM 2. There is AC power but no DC output BMC RSA and AMM Light path SvcCon, SMBridge RSA and AMM event log State 2 - AC Power But No DC Output
  • 10. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 10 System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM 2. There is AC power but no DC output BMC RSA and AMM Light path SvcCon, SMBridge RSA and AMM event log 3. There is AC and DC power but the system fails to complete POST Checkpoint codes F1 and F2 Beep codes Adapter BIOS msgs (Adaptec, LSI, etc.) PDSG RETAIN tips IBM support Web site State 3 - System Fails To Complete POST
  • 11. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 11 State 4 - System Completes POST But NOS Fails To Start Loading System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM 2. There is AC power but no DC output BMC RSA and AMM Light path SvcCon, SMBridge RSA and AMM event log 3. There is AC and DC power but the system fails to complete POST Checkpoint codes F1 and F2 Beep codes Adapter BIOS msgs (Adaptec, LSI, etc.) PDSG RETAIN tips IBM support Web site 4. There is AC and DC power, the system completes POST but the NOS fails to start loading ServeRAID Manager F2 diagnostics PDSG RETAIN tips F2 diagnostics
  • 12. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 12 System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM 2. There is AC power but no DC output BMC RSA and AMM Light path SvcCon, SMBridge RSA and AMM event log 3. There is AC and DC power but the system fails to complete POST Checkpoint codes F1 and F2 Beep codes Adapter BIOS msgs (Adaptec, LSI, etc.) PDSG RETAIN tips IBM support Web site 4. There is AC and DC power, the system completes POST but the NOS fails to start loading ServeRAID Manager F2 diagnostics PDSG RETAIN tips 5. There is AC and DC power, the system completes POST but the NOS fails to complete loading NOS boot messages ‘Blue screen’ ‘Safe’ mode NOS vendor messages State 5 - NOS Fails To Complete Loading
  • 13. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 13 System State Data Gathering Data Analysis 1. There is no AC power Visual PDSG/HMM 2. There is AC power but no DC output BMC RSA and AMM Light path SvcCon, SMBridge RSA and AMM event log 3. There is AC and DC power but the system fails to complete POST Checkpoint codes F1 and F2 Beep codes Adapter BIOS msgs (Adaptec, LSI, etc.) PDSG RETAIN tips IBM support Web site 4. There is AC and DC power, the system completes POST but the NOS fails to start loading ServeRAID Manager F2 diagnostics PDSG RETAIN tips 5. There is AC and DC power, the system completes POST but the NOS fails to complete loading NOS boot messages ‘Blue screen’ ‘Safe’ mode NOS vendor messages 6. There is AC and DC power, the system completes POST and the NOS completes loading but stops during operation DSA NOS event logs DSA State 6 - NOS Loads But Stops During Normal Operations
  • 14. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 14 Gathering Information - Tip If multiple sources are available, look for confirmations > Two sources pointing at the same probable cause increases confidence in the information > Two sources pointing at different probable causes reduces confidence in the information  Search for a third source to clarify the information being presented
  • 15. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 15 Analyzing Information - Tip Formal reference points are proven > RETAIN tips are based on factual evidence from previous cases histories > The PDSG is based on the collective knowledge of the system designers and senior support teams Guessing is NOT an option > If the information is unclear, seek help Experience is very valuable > Consult with team members if you are unsure of what the information is telling you > Offer guidance to less experienced co-workers
  • 16. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 16 > IBM System x Troubleshooting Questions > Six System States > Data Gathering Diagnostic Tools  * Light Path Diagnostic *  BMC, RSA and AMM  Dynamic System Analysis (DSA) Topic 11 - Course Agenda
  • 17. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 17 Light Path Diagnostics > Allows quick diagnosis of any type of server error  Introduced in 1998, now included in most System x, BladeCenter, and Blade Servers > Level 1 – Drop-down panel containing system status LEDs  LEDs that correspond to major server components  Includes Remind and Reset buttons > Level 2 – LED identifying suspect component  LEDs placed throughout server next to individual server components  Even without power to server, can be used for up to 12 hours Pop out Operator Information Panel Blade server Front Panel LEDs
  • 18. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 18 > IBM System x Troubleshooting Questions > Six System States > Data Gathering Diagnostic Tools  Light Path Diagnostic  * BMC, RSA and AMM *  Dynamic System Analysis (DSA) Topic 11 - Course Agenda
  • 19. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 19 IBM Systems Management Hardware Portfolio Mini-BMC BMC Remote Supervisor Adapter Advanced Management Module Mini Baseboard Management Controller • IPMI 1.5 compliant • Monitor voltages, temps, battery • Drive system LED’s except LightPath • Power control, system reset, and reboot • Used in value servers Baseboard Management Controller • Same features as mini-BMC plus the following: • IPMI 1.5 or 2.0 compliant, depending on system • Serial over LAN (SOL) • Drives LightPath • On all but value servers Remote Supervisor Adapter • Web interface and full SSL and other security module integrations • LDAP integration for authentication • Remote KVM support • Remote disk support • DNS, DHCP, SNMP, SLP • Standard in select servers and optional for most other servers in portfolio BladeCenter Adv Mgt Module • Web interface and full SSL and other security module integrations • LDAP integration for authentication • Remote KVM support • Remote disk support • DNS, DHCP, SNMP , SLP • USB Virtualization • With concurrent capable blade • Concurrent KVM capable • Concurrent Remote Drive capable • Concurrent Media Tray capable
  • 20. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 20 > IBM System x Troubleshooting Questions > Six System States > Data Gathering Diagnostic Tools  Light Path Diagnostic  BMC, RSA and AMM  * Dynamic System Analysis (DSA) * Topic 11 - Course Agenda
  • 21. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 21 Product download page: http://www.ibm.com/systems/management/dsa.html Dynamic System Analysis DSA collects and analyzes information about various aspects of a system to aid in troubleshooting Creates a merged log with all the retrieved information > Compressed XML file for IBM Support personnel > Optionally, HTML pages can be created for all users Portable Edition > Runs without altering target system > Removes any created temporary files Installable Edition > Permanent > Integrates with UpdateXpress input to rapidly identify down-level firmware and drivers Analysed components: > System configuration > Installed applications and hot fixes > Device drivers and system services > Network interfaces and settings > Performance data and details for running processes > Hardware inventory, including PCI information > Vital product data, firmware, and basic input/output system (BIOS) information > SCSI device sense data > EXA chipset uncorrectable error register information > ServeRAID configuration > Event logs for the operating system, applications, security, ServeRAID controllers, and service processors
  • 22. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 22 Dynamic System Analysis - Portable Edition
  • 23. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 23 Dynamic System Analysis - Installable Edition
  • 24. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 24 > Provide problem isolation, configuration analysis, error log collection > Primary method of testing the major components > Viewed locally or uploaded to an internal FTP server > Standard for System x and BladeCenter servers New Preboot Dynamic System Analysis
  • 25. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 25 > Press F2 key during POST > By default, it takes you to the IBM Memory Test  Select Quit to exit to DSA > Can take up to 10 minutes to load > Power on all attached devices before powering on the server Preboot DSA memory tests Preboot DSA - Access
  • 26. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 26 > Preboot DSA offers several options in a command line menu system > IBM DSA Interactive  Several command line instructions are available Preboot DSA - Command Line
  • 27. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 27 Selecting ‘Diagnostics’ from the main menu will load the diagnostic tests page Preboot DSA - Graphical Diagnostics
  • 28. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 28 Preboot DSA - Graphical Interface Select System Information GUI to enter the Graphical User Menu
  • 29. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 29 Problem Determination - Information Gathering > Machine type and model > Microprocessor or hard disk upgrades > Failure symptom  Do diagnostics fail?  What, when, where, single, or multiple systems?  Is the failure repeatable?  Has this configuration ever worked?  If it has been working, what changes were made prior to it failing?  Is this the original reported failure? > Diagnostics version — type and version level > Hardware configuration  Print (print screen) configuration currently in use  BIOS level > Operating system software — type and version level
  • 30. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 30 > When solving problems – especially ones that involve a component replacement, ensure the following: > Apply code updates to ensure that all code across all boards is matched for levels and will provide a working system > Run the embedded diagnostics program to test the new component > Run a “quick test” on the entire system > Clear the BMC event log in readiness for any subsequent events > The embedded diagnostics programs are the primary method of testing the major components of the server following parts replacement > Event logs are limited in capacity  Once a problem has been resolved, clear the logs so that useful information can be captured, should another fault occur When Solving Problems
  • 31. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 31 Advanced Management Module (AMM) Baseboard Management Controller (BMC) Common Information Model (CIM) Dynamic System Analysis (DSA) Intelligent Platform Management Interface (IPMI) Light Path Diagnostic Multiple processing (MP) Problem Determination and Service Guide (PDSG) Remote Supervisor Adapter (RSA) II Glossary of terms
  • 32. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 32 Course Summary Having completed this topic, you should be able to: > Identify basic troubleshooting questions to consider > Identify the six possible states of a system > Identify diagnostic tools that are available to gather and analyze information for each given system state
  • 33. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 33 Additional Resources IBM STG SMART Zone for more education on Webinar, Web Lectures, etc..: > Internal: http://lt.be.ibm.com/smartzone/modulartechnical > BP: http://www.ibm.com/services/weblectures/dlv/partnerworld IBM System x > http://www-03.ibm.com/systems/x/ IBM BladeCenter Chassis > http://www-03.ibm.com/systems/bladecenter/ IBM BladeCenter Blade Servers > http://www-03.ibm.com/systems/bladecenter/hardware/servers/index.html IBM BladeCenter Redbooks > http://www.redbooks.ibm.com/ IBM ServerProven > http://www-03.ibm.com/servers/eserver/serverproven/compat/us/ IBM System x Support > http://www-304.ibm.com/systems/support/supportsite.wss/brandmain?brandind=5000008
  • 34. IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation 34 End of Presentation

Editor's Notes

  1. {DESCRIPTION} {TRANSCRIPT} Welcome, this section covers the Troubleshooting aspects of servers. This is Topic 11 in a series of topics of the System x Technical Principles Course – XTW01.
  2. {DESCRIPTION} {TRANSCRIPT} At the completion of this topic, you should be able to: Identify basic troubleshooting questions to consider Identify the six possible states of a system Identify diagnostic tools that are available to gather and analyze information for each given system state
  3. {DESCRIPTION} {TRANSCRIPT} This course is designed to familiarize you with troubleshooting tools that will help to determine the appropriate solutions to your system’s problem. We will examine the six system states to gather and analyze problem, introduce the basic tools that will help identify and determine any system issues, and introduce the Dynamic System Analysis (DSA) tool which collects and analyzes system information to aid in diagnosing system problems. The next slides describes the troubleshooting questions to consider.
  4. {DESCRIPTION} {TRANSCRIPT} When troubleshooting System x and BladeCenter blade servers, knowing the answers to these questions can help lead you toward a quicker fix. It also will give you an idea of how to enter the diagnostic package and where to look for error indications. In the next slides, we will take a look at other tools available to help you diagnose and solve hardware-related problems as well as software installation and configuration options.
  5. {DESCRIPTION} {TRANSCRIPT} This section examines the six system states.
  6. {DESCRIPTION} {TRANSCRIPT} When trying to identify a system problem consider the six possible states of a system. All IBM xSeries, eServer (AMD processor-based) and System x servers start in a uniform manner. All have a common set of interfaces to advise where in the power-up sequence the server has reached. Knowing how far the system gets helps in determining what should have happened. This will help you in your problem analysis.
  7. {DESCRIPTION} {TRANSCRIPT} All servers are supported by documentation, which forms part of the tool set for both information gathering and information analysis. For example, a Problem Determination and Service Guide (PDSG), contains a list of errors that may occur (information gathering) during POST but also contain probable causes of the error (information analysis). Not all information sources are available in all system states. Let’s take a closer look at the individual states.
  8. {DESCRIPTION} {TRANSCRIPT} In this state, there are likely to be no indicators from the system. However, that fact is an indicator in itself. A system with no AC power cannot start so there will be no fan noise, the disks will not spin and there will be no lights. Your information gathering tools are your eyes and ears. Sight and sound clues are often overlooked but a lack of visual or audible indicators can be used for problem isolation. The PDSG contains information in several places about such situations and offers guidance on how to test the condition and what action to take to resolve it.
  9. {DESCRIPTION} {TRANSCRIPT} Here, AC power is present. There may be a visual indicator on the outside of the power supply. Depending on the configuration of the system, there may also be event logs from a BMC or an RSA/management module if one is installed in the system. The information gathering tools for BMC are the SMBridge utility and the IBM SVCCon tool. The BMC log is in IPMI standard format and may require some interpretation to identify any problems. The RSA log is in human readable format and will indicate more clearly what is wrong. Remember that the tools from system state 1 are also available in this state. The PDSG will be useful in understanding any event log information that is available.
  10. {DESCRIPTION} {TRANSCRIPT} In this state, something has caused the system to fail in POST. If POST can start, there may be audible warnings or visual warnings in the form of POST messages such as check point codes, adapter messages, etc. In addition, if POST is able to load the setup and diagnostic routines, you may be able to use these tools. If so, both have the ability to view the event logs of the system, giving you another option to view system status. If POST fails on an adapter, the adapter may display messages that give clues to what is failing. Information on adapter failure messages may be contained in the PDSG, RETAIN tips or on the IBM Support Web site. All of these tools are useful in trying to understand the nature of the problem.
  11. {DESCRIPTION} {TRANSCRIPT} Here, POST has completed. This is a strong indicator that the hardware has checked out and is working as designed. However, it does not mean that the hardware is ‘fit for purpose’. As the ‘purpose’ is to have the hardware behave as a server, something is stopping the software from loading. Areas to examine here may include diagnostic checks on disks, the disk boot sequence and any RAID configurations to confirm that a viable RAID array exists and can load an OS. Again, RETAIN tips and the IBM support Web site may have information that can help diagnose why the OS is failing to start.
  12. {DESCRIPTION} {TRANSCRIPT} In this state, the OS has started its boot process but is failing to complete and provide a console. At this stage, the OS itself may give indicators of what is wrong. For example, a Microsoft Windows server OS shows progress indicators on the screen and may report software error messages. This information is likely to be documented on the Microsoft support Web site. Some of the information may also be contained within IBM RETAIN tips. It is important to check all available sources when information is available and verify any fault diagnosis with two confirmations if they are available.
  13. {DESCRIPTION} {TRANSCRIPT} Finally, this state indicates that the system started correctly. This means that, at some point, everything was working as designed and the server was fit for purpose. However, something subsequently failed. An example of this is where a disk fails after the OS loads, causing the system to reboot. Depending on any fault tolerant characteristics of the server, the restart may have been successful. If it was, this opens up new opportunities for information gathering and analysis, in the form of OS event logs and the DSA tool. As with all previous states, if multiple information sources are available, all should be checked to look for matches in symptoms to increase your confidence level in the diagnosis.
  14. {DESCRIPTION} {TRANSCRIPT} It may not always be the case that multiple information sources are available for a given fault. If they are, you must always check all sources against the available analysis tools. Ideally, you are looking for a match. If there is a mis-match between two information sources, you may need to look for a third source to help you to be confident that you have identified the cause of the problem.
  15. {DESCRIPTION} {TRANSCRIPT} Always use recognized reference points. Tools such as RETAIN tips, the PDSG and the IBM support Web site contain the collective knowledge of the teams who designed the system, along with the accumulated knowledge of the many people who support the product. Your experience is, of course, extremely valuable. If you have seen a fault many times and you are working with someone who is seeing the fault for the first time, you can bring your experience to bear and help your team members. Remember to explain why a fault exists, not just how to fix it so that next time, your team members can fully understand the circumstances surrounding the fault.
  16. {DESCRIPTION} {TRANSCRIPT} This section introduces the some of the basic data gathering diagnostic tools starting with the light path diagnostic tool.
  17. {DESCRIPTION} {TRANSCRIPT} The light path diagnostics allow you to quickly identify the type of system error that occurred by monitoring and reporting the health of the processors, main memory, hard disk drives, PCI adapters, fans, power supplies, VRMs, and the internal system temperature. The server is designed so that any LEDs that are illuminated remain illuminated when the server shuts down as long as the power source is good. This feature helps you isolate the problem if an error causes the server to shut down. The system board also contains LEDs beside specific components—such as DIMM Slot 12—identifies the failed part. The light path diagnostics works even when the server is unplugged. The two buttons shown on the light path panel are: Remind - You can use the remind button on the light path diagnostics panel to put the system-error LED on the operator information panel into Remind mode. When you press the remind button, you acknowledge the error but indicate that you will not take immediate action. The system-error LED flashes while it is in Remind mode and stays in Remind mode until one of the following conditions occurs: - All known errors are corrected. - The server is restarted. - A new error occurs, causing the system-error LED to be lit again. Reset - Use this button to force an immediate system restart.
  18. {DESCRIPTION} {TRANSCRIPT} This section identify the three types of service processor: Baseboard Management Control (BMC), Remote Supervisor Adapter (RSA), and Advanced Management Module (AMM).
  19. {DESCRIPTION} {TRANSCRIPT} This table lists the three types of service processor hardware management options available for IBM System x servers and BladeCenter chassis. These are separate service processors used to control power to the device and perform management and diagnostic functions. Baseboard management controller (BMC) or mini baseboard management controller (Mini-BMC) is a specialized microcontroller embedded on the motherboard, and used to control some System x servers. It is the intelligence in the Intelligent Platform Management Interface (IPMI) architecture. The BMC manages the interface between system management software and platform hardware through different types of sensors built into the server report to the BMC on parameters such as temperature, cooling fan speeds, power mode, operating system (OS) status, to name a few. The BMC monitors these sensors and can send alerts to a system administrator via the network if any of the parameters do not stay within preset limits, indicating a potential failure of the system. The administrator can also remotely communicate with the BMC to take some corrective action such as resetting or power cycling the system to get a hung OS running again. Standard in some System x servers and an option in others, the RSA expands BMC capability by allowing you to perform systems management functions whether your server is operational or not. The RSA can both be accessed either in-band through a device driver, or out-band over serial or Ethernet. The AMM provides system-management functions and keyboard/video/mouse (KVM) switching for all of the blade servers in a BladeCenter chassis that support KVM. Each BladeCenter chassis comes with at least one advanced management module. The Remote Supervisor Adapter or the Remote Supervisor Adapter II and the Advanced Management Module (AMM) monitors and reports the status and health of your system’s components via standalone Web interfaces. The Web interfaces contains similar tasks such an System Status and Event Log where you can view possible errors that may indicate a potential failure, and messages captured on the server’s hardware environment.
  20. {DESCRIPTION} {TRANSCRIPT} This section introduce the Dynamic System Analysis (DSA) tools.
  21. {DESCRIPTION} {TRANSCRIPT} IBM Dynamic System Analysis (DSA) collects and analyzes system information to aid in diagnosing system problems. The system information is collected into a compressed XML file that can be sent to IBM Service. By default, DSA output is created in the \IBM_Support directory of the hard disk defined by the %SystemDrive% environment variable. Additionally, users can view the system information through optionally generated HTML Web pages. DSA creates a merged log that allows users to easily identify cause-and-effect relationships from different log sources in the system. Optionally, DSA can also run diagnostics on the installed components. Two versions of the DSA are available. The first, DSA Portable Edition runs from the command prompt on a supported system without altering any system files or system settings. It expands to temporary space on the target system, runs, and deletes all intermediate files after execution completes. Its design and packaging allow it to collect system information in sensitive customer environments with only temporary use of system resources. The second version, DSA Installable Edition provides a permanent installation of DSA onto a system. This installation shares a similar command prompt interface with the portable edition. With DSA Installable Edition, you can get an UpdateXpress comparison analysis to verify whether your firmware and drivers are current.
  22. {DESCRIPTION} {TRANSCRIPT} DSA Portable Edition is not installed on the target system. When run, DSA Portable Edition expands to a temporary directory, which is removed after DSA information collection is completed. DSA Portable Edition is designed to fit on removable media such as a CD or USB key. The removable media must be supported for use with the server on which you plan to run DSA Portable Edition. Installation of DSA requires 15MB of disk space. DSA requires 50 to 100 MB of available memory during the data collection process. The amount of memory required for this process depends on the size of the logs being collected from the system. To view the information that is collected by DSA, you must use Internet Explorer 6.0, with Service Pack 1 (or later) or Mozilla 1.4.0 (or later) or Firefox 1.04 (or later). In order to display the DSA data in a web browser, 30 to 100MB of available memory is required. The exact amount of memory required depends on the size of the logs being viewed. At the time of writing, DSA is launched by running the single MS Windows executable file: (ibm_utl_dsa_200p_windows_noarch.exe) or a Linux distribution specific shell script. It supports various command line options and the command might vary according the released version of the program. Using the -t option allows to send the collected data to IBM System x Service and Support. DSA uses File Transfer Protocol (FTP) to transfer the compressed XML output file to IBM Service. When DSA is run with the -v option, it creates a subdirectory that contains HTML files that you can view with a Web browser.
  23. {DESCRIPTION} {TRANSCRIPT} DSA Installable Edition can be run from the Start menu, from a command prompt or a Linux shell prompt and has the same requirements as the Portable edition. The installation can be performed either manually or in unattended mode. Once installed, the DSA executable file is collectall.exe and, like in the Portable Edition, it supports command-line options. When the file is launched with the -u option, the user can designate a fully qualified path to an UpdateXpress CD or CD image. Alternatively, using the -ul option will instruct DSA to use the online UpdateXpress index. An additional command-line utility, rtdcli.exe , is included with DSA Installable Edition to provides more control over the execution of the diagnostic tests. When launched from the Start Menu, the HTML files are automatically generated.
  24. {DESCRIPTION} {TRANSCRIPT} IBM has developed an embedded Preboot DSA (Dynamic System Analysis) diagnostics tool for the IBM System x3850 M2 and x3950 M2 because the previous tools such as PC Doctor didn’t meet the requirements for the newer IBM System x systems. The Preboot DSA is a NVRAM-based version of the of the Dynamic System Analysis tool that is used by the Technical Support teams to collect system and component level, operating system driver information as well as hardware event logs of different of hardware components or operating system event logs to diagnosis of system problems. The DSA and Preboot DSA collect information that can be viewed locally or uploaded to an IBM internal FTP server for the Technical Support teams to have remote access from different locations at every time in each part of the world in case a deeper analysis of system state information or error logs is required. It is the primary method of testing the major components of the server. This image shows the options available from the main Preboot DSA interface. The Preboot DSA is a feature for selected servers, at this time.
  25. {DESCRIPTION} {TRANSCRIPT} You can accessed the Preboot DSA once the system reaches state 4, completion of POST by pressing the F2 key when it is displayed on screen. By default, the system will first display the memory test menu, simply select Quit using the right arrow key on your keyboard and select ‘Quit to DSA’. The Preboot DSA diagnostic program might appear to be unresponsive for an unusual length of time (up to 10 minutes) when you start the program. This is normal operation while the program loads. It is recommended that you power on all attached devices first before powering on the server.
  26. {DESCRIPTION} {TRANSCRIPT} Next, the Preboot DSA displays a Command main menu that allows you to make the following selections by typing in the following commands: gui - take you the graphical environment cmd - offers various command as an option copy - copy DSA results to a removable media exit - exits the program help - is also available If you choose to type in the option “cmd”, as shown in the second image. It will provide a list of IBM DSA Interactive that allows you to collect, view, or display various tests and results on your system components.
  27. {DESCRIPTION} {TRANSCRIPT} This is a screen shot of the Diagnostics option. You can use this submenu to perform a variety of diagnostic tests on system hardware such as CPU or Memory stress test.
  28. {DESCRIPTION} {TRANSCRIPT} This is a screen shot of the System Information option. Use the submenu to obtain an overview about your system or your multinode partition.
  29. {DESCRIPTION} {TRANSCRIPT} The advice above is not specific to the any server, it is generic to any problem handling. Comparing the configuration and software set-up between “working” and “non-working” systems will often lead to problem resolution. With the variety of hardware and software combinations that can be encountered, the following information can help assist you in problem determination. If possible, have this information available when requesting assistance from Service Support and Engineering functions.
  30. {DESCRIPTION} {TRANSCRIPT} Any time parts are removed to reseat, replaced with a new part or any cables are unplugged for any reason the potential to disturb other components of the system is very high. Especially in multi-board systems, it is vital that code levels are matched; that is they can work together across the boards. You “must” also run a full pass of diagnostics on the part that was replaced, then, if that is OK run a “quick test” on the entire system to make sure no other errors were induced during the maintenance activity. You should also run the Light Path Diagnostic LED test to ensure that the LED’s will be effective in indicating future errors.
  31. {DESCRIPTION} {TRANSCRIPT} This slide presents a glossary of acronyms and terms used in this topic .
  32. {DESCRIPTION} {TRANSCRIPT} Having completed this topic, you should be able to: Identify basic troubleshooting questions to consider Identify the six possible states of a system. Identify diagnostic tools that are available to gather and analyze information for each given system state
  33. {DESCRIPTION} This screen displays html links. {TRANSCRIPT} Listed are some additional resources that will help you learn more about the IBM System x. IBM offers a rich library of resources on a variety of topics - from customized Web-based education to downloadable brochures, planning and installation guides on popular solutions, as well as maintaining IBM Systems.
  34. {DESCRIPTION} Displays the statement of “End of Presentation” in the center of the slide. {TRANSCRIPT} Thank you!