Looking at RAC, GI/Clusterware Diagnostic Tools

Looking at RAC,
GI/Clusterware Diagnostic Tools

Leighton L. Nelson
Oracle DBA Team Lead (10 yrs experience, 6 years with RAC)
RAC SIG US Events Chair and IOUG Liaison

Session# 373

Clusterware, ASM & RAC Diagnostics

• Diagcollection

• Cluster Verification Utility (cluvfy)

• Cluster Health Monitor (CHM)

• Remote Diagnostics Agent (RDA)

• ADRCI/Support Workbench

• OS Utilities

Diagcollection
• Gathers and packages Clusterware logs, traces plus OS logs and core files*

• $ORA_CRS_HOME/bin/diagcollection.pl --collect --crshome
$ORA_CRS_HOME (10gR2)

• $GRID_HOME/bin/diagcollection.pl --collect --core|crs|all (11gR2)

• Logs can be filtered by date/time with --adr --beforetime --aftertime

• Allocate enough space in current directory for diagnostic files
• Needs to be run on all nodes in the cluster.
• Limited information collected if not run as root
• In 11.2 diagcollection enhanced to collect ADR and CHM data

diagcollection example
[root@oelgrid02 u02]# /u01/app/11.2.0/grid/bin/diagcollection.sh --collect

Production Copyright 2004, 2010, Oracle. All rights reserved

Cluster Ready Services (CRS) diagnostic collection tool

The following CRS diagnostic archives will be created in the local directory:

crsData_oelgrid02_20120225_1723.tar.gz -> logs, traces and cores from CRS home.
Note: core files will be packaged only with the --core option.

ocrData_oelgrid02_20120225_1723.tar.gz -> ocrdump, ocrcheck etc

coreData_oelgrid02_20120225_1723.tar.gz -> contents of CRS core files in text
format

osData_oelgrid02_20120225_1723.tar.gz -> logs from Operating System

Collecting crs data

Cluster Verification Utility

• Cluvfy runs in stage mode or component mode

• Can be executed from the Grid Infrastructure Home in 11gR2 or from
installation media

• New resource in 11.2.0.2.0 - ora.cvu

• “cluvfy comp –list” displays components that can be checked

• For standalone cluvfy set CV_HOME CV_JDKHOME and CV_DESTLOC

• Use stage mode during installation/upgrade
• Use component mode to diagnose components after
Clusterware installation
• Doesn’t diagnose all components e.g. HAIP
• $GRID_HOME/bin/cluvfy
• $INSTALL_DISK/runcluvfy.sh

• New in 11.2.0.3.0 :
cluvfy comp healthcheck


cluvfy comp –list output

Cluster Health Monitor (CHM)

• Cluster Health Monitor (CHM) monitors and collect OS and
clusterware metrics in real-time

• Installed by default in 11.2.0.2+

• Collects metrics at 1 sec interval in 11.2.0.2 and 5 sec interval in
11.2.0.3

• Command Line Interface $GRID_HOME/bin/oclumon

• Collects CHM data using diagcollection.pl --collect --chmos

Cluster Health Monitor (CHM)

• Useful for troubleshooting root cause analysis - node
reboots/hangs, instance evictions, performance degradations etc
• OTN version of CHM and 11.2.0.2 version are incompatible. If
you have 11.2.0.2 then you cannot install OTN version.
• Uses OS API to collect metrics reducing overhead
• Clusterware resource called ora.crf
• CHM doesn’t require RAC or Clusterware

OS Watcher Black Box
• OS Watcher v4.0 has been renamed to OS Watcher Black Box (OSWbb)

• UNIX shell scripts for monitoring the OS (ps, top, mpstat, iostat, netstat, vmstat)

• Useful for diagnosing OS resource and performance problems, node reboots

• Should run on all nodes in a cluster

• Setup private interconnect monitoring

• Execute startOSWbb.sh arg1 arg2 where arg1=collection frequency and
arg2=retention time
nohup ./startOSWbb.sh 60 48 &


• Bundled with OS Watcher Black Box Analyzer
(OSWbba)

• Requires Java 1.4.2 or greater

• Correlate OS statistics using the analyzer profile

• Generates graphs and reports for memory, cpu, disk

• Use CLI option to script profile generation for
troubleshooting

OSWbb Free Memory Graph

RACcheck –
RAC Configuration Audit Tool

• RACCHECK OUTPUT

RACcheck –
RAC Configuration Audit Tool

• Assess the configuration of RAC, Clusterware and ASM

• Useful for pre-upgrade and post-upgrade system verification

• Uses “Best Practices” to report configuration problems –
PASS/WARNING/FAIL/INFO

• Generates detailed and summary reports with scorecard

Remote Diagnostics Assistant

• The diagnostics tool recommended by MOS

• Collects a wealth of information based on configuration –
OS/Clusterware/Database logs

• Runs AWR/Statspack report for Performance problems

• Generates reports in HTML format

Procwatcher
• Debug Oracle & Clusterware processes using
oradebug short_stack or OS debugger (e.g. gdb,
pstack)

• Run as Oracle process owner to debug database or as
root for clusterware processes

• Can be deployed as a Clusterware resource

• Useful for troubleshooting session hangs, severe
performance problems, instance evictions

Procwatcher
grid@node1[+ASM1]-/u02 >./prw.sh start all

Wed Feb 25 02:30:26 CDT 2012: Starting Procwatcher

Wed Feb 25 02:30:26 CDT 2012: Thank you for using Procwatcher.
:-)

Wed Feb 25 02:30:26 CDT 2012: Please add a comment to Oracle
Support Note 459694.1

Wed Feb 25 02:30:26 CDT 2012: if you have any comments,
suggestions, or issues with this tool.

Wed Feb 25 02:30:26 CDT 2012: Started Procwatcher

ADRCI/Support Workbench

• Automatic Diagnostic Repository (ADR) stores database
diagnostic information

• Package diagnostics files using ADRCI or Support Workbench

• Manages incidents and problems from alert logs

• Enterprise Manager provides GUI interface to ADR called Support
Workbench

RACDIAG.SQL

• Gathers debug information for RAC Session Hangs

• One-time data capture

• Performs hanganalyze dumps

• Certain types of hangs will prevent it from running

OS Utilities

• truss/strace – trace system calls and signals

• pstack – dump stack trace for process

• pmap/procmap – maps process memory

• nmon/nmon analyzer – collects and analyzes OS stats

• collectl /collectl utils – collects and analyzes OS stats

Summary
Tool/Utility Instance Evictions Node reboots Clusterware RAC Performance
Problems
diagcollection ✓ ✓ ✓ ✗
cluvfy ✗ ✗ ✓ ✗
CHM ✓ ✓ ✓ ✓
OSWbb/OSWbba ✓ ✓ ✓ ✓
RDA ✓ ✓ ✓ ✓
RACcheck ✓ ✓ ✓ ✗
Procwatcher ✓ ✗ ✓ ✓
ADRCI/SW ✗ ✗ ✗ ✓

MOS Notes
• OS Watcher Black Box User Guide [ID 301137.1]

• OS Watcher Black Box Analyzer User Guide [ID 461053.1]

• Data Gathering for Troubleshooting Oracle Clusterware (CRS or GI) Issues [ID 289690.1]

• CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide [ID 330358.1]

• Diagnosability for Oracle Clusterware (CRS or Grid Infrastructure) Component and Resource [ID 357808.1]

• Data Gathering for Troubleshooting RAC Issues [ID 556679.1]

• Cluster Health Monitor (CHM) FAQ [ID 1328466.1]

• Introducing Cluster Health Monitor (IPD/OS) [ID 736752.1]

• RACcheck - RAC Configuration Audit Tool [ID 1268927.1]

• Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes [ID 459694.1]

• Script to Collect RAC Diagnostic Information (racdiag.sql) [ID 135714.1]

Contact Information

• Website - blogs.griddba.com

• LinkedIn – Leighton Nelson

• Twitter - @leight0nn

• Email: leighton.nelson@mercy.net

Looking at RAC, GI/Clusterware Diagnostic Tools

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Looking at RAC, GI/Clusterware Diagnostic Tools

Similar to Looking at RAC, GI/Clusterware Diagnostic Tools (20)

More from Leighton Nelson

More from Leighton Nelson (10)

Recently uploaded

Recently uploaded (20)

Looking at RAC, GI/Clusterware Diagnostic Tools

Editor's Notes