More Related Content Similar to AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tricks for Database 19c (20) More from Sandesh Rao (20) AUSOUG - NZOUG - Groundbreakers - Jun 2019 - 19 Troubleshooting Tips and Tricks for Database 19c 1. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
19 Troubleshooting Tips and Tricks for
Database 19c
Sandesh Rao
VP AIOps , Autonomous Database
1
@sandeshr
https://www.linkedin.com/in/raosandesh/
2. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, timing, and pricing of any
features or functionality described for Oracle’s products may change and remains at the
sole discretion of Oracle Corporation.
2
3. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
1 – Best Practices
Automation and visualization
3
4. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Collection Manager Setup
1. Ensure APEX 5.x or higher is installed & configured
2. Follow Collection Manager installation in Collection Manager User Guide
3. Login to Collection Manager Application via a URL like the following
o Format will depend on choices during installation
http://hostname:port/apex/f?p=ApplicationID
http://hostname:port/pls/apex/f?p=ApplicationID
4
5. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Automatic Start From Install
• ORAchk will be automatically setup as part of TFA install
– Only on Linux or Solaris
– Only for root install
– Only on non-engineered systems
• Autostart will configure the daemon to restart at 1am every morning to rediscover any environment
changes
• Full Local client run will be triggered at 2am every morning
• Most impactful checks will be run every 2 hours via the oratier1 profile
• Any collections older than 2 weeks will be automatically purged
• To configure Autostart from orachk standalone install use:
• Once enabled daemon settings can be changed as per normal
• Remove with: or
orachk -autostart
orachk -autostop tfactl run orachk -autostop
6. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
$ ./orachk -setdbupload all
Enter value for
RAT_UPLOAD_CONNECT_STRING:(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=myserver44.acompany.com
)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orachkcm.acompany.com)))
Enter value for RAT_UPLOAD_PASSWORD:******
Database upload parameters successfully stored in orachk_wallet. orachk run will keep
uploading the collections in database until it is unset using ./orachk -unsetdbupload
all/<env variable name>
6
Configure Details for Upload of Collection Results
• Collection Manager upload configuration only requires connection string & connection password
• The required connection details can be specified using –setdbupload
• You will then be prompted to enter the values for the connection string and password
• These values will be stored in the encrypted wallet file.
./orachk -setdbupload all ./exachk -setdbupload allor
7. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Tip: To configure many ORAchk/EXAchk instances
1. Create wallet once with: -setdbupload all then enter values when
prompted
2. Copy resulting wallet to each ORAchk/EXAchk location
The environment variable RAT_WALLET_LOC can also be used to point to
the location of the wallet directory.
$ ./orachk -checkdbupload
Configuration is good to upload result to database.
• Verify ORAchk/EXAchk can make successful connection to the database for upload using –checkdbupload:
• If connection details are set ORAchk/EXAchk will attempt to upload the results at the end of Health Check
collection
7
Configure Details for Upload of Collection Results
./orachk -checkdbupload ./exachk -checkdbuploador
8. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Collection Manager Dashboard
8
9. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Dashboard Filters
Filter by
Interval
Filter by
configurable
business units Filter by
systems
Click on color coded
area to drill down
9
10. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Most Failures & Warnings
Click to see the
recommendation details
10
11. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Most Failures
Click to drill into
failures
11
12. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Most Warnings
Click to drill into
warnings
12
13. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
No difference OR No regression
failed in current collection
At least one regression from
Non-WARNING to WARNING OR
Found WARNING regression in
current collection
At least one regression from
Non-FAIL to FAIL OR Found FAIL
regression in current collection
Non clickable green flag -
Preceding collection not found
Recent Collections
Health
Score Warning count
Fail count Info count Pass count
Ignore count
13
14. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
View Collection
Collection
Link
14
15. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
View Collection
Recommendation
15
16. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
2 – ADDM in a
Multitenant Environment
16
17. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
ADDM in a Multitenant Environment
• Starting with Oracle Database 12c, ADDM is enabled by default in the root container of a multitenant
container database (CDB)
• Starting with Oracle Database 19c, you can also use ADDM in a pluggable database (PDB)
– In a CDB, ADDM works in the same way as it works in a non-CDB
– ADDM analysis is performed each time an AWR snapshot is taken on a CDB root or a PDB
– ADDM does not work in a PDB by default, because automatic AWR snapshots are disabled
• To enable ADDM in a PDB:
– Set the AWR_PDB_AUTOFLUSH_ENABLED initialization parameter to TRUE in the PDB using the following
command:
• SQL> ALTER SYSTEM SET AWR_PDB_AUTOFLUSH_ENABLED=TRUE;
• Set the AWR snapshot interval greater than 0 in the PDB using the command as shown in the following example:
– SQL> EXEC dbms_workload_repository.modify_snapshot_settings(interval=>60);
• results on a PDB provide only PDB-specific findings and recommendations
18. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
3 – Histogram of alert
log content
18
19. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
• Analyze all important recent log entries: • Search recent log entries:
19
Investigate Logs & Look for Errors
tfactl analyze –last 1d tfactl analyze -search “ora-00600" -last 8h
Searching for
“ora-00600”
20. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Examples
/opt/oracle/tfa/tfa_home/bin/tfactl analyze -since 5h
Show summary of events from alert logs, system messages in last 5 hours.
/opt/oracle/tfa/tfa_home/bin/tfactl analyze -comp os -since 1d
Show summary of events from system messages in last 1 day.
/opt/oracle/tfa/tfa_home/bin/tfactl analyze -search "ORA-" -since 2d
Search string ORA- in alert and system logs in past 2 days
/opt/oracle/tfa/tfa_home/bin/tfactl analyze -search "/Starting/c" -since 2d
Search case sensitive string "Starting" in past 2 days
/opt/oracle/tfa/tfa_home/bin/tfactl analyze -comp os -for "Feb/24/2019 11" -search "."
Show all system log messages at time Feb/24/2019 11
/opt/oracle/tfa/tfa_home/bin/tfactl analyze -comp osw -since 6h
Show OSWatcher Top summary in last 6 hours
/opt/oracle/tfa/tfa_home/bin/tfactl analyze -comp oswslabinfo -from "Feb/26/2019 05:00:01" -to "Feb/26/2019
06:00:01"
Show OSWatcher slabinfo summary for specified time period
/opt/oracle/tfa/tfa_home/bin/tfactl analyze -since 1h -type generic
Analyze all generic messages in last one hour.
21. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
tfactl analyze
$ ./tfactl analyze -type generic -since 7d
INFO: analyzing all (Alert and Unix System Logs) logs for the last 10080 minutes... Please wait...
INFO: analyzing host: myhost1
Report title: Analysis of Alert,System Logs
Report date range: last ~7 day(s)
Report (default) time zone: PST - Pacific Standard Time
Analysis started at: 03-Mar-2019 02:41:52 PM PST
Elapsed analysis time: 3 second(s).
Configuration file: /opt/oracle/tfa/tfa_home/ext/tnt/conf/tnt.prop
Configuration group: all
Total message count: 54,807, from 28-Jan-2019 04:26:28 PM PST to 03-Mar-2019 02:41:34
Messages matching last ~7 day(s): 3,139, from 24-Feb-2019 02:46:23 PM PST to 03-Mar-2019 02:41:34
last ~7 day(s) generic count: 3,139, from 24-Feb-2019 02:46:23 PM PST to 03-Mar-2019 02:41:34
last ~7 day(s) ignored generic count: 0
last ~7 day(s) unique generic count: 94
Message types for last ~7 day(s)
Occurrences percent server name type
----------- ------- -------------------- -----
3,139 100.0% myhost1 generic
----------- -------
3,139 100.0%
22. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
tfactl analyze
Unique generic messages for last ~7 day(s)
Occurrences percent server name generic
----------- ------- -------------------- -----
1,504 47.9% myhost1 : [crflogd(13931)]CRS-9520:The storage of Grid Infrastructure Managem...
487 15.5% myhost1 : [crflogd(13931)]CRS-9520:The storage of Grid Infrastructure Managem...
336 10.7% myhost1 myhost1 smartd[13812]: Device: /dev/sdv, SMART Failure: FAILURE...
336 10.7% myhost1 myhost1 smartd[13812]: Device: /dev/sdag, SMART Failure: FAILURE ...
103 3.3% myhost1 myhost1 last message repeated 9 times
103 3.3% myhost1 myhost1 kernel: oracle: sending ioctl 2285 to a partition!
53 1.7% myhost1 myhost1 init: Re-reading inittab
53 1.7% myhost1 myhost1 kernel: scsi_verify_blk_ioctl: 160 callbacks suppressed
27 0.9% myhost1 myhost1 kernel: scsi_verify_blk_ioctl: 75 callbacks suppressed
21 0.7% myhost1 myhost1 kernel: scsi_verify_blk_ioctl: 415 callbacks suppressed
12 0.4% myhost1 myhost1 auditd[10412]: Audit daemon rotating log files
7 0.2% myhost1 Starting background process VKRM
6 0.2% myhost1 Closing scheduler window
…snipping for brevity…
…error and warning message analytics also available
23. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Pattern Match Search Output
tfactl analyze [example, -search]
$ /opt/oracle/tfa/tfa_home/bin/tfactl analyze -search "ORA-" -since 7d
…snipping for brevity…
[Source: /u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/alert_RATODA1.log, Line: 9494]
Feb 25 22:00:02 2014
Errors in file /u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/RATODA1_j003_10948.trc:
ORA-12012: error on auto execute of job "ORACLE_OCM"."MGMT_CONFIG_JOB_2_1"
ORA-29280: invalid directory path
ORA-06512: at "ORACLE_OCM.MGMT_DB_LL_METRICS", line 2436
ORA-06512: at line 1
End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK"
[Source: /u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/alert_RATODA1.log, Line: 9538]
Feb 28 22:00:05 2014
Errors in file /u01/app/oracle/diag/rdbms/ratoda/RATODA1/trace/RATODA1_j001_14640.trc:
ORA-12012: error on auto execute of job "ORACLE_OCM"."MGMT_CONFIG_JOB_2_1"
ORA-29280: invalid directory path
ORA-06512: at "ORACLE_OCM.MGMT_DB_LL_METRICS", line 2436
ORA-06512: at line 1
24. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
OS Watcher top Data
$ /opt/oracle/tfa/tfa_home/bin/tfactl analyze -comp osw -since 6h
INFO: analyzing host: myhost1
Report title: OSW top logs
Report date range: last ~6 hour(s)
Report (default) time zone: PST - Pacific Standard Time
Analysis started at: 03-Mar-2019 03:14:01 PM PST
Elapsed analysis time: 0 second(s).
Configuration file: /opt/oracle/tfa/tfa_home/ext/tnt/conf/tnt.prop
Configuration group: osw
Parameter:
Total osw rec count: 2,189, from 03-Mar-2019 09:00:05 AM PST to 03-Mar-2019 03:13:55 PM PST
OSW recs matching last ~6 hour(s): 2,107, from 03-Mar-2019 09:14:06 AM PST to 03-Mar-2019 03:13:55 PM PST
statistic: t first highest (time) lowest (time) average non zero 3rd last 2nd last last trend
top.cpu.util.id: % 98.0 99.7 @10:35AM 72.8 @03:11PM 97.3 2,059 95.2 96.8 96.0 -2%
top.cpu.util.st: % 0.1 0.1 @09:14AM 0.0 @09:14AM 0.0 889 0.0 0.0 0.0 -100%
top.cpu.util.us: % 0.1 8.8 @11:31AM 0.0 @09:14AM 0.6 1,966 4.3 0.8 3.4 3300%
top.cpu.util.wa: % 1.7 18.7 @03:11PM 0.1 @10:35AM 1.1 2,059 0.3 0.4 0.4 -76%
top.loadavg.last01min: 1.17 3.12 @09:44AM 0.07 @12:45PM 0.93 1,823 0.31 0.26 0.22 -81%
top.loadavg.last05min: 0.94 2.26 @09:44AM 0.27 @12:45PM 0.93 1,823 0.82 0.79 0.77 -18%
top.loadavg.last15min: 0.79 1.60 @09:46AM 0.44 @01:18PM 0.92 1,823 0.96 0.95 0.94 18%
top.mem.buffers: k 808232 808388 @09:41AM 785608 @02:57PM 796511 2,093 785744 785744 785744 -2%
top.mem.free: k 1130332 1291344 @10:02AM 927576 @09:43AM 1188576 2,093 1244020 1265248 1265188 11%
top.swap.used: k 47556 48088 @03:00PM 47556 @09:14AM 47828 2,097 48088 48088 48088 1%
top.tasks.running: 1 4 @12:04PM 1 @09:14AM 1 1,996 1 2 2 100%
top.tasks.total: 514 527 @02:57PM 509 @09:18AM 514 1,996 518 521 520 1%
top.tasks.zombie: 0 5 @11:04AM 0 @09:14AM 0 62 0 0 0 n/a
top.users: 5 6 @03:00PM 5 @09:14AM 5 1,823 6 6 6 20%
…snipping for brevity…
25. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
OS Watcher slabinfo Data
$ /opt/oracle/tfa/tfa_home/bin/tfactl analyze -comp oswslabinfo -from "Feb/26/2019 05:00:01" -to "Feb/26/2019 06:00:01"
INFO: analyzing host: myhost1
Report title: OSW slabinfo logs
Report date range: 26/Feb/2019 to 26/Feb/2019
Report (default) time zone: PST - Pacific Standard Time
Analysis started at: 03-Mar-2019 03:17:45 PM PST
Elapsed analysis time: 10 second(s).
Configuration file: /opt/oracle/tfa/tfa_home/ext/tnt/conf/tnt.prop
Configuration group: oswslabinfo
Parameter:
Total osw rec count: 45,752, from 26-Feb-2019 05:00:06 AM PST to 03-Mar-2019 03:17:51 PM PST
OSW recs matching 26/Feb/2014 to 26/Feb/2014: 351, from 26-Feb-2019 05:00:06 AM PST to 26-Feb-2019 05:59:55 AM PST
statistic: t first highest (time) lowest (time) average non zero 3rd last 2nd last last trend
slabinfo.acfs_ccb_cache.active_objs: 4 38 @05:52AM 0 @05:01AM 10 294 3 1 8 100%
slabinfo.inet_peer_cache.active_objs: 23 39 @05:59AM 23 @05:00AM 23 351 23 23 39 69%
slabinfo.sigqueue.active_objs: 385 768 @05:28AM 285 @05:27AM 554 351 712 621 577 49%
slabinfo.skbuff_fclone_cache.active_objs: 55 133 @05:51AM 11 @05:20AM 69 351 56 77 70 27%
slabinfo.names_cache.active_objs: 126 180 @05:00AM 110 @05:23AM 146 351 171 166 156 23%
slabinfo.sgpool-8.active_objs: 135 228 @05:31AM 59 @05:11AM 152 351 180 165 157 16%
slabinfo.UDP.active_objs: 568 675 @05:28AM 492 @05:17AM 597 351 630 596 626 10%
slabinfo.size-8192.active_objs: 174 209 @05:36AM 160 @05:14AM 181 351 205 187 188 8%
slabinfo.task_delay_info.active_objs: 1477 1856 @05:28AM 1334 @05:57AM 1574 351 1529 1411 1579 6%
slabinfo.pid.active_objs: 1608 1980 @05:29AM 1452 @05:21AM 1678 351 1564 1487 1689 5%
slabinfo.blkdev_requests.active_objs: 720 880 @05:04AM 651 @05:54AM 745 351 707 736 761 5%
slabinfo.size-256.active_objs: 1116 1305 @05:06AM 846 @05:11AM 1091 351 1245 1143 1166 4%
slabinfo.ip_dst_cache.active_objs: 1497 1800 @05:28AM 1279 @05:36AM 1517 351 1594 1466 1560 4%
slabinfo.sock_inode_cache.active_objs: 2168 2329 @05:11AM 2106 @05:56AM 2225 351 2322 2278 2232 2%
slabinfo.size-512.active_objs: 3036 3152 @05:38AM 3007 @05:01AM 3088 351 3136 3112 3075 1%
****
26. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
4 – Keep track of the attribute of
important files pre-post patching
26
27. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
$ ./orachk -fileattr start
CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to
/u01/app/11.2.0.4/grid?[y/n][y]
Checking ssh user equivalency settings on all nodes in cluster
Node mysrv22 is configured for ssh user equivalency for oradb user
Node mysrv23 is configured for ssh user equivalency for oradb user
List of directories(recursive) for checking file attributes:
/u01/app/oradb/product/11.2.0/dbhome_11203
/u01/app/oradb/product/11.2.0/dbhome_11204
orachk has taken snapshot of file attributes for above directories at:
/orahome/oradb/orachk/orachk_mysrv21_20170504_041214
• Track changes to the attributes of important files with –fileattr
– Looks at all files & directories within Grid Infrastructure and Database homes by default
– The list of monitored directories and their contents can be configured to your specific requirements
– Use –fileattr start to start the first snapshot
27
Keep Track of Changes to the Attributes of Important Files
Note - 1268927.2
./orachk –fileattr start
28. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Note:
• Use the same arguments with check that you used with start
• Will proceed to perform standard health checks after attribute checking
• File Attribute Changes will also show in HTML report output
$ ./orachk -fileattr check -includedir "/root/myapp/config" -excludediscovery
CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to
/u01/app/18/19c.0/grid?[y/n][y]
Checking for prompts on myserver18 for oragrid user...
Checking ssh user equivalency settings on all nodes in cluster
Node myserver17 is configured for ssh user equivalency for root user
List of directories(recursive) for checking file attributes:
/root/myapp/config
Checking file attribute changes...
.
"/root/myapp/config/myappconfig.xml" is different:
Baseline : 0644 oracle root /root/myapp/config/myappconfig.xml
Current : 0644 root root /root/myapp/config/myappconfig.xml
…etc
…etc
• Compare current attributes against first snapshot using –fileattr check
28
Keep Track of Changes to the Attributes of Important Files
./orachk –fileattr check
• Results of snapshot comparison will also
be shown in the HTML report output
29. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
5- Event Notification
29
30. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
ORAchk/EXAchk email Notification
• Automatically started & configured to run Critical Health Checks
• You only need to configure your email for notification
30
tfactl orachk/exachk -set “NOTIFICATION_EMAIL=SOME.BODY@COMPANY.COM
31. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
ORAchk/EXAchk
Report
31
32. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Use an External SMTP Service for Notifications
• TFA can send email notification when faults are
detected
– Configure SMTP with
– Check the current SMTP configuration
– Verify configuration by sending a test email with
• To set notification email for any problem detected:
• To set notification email for specific
ORACLE_HOMEs include the OS owner:
32
tfactl set notificationAddress=john.doe@oracle.com
tfactl set notificationAddress=oracle:another.person@oracle.com
tfactl set smtp
tfactl print smtp
tfactl sendmail {email_address}
33. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 33
Event Notification
34. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
6 – Self Analysis in MOS
using TFA uploads
34
37. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
1. Enter default for event date/time and database name
2. Scans system to identify recent 10 events in the system (ORA600
example shown)
3. Once the relevant event is chosen, proceeds with diagnostic
collection
37
One Command SRDC
tfactl diagcollect –srdc <srdc_type>
4. All required files are
identified
5. Trimmed where
applicable
6. Package in a zip ready
to provide to support
Interactive Mode
41. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
7 – Diag tools which
run by default
41
42. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Automatically run on Exadata user domain (DomU)
• EXAchk will be automatically setup on Exadata user domain (DomU) as part of TFA
install
• Autostart will configure the daemon to restart at 1am every morning to rediscover any
environment changes
• Full Local client run will be triggered at 2am every morning
• Most impactful checks will be run every 2 hours via the exatier1 profile
• Any collections older than 2 weeks will be automatically purged
• To configure Autostart from exachk standalone install use:
• Once enabled daemon settings can be changed as per normal
• Remove with: or
exachk -autostart
exachk -autostop tfactl run exachk -autostop
42
43. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Automatic Start From Install
• ORAchk will be automatically setup as part of TFA install
– Only on Linux or Solaris
– Only for root install
– Only on non-engineered systems
• Autostart will configure the daemon to restart at 1am every morning to rediscover any environment
changes
• Full Local client run will be triggered at 2am every morning
• Most impactful checks will be run every 2 hours via the oratier1 profile
• Any collections older than 2 weeks will be automatically purged
• Once enabled daemon settings can be changed as per normal
• Remove with: ororachk -autostop tfactl run orachk -autostop
44. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
8 – Restrict the maximum size of
files collected for data upload
44
45. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Option to restrict the collection of excessively large files
• During collection (both on-demand and automated) TFA will verify the size
of each file to be collected
– If file is greater than MaxFileCollectSize MB only the last 1,000 lines will be collected
– Creates a new file to the TFA Collection called skipped_files.txt showing all the files
that were too large
tfactl set MaxFileCollectSize <size_mb>
46. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Option to restrict the collection of excessively large files
tfactl print config
.------------------------------------------------------------------------------------.
| myserver |
+-----------------------------------------------------------------------+------------+
| Configuration Parameter | Value |
+-----------------------------------------------------------------------+------------+
| TFA Version | 19.1.0.0.0 |
…
…
| Max Collection Size of Core Files (MB) | 500 |
| Max File Collection Size (MB) | 5120 |
| Minimum Free Space to enable Alert Log Scan (MB) | 500 |
| Time interval between consecutive Disk Usage Snapshot(minutes) | 60 |
| Age of Purging Collections (Hours) | 12 |
| TFA IPS Pool Size | 5 |
…
…
| AUTO Collection will be generated for CHA EVENTS | false |
'-----------------------------------------------------------------------+------------'
47. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
9 – REST Interfaces for Orachk
and TFA to programmatically
execute diagnostics
47
48. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
REST service
• REST support allows invocation & query over HTTPS
• Oracle REST Data Services (ORDS) is included within the install
• Once ORDS is running REST invocations can be made via:
• Post to Result
• Can be downloaded with:
48
tfactl rest [-status|-start|-stop|-uninstall] [-dir ] [-port ] [-user ] [-debug [-level ]]
https://host:port/ords/api
https://myhost:9090/ords/tfactl/diagcollect {
"collectionId" : "20180111011121slc12ekf",
"zipName" : "TFA_DEF_ZIP_20180111011121",
"tagName" : "TFA_DEF_TAG_20180111011121"
}
https://myhost:9090/ords/tfactl/download/20180111011121slc12ekf
The tfactl rest command
can only be run by root
user
The standalone ORDS setup feature utilizes file based user authentication and is provided solely for use in test and development environments.
For production use, the included ords.war should be deployed and configured.
49. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
REST service
49
Option Description
-status Prints the current status
-start Starts Oracle Trace File Analyzer REST Services if not already running
-stop Stops Oracle Trace File Analayzer REST services if running
-uninstall Removes the Oracle Trace File Analyzer REST configuration
-dir
The directory to use to store the Oracle Trace File Analyzer REST configuration details.
Defaults to the users home directory
-port
The port to run ORDS on
Defaults to 9090
-user
The user to start ORDS as
Defaults to the GRID owner
-debug Enables debug
-level
The level of debug to use, where available levels are:
•1 - FATAL
•2 - ERROR
•3 - WARNING
•4 - INFO (Default)
•5 - DEBUG
•6 - TRACE
50. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
• New run commands added to REST API:
REST Service Extensions
50
Shows summary of events from alert logshttps://host:port/ords/tfactl/run/alertsummary
Shows major events from the cluster event loghttps://host:port/ords/tfactl/run/calog
Shows system changes including DB parameters, OS parameters & patcheshttps://host:port/ords/tfactl/run/changes
Reports warnings and errors seen in the logshttps://host:port/ords/tfactl/run/events
Reports history of commands for the tfactl shellhttps://host:port/ords/tfactl/run/history
51. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
REST Service Extensions
• New option to upgrade existing ORDS REST service to latest API
• This command will:
1. Check if any new updates are available and if so stop ORDS
2. Upgrade the ORDS configuration to support the latest API updates
3. Restart ORDS again
51
tfactl rest -upgrade
52. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
REST Service Via Tomcat
• TFA includes a WAR file to enable the REST Service via Apache Tomcat
1. Deploy the WAR file located at to your Tomcat server
2. Change the tfaadmin user password
3. Change the tfarest user password
4. Add the user Tomcat runs as to the TFA access list
52
TFA_HOME/jlib/tfa.war
curl -k --user tfaadmin:tfaadmin https://host/tfa/tfactl/user/update
‘[{"password" : "some_new_password" }]’
curl -k --user tfarest:tfarest https://host/tfa/tfactl/user/update
‘[{"password" : "some_new_password" }]’
tfactl access add -user <tomcat_user>
53. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
REST Interface
• ORAchk and EXAchk include full REST support, allowing invocation & query over HTTPS
• Oracle REST Data Services (ORDS) is included within the install
• To enable REST:
1. Start ORDS:
2. Start the daemon, using the -ords option:
• Start a full health check run by accessing the URL: https://<host>:7080/ords/tfaml/orachk/start_client
• Run specific profiles: https://<host>:7080/ords/tfaml/orachk/profile/<profile1>,<profile2>
• Run specific checks: https://<host>:7080/ords/tfaml/orachk/check/<check_id>,<check_id>
• Any request will return a job id, which can then be used to query:
– Status: https://<host>:7080/ords/tfaml/orachk/status/<job_id>
– Download result: https://<host>:7080/ords/tfaml/orachk/download/<job_id>
–ordssetup
-d start -ords
The standalone ORDS setup feature utilizes file based user authentication and is provided solely for use in test and development environments.
For production use, the included orachk.jar and ords.war should be deployed and configured.
54. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
10 – Blackout option
for scheduled events
54
55. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Temporarily restrict automatic collections for specific events
• blackout option added to tfactl command
• Temporarily prevents automatic diagnostic collections for specific events
• Can be set for certain targets, events & durations
• Examples:
– Do not collect ORA-00600 events on mydb for the next 24hrs (default time)
– Do not collect ORA-04031 events on any database for the next hour
– Do not collect any events (during patching)
tfactl blackout add -targettype database -target mydb -event "ORA-00600"
tfactl blackout add -targettype database -target mydb -event "ORA-04031"
tfactl blackout add -targettype all -event all -target all -timeout 1h -reason "Disabling all
events during patching"
56. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
• Use the print option to see all blackouts in place:
• Use remove to take away a blackout
Temporarily restrict automatic collections for specific events
.-------------------------------------------------------------------------------------------------------------------------------------------------------.
| Target Type | Target | Events | Start Time | End Time | Do Collection | Reason |
+-------------+--------+-----------+------------------------------+------------------------------+---------------+--------------------------------------+
| ALL | ALL | ALL | Tue Feb 19 00:23:47 PST 2019 | Tue Feb 19 01:23:47 PST 2019 | false | Disabling all events during patching |
| DATABASE | ALL | ORA-04030 | Tue Feb 19 00:22:39 PST 2019 | Sun Feb 19 00:22:39 PST 2119 | false | NA |
| DATABASE | ALL | ORA-04031 | Tue Feb 19 00:21:27 PST 2019 | Tue Feb 19 01:21:27 PST 2019 | false | NA |
| DATABASE | MYDB | ORA-00600 | Tue Feb 19 00:20:34 PST 2019 | Wed Feb 20 00:20:34 PST 2019 | false | NA |
'-------------+--------+-----------+------------------------------+------------------------------+---------------+--------------------------------------'
tfactl blackout print
tfactl blackout remove -targettype database -event "ORA-00600" -target mydb
57. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Temporarily restrict automatic collections for specific events
Manage TFA Blackouts
Usage : tfactl blackout add|remove|print -targettype all|crs|asm|asmdg|database|listener|service|os -
target all|name -event all|"event_str1,event_str2" [-timeout nh|nd|none] [-c|-local] [-reason "reason
for blackout"] [-docollection]
add|remove|print Adds or removes or print blackout conditions
-targettype <type> Limits the blackout to only the target type given
[all||crs|asm|asmdg|database|listener|service|os](default all)
-target all|name Target for blackout (default all)
-events all|"str1,str2" Limits blackout to only availability events or event strings which should
not trigger auto collections or be marked as backed out in telemetry JSON
-timeout nh|nd|none Duration for blackout in number of hours or days before timing out
(default 24h)
-c|-local Cluster wide or Local (default local)
-reason <comment> Comment describing the reason for the blackout
-docollection Even though a blackout is set still do an auto collection for this target
tfactl blackout -help
58. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
11 - Detect and Collect
using SRDC’s
58
59. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
1. Enter default for event date/time and database name
2. Scans system to identify recent 10 events in the system (ORA600
example shown)
3. Once the relevant event is chosen, proceeds with diagnostic
collection
59
One Command SRDC
tfactl diagcollect –srdc <srdc_type>
4. All required files are
identified
5. Trimmed where
applicable
6. Package in a zip ready
to provide to support
Interactive Mode
60. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 60
Full List of SRDCs
Type of Problem SRDC Types
ORA Errors
• ORA-00020
• ORA-00060
• ORA-00600
• ORA-00700
• ORA-01031
• ORA-01555
• ORA-01578
• ORA-01628
• ORA-04030
• ORA-04031
• ORA-07445
• ORA-08102
• ORA-08103
• ORA-27300
• ORA-27301
• ORA-27302
• ORA-29548
• ORA-30036
Database
performance
• dbperf • dbsqlperf
Database resource • dbunixresources
Other internal
database errors
• internalerror
Database patching
• dbpatchinstall
• dbpatchconflict
Transparent Data
Encryption (TDE)
problems
• dbtde
Database Export
• dbexp
• dbexpdp
• dbexpdpapi
• dbexpdpperf
• dbexpdptts
Database Import
• dbimp
• dbimpdp
• dbimpdpperf
RMAN
• dbrman
• dbrman600
• dbrmanperf
Type of Problem SRDC Types
System change number • dbscn
GoldenGate
• dbggclassicmode
• dbggintegratedmode
Database install /
upgrade
• dbinstall
• dbupgrade
• dbpreupgrade
Database storage • dbasm
Corrupt block relative
dba
• dbblockcorruption
ASM/DBFS/DNFS/ACFS • dnfs
Partition problems • dbpartition
Slow partitioned
table/index
commands
• dbpartitionperf
SQL performance • dbsqlperf
UNDO corruption • dbundocorruption
Exalogic • esexalogic
Listener errors • listener_services
Naming service errors • naming_services
Database Auditing • dbaudit
Excessive SYSAUX Space • dbawrspace
Type of Problem SRDC Types
Database resources • dbunixresources
Database startup /
shutdown
• dbshutdown
• dbstartup
XDB • dbxdb
Data Guard • dbdataguard
Enterprise Manager
tablespace usage
metric
• emtbsmetrics
EM general metrics • emmetricalert
EM debug log
collection
• emdebugon • emdebugoff
EM target discovery
• emcliadd
• emclusdisc
• emdbsys
• emgendisc
• emprocdisc
EM OMS restart • emrestartoms
EM Agent
performance
• emagentperf
EM crash • emomscrash
EM java heap usage
or performance
• emomsheap
EM OMS crash,
restart or
performance
• emomshungcpu
tfactl diagcollect –srdc <srdc_type> -sr <SR#>
61. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Manual Method
1. Generate ADDM reviewing Document 1680075.1 (multiple steps)
2. Identify “good” and “problem” periods and gather AWR reviewing
Document 1903158.1 (multiple steps)
3. Generate AWR compare report (awrddrpt.sql) using “good” and
“problem” periods
4. Generate ASH report for “good” and “problem” periods reviewing
Document 1903145.1 (multiple steps)
5. Collect OSWatcher data reviewing Document 301137.1 (multiple
steps)
6. Collect Hang Analyze output at Level 4
7. Generate SQL Healthcheck for problem SQL id using Document
1366133.1 (multiple steps)
8. Run support provided sql scripts – Log File sync diagnostic output using
Document 1064487.1 (multiple steps)
9. Check alert.log if there are any errors during the “problem” period
10. Find any trace files generated during the “problem” period
11. Collate and upload all the above files/outputs to SR
Automated One Command TFA SRDC
1. Run
61
Targeted Diagnostics – Service Request Data Collections (SRDCs)
tfactl diagcollect –srdc dbperf
[-sr <sr_number>]
62. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Collect ORA-00600 SRDC
bash-4.1$ ./tfactl diagcollect -srdc ORA-00600
Enter the time of the ORA-00600 [YYYY-MM-DD HH24:MI:SS,=ALL] :
Enter the Database Name [=ALL] :
1. Oct/18/2018 02:38:37 : [ogg11204] ORA-00600: internal error code, arguments: [ktfbtgex-7], [1015817],
[1024], [1015816], [], [], [], [], [], [], [], []
2. Oct/18/2018 02:38:25 : [ogg11204] ORA-00600: internal error code, arguments: [ksprcvsp2],
[1596993584], [], [], [], [], [], [], [], [], [], []
Please choose the event : 1-2 [1]
62
63. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Collect ORA-00600 SRDC
Selected value is : 1 ( Oct/18/2018 02:38:37 )
Scripts to be run by this srdc: ipspack rdahcve1210 rdahcve1120 rdahcve1110
Components included in this srdc: OS CRS DATABASE
Collecting data for local node(s)
Scanning files from Oct/17/2018 20:38:37 to Oct/18/2018 08:38:37
WARNING: End time entered is after the current system time.
Collection Id : 20181018032231myserver69
Detailed Logging at :
/scratch/app/oragrid/tfa/repository/srdc_ora600_collection_Thu_Oct_18_03_22_31_PDT_2018_node_loca
l/diagcollect_20181018032231_myserver69.log
2018/10/18 03:22:36 PDT : NOTE : Any file or directory name containing the string .com will be renamed to
replace .com with dotcom
63
64. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Collect ORA-00600 SRDC
.-----------------------------------------------------.
| Collection Summary |
+----------------+---------------+---------+-------+
| Host | Status | Size | Time |
+----------------+---------------+--------+--------+
| myserver69| Completed| 2MB | 97s |
'-----------------+---------------+--------+---------'
Logs are being collected to:
/scratch/app/oragrid/tfa/repository/srdc_ora600_collection_Thu_Oct_18_03_22_31_PDT_2018_node_local
/scratch/app/oragrid/tfa/repository/srdc_ora600_collection_Thu_Oct_18_03_22_31_PDT_2018_node_local
/myserver69.tfa_srdc_ora600_Thu_Oct_18_03_22_31_PDT_2018.zip
64
65. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Event specific collection for default diagnostic collection
Choose the event you want to perform a diagnostic collection for:
1. Mar/12/2019 16:08:20 [ db.orcl.orcl ] ORA-04030: out of process memory when trying to allocate
2. Mar/12/2019 16:08:18 [ db.orcl.orcl ] ORA-04031: unable to allocate 8 bytes of shared memory
3. Mar/12/2019 16:08:16 [ db.orcl.orcl ] ORA-00494: enqueue held for too long more than seconds by osid
4. Mar/12/2019 16:08:14 [ db.orcl.orcl ] ORA-29709: Communication failure with Cluster Synchronization
5. Mar/12/2019 16:08:04 [ db.orcl.orcl ] ORA-29702: error occurred in Cluster Group Service operation
6. Mar/12/2019 16:07:59 [ db.orcl.orcl ] ORA-32701: Possible hangs up to hang ID= detected
7. Mar/12/2019 16:07:51 [ db.orcl.orcl ] ORA-07445: exception encountered: core dump [] [] [] [] [] []
8. Mar/12/2019 16:07:49 [ db.orcl.orcl ] ORA-00700: soft internal error, arguments: [700], [], [],[]
9. Mar/11/2019 22:02:19 [ db.oradb.oradb ] DIA0 Critical Database Process Blocked: Hang ID 1 blocks 5
sessions
10. Default diagnostic collection, for no specific event
Please choose the event : 1-10 [] 9
tfactl diagcollect
66. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Event specific collection for default diagnostic collection
Collecting data for all nodes
Scanning files from mar/11/2019 18:02:19 to mar/11/2019 23:02:19
Collection Id : 20190312162708myserver
Detailed Logging at :
/scratch/app/product/18c/tfa/repository/collection_Tue_Mar_12_16_27_09_PDT_2019_node_all/diagcollect_20190312162708_myse
rver.log
2019/03/12 16:27:12 PDT : NOTE : Any file or directory name containing the string .com will be renamed to replace .com
with dotcom
2019/03/12 16:27:12 PDT : Collection Name : tfa_Tue_Mar_12_16_27_09_PDT_2019.zip
2019/03/12 16:27:12 PDT : Collecting diagnostics from hosts : [myserver]
2019/03/12 16:27:12 PDT : Scanning of files for Collection in progress...
2019/03/12 16:27:12 PDT : Collecting additional diagnostic information...
2019/03/12 16:27:17 PDT : Getting list of files satisfying time range [03/11/2019 18:02:19 PDT, 03/11/2019 23:02:19 PDT]
2019/03/12 16:27:23 PDT : Collecting ADR incident files...
2019/03/12 16:27:28 PDT : Completed collection of additional diagnostic information...
2019/03/12 16:27:33 PDT : Completed Local Collection
.------------------------------------.
| Collection Summary |
+----------+-----------+------+------+
| Host | Status | Size | Time |
+----------+-----------+------+------+
| myserver | Completed | 10MB | 21s |
'----------+-----------+------+------'
Logs are being collected to: /scratch/app/product/18c/tfa/repository/collection_Tue_Mar_12_16_27_09_PDT_2019_node_all
/scratch/app/product/18c/tfa/repository/collection_Tue_Mar_12_16_27_09_PDT_2019_node_all/myserver.tfa_Tue_Mar_12_16_27_0
9_PDT_2019.zip
67. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Event specific collection for default diagnostic collection
Choose the event you want to perform a diagnostic collection for:
1. Mar/12/2019 16:08:20 [ db.orcl.orcl ] ORA-04030: out of process memory when trying to allocate
2. Mar/12/2019 16:08:18 [ db.orcl.orcl ] ORA-04031: unable to allocate 8 bytes of shared memory
3. Mar/12/2019 16:08:16 [ db.orcl.orcl ] ORA-00494: enqueue held for too long more than seconds by osid
4. Mar/12/2019 16:08:14 [ db.orcl.orcl ] ORA-29709: Communication failure with Cluster Synchronization
5. Mar/12/2019 16:08:04 [ db.orcl.orcl ] ORA-29702: error occurred in Cluster Group Service operation
6. Mar/12/2019 16:07:59 [ db.orcl.orcl ] ORA-32701: Possible hangs up to hang ID= detected
7. Mar/12/2019 16:07:51 [ db.orcl.orcl ] ORA-07445: exception encountered: core dump [] [] [] [] [] []
8. Mar/12/2019 16:07:49 [ db.orcl.orcl ] ORA-00700: soft internal error, arguments: [700], [], [],[]
9. Mar/11/2019 22:02:19 [ db.oradb.oradb ] DIA0 Critical Database Process Blocked: Hang ID 1 blocks 5
sessions
10. Default diagnostic collection, for no specific event
Please choose the event : 1-10 [] 1
Event with an existing SRDC – SRDC will be used for collection
The user running the collection needs to be in the dbagroup of the database chosen in the event list
tfactl diagcollect
68. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Event with an existing SRDC – SRDC will be used for collection
Event specific collection for default diagnostic collection
Scripts to be run by this srdc: srdc_db_sid_memorysizes_10glower.sql srdc_db_sid_memorysizes_11gplus.sql ipspack rdahcve1210
rdahcve1120 rdahcve1110
Components included in this srdc: OS DATABASE CHMOS
Collecting data for local node(s)
Scanning files from Mar/12/2019 14:08:20 to Mar/12/2019 18:08:20
WARNING: End time entered is after the current system time.
Collection Id : 20190312163524myserver
Detailed Logging at :
/scratch/app/product/18c/tfa/repository/srdc_ora4030_collection_Tue_Mar_12_16_35_25_PDT_2019_node_local/diagcollect_20190312163524
_myserver.log
2019/03/12 16:35:30 PDT : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom
2019/03/12 16:35:30 PDT : Collection Name : tfa_srdc_ora4030_Tue_Mar_12_16_35_25_PDT_2019.zip
2019/03/12 16:35:30 PDT : Scanning of files for Collection in progress...
2019/03/12 16:35:30 PDT : Collecting additional diagnostic information...
2019/03/12 16:35:35 PDT : Getting list of files satisfying time range [03/12/2019 14:08:20 PDT, 03/12/2019 16:35:30 PDT]
2019/03/12 16:35:49 PDT : Collecting ADR incident files...
2019/03/12 16:35:52 PDT : Completed collection of additional diagnostic information...
2019/03/12 16:35:54 PDT : Completed Local Collection
.-------------------------------------.
| Collection Summary |
+----------+-----------+-------+------+
| Host | Status | Size | Time |
+----------+-----------+-------+------+
| myserver | Completed | 2.9MB | 24s |
'----------+-----------+-------+------'
Logs are being collected to:
/scratch/app/product/18c/tfa/repository/srdc_ora4030_collection_Tue_Mar_12_16_35_25_PDT_2019_node_local
/scratch/app/product/18c/tfa/repository/srdc_ora4030_collection_Tue_Mar_12_16_35_25_PDT_2019_node_local/myserver.tfa_srdc_ora4030_
Tue_Mar_12_16_35_25_PDT_2019.zip
69. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Running a default non event specific collection
Event specific collection for default diagnostic collection
tfactl diagcollect
Choose the event you want to perform a diagnostic collection for:
1. Mar/12/2019 16:08:20 [ db.orcl.orcl ] ORA-04030: out of process memory when trying to allocate
2. Mar/12/2019 16:08:18 [ db.orcl.orcl ] ORA-04031: unable to allocate 8 bytes of shared memory
3. Mar/12/2019 16:08:16 [ db.orcl.orcl ] ORA-00494: enqueue held for too long more than seconds by
osid
4. Mar/12/2019 16:08:14 [ db.orcl.orcl ] ORA-29709: Communication failure with Cluster
Synchronization
5. Mar/12/2019 16:08:04 [ db.orcl.orcl ] ORA-29702: error occurred in Cluster Group Service operation
6. Mar/12/2019 16:07:59 [ db.orcl.orcl ] ORA-32701: Possible hangs up to hang ID= detected
7. Mar/12/2019 16:07:51 [ db.orcl.orcl ] ORA-07445: exception encountered: core dump [] [] [] [] []
[]
8. Mar/12/2019 16:07:49 [ db.orcl.orcl ] ORA-00700: soft internal error, arguments: [700], [], [],[]
9. Mar/11/2019 22:02:19 [ db.oradb.oradb ] DIA0 Critical Database Process Blocked: Hang ID 1 blocks 5
sessions
10. Default diagnostic collection, for no specific event
Please choose the event : 1-10 [] 10
70. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
12 - Manage logs
70
71. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Automatic Database Log Purge
• TFA can automatically purge database logs
– OFF by default
– Except on a Domain Service Cluster (DSC),
which it is ON by default
• Turn auto purging on or off:
• Will remove logs older than 30 days
– configurable with:
• Purging runs every 60 minutes
– configurable with:
71
tfactl set manageLogsAutoPurge=<ON|OFF>
tfactl set manageLogsAutoPurgePolicyAge=<n><d|h>
tfactl set manageLogsAutoPurgeInterval=<minutes>
72. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Manual Database Log Purge
• TFA can manage ADR log and trace files
– Show disk space usage of individual diagnostic destinations
– Purge these file types based on diagnostic location and or age:
• "ALERT“, "INCIDENT“, "TRACE“, "CDUMP“, "HM“, "UTSCDMP“, "LOG“
tfactl managelogs <options>
Runs as the ADR home
owner. So will only be able
to purge files this owner
has permission to delete
Option Description
–show usage Shows disk space usage per diagnostic directory for both GI and database logs
-show variation –older <n><m|h|d> Use to determine per directory disk space growth.
Shows the disk usage variation for the specified period per directory.
-purge –older <n><m|h|d> Remove all ADR files under the GI_BASE directory, which are older than the time specified
–gi Restrict command to only diagnostic files under the GI_BASE
–database [all | dbname] Restrict command to only diagnostic files under the database directory. Defaults to all,
alternatively specify a database name
-dryrun Use with –purge to estimate how many files will be affected and how much disk space will be
freed by a potential purge command.
May take a while for a
large number of files
72
73. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 73
Manual Database Log Purge
tfactl managelogs –show usage tfactl managelogs –show variation –older <n><m|h|d>
Use -gi to only
show grid
infrastructure
Use –database to only
show database
74. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 74
Manual Database Log Purge
tfactl managelogs –purge –older n<m|h|d> -dryrun tfactl managelogs –purge –older n<m|h|d>
Use –dryrun
for a “what if”
75. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Disk Usage Snapshots
• TFA will track disk usage and record snapshots to:
– tfa/repository/suptools/<node>/managelogs/usage_snapshot/
• Snapshot happens every 60 minutes, configurable with:
• Disk usage monitoring is ON by default, configurable with:
75
tfactl set diskUsageMonInterval=<minutes>
tfactl set diskUsageMon=<ON|OFF>
76. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
13 - Find events and execute
code based on them
76
77. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Find Events
-bash-4.1# tfactl events
Output from host : myserver70
INFO :0
ERROR :0
WARNING :0
Event Timeline:
No Events Found
Output from host : myserver71
INFO :0
ERROR :0
WARNING :0
Event Timeline:
No Events Found
77
78. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Find Events
Output from host : myserver69
INFO :2
ERROR :2
WARNING :0
Event Timeline:
[Oct/18/2018 02:38:25.000]: [db.ogg11204.ogg112041]: Incident details in:
/scratch/app/oradb/diag/rdbms/ogg11204/ogg112041/incident/incdir_102702/ogg112041_ora_5001_i102
702.trc
[Oct/18/2018 02:38:25.000]: [db.ogg11204.ogg112041]: ORA-00600: internal error code, arguments:
[ksprcvsp2], [1596993584], [], [], [], [], [], [], [], [], [], []
[Oct/18/2018 02:38:37.000]: [db.ogg11204.ogg112041]: Incident details in:
/scratch/app/oradb/diag/rdbms/ogg11204/ogg112041/incident/incdir_102703/ogg112041_ora_5001_i102
703.trc
[Oct/18/2018 02:38:37.000]: [db.ogg11204.ogg112041]: ORA-00600: internal error code, arguments:
[ktfbtgex-7], [1015817], [1024], [1015816], [], [], [], [], [], [], [], []
78
79. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Metadata search capability
• All metadata stored in the TFA index is searchable:
• Searching for all events for a database between certain dates:
79
tfactl search -showdatatypes|-json [json_details]
tfactl search -json
‘{
"data_type":"event",
"content":"oracle",
"database":"rac11g",
"from":“10/01/2018 00:00:00",
"to":"10/21/2018 00:00:00"
}’
80. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Metadata search capability
• Listing all index events:
• Listing all available datatypes:
80
tfactl search -json ‘{"data_type":"event"}’
tfactl search -showdatatypes
81. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
14 – Monitor multiple
logs
81
82. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
tail files
-bash-4.1# tfactl tail alert
Output from host : myserver69
------------------------------
==> /scratch/app/11.2.0.4/grid/log/myserver69/alertmyserver69.log <==
2018-11-25 23:28:22.532:
[ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with the mean cluster time. No
action has been taken as the Cluster Time Synchronization Service is running in observer mode.
2018-11-25 23:58:22.964:
[ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with the mean cluster time. No
action has been taken as the Cluster Time Synchronization Service is running in observer mode.
2018-11-26 00:28:23.395:
82
83. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
tail files
==> /scratch/app/oradb/diag/rdbms/apxcmupg/apxcmupg_2/trace/alert_apxcmupg_2.log <==
Sun Nov 25 06:00:00 2018
VKRM started with pid=82, OS id=4903
Sun Nov 25 06:00:02 2018
Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK"
Sun Nov 25 06:00:37 2018
End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK"
Sun Nov 25 23:00:28 2018
Thread 2 advanced to log sequence 759 (LGWR switch)
Current log# 3 seq# 759 mem# 0: +DATA/apxcmupg/onlinelog/group_3.289.917164707
Current log# 3 seq# 759 mem# 1: +FRA/apxcmupg/onlinelog/group_3.289.917164707
83
84. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
tail files
==> /scratch/app/oradb/diag/rdbms/ogg11204/ogg112041/trace/alert_ogg112041.log <==
Clearing Resource Manager plan via parameter
Sun Nov 25 05:59:59 2018
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Sun Nov 25 05:59:59 2018
Starting background process VKRM
Sun Nov 25 05:59:59 2018
VKRM started with pid=36, OS id=4901
Sun Nov 25 22:00:31 2018
Thread 1 advanced to log sequence 305 (LGWR switch)
Current log# 1 seq# 305 mem# 0: +DATA/ogg11204/redo01.log
84
85. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
tail files
==> /scratch/app/oragrid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log <==
Thu Nov 22 04:42:22 2018
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 2323] opening OCR file
Fri Nov 23 01:05:39 2018
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 16591] opening OCR file
Fri Nov 23 01:05:41 2018
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 16603] opening OCR file
Fri Nov 23 01:21:12 2018
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 1803] opening OCR file
Fri Nov 23 01:21:12 2018
NOTE: [ocrcheck.bin@myserver69 (TNS V1-V3) 1816] opening OCR file
85
86. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
15 - vi files with
wildcards
86
87. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
vi files
-bash-4.1# tfactl vi alert
2018-11-25 19:58:19.481:
[ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with the mean cluster time. No
action has been taken as the Cluster Time Synchronization Service is running in observer mode.
2018-11-25 20:28:19.911:
[ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with the mean cluster time. No
action has been taken as the Cluster Time Synchronization Service is running in observer mode.
2018-11-25 20:58:20.346:
[ctssd(5630)]CRS-2409:The clock on host myserver69 is not synchronous with the mean cluster time. No
action has been taken as the Cluster Time Synchronization Service is running in observer mode.
87
88. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
16 - Monitor Database
performance
88
89. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 89
oratop (Support Tools Bundle) - 1500864.1
Near Real-Time Database Monitoring
• Single instance & RAC
• Monitoring current database activities
• Database performance
• Identifying contentions and bottleneck
• Process & SQL Monitoring
• Real time wait events
• Active Data Guard support
• Multitenant Database (CDB) support
90. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
-bash-4.1# tfactl run oratop -database ogg19c
Monitor Database performance
90
91. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Monitor Database performance
91
• Section 1 DATABASE: Global
database information
• Section 2 INSTANCE:
Database instance Activity
• Section 3 EVENT: AWR like
“Top 5 Timed Events“
• Section 4 PROCESS | SQL:
Processes or SQL mode
information
92. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
17 - Analyze OS Metrics
92
93. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 93
OS Watcher (Support Tools Bundle)
Collect & Archive OS Metrics
• Executes standard UNIX utilities (e.g. vmstat, iostat, ps,
etc) on regular intervals
• Built in Analyzer functionality to summarize, graph and
report upon collected metrics
• Output is Required for node reboot and performance
issues
• Simple to install, extremely lightweight
• Runs on ALL platforms (Except Windows)
• MOS Note: 301137.1 – OS Watcher Users Guide
94. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Analyse OS Metrics
-bash-4.1# tfactl run oswbb
Starting OSW Analyzer V8.1.2
OSWatcher Analyzer Written by Oracle Center of
Expertise
Copyright (c) 2017 by Oracle Corporation
Parsing Data. Please Wait...
Scanning file headers for version and platform info...
Parsing file rws1270069_iostat_18.11.24.0900.dat ...
Parsing file rws1270069_iostat_18.11.24.1000.dat ...
……..
94
95. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Analyse OS Metrics
Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs
Enter GC to Generate All CPU Gif Files
Enter GM to Generate All Memory Gif Files
Enter GD to Generate All Disk Gif Files
Enter GN to Generate All Network Gif Files
Enter L to Specify Alternate Location of Gif Directory
Enter Z to Zoom Graph Time Scale (Does not change
analysis dataset)
Enter B to Returns to Baseline Graph Time Scale
(Does not change analysis dataset)
Enter R to Remove Currently Displayed Graphs
Enter X to Export Parsed Data to Flat File
Enter S to Analyze Subset of Data(Changes analysis
dataset including graph time scale)
Enter A to Analyze Data
Enter D to Generate DashBoard
Enter Q to Quit Program
Please Select an Option:1
95
96. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Analyse OS Metrics
96
myserver69
97. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Analyse OS Metrics
97
myserver69
98. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
18 - Diagnose cluster
health
98
99. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 99
Generates Diagnostic Metrics View of Cluster and Databases
Cluster Health Monitor (CHM)
GIMR
ologgerd
(master)
osysmond
osysmond
osysmond
osysmond
12c Grid Infrastructure
Management Repository
• Always on - Enabled by default
• Provides Detailed OS Resource Metrics
• Assists Node eviction analysis
• Locally logs all process data
• User can define pinned processes
• Listens to CSS and GIPC events
• Categorizes processes by type
• Supports plug-in collectors (ex.
traceroute, netstat, ping, etc.)
• New CSV output for ease of analysis
OS Data OS Data
OS Data
OS Data
Confidential – Oracle Internal/Restricted/Highly Restricted
100. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 100
Oclumon CLI or Full Integration with EM Cloud Control
Cluster Health Monitor (CHM)
Confidential – Oracle Internal/Restricted/Highly Restricted
101. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Advisor (CHA)*
Discovers Potential Cluster & DB Problems - Notifies with Corrective Actions
101
OS Data
GIMR
ochad
• Always on - Enabled by default
• Detects node and database
performance problems
• Provides early-warning alerts and
corrective action
• Supports on-site calibration to improve
sensitivity
• Integrated into EMCC Incident Manager
and notifications
• Standalone Interactive GUI Tool
DB Data
CHM
Node
Health
Prognostics
Engine
Database
Health
Prognostics
Engine
* Requires and Included with RAC or R1N License
102. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Calibrating CHA to your RAC deployment
102
Choosing a Data Set for Calibration – Defining “normal”
$ chactl query calibration –cluster –timeranges ‘start=2016-10-28 07:00:00,end=2016-10-28 13:00:00’
Cluster name : mycluster
Start time : 2016-10-28 07:00:00
End time : 2016-10-28 13:00:00
Total Samples : 11524
Percentage of filtered data : 100%
1) Disk read (ASM) (Mbyte/sec)
MEAN MEDIAN STDDEV MIN MAX
0.11 0.00 2.62 0.00 114.66
<25 <50 <75 <100 >=100
99.87% 0.08% 0.00% 0.02% 0.03%
2) Disk write (ASM) (Mbyte/sec)
MEAN MEDIAN STDDEV MIN MAX
0.01 0.00 0.15 0.00 6.77
<50 <100 <150 <200 >=200
100.00% 0.00% 0.00% 0.00% 0.00%
3) Disk throughput (ASM) (IO/sec)
MEAN MEDIAN STDDEV MIN MAX
2.20 0.00 31.17 0.00 1100.00
<5000 <10000 <15000 <20000 >=20000
100.00% 0.00% 0.00% 0.00% 0.00%
4) CPU utilization (total) (%)
MEAN MEDIAN STDDEV MIN MAX
9.62 9.30 7.95 1.80 77.90
<20 <40 <60 <80 >=80
92.67% 6.17% 1.11% 0.05% 0.00%
103. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Calibrating CHA to your RAC deployment
• Create and store the new model
$ chactl query calibrate cluster –model daytime –timeranges ‘start=2018-10-28 07:00:00,
end=2018-10-28 13:00:00’
• Begin using the new model
$ chactl monitor cluster –model daytime
• Confirm the new model is being used
$ chactl status –verbose
monitoring nodes svr01, svr02 using model daytime
monitoring database qoltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB
103
Creating a new CHA Model with CHACTL
104. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Advisor – Command Line Operations
104
Monitoring Your Databases and Nodes with CHACTL
Enable CHA monitoring on RAC database with optional model
$ chactl monitor database –db oltpacdb [-model model_name]
Enable CHA monitoring on RAC database with optional verbose
$ chactl status –verbose
monitoring nodes svr01, svr02 using model DEFAULT_CLUSTER
monitoring database oltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB
105. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
CHA Command Line Operations
105
Checking for Health Issues and Corrective Actions with CHACTL QUERY DIAGNOSIS
$ chactl query diagnosis -db oltpacdb -start "2016-10-28 01:52:50" -end "2016-10-28 03:19:15"
2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected]
2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected]
2016-10-28 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected]
2016-10-28 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected]
Problem: DB Control File IO Performance
Description: CHA has detected that reads or writes to the control files are slower than expected.
Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were
slow because of an increase in disk IO.
The slow control file reads and writes may have an impact on checkpoint and Log Writer (LGWR) performance.
Action: Separate the control files from other database files and move them to faster disks or Solid
State Devices.
Problem: DB Log File Switch
Description: CHA detected that database sessions are waiting longer than expected
for log switch completions.
Cause: The Cluster Health Advisor (CHA) detected high contention during log switches
because the redo log files were small and the redo logs switched frequently.
Action: Increase the size of the redo logs.
106. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Advisor – Command Line Operations
106
HTML Diagnostic Health Output Available (-html <file_name>)
107. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Diagnose cluster health
-bash-4.1# chactl query diagnosis -db oltpacdb -start "2018-11-26 02:52:50.0" -end "2018-11-26 03:19:15.0"
2018-11-26 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected]
2018-11-26 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected]
2018-11-26 02:52:15.0 Database oltpacdb DB CPU Utilization (oltpacdb_2) [detected]
2018-11-26 02:52:50.0 Database oltpacdb DB CPU Utilization (oltpacdb_1) [detected]
2018-11-26 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected]
2018-11-26 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected]
107
108. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
19 - Find if anything has
changed
108
109. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Has anything changed recently?
-bash-4.1# tfactl changes
Output from host : myserver69
------------------------------
[Oct/17/2018 04:54:15.397]: Parameter: fs.aio-nr: Value: 95488 => 97024
[Oct/17/2018 04:54:15.397]: Parameter: fs.inode-nr: Value: 764974 131561 => 740744 131259
[Oct/17/2018 04:54:15.397]: Parameter: kernel.pty.nr: Value: 2 => 1
[Oct/17/2018 04:54:15.397]: Parameter: kernel.random.entropy_avail: Value: 189 => 158
[Oct/17/2018 04:54:15.397]: Parameter: kernel.random.uuid: Value: 36269877-9bc9-40a3-82e0-
1619865096f2 => 7551c5e7-c59f-40fa-b55f-5bd170e8b1ab
[Oct/17/2018 05:46:15.397]: Parameter: fs.aio-nr: Value: 119680 => 122880
[Oct/17/2018 05:46:15.397]: Parameter: fs.inode-nr: Value: 1580316 810036 => 1562320 768555
[Oct/17/2018 05:46:15.397]: Parameter: kernel.pty.nr: Value: 19 => 18
[Oct/17/2018 05:46:15.397]: Parameter: kernel.random.uuid: Value: 37cc31aa-ee31-459e-8f2a-
0766b34b1b64 => f5176cdc-6390-415d-882e-02c4cff2ae4e
109
110. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Has anything changed recently?
Output from host : myserver70
------------------------------
[Oct/17/2018 04:54:15.397]: Parameter: fs.aio-nr: Value: 95488 => 97024
[Oct/17/2018 04:54:15.397]: Parameter: fs.inode-nr: Value: 764974 131561 => 740744 131259
[Oct/17/2018 04:54:15.397]: Parameter: kernel.pty.nr: Value: 2 => 1
[Oct/17/2018 04:54:15.397]: Parameter: kernel.random.entropy_avail: Value: 189 => 158
[Oct/17/2018 04:54:15.397]: Parameter: kernel.random.uuid: Value: 36269877-9bc9-40a3-82e0-
1619865096f2 => 7551c5e7-c59f-40fa-b55f-5bd170e8b1ab
[Oct/17/2018 05:46:15.397]: Parameter: fs.aio-nr: Value: 119680 => 122880
[Oct/17/2018 05:46:15.397]: Parameter: fs.inode-nr: Value: 1580316 810036 => 1562320 768555
[Oct/17/2018 05:46:15.397]: Parameter: kernel.pty.nr: Value: 19 => 18
[Oct/17/2018 05:46:15.397]: Parameter: kernel.random.uuid: Value: 37cc31aa-ee31-459e-8f2a-
0766b34b1b64 => f5176cdc-6390-415d-882e-02c4cff2ae4e
[Oct/17/2018 16:56:15.398]: Parameter: fs.aio-nr: Value: 97024 => 98560
110