This document discusses using the Autonomous Health Framework (AHF) to manage Exadata environments. AHF includes EXAchk for compliance checking and fault detection on Exadata. EXAchk can be run automatically or on-demand to check for compliance issues and potential problems. It integrates with tools like Enterprise Manager, MOS, and TFA to provide centralized reporting and issue resolution. The document provides instructions for installing and configuring AHF and EXAchk for optimal use.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
How to Use EXAchk Effectively to Manage Exadata Environments
1. VP AIOps for the Autonomous Database
Sandesh Rao
#DOAG2021
How to Use EXAchk Effectively to Manage
Exadata Environments
@sandeshr
https://www.linkedin.com/in/raosandesh/
https://www.slideshare.net/SandeshRao4
3. Install
OEDA Dec 2019+: AHF is already installed in your base image
Earlier versions either from RU or install from Doc 1070954.1
Installation and staying up to date
Exacloud ECS installer
One release behind latest
Release Updates
One release behind latest
MOS Download
Latest release
Doc 1070954.1
Upgrade Upgrade Upgrade
4. EXAchk
Automatic proactive warnings
before you’re impacted
Results viewable in the
tool of your choice
Regular emails with
check results
Compliance checks for most
impactful reoccurring problems
No need to send
anything to Oracle
REDUCE
YOUR RISK
5. Oracle Stack Coverage
Engineered Systems
Oracle Exadata Database Machine
Oracle SuperCluster
Oracle Private Cloud Appliance
Oracle Database Appliance
Oracle Big Data Appliance
Oracle Zero Data Loss Recovery
Appliance
Oracle ZFS Storage Appliance
Systems
Oracle Solaris
Cross stack checks
Solaris Cluster
OVN
ASR
Oracle Database
Autonomous Database
Standalone Database
Grid Infrastructure & RAC
Maximum Availability Architecture
(MAA) Scorecard
Upgrade Readiness Validation
Golden Gate
Application Continuity
Enterprise Manager Cloud
Control
Repository
Agent
OMS
Middleware
Oracle Identify and Access
Management Suite (Oracle IAM)
Oracle CRM
Oracle Project Billing
Siebel
Database best practices
PeopleSoft
Database best practices
SAP
EXAdata best practices
6. Performed as root user
On one BM database server or domU/VM in Virtual deployment in the cluster:
1. Unzip the latest AHF installer
2. Run the installer:
e.g.:
./ahf_setup -ahf_loc /opt -data_dir /u01/app/grid
AHF will upgrade any previous versions of EXAchk and TFA found on the system
Make sure you have enough room in /opt
Upgrading AHF on Exadata Bare Metal or Virtual DomU
unzip AHF-LINUX_<version>.zip
./ahf_setup -ahf_loc /opt -data_dir <ORACLE_BASE of Grid owner>
7. Performed as root user
On each database server Xen dom0/KVM Host in the cluster, perform the following steps:
1. Unzip the latest AHF installer
2. Run the installer:
AHF will upgrade any previous versions of EXAchk and TFA found on the system
After AHF is installed local to each database server, the TFA daemons will discover each other, typically
within 5 to 10 minutes
Make sure you have enough room in /opt
Upgrading Exadata Dom0 (Xen or KVM)
unzip AHF-LINUX_<version>.zip
./ahf_setup -ahf_loc /opt -silent -local -data_dir /opt
8. Performed as root user
On each database server in the cluster, perform the following steps:
1. Unzip the latest AHF installer
2. Run the installer:
AHF will upgrade any previous versions of EXAchk and TFA found on the system
After AHF is installed local to each database server, the TFA daemons will discover each other, typically
within 5 to 10 minutes
Make sure you have enough room in /opt
Upgrading AHF on Exadata Cloud@Customer
unzip AHF-LINUX_<version>.zip
./ahf_setup -ahf_loc /opt -silent -local -data_dir /u02
9. ORAchk / EXAchk Collection Manager Enterprise Manager ELK Stack
Architecture Options
Health
Checks
Run
Checks
HTML
Email
Oracle
Database
Many Instances One Instance One Instance One Instance
Oracle
Database
Elastic
Search
SQL
Results
XML
JSON
View enterprise-wide
results via Collection
Manager interface
View enterprise-wide
results via Enterprise
Manager interface
View enterprise-wide
results via Kibana
dashboards
AHF Service
One Instance
Object
Store
View enterprise-wide
results via AHF Service
UI
10. Building compliance with best practices
Development methodology
1
Idea
Reports from development, testing, support etc
2
Expert review
Weekly meetings to review and test
3
MOS Note 757552.1
Published Exadata best practices
4
Default deployment
Bake best practices back in to default deployment
5 ORAchk / EXAchk check
Generation of new checks
11. EXAchk
Run automatically and monitor the diffs.
In Virtualized Exadata, autoruns only on domU
AUTOMATED (recommended)
Run once a month, if in Virtualized Exadata, run
on dom0, cells and switches
ON-DEMAND
Run before and after configuration changes
CONFIGURATION
Run before and after any planned software and
hardware maintenance
MAINTENANCE
EXAchk compliance use cases
12. Enterprise Manager
EXAchk compliance checks are
integrated into the OEM Compliance
Check Framework Dashboards and
Compliance Standards via the
Engineered System plug-in
AHF
AHF EXAchk is integrated with other
Oracle Health Check and compliance
management software
Cluster Verification Utility
CVU checks are run:
• During full EXAchk runs
• In –profile preinstall
• In -preupgrade
AutoUpgrade Utility
AutoUpgrade utility checks are run:
• In -preupgrade
DBSAT
EXAchk is also integrated with DBSAT
• In –profile security
Integration
13. If you need support
-debug
Run on-demand
exachk
Limit checks
-profile
One or more of 40+
different component
focused check categories
-preupgrade
Helps you plan your
upgrade
-postupgrade
Helps confirm a successful
upgrade
Limit targets
-cells
-clusternodes
-ibswitches
-dbnames
Options
14. TFA scheduler used to run EXAchk….results in reduced process overheads
Critical checks automatically run once a day at 2am, can be changed with:
exachk –id exachk.autostart_client_exatier1 –set “AUTORUN_SCHEDULE=minute hour day month day_of_week”
Full checks run once a week at 3am Sunday, can be changed with:
exachk –id exachk.autostart_client –set “AUTORUN_SCHEDULE=minute hour day month day_of_week”
For example, to change Critical checks to run at 8am every Monday & Thursday use:
exachk –id exachk.autostart_client_exatier1 –set “AUTORUN_SCHEDULE=* 8 * * 1,4”
EXAchk now run via the TFA scheduler
TFA Scheduler EXAchk
15. 1. Proactive notification of compliance failures:
exachk -set “NOTIFICATION_EMAIL=SOME.BODY@COMPANY.COM
2. Fault notification:
tfactl set notificationAddress=some.body@example.com
3. Database specific fault notification:
tfactl set notificationAddress=<db_owner_name>:another.person@example.com
4. Optionally configure an SMTP server:
tfactl set smtp
5. Confirm email notification work:
tfactl sendmail <email_address>
Configure email notification
16. Store your MOS credentials securely in an encrypted wallet, ready for future upload:
tfactl setupload
–name mos_config
–type https
–url https://transport.oracle.com/upload/issue
-proxy www-proxy.acme.com:80
-user john.doe@acme.com
Configure MOS upload
17. Upload your collection to MOS to get help from Oracle support with a single command:
e.g:
tfactl upload –name mos_config –id 3-123456789 –file my_TFA_collection.zip
e.g:
exachk –name mos_config –id 3-123456789 –zipfile my_exachk_coll.zip
Upload to MOS
tfactl upload –name mos_config –id <sr-number> –file <file-name>
exachk –name mos_config –id <sr-number> –zipfile <file-name>
18. Run as root (recommended)
o ORAchk/EXAchk will su to lower privileged owners
of RDBMS or grid homes
o To specify a user other that root for these
situations:
Run as RDBMS or GRID Homeowner
o User must be able to switch to root for root level
checks – several options:
1. Provide the root userid password at prompts
or
2. Set up sudo
or
3. Pre-configure passwordless SSH connectivity
or
4. Allow ORAchk/EXAchk to configure private keys for
remote nodes
Which User to Run as
Connect via SSH
&
Run Checks on
Default User
Change User By exporting
user id in this Environment
Variable
Exadata Storage
Server
root RAT_CELL_SSH_USER
InfiniBand
switches
root
(when run as root)
RAT_IBSWITCH_USER
nm2user
(when run as other user)
Note: You may only choose from the provided lower privileged account
Note:
•On SuperCluster you can use Role Based Access Control (RBAC) to execute root
privileged checks, no root user required.
•root checks must be run as a user with a root equivalent access role
•On Exalogic it is only supported to run as root
19. EXAchk will:
1. Prompt for remote node password
2. Login to remote node and generate private and public key pair on remote node
3. Copy contents public key into the .ssh/authorized_keys file of remote node and delete the
public key from remote node
4. Copy private key of remote node into local node and use as identity file to make future
connections
Alternatively, you can provide the private key file yourself
Run:
E.g.:
This will generate the following key pair in the $HOME/.ssh/ directory:
• id_dsa.myhost67.root (private key / Identity file)
• id_dsa.myhost67.root.pub (Public key)
Remote node connection without passwordless SSH
ssh-keygen -f $HOME/.ssh/id_dsa.host.user -N ''
ssh-keygen -f $HOME/.ssh/id_dsa.myhost67.user -N ''
20. 1. TFA SECURE SOCKETS
Easier setup of ORAchk and EXAchk access to remote Database servers
Host 1
ORAchk /
EXAchk
TFA
Host 2
ORAchk /
EXAchk
Secure
socket
TFA
Host 3
ORAchk /
EXAchk
Secure
socket
TFA
Used by EXAchk instead
of passwordless SSH
21. Subsequent emails compare results to previous run
• Easily see if something has changed
• Email attachment has:
o Latest report
o Previous report
o Diff Report
Email Notification
24. • ORAchk_Health_Check_Catalog.html
• EXAchk_Health_Check_Catalog.html
• Contains all published checks
• Filterable & searchable
• Product Area / Engineered System
• Profiles
• Alert Level
• Release Check Authored
• Platforms
• Privileged User
• Look up check id without running report
Health Check Catalog
25. 1. Checks run against all database nodes in the cluster by default
oTo specify only a subset of nodes use:
oOnly local node:
2. Automatically discovers all databases and prompts for which should be checked
oDo not prompt but run all checks on all discovered database:
oDo not prompt and skip all database related checks:
oOnly run checks against a subset of databases:
oOnly run checks against a subset of PDBs:
Database Checks
–clusternodes <node_1>,<node_2>
–localonly
–dball
–dbnone
–dbnames <db_1>,<db_2>
–pdbnames <pdb_1>,<pdb_2>
26. Easier to stay up to date with Cluster Verification Utility checks
ORAchk
EXAchk
CVU
CVU
ORAchk
EXAchk
FULLY INTEGRATED
ORAchk/EXAchk will verify you
have a relevant CVU version
and if not, help in downloading
one
CVU checks are run by default
when you run ORAchk or
EXAchk
No CVU checks: -excludecvu
Only CVU checks: -cvuonly
27. Temporary files will be created during execution
Default location is $HOME
Location can be changed by setting RAT_TMPDIR
If using sudo access to root from a lower privileged user id, temporary directory must be reflected in
/etc/sudoers file
Root privilege checks run from root_orachk.sh or root_exachk.sh
• If you want the root script in a different directory to RAT_TMPDIR use: RAT_ROOT_SH_DIR
Temporary Working Directory
export RAT_TMPDIR=<TEMP_DIR>
<user> ALL=(root) NOPASSWD:<TEMPDIR>/root_[orachk|exachk].sh
oracle ALL=(root) NOPASSWD:/mylocation/root_exachk.sh
oracle ALL=(root) NOPASSWD:/tmp/root_orachk.sh
export RAT_ROOT_SH_DIR=/mylocation
28. Database collections are executed in parallel if possible
The default number of slave processes used is calculated automatically
Default can be changed with –dbparallel <# slave processes> or -dbparallelmax
Parallel execution can be disabled altogether if required with -dbserial
Parallel Execution
–dbparallel <# slave processes>
–dbparallelmax
–dbserial
29. Collections are typically of the format:
[orachk|exachk]_<dbserver>_<database>_<date>_<timestamp>.html
Tag collections so output contains another word to help differentiate it:
[orachk|exachk]_<dbserver>_<database>_<date>_<timestamp>_<tag_name>.html
Merge multiple reports into one with –merge and list of collection directories or zip
files:
Compare collections with –diff:
Tagging, Merging & Comparing Reports
–merge <collection_1>,<collection_2>
–diff <collection_1>,<collection_2>
–tag <tag_name>
30. 1. Profiles provide logical grouping of checks which are about similar topics
• Run only checks in a specific profile
• Run everything except checks in a specific profile
Profiles
–profile <profile>
–excludeprofile <profile>
31. Create user defined profiles by providing a comma separated list of check ids:
Once a user defined profile has been created, it can be modified:
• This list of check_ids can contain both new checks to be added and existing checks to be removed,
ORAchk/EXAchk will add/remove as necessary
Delete a user defined profile:
User defined profiles
-createprofile <profile_name> <check_ids>
-modifyprofile <profile_name> <check_ids>
-deleteprofile <profile_name>
32. Granular control to execute or exclude a single check
Ideal for testing new checks or troubleshooting
Run only specific check(s):
Exclude a specific check:
Find check id either from report or Health Check Catalog
Run or exclude individual checks
-check <check_id_1>,<check_id_2>
–excludecheck <check_id_1>,<check_id_2>
33. 1. Generate a health check report
2. Fix the issues identified
3. Generate another health check report verifying only the issues that failed before
Only Run Checks that Previously Failed
-failedchecks <previous_result>
34. • Track changes to the attributes of important files with –fileattr
– Looks at all files & directories within Grid Infrastructure and Database homes by default
– The list of monitored directories and their contents can be configured to your specific
requirements
– Use –fileattr start to start the first snapshot
Keep Track of Changes to the Attributes of Important Files
$ ./orachk -fileattr start
CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to
/u01/app/11.2.0.4/grid?[y/n][y]
Checking ssh user equivalency settings on all nodes in cluster
Node mysrv22 is configured for ssh user equivalency for oradb user
Node mysrv23 is configured for ssh user equivalency for oradb user
List of directories(recursive) for checking file attributes:
/u01/app/oradb/product/11.2.0/dbhome_11203
/u01/app/oradb/product/11.2.0/dbhome_11204
orachk has taken snapshot of file attributes for above directories at:
/orahome/oradb/orachk/orachk_mysrv21_20170504_041214
./orachk –fileattr start
35. Include other directories with –includedir <directories> using a comma separated list of directories
Keep Track of Changes to the Attributes of Important Files
./orachk –fileattr start includedir “/home/oradb,/etc/oatab”
• Exclude the default discovered directories with –excludediscovery
./orachk –fileattr start includedir “/home/oradb,/etc/oatab” -excludediscovery
36. Note:
• Use the same arguments with check that you used with start
• Will proceed to perform standard health checks after attribute checking
• File Attribute Changes will also show in HTML report output
Compare current attributes against first snapshot using –fileattr check
Keep Track of Changes to the Attributes of Important Files
$ ./orachk -fileattr check -includedir "/root/myapp/config" -excludediscovery
CRS stack is running and CRS_HOME is not set. Do you want to set CRS_HOME to
/u01/app/12.2.0/grid?[y/n][y]
Checking for prompts on myserver18 for oragrid user...
Checking ssh user equivalency settings on all nodes in cluster
Node myserver17 is configured for ssh user equivalency for root user
List of directories(recursive) for checking file attributes:
/root/myapp/config
Checking file attribute changes...
.
"/root/myapp/config/myappconfig.xml" is different:
Baseline : 0644 oracle root /root/myapp/config/myappconfig.xml
Current : 0644 root root /root/myapp/config/myappconfig.xml
…etc
…etc
./orachk –fileattr check
• Results of snapshot comparison will also
be shown in the HTML report output
37. To prevent standard health checking after attribute checking add –fileattronly:
To use a different snapshot baseline use –baseline:
To remove all snapshot use –fileattr remove
Keep Track of Changes to the Attributes of Important Files
–fileattr check –fileattronly
-fileattr check -baseline <snapshot>
-fileattr remove
38. ORAchk and EXAchk can encrypt the resulting collection zip file
To use encryption add the option -encryptzip: e.g.
• This will prompt for the password
• Once the zip file is encrypted, the original zip and directory will be deleted
To decrypt a zip use:
Encrypted resulting zip file
–profile dba -encryptzip
–decryptzip <zip_filename>
The encrypt/decrypt feature is only supported on Linux and Solaris platforms.
39. ORAchk and EXAchk include full REST support, allowing invocation & query over HTTPS
Oracle REST Data Services (ORDS) is included within the install
To enable REST:
1. Start ORDS:
2. Start the daemon, using the -ords option:
Start a full health check run by accessing the URL: https://<host>:7080/ords/tfaml/orachk/start_client
Run specific profiles: https://<host>:7080/ords/tfaml/orachk/profile/<profile1>,<profile2>
Run specific checks: https://<host>:7080/ords/tfaml/orachk/check/<check_id>,<check_id>
Any request will return a job id, which can then be used to query:
• Status: https://<host>:7080/ords/tfaml/orachk/status/<job_id>
• Download result: https://<host>:7080/ords/tfaml/orachk/download/<job_id>
REST Interface
–ordssetup
-d start -ords
The standalone ORDS setup feature utilizes file-based user authentication and is provided solely for use in test and development environments.
For production use, the included orachk.jar and ords.war should be deployed and configured.
45. No difference OR No regression
failed in current collection
At least one regression from
Non-WARNING to WARNING OR
Found WARNING regression in
current collection
At least one regression from
Non-FAIL to FAIL OR Found FAIL
regression in current collection
Non clickable green flag -
Preceding collection not found
Recent Collections
Health
Score Warning count
Fail count Info count Pass count
Ignore count
48. User Defined Checks
• Use as a Health Checking Platform
• You write your own business
specific User Defined Checks
• Collection Manager authoring UI
very similar to Oracle’s internal
authoring tool
• OS or SQL logic
• Generates user_defined_checks.xml
sample in install directory
• Utilizes framework features such as
result output, email notification,
CM storage etc
49. User Defined Checks
• Have their own profile:
user_defined_checks
• Can be excluded:
-excludeprofile user_defined_checks
• Have their own section of the report
-profile user_defined_checks
• Can be run on their own:
• Can have customized check names, pass and fail messages:
<existing_check_code>
echo "CUSTOM_CHECK_NAME=<customized_check_name>" >> CUSTOMIZE_CHECK_PARAMS
echo "CUSTOM_PASS_MSG=<customized_pass_message>" >> CUSTOMIZE_CHECK_PARAMS
echo "CUSTOM_FAIL_MSG=<customized_fail_message>" >> CUSTOMIZE_CHECK_PARAMS
50. 1. First time installation done via the APEX
workspace (5.x or higher)
2. Use the sql script applicable for your APEX
version:
• Eg APEX 5.x: Apex5_CollectionManager_App.sql
3. Follow Health Check Collection Manager
installation in the User Guide
4. Login to Collection Manager Application
via a URL like the following:
http://hostname:port/apex/f?p=ApplicationID
http://hostname:port/pls/apex/f?p=ApplicationID
Collection Manager upgrade done from
orachk / exachk:
Will determine the APEX version you
have and install the latest applicable
Collection Manager app
If the Collection Manager schema
changes in the future then ORAchk will
prompt for auto upgrade
Setup
-cmupgrade
51. • Collection zip files are stored in the RCA13_DOCS table - already created during collection manager installation
• Provide ORAchk details of where to upload collection results with –setdbupload all and complete prompts:
• Get current values with -getdbupload:
• Unset values with –unsetdbupload <parameter>:
Collection Storage Table
-setdbupload all
–unsetdbupload RAT_UPLOAD_PASSWORD
-getdbupload
52. • Set all with:
• Set specific variables by specifying comma separated list:
• Unset all with
• Check if variables are set correctly:
Store DB Upload Variables in Wallet
-setdbupload all
-setdbupload RAT_UPLOAD_CONNECT_STRING,RAT_UPLOAD_PASSWORD
-unsetdbupload all
-checkdbupload
Other Upload Parameters Not Set
by default
Description
RAT_UPLOAD_USER The user to connect as (default is ORACHKCM)
RAT_UPLOAD_TABLE The table name to store non-zipped collection
results
RAT_PATCH_UPLOAD_TABLE The table name to store non-zipped patch results
RAT_UPLOAD_ORACLE_HOME The ORACLE_HOME used during establishing
connection and uploading.
(Uses GI HOME discovered by ORAchk by default)
RAT_UPLOAD_TABLE &
RAT_PATCH_UPLOAD_TABLE
Only needed if you are using your own custom
application to view collection results, rather than
Collection Manager.
53. Enterprise Manager Integration
•Check results integrated into EM compliance framework via plugin
•View results in native EM compliance dashboards
•Related checks grouped into compliance
standards
•View targets checked, violations & average score
•Drill down into compliance standard to see
individual check results
•View break down by target
54. After selected this will launch the provisioning
wizard, choose the system type
Use Enterprise Manager provisioning feature and
select ORAchk/EXAchk
Provision
55. Drill into applicable standard and
view individual checks & target
status
View Results by Compliance Standard
Filter by Exachk%”
Click individual checks for
recommendation details
56. JSON Output to Integrate with Kibana, Elastic Search etc
• The JSON provides many tags to allow
dashboard filtering based on facts such as:
• Engineered System type
• Engineered System version
• Hardware type
• Node name
• OS version
• Rack identifier
• Rack type
• Database version
• And more...
• Kibana can be used to view health check
compliance across your data center
• Results can also be filtered based on any
combination of exposed system attributes
57. Results are also output in JSON format in the upload
directory of the collection
Writing JSON Results With syslog
1. JSON output results can be sent to the syslogd
Daemon with –syslog option e.g.:
2. Message levels used of “crit”, “err”, “warn” and
“info”
3. You can verify syslog configuration by running
the following commands:
4. Then verify in your configured message location
(e.g. /var/adm/messages) that each test
message was written
JSON Result Output
–set “AUTORUN_FLAGS=-syslog”
58. Sensitive information can be hidden from diagnostics
Machine learning algorithms determine sensitive data like:
• Host names
• IP addresses
• MAC addresses
• Oracle Database names
• Tablespace names
• Service names
• Ports
• Operating system user names
Sanitize or mask sensitive information
59. Add –sanitize or –mask to any command
• –sanitize replaces a sensitive value with random characters
• myhost123 >>>> JnsF3km9
• –mask replaces a sensitive value with a series of ‘X’
• myhost123 >>>> XXXXXXXX
Sanitize or mask sensitive information
65. Understand what the repair command will do with:
Understand what the repair command does
tfactl orachk -showrepair 8300E0A2FFE48253E053D298EB0A76CC
TFA using ORAchk : /opt/oracle.ahf/orachk/orachk
Repair Command:
currentUserName=$(whoami)
if [ "$currentUserName" = "root" ]
then
repair_report=$(rpm -e stix-fonts 2>&1)
else
repair_report="$currentUserName does not have priviedges to run
$CRS_HOME/bin/crsctl set resource use 1"
fi
echo -e "$repair_report"
66. Run the checks again and repair everything that fails
Run the checks again and repair only the specified checks
Run the checks again and repair all checks listed in the file
Run the repair command
tfactl orachk -repaircheck all
tfactl orachk -repaircheck <check_id_1>,<check_id_2>
tfactl orachk -repaircheck <file>
67. ORAchk and EXAchk local language support
INTERNATIONALIZATION
of
MESSAGES & REPORTS
Spanish (es)
German (de)
French (fr)
Italian (it)
Japanese (ja)
Korean (ko)
Portuguese-Brazil (pt_BR)
Simplified Chinese (zh_CN)
Traditional Chinese (zh_TW)
export RAT_LANG=es
orachk
export RAT_LANG={value}
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80. Thank You
Any Questions ?
Sandesh Rao
VP AIOps for the Autonomous Database
@sandeshr
https://www.linkedin.com/in/raosandesh/
https://www.slideshare.net/SandeshRao4