Beyond Metrics – Oracle AHF Insights for
Proactive Database Management
Sandesh Rao
VP AIOps and Machine Learning , Autonomous Database
@sandeshr
https://www.linkedin.com/in/raosandesh/
https://www.slideshare.net/SandeshRao4
Confidential - Oracle Restricted
Confidential - Oracle Restricted
AHF Stack (on a database system)
Confidential - Oracle Restricted
Confidential - Oracle Restricted
▪ Purpose
▪ Provides bird's eye view of the system from diagnostic perspective.
▪ Offers insights for effective issue resolution with guidance & co-relation.
▪ Unifies the AHF stack under a single user interface
▪ Usage
▪ Reactive, Proactive.
▪ Target Users
▪ Customers, Operations, Support, Development
▪ Collected by Default in AHF Collections from AHF 24.2
▪ To generate AHF Insights, run :
AHF Insights
AHF Insights Overview - Demo
Confidential - Oracle Restricted
Confidential - Oracle Restricted
AHF Insights (over command-line)
Confidential - Oracle Restricted
Confidential - Oracle Restricted
AHF Insights as part of AHF collection
Confidential - Oracle Restricted
Confidential - Oracle Restricted
AHF Insights as part of AHF collection
Confidential - Oracle Restricted
Confidential - Oracle Restricted
AHF Insights Overview
Need configurations details while
troubleshooting an issue
Confidential - Oracle Restricted
Confidential - Oracle Restricted
▪ BareMetal system (DomU has access to query storage server and fabric switch details)
▪ DomU has limited access
Configuration Details
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Cluster Details
Confidential - Oracle Restricted
Confidential - Oracle Restricted
ASM Details
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Database Details
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Database Server Configuration
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Database Server Configuration
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Database Parameters
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Kernel Parameters
Troubleshoot :
Issue due to inconsistent RPMs
Confidential - Oracle Restricted
Confidential - Oracle Restricted
▪ Example Scenario - Inconsistent glibc Package Versions
▪ Issue - Node Eviction and Clusterware Instability
▪ Example
▪ In a four-node Oracle RAC cluster, nodes 1 and 2 have glibc version 2.17-307 installed, while nodes 3 and 4
have glibc version 2.17-307.el7.1 installed. This discrepancy can cause several problems.
▪ Impact
▪ Node Eviction - Due to the different versions of glibc, nodes 3 and 4 might face eviction as the clusterware
detects inconsistencies in the environment.
▪ Clusterware Instability - The inconsistency in glibc can cause instability in Oracle Clusterware, leading to
startup failures and communication errors.
Issue due to inconsistent RPMs
Confidential - Oracle Restricted
Confidential - Oracle Restricted
RPMS & Inconsistencies
Troubleshoot :
Issue due to software version lower than
MAA Software Recommendations
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Recommended Software
All software should be updated regularly. Maintaining software at current or recent releases provides the following
benefits:
▪ Better software security
▪ More stable maintenance releases
▪ Continued compatibility with newer related software
▪ Better support and faster resolution of issues
▪ Ability to receive fixes for newly discovered issues.
Troubleshoot :
Issue due to recent changes on the system
Confidential - Oracle Restricted
Confidential - Oracle Restricted
▪ Issue due to Application of new Patch
▪ Issue due to Changes on ASM / Database parameter
▪ Issue due to New OS package installed
▪ Issue due to New Oracle Software installed
Issue due to recent changes on the system
Confidential - Oracle Restricted
Confidential - Oracle Restricted
System Changes
Troubleshoot :
Space Usage Issues
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Space Usage Issues
Troubleshoot :
Issue due to Best Practice Violations
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Best Practice Violations
Troubleshoot :
Major Events happening across the cluster
Confidential - Oracle Restricted
Confidential - Oracle Restricted
▪ For troubleshooting one needs to know :
▪ What type of system does user have ?
▪ What’s going on around the time of issue ?
▪ Can I get a full picture across all nodes ?
▪ Can I zoom into specific timeframe ?
▪ Can I look at the data from various perspectives ?
Customer Complains of “Grid failure - CRS-8503 []” in SR
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Customer’s System around the time of Issue
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Major Events around the time of issue
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Major Events around the time of issue
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Major Events around the time of issue
Troubleshoot :
Operating System Issues
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Customer’s System undergoes Node eviction
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Customer’s System undergoes Node eviction
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Customer’s System undergoes Node eviction
High Memory Pressure
Increase in RSS consumption
by ‘extract’ process
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Customer’s System undergoes Node eviction
50GB RSS hogged by extract process
Troubleshoot :
Database Anomalies
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Database Anomalies as observed by Cluster Health Advisor
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Database Anomalies as observed by Cluster Health Advisor
Troubleshoot :
Node Eviction Problems
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Node eviction due to Huge Page over-allocation
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Node eviction due to Huge Page over-allocation
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Node eviction due to Huge Page over-allocation
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Node eviction due to Huge Page over-allocation
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Node eviction due to Huge Page over-allocation
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Node eviction due to Huge Page over-allocation
Confidential - Oracle Restricted
Confidential - Oracle Restricted
Node eviction due to Huge Page over-allocation
Confidential - Oracle Restricted
Confidential - Oracle Restricted
✓ Insights in Diagnostic Collections by Default
✓ All Manual collection and subset of SRDCs
✓ Ability to schedule automatic AHF Insights generation
✓ Support Insights for Longer Time Ranges
✓Intelligent OS Resolution reduction
✓ Database Anomalies Advisor (New Section)
✓ Space Usage (New Section)
✓ Support for Single Instance Systems
✓ Augment Storage Cell information from Exawatcher Data
✓ Data Guard support in Insights (New Section)
✓ Detected Problem in Insights (New Section)
✓ Insights Co-relation Engine
✓ CHA Bayesian Network
• Performance Reports in Insights (New Section)
AHF Insights – New Features
✓ Problem Summary - Guided Resolutions
✓Node Evictions & Performance Issues Are Easier to
Resolve
✓Memory exhaustion due to
✓HugePages are over allocated
✓Database or Grid Infrastructure process increasing
memory usage
✓New Database started
✓Multipath disk failures
✓Hangs and performance issues caused by
✓Archiver stuck
✓Latch contention due to misconfigured target_pdbs
parameter
Confidential - Oracle Restricted
Confidential - Oracle Restricted
• Improvised Operating system reporting
• More space to explore the findings with a full widescreen view.
• View the co-related event information in a subplot within the Summary Timeline Gantt Chart
• View metrics associated with the problem finding in a visual format
• Easy to jump to appropriate problem section by interacting with charts
• Easier to Diagnose Problems with Disks or OS Processes
• Improvised Timeline
• Additional Timeline Views (Database Faceted, Component Faceted)
• Timeline Includes Patch Information
• Timeline chart and table now dynamically adjust time ranges.
• Performance improvement
• Report’s browser load times
• Insights Report generation time along with Diagnostic Collections
• Deep linking into individual sections of Insights report
• Copy Diagnostic Information from AHF Insights as Plain Text from more sections
• Insights Accessibility Improvements
AHF Insights – User Experience Improvements
Beyond Metrics – Oracle AHF Insights for Proactive Database Management - DOAG 2024

Beyond Metrics – Oracle AHF Insights for Proactive Database Management - DOAG 2024

  • 1.
    Beyond Metrics –Oracle AHF Insights for Proactive Database Management Sandesh Rao VP AIOps and Machine Learning , Autonomous Database @sandeshr https://www.linkedin.com/in/raosandesh/ https://www.slideshare.net/SandeshRao4
  • 2.
    Confidential - OracleRestricted Confidential - Oracle Restricted AHF Stack (on a database system)
  • 3.
    Confidential - OracleRestricted Confidential - Oracle Restricted ▪ Purpose ▪ Provides bird's eye view of the system from diagnostic perspective. ▪ Offers insights for effective issue resolution with guidance & co-relation. ▪ Unifies the AHF stack under a single user interface ▪ Usage ▪ Reactive, Proactive. ▪ Target Users ▪ Customers, Operations, Support, Development ▪ Collected by Default in AHF Collections from AHF 24.2 ▪ To generate AHF Insights, run : AHF Insights
  • 4.
  • 5.
    Confidential - OracleRestricted Confidential - Oracle Restricted AHF Insights (over command-line)
  • 6.
    Confidential - OracleRestricted Confidential - Oracle Restricted AHF Insights as part of AHF collection
  • 7.
    Confidential - OracleRestricted Confidential - Oracle Restricted AHF Insights as part of AHF collection
  • 8.
    Confidential - OracleRestricted Confidential - Oracle Restricted AHF Insights Overview
  • 9.
    Need configurations detailswhile troubleshooting an issue
  • 10.
    Confidential - OracleRestricted Confidential - Oracle Restricted ▪ BareMetal system (DomU has access to query storage server and fabric switch details) ▪ DomU has limited access Configuration Details
  • 11.
    Confidential - OracleRestricted Confidential - Oracle Restricted Cluster Details
  • 12.
    Confidential - OracleRestricted Confidential - Oracle Restricted ASM Details
  • 13.
    Confidential - OracleRestricted Confidential - Oracle Restricted Database Details
  • 14.
    Confidential - OracleRestricted Confidential - Oracle Restricted Database Server Configuration
  • 15.
    Confidential - OracleRestricted Confidential - Oracle Restricted Database Server Configuration
  • 16.
    Confidential - OracleRestricted Confidential - Oracle Restricted Database Parameters
  • 17.
    Confidential - OracleRestricted Confidential - Oracle Restricted Kernel Parameters
  • 18.
    Troubleshoot : Issue dueto inconsistent RPMs
  • 19.
    Confidential - OracleRestricted Confidential - Oracle Restricted ▪ Example Scenario - Inconsistent glibc Package Versions ▪ Issue - Node Eviction and Clusterware Instability ▪ Example ▪ In a four-node Oracle RAC cluster, nodes 1 and 2 have glibc version 2.17-307 installed, while nodes 3 and 4 have glibc version 2.17-307.el7.1 installed. This discrepancy can cause several problems. ▪ Impact ▪ Node Eviction - Due to the different versions of glibc, nodes 3 and 4 might face eviction as the clusterware detects inconsistencies in the environment. ▪ Clusterware Instability - The inconsistency in glibc can cause instability in Oracle Clusterware, leading to startup failures and communication errors. Issue due to inconsistent RPMs
  • 20.
    Confidential - OracleRestricted Confidential - Oracle Restricted RPMS & Inconsistencies
  • 21.
    Troubleshoot : Issue dueto software version lower than MAA Software Recommendations
  • 22.
    Confidential - OracleRestricted Confidential - Oracle Restricted Recommended Software All software should be updated regularly. Maintaining software at current or recent releases provides the following benefits: ▪ Better software security ▪ More stable maintenance releases ▪ Continued compatibility with newer related software ▪ Better support and faster resolution of issues ▪ Ability to receive fixes for newly discovered issues.
  • 23.
    Troubleshoot : Issue dueto recent changes on the system
  • 24.
    Confidential - OracleRestricted Confidential - Oracle Restricted ▪ Issue due to Application of new Patch ▪ Issue due to Changes on ASM / Database parameter ▪ Issue due to New OS package installed ▪ Issue due to New Oracle Software installed Issue due to recent changes on the system
  • 25.
    Confidential - OracleRestricted Confidential - Oracle Restricted System Changes
  • 26.
  • 27.
    Confidential - OracleRestricted Confidential - Oracle Restricted Space Usage Issues
  • 28.
    Troubleshoot : Issue dueto Best Practice Violations
  • 29.
    Confidential - OracleRestricted Confidential - Oracle Restricted Best Practice Violations
  • 30.
    Troubleshoot : Major Eventshappening across the cluster
  • 31.
    Confidential - OracleRestricted Confidential - Oracle Restricted ▪ For troubleshooting one needs to know : ▪ What type of system does user have ? ▪ What’s going on around the time of issue ? ▪ Can I get a full picture across all nodes ? ▪ Can I zoom into specific timeframe ? ▪ Can I look at the data from various perspectives ? Customer Complains of “Grid failure - CRS-8503 []” in SR
  • 32.
    Confidential - OracleRestricted Confidential - Oracle Restricted Customer’s System around the time of Issue
  • 33.
    Confidential - OracleRestricted Confidential - Oracle Restricted Major Events around the time of issue
  • 34.
    Confidential - OracleRestricted Confidential - Oracle Restricted Major Events around the time of issue
  • 35.
    Confidential - OracleRestricted Confidential - Oracle Restricted Major Events around the time of issue
  • 36.
  • 37.
    Confidential - OracleRestricted Confidential - Oracle Restricted Customer’s System undergoes Node eviction
  • 38.
    Confidential - OracleRestricted Confidential - Oracle Restricted Customer’s System undergoes Node eviction
  • 39.
    Confidential - OracleRestricted Confidential - Oracle Restricted Customer’s System undergoes Node eviction High Memory Pressure Increase in RSS consumption by ‘extract’ process
  • 40.
    Confidential - OracleRestricted Confidential - Oracle Restricted Customer’s System undergoes Node eviction 50GB RSS hogged by extract process
  • 41.
  • 42.
    Confidential - OracleRestricted Confidential - Oracle Restricted Database Anomalies as observed by Cluster Health Advisor
  • 43.
    Confidential - OracleRestricted Confidential - Oracle Restricted Database Anomalies as observed by Cluster Health Advisor
  • 44.
  • 45.
    Confidential - OracleRestricted Confidential - Oracle Restricted Node eviction due to Huge Page over-allocation
  • 46.
    Confidential - OracleRestricted Confidential - Oracle Restricted Node eviction due to Huge Page over-allocation
  • 47.
    Confidential - OracleRestricted Confidential - Oracle Restricted Node eviction due to Huge Page over-allocation
  • 48.
    Confidential - OracleRestricted Confidential - Oracle Restricted Node eviction due to Huge Page over-allocation
  • 49.
    Confidential - OracleRestricted Confidential - Oracle Restricted Node eviction due to Huge Page over-allocation
  • 50.
    Confidential - OracleRestricted Confidential - Oracle Restricted Node eviction due to Huge Page over-allocation
  • 51.
    Confidential - OracleRestricted Confidential - Oracle Restricted Node eviction due to Huge Page over-allocation
  • 52.
    Confidential - OracleRestricted Confidential - Oracle Restricted ✓ Insights in Diagnostic Collections by Default ✓ All Manual collection and subset of SRDCs ✓ Ability to schedule automatic AHF Insights generation ✓ Support Insights for Longer Time Ranges ✓Intelligent OS Resolution reduction ✓ Database Anomalies Advisor (New Section) ✓ Space Usage (New Section) ✓ Support for Single Instance Systems ✓ Augment Storage Cell information from Exawatcher Data ✓ Data Guard support in Insights (New Section) ✓ Detected Problem in Insights (New Section) ✓ Insights Co-relation Engine ✓ CHA Bayesian Network • Performance Reports in Insights (New Section) AHF Insights – New Features ✓ Problem Summary - Guided Resolutions ✓Node Evictions & Performance Issues Are Easier to Resolve ✓Memory exhaustion due to ✓HugePages are over allocated ✓Database or Grid Infrastructure process increasing memory usage ✓New Database started ✓Multipath disk failures ✓Hangs and performance issues caused by ✓Archiver stuck ✓Latch contention due to misconfigured target_pdbs parameter
  • 53.
    Confidential - OracleRestricted Confidential - Oracle Restricted • Improvised Operating system reporting • More space to explore the findings with a full widescreen view. • View the co-related event information in a subplot within the Summary Timeline Gantt Chart • View metrics associated with the problem finding in a visual format • Easy to jump to appropriate problem section by interacting with charts • Easier to Diagnose Problems with Disks or OS Processes • Improvised Timeline • Additional Timeline Views (Database Faceted, Component Faceted) • Timeline Includes Patch Information • Timeline chart and table now dynamically adjust time ranges. • Performance improvement • Report’s browser load times • Insights Report generation time along with Diagnostic Collections • Deep linking into individual sections of Insights report • Copy Diagnostic Information from AHF Insights as Plain Text from more sections • Insights Accessibility Improvements AHF Insights – User Experience Improvements