• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Oracle Clusterware and Private Network Considerations - Practical Performance Management for Oracle RAC
 

Oracle Clusterware and Private Network Considerations - Practical Performance Management for Oracle RAC

on

  • 7,063 views

 

Statistics

Views

Total Views
7,063
Views on SlideShare
7,035
Embed Views
28

Actions

Likes
10
Downloads
21
Comments
0

5 Embeds 28

http://www.techgig.com 9
http://www.linkedin.com 7
http://www.slideshare.net 5
https://www.linkedin.com 4
http://doubleutalks.blogspot.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Oracle Clusterware and Private Network Considerations - Practical Performance Management for Oracle RAC Oracle Clusterware and Private Network Considerations - Practical Performance Management for Oracle RAC Presentation Transcript

    • Oracle Clusterware and Private Network Considerations- Practical Performance Management for Oracle RAC
      November 12, 2009
      1
      Guenadi Nedkov Jilevski
    • Agenda
      Oracle RAC Fundamentals and Infrastructure.
      Analysis of Cache fusion Impact on RAC.
      Private Interconnect Considerations.
      Aggregation.
      Common known Problems and Symptoms - from cache fusion wait events and statistics.
      Diagnostics and Problem troubleshooting.
      Q and A
      November 12, 2009
      2
    • Oracle RAC Fundamentals and infrastructure
      Oracle RAC Architecture
      November 12, 2009
      3
    • Oracle rac fundamentals and infrastructure
      Function and Processes of Global Enqueue Services (GES) and Global Cache Services (GCS)
      November 12, 2009
      4
    • ORACLE rac FUNDAMENTAL And INFRASTRUCTURE
      Global Buffer Cache
      November 12, 2009
      5
    • Analyzing Cache fusion impact in rac
      The cost of block access and cache coherency is represented by:
      Global Cache services statistics
      Global Cache Services wait events
      The response time for cache fusion transfers is determined by:
      Overhead by the physical interconnect components
      IPC protocol
      GCS protocol
      The response time is not generally affected by disk I/O factors except for the occasional log write done when sending a dirty buffer to another instance in a write-read or write-write situation
      November 12, 2009
      6
    • Analyzing cache fusion impact on rac
      Typical Latencies for RAC Operations
      November 12, 2009
      7
      • CR block request time = build time + flush time + send time
      • Current block request time = pin time + flash time + send time
      • Latencies from V$SYSSTAT
      • Other Latencies may be seen in V$SEG_STATISTICS
    • Analyzing cache fusion impact on RAC
      Wait Events for RAC
      Wait events help to analyze what sessions are waiting for.
      Wait times are attributed to events that reflect the outcome of a request:
      Placeholders while waiting – wait_time = 0
      Placeholders after waiting – wait_time != 0
      Global cache waits are summarized in a broader category called Cluster Wait Class.
      These wait events are used in ADDM to enable Cache Fusion diagnostics.
      November 12, 2009
      8
    • Analyzing Cache fusion impact on RAC
      Wait Events Views
      November 12, 2009
      9
    • Analyzing cache fusion impact on rac
      November 12, 2009
      10
      Global Cache Wait Events: Overview
    • Analyzing cache fusion impact on rac
      November 12, 2009
      11
      2 – way Block Request: Example
    • Analyzing cache fusion impact on rac
      November 12, 2009
      12
      3-way Block Request: Example
    • Analyzing cache fusion impact on rac
      November 12, 2009
      13
      2-way Grant : Example
    • Analyzing cache fusion impact on rac
      Enqueues are synchronous.
      Enqueues are global resources in RAC
      The most frequent wait are for:
      TX – row wait locks or ITL waits
      TM – Table Manipulation Enqueue
      TA – Transaction Recovery Enqueue
      SQ – Sequence generation Enqueue
      HW – High Watermark Enqueue
      US – Undo Segment Enqueue to manage undo segment extensions.
      The waits may constitute serious serialization point
      November 12, 2009
      14
      Global Enqueue Waits: Overview
    • Analyzing cache fusion impact on rac
      Use V$SYSSTAT to characterize the workload.
      Use V$SESSSTAT to monitor important sessions.
      V$SEGMENT_STATISTICS includes RAC statistics.
      RAC relevant statistics group are:
      Global Cache Service statistics
      Global Enqueue Service statistics
      Statistics for messages send
      V$ENQUEUE_STATISTICS determines the enqueue with the highest impact.
      V$INSTANCE_CACHE_TRANSFER breaks down GCS statistics into block classes.
      November 12, 2009
      15
      Session and System Statistics
    • Private Interconnect Considerations
      November 12, 2009
      16
      IPC Configuration
    • Private Interconnect Considerations
      November 12, 2009
      17
      Infrastructure Network Packet Processing
    • Private Interconnect considerations
      November 12, 2009
      18
      Network Packet Processing: Layers, Queues and Buffers
    • Private Interconnect Considerations
      Network between the nodes of a RAC cluster must be private.
      NIC to have the same name across all the nodes in the RAC cluster.
      Supported links: Gbe, IB
      Supported transport protocols: UDP, RDS
      Use multiple or dual-ported NICs for redundancy (HA), load balancing, load spreading and increase bandwidth with NIC bonding/aggregation.
      Large ( Jumbo ) Frames for Gbe recommended if the global cache workload requires it.
      Bandwidth requirements depend on several factors ( e.g. buffer cache size, #of CPUs per node, access patterns) and cannot be predicted precisely for every application
      For OLTP 1Gb/sec usually is sufficient for performance and scalability.
      DSS/DW systems should be designed with > 1Gb/sec capacity
      November 12, 2009
      19
      Infrastructure: Private Interconnect
    • Private Interconnect considerations
      Important Settings:
      Negotiated top bit rate and full duplex mode
      NIC ring buffers
      Ethernet flow control settings
      CPU(s) receiving network interrupts
      Verify your setup:
      CVU does checking
      Load testing eliminates potential for problems
      AWR and ADDM give estimations of link utilization
      Buffer overflows, congested links and flow control can have severe consequences for performance
      Block access latencies increase when CPU(s) busy and run queues are long
      Immediate LMS scheduling is critical for predictable block access latencies when CPU > 80% busy
      Fewer and busier LMS processes may be more efficient.
      monitor their CPU utilization
      Caveat: 1 LMS can be good for runtime performance but may impact cluster reconfiguration and instance recovery time
      the default is good for most requirements. gcs_server_processes init parameter overrides defaults
      Higher priority for LMS is default
      The implementation is platform-specific
      November 12, 2009
      20
      Infrastructure: IPC configuration and Operating System
    • Private interconnect considerations
      Interconnect should be dedicated non-routable subnet mapped to a single dedicated, non-shared VLAN
      If VLANs are ‘trunked’ the interconnect VLAN traffic should not exceed the access switch layer
      Minimize the impact of Spanning Tree events
      Monitor the switch(es) for congestion
      Avoid QoS definitions that may negatively impact interconnect performance
      NIC driver dependent – DEFAULTS GENERALLY SATISFACTORY
      Confirm flow control: rx=on, tx=off
      Confirm full bit rate (1000) for the NICs
      Confirm full duplex auto-negotiate
      Ensure NIC names/slots identical on all nodes
      Configure interconnect NICs on fastest PCI bus
      Ensure compatible switch settings
      802.3ad on NICs = 802.3ad on switch ports
      MTU=9000 on NICs = MTU=9000 on switch ports
      FAILURE TO CONFIGURE THE NICS AND SWITCHES CORRECTLY WILL RESULT IN SEVERE
      PERFORMANCE DEGRADATION AND NODE FENCING
      November 12, 2009
      21
      The Interconnects, VLANs and NIC settings
    • Private Interconnect considerations
      November 12, 2009
      22
    • Aggregation
      Cisco Etherchannel based 802.3ad
      AIX Etherchannel
      HPUX Auto Port Aggregation
      SUN Trunking, IPMP, GLD
      Linux Bonding (only certain modes)
      Windows NIC teaming
      Aggregation Methods
      Load balance/failover/load spreading
      spread on sends/serialize on receives
      Active/Standby
      Oracle Interconnect Requirement
      Both Send/Receive side load balancing
      NIC and Switch port failure detection
      November 12, 2009
      23
    • Common Problems and symptoms
      gc [current][cr] block lost: This event shows block losses during transfers. High values indicate IPC, downstream network problems. ‘request retry’ event is likely to be seen .
      global cache blocks corrupt: This statistic shows if any blocks were corrupted during transfers. If high values are returned for this statistic, there is probably an IPC, network or hardware problem.
      global cache open s and global cache open x: The initial access of a particular data block by an instance generates these events. The duration of the wait should be short, and the completion of the wait is most likely followed by a read from disk. This wait is a result of the blocks that are being requested and not being cached in any instance in the cluster database. Pre-load heavily used tables into the buffer caches.
      global cache null to s and global cache null to x: These events are generated by inter-instance block ping across the network. Interinstance block ping is when two instances exchange the same block back and forth. Reduce the number of rows per block to eliminate the need for block swapping between two instances in the RAC cluster.
      global cache cr request: This event is generated when an instance has requested a consistent read data block and the block to be transferred has not arrived at the requesting instance. Placeholder event. Look for other gc events.
      gc buffer busy: This event can be associated with a disk I/O contention for example slow disk I/O due to rogue query. Slow concurrent scans can cause buffer cache contention. However, note than there can be a multiple symptoms for the same cause. It can be seen together with ‘db file scattered reads’ event. Global cache access and serialization attributes to this event. Serialization is likely to be due to log flush time on another node or immediate block transfers.
      November 12, 2009
      24
      Wait events worth investigation
    • Common Problems and symptoms
      congested:The events that contain ‘congested’ suggest CPU, LMS saturation, long running queries, swapping, network configuration issues. Maintain a global view and remember that symptom and cause can be on different instances.
      busy: The events that contain ‘busy’ indicate contention. It needs investigation by drilling down into either SQL with highest cluster wait time or segment statistics with highest block transfers. Also look at objects with highest number of block transfers and global serialization.
      Gc [current/cr] [2/3]-way –Increase private interconnects bandwidth and decreasing the private interconnects latency.
      Gc [current/cr] grant 2-way – Increase private interconnects bandwidth and decreasing the private interconnects latency.
      Gc [current/cr][block/grant] congested – means that it has been received eventually but with a delay because of intensive CPU consumption, memory lack, LMS overload due to much work in the queues, paging, swapping. This is worth investigating as it provides a room for improvement. We will look at it later.
      Gc [current/cr] block busy – Received but not sent immediately due to high concurrency or contention. This means that the block is busy. Variety of reasons for being busy just means cannot be sent immediately due to Oracle oriented reasons.
      Gc current grant busy – Grant is received but there is a delay due to many shared block images or load.
      Gc [current/cr][failure/retry] - Failure means that cannot receive the block image while retry means that the problem recovers and ultimately the block image can be received but it needs to retry. Investigate the IPC or downstream network problems.
      November 12, 2009
      25
      Wait events worth investigation
    • Diagnostics and Problem Determination
      Tune for a single instance first
      Tune for RAC
      Instance Recovery
      Interconnect traffic
      Points of serialization can be exacerbated
      RAC–reactive tuning tools :
      Specific Wait events
      System and enqueue statistics
      Enterprise Manager performance pages
      AWR and ASH reports
      RAC – proactive tools
      AWR snapshots
      ADDM reports
      November 12, 2009
      26
    • Diagnostics and Problem Determination
      Application tuning is often the most beneficial.
      Resizing and tuning the buffer cache.
      Reducing the long full-table scans in OLTP systems.
      Using Automatic Segment Space Management.
      Increasing sequence caches.
      Using partitioning to reduce inter-instance traffic.
      Avoid unnecessary parsing.
      Minimizing locking usage.
      Removing unselective indexes.
      Configuring Interconnect properly.
      November 12, 2009
      27
      Most common RAC tuning tips
    • Diagnostics and Problem Determination
      November 12, 2009
      28
    • Oracle Clusterware and Private Network Considerations- Practical Performance Management for Oracle RAC
      November 12, 2009
      29
      Questions
      &
      Answers