Oracle Real Application Cluster
Satishbabu Gunukula
12+ Years of Experience in Oracle, SQLServer Database Technologies
Oracle Certified Professional Oracle 8i,9i,10g
Oracle Certified Expert Oracle 10g RAC
http://www.oracleracexpert.com
Objective
• What is Real Application Cluster?
• Oracle Cluster Benefits & Components
• Oracle Cluster Ready Services
• Interconnect & Cache fusion
• Oracle RAC Database & ASM
• Transparent Application Failover (TAF)
• Backup & Recovery
• New Features in Oracle 11g RAC
What is Real Application Cluster
• Lower Cost
• Scalability
• High Availability
• Ease Of Administration
• Transparent to users
• GRID computing
• More than one instance per database.
• Instances are running on different nodes.
• Instances on different nodes writes to the same physical
Database
What is Real Application Cluster
• Control file, Data files, Temp Files and Spfiles are on a
shared storage
• Shared storage may have a Clustered file system or ASM
or Raw Volumes
• Every Instance will have its own redo log files and Undo
segments
• Every Instance has its own background process
• All Cache (Data Buffer,Library ,Data Dictionary ) are
synchronized by Cache Fusion and resources managed
globally
• Backup and Recovery databases from any instance in the
Cluster.
What is Real Application Cluster
• Sessions failover using Transparent Application Failover
(TAF)
• Users can connect to any active instance (load balance)
• Additional waits due to interconnect traffic
Node1 Node2
Instance 1 Instance 2
Archived Logs Archived Logs
(Local Storage) (Local Storage)
Undo Tablespace Data Files Undo Tables pace
for Instance 1 Temp Files for Instance 2
Control Files
Online Redo log Flash Recovery AreaFiles Online Redo log
files for Instance 1 Change Tracking File files for Instance2
Spfile
Shared Storage
Shared Storage
Shared store is very critical component in Oracle RAC. Both
SAN/NAS are supported.
• Supported file storage
- Raw volumes
- Cluster File system(Oracle ACFS,OCFS,OCFS2)
- Oracle ASM (Automatic Storage Management)
- Direct NFS (new feature in 11g)
• Physical Connections to shared storage
- Fully redundant active –active IO paths
• For iSCSI - Fully redundant IO paths for iSCSI with multiple
NIC card each server and Gigabit Ethernet switches.
Oracle Clusterware
• Oracle Clusterware enables servers to communicate
with each other
• Each server in cluster has addition processes that
communicate with other servers.
• Oracle clusterware manages the resources, such as
Virtual IP
Addresses, Instances, databases, listeners, services..etc.
• You can also use Oracle Clusterware to manage the user
applications.
Benefits of using Cluster
• Scalability of Applications
• Use of less expensive commodity hardware
• Ability of failover
• Ability to program the startup of applications in planned
order
• Ability to monitor processes
• Ability to restart the process if they stop
• Ability to increase capacity over time by adding servers
Oracle Clusterware Benefits
• Eliminate unplanned downtime
• Reduce or eliminate planned downtime for maintenance
• Increase throughput by enabling applications to run on
all the nodes in a cluster
• Increase the throughput on demand for clustware
applications by adding serves
• Reduce the total cost for infrastructure
Oracle clusterware components
• Software components
Voting Disk- Oracle clusterware uses this component to
determine the which nodes are members of a cluster
Oracle Cluster Registry(OCR) – Oracle clusterware uses OCR to
sore and mange information about high-availability
components in the cluster, such as cluster node list, cluster
database instance to node mapping, VIP address, services and
applications.
• Both Vote and OCR must reside on shared storage that is
accessible by all nodes in a cluster
• At least 3 voting disks and maximum of 15 disks
• To ensure high availability multiplex OCR location up to 5
locations
Oracle clusterware Network Config
• The VIP addresses must be resolved by the clients. The
(Grid Naming Service) GNS is linked to the corporate
Domain Name Service (DNS) so that clients can easily
connect to the cluster and the databases running there
• Single Client Access Name (SCAN) - The SCAN is a single
name that resolves to three IP addresses in the public
network. When using GNS and DHCP, Oracle Clusterware
configures the VIP addresses for the SCAN name that is
provided during cluster configuration
• Oracle 11g R2 supports the use of DHCP for all private
interconnect addresses, as well as for most of the VIP
addresses.
Interconnects
• Cluster Interconnect is very important private network
used for communication between all other nodes
• Network pings are performed by Cluster Synchronization
Services (ocssd.bin)
• Connected via switch to other nodes
• New wait events due to traffic over interconnect
• Enhanced technology has helped Cache Fusion
• You can use OS dependent methods like Bonding on
Unix and teaming on Windows
• OS independent redundant interconnect available from
11.2.0.2 onwards (Not on Windows)
Cache Fusion
• Cache coherency is the technique used to keep multiple
copies of a block consistent between different oracle
instance.
• GCS implements the cache coherency by using Cache
fusion algorithm
• GES maintains all non-cache fusion resource operations
• Cache Fusion addresses several types of concurrency as
below:
– Concurrent Reads on Multiple Nodes
– Concurrent Reads and Writes on Different Nodes
– Concurrent Writes on Different Nodes
• Cache Fusion partially implemented in Oracle 8i OPS
Cache Fusion
• Request a block for a Modification
1. Instance1 submits a request to GCS to modify the block.
2. The GCS transmits the request to the holder, i.e. instance 2
3. Instance 2 receives the request message and the LMS process
sends the block to instance 1.
4. On receipt of the block, instance 1 informs the GCS that it holds
the block in exclusive mode
Cache Fusion
• Write a Block to Disk
1. Instance2 sends a request to GCS to write block to disk
2. The GCS forwards the request to instance 1
3. Instance 1 receives the request and writes the block to disk.
4. Instance 1 notifies the write operation to GCS
5. After receipt of notification GCS orders PI holders to discard their
PI’s
Block Access and Buffer states
• To see a buffer's state, query the STATUS column of the
V$BH dynamic performance view.
• Block access mode - NULL and buffer state name – CR
– An instance can perform a consistent read of the block. That is, if the instance
holds an older version of the data.
• Block access mode -S and buffer state name – SCUR
– An instance has shared access to the block and can only perform
reads.
• Block access mode - X and buffer state name – XCUR
– An instance has exclusive access to the block and can modify it.
• Block access mode - NULL and buffer state name is – PI
– An instance has made changes to the block but retains copies of
it as past images to record its state before changes.
Block Access and Buffer states
• SCUR and PI buffer states are RAC specific
• There can be only one copy of any one block buffered in the XCUR
state
• To perform modifications on a block, a process must assign an
XCUR buffer state to the buffer containing the data block.
Cluster ready services
Below processes must run after CRS installation In order for
Cluster Ready Services to function
• evmd -- Event manager daemon that starts the racgevt
process to manage callouts.
• ocssd -- Manages cluster node membership and runs as
oracle user; failure of this process results in cluster
restart.
• crsd -- Performs high availability recovery and
management operations such as maintaining the OCR.
Also manages application resources and runs as root
user and restarts automatically upon failure.
Cluster ready service Stack
• Cluster Ready Services (CRS): For managing high
availability operations in a cluster.
• Cluster Synchronization Services (CSS): Manages the
cluster configuration by controlling which nodes are
members of the cluster and by notifying members when
a node joins or leaves the cluster.
• Automatic Storage Management (ASM): Provides disk
management for Oracle Clusterware.
• Cluster Time Synchronization Service (CTSS): Provides
time management in a cluster for Oracle Clusterware.
• Event Management (EVM): A background process that
publishes events that Oracle Clusterware creates.
Cluster ready service Stack
• Oracle Notification Service (ONS): Publish and
subscribes service for communicating Fast Application
Notification (FAN) events.
• Oracle Agent (oraagent): To support Oracle-specific
requirements and complex resources. Runs server
callout scripts when FAN events occur. This process was
known as RACG in Oracle11g R1
• Oracle Root Agent (orarootagent): oraagent process
that helps crsd to manage resources owned by root,
such as the network, and the Grid virtual IP address
High Availability Services Stack
• Grid Plug and Play (gpnpd): Provides access to the Grid
Plug and Play profile, and coordinates updates to the
profile among the nodes of the cluster to ensure that all
of the nodes node have the most recent profile.
• Grid Interprocess Communication (GIPC): A helper
daemon for communications infrastructure. Currently
has no functionality; to be activated in a later release.
• Multicast Domain Name Service (mDNS): Allows DNS
requests. The mDNS process is a background process on
Linux and UNIX, and a service on Windows.
• Oracle Grid Naming Service (GNS): A gateway between
the cluster mDNS and external DNS servers. The gnsd
process performs name resolution within the cluster.
RAC Database
• Use DBCA (Database Configuration Assistant) to create
database and you can also configure listeners and
Enterprise Manager
• Before you create Database
– The Oracle Cluster Ready services must be installed and
configured. Share disk must be in place.
– If planning to use ASM then ASM resources should be available.
– Oracle Database software must be installed
• DBCA Automatically recognized the cluster environment
and will provide the options of configuring the RAC
environment.
• If database created manually then use srvctl to register
the database in the OCR
RAC Database
• Use DBCA (Database Configuration Assistant) to create
database and you can also configure listeners and
Enterprise Manager
• Before you create Database
– The Oracle Cluster Ready services must be installed and
configured. Share disk must be in place.
– If planning to use ASM then ASM resources should be available.
– Oracle Database software must be installed
• DBCA Automatically recognized the cluster environment
and will provide the options of configuring the RAC
environment.
• If database created manually then use srvctl to register
the database in the OCR
RAC Specific Background Processes
• LMON: Global Enque Service Monitor
– Maintains instance membership within Oracle RAC.
– All non-cache fusion interinstance resource
operations
– The process detects instance transitions and
performs reconfiguration of GES and GCS resources.
• LMD: Global Enque Service Demon
– Manages incoming enqueue request messages and
controls access to global enqueues.
– It also performs distributed deadlock detections
• Global Cache Service and Global enqueue service
manages Global Resource Directory (GRD)
RAC Specific Background Processes
• LMSx: Global Cache Service Processes, where x can be 0
to 10
– Managing the resource requests and cross-instance
call operation
– block transfers and other GCS-related messages
• LCKx: Lock processes
– This process manages the global enqueue requests
and the cross-instance broadcast
• DIAG : Diagnosability process
– Monitors the health of the instance and captures the
data for process failures
Automatic Storage Management(ASM)
• ASM provides portable and high performance database file
system and simplifies database administration
• ASM spread data across the disks and distributes I/O load
across all available resources to optimize performance
• ASM provides integrated mirroring across disks
• Dynamically add the space without shutdown of the
database
• It is advised to use separate ORACLE_HOME for ASM install
• You can configure the ASM using DBCA
• A separate instance (ASM) starts in order to manage ASM
disks, resources and connectivity
• Both ASM and Database instances have access to common
set of disks called disk groups
ASM background processes
ASM background Process
• ARBn : Performs the actual rebalance data extent movements in an
Automatic Storage Management instance. More than one process
can run at a time, named ARB0, ARB1, and so on.
• ASMB : Runs in a database instance that is using an ASM disk group
and communicates with the ASM instance in managing storage and
providing statistics.
• GMON: Maintains disk membership in ASM disk groups.
• MARK: This process marks ASM allocation units as stale following a
missed write to an offline disk. This essentially tracks which extents
require resync for offline disks.
• RBAL: This process runs in both database and ASM instances. In the
database instance, it does a global open of ASM disks and in an ASM
instance, it also coordinates rebalance activity for disk groups
New initialization parameters
Unique parameters in RAC Instance
• instance_name- Specifies the unique name of this instance
• instance_number- Specifies the unique number that maps to
instance
• thread – Specifies the number of the redo thread used by the
instance
Non-Unique parameters in RAC Instance
• cluster_database – specifies weather RAC enabled or not
• cluster_database_instance – equal to the number of instances in a
cluster
• cluster_interconnects – Specifies the additional interconnects
available for use
• active_instance_count– specifies the number of instances that will
be active within a cluster
New parameters
• remote_listener -specifies a network name that resolves to an
address or address list of Oracle Net remote listeners
• local_listener - specifies a network name that resolves to an address
or address list of Oracle Net local listeners
Parameters in ASM Instance
• instance_type – This parameter must be set to ASM
• asm_diskgroups – lists the name of the disk groups that will be
mounted by ASM instance
• asm_diskstring – This parameter limits the set of disks that ASM
consider for discovery
• asm_power_limit – specifies the Maximum power on an ASN
instance for disk rebalance operations
• asm_preferred_read_failure_groups - specifies the failure groups
that contain preferred read disks
Transparent Application Failover(TAF)
• Transparent Application Failover (TAF) is a client-side
feature that allows for clients to reconnect to surviving
nodes in the event of a failure of an instance.
• The reconnect happens automatically from within the
OCI (Oracle Call Interface) library. Any uncommitted
transactions are rolled back and server side program
variables and session properties will be lost.
• In some case the select statements automatically re-
executed on the new connection with the cursor
positioned on the row on which it was positioned prior
to the failover.
• The failover is configured in tnsnames.ora file, the TAF
settings are placed in CONNECT_DATA section of the
tnsnames.ora using FAILOVER_MODES parameters
TAF failover & load balance methods
• Failover modes
- TYPE: TAF supports three types of failover types
SESSION: If a user's connection is lost, SESSION failover establishes
a new session automatically created for the user on the backup
node. This type of failover does not attempt to recover selects
SELECT: If the connection is lost, Oracle Net establishes a
connection to another node and re-executes the SELECT
statements with cursor positioned on the row on which it was
positioned prior to the failover
NONE: This setting is the default and failover functionality is
provided. Use this setting to prevent failover.
- METHOD: This parameters determines how failover
occurs from the primary node to the backup node
BASIC: Use this mode to establish connections at failover time, no
work on the backup server until failover time
TAF failover & load balance methods
PRECONNECT: Use this mode to pre-established connections.
- RETRIES: Use this parameter to specify number of
times to attempt to connect after a failover
- DELAY: Use this parameter to Specify the amount of
time in seconds to wait between connect attempts.
• LOAD_BALANCE:YES,NO,OFF,TRUE
• There are two methods of load balancing
- Client load balancing - Distributes new connections
among Oracle RAC nodes so that no one server is
overloaded with connection requests
- Server load balancing – Distributes processing
workload among RAC nodes and new user session
connection requests to the least loaded listener.
• For failover information query view GV$SESSION –
failover_type,failover_method,failed_over
OCR & Vote Backup and Recovery
• Oracle recommends that you back up your OCR & voting
disk after initial cluster creation
• OCR - There are two methods to backup OCR
1. Automatically generated OCR backup files under
$CRS_HOME/cdata/crs
2. OCR export/logical backup
# ocrconfig -export export_file_name
• Use ocrconfig to restore OCR from backup
# ocrconfig –restore $CRS_HOME/cdata/crs/day.ocr
or
# ocrconfig –import export_file_name
• Vote –Use dd or ocopy command
Backup - $ dd if=vote_disk_name of=backup_file_name
Recovery- $ dd if=backup_file_name of=vote_disk_name
RAC Database Backup
• Backup RAC Database from any node in a Cluster
• You can take Full and incremental backup using RMAN
• You can backup to Tape,Disk and Cloud (using Media
Management Library from Oracle 11g)
• You can use Flash Recovery Area(FRA) for your backups
• As a Best practice, the backup device should be shared
between the nodes for easy recovery
• Perform backups of your RAC database using Enterprise
Manager
• Scale up the Backup load onto multiple Instances of RAC
RMAN> CONFIGURE DEVICE TYPE DISK PARALLELISM 2;
RMAN> CONFIGURE CHANNEL C1 CONNECT ‘sys/xxxx@inst1’;
RMAN> CONFIGURE CHANNEL C2 CONNECT ‘sys/xxx@Inst2’;
RAC Database Recovery
• Automatic instance recovery occurs for failed instances
due to hardware or software problems
– SMON Determines the block needed for recovery and Global
Resource Directory (GRD) is forgen
– GES remasters enqueues and GCS remasters the resources
– Buffer space for recovery allocated and block not in recovery
will be accessible, oracle performs roll forward recovery.
• As long as one instance survives, RAC performs instance
recovery for any other failed instances
• In case of any Media failure recovery the database from
any instance in the Cluster. After recovery manually start
the other instance in cluster.
• If using Flashbak Recovery Area then you can SWTICH
the database, in case of media recovery.
Performance tuning
• ADDM (Automatic Database Diagnostic Monitor) is a
performance monitor tool, which proactively monitors the
performance and also captures RAC related issues
• The statistical data needed for diagnosis of a problem is
saved in the Automatic Workload Repository (AWR).
• Oracle Database 10g uses a scheduled job,
GATHER_STATS_JOB(GATHER_DATABASE_STATS_JOB_PROC
) to collect AWR statistics
• The ADDM Analyzed the data in AWR on regular basis to
find the root cause of performance problems and provides
recommendations to correct the problems.
• ADDM is enabled by default and is controlled by the
STATISTICS_LEVEL initialization parameter. This parameter
should be set to the TYPICAL or ALL to enable the ADDM,
default setting is TYPICAL.
• For ADDM Analysis you can run addmrpt.sql
Performance tuning
• For AWR reports you can run awrrpt.sql, awrrpti.sql for
RAC
• ADDM provides the following benefits:
– Automatic performance diagnostic report every hour by default
– Problem diagnosis based on decades of tuning expertise
– Time-based quantification of problem impacts and
recommendation benefits
– Identification of root cause, not symptoms
– Recommendations for treating the root causes of problems
– Identification of non-problem areas of the system
– Minimal overhead to the system during the diagnostic process
• The v$cache_transfer and v$file_cache_transfer views
are used to examine RAC statistics
Performance tuning
• In RAC, the global services directory processes is the
most important tuning area. GSD is communicates
through cluster interconnect. If cluster interconnects
do not perform properly, the entire RAC will suffer no
matter how well everything else is tuned.
• The Global Enqueue Services (GES) and Global Cache
Services (GCS) are the main process
• Wait events can be divided as three categories
– 1.Time-based event
– 2.System-wide event
– 3.Session wait
• The major wait events in Oracle RAC are:
– gc cr request
– gc buffer busy
Performance tuning
• The most important wait events for RAC include
various categories, such as:
• Block-oriented
– gc current block 2-way
– gc current block 3-way
– gc cr block 2-way
– gc cr block 3-way
• Message-oriented
– gc current grant 2-way
– gc cr grant 2-way
• Contention-oriented
– gc current block busy
– gc cr block busy
– gc current buffer busy