More Related Content Similar to Oracle RAC 12c Rel. 2 for Continuous Availability (20) More from Markus Michalewicz (20) Oracle RAC 12c Rel. 2 for Continuous Availability2. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
Real
Applica@on
Clusters
(RAC)
12c
Release
2
–
For
Con@nuous
Availability
Markus
Michalewicz
Senior
Director
of
Product
Management,
Oracle
RAC
Development
Markus.Michalewicz@oracle.com
@OracleRACpm
hQp://www.linkedin.com/in/markusmichalewicz
hQp://www.slideshare.net/MarkusMichalewicz
3. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Safe
Harbor
Statement
The
following
is
intended
to
outline
our
general
product
direc@on.
It
is
intended
for
informa@on
purposes
only,
and
may
not
be
incorporated
into
any
contract.
It
is
not
a
commitment
to
deliver
any
material,
code,
or
func@onality,
and
should
not
be
relied
upon
in
making
purchasing
decisions.
The
development,
release,
and
@ming
of
any
features
or
func@onality
described
for
Oracle’s
products
remains
at
the
sole
discre@on
of
Oracle.
3
4. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Edi@on-‐based
Redefini@on,
Online
Redefini@on,
Data
Guard,
GoldenGate
–
Minimal
down+me
maintenance,
upgrades,
migra+ons
Ac@ve
Data
Guard
– Data
Protec+on,
DR
– Query
Offload
GoldenGate
– Ac+ve-‐ac+ve
replica+on
– Heterogeneous
Ac@ve
Replica
Oracle
Maximum
Availability
Architecture
(MAA)
RMAN,
Oracle
Secure
Backup
– Backup
to
disk,
tape
or
cloud
Enterprise
Manager
Cloud
Control
– Coordinated
Site
Failover
Applica@on
Con@nuity
– Applica+on
HA
Global
Data
Services
– Service
Failover
/
Load
Balancing
RAC
– Scalability
– Server
HA
Flashback
– Human
error
correc+on
Produc@on
Site
ASM
– ASM
mirroring
5. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Edi@on-‐based
Redefini@on,
Online
Redefini@on,
Data
Guard,
GoldenGate
–
Minimal
down+me
maintenance,
upgrades,
migra+ons
Ac@ve
Data
Guard
– Data
Protec+on,
DR
– Query
Offload
GoldenGate
– Ac+ve-‐ac+ve
replica+on
– Heterogeneous
Ac@ve
Replica
Oracle
Maximum
Availability
Architecture
(MAA)
RMAN,
Oracle
Secure
Backup
– Backup
to
disk,
tape
or
cloud
Enterprise
Manager
Cloud
Control
– Coordinated
Site
Failover
Applica@on
Con@nuity
– Applica+on
HA
Global
Data
Services
– Service
Failover
/
Load
Balancing
RAC
– Scalability
– Server
HA
Flashback
– Human
error
correc+on
Produc@on
Site
Edi@on-‐based
Redefini@on,
Online
Redefini@on,
Data
Guard,
GoldenGate
–
Minimal
down+me
maintenance,
upgrades,
migra+ons
Ac@ve
Data
Guard
– Data
Protec+on,
DR
– Query
Offload
GoldenGate
– Ac+ve-‐ac+ve
replica+on
– Heterogeneous
Ac@ve
Replica
RMAN,
Oracle
Secure
Backup
– Backup
to
disk,
tape
or
cloud
Enterprise
Manager
Cloud
Control
– Coordinated
Site
Failover
Applica@on
Con@nuity
– Applica+on
HA
Global
Data
Services
– Service
Failover
/
Load
Balancing
RAC
– Scalability
– Server
HA
Flashback
– Human
error
correc+on
Produc@on
Site
ASM
– ASM
mirroring
6. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Program
Agenda
High
Availability
Improvements
Con@nuous
Availability
Features
1
2
6
7. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Program
Agenda
High
Availability
Improvements
Con@nuous
Availability
Features
1
2
7
8. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Reduced
failure
detec(on
@me
for
an
increased
number
of
monitored
components
8
Reduced
(me
to
recover
from
local
failures
due
to
reduced
reconfigura@on
@mes
Preven(on
of
system
or
database
failures
using
ML-‐based
real-‐@me
analysis
of
diagnos@c
data
RAC
High
Availability
Improvements
9. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Reduced
failure
detec(on
@me
for
an
increased
number
of
monitored
components
9
Reduced
(me
to
recover
from
local
failures
due
to
reduced
reconfigura@on
@mes
Preven(on
of
system
or
database
failures
using
ML-‐based
real-‐@me
analysis
of
diagnos@c
data
RAC
High
Availability
Improvements
10. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
More
Components
Checked
More
Frequently
• Oracle
Clusterware
checks
– more
components
• Mul@ple
public
networks
checked
with
Ping
Targets
– more
frequently
• VIPs
checked
every
second
• 30
secs
CSS
misscount
default,
zero
brownout
allows
for
less
– more
efficiently
• Agent
changes
allow
for
more
checks
using
lesser
resources
• Data
from
auxiliary
systems
are
taken
into
account
• Engineered
System-‐op(mized
failure
detec(on
and
fencing
– and
offline
• Offline
monitoring
of
failed
components
for
faster
recovery
– to
detect
failures
sooner
and
to
recover
faster
10
11. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Reduced
failure
detec(on
@me
for
an
increased
number
of
monitored
components
11
Reduced
(me
to
recover
from
local
failures
due
to
reduced
reconfigura@on
@mes
Preven(on
of
system
or
database
failures
using
ML-‐based
real-‐@me
analysis
of
diagnos@c
data
RAC
High
Availability
Improvements
12. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Smart
Fencing
12
13. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
13
• Pre-‐12.2,
node
evic@on
follows
a
rather
“ignorant”
paQern
– Example
in
a
2-‐node
cluster:
The
node
with
the
lowest
node
number
survives.
• Customers
must
not
base
their
applica@on
logic
on
which
node
survives
the
split
brain.
– As
this
may(!)
change
in
future
releases
Node
Evic@on
Basics
h=p://www.slideshare.net/MarkusMichalewicz/oracle-‐clusterware-‐node-‐management-‐and-‐vo(ng-‐disks
✔
1
2
14. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
14
• Node
Weigh@ng
is
a
new
feature
that
considers
the
workload
hosted
in
the
cluster
during
fencing
• The
idea
is
to
let
the
majority
of
work
survive,
if
everything
else
is
equal
– Example:
In
a
2-‐node
cluster,
the
node
hos@ng
the
majority
of
services
(at
fencing
@me)
is
meant
to
survive
Node
Weigh@ng
in
Oracle
RAC
12c
Release
2
Idea:
Everything
equal,
let
the
majority
of
work
survive
✔
1
2
15. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
A
three
node
cluster
will
benefit
from
“Node
Weigh@ng”,
if
three
equally
sized
sub-‐clusters
are
built
as
s
result
of
the
failure,
since
two
differently
sized
sub-‐clusters
are
not
equal.
15
Secondary
failure
considera(on
can
influence
which
node
survives.
Secondary
failure
considera@on
will
be
enhanced
successively.
A
fallback
scheme
is
applied
if
considera@ons
do
not
lead
to
an
ac@onable
outcome.
Let’s
Define
“Equal”
✔
Public
network
card
failure.
“Conflict”.
16. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
CSS_CRITICAL
can
be
set
on
various
levels
/
components
to
mark
them
as
“cri@cal”
so
that
the
cluster
will
try
to
preserve
them
in
case
of
a
failure.
16
CSS_CRITICAL
will
be
honored
if
no
other
technical
reason
prohibits
survival
of
the
node
which
has
at
least
one
cri@cal
component
at
the
@me
of
failure.
A
fallback
scheme
is
applied
if
CSS_CRITICAL
sepngs
do
not
lead
to
an
ac@onable
outcome.
CSS_CRITICAL
–
Fencing
with
Manual
Override
crsctl
set
server
css_cri(cal
{YES|NO}
+
server
restart
srvctl
modify
database
-‐help
|grep
cri@cal
…
-‐css_cri@cal
{YES
|
NO}
Define
whether
the
database
or
service
is
CSS
cri@cal
✔
Node
evic@on
despite
WL;
WL
will
failover.
“Conflict”.
17. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Recovery
Buddies
17
18. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
18
• Recovery
Buddies
• Track
block
changes
on
buddy
instance
• Quickly
iden@fy
blocks
requiring
recovery
during
reconfigura@on
• Allow
rapid
processing
of
transac@ons
awer
failures
Near
Zero
Reconfigura@on
Time
with
Recovery
Buddies
A.k.a.
Buddy
Instances
19. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
19
• Buddy
Instance
mapping
is
simple
(random)
– e.g.
I1
à
I2,
I2
à
I3,
I3
à
I4,
I4
à
I1
• Recovery
buddies
are
assigned
during
startup
• RMS0
on
each
recovery
buddy
instance
maintains
an
in-‐memory
area
for
redo
log
change
• An
in-‐memory
area
is
used
during
recovery
– Eliminates
the
need
to
physically
read
the
redo
Near
Zero
Reconfigura@on
Time
with
Recovery
Buddies
How
it
works
under
the
hood
Instance
I1
Instance
I2
Instance
I3
Instance
I4
Recovery
Buddy
I3
Recovery
Buddy
I4
Recovery
Buddy
I1
MyCluster
Recovery
Buddy
I2
20. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
How
Recovery
Buddies
Help
Reducing
Recovery
Time
Without
Recovery
Buddies
With
Recovery
Buddies
20
Detect
Evict
Elect
Recovery
Read
Redo
Apply
Recovery
Detect
Evict
Elect
Recovery
Read
Redo
Apply
Recovery
Up
to
4x
faster
21. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Database
Hang
Manager
21
22. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Overlooked
and
Underes@mated
–
Hang
Manager
• Customers
experience
database
hangs
for
a
variety
of
reasons
– High
system
load,
workload
conten@on,
network
conges@on,
general
errors,
etc.
• Before
Hang
Manager
was
introduced
with
Oracle
RAC
11.2.0.2
– Oracle
required
quite
some
informa@on
to
troubleshoot
a
hang
-‐
e.g.:
• System
state
dumps
• For
RAC:
global
system
state
dumps
– Customer
usually
had
to
reproduce
“the”
hang
with
addi@onal
events
to
analyze
it
22
Why
having
a
Hang
Manager
is
useful
23. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
23
• Always
on,
as
enabled
by
default
• Reliably
detects
database
hangs
• Autonomically
resolves
hangs
• Considers
QoS
policies
for
hang
resolu@on
• Logs
all
detected
hangs
&
their
resolu@ons
Introduc@on
to
Hang
Manager
How
it
works
Session
DIAG0
EVALUATE
DETECT
ANALYZE
Hung?
VERIFY
Vic(m
QoS
Policy
24. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
24
• Hang
Manager
auto-‐tunes
itself
by
periodically
collec@ng
instance-‐and
cluster-‐wide
hang
sta@s@cs
• Metrics
like
cluster
health/instance
health
is
tracked
over
a
moving
average
• This
moving
average
is
considered
during
resolu@on
• Holders
wai@ng
on
SQL*Net
break/reset
are
fast
tracked
Hang
Manager
Op@miza@ons
with
Oracle
RAC
12c
Tuning
under
the
hood
25. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
25
• Early
warning
exposed
via
(V$
view)
• Sensi@vity
can
be
set
higher
– If
the
default
level
is
too
conserva@ve
• Hang
Manager
considers
QoS
policies
and
data
during
the
valida@on
process
DBMS_HANG_MANAGER.Sensi@vity
A
new
SQL
interface
to
set
Hang
Manager
sensi@vity
Hang
Sensi(vity
Level
Descrip(on
Note
NORMAL
Hang
Manager
uses
its
default
internal
opera@ng
parameters
to
try
to
meet
typical
requirements
for
any
environments.
Default
HIGH
Hang
Manager
is
more
alert
to
sessions
wai@ng
in
a
chain
than
when
sensi@vity
is
in
NORMAL
level.
26. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Reduced
failure
detec(on
@me
for
an
increased
number
of
monitored
components
26
Reduced
(me
to
recover
from
local
failures
due
to
reduced
reconfigura@on
@mes
Preven(on
of
system
or
database
failures
using
ML-‐based
real-‐@me
analysis
of
diagnos@c
data
RAC
High
Availability
Improvements
27. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
Autonomous
Health
Framework
(AHF)
• Integrates
next
genera@on
tools
running
as
components
-‐
24/7
• Discovers
Poten@al
Issues
and
No@fies
or
takes
Correc@ve
Ac@ons
• Speeds
up
Issue
Diagnosis
and
Recovery
• Preserves
Database
and
Server
Availability
and
Performance
• Autonomously
Monitors
and
Manages
resources
to
maintain
SLAs
27
Working
for
You
Con(nuously
28. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
AHF
–
Availability
by
Pla}orm
28
Linux
x86-‐64
zLinux
Solaris
(Sparc)
HP-‐UX
Itanium
IBM
AIX
Windows
z86-‐64
Cluster
Verifica(on
U(lity
(CVU)
✔ ✔
V:
March
2015
✔ ✔
V:
August
2015
✔
V:
August
2015
✔
V:
August
2015
ORAchk
✔
✔
✔
✔
✔
✔
Cluster
Health
Monitor
(CHM)
✔ ✗
Not
planned
✔ ✗
Not
planned
✔ ✔
Cluster
Health
Advisor
(CHA)
✔
Since
12.2.0.1
✗
Not
planned
✗
Future
Release
✗
Not
planned
✗
Future
Release
✗
Not
planned
Trace
File
Analyzer
(TFA)
✔
✔
✔
✔
(no
TFA
web)
✔
✔
(no
TFA
web)
Hang
Manager
✔
✔
✔
✔
✔
✔
Memory
Guard
✔ ✗
Not
planned
✔ ✗
Not
planned
✔ ✔
Quality
of
Service
Management
(QOS)
✔ ✗
Not
planned
✔ ✗
Not
planned
✔ ✔
29. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
29
Generates
Diagnos(c
Metrics
View
of
Cluster
and
Databases
Cluster
Health
Monitor
(CHM)
• Always
on
-‐
Enabled
by
default
• Provides
Detailed
OS
Resource
Metrics
• Assists
Node
evic@on
analysis
• Locally
logs
all
process
data
• User
can
define
pinned
processes
• Listens
to
CSS
and
GIPC
events
• Categorizes
processes
by
type
• Supports
plug-‐in
collectors
(ex.
traceroute,
netstat,
ping,
etc.)
• New
CSV
output
for
ease
of
analysis
GIMR
ologgerd
(master)
osysmond
12c
Grid
Infrastructure
Management
Repository
OS
Data
osysmond
osysmond
OS
Data
OS
Data
30. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Introducing
Oracle
12c
Cluster
Health
Advisor
(CHA)
• Real
@me
monitoring
of
Oracle
RAC
database
systems
and
their
hosts
• Early
detec@on
of
impending
as
well
as
ongoing
system
faults
• Diagnoses
and
iden@fies
the
most
likely
root
causes
• Provides
correc@ve
ac@ons
for
targeted
triage.
• Generates
alerts
and
no@fica@ons
for
rapid
recovery
30
Proac(ve
Health
Prognos(cs
System
Full
presenta@on:
hQp://www.oracle.com/technetwork/database/op@ons/clustering/ahf/learnmore/oracle-‐12cr2-‐cha-‐3623186.pdf
Recorded
WebSeminar:
hQps://www.youtube.com/watch?v=TbdkGsmSgcQ
31. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Cluster
Health
Advisor
(CHA)
Architecture
Overview
31
OS
Data
GIMR
ochad
DB
Data
CHM
Node
Health
Prognos(cs
Engine
Database
Health
Prognos(cs
Engine
OS
Model
DB
Model
• cha
–
Cluster
node
resource
• Single
Java
ochad
daemon
per
node
• Reads
Cluster
Health
Monitor
data
directly
from
memory
• Reads
DB
ASH
data
from
SMR
w/o
DB
connec@on
• Uses
OS
and
DB
models
and
data
to
perform
prognos@cs
• Stores
analysis
and
evidence
in
the
GI
Management
Repository
• Sends
alerts
to
EMCC
Incident
Manager
per
target
EMCC
Alert
32. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Cluster
Health
Advisor
-‐
Scope
of
Problem
Detec@on
• Over
30
node
and
database
problems
have
been
modeled
• Over
150
OS
and
DB
metric
predictors
iden@fied
• Problem
Detec@on
in
12.2.0.1
includes
– Interconnect
,
Global
Cache
and
Cluster
Problems
– Host
CPU
and
Memory
,
PGA
Memory
stress
– IO
and
Storage
Performance
issues
– Reconfigura@on
and
Recovery
issues
– Workload
and
Session
abnormal
varia@ons
32
Best
Effort
Immediate
Guided
Diagnosis
33. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
33
Data
Sources
and
Data
Points
Cluster
Health
Advisor
Time
CPU
ASM
IOPS
Network
%
u(l
Network_
Packets
Dropped
Log
file
sync
Log
file
parallel
write
GC
CR
request
GC
current
request
GC
current
block
2-‐way
GC
current
block
busy
Enq:
CF
-‐
conten
(on
…
15:16:00
0.90
4100
13%
0
2
ms
600
us
0
0
300
us
1.5
ms
0
A
CHA
Data
Point
contains
150
signals
(sta@s@cs
and
events)
from
mul+ple
sources
OS,
ASM
,
Network
DB
(
ASH,
AWR
session,
system
and
PDB
sta(s(cs
)
Sta@s@cs
are
collected
at
a
1
second
internal
sampling
rate
,
synchronized,
smoothed
and
aggregated
to
a
Data
Point
every
5
seconds
34. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
34
Models
Capture
the
Dynamic
Behavior
of
all
Normal
Opera?on
Models
Capture
all
Normal
Opera@ng
Modes
0
5000
10000
15000
20000
25000
30000
35000
40000
10:00
2:00
6:00
5100
9025
4024
2350
4100
22050
10000
21000
4400
2500
4900
800
IOPS
user
commits
(/sec)
log
file
parallel
write
(usec)
log
file
sync
(usec)
• Release
ships
with
conserva@ve
models
to
minimize
false
warnings
• A
model
captures
the
normal
load
phases
and
their
sta@s@cs
over
@me,
and
thus
the
characteris@cs
for
all
load
intensi@es
and
profiles.
During
monitoring,
any
data
point
similar
to
one
of
the
vectors
is
NORMAL.
• One
could
say
that
the
model
REMEMBERS
the
normal
opera?onal
dynamics
over
?me
In-‐Memory
Reference
Matrix
(Part
of
“Normality”
Model)
IOPS
####
2500
4900
800
####
User
Commits
####
10000
21000
4400
####
Log
File
Parallel
Write
####
2350
4100
22050
####
Log
File
Sync
####
5100
9025
4024
####
…
…
…
…
…
…
35. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
35
CHA
Model:
Find
Similarity
with
Normal
Values
Cluster
Health
Advisor
Observed
values
(Part
of
a
Data
Point)
CHA
es(mator/predictor:
“based
on
my
normality
model,
the
value
of
IOPS
should
be
in
the
vicinity
of
~
4900,
but
it
is
reported
as
10500,
this
is
causing
a
residual
of
~
5600
in
magnitude”,
CHA
fault
detector:
“such
high
magnitude
of
residuals
should
be
tracked
carefully!
I’ll
keep
an
eye
on
the
incoming
sequence
of
this
signal
IOPS
and
if
it
remains
deviant
I’ll
generate
a
fault
on
it”.
In-‐Memory
Reference
Matrix
(Part
of
“Normality”
Model)
IOPS
####
2500
4900
800
####
User
Commits
####
10000
21000
4400
####
Log
File
Parallel
Write
####
2350
4100
22050
####
Log
File
Sync
####
5100
9025
4024
####
…
…
…
…
…
…
10500
20000
4050
10250
…
Residual
Values
(Part
of
a
Data
Point)
5600
-‐1000
-‐50
325
…
Observed
-‐
Predicted
=
36. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Cluster
Health
Advisor
(CHA)
Opera@on
Overview
36
• SRVCTL
lifecycle
daemon
management
• Enabled
by
default
-‐
Ac@vates
when
1st
RAC
instance
starts
• New
CHACTL
command
line
tool
for
all
local
opera@ons
• Java
GUI
Tool
available
on
OTN
soon
• Integrated
into
EMCC
Incident
Manager
and
no@fica@ons
• Monitoring
has
no
impact
on
DB
performance
or
availability
CHACTL
Client
CHA
Java
GUI
Client
SRVCTL
OS
Data
GIMR
DB
Data
CHM
Node
Health
Prognos(cs
Engine
Database
Health
Prognos(cs
Engine
OS
Model
DB
Model
Local
to
Cluster
EM
Cloud
Control
CHADDriver
37. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
CHA
Command
Line
Opera@ons
37
Checking
for
Health
Issues
and
Correc(ve
Ac(ons
with
CHACTL
QUERY
DIAGNOSIS
$ chactl query diagnosis -db oltpacdb -start 2016-10-28 01:52:50 -end 2016-10-28 03:19:15
2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected]
2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected]
2016-10-28 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected]
2016-10-28 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected]
Problem: DB Control File IO Performance
Description: CHA has detected that reads or writes to the control files are slower than expected.
Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were
slow because of an increase in disk IO.
The slow control file reads and writes may have an impact on checkpoint and Log Writer (LGWR) performance.
Action: Separate the control files from other database files and move them to faster disks or Solid
State Devices.
Problem: DB Log File Switch
Description: CHA detected that database sessions are waiting longer than expected
for log switch completions.
Cause: The Cluster Health Advisor (CHA) detected high contention during log switches
because the redo log files were small and the redo logs switched frequently.
Action: Increase the size of the redo logs.
38. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Cluster
Health
Advisor
–
Command
Line
Opera@ons
38
HTML
Diagnos(c
Health
Output
Available
(-‐html
file_name)
39. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Using
EMCC
for
Alerts
and
Correc@ve
Ac@ons
39
40. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
40
Using
the
CHA
GUI
to
Perform
Root-‐Cause
Analysis
Overview
• Standalone
Java
GUI
Client
• Must
be
run
on
local
cluster
node
• Can
be
run
against
live
GIMR
or
MDB
(dump)
file
chactl export repository -format
mdb -start '2017-05-01 00:00:00'
-end '2017-05-10 00:00:00'
• Used
internally
for
development
• Will
be
available
and
maintained
on
Oracle
Technology
Network
soon.
41. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Calibra@ng
CHA
to
your
RAC
Deployment
• Calibra@on
Goal:
Increase
sensi@vity
and
accuracy
with
sufficient
warning
• Release
ships
with
conserva@ve
models
to
minimize
false
warnings
– DEFAULT_CLUSTER
for
each
cluster
node
– DEFAULT_DB
for
each
database
instance
• Use
your
own
data
for
periods
of
“normal
opera@ons”
to
increase
sensi@vity
– Recommended
minimum
6
hour
period
– Should
include
all
normal
workload
phases
for
that
model
• Models
may
be
changed
dynamically
online
using
CHACTL
41
Overview
42. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Calibra@ng
CHA
to
your
RAC
deployment
42
Choosing
a
Data
Set
for
Calibra(on
–
Defining
“normal”
$ chactl query calibration –cluster –timeranges ‘start=2016-10-28 07:00:00,end=2016-10-28 13:00:00’
Cluster name : mycluster
Start time : 2016-10-28 07:00:00
End time : 2016-10-28 13:00:00
Total Samples : 11524
Percentage of filtered data : 100%
1) Disk read (ASM) (Mbyte/sec)
MEAN MEDIAN STDDEV MIN MAX
0.11 0.00 2.62 0.00 114.66
25 50 75 100 =100
99.87% 0.08% 0.00% 0.02% 0.03%
2) Disk write (ASM) (Mbyte/sec)
MEAN MEDIAN STDDEV MIN MAX
0.01 0.00 0.15 0.00 6.77
50 100 150 200 =200
100.00% 0.00% 0.00% 0.00% 0.00%
3) Disk throughput (ASM) (IO/sec)
MEAN MEDIAN STDDEV MIN MAX
2.20 0.00 31.17 0.00 1100.00
5000 10000 15000 20000 =20000
100.00% 0.00% 0.00% 0.00% 0.00%
4) CPU utilization (total) (%)
MEAN MEDIAN STDDEV MIN MAX
9.62 9.30 7.95 1.80 77.90
20 40 60 80 =80
92.67% 6.17% 1.11% 0.05% 0.00%
43. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Calibra@ng
CHA
to
your
RAC
deployment
• Create
and
store
the
new
model
$ chactl query calibrate cluster –model daytime –timeranges ‘start=2016-10-28 07:00:00,
end=2016-10-28 13:00:00’
• Begin
using
the
new
model
$ chactl monitor cluster –model daytime
• Confirm
the
new
model
is
being
used
$ chactl status –verbose
monitoring nodes svr01, svr02 using model daytime
monitoring database qoltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB
43
Crea(ng
a
new
CHA
Model
with
CHACTL
44. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Program
Agenda
High
Availability
Improvements
Con@nuous
Availability
Features
1
2
44
45. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Availability
for
applica@ons
–
Applica(on
Con(nuity
45
Availability
during
Planned
Maintenance
Con@nues
Availability
46. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Availability
for
applica@ons
–
Applica(on
Con(nuity
46
Availability
during
Planned
Maintenance
Con@nues
Availability
47. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
Real
Applica(on
Clusters
12c
Release
2
Con(nuous
Service
Availability
Real
Applica(on
Service
Levels
•
Scales
PDBs
and
Services
•
2
second
detec@on
on
EXA
•
Recovery
in
low
seconds
•
Drains
work
gradually
•
Recovers
in-‐flight
with
AC
“Always
Running”
47
48. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
• Recover
in-‐flight
with
Applica@on
Con@nuity
• ADG
sessions
survive
standby
role
change
• Drain
then
switchover,
AC
recovers
stragglers
Switchover to
db_resource_name [wait]
FAILOVER
Data
Guard
Observer
RAC
Primary
RAC
Standby
Site
A
Site
B
Oracle
Ac(ve
Data
Guard
12c
Release
2
Con(nuous
Service
Availability
48
49. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
§ Replays
in-‐flight
work
on
recoverable
errors
§ Masks
hardware,
sowware,
network,
storage
errors
and
@meouts
§ 12.1
JDBC-‐Thin,
UCP,
WebLogic
Server,
3rd
Party
Java
applica@on
servers
§
OCI,
ODP.NET
unmanaged,
JDBC
Thin
on
XA,
Tuxedo,
SQL*Plus
§ RAC,
RAC
One,
Ac@ve
Data
Guard
In-‐flight
work
con(nues
Applica(on
Con(nuity
49
12.2
50. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
1
–
Normal
Opera(on
• Client
marks
database
requests
• Server
decides
which
calls
can
cannot
be
replayed
• Directed,
client
holds
original
calls,
their
inputs,
and
valida@on
data
2
–
Outage
Phase
1:
Reconnect
• Checks
replay
is
enabled
• Verifies
@meliness
• Creates
a
new
connec@on
• Checks
target
database
is
valid
for
replay
• Uses
Transac@on
Guard
to
guarantee
last
outcome
50
3
–
Outage
Phase
2:
Replay
• Replays
captured
calls
• Ensures
results
returned
to
app
match
original
•
On
success,
returns
control
to
the
applica@on
Under
the
Covers
51. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
51
Steps
to
use
Applica@on
Con@nuity
Check
What
to
do
Iden@fy
Requests
Return
connec(ons
to
pool
-‐
UCP,
WebLogic
Ac@ve
GridLink,
3rd
Party
Containers
using
UCP
,
OCI
Session
Pool,
ODP.NET
Unmanaged,
Tuxedo
JDBC
Deprecated
Classes
Replace
non-‐standard
classes
(MOS
1364193.1);
Use
AC
orachk
to
know
Side
Effects
Use
disable
or
another
connec@on
if
a
request
should
not
be
replayed
Callbacks
UCP
and
WLS
–
with
labels
do
nothing.
12.2
set
FAILOVER_RESTORE=LEVEL1
Else
register
a
callback
for
applica@ons
that
change
state
outside
requests
Mutable
Func@ons
Grant
keeping
mutable
values,
e.g.
sequence.nextval
52. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Run
the
AC
Assessments
52
How
effec@ve
is
Applica@on
Con@nuity
for
your
applica@on
?
Where
Applica@on
Con@nuity
is
not
in
effect
-‐
what
steps
need
to
be
taken
?
No
Steps
1
Analyze
and
Report
Coverage
2
Report
usage
of
deprecated
Java
Classes
Assessment
tool
input
output
Applica@o
n
traces
user
Out
put
orachk
read
h=ps://blogs.oracle.com/WebLogicServer/entry/using_orachk_for_coverage_analysis
52
Available
in
ORAchk
53. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
For
owned
sequences:
ALTER
SEQUENCE..
[sequence]
[KEEP|NOKEEP];
CREATE
SEQUENCE..
[sequence]
[KEEP|NOKEEP];
Grant
and
Revoke
for
other
users:
GRANT
[KEEP
DATE
TIME
|
KEEP
SYSGUID]
[to
USER]
REVOKE
[KEEP
DATE
TIME
|
KEEP
SYSGUID]
[from
USER]
GRANT
KEEP
SEQUENCE
on
[sequence]
[to
USER]
;
REVOKE
KEEP
SEQUENCE
on
[sequence]
[from
USER]
53
Grant
Mutables
Keep
original
func@on
results
at
replay
54. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Decide
if
any
requests
should
not
be
replayed
e.g.
Autonomous
Transac@ons
UTL_HTTP
UTL_URL
UTL_FILE
UTL_FILE_TRANSFER
UTL_SMTP
UTL_TCP
UTL_MAIL
DBMS_JAVA
callouts
EXTPROC
54
Don’t
Want
to
Replay
Disable
replay
for
requests
that
should
not
be
replayed
Use
another
connec(on
or
disable
API
55. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Configura@on
FAILOVER_TYPE
=
TRANSACTION
for
Applica@on
Con@nuity
FAILOVER_RESTORE
=
LEVEL1
for
common
states
restored
at
failover
AQ_HA_NOTIFICATIONS=True
for
FAN
with
OCI
driver
,
ODP.NET,
Tuxedo,
SQL*Plus
55
For
Java
Set
Service
A=ributes
Use a replay data source (local or XA)
replay datasource=oracle.jdbc.replay.OracleDataSourceImpl
For
OCI,
ODP.NET,
Tuxedo,
SQL*Plus
On when enabled on the service
56. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Killing
Sessions
-‐
Extended
DBA
Command
Replays
alter
system
kill
session
…
noreplay
BEST
METHOD
dbms_service.disconnect_session([service],
dbms_service.noreplay)
BEST
METHOD
srvctl
stop
service
-‐db
orcl
-‐instance
orcl2
-‐force
YES
srvctl
stop
service
-‐db
orcl
-‐node
rws3
-‐force
YES
srvctl
stop
service
-‐db
orcl
-‐instance
orcl2
–noreplay
-‐force
srvctl
stop
service
-‐db
orcl
-‐node
rws3
–noreplay
-‐force
alter
system
kill
session
…
immediate
YES
56
57. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Availability
for
applica@ons
–
Applica(on
Con(nuity
57
Availability
during
Planned
Maintenance
Con@nues
Availability
58. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
• Complex
build
process
repeated
for
each
node
• Error
prone
• Longest
down-‐@me
and
maintenance
window
• Have
to
create
backup
(no
built-‐in
fallback
plan)
• How
do
you
enforce
standardiza@on?
• Build
gold
image
once,
use
everywhere
• Fewest
steps,
simplest
process
• Shortest
down-‐@me
and
maintenance
window
• Built-‐in
Fallback
• Built-‐in
standardiza@on
• Complex
build
process
repeated
for
each
node
• Error
prone
• Shorter
down-‐@me
and
maintenance
window
• Built-‐in
Fallback
• How
do
you
enforce
standardiza@on?
58
What
is
the
best
way
to
apply
maintenance?
1
2
3
1
2
3
1
2
Update
in
Place
Clone,
Update
and
Switch
Deploy
Gold
Image,
Switch
59. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
• Driw
not
seen
un@l
scan
takes
places
• Scanning
unchanged
targets
is
unnecessary
work
• Does
not
prevent
driw
• No
@me
lag
between
driw
and
alert
• No
extra
work
• Does
not
prevent
driw
59
• Locked
configs
cannot
driw
• Can
trigger
alert
if
unauthorized
changes
aQempted
• Can
trigger
alert
if
authorized
changes
made
What
is
the
best
approach
to
handling
sowware
driw?
Scan
Trigger
Alert
Prevent
60. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Streamline
the
Distribu@on
Process
• Ship
only
once
– To
a
customer,
to
a
site,
to
a
pool
• Ship
to
interested
par@es
only
– Subscribers
• Ship
only
what
is
necessary
– Updated
Modules,
Updated
Files,
Updated
Blocks
• Deploy
non-‐disrup@vely
– Ship
any
@me,
choose
when
to
use
it
60
Customer
61. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
61
• Simple
• Prevent
errors,
enable
easy
correc@ons
• Use
Gold
Images
for
all
scenarios
• Enable
mass
opera@ons
on
1000s
of
nodes
Rapid
Home
Provisioning
and
Maintenance
62. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Build
Inventory
of
Gold
Images
62
Create
once
on
RHP
Server
Installed
homes
11.2.0.4.1
DB
12.1.0.2
Custom
RHP
Server
•
Uptake
current
estate
by
promo(ng
exis(ng
homes
to
gold
images
•
Create
new
homes
and
promote
to
gold
images
axer
valida(on
•
Assign
states
to
images
for
lifecycle
management
GRID
11.2.0.4.3
WLS
12.2.1
•
Oracle
internal
users:
import
image
from
GIaaS
Grid
63. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
63
Supported
targets
and
environments
Manage
exis(ng
and
create
new
Pools,
Homes,
and
Databases
• Patch
and
Upgrade
exis@ng
deployments
– No
pre-‐requisites
(config,
agent,
daemon…)
for
targets
– Database
and
Grid
Infrastructure
11.2.0.3,
11.2.0.4,
12.1.0.2,
12.2.0.1
•
Provision,
Scale,
Patch
and
Upgrade
new
Clusters
and
Databases
– 11.2.0.4,
12.1.0.2,
12.2.0.1
• Bare
metal,
VMs,
CDBs,
non-‐CDBs
• SI
(standalone,
Restart,
Grid
Infr),
RAC
One,
RAC
• Linux,
Solaris,
AIX
• Generic
sowware
homes
64. Copyright
©
2016,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Easy
to
create
Server,
start
managing
current
estate
• RHP
Server
fully
self-‐contained
– Commodity
hardware
or
engineered
systems,
can
be
clustered
for
HA
– Enable
with
single
srvctl
command
– Lightweight
-‐
can
co-‐exist
with
other
func@ons
• No
new
sowware
needed
on
targets
• No
run-‐@me
dependency
between
Server
and
targets
64