Metric Abuse: Frequently Misused Metrics in Oracle

Frequently Misused Metrics in Oracle

Steve Karam
 Technical Manager at Delphix
 Oracle Certified Master, ACE, and other
acronyms
 Just a little social
 Blog: http://www.oraclealchemist.com
 Twitter: @OracleAlchemist
 Google Plus: +SteveKaram
 Facebook: OracleAlchemist

Hunting for Metrics
 Oracle has more metrics than you can
shake a stick at
 Automatic Workload Repository (AWR)
 Active Session History (ASH)
 STATSPACK
 V$ and X$ views

These Aren‘t the Metrics You‘re
Looking For
The problem is not a
lack of data, it‘s
knowing how—and
when—to use it.

The Database is…
 Broken
 Slow
 Down
 Not working
 Giving me errors
Step 1: What is the actual problem?

I‘m going to…
 Gather stats
 Add an index to something
 Bounce the database
 Blame the SysAdmins
 Blame the code
 Kill the backup
Step 2: Don‘t be hasty. Suppress kneejerk
reactions. They have no place in problem
analysis.

I think I know what to do!
That‘s great! Thinking is good. But if you only
think you found a solution, chances are good
that there‘s more to it.
Step 3: Don‘t immediately think in terms of
fixes. Think in terms of findings and
recommendations.

Gathering Stats
Just like the optimizer needs to gather stats for proper query
analysis, you need to gather stats for problem analysis.
Think of it like a popular TV medical drama:
 Your database is the patient. It‘s their job to be sick.
 Your end users are the concerned family and friends.
It‘s their job to be panicky.
 You are the doctor and team. It‘s your job to be brilliant.
Step 4. Be brilliant.

Problem Analysis
Okay, so ―Be Brilliant‖ isn‘t a good step. At
this point, what you really need to do is
choose a path to solving the issue at hand.
There are a few methods for doing this:
 Top down: Review events and waits at a
global level and drill down from there.
 Scientific Method: Do background
research, form a hypothesis, test your
hypothesis, analyze the outcome.
 Differential Diagnosis: shrink the
probability of various issues using the
process of elimination.

The Top Down Approach
Top down tuning is a viable
method, and is almost always
preferable to bottom up tuning.
This is very useful when you
know the issue is global and
you need to drill down into a
root cause. It‘s good when
things suddenly go wrong;
however, it can be difficult when
there are multiple root causes.

Scientific Method
This method is highly
effective at ensuring
factual resolutions to
problems. While it may
not always be suitable
for quickly resolving a
critical issue, it‘s always
suitable for case studies
and post-fix root cause
analysis.

Differential Diagnosis (DDX)
This method is great for global issues where the
root cause is unknown and no significant change
has occurred.
 Gather information
 List symptoms
 List possible conditions
based on the symptoms
 Test Test Test
 Eliminate conditions
 Don‘t kill the patient

Speaking of House
In the show ―House‖, the main character
has a saying: Everybody lies.
DBA: So everyone, what
changed?
Developer: Nothing.
SysAdmin: Nothing.
Network Admin: Nothing.
Project Manager: Nothing.

You never told us the real Step
4
Step 4 is simple.
Solve the problem.

Well sure, but how?
The methods we‘ve discussed are all well
and good for looking into problems and
figuring out how the cause and a solution.
For the most part, it will be up to you to:
 Gather the right metrics
 Synthesize your data
 Create findings and recommendations
 Test for success

What are the right metrics?
There are tons of papers and articles out
there on wait events, metrics, and other
metadata you should look for. We‘re not
here for that.
There are guides on how to use the
metrics you find. We‘re not here for that
either.
No, we‘re here to discuss…

#5: db file scattered read
What it is:
 An indication of a multiblock I/O
What it is not:
 A full table scan
 A reason to panic
 The culprit (not always, anyways)

The ‗db file scattered read‘ event happens
when Oracle performs a multiblock I/O;
for instance, when a full table scan occurs.
Index full scans and fast full scans also
result in multiblock I/O. But those don‘t
sound so horrible, now do they?
Why is that?

Over the years, DBAs and developers have
cultivated a mortal terror of full table scans. Of
course, they can be a problem, but are they
always the problem? Of course not.
Some facts about db file scattered reads:
 They are an incredibly optimal way to utilize
disk to gather large amounts of unordered
data
 They aren‘t the only indication of full scans or
multiblock I/O. direct path read and db file
parallel read events also are.

Before you go off on a witch hunt
because of a ‗db file scattered read‘
event, consider the following:
 Are there any indications that full
scans are actually the problem?
 Are you sure that an index read
would be more efficient in this
case?
 Do your other symptoms match
up with the conclusion that a
query performing a full table scan
is your culprit?
Full table scans
are the devil!

#4: Parse to Execute Ratio
What it is:
 An indication of how often you‘re parsing
vs. executing queries
What it is not:
 An indication of how often you‘re hard
parsing vs. executing queries

Based on this formula:
round(100*(1-:parse/:execute),2)
If you hard parse a query and then
execute it, your Execute to Parse % is 0.
If you soft parse a query and then execute
it, your Execute to Parse % is 0.

What about all those articles and forum
posts that say adding bind variables will
improve your Execute to Parse %?
They‘re not wrong, but incomplete. Adding
bind variables will improve your Execute
to Parse %... IF you have some form of
statement caching enabled.

 Hard Parses can take up valuable CPU cycles
 Soft Parses can still cripple your Oracle
instance
 The best way to reduce library cache
contention is to not touch it at all!

Tom Kyte said it best:
there are three types of parses (well, maybe four) in Oracle...
 there is the dreaded hard parse - they are VERY VERY
VERY bad.
 there is the hurtful soft parse - they are VERY VERY very
bad.
 there is the hated softer soft parse you might be able to
achieve with session cached cursors - they are VERY very
very bad.
 then there is the absence of a parse, no parse, silence. This
is golden, this is perfect, this is the goal.
“

 To see hard parses vs. soft parses, check
out the Parse Count (total) and Parse
Count (hard) in an AWR report or V$ views
 To reduce parsing as a whole (the actual
goal), make sure the code does not
explicitly parse per execution OR that the
client software has statement caching
enabled.
 For example, in JBoss, you can set the
prepared-statement-cache-size parameter
 SESSION_CACHED_CURSORS is not the
same thing!

#3: Buffer Hit Ratio
What it is:
 Another ratio
 A proportional view of LIOs to PIOs
What it is not:
 A silver bullet
 A magic ratio
 A valuable performance indicator

Wait, buffer hit ratio isn‘t valuable?
Okay, maybe that was a little heavy
handed. It can be valuable as an ―at-a-
glance‖ metric to see if something is
absolutely abysmal.

It is important to remember
that a high buffer hit ratio
doesn‘t necessarily mean
the data you needed was
available in cache when it
was needed. It also doesn‘t
mean the queries you‘re
running are optimal…they
just happen to be getting
their data from cache.
100% of crap in RAM is still
crap. It‘s just logical crap.

So what is it good for?
 If you know your queries are perfect
(lolright) then it can indicate that you
don‘t have enough RAM allocated to
your buffer cache.
 That‘s it, I just have a second bullet here
to keep the other one company.

#2: CPU %
What it is:
 CPU Usage per CPU
What it is not:
 Equivalent to your laptop‘s CPU %
 A viable measure of CPU usage (alone)
 A way to diagnose performance

#2: CPU %
This isn‘t your Windows
laptop.
When your PC shows 99%
or 100% CPU usage, you
panic. That‘s because you
only have one CPU
(usually), and 99% means
you can barely drag a
window from one side of
the screen to the other.

#2: CPU %
In the multi-processor world, it‘s not as big of
a problem. In fact, it can be a huge benefit.
 You have multiple CPUs on your servers.
99% usage of one or more is probably not
a big deal.
 CPU is the processor, and the part of the
system that performs work (as opposed to
wait). You want this to be heavily utilized.

#2: CPU %
What do you pay licensing
based on?
Number of CPUs.
So what do you actually
want to be as fully utilized
as possible?

#2: CPU %
Instead, we should be looking at:
 Runqueue length – Provided by
vmstat, uptime, top, and other tools.
Shows the number of processes
actively waiting or working on CPU
at any given time.
 Oracle Average Active Sessions –
This metric is usually more pertinent
from the DBA side, as it shows the
number of sessions actively waiting
or working at any given time.

#2: CPU %
 The focus should be on concurrency
 Using a single CPU heavily is only a
problem if the other CPUs are fairly
dormant…but that‘s another issue
entirely.
 Even run queue is not a perfect metric—
some things, like uninterruptable I/O
wait, can skew the results.
 I/O wait should be part of the bigger picture
along with run queue length.

#1: Cost
What it is:
 A numerical estimation proportional to
the expected resources necessary to
execute a statement with a given plan.
What it is not:
 Anything else.

#1: Cost
This one comes up
all. the. time.
Here‘s a simple thing
to keep in mind:
 Oracle‘s optimizer
is cost based
 Your tuning
practices are not

#1: Cost
 Cost is good to understand, so you can
understand why Oracle chose the plan it
did.
 However, you shouldn‘t try to tune
specifically to reduce cost.
 Cost is not proportional to time. A high or
low cost doesn‘t necessarily mean a
query will be slower or faster.

#1: Cost
Why is cost misused?
 ―Gather stats‖ is like the
―restart Windows‖ of the
Oracle world. Gathering
stats changes plans.
Plans have costs. I
should tune costs.
 The cost based
optimizer changed my
plan. It‘s cost based. I‘m
cost based.

#1: Cost
 Cost is not a bottleneck, nor is it
indicative of actual work. It‘s indicative of
relative work based on parameters that
exist purely in the calculations of your
particular Oracle instance.
 Instead of tuning to reduce cost, tune to
reduce bottlenecks. Those are real
things that cause real wait.

#1: Cost
 Real things to tune
 Reduce block touches (both physical and
logical) by improving your query selectivity,
join order, index usage, etc.
 Reduce parses, both hard and soft.
 Investigate execution plans and use
statistics, hints, or other methods to improve
Oracle‘s costing—just don‘t try to ‗tune down
cost‘ directly.

Step 4…
Step 4, if you remember, was ―solve the
problem.‖
That advice still stands.
But make sure you use
the right metrics to do it.
And good luck!

Metric Abuse: Frequently Misused Metrics in Oracle

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Metric Abuse: Frequently Misused Metrics in Oracle

Similar to Metric Abuse: Frequently Misused Metrics in Oracle (20)

Recently uploaded

Recently uploaded (20)

Metric Abuse: Frequently Misused Metrics in Oracle