Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Frequently Misused Metrics in Oracle
Steve Karam
 Technical Manager at Delphix
 Oracle Certified Master, ACE, and other
acronyms
 Just a little social
 Blo...
Hunting for Metrics
 Oracle has more metrics than you can
shake a stick at
 Automatic Workload Repository (AWR)
 Active...
These Aren‘t the Metrics You‘re
Looking For
The problem is not a
lack of data, it‘s
knowing how—and
when—to use it.
The Database is…
 Broken
 Slow
 Down
 Not working
 Giving me errors
Step 1: What is the actual problem?
I‘m going to…
 Gather stats
 Add an index to something
 Bounce the database
 Blame the SysAdmins
 Blame the code
 Ki...
I think I know what to do!
That‘s great! Thinking is good. But if you only
think you found a solution, chances are good
th...
Gathering Stats
Just like the optimizer needs to gather stats for proper query
analysis, you need to gather stats for prob...
Problem Analysis
Okay, so ―Be Brilliant‖ isn‘t a good step. At
this point, what you really need to do is
choose a path to ...
The Top Down Approach
Top down tuning is a viable
method, and is almost always
preferable to bottom up tuning.
This is ver...
Scientific Method
This method is highly
effective at ensuring
factual resolutions to
problems. While it may
not always be ...
Differential Diagnosis (DDX)
This method is great for global issues where the
root cause is unknown and no significant cha...
Speaking of House
In the show ―House‖, the main character
has a saying: Everybody lies.
DBA: So everyone, what
changed?
De...
You never told us the real Step
4
Step 4 is simple.
Solve the problem.
Well sure, but how?
The methods we‘ve discussed are all well
and good for looking into problems and
figuring out how the c...
What are the right metrics?
There are tons of papers and articles out
there on wait events, metrics, and other
metadata yo...
And how we can fix that
#5: db file scattered read
What it is:
 An indication of a multiblock I/O
What it is not:
 A full table scan
 A reason ...
#5: db file scattered read
The ‗db file scattered read‘ event happens
when Oracle performs a multiblock I/O;
for instance,...
#5: db file scattered read
Over the years, DBAs and developers have
cultivated a mortal terror of full table scans. Of
cou...
#5: db file scattered read
Before you go off on a witch hunt
because of a ‗db file scattered read‘
event, consider the fol...
#4: Parse to Execute Ratio
What it is:
 An indication of how often you‘re parsing
vs. executing queries
What it is not:
...
#4: Parse to Execute Ratio
Based on this formula:
round(100*(1-:parse/:execute),2)
If you hard parse a query and then
exec...
#4: Parse to Execute Ratio
What about all those articles and forum
posts that say adding bind variables will
improve your ...
#4: Parse to Execute Ratio
 Hard Parses can take up valuable CPU cycles
 Soft Parses can still cripple your Oracle
insta...
#4: Parse to Execute Ratio
Tom Kyte said it best:
there are three types of parses (well, maybe four) in Oracle...
 there ...
#4: Parse to Execute Ratio
 To see hard parses vs. soft parses, check
out the Parse Count (total) and Parse
Count (hard) ...
#3: Buffer Hit Ratio
What it is:
 Another ratio
 A proportional view of LIOs to PIOs
What it is not:
 A silver bullet
...
#3: Buffer Hit Ratio
Wait, buffer hit ratio isn‘t valuable?
Okay, maybe that was a little heavy
handed. It can be valuable...
#3: Buffer Hit Ratio
It is important to remember
that a high buffer hit ratio
doesn‘t necessarily mean
the data you needed...
#3: Buffer Hit Ratio
So what is it good for?
 If you know your queries are perfect
(lolright) then it can indicate that y...
#2: CPU %
What it is:
 CPU Usage per CPU
What it is not:
 Equivalent to your laptop‘s CPU %
 A viable measure of CPU us...
#2: CPU %
This isn‘t your Windows
laptop.
When your PC shows 99%
or 100% CPU usage, you
panic. That‘s because you
only hav...
#2: CPU %
In the multi-processor world, it‘s not as big of
a problem. In fact, it can be a huge benefit.
 You have multip...
#2: CPU %
What do you pay licensing
based on?
Number of CPUs.
So what do you actually
want to be as fully utilized
as poss...
#2: CPU %
Instead, we should be looking at:
 Runqueue length – Provided by
vmstat, uptime, top, and other tools.
Shows th...
#2: CPU %
 The focus should be on concurrency
 Using a single CPU heavily is only a
problem if the other CPUs are fairly...
#1: Cost
What it is:
 A numerical estimation proportional to
the expected resources necessary to
execute a statement with...
#1: Cost
This one comes up
all. the. time.
Here‘s a simple thing
to keep in mind:
 Oracle‘s optimizer
is cost based
 You...
#1: Cost
 Cost is good to understand, so you can
understand why Oracle chose the plan it
did.
 However, you shouldn‘t tr...
#1: Cost
Why is cost misused?
 ―Gather stats‖ is like the
―restart Windows‖ of the
Oracle world. Gathering
stats changes ...
#1: Cost
 Cost is not a bottleneck, nor is it
indicative of actual work. It‘s indicative of
relative work based on parame...
#1: Cost
 Real things to tune
 Reduce block touches (both physical and
logical) by improving your query selectivity,
joi...
Step 4…
Step 4, if you remember, was ―solve the
problem.‖
That advice still stands.
But make sure you use
the right metric...
Q&A
?
Upcoming SlideShare
Loading in …5
×

Metric Abuse: Frequently Misused Metrics in Oracle

1,732 views

Published on

This is a presentation I created for RMOUG 2014 which I was sadly unable to attend. However, I wanted to share it with the Oracle community so that you can learn a bit about metrics that are frequently cited, frequently demonized, and frequently misused. In this deck we will go through the steps to diagnose issues and what NOT to blame as you go through the process.

The topics and concepts discussed here were originally formed in a blog post on the OracleAlchemist.com site: http://www.oraclealchemist.com/news/these-arent-the-metrics-youre-looking-for/

Published in: Technology, Education
  • Be the first to comment

Metric Abuse: Frequently Misused Metrics in Oracle

  1. 1. Frequently Misused Metrics in Oracle
  2. 2. Steve Karam  Technical Manager at Delphix  Oracle Certified Master, ACE, and other acronyms  Just a little social  Blog: http://www.oraclealchemist.com  Twitter: @OracleAlchemist  Google Plus: +SteveKaram  Facebook: OracleAlchemist
  3. 3. Hunting for Metrics  Oracle has more metrics than you can shake a stick at  Automatic Workload Repository (AWR)  Active Session History (ASH)  STATSPACK  V$ and X$ views
  4. 4. These Aren‘t the Metrics You‘re Looking For The problem is not a lack of data, it‘s knowing how—and when—to use it.
  5. 5. The Database is…  Broken  Slow  Down  Not working  Giving me errors Step 1: What is the actual problem?
  6. 6. I‘m going to…  Gather stats  Add an index to something  Bounce the database  Blame the SysAdmins  Blame the code  Kill the backup Step 2: Don‘t be hasty. Suppress kneejerk reactions. They have no place in problem analysis.
  7. 7. I think I know what to do! That‘s great! Thinking is good. But if you only think you found a solution, chances are good that there‘s more to it. Step 3: Don‘t immediately think in terms of fixes. Think in terms of findings and recommendations.
  8. 8. Gathering Stats Just like the optimizer needs to gather stats for proper query analysis, you need to gather stats for problem analysis. Think of it like a popular TV medical drama:  Your database is the patient. It‘s their job to be sick.  Your end users are the concerned family and friends. It‘s their job to be panicky.  You are the doctor and team. It‘s your job to be brilliant. Step 4. Be brilliant.
  9. 9. Problem Analysis Okay, so ―Be Brilliant‖ isn‘t a good step. At this point, what you really need to do is choose a path to solving the issue at hand. There are a few methods for doing this:  Top down: Review events and waits at a global level and drill down from there.  Scientific Method: Do background research, form a hypothesis, test your hypothesis, analyze the outcome.  Differential Diagnosis: shrink the probability of various issues using the process of elimination.
  10. 10. The Top Down Approach Top down tuning is a viable method, and is almost always preferable to bottom up tuning. This is very useful when you know the issue is global and you need to drill down into a root cause. It‘s good when things suddenly go wrong; however, it can be difficult when there are multiple root causes.
  11. 11. Scientific Method This method is highly effective at ensuring factual resolutions to problems. While it may not always be suitable for quickly resolving a critical issue, it‘s always suitable for case studies and post-fix root cause analysis.
  12. 12. Differential Diagnosis (DDX) This method is great for global issues where the root cause is unknown and no significant change has occurred.  Gather information  List symptoms  List possible conditions based on the symptoms  Test Test Test  Eliminate conditions  Don‘t kill the patient
  13. 13. Speaking of House In the show ―House‖, the main character has a saying: Everybody lies. DBA: So everyone, what changed? Developer: Nothing. SysAdmin: Nothing. Network Admin: Nothing. Project Manager: Nothing.
  14. 14. You never told us the real Step 4 Step 4 is simple. Solve the problem.
  15. 15. Well sure, but how? The methods we‘ve discussed are all well and good for looking into problems and figuring out how the cause and a solution. For the most part, it will be up to you to:  Gather the right metrics  Synthesize your data  Create findings and recommendations  Test for success
  16. 16. What are the right metrics? There are tons of papers and articles out there on wait events, metrics, and other metadata you should look for. We‘re not here for that. There are guides on how to use the metrics you find. We‘re not here for that either. No, we‘re here to discuss…
  17. 17. And how we can fix that
  18. 18. #5: db file scattered read What it is:  An indication of a multiblock I/O What it is not:  A full table scan  A reason to panic  The culprit (not always, anyways)
  19. 19. #5: db file scattered read The ‗db file scattered read‘ event happens when Oracle performs a multiblock I/O; for instance, when a full table scan occurs. Index full scans and fast full scans also result in multiblock I/O. But those don‘t sound so horrible, now do they? Why is that?
  20. 20. #5: db file scattered read Over the years, DBAs and developers have cultivated a mortal terror of full table scans. Of course, they can be a problem, but are they always the problem? Of course not. Some facts about db file scattered reads:  They are an incredibly optimal way to utilize disk to gather large amounts of unordered data  They aren‘t the only indication of full scans or multiblock I/O. direct path read and db file parallel read events also are.
  21. 21. #5: db file scattered read Before you go off on a witch hunt because of a ‗db file scattered read‘ event, consider the following:  Are there any indications that full scans are actually the problem?  Are you sure that an index read would be more efficient in this case?  Do your other symptoms match up with the conclusion that a query performing a full table scan is your culprit? Full table scans are the devil!
  22. 22. #4: Parse to Execute Ratio What it is:  An indication of how often you‘re parsing vs. executing queries What it is not:  An indication of how often you‘re hard parsing vs. executing queries
  23. 23. #4: Parse to Execute Ratio Based on this formula: round(100*(1-:parse/:execute),2) If you hard parse a query and then execute it, your Execute to Parse % is 0. If you soft parse a query and then execute it, your Execute to Parse % is 0.
  24. 24. #4: Parse to Execute Ratio What about all those articles and forum posts that say adding bind variables will improve your Execute to Parse %? They‘re not wrong, but incomplete. Adding bind variables will improve your Execute to Parse %... IF you have some form of statement caching enabled.
  25. 25. #4: Parse to Execute Ratio  Hard Parses can take up valuable CPU cycles  Soft Parses can still cripple your Oracle instance  The best way to reduce library cache contention is to not touch it at all!
  26. 26. #4: Parse to Execute Ratio Tom Kyte said it best: there are three types of parses (well, maybe four) in Oracle...  there is the dreaded hard parse - they are VERY VERY VERY bad.  there is the hurtful soft parse - they are VERY VERY very bad.  there is the hated softer soft parse you might be able to achieve with session cached cursors - they are VERY very very bad.  then there is the absence of a parse, no parse, silence. This is golden, this is perfect, this is the goal. “
  27. 27. #4: Parse to Execute Ratio  To see hard parses vs. soft parses, check out the Parse Count (total) and Parse Count (hard) in an AWR report or V$ views  To reduce parsing as a whole (the actual goal), make sure the code does not explicitly parse per execution OR that the client software has statement caching enabled.  For example, in JBoss, you can set the prepared-statement-cache-size parameter  SESSION_CACHED_CURSORS is not the same thing!
  28. 28. #3: Buffer Hit Ratio What it is:  Another ratio  A proportional view of LIOs to PIOs What it is not:  A silver bullet  A magic ratio  A valuable performance indicator
  29. 29. #3: Buffer Hit Ratio Wait, buffer hit ratio isn‘t valuable? Okay, maybe that was a little heavy handed. It can be valuable as an ―at-a- glance‖ metric to see if something is absolutely abysmal.
  30. 30. #3: Buffer Hit Ratio It is important to remember that a high buffer hit ratio doesn‘t necessarily mean the data you needed was available in cache when it was needed. It also doesn‘t mean the queries you‘re running are optimal…they just happen to be getting their data from cache. 100% of crap in RAM is still crap. It‘s just logical crap.
  31. 31. #3: Buffer Hit Ratio So what is it good for?  If you know your queries are perfect (lolright) then it can indicate that you don‘t have enough RAM allocated to your buffer cache.  That‘s it, I just have a second bullet here to keep the other one company.
  32. 32. #2: CPU % What it is:  CPU Usage per CPU What it is not:  Equivalent to your laptop‘s CPU %  A viable measure of CPU usage (alone)  A way to diagnose performance
  33. 33. #2: CPU % This isn‘t your Windows laptop. When your PC shows 99% or 100% CPU usage, you panic. That‘s because you only have one CPU (usually), and 99% means you can barely drag a window from one side of the screen to the other.
  34. 34. #2: CPU % In the multi-processor world, it‘s not as big of a problem. In fact, it can be a huge benefit.  You have multiple CPUs on your servers. 99% usage of one or more is probably not a big deal.  CPU is the processor, and the part of the system that performs work (as opposed to wait). You want this to be heavily utilized.
  35. 35. #2: CPU % What do you pay licensing based on? Number of CPUs. So what do you actually want to be as fully utilized as possible?
  36. 36. #2: CPU % Instead, we should be looking at:  Runqueue length – Provided by vmstat, uptime, top, and other tools. Shows the number of processes actively waiting or working on CPU at any given time.  Oracle Average Active Sessions – This metric is usually more pertinent from the DBA side, as it shows the number of sessions actively waiting or working at any given time.
  37. 37. #2: CPU %  The focus should be on concurrency  Using a single CPU heavily is only a problem if the other CPUs are fairly dormant…but that‘s another issue entirely.  Even run queue is not a perfect metric— some things, like uninterruptable I/O wait, can skew the results.  I/O wait should be part of the bigger picture along with run queue length.
  38. 38. #1: Cost What it is:  A numerical estimation proportional to the expected resources necessary to execute a statement with a given plan. What it is not:  Anything else.
  39. 39. #1: Cost This one comes up all. the. time. Here‘s a simple thing to keep in mind:  Oracle‘s optimizer is cost based  Your tuning practices are not
  40. 40. #1: Cost  Cost is good to understand, so you can understand why Oracle chose the plan it did.  However, you shouldn‘t try to tune specifically to reduce cost.  Cost is not proportional to time. A high or low cost doesn‘t necessarily mean a query will be slower or faster.
  41. 41. #1: Cost Why is cost misused?  ―Gather stats‖ is like the ―restart Windows‖ of the Oracle world. Gathering stats changes plans. Plans have costs. I should tune costs.  The cost based optimizer changed my plan. It‘s cost based. I‘m cost based.
  42. 42. #1: Cost  Cost is not a bottleneck, nor is it indicative of actual work. It‘s indicative of relative work based on parameters that exist purely in the calculations of your particular Oracle instance.  Instead of tuning to reduce cost, tune to reduce bottlenecks. Those are real things that cause real wait.
  43. 43. #1: Cost  Real things to tune  Reduce block touches (both physical and logical) by improving your query selectivity, join order, index usage, etc.  Reduce parses, both hard and soft.  Investigate execution plans and use statistics, hints, or other methods to improve Oracle‘s costing—just don‘t try to ‗tune down cost‘ directly.
  44. 44. Step 4… Step 4, if you remember, was ―solve the problem.‖ That advice still stands. But make sure you use the right metrics to do it. And good luck!
  45. 45. Q&A ?

×