Metric Abuse: Frequently Misused Metrics in Oracle


Published on

This is a presentation I created for RMOUG 2014 which I was sadly unable to attend. However, I wanted to share it with the Oracle community so that you can learn a bit about metrics that are frequently cited, frequently demonized, and frequently misused. In this deck we will go through the steps to diagnose issues and what NOT to blame as you go through the process.

The topics and concepts discussed here were originally formed in a blog post on the site:

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Metric Abuse: Frequently Misused Metrics in Oracle

  1. 1. Frequently Misused Metrics in Oracle
  2. 2. Steve Karam  Technical Manager at Delphix  Oracle Certified Master, ACE, and other acronyms  Just a little social  Blog:  Twitter: @OracleAlchemist  Google Plus: +SteveKaram  Facebook: OracleAlchemist
  3. 3. Hunting for Metrics  Oracle has more metrics than you can shake a stick at  Automatic Workload Repository (AWR)  Active Session History (ASH)  STATSPACK  V$ and X$ views
  4. 4. These Aren‘t the Metrics You‘re Looking For The problem is not a lack of data, it‘s knowing how—and when—to use it.
  5. 5. The Database is…  Broken  Slow  Down  Not working  Giving me errors Step 1: What is the actual problem?
  6. 6. I‘m going to…  Gather stats  Add an index to something  Bounce the database  Blame the SysAdmins  Blame the code  Kill the backup Step 2: Don‘t be hasty. Suppress kneejerk reactions. They have no place in problem analysis.
  7. 7. I think I know what to do! That‘s great! Thinking is good. But if you only think you found a solution, chances are good that there‘s more to it. Step 3: Don‘t immediately think in terms of fixes. Think in terms of findings and recommendations.
  8. 8. Gathering Stats Just like the optimizer needs to gather stats for proper query analysis, you need to gather stats for problem analysis. Think of it like a popular TV medical drama:  Your database is the patient. It‘s their job to be sick.  Your end users are the concerned family and friends. It‘s their job to be panicky.  You are the doctor and team. It‘s your job to be brilliant. Step 4. Be brilliant.
  9. 9. Problem Analysis Okay, so ―Be Brilliant‖ isn‘t a good step. At this point, what you really need to do is choose a path to solving the issue at hand. There are a few methods for doing this:  Top down: Review events and waits at a global level and drill down from there.  Scientific Method: Do background research, form a hypothesis, test your hypothesis, analyze the outcome.  Differential Diagnosis: shrink the probability of various issues using the process of elimination.
  10. 10. The Top Down Approach Top down tuning is a viable method, and is almost always preferable to bottom up tuning. This is very useful when you know the issue is global and you need to drill down into a root cause. It‘s good when things suddenly go wrong; however, it can be difficult when there are multiple root causes.
  11. 11. Scientific Method This method is highly effective at ensuring factual resolutions to problems. While it may not always be suitable for quickly resolving a critical issue, it‘s always suitable for case studies and post-fix root cause analysis.
  12. 12. Differential Diagnosis (DDX) This method is great for global issues where the root cause is unknown and no significant change has occurred.  Gather information  List symptoms  List possible conditions based on the symptoms  Test Test Test  Eliminate conditions  Don‘t kill the patient
  13. 13. Speaking of House In the show ―House‖, the main character has a saying: Everybody lies. DBA: So everyone, what changed? Developer: Nothing. SysAdmin: Nothing. Network Admin: Nothing. Project Manager: Nothing.
  14. 14. You never told us the real Step 4 Step 4 is simple. Solve the problem.
  15. 15. Well sure, but how? The methods we‘ve discussed are all well and good for looking into problems and figuring out how the cause and a solution. For the most part, it will be up to you to:  Gather the right metrics  Synthesize your data  Create findings and recommendations  Test for success
  16. 16. What are the right metrics? There are tons of papers and articles out there on wait events, metrics, and other metadata you should look for. We‘re not here for that. There are guides on how to use the metrics you find. We‘re not here for that either. No, we‘re here to discuss…
  17. 17. And how we can fix that
  18. 18. #5: db file scattered read What it is:  An indication of a multiblock I/O What it is not:  A full table scan  A reason to panic  The culprit (not always, anyways)
  19. 19. #5: db file scattered read The ‗db file scattered read‘ event happens when Oracle performs a multiblock I/O; for instance, when a full table scan occurs. Index full scans and fast full scans also result in multiblock I/O. But those don‘t sound so horrible, now do they? Why is that?
  20. 20. #5: db file scattered read Over the years, DBAs and developers have cultivated a mortal terror of full table scans. Of course, they can be a problem, but are they always the problem? Of course not. Some facts about db file scattered reads:  They are an incredibly optimal way to utilize disk to gather large amounts of unordered data  They aren‘t the only indication of full scans or multiblock I/O. direct path read and db file parallel read events also are.
  21. 21. #5: db file scattered read Before you go off on a witch hunt because of a ‗db file scattered read‘ event, consider the following:  Are there any indications that full scans are actually the problem?  Are you sure that an index read would be more efficient in this case?  Do your other symptoms match up with the conclusion that a query performing a full table scan is your culprit? Full table scans are the devil!
  22. 22. #4: Parse to Execute Ratio What it is:  An indication of how often you‘re parsing vs. executing queries What it is not:  An indication of how often you‘re hard parsing vs. executing queries
  23. 23. #4: Parse to Execute Ratio Based on this formula: round(100*(1-:parse/:execute),2) If you hard parse a query and then execute it, your Execute to Parse % is 0. If you soft parse a query and then execute it, your Execute to Parse % is 0.
  24. 24. #4: Parse to Execute Ratio What about all those articles and forum posts that say adding bind variables will improve your Execute to Parse %? They‘re not wrong, but incomplete. Adding bind variables will improve your Execute to Parse %... IF you have some form of statement caching enabled.
  25. 25. #4: Parse to Execute Ratio  Hard Parses can take up valuable CPU cycles  Soft Parses can still cripple your Oracle instance  The best way to reduce library cache contention is to not touch it at all!
  26. 26. #4: Parse to Execute Ratio Tom Kyte said it best: there are three types of parses (well, maybe four) in Oracle...  there is the dreaded hard parse - they are VERY VERY VERY bad.  there is the hurtful soft parse - they are VERY VERY very bad.  there is the hated softer soft parse you might be able to achieve with session cached cursors - they are VERY very very bad.  then there is the absence of a parse, no parse, silence. This is golden, this is perfect, this is the goal. “
  27. 27. #4: Parse to Execute Ratio  To see hard parses vs. soft parses, check out the Parse Count (total) and Parse Count (hard) in an AWR report or V$ views  To reduce parsing as a whole (the actual goal), make sure the code does not explicitly parse per execution OR that the client software has statement caching enabled.  For example, in JBoss, you can set the prepared-statement-cache-size parameter  SESSION_CACHED_CURSORS is not the same thing!
  28. 28. #3: Buffer Hit Ratio What it is:  Another ratio  A proportional view of LIOs to PIOs What it is not:  A silver bullet  A magic ratio  A valuable performance indicator
  29. 29. #3: Buffer Hit Ratio Wait, buffer hit ratio isn‘t valuable? Okay, maybe that was a little heavy handed. It can be valuable as an ―at-a- glance‖ metric to see if something is absolutely abysmal.
  30. 30. #3: Buffer Hit Ratio It is important to remember that a high buffer hit ratio doesn‘t necessarily mean the data you needed was available in cache when it was needed. It also doesn‘t mean the queries you‘re running are optimal…they just happen to be getting their data from cache. 100% of crap in RAM is still crap. It‘s just logical crap.
  31. 31. #3: Buffer Hit Ratio So what is it good for?  If you know your queries are perfect (lolright) then it can indicate that you don‘t have enough RAM allocated to your buffer cache.  That‘s it, I just have a second bullet here to keep the other one company.
  32. 32. #2: CPU % What it is:  CPU Usage per CPU What it is not:  Equivalent to your laptop‘s CPU %  A viable measure of CPU usage (alone)  A way to diagnose performance
  33. 33. #2: CPU % This isn‘t your Windows laptop. When your PC shows 99% or 100% CPU usage, you panic. That‘s because you only have one CPU (usually), and 99% means you can barely drag a window from one side of the screen to the other.
  34. 34. #2: CPU % In the multi-processor world, it‘s not as big of a problem. In fact, it can be a huge benefit.  You have multiple CPUs on your servers. 99% usage of one or more is probably not a big deal.  CPU is the processor, and the part of the system that performs work (as opposed to wait). You want this to be heavily utilized.
  35. 35. #2: CPU % What do you pay licensing based on? Number of CPUs. So what do you actually want to be as fully utilized as possible?
  36. 36. #2: CPU % Instead, we should be looking at:  Runqueue length – Provided by vmstat, uptime, top, and other tools. Shows the number of processes actively waiting or working on CPU at any given time.  Oracle Average Active Sessions – This metric is usually more pertinent from the DBA side, as it shows the number of sessions actively waiting or working at any given time.
  37. 37. #2: CPU %  The focus should be on concurrency  Using a single CPU heavily is only a problem if the other CPUs are fairly dormant…but that‘s another issue entirely.  Even run queue is not a perfect metric— some things, like uninterruptable I/O wait, can skew the results.  I/O wait should be part of the bigger picture along with run queue length.
  38. 38. #1: Cost What it is:  A numerical estimation proportional to the expected resources necessary to execute a statement with a given plan. What it is not:  Anything else.
  39. 39. #1: Cost This one comes up all. the. time. Here‘s a simple thing to keep in mind:  Oracle‘s optimizer is cost based  Your tuning practices are not
  40. 40. #1: Cost  Cost is good to understand, so you can understand why Oracle chose the plan it did.  However, you shouldn‘t try to tune specifically to reduce cost.  Cost is not proportional to time. A high or low cost doesn‘t necessarily mean a query will be slower or faster.
  41. 41. #1: Cost Why is cost misused?  ―Gather stats‖ is like the ―restart Windows‖ of the Oracle world. Gathering stats changes plans. Plans have costs. I should tune costs.  The cost based optimizer changed my plan. It‘s cost based. I‘m cost based.
  42. 42. #1: Cost  Cost is not a bottleneck, nor is it indicative of actual work. It‘s indicative of relative work based on parameters that exist purely in the calculations of your particular Oracle instance.  Instead of tuning to reduce cost, tune to reduce bottlenecks. Those are real things that cause real wait.
  43. 43. #1: Cost  Real things to tune  Reduce block touches (both physical and logical) by improving your query selectivity, join order, index usage, etc.  Reduce parses, both hard and soft.  Investigate execution plans and use statistics, hints, or other methods to improve Oracle‘s costing—just don‘t try to ‗tune down cost‘ directly.
  44. 44. Step 4… Step 4, if you remember, was ―solve the problem.‖ That advice still stands. But make sure you use the right metrics to do it. And good luck!
  45. 45. Q&A ?