Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bad metric, bad! - Joseph Ours

2,490 views

Published on

Metrics have always been used in corporate sectors, primarily as a way to gain insight into what is an otherwise invisible world. Not only that, “standards bodies”, such as CMMi, require metrics to achieve a certain maturity level. These two factors tend to drive organizations to blindly adopt a set of metrics as a way of satisfying some process transparency requirement. Rarely do any organizations apply any statistical or scientific thought behind the measures and metrics they establish and interpret. In this talk, we’ll look at some common metrics and why they fail to represent what most believe they do. We’ll discuss the real purpose of metrics, issues with metric programs, how to leverage metrics effectively, and finally specific measure and metric pitfalls organizations encounter.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Bad metric, bad! - Joseph Ours

  1. 1. NOTICE: Proprietary and Confidential This material is proprietary to Centric Consulting, LLC. It contains trade secrets and informationwhich is solely the property of Centric Consulting,LLC. This material is solely for the Client’sinternaluse. This material shall not be used, reproduced, copied, disclosed, transmitted, in whole or in part, without the express consent of Centric Consulting,LLC. © 2013 Centric Consulting,LLC. All rights reserved Bad Metric. Bad! Teaching an old dog, nothing new
  2. 2. What are some typical metrics that you measure?
  3. 3. Other Examples of Software Testing Metrics • Test Case Counts by Execution Status • Test Case Percentages by Execution Status • Test Case Execution Status Trend • Test Case Status Planned vs Executed • Test Case Coverage • Test Case Status vs Coverage • Test Case First Run Failure Counts • Test Case Re– Run Counts Test Cases • Automation Index (Percent Automatable) • Automation Progress • Automation Test Coverage Automation extras
  4. 4. More Examples of Software Testing Metrics • Defect Counts by Status • Defect Counts by Priority • Defect Status Trend • Defect Density • Defect Remove Efficiency • Defect Leakage • Average Defect Response Time Defects • Requirements Volatility Index • Testing Process Efficiency Other
  5. 5. Common Themes Counts Metric (Counts/Counts) Trends
  6. 6. Other Examples of Software Testing Metrics • Test Case Counts by Execution Status – Count • Test Case Percentages by Execution Status – Count • Test Case Execution Status Trend – Trend • Test Case Executed vs Planned – Metric and Trend • Test Case Coverage – Metric • Test Case Status vs Coverage – Metric • Test Case First Run Failure Counts – Count • Test Case Re– Run Counts – Count Test Cases • Automation Index (Percent Automatable) – Metric • Automation Progress – Count • Automation Test Coverage – Metric Automation extras
  7. 7. More Examples of Software Testing Metrics • Defect Counts by Status – Count • Defect Counts by Priority – Count • Defect Status Trend – Trend • Defect Density – Metric • Defect Remove Efficiency – Metric • Defect Leakage – Metric • Average Defect Response Time – Trend Defects • Requirements Volatility Index – Metric • Testing Process Efficiency – Metric Other
  8. 8. The Problem We Typically Face? They Fail to Communicate • Present data instead of information • Offer no interpretation, allow user to draw own conclusion They Are Often Inaccurate • The act of measuring lacks of consistency • The measures themselves have inherent variability • No one reports margin of errors They Do Not Measure a Control • Can’t make decision based on number • The measurement isn’t a lever to introduce change They Are Not Tied to Organizational Objectives • No threshold set for desired goal • No action or consequence if not achieved
  9. 9. Counting
  10. 10. Counting
  11. 11. Exercise #1 1. Need 3 volunteers 2. Assume 1 scoop equals 1 days worth of testing effort 3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are bugs 4. Take a scoop 5. How many tests did you execute? 6. Based on how many tests you ran, how many more scoops do you need to execute the rest (there are 180 total)?
  12. 12. Exercise #1 Questions • Was the same scoop used? Were the results the same? • Was there variability in the number of tests run in each scoop. • Is that typical in testing? • Was there variability in the estimate of the number of tests left? • Is this similar to guessing how much time is effort is left in a test cycle? • Are these numbers reliable? • Are they repeatable?
  13. 13. Exercise #2 1. Need 3 volunteers 2. Assume 1 scoop equals 1 days worth of testing effort 3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are bugs (Red are severe) 4. Take a scoop 5. How many tests did you execute? 6. How many defects did you find? 7. Based on how many tests you ran, how many more scoops do you need to execute the rest? 8. Based on how much effort you put in, how many more scoops do you need to find the rest of the defects?
  14. 14. Exercise #2 Questions • Was the same scoop used? Were the results the same? • With an estimate of the number of tests remaining, is it reasonable to estimate the number of defects will be found? • Do people ask you to guess this type of information? • If you know how many tests (Starbursts) are left and how many man- hours you will use (scoop size), can you estimate how many scoops are needed to execute all tests (find all Starbursts)? • Is it accurate? Is it close enough? • Are these numbers reliable? • Are they repeatable? • Does encountering defects (M&M’s) reveal anything about the overall quality (how many M&M’s exist, or what it’ll take to find them)?
  15. 15. Challenges with Counting Label does not equal content Inherent variability Not evenly spaced Lacks reference for context Lack of consistency
  16. 16. Metrics (Measure over Measure)
  17. 17. Sampling Target Population Matched Samples Independent Samples Random Sampling Simple Random Sampling Stratified Sampling Cluster Sampling Quota Sampling Spatial Sampling Sampling Variability Standard Error Bias Precision For each population there are many possible samples. A sample statistic gives information about a corresponding population parameter
  18. 18. Sampling in Testing Does testing use sampling? Consider in most corporate environments: • We never test the entire application • It is not realistically possible to find every defect • So, does testing use sampling?
  19. 19. Ponder this as we discuss the next section… Is Testing a Methodical Defect Searching Activity?
  20. 20. Sampling Remember, We can’t test everything – not enough time/people/budget So, which sample approach better approximates an actual measure (e.g. dots per sq. inch?) 5.25 dots/sq. in. 6.5 dots/sq. in.
  21. 21. Ponder this as we discuss the next section… Is Testing a Methodical Defect Searching Activity?
  22. 22. Sampling Which sample approach better approximates an actual measure (e.g. dots per sq. inch?) • What is more accurate, random or methodical searching? 5.25 dots/sq. in. 6.5 dots/sq. in. 4.95 dots/sq. in. 6.3 dots/sq. in. There are actually 6.6 dots/sq. in.
  23. 23. Exercise #3
  24. 24. Exercise #3 1. Need 3 volunteers 2. Assume 1 scoop equals 1 days worth of testing effort 3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are bugs (Red are severe) 4. Each volunteer grab 1 scoop of candy 5. How many (total) tests did you execute? 6. How many (total) defects did you find? 7. Log results 8. Repeat 2 more times
  25. 25. Exercise #3 Questions • Does this graph represent anything useful? • Does a trend line help or mean anything? • Is it possible or reasonable to estimate the # of defects you’ll see based on the number of tests, from even 9 samples? • Compare scoop 1 to scoop 9 – does any scoop seem to be a reasonable estimate?
  26. 26. Challenges with Metrics (Measure over Measure) Implied Derivations and Forecasting Counts over Counts Denominator Rules Implies Velocity Measure over Measure
  27. 27. Trends
  28. 28. Trend Trend is a change in a measure (or metric) over time interval. Has three components Direction/Movement Speed/Size Cause (Implied)
  29. 29. Exercise #4 1. Need 3 volunteers 2. Assume 1 scoop equals 1 days worth of testing effort 3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are bugs (Red are severe) 4. Each volunteer grab 1 scoop of candy 5. How many of EACH type of tests did you execute? 6. How many of EACH type of defect did you find? 7. Log results 8. Repeat 2 more times
  30. 30. Exercise #4 Questions • Does the graph line represent any information of value? • Is there assurance (control) that simply taking a scoop (e.g. executing tests in a given day) will result in defects being found? • Is the shape of the defect cumulative line representative of anything? • If we only look at scoops 1-3 or 7-9, does it tell us anything or mislead us? • What if we took 2 scoops per day (added a tester – but still counted as 1 day), would that affect anything how things look? • Is M&M’s per scoop or M&M’s per skittles/starbursts mean anything?
  31. 31. Challenges with Trends Affected by challenges of counting Affected by challenges of metrics Time Based Series Intervals and Activity Pause
  32. 32. Purpose of Metrics Measure of Performance Conformance to Best Practice Deviation from Goal
  33. 33. Issues affecting purpose Misaligned with strategy Using metrics as outputs only Too many metrics Ease of measure does not equal importance Lack of context Limited dimensions Lack behavioral aspects
  34. 34. Changing the World
  35. 35. How to Leverage Metrics Explicitly link metrics to goals Use trends over absolute numbers Use shorter tracking periods Change metrics when they stop driving change Account for error and confidence
  36. 36. Q&A Joseph Ours Email: Joseph.ours@centricconsulting.com Company Website: https://centricconsulting.com/technol ogy-solutions/software-quality- assurance-and-testing/ Twitter: @justjoehere LinkedIN: www.linkedin.com/josephours Personal Blog: http://josephours.blogspot.com

×