Software QA Metrics Dashboard Benchmarking


Published on

Software metrics best practices from a benchmarking assignment that indicates how software metrics are reported to management and used to drive behavior. We learned how leading companies used dashboards to report on quality progress and improvement results. We found the best organizations focused on the vital few metrics but also had automated systems with the ability to drill down on metrics at the divisional and team levels. In addition, the best normalized the metrics by number of customers or complexity. They systematically used root cause analysis to analyze bugs in the field. The SW Quality metrics often went beyond the strict definition of quality in that they also measured release predictability and feature expectations. Finally, the best companies used external benchmarks to set their quality targets.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Software QA Metrics Dashboard Benchmarking

  1. 1. Software Quality Metrics Benchmark Study How Software Metrics and Dashboards are Applied in High Technology Companies Release Slip Rate Percentage Root Cause Analysis Uses Automated John Carter Vertical Axis Label External MetricsBenchmarks System Best Rest TCGen, Inc. Total Quality Normalizati Menlo Park, CA (Predictabili on Benchmark ty/Features) Horizontal Axis Label May 1, 2012
  2. 2. Executive Summary from 10 Public CompaniesThe purpose of the benchmark study was to capture best practices in the application of SW metrics dashboards. List of participantsTen technology companies were benchmarked against these questions:• What metrics on software quality are reported to management? 3 Highly Regulated• Internal quality metrics, external field detected metrics?• How are they normalized? Customers in field, LOC?• What are the most important?• Are they tabular, graphical? How many? Are target values shown? 7 from Technology• How frequently are they reported? How many do you report on?• What are key target values you look at for key metrics?Key Highlights:• There is no standard for the number of metrics, type of metrics, nor frequency of reporting Networking• However there are best practices around Software Quality Metrics – We can look at what separates the best from the rest• The BEST have 1. Automated metrics tracking and analysis systems that allow drill down and reporting by product, release, Storage customer 2. Normalization that ensures that the metrics are meaningful as the number of customers or the complexity of code increases 3. Root Cause Analysis system that systematically analyzes defects that escape the company and are found in the field Computer 4. Quality metrics that go beyond product defects, and include release predictability and feature expectations 5. External benchmarks that are used to set goals (created by third parties to establish databases or perform surveys) 3 Highly regulated companies 7 Networking/Computer/StorageSWQA_Metrics_Benchmark_TCGen 2
  3. 3. How We Approached the Analysis• The Process Capability Maturity Model (CMM) defines five level of process maturity – Level 1 (Initial, Chaotic) – Level 2 (Repeatable) – Level 3 (Defined) – Level 4 (Managed, Measured) – Level 5 (Optimizing)• Metrics are a key parts of the CMM model, and Level 4 indicates mastery of metrics• SW metrics are well characterized, and are often divided up between Product Quality Metrics, In-Process Metrics, and Metrics for SW Maintenance*• From our survey of ten companies, we have derived a sense of metrics maturity, and have created our own rating of SW Metrics Maturity using five factors – Automated, Root Cause Analysis, Normalized, External Benchmarks, and Total Quality (not just defects) – The Best tend to have excellent scores on all five dimensions, the rest lag behind in one or more areas – The best tend to have measures in the three areas defined above (Product, In-Process, and Maintenance) “Best vs. Rest” * Stephen Kan, “Metrics and Models in Software Quality Engineering”, Addison-Wesley, 2003SWQA_Metrics_Benchmark_TCGen 3
  4. 4. Example SW Metrics Maturity Hypothetical Radar Chart:1. Automated metrics tracking and A 5 point scale, where analysis systems that allow drill mastery is indicated as a 5 down and reporting by product, Root Cause (outermost), and absent is release, customer Analysis a 0 (innermost)2. Normalization that ensures that the metrics are meaningful as the number of customers or the complexity of code increases3. Root Cause Analysis system that Automated systematically analyzes defects that Uses External escape the company and are found Metrics Benchmarks in the field System Best4. Quality metrics that go beyond product defects, and include Rest release predictability and feature expectations5. External benchmarks that are used to set goals (created by third Total Quality parties to establish databases or perform surveys) (Predictability/ Normalization Features) The nature of the survey did not allow us to complete this chart for each participant, but this treatment would be very useful to evaluate where you are today and where you should focus in the future to close gaps between the best and the rest. SWQA_Metrics_Benchmark_TCGen 4
  5. 5. Dashboard – Drawn from Benchmarking Guiding Principles: Each metric should be linked to your overall quality objectives, which were derived from your overall strategy From the Benchmark Sample, the goals might be: • Increasing Net Promoter Score (how highly you are recommended) • Increasing Release Predictability • There should be • Increasing Customer Satisfaction between 4-8 • Increasing Reported Quality (Field Quality) metrics • Reducing time to repair • Reducing the number of Critical Accounts • Two related metrics per Each chart has the following graphical properties: screen • The charts are composed so that the ‘so what’ is very clear, and repeated for each so • Text describing & that it is clear to managers that only see them once a quarter, so they know why the metric is there and if there is any significance to the data, what the significance is. analyzing the • Targets should be on all graphs data represented • Where benchmark data exists, it will also be shown on the chart • Each chart should have the following properties  Title & Description  So What  Consistent Design  Labeled Axes  Target Curves  NarrativeSWQA_Metrics_Benchmark_TCGen 5
  6. 6. Vertical Axis Label Mean Time to Repair Percent of Release Slips Vertical Axis Label Major Release 2 Major Release 3 Benchmark Horizontal Axis Label Horizontal Axis Label This chart plots the average time, in weeks, that the This chart plots the percentage of actual versus planned customers had to wait for resolution. Measured in weekly schedule for major and minor releases. intervals, data captured per release. • The target is derived to get to less than 5% slip by 2014, • The target is derived to get to the fastest resolution (and closing the gap in a straight line, coming down from 22% reduce the number outstanding) where we are today • The increase shown in January, 2012 is driven by the A.x • The increase shown in November, 2011is driven by the release. A.2a release, which had to go through 2 alpha • The new methods for engineering releases should impact • We expect a steeper drop in July, 2012 because of our this in 2013 new “Darken the Sky” program to provide requirements stability • Benchmarking indicates that the best in class number is a slip rate of less than 15% (for 9 month release cycles).
  7. 7. Best Practices In benchmarking studies like this, we often see some exemplary practices that demonstrate creative and effective ways to stay ahead. 1. Use of third party firms to assess where your software defect performance stacks up against the competition & use of industry standard databases for software quality 2. Test Escapes Analysis Process to perform root cause analysis on all significant escapes to the field Top 5 3. SW Defects reported on dashboard includes broader measures like predictability, expectations 4. Automated, integrated system for real time metrics analysis and presentation to management is simply pulling up current data and reviewing it formally 5. Normalization for complexity and or accounts in the field to ensure that proper comparisons are made 6. Create compound metric that pulls together several important factors for the businessMetrics to 7. Institute metrics that show (unit and integration) statement coverage, branch coverage, all tests passing,Consider and for functional testing, show requirements coverage and all tests passing 8. Institute metrics that show defect backlog, number of test cases planned, and Upgrade/Update failure rate, Early Return Index, Fault Slip Through 9. Bug tool kit that goes to the field with exhaustive and searchable data to help customers avoid reporting defects, learn about workarounds, and search with Google like strength 10. If external benchmark targets are not known, track improvement release over release Other Tips 11. Focus on what is important. One participant only tracks release predictability and customer satisfaction 12. Use parametric estimation metrics – for example 4 days for a test case to ensure high quality, data driven schedule estimates (also helps demonstrate improvements over time) SWQA_Metrics_Benchmark_TCGen 7
  8. 8. Summary Statistics Key Highlights: • 8 do report customer found defects to management (remaining 2 report customer sat at a high level) • 6 report on the order of 4 metrics to management, the remaining 4 report more or less • 5 include time to market as a metric in their quality dashboard • 4 report escapes or customer found defects caused by bad fixes • 4 companies have real time visibility of metrics, and they are automatically updated on a daily basis • 3 companies reported on compound metrics that combine reliability, availability, time to fix • 3 do not use targets for metrics reported to management, but only report the improvement release to release • 3 normalize metrics (LOC on inside, or Units in Field on outside)SWQA_Metrics_Benchmark_TCGen 8
  9. 9. Implications• Root cause analysis should be performed on defects from the field that are either critical or from regressions – Many companies have special processes for doing this effectively• It appears that some participants have higher levels of automation and coverage for both unit, integration, and functional test – And it is measured• Planning metrics, such as the number of days per test case should be used for prediction and improvement• If you are growing, some normalization should be used. – It should be coarse (like judged Lines of Code, converted from Function Points)• Walker Survey, Quest Database, and are three recommended vendors for metrics and management – Walker Survey can determine how you stack up against your competitors regarding quality and satisfaction – Quest is a TL 9000 database – Manager-Tools are helpful for developing QA managers• Where absolute targets don’t exist, a target curve based on prior improvement should be used to answer ‘are we getting better?’SWQA_Metrics_Benchmark_TCGen 9