This is a presentation I gave at a workshop that was co-located with ESSoS 2010. The presentation is about using Metrics Validation Criteria to choose a valid predictive metric for security vulnerabilities.
10. Systematic literature review Phase Size of Source List Literature Index 2,228 Title 536 Cross-confirmed Title 156 Abstract 44 Full-text 17 Follow-up 20
You have the burden of proof. Not just that these metrics point to something, but that they are meaningful.
A metric is a "quantitative scale and method that can be used to determine the value a feature takes for a specific software product”.
Model for less than a specific value, let’s say .20.
Concrete evidence of proposed metrics which emanates upward into increasingly abstracted analysis of the information we discovered These sections are actually from the journal paper we are submitting to EmSE. Backwards informative sometimes we would learn something later down the line that would help us go back to do an earlier process better.
Google, CiteSeerX, IEEExplore, ACM Portal
So which one of these does prediction fall into?
Again, which one of these does prediction fall into?
You have the burden of proof. Not just that these metrics point to something, but that they are meaningful.
47 Total – 21 Removed = 26 Remaining / 47 = 55%
If we have to redefine a given metric, so it’s just a rephrasing of a well-known one, so that it can be applied to the project at hand, then that’s OK as long as the newly defined metric is predictive. This isn’t a property we necessarily want out of a metric, for example, code coverage shouldn’t increase when concatenating two components together, it should be the average of the two…
Imagine a metric that is always predictive and with 100% accuracy will tell you which files are vulnerable in a system but which takes half a year to calculate and extract. Such a metric is not usable because by the time you obtain your much-needed values, the software system has changed—not to mention that you might have had a release or a complete architectural revamping. Alternatively imagine a metric which costs twice the budget of the entire project to collect—such a metric, no matter how accurate, is not worth collecting. The instrument can be a collection method or something a concrete as the tool used to measure some part of the metric. For example, imagine a test coverage utility that doesn’t accurately calculate branch coverage. This version of branch coverage is invalid, even if it’s predictive, because the method to increase the value of the metric is unclear. Testing more branches may decrease the value of the measurement, or increase the value too much.