The adoption of machine learning techniques for software defect prediction: An initial industrial validation
Presented at:
11th Joint Conference On Knowledge-Based Software Engineering, JCKBSE, Volgograd, Russia, 2014
Get full text of publication at:
http://rakeshrana.website/index.php/work/publications/
The adoption of machine learning techniques for software defect prediction: An initial industrial validation
1. Rakesh Rana1, Miroslaw Staron1, Jörgen Hansson1, Martin Nilsson2, Wilhelm Meding3
1Computer Science & Engineering, Chalmers | University of Gothenburg, Sweden
2Volvo Car Group, Gothenburg, Sweden
3Ericsson, Gothenburg, Sweden
rakesh.rana@gu.se
The adoption of machine learning techniques
for software defect prediction: An initial
industrial validation
2. Software Defect Prediction (SDP) methods
Image 1: https://www.reliablesoft.net/how-to-become-an-expert-in-your-niche-even-if-you-are-not/
Image 2: Fenton, Norman, et al. "Predicting software defects in varying development lifecycles using Bayesian nets." Information and Software Technology 49.1 (2007): 32-43.
Image 3: Kan, Stephen H. Metrics and models in software quality engineering. Addison-Wesley Longman Publishing Co., Inc., 2002.
Image 4: http://www.codeodor.com/index.cfm/2009/11/12/Its-Not-Your-Fault-Your-Software-Sucks/3058
3. SDP: Methods based on Machine Learning
• Decision Trees (DTs)
• Support Vector Machines (SVMs)
• Artificial Neural Networks
(ANNs)
• Bayesian Belief Networks (BNNs)
Image 1: http://www.webpages.uidaho.edu/veg_measure/Modules/Lessons/Module%202(Sampling)/2_3_Accuracy_and_bias.htm
Image 2: http://www.business2community.com/marketing/3-easy-keyword-research-tips-for-inbound-marketing-success-0215660#!bKz0G0
Image 3: http://dpss.co.riverside.ca.us/childrens-services-division/adoption-information/foster-adoptive-parent
Image 4: http://www.haitian-truth.org/treatys-tighter-adoption-rules-kick-in-for-haiti/
4. Objective
“What are the factors that are important for
companies to make informed decision to adopt
(or not adopt) ML algorithms for the purpose of
software defect predictions (SDP)?”
9. Study Design
Unit of
analysis
(Domain)
Software
development
process
Current methods for
SDP
Current state
of adoption of
ML for SDP
VCG
(Automotive)
V-shaped
software
development
Focus on status
visualization and analogy
based prediction
Considering
evaluation
Ericsson
(Telecom)
Lean and Agile
development
Various modes of
presenting current status
and predictions methods
Considering
evaluation
10. Study Design
The interviewees:
VCG, QM
VCG, MetricsTL
Ericsson, QM
Ericsson, MetricsTL
Level Need and importance
(Table 2)
Level of Satisfaction
(Table 3)
Level of importance
(Table 4)
Very Low
(VL)
The information is not needed.
Not satisfactory,
improvement is needed.
The attribute is not needed
for analysis.
Low (L)
The information is desired, but
not considered important.
Not satisfactory,
improvement is desired.
The attribute can be
considered but not required.
Medium
(M)
The information is desired and
is considered of value (if
available).
Satisfactory, but could be
improved.
The attribute is useful for
making the analysis.
High (H)
The information is deemed as
needed and is considered
important.
Satisfaction is high.
The information on given
attribute is needed for
making the analysis.
Very High
(VH)
The information is a must and
should be provided with high
accuracy.
Satisfaction is very high,
with low scope for further
improvement.
Cannot make a decision
without information about
this attribute.
11. Results: Information need and its importance for SDP
Prediction Needs w.r.t software defects
VCG
(QM)
VCG
MetricsTL
Ericsson
(QM)
Ericsson
MetricsTL
Classification of defect prone files/modules L H VH VH
Expected number of defects in SW components H H L VH
Expected defect inflow for a project/release H H L VH
Release readiness/expected latent defects H VH H VH
Severity classification of defects VH M H H
VCG most OEMs (Original Equipment Manufacturers) in
automotive domain, Model Based Development (MBD)
Assessing release readiness is important (High) for both case
units
12. Results: Satisfaction with existing systems
Factors: Satisfaction with existing
systems
VCG
(QM)
VCG
MetricsTL
Ericsson
(QM)
Ericsson
MetricsTL
Status information H H H H
Trend visualization H M M H
Predictions accuracy M M L H
Cost (current costs are low) VH VH - VH
Reliability VH H VH M
“Cost of obtaining results is very important factor and the
current systems we use are very cheap to run and maintain”
– QM at VCG.
13. Results: Familiarity and competence with ML techniques
Factors: Familiarity and competence
with ML techniques
VCG
(QM)
VCG
MetricsTL
Ericsson
(QM)
Ericsson
MetricsTL
ML tried in previous project L L - M
Understanding of the technology L L - M
Ability to implement algorithms in-house VL M - M
Academic collaboration M H - M
Ability to interpret the results H H - M
Ability to assess quality of results H M - M
Participating companies in the study show medium to high
confidence with their ability to interpret the results from such
analysis.
14. Results: Perceived Benefits
Factors: Perceived Benefits
VCG
(QM)
VCG
MetricsTL
Ericsson
(QM)
Ericsson
MetricsTL
Accuracy in predicting H H VH VH
Automation of pattern discovery M H VH VH
Adaptability to different data sets M H VH VH
Ability to handle large data H H M VH
Ability to generate new insights H M H H
“When it comes to the benefits, accuracy and automation are
the top priorities for us” – MetricsTL at Ericsson
15. Results: Tool availability & External factors
Factors: Tool availability & External
factors
VCG
(QM)
VCG
MetricsTL
Ericsson
(QM)
Ericsson
MetricsTL
Compatibility with existing systems M L H VH
Availability of open source tools L H M VH
Low cost of obtaining results VH H H M
Support/consulting services H M L VL
Adoption by other industries L L L M
Use by competitors H M L M
“Even if open source tools are available, we typically need a vendor in
between to do tool integration, manage upgrades and do maintenance
work – we do not have resources for that” – QM at VCG.
“We are not afraid of trying new things and being the first one, but if it is
used in automotive sector and we have not tried it surely helps the case”
– QM at VCG.
16. Specific challenges in adopting ML techniques in
industry for SDP
Lack of information to make a strong business case
Uncertainty on applicability of ML when access to source
code is not available
How to adapt ML techniques for model driven
development
How to effectively use text base artefacts for SDP
Uncertainty over where ML fits in context of compliance to
standards
17. ML Adoption for SDP: Conclusions
ML based techniques have high potential to aid companies in SDP
efforts
We identified a total of nine important factors and twenty seven
related attributes
ML adoption framework help increase our understanding of factors
and attributes relevant for industrial practitioners
ML adoption framework will be useful for
Companies,
Researchers, and
Tool vendors
What are the factors that are important for companies to make
informed decision to adopt (or not adopt) ML algorithms for the
purpose of software defect predictions (SDP)?