Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Survey on
Software Defect Prediction
- PhD Qualifying Examination -
July 3, 2014
Jaechang Nam
Department of Computer Scien...
Outline
• Background
• Software Defect Prediction Approaches
– Simple metric and defect estimation models
– Complexity met...
Motivation
• General question of software defect
prediction
– Can we identify defect-prone entities (source
code file, bin...
Ground Assumption
• The more complex, the more defect-
prone
4
Two Focuses on Defect
Prediction
• How much complex is software and its
process?
– Metrics
• How can we predict whether so...
Prediction Performance Goal
• Recall vs. Precision
• Strong predictor criteria
– 70% recall and 25% false positive rate
(M...
Outline
• Background
• Software Defect Prediction Approaches
– Simple metric and defect estimation models
– Complexity met...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
MetricsModelsOthers
Identifying Defect-prone Entities
• Akiyama’s equation (Ajiyama@IFIP`71)
– # of defects = 4.86 + 0.018 * LOC (=Lines Of Co...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Cyclomati
c Metric
Halstea
d
Met...
Complexity Metrics and Fitting
Models
• Cyclomatic complexity metrics (McCabe`76)
– “Logical complexity” of a program repr...
Complexity Metrics and Fitting
Models
• Limitation
– Do not capture complexity (amount) of change.
– Just fitting models b...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Cy...
Regression Model
• Shen et al.’s empirical study (Shen@TSE`85)
– Linear regression model
– Validated on actual new modules...
Classification Model
• Discriminative analysis by Munson et al.
(Munson@TSE`92)
• Logistic regression
• High risk vs. low ...
?
Defect Prediction Process
(Based on Machine Learning)
16
Classification /
Regression
Software
Archives
B
C
C
B
...
2
5
0...
Defect Prediction
(Based on Machine Learning)
• Limitations
– Limited resources for process metrics
• Error fix in unit te...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Risk Prediction of Software
Changes
(Mockus@BLTJ`00)
• Logistic regression
• Change metrics
– LOC added/deleted/modified
–...
Risk Prediction of Software
Changes
(Mockus@BLTJ`00)
• Advantage
– Show the feasible model in practice
• Limitation
– Cond...
BugCache (Kim@ICSE`07)
• Maintain defect-prone entities in a cache
• Approach
• Result
– Top 10% files account for 73-95% ...
BugCache (Kim@ICSE`07)
• Advantages
– Cache can be updated quickly with less cost. (c.f. static
models based on machine le...
Change Classification (Kim@TSE`08)
• Classification model based on SVM
• About 11,500 features
– Change metadata such as c...
Change Classification (Kim@TSE`08)
• Limitations
– Heavy model (11,500 features)
– Not validated on commercial software pr...
Follow-up Studies
• Studies addressing limitations
– “Reducing Features to Improve Code Change-Based Bug
Prediction” (Shiv...
Challenges of JIT model
• Practical validation is difficult
– Just 10-fold cross validation in current literature
– No val...
Next Steps of Defect Prediction
1980s 1990s 2000s 2010s 2020s
Online Learning JIT Model
Prediction Model (Regression)
Pred...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Defect Prediction in Industry
• “Predicting the location and number of faults in
large software systems” (Ostrand@TSE`05)
...
Case Study for Practical Model
• “Does Bug Prediction Support Human Developers?
Findings From a Google Case Study” (Lewis@...
Next Steps of Defect Prediction
1980s 1990s 2000s 2010s 2020s
Actionable
Defect
Prediction
Prediction Model (Regression)
P...
Evaluation Measure for Practical
Model
• Measure prediction performance based on
code review effort
• AUCEC (Area Under Co...
Practical Application
• What else can we do more with defect
prediction models?
– Test case selection on regression testin...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Representative OO Metrics
Metric Description
WMC Weighted Methods per Class (# of methods)
DIT Depth of Inheritance Tree (...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Representative History Metrics
38
Name
# of
metrics
Metric
source
Citation
Relative code change churn 8 SW Repo.* Nagappan...
Representative History Metrics
• Advantage
– Better prediction performance than code metrics
39
0.0%
10.0%
20.0%
30.0%
40....
History Metrics
• Limitations
– History metrics do not extract particular program
characteristics such as developer social...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Other Metrics
43
Name
# of
metrics
Metric
source
Citation
Component network 28
Binaries
(Windows
Server
2003)
Zimmermann@I...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Noise Reduction
• Noise detection and elimination algorithm
(Kim@ICSE`11)
– Closest List Noise Identification (CLNI)
• Bas...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Defect Prediction for New Software
Projects
• Universal Defect Prediction Model
• Simi-supervised / active learning
• Cros...
Universal Defect Prediction Model
(Zhang@MSR`14)
• Context-aware rank transformation
– Transform metric values ranged from...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Other approaches for CDDP
• Semi-supervised learning with dimension
reduction for defect prediction (Lu@ASE`12)
– Training...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Cross-Project Defect Prediction
(CPDP)
• For a new project or a project lacking
in the historical data
52
?
?
?
Training
T...
Transfer Learning (TL)
27
Traditional Machine Learning
(ML)
Learnin
g
System
Learnin
g
System
Transfer Learning
Learnin
g
...
CPDP
54
• Adopting transfer learning
Transfer learning
Metric
Compensation
NN Filter TNB TCA+
Preprocessing N/A
Feature se...
Metric Compensation
(Watanabe@PROMISE`08)
• Key idea
• New target metric value =
target metric value * average source metr...
Metric Compensation (cont.)
(Watanabe@PROMISE`08)
56
Transfer learning
Metric
Compensation
NN Filter TNB TCA+
Preprocessin...
NN filter
(Turhan@ESEJ`09)
• Key idea
• Nearest neighbor filter
– Select 10 nearest source instances of
each target instan...
NN filter (cont.)
(Turhan@ESEJ`09)
58
Transfer learning
Metric
Compensation
NN Filter TNB TCA+
Preprocessing N/A
Feature s...
Transfer Naive Bayes
(Ma@IST`12)
• Key idea
59
Target
Hey, you look like me! You will get more chance to be my best
model!...
Transfer Naive Bayes (cont.)
(Ma@IST`12)
• Transfer Naive Bayes
– New prior probability
– New conditional probability
60
Transfer Naive Bayes (cont.)
(Ma@IST`12)
• How to find similar source instances for target
– A similarity score
– A weight...
Transfer Naive Bayes (cont.)
(Ma@IST`12)
62
Transfer learning
Metric
Compensation
NN Filter TNB TCA+
Preprocessing N/A
Fea...
TCA+
(Nam@ICSE`13)
• Key idea
– TCA (Transfer Component Analysis)
63
Source Target
Oops, we are different! Let’s meet in a...
Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original d...
TCA (cont.)
65
Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
Target domain data
Source domain data
TCA (cont.)
66
TCA
Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
TCA+
(Nam@ICSE`13)
67
Source Target
Oops, we are different! Let’s meet at another world!
New Source New Target
But, we are...
Normalization Options
• NoN: No normalization applied
• N1: Min-max normalization (max=1, min=0)
• N2: Z-score normalizati...
Preliminary Results using TCA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F-measure
69*Baseline: Cross-project defect prediction wit...
TCA+: Decision Rules
• Find a suitable normalization for TCA
• Steps
– #1: Characterize a dataset
– #2: Measure similarity...
TCA+: #1. Characterize a
Dataset
71
3
1
…
Dataset A Dataset B
2
4
5
8
9
6
11
d1,2
d1,5
d1,3
d3,11
3
1
…
2
4
5
8
9
6
11
d2,...
TCA+: #2. Measure Similarity
between Source and Target
• Minimum (min) and maximum (max) values of
DIST
• Mean and standar...
TCA+: #3. Decision Rules
• Rule #1
– Mean and Std are same  NoN
• Rule #2
– Max and Min are different  N1 (max=1, min=0)...
TCA+ (cont.)
(Nam@ICSE`13)
74
Transfer learning
Metric
Compensation
NN Filter TNB TCA+
Preprocessing N/A
Feature selection...
Current CPDP using TL
• Advantages
– Comparable prediction performance to within-prediction
models
– Benefit from the stat...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Feasibility Evaluation for CPDP
• Solution for negative transfer
– Decision tree using project characteristic metrics
(Zim...
Follow-up Studies
• “An investigation on the feasibility of cross-project
defect prediction.” (He@ASEJ`12)
– Decision tree...
Feasibility for CPDP
• Challenges on current studies
– Decision trees were not evaluated properly.
• Just fitting model
– ...
Next Steps of Defect Prediction
1980s 1990s 2000s 2010s 2020s
Cross-Prediction
Feasibility Model
Prediction Model (Regress...
Semi-
supervised/active
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Predicti...
Cross-prediction Model
• Common challenge
– Current cross-prediction models are limited to datasets
with same number of me...
Next Steps of Defect Prediction
1980s 1990s 2000s 2010s 2020s
Prediction Model (Regression)
Prediction Model (Classificati...
Other Topics
84
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Other Topics
• Privacy issue on defect datasets
– MORPH (Peters@ICSE`12)
• Mutate defect datasets while keeping prediction...
Outline
• Background
• Software Defect Prediction Approaches
– Simple metric and defect estimation models
– Complexity met...
Defect Prediction Approaches
1970s 1980s 1990s 2000s 2010s
LOC
Simple Model
Fitting Model
Prediction Model (Regression)
Pr...
Next Steps of Defect Prediction
1980s 1990s 2000s 2010s 2020s
Online Learning JIT Model
Actionable
Defect
Prediction
Cross...
Thank you!
90
91
Evaluation Measures
(classification)
• Measures for binary classification
– Confusion matrix
92
Buggy Clean
Buggy True Pos...
Evaluation Measures
(classification)
• False positive rate (FPR,PF) =
FP/(TN+FP)
• Accuracy = (TP+TN)/(TP+FP+TN+FN)
• Prec...
Evaluation Measures
(classification)
• AUC (Area Under receiver operating characteristic
Curve)
94
False Positive rate
Tru...
Evaluation Measures
(classification)
• AUCEC (Area Under Cost Effectiveness Curve)
95
Percent of LOC
Percentofbugsfound
0
...
Evaluation Measures
(Regression)
• Target
– Metric values vs. the number of bugs
– Actual vs. predicted number of bugs
• C...
CK metrics
Metric Description
WMC Weighted Methods per Class (# of methods)
DIT Depth of Inheritance Tree ( # of ancestor ...
Upcoming SlideShare
Loading in …5
×

Survey on Software Defect Prediction

10,870 views

Published on

PQE slides at HKUST

Published in: Technology, Education
  • Be the first to comment

Survey on Software Defect Prediction

  1. 1. Survey on Software Defect Prediction - PhD Qualifying Examination - July 3, 2014 Jaechang Nam Department of Computer Science and Engineering HKUST
  2. 2. Outline • Background • Software Defect Prediction Approaches – Simple metric and defect estimation models – Complexity metrics and Fitting models – Prediction models – Just-In-Time Prediction Models – Practical Prediction Models and Applications – History Metrics from Software Repositories – Cross-Project Defect Prediction and Feasibility • Summary and Challenging Issues 2
  3. 3. Motivation • General question of software defect prediction – Can we identify defect-prone entities (source code file, binary, module, change,...) in advance? • # of defects • buggy or clean • Why? – Quality assurance for large software (Akiyama@IFIP’71) – Effective resource allocation • Testing (Menzies@TSE`07) • Code review (Rahman@FSE’11) 3
  4. 4. Ground Assumption • The more complex, the more defect- prone 4
  5. 5. Two Focuses on Defect Prediction • How much complex is software and its process? – Metrics • How can we predict whether software has defects? – Models based on the metrics 5
  6. 6. Prediction Performance Goal • Recall vs. Precision • Strong predictor criteria – 70% recall and 25% false positive rate (Menzies@TSE`07) – Precision, recall, accuracy ≥ 75% (Zimmermann@FSE`09) 6
  7. 7. Outline • Background • Software Defect Prediction Approaches – Simple metric and defect estimation models – Complexity metrics and Fitting models – Prediction models – Just-In-Time Prediction Models – Practical Prediction Models and Applications – History Metrics from Software Repositories – Cross-Project Defect Prediction and Feasibility • Summary and Challenging Issues 7
  8. 8. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model MetricsModelsOthers
  9. 9. Identifying Defect-prone Entities • Akiyama’s equation (Ajiyama@IFIP`71) – # of defects = 4.86 + 0.018 * LOC (=Lines Of Code) • 23 defects in 1 KLOC • Derived from actual systems • Limitation – Only LOC is not enough to capture software complexity 9
  10. 10. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Cyclomati c Metric Halstea d Metrics MetricsModelsOthers
  11. 11. Complexity Metrics and Fitting Models • Cyclomatic complexity metrics (McCabe`76) – “Logical complexity” of a program represented in control flow graph – V(G) = #edge – #node + 2 • Halstead complexity metrics (Halsted`77) – Metrics based on # of operators and operands – Volume = N * log2n – # of defects = Volume / 3000 11
  12. 12. Complexity Metrics and Fitting Models • Limitation – Do not capture complexity (amount) of change. – Just fitting models but not prediction models in most of studies conducted in 1970s and early 1980s • Correlation analysis between metrics and # of defects – By linear regression models • Models were not validated for new entities (modules). 12
  13. 13. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Cyclomati c Metric Halstea d Metrics Process Metrics MetricsModelsOthers Prediction Model (Classification)
  14. 14. Regression Model • Shen et al.’s empirical study (Shen@TSE`85) – Linear regression model – Validated on actual new modules – Metrics • Halstead, # of conditional statements • Process metrics – Delta of complexity metrics between two successive system versions – Measures • Between actual and predicted # of defects on new modules – MRE (Mean magnitude of relative error) » average of (D-D’)/D for all modules • D: actual # of defects • D’: predicted # of defects » MRE = 0.48 14
  15. 15. Classification Model • Discriminative analysis by Munson et al. (Munson@TSE`92) • Logistic regression • High risk vs. low risk modules • Metrics – Halstead and Cyclomatic complexity metrics • Measure – Type I error: False positive rate – Type II error: False negative rate • Result – Accuracy: 92% (6 misclassification out of 78 modules) – Precision: 85% – Recall: 73% – F-measure: 88% 15
  16. 16. ? Defect Prediction Process (Based on Machine Learning) 16 Classification / Regression Software Archives B C C B ... 2 5 0 1 ... Instances with metrics (features) and labels B C B ... 2 0 1 ... Training Instances (Preprocessing ) Model ? New instances Generate Instances Build a model
  17. 17. Defect Prediction (Based on Machine Learning) • Limitations – Limited resources for process metrics • Error fix in unit testing phase was conducted informally by an individual developer (no error information available in this phase). (Shen@TSE`85) – Existing metrics were not enough to capture complexity of object-oriented (OO) programs. – Helpful for quality assurance team but not for individual developers 17
  18. 18. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics Process Metrics MetricsModelsOthers Just-In-Time Prediction Model Practical Model and Applications History Metrics CK Metrics
  19. 19. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics Just-In-Time Prediction Model Practical Model and Applications Process Metrics MetricsModelsOthers History Metrics CK Metrics
  20. 20. Risk Prediction of Software Changes (Mockus@BLTJ`00) • Logistic regression • Change metrics – LOC added/deleted/modified – Diffusion of change – Developer experience • Result – Both false positive and false negative rate: 20% in the best case 20
  21. 21. Risk Prediction of Software Changes (Mockus@BLTJ`00) • Advantage – Show the feasible model in practice • Limitation – Conducted 3 times per week • Not fully Just-In-Time – Validated on one commercial system (5ESS switching system software) 21
  22. 22. BugCache (Kim@ICSE`07) • Maintain defect-prone entities in a cache • Approach • Result – Top 10% files account for 73-95% of defects on 7 systems 22
  23. 23. BugCache (Kim@ICSE`07) • Advantages – Cache can be updated quickly with less cost. (c.f. static models based on machine learning) – Just-In-Time: always available whenever QA teams want to get the list of defect-prone entities • Limitations – Cache is not reusable for other software projects. – Designed for QA teams • Applicable only in a certain time point after a bunch of changes (e.g., end of a sprint) • Still limited for individual developers in development phase 23
  24. 24. Change Classification (Kim@TSE`08) • Classification model based on SVM • About 11,500 features – Change metadata such as changed LOC, change count – Complexity metrics – Text features from change log messages, source code, and file names • Results – 78% accuracy and 60% recall on average from 12 open- source projects 24
  25. 25. Change Classification (Kim@TSE`08) • Limitations – Heavy model (11,500 features) – Not validated on commercial software products. 25
  26. 26. Follow-up Studies • Studies addressing limitations – “Reducing Features to Improve Code Change-Based Bug Prediction” (Shivaji@TSE`13) • With less than 10% of all features, buggy F-measure is 21% improved. – “Software Change Classification using Hunk Metrics” (Ferzund@ICSM`09) • 27 hunk-level metrics for change classification • 81% accuracy, 77% buggy hunk precision, and 67% buggy hunk recall – “A large-scale empirical study of just-in-time quality assurance” (Kamei@TSE`13) • 14 process metrics (mostly from Mockus`00) • 68% accuracy, 64% recall on 11open-source and commercial projects – “An Empirical Study of Just-In-Time Defect Prediction Using Cross-Project Models” (Fukushima@MSR`14) • Median AUC: 0.72 26
  27. 27. Challenges of JIT model • Practical validation is difficult – Just 10-fold cross validation in current literature – No validation on real scenario • e.g., online machine learning • Still difficult to review huge change – Fine-grained prediction within a change • e.g., Line-level prediction 27
  28. 28. Next Steps of Defect Prediction 1980s 1990s 2000s 2010s 2020s Online Learning JIT Model Prediction Model (Regression) Prediction Model (Classification) Just-In-Time Prediction Model Process Metrics MetricsModelsOthers Fine-grained Prediction
  29. 29. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics Just-In-Time Prediction Model Practical Model and Applications Process Metrics MetricsModelsOthers History Metrics CK Metrics
  30. 30. Defect Prediction in Industry • “Predicting the location and number of faults in large software systems” (Ostrand@TSE`05) – Two industrial systems – Recall 86% – 20% most fault-prone modules account for 62% faults 30
  31. 31. Case Study for Practical Model • “Does Bug Prediction Support Human Developers? Findings From a Google Case Study” (Lewis@ICSE`13) – No identifiable change in developer behaviors after using defect prediction model • Required characteristics but very challenging – Actionable messages / obvious reasoning 31
  32. 32. Next Steps of Defect Prediction 1980s 1990s 2000s 2010s 2020s Actionable Defect Prediction Prediction Model (Regression) Prediction Model (Classification) Just-In-Time Prediction Model Practical Model and Applications Process Metrics MetricsModelsOthers
  33. 33. Evaluation Measure for Practical Model • Measure prediction performance based on code review effort • AUCEC (Area Under Cost Effectiveness Curve) 33 Percent of LOC Percentofbugsfound 0 100% 100% 50%10% M1 M2 Rahman@FSE`11, Bugcache for inspections: Hit or miss?
  34. 34. Practical Application • What else can we do more with defect prediction models? – Test case selection on regression testing (Engstrom@ICST`10) – Prioritizing warnings from FindBugs (Rahman@ICSE`14) 34
  35. 35. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Process Metrics MetricsModelsOthers Practical Model and Applications Just-In-Time Prediction Model History Metrics
  36. 36. Representative OO Metrics Metric Description WMC Weighted Methods per Class (# of methods) DIT Depth of Inheritance Tree ( # of ancestor classes) NOC Number of Children CBO Coupling between Objects (# of coupled classes) RFC Response for a class: WMC + # of methods called by the class) LCOM Lack of Cohesion in Methods (# of "connected components”) 36 • CK metrics (Chidamber&Kemerer@TSE`94) • Prediction Performance of CK vs. code (Basili@TSE`96) – F-measure: 70% vs. 60%
  37. 37. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Process Metrics MetricsModelsOthers Practical Model and Applications Just-In-Time Prediction Model History Metrics
  38. 38. Representative History Metrics 38 Name # of metrics Metric source Citation Relative code change churn 8 SW Repo.* Nagappan@ICSE`05 Change 17 SW Repo. Moser@ICSE`08 Change Entropy 1 SW Repo. Hassan@ICSE`09 Code metric churn Code Entropy 2 SW Repo. D’Ambros@MSR`10 Popularity 5 Email archive Bacchelli@FASE`10 Ownership 4 SW Repo. Bird@FSE`11 Micro Interaction Metrics (MIM) 56 Mylyn Lee@FSE`11 * SW Repo. = version control system + issue tracking system
  39. 39. Representative History Metrics • Advantage – Better prediction performance than code metrics 39 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% Moser`08 Hassan`09 D'Ambros`10 Bachille`10 Bird`11 Lee`11 Performance Improvement (all metrics vs. code complexity metrics) (F-measure) (F-measure)(Absolute prediction error) (Spearman correlation) (Spearman correlation) (Spearman correlation*) (*Bird`10’s results are from two metrics vs. code metrics, No comparison data in Nagappan`05) Performance Improvement (%)
  40. 40. History Metrics • Limitations – History metrics do not extract particular program characteristics such as developer social network, component network, and anti-pattern. – Noise data • Bias in Bug-Fix Dataset(Bird@FSE`09) – Not applicable for new projects and projects lacking in historical data 40
  41. 41. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Noise Reduction Semi- supervised/active
  42. 42. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Noise Reduction Semi- supervised/active
  43. 43. Other Metrics 43 Name # of metrics Metric source Citation Component network 28 Binaries (Windows Server 2003) Zimmermann@ICSE`0 8 Developer-Module network 9 SW Repo. + Binaries Pinzger@FSE`08 Developer social network 4 SW Repo. Meenely@FSE`08 Anti-pattern 4 SW Repo. + Design- pattern Taba@ICSM`13 * SW Repo. = version control system + issue tracking system
  44. 44. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Noise Reduction Semi- supervised/active
  45. 45. Noise Reduction • Noise detection and elimination algorithm (Kim@ICSE`11) – Closest List Noise Identification (CLNI) • Based on Euclidean distance between instances – Average F-measure improvement • 0.504  0.621 • Relink (Wo@FSE`11) – Recover missing links between bugs and changes – 60%  78% recall for missing links – F-measure improvement • e.g. 0.698 (traditional)  0.731 (ReLink) 45
  46. 46. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Semi- supervised/active
  47. 47. Defect Prediction for New Software Projects • Universal Defect Prediction Model • Simi-supervised / active learning • Cross-Project Defect Prediction 47
  48. 48. Universal Defect Prediction Model (Zhang@MSR`14) • Context-aware rank transformation – Transform metric values ranged from 1 to 10 across all projects. • Model built by 1398 projects collected from SourceForge and Google code 48
  49. 49. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Semi- supervised/active
  50. 50. Other approaches for CDDP • Semi-supervised learning with dimension reduction for defect prediction (Lu@ASE`12) – Training a model by a small set of labeled instances together with many unlabeled instances – AUC improvement • 0.83  0.88 with 2% labeled instances • Sample-based semi-supervised/active learning for defect prediction (Li@AESEJ`12) – Average F-measure • 0.628  0.685 with 10% sampled instances 50
  51. 51. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Semi- supervised/active
  52. 52. Cross-Project Defect Prediction (CPDP) • For a new project or a project lacking in the historical data 52 ? ? ? Training Test Model Project A Project B Only 2% out of 622 prediction combinations worked. (Zimmermann@FSE`09)
  53. 53. Transfer Learning (TL) 27 Traditional Machine Learning (ML) Learnin g System Learnin g System Transfer Learning Learnin g System Learnin g System Knowledge Transfer Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
  54. 54. CPDP 54 • Adopting transfer learning Transfer learning Metric Compensation NN Filter TNB TCA+ Preprocessing N/A Feature selection, Log-filter Log-filter Normalization Machine learner C4.5 Naive Bayes TNB Logistic Regression # of Subjects 2 10 10 8 # of predictions 2 10 10 26 Avg. f-measure 0.67 (W:0.79, C:0.58) 0.35 (W:0.37, C:0.26) 0.39 (NN: 0.35, C:0.33) 0.46 (W:0.46, C:0.36) Citation Watanabe@PROMISE `08 Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13 * NN = Nearest neighbor, W = Within, C = Cross
  55. 55. Metric Compensation (Watanabe@PROMISE`08) • Key idea • New target metric value = target metric value * average source metric value average target metric value 55 s Source Target New Target
  56. 56. Metric Compensation (cont.) (Watanabe@PROMISE`08) 56 Transfer learning Metric Compensation NN Filter TNB TCA+ Preprocessing N/A Feature selection, Log-filter Log-filter Normalization Machine learner C4.5 Naive Bayes TNB Logistic Regression # of Subjects 2 10 10 8 # of predictions 2 10 10 26 Avg. f-measure 0.67 (W:0.79, C:0.58) 0.35 (W:0.37, C:0.26) 0.39 (NN: 0.35, C:0.33) 0.46 (W:0.46, C:0.36) Citation Watanabe@PROMISE `08 Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13 * NN = Nearest neighbor, W = Within, C = Cross
  57. 57. NN filter (Turhan@ESEJ`09) • Key idea • Nearest neighbor filter – Select 10 nearest source instances of each target instance 57 New Source Target Hey, you look like me! Could you be my model? Source
  58. 58. NN filter (cont.) (Turhan@ESEJ`09) 58 Transfer learning Metric Compensation NN Filter TNB TCA+ Preprocessing N/A Feature selection, Log-filter Log-filter Normalization Machine learner C4.5 Naive Bayes TNB Logistic Regression # of Subjects 2 10 10 8 # of predictions 2 10 10 26 Avg. f-measure 0.67 (W:0.79, C:0.58) 0.35 (W:0.37, C:0.26) 0.39 (NN: 0.35, C:0.33) 0.46 (W:0.46, C:0.36) Citation Watanabe@PROMISE `08 Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13 * NN = Nearest neighbor, W = Within, C = Cross
  59. 59. Transfer Naive Bayes (Ma@IST`12) • Key idea 59 Target Hey, you look like me! You will get more chance to be my best model! Source  Provide more weight to similar source instances to build a Naive Bayes Model Build a model Please, consider me more important than other instances
  60. 60. Transfer Naive Bayes (cont.) (Ma@IST`12) • Transfer Naive Bayes – New prior probability – New conditional probability 60
  61. 61. Transfer Naive Bayes (cont.) (Ma@IST`12) • How to find similar source instances for target – A similarity score – A weight value 61 F1 F2 F3 F4 Score (si) Max of target 7 3 2 5 - src. inst 1 5 4 2 2 3 src. inst 2 0 2 5 9 1 Min of target 1 2 0 1 - k=# of features, si=score of instance i
  62. 62. Transfer Naive Bayes (cont.) (Ma@IST`12) 62 Transfer learning Metric Compensation NN Filter TNB TCA+ Preprocessing N/A Feature selection, Log-filter Log-filter Normalization Machine learner C4.5 Naive Bayes TNB Logistic Regression # of Subjects 2 10 10 8 # of predictions 2 10 10 26 Avg. f-measure 0.67 (W:0.79, C:0.58) 0.35 (W:0.37, C:0.26) 0.39 (NN: 0.35, C:0.33) 0.46 (W:0.46, C:0.36) Citation Watanabe@PROMISE `08 Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13 * NN = Nearest neighbor, W = Within, C = Cross
  63. 63. TCA+ (Nam@ICSE`13) • Key idea – TCA (Transfer Component Analysis) 63 Source Target Oops, we are different! Let’s meet in another world! New Source New Target
  64. 64. Transfer Component Analysis (cont.) • Feature extraction approach – Dimensionality reduction – Projection • Map original data in a lower-dimensional feature space 64 1-dimensional feature space 2-dimensional feature space
  65. 65. TCA (cont.) 65 Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis Target domain data Source domain data
  66. 66. TCA (cont.) 66 TCA Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
  67. 67. TCA+ (Nam@ICSE`13) 67 Source Target Oops, we are different! Let’s meet at another world! New Source New Target But, we are still a bit different! Source Target Oops, we are different! Let’s meet at another world! New Source New Target Normalize US together! TCA TCA+
  68. 68. Normalization Options • NoN: No normalization applied • N1: Min-max normalization (max=1, min=0) • N2: Z-score normalization (mean=0, std=1) • N3: Z-score normalization only using source mean and standard deviation • N4: Z-score normalization only using target mean and standard deviation 13
  69. 69. Preliminary Results using TCA 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 F-measure 69*Baseline: Cross-project defect prediction without TCA and normalization Prediction performance of TCA varies according to different normalization options! Baseline NoN N1 N2 N3 N4 Baseline NoN N1 N2 N3 N4 Project A  Project B Project B  Project A F-measure
  70. 70. TCA+: Decision Rules • Find a suitable normalization for TCA • Steps – #1: Characterize a dataset – #2: Measure similarity between source and target datasets – #3: Decision rules 70
  71. 71. TCA+: #1. Characterize a Dataset 71 3 1 … Dataset A Dataset B 2 4 5 8 9 6 11 d1,2 d1,5 d1,3 d3,11 3 1 … 2 4 5 8 9 6 11 d2,6 d1,2 d1,3 d3,11 DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i < j} A
  72. 72. TCA+: #2. Measure Similarity between Source and Target • Minimum (min) and maximum (max) values of DIST • Mean and standard deviation (std) of DIST • The number of instances 72
  73. 73. TCA+: #3. Decision Rules • Rule #1 – Mean and Std are same  NoN • Rule #2 – Max and Min are different  N1 (max=1, min=0) • Rule #3,#4 – Std and # of instances are different  N3 or N4 (src/tgt mean=0, std=1) • Rule #5 – Default  N2 (mean=0, std=1) 73
  74. 74. TCA+ (cont.) (Nam@ICSE`13) 74 Transfer learning Metric Compensation NN Filter TNB TCA+ Preprocessing N/A Feature selection, Log-filter Log-filter Normalization Machine learner C4.5 Naive Bayes TNB Logistic Regression # of Subjects 2 10 10 8 # of predictions 2 10 10 26 Avg. f-measure 0.67 (W:0.79, C:0.58) 0.35 (W:0.37, C:0.26) 0.39 (NN: 0.35, C:0.33) 0.46 (W:0.46, C:0.36) Citation Watanabe@PROMISE `08 Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13 * NN = Nearest neighbor, W = Within, C = Cross
  75. 75. Current CPDP using TL • Advantages – Comparable prediction performance to within-prediction models – Benefit from the state-of-the-art TL approaches • Limitation – Performance of some cross-prediction pairs is still poor. (Negative Transfer) 75 Source Target
  76. 76. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Semi- supervised/active
  77. 77. Feasibility Evaluation for CPDP • Solution for negative transfer – Decision tree using project characteristic metrics (Zimmermann@FSE`09) • E.g. programming language, # developers, etc. 77
  78. 78. Follow-up Studies • “An investigation on the feasibility of cross-project defect prediction.” (He@ASEJ`12) – Decision tree using distributional characteristics of a dataset E.g. mean, skewness, peakedness, etc. 78
  79. 79. Feasibility for CPDP • Challenges on current studies – Decision trees were not evaluated properly. • Just fitting model – Low target prediction coverage • 5 out of 34 target projects were feasible for cross- predictions (He@ASEJ`12) 79
  80. 80. Next Steps of Defect Prediction 1980s 1990s 2000s 2010s 2020s Cross-Prediction Feasibility Model Prediction Model (Regression) Prediction Model (Classification) CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Semi- supervised/active
  81. 81. Semi- supervised/active Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics History Metrics Just-In-Time Prediction Model Cross-Project Prediction Other Metrics Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers Personalized Model
  82. 82. Cross-prediction Model • Common challenge – Current cross-prediction models are limited to datasets with same number of metrics – Not applicable on projects with different feature spaces (different domains) • NASA Dataset: Halstead, LOC • Apache Dataset: LOC, Cyclomatic, CK metrics 82 Source Target
  83. 83. Next Steps of Defect Prediction 1980s 1990s 2000s 2010s 2020s Prediction Model (Regression) Prediction Model (Classification) CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers Cross-Domain Prediction History Metrics Other Metrics Noise Reduction Semi- supervised/activePersonalized Model
  84. 84. Other Topics 84
  85. 85. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics History Metrics Just-In-Time Prediction Model Cross-Project Prediction Other Metrics Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers Data Privacy Noise Reduction Semi- supervised/activePersonalized Model
  86. 86. Other Topics • Privacy issue on defect datasets – MORPH (Peters@ICSE`12) • Mutate defect datasets while keeping prediction accuracy • Can accelerate cross-project defect prediction with industrial datasets • Personalized defect prediction model (Jiang@ASE`13) – “Different developers have different coding styles, commit frequencies, and experience levels, all of which cause different defect patterns.” – Results • Average F-measure: 0.62 (personalized models) vs. 0.59 (non- personalized models) 86
  87. 87. Outline • Background • Software Defect Prediction Approaches – Simple metric and defect estimation models – Complexity metrics and Fitting models – Prediction models – Just-In-Time Prediction Models – Practical Prediction Models and Applications – History Metrics from Software Repositories – Cross-Project Defect Prediction and Feasibility • Summary and Challenging Issues 87
  88. 88. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics History Metrics Just-In-Time Prediction Model Cross-Project Prediction Other Metrics Practical Model and Applications Data Privacy Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers Noise Reduction Semi- supervised/activePersonalized Model
  89. 89. Next Steps of Defect Prediction 1980s 1990s 2000s 2010s 2020s Online Learning JIT Model Actionable Defect Prediction Cross-Prediction Feasibility Model Prediction Model (Regression) Prediction Model (Classification) CK Metrics History Metrics Just-In-Time Prediction Model Cross-Project Prediction Other Metrics Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers Cross-Domain Prediction Fine-grained Prediction Data Privacy Noise Reduction Semi- supervised/activePersonalized Model
  90. 90. Thank you! 90
  91. 91. 91
  92. 92. Evaluation Measures (classification) • Measures for binary classification – Confusion matrix 92 Buggy Clean Buggy True Positive (TP) False Negative (FN) Clean False Positive (FP) True Negatives (TN) Predicted Class Actual Class
  93. 93. Evaluation Measures (classification) • False positive rate (FPR,PF) = FP/(TN+FP) • Accuracy = (TP+TN)/(TP+FP+TN+FN) • Precision = TP/(TP+FP) • Recall = TP/(TP+FN) • F-measure = 2*Precision*Recall Precision+Recall 93
  94. 94. Evaluation Measures (classification) • AUC (Area Under receiver operating characteristic Curve) 94 False Positive rate TruePositiverate 0 1 1
  95. 95. Evaluation Measures (classification) • AUCEC (Area Under Cost Effectiveness Curve) 95 Percent of LOC Percentofbugsfound 0 100% 100% 50%10% M1 M2 Rahman@FSE`11, Bugcache for inspections: Hit or miss?
  96. 96. Evaluation Measures (Regression) • Target – Metric values vs. the number of bugs – Actual vs. predicted number of bugs • Correlation coefficient – Spearman / Pearson /R2 • Mean squared error 96
  97. 97. CK metrics Metric Description WMC Weighted Methods per Class (# of methods) DIT Depth of Inheritance Tree ( # of ancestor classes) NOC Number of Children CBO Coupling between Objects (# of coupled classes) RFC Response for a class: WMC + # of methods called by the class) LCOM Lack of Cohesion in Methods (# of "connected components”) 97

×