Your SlideShare is downloading.
×

- 1. Survey on Software Defect Prediction - PhD Qualifying Examination - July 3, 2014 Jaechang Nam Department of Computer Science and Engineering HKUST
- 2. Outline • Background • Software Defect Prediction Approaches – Simple metric and defect estimation models – Complexity metrics and Fitting models – Prediction models – Just-In-Time Prediction Models – Practical Prediction Models and Applications – History Metrics from Software Repositories – Cross-Project Defect Prediction and Feasibility • Summary and Challenging Issues 2
- 3. Motivation • General question of software defect prediction – Can we identify defect-prone entities (source code file, binary, module, change,...) in advance? • # of defects • buggy or clean • Why? – Quality assurance for large software (Akiyama@IFIP’71) – Effective resource allocation • Testing (Menzies@TSE`07) • Code review (Rahman@FSE’11) 3
- 4. Ground Assumption • The more complex, the more defect- prone 4
- 5. Two Focuses on Defect Prediction • How much complex is software and its process? – Metrics • How can we predict whether software has defects? – Models based on the metrics 5
- 6. Prediction Performance Goal • Recall vs. Precision • Strong predictor criteria – 70% recall and 25% false positive rate (Menzies@TSE`07) – Precision, recall, accuracy ≥ 75% (Zimmermann@FSE`09) 6
- 7. Outline • Background • Software Defect Prediction Approaches – Simple metric and defect estimation models – Complexity metrics and Fitting models – Prediction models – Just-In-Time Prediction Models – Practical Prediction Models and Applications – History Metrics from Software Repositories – Cross-Project Defect Prediction and Feasibility • Summary and Challenging Issues 7
- 8. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model MetricsModelsOthers
- 9. Identifying Defect-prone Entities • Akiyama’s equation (Ajiyama@IFIP`71) – # of defects = 4.86 + 0.018 * LOC (=Lines Of Code) • 23 defects in 1 KLOC • Derived from actual systems • Limitation – Only LOC is not enough to capture software complexity 9
- 10. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Cyclomati c Metric Halstea d Metrics MetricsModelsOthers
- 11. Complexity Metrics and Fitting Models • Cyclomatic complexity metrics (McCabe`76) – “Logical complexity” of a program represented in control flow graph – V(G) = #edge – #node + 2 • Halstead complexity metrics (Halsted`77) – Metrics based on # of operators and operands – Volume = N * log2n – # of defects = Volume / 3000 11
- 12. Complexity Metrics and Fitting Models • Limitation – Do not capture complexity (amount) of change. – Just fitting models but not prediction models in most of studies conducted in 1970s and early 1980s • Correlation analysis between metrics and # of defects – By linear regression models • Models were not validated for new entities (modules). 12
- 13. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Cyclomati c Metric Halstea d Metrics Process Metrics MetricsModelsOthers Prediction Model (Classification)
- 14. Regression Model • Shen et al.’s empirical study (Shen@TSE`85) – Linear regression model – Validated on actual new modules – Metrics • Halstead, # of conditional statements • Process metrics – Delta of complexity metrics between two successive system versions – Measures • Between actual and predicted # of defects on new modules – MRE (Mean magnitude of relative error) » average of (D-D’)/D for all modules • D: actual # of defects • D’: predicted # of defects » MRE = 0.48 14
- 15. Classification Model • Discriminative analysis by Munson et al. (Munson@TSE`92) • Logistic regression • High risk vs. low risk modules • Metrics – Halstead and Cyclomatic complexity metrics • Measure – Type I error: False positive rate – Type II error: False negative rate • Result – Accuracy: 92% (6 misclassification out of 78 modules) – Precision: 85% – Recall: 73% – F-measure: 88% 15
- 16. ? Defect Prediction Process (Based on Machine Learning) 16 Classification / Regression Software Archives B C C B ... 2 5 0 1 ... Instances with metrics (features) and labels B C B ... 2 0 1 ... Training Instances (Preprocessing ) Model ? New instances Generate Instances Build a model
- 17. Defect Prediction (Based on Machine Learning) • Limitations – Limited resources for process metrics • Error fix in unit testing phase was conducted informally by an individual developer (no error information available in this phase). (Shen@TSE`85) – Existing metrics were not enough to capture complexity of object-oriented (OO) programs. – Helpful for quality assurance team but not for individual developers 17
- 18. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics Process Metrics MetricsModelsOthers Just-In-Time Prediction Model Practical Model and Applications History Metrics CK Metrics
- 19. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics Just-In-Time Prediction Model Practical Model and Applications Process Metrics MetricsModelsOthers History Metrics CK Metrics
- 20. Risk Prediction of Software Changes (Mockus@BLTJ`00) • Logistic regression • Change metrics – LOC added/deleted/modified – Diffusion of change – Developer experience • Result – Both false positive and false negative rate: 20% in the best case 20
- 21. Risk Prediction of Software Changes (Mockus@BLTJ`00) • Advantage – Show the feasible model in practice • Limitation – Conducted 3 times per week • Not fully Just-In-Time – Validated on one commercial system (5ESS switching system software) 21
- 22. BugCache (Kim@ICSE`07) • Maintain defect-prone entities in a cache • Approach • Result – Top 10% files account for 73-95% of defects on 7 systems 22
- 23. BugCache (Kim@ICSE`07) • Advantages – Cache can be updated quickly with less cost. (c.f. static models based on machine learning) – Just-In-Time: always available whenever QA teams want to get the list of defect-prone entities • Limitations – Cache is not reusable for other software projects. – Designed for QA teams • Applicable only in a certain time point after a bunch of changes (e.g., end of a sprint) • Still limited for individual developers in development phase 23
- 24. Change Classification (Kim@TSE`08) • Classification model based on SVM • About 11,500 features – Change metadata such as changed LOC, change count – Complexity metrics – Text features from change log messages, source code, and file names • Results – 78% accuracy and 60% recall on average from 12 open- source projects 24
- 25. Change Classification (Kim@TSE`08) • Limitations – Heavy model (11,500 features) – Not validated on commercial software products. 25
- 26. Follow-up Studies • Studies addressing limitations – “Reducing Features to Improve Code Change-Based Bug Prediction” (Shivaji@TSE`13) • With less than 10% of all features, buggy F-measure is 21% improved. – “Software Change Classification using Hunk Metrics” (Ferzund@ICSM`09) • 27 hunk-level metrics for change classification • 81% accuracy, 77% buggy hunk precision, and 67% buggy hunk recall – “A large-scale empirical study of just-in-time quality assurance” (Kamei@TSE`13) • 14 process metrics (mostly from Mockus`00) • 68% accuracy, 64% recall on 11open-source and commercial projects – “An Empirical Study of Just-In-Time Defect Prediction Using Cross-Project Models” (Fukushima@MSR`14) • Median AUC: 0.72 26
- 27. Challenges of JIT model • Practical validation is difficult – Just 10-fold cross validation in current literature – No validation on real scenario • e.g., online machine learning • Still difficult to review huge change – Fine-grained prediction within a change • e.g., Line-level prediction 27
- 28. Next Steps of Defect Prediction 1980s 1990s 2000s 2010s 2020s Online Learning JIT Model Prediction Model (Regression) Prediction Model (Classification) Just-In-Time Prediction Model Process Metrics MetricsModelsOthers Fine-grained Prediction
- 29. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics Just-In-Time Prediction Model Practical Model and Applications Process Metrics MetricsModelsOthers History Metrics CK Metrics
- 30. Defect Prediction in Industry • “Predicting the location and number of faults in large software systems” (Ostrand@TSE`05) – Two industrial systems – Recall 86% – 20% most fault-prone modules account for 62% faults 30
- 31. Case Study for Practical Model • “Does Bug Prediction Support Human Developers? Findings From a Google Case Study” (Lewis@ICSE`13) – No identifiable change in developer behaviors after using defect prediction model • Required characteristics but very challenging – Actionable messages / obvious reasoning 31
- 32. Next Steps of Defect Prediction 1980s 1990s 2000s 2010s 2020s Actionable Defect Prediction Prediction Model (Regression) Prediction Model (Classification) Just-In-Time Prediction Model Practical Model and Applications Process Metrics MetricsModelsOthers
- 33. Evaluation Measure for Practical Model • Measure prediction performance based on code review effort • AUCEC (Area Under Cost Effectiveness Curve) 33 Percent of LOC Percentofbugsfound 0 100% 100% 50%10% M1 M2 Rahman@FSE`11, Bugcache for inspections: Hit or miss?
- 34. Practical Application • What else can we do more with defect prediction models? – Test case selection on regression testing (Engstrom@ICST`10) – Prioritizing warnings from FindBugs (Rahman@ICSE`14) 34
- 35. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Process Metrics MetricsModelsOthers Practical Model and Applications Just-In-Time Prediction Model History Metrics
- 36. Representative OO Metrics Metric Description WMC Weighted Methods per Class (# of methods) DIT Depth of Inheritance Tree ( # of ancestor classes) NOC Number of Children CBO Coupling between Objects (# of coupled classes) RFC Response for a class: WMC + # of methods called by the class) LCOM Lack of Cohesion in Methods (# of "connected components”) 36 • CK metrics (Chidamber&Kemerer@TSE`94) • Prediction Performance of CK vs. code (Basili@TSE`96) – F-measure: 70% vs. 60%
- 37. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Process Metrics MetricsModelsOthers Practical Model and Applications Just-In-Time Prediction Model History Metrics
- 38. Representative History Metrics 38 Name # of metrics Metric source Citation Relative code change churn 8 SW Repo.* Nagappan@ICSE`05 Change 17 SW Repo. Moser@ICSE`08 Change Entropy 1 SW Repo. Hassan@ICSE`09 Code metric churn Code Entropy 2 SW Repo. D’Ambros@MSR`10 Popularity 5 Email archive Bacchelli@FASE`10 Ownership 4 SW Repo. Bird@FSE`11 Micro Interaction Metrics (MIM) 56 Mylyn Lee@FSE`11 * SW Repo. = version control system + issue tracking system
- 39. Representative History Metrics • Advantage – Better prediction performance than code metrics 39 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% Moser`08 Hassan`09 D'Ambros`10 Bachille`10 Bird`11 Lee`11 Performance Improvement (all metrics vs. code complexity metrics) (F-measure) (F-measure)(Absolute prediction error) (Spearman correlation) (Spearman correlation) (Spearman correlation*) (*Bird`10’s results are from two metrics vs. code metrics, No comparison data in Nagappan`05) Performance Improvement (%)
- 40. History Metrics • Limitations – History metrics do not extract particular program characteristics such as developer social network, component network, and anti-pattern. – Noise data • Bias in Bug-Fix Dataset(Bird@FSE`09) – Not applicable for new projects and projects lacking in historical data 40
- 41. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Noise Reduction Semi- supervised/active
- 42. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Noise Reduction Semi- supervised/active
- 43. Other Metrics 43 Name # of metrics Metric source Citation Component network 28 Binaries (Windows Server 2003) Zimmermann@ICSE`0 8 Developer-Module network 9 SW Repo. + Binaries Pinzger@FSE`08 Developer social network 4 SW Repo. Meenely@FSE`08 Anti-pattern 4 SW Repo. + Design- pattern Taba@ICSM`13 * SW Repo. = version control system + issue tracking system
- 44. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Noise Reduction Semi- supervised/active
- 45. Noise Reduction • Noise detection and elimination algorithm (Kim@ICSE`11) – Closest List Noise Identification (CLNI) • Based on Euclidean distance between instances – Average F-measure improvement • 0.504 0.621 • Relink (Wo@FSE`11) – Recover missing links between bugs and changes – 60% 78% recall for missing links – F-measure improvement • e.g. 0.698 (traditional) 0.731 (ReLink) 45
- 46. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Semi- supervised/active
- 47. Defect Prediction for New Software Projects • Universal Defect Prediction Model • Simi-supervised / active learning • Cross-Project Defect Prediction 47
- 48. Universal Defect Prediction Model (Zhang@MSR`14) • Context-aware rank transformation – Transform metric values ranged from 1 to 10 across all projects. • Model built by 1398 projects collected from SourceForge and Google code 48
- 49. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Semi- supervised/active
- 50. Other approaches for CDDP • Semi-supervised learning with dimension reduction for defect prediction (Lu@ASE`12) – Training a model by a small set of labeled instances together with many unlabeled instances – AUC improvement • 0.83 0.88 with 2% labeled instances • Sample-based semi-supervised/active learning for defect prediction (Li@AESEJ`12) – Average F-measure • 0.628 0.685 with 10% sampled instances 50
- 51. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Semi- supervised/active
- 52. Cross-Project Defect Prediction (CPDP) • For a new project or a project lacking in the historical data 52 ? ? ? Training Test Model Project A Project B Only 2% out of 622 prediction combinations worked. (Zimmermann@FSE`09)
- 53. Transfer Learning (TL) 27 Traditional Machine Learning (ML) Learnin g System Learnin g System Transfer Learning Learnin g System Learnin g System Knowledge Transfer Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
- 54. CPDP 54 • Adopting transfer learning Transfer learning Metric Compensation NN Filter TNB TCA+ Preprocessing N/A Feature selection, Log-filter Log-filter Normalization Machine learner C4.5 Naive Bayes TNB Logistic Regression # of Subjects 2 10 10 8 # of predictions 2 10 10 26 Avg. f-measure 0.67 (W:0.79, C:0.58) 0.35 (W:0.37, C:0.26) 0.39 (NN: 0.35, C:0.33) 0.46 (W:0.46, C:0.36) Citation Watanabe@PROMISE `08 Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13 * NN = Nearest neighbor, W = Within, C = Cross
- 55. Metric Compensation (Watanabe@PROMISE`08) • Key idea • New target metric value = target metric value * average source metric value average target metric value 55 s Source Target New Target
- 56. Metric Compensation (cont.) (Watanabe@PROMISE`08) 56 Transfer learning Metric Compensation NN Filter TNB TCA+ Preprocessing N/A Feature selection, Log-filter Log-filter Normalization Machine learner C4.5 Naive Bayes TNB Logistic Regression # of Subjects 2 10 10 8 # of predictions 2 10 10 26 Avg. f-measure 0.67 (W:0.79, C:0.58) 0.35 (W:0.37, C:0.26) 0.39 (NN: 0.35, C:0.33) 0.46 (W:0.46, C:0.36) Citation Watanabe@PROMISE `08 Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13 * NN = Nearest neighbor, W = Within, C = Cross
- 57. NN filter (Turhan@ESEJ`09) • Key idea • Nearest neighbor filter – Select 10 nearest source instances of each target instance 57 New Source Target Hey, you look like me! Could you be my model? Source
- 58. NN filter (cont.) (Turhan@ESEJ`09) 58 Transfer learning Metric Compensation NN Filter TNB TCA+ Preprocessing N/A Feature selection, Log-filter Log-filter Normalization Machine learner C4.5 Naive Bayes TNB Logistic Regression # of Subjects 2 10 10 8 # of predictions 2 10 10 26 Avg. f-measure 0.67 (W:0.79, C:0.58) 0.35 (W:0.37, C:0.26) 0.39 (NN: 0.35, C:0.33) 0.46 (W:0.46, C:0.36) Citation Watanabe@PROMISE `08 Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13 * NN = Nearest neighbor, W = Within, C = Cross
- 59. Transfer Naive Bayes (Ma@IST`12) • Key idea 59 Target Hey, you look like me! You will get more chance to be my best model! Source Provide more weight to similar source instances to build a Naive Bayes Model Build a model Please, consider me more important than other instances
- 60. Transfer Naive Bayes (cont.) (Ma@IST`12) • Transfer Naive Bayes – New prior probability – New conditional probability 60
- 61. Transfer Naive Bayes (cont.) (Ma@IST`12) • How to find similar source instances for target – A similarity score – A weight value 61 F1 F2 F3 F4 Score (si) Max of target 7 3 2 5 - src. inst 1 5 4 2 2 3 src. inst 2 0 2 5 9 1 Min of target 1 2 0 1 - k=# of features, si=score of instance i
- 62. Transfer Naive Bayes (cont.) (Ma@IST`12) 62 Transfer learning Metric Compensation NN Filter TNB TCA+ Preprocessing N/A Feature selection, Log-filter Log-filter Normalization Machine learner C4.5 Naive Bayes TNB Logistic Regression # of Subjects 2 10 10 8 # of predictions 2 10 10 26 Avg. f-measure 0.67 (W:0.79, C:0.58) 0.35 (W:0.37, C:0.26) 0.39 (NN: 0.35, C:0.33) 0.46 (W:0.46, C:0.36) Citation Watanabe@PROMISE `08 Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13 * NN = Nearest neighbor, W = Within, C = Cross
- 63. TCA+ (Nam@ICSE`13) • Key idea – TCA (Transfer Component Analysis) 63 Source Target Oops, we are different! Let’s meet in another world! New Source New Target
- 64. Transfer Component Analysis (cont.) • Feature extraction approach – Dimensionality reduction – Projection • Map original data in a lower-dimensional feature space 64 1-dimensional feature space 2-dimensional feature space
- 65. TCA (cont.) 65 Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis Target domain data Source domain data
- 66. TCA (cont.) 66 TCA Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
- 67. TCA+ (Nam@ICSE`13) 67 Source Target Oops, we are different! Let’s meet at another world! New Source New Target But, we are still a bit different! Source Target Oops, we are different! Let’s meet at another world! New Source New Target Normalize US together! TCA TCA+
- 68. Normalization Options • NoN: No normalization applied • N1: Min-max normalization (max=1, min=0) • N2: Z-score normalization (mean=0, std=1) • N3: Z-score normalization only using source mean and standard deviation • N4: Z-score normalization only using target mean and standard deviation 13
- 69. Preliminary Results using TCA 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 F-measure 69*Baseline: Cross-project defect prediction without TCA and normalization Prediction performance of TCA varies according to different normalization options! Baseline NoN N1 N2 N3 N4 Baseline NoN N1 N2 N3 N4 Project A Project B Project B Project A F-measure
- 70. TCA+: Decision Rules • Find a suitable normalization for TCA • Steps – #1: Characterize a dataset – #2: Measure similarity between source and target datasets – #3: Decision rules 70
- 71. TCA+: #1. Characterize a Dataset 71 3 1 … Dataset A Dataset B 2 4 5 8 9 6 11 d1,2 d1,5 d1,3 d3,11 3 1 … 2 4 5 8 9 6 11 d2,6 d1,2 d1,3 d3,11 DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i < j} A
- 72. TCA+: #2. Measure Similarity between Source and Target • Minimum (min) and maximum (max) values of DIST • Mean and standard deviation (std) of DIST • The number of instances 72
- 73. TCA+: #3. Decision Rules • Rule #1 – Mean and Std are same NoN • Rule #2 – Max and Min are different N1 (max=1, min=0) • Rule #3,#4 – Std and # of instances are different N3 or N4 (src/tgt mean=0, std=1) • Rule #5 – Default N2 (mean=0, std=1) 73
- 74. TCA+ (cont.) (Nam@ICSE`13) 74 Transfer learning Metric Compensation NN Filter TNB TCA+ Preprocessing N/A Feature selection, Log-filter Log-filter Normalization Machine learner C4.5 Naive Bayes TNB Logistic Regression # of Subjects 2 10 10 8 # of predictions 2 10 10 26 Avg. f-measure 0.67 (W:0.79, C:0.58) 0.35 (W:0.37, C:0.26) 0.39 (NN: 0.35, C:0.33) 0.46 (W:0.46, C:0.36) Citation Watanabe@PROMISE `08 Turhan@ESEJ`09 Ma@IST`12 Nam@ICSE`13 * NN = Nearest neighbor, W = Within, C = Cross
- 75. Current CPDP using TL • Advantages – Comparable prediction performance to within-prediction models – Benefit from the state-of-the-art TL approaches • Limitation – Performance of some cross-prediction pairs is still poor. (Negative Transfer) 75 Source Target
- 76. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Semi- supervised/active
- 77. Feasibility Evaluation for CPDP • Solution for negative transfer – Decision tree using project characteristic metrics (Zimmermann@FSE`09) • E.g. programming language, # developers, etc. 77
- 78. Follow-up Studies • “An investigation on the feasibility of cross-project defect prediction.” (He@ASEJ`12) – Decision tree using distributional characteristics of a dataset E.g. mean, skewness, peakedness, etc. 78
- 79. Feasibility for CPDP • Challenges on current studies – Decision trees were not evaluated properly. • Just fitting model – Low target prediction coverage • 5 out of 34 target projects were feasible for cross- predictions (He@ASEJ`12) 79
- 80. Next Steps of Defect Prediction 1980s 1990s 2000s 2010s 2020s Cross-Prediction Feasibility Model Prediction Model (Regression) Prediction Model (Classification) CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers History Metrics Other Metrics Semi- supervised/active
- 81. Semi- supervised/active Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics History Metrics Just-In-Time Prediction Model Cross-Project Prediction Other Metrics Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers Personalized Model
- 82. Cross-prediction Model • Common challenge – Current cross-prediction models are limited to datasets with same number of metrics – Not applicable on projects with different feature spaces (different domains) • NASA Dataset: Halstead, LOC • Apache Dataset: LOC, Cyclomatic, CK metrics 82 Source Target
- 83. Next Steps of Defect Prediction 1980s 1990s 2000s 2010s 2020s Prediction Model (Regression) Prediction Model (Classification) CK Metrics Just-In-Time Prediction Model Cross-Project Prediction Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers Cross-Domain Prediction History Metrics Other Metrics Noise Reduction Semi- supervised/activePersonalized Model
- 84. Other Topics 84
- 85. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics History Metrics Just-In-Time Prediction Model Cross-Project Prediction Other Metrics Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers Data Privacy Noise Reduction Semi- supervised/activePersonalized Model
- 86. Other Topics • Privacy issue on defect datasets – MORPH (Peters@ICSE`12) • Mutate defect datasets while keeping prediction accuracy • Can accelerate cross-project defect prediction with industrial datasets • Personalized defect prediction model (Jiang@ASE`13) – “Different developers have different coding styles, commit frequencies, and experience levels, all of which cause different defect patterns.” – Results • Average F-measure: 0.62 (personalized models) vs. 0.59 (non- personalized models) 86
- 87. Outline • Background • Software Defect Prediction Approaches – Simple metric and defect estimation models – Complexity metrics and Fitting models – Prediction models – Just-In-Time Prediction Models – Practical Prediction Models and Applications – History Metrics from Software Repositories – Cross-Project Defect Prediction and Feasibility • Summary and Challenging Issues 87
- 88. Defect Prediction Approaches 1970s 1980s 1990s 2000s 2010s LOC Simple Model Fitting Model Prediction Model (Regression) Prediction Model (Classification) Cyclomati c Metric Halstea d Metrics CK Metrics History Metrics Just-In-Time Prediction Model Cross-Project Prediction Other Metrics Practical Model and Applications Data Privacy Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers Noise Reduction Semi- supervised/activePersonalized Model
- 89. Next Steps of Defect Prediction 1980s 1990s 2000s 2010s 2020s Online Learning JIT Model Actionable Defect Prediction Cross-Prediction Feasibility Model Prediction Model (Regression) Prediction Model (Classification) CK Metrics History Metrics Just-In-Time Prediction Model Cross-Project Prediction Other Metrics Practical Model and Applications Universa l Model Process Metrics Cross-Project Feasibility MetricsModelsOthers Cross-Domain Prediction Fine-grained Prediction Data Privacy Noise Reduction Semi- supervised/activePersonalized Model
- 90. Thank you! 90
- 91. 91
- 92. Evaluation Measures (classification) • Measures for binary classification – Confusion matrix 92 Buggy Clean Buggy True Positive (TP) False Negative (FN) Clean False Positive (FP) True Negatives (TN) Predicted Class Actual Class
- 93. Evaluation Measures (classification) • False positive rate (FPR,PF) = FP/(TN+FP) • Accuracy = (TP+TN)/(TP+FP+TN+FN) • Precision = TP/(TP+FP) • Recall = TP/(TP+FN) • F-measure = 2*Precision*Recall Precision+Recall 93
- 94. Evaluation Measures (classification) • AUC (Area Under receiver operating characteristic Curve) 94 False Positive rate TruePositiverate 0 1 1
- 95. Evaluation Measures (classification) • AUCEC (Area Under Cost Effectiveness Curve) 95 Percent of LOC Percentofbugsfound 0 100% 100% 50%10% M1 M2 Rahman@FSE`11, Bugcache for inspections: Hit or miss?
- 96. Evaluation Measures (Regression) • Target – Metric values vs. the number of bugs – Actual vs. predicted number of bugs • Correlation coefficient – Spearman / Pearson /R2 • Mean squared error 96
- 97. CK metrics Metric Description WMC Weighted Methods per Class (# of methods) DIT Depth of Inheritance Tree ( # of ancestor classes) NOC Number of Children CBO Coupling between Objects (# of coupled classes) RFC Response for a class: WMC + # of methods called by the class) LCOM Lack of Cohesion in Methods (# of "connected components”) 97