Advertisement

Personalized Defect Prediction

Sung Kim
Associate Prof.
Nov. 18, 2013
Advertisement

More Related Content

Slideshows for you(20)

Similar to Personalized Defect Prediction(20)

Advertisement
Advertisement

Personalized Defect Prediction

  1. Personalized Defect Prediction Tian Jiang Lin Tan University of Waterloo University of Waterloo Sunghun Kim Hong Kong University of Science and Technology 1
  2. How to Find Bugs? • Code Review • Testing • Static Analysis • Dynamic Analysis • Verification • Defect Prediction 2 2
  3. Defect Prediction Software History Predictor Future Defect 3 3
  4. Developers are Different 4 4
  5. Developers are Different Modulo % FOR Bitwise OR CONTINUE % of Buggy Changes 80 60 40 20 0 A B C D Average Linux Kernel, 2005-2010 4 4
  6. Developers are Different Modulo % FOR Bitwise OR CONTINUE % of Buggy Changes 80 60 40 20 0 A B C D Average Linux Kernel, 2005-2010 4 4
  7. Developers are Different Modulo % FOR Bitwise OR CONTINUE % of Buggy Changes 80 60 40 20 0 A B C D Average Linux Kernel, 2005-2010 Personalized models can improve performance. 4 4
  8. Successes in Other Fields 5 5
  9. Successes in Other Fields • Google personalized search 5 5
  10. Successes in Other Fields • • Google personalized search Facebook personalized ad placement 5 5
  11. Contributions 6 6
  12. Contributions • Personalized Change Classification (PCC) ✦ One model for each developer 6 6
  13. Contributions • Personalized Change Classification (PCC) ✦ One model for each developer • Confidence-based Hybrid PCC (PCC+) ✦ Picks predictions with highest confidence 6 6
  14. Contributions • Personalized Change Classification (PCC) ✦ One model for each developer • Confidence-based Hybrid PCC (PCC+) ✦ Picks predictions with highest confidence • Evaluate on six C and Java projects ✦ Find up to 155 more bugs by inspecting 20% LOC ✦ Improve F1 by up to 0.08 6 6
  15. What is a Change? 7 7
  16. What is a Change? Commit: 09a02f... Author: John Smith Message: I submitted some code. file1.c + + + - file2.c + - file3.c + + - 7 7
  17. What is a Change? Commit Commit: 09a02f... Author: John Smith Message: I submitted some code. file1.c + + + - file2.c + - file3.c + + - Change 1 Change 2 Change 3 7 7
  18. What is a Change? Commit Commit: 09a02f... Author: John Smith Message: I submitted some code. file1.c + + + - file2.c + - file3.c + + - Change 1 Change 2 Change 3 Change-Level: Inspect less code to locate a bug. 7 7
  19. Change Classification (CC) 8 8
  20. Change Classification (CC) Training Phase Prediction Phase Software History 8 8
  21. Change Classification (CC) Training Phase Software History Prediction Phase Training Instances 1. Label changes with clean or buggy 8 8
  22. Change Classification (CC) Training Phase Software History Training Instances 1. Label changes with clean or buggy Prediction Phase Features 2. Extract features 8 8
  23. Change Classification (CC) Training Phase Software History Training Instances 1. Label changes with clean or buggy Prediction Phase Features 2. Extract features Classification Algorithm Model 3. Build prediction model 8 8
  24. Change Classification (CC) Training Phase Software History Training Instances 1. Label changes with clean or buggy Prediction Phase Features 2. Extract features Classification Algorithm 3. Build prediction model Model Future Instances 4. Predict 8 8
  25. Label Clean or Buggy 9 9
  26. Label Clean or Buggy [Sliwerski et al. ’05] Revision History 9 9
  27. Label Clean or Buggy [Sliwerski et al. ’05] Revision History Bug-Fixing Change Commit: 1da57... Message: I fixed a bug fileA.c - if (i < 128) +if (i <= 128) Contain keyword “fix”, or ID of manually verified bug report [Herzif et al. ’13] 9 9
  28. Label Clean or Buggy [Sliwerski et al. ’05] Revision History Buggy Change Bug-Fixing Change Commit: 7a3bc... Message: new feature fileA.c +... +if (i < 128) +... Commit: 1da57... Message: I fixed a bug fileA.c Fixed by a later change git blame - if (i < 128) +if (i <= 128) Contain keyword “fix”, or ID of manually verified bug report [Herzif et al. ’13] 9 9
  29. Three Types of Features 10 10
  30. Three Types of Features • Metadata • Bag-of-Words • Characteristic Vector 10 10
  31. Characteristic Vector 11 11
  32. Characteristic Vector Count Abstract Syntax Tree (AST) nodes 11 11
  33. Characteristic Vector Count Abstract Syntax Tree (AST) nodes for (...; ...; ...) { for (...; ...; ...) { if (...) ...; } } 11 11
  34. Characteristic Vector Count Abstract Syntax Tree (AST) nodes for: if: while: ... for (...; ...; ...) { for (...; ...; ...) { if (...) ...; } } 11 11
  35. Characteristic Vector Count Abstract Syntax Tree (AST) nodes for: if: while: ... for (...; ...; ...) { for (...; ...; ...) { if (...) ...; } } 2 1 0 11 11
  36. CC: Training 12 12
  37. CC: Training Training Instances Model 12 12
  38. CC: Training Training Instances Model 12 12
  39. CC: Prediction Unlabeled Changes 13 13
  40. CC: Prediction Unlabeled Changes Model Predicted Changes 13 13
  41. PCC: Training 14 14
  42. PCC: Training Training Instances 14 14
  43. PCC: Training Dev 1 Training Instances Dev 2 Dev 3 Group Changes by Developer 14 14
  44. PCC: Training Model 1 Dev 1 Model 2 Training Instances Dev 2 Model 3 Dev 3 Group Changes by Developer Training 14 14
  45. PCC: Prediction Model 1 Model 2 Model 3 15 15
  46. PCC: Prediction Model 1 Model 2 (Dev 2) Model 3 Choose a Model by Developer 15 15
  47. PCC: Prediction Model 1 Model 2 (Dev 2) Model 3 Choose a Model by Developer Prediction 15 15
  48. PCC+: Prediction 16 16
  49. PCC+: Prediction Combiner CC PCC Feed Changes to All Models Prediction 16 16
  50. Confidence Measure 17 17
  51. Confidence Measure • Bugginess ✦ Probability of a change being buggy 17 17
  52. Confidence Measure • Bugginess ✦ Probability of a change being buggy • Confidence Measure ✦ Comparable measure of confidence 17 17
  53. Confidence Measure • Bugginess ✦ Probability of a change being buggy • Confidence Measure ✦ Comparable measure of confidence • Select the prediction with the highest confidence. 17 17
  54. Research Questions 18 18
  55. Research Questions • RQ1: Do PCC and PCC+ outperform CC? 18 18
  56. Research Questions • • RQ1: Do PCC and PCC+ outperform CC? RQ2: Does PCC outperform CC in other setups? ✦ Classification algorithms ✦ Sizes of training sets 18 18
  57. Two Metrics 19 19
  58. Two Metrics • F1-Score ✦ Harmonic mean of precision and recall 19 19
  59. Two Metrics • F1-Score ✦ Harmonic mean of precision and recall • Cost Effectiveness ✦ Relevant in cost sensitive scenarios ✦ NofB20: Number of Bugs discovered by inspecting top 20% lines of code 19 19
  60. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  61. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  62. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  63. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  64. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  65. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  66. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 21 21
  67. Cost Effectiveness Cumulative LOC 10% 15% 19% 27% Changes LOC ug Buggy #1B e ru T 10 Buggy #2 5 ug Buggy #3B e ru T ug Buggy #4B e ru T 4 8 Buggy #5 12 ... ... NofB20=3 100 21 21
  68. Test Subjects Projects Language LOC # of Changes Linux kernel C 7.3M 429K PostgreSQL C 289K 89K Xorg C 1.1M 46K Eclipse Java 1.5M 73K Lucene* Java 828K 76K Jackrabbit* Java 589K 61K * With manually labelled bug report data [Herzif et al. ’13] 22 22
  69. PCC/PCC+ vs. CC Decision Tree, NofB20 23 23
  70. PCC/PCC+ vs. CC Decision Tree, NofB20 Projects CC PCC Delta PCC+ Delta Linux 160 179 +19 172 +12 PostgreSQL 55 210 +155 175 +120 Xorg 96 159 +63 161 +65 Eclipse 116 207 +91 200 +84 Lucene 177 254 +77 257 +80 Jackrabbit 411 449 +38 459 +48 Average - - +74 - +68 Statistical significant deltas are in bold. 23 23
  71. PCC/PCC+ outperforms CC. 24 24
  72. Different Classification Alg. NofB20 Projects Naive Bayes Logistic Regression CC PCC Delta CC PCC Delta Linux 138 147 +9 102 137 +35 PostgreSQL 89 113 +24 46 56 +10 Xorg 84 101 +17 52 29 -23 Eclipse 65 108 +43 54 55 +1 Lucene 152 139 -13 30 200 +170 Jackrabbit 420 414 -6 261 370 +109 Average - - +12 - - +59 Statistical significant deltas are in bold. 25 25
  73. Different Classification Alg. NofB20 Projects Naive Bayes Logistic Regression CC PCC Delta CC PCC Delta Linux 138 147 +9 102 137 +35 PostgreSQL 89 113 +24 46 56 +10 Xorg 84 101 +17 52 29 -23 Eclipse 65 108 +43 54 55 +1 Lucene 152 139 -13 30 200 +170 Jackrabbit 420 414 -6 261 370 +109 Average - - +12 - - +59 Statistical significant deltas are in bold. 25 25
  74. Different Training Set Sizes PCC CC 300 NofB20 250 200 150 100 10 20 30 40 50 60 70 80 90 Training Set Size Per Developer 26 26
  75. Different Training Set Sizes PCC CC 300 NofB20 250 200 150 100 10 20 30 40 50 60 70 80 90 Training Set Size Per Developer 26 26
  76. The improvement presents in other setups. 27 27
  77. Related Work • Kim et al., Classifying software changes: Clean or buggy?, TSE ’08 • Bettenburg et al., Think locally, act globally: Improving defect and effort prediction models, MSR ’12 28 28
  78. Conclusions & Future Work • • PCC and PCC+ improve prediction performance. • Personalized approach can be applied to other fields. The improvement presents in other setups. ✦ Recommendation systems ✦ Vulnerability prediction ✦ Top crashes prediction 29 29
Advertisement