Dealing with          Noise          in bug prediction      Sunghun Kim, Hongyu Zhang,        Rongxin Wu and Liang GongThe...
Where are the bugs?         2
Where are the bugs?Complex files! [Menzies et al.]                    2
Where are the bugs?                        Modified files!Complex files!           [Nagappan et al.] [Menzies et al.]        ...
Where are the bugs?                          Modified files!Complex files!             [Nagappan et al.] [Menzies et al.]Near...
Where are the bugs?                             Modified files!Complex files!                [Nagappan et al.] [Menzies et al...
Prediction model  training instances  (features+ labels)            3
Prediction model  training instances  (features+ labels)       Learner            3
Prediction model      training instances      (features+ labels)?           Learner                3
Prediction model      training instances      (features+ labels)?           Learner                3
Prediction model      training instances      (features+ labels)?           Learner         Prediction                3
Prediction model      training instances      (features+ labels)?           Learner         Prediction                3
Training on software evolution is key  • Software features can be used to predict bugs  • Defect labels obtained from soft...
Change classification5    Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
Change classification    bug-introducing (“bad”)    X        X              X       X5        Kim, Whitehead Jr., Zhang: Cl...
Change classification        BUILD A LEARNER    bug-introducing (“bad”)    X         X              X       X5         Kim,...
Change classification        BUILD A LEARNER    bug-introducing (“bad”)    X         X              X       X              ...
Change classification        BUILD A LEARNER    bug-introducing (“bad”)    X         X              X       X              ...
Training Classifiers             0   1   0   1   0   1   0   1   …   0   1Historicalchanges             0   0   0   1   0  ...
Source Repository                                                       Bug Database                  all commits C       ...
Source Repository                                                         Bug Database                  all commits C     ...
Source Repository                                                          Bug Database                  all commits C    ...
Source Repository                                                          Bug Database                  all commits C    ...
Source Repository                                                          Bug Database                  all commits C    ...
Source Repository                                                           Bug Database                  all commits C   ...
Source Repository                                                           Bug Database                  all commits C   ...
Effect of training on     superbiased data (Severity)                                             Trained on all bugs     ...
Effect of training on     superbiased data (Severity)                                             Trained on all bugs     ...
Effect of training on      superbiased data (Severity)                                              Trained on all bugs   ...
Effect of training on      superbiased data (Severity)           Bias in bug severityon all bugs                          ...
Are defect predictionmodels learned fromnoisy data reliable?             11
Study questions• Q1: How resistant a defect prediction  model is to noise?• Q2: How much noise could be detected/  removed...
Study approach      13
Study approach      13
Study approach      13
Study approach Training                 Testing            13
Study approach Training            Bayes Net   Testing                 13
Making noisy training instances              Training        Testing                14
Making noisy training instances1 Removing  buggy labels     False negative noise                        Training          ...
Making noisy training instances1 Removing  buggy labels     False negative noise                        Training          ...
Prediction models                                               buggyRev n        Rev n+1...          ......          ... ...
Prediction models                                                        buggyRev n           Rev n+1...             ........
Performance evaluation§ 4 possible outcomes from prediction models  § Classifying a buggy change as buggy (nb->b)  § Cl...
Subjectschange classification    subject      # instances        % buggy   # features   Columba          1,800            2...
Experimental results                    $"                  !#,"                  !#+"                  !#*"!"##$%&()*+",)...
Columba                    $"                  !#,"                  !#+"                  !#*"!"##$%&()*+",)             ...
Columba                    $"                  !#,"                  !#+"                                    1. Random gue...
Columba                    $"                  !#,"                  !#+"                  !#*"!"##$%&()*+",)             ...
Eclipse (JDT)                    $"                  !#,"                  !#+"                  !#*"!"##$%&()*+",)       ...
Scarab                    $"                  !#,"                  !#+"                  !#*"!"##$%&()*+",)              ...
Eclipse (Debug)                     $"                   !#,"                   !#+"                   !#*"!"##$%&()*+",)%...
Eclipse (SWT)                     $"                   !#,"                   !#+"                   !#*"!"##$%&()*+",)%  ...
Q1: How resistant a defect                              prediction model is to noise?                    $"               ...
Q1: How resistant a defect                              prediction model is to noise?                    $"               ...
Q1: How resistant a defect                              prediction model is to noise?                    $"               ...
Study questions• Q1: How resistant a defect prediction  model is to noise?• Q2: How much noise could be detected/  removed...
Detecting noise1 Removing  buggy labels       False negative noise                      Original training2 Adding  buggy l...
Detecting noise    False negative noise     Original training    False positive noise           29
30ts. However, it is very hard to get a golden set. In our approach,e carefully select high quality datasets and assume th...
return Aj  Closest 9. Thenoise identification algorit    F igure            list pseudo-code of the C LN I                 ...
Noise detection performance        Precision        Recall     F-measureDebug    0.681           0.871        0.764SWT    ...
Noise detection performance   1  0.9  0.8  0.7  0.6  0.5  0.4  0.3                                     Precision  0.2     ...
Study questions• Q1: How resistant a defect prediction  model is to noise?• Q2: How much noise could be detected/  removed...
Bug prediction using cleaned data                                Noisey          Cleaned                100               ...
Bug prediction using cleaned data                                Noisey          Cleaned                100               ...
Bug prediction using cleaned data                                 Noisey          Cleaned                100              ...
Study limitations• All datasets are collected from open source  projects• The golden set used in this paper may not be  pe...
Summary• Prediction models (used in our experiments)  are resistant (up to 20~30%) of noise• Noise detection is promising•...
Upcoming SlideShare
Loading in …5
×

Dealing with Noise in Defect Prediction

1,935 views

Published on

ICSE 2011 technical session talk by Sung Kim

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,935
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
46
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Dealing with Noise in Defect Prediction

  1. 1. Dealing with Noise in bug prediction Sunghun Kim, Hongyu Zhang, Rongxin Wu and Liang GongThe Hong Kong University of Science & Technology Tsinghua University
  2. 2. Where are the bugs? 2
  3. 3. Where are the bugs?Complex files! [Menzies et al.] 2
  4. 4. Where are the bugs? Modified files!Complex files! [Nagappan et al.] [Menzies et al.] 2
  5. 5. Where are the bugs? Modified files!Complex files! [Nagappan et al.] [Menzies et al.]Nearby other bugs![Zimmermann et al.] 2
  6. 6. Where are the bugs? Modified files!Complex files! [Nagappan et al.] [Menzies et al.]Nearby other bugs! Previously fixed files[Zimmermann et al.] [Hassan et al.] 2
  7. 7. Prediction model training instances (features+ labels) 3
  8. 8. Prediction model training instances (features+ labels) Learner 3
  9. 9. Prediction model training instances (features+ labels)? Learner 3
  10. 10. Prediction model training instances (features+ labels)? Learner 3
  11. 11. Prediction model training instances (features+ labels)? Learner Prediction 3
  12. 12. Prediction model training instances (features+ labels)? Learner Prediction 3
  13. 13. Training on software evolution is key • Software features can be used to predict bugs • Defect labels obtained from software evolution • Supervised learning algorithms Version Bug Archive Database 4
  14. 14. Change classification5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  15. 15. Change classification bug-introducing (“bad”) X X X X5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  16. 16. Change classification BUILD A LEARNER bug-introducing (“bad”) X X X X5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  17. 17. Change classification BUILD A LEARNER bug-introducing (“bad”) X X X X new change5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  18. 18. Change classification BUILD A LEARNER bug-introducing (“bad”) X X X X new change PREDICT QUALITY5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  19. 19. Training Classifiers 0 1 0 1 0 1 0 1 … 0 1Historicalchanges 0 0 0 1 0 1 0 1 … 0 0 0 1 1 1 0 1 1 1 … 0 0 0 1 0 3 0 0 0 1 … 0 1 0 1 0 1 0 1 0 1 … 0 0 § Machine learning techniques • Bayesian Network, SVM
  20. 20. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit fixed bugs Bf commit commit commit7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  21. 21. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit fixed bugs Bf commit commit commit linked via log messages7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  22. 22. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit fixed bugs Bf commit linked fixed bugs Bfl commit commit linked via log messages7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  23. 23. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  24. 24. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit related, commit but not linked fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  25. 25. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit bug fixes Cf related, commit but not linked fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  26. 26. Source Repository Bug Database all commits C oise! N all bugs B commit commit commit commit commit bug fixes Cf related, commit but not linked fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  27. 27. Effect of training on superbiased data (Severity) Trained on all bugs Trained on biased data1 Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall8 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  28. 28. Effect of training on superbiased data (Severity) Trained on all bugs Trained on biased data1 Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall9 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  29. 29. Effect of training on superbiased data (Severity) Trained on all bugs Trained on biased data1 Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall10 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  30. 30. Effect of training on superbiased data (Severity) Bias in bug severityon all bugs Trained affects BugCache on biased data1 Trained Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall10 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  31. 31. Are defect predictionmodels learned fromnoisy data reliable? 11
  32. 32. Study questions• Q1: How resistant a defect prediction model is to noise?• Q2: How much noise could be detected/ removed?• Q3: Could we remove noise to improve defect prediction performance? 12
  33. 33. Study approach 13
  34. 34. Study approach 13
  35. 35. Study approach 13
  36. 36. Study approach Training Testing 13
  37. 37. Study approach Training Bayes Net Testing 13
  38. 38. Making noisy training instances Training Testing 14
  39. 39. Making noisy training instances1 Removing buggy labels False negative noise Training Testing 14
  40. 40. Making noisy training instances1 Removing buggy labels False negative noise Training Testing2 Adding buggy labels False positive noise 14
  41. 41. Prediction models buggyRev n Rev n+1... ...... ... change clean change classification 15
  42. 42. Prediction models buggyRev n Rev n+1... ...... ... change clean change classification buggy File ... ... ... File clean file-level defect prediction 15
  43. 43. Performance evaluation§ 4 possible outcomes from prediction models § Classifying a buggy change as buggy (nb->b) § Classifying a buggy change as clean (nb->c) § Classifying a clean change as clean (nc->c) § Classifying a clean change as buggy (nc->b) nb->b nb->b§ Precision = , Recall= nb->b + nc->b nb->b + nb->c precision ! recall§ F-measure = 2 ! precision + recall 16
  44. 44. Subjectschange classification subject # instances % buggy # features Columba 1,800 29.4% 17,411 Eclipse (JDT) 659 10.1% 16,192 Scarab 1,090 50.6% 5,710file-level defect prediction subject # instances % buggy # features SWT 1,485 44% 18 Debug 1,065 24.7% 18 17
  45. 45. Experimental results $" !#," !#+" !#*"!"##$%&()*+",) !#)" !#(" -./0123" !#" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 18
  46. 46. Columba $" !#," !#+" !#*"!"##$%&()*+",) !#)" !#(" -./0123" !#" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 19
  47. 47. Columba $" !#," !#+" 1. Random guess (50% buggy, 50% clean) 2. Columba’s defect rate is about 30% !#*"!"##$%&()*+",) 3. Precision = 0.3 and Recall =0.5 !#)" 4. F-measure = 0.375 (2*0.5*0.3)/(0.3+0.5) !#(" -./0123" !#" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 19
  48. 48. Columba $" !#," !#+" !#*"!"##$%&()*+",) !#)" !#(" -./0123" !#" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 20
  49. 49. Eclipse (JDT) $" !#," !#+" !#*"!"##$%&()*+",) !#)" !#(" -./0123" !#" !#&" 45667"89:"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 21
  50. 50. Scarab $" !#," !#+" !#*"!"##$%&()*+",) !#)" !#(" -./0/1" !#" !#&" 23445"670"-./0/1" !#%" !#$" !" !" !#$" !#%" !#&" !#" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 22
  51. 51. Eclipse (Debug) $" !#," !#+" !#*"!"##$%&()*+",)% !#)" !#(" -./01" !#" !#&" -0223"456"-./01" !#%" !#$" !" !" !#$" !#%" !#&" !#" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)% 23
  52. 52. Eclipse (SWT) $" !#," !#+" !#*"!"##$%&()*+",)% !#)" !#(" -./" !#" !#&" 01223"456"-./" !#%" !#$" !" !" !#$" !#%" !#&" !#" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)% 24
  53. 53. Q1: How resistant a defect prediction model is to noise? $" !#," !#+" !#*"!"##$%&()*+",) -./" !#)" !#(" 01234" !#" 5673829" !#&" :;7<=>1" !#%" -;9?92" !#$" !" !" !#$" !#%" !#&" !#" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 25
  54. 54. Q1: How resistant a defect prediction model is to noise? $" !#," !#+" !#*"!"##$%&()*+",) -./" !#)" !#(" 01234" !#" 5673829" !#&" :;7<=>1" !#%" -;9?92" !#$" !" !" !#$" !#%" !#&" !#" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 26
  55. 55. Q1: How resistant a defect prediction model is to noise? $" !#," !#+" !#*"!"##$%&()*+",) -./" !#)" !#(" 01234" !#" 5673829" !#&" !#%" !#$" 20~30% :;7<=>1" -;9?92" !" !" !#$" !#%" !#&" !#" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 26
  56. 56. Study questions• Q1: How resistant a defect prediction model is to noise?• Q2: How much noise could be detected/ removed?• Q3: Could we remove noise to improve defect prediction performance? 27
  57. 57. Detecting noise1 Removing buggy labels False negative noise Original training2 Adding buggy labels False positive noise 28
  58. 58. Detecting noise False negative noise Original training False positive noise 29
  59. 59. 30ts. However, it is very hard to get a golden set. In our approach,e carefully select high quality datasets and assume them the lden sets. We then add FPs and FNs intentionally to create a False positive noise ise set. To add FPs and FNs, we randomly selects instances in a lden set and artificially change their labels from buggy to clean from clean to buggy, inspired by experiments in [4]. Original training ? noise Clean False negative noise Detecting noise F igure 4. C reating biased training set make FN data sets (for RQ1), we randomly select n% buggy
  60. 60. return Aj Closest 9. Thenoise identification algorit F igure list pseudo-code of the C LN I A 31
  61. 61. Noise detection performance Precision Recall F-measureDebug 0.681 0.871 0.764SWT 0.624 0.830 0.712 (noise level =20%) 32
  62. 62. Noise detection performance 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Precision 0.2 Recall 0.1 F-measure 0 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 FP & FN noise level Noise Rate 33
  63. 63. Study questions• Q1: How resistant a defect prediction model is to noise?• Q2: How much noise could be detected/ removed?Q3: Could we remove noise to improvedefect prediction performance? 34
  64. 64. Bug prediction using cleaned data Noisey Cleaned 100 75SWT F-measure 50 25 0 0% 15% 30% 45% 35 Noise level
  65. 65. Bug prediction using cleaned data Noisey Cleaned 100 75SWT F-measure 50 25 0 0% 15% 30% 45% 36 Noise level
  66. 66. Bug prediction using cleaned data Noisey Cleaned 100 75SWT F-measure 50 25 76% F-measure with 45% noise 0 0% 15% 30% 45% 36 Noise level
  67. 67. Study limitations• All datasets are collected from open source projects• The golden set used in this paper may not be perfect• The noisy data simulations may not reflect the actual noise patterns in practice 37
  68. 68. Summary• Prediction models (used in our experiments) are resistant (up to 20~30%) of noise• Noise detection is promising• Future work - Building oracle defect sets - Improving noise detection algorithms - Applying to more defect prediction models (regression, bugcache) 38

×