Advertisement

More Related Content

Advertisement

Dealing with Noise in Defect Prediction

  1. Dealing with Noise in bug prediction Sunghun Kim, Hongyu Zhang, Rongxin Wu and Liang Gong The Hong Kong University of Science & Technology Tsinghua University
  2. Where are the bugs? 2
  3. Where are the bugs? Complex files! [Menzies et al.] 2
  4. Where are the bugs? Modified files! Complex files! [Nagappan et al.] [Menzies et al.] 2
  5. Where are the bugs? Modified files! Complex files! [Nagappan et al.] [Menzies et al.] Nearby other bugs! [Zimmermann et al.] 2
  6. Where are the bugs? Modified files! Complex files! [Nagappan et al.] [Menzies et al.] Nearby other bugs! Previously fixed files [Zimmermann et al.] [Hassan et al.] 2
  7. Prediction model training instances (features+ labels) 3
  8. Prediction model training instances (features+ labels) Learner 3
  9. Prediction model training instances (features+ labels) ? Learner 3
  10. Prediction model training instances (features+ labels) ? Learner 3
  11. Prediction model training instances (features+ labels) ? Learner Prediction 3
  12. Prediction model training instances (features+ labels) ? Learner Prediction 3
  13. Training on software evolution is key • Software features can be used to predict bugs • Defect labels obtained from software evolution • Supervised learning algorithms Version Bug Archive Database 4
  14. Change classification 5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  15. Change classification bug-introducing (“bad”) X X X X 5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  16. Change classification BUILD A LEARNER bug-introducing (“bad”) X X X X 5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  17. Change classification BUILD A LEARNER bug-introducing (“bad”) X X X X new change 5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  18. Change classification BUILD A LEARNER bug-introducing (“bad”) X X X X new change PREDICT QUALITY 5 Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
  19. Training Classifiers 0 1 0 1 0 1 0 1 … 0 1 Historical changes 0 0 0 1 0 1 0 1 … 0 0 0 1 1 1 0 1 1 1 … 0 0 0 1 0 3 0 0 0 1 … 0 1 0 1 0 1 0 1 0 1 … 0 0 § Machine learning techniques • Bayesian Network, SVM
  20. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit fixed bugs Bf commit commit commit 7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  21. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit fixed bugs Bf commit commit commit linked via log messages 7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  22. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit fixed bugs Bf commit linked fixed bugs Bfl commit commit linked via log messages 7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  23. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit commit fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages 7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  24. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit related, commit but not linked fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages 7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  25. Source Repository Bug Database all commits C commit all bugs B commit commit commit commit bug fixes Cf related, commit but not linked fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages 7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  26. Source Repository Bug Database all commits C oise! N all bugs B commit commit commit commit commit bug fixes Cf related, commit but not linked fixed bugs Bf commit linked fixes Cfl linked fixed bugs Bfl commit commit linked via log messages 7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  27. Effect of training on superbiased data (Severity) Trained on all bugs Trained on biased data1 Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall 8 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  28. Effect of training on superbiased data (Severity) Trained on all bugs Trained on biased data1 Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall 9 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  29. Effect of training on superbiased data (Severity) Trained on all bugs Trained on biased data1 Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall 10 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  30. Effect of training on superbiased data (Severity) Bias in bug severityon all bugs Trained affects BugCache on biased data1 Trained Trained on biased data2 0% 20% 40% 60% 80% 100% Bug Recall 10 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
  31. Are defect prediction models learned from noisy data reliable? 11
  32. Study questions • Q1: How resistant a defect prediction model is to noise? • Q2: How much noise could be detected/ removed? • Q3: Could we remove noise to improve defect prediction performance? 12
  33. Study approach 13
  34. Study approach 13
  35. Study approach 13
  36. Study approach Training Testing 13
  37. Study approach Training Bayes Net Testing 13
  38. Making noisy training instances Training Testing 14
  39. Making noisy training instances 1 Removing buggy labels False negative noise Training Testing 14
  40. Making noisy training instances 1 Removing buggy labels False negative noise Training Testing 2 Adding buggy labels False positive noise 14
  41. Prediction models buggy Rev n Rev n+1 ... ... ... ... change clean change classification 15
  42. Prediction models buggy Rev n Rev n+1 ... ... ... ... change clean change classification buggy File ... ... ... File clean file-level defect prediction 15
  43. Performance evaluation § 4 possible outcomes from prediction models § Classifying a buggy change as buggy (nb->b) § Classifying a buggy change as clean (nb->c) § Classifying a clean change as clean (nc->c) § Classifying a clean change as buggy (nc->b) nb->b nb->b § Precision = , Recall= nb->b + nc->b nb->b + nb->c precision ! recall § F-measure = 2 ! precision + recall 16
  44. Subjects change classification subject # instances % buggy # features Columba 1,800 29.4% 17,411 Eclipse (JDT) 659 10.1% 16,192 Scarab 1,090 50.6% 5,710 file-level defect prediction subject # instances % buggy # features SWT 1,485 44% 18 Debug 1,065 24.7% 18 17
  45. Experimental results $" !#," !#+" !#*" !"##$%&'()*+",) !#)" !#(" -./0123" !#'" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 18
  46. Columba $" !#," !#+" !#*" !"##$%&'()*+",) !#)" !#(" -./0123" !#'" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 19
  47. Columba $" !#," !#+" 1. Random guess (50% buggy, 50% clean) 2. Columba’s defect rate is about 30% !#*" !"##$%&'()*+",) 3. Precision = 0.3 and Recall =0.5 !#)" 4. F-measure = 0.375 (2*0.5*0.3)/(0.3+0.5) !#(" -./0123" !#'" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 19
  48. Columba $" !#," !#+" !#*" !"##$%&'()*+",) !#)" !#(" -./0123" !#'" !#&" 40115"6.7"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 20
  49. Eclipse (JDT) $" !#," !#+" !#*" !"##$%&'()*+",) !#)" !#(" -./0123" !#'" !#&" 45667"89:"-./0123" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 21
  50. Scarab $" !#," !#+" !#*" !"##$%&'()*+",) !#)" !#(" -./0/1" !#'" !#&" 23445"670"-./0/1" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 22
  51. Eclipse (Debug) $" !#," !#+" !#*" !"##$%&'()*+",)% !#)" !#(" -./01" !#'" !#&" -0223"456"-./01" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)% 23
  52. Eclipse (SWT) $" !#," !#+" !#*" !"##$%&'()*+",)% !#)" !#(" -./" !#'" !#&" 01223"456"-./" !#%" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)% 24
  53. Q1: How resistant a defect prediction model is to noise? $" !#," !#+" !#*" !"##$%&'()*+",) -./" !#)" !#(" 01234" !#'" 5673829" !#&" :;7<=>1" !#%" -;9?92" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 25
  54. Q1: How resistant a defect prediction model is to noise? $" !#," !#+" !#*" !"##$%&'()*+",) -./" !#)" !#(" 01234" !#'" 5673829" !#&" :;7<=>1" !#%" -;9?92" !#$" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 26
  55. Q1: How resistant a defect prediction model is to noise? $" !#," !#+" !#*" !"##$%&'()*+",) -./" !#)" !#(" 01234" !#'" 5673829" !#&" !#%" !#$" 20~30% :;7<=>1" -;9?92" !" !" !#$" !#%" !#&" !#'" !#(" !#)" -./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3) 26
  56. Study questions • Q1: How resistant a defect prediction model is to noise? • Q2: How much noise could be detected/ removed? • Q3: Could we remove noise to improve defect prediction performance? 27
  57. Detecting noise 1 Removing buggy labels False negative noise Original training 2 Adding buggy labels False positive noise 28
  58. Detecting noise False negative noise Original training False positive noise 29
  59. 30 ts. However, it is very hard to get a golden set. In our approach, e carefully select high quality datasets and assume them the lden sets. We then add FPs and FNs intentionally to create a False positive noise ise set. To add FPs and FNs, we randomly selects instances in a lden set and artificially change their labels from buggy to clean from clean to buggy, inspired by experiments in [4]. Original training ? noise Clean False negative noise Detecting noise F igure 4. C reating biased training set make FN data sets (for RQ1), we randomly select n% buggy
  60. return Aj Closest 9. Thenoise identification algorit F igure list pseudo-code of the C LN I A 31
  61. Noise detection performance Precision Recall F-measure Debug 0.681 0.871 0.764 SWT 0.624 0.830 0.712 (noise level =20%) 32
  62. Noise detection performance 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Precision 0.2 Recall 0.1 F-measure 0 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 FP & FN noise level Noise Rate 33
  63. Study questions • Q1: How resistant a defect prediction model is to noise? • Q2: How much noise could be detected/ removed? Q3: Could we remove noise to improve defect prediction performance? 34
  64. Bug prediction using cleaned data Noisey Cleaned 100 75 SWT F-measure 50 25 0 0% 15% 30% 45% 35 Noise level
  65. Bug prediction using cleaned data Noisey Cleaned 100 75 SWT F-measure 50 25 0 0% 15% 30% 45% 36 Noise level
  66. Bug prediction using cleaned data Noisey Cleaned 100 75 SWT F-measure 50 25 76% F-measure with 45% noise 0 0% 15% 30% 45% 36 Noise level
  67. Study limitations • All datasets are collected from open source projects • The golden set used in this paper may not be perfect • The noisy data simulations may not reflect the actual noise patterns in practice 37
  68. Summary • Prediction models (used in our experiments) are resistant (up to 20~30%) of noise • Noise detection is promising • Future work - Building oracle defect sets - Improving noise detection algorithms - Applying to more defect prediction models (regression, bugcache) 38
Advertisement