Dealing with
Noise
in bug prediction
Sunghun Kim, Hongyu Zhang,
Rongxin Wu and Liang Gong
The Hong Kong University of Science & Technology
Tsinghua University
Where are the bugs?
Modified files!
Complex files! [Nagappan et al.]
[Menzies et al.]
2
Where are the bugs?
Modified files!
Complex files! [Nagappan et al.]
[Menzies et al.]
Nearby other bugs!
[Zimmermann et al.]
2
Where are the bugs?
Modified files!
Complex files! [Nagappan et al.]
[Menzies et al.]
Nearby other bugs! Previously fixed files
[Zimmermann et al.] [Hassan et al.]
2
Prediction model
training instances
(features+ labels)
?
Learner Prediction
3
Prediction model
training instances
(features+ labels)
?
Learner Prediction
3
Training on software evolution is key
• Software features can be used to predict bugs
• Defect labels obtained from software evolution
• Supervised learning algorithms
Version Bug
Archive Database
4
Change classification
5
Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
Change classification
bug-introducing (“bad”)
X X X X
5
Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
Change classification
BUILD A LEARNER
bug-introducing (“bad”)
X X X X
5
Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
Change classification
BUILD A LEARNER
bug-introducing (“bad”)
X X X X
new change
5
Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
Change classification
BUILD A LEARNER
bug-introducing (“bad”)
X X X X
new change
PREDICT QUALITY
5
Kim, Whitehead Jr., Zhang: Classifying Software Changes: Clean or Buggy? (TSE 2008)
Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
fixed bugs Bf
commit
commit
commit
7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
fixed bugs Bf
commit
commit
commit
linked via log messages
7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
fixed bugs Bf
commit
linked fixed bugs Bfl
commit
commit
linked via log messages
7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
commit
fixed bugs Bf
commit
linked fixes Cfl linked fixed bugs Bfl
commit
commit
linked via log messages
7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
related,
commit
but not linked fixed bugs Bf
commit
linked fixes Cfl linked fixed bugs Bfl
commit
commit
linked via log messages
7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
Source Repository Bug Database
all commits C
commit all bugs B
commit commit
commit commit
bug fixes Cf related,
commit
but not linked fixed bugs Bf
commit
linked fixes Cfl linked fixed bugs Bfl
commit
commit
linked via log messages
7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
Source Repository Bug Database
all commits C
oise!
N all bugs B
commit
commit commit
commit commit
bug fixes Cf related,
commit
but not linked fixed bugs Bf
commit
linked fixes Cfl linked fixed bugs Bfl
commit
commit
linked via log messages
7 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
Effect of training on
superbiased data (Severity)
Trained on all bugs
Trained on biased data1
Trained on biased data2
0% 20% 40% 60% 80% 100%
Bug Recall
8 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
Effect of training on
superbiased data (Severity)
Trained on all bugs
Trained on biased data1
Trained on biased data2
0% 20% 40% 60% 80% 100%
Bug Recall
9 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
Effect of training on
superbiased data (Severity)
Trained on all bugs
Trained on biased data1
Trained on biased data2
0% 20% 40% 60% 80% 100%
Bug Recall
10 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
Effect of training on
superbiased data (Severity)
Bias in bug severityon all bugs
Trained
affects BugCache on biased data1
Trained
Trained on biased data2
0% 20% 40% 60% 80% 100%
Bug Recall
10 Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets,” FSE2009
Study questions
• Q1: How resistant a defect prediction
model is to noise?
• Q2: How much noise could be detected/
removed?
• Q3: Could we remove noise to improve
defect prediction performance?
12
Q1: How resistant a defect
prediction model is to noise?
$"
!#,"
!#+"
!#*"
!"##$%&'()*+",)
-./"
!#)"
!#(" 01234"
!#'" 5673829"
!#&"
:;7<=>1"
!#%"
-;9?92"
!#$"
!"
!" !#$" !#%" !#&" !#'" !#(" !#)"
-./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
25
Q1: How resistant a defect
prediction model is to noise?
$"
!#,"
!#+"
!#*"
!"##$%&'()*+",)
-./"
!#)"
!#(" 01234"
!#'" 5673829"
!#&"
:;7<=>1"
!#%"
-;9?92"
!#$"
!"
!" !#$" !#%" !#&" !#'" !#(" !#)"
-./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
26
Q1: How resistant a defect
prediction model is to noise?
$"
!#,"
!#+"
!#*"
!"##$%&'()*+",)
-./"
!#)"
!#(" 01234"
!#'" 5673829"
!#&"
!#%"
!#$"
20~30% :;7<=>1"
-;9?92"
!"
!" !#$" !#%" !#&" !#'" !#(" !#)"
-./%0,*1212#%+)3%4*5+)%2)#*67)%-&8/%9%4*5+)%:;+167)%-&</%,*3)
26
Study questions
• Q1: How resistant a defect prediction
model is to noise?
• Q2: How much noise could be detected/
removed?
• Q3: Could we remove noise to improve
defect prediction performance?
27
Detecting noise
1 Removing
buggy labels False negative noise
Original training
2 Adding
buggy labels
False positive noise
28
Detecting noise
False negative noise
Original training
False positive noise
29
30
ts. However, it is very hard to get a golden set. In our approach,
e carefully select high quality datasets and assume them the
lden sets. We then add FPs and FNs intentionally to create a
False positive noise
ise set. To add FPs and FNs, we randomly selects instances in a
lden set and artificially change their labels from buggy to clean
from clean to buggy, inspired by experiments in [4].
Original training
?
noise
Clean False negative noise
Detecting noise
F igure 4. C reating biased training set
make FN data sets (for RQ1), we randomly select n% buggy
return Aj
Closest 9. Thenoise identification algorit
F igure
list pseudo-code of the C LN I
A
31
Study questions
• Q1: How resistant a defect prediction
model is to noise?
• Q2: How much noise could be detected/
removed?
Q3: Could we remove noise to improve
defect prediction performance?
34
Bug prediction using cleaned data
Noisey Cleaned
100
75
SWT F-measure
50
25
76%
F-measure
with 45% noise
0
0% 15% 30% 45%
36 Noise level
Study limitations
• All datasets are collected from open source
projects
• The golden set used in this paper may not be
perfect
• The noisy data simulations may not reflect
the actual noise patterns in practice
37
Summary
• Prediction models (used in our experiments)
are resistant (up to 20~30%) of noise
• Noise detection is promising
• Future work
- Building oracle defect sets
- Improving noise detection algorithms
- Applying to more defect prediction models
(regression, bugcache)
38