Developers are Different
Modulo %
FOR
Bitwise OR
CONTINUE
% of Buggy Changes
80
60
40
20
0
A
B
C
D
Average
Linux Kernel, 2005-2010
4
4
Developers are Different
Modulo %
FOR
Bitwise OR
CONTINUE
% of Buggy Changes
80
60
40
20
0
A
B
C
D
Average
Linux Kernel, 2005-2010
4
4
Developers are Different
Modulo %
FOR
Bitwise OR
CONTINUE
% of Buggy Changes
80
60
40
20
0
A
B
C
D
Average
Linux Kernel, 2005-2010
Personalized models can improve performance.
4
4
Contributions
•
Personalized Change Classification (PCC)
✦ One model for each developer
•
Confidence-based Hybrid PCC (PCC+)
✦ Picks predictions with highest confidence
•
Evaluate on six C and Java projects
✦ Find up to 155 more bugs by inspecting
20% LOC
✦ Improve F1 by up to 0.08
6
6
What is a Change?
Commit: 09a02f...
Author: John Smith
Message: I submitted some code.
file1.c
+
+
+
-
file2.c
+
-
file3.c
+
+
-
7
7
What is a Change?
Commit
Commit: 09a02f...
Author: John Smith
Message: I submitted some code.
file1.c
+
+
+
-
file2.c
+
-
file3.c
+
+
-
Change 1 Change 2 Change 3
7
7
What is a Change?
Commit
Commit: 09a02f...
Author: John Smith
Message: I submitted some code.
file1.c
+
+
+
-
file2.c
+
-
file3.c
+
+
-
Change 1 Change 2 Change 3
Change-Level: Inspect less code to locate a bug.
7
7
Change Classification (CC)
Training Phase
Software
History
Training
Instances
1. Label changes
with clean or buggy
Prediction Phase
Features
2. Extract
features
8
8
Change Classification (CC)
Training Phase
Software
History
Training
Instances
1. Label changes
with clean or buggy
Prediction Phase
Features
2. Extract
features
Classification
Algorithm
Model
3. Build prediction
model
8
8
Change Classification (CC)
Training Phase
Software
History
Training
Instances
1. Label changes
with clean or buggy
Prediction Phase
Features
2. Extract
features
Classification
Algorithm
3. Build prediction
model
Model
Future
Instances
4. Predict
8
8
Label Clean or Buggy
[Sliwerski et al. ’05]
Revision History
9
9
Label Clean or Buggy
[Sliwerski et al. ’05]
Revision History
Bug-Fixing Change
Commit: 1da57...
Message: I fixed a bug
fileA.c
- if (i < 128)
+if (i <= 128)
Contain keyword “fix”, or
ID of manually verified bug report [Herzif et al. ’13]
9
9
Label Clean or Buggy
[Sliwerski et al. ’05]
Revision History
Buggy Change
Bug-Fixing Change
Commit: 7a3bc...
Message: new feature
fileA.c
+...
+if (i < 128)
+...
Commit: 1da57...
Message: I fixed a bug
fileA.c
Fixed by a later change
git blame
- if (i < 128)
+if (i <= 128)
Contain keyword “fix”, or
ID of manually verified bug report [Herzif et al. ’13]
9
9
Confidence Measure
•
Bugginess
✦ Probability of a change being buggy
•
Confidence Measure
✦ Comparable measure of confidence
•
Select the prediction with the highest confidence.
17
17
Research Questions
•
•
RQ1: Do PCC and PCC+ outperform CC?
RQ2: Does PCC outperform CC in other setups?
✦ Classification algorithms
✦ Sizes of training sets
18
18
Two Metrics
•
F1-Score
✦ Harmonic mean of precision and recall
•
Cost Effectiveness
✦ Relevant in cost sensitive scenarios
✦ NofB20: Number of Bugs discovered by
inspecting top 20% lines of code
19
19
Test Subjects
Projects
Language
LOC
# of Changes
Linux kernel
C
7.3M
429K
PostgreSQL
C
289K
89K
Xorg
C
1.1M
46K
Eclipse
Java
1.5M
73K
Lucene*
Java
828K
76K
Jackrabbit*
Java
589K
61K
* With manually labelled bug report data [Herzif et al. ’13]
22
22
Related Work
•
Kim et al., Classifying software changes: Clean or
buggy?, TSE ’08
•
Bettenburg et al., Think locally, act globally: Improving
defect and effort prediction models, MSR ’12
28
28
Conclusions & Future Work
•
•
PCC and PCC+ improve prediction performance.
•
Personalized approach can be applied to other fields.
The improvement presents in other setups.
✦ Recommendation systems
✦ Vulnerability prediction
✦ Top crashes prediction
29
29