Heterogeneous Defect Prediction (  ESEC/FSE 2015)

Heterogeneous Defect
Prediction
ESEC/FSE 2015
September 3, 2015
Jaechang Nam and Sunghun Kim
Department of Computer Science and Engineering
HKUST

2
Predict
Training
?
?
Model
Project A
: Metric value
: Buggy-labeled instance
: Clean-labeled instance
?: Unlabeled instance
Software Defect Prediction
Related Work
Munson@TSE`92, Basili@TSE`95, Menzies@TSE`07,
Hassan@ICSE`09, Bird@FSE`11,D’ambros@EMSE112
Lee@FSE`11,...

What if labeled instances do not
exist?
3
?
?
?
?
?
Project X
Unlabeled
Dataset
: Metric value
Model
New projects
Projects lacking in
historical data

Existing Solutions?
4
?
?
?
?
?
(New) Project X
Unlabeled
Dataset
: Metric value

Cross-Project Defect Prediction
(CPDP)
5
?
?
?
?
?
Training
Predict
Model
Project A
(source)
Project X
(target)
Unlabeled
Dataset
: Metric value
: Buggy-labeled instance
: Clean-labeled instance
Related Work
Watanabe@PROMISE`08, Turhan@EMSE`09
Zimmermann@FSE`09, Ma@IST`12, Zhang@MSR`14
Zhang@MSR`14, Panichella@WCRE`14,
Canfora@STVR15
Challenge
Same metric set
(same feature space)
• Heterogeneous
metrics between
source and target

Motivation
6
?
Training
Test
Model
Project A
(source)
Project C
(target)
?
?
?
?
?
?
?
Heterogeneous metric sets
(different feature spaces
or different domains)
Possible to Reuse all the existing defect datasets for CPDP!
Heterogeneous Defect Prediction (HDP)

Key Idea
• Consistent defect-proneness tendency of
metrics
– Defect prediction metrics measure complexity of
software and its development process.
• e.g.
– The number of developers touching a source code file
(Bird@FSE`11)
– The number of methods in a class (D’Ambroas@ESEJ`12)
– The number of operands (Menzies@TSE`08)
More complexity implies more defect-proneness
(Rahman@ICSE`13)
• Distributions between source and target should
be the same to build a strong prediction model.
7
Match source and target metrics that
have similar distribution

Heterogeneous Defect Prediction (HDP)
- Overview -
8
X1 X2 X3 X4 Label
1 1 3 10 Buggy
8 0 1 0 Clean
⋮ ⋮ ⋮ ⋮ ⋮
9 0 1 1 Clean
Metric
Matching
Source: Project A Target: Project B
Cross-
prediction Model
Build
(training)
Predict
(test)
Metric
Selection
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label
3 1 1 0 2 1 9 ?
1 1 9 0 2 3 8 ?
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
0 1 1 1 2 1 1 ?
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
9 1 1 ?
8 3 9 ?
⋮ ⋮ ⋮ ⋮
1 1 1 ?

Metric Selection
• Why? (Guyon@JMLR`03)
– Select informative metrics
• Remove redundant and irrelevant metrics
– Decrease complexity of metric matching combination
• Feature Selection Approaches (Gao@SPE`11,Shivaji@TSE`13)
– Gain Ratio
– Chi-square
– Relief-F
– Significance attribute evaluation
9

Metric Matching
10
Source Metrics Target Metrics
X1
X2
Y1
Y2
0.8
0.5
* We can apply different cutoff values of matching score
* It can be possible that there is no matching at all.

Compute Matching Score
KSAnalyzer
• Use p-value of Kolmogorov-Smirnov Test
(Massey@JASA`51)
11
Matching Score M of i-th source and j-th target metrics:
Mij = pij

Heterogeneous Defect Prediction
- Overview -
12
X1 X2 X3 X4 Label
1 1 3 10 Buggy
8 0 1 0 Clean
⋮ ⋮ ⋮ ⋮ ⋮
9 0 1 1 Clean
Metric
Matching
Source: Project A Target: Project B
Cross-
prediction Model
Build
(training)
Predict
(test)
Metric
Selection
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label
3 1 1 0 2 1 9 ?
1 1 9 0 2 3 8 ?
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
0 1 1 1 2 1 1 ?
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
9 1 1 ?
8 3 9 ?
⋮ ⋮ ⋮ ⋮
1 1 1 ?

Baselines
• WPDP
• CPDP-CM (Turhan@EMSE`09,Ma@IST`12,He@IST`14)
– Cross-project defect prediction using only
common metrics between source and target
datasets
• CPDP-IFS (He@CoRR`14)
– Cross-project defect prediction on
Imbalanced Feature Set (i.e. heterogeneous
metric set)
– 16 distributional characteristics of values of
an instance as features (e.g., mean, std,
maximum,...)
14

Research Questions (RQs)
• RQ1
– Is heterogeneous defect prediction comparable
to WPDP?
• RQ2
– Is heterogeneous defect prediction comparable
to CPDP-CM?
• RQ3
– Is Heterogeneous defect prediction comparable
to CPDP-IFS?
15

Benchmark Datasets
Group Dataset
# of instances # of
metrics
Granularity
All Buggy (%)
AEEEM
EQ 325 129 (39.7%)
61 Class
JDT 997 206 (20.7%)
LC 399 64 (9.36%)
ML 1862 245 (13.2%)
PDE 1492 209 (14.0%)
MORP
H
ant-1.3 125 20 (16.0%)
20 Class
arc 234 27 (11.5%)
camel-1.0 339 13 (3.8%)
poi-1.5 237 141 (75.0%)
redaktor 176 27 (15.3%)
skarbonka 45 9 (20.0%)
tomcat 858 77 (9.0%)
velocity-1.4 196 147 (75.0%)
xalan-2.4 723 110 (15.2%)
xerces-1.2 440 71 (16.1%)
16
Group Dataset
# of instances # of
metrics
Granularity
All Buggy (%)
ReLink
Apache 194 98 (50.5%)
26 FileSafe 56 22 (39.3%)
ZXing 399
118
(29.6%)
NASA
cm1 327 42 (12.8%)
37 Function
mw1 253 27 (10.7%)
pc1 705 61 (8.7%)
pc3 1077
134
(12.4%)
pc4 1458
178
(12.2%)
SOFTLA
B
ar1 121 9 (7.4%)
29 Function
ar3 63 8 (12.7%)
ar4 107 20 (18.7%)
ar5 36 8 (22.2%)
ar6 101 15 (14.9%)
600 prediction combinations in total!

Experimental Settings
• Logistic Regression
• HDP vs. WPDP, CPDP-CM, and CPDP-IFS
17
Test set
(50%)
Training set
(50%)
Project
1
Project
2
Project
n
...
...
X 1000
Project
1
Project
2
Project
n
...
...
CPDP-CM
CPDP-IFS
HDP
WPDP
Project A

Evaluation Measures
• False Positive Rate = FP/(TN+FP)
• True Positive Rate = Recall
• AUC (Area Under receiver operating characteristic Curve)
18
False Positive rate
TruePositiverate
0
1
1

Evaluation Measures
• Win/Tie/Loss (Valentini@ICML`03, Li@JASE`12, Kocaguneli@TSE`13)
– Wilcoxon signed-rank test (p<0.05) for 1000
prediction results
– Win
• # of outperforming HDP prediction combinations with
statistical significance. (p<0.05)
– Tie
• # of HDP prediction combinations with no statistical
significance. (p≥0.05)
– Loss
• # of outperforming baseline prediction results with
statistical significance. (p>0.05)
19

Prediction Results in median
AUC
Target WPDP
CPDP-
CM
CPDP-
IFS
HDPKS
(cutoff
=0.05)
EQ 0.583 0.776 0.461 0.783
JDT 0.795 0.781 0.543 0.767
MC 0.575 0.636 0.584 0.655
ML 0.734 0.651 0.557 0.692*
PDE 0.684 0.682 0.566 0.717
ant-1.3 0.670 0.611 0.500 0.701
arc 0.670 0.611 0.523 0.701
camel-1.0 0.550 0.590 0.500 0.639
poi-1.5 0.707 0.676 0.606 0.537
redaktor 0.744 0.500 0.500 0.537
skarbonka 0.569 0.736 0.528 0.694*
tomcat 0.778 0.746 0.640 0.818
velocity-
1.4
0.725 0.609 0.500 0.391
xalan-2.4 0.755 0.658 0.499 0.751
xerces-1.2 0.624 0.453 0.500 0.489
21
Target WPDP
CPDP-
CM
CPDP-
IFS
HDPKS
(cutoff
=0.05)
Apache 0.714 0.689 0.635 0.717*
Safe 0.706 0.749 0.616 0.818*
ZXing 0.605 0.619 0.530 0.650*
cm1 0.653 0.622 0.551 0.717*
mw1 0.612 0.584 0.614 0.727
pc1 0.787 0.675 0.564 0.752*
pc3 0.794 0.665 0.500 0.738*
pc4 0.900 0.773 0.589 0.682*
ar1 0.582 0.464 0.500 0.734*
ar3 0.574 0.862 0.682 0.823*
ar4 0.657 0.588 0.575 0.816*
ar5 0.804 0.875 0.585 0.911*
ar6 0.654 0.611 0.527 0.640
All 0.657 0.636 0.555 0.724*
HDPKS: Heterogeneous defect prediction using KSAnalyzer

Win/Tie/Loss Results
Target
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
W T L W T L W T L
EQ 4 0 0 2 2 0 4 0 0
JDT 0 0 5 3 0 2 5 0 0
LC 6 0 1 3 3 1 3 1 3
ML 0 0 6 4 2 0 6 0 0
PDE 3 0 2 2 0 3 5 0 0
ant-1.3 6 0 1 6 0 1 5 0 2
arc 3 1 0 3 0 1 4 0 0
camel-1.0 3 0 2 3 0 2 4 0 1
poi-1.5 2 0 2 3 0 1 2 0 2
redaktor 0 0 4 2 0 2 3 0 1
skarbonka 11 0 0 4 0 7 9 0 2
tomcat 2 0 0 1 1 0 2 0 0
velocity-
1.4
0 0 3 0 0 3 0 0 3
xalan-2.4 0 0 1 1 0 0 1 0 0
xerces-1.2 0 0 3 3 0 0 1 0 2 22
Target
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
W T L W T L W T L
Apach
e
6 0 5 8 1 2 9 0 2
Safe 14 0 3 12 0 5 15 0 2
ZXing 8 0 0 6 0 2 7 0 1
cm1 7 1 2 8 0 2 9 0 1
mw1 5 0 1 4 0 2 4 0 2
pc1 1 0 5 5 0 1 6 0 0
pc3 0 0 7 7 0 0 7 0 0
pc4 0 0 7 2 0 5 7 0 0
ar1 14 0 1 14 0 1 11 0 4
ar3 15 0 0 5 0 10 10 2 3
ar4 16 0 0 14 1 1 15 0 1
ar5 14 0 4 14 0 4 16 0 2
ar6 7 1 7 8 4 3 12 0 3
Total 147 3 72 147 14 61 182 3 35
%
66.2
%
1.4%
32.4
%
66.2
%
6.3%
27.5
%
82.0
%
1.3%
16.7
%

Matched Metrics (Win)
23
MetricValues
Distribution
(Source metric: RFC-the number of method invoked by a class, Target metric: the number of operand
Matching Score = 0.91
AUC = 0.946 (ant1.3  ar5)

Matched Metrics (Loss)
24
MetricValues
Distribution
(Source metric: LOC, Target metric: average number of LOC in a method)
Matching Score = 0.13
AUC = 0.391 (Safe  velocity-1.4)

Different Feature Selections
(median AUCs, Win/Tie/Loss)
25
Approach
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
HDP
AUC Win% AUC Win% AUC Win% AUC
Gain Ratio 0.657 63.7% 0.645 63.2% 0.536 80.2% 0.720
Chi-Square 0.657 64.7% 0.651 66.4% 0.556 82.3% 0.727
Significanc
e
0.657 66.2% 0.636 66.2% 0.553 82.0% 0.724
Relief-F 0.670 57.0% 0.657 63.1% 0.543 80.5% 0.709
None 0.657 47.3% 0.624 50.3% 0.536 66.3% 0.663

Results in Different Cutoffs
26
Cutoff
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
HDP Target
Coverage
AUC Win% AUC Win% AUC Win% AUC
0.05 0.657 66.2% 0.636 66.2% 0.553 82.4% 0.724* 100%
0.90 0.657 100% 0.761 71.4% 0.624 100% 0.852* 21%

Conclusion
• HDP
– Potential for CPDP across datasets with
different metric sets.
• Future work
– Filtering out noisy metric matching
– Determine the best probability threshold
27

Heterogeneous Defect Prediction (  ESEC/FSE 2015)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Heterogeneous Defect Prediction (  ESEC/FSE 2015)

Similar to Heterogeneous Defect Prediction (  ESEC/FSE 2015) (20)

More from Sung Kim

More from Sung Kim (16)

Recently uploaded

Recently uploaded (20)