SlideShare a Scribd company logo
Software Defect Prediction
on Unlabeled Datasets
- PhD Thesis Defence -
July 23, 2015
Jaechang Nam
Department of Computer Science and Engineering
HKUST
Software Defect Prediction
• General question of software defect
prediction
– Can we identify defect-prone entities (source
code file, binary, module, change,...) in advance?
• # of defects
• buggy or clean
• Why? (applications)
– Quality assurance for large software
(Akiyama@IFIP’71)
– Effective resource allocation
• Testing (Menzies@TSE`07, Kim@FSE`15)
• Code review (Rahman@FSE’11)
2
3
Predict
Training
?
?
Model
Project A
: Metric value
: Buggy-labeled instance
: Clean-labeled instance
?: Unlabeled instance
Software Defect Prediction
Related Work
Munson@TSE`92, Basili@TSE`95, Menzies@TSE`07,
Hassan@ICSE`09, Bird@FSE`11,D’ambros@EMSE112
Lee@FSE`11,...
What if labeled instances do not
exist?
4
?
?
?
?
?
Project X
Unlabeled
Dataset
?: Unlabeled instance
: Metric value
Model
New projects
Projects lacking in
historical data
This problem is...
5
?
?
?
?
?
Project X
Unlabeled
Dataset
?: Unlabeled instance
: Metric value
Software Defect Prediction
on Unlabeled Datasets
Existing Solutions?
6
?
?
?
?
?
(New) Project X
Unlabeled
Dataset
?: Unlabeled instance
: Metric value
Solution 1
Cross-Project Defect Prediction
(CPDP)
7
?
?
?
?
?
Training
Predict
Model
Project A
(source)
Project X
(target)
Unlabeled
Dataset
: Metric value
: Buggy-labeled instance
: Clean-labeled instance
?: Unlabeled instance
Related Work
Watanabe@PROMISE08, Turhan@EMSE`09
Zimmermann@FSE`09, Ma@IST`12,
Zhang@MSR`14
Challenges
Same metric set
(same feature space)
• Worse than WPDP
• Heterogeneous
metrics between
source and target
Only 2% out of 622 CPDP
combinations worked.
(Zimmermann@FSE`09)
Solution 2
Using Only Unlabeled Datasets
8
?
?
?
?
?
Project X
Unlabeled
Dataset
Training
Model
Predict
Related Work
Zhong@HASE`04,
Catal@ITNG`09
• Manual Effort
Challenge
Human-intervention
9
Software Defect Prediction
on Unlabeled Datasets
Sub-problems Proposed Techniques
CPDP comparable to WPDP? Transfer Defect Learning (TCA+)
CPDP across projects with
heterogeneous metric sets?
Heterogeneous Defect Prediction (HDP)
DP using only unlabeled
datasets
without human effort?
CLAMI
10
Software Defect Prediction
on Unlabeled Datasets
Sub-problems Proposed Techniques
CPDP comparable to WPDP? Transfer Defect Learning (TCA+)
CPDP across projects with
heterogeneous metric sets?
Heterogeneous Defect Prediction (HDP)
DP using only unlabeled
datasets
without human effort?
CLAMI
CPDP
• Reason for poor prediction
performance of CPDP
– Different distributions of source and target
datasets (Pan et al@TKDE`09)
11
TCA+
12
Source Target
Oops, we are different! Let’s meet at another world!
(Projecting datasets into a latent feature space)
New Source New Target
Normalize US together!Normalization
Transfer
Component
Analysis (TCA)
+
Make different distributions between source and target
similar!
Data Normalization
• Adjust all metric values in the same
scale
– E.g., Make Mean = 0 and Std = 1
• Known to be helpful for classification
algorithms to improve prediction
performance (Han@`12).
13
Normalization Options
• N1: Min-max Normalization (max=1, min=0) [Han et
al., 2012]
• N2: Z-score Normalization (mean=0, std=1) [Han et
al., 2012]
• N3: Z-score Normalization only using source mean
and standard deviation
• N4: Z-score Normalization only using target mean
and standard deviation
• NoN: No normalization
14
Decision Rules for Normalization
• Find a suitable normalization
• Steps
– #1: Characterize a dataset
– #2: Measure similarity
between source and target datasets
– #3: Decision rules
15
Decision Rules for Normalization
#1: Characterize a dataset
3
1
…
Dataset A Dataset B
2
4
5
8
9
6
11
d1,2
d1,5
d1,3
d3,11
3
1
…
2
4
5
8
9
6
11
d2,6
d1,2
d1,3
d3,11
DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i
< j}
A
16
Decision Rules for Normalization
#2: Measure Similarity between source and
target
3
1
…
Dataset A Dataset B
2
4
5
8
9
6
11
d1,2
d1,5
d1,3
d3,11
3
1
…
2
4
5
8
9
6
11
d2,6
d1,2
d1,3
d3,11
DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i
< j}
A
17
• Minimum (min) and maximum (max) values of
DIST
• Mean and standard deviation (std) of DIST
• The number of instances
Decision Rules for Normalization
#3: Decision Rules
• Rule #1
– Mean and Std are same  NoN
• Rule #2
– Max and Min are different  N1 (max=1, min=0)
• Rule #3, #4
– Std and # of instances are different
 N3 or N4 (src/tgt mean=0, std=1)
• Rule #5
– Default  N2 (mean=0, std=1)
18
TCA
• Key idea
Source Target
New Source New Target
Oops, we are different! Let’s meet at another world!
(Projecting datasets into a latent feature space)
19
TCA (cont.)
20
Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
Target domain data
Source domain data
Buggy source instances
Clean source instances
Buggy target instances
Clean target instances
TCA (cont.)
21
TCA
Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
TCA+
22
Source Target
New Source New Target
Normalize us together with a suitable
option!
Normalization
Transfer
Component
Analysis (TCA)
+
Make different distributions between source and target
similar!
Oops, we are different! Let’s meet at another world!
(Projecting datasets into a latent feature space)
EVALUATION
23
Research Questions
• RQ1
– What is the cross-project prediction performance
of TCA/TCA+ compared to WPDP?
• RQ2
– What is the cross-project prediction performance
of TCA/TCA+ compared to that CPDP without
TCA/TCA+?
24
Experimental Setup
• 8 software subjects
• Machine learning algorithm
– Logistic regression
ReLink (Wu et al.@FSE`11)
Projects
# of metrics
(features)
Apache
26
(Source code)
Safe
ZXing
AEEEM (D’Ambros et al.@MSR`10)
Projects
# of metrics
(features)
Apache Lucene
(LC)
61
(Source code,
Churn,
Entropy,…)
Equinox (EQ)
Eclipse JDT
Eclipse PDE UI
Mylyn (ML)
25
Experimental Design
Test set
(50%)
Training set
(50%)
Within-project defect prediction (WPDP)
26
Experimental Design
Target project (Test set)
Source project (Training set)
Cross-project defect prediction (CPDP)
27
Experimental Design
Target project (Test set)
Source project (Training set)
Cross-project defect prediction with TCA/TCA+
TCA/TCA+
28
RESULTS
29
ReLink Result
Representative 3 out of 6 combinations
*CPDP: Cross-project defect prediction without
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F-measure
WPDP CPDP TCA TCA+
Safe  Apache Apache  Safe Safe  ZXing
WPDP CPDP TCA TCA+ WPDP CPDP TCA TCA+
30
ReLink Result
F-measure
Cross
Source  Target
Safe  Apache
Zxing  Apache
Apache  Safe
Zxing  Safe
Apache  ZXing
Safe  ZXing
Average
CPDP
0.52
0.69
0.49
0.59
0.46
0.10
0.49
TCA
0.64
0.64
0.72
0.70
0.45
0.42
0.59
TCA+
0.64
0.72
0.72
0.64
0.49
0.53
0.61
WPDP
0.64
0.62
0.33
0.53
*CPDP: Cross-project defect prediction without 31
AEEEM Result
Representative 3 out of 20 combinations
*CPDP: Cross-project defect prediction without TCA/TCA+
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F-measure
WPDP CPDP TCA TCA+
JDT  EQ PDE  LC PDE  ML
WPDP CPDP TCA TCA+ WPDP CPDP TCA TCA+
32
AEEEM Result
F-measure
Cross
Source  Target
JDT  EQ
LC  EQ
ML  EQ
…
PDE  LC
EQ  ML
JDT  ML
LC  ML
PDE ML
…
Average
CPDP
0.31
0.50
0.24
…
0.33
0.19
0.27
0.20
0.27
…
0.32
TCA
0.59
0.62
0.56
…
0.27
0.62
0.56
0.58
0.48
…
0.41
TCA+
0.60
0.62
0.56
…
0.33
0.62
0.56
0.60
0.54
…
0.41
WPDP
0.58
…
0.37
0.30
…
0.42
33
Related Work
Transfer
learning
Metric
Compensation
NN Filter TNB TCA+
Preprocessing N/A
Feature
selection,
Log-filter
Log-filter Normalization
Machine
learner
C4.5 Naive Bayes TNB
Logistic
Regression
# of Subjects 2 10 10 8
# of
predictions
2 10 10 26
Avg. f-
measure
0.67
(W:0.79,
C:0.58)
0.35
(W:0.37,
C:0.26)
0.39
(NN: 0.35,
C:0.33)
0.46
(W:0.46,
C:0.36)
Citation Watanabe@PROMISE
`08
Turhan@ESEJ`0
9
Ma@IST`12 Nam@ICSE`13
* NN = Nearest neighbor, W = Within, C = Cross
34
35
Software Defect Prediction
on Unlabeled Datasets
Sub-problems Proposed Techniques
CPDP comparable to WPDP? Transfer Defect Learning (TCA+)
CPDP across projects with
heterogeneous metric sets?
Heterogeneous Defect Prediction (HDP)
DP using only unlabeled
datasets
without human effort?
CLAMI
Motivation
36
?
?
?
?
?
Training
Test
Model
Project A
(source)
Project B
(target)
Same metric set
(same feature space)
CPDP
In experiments of TCA+
Datasets in ReLink Datasets in AEEEMX
Unlabeled
Dataset
Apache
Safe
JDTX
Motivation
37
?
Training
Test
Model
Project A
(source)
Project C
(target)
?
?
?
?
?
?
?
Heterogeneous metric sets
(different feature spaces
or different domains)
Possible to Reuse all the existing defect datasets for CPDP!
Heterogeneous Defect Prediction (HDP)
Key Idea
• Most defect prediction metrics
– Measure complexity of software and its
development process.
• e.g.
– The number of developers touching a source code file
(Bird@FSE`11)
– The number of methods in a class (D’Ambroas@ESEJ`12)
– The number of operands (Menzies@TSE`08)
More complexity implies more defect-proneness
(Rahman@ICSE`13)
38
Key Idea
• Most defect prediction metrics
– Measure complexity of software and its
development process.
• e.g.
– The number of developers touching a source code file
(Bird@FSE`11)
– The number of methods in a class (D’Ambroas@ESEJ`12)
– The number of operands (Menzies@TSE`08)
More complexity implies more defect-proneness
(Rahman@ICSE`13)
39
Match source and target metrics that have similar
distribution
Heterogeneous Defect Prediction (HDP)
- Overview -
40
X1 X2 X3 X4 Label
1 1 3 10 Buggy
8 0 1 0 Clean
⋮ ⋮ ⋮ ⋮ ⋮
9 0 1 1 Clean
Metric
Matching
Source: Project A Target: Project B
Cross-
prediction Model
Build
(training)
Predict
(test)
Metric
Selection
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label
3 1 1 0 2 1 9 ?
1 1 9 0 2 3 8 ?
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
0 1 1 1 2 1 1 ?
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
9 1 1 ?
8 3 9 ?
⋮ ⋮ ⋮ ⋮
1 1 1 ?
Metric Selection
• Why? (Guyon@JMLR`03)
– Select informative metrics
• Remove redundant and irrelevant metrics
– Decrease complexity of metric matching
combinations
• Feature Selection Approaches
(Gao@SPE`11,Shivaji@TSE`13)
– Gain Ratio
– Chi-square
– Relief-F
– Significance attribute evaluation
41
Metric Matching
42
Source Metrics Target Metrics
X1
X2
Y1
Y2
0.8
0.5
* We can apply different cutoff values of matching score
* It can be possible that there is no matching at all.
Compute Matching Score
KSAnalyzer
• Use p-value of Kolmogorov-Smirnov Test
(Massey@JASA`51)
43
Matching Score M of i-th source and j-th target metrics:
Mij = pij
Heterogeneous Defect Prediction
- Overview -
44
X1 X2 X3 X4 Label
1 1 3 10 Buggy
8 0 1 0 Clean
⋮ ⋮ ⋮ ⋮ ⋮
9 0 1 1 Clean
Metric
Matching
Source: Project A Target: Project B
Cross-
prediction Model
Build
(training)
Predict
(test)
Metric
Selection
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label
3 1 1 0 2 1 9 ?
1 1 9 0 2 3 8 ?
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
0 1 1 1 2 1 1 ?
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
9 1 1 ?
8 3 9 ?
⋮ ⋮ ⋮ ⋮
1 1 1 ?
EVALUATION
45
Baselines
• WPDP
• CPDP-CM (Turhan@EMSE`09,Ma@IST`12,He@IST`14)
– Cross-project defect prediction using only
common metrics between source and target
datasets
• CPDP-IFS (He@CoRR`14)
– Cross-project defect prediction on
Imbalanced Feature Set (i.e. heterogeneous
metric set)
– 16 distributional characteristics of values of
an instance as features (e.g., mean, std,
maximum,...)
46
Research Questions (RQs)
• RQ1
– Is heterogeneous defect prediction comparable
to WPDP?
• RQ2
– Is heterogeneous defect prediction comparable
to CPDP-CM?
• RQ3
– Is Heterogeneous defect prediction comparable
to CPDP-IFS?
47
Benchmark Datasets
Group Dataset
# of instances # of
metrics
Granularity
All Buggy (%)
AEEEM
EQ 325 129 (39.7%)
61 Class
JDT 997 206 (20.7%)
LC 399 64 (9.36%)
ML 1862 245 (13.2%)
PDE 1492 209 (14.0%)
MORP
H
ant-1.3 125 20 (16.0%)
20 Class
arc 234 27 (11.5%)
camel-1.0 339 13 (3.8%)
poi-1.5 237 141 (75.0%)
redaktor 176 27 (15.3%)
skarbonka 45 9 (20.0%)
tomcat 858 77 (9.0%)
velocity-1.4 196 147 (75.0%)
xalan-2.4 723 110 (15.2%)
xerces-1.2 440 71 (16.1%)
48
Group Dataset
# of instances # of
metrics
Granularity
All Buggy (%)
ReLink
Apache 194 98 (50.5%)
26 FileSafe 56 22 (39.3%)
ZXing 399
118
(29.6%)
NASA
cm1 327 42 (12.8%)
37 Function
mw1 253 27 (10.7%)
pc1 705 61 (8.7%)
pc3 1077
134
(12.4%)
pc4 1458
178
(12.2%)
SOFTLA
B
ar1 121 9 (7.4%)
29 Function
ar3 63 8 (12.7%)
ar4 107 20 (18.7%)
ar5 36 8 (22.2%)
ar6 101 15 (14.9%)
600 prediction combinations in total!
Experimental Settings
• Logistic Regression
• HDP vs. WPDP, CPDP-CM, and CPDP-IFS
49
Test set
(50%)
Training set
(50%)
Project
1
Project
2
Project
n
...
...
X 1000
Project
1
Project
2
Project
n
...
...
CPDP-CM
CPDP-IFS
HDP
WPDP
Project A
Evaluation Measures
• False Positive Rate = FP/(TN+FP)
• True Positive Rate = Recall
• AUC (Area Under receiver operating characteristic Curve)
50
False Positive rate
TruePositiverate
0
1
1
Evaluation Measures
• Win/Tie/Loss (Valentini@ICML`03, Li@JASE`12, Kocaguneli@TSE`13)
– Wilcoxon signed-rank test (p<0.05) for 1000
prediction results
– Win
• # of outperforming HDP prediction combinations with
statistical significance. (p<0.05)
– Tie
• # of HDP prediction combinations with no statistical
significance. (p≥0.05)
– Loss
• # of outperforming baseline prediction results with
statistical significance. (p>0.05)
51
RESULT
52
Prediction Results in median
AUC
Target WPDP
CPDP-
CM
CPDP-
IFS
HDPKS
(cutoff
=0.05)
EQ 0.583 0.776 0.461 0.783
JDT 0.795 0.781 0.543 0.767
MC 0.575 0.636 0.584 0.655
ML 0.734 0.651 0.557 0.692*
PDE 0.684 0.682 0.566 0.717
ant-1.3 0.670 0.611 0.500 0.701
arc 0.670 0.611 0.523 0.701
camel-1.0 0.550 0.590 0.500 0.639
poi-1.5 0.707 0.676 0.606 0.537
redaktor 0.744 0.500 0.500 0.537
skarbonka 0.569 0.736 0.528 0.694*
tomcat 0.778 0.746 0.640 0.818
velocity-
1.4
0.725 0.609 0.500 0.391
xalan-2.4 0.755 0.658 0.499 0.751
xerces-1.2 0.624 0.453 0.500 0.489
53
Target WPDP
CPDP-
CM
CPDP-
IFS
HDPKS
(cutoff
=0.05)
Apache 0.714 0.689 0.635 0.717*
Safe 0.706 0.749 0.616 0.818*
ZXing 0.605 0.619 0.530 0.650*
cm1 0.653 0.622 0.551 0.717*
mw1 0.612 0.584 0.614 0.727
pc1 0.787 0.675 0.564 0.752*
pc3 0.794 0.665 0.500 0.738*
pc4 0.900 0.773 0.589 0.682*
ar1 0.582 0.464 0.500 0.734*
ar3 0.574 0.862 0.682 0.823*
ar4 0.657 0.588 0.575 0.816*
ar5 0.804 0.875 0.585 0.911*
ar6 0.654 0.611 0.527 0.640
All 0.657 0.636 0.555 0.724*
HDPKS: Heterogeneous defect prediction using KSAnalyzer
Win/Tie/Loss Results
Target
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
W T L W T L W T L
EQ 4 0 0 2 2 0 4 0 0
JDT 0 0 5 3 0 2 5 0 0
LC 6 0 1 3 3 1 3 1 3
ML 0 0 6 4 2 0 6 0 0
PDE 3 0 2 2 0 3 5 0 0
ant-1.3 6 0 1 6 0 1 5 0 2
arc 3 1 0 3 0 1 4 0 0
camel-1.0 3 0 2 3 0 2 4 0 1
poi-1.5 2 0 2 3 0 1 2 0 2
redaktor 0 0 4 2 0 2 3 0 1
skarbonka 11 0 0 4 0 7 9 0 2
tomcat 2 0 0 1 1 0 2 0 0
velocity-
1.4
0 0 3 0 0 3 0 0 3
xalan-2.4 0 0 1 1 0 0 1 0 0
xerces-1.2 0 0 3 3 0 0 1 0 2 54
Target
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
W T L W T L W T L
Apach
e
6 0 5 8 1 2 9 0 2
Safe 14 0 3 12 0 5 15 0 2
ZXing 8 0 0 6 0 2 7 0 1
cm1 7 1 2 8 0 2 9 0 1
mw1 5 0 1 4 0 2 4 0 2
pc1 1 0 5 5 0 1 6 0 0
pc3 0 0 7 7 0 0 7 0 0
pc4 0 0 7 2 0 5 7 0 0
ar1 14 0 1 14 0 1 11 0 4
ar3 15 0 0 5 0 10 10 2 3
ar4 16 0 0 14 1 1 15 0 1
ar5 14 0 4 14 0 4 16 0 2
ar6 7 1 7 8 4 3 12 0 3
Total 147 3 72 147 14 61 182 3 35
%
66.2
%
1.4%
32.4
%
66.2
%
6.3%
27.5
%
82.0
%
1.3%
16.7
%
Matched Metrics (Win)
55
MetricValues
Distribution
(Source metric: RFC-the number of method invoked by a class, Target metric: the number of operand
Matching Score = 0.91
AUC = 0.946 (ant1.3  ar5)
Matched Metrics (Loss)
56
MetricValues
Distribution
(Source metric: LOC, Target metric: average number of LOC in a method)
Matching Score = 0.13
AUC = 0.391 (Safe  velocity-1.4)
Different Feature Selections
(median AUCs, Win/Tie/Loss)
57
Approach
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
HDP
AUC Win% AUC Win% AUC Win% AUC
Gain Ratio 0.657 63.7% 0.645 63.2% 0.536 80.2% 0.720
Chi-Square 0.657 64.7% 0.651 66.4% 0.556 82.3% 0.727
Significanc
e
0.657 66.2% 0.636 66.2% 0.553 82.0% 0.724
Relief-F 0.670 57.0% 0.657 63.1% 0.543 80.5% 0.709
None 0.657 47.3% 0.624 50.3% 0.536 66.3% 0.663
Results in Different Cutoffs
58
Cutoff
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
HDP Target
Coverage
AUC Win% AUC Win% AUC Win% AUC
0.05 0.657 66.2% 0.636 66.2% 0.553 82.4% 0.724* 100%
0.90 0.657 100% 0.761 71.4% 0.624 100% 0.852* 21%
59
Software Defect Prediction
on Unlabeled Datasets
Sub-problems Proposed Techniques
CPDP comparable to WPDP? Transfer Defect Learning (TCA+)
CPDP across projects with
heterogeneous metric sets?
Heterogeneous Defect Prediction (HDP)
DP using only unlabeled
datasets
without human effort?
CLAMI
Motivation
60
- Loss result of HDP
Motivation
61
- Loss result of HDP
Still difficult to make different
distribution similar!
Motivation
62
Training
Predict
Unlabeled Dataset
What if....
?
How?
• Recall the trend of defect prediction metrics
– Measures complexity of software and its
development process.
• e.g.
– The number of developers touching a source code file
(Bird@FSE`11)
– The number of methods in a class (D’Ambroas@ESEJ`12)
– The number of operands (Menzies@TSE`08)
Higher metric values imply more defect-proneness
(Rahman@ICSE`13)
63
How?
• Recall this trend of defect prediction metrics
– Measures complexity of software and its
development process.
• e.g.
– The number of developers touching a source code file
(Bird@FSE`11)
– The number of methods in a class (D’Ambroas@ESEJ`12)
– The number of operands (Menzies@TSE`08)
Higher metric values imply more defect-proneness
(Rahman@ICSE`13)
64
(1) Label instances that have higher metric values as
buggy!
(2) Generate a training set by removing metrics and
instances that violates (1).
CLAMI Approach Overview
65
Unlabeled
Dataset
(1) Clustering
(2) LAbeling
(3) Metric Selection
(4) Instance Selection
(5) Metric
Selection
CLAMI
Model
Build
Predict
Training dataset
Test dataset
CLAMI Approach
- Clustering and Labeling Clusters -
66
Cluster, K=3
Unlabeled Dataset
X1 X2 X3 X4 X5 X6 X7 Label
3 1 3 0 5 1 9 ?
1 1 2 0 7 3 8 ?
2 3 2 5 5 2 1 ?
0 0 8 1 0 1 9 ?
1 0 2 5 6 10 8 ?
1 4 1 1 7 1 1 ?
1 0 1 0 0 1 7 ?
1 1 2 1 5 1 8Median
Inst.
A
Inst. B
Inst.
C
Inst.
D
Inst. E
Inst. F
inst.
G
Instance
s
K = the number of higher metric
values that are greater than Median.
C
Cluster, K=4
A, E
B, D, F
Cluster, K=2
G
Cluster, K=0
(1) Clustering (2) Labeling Clusters
Higher values : buggy clusters : clean clusters
CLAMI Approach
- Metric Selection -
67
{X1,X4}
X1 X2 X3 X4 X5 X6 X7 Label
3 1 3 0 5 1 9 Buggy
1 1 2 0 7 3 8 Clean
2 3 2 5 5 2 1 Buggy
0 0 8 1 0 1 9 Clean
1 0 2 5 6 10 8 Buggy
1 4 1 1 7 1 1 Clean
1 0 1 0 0 1 7 Clean
Inst.
A
Inst. B
Inst. C
Inst. D
Inst. E
Inst. F
Inst.
G 1 3 3 1 4 2 3
# of
Violations
Selected Metrics
Violation: a metric value that does not follow its label!
Higher values are bold-facedViolations
CLAMI Approach
- Instance Selection -
68
X1 X4 Label
3 0 Buggy
1 0 Clean
2 5 Buggy
0 1 Clean
1 5 Buggy
1 1 Clean
1 0 Clean
Inst. A
Inst. B
Inst. C
Inst. D
Inst. E
Inst. F
Inst. G
X1 X4 Label
1 0 Clean
2 5 Buggy
0 1 Clean
1 1 Clean
1 0 Clean
Inst. B
Inst. C
Inst. D
Inst. F
Inst. G
Final Training Dataset
CLAMI Approach Overview
69
Unlabeled
Dataset
(1) Clustering
(2) LAbeling
(3) Metric Selection
(4) Instance Selection
(5) Metric
Selection
CLAMI
Model
Build
Predict
Training dataset
Test dataset
EVALUATION
70
Baselines
• Supervised learning model (i.e. WPDP)
• Defect prediction only using unlabeled
datasets
– Expert-based (Zhong@HASE`04)
• Cluster instances by K-Mean into 20 clusters
• A human expert labels each cluster
– Threshold-based (Catal@ITNG`09)
• [LoC, CC, UOP, UOpnd, TOp, TOpnd]
= [65, 10, 25, 40, 125, 70]
– Label an instance whose any metric value is greater
than a threshold value
• Manual effort requires to decide threshold values in
advance.
71
Research Questions (RQs)
• RQ1
– CLAMI vs. Supervised learning model?
• RQ2
– CLAMI vs. Expert-/threshold-based approaches?
(Zhong@HASE`04, Catal@ITNG`09)
72
Benchmark Datasets
Group Dataset
# of instnaces # of
metrics
Prediction
GranularityAll Buggy (%)
NetGene
Httpclient 361
205
(56.8%)
465
(Network,
Change
genealogy)
File
Jackrabbit 542
225
(41.5%)
Lucene 1671
346
(10.7%)
Rhino 253
109
(43.1%)
ReLink
Apache 194 98 (50.5%)
26
(code
complexity)
File
Safe 56
22
(39.29%)
ZXing 399
118
(29.6%)
73
Experimental Settings (RQ1)
- Supervised learning model -
74
Test set (50%)
Training set (50%)
Supervised
Model
(Baseline)
Training
Predict
X 1000
CLAMI
Model
Training
Predict
Experimental Settings (RQ2)
-Comparison to existing approaches -
75
Unlabeled Dataset
CLAMI
Model
Predict
Training
Predict
Threshold-
Based
(Baseline1,
Catal@ITNG`09)
Expert-
Based
(Baseline2,
Zhong@HASE`04)
Measure
• F-measure
• AUC
76
RESULT
77
Supervised model vs. CLAMI
Dataset
F-measure AUC
Supervise
d
(w/ labels)
CLAMI
(w/o
labels)
+/-%
Supervise
d
(w/ labels)
CLAMI
(w/o
labels)
+/-%
Httpclient 0.729 0.722 -1.0% 0.727 0.772 +6.2%
Jackrabbi
t
0.649 0.685 +5.5% 0.727 0.751 +3.2%
Lucene 0.508 0.397 -21.8% 0.708 0.595 -15.9%
Rhino 0.639 0.752 +17.7% 0.702 0.777 +10.7%
Apache 0.653 0.720 +10.2% 0.714 0.753 +5.3%
Safe 0.615 0.667 +8.3% 0.706 0.773 +9.5%
ZXing 0.333 0.497 +49.0% 0.605 0.644 +6.4%
Median 0.639 0.685 +7.2% 0.707 0.753 +6.3%
78
Existing approaches vs. CLAMI
f-measure
Dataset Threshold-based Expert-based CLAMI
Httpclient 0.355 0.811 0.756
Jackrabbit 0.184 0.676 0.685
Lucene 0.144 0.000 0.404
Rhino 0.190 0.707 0.731
Apache 0.547 0.701 0.725
Safe 0.308 0.718 0.694
ZXing 0.228 0.402 0.505
Median 0.228 0.701 0.694
79
Distributions of metrics (Safe)
80
Most frequently selected metrics by CLAMI
Metrics with less discriminative power
Distributions of metrics (Lucene)
81
Most frequently selected metrics by CLAMI
Metrics with less discriminative power
82
Software Defect Prediction
on Unlabeled Datasets
Sub-problems Proposed Techniques
CPDP comparable to WPDP? Transfer Defect Learning (TCA+)
CPDP across projects with
heterogeneous metric sets?
Heterogeneous Defect Prediction (HDP)
DP using only unlabeled
datasets
without human effort?
CLAMI
Conclusion
83
Sub-problems
Technique 1:
TCA+
Technique 2:
HDP
Technique 3:
CLAMI
Comparable prediction
performance than WPDP
O
(in f-measure)
O
(in AUC)
O
Able to handle
heterogeneous metric
sets
X O O
Automated
without human effort O O O
Publications at HKUST
• Defect Prediction
– Micro Interaction Metrics for Defect Prediction@FSE`11, Taek Lee,
Jaechang Nam, Donggyun Han, Sunghun Kim and Hoh Peter In
– Transfer Defect Learning@ICSE`13, Jaechang Nam, Sinno Jialin Pan and
Sunghun Kim, Nominee, ACM SIGSOFT Distinguished Paper Award
– Heterogeneous Defect Prediction@FSE`15, Jaechang Nam ann Sunghun Kim
– REMI: Defect Prediction for Efficient API Testing@FSE`15, Mijung Kim,
Jaechang Nam, Jaehyuk Yeon, Soonhwang Choi, and Sunghun Kim, Industrial
Track
– CLAMI: Defect Prediction on Unlabeled Datasets@ASE`15, Jaechang Nam
and Sunghun Kim
• Testing
– Calibrated Mutation Testing@MUTATION`12, Jaechang Nam, David Schuler,
and Andreas Zeller
• Automated bug-fixing
– Automatic Patch Generation Learned from Human-written
Patches@ICSE`13, Dongsun Kim, Jaechang Nam, Jaewoo Song and Sunghun
Kim, ACM SIGSOFT Distinguished Paper Award Winner
84
Cross-
Prediction
Feasibility
Check
CLAMI
NoSame
metric
set?
TCA+
Feasibl
e?
Yes
No
Yes
HDP
Unlabeled
Project
Dataset Existing
Labeled
Project
Datasets
Ensemble model for defect prediction on unlabeled datasets
85
Q&A
THANK YOU!
86

More Related Content

What's hot

Big Data Pipelines and Machine Learning at Uber
Big Data Pipelines and Machine Learning at UberBig Data Pipelines and Machine Learning at Uber
Big Data Pipelines and Machine Learning at UberSudhir Tonse
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsEnrico Palumbo
 
Microservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native AppsMicroservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native AppsAraf Karsh Hamid
 
Requirement and Specification
Requirement and SpecificationRequirement and Specification
Requirement and Specificationsarojsaroza
 
Cloud Application architecture styles
Cloud Application architecture styles Cloud Application architecture styles
Cloud Application architecture styles Nilay Shrivastava
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jWilliam Lyon
 
DevOps: Benefits & Future Trends
DevOps: Benefits & Future TrendsDevOps: Benefits & Future Trends
DevOps: Benefits & Future Trends9 series
 
Semantic kernel - Do you need Python to play with LLM?
Semantic kernel - Do you need Python to play with LLM?Semantic kernel - Do you need Python to play with LLM?
Semantic kernel - Do you need Python to play with LLM?Marco De Nittis
 
Introduction to Service Oriented Architecture
Introduction to Service Oriented ArchitectureIntroduction to Service Oriented Architecture
Introduction to Service Oriented ArchitectureDATA Inc.
 
DevOps Explained
DevOps ExplainedDevOps Explained
DevOps ExplainedDevOpsAnon
 
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital Factory
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital FactoryModeling Manufacturing With Graph Databases: A Journey Towards a Digital Factory
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital FactoryNeo4j
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceNeo4j
 
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdfBoreVishnusai
 
Taming the Wild West of NLP
Taming the Wild West of NLPTaming the Wild West of NLP
Taming the Wild West of NLPYunyao Li
 
Monoliths and Microservices
Monoliths and Microservices Monoliths and Microservices
Monoliths and Microservices Bozhidar Bozhanov
 

What's hot (20)

Big Data Pipelines and Machine Learning at Uber
Big Data Pipelines and Machine Learning at UberBig Data Pipelines and Machine Learning at Uber
Big Data Pipelines and Machine Learning at Uber
 
Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender Systems
 
Microservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native AppsMicroservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native Apps
 
Requirement and Specification
Requirement and SpecificationRequirement and Specification
Requirement and Specification
 
DevOps
DevOpsDevOps
DevOps
 
Cloud Application architecture styles
Cloud Application architecture styles Cloud Application architecture styles
Cloud Application architecture styles
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
DevOps: Benefits & Future Trends
DevOps: Benefits & Future TrendsDevOps: Benefits & Future Trends
DevOps: Benefits & Future Trends
 
Semantic kernel - Do you need Python to play with LLM?
Semantic kernel - Do you need Python to play with LLM?Semantic kernel - Do you need Python to play with LLM?
Semantic kernel - Do you need Python to play with LLM?
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Introduction to microservices
Introduction to microservicesIntroduction to microservices
Introduction to microservices
 
Introduction to Service Oriented Architecture
Introduction to Service Oriented ArchitectureIntroduction to Service Oriented Architecture
Introduction to Service Oriented Architecture
 
DevOps Explained
DevOps ExplainedDevOps Explained
DevOps Explained
 
DevOps introduction
DevOps introductionDevOps introduction
DevOps introduction
 
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital Factory
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital FactoryModeling Manufacturing With Graph Databases: A Journey Towards a Digital Factory
Modeling Manufacturing With Graph Databases: A Journey Towards a Digital Factory
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data Science
 
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdf
 
Taming the Wild West of NLP
Taming the Wild West of NLPTaming the Wild West of NLP
Taming the Wild West of NLP
 
Monoliths and Microservices
Monoliths and Microservices Monoliths and Microservices
Monoliths and Microservices
 

Similar to Software Defect Prediction on Unlabeled Datasets

Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaopenseesdays
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceCS, NcState
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
 
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...Jihun Park
 
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...Annibale Panichella
 
Spm software effort estimation
Spm software effort estimationSpm software effort estimation
Spm software effort estimationKanchana Devi
 
final_ICSE '22 Presentaion_Sherry.pdf
final_ICSE '22 Presentaion_Sherry.pdffinal_ICSE '22 Presentaion_Sherry.pdf
final_ICSE '22 Presentaion_Sherry.pdfXueqiYang
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
Enabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial IntelligenceEnabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial IntelligenceLionel Briand
 
Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Sayed Mohsin Reza
 

Similar to Software Defect Prediction on Unlabeled Datasets (20)

Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
Don't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptxDon't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptx
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
 
STRICT-SANER2017
STRICT-SANER2017STRICT-SANER2017
STRICT-SANER2017
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
 
AIRS2016
AIRS2016AIRS2016
AIRS2016
 
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
 
Spm software effort estimation
Spm software effort estimationSpm software effort estimation
Spm software effort estimation
 
final_ICSE '22 Presentaion_Sherry.pdf
final_ICSE '22 Presentaion_Sherry.pdffinal_ICSE '22 Presentaion_Sherry.pdf
final_ICSE '22 Presentaion_Sherry.pdf
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Enabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial IntelligenceEnabling Automated Software Testing with Artificial Intelligence
Enabling Automated Software Testing with Artificial Intelligence
 
Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...
 

More from Sung Kim

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
 
Time series classification
Time series classificationTime series classification
Time series classificationSung Kim
 
Tensor board
Tensor boardTensor board
Tensor boardSung Kim
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesSung Kim
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Sung Kim
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Sung Kim
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...Sung Kim
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test GenerationSung Kim
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect PredictionSung Kim
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSung Kim
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesSung Kim
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksSung Kim
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionSung Kim
 

More from Sung Kim (20)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
 
Time series classification
Time series classificationTime series classification
Time series classification
 
Tensor board
Tensor boardTensor board
Tensor board
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash Reproduction
 

Recently uploaded

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)Max Lee
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityamy56318795
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024vaibhav130304
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Soroosh Khodami
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfMeon Technology
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareinfo611746
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockSkilrock Technologies
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems ApproachNeo4j
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfmbmh111980
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesNeo4j
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with StrimziStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzisteffenkarlsson2
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfFurqanuddin10
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationWave PLM
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...rajkumar669520
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1KnowledgeSeed
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationHelp Desk Migration
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdfkalichargn70th171
 

Recently uploaded (20)

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with StrimziStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 

Software Defect Prediction on Unlabeled Datasets

  • 1. Software Defect Prediction on Unlabeled Datasets - PhD Thesis Defence - July 23, 2015 Jaechang Nam Department of Computer Science and Engineering HKUST
  • 2. Software Defect Prediction • General question of software defect prediction – Can we identify defect-prone entities (source code file, binary, module, change,...) in advance? • # of defects • buggy or clean • Why? (applications) – Quality assurance for large software (Akiyama@IFIP’71) – Effective resource allocation • Testing (Menzies@TSE`07, Kim@FSE`15) • Code review (Rahman@FSE’11) 2
  • 3. 3 Predict Training ? ? Model Project A : Metric value : Buggy-labeled instance : Clean-labeled instance ?: Unlabeled instance Software Defect Prediction Related Work Munson@TSE`92, Basili@TSE`95, Menzies@TSE`07, Hassan@ICSE`09, Bird@FSE`11,D’ambros@EMSE112 Lee@FSE`11,...
  • 4. What if labeled instances do not exist? 4 ? ? ? ? ? Project X Unlabeled Dataset ?: Unlabeled instance : Metric value Model New projects Projects lacking in historical data
  • 5. This problem is... 5 ? ? ? ? ? Project X Unlabeled Dataset ?: Unlabeled instance : Metric value Software Defect Prediction on Unlabeled Datasets
  • 6. Existing Solutions? 6 ? ? ? ? ? (New) Project X Unlabeled Dataset ?: Unlabeled instance : Metric value
  • 7. Solution 1 Cross-Project Defect Prediction (CPDP) 7 ? ? ? ? ? Training Predict Model Project A (source) Project X (target) Unlabeled Dataset : Metric value : Buggy-labeled instance : Clean-labeled instance ?: Unlabeled instance Related Work Watanabe@PROMISE08, Turhan@EMSE`09 Zimmermann@FSE`09, Ma@IST`12, Zhang@MSR`14 Challenges Same metric set (same feature space) • Worse than WPDP • Heterogeneous metrics between source and target Only 2% out of 622 CPDP combinations worked. (Zimmermann@FSE`09)
  • 8. Solution 2 Using Only Unlabeled Datasets 8 ? ? ? ? ? Project X Unlabeled Dataset Training Model Predict Related Work Zhong@HASE`04, Catal@ITNG`09 • Manual Effort Challenge Human-intervention
  • 9. 9 Software Defect Prediction on Unlabeled Datasets Sub-problems Proposed Techniques CPDP comparable to WPDP? Transfer Defect Learning (TCA+) CPDP across projects with heterogeneous metric sets? Heterogeneous Defect Prediction (HDP) DP using only unlabeled datasets without human effort? CLAMI
  • 10. 10 Software Defect Prediction on Unlabeled Datasets Sub-problems Proposed Techniques CPDP comparable to WPDP? Transfer Defect Learning (TCA+) CPDP across projects with heterogeneous metric sets? Heterogeneous Defect Prediction (HDP) DP using only unlabeled datasets without human effort? CLAMI
  • 11. CPDP • Reason for poor prediction performance of CPDP – Different distributions of source and target datasets (Pan et al@TKDE`09) 11
  • 12. TCA+ 12 Source Target Oops, we are different! Let’s meet at another world! (Projecting datasets into a latent feature space) New Source New Target Normalize US together!Normalization Transfer Component Analysis (TCA) + Make different distributions between source and target similar!
  • 13. Data Normalization • Adjust all metric values in the same scale – E.g., Make Mean = 0 and Std = 1 • Known to be helpful for classification algorithms to improve prediction performance (Han@`12). 13
  • 14. Normalization Options • N1: Min-max Normalization (max=1, min=0) [Han et al., 2012] • N2: Z-score Normalization (mean=0, std=1) [Han et al., 2012] • N3: Z-score Normalization only using source mean and standard deviation • N4: Z-score Normalization only using target mean and standard deviation • NoN: No normalization 14
  • 15. Decision Rules for Normalization • Find a suitable normalization • Steps – #1: Characterize a dataset – #2: Measure similarity between source and target datasets – #3: Decision rules 15
  • 16. Decision Rules for Normalization #1: Characterize a dataset 3 1 … Dataset A Dataset B 2 4 5 8 9 6 11 d1,2 d1,5 d1,3 d3,11 3 1 … 2 4 5 8 9 6 11 d2,6 d1,2 d1,3 d3,11 DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i < j} A 16
  • 17. Decision Rules for Normalization #2: Measure Similarity between source and target 3 1 … Dataset A Dataset B 2 4 5 8 9 6 11 d1,2 d1,5 d1,3 d3,11 3 1 … 2 4 5 8 9 6 11 d2,6 d1,2 d1,3 d3,11 DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i < j} A 17 • Minimum (min) and maximum (max) values of DIST • Mean and standard deviation (std) of DIST • The number of instances
  • 18. Decision Rules for Normalization #3: Decision Rules • Rule #1 – Mean and Std are same  NoN • Rule #2 – Max and Min are different  N1 (max=1, min=0) • Rule #3, #4 – Std and # of instances are different  N3 or N4 (src/tgt mean=0, std=1) • Rule #5 – Default  N2 (mean=0, std=1) 18
  • 19. TCA • Key idea Source Target New Source New Target Oops, we are different! Let’s meet at another world! (Projecting datasets into a latent feature space) 19
  • 20. TCA (cont.) 20 Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis Target domain data Source domain data Buggy source instances Clean source instances Buggy target instances Clean target instances
  • 21. TCA (cont.) 21 TCA Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
  • 22. TCA+ 22 Source Target New Source New Target Normalize us together with a suitable option! Normalization Transfer Component Analysis (TCA) + Make different distributions between source and target similar! Oops, we are different! Let’s meet at another world! (Projecting datasets into a latent feature space)
  • 24. Research Questions • RQ1 – What is the cross-project prediction performance of TCA/TCA+ compared to WPDP? • RQ2 – What is the cross-project prediction performance of TCA/TCA+ compared to that CPDP without TCA/TCA+? 24
  • 25. Experimental Setup • 8 software subjects • Machine learning algorithm – Logistic regression ReLink (Wu et al.@FSE`11) Projects # of metrics (features) Apache 26 (Source code) Safe ZXing AEEEM (D’Ambros et al.@MSR`10) Projects # of metrics (features) Apache Lucene (LC) 61 (Source code, Churn, Entropy,…) Equinox (EQ) Eclipse JDT Eclipse PDE UI Mylyn (ML) 25
  • 26. Experimental Design Test set (50%) Training set (50%) Within-project defect prediction (WPDP) 26
  • 27. Experimental Design Target project (Test set) Source project (Training set) Cross-project defect prediction (CPDP) 27
  • 28. Experimental Design Target project (Test set) Source project (Training set) Cross-project defect prediction with TCA/TCA+ TCA/TCA+ 28
  • 30. ReLink Result Representative 3 out of 6 combinations *CPDP: Cross-project defect prediction without 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 F-measure WPDP CPDP TCA TCA+ Safe  Apache Apache  Safe Safe  ZXing WPDP CPDP TCA TCA+ WPDP CPDP TCA TCA+ 30
  • 31. ReLink Result F-measure Cross Source  Target Safe  Apache Zxing  Apache Apache  Safe Zxing  Safe Apache  ZXing Safe  ZXing Average CPDP 0.52 0.69 0.49 0.59 0.46 0.10 0.49 TCA 0.64 0.64 0.72 0.70 0.45 0.42 0.59 TCA+ 0.64 0.72 0.72 0.64 0.49 0.53 0.61 WPDP 0.64 0.62 0.33 0.53 *CPDP: Cross-project defect prediction without 31
  • 32. AEEEM Result Representative 3 out of 20 combinations *CPDP: Cross-project defect prediction without TCA/TCA+ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 F-measure WPDP CPDP TCA TCA+ JDT  EQ PDE  LC PDE  ML WPDP CPDP TCA TCA+ WPDP CPDP TCA TCA+ 32
  • 33. AEEEM Result F-measure Cross Source  Target JDT  EQ LC  EQ ML  EQ … PDE  LC EQ  ML JDT  ML LC  ML PDE ML … Average CPDP 0.31 0.50 0.24 … 0.33 0.19 0.27 0.20 0.27 … 0.32 TCA 0.59 0.62 0.56 … 0.27 0.62 0.56 0.58 0.48 … 0.41 TCA+ 0.60 0.62 0.56 … 0.33 0.62 0.56 0.60 0.54 … 0.41 WPDP 0.58 … 0.37 0.30 … 0.42 33
  • 34. Related Work Transfer learning Metric Compensation NN Filter TNB TCA+ Preprocessing N/A Feature selection, Log-filter Log-filter Normalization Machine learner C4.5 Naive Bayes TNB Logistic Regression # of Subjects 2 10 10 8 # of predictions 2 10 10 26 Avg. f- measure 0.67 (W:0.79, C:0.58) 0.35 (W:0.37, C:0.26) 0.39 (NN: 0.35, C:0.33) 0.46 (W:0.46, C:0.36) Citation Watanabe@PROMISE `08 Turhan@ESEJ`0 9 Ma@IST`12 Nam@ICSE`13 * NN = Nearest neighbor, W = Within, C = Cross 34
  • 35. 35 Software Defect Prediction on Unlabeled Datasets Sub-problems Proposed Techniques CPDP comparable to WPDP? Transfer Defect Learning (TCA+) CPDP across projects with heterogeneous metric sets? Heterogeneous Defect Prediction (HDP) DP using only unlabeled datasets without human effort? CLAMI
  • 36. Motivation 36 ? ? ? ? ? Training Test Model Project A (source) Project B (target) Same metric set (same feature space) CPDP In experiments of TCA+ Datasets in ReLink Datasets in AEEEMX Unlabeled Dataset Apache Safe JDTX
  • 37. Motivation 37 ? Training Test Model Project A (source) Project C (target) ? ? ? ? ? ? ? Heterogeneous metric sets (different feature spaces or different domains) Possible to Reuse all the existing defect datasets for CPDP! Heterogeneous Defect Prediction (HDP)
  • 38. Key Idea • Most defect prediction metrics – Measure complexity of software and its development process. • e.g. – The number of developers touching a source code file (Bird@FSE`11) – The number of methods in a class (D’Ambroas@ESEJ`12) – The number of operands (Menzies@TSE`08) More complexity implies more defect-proneness (Rahman@ICSE`13) 38
  • 39. Key Idea • Most defect prediction metrics – Measure complexity of software and its development process. • e.g. – The number of developers touching a source code file (Bird@FSE`11) – The number of methods in a class (D’Ambroas@ESEJ`12) – The number of operands (Menzies@TSE`08) More complexity implies more defect-proneness (Rahman@ICSE`13) 39 Match source and target metrics that have similar distribution
  • 40. Heterogeneous Defect Prediction (HDP) - Overview - 40 X1 X2 X3 X4 Label 1 1 3 10 Buggy 8 0 1 0 Clean ⋮ ⋮ ⋮ ⋮ ⋮ 9 0 1 1 Clean Metric Matching Source: Project A Target: Project B Cross- prediction Model Build (training) Predict (test) Metric Selection Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label 3 1 1 0 2 1 9 ? 1 1 9 0 2 3 8 ? ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 0 1 1 1 2 1 1 ? 1 3 10 Buggy 8 1 0 Clean ⋮ ⋮ ⋮ ⋮ 9 1 1 Clean 1 3 10 Buggy 8 1 0 Clean ⋮ ⋮ ⋮ ⋮ 9 1 1 Clean 9 1 1 ? 8 3 9 ? ⋮ ⋮ ⋮ ⋮ 1 1 1 ?
  • 41. Metric Selection • Why? (Guyon@JMLR`03) – Select informative metrics • Remove redundant and irrelevant metrics – Decrease complexity of metric matching combinations • Feature Selection Approaches (Gao@SPE`11,Shivaji@TSE`13) – Gain Ratio – Chi-square – Relief-F – Significance attribute evaluation 41
  • 42. Metric Matching 42 Source Metrics Target Metrics X1 X2 Y1 Y2 0.8 0.5 * We can apply different cutoff values of matching score * It can be possible that there is no matching at all.
  • 43. Compute Matching Score KSAnalyzer • Use p-value of Kolmogorov-Smirnov Test (Massey@JASA`51) 43 Matching Score M of i-th source and j-th target metrics: Mij = pij
  • 44. Heterogeneous Defect Prediction - Overview - 44 X1 X2 X3 X4 Label 1 1 3 10 Buggy 8 0 1 0 Clean ⋮ ⋮ ⋮ ⋮ ⋮ 9 0 1 1 Clean Metric Matching Source: Project A Target: Project B Cross- prediction Model Build (training) Predict (test) Metric Selection Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label 3 1 1 0 2 1 9 ? 1 1 9 0 2 3 8 ? ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 0 1 1 1 2 1 1 ? 1 3 10 Buggy 8 1 0 Clean ⋮ ⋮ ⋮ ⋮ 9 1 1 Clean 1 3 10 Buggy 8 1 0 Clean ⋮ ⋮ ⋮ ⋮ 9 1 1 Clean 9 1 1 ? 8 3 9 ? ⋮ ⋮ ⋮ ⋮ 1 1 1 ?
  • 46. Baselines • WPDP • CPDP-CM (Turhan@EMSE`09,Ma@IST`12,He@IST`14) – Cross-project defect prediction using only common metrics between source and target datasets • CPDP-IFS (He@CoRR`14) – Cross-project defect prediction on Imbalanced Feature Set (i.e. heterogeneous metric set) – 16 distributional characteristics of values of an instance as features (e.g., mean, std, maximum,...) 46
  • 47. Research Questions (RQs) • RQ1 – Is heterogeneous defect prediction comparable to WPDP? • RQ2 – Is heterogeneous defect prediction comparable to CPDP-CM? • RQ3 – Is Heterogeneous defect prediction comparable to CPDP-IFS? 47
  • 48. Benchmark Datasets Group Dataset # of instances # of metrics Granularity All Buggy (%) AEEEM EQ 325 129 (39.7%) 61 Class JDT 997 206 (20.7%) LC 399 64 (9.36%) ML 1862 245 (13.2%) PDE 1492 209 (14.0%) MORP H ant-1.3 125 20 (16.0%) 20 Class arc 234 27 (11.5%) camel-1.0 339 13 (3.8%) poi-1.5 237 141 (75.0%) redaktor 176 27 (15.3%) skarbonka 45 9 (20.0%) tomcat 858 77 (9.0%) velocity-1.4 196 147 (75.0%) xalan-2.4 723 110 (15.2%) xerces-1.2 440 71 (16.1%) 48 Group Dataset # of instances # of metrics Granularity All Buggy (%) ReLink Apache 194 98 (50.5%) 26 FileSafe 56 22 (39.3%) ZXing 399 118 (29.6%) NASA cm1 327 42 (12.8%) 37 Function mw1 253 27 (10.7%) pc1 705 61 (8.7%) pc3 1077 134 (12.4%) pc4 1458 178 (12.2%) SOFTLA B ar1 121 9 (7.4%) 29 Function ar3 63 8 (12.7%) ar4 107 20 (18.7%) ar5 36 8 (22.2%) ar6 101 15 (14.9%) 600 prediction combinations in total!
  • 49. Experimental Settings • Logistic Regression • HDP vs. WPDP, CPDP-CM, and CPDP-IFS 49 Test set (50%) Training set (50%) Project 1 Project 2 Project n ... ... X 1000 Project 1 Project 2 Project n ... ... CPDP-CM CPDP-IFS HDP WPDP Project A
  • 50. Evaluation Measures • False Positive Rate = FP/(TN+FP) • True Positive Rate = Recall • AUC (Area Under receiver operating characteristic Curve) 50 False Positive rate TruePositiverate 0 1 1
  • 51. Evaluation Measures • Win/Tie/Loss (Valentini@ICML`03, Li@JASE`12, Kocaguneli@TSE`13) – Wilcoxon signed-rank test (p<0.05) for 1000 prediction results – Win • # of outperforming HDP prediction combinations with statistical significance. (p<0.05) – Tie • # of HDP prediction combinations with no statistical significance. (p≥0.05) – Loss • # of outperforming baseline prediction results with statistical significance. (p>0.05) 51
  • 53. Prediction Results in median AUC Target WPDP CPDP- CM CPDP- IFS HDPKS (cutoff =0.05) EQ 0.583 0.776 0.461 0.783 JDT 0.795 0.781 0.543 0.767 MC 0.575 0.636 0.584 0.655 ML 0.734 0.651 0.557 0.692* PDE 0.684 0.682 0.566 0.717 ant-1.3 0.670 0.611 0.500 0.701 arc 0.670 0.611 0.523 0.701 camel-1.0 0.550 0.590 0.500 0.639 poi-1.5 0.707 0.676 0.606 0.537 redaktor 0.744 0.500 0.500 0.537 skarbonka 0.569 0.736 0.528 0.694* tomcat 0.778 0.746 0.640 0.818 velocity- 1.4 0.725 0.609 0.500 0.391 xalan-2.4 0.755 0.658 0.499 0.751 xerces-1.2 0.624 0.453 0.500 0.489 53 Target WPDP CPDP- CM CPDP- IFS HDPKS (cutoff =0.05) Apache 0.714 0.689 0.635 0.717* Safe 0.706 0.749 0.616 0.818* ZXing 0.605 0.619 0.530 0.650* cm1 0.653 0.622 0.551 0.717* mw1 0.612 0.584 0.614 0.727 pc1 0.787 0.675 0.564 0.752* pc3 0.794 0.665 0.500 0.738* pc4 0.900 0.773 0.589 0.682* ar1 0.582 0.464 0.500 0.734* ar3 0.574 0.862 0.682 0.823* ar4 0.657 0.588 0.575 0.816* ar5 0.804 0.875 0.585 0.911* ar6 0.654 0.611 0.527 0.640 All 0.657 0.636 0.555 0.724* HDPKS: Heterogeneous defect prediction using KSAnalyzer
  • 54. Win/Tie/Loss Results Target Against WPDP Against CPDP-CM Against CPDP-IFS W T L W T L W T L EQ 4 0 0 2 2 0 4 0 0 JDT 0 0 5 3 0 2 5 0 0 LC 6 0 1 3 3 1 3 1 3 ML 0 0 6 4 2 0 6 0 0 PDE 3 0 2 2 0 3 5 0 0 ant-1.3 6 0 1 6 0 1 5 0 2 arc 3 1 0 3 0 1 4 0 0 camel-1.0 3 0 2 3 0 2 4 0 1 poi-1.5 2 0 2 3 0 1 2 0 2 redaktor 0 0 4 2 0 2 3 0 1 skarbonka 11 0 0 4 0 7 9 0 2 tomcat 2 0 0 1 1 0 2 0 0 velocity- 1.4 0 0 3 0 0 3 0 0 3 xalan-2.4 0 0 1 1 0 0 1 0 0 xerces-1.2 0 0 3 3 0 0 1 0 2 54 Target Against WPDP Against CPDP-CM Against CPDP-IFS W T L W T L W T L Apach e 6 0 5 8 1 2 9 0 2 Safe 14 0 3 12 0 5 15 0 2 ZXing 8 0 0 6 0 2 7 0 1 cm1 7 1 2 8 0 2 9 0 1 mw1 5 0 1 4 0 2 4 0 2 pc1 1 0 5 5 0 1 6 0 0 pc3 0 0 7 7 0 0 7 0 0 pc4 0 0 7 2 0 5 7 0 0 ar1 14 0 1 14 0 1 11 0 4 ar3 15 0 0 5 0 10 10 2 3 ar4 16 0 0 14 1 1 15 0 1 ar5 14 0 4 14 0 4 16 0 2 ar6 7 1 7 8 4 3 12 0 3 Total 147 3 72 147 14 61 182 3 35 % 66.2 % 1.4% 32.4 % 66.2 % 6.3% 27.5 % 82.0 % 1.3% 16.7 %
  • 55. Matched Metrics (Win) 55 MetricValues Distribution (Source metric: RFC-the number of method invoked by a class, Target metric: the number of operand Matching Score = 0.91 AUC = 0.946 (ant1.3  ar5)
  • 56. Matched Metrics (Loss) 56 MetricValues Distribution (Source metric: LOC, Target metric: average number of LOC in a method) Matching Score = 0.13 AUC = 0.391 (Safe  velocity-1.4)
  • 57. Different Feature Selections (median AUCs, Win/Tie/Loss) 57 Approach Against WPDP Against CPDP-CM Against CPDP-IFS HDP AUC Win% AUC Win% AUC Win% AUC Gain Ratio 0.657 63.7% 0.645 63.2% 0.536 80.2% 0.720 Chi-Square 0.657 64.7% 0.651 66.4% 0.556 82.3% 0.727 Significanc e 0.657 66.2% 0.636 66.2% 0.553 82.0% 0.724 Relief-F 0.670 57.0% 0.657 63.1% 0.543 80.5% 0.709 None 0.657 47.3% 0.624 50.3% 0.536 66.3% 0.663
  • 58. Results in Different Cutoffs 58 Cutoff Against WPDP Against CPDP-CM Against CPDP-IFS HDP Target Coverage AUC Win% AUC Win% AUC Win% AUC 0.05 0.657 66.2% 0.636 66.2% 0.553 82.4% 0.724* 100% 0.90 0.657 100% 0.761 71.4% 0.624 100% 0.852* 21%
  • 59. 59 Software Defect Prediction on Unlabeled Datasets Sub-problems Proposed Techniques CPDP comparable to WPDP? Transfer Defect Learning (TCA+) CPDP across projects with heterogeneous metric sets? Heterogeneous Defect Prediction (HDP) DP using only unlabeled datasets without human effort? CLAMI
  • 61. Motivation 61 - Loss result of HDP Still difficult to make different distribution similar!
  • 63. How? • Recall the trend of defect prediction metrics – Measures complexity of software and its development process. • e.g. – The number of developers touching a source code file (Bird@FSE`11) – The number of methods in a class (D’Ambroas@ESEJ`12) – The number of operands (Menzies@TSE`08) Higher metric values imply more defect-proneness (Rahman@ICSE`13) 63
  • 64. How? • Recall this trend of defect prediction metrics – Measures complexity of software and its development process. • e.g. – The number of developers touching a source code file (Bird@FSE`11) – The number of methods in a class (D’Ambroas@ESEJ`12) – The number of operands (Menzies@TSE`08) Higher metric values imply more defect-proneness (Rahman@ICSE`13) 64 (1) Label instances that have higher metric values as buggy! (2) Generate a training set by removing metrics and instances that violates (1).
  • 65. CLAMI Approach Overview 65 Unlabeled Dataset (1) Clustering (2) LAbeling (3) Metric Selection (4) Instance Selection (5) Metric Selection CLAMI Model Build Predict Training dataset Test dataset
  • 66. CLAMI Approach - Clustering and Labeling Clusters - 66 Cluster, K=3 Unlabeled Dataset X1 X2 X3 X4 X5 X6 X7 Label 3 1 3 0 5 1 9 ? 1 1 2 0 7 3 8 ? 2 3 2 5 5 2 1 ? 0 0 8 1 0 1 9 ? 1 0 2 5 6 10 8 ? 1 4 1 1 7 1 1 ? 1 0 1 0 0 1 7 ? 1 1 2 1 5 1 8Median Inst. A Inst. B Inst. C Inst. D Inst. E Inst. F inst. G Instance s K = the number of higher metric values that are greater than Median. C Cluster, K=4 A, E B, D, F Cluster, K=2 G Cluster, K=0 (1) Clustering (2) Labeling Clusters Higher values : buggy clusters : clean clusters
  • 67. CLAMI Approach - Metric Selection - 67 {X1,X4} X1 X2 X3 X4 X5 X6 X7 Label 3 1 3 0 5 1 9 Buggy 1 1 2 0 7 3 8 Clean 2 3 2 5 5 2 1 Buggy 0 0 8 1 0 1 9 Clean 1 0 2 5 6 10 8 Buggy 1 4 1 1 7 1 1 Clean 1 0 1 0 0 1 7 Clean Inst. A Inst. B Inst. C Inst. D Inst. E Inst. F Inst. G 1 3 3 1 4 2 3 # of Violations Selected Metrics Violation: a metric value that does not follow its label! Higher values are bold-facedViolations
  • 68. CLAMI Approach - Instance Selection - 68 X1 X4 Label 3 0 Buggy 1 0 Clean 2 5 Buggy 0 1 Clean 1 5 Buggy 1 1 Clean 1 0 Clean Inst. A Inst. B Inst. C Inst. D Inst. E Inst. F Inst. G X1 X4 Label 1 0 Clean 2 5 Buggy 0 1 Clean 1 1 Clean 1 0 Clean Inst. B Inst. C Inst. D Inst. F Inst. G Final Training Dataset
  • 69. CLAMI Approach Overview 69 Unlabeled Dataset (1) Clustering (2) LAbeling (3) Metric Selection (4) Instance Selection (5) Metric Selection CLAMI Model Build Predict Training dataset Test dataset
  • 71. Baselines • Supervised learning model (i.e. WPDP) • Defect prediction only using unlabeled datasets – Expert-based (Zhong@HASE`04) • Cluster instances by K-Mean into 20 clusters • A human expert labels each cluster – Threshold-based (Catal@ITNG`09) • [LoC, CC, UOP, UOpnd, TOp, TOpnd] = [65, 10, 25, 40, 125, 70] – Label an instance whose any metric value is greater than a threshold value • Manual effort requires to decide threshold values in advance. 71
  • 72. Research Questions (RQs) • RQ1 – CLAMI vs. Supervised learning model? • RQ2 – CLAMI vs. Expert-/threshold-based approaches? (Zhong@HASE`04, Catal@ITNG`09) 72
  • 73. Benchmark Datasets Group Dataset # of instnaces # of metrics Prediction GranularityAll Buggy (%) NetGene Httpclient 361 205 (56.8%) 465 (Network, Change genealogy) File Jackrabbit 542 225 (41.5%) Lucene 1671 346 (10.7%) Rhino 253 109 (43.1%) ReLink Apache 194 98 (50.5%) 26 (code complexity) File Safe 56 22 (39.29%) ZXing 399 118 (29.6%) 73
  • 74. Experimental Settings (RQ1) - Supervised learning model - 74 Test set (50%) Training set (50%) Supervised Model (Baseline) Training Predict X 1000 CLAMI Model Training Predict
  • 75. Experimental Settings (RQ2) -Comparison to existing approaches - 75 Unlabeled Dataset CLAMI Model Predict Training Predict Threshold- Based (Baseline1, Catal@ITNG`09) Expert- Based (Baseline2, Zhong@HASE`04)
  • 78. Supervised model vs. CLAMI Dataset F-measure AUC Supervise d (w/ labels) CLAMI (w/o labels) +/-% Supervise d (w/ labels) CLAMI (w/o labels) +/-% Httpclient 0.729 0.722 -1.0% 0.727 0.772 +6.2% Jackrabbi t 0.649 0.685 +5.5% 0.727 0.751 +3.2% Lucene 0.508 0.397 -21.8% 0.708 0.595 -15.9% Rhino 0.639 0.752 +17.7% 0.702 0.777 +10.7% Apache 0.653 0.720 +10.2% 0.714 0.753 +5.3% Safe 0.615 0.667 +8.3% 0.706 0.773 +9.5% ZXing 0.333 0.497 +49.0% 0.605 0.644 +6.4% Median 0.639 0.685 +7.2% 0.707 0.753 +6.3% 78
  • 79. Existing approaches vs. CLAMI f-measure Dataset Threshold-based Expert-based CLAMI Httpclient 0.355 0.811 0.756 Jackrabbit 0.184 0.676 0.685 Lucene 0.144 0.000 0.404 Rhino 0.190 0.707 0.731 Apache 0.547 0.701 0.725 Safe 0.308 0.718 0.694 ZXing 0.228 0.402 0.505 Median 0.228 0.701 0.694 79
  • 80. Distributions of metrics (Safe) 80 Most frequently selected metrics by CLAMI Metrics with less discriminative power
  • 81. Distributions of metrics (Lucene) 81 Most frequently selected metrics by CLAMI Metrics with less discriminative power
  • 82. 82 Software Defect Prediction on Unlabeled Datasets Sub-problems Proposed Techniques CPDP comparable to WPDP? Transfer Defect Learning (TCA+) CPDP across projects with heterogeneous metric sets? Heterogeneous Defect Prediction (HDP) DP using only unlabeled datasets without human effort? CLAMI
  • 83. Conclusion 83 Sub-problems Technique 1: TCA+ Technique 2: HDP Technique 3: CLAMI Comparable prediction performance than WPDP O (in f-measure) O (in AUC) O Able to handle heterogeneous metric sets X O O Automated without human effort O O O
  • 84. Publications at HKUST • Defect Prediction – Micro Interaction Metrics for Defect Prediction@FSE`11, Taek Lee, Jaechang Nam, Donggyun Han, Sunghun Kim and Hoh Peter In – Transfer Defect Learning@ICSE`13, Jaechang Nam, Sinno Jialin Pan and Sunghun Kim, Nominee, ACM SIGSOFT Distinguished Paper Award – Heterogeneous Defect Prediction@FSE`15, Jaechang Nam ann Sunghun Kim – REMI: Defect Prediction for Efficient API Testing@FSE`15, Mijung Kim, Jaechang Nam, Jaehyuk Yeon, Soonhwang Choi, and Sunghun Kim, Industrial Track – CLAMI: Defect Prediction on Unlabeled Datasets@ASE`15, Jaechang Nam and Sunghun Kim • Testing – Calibrated Mutation Testing@MUTATION`12, Jaechang Nam, David Schuler, and Andreas Zeller • Automated bug-fixing – Automatic Patch Generation Learned from Human-written Patches@ICSE`13, Dongsun Kim, Jaechang Nam, Jaewoo Song and Sunghun Kim, ACM SIGSOFT Distinguished Paper Award Winner 84

Editor's Notes

  1. Good afternoon, everyone! I’m JC. Thanks for coming to my PhD defence. The title of my thesis is Software Defect Prediction on Unlabeled Datasets.
  2. General Question of software defect prediction is: Can we identify defect-prone software entities in advance? For example, by using defect prediction technique, we can predict whether a source code file is buggy or clean. After predicting defect-prone software entities, software quality assurance teams can effectively allocate limited resources for software testing and code review to develop reliable software product.
  3. Here is Project A and some software entities. Let say these entities are source code files. I want to predict whether these files are buggy or clean. To do this, we need a prediction model. Since defect prediction models are trained by machine learning algorithms, we need labeled instances collected from previous releases. This is an labeled instance. An instance consists of features and labels. Various software metrics such as LoC, # of functions in a file, and # of authors touching a source file, are used as features for machine learning. Software metrics measure complexity of software and its development process Each instance can be labeled by past bug information. Software metrics and past bug information can be collected from software archives such as version control systems and bug report systems. With these labeled instances, we can build a prediction model and predict the unlabeled instances. This prediction is conducted within the same project. So, we call this Within-project defect prediction (WPDP). There are many studies about WPDP and showed good prediction performance. ( like prediction accuracy is 0.7.)
  4. What if there are no labeled instances. This can happen in new projects and projects lacking in historical data. New projects do not have past bug information to label instances. Some projects also does not have bug information because of lacking in historical data from software archives. When I participated in an industrial project for Samsung electronics, it was really difficult to generate labeled instances because their software archives are not well managed by developers. So, in some real industrial projects, we may not generate labeled instances to build a prediction model. Without labeled instances, we can not build a prediction model. After experiencing this limitation form the industry, I decided to address this problem.
  5. We define this problem as Software Defect Prediction on Unlabeled Datasets.
  6. There are existing solutions to build a prediction model for unlabeled datasets. The first solution is cross-project defect prediction. We can reuse labeled instances from other projects.
  7. Normalization gives all data values in the same scale. For example, we can make mean value of data set as 0 and standard deviation as 1. Normalization is also known to be helpful for classification algorithm. As many defect prediction models classify source code as buggy or clean. It is a classification problem. So we applied normalization for all training and test data sets.
  8. Based on these normalization techniques, we defined several normalization options for defect prediction data sets. NI is min-max normalization which makes maximum and minimum value as 1 and 0 respectively. N2 is z-score normalization which makes mean and standard deviation as 0 and 1 respectively. We assume that some data sets may not have enough statistical information. So we defined variations of z-score normalization. To normalize both source and target data sets, N3 is only using mean and standard deviation from source data (when target data does not have enough statistical information. For example, lack of instances in a data set. N4 is only using target information for normalizing both source and target data sets.
  9. TCA+ provides decision rules to select suitable normalization option. For the decision rules, we first characterize both source and target data sets to identify their difference. In the second step, we measure similarity between source and target data sets. With degree of similarity, we created decision rules!
  10. Then, how could we characterize data set? Here are two data sets. Intuitively, Data set A’s distribution is more sparser than data set B. To quantify this difference, we compute Euclidean distance of all pairs of instances in each data set. We defined DIST set for distances of all pairs. Likewise, we can get DIST set from Data set B.
  11. Then, how could we characterize data set? Here are two data sets. Intuitively, Data set A’s distribution is more sparser than data set B. To quantify this difference, we compute Euclidean distance of all pairs of instances in each data set. We defined DIST set for distances of all pairs. Likewise, we can get DIST set from Data set B.
  12. These are decision rules. If mean and std is same, we assume that distributions between source and target is same. So we applied no normalization. For Rule2, if max and min values are different, we used N1(min-max normalization) for Rule3 and 4, we considered std and # of instances. If target information is not enough, then we used source mean and std to normalize both datasets. In case of Rule 5, if there are no rules are applicable, we applied N2 option, which make mean and std as 0 and 1 respectively.
  13. Here is an example showing how PCA and TCA works. In two-dimensional space, there are source and target data sets and we can see distributions are clearly different. If we apply PCA and TCA , and then we can get the following results in one-dimensional space.
  14. Probability density function Probability mass function In PCA, instances are projected into one dimensional space, however, distribution between source and target are still different. In TCA, all instances are also projected in one-dimensional space, where distribution between source and target is similar. Positive and negative instance of both training and test domains have discriminative power as shown in this figure. You can check detailed equations about this algorithm in this paper [add labels]
  15. 8 software subjects ReLink (Wu et al.@FSE`11): 3 subjects 26 source code metrics (features) Apache / OpenIntent Safe / ZXing Manually inspected defect data (Golden set) AEEEM (D’Ambros et al.@MSR`10): 5 subjects 61 metrics (source code, churn, entropy metrics) Apache Lucene (LC) / Equinox (EQ) / Eclipse JDT / Eclipse PDE UI / Mylyn (ML) Machine learning algorithms Logistic regression
  16. We report within-project prediction results. In Within prediction settings, we used 50:50 random splits, which is widely used in several literatures. We repeated 50:50 random splits 100 times
  17. Wilcoxon-matched paired test
  18. Wilcoxon-matched paired test
  19. Wilcoxon-matched paired test
  20. Various feature selection approaches can be applied
  21. AEEEM: object- oriented (OO) metrics, previous-defect metrics, entropy met- rics of change and code, and churn-of-source-code metrics [4]. MORPH: McCabe’s cyclomatic metrics, CK metrics, and other OO metrics [36]. ReLink: code complexity metrics NASA: Halstead metrics and McCabe’s cyclomatic metrics, additional complexity metrics such as parameter count and percentage of comments SOFTLAB: Halstead metrics and McCabe’s cyclomatic metrics
  22. Clustering: group instances that have higher metric values Labeling: label groups that have higher metrics values as buggy Metric and Instance selection: select more informative metrics and instances
  23. Clustering: group instances that have higher metric values Labeling: label groups that have higher metrics values as buggy Metric and Instance selection: select more informative metrics and instances
  24. Manual effort to decide threshold Literature Tuning machine: using known bugs, decide threshold values that minimize prediction error. Analysis of multiple releases
  25. In case of lucene, all clusters are labeled as clean by expert. better results are bold-faced. (not a statistical testing. experiment conducted once)