1. Transfer Defect Learning
Jaechang Nam
The Hong Kong University of Science and Technology, China
Sinno Jialian Pan
Institute for Infocomm Research, Singapore
Sunghun Kim
The Hong Kong University of Science and Technology, China
2. Defect Prediction
• Hassan et al.@ICSE`09, Predicting Faults Using the Complexity of Code
Changes
• D’Ambros et al.@MSR`10, An Extensive Comparison of Bug Prediction
Approaches
• Rahman et al.@ICSE`12, Recalling the Impression of Cross-Project Defect
Prediction
• Hata et al.@ICSE`12, Bug Prediction based on Fine-grained Module
histories
• …
2
Program Prediction Model
(Machine learning)
Future defects
6. Cross-project Defect Prediction
5
“Training data is often not available, either
because a company is too small or it is the first
release of a product”
Zimmerman et al.@FSE`09, Cross-project Defect Prediction
7. Cross-project Defect Prediction
5
“Training data is often not available, either
because a company is too small or it is the first
release of a product”
Zimmerman et al.@FSE`09, Cross-project Defect Prediction
“For many new projects we may not have enough
historical data to train prediction models.”
Rahman, Posnett, and Devanbu @ICSE`12, Recalling the
“Imprecision” of Cross-project Defect Prediction
8. Cross-project defect prediction
• Zimmerman et al.@FSE`09
– “We ran 622 cross-project predictions and found
only 3.4% actually worked.”
6
Worked,
3.4%
Not
worked,
96.6%
9. Cross-company defect prediction
• Turhan and Menzies et al.@ESEJ`09
– “Within-company data models are still the best”
7
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Cross Cross with a NN
filter
Within
Avg. F-measure
13. 11
• Data preprocessing for training and test dataNormalization
• A state-of-the art transfer learning algorithm
• Transfer Component AnalysisTCA
• Adapted TCA for cross-project defect prediction
• Decision rules to select a suitable data normalization optionTCA+
Approaches of Transfer Defect Learning
14. Data Normalization
• Adjust all feature values in the same scale
– E.g., Make Mean = 0 and Std = 1
• Known to be helpful for classification
algorithms to improve prediction
performance [Han et al. 2012].
12
15. Normalization Options
• N1: Min-max Normalization (max=1, min=0)
[Han et al., 2012]
• N2: Z-score Normalization (mean=0, std=1)
[Han et al., 2012]
• N3: Z-score Normalization only using source
mean and standard deviation
• N4: Z-score Normalization only using target
mean and standard deviation
13
16. 14
• Data preprocessing for training and test dataNormalization
• A state-of-the art transfer learning algorithm
• Transfer Component Analysis
TCA
• Adapted TCA for cross-project defect prediction
• Decision rules to select a suitable data normalization optionTCA+
Approaches of Transfer Defect Learning
21. A Common Assumption in
Traditional ML
16
Pan andYang@TKDE`10, Survey onTransfer Learning
• Same distribution
22. A Common Assumption in
Traditional ML
16
Pan andYang@TKDE`10, Survey onTransfer Learning
• Same distribution
Cross Prediction
23. A Common Assumption in
Traditional ML
16
Pan andYang@TKDE`10, Survey onTransfer Learning
• Same distribution
Transfer Learning
24. Transfer Component Analysis
• Unsupervised Transfer learning
– Target project labels are not known.
• Must have the same feature space
• Make distribution difference between
training and test datasets similar
17
Pan et al.@TNN`10, Domain Adaptation viaTransfer ComponentAnalysis
25. Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
18
26. Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
18
2-dimensional feature space
27. Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
18
1-dimensional feature space
28. Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
18
1-dimensional feature space
29. Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
18
1-dimensional feature space
2-dimensional feature space
30. Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
– C.f. Principal Component Analysis (PCA)
18
1-dimensional feature space
31. Transfer Component Analysis (cont.)
19
Pan et al.@TNN`10, Domain Adaptation viaTransfer ComponentAnalysis
Target domain data
Source domain data
32. Transfer Component Analysis (cont.)
20
PCA TCA
Pan et al.@TNN`10, Domain Adaptation viaTransfer ComponentAnalysis
33. Preliminary Results using TCA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F-measure
21*Baseline: Cross-project defect prediction without TCA and normalization
Baseline NoN N1 N2 N3 N4 Baseline NoN N1 N2 N3 N4
Safe Apache Apache Safe
34. Preliminary Results using TCA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F-measure
21*Baseline: Cross-project defect prediction without TCA and normalization
Prediction performance of TCA
varies according to different
normalization options!
Baseline NoN N1 N2 N3 N4 Baseline NoN N1 N2 N3 N4
Safe Apache Apache Safe
35. 22
• Data preprocessing for training and test dataNormalization
• A state-of-the art transfer learning algorithm
• Transfer Component Analysis
TCA
• Adapted TCA for cross-project defect prediction
• Decision rules to select a suitable data
normalization option
TCA+
Approaches of Transfer Defect Learning
36. TCA+: Decision rules
• Find a suitable normalization for TCA
• Steps
– #1: Characterize a dataset
– #2: Measure similarity
between source and target datasets
– #3: Decision rules
23
37. #1: Characterize a dataset
24
3
1
…
Dataset A Dataset B
2
4
5
8
9
6
11
d1,2
d1,5
d1,3
d3,11
3
1
…
2
4
5
8
9
6
11
d2,6
d1,2
d1,3
d3,11
DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i < j}
A
38. #2: Measure Similarity between
source and target
• Minimum (min) and maximum (max) values of
DIST
• Mean and standard deviation (std) of DIST
• The number of instances
25
39. #3: Decision Rules
• Rule #1
– Mean and Std are same NoN
• Rule #2
– Max and Min are different N1 (max=1, min=0)
• Rule #3,#4
– Std and # of instances are different
N3 or N4 (src/tgt mean=0, std=1)
• Rule #5
– Default N2 (mean=0, std=1)
26
50. Threats to Validity
• Systems are open-source projects.
• Experimental results may not be
generalizable.
• Decision rules in TCA+ may not be
generalizable.
37
51. Future Work
• Transfer defect learning on different
feature space
– e.g., ReLink AEEEM
AEEEM ReLink
• Local models using Transfer Learning
• Adapt Transfer learning in other Software
Engineering (SE) problems
– e.g., Knowledge from mailing lists
Bug triage problem
38
52. Conclusion
• TCA+
– TCA
• Make distributions of source and target similar
– Decision rules to improve TCA
– Significantly improved cross-project defect
prediction performance
• Transfer Learning in SE
– Transfer learning may benefit other
prediction and recommendation systems in
SE domains.
39