SlideShare a Scribd company logo
Heterogeneous Defect
Prediction
ESEC/FSE 2015
September 3, 2015
Jaechang Nam and Sunghun Kim
Department of Computer Science and Engineering
HKUST
2
Predict
Training
?
?
Model
Project A
: Metric value
: Buggy-labeled instance
: Clean-labeled instance
?: Unlabeled instance
Software Defect Prediction
Related Work
Munson@TSE`92, Basili@TSE`95, Menzies@TSE`07,
Hassan@ICSE`09, Bird@FSE`11,D’ambros@EMSE112
Lee@FSE`11,...
What if labeled instances do not
exist?
3
?
?
?
?
?
Project X
Unlabeled
Dataset
?: Unlabeled instance
: Metric value
Model
New projects
Projects lacking in
historical data
Existing Solutions?
4
?
?
?
?
?
(New) Project X
Unlabeled
Dataset
?: Unlabeled instance
: Metric value
Cross-Project Defect Prediction
(CPDP)
5
?
?
?
?
?
Training
Predict
Model
Project A
(source)
Project X
(target)
Unlabeled
Dataset
: Metric value
: Buggy-labeled instance
: Clean-labeled instance
?: Unlabeled instance
Related Work
Watanabe@PROMISE`08, Turhan@EMSE`09
Zimmermann@FSE`09, Ma@IST`12, Zhang@MSR`14
Zhang@MSR`14, Panichella@WCRE`14,
Canfora@STVR15
Challenge
Same metric set
(same feature space)
• Heterogeneous
metrics between
source and target
Motivation
6
?
Training
Test
Model
Project A
(source)
Project C
(target)
?
?
?
?
?
?
?
Heterogeneous metric sets
(different feature spaces
or different domains)
Possible to Reuse all the existing defect datasets for CPDP!
Heterogeneous Defect Prediction (HDP)
Key Idea
• Consistent defect-proneness tendency of
metrics
– Defect prediction metrics measure complexity of
software and its development process.
• e.g.
– The number of developers touching a source code file
(Bird@FSE`11)
– The number of methods in a class (D’Ambroas@ESEJ`12)
– The number of operands (Menzies@TSE`08)
More complexity implies more defect-proneness
(Rahman@ICSE`13)
• Distributions between source and target should
be the same to build a strong prediction model.
7
Match source and target metrics that
have similar distribution
Heterogeneous Defect Prediction (HDP)
- Overview -
8
X1 X2 X3 X4 Label
1 1 3 10 Buggy
8 0 1 0 Clean
⋮ ⋮ ⋮ ⋮ ⋮
9 0 1 1 Clean
Metric
Matching
Source: Project A Target: Project B
Cross-
prediction Model
Build
(training)
Predict
(test)
Metric
Selection
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label
3 1 1 0 2 1 9 ?
1 1 9 0 2 3 8 ?
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
0 1 1 1 2 1 1 ?
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
9 1 1 ?
8 3 9 ?
⋮ ⋮ ⋮ ⋮
1 1 1 ?
Metric Selection
• Why? (Guyon@JMLR`03)
– Select informative metrics
• Remove redundant and irrelevant metrics
– Decrease complexity of metric matching combination
• Feature Selection Approaches (Gao@SPE`11,Shivaji@TSE`13)
– Gain Ratio
– Chi-square
– Relief-F
– Significance attribute evaluation
9
Metric Matching
10
Source Metrics Target Metrics
X1
X2
Y1
Y2
0.8
0.5
* We can apply different cutoff values of matching score
* It can be possible that there is no matching at all.
Compute Matching Score
KSAnalyzer
• Use p-value of Kolmogorov-Smirnov Test
(Massey@JASA`51)
11
Matching Score M of i-th source and j-th target metrics:
Mij = pij
Heterogeneous Defect Prediction
- Overview -
12
X1 X2 X3 X4 Label
1 1 3 10 Buggy
8 0 1 0 Clean
⋮ ⋮ ⋮ ⋮ ⋮
9 0 1 1 Clean
Metric
Matching
Source: Project A Target: Project B
Cross-
prediction Model
Build
(training)
Predict
(test)
Metric
Selection
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label
3 1 1 0 2 1 9 ?
1 1 9 0 2 3 8 ?
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
0 1 1 1 2 1 1 ?
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
1 3 10 Buggy
8 1 0 Clean
⋮ ⋮ ⋮ ⋮
9 1 1 Clean
9 1 1 ?
8 3 9 ?
⋮ ⋮ ⋮ ⋮
1 1 1 ?
EVALUATION
13
Baselines
• WPDP
• CPDP-CM (Turhan@EMSE`09,Ma@IST`12,He@IST`14)
– Cross-project defect prediction using only
common metrics between source and target
datasets
• CPDP-IFS (He@CoRR`14)
– Cross-project defect prediction on
Imbalanced Feature Set (i.e. heterogeneous
metric set)
– 16 distributional characteristics of values of
an instance as features (e.g., mean, std,
maximum,...)
14
Research Questions (RQs)
• RQ1
– Is heterogeneous defect prediction comparable
to WPDP?
• RQ2
– Is heterogeneous defect prediction comparable
to CPDP-CM?
• RQ3
– Is Heterogeneous defect prediction comparable
to CPDP-IFS?
15
Benchmark Datasets
Group Dataset
# of instances # of
metrics
Granularity
All Buggy (%)
AEEEM
EQ 325 129 (39.7%)
61 Class
JDT 997 206 (20.7%)
LC 399 64 (9.36%)
ML 1862 245 (13.2%)
PDE 1492 209 (14.0%)
MORP
H
ant-1.3 125 20 (16.0%)
20 Class
arc 234 27 (11.5%)
camel-1.0 339 13 (3.8%)
poi-1.5 237 141 (75.0%)
redaktor 176 27 (15.3%)
skarbonka 45 9 (20.0%)
tomcat 858 77 (9.0%)
velocity-1.4 196 147 (75.0%)
xalan-2.4 723 110 (15.2%)
xerces-1.2 440 71 (16.1%)
16
Group Dataset
# of instances # of
metrics
Granularity
All Buggy (%)
ReLink
Apache 194 98 (50.5%)
26 FileSafe 56 22 (39.3%)
ZXing 399
118
(29.6%)
NASA
cm1 327 42 (12.8%)
37 Function
mw1 253 27 (10.7%)
pc1 705 61 (8.7%)
pc3 1077
134
(12.4%)
pc4 1458
178
(12.2%)
SOFTLA
B
ar1 121 9 (7.4%)
29 Function
ar3 63 8 (12.7%)
ar4 107 20 (18.7%)
ar5 36 8 (22.2%)
ar6 101 15 (14.9%)
600 prediction combinations in total!
Experimental Settings
• Logistic Regression
• HDP vs. WPDP, CPDP-CM, and CPDP-IFS
17
Test set
(50%)
Training set
(50%)
Project
1
Project
2
Project
n
...
...
X 1000
Project
1
Project
2
Project
n
...
...
CPDP-CM
CPDP-IFS
HDP
WPDP
Project A
Evaluation Measures
• False Positive Rate = FP/(TN+FP)
• True Positive Rate = Recall
• AUC (Area Under receiver operating characteristic Curve)
18
False Positive rate
TruePositiverate
0
1
1
Evaluation Measures
• Win/Tie/Loss (Valentini@ICML`03, Li@JASE`12, Kocaguneli@TSE`13)
– Wilcoxon signed-rank test (p<0.05) for 1000
prediction results
– Win
• # of outperforming HDP prediction combinations with
statistical significance. (p<0.05)
– Tie
• # of HDP prediction combinations with no statistical
significance. (p≥0.05)
– Loss
• # of outperforming baseline prediction results with
statistical significance. (p>0.05)
19
RESULT
20
Prediction Results in median
AUC
Target WPDP
CPDP-
CM
CPDP-
IFS
HDPKS
(cutoff
=0.05)
EQ 0.583 0.776 0.461 0.783
JDT 0.795 0.781 0.543 0.767
MC 0.575 0.636 0.584 0.655
ML 0.734 0.651 0.557 0.692*
PDE 0.684 0.682 0.566 0.717
ant-1.3 0.670 0.611 0.500 0.701
arc 0.670 0.611 0.523 0.701
camel-1.0 0.550 0.590 0.500 0.639
poi-1.5 0.707 0.676 0.606 0.537
redaktor 0.744 0.500 0.500 0.537
skarbonka 0.569 0.736 0.528 0.694*
tomcat 0.778 0.746 0.640 0.818
velocity-
1.4
0.725 0.609 0.500 0.391
xalan-2.4 0.755 0.658 0.499 0.751
xerces-1.2 0.624 0.453 0.500 0.489
21
Target WPDP
CPDP-
CM
CPDP-
IFS
HDPKS
(cutoff
=0.05)
Apache 0.714 0.689 0.635 0.717*
Safe 0.706 0.749 0.616 0.818*
ZXing 0.605 0.619 0.530 0.650*
cm1 0.653 0.622 0.551 0.717*
mw1 0.612 0.584 0.614 0.727
pc1 0.787 0.675 0.564 0.752*
pc3 0.794 0.665 0.500 0.738*
pc4 0.900 0.773 0.589 0.682*
ar1 0.582 0.464 0.500 0.734*
ar3 0.574 0.862 0.682 0.823*
ar4 0.657 0.588 0.575 0.816*
ar5 0.804 0.875 0.585 0.911*
ar6 0.654 0.611 0.527 0.640
All 0.657 0.636 0.555 0.724*
HDPKS: Heterogeneous defect prediction using KSAnalyzer
Win/Tie/Loss Results
Target
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
W T L W T L W T L
EQ 4 0 0 2 2 0 4 0 0
JDT 0 0 5 3 0 2 5 0 0
LC 6 0 1 3 3 1 3 1 3
ML 0 0 6 4 2 0 6 0 0
PDE 3 0 2 2 0 3 5 0 0
ant-1.3 6 0 1 6 0 1 5 0 2
arc 3 1 0 3 0 1 4 0 0
camel-1.0 3 0 2 3 0 2 4 0 1
poi-1.5 2 0 2 3 0 1 2 0 2
redaktor 0 0 4 2 0 2 3 0 1
skarbonka 11 0 0 4 0 7 9 0 2
tomcat 2 0 0 1 1 0 2 0 0
velocity-
1.4
0 0 3 0 0 3 0 0 3
xalan-2.4 0 0 1 1 0 0 1 0 0
xerces-1.2 0 0 3 3 0 0 1 0 2 22
Target
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
W T L W T L W T L
Apach
e
6 0 5 8 1 2 9 0 2
Safe 14 0 3 12 0 5 15 0 2
ZXing 8 0 0 6 0 2 7 0 1
cm1 7 1 2 8 0 2 9 0 1
mw1 5 0 1 4 0 2 4 0 2
pc1 1 0 5 5 0 1 6 0 0
pc3 0 0 7 7 0 0 7 0 0
pc4 0 0 7 2 0 5 7 0 0
ar1 14 0 1 14 0 1 11 0 4
ar3 15 0 0 5 0 10 10 2 3
ar4 16 0 0 14 1 1 15 0 1
ar5 14 0 4 14 0 4 16 0 2
ar6 7 1 7 8 4 3 12 0 3
Total 147 3 72 147 14 61 182 3 35
%
66.2
%
1.4%
32.4
%
66.2
%
6.3%
27.5
%
82.0
%
1.3%
16.7
%
Matched Metrics (Win)
23
MetricValues
Distribution
(Source metric: RFC-the number of method invoked by a class, Target metric: the number of operand
Matching Score = 0.91
AUC = 0.946 (ant1.3  ar5)
Matched Metrics (Loss)
24
MetricValues
Distribution
(Source metric: LOC, Target metric: average number of LOC in a method)
Matching Score = 0.13
AUC = 0.391 (Safe  velocity-1.4)
Different Feature Selections
(median AUCs, Win/Tie/Loss)
25
Approach
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
HDP
AUC Win% AUC Win% AUC Win% AUC
Gain Ratio 0.657 63.7% 0.645 63.2% 0.536 80.2% 0.720
Chi-Square 0.657 64.7% 0.651 66.4% 0.556 82.3% 0.727
Significanc
e
0.657 66.2% 0.636 66.2% 0.553 82.0% 0.724
Relief-F 0.670 57.0% 0.657 63.1% 0.543 80.5% 0.709
None 0.657 47.3% 0.624 50.3% 0.536 66.3% 0.663
Results in Different Cutoffs
26
Cutoff
Against
WPDP
Against
CPDP-CM
Against
CPDP-IFS
HDP Target
Coverage
AUC Win% AUC Win% AUC Win% AUC
0.05 0.657 66.2% 0.636 66.2% 0.553 82.4% 0.724* 100%
0.90 0.657 100% 0.761 71.4% 0.624 100% 0.852* 21%
Conclusion
• HDP
– Potential for CPDP across datasets with
different metric sets.
• Future work
– Filtering out noisy metric matching
– Determine the best probability threshold
27
Q&A
THANK YOU!
28

More Related Content

What's hot

Version spaces
Version spacesVersion spaces
Version spacesGekkietje
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
Sudip Singh
 
Online Shopping Agent in AI
Online Shopping Agent in AIOnline Shopping Agent in AI
Online Shopping Agent in AI
Fazle Rabbi Ador
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
AnastasiaSteele10
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
Sarah Guido
 
Ai lecture 14(unit03)
Ai lecture  14(unit03)Ai lecture  14(unit03)
Ai lecture 14(unit03)
vikas dhakane
 
Machine learning
Machine learningMachine learning
Machine learning
omaraldabash
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
Hemantha Kulathilake
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
AkshayaNagarajan10
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Alexander Hendorf
 
AI_2 State Space Search
AI_2 State Space SearchAI_2 State Space Search
AI_2 State Space Search
Khushali Kathiriya
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Knoldus Inc.
 
Predictive Analytics Using R | Edureka
Predictive Analytics Using R | EdurekaPredictive Analytics Using R | Edureka
Predictive Analytics Using R | Edureka
Edureka!
 
Chapter 1 - Introduction
Chapter 1 - IntroductionChapter 1 - Introduction
Chapter 1 - Introduction
Charles Deledalle
 
Resolution,forward backward chaining
Resolution,forward backward chainingResolution,forward backward chaining
Resolution,forward backward chaining
Ann Rose
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
Harry Potter
 
Evaluating LLM Models for Production Systems Methods and Practices -
Evaluating LLM Models for Production Systems Methods and Practices -Evaluating LLM Models for Production Systems Methods and Practices -
Evaluating LLM Models for Production Systems Methods and Practices -
alopatenko
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
Tunde Ajose-Ismail
 

What's hot (20)

Version spaces
Version spacesVersion spaces
Version spaces
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
 
Online Shopping Agent in AI
Online Shopping Agent in AIOnline Shopping Agent in AI
Online Shopping Agent in AI
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Ai lecture 14(unit03)
Ai lecture  14(unit03)Ai lecture  14(unit03)
Ai lecture 14(unit03)
 
Machine learning
Machine learningMachine learning
Machine learning
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
 
AI_2 State Space Search
AI_2 State Space SearchAI_2 State Space Search
AI_2 State Space Search
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Predictive Analytics Using R | Edureka
Predictive Analytics Using R | EdurekaPredictive Analytics Using R | Edureka
Predictive Analytics Using R | Edureka
 
Chapter 1 - Introduction
Chapter 1 - IntroductionChapter 1 - Introduction
Chapter 1 - Introduction
 
Resolution,forward backward chaining
Resolution,forward backward chainingResolution,forward backward chaining
Resolution,forward backward chaining
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
 
Evaluating LLM Models for Production Systems Methods and Practices -
Evaluating LLM Models for Production Systems Methods and Practices -Evaluating LLM Models for Production Systems Methods and Practices -
Evaluating LLM Models for Production Systems Methods and Practices -
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 

Viewers also liked

A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
Sung Kim
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
Sung Kim
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Sung Kim
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
Sung Kim
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
Sung Kim
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
Sung Kim
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Sung Kim
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Sung Kim
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
Sung Kim
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
Sung Kim
 
Tensor board
Tensor boardTensor board
Tensor board
Sung Kim
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
Sung Kim
 
Time series classification
Time series classificationTime series classification
Time series classification
Sung Kim
 

Viewers also liked (13)

A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
Tensor board
Tensor boardTensor board
Tensor board
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Time series classification
Time series classificationTime series classification
Time series classification
 

Similar to Heterogeneous Defect Prediction (

ESEC/FSE 2015)

A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
Martin Pinzger
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Artificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosiesArtificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosies
FoCAS Initiative
 
Podem_Report
Podem_ReportPodem_Report
Podem_Report
Anandhavel Nagendra
 
Towards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model CheckingTowards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model Checking
Akos Hajdu
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
Martin Pinzger
 
Automation of building reliable models
Automation of building reliable modelsAutomation of building reliable models
Automation of building reliable models
Eszter Szabó
 
CodeChecker summary 21062021
CodeChecker summary 21062021CodeChecker summary 21062021
CodeChecker summary 21062021
Olivera Milenkovic
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methodsjoycemi_la
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methodsjoycemi_la
 
AIRS2016
AIRS2016AIRS2016
AIRS2016
Tetsuya Sakai
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
Andrea Gigli
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_Jie
MDO_Lab
 
MuVM: Higher Order Mutation Analysis Virtual Machine for C
MuVM: Higher Order Mutation Analysis Virtual Machine for CMuVM: Higher Order Mutation Analysis Virtual Machine for C
MuVM: Higher Order Mutation Analysis Virtual Machine for C
Susumu Tokumoto
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
University of Huddersfield
 
BLAST and sequence alignment
BLAST and sequence alignmentBLAST and sequence alignment
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsDependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software Bugs
Roberto Natella
 
A multi-sensor based uncut crop edge detection method for head-feeding combin...
A multi-sensor based uncut crop edge detection method for head-feeding combin...A multi-sensor based uncut crop edge detection method for head-feeding combin...
A multi-sensor based uncut crop edge detection method for head-feeding combin...
Institute of Agricultural Machinery, NARO
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdf
asdfasdf214078
 
Human_Activity_Recognition_Predictive_Model
Human_Activity_Recognition_Predictive_ModelHuman_Activity_Recognition_Predictive_Model
Human_Activity_Recognition_Predictive_ModelDavid Ritchie
 

Similar to Heterogeneous Defect Prediction (

ESEC/FSE 2015) (20)

A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Artificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosiesArtificial software diversity: automatic synthesis of program sosies
Artificial software diversity: automatic synthesis of program sosies
 
Podem_Report
Podem_ReportPodem_Report
Podem_Report
 
Towards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model CheckingTowards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model Checking
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
Automation of building reliable models
Automation of building reliable modelsAutomation of building reliable models
Automation of building reliable models
 
CodeChecker summary 21062021
CodeChecker summary 21062021CodeChecker summary 21062021
CodeChecker summary 21062021
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methods
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methods
 
AIRS2016
AIRS2016AIRS2016
AIRS2016
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_Jie
 
MuVM: Higher Order Mutation Analysis Virtual Machine for C
MuVM: Higher Order Mutation Analysis Virtual Machine for CMuVM: Higher Order Mutation Analysis Virtual Machine for C
MuVM: Higher Order Mutation Analysis Virtual Machine for C
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
BLAST and sequence alignment
BLAST and sequence alignmentBLAST and sequence alignment
BLAST and sequence alignment
 
Dependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software BugsDependability Benchmarking by Injecting Software Bugs
Dependability Benchmarking by Injecting Software Bugs
 
A multi-sensor based uncut crop edge detection method for head-feeding combin...
A multi-sensor based uncut crop edge detection method for head-feeding combin...A multi-sensor based uncut crop edge detection method for head-feeding combin...
A multi-sensor based uncut crop edge detection method for head-feeding combin...
 
Deep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdfDeep_Learning__INAF_baroncelli.pdf
Deep_Learning__INAF_baroncelli.pdf
 
Human_Activity_Recognition_Predictive_Model
Human_Activity_Recognition_Predictive_ModelHuman_Activity_Recognition_Predictive_Model
Human_Activity_Recognition_Predictive_Model
 

More from Sung Kim

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
Sung Kim
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
Sung Kim
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
Sung Kim
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
Sung Kim
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
Sung Kim
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
Sung Kim
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash Reproduction
Sung Kim
 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012
Sung Kim
 
Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote
Sung Kim
 
Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)
Sung Kim
 
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Sung Kim
 
Software Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of CrowdsSoftware Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of Crowds
Sung Kim
 
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
Sung Kim
 
Self-defending software: Automatically patching errors in deployed software ...
Self-defending software: Automatically patching  errors in deployed software ...Self-defending software: Automatically patching  errors in deployed software ...
Self-defending software: Automatically patching errors in deployed software ...Sung Kim
 
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
Sung Kim
 

More from Sung Kim (16)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash Reproduction
 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012
 
Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote
 
Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)
 
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
 
Software Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of CrowdsSoftware Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of Crowds
 
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
 
Self-defending software: Automatically patching errors in deployed software ...
Self-defending software: Automatically patching  errors in deployed software ...Self-defending software: Automatically patching  errors in deployed software ...
Self-defending software: Automatically patching errors in deployed software ...
 
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
 

Recently uploaded

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 

Recently uploaded (20)

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 

Heterogeneous Defect Prediction (

ESEC/FSE 2015)

  • 1. Heterogeneous Defect Prediction ESEC/FSE 2015 September 3, 2015 Jaechang Nam and Sunghun Kim Department of Computer Science and Engineering HKUST
  • 2. 2 Predict Training ? ? Model Project A : Metric value : Buggy-labeled instance : Clean-labeled instance ?: Unlabeled instance Software Defect Prediction Related Work Munson@TSE`92, Basili@TSE`95, Menzies@TSE`07, Hassan@ICSE`09, Bird@FSE`11,D’ambros@EMSE112 Lee@FSE`11,...
  • 3. What if labeled instances do not exist? 3 ? ? ? ? ? Project X Unlabeled Dataset ?: Unlabeled instance : Metric value Model New projects Projects lacking in historical data
  • 4. Existing Solutions? 4 ? ? ? ? ? (New) Project X Unlabeled Dataset ?: Unlabeled instance : Metric value
  • 5. Cross-Project Defect Prediction (CPDP) 5 ? ? ? ? ? Training Predict Model Project A (source) Project X (target) Unlabeled Dataset : Metric value : Buggy-labeled instance : Clean-labeled instance ?: Unlabeled instance Related Work Watanabe@PROMISE`08, Turhan@EMSE`09 Zimmermann@FSE`09, Ma@IST`12, Zhang@MSR`14 Zhang@MSR`14, Panichella@WCRE`14, Canfora@STVR15 Challenge Same metric set (same feature space) • Heterogeneous metrics between source and target
  • 6. Motivation 6 ? Training Test Model Project A (source) Project C (target) ? ? ? ? ? ? ? Heterogeneous metric sets (different feature spaces or different domains) Possible to Reuse all the existing defect datasets for CPDP! Heterogeneous Defect Prediction (HDP)
  • 7. Key Idea • Consistent defect-proneness tendency of metrics – Defect prediction metrics measure complexity of software and its development process. • e.g. – The number of developers touching a source code file (Bird@FSE`11) – The number of methods in a class (D’Ambroas@ESEJ`12) – The number of operands (Menzies@TSE`08) More complexity implies more defect-proneness (Rahman@ICSE`13) • Distributions between source and target should be the same to build a strong prediction model. 7 Match source and target metrics that have similar distribution
  • 8. Heterogeneous Defect Prediction (HDP) - Overview - 8 X1 X2 X3 X4 Label 1 1 3 10 Buggy 8 0 1 0 Clean ⋮ ⋮ ⋮ ⋮ ⋮ 9 0 1 1 Clean Metric Matching Source: Project A Target: Project B Cross- prediction Model Build (training) Predict (test) Metric Selection Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label 3 1 1 0 2 1 9 ? 1 1 9 0 2 3 8 ? ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 0 1 1 1 2 1 1 ? 1 3 10 Buggy 8 1 0 Clean ⋮ ⋮ ⋮ ⋮ 9 1 1 Clean 1 3 10 Buggy 8 1 0 Clean ⋮ ⋮ ⋮ ⋮ 9 1 1 Clean 9 1 1 ? 8 3 9 ? ⋮ ⋮ ⋮ ⋮ 1 1 1 ?
  • 9. Metric Selection • Why? (Guyon@JMLR`03) – Select informative metrics • Remove redundant and irrelevant metrics – Decrease complexity of metric matching combination • Feature Selection Approaches (Gao@SPE`11,Shivaji@TSE`13) – Gain Ratio – Chi-square – Relief-F – Significance attribute evaluation 9
  • 10. Metric Matching 10 Source Metrics Target Metrics X1 X2 Y1 Y2 0.8 0.5 * We can apply different cutoff values of matching score * It can be possible that there is no matching at all.
  • 11. Compute Matching Score KSAnalyzer • Use p-value of Kolmogorov-Smirnov Test (Massey@JASA`51) 11 Matching Score M of i-th source and j-th target metrics: Mij = pij
  • 12. Heterogeneous Defect Prediction - Overview - 12 X1 X2 X3 X4 Label 1 1 3 10 Buggy 8 0 1 0 Clean ⋮ ⋮ ⋮ ⋮ ⋮ 9 0 1 1 Clean Metric Matching Source: Project A Target: Project B Cross- prediction Model Build (training) Predict (test) Metric Selection Y1 Y2 Y3 Y4 Y5 Y6 Y7 Label 3 1 1 0 2 1 9 ? 1 1 9 0 2 3 8 ? ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 0 1 1 1 2 1 1 ? 1 3 10 Buggy 8 1 0 Clean ⋮ ⋮ ⋮ ⋮ 9 1 1 Clean 1 3 10 Buggy 8 1 0 Clean ⋮ ⋮ ⋮ ⋮ 9 1 1 Clean 9 1 1 ? 8 3 9 ? ⋮ ⋮ ⋮ ⋮ 1 1 1 ?
  • 14. Baselines • WPDP • CPDP-CM (Turhan@EMSE`09,Ma@IST`12,He@IST`14) – Cross-project defect prediction using only common metrics between source and target datasets • CPDP-IFS (He@CoRR`14) – Cross-project defect prediction on Imbalanced Feature Set (i.e. heterogeneous metric set) – 16 distributional characteristics of values of an instance as features (e.g., mean, std, maximum,...) 14
  • 15. Research Questions (RQs) • RQ1 – Is heterogeneous defect prediction comparable to WPDP? • RQ2 – Is heterogeneous defect prediction comparable to CPDP-CM? • RQ3 – Is Heterogeneous defect prediction comparable to CPDP-IFS? 15
  • 16. Benchmark Datasets Group Dataset # of instances # of metrics Granularity All Buggy (%) AEEEM EQ 325 129 (39.7%) 61 Class JDT 997 206 (20.7%) LC 399 64 (9.36%) ML 1862 245 (13.2%) PDE 1492 209 (14.0%) MORP H ant-1.3 125 20 (16.0%) 20 Class arc 234 27 (11.5%) camel-1.0 339 13 (3.8%) poi-1.5 237 141 (75.0%) redaktor 176 27 (15.3%) skarbonka 45 9 (20.0%) tomcat 858 77 (9.0%) velocity-1.4 196 147 (75.0%) xalan-2.4 723 110 (15.2%) xerces-1.2 440 71 (16.1%) 16 Group Dataset # of instances # of metrics Granularity All Buggy (%) ReLink Apache 194 98 (50.5%) 26 FileSafe 56 22 (39.3%) ZXing 399 118 (29.6%) NASA cm1 327 42 (12.8%) 37 Function mw1 253 27 (10.7%) pc1 705 61 (8.7%) pc3 1077 134 (12.4%) pc4 1458 178 (12.2%) SOFTLA B ar1 121 9 (7.4%) 29 Function ar3 63 8 (12.7%) ar4 107 20 (18.7%) ar5 36 8 (22.2%) ar6 101 15 (14.9%) 600 prediction combinations in total!
  • 17. Experimental Settings • Logistic Regression • HDP vs. WPDP, CPDP-CM, and CPDP-IFS 17 Test set (50%) Training set (50%) Project 1 Project 2 Project n ... ... X 1000 Project 1 Project 2 Project n ... ... CPDP-CM CPDP-IFS HDP WPDP Project A
  • 18. Evaluation Measures • False Positive Rate = FP/(TN+FP) • True Positive Rate = Recall • AUC (Area Under receiver operating characteristic Curve) 18 False Positive rate TruePositiverate 0 1 1
  • 19. Evaluation Measures • Win/Tie/Loss (Valentini@ICML`03, Li@JASE`12, Kocaguneli@TSE`13) – Wilcoxon signed-rank test (p<0.05) for 1000 prediction results – Win • # of outperforming HDP prediction combinations with statistical significance. (p<0.05) – Tie • # of HDP prediction combinations with no statistical significance. (p≥0.05) – Loss • # of outperforming baseline prediction results with statistical significance. (p>0.05) 19
  • 21. Prediction Results in median AUC Target WPDP CPDP- CM CPDP- IFS HDPKS (cutoff =0.05) EQ 0.583 0.776 0.461 0.783 JDT 0.795 0.781 0.543 0.767 MC 0.575 0.636 0.584 0.655 ML 0.734 0.651 0.557 0.692* PDE 0.684 0.682 0.566 0.717 ant-1.3 0.670 0.611 0.500 0.701 arc 0.670 0.611 0.523 0.701 camel-1.0 0.550 0.590 0.500 0.639 poi-1.5 0.707 0.676 0.606 0.537 redaktor 0.744 0.500 0.500 0.537 skarbonka 0.569 0.736 0.528 0.694* tomcat 0.778 0.746 0.640 0.818 velocity- 1.4 0.725 0.609 0.500 0.391 xalan-2.4 0.755 0.658 0.499 0.751 xerces-1.2 0.624 0.453 0.500 0.489 21 Target WPDP CPDP- CM CPDP- IFS HDPKS (cutoff =0.05) Apache 0.714 0.689 0.635 0.717* Safe 0.706 0.749 0.616 0.818* ZXing 0.605 0.619 0.530 0.650* cm1 0.653 0.622 0.551 0.717* mw1 0.612 0.584 0.614 0.727 pc1 0.787 0.675 0.564 0.752* pc3 0.794 0.665 0.500 0.738* pc4 0.900 0.773 0.589 0.682* ar1 0.582 0.464 0.500 0.734* ar3 0.574 0.862 0.682 0.823* ar4 0.657 0.588 0.575 0.816* ar5 0.804 0.875 0.585 0.911* ar6 0.654 0.611 0.527 0.640 All 0.657 0.636 0.555 0.724* HDPKS: Heterogeneous defect prediction using KSAnalyzer
  • 22. Win/Tie/Loss Results Target Against WPDP Against CPDP-CM Against CPDP-IFS W T L W T L W T L EQ 4 0 0 2 2 0 4 0 0 JDT 0 0 5 3 0 2 5 0 0 LC 6 0 1 3 3 1 3 1 3 ML 0 0 6 4 2 0 6 0 0 PDE 3 0 2 2 0 3 5 0 0 ant-1.3 6 0 1 6 0 1 5 0 2 arc 3 1 0 3 0 1 4 0 0 camel-1.0 3 0 2 3 0 2 4 0 1 poi-1.5 2 0 2 3 0 1 2 0 2 redaktor 0 0 4 2 0 2 3 0 1 skarbonka 11 0 0 4 0 7 9 0 2 tomcat 2 0 0 1 1 0 2 0 0 velocity- 1.4 0 0 3 0 0 3 0 0 3 xalan-2.4 0 0 1 1 0 0 1 0 0 xerces-1.2 0 0 3 3 0 0 1 0 2 22 Target Against WPDP Against CPDP-CM Against CPDP-IFS W T L W T L W T L Apach e 6 0 5 8 1 2 9 0 2 Safe 14 0 3 12 0 5 15 0 2 ZXing 8 0 0 6 0 2 7 0 1 cm1 7 1 2 8 0 2 9 0 1 mw1 5 0 1 4 0 2 4 0 2 pc1 1 0 5 5 0 1 6 0 0 pc3 0 0 7 7 0 0 7 0 0 pc4 0 0 7 2 0 5 7 0 0 ar1 14 0 1 14 0 1 11 0 4 ar3 15 0 0 5 0 10 10 2 3 ar4 16 0 0 14 1 1 15 0 1 ar5 14 0 4 14 0 4 16 0 2 ar6 7 1 7 8 4 3 12 0 3 Total 147 3 72 147 14 61 182 3 35 % 66.2 % 1.4% 32.4 % 66.2 % 6.3% 27.5 % 82.0 % 1.3% 16.7 %
  • 23. Matched Metrics (Win) 23 MetricValues Distribution (Source metric: RFC-the number of method invoked by a class, Target metric: the number of operand Matching Score = 0.91 AUC = 0.946 (ant1.3  ar5)
  • 24. Matched Metrics (Loss) 24 MetricValues Distribution (Source metric: LOC, Target metric: average number of LOC in a method) Matching Score = 0.13 AUC = 0.391 (Safe  velocity-1.4)
  • 25. Different Feature Selections (median AUCs, Win/Tie/Loss) 25 Approach Against WPDP Against CPDP-CM Against CPDP-IFS HDP AUC Win% AUC Win% AUC Win% AUC Gain Ratio 0.657 63.7% 0.645 63.2% 0.536 80.2% 0.720 Chi-Square 0.657 64.7% 0.651 66.4% 0.556 82.3% 0.727 Significanc e 0.657 66.2% 0.636 66.2% 0.553 82.0% 0.724 Relief-F 0.670 57.0% 0.657 63.1% 0.543 80.5% 0.709 None 0.657 47.3% 0.624 50.3% 0.536 66.3% 0.663
  • 26. Results in Different Cutoffs 26 Cutoff Against WPDP Against CPDP-CM Against CPDP-IFS HDP Target Coverage AUC Win% AUC Win% AUC Win% AUC 0.05 0.657 66.2% 0.636 66.2% 0.553 82.4% 0.724* 100% 0.90 0.657 100% 0.761 71.4% 0.624 100% 0.852* 21%
  • 27. Conclusion • HDP – Potential for CPDP across datasets with different metric sets. • Future work – Filtering out noisy metric matching – Determine the best probability threshold 27

Editor's Notes

  1. Oggioni Room 17 PM (Session in 16:30 – 18:00)
  2. Here is Project A and some software entities. Let say these entities are source code files. I want to predict whether these files are buggy or clean. To do this, we need a prediction model. Since defect prediction models are trained by machine learning algorithms, we need labeled instances collected from previous releases. This is an labeled instance. An instance consists of features and labels. Various software metrics such as LoC, # of functions in a file, and # of authors touching a source file, are used as features for machine learning. Software metrics measure complexity of software and its development process Each instance can be labeled by past bug information. Software metrics and past bug information can be collected from software archives such as version control systems and bug report systems. With these labeled instances, we can build a prediction model and predict the unlabeled instances. This prediction is conducted within the same project. So, we call this Within-project defect prediction (WPDP). There are many studies about WPDP and showed good prediction performance. ( like prediction accuracy is 0.7.)
  3. What if there are no labeled instances. This can happen in new projects and projects lacking in historical data. New projects do not have past defect information to label instances. Some projects also does not have defect information because of lacking in historical data from software archives. When I participated in an industrial project for Samsung electronics, it was really difficult to generate labeled instances because their software archives are not well managed by developers. So, in some real industrial projects, we may not generate labeled instances to build a prediction model. Without labeled instances, we can not build a prediction model. After experiencing this limitation form the industry, I decided to address this problem.
  4. There are existing solutions to build a prediction model for unlabeled datasets. The first solution is cross-project defect prediction. We can reuse labeled instances from other projects.
  5. Various feature selection approaches can be applied
  6. By doing that, we can investigate how higher matching scores can impact defect prediction performance.
  7. 16 distribution characteristics: mode, median, mean, harmonic mean, minimum, maximum, range, variation ratio, first quartile, third quartile, interquartile range, vari- ance, standard deviation, coefficient of variance, skewness, and kurtosis
  8. AEEEM: object- oriented (OO) metrics, previous-defect metrics, entropy met- rics of change and code, and churn-of-source-code metrics [4]. MORPH: McCabe’s cyclomatic metrics, CK metrics, and other OO metrics [36]. ReLink: code complexity metrics NASA: Halstead metrics and McCabe’s cyclomatic metrics, additional complexity metrics such as parameter count and percentage of comments SOFTLAB: Halstead metrics and McCabe’s cyclomatic metrics
  9. all 222 prediction combinations among 600 predictions