SlideShare a Scribd company logo
MIP - Cross-Project Defect Prediction
Models: l'Union Fait la Force
Annibale Panichella Rocco Oliveto Andrea De Lucia
1
SANER 2014 (Prev. WCRE)
2
SANER 2014 (Prev. WCRE)
3
Cross-Project Defect Prediction
4
File LOC LCOM DIT … Bug
F1 76 0 2 …
F2 101 1 12 …
… … … … … …
Project A
(Source)
AI Model
|——Software Metrics——|
Project B
(Target)
File LOC LCOM DIT … Bug
F1 176 22 1 … ?
F2 433 144 4 … ?
… … … … … ?
|———Same Metrics———|
Training Test
Friedman test + Nemenyi
Liem and Panichella, Arxiv 2020
The Master Predictor?
5
“One Ring to rule them all”
Some models seem to perform better
than others
Differences among libraries/
implementations
A Deeper Analysis
6
Bayes
Net
AUC = 0.56
Logistic AUC = 0.49
If (dit<10)
If (loc > 5)
1
0
…
True False
True False
Decision
Tree
AUC = 0.57
4
BN
DT
Log
8
9
1
38 10
0
Number of true positives
(identi
fi
ed bugs) for Ant v1.7
L’Union Fait La Force
(Unity Is Strength)
7
Inspired by Peer-Reviewing
8
Manuscript
Reviewer1 Reviewer2 Reviewer3
Strong
Reject
Strong
Accept
Strong
Accept
ACCEPT
Discussion
File
If (dit<10)
If (loc > 5)
1
0
…
True False
True False
Model 1 Model 2 Model 3
Combined Model
(CODEP)
Final Decision
Inspired by Peer-Reviewing
9
File
If (dit<10)
If (loc > 5)
1
0
…
True False
True False
Model 1 Model 2 Model 3
Combined Model
(CODEP)
Final Decision
Base models are diverse (diversity is
the key like in peer-reviewing)
Combination can be done with any
base/meta model (e.g., Naive Bayes Net)
CODEP takes as inputs the predicted
scores (con
fi
dence values) and not the
predicted labels (yes/no) of the base models
Replicating the Study 10 Years Later
10
A quick demo…
Replicating the Study 10 Years Later
11
Over multiple runs (random K-folds),
CODEP shows statistically better
performance than the base model
Some models have larger results
variance (less stable) than others
AU
C
The Impact in the Community
12
“Some classi
fi
ers are more consistent in
predicting defects than others.”
Bowes et al., PROMISE 2015
“Despite similar predictive performance values
for these four classi
fi
ers, each detects different
sets of defects.”
Similar results also showed by
• Bowes et al., in SQJ 2017
• Chen et al., JSEP 2019
13
Other combination strategies:
• Ensemble (Majority voting)
• Ensemble Stacking (Different ML models)
• Classi
fi
er selection (Adaptive per target project)
The Impact in the Community
0
3
6
9
12
2016 2017 2018 2019 2020 2021 2022
# Follow-up papers on
Ensemble methods
(Directly citing CODEP)
150+ Papers on
Google scholar
on Ensemble and
combination
methods for
Defect Prediction
Majority voting is the best…
Majority voting is weak…
Bootstrap aggregation is the
best ever…
Voting average is the top-one
14
Further studies on the role of model diversity on combination performances
• Petric et al., ESEM 2016 “Diversity […] is essential"
• Chen et al., IEEETransactions or Reliability, 2023
• Sharma et al., JSS 2023
• …
Combination of different ML models for federated learning (DSA 2023)
Beyond defect prediction:
• Malware detection
• Effort estimation
• GitHub PR prediction
• …
The Impact in the Community
What About Current Trends in SE?
15
The Hot-Topic Waves
16
Standard
ML
Time
Research
Interests
Deep
Learning
LLMs
Random
Forest
Every 2-4 years (wave) we jump on new
techniques hoping to
fi
nd the “master ring”
We often forget the free-lunch theorem
in optimization and machine learning
Even a stopped clock is right twice in a day
(simple baseline are stronger than we think)
We should focus on understanding when to
use X orY rather than focusing on X vs.Y
(l’union fait la force)
MIP - Cross-Project Defect Prediction
Models: l'Union Fait la Force
Annibale Panichella Rocco Oliveto Andrea De Lucia
17

More Related Content

Similar to MIP Award presentation at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024

Contrasting Offline and Online Results when Evaluating Recommendation Algorithms
Contrasting Offline and Online Results when Evaluating Recommendation AlgorithmsContrasting Offline and Online Results when Evaluating Recommendation Algorithms
Contrasting Offline and Online Results when Evaluating Recommendation Algorithms
Marco Rossetti
 
From Bugs to Decision Support - Selected Research Highlights
From Bugs to Decision Support - Selected Research HighlightsFrom Bugs to Decision Support - Selected Research Highlights
From Bugs to Decision Support - Selected Research Highlights
Markus Borg
 
Benchmarking machine learning techniques
Benchmarking machine learning techniquesBenchmarking machine learning techniques
Benchmarking machine learning techniques
ijseajournal
 
Master’s Thesis.pptx
Master’s Thesis.pptxMaster’s Thesis.pptx
Master’s Thesis.pptx
usmimukherjee18
 
Cukic Promise08 V3
Cukic Promise08 V3Cukic Promise08 V3
Cukic Promise08 V3
gregoryg
 
A Hybrid Approach to Expert and Model Based Effort Estimation
A Hybrid Approach to Expert and Model Based Effort Estimation  A Hybrid Approach to Expert and Model Based Effort Estimation
A Hybrid Approach to Expert and Model Based Effort Estimation
CS, NcState
 
Predicting More from Less: Synergies of Learning
Predicting More from Less: Synergies of LearningPredicting More from Less: Synergies of Learning
Predicting More from Less: Synergies of Learning
CS, NcState
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
Tao Xie
 
Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.
Lionel Briand
 
Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)
VikramJothyPrakash1
 
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
DataScienceConferenc1
 
Technical report jada
Technical report jadaTechnical report jada
Technical report jada
University of Bari (Italy)
 
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...
CS, NcState
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...
Chakkrit (Kla) Tantithamthavorn
 
Abstract.doc
Abstract.docAbstract.doc
Abstract.doc
butest
 
Final Survey On Rule Base Development Slideshare
Final Survey On Rule Base Development SlideshareFinal Survey On Rule Base Development Slideshare
Final Survey On Rule Base Development Slideshare
Valentin Zacharias
 
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
Tao Xie
 
Bits of Evidence
Bits of EvidenceBits of Evidence
Bits of Evidence
Greg Wilson
 
Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...
csandit
 

Similar to MIP Award presentation at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024 (20)

Contrasting Offline and Online Results when Evaluating Recommendation Algorithms
Contrasting Offline and Online Results when Evaluating Recommendation AlgorithmsContrasting Offline and Online Results when Evaluating Recommendation Algorithms
Contrasting Offline and Online Results when Evaluating Recommendation Algorithms
 
From Bugs to Decision Support - Selected Research Highlights
From Bugs to Decision Support - Selected Research HighlightsFrom Bugs to Decision Support - Selected Research Highlights
From Bugs to Decision Support - Selected Research Highlights
 
Benchmarking machine learning techniques
Benchmarking machine learning techniquesBenchmarking machine learning techniques
Benchmarking machine learning techniques
 
Master’s Thesis.pptx
Master’s Thesis.pptxMaster’s Thesis.pptx
Master’s Thesis.pptx
 
Cukic Promise08 V3
Cukic Promise08 V3Cukic Promise08 V3
Cukic Promise08 V3
 
A Hybrid Approach to Expert and Model Based Effort Estimation
A Hybrid Approach to Expert and Model Based Effort Estimation  A Hybrid Approach to Expert and Model Based Effort Estimation
A Hybrid Approach to Expert and Model Based Effort Estimation
 
Predicting More from Less: Synergies of Learning
Predicting More from Less: Synergies of LearningPredicting More from Less: Synergies of Learning
Predicting More from Less: Synergies of Learning
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.
 
Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)
 
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
 
Technical report jada
Technical report jadaTechnical report jada
Technical report jada
 
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for ...
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...
 
Abstract.doc
Abstract.docAbstract.doc
Abstract.doc
 
Final Survey On Rule Base Development Slideshare
Final Survey On Rule Base Development SlideshareFinal Survey On Rule Base Development Slideshare
Final Survey On Rule Base Development Slideshare
 
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
 
Bits of Evidence
Bits of EvidenceBits of Evidence
Bits of Evidence
 
Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...Comparative Performance Analysis of Machine Learning Techniques for Software ...
Comparative Performance Analysis of Machine Learning Techniques for Software ...
 

More from Annibale Panichella

Breaking the Silence: the Threats of Using LLMs in Software Engineering
Breaking the Silence: the Threats of Using LLMs in Software EngineeringBreaking the Silence: the Threats of Using LLMs in Software Engineering
Breaking the Silence: the Threats of Using LLMs in Software Engineering
Annibale Panichella
 
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Annibale Panichella
 
A Fast Multi-objective Evolutionary Approach for Designing Large-Scale Optica...
A Fast Multi-objective Evolutionary Approach for Designing Large-Scale Optica...A Fast Multi-objective Evolutionary Approach for Designing Large-Scale Optica...
A Fast Multi-objective Evolutionary Approach for Designing Large-Scale Optica...
Annibale Panichella
 
An Improved Pareto Front Modeling Algorithm for Large-scale Many-Objective Op...
An Improved Pareto Front Modeling Algorithm for Large-scale Many-Objective Op...An Improved Pareto Front Modeling Algorithm for Large-scale Many-Objective Op...
An Improved Pareto Front Modeling Algorithm for Large-scale Many-Objective Op...
Annibale Panichella
 
VST2022.pdf
VST2022.pdfVST2022.pdf
VST2022.pdf
Annibale Panichella
 
IPA Fall Days 2019
 IPA Fall Days 2019 IPA Fall Days 2019
IPA Fall Days 2019
Annibale Panichella
 
An Adaptive Evolutionary Algorithm based on Non-Euclidean Geometry for Many-O...
An Adaptive Evolutionary Algorithm based on Non-Euclidean Geometry for Many-O...An Adaptive Evolutionary Algorithm based on Non-Euclidean Geometry for Many-O...
An Adaptive Evolutionary Algorithm based on Non-Euclidean Geometry for Many-O...
Annibale Panichella
 
Speeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational IntelligenceSpeeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational Intelligence
Annibale Panichella
 
Incremental Control Dependency Frontier Exploration for Many-Criteria Test C...
Incremental Control Dependency Frontier Exploration for Many-Criteria  Test C...Incremental Control Dependency Frontier Exploration for Many-Criteria  Test C...
Incremental Control Dependency Frontier Exploration for Many-Criteria Test C...
Annibale Panichella
 
Sbst2018 contest2018
Sbst2018 contest2018Sbst2018 contest2018
Sbst2018 contest2018
Annibale Panichella
 
Java Unit Testing Tool Competition — Fifth Round
Java Unit Testing Tool Competition — Fifth RoundJava Unit Testing Tool Competition — Fifth Round
Java Unit Testing Tool Competition — Fifth Round
Annibale Panichella
 
ICSE 2017 - Evocrash
ICSE 2017 - EvocrashICSE 2017 - Evocrash
ICSE 2017 - Evocrash
Annibale Panichella
 
Evolutionary Testing for Crash Reproduction
Evolutionary Testing for Crash ReproductionEvolutionary Testing for Crash Reproduction
Evolutionary Testing for Crash Reproduction
Annibale Panichella
 
Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...
Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...
Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...
Annibale Panichella
 
Security Threat Identification and Testing
Security Threat Identification and TestingSecurity Threat Identification and Testing
Security Threat Identification and Testing
Annibale Panichella
 
Reformulating Branch Coverage as a Many-Objective Optimization Problem
Reformulating Branch Coverage as a Many-Objective Optimization ProblemReformulating Branch Coverage as a Many-Objective Optimization Problem
Reformulating Branch Coverage as a Many-Objective Optimization Problem
Annibale Panichella
 
Results for EvoSuite-MOSA at the Third Unit Testing Tool Competition
Results for EvoSuite-MOSA at the Third Unit Testing Tool CompetitionResults for EvoSuite-MOSA at the Third Unit Testing Tool Competition
Results for EvoSuite-MOSA at the Third Unit Testing Tool Competition
Annibale Panichella
 
Adaptive User Feedback for IR-based Traceability Recovery
Adaptive User Feedback for IR-based Traceability RecoveryAdaptive User Feedback for IR-based Traceability Recovery
Adaptive User Feedback for IR-based Traceability Recovery
Annibale Panichella
 
Diversity mechanisms for evolutionary populations in Search-Based Software En...
Diversity mechanisms for evolutionary populations in Search-Based Software En...Diversity mechanisms for evolutionary populations in Search-Based Software En...
Diversity mechanisms for evolutionary populations in Search-Based Software En...
Annibale Panichella
 
Estimating the Evolution Direction of Populations to Improve Genetic Algorithms
Estimating the Evolution Direction of Populations to Improve Genetic AlgorithmsEstimating the Evolution Direction of Populations to Improve Genetic Algorithms
Estimating the Evolution Direction of Populations to Improve Genetic Algorithms
Annibale Panichella
 

More from Annibale Panichella (20)

Breaking the Silence: the Threats of Using LLMs in Software Engineering
Breaking the Silence: the Threats of Using LLMs in Software EngineeringBreaking the Silence: the Threats of Using LLMs in Software Engineering
Breaking the Silence: the Threats of Using LLMs in Software Engineering
 
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
 
A Fast Multi-objective Evolutionary Approach for Designing Large-Scale Optica...
A Fast Multi-objective Evolutionary Approach for Designing Large-Scale Optica...A Fast Multi-objective Evolutionary Approach for Designing Large-Scale Optica...
A Fast Multi-objective Evolutionary Approach for Designing Large-Scale Optica...
 
An Improved Pareto Front Modeling Algorithm for Large-scale Many-Objective Op...
An Improved Pareto Front Modeling Algorithm for Large-scale Many-Objective Op...An Improved Pareto Front Modeling Algorithm for Large-scale Many-Objective Op...
An Improved Pareto Front Modeling Algorithm for Large-scale Many-Objective Op...
 
VST2022.pdf
VST2022.pdfVST2022.pdf
VST2022.pdf
 
IPA Fall Days 2019
 IPA Fall Days 2019 IPA Fall Days 2019
IPA Fall Days 2019
 
An Adaptive Evolutionary Algorithm based on Non-Euclidean Geometry for Many-O...
An Adaptive Evolutionary Algorithm based on Non-Euclidean Geometry for Many-O...An Adaptive Evolutionary Algorithm based on Non-Euclidean Geometry for Many-O...
An Adaptive Evolutionary Algorithm based on Non-Euclidean Geometry for Many-O...
 
Speeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational IntelligenceSpeeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational Intelligence
 
Incremental Control Dependency Frontier Exploration for Many-Criteria Test C...
Incremental Control Dependency Frontier Exploration for Many-Criteria  Test C...Incremental Control Dependency Frontier Exploration for Many-Criteria  Test C...
Incremental Control Dependency Frontier Exploration for Many-Criteria Test C...
 
Sbst2018 contest2018
Sbst2018 contest2018Sbst2018 contest2018
Sbst2018 contest2018
 
Java Unit Testing Tool Competition — Fifth Round
Java Unit Testing Tool Competition — Fifth RoundJava Unit Testing Tool Competition — Fifth Round
Java Unit Testing Tool Competition — Fifth Round
 
ICSE 2017 - Evocrash
ICSE 2017 - EvocrashICSE 2017 - Evocrash
ICSE 2017 - Evocrash
 
Evolutionary Testing for Crash Reproduction
Evolutionary Testing for Crash ReproductionEvolutionary Testing for Crash Reproduction
Evolutionary Testing for Crash Reproduction
 
Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...
Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...
Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...
 
Security Threat Identification and Testing
Security Threat Identification and TestingSecurity Threat Identification and Testing
Security Threat Identification and Testing
 
Reformulating Branch Coverage as a Many-Objective Optimization Problem
Reformulating Branch Coverage as a Many-Objective Optimization ProblemReformulating Branch Coverage as a Many-Objective Optimization Problem
Reformulating Branch Coverage as a Many-Objective Optimization Problem
 
Results for EvoSuite-MOSA at the Third Unit Testing Tool Competition
Results for EvoSuite-MOSA at the Third Unit Testing Tool CompetitionResults for EvoSuite-MOSA at the Third Unit Testing Tool Competition
Results for EvoSuite-MOSA at the Third Unit Testing Tool Competition
 
Adaptive User Feedback for IR-based Traceability Recovery
Adaptive User Feedback for IR-based Traceability RecoveryAdaptive User Feedback for IR-based Traceability Recovery
Adaptive User Feedback for IR-based Traceability Recovery
 
Diversity mechanisms for evolutionary populations in Search-Based Software En...
Diversity mechanisms for evolutionary populations in Search-Based Software En...Diversity mechanisms for evolutionary populations in Search-Based Software En...
Diversity mechanisms for evolutionary populations in Search-Based Software En...
 
Estimating the Evolution Direction of Populations to Improve Genetic Algorithms
Estimating the Evolution Direction of Populations to Improve Genetic AlgorithmsEstimating the Evolution Direction of Populations to Improve Genetic Algorithms
Estimating the Evolution Direction of Populations to Improve Genetic Algorithms
 

Recently uploaded

Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 

Recently uploaded (20)

Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 

MIP Award presentation at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024

  • 1. MIP - Cross-Project Defect Prediction Models: l'Union Fait la Force Annibale Panichella Rocco Oliveto Andrea De Lucia 1
  • 4. Cross-Project Defect Prediction 4 File LOC LCOM DIT … Bug F1 76 0 2 … F2 101 1 12 … … … … … … … Project A (Source) AI Model |——Software Metrics——| Project B (Target) File LOC LCOM DIT … Bug F1 176 22 1 … ? F2 433 144 4 … ? … … … … … ? |———Same Metrics———| Training Test
  • 5. Friedman test + Nemenyi Liem and Panichella, Arxiv 2020 The Master Predictor? 5 “One Ring to rule them all” Some models seem to perform better than others Differences among libraries/ implementations
  • 6. A Deeper Analysis 6 Bayes Net AUC = 0.56 Logistic AUC = 0.49 If (dit<10) If (loc > 5) 1 0 … True False True False Decision Tree AUC = 0.57 4 BN DT Log 8 9 1 38 10 0 Number of true positives (identi fi ed bugs) for Ant v1.7
  • 7. L’Union Fait La Force (Unity Is Strength) 7
  • 8. Inspired by Peer-Reviewing 8 Manuscript Reviewer1 Reviewer2 Reviewer3 Strong Reject Strong Accept Strong Accept ACCEPT Discussion File If (dit<10) If (loc > 5) 1 0 … True False True False Model 1 Model 2 Model 3 Combined Model (CODEP) Final Decision
  • 9. Inspired by Peer-Reviewing 9 File If (dit<10) If (loc > 5) 1 0 … True False True False Model 1 Model 2 Model 3 Combined Model (CODEP) Final Decision Base models are diverse (diversity is the key like in peer-reviewing) Combination can be done with any base/meta model (e.g., Naive Bayes Net) CODEP takes as inputs the predicted scores (con fi dence values) and not the predicted labels (yes/no) of the base models
  • 10. Replicating the Study 10 Years Later 10 A quick demo…
  • 11. Replicating the Study 10 Years Later 11 Over multiple runs (random K-folds), CODEP shows statistically better performance than the base model Some models have larger results variance (less stable) than others AU C
  • 12. The Impact in the Community 12 “Some classi fi ers are more consistent in predicting defects than others.” Bowes et al., PROMISE 2015 “Despite similar predictive performance values for these four classi fi ers, each detects different sets of defects.” Similar results also showed by • Bowes et al., in SQJ 2017 • Chen et al., JSEP 2019
  • 13. 13 Other combination strategies: • Ensemble (Majority voting) • Ensemble Stacking (Different ML models) • Classi fi er selection (Adaptive per target project) The Impact in the Community 0 3 6 9 12 2016 2017 2018 2019 2020 2021 2022 # Follow-up papers on Ensemble methods (Directly citing CODEP) 150+ Papers on Google scholar on Ensemble and combination methods for Defect Prediction Majority voting is the best… Majority voting is weak… Bootstrap aggregation is the best ever… Voting average is the top-one
  • 14. 14 Further studies on the role of model diversity on combination performances • Petric et al., ESEM 2016 “Diversity […] is essential" • Chen et al., IEEETransactions or Reliability, 2023 • Sharma et al., JSS 2023 • … Combination of different ML models for federated learning (DSA 2023) Beyond defect prediction: • Malware detection • Effort estimation • GitHub PR prediction • … The Impact in the Community
  • 15. What About Current Trends in SE? 15
  • 16. The Hot-Topic Waves 16 Standard ML Time Research Interests Deep Learning LLMs Random Forest Every 2-4 years (wave) we jump on new techniques hoping to fi nd the “master ring” We often forget the free-lunch theorem in optimization and machine learning Even a stopped clock is right twice in a day (simple baseline are stronger than we think) We should focus on understanding when to use X orY rather than focusing on X vs.Y (l’union fait la force)
  • 17. MIP - Cross-Project Defect Prediction Models: l'Union Fait la Force Annibale Panichella Rocco Oliveto Andrea De Lucia 17