SlideShare a Scribd company logo
Predicting Defects Using
Change Genealogies
Kim Herzig*, Sascha Just†, Andreas Rau†, Andreas Zeller†
* Microsoft Research, UK

† Saarland University,

Germany
Prediction Models
• Goal: determine the likelihood of bugs in

code entities

 Quality assurance limited by time and money.
 Can be helpful for project outsiders.

• Trained on “ground truth”
 Known instances and their properties.
 Idea: learning from past for future.

• Predicting / estimating defect likelihood of

new, unknown code entities
Fine-Tuning Prediction Models
Machine Learner

Training Methods
Metrics (independent variables)
Prediction Target
(Social) Network Metrics
 Some participants more active and

central than others.

 Are these participants also more

crucial?
Code Network Metrics

[2008] Zimmermann and Nagappan: “Predicting Defects using Network Analysis on Dependency Graphs”

10100
10010
1101011000
1001011001
0001010111
1001011001

10100
10010
1101011000
1001011001
0001010111
1001011001

10100
10010
1101011000
1001011001
0001010111
1001011001

10100
10010
1101011000
1001011001
0001010111
1001011001

 Code entities communicate with

each other.

Call graphs do not
change significantly  Use call graph network to
compute network metrics.
over time!
10100
10010
1101011000
1001011001
0001010111
1001011001

10100
10010
1101011000
1001011001
0001010111
1001011001

10100
10010
1101011000
1001011001
0001010111
1001011001

10100
10010
1101011000
1001011001
0001010111
1001011001

 Assumption: “Central binaries tend to

be defect-prone”.
Change Network Metrics
Idea: Use dependencies between code changes
 Code changes depend on each

other.

 Central code changes tend to be

crucial.

Change Genealogies

 Assumption: “Code being crucially

changed tend to be defect prone”.
Change Genealogies (in a nutshell)
[2013] Kim Herzig: “Mining and Untangling Change Genealogies” (PhD thesis)

Directed graph structure
Method level dependencies

Multi-dimensional (space & time)
Change Genealogy Metrics
 EGO network metrics
 Measures the immediate impact of changes on other changes.

 GLOBAL network metrics
 Express the long-term impact of changes on other changes.

 Considering the type of the change
 Adding method definition, modifying method call

 Considering parent age
 How old are the parent changes a change depends on.

Change genealogy metrics must be aggregated to source file level.
Experimental Setup
Comparing change genealogies
against:


Code complexity models
(e.g. McCabe)



Code dependency models
(Zimmermann & Nagappan)



Combined network models

(Change genealogy & code dependency network metrics)
Experimental Setup

Study subjects

Multiple machine learners
Prediction Precision

NM & CGM

Change genealogy metrics
Code dependency network metrics (Zimmermann & Nagappan)
Code complexity metrics
Confirmed: Network metrics
outperform complexity metrics.
Change genealogy models report
less false positives (higher precision).
Change genealogy model slightly
more false negatives (lower recall).
Combining network metrics: good
recall but worse precision.
Influential Metrics
Network efficiency among the top 10 most influential metrics.
Relationship between changes and type of dependency top 2 metrics (for all projects).
Higher number of old parents the higher the probability to add bugs.
 Code entities combining multiple older functionalities more defect prone.
Summary

Adapting social network metrics

to change dependency graphs.

Comparing prediction models.

 Change genealogies are well suited for defect prediction

(better precision, close recall).

 Code entities combining multiple older functionalities more defect prone.

More Related Content

What's hot

Dang et al. (2013), "Contextual difference and intention to perform informati...
Dang et al. (2013), "Contextual difference and intention to perform informati...Dang et al. (2013), "Contextual difference and intention to perform informati...
Dang et al. (2013), "Contextual difference and intention to perform informati...
tduy0506
 
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...Some insights from a Systematic Mapping Study and a Systematic Review Study: ...
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...
Phu H. Nguyen
 
Reverse Engineering android Malware analysis
Reverse Engineering android Malware analysisReverse Engineering android Malware analysis
Reverse Engineering android Malware analysis
Anik Ralhan
 
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - IntroductionTutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Jean-Paul Calbimonte
 
Effects of Ownership on Software Quality
 Effects of Ownership on Software Quality Effects of Ownership on Software Quality
Effects of Ownership on Software Quality
Md. Shafiuzzaman Hira
 
Unit Testing with ASP.NET
Unit Testing with ASP.NETUnit Testing with ASP.NET
Unit Testing with ASP.NET
Josh Candish
 
Using Cognitive Dimensions Questionnaire to Evaluate the Usability of Securit...
Using Cognitive Dimensions Questionnaire to Evaluate the Usability of Securit...Using Cognitive Dimensions Questionnaire to Evaluate the Usability of Securit...
Using Cognitive Dimensions Questionnaire to Evaluate the Usability of Securit...
Chamila Wijayarathna
 
WSN Security Research Directions
WSN Security Research DirectionsWSN Security Research Directions
WSN Security Research Directions
Emil Lupu
 
an empirical performance evaluation of relational keyword search techniques
an empirical performance evaluation of relational keyword search techniquesan empirical performance evaluation of relational keyword search techniques
an empirical performance evaluation of relational keyword search techniques
swathi78
 
3. Cnnecst-Project Planning and Organization
3. Cnnecst-Project Planning and Organization3. Cnnecst-Project Planning and Organization
3. Cnnecst-Project Planning and Organization
CNNECST - Convolutional Neural Networks
 
Ontology based top-k query answering over massive, heterogeneous, and dynamic...
Ontology based top-k query answering over massive, heterogeneous, and dynamic...Ontology based top-k query answering over massive, heterogeneous, and dynamic...
Ontology based top-k query answering over massive, heterogeneous, and dynamic...
Daniele Dell'Aglio
 
Fortner_OSCARPresentation
Fortner_OSCARPresentationFortner_OSCARPresentation
Fortner_OSCARPresentation
Ashley Fortner
 
Who Watches the Watchers Metrics for Security Strategy - BsidesLV 2015 - Roytman
Who Watches the Watchers Metrics for Security Strategy - BsidesLV 2015 - RoytmanWho Watches the Watchers Metrics for Security Strategy - BsidesLV 2015 - Roytman
Who Watches the Watchers Metrics for Security Strategy - BsidesLV 2015 - Roytman
Michael Roytman
 
Data Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel FileData Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel File
Mehmet Gök
 
Frequency Based Detection Of Task Switches
Frequency Based Detection Of Task SwitchesFrequency Based Detection Of Task Switches
Frequency Based Detection Of Task Switches
rnair
 
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementGoal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Emil Lupu
 

What's hot (16)

Dang et al. (2013), "Contextual difference and intention to perform informati...
Dang et al. (2013), "Contextual difference and intention to perform informati...Dang et al. (2013), "Contextual difference and intention to perform informati...
Dang et al. (2013), "Contextual difference and intention to perform informati...
 
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...Some insights from a Systematic Mapping Study and a Systematic Review Study: ...
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...
 
Reverse Engineering android Malware analysis
Reverse Engineering android Malware analysisReverse Engineering android Malware analysis
Reverse Engineering android Malware analysis
 
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - IntroductionTutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
 
Effects of Ownership on Software Quality
 Effects of Ownership on Software Quality Effects of Ownership on Software Quality
Effects of Ownership on Software Quality
 
Unit Testing with ASP.NET
Unit Testing with ASP.NETUnit Testing with ASP.NET
Unit Testing with ASP.NET
 
Using Cognitive Dimensions Questionnaire to Evaluate the Usability of Securit...
Using Cognitive Dimensions Questionnaire to Evaluate the Usability of Securit...Using Cognitive Dimensions Questionnaire to Evaluate the Usability of Securit...
Using Cognitive Dimensions Questionnaire to Evaluate the Usability of Securit...
 
WSN Security Research Directions
WSN Security Research DirectionsWSN Security Research Directions
WSN Security Research Directions
 
an empirical performance evaluation of relational keyword search techniques
an empirical performance evaluation of relational keyword search techniquesan empirical performance evaluation of relational keyword search techniques
an empirical performance evaluation of relational keyword search techniques
 
3. Cnnecst-Project Planning and Organization
3. Cnnecst-Project Planning and Organization3. Cnnecst-Project Planning and Organization
3. Cnnecst-Project Planning and Organization
 
Ontology based top-k query answering over massive, heterogeneous, and dynamic...
Ontology based top-k query answering over massive, heterogeneous, and dynamic...Ontology based top-k query answering over massive, heterogeneous, and dynamic...
Ontology based top-k query answering over massive, heterogeneous, and dynamic...
 
Fortner_OSCARPresentation
Fortner_OSCARPresentationFortner_OSCARPresentation
Fortner_OSCARPresentation
 
Who Watches the Watchers Metrics for Security Strategy - BsidesLV 2015 - Roytman
Who Watches the Watchers Metrics for Security Strategy - BsidesLV 2015 - RoytmanWho Watches the Watchers Metrics for Security Strategy - BsidesLV 2015 - Roytman
Who Watches the Watchers Metrics for Security Strategy - BsidesLV 2015 - Roytman
 
Data Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel FileData Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel File
 
Frequency Based Detection Of Task Switches
Frequency Based Detection Of Task SwitchesFrequency Based Detection Of Task Switches
Frequency Based Detection Of Task Switches
 
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementGoal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
 

Similar to Predicting Defects Using Change Genealogies (ISSE 2013)

Network Intrusion Detection (1)-converted-1.pptx
Network Intrusion Detection (1)-converted-1.pptxNetwork Intrusion Detection (1)-converted-1.pptx
Network Intrusion Detection (1)-converted-1.pptx
SubhrajyotiPayra
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
NeerajNishad4
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
MarcoMellia
 
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
Agile Testing Alliance
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networking
Stenio Fernandes
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Greg Makowski
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
SubhashreddyPalleti
 
The Dangers of Machine Learning
The Dangers of Machine LearningThe Dangers of Machine Learning
The Dangers of Machine Learning
tothepointIT
 
ISEN 613_Team3_Final Project Report
ISEN 613_Team3_Final Project ReportISEN 613_Team3_Final Project Report
ISEN 613_Team3_Final Project Report
Rahul Garg, CSSGB
 
ISEN 613_Team3_Final Project Report
ISEN 613_Team3_Final Project ReportISEN 613_Team3_Final Project Report
ISEN 613_Team3_Final Project Report
Naman Kapoor
 
ICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptxICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptx
ajagbesundayadeola
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
IRJET Journal
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Luigi Vanfretti
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
Anubhav Jain
 
Cyb 5675 class project final
Cyb 5675   class project finalCyb 5675   class project final
Cyb 5675 class project final
Craig Cannon
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071
CS, NcState
 
Study on reliability optimization problem of computer By Dharmendra Singh[Srm...
Study on reliability optimization problem of computer By Dharmendra Singh[Srm...Study on reliability optimization problem of computer By Dharmendra Singh[Srm...
Study on reliability optimization problem of computer By Dharmendra Singh[Srm...
Dharmendrasingh417
 
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
IJCSEA Journal
 
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
IJCSEA Journal
 

Similar to Predicting Defects Using Change Genealogies (ISSE 2013) (20)

Network Intrusion Detection (1)-converted-1.pptx
Network Intrusion Detection (1)-converted-1.pptxNetwork Intrusion Detection (1)-converted-1.pptx
Network Intrusion Detection (1)-converted-1.pptx
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
 
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networking
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
The Dangers of Machine Learning
The Dangers of Machine LearningThe Dangers of Machine Learning
The Dangers of Machine Learning
 
ISEN 613_Team3_Final Project Report
ISEN 613_Team3_Final Project ReportISEN 613_Team3_Final Project Report
ISEN 613_Team3_Final Project Report
 
ISEN 613_Team3_Final Project Report
ISEN 613_Team3_Final Project ReportISEN 613_Team3_Final Project Report
ISEN 613_Team3_Final Project Report
 
ICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptxICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptx
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Cyb 5675 class project final
Cyb 5675   class project finalCyb 5675   class project final
Cyb 5675 class project final
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071
 
Study on reliability optimization problem of computer By Dharmendra Singh[Srm...
Study on reliability optimization problem of computer By Dharmendra Singh[Srm...Study on reliability optimization problem of computer By Dharmendra Singh[Srm...
Study on reliability optimization problem of computer By Dharmendra Singh[Srm...
 
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
 
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...
 

More from Kim Herzig

Keynote AST 2016
Keynote AST 2016Keynote AST 2016
Keynote AST 2016
Kim Herzig
 
Empirically Detecting False Test Alarms Using Association Rules @ ICSE 2015
Empirically Detecting False Test Alarms Using Association Rules @ ICSE 2015Empirically Detecting False Test Alarms Using Association Rules @ ICSE 2015
Empirically Detecting False Test Alarms Using Association Rules @ ICSE 2015
Kim Herzig
 
The Art of Testing Less without Sacrificing Quality @ ICSE 2015
The Art of Testing Less without Sacrificing Quality @ ICSE 2015The Art of Testing Less without Sacrificing Quality @ ICSE 2015
The Art of Testing Less without Sacrificing Quality @ ICSE 2015
Kim Herzig
 
Issre2014 test defectprediction
Issre2014 test defectpredictionIssre2014 test defectprediction
Issre2014 test defectprediction
Kim Herzig
 
The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...
Kim Herzig
 
Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)
Kim Herzig
 
The Impact of Tangled Code Changes
The Impact of Tangled Code ChangesThe Impact of Tangled Code Changes
The Impact of Tangled Code Changes
Kim Herzig
 
Mining Cause Effect Chains from Version Archives - ISSRE 2011
Mining Cause Effect Chains from Version Archives - ISSRE 2011Mining Cause Effect Chains from Version Archives - ISSRE 2011
Mining Cause Effect Chains from Version Archives - ISSRE 2011
Kim Herzig
 
Network vs. Code Metrics to Predict Defects: A Replication Study
Network vs. Code Metrics  to Predict Defects: A Replication StudyNetwork vs. Code Metrics  to Predict Defects: A Replication Study
Network vs. Code Metrics to Predict Defects: A Replication Study
Kim Herzig
 
Capturing the Long Term Impact of Changes
Capturing the Long Term Impact of ChangesCapturing the Long Term Impact of Changes
Capturing the Long Term Impact of Changes
Kim Herzig
 
Software Engineering Course 2009 - Mining Software Archives
Software Engineering Course 2009 - Mining Software ArchivesSoftware Engineering Course 2009 - Mining Software Archives
Software Engineering Course 2009 - Mining Software Archives
Kim Herzig
 

More from Kim Herzig (11)

Keynote AST 2016
Keynote AST 2016Keynote AST 2016
Keynote AST 2016
 
Empirically Detecting False Test Alarms Using Association Rules @ ICSE 2015
Empirically Detecting False Test Alarms Using Association Rules @ ICSE 2015Empirically Detecting False Test Alarms Using Association Rules @ ICSE 2015
Empirically Detecting False Test Alarms Using Association Rules @ ICSE 2015
 
The Art of Testing Less without Sacrificing Quality @ ICSE 2015
The Art of Testing Less without Sacrificing Quality @ ICSE 2015The Art of Testing Less without Sacrificing Quality @ ICSE 2015
The Art of Testing Less without Sacrificing Quality @ ICSE 2015
 
Issre2014 test defectprediction
Issre2014 test defectpredictionIssre2014 test defectprediction
Issre2014 test defectprediction
 
The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...
 
Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)
 
The Impact of Tangled Code Changes
The Impact of Tangled Code ChangesThe Impact of Tangled Code Changes
The Impact of Tangled Code Changes
 
Mining Cause Effect Chains from Version Archives - ISSRE 2011
Mining Cause Effect Chains from Version Archives - ISSRE 2011Mining Cause Effect Chains from Version Archives - ISSRE 2011
Mining Cause Effect Chains from Version Archives - ISSRE 2011
 
Network vs. Code Metrics to Predict Defects: A Replication Study
Network vs. Code Metrics  to Predict Defects: A Replication StudyNetwork vs. Code Metrics  to Predict Defects: A Replication Study
Network vs. Code Metrics to Predict Defects: A Replication Study
 
Capturing the Long Term Impact of Changes
Capturing the Long Term Impact of ChangesCapturing the Long Term Impact of Changes
Capturing the Long Term Impact of Changes
 
Software Engineering Course 2009 - Mining Software Archives
Software Engineering Course 2009 - Mining Software ArchivesSoftware Engineering Course 2009 - Mining Software Archives
Software Engineering Course 2009 - Mining Software Archives
 

Recently uploaded

"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 

Recently uploaded (20)

"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 

Predicting Defects Using Change Genealogies (ISSE 2013)

  • 1. Predicting Defects Using Change Genealogies Kim Herzig*, Sascha Just†, Andreas Rau†, Andreas Zeller† * Microsoft Research, UK † Saarland University, Germany
  • 2. Prediction Models • Goal: determine the likelihood of bugs in code entities  Quality assurance limited by time and money.  Can be helpful for project outsiders. • Trained on “ground truth”  Known instances and their properties.  Idea: learning from past for future. • Predicting / estimating defect likelihood of new, unknown code entities
  • 3. Fine-Tuning Prediction Models Machine Learner Training Methods Metrics (independent variables) Prediction Target
  • 4. (Social) Network Metrics  Some participants more active and central than others.  Are these participants also more crucial?
  • 5. Code Network Metrics [2008] Zimmermann and Nagappan: “Predicting Defects using Network Analysis on Dependency Graphs” 10100 10010 1101011000 1001011001 0001010111 1001011001 10100 10010 1101011000 1001011001 0001010111 1001011001 10100 10010 1101011000 1001011001 0001010111 1001011001 10100 10010 1101011000 1001011001 0001010111 1001011001  Code entities communicate with each other. Call graphs do not change significantly  Use call graph network to compute network metrics. over time! 10100 10010 1101011000 1001011001 0001010111 1001011001 10100 10010 1101011000 1001011001 0001010111 1001011001 10100 10010 1101011000 1001011001 0001010111 1001011001 10100 10010 1101011000 1001011001 0001010111 1001011001  Assumption: “Central binaries tend to be defect-prone”.
  • 6. Change Network Metrics Idea: Use dependencies between code changes  Code changes depend on each other.  Central code changes tend to be crucial. Change Genealogies  Assumption: “Code being crucially changed tend to be defect prone”.
  • 7. Change Genealogies (in a nutshell) [2013] Kim Herzig: “Mining and Untangling Change Genealogies” (PhD thesis) Directed graph structure Method level dependencies Multi-dimensional (space & time)
  • 8. Change Genealogy Metrics  EGO network metrics  Measures the immediate impact of changes on other changes.  GLOBAL network metrics  Express the long-term impact of changes on other changes.  Considering the type of the change  Adding method definition, modifying method call  Considering parent age  How old are the parent changes a change depends on. Change genealogy metrics must be aggregated to source file level.
  • 9. Experimental Setup Comparing change genealogies against:  Code complexity models (e.g. McCabe)  Code dependency models (Zimmermann & Nagappan)  Combined network models (Change genealogy & code dependency network metrics)
  • 11. Prediction Precision NM & CGM Change genealogy metrics Code dependency network metrics (Zimmermann & Nagappan) Code complexity metrics
  • 12. Confirmed: Network metrics outperform complexity metrics. Change genealogy models report less false positives (higher precision). Change genealogy model slightly more false negatives (lower recall). Combining network metrics: good recall but worse precision.
  • 13. Influential Metrics Network efficiency among the top 10 most influential metrics. Relationship between changes and type of dependency top 2 metrics (for all projects). Higher number of old parents the higher the probability to add bugs.  Code entities combining multiple older functionalities more defect prone.
  • 14. Summary Adapting social network metrics to change dependency graphs. Comparing prediction models.  Change genealogies are well suited for defect prediction (better precision, close recall).  Code entities combining multiple older functionalities more defect prone.