SlideShare a Scribd company logo
Studying Bug-Introducing Commits in
Large Open Source Projects Stefano Sansone
Dr. Emad Shihab
Dept. of Software Engineering
Software systems are built from many small changes, called software commits. In general, the goal of these commits is to fix a prior error (i.e.,
software bug) or to enhance the functionality of the software system. However, in some cases, commits themselves can introduce software bugs.
These commits can be very costly and in our research we focus on effectively identifying software commits that have a high chance of introducing
bugs.
Results
Approach
Background
RQ 1: Do certain categories of
commits tend to be risky?
RQ 2: Can we accurately predict risky changes? Does
the prediction accuracy differ for the various categories?
RQ 3: How many and what metrics do we need to
accurately predict risky changes? Do the metrics
and the number of metrics differ for the various
categories?
Motivation
Metrics
CHURN – No. of lines added, deleted and modified
ADD – No. of lines added
DEL – No. of lines deleted
NS – No. of modified subsystems
ND – No. of modified directories
NF – No. of modified files
Ent – distribution of modified code across each file
NDEV – No. of developers that changed the modified files
AGE – average time interval between the last and current
change
NFC – No. of unique changes to the modified files
EXP – developer experience
REXP – recent developer experience
SEXP – developer experience on a subsystem
Largest risky categories
• Corrective
• Feature Addition
• Non Functional
• Good Recall
• Precision could be better
Most used metrics
• Developer
Experience
• Lines Modified
• No. Subsystems
• No. Developers
Number of Metrics
• Balance of recall and
precision with 7 metrics
• Tradeoff between Precision and Recall
• Precision slightly higher for Corrective category
(similar to other categories)
Categorize the
commits by their
commit messages.
Extract metrics from the commits.
Create two
prediction models
for each category
to identify risky
commits. A
generalized linear
model (GLM) and a
model that
compares the
median metric
values of risky
changes.
Bug-introducing (“risky”) commits is any commit that is later changed by another commit (the fix commit). These risky changes are found by
identifying the commits that first introduced a line of code that needed to be changed. Our research attempts to develop prediction models to be able
to predict if a change is risky as soon as it is added to the repository. Previous work has focused on predicting bug-introducing (risky) changes by
using a set of metrics and building prediction models off of them. Our work studies large open source projects and tries to find categories of commits
that have a high percentage of risky commits. Using this separation of commits into categories, we build prediction models based on the metrics of
individual categories to predict risky changes.
Metric All Corrective
Feature
Addition
Non
Functional Perfective Preventive
Developer Experience 84 62 70 48 26 46
Lines Modified 82 98 80 74 38 66
No. Subsystems 80 58 70 28 46 32
No. Developers 64 54 60 20 10 52
Developer Experience on
a subsystem 56 72 36 42 20 44
Distribution of Modified
Code 50 24 44 6 18 8
No. Unique Changes 44 34 42 24 10 6
Time since last change 36 6 22 20 32 24
Lines Added 20 20 0 40 0 8
No. Directories 20 40 18 26 12 38
No. Files 20 8 4 12 16 0
Lines Deleted 0 16 0 0 0 0
Recent Developer
Experience 0 0 0 0 0 0
Metric Usage Percent by Category In GLM Prediction Model
Project Risky Ratio Recall Precision F-measure
Maven-2 0.235 0.5 0.336 0.403
Perl 0.289 0.5 0.141 0.221
PostgreSQL 0.428 0.582 0.446 0.506
Rails 0.269 0.619 0.111 0.188
Precision - fraction of retrieved instances that are relevant
Recall - fraction of relevant instances that are retrieved
F-measure - weighted harmonic mean of precision and
recall
Extract Data Categorize Extract Metrics
Compare Metrics
and Identify Risky
Commits
Corrective
Feature Addition
Non Functional
Perfective
Preventive
Project Maven-2 Perl PostgreSQL Rails
Language Java C C Ruby
Start Date Sept '03 Dec '87 Jul '96 Nov '04
End Date May '12 Jun '13 Jun '13 Jun '13
No.
Commits
5638 51107 35194 37955
No.
Developers
34 1181 38 2342
Domain build
automation
tool
interpreted,
dynamic
programmi
ng language
object-relational
database
management
system
web application
framework
Extract commit data from
four large open source
projects.

More Related Content

What's hot

Software testing
Software testingSoftware testing
Software testing
Sengu Msc
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
ranapoonam1
 
Why test software
Why test softwareWhy test software
Why test software
Abdul Basit
 
Software engineering- system testing
Software engineering- system testingSoftware engineering- system testing
Software engineering- system testing
Tejas Mhaske
 
130830 stephane bunod - impact of code ownership on antipatterns
130830   stephane bunod - impact of code ownership on antipatterns130830   stephane bunod - impact of code ownership on antipatterns
130830 stephane bunod - impact of code ownership on antipatterns
Ptidej Team
 
Rayleigh model
Rayleigh modelRayleigh model
Rayleigh model
Roy Antony Arnold G
 
Testing throughout the software life cycle 2
Testing throughout the software life cycle 2Testing throughout the software life cycle 2
Testing throughout the software life cycle 2
novranrafindo
 
The Art of Testing Less without Sacrificing Quality @ ICSE 2015
The Art of Testing Less without Sacrificing Quality @ ICSE 2015The Art of Testing Less without Sacrificing Quality @ ICSE 2015
The Art of Testing Less without Sacrificing Quality @ ICSE 2015
Kim Herzig
 
Software Testing
Software Testing Software Testing
Software Testing
Vignesh Suresh
 
Beyond Static Analysis: Integrating .NET Static Analysis with Unit Testing a...
Beyond Static Analysis: Integrating .NET  Static Analysis with Unit Testing a...Beyond Static Analysis: Integrating .NET  Static Analysis with Unit Testing a...
Beyond Static Analysis: Integrating .NET Static Analysis with Unit Testing a...
Erika Barron
 
Static white box testing lecture 12
Static white box testing lecture 12Static white box testing lecture 12
Static white box testing lecture 12
Abdul Basit
 
SE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software DesignSE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software Design
Amr E. Mohamed
 
Software testing introduction
Software testing introductionSoftware testing introduction
Software testing introduction
Sriman Eshwar
 
Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis...
Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis...Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis...
Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis...
Andrew Petukhov
 
Testing throughout the software life cycle 2
Testing throughout the software life cycle 2Testing throughout the software life cycle 2
Testing throughout the software life cycle 2
Risun Hidayat
 
Levels of testing
Levels of testingLevels of testing
Levels of testing
Ranjeet Singh
 
Testing throughout the software life cycle
Testing throughout the software life cycleTesting throughout the software life cycle
Testing throughout the software life cycle
Selvy Ariska
 
Understanding the Rationale for Updating a Function's Comment
Understanding the Rationale for Updating a Function's CommentUnderstanding the Rationale for Updating a Function's Comment
Understanding the Rationale for Updating a Function's Comment
SAIL_QU
 
Testing Technique
Testing TechniqueTesting Technique
Testing Technique
Ajeng Savitri
 
Software testing q as collection by ravi
Software testing q as   collection by raviSoftware testing q as   collection by ravi
Software testing q as collection by ravi
Ravindranath Tagore
 

What's hot (20)

Software testing
Software testingSoftware testing
Software testing
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
 
Why test software
Why test softwareWhy test software
Why test software
 
Software engineering- system testing
Software engineering- system testingSoftware engineering- system testing
Software engineering- system testing
 
130830 stephane bunod - impact of code ownership on antipatterns
130830   stephane bunod - impact of code ownership on antipatterns130830   stephane bunod - impact of code ownership on antipatterns
130830 stephane bunod - impact of code ownership on antipatterns
 
Rayleigh model
Rayleigh modelRayleigh model
Rayleigh model
 
Testing throughout the software life cycle 2
Testing throughout the software life cycle 2Testing throughout the software life cycle 2
Testing throughout the software life cycle 2
 
The Art of Testing Less without Sacrificing Quality @ ICSE 2015
The Art of Testing Less without Sacrificing Quality @ ICSE 2015The Art of Testing Less without Sacrificing Quality @ ICSE 2015
The Art of Testing Less without Sacrificing Quality @ ICSE 2015
 
Software Testing
Software Testing Software Testing
Software Testing
 
Beyond Static Analysis: Integrating .NET Static Analysis with Unit Testing a...
Beyond Static Analysis: Integrating .NET  Static Analysis with Unit Testing a...Beyond Static Analysis: Integrating .NET  Static Analysis with Unit Testing a...
Beyond Static Analysis: Integrating .NET Static Analysis with Unit Testing a...
 
Static white box testing lecture 12
Static white box testing lecture 12Static white box testing lecture 12
Static white box testing lecture 12
 
SE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software DesignSE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software Design
 
Software testing introduction
Software testing introductionSoftware testing introduction
Software testing introduction
 
Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis...
Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis...Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis...
Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis...
 
Testing throughout the software life cycle 2
Testing throughout the software life cycle 2Testing throughout the software life cycle 2
Testing throughout the software life cycle 2
 
Levels of testing
Levels of testingLevels of testing
Levels of testing
 
Testing throughout the software life cycle
Testing throughout the software life cycleTesting throughout the software life cycle
Testing throughout the software life cycle
 
Understanding the Rationale for Updating a Function's Comment
Understanding the Rationale for Updating a Function's CommentUnderstanding the Rationale for Updating a Function's Comment
Understanding the Rationale for Updating a Function's Comment
 
Testing Technique
Testing TechniqueTesting Technique
Testing Technique
 
Software testing q as collection by ravi
Software testing q as   collection by raviSoftware testing q as   collection by ravi
Software testing q as collection by ravi
 

Viewers also liked

Capítulo I.
Capítulo I.Capítulo I.
Capítulo I.
Janne Pily
 
resume
resumeresume
The Art of Social Media: LinkedIn
The Art of Social Media: LinkedInThe Art of Social Media: LinkedIn
The Art of Social Media: LinkedIn
Christina Adams
 
A Pecsétviaszgomba
A PecsétviaszgombaA Pecsétviaszgomba
A Pecsétviaszgomba
Zoltán Sándor Erdélyi
 
Terrorismo, capitalismo e governança mundial
Terrorismo, capitalismo e governança mundialTerrorismo, capitalismo e governança mundial
Terrorismo, capitalismo e governança mundial
Fernando Alcoforado
 
Matematica estas ahi
Matematica estas ahiMatematica estas ahi
Matematica estas ahi
winyshirleydesposorio
 
Golf 4 millions members
Golf 4 millions membersGolf 4 millions members
Golf 4 millions members
Mark Slipp
 
Capítulo III.
Capítulo III.Capítulo III.
Capítulo III.
Janne Pily
 
lokeshResume
lokeshResumelokeshResume
lokeshResume
lokesh kumar
 
ED5013 Mod 3 Analysis
ED5013 Mod 3 AnalysisED5013 Mod 3 Analysis
ED5013 Mod 3 Analysis
Rachel Thorp
 
4 quadrant
4 quadrant4 quadrant

Viewers also liked (12)

Capítulo I.
Capítulo I.Capítulo I.
Capítulo I.
 
resume
resumeresume
resume
 
The Art of Social Media: LinkedIn
The Art of Social Media: LinkedInThe Art of Social Media: LinkedIn
The Art of Social Media: LinkedIn
 
A Pecsétviaszgomba
A PecsétviaszgombaA Pecsétviaszgomba
A Pecsétviaszgomba
 
Terrorismo, capitalismo e governança mundial
Terrorismo, capitalismo e governança mundialTerrorismo, capitalismo e governança mundial
Terrorismo, capitalismo e governança mundial
 
Matematica estas ahi
Matematica estas ahiMatematica estas ahi
Matematica estas ahi
 
Golf 4 millions members
Golf 4 millions membersGolf 4 millions members
Golf 4 millions members
 
tracking sheet-2
tracking sheet-2tracking sheet-2
tracking sheet-2
 
Capítulo III.
Capítulo III.Capítulo III.
Capítulo III.
 
lokeshResume
lokeshResumelokeshResume
lokeshResume
 
ED5013 Mod 3 Analysis
ED5013 Mod 3 AnalysisED5013 Mod 3 Analysis
ED5013 Mod 3 Analysis
 
4 quadrant
4 quadrant4 quadrant
4 quadrant
 

Similar to poster_3.0

A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
Martin Pinzger
 
Software Engineering Important Short Question for Exams
Software Engineering Important Short Question for ExamsSoftware Engineering Important Short Question for Exams
Software Engineering Important Short Question for Exams
MuhammadTalha436
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Amine Barrak
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1
SAIL_QU
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Martin Pinzger
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
Martin Pinzger
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...
eSAT Journals
 
J034057065
J034057065J034057065
J034057065
ijceronline
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
Nishanth Harapanahalli
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
Martin Pinzger
 
Intro-Soft-Engg-2.pptx
Intro-Soft-Engg-2.pptxIntro-Soft-Engg-2.pptx
Intro-Soft-Engg-2.pptx
Viju Neduvathoor
 
se02_SW_Process.ppt
se02_SW_Process.pptse02_SW_Process.ppt
se02_SW_Process.ppt
Nhân Công
 
Software Engineering Process Models
Software Engineering Process Models Software Engineering Process Models
Software Engineering Process Models
Satya P. Joshi
 
software Engineering process
software Engineering processsoftware Engineering process
software Engineering process
Raheel Aslam
 
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
lifove
 
Using HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsUsing HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review Analytics
The University of Adelaide
 
Esem2010 shihab
Esem2010 shihabEsem2010 shihab
Esem2010 shihab
SAIL_QU
 
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Annibale Panichella
 
Soft Eng - Software Process
Soft  Eng - Software ProcessSoft  Eng - Software Process
Soft Eng - Software Process
Jomel Penalba
 
Ch4
Ch4Ch4

Similar to poster_3.0 (20)

A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
Software Engineering Important Short Question for Exams
Software Engineering Important Short Question for ExamsSoftware Engineering Important Short Question for Exams
Software Engineering Important Short Question for Exams
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...
 
J034057065
J034057065J034057065
J034057065
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
 
Intro-Soft-Engg-2.pptx
Intro-Soft-Engg-2.pptxIntro-Soft-Engg-2.pptx
Intro-Soft-Engg-2.pptx
 
se02_SW_Process.ppt
se02_SW_Process.pptse02_SW_Process.ppt
se02_SW_Process.ppt
 
Software Engineering Process Models
Software Engineering Process Models Software Engineering Process Models
Software Engineering Process Models
 
software Engineering process
software Engineering processsoftware Engineering process
software Engineering process
 
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
 
Using HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsUsing HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review Analytics
 
Esem2010 shihab
Esem2010 shihabEsem2010 shihab
Esem2010 shihab
 
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
 
Soft Eng - Software Process
Soft  Eng - Software ProcessSoft  Eng - Software Process
Soft Eng - Software Process
 
Ch4
Ch4Ch4
Ch4
 

poster_3.0

  • 1. Studying Bug-Introducing Commits in Large Open Source Projects Stefano Sansone Dr. Emad Shihab Dept. of Software Engineering Software systems are built from many small changes, called software commits. In general, the goal of these commits is to fix a prior error (i.e., software bug) or to enhance the functionality of the software system. However, in some cases, commits themselves can introduce software bugs. These commits can be very costly and in our research we focus on effectively identifying software commits that have a high chance of introducing bugs. Results Approach Background RQ 1: Do certain categories of commits tend to be risky? RQ 2: Can we accurately predict risky changes? Does the prediction accuracy differ for the various categories? RQ 3: How many and what metrics do we need to accurately predict risky changes? Do the metrics and the number of metrics differ for the various categories? Motivation Metrics CHURN – No. of lines added, deleted and modified ADD – No. of lines added DEL – No. of lines deleted NS – No. of modified subsystems ND – No. of modified directories NF – No. of modified files Ent – distribution of modified code across each file NDEV – No. of developers that changed the modified files AGE – average time interval between the last and current change NFC – No. of unique changes to the modified files EXP – developer experience REXP – recent developer experience SEXP – developer experience on a subsystem Largest risky categories • Corrective • Feature Addition • Non Functional • Good Recall • Precision could be better Most used metrics • Developer Experience • Lines Modified • No. Subsystems • No. Developers Number of Metrics • Balance of recall and precision with 7 metrics • Tradeoff between Precision and Recall • Precision slightly higher for Corrective category (similar to other categories) Categorize the commits by their commit messages. Extract metrics from the commits. Create two prediction models for each category to identify risky commits. A generalized linear model (GLM) and a model that compares the median metric values of risky changes. Bug-introducing (“risky”) commits is any commit that is later changed by another commit (the fix commit). These risky changes are found by identifying the commits that first introduced a line of code that needed to be changed. Our research attempts to develop prediction models to be able to predict if a change is risky as soon as it is added to the repository. Previous work has focused on predicting bug-introducing (risky) changes by using a set of metrics and building prediction models off of them. Our work studies large open source projects and tries to find categories of commits that have a high percentage of risky commits. Using this separation of commits into categories, we build prediction models based on the metrics of individual categories to predict risky changes. Metric All Corrective Feature Addition Non Functional Perfective Preventive Developer Experience 84 62 70 48 26 46 Lines Modified 82 98 80 74 38 66 No. Subsystems 80 58 70 28 46 32 No. Developers 64 54 60 20 10 52 Developer Experience on a subsystem 56 72 36 42 20 44 Distribution of Modified Code 50 24 44 6 18 8 No. Unique Changes 44 34 42 24 10 6 Time since last change 36 6 22 20 32 24 Lines Added 20 20 0 40 0 8 No. Directories 20 40 18 26 12 38 No. Files 20 8 4 12 16 0 Lines Deleted 0 16 0 0 0 0 Recent Developer Experience 0 0 0 0 0 0 Metric Usage Percent by Category In GLM Prediction Model Project Risky Ratio Recall Precision F-measure Maven-2 0.235 0.5 0.336 0.403 Perl 0.289 0.5 0.141 0.221 PostgreSQL 0.428 0.582 0.446 0.506 Rails 0.269 0.619 0.111 0.188 Precision - fraction of retrieved instances that are relevant Recall - fraction of relevant instances that are retrieved F-measure - weighted harmonic mean of precision and recall Extract Data Categorize Extract Metrics Compare Metrics and Identify Risky Commits Corrective Feature Addition Non Functional Perfective Preventive Project Maven-2 Perl PostgreSQL Rails Language Java C C Ruby Start Date Sept '03 Dec '87 Jul '96 Nov '04 End Date May '12 Jun '13 Jun '13 Jun '13 No. Commits 5638 51107 35194 37955 No. Developers 34 1181 38 2342 Domain build automation tool interpreted, dynamic programmi ng language object-relational database management system web application framework Extract commit data from four large open source projects.