poster_3.0

Studying Bug-Introducing Commits in
Large Open Source Projects Stefano Sansone
Dr. Emad Shihab
Dept. of Software Engineering
Software systems are built from many small changes, called software commits. In general, the goal of these commits is to fix a prior error (i.e.,
software bug) or to enhance the functionality of the software system. However, in some cases, commits themselves can introduce software bugs.
These commits can be very costly and in our research we focus on effectively identifying software commits that have a high chance of introducing
bugs.
Results
Approach
Background
RQ 1: Do certain categories of
commits tend to be risky?
RQ 2: Can we accurately predict risky changes? Does
the prediction accuracy differ for the various categories?
RQ 3: How many and what metrics do we need to
accurately predict risky changes? Do the metrics
and the number of metrics differ for the various
categories?
Motivation
Metrics
CHURN – No. of lines added, deleted and modified
ADD – No. of lines added
DEL – No. of lines deleted
NS – No. of modified subsystems
ND – No. of modified directories
NF – No. of modified files
Ent – distribution of modified code across each file
NDEV – No. of developers that changed the modified files
AGE – average time interval between the last and current
change
NFC – No. of unique changes to the modified files
EXP – developer experience
REXP – recent developer experience
SEXP – developer experience on a subsystem
Largest risky categories
• Corrective
• Feature Addition
• Non Functional
• Good Recall
• Precision could be better
Most used metrics
• Developer
Experience
• Lines Modified
• No. Subsystems
• No. Developers
Number of Metrics
• Balance of recall and
precision with 7 metrics
• Tradeoff between Precision and Recall
• Precision slightly higher for Corrective category
(similar to other categories)
Categorize the
commits by their
commit messages.
Extract metrics from the commits.
Create two
prediction models
for each category
to identify risky
commits. A
generalized linear
model (GLM) and a
model that
compares the
median metric
values of risky
changes.
Bug-introducing (“risky”) commits is any commit that is later changed by another commit (the fix commit). These risky changes are found by
identifying the commits that first introduced a line of code that needed to be changed. Our research attempts to develop prediction models to be able
to predict if a change is risky as soon as it is added to the repository. Previous work has focused on predicting bug-introducing (risky) changes by
using a set of metrics and building prediction models off of them. Our work studies large open source projects and tries to find categories of commits
that have a high percentage of risky commits. Using this separation of commits into categories, we build prediction models based on the metrics of
individual categories to predict risky changes.
Metric All Corrective
Feature
Addition
Non
Functional Perfective Preventive
Developer Experience 84 62 70 48 26 46
Lines Modified 82 98 80 74 38 66
No. Subsystems 80 58 70 28 46 32
No. Developers 64 54 60 20 10 52
Developer Experience on
a subsystem 56 72 36 42 20 44
Distribution of Modified
Code 50 24 44 6 18 8
No. Unique Changes 44 34 42 24 10 6
Time since last change 36 6 22 20 32 24
Lines Added 20 20 0 40 0 8
No. Directories 20 40 18 26 12 38
No. Files 20 8 4 12 16 0
Lines Deleted 0 16 0 0 0 0
Recent Developer
Experience 0 0 0 0 0 0
Metric Usage Percent by Category In GLM Prediction Model
Project Risky Ratio Recall Precision F-measure
Maven-2 0.235 0.5 0.336 0.403
Perl 0.289 0.5 0.141 0.221
PostgreSQL 0.428 0.582 0.446 0.506
Rails 0.269 0.619 0.111 0.188
Precision - fraction of retrieved instances that are relevant
Recall - fraction of relevant instances that are retrieved
F-measure - weighted harmonic mean of precision and
recall
Extract Data Categorize Extract Metrics
Compare Metrics
and Identify Risky
Commits
Corrective
Feature Addition
Non Functional
Perfective
Preventive
Project Maven-2 Perl PostgreSQL Rails
Language Java C C Ruby
Start Date Sept '03 Dec '87 Jul '96 Nov '04
End Date May '12 Jun '13 Jun '13 Jun '13
No.
Commits
5638 51107 35194 37955
No.
Developers
34 1181 38 2342
Domain build
automation
tool
interpreted,
dynamic
programmi
ng language
object-relational
database
management
system
web application
framework
Extract commit data from
four large open source
projects.

poster_3.0

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to poster_3.0

Similar to poster_3.0 (20)

poster_3.0