SlideShare a Scribd company logo
1 of 45
Download to read offline
A Tale of Experiments on
Bug Prediction
Martin Pinzger
Professor of Software Engineering
University of Klagenfurt, Austria


Follow me: @pinzger
Software repositories
2
Hmm, wait a minute
3
Can’t we learn “something” from that data?
Goal of software repository mining
Software Analytics
To obtain insightful and actionable information for completing various tasks
around developing and maintaining software systems
Examples
Quality analysis and defect prediction
Detecting “hot-spots”
Preventing defects
Recommender (advisory) systems
Code completion
Suggesting good code examples
Helping in using an API
...
4
Examples from my mining research
My mining research
The relationship between developer contributions and failure-prone Microsoft Vista
binaries (FSE 2008)
Predicting failure-prone methods (ESEM 2012)
Predicting Build Co-Changes with Source Code Change and Commit Categories
(to appear at SANER 2016)
For more see: http://serg.aau.at/bin/view/MartinPinzger/Publications
Surveys on software repository mining
A survey and taxonomy of approaches for mining software repositories in the
context of software evolution, Kagdi et al. 2007
Evaluating defect prediction approaches: a benchmark and an extensive
comparison, D’Ambros et al. 2012
Conference: MSR 2016 http://msrconf.org/
5
Method-Level Bug Prediction
with Emanuel Giger, Marco D’Ambros*, Harald Gall
University of Zurich
*University of Lugano
Many existing studies to predict bug-prone files
A comparative analysis of the efficiency of change metrics and static
code attributes for defect prediction, Moser et al. 2008
Use of relative code churn measures to predict system defect
density, Nagappan et al. 2005
Cross-project defect prediction: a large scale experiment on data vs.
domain vs. process, Zimmermann et al. 2009
Predicting faults using the complexity of code changes, Hassan et al.
2009
7
Prediction granularity
11 methods on average
class 1 class 2 class 3 class n...class 2
4 methods are bug prone (ca. 36%)
Retrieving bug-prone methods saves manual inspection effort and
testing effort
8
Large files are typically the most bug-prone files
Research questions
How accurate can we predict buggy methods?
Which characteristics (i.e., metrics) are indicating bug prone
methods?
How does the accuracy vary if the number of buggy methods
decreases?
9
Research questions
10
RQ1 What is the accuracy of bug prediction on
method level?
RQ2 Which characteristics (i.e., metrics) are
indicating bug prone methods?
RQ3 How does the accuracy vary if the number of
buggy methods decreases?
Experiment with 21 Java open source projects
11
Project #Classes #Methods #M-Histories #Bugs
JDT Core 1.140 17.703 43.134 4.888
Jena2 897 8.340 7.764 704
Lucene 477 3.870 1.754 377
Xerces 693 8.189 6.866 1.017
Derby Engine 1.394 18.693 9.507 1.663
Ant Core 827 8.698 17.993 1.900
Approach overview
how many of them (Bugs), and (3) fine-grained source code
changes (SCC).
4. Experiment
2. Bug Data
3. Source Code Changes (SCC)1.Versioning Data
CVS, SVN,
GIT
Evolizer
RHDB
Log Entries
ChangeDistiller
Subsequent
Versions
Changes
#bug123
Message Bug
Support
Vector
Machine
1.1 1.2
AST
Comparison
12
Investigated metrics
13
Source code metrics (from the last release)
fanIn, fanOut, localVar, parameters, commentToCodeRatio, countPath, McCabe
Complexity, statements, maxNesting
Change metrics
methodHistories, authors,
stmtAdded, maxStmtAdded, avgStmtAdded,
stmtDeleted, maxStmtDeleted, avgStmtDeleted,
churn, maxChurn, avgChurn,
decl, cond, elseAdded, elseDeleted
Bugs
Count bug references in commit logs for changed methods
Predicting bug-prone methods
Bug-prone vs. not bug-prone
14
.1 Experimental Setup
Prior to model building and classification we labeled
ethod in our dataset either as bug-prone or not bug-p
s follows:
bugClass =
not bug − prone : #bugs = 0
bug − prone : #bugs >= 1
hese two classes represent the binary target classes
aining and validating the prediction models. Using 0
pectively 1) as cut-point is a common approach applie
any studies covering bug prediction models, e.g., [30
7, 4, 27, 37]. Other cut-points are applied in litera
r instance, a statistical lower confidence bound [33] or
edian [16]. Those varying cut-points as well as the div
Models computed with change metrics (CM) perform best
authors and methodHistories are the most important measures
Accuracy of prediction models
15
Table 4: Median classification results over all pro-
jects per classifier and per model
CM SCM CM&SCM
AUC P R AUC P R AUC P R
RndFor .95 .84 .88 .72 .5 .64 .95 .85 .95
SVM .96 .83 .86 .7 .48 .63 .95 .8 .96
BN .96 .82 .86 .73 .46 .73 .96 .81 .96
J48 .95 .84 .82 .69 .56 .58 .91 .83 .89
values of the code metrics model are approximately 0.7 for
each classifier—what is defined by Lessman et al. as ”promis-
ing” [26]. However, the source code metrics suffer from con-
siderably low precision values. The highest median precision
Predicting bug-prone methods with diff. cut-points
Bug-prone vs. not bug-prone
p = 75%, 90%, 95% percentiles of #bugs in methods per project
-> predict the top 25%, 10%, and 5% bug-prone methods
16
ow the classification performance varies (RQ3) as the
er of samples in the target class shrinks, and wheth
bserve similar findings as in Section 3.2 regarding t
ults of the change and code metrics (RQ2). For tha
pplied three additional cut-point values as follows:
bugClass =
not bug − prone : #bugs <= p
bug − prone : #bugs > p
here p represents either the value of the 75%, 90%, or
ercentile of the distribution of the number of bugs in
ds per project. For example, using the 95% percent
ut-point for prior binning would mean to predict the
ve percent” methods in terms of the number of bugs.
To conduct this study we applied the same experim
etup as in Section 3.1, except for the differently chose
Decreasing the number of bug-prone methods
Models trained with Random Forest (RndFor)
Change metrics (CM) perform best
Precision decreases (as expected)
17
Table 5: Median classification results for RndFor
ver all projects per cut-point and per model
CM SCM CM&SCM
AUC P R AUC P R AUC P R
GT0 .95 .84 .88 .72 .50 .64 .95 .85 .95
75% .97 .72 .95 .75 .39 .63 .97 .74 .95
90% .97 .58 .94 .77 .20 .69 .98 .64 .94
95% .97 .62 .92 .79 .13 .72 .98 .68 .92
ion in the case of the 95% percentile (median precision of
.13). Looking at the change metrics and the combined
model the median precision is significantly higher for the
Application: file level vs. method level prediction
JDT Core 3.0 - LocalDeclaration.class
Contains 6 methods / 1 affected by post release bugs
LocalDeclaration.resolve(...) was predicted bug-prone with p=0.97
File-level: p=0.17 to guess the bug-prone method
Need to manually rule out 5 methods to reach >0.82 precision 1 / (6-5)
JDT Core 3.0 - Main.class
Contains 26 methods / 11 affected by post release bugs
Main.configure(...) was predicted bug-prone with p=1.0
File-level: p=0.42 to guess a bug-prone method
Need to rule out 13 methods to reach >0.82 precision 11 / (26-13)
18
What can we learn from that?
Large files are more likely to change and have bugs
Test large files more thoroughly - YES
Bugs are fixed through changes that again lead to bugs
Stop changing our systems - NO, of course not!
Test changing entities more thoroughly - YES
Are we not already doing that?
Do we really need (complex) prediction models for that?
Not sure - might be the reason why these models are not really used, yet
Microsoft started to add prediction models to their quality assurance tools - current status?
But, use at least a metric tools and keep track of your code quality
-> Continuous integration environments, SONAR
19
Can developer-module networks
predict failures?
with Nachi Nagappan, Brendan Murphy
Microsoft Research
Team structure and post-failures
Results of an initial study with MS Vista
#Authors and #Commits of binaries is correlated with the #post-release failures
We wanted to find out
Are binaries with fragmented contributions from many developers more likely to
have post-release failures?
Should developers focus on one thing?


21
Study with MS Vista project
Data
Released in January, 2007
> 4 years of development
Several thousand developers
Several thousand binaries (*.exe, *.dll)
Several millions of commits
22
Alice
Bob
Dan
Eric
Fu
Go
Hin
ab
c
Approach in a nutshell
23
Change
Logs
Bugs
Regression Analysis
Validation with data splitting
Alice
Dan
Eric Go
Hin c
5
4
6
2
5 7
4
a
4
Bob
2
b
6
Fu
Binary #bugs #centrality
a 12 0.9
b 7 0.5
c 3 0.2
Contribution network
24
Alice
Bob
Dan
Eric
Fu
Go
Hin
ab
c
Windows binary (*.dll)
Developer
Which binary is failure-prone?
Measuring fragmentation
25
Alice
Bob
Dan
Eric
Fu
Go
Hin
ab
c
Freeman degree
Alice
Bob
Dan
Eric
Fu
Go
Hin
ab
c
Alice
Bob
Dan
Eric
Fu
Go
Hin
ab
c
Bonacich’s powerCloseness
Alice
Bob
Dan
Eric
Fu
Go
Hin
ab
c
Research questions
Are binaries with fragmented contributions more failure-prone?
Does more fragmentation also mean a higher number of post-release
failures?
Which measures of fragmentation are useful for failure estimation?
26
Correlation analysis
27
nrCommits nrAuthors Power dPower Closeness Reach Betweenness
Failures 0,7 0,699 0,692 0,74 0,747 0,746 0,503
nrCommits 0,704 0,996 0,773 0,748 0,732 0,466
nrAuthors 0,683 0,981 0,914 0,944 0,83
Power 0,756 0,732 0,714 0,439
dPower 0,943 0,964 0,772
Closeness 0,99 0,738
Reach 0,773
Spearman rank correlation
All correlations are significant at the 0.01 level (2-tailed)
How to predict failure-prone binaries?
Binary logistic regression of 50 random splits
4 principal components from 7 centrality measures
28
40200
1.00
0.90
0.80
0.70
0.60
0.50
40200
1.00
0.90
0.80
0.70
0.60
0.50
40200
1.00
0.90
0.80
0.70
0.60
0.50
Precision Recall AUC
Hot to predict the number of failures?
Linear regression of 50 random splits
#Failures = b0 + b1*nCloseness + b2*nrAuthors + b3*nrCommits
All correlations are significant at the 0.01 level (2-tailed)
40200
1.00
0.90
0.80
0.70
0.60
0.50
40200
1.00
0.90
0.80
0.70
0.60
0.50
40200
1.00
0.90
0.80
0.70
0.60
0.50
R-Square Pearson Spearman
29
Which fragmentation measures to use?
30
40200
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
40200
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
40200
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
40200
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
R-SquareSpearman
Model with nrAuthors,
nrCommits
Model with nCloseness,
nrAuthors, nrCommits
Summary of results
Centrality measures can predict more than 83% of failure-pone Vista
binaries
Closeness, nrAuthors, and nrCommits can predict the number of post-
release failures
Closeness or Reach can improve prediction of the number of post-
release failures by 32%
More information
Can Developer-Module Networks Predict Failures?, FSE 2008
31
Alice
Bob
Dan
Eric
Fu
Go
Hin
ab
c
5
4
6
2 4
6
2
5 7
4
What can we learn from that?
32
Alice
Bob
Dan
Eric
Fu
Go
Hin
ab
c
5
4
6
2 4
6
2
5 7
4
What can we learn from that?
Re-organize/restrict developer contributions
Simply: fire Bob!
Find out the reasons why Bob is contributing to both binaries
At MS, few key developers helped in many places to get Vista running
33
Alice
Bob
Dan
Eric
Fu
Go
Hin
ab
c
5
4
6
2 4
6
2
5 7
4
What can we learn from that?
Re-factor central binaries
Check the contract between “a” and “b” - decouple them
E.g., analyze if “a” contains functionality that should be moved to “b” or a new
binary
34
Alice
Bob
Dan
Eric
Fu
Go
Hin
ab
c
5
4
6
2 4
6
2
5 7
4
What can we learn from that?
Increase testing of binaries “a” and “b”
Yes, since these binaries are failure prone
35
What did Microsoft do with the results?
Results were kept within MS Research (at least in 2007 and 2008)
Be careful - our findings were purely based on the data
Such findings often do not show the full picture, since not everything is recorded
My results triggered a lot more research on developer contributions
at MS Research
36
Why researchers want/need/must
collaborate with industry?
My experiences with open source and industrial
software projects
Open source projects: pros
+ Provide tons of data
E.g., Eclipse project, Apache projects, etc.
+ Easy to access
Almost “all” data is publicly available, e.g., Github
+ No organizational obstacles to get access
+ You get ALL the data, not just parts of it
38
Open source projects: cons
- Research is (sometimes) difficult to motivate
Sometimes researchers analyze open source projects to try out some new
technique/algorithms but without knowing what actual problem they want to
solve
Why are you doing this?
What is the value for the research and industry
- Difficult to get in touch with the developers to validate the results
10 years back, I showed our results to Mozilla - they only said “interesting” but
that was it!
Some open source communities are more responsive: e.g., Eclipse community
Still, you typically do not get the chance to meet them face-2-face
39
Industrial projects: pros
+ Usually provide real problems
But sometimes these problems are not “research” problems - and researchers
want to do research
+ Provide contact to developers to obtain feedback on the findings
Industry as a laboratory
Helps to evaluate what is useful and what is not
+ Potential to see our research results used at least by developers
Processes, tools, algorithms, models, best practices, etc.
40
Industrial projects: cons
- Often expectations between researchers and developers differ a lot
Developers want to get things done - researchers want to publish
Solutions are too complex and/or only applicable to a very specific case study,
therefore not useful
- Note, researchers often provide know how, not ready made tools
Is that a good idea? - Not always, but we are typically cheaper than most
consultants
- Industry provides only partial case studies
Often not really useful to perform research on -> we need data and a lot of it
41
How I got in touch with Microsoft (Research)
Met them at the Microsoft developers conference
Got invited to Redmond to show them what we could do FOR them
Agreed on sending a researcher (me) to Redmond for three months
Fully paid by Microsoft
Once within Microsoft got access to the data and to some developers
My main contact was with MS Research


Microsoft invests in internships and visiting researchers
They use it to find and hire talented people
42
What is next on my research agenda?
Study defect prediction in industrial software projects
I am still looking for a good industrial partner (and a student)
Ease understanding changes and their effects
What is the effect on the design?
What is the effect on the quality?
Recommender techniques
Identify the sources of problems
Recommend and perform refactorings to solve the problem
Provide advice on the effects of changes to prevent problems
For this I want and need to collaborate with industry!
43
Conclusions
44
Questions?
Martin Pinzger
martin.pinzger@aau.at
the history of a software system to assemble the dataset for
our experiments: (1) versioning data including lines modi-
fied (LM), (2) bug data, i.e., which files contained bugs and
how many of them (Bugs), and (3) fine-grained source code
changes (SCC).
4. Experiment
2. Bug Data
3. Source Code Changes (SCC)1.Versioning Data
CVS, SVN,
GIT
Evolizer
RHDB
Log Entries
ChangeDistiller
Subsequent
Versions
Changes
#bug123
Message Bug
Support
Vector
Machine
1.1 1.2
AST
Comparison
Figure 1: Stepwise overview of the data extraction process.
1. Versioning Data. We use EVOLIZER [14] to access the ver-
sioning repositories , e.g., CVS, SVN, or GIT. They provide
log entries that contain information about revisions of files
that belong to a system. From the log entries we extract the
revision number (to identify the revisions of a file in correct
temporal order), the revision timestamp, the name of the de-
veloper who checked-in the new revision, and the commit
message. We then compute LM for a source file as the sum of
lines added, lines deleted, and lines changed per file revision.
2. Bug Data. Bug reports are stored in bug repositories such
as Bugzilla. Traditional bug tracking and versioning repos-
Update Core 595 8’496 251’434 36’151 532 Oct0
Debug UI 1’954 18’862 444’061 81’836 3’120 May
JDT Debug UI 775 8’663 168’598 45’645 2’002 Nov
Help 598 3’658 66’743 12’170 243 May
JDT Core 1’705 63’038 2’814K 451’483 6’033 Jun0
OSGI 748 9’866 335’253 56’238 1’411 Nov
single source code statements, e.g., method invocatio
ments, between two versions of a program by com
their respective abstract syntax trees (AST). Each chan
represents a tree edit operation that is required to tr
one version of the AST into the other. The algorithm i
mented in CHANGEDISTILLER [14] that pairwise co
the ASTs between all direct subsequent revisions of e
Based on this information, we then count the numbe
ferent source code changes (SCC) per file revision.
The preprocessed data from step 1-3 is stored into
lease History Database (RHDB) [10]. From that data,
compute LM, SCC, and Bugs for each source file by a
ing the values over the given observation period.
3. EMPIRICAL STUDY
In this section, we present the empirical study that
formed to investigate the hypotheses stated in Sectio
discuss the dataset, the statistical methods and machi
ing algorithms we used, and report on the results a
ings of the experiments.
3.1 Dataset and Data Preparation
We performed our experiments on 15 plugins of the
platform. Eclipse is a popular open source system
been studied extensively before [4,27,38,39].
Table 1 gives an overview of the Eclipse dataset
this study with the number of unique *.java files (Fi
Alice
Bob
Dan
Eric
Fu
Go
Hin
ab
c
5
4
6
2 4
6
2
5 7
4
Academia wants/needs/must
collaborate with industry
Industry should invest in such
a collaboration
45

More Related Content

What's hot

Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesThomas Zimmermann
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationkrws
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation DefenseSung Kim
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesSung Kim
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...SAIL_QU
 
The relationship between test and production code quality (@ SIG)
The relationship between test and production code quality (@ SIG)The relationship between test and production code quality (@ SIG)
The relationship between test and production code quality (@ SIG)Maurício Aniche
 
Zikopis Evangelos Thesis Presentation
Zikopis Evangelos Thesis PresentationZikopis Evangelos Thesis Presentation
Zikopis Evangelos Thesis PresentationISSEL
 
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...eSAT Journals
 
Data collection for software defect prediction
Data collection for software defect predictionData collection for software defect prediction
Data collection for software defect predictionAmmAr mobark
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...Ali Ouni
 
ICSE2013
ICSE2013ICSE2013
ICSE2013swy351
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSung Kim
 
ICSME2014
ICSME2014ICSME2014
ICSME2014swy351
 
A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...
A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...
A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...Lionel Briand
 
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...Lionel Briand
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
ICSE2014
ICSE2014ICSE2014
ICSE2014swy351
 

What's hot (20)

Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localization
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation Defense
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
The relationship between test and production code quality (@ SIG)
The relationship between test and production code quality (@ SIG)The relationship between test and production code quality (@ SIG)
The relationship between test and production code quality (@ SIG)
 
Zikopis Evangelos Thesis Presentation
Zikopis Evangelos Thesis PresentationZikopis Evangelos Thesis Presentation
Zikopis Evangelos Thesis Presentation
 
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...
 
Data collection for software defect prediction
Data collection for software defect predictionData collection for software defect prediction
Data collection for software defect prediction
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
 
ICSE2013
ICSE2013ICSE2013
ICSE2013
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
ICSME2014
ICSME2014ICSME2014
ICSME2014
 
A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...
A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...
A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...
 
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
Known XML Vulnerabilities Are Still a Threat to Popular Parsers ! & Open Sour...
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
ICSE2014
ICSE2014ICSE2014
ICSE2014
 

Viewers also liked

Viewers also liked (12)

Navneet id
Navneet idNavneet id
Navneet id
 
Innovative Ideas in Literacy Conference flyer
Innovative Ideas in Literacy Conference flyerInnovative Ideas in Literacy Conference flyer
Innovative Ideas in Literacy Conference flyer
 
Test
TestTest
Test
 
Virus informáticos
Virus informáticosVirus informáticos
Virus informáticos
 
Doze joias literarias
Doze joias literariasDoze joias literarias
Doze joias literarias
 
Um assistido sem esclarecimento espiritual
Um assistido sem esclarecimento espiritualUm assistido sem esclarecimento espiritual
Um assistido sem esclarecimento espiritual
 
skateboardmockup
skateboardmockupskateboardmockup
skateboardmockup
 
Southern Cross University International application-for-admission-2016042
Southern Cross University International application-for-admission-2016042Southern Cross University International application-for-admission-2016042
Southern Cross University International application-for-admission-2016042
 
Mighty midgets
Mighty midgetsMighty midgets
Mighty midgets
 
Spar dos pés r
Spar dos pés rSpar dos pés r
Spar dos pés r
 
Guia de estudio tecnologia
Guia de estudio tecnologiaGuia de estudio tecnologia
Guia de estudio tecnologia
 
cer2
cer2cer2
cer2
 

Similar to A Tale of Experiments on Bug Prediction

A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug predictionMartin Pinzger
 
Esem2010 shihab
Esem2010 shihabEsem2010 shihab
Esem2010 shihabSAIL_QU
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Martin Pinzger
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software developmentMartin Pinzger
 
A survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsA survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsAhmed Magdy Ezzeldin, MSc.
 
A value added predictive defect type distribution model
A value added predictive defect type distribution modelA value added predictive defect type distribution model
A value added predictive defect type distribution modelUmeshchandraYadav5
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smelleSAT Publishing House
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smelleSAT Journals
 
A defect prediction model based on the relationships between developers and c...
A defect prediction model based on the relationships between developers and c...A defect prediction model based on the relationships between developers and c...
A defect prediction model based on the relationships between developers and c...Vrije Universiteit Brussel
 
Bug Triage: An Automated Process
Bug Triage: An Automated ProcessBug Triage: An Automated Process
Bug Triage: An Automated ProcessIRJET Journal
 
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...Annibale Panichella
 
Traps detection during migration of C and C++ code to 64-bit Windows
Traps detection during migration of C and C++ code to 64-bit WindowsTraps detection during migration of C and C++ code to 64-bit Windows
Traps detection during migration of C and C++ code to 64-bit WindowsPVS-Studio
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_JieMDO_Lab
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1SAIL_QU
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningIRJET Journal
 

Similar to A Tale of Experiments on Bug Prediction (20)

A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
 
Esem2010 shihab
Esem2010 shihabEsem2010 shihab
Esem2010 shihab
 
poster_3.0
poster_3.0poster_3.0
poster_3.0
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
 
A survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsA survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithms
 
A value added predictive defect type distribution model
A value added predictive defect type distribution modelA value added predictive defect type distribution model
A value added predictive defect type distribution model
 
J034057065
J034057065J034057065
J034057065
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smell
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smell
 
A defect prediction model based on the relationships between developers and c...
A defect prediction model based on the relationships between developers and c...A defect prediction model based on the relationships between developers and c...
A defect prediction model based on the relationships between developers and c...
 
Bug Triage: An Automated Process
Bug Triage: An Automated ProcessBug Triage: An Automated Process
Bug Triage: An Automated Process
 
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
 
Debug me
Debug meDebug me
Debug me
 
Traps detection during migration of C and C++ code to 64-bit Windows
Traps detection during migration of C and C++ code to 64-bit WindowsTraps detection during migration of C and C++ code to 64-bit Windows
Traps detection during migration of C and C++ code to 64-bit Windows
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
DSUS_MAO_2012_Jie
DSUS_MAO_2012_JieDSUS_MAO_2012_Jie
DSUS_MAO_2012_Jie
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
 

Recently uploaded

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 

Recently uploaded (20)

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 

A Tale of Experiments on Bug Prediction

  • 1. A Tale of Experiments on Bug Prediction Martin Pinzger Professor of Software Engineering University of Klagenfurt, Austria 
 Follow me: @pinzger
  • 3. Hmm, wait a minute 3 Can’t we learn “something” from that data?
  • 4. Goal of software repository mining Software Analytics To obtain insightful and actionable information for completing various tasks around developing and maintaining software systems Examples Quality analysis and defect prediction Detecting “hot-spots” Preventing defects Recommender (advisory) systems Code completion Suggesting good code examples Helping in using an API ... 4
  • 5. Examples from my mining research My mining research The relationship between developer contributions and failure-prone Microsoft Vista binaries (FSE 2008) Predicting failure-prone methods (ESEM 2012) Predicting Build Co-Changes with Source Code Change and Commit Categories (to appear at SANER 2016) For more see: http://serg.aau.at/bin/view/MartinPinzger/Publications Surveys on software repository mining A survey and taxonomy of approaches for mining software repositories in the context of software evolution, Kagdi et al. 2007 Evaluating defect prediction approaches: a benchmark and an extensive comparison, D’Ambros et al. 2012 Conference: MSR 2016 http://msrconf.org/ 5
  • 6. Method-Level Bug Prediction with Emanuel Giger, Marco D’Ambros*, Harald Gall University of Zurich *University of Lugano
  • 7. Many existing studies to predict bug-prone files A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, Moser et al. 2008 Use of relative code churn measures to predict system defect density, Nagappan et al. 2005 Cross-project defect prediction: a large scale experiment on data vs. domain vs. process, Zimmermann et al. 2009 Predicting faults using the complexity of code changes, Hassan et al. 2009 7
  • 8. Prediction granularity 11 methods on average class 1 class 2 class 3 class n...class 2 4 methods are bug prone (ca. 36%) Retrieving bug-prone methods saves manual inspection effort and testing effort 8 Large files are typically the most bug-prone files
  • 9. Research questions How accurate can we predict buggy methods? Which characteristics (i.e., metrics) are indicating bug prone methods? How does the accuracy vary if the number of buggy methods decreases? 9
  • 10. Research questions 10 RQ1 What is the accuracy of bug prediction on method level? RQ2 Which characteristics (i.e., metrics) are indicating bug prone methods? RQ3 How does the accuracy vary if the number of buggy methods decreases?
  • 11. Experiment with 21 Java open source projects 11 Project #Classes #Methods #M-Histories #Bugs JDT Core 1.140 17.703 43.134 4.888 Jena2 897 8.340 7.764 704 Lucene 477 3.870 1.754 377 Xerces 693 8.189 6.866 1.017 Derby Engine 1.394 18.693 9.507 1.663 Ant Core 827 8.698 17.993 1.900
  • 12. Approach overview how many of them (Bugs), and (3) fine-grained source code changes (SCC). 4. Experiment 2. Bug Data 3. Source Code Changes (SCC)1.Versioning Data CVS, SVN, GIT Evolizer RHDB Log Entries ChangeDistiller Subsequent Versions Changes #bug123 Message Bug Support Vector Machine 1.1 1.2 AST Comparison 12
  • 13. Investigated metrics 13 Source code metrics (from the last release) fanIn, fanOut, localVar, parameters, commentToCodeRatio, countPath, McCabe Complexity, statements, maxNesting Change metrics methodHistories, authors, stmtAdded, maxStmtAdded, avgStmtAdded, stmtDeleted, maxStmtDeleted, avgStmtDeleted, churn, maxChurn, avgChurn, decl, cond, elseAdded, elseDeleted Bugs Count bug references in commit logs for changed methods
  • 14. Predicting bug-prone methods Bug-prone vs. not bug-prone 14 .1 Experimental Setup Prior to model building and classification we labeled ethod in our dataset either as bug-prone or not bug-p s follows: bugClass = not bug − prone : #bugs = 0 bug − prone : #bugs >= 1 hese two classes represent the binary target classes aining and validating the prediction models. Using 0 pectively 1) as cut-point is a common approach applie any studies covering bug prediction models, e.g., [30 7, 4, 27, 37]. Other cut-points are applied in litera r instance, a statistical lower confidence bound [33] or edian [16]. Those varying cut-points as well as the div
  • 15. Models computed with change metrics (CM) perform best authors and methodHistories are the most important measures Accuracy of prediction models 15 Table 4: Median classification results over all pro- jects per classifier and per model CM SCM CM&SCM AUC P R AUC P R AUC P R RndFor .95 .84 .88 .72 .5 .64 .95 .85 .95 SVM .96 .83 .86 .7 .48 .63 .95 .8 .96 BN .96 .82 .86 .73 .46 .73 .96 .81 .96 J48 .95 .84 .82 .69 .56 .58 .91 .83 .89 values of the code metrics model are approximately 0.7 for each classifier—what is defined by Lessman et al. as ”promis- ing” [26]. However, the source code metrics suffer from con- siderably low precision values. The highest median precision
  • 16. Predicting bug-prone methods with diff. cut-points Bug-prone vs. not bug-prone p = 75%, 90%, 95% percentiles of #bugs in methods per project -> predict the top 25%, 10%, and 5% bug-prone methods 16 ow the classification performance varies (RQ3) as the er of samples in the target class shrinks, and wheth bserve similar findings as in Section 3.2 regarding t ults of the change and code metrics (RQ2). For tha pplied three additional cut-point values as follows: bugClass = not bug − prone : #bugs <= p bug − prone : #bugs > p here p represents either the value of the 75%, 90%, or ercentile of the distribution of the number of bugs in ds per project. For example, using the 95% percent ut-point for prior binning would mean to predict the ve percent” methods in terms of the number of bugs. To conduct this study we applied the same experim etup as in Section 3.1, except for the differently chose
  • 17. Decreasing the number of bug-prone methods Models trained with Random Forest (RndFor) Change metrics (CM) perform best Precision decreases (as expected) 17 Table 5: Median classification results for RndFor ver all projects per cut-point and per model CM SCM CM&SCM AUC P R AUC P R AUC P R GT0 .95 .84 .88 .72 .50 .64 .95 .85 .95 75% .97 .72 .95 .75 .39 .63 .97 .74 .95 90% .97 .58 .94 .77 .20 .69 .98 .64 .94 95% .97 .62 .92 .79 .13 .72 .98 .68 .92 ion in the case of the 95% percentile (median precision of .13). Looking at the change metrics and the combined model the median precision is significantly higher for the
  • 18. Application: file level vs. method level prediction JDT Core 3.0 - LocalDeclaration.class Contains 6 methods / 1 affected by post release bugs LocalDeclaration.resolve(...) was predicted bug-prone with p=0.97 File-level: p=0.17 to guess the bug-prone method Need to manually rule out 5 methods to reach >0.82 precision 1 / (6-5) JDT Core 3.0 - Main.class Contains 26 methods / 11 affected by post release bugs Main.configure(...) was predicted bug-prone with p=1.0 File-level: p=0.42 to guess a bug-prone method Need to rule out 13 methods to reach >0.82 precision 11 / (26-13) 18
  • 19. What can we learn from that? Large files are more likely to change and have bugs Test large files more thoroughly - YES Bugs are fixed through changes that again lead to bugs Stop changing our systems - NO, of course not! Test changing entities more thoroughly - YES Are we not already doing that? Do we really need (complex) prediction models for that? Not sure - might be the reason why these models are not really used, yet Microsoft started to add prediction models to their quality assurance tools - current status? But, use at least a metric tools and keep track of your code quality -> Continuous integration environments, SONAR 19
  • 20. Can developer-module networks predict failures? with Nachi Nagappan, Brendan Murphy Microsoft Research
  • 21. Team structure and post-failures Results of an initial study with MS Vista #Authors and #Commits of binaries is correlated with the #post-release failures We wanted to find out Are binaries with fragmented contributions from many developers more likely to have post-release failures? Should developers focus on one thing? 
 21
  • 22. Study with MS Vista project Data Released in January, 2007 > 4 years of development Several thousand developers Several thousand binaries (*.exe, *.dll) Several millions of commits 22
  • 23. Alice Bob Dan Eric Fu Go Hin ab c Approach in a nutshell 23 Change Logs Bugs Regression Analysis Validation with data splitting Alice Dan Eric Go Hin c 5 4 6 2 5 7 4 a 4 Bob 2 b 6 Fu Binary #bugs #centrality a 12 0.9 b 7 0.5 c 3 0.2
  • 24. Contribution network 24 Alice Bob Dan Eric Fu Go Hin ab c Windows binary (*.dll) Developer Which binary is failure-prone?
  • 26. Research questions Are binaries with fragmented contributions more failure-prone? Does more fragmentation also mean a higher number of post-release failures? Which measures of fragmentation are useful for failure estimation? 26
  • 27. Correlation analysis 27 nrCommits nrAuthors Power dPower Closeness Reach Betweenness Failures 0,7 0,699 0,692 0,74 0,747 0,746 0,503 nrCommits 0,704 0,996 0,773 0,748 0,732 0,466 nrAuthors 0,683 0,981 0,914 0,944 0,83 Power 0,756 0,732 0,714 0,439 dPower 0,943 0,964 0,772 Closeness 0,99 0,738 Reach 0,773 Spearman rank correlation All correlations are significant at the 0.01 level (2-tailed)
  • 28. How to predict failure-prone binaries? Binary logistic regression of 50 random splits 4 principal components from 7 centrality measures 28 40200 1.00 0.90 0.80 0.70 0.60 0.50 40200 1.00 0.90 0.80 0.70 0.60 0.50 40200 1.00 0.90 0.80 0.70 0.60 0.50 Precision Recall AUC
  • 29. Hot to predict the number of failures? Linear regression of 50 random splits #Failures = b0 + b1*nCloseness + b2*nrAuthors + b3*nrCommits All correlations are significant at the 0.01 level (2-tailed) 40200 1.00 0.90 0.80 0.70 0.60 0.50 40200 1.00 0.90 0.80 0.70 0.60 0.50 40200 1.00 0.90 0.80 0.70 0.60 0.50 R-Square Pearson Spearman 29
  • 30. Which fragmentation measures to use? 30 40200 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 40200 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 40200 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 40200 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 R-SquareSpearman Model with nrAuthors, nrCommits Model with nCloseness, nrAuthors, nrCommits
  • 31. Summary of results Centrality measures can predict more than 83% of failure-pone Vista binaries Closeness, nrAuthors, and nrCommits can predict the number of post- release failures Closeness or Reach can improve prediction of the number of post- release failures by 32% More information Can Developer-Module Networks Predict Failures?, FSE 2008 31
  • 33. Alice Bob Dan Eric Fu Go Hin ab c 5 4 6 2 4 6 2 5 7 4 What can we learn from that? Re-organize/restrict developer contributions Simply: fire Bob! Find out the reasons why Bob is contributing to both binaries At MS, few key developers helped in many places to get Vista running 33
  • 34. Alice Bob Dan Eric Fu Go Hin ab c 5 4 6 2 4 6 2 5 7 4 What can we learn from that? Re-factor central binaries Check the contract between “a” and “b” - decouple them E.g., analyze if “a” contains functionality that should be moved to “b” or a new binary 34
  • 35. Alice Bob Dan Eric Fu Go Hin ab c 5 4 6 2 4 6 2 5 7 4 What can we learn from that? Increase testing of binaries “a” and “b” Yes, since these binaries are failure prone 35
  • 36. What did Microsoft do with the results? Results were kept within MS Research (at least in 2007 and 2008) Be careful - our findings were purely based on the data Such findings often do not show the full picture, since not everything is recorded My results triggered a lot more research on developer contributions at MS Research 36
  • 37. Why researchers want/need/must collaborate with industry? My experiences with open source and industrial software projects
  • 38. Open source projects: pros + Provide tons of data E.g., Eclipse project, Apache projects, etc. + Easy to access Almost “all” data is publicly available, e.g., Github + No organizational obstacles to get access + You get ALL the data, not just parts of it 38
  • 39. Open source projects: cons - Research is (sometimes) difficult to motivate Sometimes researchers analyze open source projects to try out some new technique/algorithms but without knowing what actual problem they want to solve Why are you doing this? What is the value for the research and industry - Difficult to get in touch with the developers to validate the results 10 years back, I showed our results to Mozilla - they only said “interesting” but that was it! Some open source communities are more responsive: e.g., Eclipse community Still, you typically do not get the chance to meet them face-2-face 39
  • 40. Industrial projects: pros + Usually provide real problems But sometimes these problems are not “research” problems - and researchers want to do research + Provide contact to developers to obtain feedback on the findings Industry as a laboratory Helps to evaluate what is useful and what is not + Potential to see our research results used at least by developers Processes, tools, algorithms, models, best practices, etc. 40
  • 41. Industrial projects: cons - Often expectations between researchers and developers differ a lot Developers want to get things done - researchers want to publish Solutions are too complex and/or only applicable to a very specific case study, therefore not useful - Note, researchers often provide know how, not ready made tools Is that a good idea? - Not always, but we are typically cheaper than most consultants - Industry provides only partial case studies Often not really useful to perform research on -> we need data and a lot of it 41
  • 42. How I got in touch with Microsoft (Research) Met them at the Microsoft developers conference Got invited to Redmond to show them what we could do FOR them Agreed on sending a researcher (me) to Redmond for three months Fully paid by Microsoft Once within Microsoft got access to the data and to some developers My main contact was with MS Research 
 Microsoft invests in internships and visiting researchers They use it to find and hire talented people 42
  • 43. What is next on my research agenda? Study defect prediction in industrial software projects I am still looking for a good industrial partner (and a student) Ease understanding changes and their effects What is the effect on the design? What is the effect on the quality? Recommender techniques Identify the sources of problems Recommend and perform refactorings to solve the problem Provide advice on the effects of changes to prevent problems For this I want and need to collaborate with industry! 43
  • 44. Conclusions 44 Questions? Martin Pinzger martin.pinzger@aau.at the history of a software system to assemble the dataset for our experiments: (1) versioning data including lines modi- fied (LM), (2) bug data, i.e., which files contained bugs and how many of them (Bugs), and (3) fine-grained source code changes (SCC). 4. Experiment 2. Bug Data 3. Source Code Changes (SCC)1.Versioning Data CVS, SVN, GIT Evolizer RHDB Log Entries ChangeDistiller Subsequent Versions Changes #bug123 Message Bug Support Vector Machine 1.1 1.2 AST Comparison Figure 1: Stepwise overview of the data extraction process. 1. Versioning Data. We use EVOLIZER [14] to access the ver- sioning repositories , e.g., CVS, SVN, or GIT. They provide log entries that contain information about revisions of files that belong to a system. From the log entries we extract the revision number (to identify the revisions of a file in correct temporal order), the revision timestamp, the name of the de- veloper who checked-in the new revision, and the commit message. We then compute LM for a source file as the sum of lines added, lines deleted, and lines changed per file revision. 2. Bug Data. Bug reports are stored in bug repositories such as Bugzilla. Traditional bug tracking and versioning repos- Update Core 595 8’496 251’434 36’151 532 Oct0 Debug UI 1’954 18’862 444’061 81’836 3’120 May JDT Debug UI 775 8’663 168’598 45’645 2’002 Nov Help 598 3’658 66’743 12’170 243 May JDT Core 1’705 63’038 2’814K 451’483 6’033 Jun0 OSGI 748 9’866 335’253 56’238 1’411 Nov single source code statements, e.g., method invocatio ments, between two versions of a program by com their respective abstract syntax trees (AST). Each chan represents a tree edit operation that is required to tr one version of the AST into the other. The algorithm i mented in CHANGEDISTILLER [14] that pairwise co the ASTs between all direct subsequent revisions of e Based on this information, we then count the numbe ferent source code changes (SCC) per file revision. The preprocessed data from step 1-3 is stored into lease History Database (RHDB) [10]. From that data, compute LM, SCC, and Bugs for each source file by a ing the values over the given observation period. 3. EMPIRICAL STUDY In this section, we present the empirical study that formed to investigate the hypotheses stated in Sectio discuss the dataset, the statistical methods and machi ing algorithms we used, and report on the results a ings of the experiments. 3.1 Dataset and Data Preparation We performed our experiments on 15 plugins of the platform. Eclipse is a popular open source system been studied extensively before [4,27,38,39]. Table 1 gives an overview of the Eclipse dataset this study with the number of unique *.java files (Fi Alice Bob Dan Eric Fu Go Hin ab c 5 4 6 2 4 6 2 5 7 4 Academia wants/needs/must collaborate with industry Industry should invest in such a collaboration
  • 45. 45