Populating a Release History Database (ICSM 2013 MIP)
1. ICSM 2013 MIP
Populating a Release History DB
Michael Fischer Martin Pinzger Harald C. Gall
University of Zurich, Switzerland
University of Klagenfurt, Austria
3. Motivation back in 2003
Version control and bug tracking systems need to be integrated
large amounts of historical information can give insights
but provide only insufficient support for a detailed analysis of software
evolution
Our goal was
to populate a Release History Database that combines version data with
bug tracking data and adds missing information not covered by version
control systems such as merge points or bug links.
to enable systematic queries to the structured data to obtain meaningful
views showing the evolution of a software project.
to enable more accurate reasoning of evolutionary aspects.
3
4. Populating a Release History DB
Problem = re-establishment of links between modification
reports (MRs) and problem reports (PRs) since no mechanisms
are provided by CVS
We use PR IDs found in the MRs of CVS
PR IDs in MRs are detected using a set of regular expressions.
A match is rated according to the confidence value:
high (h), medium (m), or low (l)
confidence is considered high if expressions such as <keyword><ID> can
be detected
confidence is considered low a six digit number just appearing
somewhere in the text of a modification report without preceding keyword
4
5. Building a Release History DB
3 main sources:
Modification reports (MR): CVS
Problem reports (PR): Bugzilla
program and patch information: Release Packages
Relevant MRs and PRs are filtered, validated and
stored in a Release History DB (RHDB)
5
11. Conclusions from 2003
RHDB offers some benefits for evolution analysis
qualified links between changes and bugs
files logically coupled via changes and bugs
branch/merge revision data
Data set as a basis for further analyses and
visualizations (e.g. MDS-view)
A basis for data exchange among research groups in
the direction of a meta-model for release data
11
12. Next steps: outlook from 2003
Further revise and develop meta-model for release data
exchange
Provide a qualified set of queries to the RHDB
Integrate with other evolution analyses and evolution
data in a framework
bug report data
modification report data
test data and properties
feature information
multi-dimensional visualization
12
18. What is referenced?
...Also, numerical bug IDs mentioned in the commit log, are
linked back to the issue tracking system’s identifiers [21,
44]... (F. Rahman, et al.)
...First, we searched for keywords such as “bug”, “bugs”,
“bug fixes”, and “fixed bug”, or references to bug IDs in log
files; ... [39, 15, 29]... (P. Bhattacharya et al.)
..While modern systems like Subclipse (http://
subclipse.tigris.org) allow to link bug reports and code
modifications, most of the time these links are not available
[12]... (W. Poncin, et al.)
18
20. Mining Software Repositories
Does distributed development affect software quality?
Cross-project defect prediction: when does it work?
Visual (Effort Estimation) Patterns in Issue Tracking Data
Visual Understanding of Source Code Dependencies
Analyzing the co-evolution of comments and code
Predicting the fix time of bugs
Supporting developers with Natural Language Queries
Can Developer-Module Networks Predict Failures?
Interactive Views for Analyzing Problem Reports
20
21. From RHDB to
Change Type Analysis: Change Distiller
Beat Fluri and Harald Gall
22. Source Code Changes using ASTs
Using tree differencing, we can determine
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
public void method(D d) {
d.foo();
d.bar();
}
22
23. Using tree differencing, we can determine
enclosing entity (root node)
Source Code Changes using ASTs
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
public void method(D d) {
d.foo();
d.bar();
}
23
24. Using tree differencing, we can determine
enclosing entity (root node)
kind of statement which changed (node information)
kind of change (tree edit operation)
Source Code Changes using ASTs
public void method(D d) {
d.foo();
d.bar();
}
24
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
25. Using tree differencing, we can determine
enclosing entity (root node)
kind of statement which changed (node information)
kind of change (tree edit operation)
Source Code Changes using ASTs
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
public void method(D d) {
d.foo();
d.bar();
}
25
29. SOFtware Analysis Services
The actual repository analysis is offered as a service
The user selects the analysis with the data to be
analyzed and gets the results (workflow)
Data is key; analyses can be lengthy and expensive!
29
31. General
Concepts
Domain Specific
Concepts
System Specific Concepts
Bugs Code History
CVS SVN GITJava C#Bugzilla Trac
Issue
Tracking
Bugzilla Trac
Change
Coupling
Change
Types
Source
Code
C#Java Software
Design
Metrics
Version
Control
CVS SVN GIT
SEON Pyramid(s)
www.se-on.org
31
32. Semantic Links
http://myProject.org/bugs/nr124
Bug History extractor
http://ifi.uzh.ch/svnImporter
myProject/Foo.java23
Version Control
history extractor
Bug-Revision linker
http://myProject.org/bugs/nr124
http://sofas.org/bugOntology/affects
http://ifi.uzh.ch/svnImporter/
myProject/Foo.java23
33. Current SOFAS services
Data Gatherers
Version history extractor for CVS, SVN, GIT, and Mercurial
Issue tracking history for Bugzilla, Trac, SourceForge, Jira
Basic Services
Meta-model extractors for Java and C# (FAMIX)
Change coupling, change types
Issue-revision linker
Metrics service
Composite services
Evolutionary hot-spots
Highly changing Code Clones
and many more ...
33
36. From RHDB to
Defect Prediction
Emanuel Giger, Martin Pinzger, and Harald Gall
37. RHDB for Defect Prediction
Some of our defect prediction papers
Predicting defect densities in source code files with decision
tree learners (MSR 2006)
Improving defect prediction using temporal features and non
linear models (IWPSE 2007)
Predicting the fix time of bugs (RSSE 2010)
Comparing fine-grained source code changes and code
churn for bug prediction (MSR 2011)
Method-Level Bug Prediction (ESEM 2012)
37
38. Prediction granularity
11 methods on average
class 1 class 2 class 3 class n...class 2
4 are bug prone (ca. 36%)
Goal: Prediction model to identify bug-prone methods
38
Large files are typically the most bug-prone files
39. Approach for defect prediction
how many of them (Bugs), and (3) fine-grained source code
changes (SCC).
4. Experiment
2. Bug Data
3. Source Code Changes (SCC)1.Versioning Data
CVS, SVN,
GIT
Evolizer
RHDB
Log Entries
ChangeDistiller
Subsequent
Versions
Changes
#bug123
Message Bug
Support
Vector
Machine
1.1 1.2
AST
Comparison
39
41. Models computed with change metrics (CM) perform
better than with source-code metrics (SCM)
authors and methodHistories are the most important
measures
Results: Product and process metrics
41
Table 4: Median classification results over all pro-
jects per classifier and per model
CM SCM CM&SCM
AUC P R AUC P R AUC P R
RndFor .95 .84 .88 .72 .5 .64 .95 .85 .95
SVM .96 .83 .86 .7 .48 .63 .95 .8 .96
BN .96 .82 .86 .73 .46 .73 .96 .81 .96
J48 .95 .84 .82 .69 .56 .58 .91 .83 .89
values of the code metrics model are approximately 0.7 for
each classifier—what is defined by Lessman et al. as ”promis-
ing” [26]. However, the source code metrics suffer from con-
siderably low precision values. The highest median precision
42. Lessons from Defect Prediction
Bug predictions do work
Cross-project predictions do not really work
Data sets (systems) as benchmark
Data preprocessing and learners need to be calibrated
Studies need to be replicable (systematically)
42
46. Changes in Data Types
46
Change Type AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg
XSDTypeA 409 234 157 0
XSDTypeC 160 295 280 6
XSDTypeD 2 71 28 0
XSDElementA 208 2 25 0
XSDElementC 1 0 18 0
XSDElementD 0 2 0 0
XSDAttributeGroupA 6 0 0 0
XSDAttributeGroupC 5 0 0 0
Total 791 604 508 6
Data types are added but rarely deleted
47. What we learned from WSDL evolution
47
Users of the FedEx service
Data types change frequently
Operations are more stable
Users of the AmazonEC2 service
New operations are continuously added
Data types change frequently adding new elements
Analyzing the Evolution of Web Services using Fine-Grained Changes
D. Romano and M. Pinzger, ICWS 2012