Populating a Release History Database (ICSM 2013 MIP)

ICSM 2013 MIP
Populating a Release History DB
Michael Fischer Martin Pinzger Harald C. Gall
University of Zurich, Switzerland
University of Klagenfurt, Austria

Roadmap
Back to 2003
Impact of the Work
Mining Software Repositories
From RHDB to recent research
2

Motivation back in 2003
Version control and bug tracking systems need to be integrated
large amounts of historical information can give insights
but provide only insufficient support for a detailed analysis of software
evolution
Our goal was
to populate a Release History Database that combines version data with
bug tracking data and adds missing information not covered by version
control systems such as merge points or bug links.
to enable systematic queries to the structured data to obtain meaningful
views showing the evolution of a software project.
to enable more accurate reasoning of evolutionary aspects.
3

Populating a Release History DB
Problem = re-establishment of links between modification
reports (MRs) and problem reports (PRs) since no mechanisms
are provided by CVS
We use PR IDs found in the MRs of CVS
PR IDs in MRs are detected using a set of regular expressions.
A match is rated according to the confidence value:
high (h), medium (m), or low (l)
confidence is considered high if expressions such as <keyword><ID> can
be detected
confidence is considered low a six digit number just appearing
somewhere in the text of a modification report without preceding keyword
4

Building a Release History DB
3 main sources:
Modification reports (MR): CVS
Problem reports (PR): Bugzilla
program and patch information: Release Packages
Relevant MRs and PRs are filtered, validated and
stored in a Release History DB (RHDB)
5

Views on Mozilla evolution
50% of files have been modified in last quarter of observation
although only 25% of files have been integrated
8

Views on Mozilla evolution /2
modules
size
9

Conclusions from 2003
RHDB offers some benefits for evolution analysis
qualified links between changes and bugs
files logically coupled via changes and bugs
branch/merge revision data
Data set as a basis for further analyses and
visualizations (e.g. MDS-view)
A basis for data exchange among research groups in
the direction of a meta-model for release data
11

Next steps: outlook from 2003
Further revise and develop meta-model for release data
exchange
Provide a qualified set of queries to the RHDB
Integrate with other evolution analyses and evolution
data in a framework
bug report data
modification report data
test data and properties
feature information
multi-dimensional visualization
12

Google Scholar: 387
with MSR 2004 in Edinburgh, 2005 in St. Louis
Citations
14

What is referenced?
...Also, numerical bug IDs mentioned in the commit log, are
linked back to the issue tracking system’s identifiers [21,
44]... (F. Rahman, et al.)
...First, we searched for keywords such as “bug”, “bugs”,
“bug fixes”, and “fixed bug”, or references to bug IDs in log
files; ... [39, 15, 29]... (P. Bhattacharya et al.)
..While modern systems like Subclipse (http://
subclipse.tigris.org) allow to link bug reports and code
modifications, most of the time these links are not available
[12]... (W. Poncin, et al.)
18

From RHDB to

Does distributed development affect software quality?
Cross-project defect prediction: when does it work?
Visual (Effort Estimation) Patterns in Issue Tracking Data
Visual Understanding of Source Code Dependencies
Analyzing the co-evolution of comments and code
Predicting the fix time of bugs
Supporting developers with Natural Language Queries
Can Developer-Module Networks Predict Failures?
Interactive Views for Analyzing Problem Reports
20

From RHDB to
Change Type Analysis: Change Distiller
Beat Fluri and Harald Gall

Source Code Changes using ASTs
Using tree differencing, we can determine
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
d.foo();
d.bar();
}
22

enclosing entity (root node)
if (d != null) {
d.foo();
d.bar();
}
}
d.foo();
d.bar();
}
23

kind of statement which changed (node information)
kind of change (tree edit operation)
d.foo();
d.bar();
}
24
if (d != null) {
d.foo();
d.bar();
}
}

kind of statement which changed (node information)
kind of change (tree edit operation)
if (d != null) {
d.foo();
d.bar();
}
}
d.foo();
d.bar();
}
25

ChangeDistiller Model
uniqueName
shortName
type
SourceCodeEntity
structureEntity
sourceCodeEntity
type
ChangeOperation
parentEntity
Insert
parentEntity
Delete
oldParentEntity
newParentEntity
Move
newEntity
parentEntity
Update
uniqueName
type
bodyChanges
declarationChanges
StructureEntity
*
changeType
changeOperations
SourceCodeChange
*
structureEntity
version
StructureEntityVersion
attributeVersions
AttributeHistory
methodVersions
MethodHistory
*
classVersions
attributeHistories
innerClassHistories
methodHistories
ClassHistory
*
*
*
Revision
link to
org.evolizer.model.versioning
BodyChange
DeclarationChange
*
*
26

ChangeDistiller
De
27https://bitbucket.org/sealuzh/tools-changedistiller/wiki/Home

From RHDB to
Software Analysis as a Service (SOFAS)
Giacomo Ghezzi and Harald Gall

SOFtware Analysis Services
The actual repository analysis is offered as a service
The user selects the analysis with the data to be
analyzed and gets the results (workflow)
Data is key; analyses can be lengthy and expensive!
29

SOFAS scenario
SVN
History
Service
OO
Metrics
Service
Famic
Model
Service
30

General
Concepts
Domain Speciﬁc
Concepts
System Speciﬁc Concepts
Bugs Code History
CVS SVN GITJava C#Bugzilla Trac
Issue
Tracking
Bugzilla Trac
Change
Coupling
Change
Types
Source
Code
C#Java Software
Design
Metrics
Version
Control
CVS SVN GIT
SEON Pyramid(s)
www.se-on.org
31

Semantic Links
http://myProject.org/bugs/nr124
Bug History extractor
http://ifi.uzh.ch/svnImporter
myProject/Foo.java23
Version Control
history extractor
Bug-Revision linker
http://myProject.org/bugs/nr124
http://sofas.org/bugOntology/affects
http://ifi.uzh.ch/svnImporter/
myProject/Foo.java23

Current SOFAS services
Data Gatherers
Version history extractor for CVS, SVN, GIT, and Mercurial
Issue tracking history for Bugzilla, Trac, SourceForge, Jira
Basic Services
Meta-model extractors for Java and C# (FAMIX)
Change coupling, change types
Issue-revision linker
Metrics service
Composite services
Evolutionary hot-spots
Highly changing Code Clones
and many more ...
33

Software Analysis Workflows
34

35
Software Facets - a glimpse

From RHDB to
Defect Prediction
Emanuel Giger, Martin Pinzger, and Harald Gall

RHDB for Defect Prediction
Some of our defect prediction papers
Predicting defect densities in source code files with decision
tree learners (MSR 2006)
Improving defect prediction using temporal features and non
linear models (IWPSE 2007)
Predicting the fix time of bugs (RSSE 2010)
Comparing fine-grained source code changes and code
churn for bug prediction (MSR 2011)
Method-Level Bug Prediction (ESEM 2012)
37

Prediction granularity
11 methods on average
class 1 class 2 class 3 class n...class 2
4 are bug prone (ca. 36%)
Goal: Prediction model to identify bug-prone methods
38
Large files are typically the most bug-prone files

Approach for defect prediction
how many of them (Bugs), and (3) ﬁne-grained source code
changes (SCC).
4. Experiment
2. Bug Data
3. Source Code Changes (SCC)1.Versioning Data
CVS, SVN,
GIT
Evolizer
RHDB
Log Entries
ChangeDistiller
Subsequent
Versions
Changes
#bug123
Message Bug
Support
Vector
Machine
1.1 1.2
AST
Comparison
39

21 Java open source projects
40
Project #Classes #Methods #M-Histories #Bugs
JDT Core 1140 17703 43134 4888
Jena2 897 8340 7764 704
Lucene 477 3870 1754 377
Xerces 693 8189 6866 1017
Derby Engine 1394 18693 9507 1663
Ant Core 827 8698 17993 1900

Models computed with change metrics (CM) perform
better than with source-code metrics (SCM)
authors and methodHistories are the most important
measures
Results: Product and process metrics
41
Table 4: Median classification results over all pro-
jects per classifier and per model
CM SCM CM&SCM
AUC P R AUC P R AUC P R
RndFor .95 .84 .88 .72 .5 .64 .95 .85 .95
SVM .96 .83 .86 .7 .48 .63 .95 .8 .96
BN .96 .82 .86 .73 .46 .73 .96 .81 .96
J48 .95 .84 .82 .69 .56 .58 .91 .83 .89
values of the code metrics model are approximately 0.7 for
each classifier—what is defined by Lessman et al. as ”promis-
ing” [26]. However, the source code metrics suffer from con-
siderably low precision values. The highest median precision

Lessons from Defect Prediction
Bug predictions do work
Cross-project predictions do not really work
Data sets (systems) as benchmark
Data preprocessing and learners need to be calibrated
Studies need to be replicable (systematically)
42

From RHDB to
Evolution of Service-oriented Systems
Daniele Romano and Martin Pinzger

Fine-Grained Changes for WSDLs
44
Matching)Engine)
org.eclipse.compare.match/
Match/Model/
Diff/Model/
Differencing)Engine)
org.eclipse.compare.diff/
XSD)Transformer) XSD)Transformer)
WSDL/Model1’/ WSDL/Model2’/
WSDL/Model1/ WSDL/Model2/
WSDL/Version1/ WSDL/Version2/
WSDL)Parser)
org.eclipse.wst.wsdl/
org.eclipse.xsd/
WSDL)Parser)
org.eclipse.wst.wsdl/
org.eclipse.xsd/
A
B)
C)
D

Changes in WSDLs
45
Change Type AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg
OperationA 113 1 10 0
OperationC 0 1 0 0
OperationD 9 1 4 0
MessageA 218 2 16 0
MessageC 2 0 2 0
MessageD 10 2 2 0
PartA 27 0 2 0
PartC 34 0 0 0
PartD 27 0 2 0
Total 440 7 38 0
Operations and messages are added but rarely
deleted

Changes in Data Types
46
Change Type AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg
XSDTypeA 409 234 157 0
XSDTypeC 160 295 280 6
XSDTypeD 2 71 28 0
XSDElementA 208 2 25 0
XSDElementC 1 0 18 0
XSDElementD 0 2 0 0
XSDAttributeGroupA 6 0 0 0
XSDAttributeGroupC 5 0 0 0
Total 791 604 508 6
Data types are added but rarely deleted

What we learned from WSDL evolution
47
Users of the FedEx service
Data types change frequently
Operations are more stable
Users of the AmazonEC2 service
New operations are continuously added
Data types change frequently adding new elements
Analyzing the Evolution of Web Services using Fine-Grained Changes
D. Romano and M. Pinzger, ICWS 2012

What is next?
RHDB
Evolizer, ChangeDistiller, SOFAS
DA4Java
Empirical studies on quality, change and
defect prediction
year2003 2005 2007 2009 2011 2013
Spreadsheet
analysis
ArchView
Web 2.0 for
understanding
Service quality Ecosystem
Evolution
Stakeholders
Needs & Views
Replication
Studies
Social Coding &
Mining

Conclusions
50
Contact
gall@iﬁ.uzh.ch
martin.pinzger@aau.at4. Experiment
2. Bug Data
3. Source Code Changes (SCC)1.Versioning Data
CVS, SVN,
GIT
Evolizer
RHDB
Log Entries
ChangeDistiller
Subsequent
Versions
Changes
#bug123
Message Bug
Support
Vector
Machine
1.1 1.2
AST
Comparison
Ecosystem Evolution
Stakeholders Needs &
Views
Replication Studies
Social Coding &
Mining

Populating a Release History Database (ICSM 2013 MIP)

Recommended

Recommended

More Related Content

Similar to Populating a Release History Database (ICSM 2013 MIP)

Similar to Populating a Release History Database (ICSM 2013 MIP) (20)

Recently uploaded

Recently uploaded (20)

Populating a Release History Database (ICSM 2013 MIP)