SlideShare a Scribd company logo
1 of 50
Download to read offline
ICSM 2013 MIP
Populating a Release History DB
Michael Fischer Martin Pinzger Harald C. Gall
University of Zurich, Switzerland
University of Klagenfurt, Austria
Roadmap
Back to 2003
Impact of the Work
Mining Software Repositories
From RHDB to recent research
2
Motivation back in 2003
Version control and bug tracking systems need to be integrated
large amounts of historical information can give insights
but provide only insufficient support for a detailed analysis of software
evolution
Our goal was
to populate a Release History Database that combines version data with
bug tracking data and adds missing information not covered by version
control systems such as merge points or bug links.
to enable systematic queries to the structured data to obtain meaningful
views showing the evolution of a software project.
to enable more accurate reasoning of evolutionary aspects.
3
Populating a Release History DB
Problem = re-establishment of links between modification
reports (MRs) and problem reports (PRs) since no mechanisms
are provided by CVS
We use PR IDs found in the MRs of CVS
PR IDs in MRs are detected using a set of regular expressions.
A match is rated according to the confidence value:
high (h), medium (m), or low (l)
confidence is considered high if expressions such as <keyword><ID> can
be detected
confidence is considered low a six digit number just appearing
somewhere in the text of a modification report without preceding keyword
4
Building a Release History DB
3 main sources:
Modification reports (MR): CVS
Problem reports (PR): Bugzilla
program and patch information: Release Packages
Relevant MRs and PRs are filtered, validated and
stored in a Release History DB (RHDB)
5
Import process
6
RHDB schema (meta-model)
7
Views on Mozilla evolution
50% of files have been modified in last quarter of observation
although only 25% of files have been integrated
8
Views on Mozilla evolution /2
modules
size
9
Feature evolution
10
Conclusions from 2003
RHDB offers some benefits for evolution analysis
qualified links between changes and bugs
files logically coupled via changes and bugs
branch/merge revision data
Data set as a basis for further analyses and
visualizations (e.g. MDS-view)
A basis for data exchange among research groups in
the direction of a meta-model for release data
11
Next steps: outlook from 2003
Further revise and develop meta-model for release data
exchange
Provide a qualified set of queries to the RHDB
Integrate with other evolution analyses and evolution
data in a framework
bug report data
modification report data
test data and properties
feature information
multi-dimensional visualization
12
Impact of RHDB work
Google Scholar: 387
with MSR 2004 in Edinburgh, 2005 in St. Louis
Citations
14
Paper titles (top 16)
15
Conferences
16
Paper authors (top 100)
17
What is referenced?
...Also, numerical bug IDs mentioned in the commit log, are
linked back to the issue tracking system’s identifiers [21,
44]... (F. Rahman, et al.)
...First, we searched for keywords such as “bug”, “bugs”,
“bug fixes”, and “fixed bug”, or references to bug IDs in log
files; ... [39, 15, 29]... (P. Bhattacharya et al.)
..While modern systems like Subclipse (http://
subclipse.tigris.org) allow to link bug reports and code
modifications, most of the time these links are not available
[12]... (W. Poncin, et al.)
18
From RHDB to
Mining Software Repositories
Mining Software Repositories
Does distributed development affect software quality?
Cross-project defect prediction: when does it work?
Visual (Effort Estimation) Patterns in Issue Tracking Data
Visual Understanding of Source Code Dependencies
Analyzing the co-evolution of comments and code
Predicting the fix time of bugs
Supporting developers with Natural Language Queries
Can Developer-Module Networks Predict Failures?
Interactive Views for Analyzing Problem Reports
20
From RHDB to
Change Type Analysis: Change Distiller
Beat Fluri and Harald Gall
Source Code Changes using ASTs
Using tree differencing, we can determine
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
public void method(D d) {
d.foo();
d.bar();
}
22
Using tree differencing, we can determine
enclosing entity (root node)
Source Code Changes using ASTs
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
public void method(D d) {
d.foo();
d.bar();
}
23
Using tree differencing, we can determine
enclosing entity (root node)
kind of statement which changed (node information)
kind of change (tree edit operation)
Source Code Changes using ASTs
public void method(D d) {
d.foo();
d.bar();
}
24
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
Using tree differencing, we can determine
enclosing entity (root node)
kind of statement which changed (node information)
kind of change (tree edit operation)
Source Code Changes using ASTs
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
public void method(D d) {
d.foo();
d.bar();
}
25
ChangeDistiller Model
uniqueName
shortName
type
SourceCodeEntity
structureEntity
sourceCodeEntity
type
ChangeOperation
parentEntity
Insert
parentEntity
Delete
oldParentEntity
newParentEntity
Move
newEntity
parentEntity
Update
uniqueName
type
bodyChanges
declarationChanges
StructureEntity
*
changeType
changeOperations
SourceCodeChange
*
structureEntity
version
StructureEntityVersion
attributeVersions
AttributeHistory
methodVersions
MethodHistory
*
classVersions
attributeHistories
innerClassHistories
methodHistories
ClassHistory
*
*
*
Revision
link to
org.evolizer.model.versioning
BodyChange
DeclarationChange
*
*
26
ChangeDistiller
De
27https://bitbucket.org/sealuzh/tools-changedistiller/wiki/Home
From RHDB to
Software Analysis as a Service (SOFAS)
Giacomo Ghezzi and Harald Gall
SOFtware Analysis Services
The actual repository analysis is offered as a service
The user selects the analysis with the data to be
analyzed and gets the results (workflow)
Data is key; analyses can be lengthy and expensive!
29
SOFAS scenario
SVN
History
Service
OO
Metrics
Service
Famic
Model
Service
30
General
Concepts
Domain Specific
Concepts
System Specific Concepts
Bugs Code History
CVS SVN GITJava C#Bugzilla Trac
Issue
Tracking
Bugzilla Trac
Change
Coupling
Change
Types
Source
Code
C#Java Software
Design
Metrics
Version
Control
CVS SVN GIT
SEON Pyramid(s)
www.se-on.org
31
Semantic Links
http://myProject.org/bugs/nr124
Bug History extractor
http://ifi.uzh.ch/svnImporter
myProject/Foo.java23
Version Control
history extractor
Bug-Revision linker
http://myProject.org/bugs/nr124
http://sofas.org/bugOntology/affects
http://ifi.uzh.ch/svnImporter/
myProject/Foo.java23
Current SOFAS services
Data Gatherers
Version history extractor for CVS, SVN, GIT, and Mercurial
Issue tracking history for Bugzilla, Trac, SourceForge, Jira
Basic Services
Meta-model extractors for Java and C# (FAMIX)
Change coupling, change types
Issue-revision linker
Metrics service
Composite services
Evolutionary hot-spots
Highly changing Code Clones
and many more ...
33
Software Analysis Workflows
34
35
Software Facets - a glimpse
From RHDB to
Defect Prediction
Emanuel Giger, Martin Pinzger, and Harald Gall
RHDB for Defect Prediction
Some of our defect prediction papers
Predicting defect densities in source code files with decision
tree learners (MSR 2006)
Improving defect prediction using temporal features and non
linear models (IWPSE 2007)
Predicting the fix time of bugs (RSSE 2010)
Comparing fine-grained source code changes and code
churn for bug prediction (MSR 2011)
Method-Level Bug Prediction (ESEM 2012)
37
Prediction granularity
11 methods on average
class 1 class 2 class 3 class n...class 2
4 are bug prone (ca. 36%)
Goal: Prediction model to identify bug-prone methods
38
Large files are typically the most bug-prone files
Approach for defect prediction
how many of them (Bugs), and (3) fine-grained source code
changes (SCC).
4. Experiment
2. Bug Data
3. Source Code Changes (SCC)1.Versioning Data
CVS, SVN,
GIT
Evolizer
RHDB
Log Entries
ChangeDistiller
Subsequent
Versions
Changes
#bug123
Message Bug
Support
Vector
Machine
1.1 1.2
AST
Comparison
39
21 Java open source projects
40
Project #Classes #Methods #M-Histories #Bugs
JDT Core 1140 17703 43134 4888
Jena2 897 8340 7764 704
Lucene 477 3870 1754 377
Xerces 693 8189 6866 1017
Derby Engine 1394 18693 9507 1663
Ant Core 827 8698 17993 1900
Models computed with change metrics (CM) perform
better than with source-code metrics (SCM)
authors and methodHistories are the most important
measures
Results: Product and process metrics
41
Table 4: Median classification results over all pro-
jects per classifier and per model
CM SCM CM&SCM
AUC P R AUC P R AUC P R
RndFor .95 .84 .88 .72 .5 .64 .95 .85 .95
SVM .96 .83 .86 .7 .48 .63 .95 .8 .96
BN .96 .82 .86 .73 .46 .73 .96 .81 .96
J48 .95 .84 .82 .69 .56 .58 .91 .83 .89
values of the code metrics model are approximately 0.7 for
each classifier—what is defined by Lessman et al. as ”promis-
ing” [26]. However, the source code metrics suffer from con-
siderably low precision values. The highest median precision
Lessons from Defect Prediction
Bug predictions do work
Cross-project predictions do not really work
Data sets (systems) as benchmark
Data preprocessing and learners need to be calibrated
Studies need to be replicable (systematically)
42
From RHDB to
Evolution of Service-oriented Systems
Daniele Romano and Martin Pinzger
Fine-Grained Changes for WSDLs
44
Matching)Engine)
org.eclipse.compare.match/
Match/Model/
Diff/Model/
Differencing)Engine)
org.eclipse.compare.diff/
XSD)Transformer) XSD)Transformer)
WSDL/Model1’/ WSDL/Model2’/
WSDL/Model1/ WSDL/Model2/
WSDL/Version1/ WSDL/Version2/
WSDL)Parser)
org.eclipse.wst.wsdl/
org.eclipse.xsd/
WSDL)Parser)
org.eclipse.wst.wsdl/
org.eclipse.xsd/
A
B)
C)
D
Changes in WSDLs
45
Change Type AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg
OperationA 113 1 10 0
OperationC 0 1 0 0
OperationD 9 1 4 0
MessageA 218 2 16 0
MessageC 2 0 2 0
MessageD 10 2 2 0
PartA 27 0 2 0
PartC 34 0 0 0
PartD 27 0 2 0
Total 440 7 38 0
Operations and messages are added but rarely
deleted
Changes in Data Types
46
Change Type AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg
XSDTypeA 409 234 157 0
XSDTypeC 160 295 280 6
XSDTypeD 2 71 28 0
XSDElementA 208 2 25 0
XSDElementC 1 0 18 0
XSDElementD 0 2 0 0
XSDAttributeGroupA 6 0 0 0
XSDAttributeGroupC 5 0 0 0
Total 791 604 508 6
Data types are added but rarely deleted
What we learned from WSDL evolution
47
Users of the FedEx service
Data types change frequently
Operations are more stable
Users of the AmazonEC2 service
New operations are continuously added
Data types change frequently adding new elements
Analyzing the Evolution of Web Services using Fine-Grained Changes
D. Romano and M. Pinzger, ICWS 2012
What is next?
48
What is next?
RHDB
Evolizer, ChangeDistiller, SOFAS
DA4Java
Empirical studies on quality, change and
defect prediction
year2003 2005 2007 2009 2011 2013
Spreadsheet
analysis
ArchView
Web 2.0 for
understanding
Service quality Ecosystem
Evolution
Stakeholders
Needs & Views
Replication
Studies
Social Coding &
Mining
Conclusions
50
Contact
gall@ifi.uzh.ch
martin.pinzger@aau.at4. Experiment
2. Bug Data
3. Source Code Changes (SCC)1.Versioning Data
CVS, SVN,
GIT
Evolizer
RHDB
Log Entries
ChangeDistiller
Subsequent
Versions
Changes
#bug123
Message Bug
Support
Vector
Machine
1.1 1.2
AST
Comparison
Ecosystem Evolution
Stakeholders Needs &
Views
Replication Studies
Social Coding &
Mining

More Related Content

Similar to Populating a Release History Database (ICSM 2013 MIP)

A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionMartin Pinzger
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software developmentMartin Pinzger
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffMartin Pinzger
 
Software Maintenance Bug Triaging
Software Maintenance Bug TriagingSoftware Maintenance Bug Triaging
Software Maintenance Bug TriagingRamis Khan
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug predictionMartin Pinzger
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2Revolution Analytics
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clusteringNishanth Harapanahalli
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEuropeBigData_Europe
 
Modeling Search Computing Applications
Modeling Search Computing ApplicationsModeling Search Computing Applications
Modeling Search Computing ApplicationsMarco Brambilla
 
Svcc services presentation (Silicon Valley code camp 2011)
Svcc services presentation (Silicon Valley code camp 2011)Svcc services presentation (Silicon Valley code camp 2011)
Svcc services presentation (Silicon Valley code camp 2011)Jen Wong
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talkRay Buse
 

Similar to Populating a Release History Database (ICSM 2013 MIP) (20)

A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
 
poster_3.0
poster_3.0poster_3.0
poster_3.0
 
Software Maintenance Bug Triaging
Software Maintenance Bug TriagingSoftware Maintenance Bug Triaging
Software Maintenance Bug Triaging
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
 
Of Changes and Their History
Of Changes and Their HistoryOf Changes and Their History
Of Changes and Their History
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product Overview
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEurope
 
Modeling Search Computing Applications
Modeling Search Computing ApplicationsModeling Search Computing Applications
Modeling Search Computing Applications
 
Software maintenance
Software maintenanceSoftware maintenance
Software maintenance
 
Svcc services presentation (Silicon Valley code camp 2011)
Svcc services presentation (Silicon Valley code camp 2011)Svcc services presentation (Silicon Valley code camp 2011)
Svcc services presentation (Silicon Valley code camp 2011)
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talk
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Populating a Release History Database (ICSM 2013 MIP)

  • 1. ICSM 2013 MIP Populating a Release History DB Michael Fischer Martin Pinzger Harald C. Gall University of Zurich, Switzerland University of Klagenfurt, Austria
  • 2. Roadmap Back to 2003 Impact of the Work Mining Software Repositories From RHDB to recent research 2
  • 3. Motivation back in 2003 Version control and bug tracking systems need to be integrated large amounts of historical information can give insights but provide only insufficient support for a detailed analysis of software evolution Our goal was to populate a Release History Database that combines version data with bug tracking data and adds missing information not covered by version control systems such as merge points or bug links. to enable systematic queries to the structured data to obtain meaningful views showing the evolution of a software project. to enable more accurate reasoning of evolutionary aspects. 3
  • 4. Populating a Release History DB Problem = re-establishment of links between modification reports (MRs) and problem reports (PRs) since no mechanisms are provided by CVS We use PR IDs found in the MRs of CVS PR IDs in MRs are detected using a set of regular expressions. A match is rated according to the confidence value: high (h), medium (m), or low (l) confidence is considered high if expressions such as <keyword><ID> can be detected confidence is considered low a six digit number just appearing somewhere in the text of a modification report without preceding keyword 4
  • 5. Building a Release History DB 3 main sources: Modification reports (MR): CVS Problem reports (PR): Bugzilla program and patch information: Release Packages Relevant MRs and PRs are filtered, validated and stored in a Release History DB (RHDB) 5
  • 8. Views on Mozilla evolution 50% of files have been modified in last quarter of observation although only 25% of files have been integrated 8
  • 9. Views on Mozilla evolution /2 modules size 9
  • 11. Conclusions from 2003 RHDB offers some benefits for evolution analysis qualified links between changes and bugs files logically coupled via changes and bugs branch/merge revision data Data set as a basis for further analyses and visualizations (e.g. MDS-view) A basis for data exchange among research groups in the direction of a meta-model for release data 11
  • 12. Next steps: outlook from 2003 Further revise and develop meta-model for release data exchange Provide a qualified set of queries to the RHDB Integrate with other evolution analyses and evolution data in a framework bug report data modification report data test data and properties feature information multi-dimensional visualization 12
  • 14. Google Scholar: 387 with MSR 2004 in Edinburgh, 2005 in St. Louis Citations 14
  • 18. What is referenced? ...Also, numerical bug IDs mentioned in the commit log, are linked back to the issue tracking system’s identifiers [21, 44]... (F. Rahman, et al.) ...First, we searched for keywords such as “bug”, “bugs”, “bug fixes”, and “fixed bug”, or references to bug IDs in log files; ... [39, 15, 29]... (P. Bhattacharya et al.) ..While modern systems like Subclipse (http:// subclipse.tigris.org) allow to link bug reports and code modifications, most of the time these links are not available [12]... (W. Poncin, et al.) 18
  • 19. From RHDB to Mining Software Repositories
  • 20. Mining Software Repositories Does distributed development affect software quality? Cross-project defect prediction: when does it work? Visual (Effort Estimation) Patterns in Issue Tracking Data Visual Understanding of Source Code Dependencies Analyzing the co-evolution of comments and code Predicting the fix time of bugs Supporting developers with Natural Language Queries Can Developer-Module Networks Predict Failures? Interactive Views for Analyzing Problem Reports 20
  • 21. From RHDB to Change Type Analysis: Change Distiller Beat Fluri and Harald Gall
  • 22. Source Code Changes using ASTs Using tree differencing, we can determine public void method(D d) { if (d != null) { d.foo(); d.bar(); } } public void method(D d) { d.foo(); d.bar(); } 22
  • 23. Using tree differencing, we can determine enclosing entity (root node) Source Code Changes using ASTs public void method(D d) { if (d != null) { d.foo(); d.bar(); } } public void method(D d) { d.foo(); d.bar(); } 23
  • 24. Using tree differencing, we can determine enclosing entity (root node) kind of statement which changed (node information) kind of change (tree edit operation) Source Code Changes using ASTs public void method(D d) { d.foo(); d.bar(); } 24 public void method(D d) { if (d != null) { d.foo(); d.bar(); } }
  • 25. Using tree differencing, we can determine enclosing entity (root node) kind of statement which changed (node information) kind of change (tree edit operation) Source Code Changes using ASTs public void method(D d) { if (d != null) { d.foo(); d.bar(); } } public void method(D d) { d.foo(); d.bar(); } 25
  • 28. From RHDB to Software Analysis as a Service (SOFAS) Giacomo Ghezzi and Harald Gall
  • 29. SOFtware Analysis Services The actual repository analysis is offered as a service The user selects the analysis with the data to be analyzed and gets the results (workflow) Data is key; analyses can be lengthy and expensive! 29
  • 31. General Concepts Domain Specific Concepts System Specific Concepts Bugs Code History CVS SVN GITJava C#Bugzilla Trac Issue Tracking Bugzilla Trac Change Coupling Change Types Source Code C#Java Software Design Metrics Version Control CVS SVN GIT SEON Pyramid(s) www.se-on.org 31
  • 32. Semantic Links http://myProject.org/bugs/nr124 Bug History extractor http://ifi.uzh.ch/svnImporter myProject/Foo.java23 Version Control history extractor Bug-Revision linker http://myProject.org/bugs/nr124 http://sofas.org/bugOntology/affects http://ifi.uzh.ch/svnImporter/ myProject/Foo.java23
  • 33. Current SOFAS services Data Gatherers Version history extractor for CVS, SVN, GIT, and Mercurial Issue tracking history for Bugzilla, Trac, SourceForge, Jira Basic Services Meta-model extractors for Java and C# (FAMIX) Change coupling, change types Issue-revision linker Metrics service Composite services Evolutionary hot-spots Highly changing Code Clones and many more ... 33
  • 35. 35 Software Facets - a glimpse
  • 36. From RHDB to Defect Prediction Emanuel Giger, Martin Pinzger, and Harald Gall
  • 37. RHDB for Defect Prediction Some of our defect prediction papers Predicting defect densities in source code files with decision tree learners (MSR 2006) Improving defect prediction using temporal features and non linear models (IWPSE 2007) Predicting the fix time of bugs (RSSE 2010) Comparing fine-grained source code changes and code churn for bug prediction (MSR 2011) Method-Level Bug Prediction (ESEM 2012) 37
  • 38. Prediction granularity 11 methods on average class 1 class 2 class 3 class n...class 2 4 are bug prone (ca. 36%) Goal: Prediction model to identify bug-prone methods 38 Large files are typically the most bug-prone files
  • 39. Approach for defect prediction how many of them (Bugs), and (3) fine-grained source code changes (SCC). 4. Experiment 2. Bug Data 3. Source Code Changes (SCC)1.Versioning Data CVS, SVN, GIT Evolizer RHDB Log Entries ChangeDistiller Subsequent Versions Changes #bug123 Message Bug Support Vector Machine 1.1 1.2 AST Comparison 39
  • 40. 21 Java open source projects 40 Project #Classes #Methods #M-Histories #Bugs JDT Core 1140 17703 43134 4888 Jena2 897 8340 7764 704 Lucene 477 3870 1754 377 Xerces 693 8189 6866 1017 Derby Engine 1394 18693 9507 1663 Ant Core 827 8698 17993 1900
  • 41. Models computed with change metrics (CM) perform better than with source-code metrics (SCM) authors and methodHistories are the most important measures Results: Product and process metrics 41 Table 4: Median classification results over all pro- jects per classifier and per model CM SCM CM&SCM AUC P R AUC P R AUC P R RndFor .95 .84 .88 .72 .5 .64 .95 .85 .95 SVM .96 .83 .86 .7 .48 .63 .95 .8 .96 BN .96 .82 .86 .73 .46 .73 .96 .81 .96 J48 .95 .84 .82 .69 .56 .58 .91 .83 .89 values of the code metrics model are approximately 0.7 for each classifier—what is defined by Lessman et al. as ”promis- ing” [26]. However, the source code metrics suffer from con- siderably low precision values. The highest median precision
  • 42. Lessons from Defect Prediction Bug predictions do work Cross-project predictions do not really work Data sets (systems) as benchmark Data preprocessing and learners need to be calibrated Studies need to be replicable (systematically) 42
  • 43. From RHDB to Evolution of Service-oriented Systems Daniele Romano and Martin Pinzger
  • 44. Fine-Grained Changes for WSDLs 44 Matching)Engine) org.eclipse.compare.match/ Match/Model/ Diff/Model/ Differencing)Engine) org.eclipse.compare.diff/ XSD)Transformer) XSD)Transformer) WSDL/Model1’/ WSDL/Model2’/ WSDL/Model1/ WSDL/Model2/ WSDL/Version1/ WSDL/Version2/ WSDL)Parser) org.eclipse.wst.wsdl/ org.eclipse.xsd/ WSDL)Parser) org.eclipse.wst.wsdl/ org.eclipse.xsd/ A B) C) D
  • 45. Changes in WSDLs 45 Change Type AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg OperationA 113 1 10 0 OperationC 0 1 0 0 OperationD 9 1 4 0 MessageA 218 2 16 0 MessageC 2 0 2 0 MessageD 10 2 2 0 PartA 27 0 2 0 PartC 34 0 0 0 PartD 27 0 2 0 Total 440 7 38 0 Operations and messages are added but rarely deleted
  • 46. Changes in Data Types 46 Change Type AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg XSDTypeA 409 234 157 0 XSDTypeC 160 295 280 6 XSDTypeD 2 71 28 0 XSDElementA 208 2 25 0 XSDElementC 1 0 18 0 XSDElementD 0 2 0 0 XSDAttributeGroupA 6 0 0 0 XSDAttributeGroupC 5 0 0 0 Total 791 604 508 6 Data types are added but rarely deleted
  • 47. What we learned from WSDL evolution 47 Users of the FedEx service Data types change frequently Operations are more stable Users of the AmazonEC2 service New operations are continuously added Data types change frequently adding new elements Analyzing the Evolution of Web Services using Fine-Grained Changes D. Romano and M. Pinzger, ICWS 2012
  • 49. What is next? RHDB Evolizer, ChangeDistiller, SOFAS DA4Java Empirical studies on quality, change and defect prediction year2003 2005 2007 2009 2011 2013 Spreadsheet analysis ArchView Web 2.0 for understanding Service quality Ecosystem Evolution Stakeholders Needs & Views Replication Studies Social Coding & Mining
  • 50. Conclusions 50 Contact gall@ifi.uzh.ch martin.pinzger@aau.at4. Experiment 2. Bug Data 3. Source Code Changes (SCC)1.Versioning Data CVS, SVN, GIT Evolizer RHDB Log Entries ChangeDistiller Subsequent Versions Changes #bug123 Message Bug Support Vector Machine 1.1 1.2 AST Comparison Ecosystem Evolution Stakeholders Needs & Views Replication Studies Social Coding & Mining