Your SlideShare is downloading. ×
0
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Populating a Release History Database (ICSM 2013 MIP)

691

Published on

The presentation gives a retrospective, impact, and future research directions of our research on building a release history database from 2003.

The presentation gives a retrospective, impact, and future research directions of our research on building a release history database from 2003.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
691
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ICSM 2013 MIP Populating a Release History DB Michael Fischer Martin Pinzger Harald C. Gall University of Zurich, Switzerland University of Klagenfurt, Austria
  • 2. Roadmap Back to 2003 Impact of the Work Mining Software Repositories From RHDB to recent research 2
  • 3. Motivation back in 2003 Version control and bug tracking systems need to be integrated large amounts of historical information can give insights but provide only insufficient support for a detailed analysis of software evolution Our goal was to populate a Release History Database that combines version data with bug tracking data and adds missing information not covered by version control systems such as merge points or bug links. to enable systematic queries to the structured data to obtain meaningful views showing the evolution of a software project. to enable more accurate reasoning of evolutionary aspects. 3
  • 4. Populating a Release History DB Problem = re-establishment of links between modification reports (MRs) and problem reports (PRs) since no mechanisms are provided by CVS We use PR IDs found in the MRs of CVS PR IDs in MRs are detected using a set of regular expressions. A match is rated according to the confidence value: high (h), medium (m), or low (l) confidence is considered high if expressions such as <keyword><ID> can be detected confidence is considered low a six digit number just appearing somewhere in the text of a modification report without preceding keyword 4
  • 5. Building a Release History DB 3 main sources: Modification reports (MR): CVS Problem reports (PR): Bugzilla program and patch information: Release Packages Relevant MRs and PRs are filtered, validated and stored in a Release History DB (RHDB) 5
  • 6. Import process 6
  • 7. RHDB schema (meta-model) 7
  • 8. Views on Mozilla evolution 50% of files have been modified in last quarter of observation although only 25% of files have been integrated 8
  • 9. Views on Mozilla evolution /2 modules size 9
  • 10. Feature evolution 10
  • 11. Conclusions from 2003 RHDB offers some benefits for evolution analysis qualified links between changes and bugs files logically coupled via changes and bugs branch/merge revision data Data set as a basis for further analyses and visualizations (e.g. MDS-view) A basis for data exchange among research groups in the direction of a meta-model for release data 11
  • 12. Next steps: outlook from 2003 Further revise and develop meta-model for release data exchange Provide a qualified set of queries to the RHDB Integrate with other evolution analyses and evolution data in a framework bug report data modification report data test data and properties feature information multi-dimensional visualization 12
  • 13. Impact of RHDB work
  • 14. Google Scholar: 387 with MSR 2004 in Edinburgh, 2005 in St. Louis Citations 14
  • 15. Paper titles (top 16) 15
  • 16. Conferences 16
  • 17. Paper authors (top 100) 17
  • 18. What is referenced? ...Also, numerical bug IDs mentioned in the commit log, are linked back to the issue tracking system’s identifiers [21, 44]... (F. Rahman, et al.) ...First, we searched for keywords such as “bug”, “bugs”, “bug fixes”, and “fixed bug”, or references to bug IDs in log files; ... [39, 15, 29]... (P. Bhattacharya et al.) ..While modern systems like Subclipse (http:// subclipse.tigris.org) allow to link bug reports and code modifications, most of the time these links are not available [12]... (W. Poncin, et al.) 18
  • 19. From RHDB to Mining Software Repositories
  • 20. Mining Software Repositories Does distributed development affect software quality? Cross-project defect prediction: when does it work? Visual (Effort Estimation) Patterns in Issue Tracking Data Visual Understanding of Source Code Dependencies Analyzing the co-evolution of comments and code Predicting the fix time of bugs Supporting developers with Natural Language Queries Can Developer-Module Networks Predict Failures? Interactive Views for Analyzing Problem Reports 20
  • 21. From RHDB to Change Type Analysis: Change Distiller Beat Fluri and Harald Gall
  • 22. Source Code Changes using ASTs Using tree differencing, we can determine public void method(D d) { if (d != null) { d.foo(); d.bar(); } } public void method(D d) { d.foo(); d.bar(); } 22
  • 23. Using tree differencing, we can determine enclosing entity (root node) Source Code Changes using ASTs public void method(D d) { if (d != null) { d.foo(); d.bar(); } } public void method(D d) { d.foo(); d.bar(); } 23
  • 24. Using tree differencing, we can determine enclosing entity (root node) kind of statement which changed (node information) kind of change (tree edit operation) Source Code Changes using ASTs public void method(D d) { d.foo(); d.bar(); } 24 public void method(D d) { if (d != null) { d.foo(); d.bar(); } }
  • 25. Using tree differencing, we can determine enclosing entity (root node) kind of statement which changed (node information) kind of change (tree edit operation) Source Code Changes using ASTs public void method(D d) { if (d != null) { d.foo(); d.bar(); } } public void method(D d) { d.foo(); d.bar(); } 25
  • 26. ChangeDistiller Model uniqueName shortName type SourceCodeEntity structureEntity sourceCodeEntity type ChangeOperation parentEntity Insert parentEntity Delete oldParentEntity newParentEntity Move newEntity parentEntity Update uniqueName type bodyChanges declarationChanges StructureEntity * changeType changeOperations SourceCodeChange * structureEntity version StructureEntityVersion attributeVersions AttributeHistory methodVersions MethodHistory * classVersions attributeHistories innerClassHistories methodHistories ClassHistory * * * Revision link to org.evolizer.model.versioning BodyChange DeclarationChange * * 26
  • 27. ChangeDistiller De 27https://bitbucket.org/sealuzh/tools-changedistiller/wiki/Home
  • 28. From RHDB to Software Analysis as a Service (SOFAS) Giacomo Ghezzi and Harald Gall
  • 29. SOFtware Analysis Services The actual repository analysis is offered as a service The user selects the analysis with the data to be analyzed and gets the results (workflow) Data is key; analyses can be lengthy and expensive! 29
  • 30. SOFAS scenario SVN History Service OO Metrics Service Famic Model Service 30
  • 31. General Concepts Domain Specific Concepts System Specific Concepts Bugs Code History CVS SVN GITJava C#Bugzilla Trac Issue Tracking Bugzilla Trac Change Coupling Change Types Source Code C#Java Software Design Metrics Version Control CVS SVN GIT SEON Pyramid(s) www.se-on.org 31
  • 32. Semantic Links http://myProject.org/bugs/nr124 Bug History extractor http://ifi.uzh.ch/svnImporter myProject/Foo.java23 Version Control history extractor Bug-Revision linker http://myProject.org/bugs/nr124 http://sofas.org/bugOntology/affects http://ifi.uzh.ch/svnImporter/ myProject/Foo.java23
  • 33. Current SOFAS services Data Gatherers Version history extractor for CVS, SVN, GIT, and Mercurial Issue tracking history for Bugzilla, Trac, SourceForge, Jira Basic Services Meta-model extractors for Java and C# (FAMIX) Change coupling, change types Issue-revision linker Metrics service Composite services Evolutionary hot-spots Highly changing Code Clones and many more ... 33
  • 34. Software Analysis Workflows 34
  • 35. 35 Software Facets - a glimpse
  • 36. From RHDB to Defect Prediction Emanuel Giger, Martin Pinzger, and Harald Gall
  • 37. RHDB for Defect Prediction Some of our defect prediction papers Predicting defect densities in source code files with decision tree learners (MSR 2006) Improving defect prediction using temporal features and non linear models (IWPSE 2007) Predicting the fix time of bugs (RSSE 2010) Comparing fine-grained source code changes and code churn for bug prediction (MSR 2011) Method-Level Bug Prediction (ESEM 2012) 37
  • 38. Prediction granularity 11 methods on average class 1 class 2 class 3 class n...class 2 4 are bug prone (ca. 36%) Goal: Prediction model to identify bug-prone methods 38 Large files are typically the most bug-prone files
  • 39. Approach for defect prediction how many of them (Bugs), and (3) fine-grained source code changes (SCC). 4. Experiment 2. Bug Data 3. Source Code Changes (SCC)1.Versioning Data CVS, SVN, GIT Evolizer RHDB Log Entries ChangeDistiller Subsequent Versions Changes #bug123 Message Bug Support Vector Machine 1.1 1.2 AST Comparison 39
  • 40. 21 Java open source projects 40 Project #Classes #Methods #M-Histories #Bugs JDT Core 1140 17703 43134 4888 Jena2 897 8340 7764 704 Lucene 477 3870 1754 377 Xerces 693 8189 6866 1017 Derby Engine 1394 18693 9507 1663 Ant Core 827 8698 17993 1900
  • 41. Models computed with change metrics (CM) perform better than with source-code metrics (SCM) authors and methodHistories are the most important measures Results: Product and process metrics 41 Table 4: Median classification results over all pro- jects per classifier and per model CM SCM CM&SCM AUC P R AUC P R AUC P R RndFor .95 .84 .88 .72 .5 .64 .95 .85 .95 SVM .96 .83 .86 .7 .48 .63 .95 .8 .96 BN .96 .82 .86 .73 .46 .73 .96 .81 .96 J48 .95 .84 .82 .69 .56 .58 .91 .83 .89 values of the code metrics model are approximately 0.7 for each classifier—what is defined by Lessman et al. as ”promis- ing” [26]. However, the source code metrics suffer from con- siderably low precision values. The highest median precision
  • 42. Lessons from Defect Prediction Bug predictions do work Cross-project predictions do not really work Data sets (systems) as benchmark Data preprocessing and learners need to be calibrated Studies need to be replicable (systematically) 42
  • 43. From RHDB to Evolution of Service-oriented Systems Daniele Romano and Martin Pinzger
  • 44. Fine-Grained Changes for WSDLs 44 Matching)Engine) org.eclipse.compare.match/ Match/Model/ Diff/Model/ Differencing)Engine) org.eclipse.compare.diff/ XSD)Transformer) XSD)Transformer) WSDL/Model1’/ WSDL/Model2’/ WSDL/Model1/ WSDL/Model2/ WSDL/Version1/ WSDL/Version2/ WSDL)Parser) org.eclipse.wst.wsdl/ org.eclipse.xsd/ WSDL)Parser) org.eclipse.wst.wsdl/ org.eclipse.xsd/ A B) C) D
  • 45. Changes in WSDLs 45 Change Type AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg OperationA 113 1 10 0 OperationC 0 1 0 0 OperationD 9 1 4 0 MessageA 218 2 16 0 MessageC 2 0 2 0 MessageD 10 2 2 0 PartA 27 0 2 0 PartC 34 0 0 0 PartD 27 0 2 0 Total 440 7 38 0 Operations and messages are added but rarely deleted
  • 46. Changes in Data Types 46 Change Type AmazonEC2 FedEx Rate FedEx Ship FedEx Pkg XSDTypeA 409 234 157 0 XSDTypeC 160 295 280 6 XSDTypeD 2 71 28 0 XSDElementA 208 2 25 0 XSDElementC 1 0 18 0 XSDElementD 0 2 0 0 XSDAttributeGroupA 6 0 0 0 XSDAttributeGroupC 5 0 0 0 Total 791 604 508 6 Data types are added but rarely deleted
  • 47. What we learned from WSDL evolution 47 Users of the FedEx service Data types change frequently Operations are more stable Users of the AmazonEC2 service New operations are continuously added Data types change frequently adding new elements Analyzing the Evolution of Web Services using Fine-Grained Changes D. Romano and M. Pinzger, ICWS 2012
  • 48. What is next? 48
  • 49. What is next? RHDB Evolizer, ChangeDistiller, SOFAS DA4Java Empirical studies on quality, change and defect prediction year2003 2005 2007 2009 2011 2013 Spreadsheet analysis ArchView Web 2.0 for understanding Service quality Ecosystem Evolution Stakeholders Needs & Views Replication Studies Social Coding & Mining
  • 50. Conclusions 50 Contact gall@ifi.uzh.ch martin.pinzger@aau.at4. Experiment 2. Bug Data 3. Source Code Changes (SCC)1.Versioning Data CVS, SVN, GIT Evolizer RHDB Log Entries ChangeDistiller Subsequent Versions Changes #bug123 Message Bug Support Vector Machine 1.1 1.2 AST Comparison Ecosystem Evolution Stakeholders Needs & Views Replication Studies Social Coding & Mining

×