0
Mining Software Repositories What to do? And where to get data? Israel Herraiz < [email_address] > Universidad Alfonso X e...
Outline <ul><li>What is Mining Software Repositories? What  are repositories?
Conferences and journals of interest </li></ul><ul><ul><li>And some words about trending topics </li></ul></ul><ul><li>Too...
Datasets for Mining Software Repositories </li></ul><ul><ul><li>For replicable and verifiable empirical studies </li></ul>...
1. What is Mining Software Repositories?
What is Mining Software Repositories? <ul><li>MSR analyzes the rich data available in software repositories to uncover int...
Popular topic since 2004 </li><ul><li>MSR workshop, colocated with ICSE
Working Conference since 2008 </li></ul></ul>
What are repositories? <ul><li>Anything that leaves a trail about any software development or maintenance activities
Also includes any software artifact
Tipically </li><ul><li>Version control systems
Bug tracking systems
Public communication tools (mailing lists) </li></ul></ul>
Differences between artifact and repository #include <stdio.h> int main() { printf(“Hello world”); return 0; } Artifact  S...
2. Conferences and journals of interest
Working conferences of interest IEEE Int. Working Conf. Source Code Analysis & Manipulation (SCAM) http://www.ieee-scam.or...
Conferences of interest IEEE Int. Conf. Software Engineering (ICSE) http://www.sbs.co.za/ICSE2010/ IEEE Int. Conf. Softwar...
Other interesting conferences <ul><li>Working Conference on Reverse Engineering (WCRE) </li><ul><li>http://web.soccerlab.p...
Journals of interest <ul><li>IEEE Transactions on Software Engineering (TSE)  </li><ul><li>http://www.computer.org/tse/ </...
Handy links <ul><li>Software Engineering Conferences </li><ul><li>Verification, Formal Methods, Programming Lang. and Comp...
http://people.engr.ncsu.edu/txie/seconferences.htm   </li></ul><li>Upcoming Software Engineering Conferences Map </li><ul>...
Trending topics <ul><li>Replication of empirical studies </li><ul><li>The replication package </li></ul><li>Recommendation...
Upcoming SlideShare
Loading in...5
×

Mining Software Repositories

5,169

Published on

Mining Software Repositories. Where to get data? Where to publish?

Published in: Technology
2 Comments
6 Likes
Statistics
Notes
No Downloads
Views
Total Views
5,169
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
257
Comments
2
Likes
6
Embeds 0
No embeds

No notes for slide

Transcript of "Mining Software Repositories"

  1. 1. Mining Software Repositories What to do? And where to get data? Israel Herraiz < [email_address] > Universidad Alfonso X el Sabio June 18 th 2010
  2. 2. Outline <ul><li>What is Mining Software Repositories? What are repositories?
  3. 3. Conferences and journals of interest </li></ul><ul><ul><li>And some words about trending topics </li></ul></ul><ul><li>Tools for Mining Software Repositories
  4. 4. Datasets for Mining Software Repositories </li></ul><ul><ul><li>For replicable and verifiable empirical studies </li></ul></ul>
  5. 5. 1. What is Mining Software Repositories?
  6. 6. What is Mining Software Repositories? <ul><li>MSR analyzes the rich data available in software repositories to uncover interesting and actionable information about software systems and projects.
  7. 7. Popular topic since 2004 </li><ul><li>MSR workshop, colocated with ICSE
  8. 8. Working Conference since 2008 </li></ul></ul>
  9. 9. What are repositories? <ul><li>Anything that leaves a trail about any software development or maintenance activities
  10. 10. Also includes any software artifact
  11. 11. Tipically </li><ul><li>Version control systems
  12. 12. Bug tracking systems
  13. 13. Public communication tools (mailing lists) </li></ul></ul>
  14. 14. Differences between artifact and repository #include <stdio.h> int main() { printf(“Hello world”); return 0; } Artifact Source code file hello.c - printf(“Hello world”); + printf(“Hello world ”); Author: rms Date: 20100618 04:34 UTC Change: +1 -1 Log: Forgot to add new line hello.c.diff Repository Change to an artifact Meta-information
  15. 15. 2. Conferences and journals of interest
  16. 16. Working conferences of interest IEEE Int. Working Conf. Source Code Analysis & Manipulation (SCAM) http://www.ieee-scam.org IEEE Int. Working Conf. Mining Software Repositories (MSR) http://msr.uwaterloo.ca Deadlines Accept rate Journal possib. January (Februray for the challenge) April 26% (2007) 38% (2008) 45% (2009) 19% (2008) 31% (2010) JSS SCP EMSE IEEE TSE
  17. 17. Conferences of interest IEEE Int. Conf. Software Engineering (ICSE) http://www.sbs.co.za/ICSE2010/ IEEE Int. Conf. Software Maintenance (ICSM) http://icsm2010.upt.ro/ Deadlines Accept rate Journal possib. April August September 15% (2008) 12% (2009) 14% (2010) 21% (2007) 26% (2008) 22% (2009) No special issues No special issues Empirical Software Eng. & Measurement (EMSE) http://www.esem-conferences.org/ March ? EMSE
  18. 18. Other interesting conferences <ul><li>Working Conference on Reverse Engineering (WCRE) </li><ul><li>http://web.soccerlab.polymtl.ca/wcre2010/ </li></ul><li>International Conference on Predictive Models and Software Engineering (PROMISE) </li><ul><li>http://promisedata.org/ </li></ul><li>European Conference on Software Mainteance and Re-engineering (CSMR) </li><ul><li>http://www.sait.escet.urjc.es/csmr2010/ </li></ul></ul>
  19. 19. Journals of interest <ul><li>IEEE Transactions on Software Engineering (TSE) </li><ul><li>http://www.computer.org/tse/ </li></ul><li>ACM Transactions on Software Engineering and Methodology (TOSEM) </li><ul><li>http://tosem.acm.org/ </li></ul><li>Empirical Software Engineering (EMSE) </li><ul><li>http://www.springerlink.com/content/1382-3256 </li></ul><li>Journal of Systems and Software (JSS) </li><ul><li>http://www.elsevier.com/locate/jss </li></ul><li>Journal of Software Maintenance and Evolution (JSME) </li><ul><li>http://eu.wiley.com/WileyCDA/WileyTitle/productCd-SMR.html </li></ul></ul>
  20. 20. Handy links <ul><li>Software Engineering Conferences </li><ul><li>Verification, Formal Methods, Programming Lang. and Compilers, Web, Security
  21. 21. http://people.engr.ncsu.edu/txie/seconferences.htm </li></ul><li>Upcoming Software Engineering Conferences Map </li><ul><li>http://research.csc.ncsu.edu/ase/semap/ </li></ul></ul>
  22. 22. Trending topics <ul><li>Replication of empirical studies </li><ul><li>The replication package </li></ul><li>Recommendation systems </li><ul><li>Automated Software Engineering </li></ul></ul>
  23. 23. 3. Tools for Mining Software Repositories
  24. 24. Tools for Mining Software Repositories <ul><li>Mining tools </li><ul><li>Libresoft Tools http://tools.libresoft.es/
  25. 25. CVSAnaly – CVS/SVN/Git repositories log parser
  26. 26. MLStats – Mailman and Mboxes parser
  27. 27. Bicho – Bugzilla and SF.net tracker parser </li></ul><li>Software Architecture Group (SWAG) – University of Waterloo </li><ul><li>http://www.swag.uwaterloo.ca/tools.html </li></ul></ul>
  28. 28. 4. Datasets for Mining Software Repositories
  29. 29. MSR Mining Challenge <ul><li>Mirrors of the version archives and bug databases for Mozilla Firefox and Eclipse </li><ul><li>http://msr.uwaterloo.ca/msr2008/challenge/ </li></ul><li>Repository logs of over 500+ Gnome projects, XML dump of the bug databases, and the complete SVN repositories of 69 Gnome projects </li><ul><li>http://msr.uwaterloo.ca/msr2009/challenge/ </li></ul></ul>
  30. 30. Ultimate Debian Database <ul><li>Database with information about packages and bug reports of Debian and Ubuntu </li><ul><li>http://udd.debian.org/ </li></ul></ul>
  31. 31. Eclipse bug database <ul><li>Saarland University
  32. 32. Datasheets, databases, scripts, with information about Eclipse bug reports for several releases
  33. 33. http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/ </li></ul>
  34. 34. FLOSSMetrics <ul><li>Databases about ~5000 open source projects
  35. 35. Control version repositories, mailing list archives, bug tracking databases
  36. 36. MySQL dumps </li><ul><li>Not very user friendly </li></ul><li>Obtained using the Libresoft Tools
  37. 37. http://www.flossmetrics.org/ </li></ul>
  38. 38. FLOSSMole <ul><li>Database with information about all the SourceForge.net projects
  39. 39. ~150,000 projects
  40. 40. Mainly metainformation, obtained through parsing the web pages of the projects
  41. 41. No low level or fine grained information
  42. 42. http://flossmole.org </li></ul>
  43. 43. PROMISE repository <ul><li>All PROMISE papers must also submit a package with the data used in the paper
  44. 44. http://promisedata.org/
  45. 45. 101 datasets </li><ul><li>Defect prediction (58)
  46. 46. Effort prediction (18)
  47. 47. General (9)
  48. 48. Model-based SE (7)
  49. 49. Text mining (9) </li></ul></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×