Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Mining Software Repositories What to do? And where to get data? Israel Herraiz < [email_address] > Universidad Alfonso X e...
Outline <ul><li>What is Mining Software Repositories? What  are repositories?
Conferences and journals of interest </li></ul><ul><ul><li>And some words about trending topics </li></ul></ul><ul><li>Too...
Datasets for Mining Software Repositories </li></ul><ul><ul><li>For replicable and verifiable empirical studies </li></ul>...
1. What is Mining Software Repositories?
What is Mining Software Repositories? <ul><li>MSR analyzes the rich data available in software repositories to uncover int...
Popular topic since 2004 </li><ul><li>MSR workshop, colocated with ICSE
Working Conference since 2008 </li></ul></ul>
What are repositories? <ul><li>Anything that leaves a trail about any software development or maintenance activities
Also includes any software artifact
Tipically </li><ul><li>Version control systems
Bug tracking systems
Public communication tools (mailing lists) </li></ul></ul>
Differences between artifact and repository #include <stdio.h> int main() { printf(“Hello world”); return 0; } Artifact  S...
2. Conferences and journals of interest
Working conferences of interest IEEE Int. Working Conf. Source Code Analysis & Manipulation (SCAM) http://www.ieee-scam.or...
Conferences of interest IEEE Int. Conf. Software Engineering (ICSE) http://www.sbs.co.za/ICSE2010/ IEEE Int. Conf. Softwar...
Other interesting conferences <ul><li>Working Conference on Reverse Engineering (WCRE) </li><ul><li>http://web.soccerlab.p...
Journals of interest <ul><li>IEEE Transactions on Software Engineering (TSE)  </li><ul><li>http://www.computer.org/tse/ </...
Handy links <ul><li>Software Engineering Conferences </li><ul><li>Verification, Formal Methods, Programming Lang. and Comp...
http://people.engr.ncsu.edu/txie/seconferences.htm   </li></ul><li>Upcoming Software Engineering Conferences Map </li><ul>...
Trending topics <ul><li>Replication of empirical studies </li><ul><li>The replication package </li></ul><li>Recommendation...
Upcoming SlideShare
Loading in …5
×

Mining Software Repositories

6,759 views

Published on

Mining Software Repositories. Where to get data? Where to publish?

Published in: Technology

Mining Software Repositories

  1. 1. Mining Software Repositories What to do? And where to get data? Israel Herraiz < [email_address] > Universidad Alfonso X el Sabio June 18 th 2010
  2. 2. Outline <ul><li>What is Mining Software Repositories? What are repositories?
  3. 3. Conferences and journals of interest </li></ul><ul><ul><li>And some words about trending topics </li></ul></ul><ul><li>Tools for Mining Software Repositories
  4. 4. Datasets for Mining Software Repositories </li></ul><ul><ul><li>For replicable and verifiable empirical studies </li></ul></ul>
  5. 5. 1. What is Mining Software Repositories?
  6. 6. What is Mining Software Repositories? <ul><li>MSR analyzes the rich data available in software repositories to uncover interesting and actionable information about software systems and projects.
  7. 7. Popular topic since 2004 </li><ul><li>MSR workshop, colocated with ICSE
  8. 8. Working Conference since 2008 </li></ul></ul>
  9. 9. What are repositories? <ul><li>Anything that leaves a trail about any software development or maintenance activities
  10. 10. Also includes any software artifact
  11. 11. Tipically </li><ul><li>Version control systems
  12. 12. Bug tracking systems
  13. 13. Public communication tools (mailing lists) </li></ul></ul>
  14. 14. Differences between artifact and repository #include <stdio.h> int main() { printf(“Hello world”); return 0; } Artifact Source code file hello.c - printf(“Hello world”); + printf(“Hello world ”); Author: rms Date: 20100618 04:34 UTC Change: +1 -1 Log: Forgot to add new line hello.c.diff Repository Change to an artifact Meta-information
  15. 15. 2. Conferences and journals of interest
  16. 16. Working conferences of interest IEEE Int. Working Conf. Source Code Analysis & Manipulation (SCAM) http://www.ieee-scam.org IEEE Int. Working Conf. Mining Software Repositories (MSR) http://msr.uwaterloo.ca Deadlines Accept rate Journal possib. January (Februray for the challenge) April 26% (2007) 38% (2008) 45% (2009) 19% (2008) 31% (2010) JSS SCP EMSE IEEE TSE
  17. 17. Conferences of interest IEEE Int. Conf. Software Engineering (ICSE) http://www.sbs.co.za/ICSE2010/ IEEE Int. Conf. Software Maintenance (ICSM) http://icsm2010.upt.ro/ Deadlines Accept rate Journal possib. April August September 15% (2008) 12% (2009) 14% (2010) 21% (2007) 26% (2008) 22% (2009) No special issues No special issues Empirical Software Eng. & Measurement (EMSE) http://www.esem-conferences.org/ March ? EMSE
  18. 18. Other interesting conferences <ul><li>Working Conference on Reverse Engineering (WCRE) </li><ul><li>http://web.soccerlab.polymtl.ca/wcre2010/ </li></ul><li>International Conference on Predictive Models and Software Engineering (PROMISE) </li><ul><li>http://promisedata.org/ </li></ul><li>European Conference on Software Mainteance and Re-engineering (CSMR) </li><ul><li>http://www.sait.escet.urjc.es/csmr2010/ </li></ul></ul>
  19. 19. Journals of interest <ul><li>IEEE Transactions on Software Engineering (TSE) </li><ul><li>http://www.computer.org/tse/ </li></ul><li>ACM Transactions on Software Engineering and Methodology (TOSEM) </li><ul><li>http://tosem.acm.org/ </li></ul><li>Empirical Software Engineering (EMSE) </li><ul><li>http://www.springerlink.com/content/1382-3256 </li></ul><li>Journal of Systems and Software (JSS) </li><ul><li>http://www.elsevier.com/locate/jss </li></ul><li>Journal of Software Maintenance and Evolution (JSME) </li><ul><li>http://eu.wiley.com/WileyCDA/WileyTitle/productCd-SMR.html </li></ul></ul>
  20. 20. Handy links <ul><li>Software Engineering Conferences </li><ul><li>Verification, Formal Methods, Programming Lang. and Compilers, Web, Security
  21. 21. http://people.engr.ncsu.edu/txie/seconferences.htm </li></ul><li>Upcoming Software Engineering Conferences Map </li><ul><li>http://research.csc.ncsu.edu/ase/semap/ </li></ul></ul>
  22. 22. Trending topics <ul><li>Replication of empirical studies </li><ul><li>The replication package </li></ul><li>Recommendation systems </li><ul><li>Automated Software Engineering </li></ul></ul>
  23. 23. 3. Tools for Mining Software Repositories
  24. 24. Tools for Mining Software Repositories <ul><li>Mining tools </li><ul><li>Libresoft Tools http://tools.libresoft.es/
  25. 25. CVSAnaly – CVS/SVN/Git repositories log parser
  26. 26. MLStats – Mailman and Mboxes parser
  27. 27. Bicho – Bugzilla and SF.net tracker parser </li></ul><li>Software Architecture Group (SWAG) – University of Waterloo </li><ul><li>http://www.swag.uwaterloo.ca/tools.html </li></ul></ul>
  28. 28. 4. Datasets for Mining Software Repositories
  29. 29. MSR Mining Challenge <ul><li>Mirrors of the version archives and bug databases for Mozilla Firefox and Eclipse </li><ul><li>http://msr.uwaterloo.ca/msr2008/challenge/ </li></ul><li>Repository logs of over 500+ Gnome projects, XML dump of the bug databases, and the complete SVN repositories of 69 Gnome projects </li><ul><li>http://msr.uwaterloo.ca/msr2009/challenge/ </li></ul></ul>
  30. 30. Ultimate Debian Database <ul><li>Database with information about packages and bug reports of Debian and Ubuntu </li><ul><li>http://udd.debian.org/ </li></ul></ul>
  31. 31. Eclipse bug database <ul><li>Saarland University
  32. 32. Datasheets, databases, scripts, with information about Eclipse bug reports for several releases
  33. 33. http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/ </li></ul>
  34. 34. FLOSSMetrics <ul><li>Databases about ~5000 open source projects
  35. 35. Control version repositories, mailing list archives, bug tracking databases
  36. 36. MySQL dumps </li><ul><li>Not very user friendly </li></ul><li>Obtained using the Libresoft Tools
  37. 37. http://www.flossmetrics.org/ </li></ul>
  38. 38. FLOSSMole <ul><li>Database with information about all the SourceForge.net projects
  39. 39. ~150,000 projects
  40. 40. Mainly metainformation, obtained through parsing the web pages of the projects
  41. 41. No low level or fine grained information
  42. 42. http://flossmole.org </li></ul>
  43. 43. PROMISE repository <ul><li>All PROMISE papers must also submit a package with the data used in the paper
  44. 44. http://promisedata.org/
  45. 45. 101 datasets </li><ul><li>Defect prediction (58)
  46. 46. Effort prediction (18)
  47. 47. General (9)
  48. 48. Model-based SE (7)
  49. 49. Text mining (9) </li></ul></ul>

×