Mining Software Repositories
Upcoming SlideShare
Loading in...5
×
 

Mining Software Repositories

on

  • 5,466 views

Mining Software Repositories. Where to get data? Where to publish?

Mining Software Repositories. Where to get data? Where to publish?

Statistics

Views

Total Views
5,466
Views on SlideShare
5,410
Embed Views
56

Actions

Likes
4
Downloads
194
Comments
2

2 Embeds 56

http://herraiz.org 54
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Mining Software Repositories Mining Software Repositories Presentation Transcript

    • Mining Software Repositories What to do? And where to get data? Israel Herraiz < [email_address] > Universidad Alfonso X el Sabio June 18 th 2010
    • Outline
      • What is Mining Software Repositories? What are repositories?
      • Conferences and journals of interest
        • And some words about trending topics
      • Tools for Mining Software Repositories
      • Datasets for Mining Software Repositories
        • For replicable and verifiable empirical studies
    • 1. What is Mining Software Repositories?
    • What is Mining Software Repositories?
      • MSR analyzes the rich data available in software repositories to uncover interesting and actionable information about software systems and projects.
      • Popular topic since 2004
        • MSR workshop, colocated with ICSE
        • Working Conference since 2008
    • What are repositories?
      • Anything that leaves a trail about any software development or maintenance activities
      • Also includes any software artifact
      • Tipically
        • Version control systems
        • Bug tracking systems
        • Public communication tools (mailing lists)
    • Differences between artifact and repository #include <stdio.h> int main() { printf(“Hello world”); return 0; } Artifact Source code file hello.c - printf(“Hello world”); + printf(“Hello world ”); Author: rms Date: 20100618 04:34 UTC Change: +1 -1 Log: Forgot to add new line hello.c.diff Repository Change to an artifact Meta-information
    • 2. Conferences and journals of interest
    • Working conferences of interest IEEE Int. Working Conf. Source Code Analysis & Manipulation (SCAM) http://www.ieee-scam.org IEEE Int. Working Conf. Mining Software Repositories (MSR) http://msr.uwaterloo.ca Deadlines Accept rate Journal possib. January (Februray for the challenge) April 26% (2007) 38% (2008) 45% (2009) 19% (2008) 31% (2010) JSS SCP EMSE IEEE TSE
    • Conferences of interest IEEE Int. Conf. Software Engineering (ICSE) http://www.sbs.co.za/ICSE2010/ IEEE Int. Conf. Software Maintenance (ICSM) http://icsm2010.upt.ro/ Deadlines Accept rate Journal possib. April August September 15% (2008) 12% (2009) 14% (2010) 21% (2007) 26% (2008) 22% (2009) No special issues No special issues Empirical Software Eng. & Measurement (EMSE) http://www.esem-conferences.org/ March ? EMSE
    • Other interesting conferences
      • Working Conference on Reverse Engineering (WCRE)
        • http://web.soccerlab.polymtl.ca/wcre2010/
      • International Conference on Predictive Models and Software Engineering (PROMISE)
        • http://promisedata.org/
      • European Conference on Software Mainteance and Re-engineering (CSMR)
        • http://www.sait.escet.urjc.es/csmr2010/
    • Journals of interest
      • IEEE Transactions on Software Engineering (TSE)
        • http://www.computer.org/tse/
      • ACM Transactions on Software Engineering and Methodology (TOSEM)
        • http://tosem.acm.org/
      • Empirical Software Engineering (EMSE)
        • http://www.springerlink.com/content/1382-3256
      • Journal of Systems and Software (JSS)
        • http://www.elsevier.com/locate/jss
      • Journal of Software Maintenance and Evolution (JSME)
        • http://eu.wiley.com/WileyCDA/WileyTitle/productCd-SMR.html
    • Handy links
      • Software Engineering Conferences
        • Verification, Formal Methods, Programming Lang. and Compilers, Web, Security
        • http://people.engr.ncsu.edu/txie/seconferences.htm
      • Upcoming Software Engineering Conferences Map
        • http://research.csc.ncsu.edu/ase/semap/
    • Trending topics
      • Replication of empirical studies
        • The replication package
      • Recommendation systems
        • Automated Software Engineering
    • 3. Tools for Mining Software Repositories
    • Tools for Mining Software Repositories
      • Mining tools
        • Libresoft Tools http://tools.libresoft.es/
        • CVSAnaly – CVS/SVN/Git repositories log parser
        • MLStats – Mailman and Mboxes parser
        • Bicho – Bugzilla and SF.net tracker parser
      • Software Architecture Group (SWAG) – University of Waterloo
        • http://www.swag.uwaterloo.ca/tools.html
    • 4. Datasets for Mining Software Repositories
    • MSR Mining Challenge
      • Mirrors of the version archives and bug databases for Mozilla Firefox and Eclipse
        • http://msr.uwaterloo.ca/msr2008/challenge/
      • Repository logs of over 500+ Gnome projects, XML dump of the bug databases, and the complete SVN repositories of 69 Gnome projects
        • http://msr.uwaterloo.ca/msr2009/challenge/
    • Ultimate Debian Database
      • Database with information about packages and bug reports of Debian and Ubuntu
        • http://udd.debian.org/
    • Eclipse bug database
      • Saarland University
      • Datasheets, databases, scripts, with information about Eclipse bug reports for several releases
      • http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/
    • FLOSSMetrics
      • Databases about ~5000 open source projects
      • Control version repositories, mailing list archives, bug tracking databases
      • MySQL dumps
        • Not very user friendly
      • Obtained using the Libresoft Tools
      • http://www.flossmetrics.org/
    • FLOSSMole
      • Database with information about all the SourceForge.net projects
      • ~150,000 projects
      • Mainly metainformation, obtained through parsing the web pages of the projects
      • No low level or fine grained information
      • http://flossmole.org
    • PROMISE repository
      • All PROMISE papers must also submit a package with the data used in the paper
      • http://promisedata.org/
      • 101 datasets
        • Defect prediction (58)
        • Effort prediction (18)
        • General (9)
        • Model-based SE (7)
        • Text mining (9)