Python Meetup Talk 21072009
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,475
On Slideshare
1,426
From Embeds
49
Number of Embeds
5

Actions

Shares
Downloads
15
Comments
0
Likes
0

Embeds 49

http://delaciutatalmon.wordpress.com 39
http://www.slideshare.net 5
http://www.techgig.com 2
http://www.linkedin.com 2
http://www.lmodules.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction Data Mining And the results are A vision over the present and the future Mining Software Repositories Improving software Pere Urb´n Bayes o Data Management Group Dept. Arquitectura de Computadors Universitat Polit`cnica de Catalunya e purbon@ac.upc.edu July of 2009 Pere Urb´n Bayes o Mining Software Repositories
  • 2. Introduction Data Mining And the results are A vision over the present and the future Index Introduction Data Mining The results The future Pere Urb´n Bayes o Mining Software Repositories
  • 3. Introduction Motivations Data Mining The Situation And the results are Objectives A vision over the present and the future The problem Companies need to own highly available and reliable software. The software of low quality harms both, clients and producers. Unfortunately, avoiding defects is a difficult task to undertake. Project Leaders need to keep an eye inside to many projects. Software engineer tend not to document software in deep. The complexity of software projects is growing every day. Pere Urb´n Bayes o Mining Software Repositories
  • 4. Introduction Motivations Data Mining The Situation And the results are Objectives A vision over the present and the future The software development process Pere Urb´n Bayes o Mining Software Repositories
  • 5. Introduction Motivations Data Mining The Situation And the results are Objectives A vision over the present and the future Support tools Tools used to support software development: Version Control server. Bug Tracker server. Project Management server. Life cycle management software. ... This set of tools store a huge amount of information during the process, Why not to use this information to improve our software? Pere Urb´n Bayes o Mining Software Repositories
  • 6. Introduction Motivations Data Mining The Situation And the results are Objectives A vision over the present and the future Objective and Applications Objectives: Analyse the use of data mining technology, to data stored in support tools, with the aim to improve software quality. Develop an experimental prototype tool. Applications: Reduce the error rate. Provides a non-exploited source of documentation. Provide a new source of support tools for IDE’s. Pere Urb´n Bayes o Mining Software Repositories
  • 7. Introduction Data Mining Introduction And the results are The use of A vision over the present and the future Data mining Type of database analysis that attempts to discover useful patterns or relationships in a group of data. The analysis uses advanced statistical methods, such as cluster analysis, and sometimes employs artificial intelligence or neural network techniques. A major goal of data mining is to discover previously unknown relationships among the data, especially when the data come from different databases. Pere Urb´n Bayes o Mining Software Repositories
  • 8. Introduction Data Mining Introduction And the results are The use of A vision over the present and the future Methods Types of: Traditional Data Mining (K-Means, C4.5, Bayesian Networks). Relational Data Mining (ILP, Markov logic networks, Relational bayesian methods, Dependency Networks). Categories: Clusterers Classifiers Associative rules Network Models. Pere Urb´n Bayes o Mining Software Repositories
  • 9. Introduction Data Mining Introduction And the results are The use of A vision over the present and the future Data mining Type of database analysis that attempts to discover useful patterns or relationships in a group of data. The analysis uses advanced statistical methods, such as cluster analysis, and sometimes employs artificial intelligence or neural network techniques. A major goal of data mining is to discover previously unknown relationships among the data, especially when the data come from different databases. Pere Urb´n Bayes o Mining Software Repositories
  • 10. Introduction Data Mining Introduction And the results are The use of A vision over the present and the future Issue detection LOC DefectAppearence2Month RevisionsAuthor LineAddedIRLAdd ReportedI2Month Revision2Month LineAddedIRLDel Revision3Month Releases AlterType DefectAppearence3Month ReportedI1Month AgeMonths ReportedI3Month ReportedIssues RevisionAge Revision5Month ReportedI5Month DefectReleases DefectAppearence5Month Revision1Month DefectAppearance1Month Question: Has this file a non detected error. The exact number of errors can be predicted to. Pere Urb´n Bayes o Mining Software Repositories
  • 11. Introduction Data Mining Introduction And the results are The use of A vision over the present and the future Another types of objectives Predict bugs related to a software developer. Prediction of bugs in software components. This techniques could be used in different topics: Software understanding. Software evolution. Software visualization. Change propagation. Impact analysis. Software complexity. Fault prediction. Pere Urb´n Bayes o Mining Software Repositories
  • 12. Introduction Data Mining Error prediction And the results are Software A vision over the present and the future Error prediction Eclipse Project Firefox Project Correctly classified 94.65% 94.822% Statistics Kappa 0.893 0.8883 Precision 0.9465 0.9482 Recall 0.945 0.949 AUC ROC 0.9682 0.9808 Eclipse-Firefox Firefox-Eclipse Correctly classified 82.0065% 87.975% Statistics Kappa 0.5976 0.7595 Precision 0.818 0.894 Recall 0.82 0.88 AUC ROC 0.805 0.83 Pere Urb´n Bayes o Mining Software Repositories
  • 13. Introduction Data Mining Error prediction And the results are Software A vision over the present and the future The end App Pere Urb´n Bayes o Mining Software Repositories
  • 14. Introduction Data Mining Software libraries And the results are An envision A vision over the present and the future The Prototype Software being used: Programming: JAVA Database: MySQL and MonetDB. Data Mining: Weka 3.6 and Proximity 4.3 XML: Apache Xerces 2.9.1 SVN, CVS : svnkit 1.3.0, for CVS netbeans-cvs lib and a custom rcs file parser. Presentation: Prefuse Visualization Toolkit and Weka Drawing facilities. Pere Urb´n Bayes o Mining Software Repositories
  • 15. Introduction Data Mining Software libraries And the results are An envision A vision over the present and the future Could python give use the same? Machine Learning: Orange: With 1.0 this lib has many interesting and useful methods, Classification, Regression and Clustering. The most similar to Weka. PyML: Only has classifier facilities. Shogun: Only for Support Vector Machines. RPy: An interface to R. Databases: The most important relational databases are available via DB-API. ZODB: Zope Object Database. Metakit: An embedded database with a not defined paradigm. Pygr: Python graph database framework for bioinformatics. Pere Urb´n Bayes o Mining Software Repositories
  • 16. Introduction Data Mining Software libraries And the results are An envision A vision over the present and the future Could python give use the same? Presentation: Graph Drawing: NetworkX, with nice result. There are some other but they look incomplete. GUI: PyQT, wxWindows, pyGTK. It’s your taste XD!. SVN, CVS processing: SVN: pysvn - Python interface to Subversion. CVS: It seams nothing is available. GIT: PyGit - Pythonic git bindings targeted towards porcelains. XML Processing could be done using built-in support and with any SAX or DOM parser. Pere Urb´n Bayes o Mining Software Repositories
  • 17. Introduction Data Mining Software libraries And the results are An envision A vision over the present and the future The future Known issues: Data preprocessing performance. Database performance, is the relational model valid? Dynamic procedure addition. The Todo List: Develop new procedures over different related topics, like software visualization, change support, etc. Develop a more mature software. Python could help in some parts. This software must be easily extensible. Improve the hole process performance. Pere Urb´n Bayes o Mining Software Repositories
  • 18. Introduction Data Mining Software libraries And the results are An envision A vision over the present and the future The end Question? Pere Urb´n Bayes o Data Management Group Dept. Arquitectura de Computadors Universitat Polit`cnica de Catalunya e purbon@ac.upc.edu Pere Urb´n Bayes o Mining Software Repositories