Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PyDriller: Python Framework for Mining Software Repositories

16 views

Published on

FSE 2018

Davide Spadini

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

PyDriller: Python Framework for Mining Software Repositories

  1. 1. PyDriller: Python Framework for Mining Software Repositories Davide Spadini, Mauricio Aniche, Alberto Bacchelli
  2. 2. PyDriller: Python Framework for Mining Software Repositories Davide Spadini, Mauricio Aniche, Alberto Bacchelli ishepard @DavideSpadini
  3. 3. What?
  4. 4. Framework to analyse Git (and soon Mercurial) repositories
  5. 5. Why?
  6. 6. • There are already many frameworks for Git • Generally, one for each programming language • Java -> JGit • Python -> GitPython • Javascript -> nodegit • etc.
  7. 7. So, why?
  8. 8. How many commands does Git have? • > 20? • > 50? • > 100? • > 150? 154!!
  9. 9. PyDriller • Aim: to ease the extraction of information from Git repositories • What is supported: • analysing the history of a project • retrieving commit information (date, message, authors, etc.) • retrieving files information (diff, source code) • What is not supported: • writing on the repo (git pull, git push, git add, git commit, etc..)
  10. 10. Demo
  11. 11. Statistics • Everything is lazy evaluated, so you “pay” what you get. 1. only commit information: immediate (as git log) 2. commit and file information: 60 commits/sec (1240 commits in 22 seconds) 3. commit, file and metrics information: 4 commits/s (1240 commits in ~5min)
  12. 12. Thank you for your support! • Some numbers: 1. Downloaded approximatively 4000 times 2. 100 times only last 2 weeks • Community driven • University of Zurich, TU Delft and University of Catania teach PyDriller in their MSR courses • SIG uses PyDriller in their quality assessments
  13. 13. What’s next? • A company asked me to implement RepositoryMining().traverse_files() • Mercurial support • Ideas? Talk to me or submit a PR :)
  14. 14. PyDriller • Source code: https://github.com/ishepard/pydriller • Doc: https://pydriller.readthedocs.io/en/latest/ • Feel free to leave a star! :)

×