PyDriller: Python Framework for
Mining Software Repositories
Davide Spadini, Mauricio Aniche, Alberto Bacchelli
PyDriller: Python Framework for
Mining Software Repositories
Davide Spadini, Mauricio Aniche, Alberto Bacchelli
ishepard @DavideSpadini
What?
Framework to analyse Git (and soon Mercurial)
repositories
Why?
• There are already many frameworks for Git
• Generally, one for each programming language
• Java -> JGit
• Python -> GitPython
• Javascript -> nodegit
• etc.
So, why?
How many commands does Git have?
• > 20?
• > 50?
• > 100?
• > 150?
154!!
PyDriller
• Aim: to ease the extraction of information from Git repositories
• What is supported:
• analysing the history of a project
• retrieving commit information (date, message, authors, etc.)
• retrieving files information (diff, source code)
• What is not supported:
• writing on the repo (git pull, git push, git add, git commit,
etc..)
Demo
Statistics
• Everything is lazy evaluated, so you “pay” what you get.
1. only commit information:
immediate (as git log)
2. commit and file information:
60 commits/sec (1240 commits in 22 seconds)
3. commit, file and metrics information:
4 commits/s (1240 commits in ~5min)
Thank you for your support!
• Some numbers:
1. Downloaded approximatively 4000 times
2. 100 times only last 2 weeks
• Community driven
• University of Zurich, TU Delft and University of Catania teach
PyDriller in their MSR courses
• SIG uses PyDriller in their quality assessments
What’s next?
• A company asked me to implement
RepositoryMining().traverse_files()
• Mercurial support
• Ideas? Talk to me or submit a PR :)
PyDriller
• Source code: https://github.com/ishepard/pydriller
• Doc: https://pydriller.readthedocs.io/en/latest/
• Feel free to leave a star! :)

PyDriller: Python Framework for Mining Software Repositories

  • 1.
    PyDriller: Python Frameworkfor Mining Software Repositories Davide Spadini, Mauricio Aniche, Alberto Bacchelli
  • 2.
    PyDriller: Python Frameworkfor Mining Software Repositories Davide Spadini, Mauricio Aniche, Alberto Bacchelli ishepard @DavideSpadini
  • 3.
  • 4.
    Framework to analyseGit (and soon Mercurial) repositories
  • 5.
  • 6.
    • There arealready many frameworks for Git • Generally, one for each programming language • Java -> JGit • Python -> GitPython • Javascript -> nodegit • etc.
  • 7.
  • 11.
    How many commandsdoes Git have? • > 20? • > 50? • > 100? • > 150? 154!!
  • 12.
    PyDriller • Aim: toease the extraction of information from Git repositories • What is supported: • analysing the history of a project • retrieving commit information (date, message, authors, etc.) • retrieving files information (diff, source code) • What is not supported: • writing on the repo (git pull, git push, git add, git commit, etc..)
  • 13.
  • 14.
    Statistics • Everything islazy evaluated, so you “pay” what you get. 1. only commit information: immediate (as git log) 2. commit and file information: 60 commits/sec (1240 commits in 22 seconds) 3. commit, file and metrics information: 4 commits/s (1240 commits in ~5min)
  • 15.
    Thank you foryour support! • Some numbers: 1. Downloaded approximatively 4000 times 2. 100 times only last 2 weeks • Community driven • University of Zurich, TU Delft and University of Catania teach PyDriller in their MSR courses • SIG uses PyDriller in their quality assessments
  • 16.
    What’s next? • Acompany asked me to implement RepositoryMining().traverse_files() • Mercurial support • Ideas? Talk to me or submit a PR :)
  • 17.
    PyDriller • Source code:https://github.com/ishepard/pydriller • Doc: https://pydriller.readthedocs.io/en/latest/ • Feel free to leave a star! :)