Open Source Scientific Software

1,941 views

Published on

Brief talk at BrainHack 2013 on developing open-source scientific software: strategical issues on project positioning,

Published in: Technology, Art & Photos
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,941
On SlideShare
0
From Embeds
0
Number of Embeds
96
Actions
Shares
0
Downloads
18
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Open Source Scientific Software

  1. 1. Open source scientific software What, why, & how Ga¨l Varoquaux e — Slides on slideshare
  2. 2. Please allow me to introduce myself I’m a man of wealth and taste I’ve been around for a long, long year 2005..2007: Experimental-control software Quantum physics, free-fall airplanes 2006... Open source scientific Python Mayavi, scikit-learn, joblib, nipy, nilearn... 2008 Consultant, scientific Python Startup: Enthought, Texas Scipy/Euroscipy conference chair G Varoquaux 2
  3. 3. Open source scientific software 1 What data access G Varoquaux source science 3
  4. 4. 1 Open Source: definitions Free redistribution Access to source code Allow derived work No discrimination against persons or groups / against fields of endeavor FSL, I am looking at you Universities are commercial entities (Madey vs Duke) OSI: Open Source Initiative http://opensource.org G Varoquaux 4
  5. 5. 1 Open Source: definitions Free redistribution Access to source code Open Community Allow derivedawork repository: read & write Access to code SPM, FreeSurfer... I am looking at you No discrimination against persons or groups / against fields of endeavor FSL, I am looking at you Universities are commercial entities (Madey vs Duke) OSI: Open Source Initiative http://opensource.org G Varoquaux 4
  6. 6. 1 Choice of license Use it, don’t screw my users BSD, MIT Viral by code inclusion LGPL CopyLeft GPL Do you understand the consequences? - GPL code cannot be linked to MKL - LGPL code can only be reused in GPL/LGPL code - Code with no licenses cannot be used G Varoquaux http://opensource.org/licenses 5
  7. 7. 1 Choice of license Use it, don’t screw my users BSD, MIT Viral by code inclusion LGPL CopyLeft GPL Do you understand the consequences? Don’t invent licenses Legalese should be left to lawyers G Varoquaux http://opensource.org/licenses 5
  8. 8. 1 Choice of license Use it, don’t screw my users BSD, MIT Use BSD code inclusion Viral by foster private sector LGPL avoid legal difficulties we need CopyLeft a much reuse as possible science should not have strings attached GPL Do you understand the consequences? Don’t invent licenses Legalese should be left to lawyers G Varoquaux http://opensource.org/licenses 5
  9. 9. Open source scientific software 2 Why How do we justify the investment to our bosses to the funding agencies www.phdcomics.com G Varoquaux 6
  10. 10. 2 For the Good of Science “if it’s not open and verifiable by others, it’s not science, or engineering, or whatever it is you call what we do” Stodden, 2010 “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment.” Buckheit & Donoho, 1995 Reproducible science G Varoquaux 7
  11. 11. 2 For the Good of Science “if it’s not open and verifiable by others, it’s not science, or engineering, or whatever it is you call what we do” Stodden, 2010 are high-level These conclusions “An article about computational science in a scientific Need more it is merely publication is not the scholarship itself,ground-to -earth arguments advertising of the scholarship. The actual scholarship is the complete software development environment.” Buckheit & Donoho, 1995 Reproducible science G Varoquaux 7
  12. 12. 2 Lab survival: beyond the oral tradition Can you run the analysis of the lab’s former students? We need basic building blocks More eyes make bugs shallow G Varoquaux 8
  13. 13. 2 The economics Code maintenance is expensive scikit-learn ∼ 300 email/month nipy ∼ 45 email/month joblib ∼ 45 email/month mayavi ∼ 30 email/month “Hey Gael, I take it you’re too busy. That’s okay, I spent a day trying to install XXX and I think I’ll succeed myself. Next time though please don’t ignore my emails, I really don’t like it. You can say, ‘sorry, I have no time to help you.’ Just don’t ignore.” G Varoquaux 9
  14. 14. 2 The economics Code maintenance is expensive scikit-learn ∼ 300 email/month nipy ∼ 45 email/month joblib ∼ 45 email/month mayavi ∼ 30 email/month Your “benefits” come from a fraction of the code Data loading? Standard algorithms? Share the common code... ...to avoid dying under code Code becomes less precious with time And somebody might contribute features G Varoquaux 9
  15. 15. 2 Having an impact To reach our target audience (neuroscientists, MD) To disseminate our ideas To facilitate new ideas Can bring citations G Varoquaux 10
  16. 16. Open source scientific software 3 How G Varoquaux 11
  17. 17. 3 Choice of environment Python, what else? High-level language - interactive ipython - easy to debug - general purpose Scientific computing environment - array-computing numpy - rich ecosystem scipy, scikit-learn, scikit-image... G Varoquaux 12
  18. 18. 3 6 steps to a successfull project 1 Focus on quality 2 Build great docs and examples 3 Use github 4 Limit the technicality of your codebase 5 Releasing and packaging matter 6 Focus on your contributors, give them credit http://www.slideshare.net/GaelVaroquaux/ scikit-learn-dveloppement-communautaire G Varoquaux 13
  19. 19. 3 Scikit-learn: a very successful project General-purpose machine learning in Python Over 200 contributors ∼ 12 core devs Huge feature list: benefits of wide team Success recipe: product vision, great docs, high-level Documentation: all figures are generated Crafting simple didactic examples has taught us a lot ⇒ Executable docs = textbooks of the future G Varoquaux 14
  20. 20. 3 Nilearn: making multivariate analysis routine Project scope Very preliminar Machine learning for neuroimaging: make using scikit-learn on neuroimaging easy The target user base is small Examples in the docs Run out of the box, downloading open data Produce a clear figure Data from Miyawaki 2008 Routine, simple, reproduction of papers G Varoquaux ni 15
  21. 21. Open source scientific software It’s worth it Do it right: - Liberal licensing (BSD) - Realistic engineer compromises - Quality and ease of use (the apple strategy) Work with us on nilearn Examples = open science @GaelVaroquaux ni
  22. 22. Open source a tragedie 1/f distribution Source: Fernando Perez

×