Open source scientific software
What, why, & how

Ga¨l Varoquaux
e

—

Slides on slideshare
Please allow me to introduce myself
I’m a man of wealth and taste
I’ve been around for a long, long year

2005..2007: Experimental-control software
Quantum physics, free-fall airplanes

2006... Open source scientific Python
Mayavi, scikit-learn, joblib, nipy, nilearn...

2008 Consultant, scientific Python
Startup: Enthought, Texas

Scipy/Euroscipy conference chair

G Varoquaux

2
Open source scientific software

1 What

data
access

G Varoquaux

source

science

3
1 Open Source: definitions
Free redistribution
Access to source code
Allow derived work
No discrimination against persons or groups /
against fields of endeavor
FSL, I am looking at you
Universities are commercial entities
(Madey vs Duke)
OSI: Open Source Initiative http://opensource.org
G Varoquaux

4
1 Open Source: definitions
Free redistribution
Access to source code
Open Community
Allow derivedawork repository: read & write
Access to code
SPM, FreeSurfer... I am looking at you
No discrimination against persons or groups /
against fields of endeavor
FSL, I am looking at you
Universities are commercial entities
(Madey vs Duke)
OSI: Open Source Initiative http://opensource.org
G Varoquaux

4
1 Choice of license
Use it, don’t screw my users
BSD, MIT
Viral by code inclusion
LGPL
CopyLeft
GPL
Do you understand the consequences?
- GPL code cannot be linked to MKL
- LGPL code can only be reused in GPL/LGPL code
- Code with no licenses cannot be used
G Varoquaux

http://opensource.org/licenses

5
1 Choice of license
Use it, don’t screw my users
BSD, MIT
Viral by code inclusion
LGPL
CopyLeft
GPL
Do you understand the consequences?
Don’t invent licenses
Legalese should be left to lawyers
G Varoquaux

http://opensource.org/licenses

5
1 Choice of license
Use it, don’t screw my users
BSD, MIT
Use BSD code inclusion
Viral by
foster private sector
LGPL
avoid legal difficulties
we need
CopyLeft a much reuse as possible
science should not have strings attached
GPL
Do you understand the consequences?
Don’t invent licenses
Legalese should be left to lawyers
G Varoquaux

http://opensource.org/licenses

5
Open source scientific software

2 Why

How do we justify the investment
to our bosses
to the funding agencies

www.phdcomics.com

G Varoquaux

6
2 For the Good of Science
“if it’s not open and
verifiable by others, it’s
not science, or engineering,
or whatever it is you call
what we do” Stodden, 2010
“An article about computational science in a scientific
publication is not the scholarship itself, it is merely
advertising of the scholarship. The actual scholarship is
the complete software development environment.”
Buckheit & Donoho, 1995
Reproducible science
G Varoquaux

7
2 For the Good of Science
“if it’s not open and
verifiable by others, it’s
not science, or engineering,
or whatever it is you call
what we do” Stodden, 2010 are high-level
These
conclusions
“An article about computational science in a scientific
Need more it is merely
publication is not the scholarship itself,ground-to
-earth arguments
advertising of the scholarship. The actual scholarship is
the complete software development environment.”
Buckheit & Donoho, 1995
Reproducible science
G Varoquaux

7
2 Lab survival: beyond the oral tradition

Can you run the analysis
of the lab’s former students?

We need basic building blocks
More eyes make bugs shallow

G Varoquaux

8
2 The economics
Code maintenance is expensive
scikit-learn ∼ 300 email/month nipy ∼ 45 email/month
joblib ∼ 45 email/month
mayavi ∼ 30 email/month
“Hey Gael, I take it you’re too
busy. That’s okay, I spent a day
trying to install XXX and I think
I’ll succeed myself. Next time
though please don’t ignore my
emails, I really don’t like it. You
can say, ‘sorry, I have no time to
help you.’ Just don’t ignore.”
G Varoquaux

9
2 The economics
Code maintenance is expensive
scikit-learn ∼ 300 email/month nipy ∼ 45 email/month
joblib ∼ 45 email/month
mayavi ∼ 30 email/month
Your “benefits” come from a fraction of the code
Data loading?
Standard algorithms?
Share the common code...
...to avoid dying under code
Code becomes less precious with time
And somebody might contribute features
G Varoquaux

9
2 Having an impact
To reach our target audience
(neuroscientists, MD)
To disseminate our ideas
To facilitate new ideas
Can bring citations

G Varoquaux

10
Open source scientific software

3 How

G Varoquaux

11
3 Choice of environment
Python, what else?
High-level language
- interactive
ipython
- easy to debug
- general purpose
Scientific computing environment
- array-computing
numpy
- rich ecosystem
scipy, scikit-learn,
scikit-image...

G Varoquaux

12
3 6 steps to a successfull project
1 Focus on quality
2 Build great docs and examples
3 Use github
4 Limit the technicality of your codebase
5 Releasing and packaging matter
6 Focus on your contributors,
give them credit
http://www.slideshare.net/GaelVaroquaux/
scikit-learn-dveloppement-communautaire

G Varoquaux

13
3 Scikit-learn: a very successful project
General-purpose machine learning in Python
Over 200 contributors
∼ 12 core devs

Huge feature list: benefits of wide team
Success recipe: product vision, great docs, high-level
Documentation: all figures are generated
Crafting simple didactic examples has taught us a lot
⇒ Executable docs
= textbooks of the future
G Varoquaux

14
3 Nilearn: making multivariate analysis routine
Project scope
Very preliminar
Machine learning for neuroimaging:
make using scikit-learn on neuroimaging easy
The target user base is small
Examples in the docs
Run out of the box,
downloading open data
Produce a clear figure
Data from Miyawaki 2008

Routine, simple, reproduction of papers
G Varoquaux

ni

15
Open source scientific software
It’s worth it
Do it right:
- Liberal licensing (BSD)
- Realistic engineer compromises
- Quality and ease of use (the apple strategy)

Work with us on nilearn
Examples = open science

@GaelVaroquaux

ni
Open source a tragedie

1/f distribution

Source: Fernando Perez

Open Source Scientific Software

  • 1.
    Open source scientificsoftware What, why, & how Ga¨l Varoquaux e — Slides on slideshare
  • 2.
    Please allow meto introduce myself I’m a man of wealth and taste I’ve been around for a long, long year 2005..2007: Experimental-control software Quantum physics, free-fall airplanes 2006... Open source scientific Python Mayavi, scikit-learn, joblib, nipy, nilearn... 2008 Consultant, scientific Python Startup: Enthought, Texas Scipy/Euroscipy conference chair G Varoquaux 2
  • 3.
    Open source scientificsoftware 1 What data access G Varoquaux source science 3
  • 4.
    1 Open Source:definitions Free redistribution Access to source code Allow derived work No discrimination against persons or groups / against fields of endeavor FSL, I am looking at you Universities are commercial entities (Madey vs Duke) OSI: Open Source Initiative http://opensource.org G Varoquaux 4
  • 5.
    1 Open Source:definitions Free redistribution Access to source code Open Community Allow derivedawork repository: read & write Access to code SPM, FreeSurfer... I am looking at you No discrimination against persons or groups / against fields of endeavor FSL, I am looking at you Universities are commercial entities (Madey vs Duke) OSI: Open Source Initiative http://opensource.org G Varoquaux 4
  • 6.
    1 Choice oflicense Use it, don’t screw my users BSD, MIT Viral by code inclusion LGPL CopyLeft GPL Do you understand the consequences? - GPL code cannot be linked to MKL - LGPL code can only be reused in GPL/LGPL code - Code with no licenses cannot be used G Varoquaux http://opensource.org/licenses 5
  • 7.
    1 Choice oflicense Use it, don’t screw my users BSD, MIT Viral by code inclusion LGPL CopyLeft GPL Do you understand the consequences? Don’t invent licenses Legalese should be left to lawyers G Varoquaux http://opensource.org/licenses 5
  • 8.
    1 Choice oflicense Use it, don’t screw my users BSD, MIT Use BSD code inclusion Viral by foster private sector LGPL avoid legal difficulties we need CopyLeft a much reuse as possible science should not have strings attached GPL Do you understand the consequences? Don’t invent licenses Legalese should be left to lawyers G Varoquaux http://opensource.org/licenses 5
  • 9.
    Open source scientificsoftware 2 Why How do we justify the investment to our bosses to the funding agencies www.phdcomics.com G Varoquaux 6
  • 10.
    2 For theGood of Science “if it’s not open and verifiable by others, it’s not science, or engineering, or whatever it is you call what we do” Stodden, 2010 “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment.” Buckheit & Donoho, 1995 Reproducible science G Varoquaux 7
  • 11.
    2 For theGood of Science “if it’s not open and verifiable by others, it’s not science, or engineering, or whatever it is you call what we do” Stodden, 2010 are high-level These conclusions “An article about computational science in a scientific Need more it is merely publication is not the scholarship itself,ground-to -earth arguments advertising of the scholarship. The actual scholarship is the complete software development environment.” Buckheit & Donoho, 1995 Reproducible science G Varoquaux 7
  • 12.
    2 Lab survival:beyond the oral tradition Can you run the analysis of the lab’s former students? We need basic building blocks More eyes make bugs shallow G Varoquaux 8
  • 13.
    2 The economics Codemaintenance is expensive scikit-learn ∼ 300 email/month nipy ∼ 45 email/month joblib ∼ 45 email/month mayavi ∼ 30 email/month “Hey Gael, I take it you’re too busy. That’s okay, I spent a day trying to install XXX and I think I’ll succeed myself. Next time though please don’t ignore my emails, I really don’t like it. You can say, ‘sorry, I have no time to help you.’ Just don’t ignore.” G Varoquaux 9
  • 14.
    2 The economics Codemaintenance is expensive scikit-learn ∼ 300 email/month nipy ∼ 45 email/month joblib ∼ 45 email/month mayavi ∼ 30 email/month Your “benefits” come from a fraction of the code Data loading? Standard algorithms? Share the common code... ...to avoid dying under code Code becomes less precious with time And somebody might contribute features G Varoquaux 9
  • 15.
    2 Having animpact To reach our target audience (neuroscientists, MD) To disseminate our ideas To facilitate new ideas Can bring citations G Varoquaux 10
  • 16.
    Open source scientificsoftware 3 How G Varoquaux 11
  • 17.
    3 Choice ofenvironment Python, what else? High-level language - interactive ipython - easy to debug - general purpose Scientific computing environment - array-computing numpy - rich ecosystem scipy, scikit-learn, scikit-image... G Varoquaux 12
  • 18.
    3 6 stepsto a successfull project 1 Focus on quality 2 Build great docs and examples 3 Use github 4 Limit the technicality of your codebase 5 Releasing and packaging matter 6 Focus on your contributors, give them credit http://www.slideshare.net/GaelVaroquaux/ scikit-learn-dveloppement-communautaire G Varoquaux 13
  • 19.
    3 Scikit-learn: avery successful project General-purpose machine learning in Python Over 200 contributors ∼ 12 core devs Huge feature list: benefits of wide team Success recipe: product vision, great docs, high-level Documentation: all figures are generated Crafting simple didactic examples has taught us a lot ⇒ Executable docs = textbooks of the future G Varoquaux 14
  • 20.
    3 Nilearn: makingmultivariate analysis routine Project scope Very preliminar Machine learning for neuroimaging: make using scikit-learn on neuroimaging easy The target user base is small Examples in the docs Run out of the box, downloading open data Produce a clear figure Data from Miyawaki 2008 Routine, simple, reproduction of papers G Varoquaux ni 15
  • 21.
    Open source scientificsoftware It’s worth it Do it right: - Liberal licensing (BSD) - Realistic engineer compromises - Quality and ease of use (the apple strategy) Work with us on nilearn Examples = open science @GaelVaroquaux ni
  • 22.
    Open source atragedie 1/f distribution Source: Fernando Perez