SlideShare a Scribd company logo
1 of 22
Download to read offline
Open source scientific software
What, why, & how

Ga¨l Varoquaux
e

—

Slides on slideshare
Please allow me to introduce myself
I’m a man of wealth and taste
I’ve been around for a long, long year

2005..2007: Experimental-control software
Quantum physics, free-fall airplanes

2006... Open source scientific Python
Mayavi, scikit-learn, joblib, nipy, nilearn...

2008 Consultant, scientific Python
Startup: Enthought, Texas

Scipy/Euroscipy conference chair

G Varoquaux

2
Open source scientific software

1 What

data
access

G Varoquaux

source

science

3
1 Open Source: definitions
Free redistribution
Access to source code
Allow derived work
No discrimination against persons or groups /
against fields of endeavor
FSL, I am looking at you
Universities are commercial entities
(Madey vs Duke)
OSI: Open Source Initiative http://opensource.org
G Varoquaux

4
1 Open Source: definitions
Free redistribution
Access to source code
Open Community
Allow derivedawork repository: read & write
Access to code
SPM, FreeSurfer... I am looking at you
No discrimination against persons or groups /
against fields of endeavor
FSL, I am looking at you
Universities are commercial entities
(Madey vs Duke)
OSI: Open Source Initiative http://opensource.org
G Varoquaux

4
1 Choice of license
Use it, don’t screw my users
BSD, MIT
Viral by code inclusion
LGPL
CopyLeft
GPL
Do you understand the consequences?
- GPL code cannot be linked to MKL
- LGPL code can only be reused in GPL/LGPL code
- Code with no licenses cannot be used
G Varoquaux

http://opensource.org/licenses

5
1 Choice of license
Use it, don’t screw my users
BSD, MIT
Viral by code inclusion
LGPL
CopyLeft
GPL
Do you understand the consequences?
Don’t invent licenses
Legalese should be left to lawyers
G Varoquaux

http://opensource.org/licenses

5
1 Choice of license
Use it, don’t screw my users
BSD, MIT
Use BSD code inclusion
Viral by
foster private sector
LGPL
avoid legal difficulties
we need
CopyLeft a much reuse as possible
science should not have strings attached
GPL
Do you understand the consequences?
Don’t invent licenses
Legalese should be left to lawyers
G Varoquaux

http://opensource.org/licenses

5
Open source scientific software

2 Why

How do we justify the investment
to our bosses
to the funding agencies

www.phdcomics.com

G Varoquaux

6
2 For the Good of Science
“if it’s not open and
verifiable by others, it’s
not science, or engineering,
or whatever it is you call
what we do” Stodden, 2010
“An article about computational science in a scientific
publication is not the scholarship itself, it is merely
advertising of the scholarship. The actual scholarship is
the complete software development environment.”
Buckheit & Donoho, 1995
Reproducible science
G Varoquaux

7
2 For the Good of Science
“if it’s not open and
verifiable by others, it’s
not science, or engineering,
or whatever it is you call
what we do” Stodden, 2010 are high-level
These
conclusions
“An article about computational science in a scientific
Need more it is merely
publication is not the scholarship itself,ground-to
-earth arguments
advertising of the scholarship. The actual scholarship is
the complete software development environment.”
Buckheit & Donoho, 1995
Reproducible science
G Varoquaux

7
2 Lab survival: beyond the oral tradition

Can you run the analysis
of the lab’s former students?

We need basic building blocks
More eyes make bugs shallow

G Varoquaux

8
2 The economics
Code maintenance is expensive
scikit-learn ∼ 300 email/month nipy ∼ 45 email/month
joblib ∼ 45 email/month
mayavi ∼ 30 email/month
“Hey Gael, I take it you’re too
busy. That’s okay, I spent a day
trying to install XXX and I think
I’ll succeed myself. Next time
though please don’t ignore my
emails, I really don’t like it. You
can say, ‘sorry, I have no time to
help you.’ Just don’t ignore.”
G Varoquaux

9
2 The economics
Code maintenance is expensive
scikit-learn ∼ 300 email/month nipy ∼ 45 email/month
joblib ∼ 45 email/month
mayavi ∼ 30 email/month
Your “benefits” come from a fraction of the code
Data loading?
Standard algorithms?
Share the common code...
...to avoid dying under code
Code becomes less precious with time
And somebody might contribute features
G Varoquaux

9
2 Having an impact
To reach our target audience
(neuroscientists, MD)
To disseminate our ideas
To facilitate new ideas
Can bring citations

G Varoquaux

10
Open source scientific software

3 How

G Varoquaux

11
3 Choice of environment
Python, what else?
High-level language
- interactive
ipython
- easy to debug
- general purpose
Scientific computing environment
- array-computing
numpy
- rich ecosystem
scipy, scikit-learn,
scikit-image...

G Varoquaux

12
3 6 steps to a successfull project
1 Focus on quality
2 Build great docs and examples
3 Use github
4 Limit the technicality of your codebase
5 Releasing and packaging matter
6 Focus on your contributors,
give them credit
http://www.slideshare.net/GaelVaroquaux/
scikit-learn-dveloppement-communautaire

G Varoquaux

13
3 Scikit-learn: a very successful project
General-purpose machine learning in Python
Over 200 contributors
∼ 12 core devs

Huge feature list: benefits of wide team
Success recipe: product vision, great docs, high-level
Documentation: all figures are generated
Crafting simple didactic examples has taught us a lot
⇒ Executable docs
= textbooks of the future
G Varoquaux

14
3 Nilearn: making multivariate analysis routine
Project scope
Very preliminar
Machine learning for neuroimaging:
make using scikit-learn on neuroimaging easy
The target user base is small
Examples in the docs
Run out of the box,
downloading open data
Produce a clear figure
Data from Miyawaki 2008

Routine, simple, reproduction of papers
G Varoquaux

ni

15
Open source scientific software
It’s worth it
Do it right:
- Liberal licensing (BSD)
- Realistic engineer compromises
- Quality and ease of use (the apple strategy)

Work with us on nilearn
Examples = open science

@GaelVaroquaux

ni
Open source a tragedie

1/f distribution

Source: Fernando Perez

More Related Content

What's hot

Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in PythonGael Varoquaux
 
Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Abhik Roychoudhury
 
Automated Program Repair, Distinguished lecture at MPI-SWS
Automated Program Repair, Distinguished lecture at MPI-SWSAutomated Program Repair, Distinguished lecture at MPI-SWS
Automated Program Repair, Distinguished lecture at MPI-SWSAbhik Roychoudhury
 
Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...Gael Varoquaux
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona KeynoteTravis Oliphant
 
Constructing Operating Systems and E-Commerce
Constructing Operating Systems and E-CommerceConstructing Operating Systems and E-Commerce
Constructing Operating Systems and E-CommerceIJARIIT
 

What's hot (10)

Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in Python
 
Binary Analysis - Luxembourg
Binary Analysis - LuxembourgBinary Analysis - Luxembourg
Binary Analysis - Luxembourg
 
NUS PhD e-open day 2020
NUS PhD e-open day 2020NUS PhD e-open day 2020
NUS PhD e-open day 2020
 
Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Singapore International Cyberweek 2020
Singapore International Cyberweek 2020
 
Automated Program Repair, Distinguished lecture at MPI-SWS
Automated Program Repair, Distinguished lecture at MPI-SWSAutomated Program Repair, Distinguished lecture at MPI-SWS
Automated Program Repair, Distinguished lecture at MPI-SWS
 
Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...
 
APSEC2020 Keynote
APSEC2020 KeynoteAPSEC2020 Keynote
APSEC2020 Keynote
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
Python and Sage
Python and SagePython and Sage
Python and Sage
 
Constructing Operating Systems and E-Commerce
Constructing Operating Systems and E-CommerceConstructing Operating Systems and E-Commerce
Constructing Operating Systems and E-Commerce
 

Viewers also liked

Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...Camille Maumet
 
Software imaging-meeting
Software imaging-meetingSoftware imaging-meeting
Software imaging-meetingpiloubazin
 
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionA hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionGael Varoquaux
 
Advanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisAdvanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisGael Varoquaux
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataGael Varoquaux
 
Top 10 signs in gastroenterology
Top 10 signs in gastroenterologyTop 10 signs in gastroenterology
Top 10 signs in gastroenterologyEaswar Moorthy
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectGael Varoquaux
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIGael Varoquaux
 
Connectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsConnectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsGael Varoquaux
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsGael Varoquaux
 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonGael Varoquaux
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonGael Varoquaux
 
The Default Mode Network
The Default Mode NetworkThe Default Mode Network
The Default Mode NetworkEvelyn McKelvie
 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsGael Varoquaux
 
A (quick) introduction to Magnetic Resonance Imagery preprocessing and analysis
A (quick) introduction to Magnetic Resonance Imagery preprocessing and analysisA (quick) introduction to Magnetic Resonance Imagery preprocessing and analysis
A (quick) introduction to Magnetic Resonance Imagery preprocessing and analysisStephen Larroque
 
NEUROIMAGING IN PSYCHIATRY
NEUROIMAGING IN PSYCHIATRYNEUROIMAGING IN PSYCHIATRY
NEUROIMAGING IN PSYCHIATRYSubrata Naskar
 
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Gael Varoquaux
 
Estimating cellphone signal intensity & identifying
Estimating cellphone signal intensity & identifyingEstimating cellphone signal intensity & identifying
Estimating cellphone signal intensity & identifyingeSAT Publishing House
 

Viewers also liked (18)

Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
 
Software imaging-meeting
Software imaging-meetingSoftware imaging-meeting
Software imaging-meeting
 
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionA hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
 
Advanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisAdvanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysis
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of data
 
Top 10 signs in gastroenterology
Top 10 signs in gastroenterologyTop 10 signs in gastroenterology
Top 10 signs in gastroenterology
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRI
 
Connectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsConnectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis Methods
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questions
 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en Python
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
 
The Default Mode Network
The Default Mode NetworkThe Default Mode Network
The Default Mode Network
 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizations
 
A (quick) introduction to Magnetic Resonance Imagery preprocessing and analysis
A (quick) introduction to Magnetic Resonance Imagery preprocessing and analysisA (quick) introduction to Magnetic Resonance Imagery preprocessing and analysis
A (quick) introduction to Magnetic Resonance Imagery preprocessing and analysis
 
NEUROIMAGING IN PSYCHIATRY
NEUROIMAGING IN PSYCHIATRYNEUROIMAGING IN PSYCHIATRY
NEUROIMAGING IN PSYCHIATRY
 
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
 
Estimating cellphone signal intensity & identifying
Estimating cellphone signal intensity & identifyingEstimating cellphone signal intensity & identifying
Estimating cellphone signal intensity & identifying
 

Similar to Open Source Scientific Software

Oscon 2011 Practicing Open Science
Oscon 2011 Practicing Open ScienceOscon 2011 Practicing Open Science
Oscon 2011 Practicing Open ScienceMarcus Hanwell
 
ImageJ and the SciJava software stack
ImageJ and the SciJava software stackImageJ and the SciJava software stack
ImageJ and the SciJava software stackCurtis Rueden
 
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengeIronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengePurdue RCODI
 
Open experiments and open-source
Open experiments and open-sourceOpen experiments and open-source
Open experiments and open-sourcepeircej
 
Facilitate Research Communities Adoption of Open Science Publishing Principle...
Facilitate Research Communities Adoption of Open Science Publishing Principle...Facilitate Research Communities Adoption of Open Science Publishing Principle...
Facilitate Research Communities Adoption of Open Science Publishing Principle...OpenAIRE
 
What Open Compute Project has in store for us all in 2020! webinar
What Open Compute Project has in store for us all in 2020! webinarWhat Open Compute Project has in store for us all in 2020! webinar
What Open Compute Project has in store for us all in 2020! webinarSubmer Immersion Cooling
 
12.5.18 "How For-Profit Companies Can Be a Part of the Open Environment" pres...
12.5.18 "How For-Profit Companies Can Be a Part of the Open Environment" pres...12.5.18 "How For-Profit Companies Can Be a Part of the Open Environment" pres...
12.5.18 "How For-Profit Companies Can Be a Part of the Open Environment" pres...DuraSpace
 
Learning Open Source through GSOC
Learning Open Source through GSOC Learning Open Source through GSOC
Learning Open Source through GSOC smarru
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoEGI Federation
 
Wageningen phenotype meeting
Wageningen phenotype meetingWageningen phenotype meeting
Wageningen phenotype meetingthehyve
 
Open Science: políticas e herramientas en Europa - Universidad de Cantabria
Open Science: políticas e herramientas en Europa - Universidad de CantabriaOpen Science: políticas e herramientas en Europa - Universidad de Cantabria
Open Science: políticas e herramientas en Europa - Universidad de CantabriaPedro Príncipe
 
Purdue unal iron hacks 2018 spring - award ceremony
Purdue unal iron hacks 2018 spring - award ceremonyPurdue unal iron hacks 2018 spring - award ceremony
Purdue unal iron hacks 2018 spring - award ceremonyPurdue RCODI
 
OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Felix Z. Hoffmann
 
Open Source Movement
Open Source MovementOpen Source Movement
Open Source MovementMesut Yılmaz
 
Open Source Compliance at Twitter
Open Source Compliance at TwitterOpen Source Compliance at Twitter
Open Source Compliance at TwitterChris Aniszczyk
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better ResearchCarole Goble
 

Similar to Open Source Scientific Software (20)

Oscon 2011 Practicing Open Science
Oscon 2011 Practicing Open ScienceOscon 2011 Practicing Open Science
Oscon 2011 Practicing Open Science
 
Do you speak open science
Do you speak open science Do you speak open science
Do you speak open science
 
Sgci sc18-11-14-18
Sgci sc18-11-14-18Sgci sc18-11-14-18
Sgci sc18-11-14-18
 
ImageJ and the SciJava software stack
ImageJ and the SciJava software stackImageJ and the SciJava software stack
ImageJ and the SciJava software stack
 
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengeIronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
 
Oscon 2011 schroeder
Oscon 2011 schroederOscon 2011 schroeder
Oscon 2011 schroeder
 
Open experiments and open-source
Open experiments and open-sourceOpen experiments and open-source
Open experiments and open-source
 
Facilitate Research Communities Adoption of Open Science Publishing Principle...
Facilitate Research Communities Adoption of Open Science Publishing Principle...Facilitate Research Communities Adoption of Open Science Publishing Principle...
Facilitate Research Communities Adoption of Open Science Publishing Principle...
 
What Open Compute Project has in store for us all in 2020! webinar
What Open Compute Project has in store for us all in 2020! webinarWhat Open Compute Project has in store for us all in 2020! webinar
What Open Compute Project has in store for us all in 2020! webinar
 
12.5.18 "How For-Profit Companies Can Be a Part of the Open Environment" pres...
12.5.18 "How For-Profit Companies Can Be a Part of the Open Environment" pres...12.5.18 "How For-Profit Companies Can Be a Part of the Open Environment" pres...
12.5.18 "How For-Profit Companies Can Be a Part of the Open Environment" pres...
 
Learning Open Source through GSOC
Learning Open Source through GSOC Learning Open Source through GSOC
Learning Open Source through GSOC
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
 
Wageningen phenotype meeting
Wageningen phenotype meetingWageningen phenotype meeting
Wageningen phenotype meeting
 
Open Science: políticas e herramientas en Europa - Universidad de Cantabria
Open Science: políticas e herramientas en Europa - Universidad de CantabriaOpen Science: políticas e herramientas en Europa - Universidad de Cantabria
Open Science: políticas e herramientas en Europa - Universidad de Cantabria
 
Purdue unal iron hacks 2018 spring - award ceremony
Purdue unal iron hacks 2018 spring - award ceremonyPurdue unal iron hacks 2018 spring - award ceremony
Purdue unal iron hacks 2018 spring - award ceremony
 
OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?
 
Open Source Movement
Open Source MovementOpen Source Movement
Open Source Movement
 
Open Source Compliance at Twitter
Open Source Compliance at TwitterOpen Source Compliance at Twitter
Open Source Compliance at Twitter
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 

More from Gael Varoquaux

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueGael Varoquaux
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingGael Varoquaux
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing valuesGael Varoquaux
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataGael Varoquaux
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settingsGael Varoquaux
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Gael Varoquaux
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingGael Varoquaux
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesGael Varoquaux
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomesGael Varoquaux
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingGael Varoquaux
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingGael Varoquaux
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsGael Varoquaux
 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsityGael Varoquaux
 

More from Gael Varoquaux (15)

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic value
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imaging
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing values
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settings
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mapping
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imaging
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imaging
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsity
 

Recently uploaded

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Open Source Scientific Software

  • 1. Open source scientific software What, why, & how Ga¨l Varoquaux e — Slides on slideshare
  • 2. Please allow me to introduce myself I’m a man of wealth and taste I’ve been around for a long, long year 2005..2007: Experimental-control software Quantum physics, free-fall airplanes 2006... Open source scientific Python Mayavi, scikit-learn, joblib, nipy, nilearn... 2008 Consultant, scientific Python Startup: Enthought, Texas Scipy/Euroscipy conference chair G Varoquaux 2
  • 3. Open source scientific software 1 What data access G Varoquaux source science 3
  • 4. 1 Open Source: definitions Free redistribution Access to source code Allow derived work No discrimination against persons or groups / against fields of endeavor FSL, I am looking at you Universities are commercial entities (Madey vs Duke) OSI: Open Source Initiative http://opensource.org G Varoquaux 4
  • 5. 1 Open Source: definitions Free redistribution Access to source code Open Community Allow derivedawork repository: read & write Access to code SPM, FreeSurfer... I am looking at you No discrimination against persons or groups / against fields of endeavor FSL, I am looking at you Universities are commercial entities (Madey vs Duke) OSI: Open Source Initiative http://opensource.org G Varoquaux 4
  • 6. 1 Choice of license Use it, don’t screw my users BSD, MIT Viral by code inclusion LGPL CopyLeft GPL Do you understand the consequences? - GPL code cannot be linked to MKL - LGPL code can only be reused in GPL/LGPL code - Code with no licenses cannot be used G Varoquaux http://opensource.org/licenses 5
  • 7. 1 Choice of license Use it, don’t screw my users BSD, MIT Viral by code inclusion LGPL CopyLeft GPL Do you understand the consequences? Don’t invent licenses Legalese should be left to lawyers G Varoquaux http://opensource.org/licenses 5
  • 8. 1 Choice of license Use it, don’t screw my users BSD, MIT Use BSD code inclusion Viral by foster private sector LGPL avoid legal difficulties we need CopyLeft a much reuse as possible science should not have strings attached GPL Do you understand the consequences? Don’t invent licenses Legalese should be left to lawyers G Varoquaux http://opensource.org/licenses 5
  • 9. Open source scientific software 2 Why How do we justify the investment to our bosses to the funding agencies www.phdcomics.com G Varoquaux 6
  • 10. 2 For the Good of Science “if it’s not open and verifiable by others, it’s not science, or engineering, or whatever it is you call what we do” Stodden, 2010 “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment.” Buckheit & Donoho, 1995 Reproducible science G Varoquaux 7
  • 11. 2 For the Good of Science “if it’s not open and verifiable by others, it’s not science, or engineering, or whatever it is you call what we do” Stodden, 2010 are high-level These conclusions “An article about computational science in a scientific Need more it is merely publication is not the scholarship itself,ground-to -earth arguments advertising of the scholarship. The actual scholarship is the complete software development environment.” Buckheit & Donoho, 1995 Reproducible science G Varoquaux 7
  • 12. 2 Lab survival: beyond the oral tradition Can you run the analysis of the lab’s former students? We need basic building blocks More eyes make bugs shallow G Varoquaux 8
  • 13. 2 The economics Code maintenance is expensive scikit-learn ∼ 300 email/month nipy ∼ 45 email/month joblib ∼ 45 email/month mayavi ∼ 30 email/month “Hey Gael, I take it you’re too busy. That’s okay, I spent a day trying to install XXX and I think I’ll succeed myself. Next time though please don’t ignore my emails, I really don’t like it. You can say, ‘sorry, I have no time to help you.’ Just don’t ignore.” G Varoquaux 9
  • 14. 2 The economics Code maintenance is expensive scikit-learn ∼ 300 email/month nipy ∼ 45 email/month joblib ∼ 45 email/month mayavi ∼ 30 email/month Your “benefits” come from a fraction of the code Data loading? Standard algorithms? Share the common code... ...to avoid dying under code Code becomes less precious with time And somebody might contribute features G Varoquaux 9
  • 15. 2 Having an impact To reach our target audience (neuroscientists, MD) To disseminate our ideas To facilitate new ideas Can bring citations G Varoquaux 10
  • 16. Open source scientific software 3 How G Varoquaux 11
  • 17. 3 Choice of environment Python, what else? High-level language - interactive ipython - easy to debug - general purpose Scientific computing environment - array-computing numpy - rich ecosystem scipy, scikit-learn, scikit-image... G Varoquaux 12
  • 18. 3 6 steps to a successfull project 1 Focus on quality 2 Build great docs and examples 3 Use github 4 Limit the technicality of your codebase 5 Releasing and packaging matter 6 Focus on your contributors, give them credit http://www.slideshare.net/GaelVaroquaux/ scikit-learn-dveloppement-communautaire G Varoquaux 13
  • 19. 3 Scikit-learn: a very successful project General-purpose machine learning in Python Over 200 contributors ∼ 12 core devs Huge feature list: benefits of wide team Success recipe: product vision, great docs, high-level Documentation: all figures are generated Crafting simple didactic examples has taught us a lot ⇒ Executable docs = textbooks of the future G Varoquaux 14
  • 20. 3 Nilearn: making multivariate analysis routine Project scope Very preliminar Machine learning for neuroimaging: make using scikit-learn on neuroimaging easy The target user base is small Examples in the docs Run out of the box, downloading open data Produce a clear figure Data from Miyawaki 2008 Routine, simple, reproduction of papers G Varoquaux ni 15
  • 21. Open source scientific software It’s worth it Do it right: - Liberal licensing (BSD) - Realistic engineer compromises - Quality and ease of use (the apple strategy) Work with us on nilearn Examples = open science @GaelVaroquaux ni
  • 22. Open source a tragedie 1/f distribution Source: Fernando Perez