SlideShare a Scribd company logo
1 of 86
Download to read offline
Computational practices for reproducible science
Ga¨el Varoquaux
import science
science.reproduce()
Computational practices for reproducible science
Ga¨el Varoquaux
This talk:
High-level advice on code in science
Pointers to good software practices
Computational practices for reproducible science
Ga¨el Varoquaux
Outline:
1 Coding for science, reproducibly
2 Making libraries
3 From software back to science
Where I come from
PhD in physics
Computers:
Matlab
Instrument control
Python for Science:
Startup guy
Mayavi
scikit-learn
joblib
nilearn
Applied maths / machine
learning for neuroscience
G Varoquaux 2
1 Coding for science,
reproducibly
G Varoquaux 3
1 Computational reproducibility
Computational reproducibility:
Automate everything
Control the environment
G Varoquaux 4
1 Automate everything
Just a simple matter of programming
G Varoquaux 5
1 Automate everything
Just a simple matter of programming
Scientific workflow:
Based on intuition
and experimentation
⇒ Iterative
Conjecture
Experiment
G Varoquaux 6
1 Automate everything
Just a simple matter of programming
Scientific workflow:
Based on intuition
and experimentation
⇒ Iterative
Conjecture
Experiment
needs consolidation
keeping flexibility
G Varoquaux 7
1 Automate everything
Just a simple matter of programming
Scientific workflow:
Based on intuition
and experimentation
⇒ Iterative
Conjecture
Experiment
needs consolidation
keeping flexibility
Lost in iterations
G Varoquaux 8
1 Reproducible computational science
Harder than it seemed
Impediments
Missing steps / files
Libraries have changed
Non portable code
Statistical / numerical instabilities
No one knows where the info is
G Varoquaux 9
1 Reproducible computational science
Harder than it seemed
Impediments
Missing steps / files
Libraries have changed
Non portable code
Statistical / numerical instabilities
No one knows where the info is
Technical
Human
Code quality matters
Manual steps are evil
G Varoquaux 9
1 How I work progressive consolidation
Start with a script playing to understand the problem
G Varoquaux 10
1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ⇒ move to a function
Use functions
Obstacle: local scope
requires identifying input and output variables
That’s a good thing
Interactive debugging / understanding
inside a function: %debug in IPython
Functions are the basic reusable abstraction
G Varoquaux 10
1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ⇒ move to a function
As they stabilize, move to a module
Modules
enable sharing between experiments
⇒ avoid 1000 lines scripts + commented code
enable testing
Fast experiments as tests
⇒ gives confidence, hence refactorings
G Varoquaux 10
1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ⇒ move to a function
As they stabilize, move to a module
Clean: delete code & files you have version control
Attentional load makes it impossible
to find or understand things
Where’s Waldo?G Varoquaux 10
1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ⇒ move to a function
As they stabilize, move to a module
Clean: delete code & files you have version control
Why is it hard?
Long compute times
make us unadventurous
Know your tools
Refactoring editor
Version control
G Varoquaux 10
1 Mitigating long compute times
by storing intermediate results
Makefiles
Accessible intermediate results
Break control flow & error management
G Varoquaux 11
1 Mitigating long compute times
by storing intermediate results
Makefiles
Accessible intermediate results
Break control flow & error management
Memoization / caching
g = mem.cache(f) # ”joblib” Python module
b = g(a) # computes a using f
c = g(a) # retrieves results from store
Fits in coding / experimentation loop
Black-boxy, persistence only implicit
Discourages refactoring (avoid recomputing)
G Varoquaux 11
1 A design pattern in computational experiments
MVC pattern (from Wikipedia
Model
Manages the data
and rules of the
application
View
Output represen-
tation
Possibly several views
Controller
Accepts input
and converts it to
commands
for model and view
Photo-editing software
Filters Canvas Tool palettes
Typical web application
Database Web-pages URLs
G Varoquaux 12
1 A design pattern in computational experiments
MVC pattern (from Wikipedia
Model
Manages the data
and rules of the
application
View
Output represen-
tation
Possibly several views
Controller
Accepts input
and converts it to
commands
for model and view
For data science:
Numerical, data-
processing, & ex-
perimental logic
Results, as files.
Data & plots
Imperative API
Avoid input as files:
not expressive
Module
with functions
Post-processing script
CSV & data files
Script
⇒ for loops
G Varoquaux 12
1 A design pattern in computational experiments
MVC pattern (from Wikipedia
Model
Manages the data
and rules of the
application
View
Output represen-
tation
Possibly several views
Controller
Accepts input
and converts it to
commands
for model and view
For data science:
Numerical, data-
processing, & ex-
perimental logic
Results, as files.
Data & plots
Imperative API
Avoid input as files:
not expressive
Module
with functions
Post-processing script
CSV & data files
Script
⇒ for loops
A recipe
3 types of files:
• modules • command scripts • post-processing scripts
CSVs & intermediate data files
Separate computation from analysis / plotting
Code and text (and data) ⇒ version control
G Varoquaux 12
1 A design pattern in computational experiments
MVC pattern (from Wikipedia
Model
Manages the data
and rules of the
application
View
Output represen-
tation
Possibly several views
Controller
Accepts input
and converts it to
commands
for model and view
For data science:
Numerical, data-
processing, & ex-
perimental logic
Results, as files.
Data & plots
Imperative API
Avoid input as files:
not expressive
Module
with functions
Post-processing script
CSV & data files
Script
⇒ for loops
A recipe
3 types of files:
• modules • command scripts • post-processing scripts
CSVs & intermediate data files
Separate computation from analysis / plotting
Code and text (and data) ⇒ version control
Decouple steps
Goals: Reuse code
Mitigate compute time
G Varoquaux 12
Adopting software-engineering best practices
G Varoquaux 13
1 The ladder of code quality
Use static checking in your editor seriously
G Varoquaux 14
1 The ladder of code quality
Use static checking in your editor seriously
Coding convention, good naming
“Any fool can write code that a computer
can understand. Good programmers write
code that humans can understand”
– Martin Fowler
Your future self will thank you
G Varoquaux 14
1 The ladder of code quality
Use static checking in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
G Varoquaux 14
1 The ladder of code qualityIncreasingcost
?
Use static checking in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
Premature software engineering
is the root of reproducible science evil
G Varoquaux 14
1 The ladder of code qualityIncreasingcost
?
Use static checking in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
Premature software engineering
is the root of reproducible science evil
Over versus under engineering
Our goal is scientific discovery
Experimentation to develop intuitions
⇒ new ideas
As the path becomes clear: consolidation
Heavy engineering too early freezes bad ideas
G Varoquaux 14
1 LibrariesIncreasingcost
?
Use static checking in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
A library
G Varoquaux 15
1 LibrariesIncreasingcost
?
Use static checking in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
A library
Code = Software
Well-defined problem = research
Functions should work as long as input is valid
Targets users, not the authors
Made for reuse
The workhorse of reproducibility
G Varoquaux 15
1 Libraries enable reproducible science
Reproducibility
Rerun and come to the same conclusion
An argument for copying all scripts
for each paper Better: a tag in git
G Varoquaux 16
1 Libraries enable reproducible science
Reproducibility
Rerun and come to the same conclusion
An argument for copying all scripts
for each paper Better: a tag in git
Copying scripts scales very poorly:
Accumulation of half-dead code ⇒ cognitive overload
No consolidation across studies
Each study has different variants of different bugs
Garden of forking code
G Varoquaux 16
1 Libraries enable reproducible science
Reproducibility
Rerun and come to the same conclusion
An argument for copying all scripts
for each paper Better: a tag in git
Kirstie Whitaker
G Varoquaux 16
1 Libraries enable reproducible science
Reproducibility
Rerun and come to the same conclusion
An argument for copying all scripts
for each paper
Frozen food
Reusability
Apply the approach to a new problem
Being able to understand, modify,
run in new settings
Generalization is the true scientific test
Reusable computational science = libraries
G Varoquaux 16
2 Making libraries
to enable reproducibility
G Varoquaux 17
2 Pick the right battles: project scope
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
G Varoquaux 18
2 Pick the right battles: project scope
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
G Varoquaux 18
2 Pick the right battles: project scope
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
Define project scope and vision
Break down projects by expertise
Don’t solve hard problems
Know the software landscape
Need a vision = elevator pitch
G Varoquaux 18
2 Pick the right battles: project scope
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
Define project scope and vision
Break down projects by expertise
Don’t solve hard problems
Know the software landscape
Need a vision = elevator pitch
Your research (PhD) probably does not qualify
⇒ need to cherry-pick contributions
G Varoquaux 18
2 Short cycles, limited ambitions
Keep coming back to your users
Release early, release often
G Varoquaux 19
2 Simplicity
Complexity increase superlinearly
[An Experiment on Unit Increase in Problem Complexity,
Woodfield 1979]
25% increase in problem complexity
⇒ 100% increase in code complexity
G Varoquaux 20
2 Simplicity
Complexity increase superlinearly
[An Experiment on Unit Increase in Problem Complexity,
Woodfield 1979]
25% increase in problem complexity
⇒ 100% increase in code complexity
The 80/20 rule
80% of the usecases can be solved
with 20% of the lines of code
Avoid feature creep
G Varoquaux 20
2 Quality
Quality is free
This is a book, by Philip Crosby
G Varoquaux 21
2 You need quality
Quality will give you users
Bugs give you bad rap
Quality will give you developers
Contribute to learn and improve
Quality avoids scientific errors
Do less, do better
Goes against the grant-system incentive
G Varoquaux 22
2 Quality: what & how
Great documentation
Simplify, but don’t dumb down
Focus on what the user is trying to solve
Great APIs
Example-based development
If something is hard to explain, rethink the concepts
Limit the number of different concepts and objects
Good numerics
Write tests based on mathematical properties
When a user finds an instability, write a new test
G Varoquaux 23
2 Quality: what & how
Great documentation
Simplify, but don’t dumb down
Focus on what the user is trying to solve
Great APIs
Example-based development
If something is hard to explain, rethink the concepts
Limit the number of different concepts and objects
Good numerics
Write tests based on mathematical properties
When a user finds an instability, write a new test
Quality enables reuse
Beyond mere reproducibility
G Varoquaux 23
2 Quality assurance
Code review: pull requests
Can include newcomers
We read each others code
Everything is discussed:
- Should the algorithm go in?
- Are there good defaults?
- Are names meaningfull?
- Are the numerics stable?
- Could it be faster?
G Varoquaux 24
2 Quality assurance
Unit testing
Everything is tested
Great for numerics
Overall tests enforce on all estimators
- consistency with the API
- basic invariances
- good handling of various inputs
G Varoquaux 25
2 Dependencies
Beyond installation, a challenge is to ensure package
versions play way together: correctness of the code
Breakage of backward compability
yields irreconcilable dependencies
G Varoquaux 26
2 Dependencies and their upgrade
It’s a fact: users hate upgrading
If it ain’t broken, don’t fix it
even if it is, apparently
G Varoquaux 27
2 Declaring undependence?
Monolythic packages with no dependencies...
But:
Scaling is hard
Complexity grows as square of codebase size
[Woodfield 1979]
User support grows with userbase size
G Varoquaux 28
3 From software back to science
G Varoquaux 29
3 Beyond computational reproducibility
Make every computational step reproducible,
and good science will emerge
G Varoquaux 30
3 Beyond computational reproducibility
Make every computational step reproducible,
and good science will emerge
Estimating the reproducibility of psychological science
[Science 2015] 36% of effects replicate
Reasons:
Statistical challenges — analysis degrees of freedom
Weak insentives — winner’s curse in publication
Seldom computational reproducibility
G Varoquaux 30
In practice, the best way to improve research
is to use the right (conceptual) tools.
G Varoquaux 31
3 Managing complexity
In practice, the best way to improve research
is to use the right (conceptual) tools.
The everyday roadblock is cognitive load
Machine learning, brain anatomy, psychology
R, Python, shell scripts
Funding agencies, reviewer 3, the Manuscript
G Varoquaux 32
3 Tools that reduce cognitive overload
It’s a design problem
G Varoquaux 33
3 Tools that reduce cognitive overload
Jonathan Ive, an industrial designer, is #4 at Apple
Code different.
G Varoquaux 34
3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 35
3 Some API design principles for the scipy stack
Consistency, consistency, consistency
np.save(file, obj) pickle.dump(obj, file)
fmin(...maxiter=10) lsq linear(...max iter=10)
Creates cognitive overload
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 36
3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
Objects have hidden states,
Objects have no universal interface, entry point, output
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 37
3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
How much do usage patterns carry out across the library?
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 38
3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Facilitates working with multiple libraries together
Easier to get up to speed with a given library
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 39
3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Change of behavior depending on input type
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 40
3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Interfaces define objects
Incompatible behaviors lead to bugs (eg np.matrix)
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 41
3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Properties obfuscate the data model of the object
Properties can create hidden compute costs
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 42
3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Objects are understood by their surface
Composition creates cognitive overload
Error messages matter
Be Pythonic
G Varoquaux 43
3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Explain the problem
Print the offending value
Be Pythonic
G Varoquaux 44
3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
Avoid syntax hacks
G Varoquaux 45
3 Scikit-learn API
Scikit-learn cheat sheet
Scikit-learn
Fit and predict
>>> estimator = Estimator(param1=param1)
>>> estimator.fit(X train, y train)
>>> y test = estimator.predict(X test)
Transform data
>>> X red = estimator.transform(X test)
G Varoquaux 46
3 Scikit-learn API
Scikit-learn cheat sheet
Scikit-learn
Fit and predict
>>> estimator = Estimator(param1=param1)
>>> estimator.fit(X train, y train)
>>> y test = estimator.predict(X test)
Transform data
>>> X red = estimator.transform(X test)
The estimator is a “contract”
(slightly more elaborate than above)
It has created an ecosystem of packages
Based on duck-typing, not inheritence
G Varoquaux 46
3 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
G Varoquaux 47
3 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
User flow on the scikit-learn website:
Examples
G Varoquaux 47
3 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
User flow on the nilearn website:
Examples
G Varoquaux 47
3 Building great documentation
Focus on explaining concepts (hint: write plain English)
Less is more: prioritize, avoid redundancy
Code examples must be short (link to full tutorial examples)
Links everywhere: users will land at the wrong place
Teach with the docs
Plan for maintenance of docs:
Continuous integration
Check links
Runs examples
Doctests
G Varoquaux 48
3 Reusable science
scikit-learn is the new machine-learning textbook
nilearn is the new neuroimaging review article
Experiments reproduced
at each commit
eg: brain reading
nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html
G Varoquaux 49
3 Reusable science
scikit-learn is the new machine-learning textbook
nilearn is the new neuroimaging review article
Experiments reproduced
at each commit
eg: brain reading
nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html
Resource intensive Continuous integration:
Data ⇒ Fight for good open data
Computation ⇒ Find good algorithms and tradeoffs
Forces us to distill the literature (as a review)
G Varoquaux 49
3 Reusable science
scikit-learn is the new machine-learning textbook
nilearn is the new neuroimaging review article
Experiments reproduced
at each commit
eg: brain reading
nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html
Package development consolidates
science and moves it outside the lab
G Varoquaux 49
3 Reusable science
scikit-learn is the new machine-learning textbook
nilearn is the new neuroimaging review article
Experiments reproduced
at each commit
eg: brain reading
nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html
Package development consolidates
science and moves it outside the lab
scipy-lectures: living book for Python in science
G Varoquaux 49
Coding as a scientist
Final code should be auditable,
ideally reusable
Tension between interactive computing
& automating
Main enemy: cognitive overload
G Varoquaux 50
Coding as a scientist
Final code should be auditable,
ideally reusable
Tension between interactive computing
& automating
Main enemy: cognitive overload
In the industry
Reusable
Verifiable? Not for silicon valley,
but in insurance, healthcare, banking...
Moving data-scientist code
to production?
Software projects going over budget?
G Varoquaux 50
@GaelVaroquaux
Computational practices for reproducible science
Software-engineering skills are superpowers
“text-book” software engineering (yet never taught)
Make you more productive
@GaelVaroquaux
Computational practices for reproducible science
Software-engineering skills are superpowers
“text-book” software engineering (yet never taught)
Make you more productive
But beware of over-engineering
In software development, keep an eye on the end goal
Science is iterative, not every step must be reproduced
@GaelVaroquaux
Computational practices for reproducible science
Software-engineering skills are superpowers
“text-book” software engineering (yet never taught)
Make you more productive
But beware of over-engineering
In software development, keep an eye on the end goal
Science is iterative, not every step must be reproduced
Science is made by humans, not computers
Usability, communication, human factors
Design for reuse & generalization
@GaelVaroquaux
Computational practices for reproducible science
Software-engineering skills are superpowers
“text-book” software engineering (yet never taught)
Make you more productive
But beware of over-engineering
In software development, keep an eye on the end goal
Science is iterative, not every step must be reproduced
Science is made by humans, not computers
Usability, communication, human factors
Design for reuse & generalization
Reproducible research must borrow
from agile development

More Related Content

What's hot

Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...Gael Varoquaux
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonMicrosoft
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator ProgramGoDataDriven
 
Automated Program Repair, Distinguished lecture at MPI-SWS
Automated Program Repair, Distinguished lecture at MPI-SWSAutomated Program Repair, Distinguished lecture at MPI-SWS
Automated Program Repair, Distinguished lecture at MPI-SWSAbhik Roychoudhury
 
Automated Program Repair Keynote talk
Automated Program Repair Keynote talkAutomated Program Repair Keynote talk
Automated Program Repair Keynote talkAbhik Roychoudhury
 

What's hot (9)

Binary Analysis - Luxembourg
Binary Analysis - LuxembourgBinary Analysis - Luxembourg
Binary Analysis - Luxembourg
 
Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...Python for brain mining: (neuro)science with state of the art machine learnin...
Python for brain mining: (neuro)science with state of the art machine learnin...
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
NUS PhD e-open day 2020
NUS PhD e-open day 2020NUS PhD e-open day 2020
NUS PhD e-open day 2020
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in Python
 
APSEC2020 Keynote
APSEC2020 KeynoteAPSEC2020 Keynote
APSEC2020 Keynote
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
 
Automated Program Repair, Distinguished lecture at MPI-SWS
Automated Program Repair, Distinguished lecture at MPI-SWSAutomated Program Repair, Distinguished lecture at MPI-SWS
Automated Program Repair, Distinguished lecture at MPI-SWS
 
Automated Program Repair Keynote talk
Automated Program Repair Keynote talkAutomated Program Repair Keynote talk
Automated Program Repair Keynote talk
 

Similar to Computational practices for reproducible science

Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R StudioSusan Johnston
 
Creating a reasonable project boilerplate
Creating a reasonable project boilerplateCreating a reasonable project boilerplate
Creating a reasonable project boilerplateStanislav Petrov
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataGael Varoquaux
 
Troubleshooting tips from docker support engineers
Troubleshooting tips from docker support engineersTroubleshooting tips from docker support engineers
Troubleshooting tips from docker support engineersDocker, Inc.
 
1803_STAMP_OpenCloudForum2018
1803_STAMP_OpenCloudForum20181803_STAMP_OpenCloudForum2018
1803_STAMP_OpenCloudForum2018STAMP Project
 
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxPyData
 
Pipeline as code for your infrastructure as Code
Pipeline as code for your infrastructure as CodePipeline as code for your infrastructure as Code
Pipeline as code for your infrastructure as CodeKris Buytaert
 
Simple tools to fight bigger quality battle
Simple tools to fight bigger quality battleSimple tools to fight bigger quality battle
Simple tools to fight bigger quality battleAnand Ramdeo
 
The Power Of Refactoring (PHPCon Italia)
The Power Of Refactoring (PHPCon Italia)The Power Of Refactoring (PHPCon Italia)
The Power Of Refactoring (PHPCon Italia)Stefan Koopmanschap
 
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...Bruno Tanoue
 
When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?Niklas Heidloff
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learningMikhail Rozhkov
 
Bug prediction + sdlc automation
Bug prediction + sdlc automationBug prediction + sdlc automation
Bug prediction + sdlc automationAlexey Tokar
 
It’s 2021. Why are we -still- rebooting for patches? A look at Live Patching.
It’s 2021. Why are we -still- rebooting for patches? A look at Live Patching.It’s 2021. Why are we -still- rebooting for patches? A look at Live Patching.
It’s 2021. Why are we -still- rebooting for patches? A look at Live Patching.All Things Open
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using SwiftDiego Freniche Brito
 
Complete python toolbox for modern developers
Complete python toolbox for modern developersComplete python toolbox for modern developers
Complete python toolbox for modern developersJan Giacomelli
 

Similar to Computational practices for reproducible science (20)

Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
 
Creating a reasonable project boilerplate
Creating a reasonable project boilerplateCreating a reasonable project boilerplate
Creating a reasonable project boilerplate
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of data
 
Troubleshooting tips from docker support engineers
Troubleshooting tips from docker support engineersTroubleshooting tips from docker support engineers
Troubleshooting tips from docker support engineers
 
20150422 repro resr
20150422 repro resr20150422 repro resr
20150422 repro resr
 
groovy & grails - lecture 6
groovy & grails - lecture 6groovy & grails - lecture 6
groovy & grails - lecture 6
 
1803_STAMP_OpenCloudForum2018
1803_STAMP_OpenCloudForum20181803_STAMP_OpenCloudForum2018
1803_STAMP_OpenCloudForum2018
 
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
 
Pipeline as code for your infrastructure as Code
Pipeline as code for your infrastructure as CodePipeline as code for your infrastructure as Code
Pipeline as code for your infrastructure as Code
 
Fuzzing - Part 2
Fuzzing - Part 2Fuzzing - Part 2
Fuzzing - Part 2
 
Simple tools to fight bigger quality battle
Simple tools to fight bigger quality battleSimple tools to fight bigger quality battle
Simple tools to fight bigger quality battle
 
The Power Of Refactoring (PHPCon Italia)
The Power Of Refactoring (PHPCon Italia)The Power Of Refactoring (PHPCon Italia)
The Power Of Refactoring (PHPCon Italia)
 
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
 
When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learning
 
JavaFX in Action Part I
JavaFX in Action Part IJavaFX in Action Part I
JavaFX in Action Part I
 
Bug prediction + sdlc automation
Bug prediction + sdlc automationBug prediction + sdlc automation
Bug prediction + sdlc automation
 
It’s 2021. Why are we -still- rebooting for patches? A look at Live Patching.
It’s 2021. Why are we -still- rebooting for patches? A look at Live Patching.It’s 2021. Why are we -still- rebooting for patches? A look at Live Patching.
It’s 2021. Why are we -still- rebooting for patches? A look at Live Patching.
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides:  Let's build macOS CLI Utilities using SwiftMobileConf 2021 Slides:  Let's build macOS CLI Utilities using Swift
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
 
Complete python toolbox for modern developers
Complete python toolbox for modern developersComplete python toolbox for modern developers
Complete python toolbox for modern developers
 

More from Gael Varoquaux

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueGael Varoquaux
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingGael Varoquaux
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing valuesGael Varoquaux
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataGael Varoquaux
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settingsGael Varoquaux
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Gael Varoquaux
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingGael Varoquaux
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesGael Varoquaux
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomesGael Varoquaux
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingGael Varoquaux
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingGael Varoquaux
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsGael Varoquaux
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsGael Varoquaux
 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsityGael Varoquaux
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIGael Varoquaux
 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsGael Varoquaux
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectGael Varoquaux
 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in PythonGael Varoquaux
 

More from Gael Varoquaux (20)

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic value
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imaging
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing values
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settings
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mapping
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imaging
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imaging
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questions
 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsity
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRI
 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizations
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in Python
 

Recently uploaded

《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 

Recently uploaded (20)

《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 

Computational practices for reproducible science

  • 1. Computational practices for reproducible science Ga¨el Varoquaux import science science.reproduce()
  • 2. Computational practices for reproducible science Ga¨el Varoquaux This talk: High-level advice on code in science Pointers to good software practices
  • 3. Computational practices for reproducible science Ga¨el Varoquaux Outline: 1 Coding for science, reproducibly 2 Making libraries 3 From software back to science
  • 4. Where I come from PhD in physics Computers: Matlab Instrument control Python for Science: Startup guy Mayavi scikit-learn joblib nilearn Applied maths / machine learning for neuroscience G Varoquaux 2
  • 5. 1 Coding for science, reproducibly G Varoquaux 3
  • 6. 1 Computational reproducibility Computational reproducibility: Automate everything Control the environment G Varoquaux 4
  • 7. 1 Automate everything Just a simple matter of programming G Varoquaux 5
  • 8. 1 Automate everything Just a simple matter of programming Scientific workflow: Based on intuition and experimentation ⇒ Iterative Conjecture Experiment G Varoquaux 6
  • 9. 1 Automate everything Just a simple matter of programming Scientific workflow: Based on intuition and experimentation ⇒ Iterative Conjecture Experiment needs consolidation keeping flexibility G Varoquaux 7
  • 10. 1 Automate everything Just a simple matter of programming Scientific workflow: Based on intuition and experimentation ⇒ Iterative Conjecture Experiment needs consolidation keeping flexibility Lost in iterations G Varoquaux 8
  • 11. 1 Reproducible computational science Harder than it seemed Impediments Missing steps / files Libraries have changed Non portable code Statistical / numerical instabilities No one knows where the info is G Varoquaux 9
  • 12. 1 Reproducible computational science Harder than it seemed Impediments Missing steps / files Libraries have changed Non portable code Statistical / numerical instabilities No one knows where the info is Technical Human Code quality matters Manual steps are evil G Varoquaux 9
  • 13. 1 How I work progressive consolidation Start with a script playing to understand the problem G Varoquaux 10
  • 14. 1 How I work progressive consolidation Start with a script playing to understand the problem Identify blocks/operations ⇒ move to a function Use functions Obstacle: local scope requires identifying input and output variables That’s a good thing Interactive debugging / understanding inside a function: %debug in IPython Functions are the basic reusable abstraction G Varoquaux 10
  • 15. 1 How I work progressive consolidation Start with a script playing to understand the problem Identify blocks/operations ⇒ move to a function As they stabilize, move to a module Modules enable sharing between experiments ⇒ avoid 1000 lines scripts + commented code enable testing Fast experiments as tests ⇒ gives confidence, hence refactorings G Varoquaux 10
  • 16. 1 How I work progressive consolidation Start with a script playing to understand the problem Identify blocks/operations ⇒ move to a function As they stabilize, move to a module Clean: delete code & files you have version control Attentional load makes it impossible to find or understand things Where’s Waldo?G Varoquaux 10
  • 17. 1 How I work progressive consolidation Start with a script playing to understand the problem Identify blocks/operations ⇒ move to a function As they stabilize, move to a module Clean: delete code & files you have version control Why is it hard? Long compute times make us unadventurous Know your tools Refactoring editor Version control G Varoquaux 10
  • 18. 1 Mitigating long compute times by storing intermediate results Makefiles Accessible intermediate results Break control flow & error management G Varoquaux 11
  • 19. 1 Mitigating long compute times by storing intermediate results Makefiles Accessible intermediate results Break control flow & error management Memoization / caching g = mem.cache(f) # ”joblib” Python module b = g(a) # computes a using f c = g(a) # retrieves results from store Fits in coding / experimentation loop Black-boxy, persistence only implicit Discourages refactoring (avoid recomputing) G Varoquaux 11
  • 20. 1 A design pattern in computational experiments MVC pattern (from Wikipedia Model Manages the data and rules of the application View Output represen- tation Possibly several views Controller Accepts input and converts it to commands for model and view Photo-editing software Filters Canvas Tool palettes Typical web application Database Web-pages URLs G Varoquaux 12
  • 21. 1 A design pattern in computational experiments MVC pattern (from Wikipedia Model Manages the data and rules of the application View Output represen- tation Possibly several views Controller Accepts input and converts it to commands for model and view For data science: Numerical, data- processing, & ex- perimental logic Results, as files. Data & plots Imperative API Avoid input as files: not expressive Module with functions Post-processing script CSV & data files Script ⇒ for loops G Varoquaux 12
  • 22. 1 A design pattern in computational experiments MVC pattern (from Wikipedia Model Manages the data and rules of the application View Output represen- tation Possibly several views Controller Accepts input and converts it to commands for model and view For data science: Numerical, data- processing, & ex- perimental logic Results, as files. Data & plots Imperative API Avoid input as files: not expressive Module with functions Post-processing script CSV & data files Script ⇒ for loops A recipe 3 types of files: • modules • command scripts • post-processing scripts CSVs & intermediate data files Separate computation from analysis / plotting Code and text (and data) ⇒ version control G Varoquaux 12
  • 23. 1 A design pattern in computational experiments MVC pattern (from Wikipedia Model Manages the data and rules of the application View Output represen- tation Possibly several views Controller Accepts input and converts it to commands for model and view For data science: Numerical, data- processing, & ex- perimental logic Results, as files. Data & plots Imperative API Avoid input as files: not expressive Module with functions Post-processing script CSV & data files Script ⇒ for loops A recipe 3 types of files: • modules • command scripts • post-processing scripts CSVs & intermediate data files Separate computation from analysis / plotting Code and text (and data) ⇒ version control Decouple steps Goals: Reuse code Mitigate compute time G Varoquaux 12
  • 24. Adopting software-engineering best practices G Varoquaux 13
  • 25. 1 The ladder of code quality Use static checking in your editor seriously G Varoquaux 14
  • 26. 1 The ladder of code quality Use static checking in your editor seriously Coding convention, good naming “Any fool can write code that a computer can understand. Good programmers write code that humans can understand” – Martin Fowler Your future self will thank you G Varoquaux 14
  • 27. 1 The ladder of code quality Use static checking in your editor seriously Coding convention, good naming Version control Use git + github Code review Unit testing If it’s not tested, it’s broken or soon will be Make a package controlled dependencies and compilation ... G Varoquaux 14
  • 28. 1 The ladder of code qualityIncreasingcost ? Use static checking in your editor seriously Coding convention, good naming Version control Use git + github Code review Unit testing If it’s not tested, it’s broken or soon will be Make a package controlled dependencies and compilation ... Premature software engineering is the root of reproducible science evil G Varoquaux 14
  • 29. 1 The ladder of code qualityIncreasingcost ? Use static checking in your editor seriously Coding convention, good naming Version control Use git + github Code review Unit testing If it’s not tested, it’s broken or soon will be Make a package controlled dependencies and compilation ... Premature software engineering is the root of reproducible science evil Over versus under engineering Our goal is scientific discovery Experimentation to develop intuitions ⇒ new ideas As the path becomes clear: consolidation Heavy engineering too early freezes bad ideas G Varoquaux 14
  • 30. 1 LibrariesIncreasingcost ? Use static checking in your editor seriously Coding convention, good naming Version control Use git + github Code review Unit testing If it’s not tested, it’s broken or soon will be Make a package controlled dependencies and compilation ... A library G Varoquaux 15
  • 31. 1 LibrariesIncreasingcost ? Use static checking in your editor seriously Coding convention, good naming Version control Use git + github Code review Unit testing If it’s not tested, it’s broken or soon will be Make a package controlled dependencies and compilation ... A library Code = Software Well-defined problem = research Functions should work as long as input is valid Targets users, not the authors Made for reuse The workhorse of reproducibility G Varoquaux 15
  • 32. 1 Libraries enable reproducible science Reproducibility Rerun and come to the same conclusion An argument for copying all scripts for each paper Better: a tag in git G Varoquaux 16
  • 33. 1 Libraries enable reproducible science Reproducibility Rerun and come to the same conclusion An argument for copying all scripts for each paper Better: a tag in git Copying scripts scales very poorly: Accumulation of half-dead code ⇒ cognitive overload No consolidation across studies Each study has different variants of different bugs Garden of forking code G Varoquaux 16
  • 34. 1 Libraries enable reproducible science Reproducibility Rerun and come to the same conclusion An argument for copying all scripts for each paper Better: a tag in git Kirstie Whitaker G Varoquaux 16
  • 35. 1 Libraries enable reproducible science Reproducibility Rerun and come to the same conclusion An argument for copying all scripts for each paper Frozen food Reusability Apply the approach to a new problem Being able to understand, modify, run in new settings Generalization is the true scientific test Reusable computational science = libraries G Varoquaux 16
  • 36. 2 Making libraries to enable reproducibility G Varoquaux 17
  • 37. 2 Pick the right battles: project scope Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting G Varoquaux 18
  • 38. 2 Pick the right battles: project scope Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting G Varoquaux 18
  • 39. 2 Pick the right battles: project scope Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting Define project scope and vision Break down projects by expertise Don’t solve hard problems Know the software landscape Need a vision = elevator pitch G Varoquaux 18
  • 40. 2 Pick the right battles: project scope Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting Define project scope and vision Break down projects by expertise Don’t solve hard problems Know the software landscape Need a vision = elevator pitch Your research (PhD) probably does not qualify ⇒ need to cherry-pick contributions G Varoquaux 18
  • 41. 2 Short cycles, limited ambitions Keep coming back to your users Release early, release often G Varoquaux 19
  • 42. 2 Simplicity Complexity increase superlinearly [An Experiment on Unit Increase in Problem Complexity, Woodfield 1979] 25% increase in problem complexity ⇒ 100% increase in code complexity G Varoquaux 20
  • 43. 2 Simplicity Complexity increase superlinearly [An Experiment on Unit Increase in Problem Complexity, Woodfield 1979] 25% increase in problem complexity ⇒ 100% increase in code complexity The 80/20 rule 80% of the usecases can be solved with 20% of the lines of code Avoid feature creep G Varoquaux 20
  • 44. 2 Quality Quality is free This is a book, by Philip Crosby G Varoquaux 21
  • 45. 2 You need quality Quality will give you users Bugs give you bad rap Quality will give you developers Contribute to learn and improve Quality avoids scientific errors Do less, do better Goes against the grant-system incentive G Varoquaux 22
  • 46. 2 Quality: what & how Great documentation Simplify, but don’t dumb down Focus on what the user is trying to solve Great APIs Example-based development If something is hard to explain, rethink the concepts Limit the number of different concepts and objects Good numerics Write tests based on mathematical properties When a user finds an instability, write a new test G Varoquaux 23
  • 47. 2 Quality: what & how Great documentation Simplify, but don’t dumb down Focus on what the user is trying to solve Great APIs Example-based development If something is hard to explain, rethink the concepts Limit the number of different concepts and objects Good numerics Write tests based on mathematical properties When a user finds an instability, write a new test Quality enables reuse Beyond mere reproducibility G Varoquaux 23
  • 48. 2 Quality assurance Code review: pull requests Can include newcomers We read each others code Everything is discussed: - Should the algorithm go in? - Are there good defaults? - Are names meaningfull? - Are the numerics stable? - Could it be faster? G Varoquaux 24
  • 49. 2 Quality assurance Unit testing Everything is tested Great for numerics Overall tests enforce on all estimators - consistency with the API - basic invariances - good handling of various inputs G Varoquaux 25
  • 50. 2 Dependencies Beyond installation, a challenge is to ensure package versions play way together: correctness of the code Breakage of backward compability yields irreconcilable dependencies G Varoquaux 26
  • 51. 2 Dependencies and their upgrade It’s a fact: users hate upgrading If it ain’t broken, don’t fix it even if it is, apparently G Varoquaux 27
  • 52. 2 Declaring undependence? Monolythic packages with no dependencies... But: Scaling is hard Complexity grows as square of codebase size [Woodfield 1979] User support grows with userbase size G Varoquaux 28
  • 53. 3 From software back to science G Varoquaux 29
  • 54. 3 Beyond computational reproducibility Make every computational step reproducible, and good science will emerge G Varoquaux 30
  • 55. 3 Beyond computational reproducibility Make every computational step reproducible, and good science will emerge Estimating the reproducibility of psychological science [Science 2015] 36% of effects replicate Reasons: Statistical challenges — analysis degrees of freedom Weak insentives — winner’s curse in publication Seldom computational reproducibility G Varoquaux 30
  • 56. In practice, the best way to improve research is to use the right (conceptual) tools. G Varoquaux 31
  • 57. 3 Managing complexity In practice, the best way to improve research is to use the right (conceptual) tools. The everyday roadblock is cognitive load Machine learning, brain anatomy, psychology R, Python, shell scripts Funding agencies, reviewer 3, the Manuscript G Varoquaux 32
  • 58. 3 Tools that reduce cognitive overload It’s a design problem G Varoquaux 33
  • 59. 3 Tools that reduce cognitive overload Jonathan Ive, an industrial designer, is #4 at Apple Code different. G Varoquaux 34
  • 60. 3 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 35
  • 61. 3 Some API design principles for the scipy stack Consistency, consistency, consistency np.save(file, obj) pickle.dump(obj, file) fmin(...maxiter=10) lsq linear(...max iter=10) Creates cognitive overload Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 36
  • 62. 3 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes Objects have hidden states, Objects have no universal interface, entry point, output A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 37
  • 63. 3 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts How much do usage patterns carry out across the library? Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 38
  • 64. 3 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Facilitates working with multiple libraries together Easier to get up to speed with a given library Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 39
  • 65. 3 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Change of behavior depending on input type Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 40
  • 66. 3 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Interfaces define objects Incompatible behaviors lead to bugs (eg np.matrix) Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 41
  • 67. 3 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Properties obfuscate the data model of the object Properties can create hidden compute costs Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 42
  • 68. 3 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Objects are understood by their surface Composition creates cognitive overload Error messages matter Be Pythonic G Varoquaux 43
  • 69. 3 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Explain the problem Print the offending value Be Pythonic G Varoquaux 44
  • 70. 3 Some API design principles for the scipy stack Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic Avoid syntax hacks G Varoquaux 45
  • 71. 3 Scikit-learn API Scikit-learn cheat sheet Scikit-learn Fit and predict >>> estimator = Estimator(param1=param1) >>> estimator.fit(X train, y train) >>> y test = estimator.predict(X test) Transform data >>> X red = estimator.transform(X test) G Varoquaux 46
  • 72. 3 Scikit-learn API Scikit-learn cheat sheet Scikit-learn Fit and predict >>> estimator = Estimator(param1=param1) >>> estimator.fit(X train, y train) >>> y test = estimator.predict(X test) Transform data >>> X red = estimator.transform(X test) The estimator is a “contract” (slightly more elaborate than above) It has created an ecosystem of packages Based on duck-typing, not inheritence G Varoquaux 46
  • 73. 3 Example-driven development The 3-liner as the new cool Teaching others Teaching yourself Write examples that solve end problems Iterate on your API until these are simple G Varoquaux 47
  • 74. 3 Example-driven development The 3-liner as the new cool Teaching others Teaching yourself Write examples that solve end problems Iterate on your API until these are simple User flow on the scikit-learn website: Examples G Varoquaux 47
  • 75. 3 Example-driven development The 3-liner as the new cool Teaching others Teaching yourself Write examples that solve end problems Iterate on your API until these are simple User flow on the nilearn website: Examples G Varoquaux 47
  • 76. 3 Building great documentation Focus on explaining concepts (hint: write plain English) Less is more: prioritize, avoid redundancy Code examples must be short (link to full tutorial examples) Links everywhere: users will land at the wrong place Teach with the docs Plan for maintenance of docs: Continuous integration Check links Runs examples Doctests G Varoquaux 48
  • 77. 3 Reusable science scikit-learn is the new machine-learning textbook nilearn is the new neuroimaging review article Experiments reproduced at each commit eg: brain reading nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html G Varoquaux 49
  • 78. 3 Reusable science scikit-learn is the new machine-learning textbook nilearn is the new neuroimaging review article Experiments reproduced at each commit eg: brain reading nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html Resource intensive Continuous integration: Data ⇒ Fight for good open data Computation ⇒ Find good algorithms and tradeoffs Forces us to distill the literature (as a review) G Varoquaux 49
  • 79. 3 Reusable science scikit-learn is the new machine-learning textbook nilearn is the new neuroimaging review article Experiments reproduced at each commit eg: brain reading nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html Package development consolidates science and moves it outside the lab G Varoquaux 49
  • 80. 3 Reusable science scikit-learn is the new machine-learning textbook nilearn is the new neuroimaging review article Experiments reproduced at each commit eg: brain reading nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html Package development consolidates science and moves it outside the lab scipy-lectures: living book for Python in science G Varoquaux 49
  • 81. Coding as a scientist Final code should be auditable, ideally reusable Tension between interactive computing & automating Main enemy: cognitive overload G Varoquaux 50
  • 82. Coding as a scientist Final code should be auditable, ideally reusable Tension between interactive computing & automating Main enemy: cognitive overload In the industry Reusable Verifiable? Not for silicon valley, but in insurance, healthcare, banking... Moving data-scientist code to production? Software projects going over budget? G Varoquaux 50
  • 83. @GaelVaroquaux Computational practices for reproducible science Software-engineering skills are superpowers “text-book” software engineering (yet never taught) Make you more productive
  • 84. @GaelVaroquaux Computational practices for reproducible science Software-engineering skills are superpowers “text-book” software engineering (yet never taught) Make you more productive But beware of over-engineering In software development, keep an eye on the end goal Science is iterative, not every step must be reproduced
  • 85. @GaelVaroquaux Computational practices for reproducible science Software-engineering skills are superpowers “text-book” software engineering (yet never taught) Make you more productive But beware of over-engineering In software development, keep an eye on the end goal Science is iterative, not every step must be reproduced Science is made by humans, not computers Usability, communication, human factors Design for reuse & generalization
  • 86. @GaelVaroquaux Computational practices for reproducible science Software-engineering skills are superpowers “text-book” software engineering (yet never taught) Make you more productive But beware of over-engineering In software development, keep an eye on the end goal Science is iterative, not every step must be reproduced Science is made by humans, not computers Usability, communication, human factors Design for reuse & generalization Reproducible research must borrow from agile development