Reconciling bleeding-edge scientific results and reproducible research may seem a conundrum in our fast-paced high-pressure academic world. I discuss the practices that I found useful in computational work. At a high level, it is important to navigate the space between rapid experimentation and industrial-grade software development. I advocate adopting more and more software-engineering best practices as a project matures. I will also discuss how to turn the computational work into libraries, and to ensure the quality of the resulting libraries. And I conclude on how those libraries need to fit in the larger picture of the exercise of research to give better science.
2. Computational practices for reproducible science
Ga¨el Varoquaux
This talk:
High-level advice on code in science
Pointers to good software practices
3. Computational practices for reproducible science
Ga¨el Varoquaux
Outline:
1 Coding for science, reproducibly
2 Making libraries
3 From software back to science
4. Where I come from
PhD in physics
Computers:
Matlab
Instrument control
Python for Science:
Startup guy
Mayavi
scikit-learn
joblib
nilearn
Applied maths / machine
learning for neuroscience
G Varoquaux 2
8. 1 Automate everything
Just a simple matter of programming
Scientific workflow:
Based on intuition
and experimentation
⇒ Iterative
Conjecture
Experiment
G Varoquaux 6
9. 1 Automate everything
Just a simple matter of programming
Scientific workflow:
Based on intuition
and experimentation
⇒ Iterative
Conjecture
Experiment
needs consolidation
keeping flexibility
G Varoquaux 7
10. 1 Automate everything
Just a simple matter of programming
Scientific workflow:
Based on intuition
and experimentation
⇒ Iterative
Conjecture
Experiment
needs consolidation
keeping flexibility
Lost in iterations
G Varoquaux 8
11. 1 Reproducible computational science
Harder than it seemed
Impediments
Missing steps / files
Libraries have changed
Non portable code
Statistical / numerical instabilities
No one knows where the info is
G Varoquaux 9
12. 1 Reproducible computational science
Harder than it seemed
Impediments
Missing steps / files
Libraries have changed
Non portable code
Statistical / numerical instabilities
No one knows where the info is
Technical
Human
Code quality matters
Manual steps are evil
G Varoquaux 9
13. 1 How I work progressive consolidation
Start with a script playing to understand the problem
G Varoquaux 10
14. 1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ⇒ move to a function
Use functions
Obstacle: local scope
requires identifying input and output variables
That’s a good thing
Interactive debugging / understanding
inside a function: %debug in IPython
Functions are the basic reusable abstraction
G Varoquaux 10
15. 1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ⇒ move to a function
As they stabilize, move to a module
Modules
enable sharing between experiments
⇒ avoid 1000 lines scripts + commented code
enable testing
Fast experiments as tests
⇒ gives confidence, hence refactorings
G Varoquaux 10
16. 1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ⇒ move to a function
As they stabilize, move to a module
Clean: delete code & files you have version control
Attentional load makes it impossible
to find or understand things
Where’s Waldo?G Varoquaux 10
17. 1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ⇒ move to a function
As they stabilize, move to a module
Clean: delete code & files you have version control
Why is it hard?
Long compute times
make us unadventurous
Know your tools
Refactoring editor
Version control
G Varoquaux 10
18. 1 Mitigating long compute times
by storing intermediate results
Makefiles
Accessible intermediate results
Break control flow & error management
G Varoquaux 11
19. 1 Mitigating long compute times
by storing intermediate results
Makefiles
Accessible intermediate results
Break control flow & error management
Memoization / caching
g = mem.cache(f) # ”joblib” Python module
b = g(a) # computes a using f
c = g(a) # retrieves results from store
Fits in coding / experimentation loop
Black-boxy, persistence only implicit
Discourages refactoring (avoid recomputing)
G Varoquaux 11
20. 1 A design pattern in computational experiments
MVC pattern (from Wikipedia
Model
Manages the data
and rules of the
application
View
Output represen-
tation
Possibly several views
Controller
Accepts input
and converts it to
commands
for model and view
Photo-editing software
Filters Canvas Tool palettes
Typical web application
Database Web-pages URLs
G Varoquaux 12
21. 1 A design pattern in computational experiments
MVC pattern (from Wikipedia
Model
Manages the data
and rules of the
application
View
Output represen-
tation
Possibly several views
Controller
Accepts input
and converts it to
commands
for model and view
For data science:
Numerical, data-
processing, & ex-
perimental logic
Results, as files.
Data & plots
Imperative API
Avoid input as files:
not expressive
Module
with functions
Post-processing script
CSV & data files
Script
⇒ for loops
G Varoquaux 12
22. 1 A design pattern in computational experiments
MVC pattern (from Wikipedia
Model
Manages the data
and rules of the
application
View
Output represen-
tation
Possibly several views
Controller
Accepts input
and converts it to
commands
for model and view
For data science:
Numerical, data-
processing, & ex-
perimental logic
Results, as files.
Data & plots
Imperative API
Avoid input as files:
not expressive
Module
with functions
Post-processing script
CSV & data files
Script
⇒ for loops
A recipe
3 types of files:
• modules • command scripts • post-processing scripts
CSVs & intermediate data files
Separate computation from analysis / plotting
Code and text (and data) ⇒ version control
G Varoquaux 12
23. 1 A design pattern in computational experiments
MVC pattern (from Wikipedia
Model
Manages the data
and rules of the
application
View
Output represen-
tation
Possibly several views
Controller
Accepts input
and converts it to
commands
for model and view
For data science:
Numerical, data-
processing, & ex-
perimental logic
Results, as files.
Data & plots
Imperative API
Avoid input as files:
not expressive
Module
with functions
Post-processing script
CSV & data files
Script
⇒ for loops
A recipe
3 types of files:
• modules • command scripts • post-processing scripts
CSVs & intermediate data files
Separate computation from analysis / plotting
Code and text (and data) ⇒ version control
Decouple steps
Goals: Reuse code
Mitigate compute time
G Varoquaux 12
25. 1 The ladder of code quality
Use static checking in your editor seriously
G Varoquaux 14
26. 1 The ladder of code quality
Use static checking in your editor seriously
Coding convention, good naming
“Any fool can write code that a computer
can understand. Good programmers write
code that humans can understand”
– Martin Fowler
Your future self will thank you
G Varoquaux 14
27. 1 The ladder of code quality
Use static checking in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
G Varoquaux 14
28. 1 The ladder of code qualityIncreasingcost
?
Use static checking in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
Premature software engineering
is the root of reproducible science evil
G Varoquaux 14
29. 1 The ladder of code qualityIncreasingcost
?
Use static checking in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
Premature software engineering
is the root of reproducible science evil
Over versus under engineering
Our goal is scientific discovery
Experimentation to develop intuitions
⇒ new ideas
As the path becomes clear: consolidation
Heavy engineering too early freezes bad ideas
G Varoquaux 14
30. 1 LibrariesIncreasingcost
?
Use static checking in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
A library
G Varoquaux 15
31. 1 LibrariesIncreasingcost
?
Use static checking in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testing
If it’s not tested, it’s broken or soon will be
Make a package
controlled dependencies and compilation
...
A library
Code = Software
Well-defined problem = research
Functions should work as long as input is valid
Targets users, not the authors
Made for reuse
The workhorse of reproducibility
G Varoquaux 15
32. 1 Libraries enable reproducible science
Reproducibility
Rerun and come to the same conclusion
An argument for copying all scripts
for each paper Better: a tag in git
G Varoquaux 16
33. 1 Libraries enable reproducible science
Reproducibility
Rerun and come to the same conclusion
An argument for copying all scripts
for each paper Better: a tag in git
Copying scripts scales very poorly:
Accumulation of half-dead code ⇒ cognitive overload
No consolidation across studies
Each study has different variants of different bugs
Garden of forking code
G Varoquaux 16
34. 1 Libraries enable reproducible science
Reproducibility
Rerun and come to the same conclusion
An argument for copying all scripts
for each paper Better: a tag in git
Kirstie Whitaker
G Varoquaux 16
35. 1 Libraries enable reproducible science
Reproducibility
Rerun and come to the same conclusion
An argument for copying all scripts
for each paper
Frozen food
Reusability
Apply the approach to a new problem
Being able to understand, modify,
run in new settings
Generalization is the true scientific test
Reusable computational science = libraries
G Varoquaux 16
37. 2 Pick the right battles: project scope
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
G Varoquaux 18
38. 2 Pick the right battles: project scope
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
G Varoquaux 18
39. 2 Pick the right battles: project scope
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
Define project scope and vision
Break down projects by expertise
Don’t solve hard problems
Know the software landscape
Need a vision = elevator pitch
G Varoquaux 18
40. 2 Pick the right battles: project scope
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
Define project scope and vision
Break down projects by expertise
Don’t solve hard problems
Know the software landscape
Need a vision = elevator pitch
Your research (PhD) probably does not qualify
⇒ need to cherry-pick contributions
G Varoquaux 18
41. 2 Short cycles, limited ambitions
Keep coming back to your users
Release early, release often
G Varoquaux 19
42. 2 Simplicity
Complexity increase superlinearly
[An Experiment on Unit Increase in Problem Complexity,
Woodfield 1979]
25% increase in problem complexity
⇒ 100% increase in code complexity
G Varoquaux 20
43. 2 Simplicity
Complexity increase superlinearly
[An Experiment on Unit Increase in Problem Complexity,
Woodfield 1979]
25% increase in problem complexity
⇒ 100% increase in code complexity
The 80/20 rule
80% of the usecases can be solved
with 20% of the lines of code
Avoid feature creep
G Varoquaux 20
45. 2 You need quality
Quality will give you users
Bugs give you bad rap
Quality will give you developers
Contribute to learn and improve
Quality avoids scientific errors
Do less, do better
Goes against the grant-system incentive
G Varoquaux 22
46. 2 Quality: what & how
Great documentation
Simplify, but don’t dumb down
Focus on what the user is trying to solve
Great APIs
Example-based development
If something is hard to explain, rethink the concepts
Limit the number of different concepts and objects
Good numerics
Write tests based on mathematical properties
When a user finds an instability, write a new test
G Varoquaux 23
47. 2 Quality: what & how
Great documentation
Simplify, but don’t dumb down
Focus on what the user is trying to solve
Great APIs
Example-based development
If something is hard to explain, rethink the concepts
Limit the number of different concepts and objects
Good numerics
Write tests based on mathematical properties
When a user finds an instability, write a new test
Quality enables reuse
Beyond mere reproducibility
G Varoquaux 23
48. 2 Quality assurance
Code review: pull requests
Can include newcomers
We read each others code
Everything is discussed:
- Should the algorithm go in?
- Are there good defaults?
- Are names meaningfull?
- Are the numerics stable?
- Could it be faster?
G Varoquaux 24
49. 2 Quality assurance
Unit testing
Everything is tested
Great for numerics
Overall tests enforce on all estimators
- consistency with the API
- basic invariances
- good handling of various inputs
G Varoquaux 25
50. 2 Dependencies
Beyond installation, a challenge is to ensure package
versions play way together: correctness of the code
Breakage of backward compability
yields irreconcilable dependencies
G Varoquaux 26
51. 2 Dependencies and their upgrade
It’s a fact: users hate upgrading
If it ain’t broken, don’t fix it
even if it is, apparently
G Varoquaux 27
52. 2 Declaring undependence?
Monolythic packages with no dependencies...
But:
Scaling is hard
Complexity grows as square of codebase size
[Woodfield 1979]
User support grows with userbase size
G Varoquaux 28
54. 3 Beyond computational reproducibility
Make every computational step reproducible,
and good science will emerge
G Varoquaux 30
55. 3 Beyond computational reproducibility
Make every computational step reproducible,
and good science will emerge
Estimating the reproducibility of psychological science
[Science 2015] 36% of effects replicate
Reasons:
Statistical challenges — analysis degrees of freedom
Weak insentives — winner’s curse in publication
Seldom computational reproducibility
G Varoquaux 30
56. In practice, the best way to improve research
is to use the right (conceptual) tools.
G Varoquaux 31
57. 3 Managing complexity
In practice, the best way to improve research
is to use the right (conceptual) tools.
The everyday roadblock is cognitive load
Machine learning, brain anatomy, psychology
R, Python, shell scripts
Funding agencies, reviewer 3, the Manuscript
G Varoquaux 32
58. 3 Tools that reduce cognitive overload
It’s a design problem
G Varoquaux 33
59. 3 Tools that reduce cognitive overload
Jonathan Ive, an industrial designer, is #4 at Apple
Code different.
G Varoquaux 34
60. 3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 35
61. 3 Some API design principles for the scipy stack
Consistency, consistency, consistency
np.save(file, obj) pickle.dump(obj, file)
fmin(...maxiter=10) lsq linear(...max iter=10)
Creates cognitive overload
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 36
62. 3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
Objects have hidden states,
Objects have no universal interface, entry point, output
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 37
63. 3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
How much do usage patterns carry out across the library?
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 38
64. 3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Facilitates working with multiple libraries together
Easier to get up to speed with a given library
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 39
65. 3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Change of behavior depending on input type
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 40
66. 3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Interfaces define objects
Incompatible behaviors lead to bugs (eg np.matrix)
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 41
67. 3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Properties obfuscate the data model of the object
Properties can create hidden compute costs
Shallow is better than deep
Error messages matter
Be Pythonic
G Varoquaux 42
68. 3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Objects are understood by their surface
Composition creates cognitive overload
Error messages matter
Be Pythonic
G Varoquaux 43
69. 3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Explain the problem
Print the offending value
Be Pythonic
G Varoquaux 44
70. 3 Some API design principles for the scipy stack
Consistency, consistency, consistency
Functions are easier to understand than classes
A library should hinge on a small number of concepts
Common data containers make the ecosystem stronger
Each function should have one and only one purpose
Code for interfaces, but don’t overdo duck typing
Properties are for impedance matching
Shallow is better than deep
Error messages matter
Be Pythonic
Avoid syntax hacks
G Varoquaux 45
71. 3 Scikit-learn API
Scikit-learn cheat sheet
Scikit-learn
Fit and predict
>>> estimator = Estimator(param1=param1)
>>> estimator.fit(X train, y train)
>>> y test = estimator.predict(X test)
Transform data
>>> X red = estimator.transform(X test)
G Varoquaux 46
72. 3 Scikit-learn API
Scikit-learn cheat sheet
Scikit-learn
Fit and predict
>>> estimator = Estimator(param1=param1)
>>> estimator.fit(X train, y train)
>>> y test = estimator.predict(X test)
Transform data
>>> X red = estimator.transform(X test)
The estimator is a “contract”
(slightly more elaborate than above)
It has created an ecosystem of packages
Based on duck-typing, not inheritence
G Varoquaux 46
73. 3 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
G Varoquaux 47
74. 3 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
User flow on the scikit-learn website:
Examples
G Varoquaux 47
75. 3 Example-driven development
The 3-liner as the new cool
Teaching others
Teaching yourself
Write examples that solve end problems
Iterate on your API until these are simple
User flow on the nilearn website:
Examples
G Varoquaux 47
76. 3 Building great documentation
Focus on explaining concepts (hint: write plain English)
Less is more: prioritize, avoid redundancy
Code examples must be short (link to full tutorial examples)
Links everywhere: users will land at the wrong place
Teach with the docs
Plan for maintenance of docs:
Continuous integration
Check links
Runs examples
Doctests
G Varoquaux 48
77. 3 Reusable science
scikit-learn is the new machine-learning textbook
nilearn is the new neuroimaging review article
Experiments reproduced
at each commit
eg: brain reading
nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html
G Varoquaux 49
78. 3 Reusable science
scikit-learn is the new machine-learning textbook
nilearn is the new neuroimaging review article
Experiments reproduced
at each commit
eg: brain reading
nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html
Resource intensive Continuous integration:
Data ⇒ Fight for good open data
Computation ⇒ Find good algorithms and tradeoffs
Forces us to distill the literature (as a review)
G Varoquaux 49
79. 3 Reusable science
scikit-learn is the new machine-learning textbook
nilearn is the new neuroimaging review article
Experiments reproduced
at each commit
eg: brain reading
nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html
Package development consolidates
science and moves it outside the lab
G Varoquaux 49
80. 3 Reusable science
scikit-learn is the new machine-learning textbook
nilearn is the new neuroimaging review article
Experiments reproduced
at each commit
eg: brain reading
nilearn.github.io/auto examples/02 decoding/plot miyawaki reconstruction.html
Package development consolidates
science and moves it outside the lab
scipy-lectures: living book for Python in science
G Varoquaux 49
81. Coding as a scientist
Final code should be auditable,
ideally reusable
Tension between interactive computing
& automating
Main enemy: cognitive overload
G Varoquaux 50
82. Coding as a scientist
Final code should be auditable,
ideally reusable
Tension between interactive computing
& automating
Main enemy: cognitive overload
In the industry
Reusable
Verifiable? Not for silicon valley,
but in insurance, healthcare, banking...
Moving data-scientist code
to production?
Software projects going over budget?
G Varoquaux 50
83. @GaelVaroquaux
Computational practices for reproducible science
Software-engineering skills are superpowers
“text-book” software engineering (yet never taught)
Make you more productive
84. @GaelVaroquaux
Computational practices for reproducible science
Software-engineering skills are superpowers
“text-book” software engineering (yet never taught)
Make you more productive
But beware of over-engineering
In software development, keep an eye on the end goal
Science is iterative, not every step must be reproduced
85. @GaelVaroquaux
Computational practices for reproducible science
Software-engineering skills are superpowers
“text-book” software engineering (yet never taught)
Make you more productive
But beware of over-engineering
In software development, keep an eye on the end goal
Science is iterative, not every step must be reproduced
Science is made by humans, not computers
Usability, communication, human factors
Design for reuse & generalization
86. @GaelVaroquaux
Computational practices for reproducible science
Software-engineering skills are superpowers
“text-book” software engineering (yet never taught)
Make you more productive
But beware of over-engineering
In software development, keep an eye on the end goal
Science is iterative, not every step must be reproduced
Science is made by humans, not computers
Usability, communication, human factors
Design for reuse & generalization
Reproducible research must borrow
from agile development