More Related Content Similar to Making Conda-based Reproducible Projects (20) Making Conda-based Reproducible Projects1. 1 © 2021 Anaconda
Making Reproducible
Conda-based Projects
Albert DeFusco
May 19, 2021
2. 2 © 2021 Anaconda
Agenda
1 | Reproducible
Notebooks?
2 | Gathering materials
3 | Conda Environments
4 | Anaconda Project
5 | Sharing projects
6 | Demonstration
3. 3 © 2021 Anaconda
Jupyter notebooks everywhere
4. 4 © 2021 Anaconda
Are Jupyter notebooks reproducible?
A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks
João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, Juliana Freire
● Analyzed 1 million Jupyter Notebooks on Github
○ Was the notebook readable?
○ Was the notebook named “Untitled.ipynb”?
○ Did the notebook define functions or classes?
○ Was the data provided?
○ Can the notebook be executed?
http://www.ic.uff.br/~leomurta/papers/pimentel2019a.pdf
5. 5 © 2021 Anaconda
Are Jupyter notebooks reproducible?
A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks
João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, Juliana Freire
“out of 863,878 attempted executions of valid notebooks (i.e., notebooks with
defined Python version and execution order), only 24.11% executed without
errors and only 4.03% produced the same results.”
http://www.ic.uff.br/~leomurta/papers/pimentel2019a.pdf
6. 6 © 2021 Anaconda
Gather code and data
7. 7 © 2021 Anaconda
Gather what you need in one directory
● Jupyter notebooks
● Script files
● data files
○ Or links to the data on the web
This is not a good idea
Much better!
8. 8 © 2021 Anaconda
Why some people don’t like Notebooks
● Cells can be run out-of-order
● Cells can be deleted
● No guarantee that the correct execution can be repeated easily
9. 9 © 2021 Anaconda
Recommendation: Restart and Run All
Before sharing your work
1. Restart Kernel and Run All Cells
2. Fix the first error
3. Repeat
Do this frequently
11. 11 © 2021 Anaconda
The Anaconda base environment
● Over 300 packages
already installed
● Data Science out-of-
the-box
12. 12 © 2021 Anaconda
Plan ahead with multiple environments
● Keep multiple package versions installed
● You can use environments to
○ Try new versions of packages
○ Install old versions of packages
○ Install only the packages you need for a project
13. 13 © 2021 Anaconda
Create the new environment
● On the Environment pane
○ Click Create
○ Provide a name
○ Select the Python version
● Continue to add more
packages
14. 14 © 2021 Anaconda
Best practices
● Managing multiple environments can be challenging
● nb_conda
○ Install into your base environment and launch Jupyter
○ Access any environment where ipykernel, notebook, or jupyter has been installed
○ Every notebook will “remember” the environment that was used
15. 15 © 2021 Anaconda
Works with Jupyter Notebook and JupyterLab
17. 17 © 2021 Anaconda
anaconda-project
● Open-source tool that helps you make reproducible projects
● With a single YAML text file you can
○ Specify Conda and Pip packages that will be installed
○ Specify URLs to automatically download data files
○ Specify executable commands that can launch scripts, applications, or Jupyter Notebooks
● You can easily share projects as
○ Archive files (zip, tar.gz, tar.bz2)
○ Uploaded to Anaconda.org
○ Docker images
18. 18 © 2021 Anaconda
Install anaconda-project
● Provided with Anaconda Individual Edition
● Install or upgrade anaconda-project using
○ Anaconda Navigator
○ Conda
19. 19 © 2021 Anaconda
Initialize your project on the Command Line
To create a new anaconda-project.yml file from scratch in your directory
$ anaconda-project init
$ anaconda-project add-packages package1 package2 …
$ anaconda-project add-packages --pip package3 package4 ...
20. 20 © 2021 Anaconda
Initialize your project on the Command Line
To bootstrap a project from an existing Conda environment
$ conda env export --from-history environment-name > anaconda-project.yml
# add more packages if needed
$ anaconda-project add-packages package1 package2 …
$ anaconda-project add-packages --pip package3 package4 ...
Without the --from-history flag the environment will not work on all OSes
21. 21 © 2021 Anaconda
Runnable commands
With Commands you can define how to execute your project
● Run a script or OS command
● Launch a Jupyter notebook
● Launch an application from a notebook or script. For example:
○ Bokeh or Panel dashboards
○ Flask, Django, or FastAPI web service
$ anaconda-project add-command --type notebook name notebook.ipynb
$ anaconda-project add-command --type unix name ‘panel serve notebook.ipynb’
$ anaconda-project add-command --type windows name ‘panel serve notebook.ipynb’
22. 22 © 2021 Anaconda
Locking package versions
The anaconda-project.yml may now look like
name: my-project
packages:
- python=3.8
- notebook
- pandas
To completely specify the package versions (including pip freeze)
$ anaconda-project lock
Updating locked dependencies for env spec default…
Resolving conda packages for osx-64
Resolving conda packages for linux-64
Resolving conda packages for win-64
23. 23 © 2021 Anaconda
Multiple ways to share your project
● Create a Git repository and push to Github like any other project
● Create a Zip or Tar archive
$ anaconda-project archive archive-name.zip
● Create an account on Anaconda.org and upload an archive
$ anaconda-project upload
● Create a Docker image (requires anaconda-project 0.10 or greater)
$ anaconda-project dockerize
25. 25 © 2021 Anaconda
Learn more
https://anaconda-project.readthedocs.io
26. 26 © 2021 Anaconda
Review
● Use a new Conda environment or initialize an anaconda-project.yml file
● Gather notebooks, code, and data into one directory
● Make sure you can “Restart kernel and run all” on your notebooks
● Special features of anaconda-project
○ Lock package versions for Mac, Linux, and Windows
○ Define runnable commands
○ Easily share the project as an archive, on Anaconda.org, or Docker image