Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improving your jupyter notebook workflow

Slides for my talk at Women Who Code Talk night in Berlin 17/09/2019

Jupyter notebooks are a great tool for interactive programming but they
can also quickly turn messy. With their focus on rapid prototyping, it
can take some discipline to not end up with a pile of notebooks full of
messy code. On top of that, jupyter notebooks are difficult to version
control and it thus becomes hard to track changes. It is threfore easy
to end up in a situation where you don’t understand your own analysis
from some months ago, not to mention the code from your colleague.
In this talk, I will share a few simple and practical tips that I found
helpful to improve the notebook workflow in our data science team.

Link to Jupyter Savehook Gist:

  • Be the first to comment

  • Be the first to like this

Improving your jupyter notebook workflow

  1. 1. Improving your notebook workflow By Corrie Bartelheimer
  2. 2. Messy Notebooks 1 2
  3. 3. Messy? 3 ● Easy to end up with a messy notebook pile ● Rapid prototyping leads to less documentation Where is the PyTorch import? ● Reproducibility of results Which data was used here?
  4. 4. Collaboration? 4 ● Will future-you still understand your notebooks? ● Or your colleague? Where is the PyTorch import? ● Will they be able to run the notebooks? Which data was used here?
  5. 5. Version Control? 5 ● Diff too large ● Or too cryptic
  6. 6. Tidy Notebooks 6
  7. 7. Version Control! 7 jupyter nbconvert --to md notebook.ipynb ● Convert notebooks to markdown ● Using nbconvert* ● Can be automated via Jupyter SaveHook *Other options are for example: jupytext, reviewnb
  8. 8. Version Control! 8 ● Diff of both code and results ● Rich Markdown diff is also possible
  9. 9. Collaboration! 9 ● Use a code (and analysis) review process ● In our team, we use the GitHub workflow ● A review encourages clean-ups and documentation
  10. 10. Reproducibility! 10 ● Document where your data comes from ● Or better: access data programmatically ● Run the whole notebook before commiting
  11. 11. Some more tidying 11 ● Include a TL;DR summary of the question you’re trying to solve and your conclusion Cookiecutter Data Science: ● Introduce a naming convention for notebooks, e.g. 1.0-cba-initial-data-exploration ● Use a default folder structure
  12. 12. Summary 12 ● Collaborate and use Code Reviews ● Version Control with converted notebooks ● Make notebooks reproducible ● Clean up your notebooks
  13. 13. 13 Thanks! Any questions? @corrieaar corriebar