How to organise a Jupyter IPython notebook research project, so that yourself, as well as others, be able to read, understand and reproduce your work? How big should a notebook be? What to put in one cell? How do Clean Code principles outlined by Robert C. Martin aka Uncle Bob relate to Python and more specifically to IPython?
2. @KNerush @Volodymyrk
Volodymyr (Vlad) Kazantsev
Head of Data @ product madness
Product Manager
MBA @LBS
Graphics programming
Writes code for money since 2002
Math degree
2
Kateryna (Katya) Nerush
Mobile Dev @ Octopus Labs
Dev Lead in Finance
Data Engineer
Web Developer
Writes code for money since 2003
CS degree
4. @KNerush @Volodymyrk
Who are Data Scientists, really?
4
Coding
Stats Business “In a nutshell, coding is telling a computer to do
something using a language it understands.”
Data Science with Python
6. @KNerush @Volodymyrk
“Any fool can write code that a computer can understand. Good programmers write
code that humans can understand” - Kent Beck, 1999
6
WTF! How am I suppose to
validate this??
Sorry, but how do
can I calculate
7 day retention ?
8. @KNerush @Volodymyrk
You do it for your own good..
8
Re-run all AB tests
analysis for the
last months, by
tomorrow
Ideas &
Questions
Data
Analysis
Insights
Impact
11. @KNerush @Volodymyrk
“Clean Code” ?
11
Pleasingly graceful and stylish in appearance
or manner
Bjarne Stroustrup
Inventor of C++
Clean code reads like well written prose
Grady Booch
creator of UML
.. each routine turns out to be pretty much what
you expected
Ward Cunningham
inventor of Wiki and XP
12. @KNerush @Volodymyrk
One does not simply start writing clean code..
12
First make it work,
Then make it Right,
Then make it fast and small
Kent Beck
co-inventor of XP and TDD
Leave the campground cleaner than you found it
- Run all the tests
- Contains no duplicate code
- Expresses all ideas...
- Minimize classes and methods
Ron Jeffries
author of Extreme
Programming Installed
The Boy Scouts of America
Applied to programming by
Uncle Bob
14. @KNerush @Volodymyrk
“There are only two hard problems in Computer Science:
cache invalidation and naming things" - Phil Karlton
● long_descriptive_names
○ Avoid: x, i, stuff, do_blah()
● Pronounceable and Searchable
○ revenue_per_payer vs. arpdpu
● Avoid encodings, abbreviations, prefixes, suffixes.. if possible
○ bonus_points_on_iphone vs. cns_crm_dip
● Add meaningful context
○ daily_revenue_per_payer
● Don’t be lazy.
○ Spend time naming and renaming things.
14
15. @KNerush @Volodymyrk
“each routine turns out to be pretty much what you
expected” - Ward Cunningham
● Small
● Do one thing
● One Level of Abstraction
● Have only few arguments (one is the best)
○ Less important in Python, with named arguments.
15
16. @KNerush @Volodymyrk
● Use good names
● Avoid obvious comments.
● Dead Commented-out Code
● ToDo, licenses, history, markup for documentation and other nonsense
● But there are exceptions..
“When you feel the need to write a comment, first try to refactor
the code so that any comment becomes superfluous” Kent Beck
16
20. @KNerush @Volodymyrk
“Long functions is where classes are trying to hide” -
Robert C. Martin
20
● Small
● Do one thing
● SOLID, Design Patterns, etc.
21. @KNerush @Volodymyrk
Code conventions
● Team should produce same style code as if that was one person
● Team conventions over language one, over personal ones
● Automate style formatting
21
23. @KNerush @Volodymyrk
● Indentation
● Tabs or Spaces?
● Maximum Line Length
● Should a line break before or after a binary operator?
● Blank Lines
● Imports
● Comments
● Naming Conventions
Example:
PEP 8 -- Style Guide for Python Code
23
foo = long_function_name(var_one, var_two,
var_three, var_four)
foo = long_function_name(var_one, var_two,
var_three, var_four)
Good Bad
https://www.python.org/dev/peps/pep-0008/
25. @KNerush @Volodymyrk25
My favourite !
This is not Java or C++
● Functions are first-class objects
● Duck-typing as an interface
● No setters/getters
● Itertools, zip, enumerate
● etc.
27. @KNerush @Volodymyrk
1. Imports
27
2. Get Data
5.Visualisation
6. Making sense of the data
4. Modelling
3. Transform Data
Typical structure of the ipynb
37. @KNerush @Volodymyrk
● One “idea - execution - output” triplet per cell
● Import Cell: expected output is no import errors
● CMD+SHIFT+P
37
Tip 4: each cell should have one logical output
38. @KNerush @Volodymyrk
Tip 5: write tests .. in jupyter notebooks
38
https://pypi.python.org/pypi/pytest-ipynb
40. @KNerush @Volodymyrk
Code Smells .. in ipynb
- Cells can’t be executed in order (with runAll and Restart&RunAll)
- Prototype (check ideas) code is mixed with “analysis” code
- Debugging cells
- Copy-paste cells
- Duplicate code (in general)
- Multiple notebooks that re-implement the same function
40
43. @KNerush @Volodymyrk
Summary: How to organise a Jupyter project
1. Notebook should have one Hypothesis-Data-Interpretation loop
2. Make a multi-project utils library
3. Good jupyter notebook reads like a well written prose
4. Each cell should have one and only one output
5. Write tests in notebooks
6. Deploy a shared Jupyter server
7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.
43