Clean Code in Jupyter notebook

@KNerush @Volodymyrk
Clean Code
In Jupyter notebooks, using Python
1
5th of July, 2016

Volodymyr (Vlad) Kazantsev
Head of Data @ product madness
Product Manager
MBA @LBS
Graphics programming
Writes code for money since 2002
Math degree
2
Kateryna (Katya) Nerush
Mobile Dev @ Octopus Labs
Dev Lead in Finance
Data Engineer
Web Developer
Writes code for money since 2003
CS degree

Why we end-up with messy ipy notebooks?
3
Coding
Stats Business

Who are Data Scientists, really?
4
Coding
Stats Business “In a nutshell, coding is telling a computer to do
something using a language it understands.”
Data Science with Python

It is not going to production anyway!
5

“Any fool can write code that a computer can understand. Good programmers write
code that humans can understand” - Kent Beck, 1999
6
WTF! How am I suppose to
validate this??
Sorry, but how do
can I calculate
7 day retention ?

From Prototype to ... The Data Science Spiral
7
Ideas &
Questions
Data
Analysis
Insights
Impact

You do it for your own good..
8
Re-run all AB tests
analysis for the
last months, by
tomorrow
Ideas &
Questions
Data
Analysis
Insights
Impact

Part 2
What can Data Scientists learn from
Software Engineers?
9

Robert C. Martin, a.k.a. “Uncle Bob”
10
https://cleancoders.com/

“Clean Code” ?
11
Pleasingly graceful and stylish in appearance
or manner
Bjarne Stroustrup
Inventor of C++
Clean code reads like well written prose
Grady Booch
creator of UML
.. each routine turns out to be pretty much what
you expected
Ward Cunningham
inventor of Wiki and XP

One does not simply start writing clean code..
12
First make it work,
Then make it Right,
Then make it fast and small
Kent Beck
co-inventor of XP and TDD
Leave the campground cleaner than you found it
- Run all the tests
- Contains no duplicate code
- Expresses all ideas...
- Minimize classes and methods
Ron Jeffries
author of Extreme
Programming Installed
The Boy Scouts of America
Applied to programming by
Uncle Bob

I'm not a great programmer;
I'm just a good programmer with great habits.
13
Kent Beck

“There are only two hard problems in Computer Science:
cache invalidation and naming things" - Phil Karlton
● long_descriptive_names
○ Avoid: x, i, stuff, do_blah()
● Pronounceable and Searchable
○ revenue_per_payer vs. arpdpu
● Avoid encodings, abbreviations, prefixes, suffixes.. if possible
○ bonus_points_on_iphone vs. cns_crm_dip
● Add meaningful context
○ daily_revenue_per_payer
● Don’t be lazy.
○ Spend time naming and renaming things.
14

“each routine turns out to be pretty much what you
expected” - Ward Cunningham
● Small
● Do one thing
● One Level of Abstraction
● Have only few arguments (one is the best)
○ Less important in Python, with named arguments.
15

● Use good names
● Avoid obvious comments.
● Dead Commented-out Code
● ToDo, licenses, history, markup for documentation and other nonsense
● But there are exceptions..
“When you feel the need to write a comment, first try to refactor
the code so that any comment becomes superfluous” Kent Beck
16

// When I wrote this, only God and I understood what I was doing
// Now, God only knows
17

// sometimes I believe compiler ignores all my comments
18

/**
* Always returns true.
*/
public boolean isAvailable() {
return false;
}
19

“Long functions is where classes are trying to hide” -
Robert C. Martin
20
● Small
● Do one thing
● SOLID, Design Patterns, etc.

Code conventions
● Team should produce same style code as if that was one person
● Team conventions over language one, over personal ones
● Automate style formatting
21

Part 3
How to write Clean Code in Python?
(e.g. this is not Java)
22

● Indentation
● Tabs or Spaces?
● Maximum Line Length
● Should a line break before or after a binary operator?
● Blank Lines
● Imports
● Comments
● Naming Conventions
Example:
PEP 8 -- Style Guide for Python Code
23
foo = long_function_name(var_one, var_two,
var_three, var_four)
foo = long_function_name(var_one, var_two,
var_three, var_four)
Good Bad
https://www.python.org/dev/peps/pep-0008/

Google Python Style Guide
24
https://google.github.io/styleguide/pyguide.html

@KNerush @Volodymyrk25
My favourite !
This is not Java or C++
● Functions are first-class objects
● Duck-typing as an interface
● No setters/getters
● Itertools, zip, enumerate
● etc.

Part 4
How to write Clean Python Code in Jupyter
Notebook?
26

1. Imports
27
2. Get Data
5.Visualisation
6. Making sense of the data
4. Modelling
3. Transform Data
Typical structure of the ipynb

How big should a notebook file be?
28

How big should a notebook file be?
Hypothesis - Data - Interpretation
29

Keep your notebooks small!
(4-10 cells each)
30

Example:
Tip 1: break fat notebook into many small ones
31
1_data_preparation.ipynb
df.to_pickle(‘clean_data_1.pkl)
2_linear_model.py
df = pd.read_pickle(‘clean_data_1.pkl)
3_ensamble.py
df = pd.read_pickle(‘clean_data_1.pkl)

Tip 2: shared library
● Data access
● Common plotting functionality
● Report generation
● Misc. utils
32
acme_data_utils
Data_access.py
plotting.py
setup.py
tests/

Tip 3: Don’t just be pythonic. Be IPythonic
Don’t hide “secret sauce” inside imported module
BAD:
Good:
33

Clean code reads like well written prose
34
Grady Booch

Good jupyter notebook reads like well written prose
35

How big should one Cell be?
36

● One “idea - execution - output” triplet per cell
● Import Cell: expected output is no import errors
● CMD+SHIFT+P
37
Tip 4: each cell should have one logical output

Tip 5: write tests .. in jupyter notebooks
38
https://pypi.python.org/pypi/pytest-ipynb

Tip 6: ..to the cloud
39

Code Smells .. in ipynb
- Cells can’t be executed in order (with runAll and Restart&RunAll)
- Prototype (check ideas) code is mixed with “analysis” code
- Debugging cells
- Copy-paste cells
- Duplicate code (in general)
- Multiple notebooks that re-implement the same function
40

Tip 7: Run notebook from another notebook!
41
analysis.ipynb

Make Data Product from notebooks!
42

Summary: How to organise a Jupyter project
1. Notebook should have one Hypothesis-Data-Interpretation loop
2. Make a multi-project utils library
3. Good jupyter notebook reads like a well written prose
4. Each cell should have one and only one output
5. Write tests in notebooks
6. Deploy a shared Jupyter server
7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.
43

Clean Code in Jupyter notebook

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Clean Code in Jupyter notebook

Similar to Clean Code in Jupyter notebook (20)

More from Volodymyr Kazantsev

More from Volodymyr Kazantsev (6)

Recently uploaded

Recently uploaded (20)

Clean Code in Jupyter notebook