Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
@KNerush @Volodymyrk
Clean Code
In Jupyter notebooks, using Python
1
5th of July, 2016
@KNerush @Volodymyrk
Volodymyr (Vlad) Kazantsev
Head of Data @ product madness
Product Manager
MBA @LBS
Graphics programmi...
@KNerush @Volodymyrk
Why we end-up with messy ipy notebooks?
3
Coding
Stats Business
@KNerush @Volodymyrk
Who are Data Scientists, really?
4
Coding
Stats Business “In a nutshell, coding is telling a computer...
@KNerush @Volodymyrk
It is not going to production anyway!
5
@KNerush @Volodymyrk
“Any fool can write code that a computer can understand. Good programmers write
code that humans can ...
@KNerush @Volodymyrk
From Prototype to ... The Data Science Spiral
7
Ideas & Questions Data Analysis
Insights
Impact
@KNerush @Volodymyrk
You do it for your own good..
8
Re-run all AB tests
analysis for the last
months, by
tomorrow
eas & Q...
@KNerush @Volodymyrk
Part 2
What can Data Scientists learn from
Software Engineers?
9
@KNerush @Volodymyrk
Robert C. Martin, a.k.a. “Uncle Bob”
10
https://cleancoders.com/
@KNerush @Volodymyrk
“Clean Code” ?
11
Pleasingly graceful and stylish in appearance
or manner
Bjarne Stroustrup
Inventor ...
@KNerush @Volodymyrk
One does not simply start writing clean code..
12
First make it work,
Then make it Right,
Then make i...
@KNerush @Volodymyrk
I'm not a great programmer;
I'm just a good programmer with great habits.
13
Kent Beck
@KNerush @Volodymyrk
“There are only two hard problems in Computer Science:
cache invalidation and naming things" - Phil K...
@KNerush @Volodymyrk
“each routine turns out to be pretty much what you
expected” - Ward Cunningham
Small
Do one thing
One...
@KNerush @Volodymyrk
Use good names
Avoid obvious comments.
Dead Commented-out Code
ToDo, licenses, history, markup for do...
@KNerush @Volodymyrk
// When I wrote this, only God and I understood what I was doing
// Now, God only knows
17
@KNerush @Volodymyrk
// sometimes I believe compiler ignores all my comments
18
@KNerush @Volodymyrk
/**
* Always returns true.
*/
public boolean isAvailable() {
return false;
}
19
@KNerush @Volodymyrk
“Long functions is where classes are trying to hide” -
Robert C. Martin
20
Small
Do one thing
SOLID, ...
@KNerush @Volodymyrk
Code conventions
Team should produce same style code as if that was one person
Team conventions over ...
@KNerush @Volodymyrk
Part 3
How to write Clean Code in Python?
(e.g. this is not Java)
22
@KNerush @Volodymyrk
● Indentation
● Tabs or Spaces?
● Maximum Line Length
● Should a line break before or after a binary ...
@KNerush @Volodymyrk
Google Python Style Guide
24
https://google.github.io/styleguide/pyguide.html
@KNerush @Volodymyrk25
My favourite !
This is not Java or C++
Functions are first-class objects
Duck-typing as an interfac...
@KNerush @Volodymyrk
Part 4
How to write Clean Python Code in
Jupyter Notebook?
26
@KNerush @Volodymyrk
1. Imports
27
2. Get Data
5.Visualisation
6. Making sense of the data
4. Modelling
3. Transform Data
...
@KNerush @Volodymyrk
How big should a notebook file be?
28
@KNerush @Volodymyrk
How big should a notebook file be?
Hypothesis - Data - Interpretation
29
@KNerush @Volodymyrk
Keep your notebooks small!
(4-10 cells each)
30
@KNerush @Volodymyrk
Example:
Tip 1: break fat notebook into many small ones
31
1_data_preparation.ipynb
df.to_pickle(‘cle...
@KNerush @Volodymyrk
Tip 2: shared library
Data access
Common plotting functionality
Report generation
Misc. utils
32
acme...
@KNerush @Volodymyrk
Tip 3: Don’t just be pythonic. Be IPythonic
Don’t hide “secret sauce” inside imported module
BAD:
Goo...
@KNerush @Volodymyrk
Clean code reads like well written prose
34
Grady Booch
@KNerush @Volodymyrk
Good jupyter notebook reads like well written prose
35
@KNerush @Volodymyrk
How big should one Cell be?
36
@KNerush @Volodymyrk
One “idea - execution - output” triplet per cell
Import Cell: expected output is no import errors
CMD...
@KNerush @Volodymyrk
Tip 5: write tests .. in jupyter notebooks
38
https://pypi.python.org/pypi/pytest-ipynb
@KNerush @Volodymyrk
Tip 6: ..to the cloud
39
@KNerush @Volodymyrk
Code Smells .. in ipynb
- Cells can’t be executed in order (with runAll and Restart&RunAll)
- Prototy...
@KNerush @Volodymyrk
Tip 7: Run notebook from another notebook!
41
analysis.ipynb
@KNerush @Volodymyrk
Make Data Product from notebooks!
42
@KNerush @Volodymyrk
Summary: How to organise a Jupyter project
1. Notebook should have one Hypothesis-Data-Interpretation...
Upcoming SlideShare
Loading in …5
×

Clean code in Jupyter notebooks

3,556 views

Published on

Applying clean code practices to data science projects in Jupyter notebooks

Published in: Software
  • Be the first to comment

Clean code in Jupyter notebooks

  1. 1. @KNerush @Volodymyrk Clean Code In Jupyter notebooks, using Python 1 5th of July, 2016
  2. 2. @KNerush @Volodymyrk Volodymyr (Vlad) Kazantsev Head of Data @ product madness Product Manager MBA @LBS Graphics programming Writes code for money since 2002 Math degree 2 Kateryna (Katya) Nerush Mobile Dev @ Octopus Labs Dev Lead in Finance Data Engineer Web Developer Writes code for money since 2003 CS degree
  3. 3. @KNerush @Volodymyrk Why we end-up with messy ipy notebooks? 3 Coding Stats Business
  4. 4. @KNerush @Volodymyrk Who are Data Scientists, really? 4 Coding Stats Business “In a nutshell, coding is telling a computer to do something using a language it understands.” Data Science with Python
  5. 5. @KNerush @Volodymyrk It is not going to production anyway! 5
  6. 6. @KNerush @Volodymyrk “Any fool can write code that a computer can understand. Good programmers write code that humans can understand” - Kent Beck, 1999 6 WTF! How am I suppose to validate this?? Sorry, but how do can I calculate 7 day retention ?
  7. 7. @KNerush @Volodymyrk From Prototype to ... The Data Science Spiral 7 Ideas & Questions Data Analysis Insights Impact
  8. 8. @KNerush @Volodymyrk You do it for your own good.. 8 Re-run all AB tests analysis for the last months, by tomorrow eas & Questions Data Analysis Insights Impact
  9. 9. @KNerush @Volodymyrk Part 2 What can Data Scientists learn from Software Engineers? 9
  10. 10. @KNerush @Volodymyrk Robert C. Martin, a.k.a. “Uncle Bob” 10 https://cleancoders.com/
  11. 11. @KNerush @Volodymyrk “Clean Code” ? 11 Pleasingly graceful and stylish in appearance or manner Bjarne Stroustrup Inventor of C++ Clean code reads like well written prose Grady Booch creator of UML .. each routine turns out to be pretty much what you expected Ward Cunningham inventor of Wiki and XP
  12. 12. @KNerush @Volodymyrk One does not simply start writing clean code.. 12 First make it work, Then make it Right, Then make it fast and small Kent Beck co-inventor of XP and TDD Leave the campground cleaner than you found it - Run all the tests - Contains no duplicate code - Expresses all ideas... - Minimize classes and methods Ron Jeffries author of Extreme Programming Installed The Boy Scouts of America Applied to programming by Uncle Bob
  13. 13. @KNerush @Volodymyrk I'm not a great programmer; I'm just a good programmer with great habits. 13 Kent Beck
  14. 14. @KNerush @Volodymyrk “There are only two hard problems in Computer Science: cache invalidation and naming things" - Phil Karlton long_descriptive_names Avoid: x, i, stuff, do_blah() Pronounceable and Searchable revenue_per_payer vs. arpdpu Avoid encodings, abbreviations, prefixes, suffixes.. if possible bonus_points_on_iphone vs. cns_crm_dip Add meaningful context daily_revenue_per_payer Don’t be lazy. Spend time naming and renaming things. 14
  15. 15. @KNerush @Volodymyrk “each routine turns out to be pretty much what you expected” - Ward Cunningham Small Do one thing One Level of Abstraction Have only few arguments (one is the best) Less important in Python, with named arguments. 15
  16. 16. @KNerush @Volodymyrk Use good names Avoid obvious comments. Dead Commented-out Code ToDo, licenses, history, markup for documentation and other nonsense But there are exceptions.. “When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous” Kent Beck 16
  17. 17. @KNerush @Volodymyrk // When I wrote this, only God and I understood what I was doing // Now, God only knows 17
  18. 18. @KNerush @Volodymyrk // sometimes I believe compiler ignores all my comments 18
  19. 19. @KNerush @Volodymyrk /** * Always returns true. */ public boolean isAvailable() { return false; } 19
  20. 20. @KNerush @Volodymyrk “Long functions is where classes are trying to hide” - Robert C. Martin 20 Small Do one thing SOLID, Design Patterns, etc.
  21. 21. @KNerush @Volodymyrk Code conventions Team should produce same style code as if that was one person Team conventions over language one, over personal ones Automate style formatting 21
  22. 22. @KNerush @Volodymyrk Part 3 How to write Clean Code in Python? (e.g. this is not Java) 22
  23. 23. @KNerush @Volodymyrk ● Indentation ● Tabs or Spaces? ● Maximum Line Length ● Should a line break before or after a binary operator? ● Blank Lines ● Imports ● Comments ● Naming Conventions Example: PEP 8 -- Style Guide for Python Code 23 foo = long_function_name(var_one, var_two, var_three, var_four) foo = long_function_name(var_one, var_two, var_three, var_four) Good Bad https://www.python.org/dev/peps/pep-0008/
  24. 24. @KNerush @Volodymyrk Google Python Style Guide 24 https://google.github.io/styleguide/pyguide.html
  25. 25. @KNerush @Volodymyrk25 My favourite ! This is not Java or C++ Functions are first-class objects Duck-typing as an interface No setters/getters Itertools, zip, enumerate etc.
  26. 26. @KNerush @Volodymyrk Part 4 How to write Clean Python Code in Jupyter Notebook? 26
  27. 27. @KNerush @Volodymyrk 1. Imports 27 2. Get Data 5.Visualisation 6. Making sense of the data 4. Modelling 3. Transform Data Typical structure of the ipynb
  28. 28. @KNerush @Volodymyrk How big should a notebook file be? 28
  29. 29. @KNerush @Volodymyrk How big should a notebook file be? Hypothesis - Data - Interpretation 29
  30. 30. @KNerush @Volodymyrk Keep your notebooks small! (4-10 cells each) 30
  31. 31. @KNerush @Volodymyrk Example: Tip 1: break fat notebook into many small ones 31 1_data_preparation.ipynb df.to_pickle(‘clean_data_1.pkl) 2_linear_model.py df = pd.read_pickle(‘clean_data_1.pkl) 3_ensamble.py df = pd.read_pickle(‘clean_data_1.pkl)
  32. 32. @KNerush @Volodymyrk Tip 2: shared library Data access Common plotting functionality Report generation Misc. utils 32 acme_data_utils Data_access.py plotting.py setup.py tests/
  33. 33. @KNerush @Volodymyrk Tip 3: Don’t just be pythonic. Be IPythonic Don’t hide “secret sauce” inside imported module BAD: Good: 33
  34. 34. @KNerush @Volodymyrk Clean code reads like well written prose 34 Grady Booch
  35. 35. @KNerush @Volodymyrk Good jupyter notebook reads like well written prose 35
  36. 36. @KNerush @Volodymyrk How big should one Cell be? 36
  37. 37. @KNerush @Volodymyrk One “idea - execution - output” triplet per cell Import Cell: expected output is no import errors CMD+SHIFT+P 37 Tip 4: each cell should have one logical output
  38. 38. @KNerush @Volodymyrk Tip 5: write tests .. in jupyter notebooks 38 https://pypi.python.org/pypi/pytest-ipynb
  39. 39. @KNerush @Volodymyrk Tip 6: ..to the cloud 39
  40. 40. @KNerush @Volodymyrk Code Smells .. in ipynb - Cells can’t be executed in order (with runAll and Restart&RunAll) - Prototype (check ideas) code is mixed with “analysis” code - Debugging cells - Copy-paste cells - Duplicate code (in general) - Multiple notebooks that re-implement the same function 40
  41. 41. @KNerush @Volodymyrk Tip 7: Run notebook from another notebook! 41 analysis.ipynb
  42. 42. @KNerush @Volodymyrk Make Data Product from notebooks! 42
  43. 43. @KNerush @Volodymyrk Summary: How to organise a Jupyter project 1. Notebook should have one Hypothesis-Data-Interpretation loop 2. Make a multi-project utils library 3. Good jupyter notebook reads like a well written prose 4. Each cell should have one and only one output 5. Write tests in notebooks 6. Deploy a shared Jupyter server 7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible. 43

×