SlideShare a Scribd company logo
1 of 43
@KNerush @Volodymyrk
Clean Code
In Jupyter notebooks, using Python
1
5th of July, 2016
@KNerush @Volodymyrk
Volodymyr (Vlad) Kazantsev
Head of Data @ product madness
Product Manager
MBA @LBS
Graphics programming
Writes code for money since 2002
Math degree
2
Kateryna (Katya) Nerush
Mobile Dev @ Octopus Labs
Dev Lead in Finance
Data Engineer
Web Developer
Writes code for money since
2003
CS degree
@KNerush @Volodymyrk
Why we end-up with messy ipy notebooks?
3
Coding
Stats Business
@KNerush @Volodymyrk
Who are Data Scientists, really?
4
Coding
Stats Business “In a nutshell, coding is telling a computer to do
something using a language it understands.”
Data Science with Python
@KNerush @Volodymyrk
It is not going to production anyway!
5
@KNerush @Volodymyrk
“Any fool can write code that a computer can understand. Good programmers write
code that humans can understand” - Kent Beck, 1999
6
WTF! How am I suppose to
validate this??
Sorry, but how do
can I calculate
7 day retention ?
@KNerush @Volodymyrk
From Prototype to ... The Data Science Spiral
7
Ideas & Questions Data Analysis
Insights
Impact
@KNerush @Volodymyrk
You do it for your own good..
8
Re-run all AB tests
analysis for the last
months, by
tomorrow
eas & Questions Data Analysis
Insights
Impact
@KNerush @Volodymyrk
Part 2
What can Data Scientists learn from
Software Engineers?
9
@KNerush @Volodymyrk
Robert C. Martin, a.k.a. “Uncle Bob”
10
https://cleancoders.com/
@KNerush @Volodymyrk
“Clean Code” ?
11
Pleasingly graceful and stylish in appearance
or manner
Bjarne Stroustrup
Inventor of C++
Clean code reads like well written prose
Grady Booch
creator of UML
.. each routine turns out to be pretty much what
you expected
Ward Cunningham
inventor of Wiki and XP
@KNerush @Volodymyrk
One does not simply start writing clean code..
12
First make it work,
Then make it Right,
Then make it fast and small
Kent Beck
co-inventor of XP and TDD
Leave the campground cleaner than you found it
- Run all the tests
- Contains no duplicate code
- Expresses all ideas...
- Minimize classes and methods
Ron Jeffries
author of Extreme
Programming Installed
The Boy Scouts of America
Applied to programming by
Uncle Bob
@KNerush @Volodymyrk
I'm not a great programmer;
I'm just a good programmer with great habits.
13
Kent Beck
@KNerush @Volodymyrk
“There are only two hard problems in Computer Science:
cache invalidation and naming things" - Phil Karlton
long_descriptive_names
Avoid: x, i, stuff, do_blah()
Pronounceable and Searchable
revenue_per_payer vs. arpdpu
Avoid encodings, abbreviations, prefixes, suffixes.. if possible
bonus_points_on_iphone vs. cns_crm_dip
Add meaningful context
daily_revenue_per_payer
Don’t be lazy.
Spend time naming and renaming things.
14
@KNerush @Volodymyrk
“each routine turns out to be pretty much what you
expected” - Ward Cunningham
Small
Do one thing
One Level of Abstraction
Have only few arguments (one is the best)
Less important in Python, with named arguments.
15
@KNerush @Volodymyrk
Use good names
Avoid obvious comments.
Dead Commented-out Code
ToDo, licenses, history, markup for documentation and other nonsense
But there are exceptions..
“When you feel the need to write a comment, first try to refactor
the code so that any comment becomes superfluous” Kent Beck
16
@KNerush @Volodymyrk
// When I wrote this, only God and I understood what I was doing
// Now, God only knows
17
@KNerush @Volodymyrk
// sometimes I believe compiler ignores all my comments
18
@KNerush @Volodymyrk
/**
* Always returns true.
*/
public boolean isAvailable() {
return false;
}
19
@KNerush @Volodymyrk
“Long functions is where classes are trying to hide” -
Robert C. Martin
20
Small
Do one thing
SOLID, Design Patterns, etc.
@KNerush @Volodymyrk
Code conventions
Team should produce same style code as if that was one person
Team conventions over language one, over personal ones
Automate style formatting
21
@KNerush @Volodymyrk
Part 3
How to write Clean Code in Python?
(e.g. this is not Java)
22
@KNerush @Volodymyrk
● Indentation
● Tabs or Spaces?
● Maximum Line Length
● Should a line break before or after a binary operator?
● Blank Lines
● Imports
● Comments
● Naming Conventions
Example:
PEP 8 -- Style Guide for Python Code
23
foo = long_function_name(var_one, var_two,
var_three, var_four)
foo = long_function_name(var_one, var_two,
var_three, var_four)
Good Bad
https://www.python.org/dev/peps/pep-0008/
@KNerush @Volodymyrk
Google Python Style Guide
24
https://google.github.io/styleguide/pyguide.html
@KNerush @Volodymyrk25
My favourite !
This is not Java or C++
Functions are first-class objects
Duck-typing as an interface
No setters/getters
Itertools, zip, enumerate
etc.
@KNerush @Volodymyrk
Part 4
How to write Clean Python Code in
Jupyter Notebook?
26
@KNerush @Volodymyrk
1. Imports
27
2. Get Data
5.Visualisation
6. Making sense of the data
4. Modelling
3. Transform Data
Typical structure of the ipynb
@KNerush @Volodymyrk
How big should a notebook file be?
28
@KNerush @Volodymyrk
How big should a notebook file be?
Hypothesis - Data - Interpretation
29
@KNerush @Volodymyrk
Keep your notebooks small!
(4-10 cells each)
30
@KNerush @Volodymyrk
Example:
Tip 1: break fat notebook into many small ones
31
1_data_preparation.ipynb
df.to_pickle(‘clean_data_1.pkl)
2_linear_model.py
df = pd.read_pickle(‘clean_data_1.pkl)
3_ensamble.py
df = pd.read_pickle(‘clean_data_1.pkl)
@KNerush @Volodymyrk
Tip 2: shared library
Data access
Common plotting functionality
Report generation
Misc. utils
32
acme_data_utils
Data_access.py
plotting.py
setup.py
tests/
@KNerush @Volodymyrk
Tip 3: Don’t just be pythonic. Be IPythonic
Don’t hide “secret sauce” inside imported module
BAD:
Good:
33
@KNerush @Volodymyrk
Clean code reads like well written prose
34
Grady Booch
@KNerush @Volodymyrk
Good jupyter notebook reads like well written prose
35
@KNerush @Volodymyrk
How big should one Cell be?
36
@KNerush @Volodymyrk
One “idea - execution - output” triplet per cell
Import Cell: expected output is no import errors
CMD+SHIFT+P
37
Tip 4: each cell should have one logical output
@KNerush @Volodymyrk
Tip 5: write tests .. in jupyter notebooks
38
https://pypi.python.org/pypi/pytest-ipynb
@KNerush @Volodymyrk
Tip 6: ..to the cloud
39
@KNerush @Volodymyrk
Code Smells .. in ipynb
- Cells can’t be executed in order (with runAll and Restart&RunAll)
- Prototype (check ideas) code is mixed with “analysis” code
- Debugging cells
- Copy-paste cells
- Duplicate code (in general)
- Multiple notebooks that re-implement the same function
40
@KNerush @Volodymyrk
Tip 7: Run notebook from another notebook!
41
analysis.ipynb
@KNerush @Volodymyrk
Make Data Product from notebooks!
42
@KNerush @Volodymyrk
Summary: How to organise a Jupyter project
1. Notebook should have one Hypothesis-Data-Interpretation loop
2. Make a multi-project utils library
3. Good jupyter notebook reads like a well written prose
4. Each cell should have one and only one output
5. Write tests in notebooks
6. Deploy a shared Jupyter server
7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.
43

More Related Content

What's hot

What's hot (20)

Jfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptxJfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptx
 
GitHub Copilot.pptx
GitHub Copilot.pptxGitHub Copilot.pptx
GitHub Copilot.pptx
 
The world of Containers with Podman, Buildah, Skopeo by Seema - CCDays
The world of Containers with Podman, Buildah, Skopeo by Seema - CCDaysThe world of Containers with Podman, Buildah, Skopeo by Seema - CCDays
The world of Containers with Podman, Buildah, Skopeo by Seema - CCDays
 
DockerCon SF 2015: The Distributed System Toolkit
DockerCon SF 2015: The Distributed System ToolkitDockerCon SF 2015: The Distributed System Toolkit
DockerCon SF 2015: The Distributed System Toolkit
 
Build and Modernize Intelligent Apps​
Build and Modernize Intelligent Apps​Build and Modernize Intelligent Apps​
Build and Modernize Intelligent Apps​
 
Publish Android Application on Google Play Store
Publish Android Application on Google Play Store Publish Android Application on Google Play Store
Publish Android Application on Google Play Store
 
Fuchsia operating system by google document
Fuchsia operating system by google documentFuchsia operating system by google document
Fuchsia operating system by google document
 
ChatGPT_Cheatsheet_Costa.pdf
ChatGPT_Cheatsheet_Costa.pdfChatGPT_Cheatsheet_Costa.pdf
ChatGPT_Cheatsheet_Costa.pdf
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
 
Continuous Integration and Continuous Delivery on Azure
Continuous Integration and Continuous Delivery on AzureContinuous Integration and Continuous Delivery on Azure
Continuous Integration and Continuous Delivery on Azure
 
Flutter
FlutterFlutter
Flutter
 
My Project Report Documentation with Abstract & Snapshots
My Project Report Documentation with Abstract & SnapshotsMy Project Report Documentation with Abstract & Snapshots
My Project Report Documentation with Abstract & Snapshots
 
OpenAI-Copilot-ChatGPT.pptx
OpenAI-Copilot-ChatGPT.pptxOpenAI-Copilot-ChatGPT.pptx
OpenAI-Copilot-ChatGPT.pptx
 
openai-chatgpt sunumu
openai-chatgpt sunumuopenai-chatgpt sunumu
openai-chatgpt sunumu
 
Azure kubernetes service (aks)
Azure kubernetes service (aks)Azure kubernetes service (aks)
Azure kubernetes service (aks)
 
Branching Out: How To Automate Your Development Process
Branching Out: How To Automate Your Development ProcessBranching Out: How To Automate Your Development Process
Branching Out: How To Automate Your Development Process
 
JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020
 
"Micro-frontends: Scalable and Modular Frontend in Parimatch Tech", Kyrylo Ai...
"Micro-frontends: Scalable and Modular Frontend in Parimatch Tech", Kyrylo Ai..."Micro-frontends: Scalable and Modular Frontend in Parimatch Tech", Kyrylo Ai...
"Micro-frontends: Scalable and Modular Frontend in Parimatch Tech", Kyrylo Ai...
 
A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
 

Similar to Clean code in Jupyter notebooks

Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
Discover Pinterest
 
Stop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principlesStop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principles
Edorian
 

Similar to Clean code in Jupyter notebooks (20)

Clean Code in Jupyter notebook
Clean Code in Jupyter notebookClean Code in Jupyter notebook
Clean Code in Jupyter notebook
 
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
 
How to write maintainable code
How to write maintainable codeHow to write maintainable code
How to write maintainable code
 
Performance #5 cpu and battery
Performance #5  cpu and batteryPerformance #5  cpu and battery
Performance #5 cpu and battery
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
 
Stop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principlesStop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principles
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)
 
Prometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesPrometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on Kubernetes
 
EuroPython 2020 - Speak python with devices
EuroPython 2020 - Speak python with devicesEuroPython 2020 - Speak python with devices
EuroPython 2020 - Speak python with devices
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
 
Voxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation AvoidanceVoxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation Avoidance
 
Advice for Computer Science freshers!
Advice for Computer Science freshers!Advice for Computer Science freshers!
Advice for Computer Science freshers!
 
The Art of Clean code
The Art of Clean codeThe Art of Clean code
The Art of Clean code
 
Codemotion Berlin 2015 recap
Codemotion Berlin 2015   recapCodemotion Berlin 2015   recap
Codemotion Berlin 2015 recap
 
engage 2014 - JavaBlast
engage 2014 - JavaBlastengage 2014 - JavaBlast
engage 2014 - JavaBlast
 
Refactoring, 2nd Edition
Refactoring, 2nd EditionRefactoring, 2nd Edition
Refactoring, 2nd Edition
 
Why defensive research is sexy too.. … and a real sign of skill
Why defensive research is sexy too.. … and a real sign of skillWhy defensive research is sexy too.. … and a real sign of skill
Why defensive research is sexy too.. … and a real sign of skill
 
Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1
 
Clean Code 2
Clean Code 2Clean Code 2
Clean Code 2
 

Recently uploaded

Recently uploaded (20)

Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
Auto Affiliate  AI Earns First Commission in 3 Hours..pdfAuto Affiliate  AI Earns First Commission in 3 Hours..pdf
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdf
 
^Clinic ^%[+27788225528*Abortion Pills For Sale In harare
^Clinic ^%[+27788225528*Abortion Pills For Sale In harare^Clinic ^%[+27788225528*Abortion Pills For Sale In harare
^Clinic ^%[+27788225528*Abortion Pills For Sale In harare
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
What is a Recruitment Management Software?
What is a Recruitment Management Software?What is a Recruitment Management Software?
What is a Recruitment Management Software?
 
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with GraphGraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
 
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
 
Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdf
 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 

Clean code in Jupyter notebooks

  • 1. @KNerush @Volodymyrk Clean Code In Jupyter notebooks, using Python 1 5th of July, 2016
  • 2. @KNerush @Volodymyrk Volodymyr (Vlad) Kazantsev Head of Data @ product madness Product Manager MBA @LBS Graphics programming Writes code for money since 2002 Math degree 2 Kateryna (Katya) Nerush Mobile Dev @ Octopus Labs Dev Lead in Finance Data Engineer Web Developer Writes code for money since 2003 CS degree
  • 3. @KNerush @Volodymyrk Why we end-up with messy ipy notebooks? 3 Coding Stats Business
  • 4. @KNerush @Volodymyrk Who are Data Scientists, really? 4 Coding Stats Business “In a nutshell, coding is telling a computer to do something using a language it understands.” Data Science with Python
  • 5. @KNerush @Volodymyrk It is not going to production anyway! 5
  • 6. @KNerush @Volodymyrk “Any fool can write code that a computer can understand. Good programmers write code that humans can understand” - Kent Beck, 1999 6 WTF! How am I suppose to validate this?? Sorry, but how do can I calculate 7 day retention ?
  • 7. @KNerush @Volodymyrk From Prototype to ... The Data Science Spiral 7 Ideas & Questions Data Analysis Insights Impact
  • 8. @KNerush @Volodymyrk You do it for your own good.. 8 Re-run all AB tests analysis for the last months, by tomorrow eas & Questions Data Analysis Insights Impact
  • 9. @KNerush @Volodymyrk Part 2 What can Data Scientists learn from Software Engineers? 9
  • 10. @KNerush @Volodymyrk Robert C. Martin, a.k.a. “Uncle Bob” 10 https://cleancoders.com/
  • 11. @KNerush @Volodymyrk “Clean Code” ? 11 Pleasingly graceful and stylish in appearance or manner Bjarne Stroustrup Inventor of C++ Clean code reads like well written prose Grady Booch creator of UML .. each routine turns out to be pretty much what you expected Ward Cunningham inventor of Wiki and XP
  • 12. @KNerush @Volodymyrk One does not simply start writing clean code.. 12 First make it work, Then make it Right, Then make it fast and small Kent Beck co-inventor of XP and TDD Leave the campground cleaner than you found it - Run all the tests - Contains no duplicate code - Expresses all ideas... - Minimize classes and methods Ron Jeffries author of Extreme Programming Installed The Boy Scouts of America Applied to programming by Uncle Bob
  • 13. @KNerush @Volodymyrk I'm not a great programmer; I'm just a good programmer with great habits. 13 Kent Beck
  • 14. @KNerush @Volodymyrk “There are only two hard problems in Computer Science: cache invalidation and naming things" - Phil Karlton long_descriptive_names Avoid: x, i, stuff, do_blah() Pronounceable and Searchable revenue_per_payer vs. arpdpu Avoid encodings, abbreviations, prefixes, suffixes.. if possible bonus_points_on_iphone vs. cns_crm_dip Add meaningful context daily_revenue_per_payer Don’t be lazy. Spend time naming and renaming things. 14
  • 15. @KNerush @Volodymyrk “each routine turns out to be pretty much what you expected” - Ward Cunningham Small Do one thing One Level of Abstraction Have only few arguments (one is the best) Less important in Python, with named arguments. 15
  • 16. @KNerush @Volodymyrk Use good names Avoid obvious comments. Dead Commented-out Code ToDo, licenses, history, markup for documentation and other nonsense But there are exceptions.. “When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous” Kent Beck 16
  • 17. @KNerush @Volodymyrk // When I wrote this, only God and I understood what I was doing // Now, God only knows 17
  • 18. @KNerush @Volodymyrk // sometimes I believe compiler ignores all my comments 18
  • 19. @KNerush @Volodymyrk /** * Always returns true. */ public boolean isAvailable() { return false; } 19
  • 20. @KNerush @Volodymyrk “Long functions is where classes are trying to hide” - Robert C. Martin 20 Small Do one thing SOLID, Design Patterns, etc.
  • 21. @KNerush @Volodymyrk Code conventions Team should produce same style code as if that was one person Team conventions over language one, over personal ones Automate style formatting 21
  • 22. @KNerush @Volodymyrk Part 3 How to write Clean Code in Python? (e.g. this is not Java) 22
  • 23. @KNerush @Volodymyrk ● Indentation ● Tabs or Spaces? ● Maximum Line Length ● Should a line break before or after a binary operator? ● Blank Lines ● Imports ● Comments ● Naming Conventions Example: PEP 8 -- Style Guide for Python Code 23 foo = long_function_name(var_one, var_two, var_three, var_four) foo = long_function_name(var_one, var_two, var_three, var_four) Good Bad https://www.python.org/dev/peps/pep-0008/
  • 24. @KNerush @Volodymyrk Google Python Style Guide 24 https://google.github.io/styleguide/pyguide.html
  • 25. @KNerush @Volodymyrk25 My favourite ! This is not Java or C++ Functions are first-class objects Duck-typing as an interface No setters/getters Itertools, zip, enumerate etc.
  • 26. @KNerush @Volodymyrk Part 4 How to write Clean Python Code in Jupyter Notebook? 26
  • 27. @KNerush @Volodymyrk 1. Imports 27 2. Get Data 5.Visualisation 6. Making sense of the data 4. Modelling 3. Transform Data Typical structure of the ipynb
  • 28. @KNerush @Volodymyrk How big should a notebook file be? 28
  • 29. @KNerush @Volodymyrk How big should a notebook file be? Hypothesis - Data - Interpretation 29
  • 30. @KNerush @Volodymyrk Keep your notebooks small! (4-10 cells each) 30
  • 31. @KNerush @Volodymyrk Example: Tip 1: break fat notebook into many small ones 31 1_data_preparation.ipynb df.to_pickle(‘clean_data_1.pkl) 2_linear_model.py df = pd.read_pickle(‘clean_data_1.pkl) 3_ensamble.py df = pd.read_pickle(‘clean_data_1.pkl)
  • 32. @KNerush @Volodymyrk Tip 2: shared library Data access Common plotting functionality Report generation Misc. utils 32 acme_data_utils Data_access.py plotting.py setup.py tests/
  • 33. @KNerush @Volodymyrk Tip 3: Don’t just be pythonic. Be IPythonic Don’t hide “secret sauce” inside imported module BAD: Good: 33
  • 34. @KNerush @Volodymyrk Clean code reads like well written prose 34 Grady Booch
  • 35. @KNerush @Volodymyrk Good jupyter notebook reads like well written prose 35
  • 36. @KNerush @Volodymyrk How big should one Cell be? 36
  • 37. @KNerush @Volodymyrk One “idea - execution - output” triplet per cell Import Cell: expected output is no import errors CMD+SHIFT+P 37 Tip 4: each cell should have one logical output
  • 38. @KNerush @Volodymyrk Tip 5: write tests .. in jupyter notebooks 38 https://pypi.python.org/pypi/pytest-ipynb
  • 39. @KNerush @Volodymyrk Tip 6: ..to the cloud 39
  • 40. @KNerush @Volodymyrk Code Smells .. in ipynb - Cells can’t be executed in order (with runAll and Restart&RunAll) - Prototype (check ideas) code is mixed with “analysis” code - Debugging cells - Copy-paste cells - Duplicate code (in general) - Multiple notebooks that re-implement the same function 40
  • 41. @KNerush @Volodymyrk Tip 7: Run notebook from another notebook! 41 analysis.ipynb
  • 42. @KNerush @Volodymyrk Make Data Product from notebooks! 42
  • 43. @KNerush @Volodymyrk Summary: How to organise a Jupyter project 1. Notebook should have one Hypothesis-Data-Interpretation loop 2. Make a multi-project utils library 3. Good jupyter notebook reads like a well written prose 4. Each cell should have one and only one output 5. Write tests in notebooks 6. Deploy a shared Jupyter server 7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible. 43

Editor's Notes

  1. Data Scientists are coming from various backgrounds. In my company, many came from the business “dark” side.
  2. http://www.slideshare.net/ISchwarz23/clean-code-49797249
  3. Part-2 of the talk is, of course, heavily inspired by work of Robert C Martin, his books, website and his absolutely wonderful video podcast. Shameless plagiarism Alert!
  4. So, what is Clean code anyway? “Pleasingly graceful and stylish in appearance or manner” - and this guy invented C++?! Graceful and stylish.. “Clean code reads like well written prose” - Grady Booch, creator of UML. Interestingly, UML can’t be read like a prose at all.. But may be there is something in there afterall.. “.. each routine turns out to be pretty much what you expected” Ward Cunningham, inventor of Wiki. So, clean code should not “surprise”. I can definitely relate to that. How often you open someone else’s code and go “Wow, what is that.. This is very curious and inventive way.. But.. WHY??” So clean code should aim not to surprise. It should be even “boring” and “predictable”. And consistent.. But more on that later
  5. First make it work, Then make it Right, Then make it fast and small These are the Design rules of Kent Beck, creator and proponent of Test Driven Development Another recipe for making clean code from Ron Jeffries, leading book author on Agile, XP and good practices: Clean code is the one that.. Run all the tests Contains no duplicate code Expresses all ideas... Minimize classes and methods Robert C. Martin, found a very successful metaphor for writing clean code sustainably.. It is called a Boy Scout rule of development: Leave the campground cleaner than you found it
  6. So let’s take a look at how to develop those great habits
  7. Naming things.. "There are only two hard problems in Computer Science: cache invalidation and naming things." - Phil Karlton, Principal Architect at Netscape Rule-1: long_descriptive_names. Don’t be afraid to to type a long name. All modern IDEs have auto-complete. And if yours don’t - get a better one that do! Even Python Notebooks have autocomplete for variable and function names. X, i, stuff and do_blah() are real variable or function names that I have seen! Rule2: name should be easy to pronounce. Arpdpu stands for .. Vova, what is this stands for again?? Vova: “Average Revenue per Daily Paying User Rule3: avoid encodings and abbreviations. Exception may be only where everyone in the organisation already knows that DAU stands for Daily Active Users.. Avoid hungarian notetion and other nonesence. But use Domain Names. Rule 4: add relevant context. Of course, long names make it hard to have lines that are 79 characters. But I prefer to have longer lines (we all have UltraHD Retina monitors these days anyway) rather than shorter and obscure names. Rule 5: naming is hard. Very hard. So think hard about naming things. And don’t be afraid to refactor and rename, if you found a better way to express the purpose of variable, function or a Class. Good EDI should have refactor-rename functions. And even Python Notebook has “search and replace” these days..
  8. Functions.. Rules of functions should be small should be smaller than that do one thing (also applies to classes) they should do it well do it only One Level of Abstraction: Don’t mix high-level policy and low-level details
  9. But there are exceptions.. Complex algorithms Technical notes and warnings Conventions and rules
  10. And this is where I will take over...
  11. So let’s take a look at how to develop those great habits
  12. So let’s take a look at how to develop those great habits
  13. Same environment Same color - reproducability
  14. Advice: reload() modules that are changing, so that RunAll produce the same result as Restart&RunAll There should be one and and only one narrative (story line) per notebook Not the same as test-cells. We are going to talk about Tests and TDD later Copy-paste cells - my favourite