SlideShare a Scribd company logo
1 of 43
Download to read offline
@KNerush @Volodymyrk
Clean Code
In Jupyter notebooks, using Python
1
5th of July, 2016
@KNerush @Volodymyrk
Volodymyr (Vlad) Kazantsev
Head of Data @ product madness
Product Manager
MBA @LBS
Graphics programming
Writes code for money since 2002
Math degree
2
Kateryna (Katya) Nerush
Mobile Dev @ Octopus Labs
Dev Lead in Finance
Data Engineer
Web Developer
Writes code for money since 2003
CS degree
@KNerush @Volodymyrk
Why we end-up with messy ipy notebooks?
3
Coding
Stats Business
@KNerush @Volodymyrk
Who are Data Scientists, really?
4
Coding
Stats Business “In a nutshell, coding is telling a computer to do
something using a language it understands.”
Data Science with Python
@KNerush @Volodymyrk
It is not going to production anyway!
5
@KNerush @Volodymyrk
“Any fool can write code that a computer can understand. Good programmers write
code that humans can understand” - Kent Beck, 1999
6
WTF! How am I suppose to
validate this??
Sorry, but how do
can I calculate
7 day retention ?
@KNerush @Volodymyrk
From Prototype to ... The Data Science Spiral
7
Ideas &
Questions
Data
Analysis
Insights
Impact
@KNerush @Volodymyrk
You do it for your own good..
8
Re-run all AB tests
analysis for the
last months, by
tomorrow
Ideas &
Questions
Data
Analysis
Insights
Impact
@KNerush @Volodymyrk
Part 2
What can Data Scientists learn from
Software Engineers?
9
@KNerush @Volodymyrk
Robert C. Martin, a.k.a. “Uncle Bob”
10
https://cleancoders.com/
@KNerush @Volodymyrk
“Clean Code” ?
11
Pleasingly graceful and stylish in appearance
or manner
Bjarne Stroustrup
Inventor of C++
Clean code reads like well written prose
Grady Booch
creator of UML
.. each routine turns out to be pretty much what
you expected
Ward Cunningham
inventor of Wiki and XP
@KNerush @Volodymyrk
One does not simply start writing clean code..
12
First make it work,
Then make it Right,
Then make it fast and small
Kent Beck
co-inventor of XP and TDD
Leave the campground cleaner than you found it
- Run all the tests
- Contains no duplicate code
- Expresses all ideas...
- Minimize classes and methods
Ron Jeffries
author of Extreme
Programming Installed
The Boy Scouts of America
Applied to programming by
Uncle Bob
@KNerush @Volodymyrk
I'm not a great programmer;
I'm just a good programmer with great habits.
13
Kent Beck
@KNerush @Volodymyrk
“There are only two hard problems in Computer Science:
cache invalidation and naming things" - Phil Karlton
● long_descriptive_names
○ Avoid: x, i, stuff, do_blah()
● Pronounceable and Searchable
○ revenue_per_payer vs. arpdpu
● Avoid encodings, abbreviations, prefixes, suffixes.. if possible
○ bonus_points_on_iphone vs. cns_crm_dip
● Add meaningful context
○ daily_revenue_per_payer
● Don’t be lazy.
○ Spend time naming and renaming things.
14
@KNerush @Volodymyrk
“each routine turns out to be pretty much what you
expected” - Ward Cunningham
● Small
● Do one thing
● One Level of Abstraction
● Have only few arguments (one is the best)
○ Less important in Python, with named arguments.
15
@KNerush @Volodymyrk
● Use good names
● Avoid obvious comments.
● Dead Commented-out Code
● ToDo, licenses, history, markup for documentation and other nonsense
● But there are exceptions..
“When you feel the need to write a comment, first try to refactor
the code so that any comment becomes superfluous” Kent Beck
16
@KNerush @Volodymyrk
// When I wrote this, only God and I understood what I was doing
// Now, God only knows
17
@KNerush @Volodymyrk
// sometimes I believe compiler ignores all my comments
18
@KNerush @Volodymyrk
/**
* Always returns true.
*/
public boolean isAvailable() {
return false;
}
19
@KNerush @Volodymyrk
“Long functions is where classes are trying to hide” -
Robert C. Martin
20
● Small
● Do one thing
● SOLID, Design Patterns, etc.
@KNerush @Volodymyrk
Code conventions
● Team should produce same style code as if that was one person
● Team conventions over language one, over personal ones
● Automate style formatting
21
@KNerush @Volodymyrk
Part 3
How to write Clean Code in Python?
(e.g. this is not Java)
22
@KNerush @Volodymyrk
● Indentation
● Tabs or Spaces?
● Maximum Line Length
● Should a line break before or after a binary operator?
● Blank Lines
● Imports
● Comments
● Naming Conventions
Example:
PEP 8 -- Style Guide for Python Code
23
foo = long_function_name(var_one, var_two,
var_three, var_four)
foo = long_function_name(var_one, var_two,
var_three, var_four)
Good Bad
https://www.python.org/dev/peps/pep-0008/
@KNerush @Volodymyrk
Google Python Style Guide
24
https://google.github.io/styleguide/pyguide.html
@KNerush @Volodymyrk25
My favourite !
This is not Java or C++
● Functions are first-class objects
● Duck-typing as an interface
● No setters/getters
● Itertools, zip, enumerate
● etc.
@KNerush @Volodymyrk
Part 4
How to write Clean Python Code in Jupyter
Notebook?
26
@KNerush @Volodymyrk
1. Imports
27
2. Get Data
5.Visualisation
6. Making sense of the data
4. Modelling
3. Transform Data
Typical structure of the ipynb
@KNerush @Volodymyrk
How big should a notebook file be?
28
@KNerush @Volodymyrk
How big should a notebook file be?
Hypothesis - Data - Interpretation
29
@KNerush @Volodymyrk
Keep your notebooks small!
(4-10 cells each)
30
@KNerush @Volodymyrk
Example:
Tip 1: break fat notebook into many small ones
31
1_data_preparation.ipynb
df.to_pickle(‘clean_data_1.pkl)
2_linear_model.py
df = pd.read_pickle(‘clean_data_1.pkl)
3_ensamble.py
df = pd.read_pickle(‘clean_data_1.pkl)
@KNerush @Volodymyrk
Tip 2: shared library
● Data access
● Common plotting functionality
● Report generation
● Misc. utils
32
acme_data_utils
Data_access.py
plotting.py
setup.py
tests/
@KNerush @Volodymyrk
Tip 3: Don’t just be pythonic. Be IPythonic
Don’t hide “secret sauce” inside imported module
BAD:
Good:
33
@KNerush @Volodymyrk
Clean code reads like well written prose
34
Grady Booch
@KNerush @Volodymyrk
Good jupyter notebook reads like well written prose
35
@KNerush @Volodymyrk
How big should one Cell be?
36
@KNerush @Volodymyrk
● One “idea - execution - output” triplet per cell
● Import Cell: expected output is no import errors
● CMD+SHIFT+P
37
Tip 4: each cell should have one logical output
@KNerush @Volodymyrk
Tip 5: write tests .. in jupyter notebooks
38
https://pypi.python.org/pypi/pytest-ipynb
@KNerush @Volodymyrk
Tip 6: ..to the cloud
39
@KNerush @Volodymyrk
Code Smells .. in ipynb
- Cells can’t be executed in order (with runAll and Restart&RunAll)
- Prototype (check ideas) code is mixed with “analysis” code
- Debugging cells
- Copy-paste cells
- Duplicate code (in general)
- Multiple notebooks that re-implement the same function
40
@KNerush @Volodymyrk
Tip 7: Run notebook from another notebook!
41
analysis.ipynb
@KNerush @Volodymyrk
Make Data Product from notebooks!
42
@KNerush @Volodymyrk
Summary: How to organise a Jupyter project
1. Notebook should have one Hypothesis-Data-Interpretation loop
2. Make a multi-project utils library
3. Good jupyter notebook reads like a well written prose
4. Each cell should have one and only one output
5. Write tests in notebooks
6. Deploy a shared Jupyter server
7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.
43

More Related Content

What's hot

Clean Code I - Best Practices
Clean Code I - Best PracticesClean Code I - Best Practices
Clean Code I - Best Practices
Theo Jungeblut
 

What's hot (20)

Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
Xgboost
XgboostXgboost
Xgboost
 
[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptx
[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptx[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptx
[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptx
 
XGBoost & LightGBM
XGBoost & LightGBMXGBoost & LightGBM
XGBoost & LightGBM
 
Kaggle and data science
Kaggle and data scienceKaggle and data science
Kaggle and data science
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data Scientist
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
Clean Code I - Best Practices
Clean Code I - Best PracticesClean Code I - Best Practices
Clean Code I - Best Practices
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Design pattern cheat sheet
Design pattern cheat sheetDesign pattern cheat sheet
Design pattern cheat sheet
 
Clean code
Clean codeClean code
Clean code
 
Intro to Jupyter Notebooks
Intro to Jupyter NotebooksIntro to Jupyter Notebooks
Intro to Jupyter Notebooks
 
Efficient estimation of word representations in vector space (2013)
Efficient estimation of word representations in vector space (2013)Efficient estimation of word representations in vector space (2013)
Efficient estimation of word representations in vector space (2013)
 
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
 

Similar to Clean Code in Jupyter notebook

Top Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To KnowTop Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To Know
Kathy Brown
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
Discover Pinterest
 
Stop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principlesStop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principles
Edorian
 

Similar to Clean Code in Jupyter notebook (20)

Clean code in Jupyter notebooks
Clean code in Jupyter notebooksClean code in Jupyter notebooks
Clean code in Jupyter notebooks
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
Top Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To KnowTop Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To Know
 
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
 
How to write maintainable code
How to write maintainable codeHow to write maintainable code
How to write maintainable code
 
Performance #5 cpu and battery
Performance #5  cpu and batteryPerformance #5  cpu and battery
Performance #5 cpu and battery
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
 
Caveats
CaveatsCaveats
Caveats
 
engage 2014 - JavaBlast
engage 2014 - JavaBlastengage 2014 - JavaBlast
engage 2014 - JavaBlast
 
Prometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesPrometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on Kubernetes
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)
 
Peddle the Pedal to the Metal
Peddle the Pedal to the MetalPeddle the Pedal to the Metal
Peddle the Pedal to the Metal
 
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
 
Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1
 
Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....
Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....
Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScriptBig Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational Work
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
Stop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principlesStop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principles
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

Clean Code in Jupyter notebook

  • 1. @KNerush @Volodymyrk Clean Code In Jupyter notebooks, using Python 1 5th of July, 2016
  • 2. @KNerush @Volodymyrk Volodymyr (Vlad) Kazantsev Head of Data @ product madness Product Manager MBA @LBS Graphics programming Writes code for money since 2002 Math degree 2 Kateryna (Katya) Nerush Mobile Dev @ Octopus Labs Dev Lead in Finance Data Engineer Web Developer Writes code for money since 2003 CS degree
  • 3. @KNerush @Volodymyrk Why we end-up with messy ipy notebooks? 3 Coding Stats Business
  • 4. @KNerush @Volodymyrk Who are Data Scientists, really? 4 Coding Stats Business “In a nutshell, coding is telling a computer to do something using a language it understands.” Data Science with Python
  • 5. @KNerush @Volodymyrk It is not going to production anyway! 5
  • 6. @KNerush @Volodymyrk “Any fool can write code that a computer can understand. Good programmers write code that humans can understand” - Kent Beck, 1999 6 WTF! How am I suppose to validate this?? Sorry, but how do can I calculate 7 day retention ?
  • 7. @KNerush @Volodymyrk From Prototype to ... The Data Science Spiral 7 Ideas & Questions Data Analysis Insights Impact
  • 8. @KNerush @Volodymyrk You do it for your own good.. 8 Re-run all AB tests analysis for the last months, by tomorrow Ideas & Questions Data Analysis Insights Impact
  • 9. @KNerush @Volodymyrk Part 2 What can Data Scientists learn from Software Engineers? 9
  • 10. @KNerush @Volodymyrk Robert C. Martin, a.k.a. “Uncle Bob” 10 https://cleancoders.com/
  • 11. @KNerush @Volodymyrk “Clean Code” ? 11 Pleasingly graceful and stylish in appearance or manner Bjarne Stroustrup Inventor of C++ Clean code reads like well written prose Grady Booch creator of UML .. each routine turns out to be pretty much what you expected Ward Cunningham inventor of Wiki and XP
  • 12. @KNerush @Volodymyrk One does not simply start writing clean code.. 12 First make it work, Then make it Right, Then make it fast and small Kent Beck co-inventor of XP and TDD Leave the campground cleaner than you found it - Run all the tests - Contains no duplicate code - Expresses all ideas... - Minimize classes and methods Ron Jeffries author of Extreme Programming Installed The Boy Scouts of America Applied to programming by Uncle Bob
  • 13. @KNerush @Volodymyrk I'm not a great programmer; I'm just a good programmer with great habits. 13 Kent Beck
  • 14. @KNerush @Volodymyrk “There are only two hard problems in Computer Science: cache invalidation and naming things" - Phil Karlton ● long_descriptive_names ○ Avoid: x, i, stuff, do_blah() ● Pronounceable and Searchable ○ revenue_per_payer vs. arpdpu ● Avoid encodings, abbreviations, prefixes, suffixes.. if possible ○ bonus_points_on_iphone vs. cns_crm_dip ● Add meaningful context ○ daily_revenue_per_payer ● Don’t be lazy. ○ Spend time naming and renaming things. 14
  • 15. @KNerush @Volodymyrk “each routine turns out to be pretty much what you expected” - Ward Cunningham ● Small ● Do one thing ● One Level of Abstraction ● Have only few arguments (one is the best) ○ Less important in Python, with named arguments. 15
  • 16. @KNerush @Volodymyrk ● Use good names ● Avoid obvious comments. ● Dead Commented-out Code ● ToDo, licenses, history, markup for documentation and other nonsense ● But there are exceptions.. “When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous” Kent Beck 16
  • 17. @KNerush @Volodymyrk // When I wrote this, only God and I understood what I was doing // Now, God only knows 17
  • 18. @KNerush @Volodymyrk // sometimes I believe compiler ignores all my comments 18
  • 19. @KNerush @Volodymyrk /** * Always returns true. */ public boolean isAvailable() { return false; } 19
  • 20. @KNerush @Volodymyrk “Long functions is where classes are trying to hide” - Robert C. Martin 20 ● Small ● Do one thing ● SOLID, Design Patterns, etc.
  • 21. @KNerush @Volodymyrk Code conventions ● Team should produce same style code as if that was one person ● Team conventions over language one, over personal ones ● Automate style formatting 21
  • 22. @KNerush @Volodymyrk Part 3 How to write Clean Code in Python? (e.g. this is not Java) 22
  • 23. @KNerush @Volodymyrk ● Indentation ● Tabs or Spaces? ● Maximum Line Length ● Should a line break before or after a binary operator? ● Blank Lines ● Imports ● Comments ● Naming Conventions Example: PEP 8 -- Style Guide for Python Code 23 foo = long_function_name(var_one, var_two, var_three, var_four) foo = long_function_name(var_one, var_two, var_three, var_four) Good Bad https://www.python.org/dev/peps/pep-0008/
  • 24. @KNerush @Volodymyrk Google Python Style Guide 24 https://google.github.io/styleguide/pyguide.html
  • 25. @KNerush @Volodymyrk25 My favourite ! This is not Java or C++ ● Functions are first-class objects ● Duck-typing as an interface ● No setters/getters ● Itertools, zip, enumerate ● etc.
  • 26. @KNerush @Volodymyrk Part 4 How to write Clean Python Code in Jupyter Notebook? 26
  • 27. @KNerush @Volodymyrk 1. Imports 27 2. Get Data 5.Visualisation 6. Making sense of the data 4. Modelling 3. Transform Data Typical structure of the ipynb
  • 28. @KNerush @Volodymyrk How big should a notebook file be? 28
  • 29. @KNerush @Volodymyrk How big should a notebook file be? Hypothesis - Data - Interpretation 29
  • 30. @KNerush @Volodymyrk Keep your notebooks small! (4-10 cells each) 30
  • 31. @KNerush @Volodymyrk Example: Tip 1: break fat notebook into many small ones 31 1_data_preparation.ipynb df.to_pickle(‘clean_data_1.pkl) 2_linear_model.py df = pd.read_pickle(‘clean_data_1.pkl) 3_ensamble.py df = pd.read_pickle(‘clean_data_1.pkl)
  • 32. @KNerush @Volodymyrk Tip 2: shared library ● Data access ● Common plotting functionality ● Report generation ● Misc. utils 32 acme_data_utils Data_access.py plotting.py setup.py tests/
  • 33. @KNerush @Volodymyrk Tip 3: Don’t just be pythonic. Be IPythonic Don’t hide “secret sauce” inside imported module BAD: Good: 33
  • 34. @KNerush @Volodymyrk Clean code reads like well written prose 34 Grady Booch
  • 35. @KNerush @Volodymyrk Good jupyter notebook reads like well written prose 35
  • 36. @KNerush @Volodymyrk How big should one Cell be? 36
  • 37. @KNerush @Volodymyrk ● One “idea - execution - output” triplet per cell ● Import Cell: expected output is no import errors ● CMD+SHIFT+P 37 Tip 4: each cell should have one logical output
  • 38. @KNerush @Volodymyrk Tip 5: write tests .. in jupyter notebooks 38 https://pypi.python.org/pypi/pytest-ipynb
  • 39. @KNerush @Volodymyrk Tip 6: ..to the cloud 39
  • 40. @KNerush @Volodymyrk Code Smells .. in ipynb - Cells can’t be executed in order (with runAll and Restart&RunAll) - Prototype (check ideas) code is mixed with “analysis” code - Debugging cells - Copy-paste cells - Duplicate code (in general) - Multiple notebooks that re-implement the same function 40
  • 41. @KNerush @Volodymyrk Tip 7: Run notebook from another notebook! 41 analysis.ipynb
  • 42. @KNerush @Volodymyrk Make Data Product from notebooks! 42
  • 43. @KNerush @Volodymyrk Summary: How to organise a Jupyter project 1. Notebook should have one Hypothesis-Data-Interpretation loop 2. Make a multi-project utils library 3. Good jupyter notebook reads like a well written prose 4. Each cell should have one and only one output 5. Write tests in notebooks 6. Deploy a shared Jupyter server 7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible. 43