SlideShare a Scribd company logo
1 of 43
@KNerush @Volodymyrk
Clean Code
In Jupyter notebooks, using Python
1
5th of July, 2016
@KNerush @Volodymyrk
Volodymyr (Vlad) Kazantsev
Head of Data @ product madness
Product Manager
MBA @LBS
Graphics programming
Writes code for money since 2002
Math degree
2
Kateryna (Katya) Nerush
Mobile Dev @ Octopus Labs
Dev Lead in Finance
Data Engineer
Web Developer
Writes code for money since
2003
CS degree
@KNerush @Volodymyrk
Why we end-up with messy ipy notebooks?
3
Coding
Stats Business
@KNerush @Volodymyrk
Who are Data Scientists, really?
4
Coding
Stats Business “In a nutshell, coding is telling a computer to do
something using a language it understands.”
Data Science with Python
@KNerush @Volodymyrk
It is not going to production anyway!
5
@KNerush @Volodymyrk
“Any fool can write code that a computer can understand. Good programmers write
code that humans can understand” - Kent Beck, 1999
6
WTF! How am I suppose to
validate this??
Sorry, but how do
can I calculate
7 day retention ?
@KNerush @Volodymyrk
From Prototype to ... The Data Science Spiral
7
Ideas & Questions Data Analysis
Insights
Impact
@KNerush @Volodymyrk
You do it for your own good..
8
Re-run all AB tests
analysis for the last
months, by
tomorrow
eas & Questions Data Analysis
Insights
Impact
@KNerush @Volodymyrk
Part 2
What can Data Scientists learn from
Software Engineers?
9
@KNerush @Volodymyrk
Robert C. Martin, a.k.a. “Uncle Bob”
10
https://cleancoders.com/
@KNerush @Volodymyrk
“Clean Code” ?
11
Pleasingly graceful and stylish in appearance
or manner
Bjarne Stroustrup
Inventor of C++
Clean code reads like well written prose
Grady Booch
creator of UML
.. each routine turns out to be pretty much what
you expected
Ward Cunningham
inventor of Wiki and XP
@KNerush @Volodymyrk
One does not simply start writing clean code..
12
First make it work,
Then make it Right,
Then make it fast and small
Kent Beck
co-inventor of XP and TDD
Leave the campground cleaner than you found it
- Run all the tests
- Contains no duplicate code
- Expresses all ideas...
- Minimize classes and methods
Ron Jeffries
author of Extreme
Programming Installed
The Boy Scouts of America
Applied to programming by
Uncle Bob
@KNerush @Volodymyrk
I'm not a great programmer;
I'm just a good programmer with great habits.
13
Kent Beck
@KNerush @Volodymyrk
“There are only two hard problems in Computer Science:
cache invalidation and naming things" - Phil Karlton
long_descriptive_names
Avoid: x, i, stuff, do_blah()
Pronounceable and Searchable
revenue_per_payer vs. arpdpu
Avoid encodings, abbreviations, prefixes, suffixes.. if possible
bonus_points_on_iphone vs. cns_crm_dip
Add meaningful context
daily_revenue_per_payer
Don’t be lazy.
Spend time naming and renaming things.
14
@KNerush @Volodymyrk
“each routine turns out to be pretty much what you
expected” - Ward Cunningham
Small
Do one thing
One Level of Abstraction
Have only few arguments (one is the best)
Less important in Python, with named arguments.
15
@KNerush @Volodymyrk
Use good names
Avoid obvious comments.
Dead Commented-out Code
ToDo, licenses, history, markup for documentation and other nonsense
But there are exceptions..
“When you feel the need to write a comment, first try to refactor
the code so that any comment becomes superfluous” Kent Beck
16
@KNerush @Volodymyrk
// When I wrote this, only God and I understood what I was doing
// Now, God only knows
17
@KNerush @Volodymyrk
// sometimes I believe compiler ignores all my comments
18
@KNerush @Volodymyrk
/**
* Always returns true.
*/
public boolean isAvailable() {
return false;
}
19
@KNerush @Volodymyrk
“Long functions is where classes are trying to hide” -
Robert C. Martin
20
Small
Do one thing
SOLID, Design Patterns, etc.
@KNerush @Volodymyrk
Code conventions
Team should produce same style code as if that was one person
Team conventions over language one, over personal ones
Automate style formatting
21
@KNerush @Volodymyrk
Part 3
How to write Clean Code in Python?
(e.g. this is not Java)
22
@KNerush @Volodymyrk
● Indentation
● Tabs or Spaces?
● Maximum Line Length
● Should a line break before or after a binary operator?
● Blank Lines
● Imports
● Comments
● Naming Conventions
Example:
PEP 8 -- Style Guide for Python Code
23
foo = long_function_name(var_one, var_two,
var_three, var_four)
foo = long_function_name(var_one, var_two,
var_three, var_four)
Good Bad
https://www.python.org/dev/peps/pep-0008/
@KNerush @Volodymyrk
Google Python Style Guide
24
https://google.github.io/styleguide/pyguide.html
@KNerush @Volodymyrk25
My favourite !
This is not Java or C++
Functions are first-class objects
Duck-typing as an interface
No setters/getters
Itertools, zip, enumerate
etc.
@KNerush @Volodymyrk
Part 4
How to write Clean Python Code in
Jupyter Notebook?
26
@KNerush @Volodymyrk
1. Imports
27
2. Get Data
5.Visualisation
6. Making sense of the data
4. Modelling
3. Transform Data
Typical structure of the ipynb
@KNerush @Volodymyrk
How big should a notebook file be?
28
@KNerush @Volodymyrk
How big should a notebook file be?
Hypothesis - Data - Interpretation
29
@KNerush @Volodymyrk
Keep your notebooks small!
(4-10 cells each)
30
@KNerush @Volodymyrk
Example:
Tip 1: break fat notebook into many small ones
31
1_data_preparation.ipynb
df.to_pickle(‘clean_data_1.pkl)
2_linear_model.py
df = pd.read_pickle(‘clean_data_1.pkl)
3_ensamble.py
df = pd.read_pickle(‘clean_data_1.pkl)
@KNerush @Volodymyrk
Tip 2: shared library
Data access
Common plotting functionality
Report generation
Misc. utils
32
acme_data_utils
Data_access.py
plotting.py
setup.py
tests/
@KNerush @Volodymyrk
Tip 3: Don’t just be pythonic. Be IPythonic
Don’t hide “secret sauce” inside imported module
BAD:
Good:
33
@KNerush @Volodymyrk
Clean code reads like well written prose
34
Grady Booch
@KNerush @Volodymyrk
Good jupyter notebook reads like well written prose
35
@KNerush @Volodymyrk
How big should one Cell be?
36
@KNerush @Volodymyrk
One “idea - execution - output” triplet per cell
Import Cell: expected output is no import errors
CMD+SHIFT+P
37
Tip 4: each cell should have one logical output
@KNerush @Volodymyrk
Tip 5: write tests .. in jupyter notebooks
38
https://pypi.python.org/pypi/pytest-ipynb
@KNerush @Volodymyrk
Tip 6: ..to the cloud
39
@KNerush @Volodymyrk
Code Smells .. in ipynb
- Cells can’t be executed in order (with runAll and Restart&RunAll)
- Prototype (check ideas) code is mixed with “analysis” code
- Debugging cells
- Copy-paste cells
- Duplicate code (in general)
- Multiple notebooks that re-implement the same function
40
@KNerush @Volodymyrk
Tip 7: Run notebook from another notebook!
41
analysis.ipynb
@KNerush @Volodymyrk
Make Data Product from notebooks!
42
@KNerush @Volodymyrk
Summary: How to organise a Jupyter project
1. Notebook should have one Hypothesis-Data-Interpretation loop
2. Make a multi-project utils library
3. Good jupyter notebook reads like a well written prose
4. Each cell should have one and only one output
5. Write tests in notebooks
6. Deploy a shared Jupyter server
7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.
43

More Related Content

What's hot

Clean Code - The Next Chapter
Clean Code - The Next ChapterClean Code - The Next Chapter
Clean Code - The Next ChapterVictor Rentea
 
Clean code: understanding Boundaries and Unit Tests
Clean code: understanding Boundaries and Unit TestsClean code: understanding Boundaries and Unit Tests
Clean code: understanding Boundaries and Unit Testsradin reth
 
Weaviate Air #3 - New in AI segment.pdf
Weaviate Air #3 - New in AI segment.pdfWeaviate Air #3 - New in AI segment.pdf
Weaviate Air #3 - New in AI segment.pdfConnorShorten2
 
Elasticsearch vs MongoDB comparison
Elasticsearch vs MongoDB comparisonElasticsearch vs MongoDB comparison
Elasticsearch vs MongoDB comparisonjeetendra mandal
 
Mongoose: MongoDB object modelling for Node.js
Mongoose: MongoDB object modelling for Node.jsMongoose: MongoDB object modelling for Node.js
Mongoose: MongoDB object modelling for Node.jsYuriy Bogomolov
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
The Art of Clean code
The Art of Clean codeThe Art of Clean code
The Art of Clean codeVictor Rentea
 
How to successfully grow a code review culture
How to successfully grow a code review cultureHow to successfully grow a code review culture
How to successfully grow a code review cultureNina Zakharenko
 
Clean Code Principles
Clean Code PrinciplesClean Code Principles
Clean Code PrinciplesYeurDreamin'
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
 
Clean Code: Chapter 3 Function
Clean Code: Chapter 3 FunctionClean Code: Chapter 3 Function
Clean Code: Chapter 3 FunctionKent Huang
 
Impulse ChatGPT and Generative AI Tools in Corporate Learning
Impulse ChatGPT and Generative AI Tools in Corporate LearningImpulse ChatGPT and Generative AI Tools in Corporate Learning
Impulse ChatGPT and Generative AI Tools in Corporate LearningJan Foelsing
 

What's hot (20)

Clean Code - The Next Chapter
Clean Code - The Next ChapterClean Code - The Next Chapter
Clean Code - The Next Chapter
 
Django
DjangoDjango
Django
 
Clean code: understanding Boundaries and Unit Tests
Clean code: understanding Boundaries and Unit TestsClean code: understanding Boundaries and Unit Tests
Clean code: understanding Boundaries and Unit Tests
 
Weaviate Air #3 - New in AI segment.pdf
Weaviate Air #3 - New in AI segment.pdfWeaviate Air #3 - New in AI segment.pdf
Weaviate Air #3 - New in AI segment.pdf
 
Clean code
Clean codeClean code
Clean code
 
Elasticsearch vs MongoDB comparison
Elasticsearch vs MongoDB comparisonElasticsearch vs MongoDB comparison
Elasticsearch vs MongoDB comparison
 
Mongoose: MongoDB object modelling for Node.js
Mongoose: MongoDB object modelling for Node.jsMongoose: MongoDB object modelling for Node.js
Mongoose: MongoDB object modelling for Node.js
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Clean code slide
Clean code slideClean code slide
Clean code slide
 
Clean coding-practices
Clean coding-practicesClean coding-practices
Clean coding-practices
 
The Art of Clean code
The Art of Clean codeThe Art of Clean code
The Art of Clean code
 
How to successfully grow a code review culture
How to successfully grow a code review cultureHow to successfully grow a code review culture
How to successfully grow a code review culture
 
Llama-index
Llama-indexLlama-index
Llama-index
 
Clean Code
Clean CodeClean Code
Clean Code
 
Clean Code Principles
Clean Code PrinciplesClean Code Principles
Clean Code Principles
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
 
Clean Code: Chapter 3 Function
Clean Code: Chapter 3 FunctionClean Code: Chapter 3 Function
Clean Code: Chapter 3 Function
 
DDD with Behat
DDD with BehatDDD with Behat
DDD with Behat
 
Beginning Python Programming
Beginning Python ProgrammingBeginning Python Programming
Beginning Python Programming
 
Impulse ChatGPT and Generative AI Tools in Corporate Learning
Impulse ChatGPT and Generative AI Tools in Corporate LearningImpulse ChatGPT and Generative AI Tools in Corporate Learning
Impulse ChatGPT and Generative AI Tools in Corporate Learning
 

Similar to Clean Code in Jupyter Notebooks Using Python

How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017Codemotion
 
How to write maintainable code
How to write maintainable codeHow to write maintainable code
How to write maintainable codePeter Hilton
 
Performance #5 cpu and battery
Performance #5  cpu and batteryPerformance #5  cpu and battery
Performance #5 cpu and batteryVitali Pekelis
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...Altinity Ltd
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Discover Pinterest
 
Stop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principlesStop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principlesEdorian
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)David Evans
 
Prometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesPrometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesLeonardo Di Donato
 
EuroPython 2020 - Speak python with devices
EuroPython 2020 - Speak python with devicesEuroPython 2020 - Speak python with devices
EuroPython 2020 - Speak python with devicesHua Chu
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering
 
Voxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation AvoidanceVoxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation AvoidanceVoxxed Days Thessaloniki
 
Advice for Computer Science freshers!
Advice for Computer Science freshers!Advice for Computer Science freshers!
Advice for Computer Science freshers!Karan Singh
 
Codemotion Berlin 2015 recap
Codemotion Berlin 2015   recapCodemotion Berlin 2015   recap
Codemotion Berlin 2015 recapTorben Dohrn
 
Refactoring, 2nd Edition
Refactoring, 2nd EditionRefactoring, 2nd Edition
Refactoring, 2nd Editionjexp
 
Why defensive research is sexy too.. … and a real sign of skill
Why defensive research is sexy too.. … and a real sign of skillWhy defensive research is sexy too.. … and a real sign of skill
Why defensive research is sexy too.. … and a real sign of skillOllie Whitehouse
 
Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Andrei KUCHARAVY
 
Documentation avoidance for developers
Documentation avoidance for developersDocumentation avoidance for developers
Documentation avoidance for developersPeter Hilton
 
Top Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To KnowTop Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To KnowKathy Brown
 

Similar to Clean Code in Jupyter Notebooks Using Python (20)

How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
 
How to write maintainable code
How to write maintainable codeHow to write maintainable code
How to write maintainable code
 
Performance #5 cpu and battery
Performance #5  cpu and batteryPerformance #5  cpu and battery
Performance #5 cpu and battery
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
 
Stop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principlesStop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principles
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)
 
Prometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesPrometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on Kubernetes
 
EuroPython 2020 - Speak python with devices
EuroPython 2020 - Speak python with devicesEuroPython 2020 - Speak python with devices
EuroPython 2020 - Speak python with devices
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
 
Voxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation AvoidanceVoxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation Avoidance
 
Advice for Computer Science freshers!
Advice for Computer Science freshers!Advice for Computer Science freshers!
Advice for Computer Science freshers!
 
Codemotion Berlin 2015 recap
Codemotion Berlin 2015   recapCodemotion Berlin 2015   recap
Codemotion Berlin 2015 recap
 
engage 2014 - JavaBlast
engage 2014 - JavaBlastengage 2014 - JavaBlast
engage 2014 - JavaBlast
 
Refactoring, 2nd Edition
Refactoring, 2nd EditionRefactoring, 2nd Edition
Refactoring, 2nd Edition
 
Why defensive research is sexy too.. … and a real sign of skill
Why defensive research is sexy too.. … and a real sign of skillWhy defensive research is sexy too.. … and a real sign of skill
Why defensive research is sexy too.. … and a real sign of skill
 
Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1
 
Clean Code 2
Clean Code 2Clean Code 2
Clean Code 2
 
Documentation avoidance for developers
Documentation avoidance for developersDocumentation avoidance for developers
Documentation avoidance for developers
 
Top Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To KnowTop Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To Know
 

Recently uploaded

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 

Recently uploaded (20)

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 

Clean Code in Jupyter Notebooks Using Python

  • 1. @KNerush @Volodymyrk Clean Code In Jupyter notebooks, using Python 1 5th of July, 2016
  • 2. @KNerush @Volodymyrk Volodymyr (Vlad) Kazantsev Head of Data @ product madness Product Manager MBA @LBS Graphics programming Writes code for money since 2002 Math degree 2 Kateryna (Katya) Nerush Mobile Dev @ Octopus Labs Dev Lead in Finance Data Engineer Web Developer Writes code for money since 2003 CS degree
  • 3. @KNerush @Volodymyrk Why we end-up with messy ipy notebooks? 3 Coding Stats Business
  • 4. @KNerush @Volodymyrk Who are Data Scientists, really? 4 Coding Stats Business “In a nutshell, coding is telling a computer to do something using a language it understands.” Data Science with Python
  • 5. @KNerush @Volodymyrk It is not going to production anyway! 5
  • 6. @KNerush @Volodymyrk “Any fool can write code that a computer can understand. Good programmers write code that humans can understand” - Kent Beck, 1999 6 WTF! How am I suppose to validate this?? Sorry, but how do can I calculate 7 day retention ?
  • 7. @KNerush @Volodymyrk From Prototype to ... The Data Science Spiral 7 Ideas & Questions Data Analysis Insights Impact
  • 8. @KNerush @Volodymyrk You do it for your own good.. 8 Re-run all AB tests analysis for the last months, by tomorrow eas & Questions Data Analysis Insights Impact
  • 9. @KNerush @Volodymyrk Part 2 What can Data Scientists learn from Software Engineers? 9
  • 10. @KNerush @Volodymyrk Robert C. Martin, a.k.a. “Uncle Bob” 10 https://cleancoders.com/
  • 11. @KNerush @Volodymyrk “Clean Code” ? 11 Pleasingly graceful and stylish in appearance or manner Bjarne Stroustrup Inventor of C++ Clean code reads like well written prose Grady Booch creator of UML .. each routine turns out to be pretty much what you expected Ward Cunningham inventor of Wiki and XP
  • 12. @KNerush @Volodymyrk One does not simply start writing clean code.. 12 First make it work, Then make it Right, Then make it fast and small Kent Beck co-inventor of XP and TDD Leave the campground cleaner than you found it - Run all the tests - Contains no duplicate code - Expresses all ideas... - Minimize classes and methods Ron Jeffries author of Extreme Programming Installed The Boy Scouts of America Applied to programming by Uncle Bob
  • 13. @KNerush @Volodymyrk I'm not a great programmer; I'm just a good programmer with great habits. 13 Kent Beck
  • 14. @KNerush @Volodymyrk “There are only two hard problems in Computer Science: cache invalidation and naming things" - Phil Karlton long_descriptive_names Avoid: x, i, stuff, do_blah() Pronounceable and Searchable revenue_per_payer vs. arpdpu Avoid encodings, abbreviations, prefixes, suffixes.. if possible bonus_points_on_iphone vs. cns_crm_dip Add meaningful context daily_revenue_per_payer Don’t be lazy. Spend time naming and renaming things. 14
  • 15. @KNerush @Volodymyrk “each routine turns out to be pretty much what you expected” - Ward Cunningham Small Do one thing One Level of Abstraction Have only few arguments (one is the best) Less important in Python, with named arguments. 15
  • 16. @KNerush @Volodymyrk Use good names Avoid obvious comments. Dead Commented-out Code ToDo, licenses, history, markup for documentation and other nonsense But there are exceptions.. “When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous” Kent Beck 16
  • 17. @KNerush @Volodymyrk // When I wrote this, only God and I understood what I was doing // Now, God only knows 17
  • 18. @KNerush @Volodymyrk // sometimes I believe compiler ignores all my comments 18
  • 19. @KNerush @Volodymyrk /** * Always returns true. */ public boolean isAvailable() { return false; } 19
  • 20. @KNerush @Volodymyrk “Long functions is where classes are trying to hide” - Robert C. Martin 20 Small Do one thing SOLID, Design Patterns, etc.
  • 21. @KNerush @Volodymyrk Code conventions Team should produce same style code as if that was one person Team conventions over language one, over personal ones Automate style formatting 21
  • 22. @KNerush @Volodymyrk Part 3 How to write Clean Code in Python? (e.g. this is not Java) 22
  • 23. @KNerush @Volodymyrk ● Indentation ● Tabs or Spaces? ● Maximum Line Length ● Should a line break before or after a binary operator? ● Blank Lines ● Imports ● Comments ● Naming Conventions Example: PEP 8 -- Style Guide for Python Code 23 foo = long_function_name(var_one, var_two, var_three, var_four) foo = long_function_name(var_one, var_two, var_three, var_four) Good Bad https://www.python.org/dev/peps/pep-0008/
  • 24. @KNerush @Volodymyrk Google Python Style Guide 24 https://google.github.io/styleguide/pyguide.html
  • 25. @KNerush @Volodymyrk25 My favourite ! This is not Java or C++ Functions are first-class objects Duck-typing as an interface No setters/getters Itertools, zip, enumerate etc.
  • 26. @KNerush @Volodymyrk Part 4 How to write Clean Python Code in Jupyter Notebook? 26
  • 27. @KNerush @Volodymyrk 1. Imports 27 2. Get Data 5.Visualisation 6. Making sense of the data 4. Modelling 3. Transform Data Typical structure of the ipynb
  • 28. @KNerush @Volodymyrk How big should a notebook file be? 28
  • 29. @KNerush @Volodymyrk How big should a notebook file be? Hypothesis - Data - Interpretation 29
  • 30. @KNerush @Volodymyrk Keep your notebooks small! (4-10 cells each) 30
  • 31. @KNerush @Volodymyrk Example: Tip 1: break fat notebook into many small ones 31 1_data_preparation.ipynb df.to_pickle(‘clean_data_1.pkl) 2_linear_model.py df = pd.read_pickle(‘clean_data_1.pkl) 3_ensamble.py df = pd.read_pickle(‘clean_data_1.pkl)
  • 32. @KNerush @Volodymyrk Tip 2: shared library Data access Common plotting functionality Report generation Misc. utils 32 acme_data_utils Data_access.py plotting.py setup.py tests/
  • 33. @KNerush @Volodymyrk Tip 3: Don’t just be pythonic. Be IPythonic Don’t hide “secret sauce” inside imported module BAD: Good: 33
  • 34. @KNerush @Volodymyrk Clean code reads like well written prose 34 Grady Booch
  • 35. @KNerush @Volodymyrk Good jupyter notebook reads like well written prose 35
  • 36. @KNerush @Volodymyrk How big should one Cell be? 36
  • 37. @KNerush @Volodymyrk One “idea - execution - output” triplet per cell Import Cell: expected output is no import errors CMD+SHIFT+P 37 Tip 4: each cell should have one logical output
  • 38. @KNerush @Volodymyrk Tip 5: write tests .. in jupyter notebooks 38 https://pypi.python.org/pypi/pytest-ipynb
  • 39. @KNerush @Volodymyrk Tip 6: ..to the cloud 39
  • 40. @KNerush @Volodymyrk Code Smells .. in ipynb - Cells can’t be executed in order (with runAll and Restart&RunAll) - Prototype (check ideas) code is mixed with “analysis” code - Debugging cells - Copy-paste cells - Duplicate code (in general) - Multiple notebooks that re-implement the same function 40
  • 41. @KNerush @Volodymyrk Tip 7: Run notebook from another notebook! 41 analysis.ipynb
  • 42. @KNerush @Volodymyrk Make Data Product from notebooks! 42
  • 43. @KNerush @Volodymyrk Summary: How to organise a Jupyter project 1. Notebook should have one Hypothesis-Data-Interpretation loop 2. Make a multi-project utils library 3. Good jupyter notebook reads like a well written prose 4. Each cell should have one and only one output 5. Write tests in notebooks 6. Deploy a shared Jupyter server 7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible. 43

Editor's Notes

  1. Data Scientists are coming from various backgrounds. In my company, many came from the business “dark” side.
  2. http://www.slideshare.net/ISchwarz23/clean-code-49797249
  3. Part-2 of the talk is, of course, heavily inspired by work of Robert C Martin, his books, website and his absolutely wonderful video podcast. Shameless plagiarism Alert!
  4. So, what is Clean code anyway? “Pleasingly graceful and stylish in appearance or manner” - and this guy invented C++?! Graceful and stylish.. “Clean code reads like well written prose” - Grady Booch, creator of UML. Interestingly, UML can’t be read like a prose at all.. But may be there is something in there afterall.. “.. each routine turns out to be pretty much what you expected” Ward Cunningham, inventor of Wiki. So, clean code should not “surprise”. I can definitely relate to that. How often you open someone else’s code and go “Wow, what is that.. This is very curious and inventive way.. But.. WHY??” So clean code should aim not to surprise. It should be even “boring” and “predictable”. And consistent.. But more on that later
  5. First make it work, Then make it Right, Then make it fast and small These are the Design rules of Kent Beck, creator and proponent of Test Driven Development Another recipe for making clean code from Ron Jeffries, leading book author on Agile, XP and good practices: Clean code is the one that.. Run all the tests Contains no duplicate code Expresses all ideas... Minimize classes and methods Robert C. Martin, found a very successful metaphor for writing clean code sustainably.. It is called a Boy Scout rule of development: Leave the campground cleaner than you found it
  6. So let’s take a look at how to develop those great habits
  7. Naming things.. "There are only two hard problems in Computer Science: cache invalidation and naming things." - Phil Karlton, Principal Architect at Netscape Rule-1: long_descriptive_names. Don’t be afraid to to type a long name. All modern IDEs have auto-complete. And if yours don’t - get a better one that do! Even Python Notebooks have autocomplete for variable and function names. X, i, stuff and do_blah() are real variable or function names that I have seen! Rule2: name should be easy to pronounce. Arpdpu stands for .. Vova, what is this stands for again?? Vova: “Average Revenue per Daily Paying User Rule3: avoid encodings and abbreviations. Exception may be only where everyone in the organisation already knows that DAU stands for Daily Active Users.. Avoid hungarian notetion and other nonesence. But use Domain Names. Rule 4: add relevant context. Of course, long names make it hard to have lines that are 79 characters. But I prefer to have longer lines (we all have UltraHD Retina monitors these days anyway) rather than shorter and obscure names. Rule 5: naming is hard. Very hard. So think hard about naming things. And don’t be afraid to refactor and rename, if you found a better way to express the purpose of variable, function or a Class. Good EDI should have refactor-rename functions. And even Python Notebook has “search and replace” these days..
  8. Functions.. Rules of functions should be small should be smaller than that do one thing (also applies to classes) they should do it well do it only One Level of Abstraction: Don’t mix high-level policy and low-level details
  9. But there are exceptions.. Complex algorithms Technical notes and warnings Conventions and rules
  10. And this is where I will take over...
  11. So let’s take a look at how to develop those great habits
  12. So let’s take a look at how to develop those great habits
  13. Same environment Same color - reproducability
  14. Advice: reload() modules that are changing, so that RunAll produce the same result as Restart&RunAll There should be one and and only one narrative (story line) per notebook Not the same as test-cells. We are going to talk about Tests and TDD later Copy-paste cells - my favourite