SlideShare a Scribd company logo
@KNerush @Volodymyrk
Clean Code
In Jupyter notebooks, using Python
1
5th of July, 2016
@KNerush @Volodymyrk
Volodymyr (Vlad) Kazantsev
Head of Data @ product madness
Product Manager
MBA @LBS
Graphics programming
Writes code for money since 2002
Math degree
2
Kateryna (Katya) Nerush
Mobile Dev @ Octopus Labs
Dev Lead in Finance
Data Engineer
Web Developer
Writes code for money since 2003
CS degree
@KNerush @Volodymyrk
Why we end-up with messy ipy notebooks?
3
Coding
Stats Business
@KNerush @Volodymyrk
Who are Data Scientists, really?
4
Coding
Stats Business “In a nutshell, coding is telling a computer to do
something using a language it understands.”
Data Science with Python
@KNerush @Volodymyrk
It is not going to production anyway!
5
@KNerush @Volodymyrk
“Any fool can write code that a computer can understand. Good programmers write
code that humans can understand” - Kent Beck, 1999
6
WTF! How am I suppose to
validate this??
Sorry, but how do
can I calculate
7 day retention ?
@KNerush @Volodymyrk
From Prototype to ... The Data Science Spiral
7
Ideas &
Questions
Data
Analysis
Insights
Impact
@KNerush @Volodymyrk
You do it for your own good..
8
Re-run all AB tests
analysis for the
last months, by
tomorrow
Ideas &
Questions
Data
Analysis
Insights
Impact
@KNerush @Volodymyrk
Part 2
What can Data Scientists learn from
Software Engineers?
9
@KNerush @Volodymyrk
Robert C. Martin, a.k.a. “Uncle Bob”
10
https://cleancoders.com/
@KNerush @Volodymyrk
“Clean Code” ?
11
Pleasingly graceful and stylish in appearance
or manner
Bjarne Stroustrup
Inventor of C++
Clean code reads like well written prose
Grady Booch
creator of UML
.. each routine turns out to be pretty much what
you expected
Ward Cunningham
inventor of Wiki and XP
@KNerush @Volodymyrk
One does not simply start writing clean code..
12
First make it work,
Then make it Right,
Then make it fast and small
Kent Beck
co-inventor of XP and TDD
Leave the campground cleaner than you found it
- Run all the tests
- Contains no duplicate code
- Expresses all ideas...
- Minimize classes and methods
Ron Jeffries
author of Extreme
Programming Installed
The Boy Scouts of America
Applied to programming by
Uncle Bob
@KNerush @Volodymyrk
I'm not a great programmer;
I'm just a good programmer with great habits.
13
Kent Beck
@KNerush @Volodymyrk
“There are only two hard problems in Computer Science:
cache invalidation and naming things" - Phil Karlton
● long_descriptive_names
○ Avoid: x, i, stuff, do_blah()
● Pronounceable and Searchable
○ revenue_per_payer vs. arpdpu
● Avoid encodings, abbreviations, prefixes, suffixes.. if possible
○ bonus_points_on_iphone vs. cns_crm_dip
● Add meaningful context
○ daily_revenue_per_payer
● Don’t be lazy.
○ Spend time naming and renaming things.
14
@KNerush @Volodymyrk
“each routine turns out to be pretty much what you
expected” - Ward Cunningham
● Small
● Do one thing
● One Level of Abstraction
● Have only few arguments (one is the best)
○ Less important in Python, with named arguments.
15
@KNerush @Volodymyrk
● Use good names
● Avoid obvious comments.
● Dead Commented-out Code
● ToDo, licenses, history, markup for documentation and other nonsense
● But there are exceptions..
“When you feel the need to write a comment, first try to refactor
the code so that any comment becomes superfluous” Kent Beck
16
@KNerush @Volodymyrk
// When I wrote this, only God and I understood what I was doing
// Now, God only knows
17
@KNerush @Volodymyrk
// sometimes I believe compiler ignores all my comments
18
@KNerush @Volodymyrk
/**
* Always returns true.
*/
public boolean isAvailable() {
return false;
}
19
@KNerush @Volodymyrk
“Long functions is where classes are trying to hide” -
Robert C. Martin
20
● Small
● Do one thing
● SOLID, Design Patterns, etc.
@KNerush @Volodymyrk
Code conventions
● Team should produce same style code as if that was one person
● Team conventions over language one, over personal ones
● Automate style formatting
21
@KNerush @Volodymyrk
Part 3
How to write Clean Code in Python?
(e.g. this is not Java)
22
@KNerush @Volodymyrk
● Indentation
● Tabs or Spaces?
● Maximum Line Length
● Should a line break before or after a binary operator?
● Blank Lines
● Imports
● Comments
● Naming Conventions
Example:
PEP 8 -- Style Guide for Python Code
23
foo = long_function_name(var_one, var_two,
var_three, var_four)
foo = long_function_name(var_one, var_two,
var_three, var_four)
Good Bad
https://www.python.org/dev/peps/pep-0008/
@KNerush @Volodymyrk
Google Python Style Guide
24
https://google.github.io/styleguide/pyguide.html
@KNerush @Volodymyrk25
My favourite !
This is not Java or C++
● Functions are first-class objects
● Duck-typing as an interface
● No setters/getters
● Itertools, zip, enumerate
● etc.
@KNerush @Volodymyrk
Part 4
How to write Clean Python Code in Jupyter
Notebook?
26
@KNerush @Volodymyrk
1. Imports
27
2. Get Data
5.Visualisation
6. Making sense of the data
4. Modelling
3. Transform Data
Typical structure of the ipynb
@KNerush @Volodymyrk
How big should a notebook file be?
28
@KNerush @Volodymyrk
How big should a notebook file be?
Hypothesis - Data - Interpretation
29
@KNerush @Volodymyrk
Keep your notebooks small!
(4-10 cells each)
30
@KNerush @Volodymyrk
Example:
Tip 1: break fat notebook into many small ones
31
1_data_preparation.ipynb
df.to_pickle(‘clean_data_1.pkl)
2_linear_model.py
df = pd.read_pickle(‘clean_data_1.pkl)
3_ensamble.py
df = pd.read_pickle(‘clean_data_1.pkl)
@KNerush @Volodymyrk
Tip 2: shared library
● Data access
● Common plotting functionality
● Report generation
● Misc. utils
32
acme_data_utils
Data_access.py
plotting.py
setup.py
tests/
@KNerush @Volodymyrk
Tip 3: Don’t just be pythonic. Be IPythonic
Don’t hide “secret sauce” inside imported module
BAD:
Good:
33
@KNerush @Volodymyrk
Clean code reads like well written prose
34
Grady Booch
@KNerush @Volodymyrk
Good jupyter notebook reads like well written prose
35
@KNerush @Volodymyrk
How big should one Cell be?
36
@KNerush @Volodymyrk
● One “idea - execution - output” triplet per cell
● Import Cell: expected output is no import errors
● CMD+SHIFT+P
37
Tip 4: each cell should have one logical output
@KNerush @Volodymyrk
Tip 5: write tests .. in jupyter notebooks
38
https://pypi.python.org/pypi/pytest-ipynb
@KNerush @Volodymyrk
Tip 6: ..to the cloud
39
@KNerush @Volodymyrk
Code Smells .. in ipynb
- Cells can’t be executed in order (with runAll and Restart&RunAll)
- Prototype (check ideas) code is mixed with “analysis” code
- Debugging cells
- Copy-paste cells
- Duplicate code (in general)
- Multiple notebooks that re-implement the same function
40
@KNerush @Volodymyrk
Tip 7: Run notebook from another notebook!
41
analysis.ipynb
@KNerush @Volodymyrk
Make Data Product from notebooks!
42
@KNerush @Volodymyrk
Summary: How to organise a Jupyter project
1. Notebook should have one Hypothesis-Data-Interpretation loop
2. Make a multi-project utils library
3. Good jupyter notebook reads like a well written prose
4. Each cell should have one and only one output
5. Write tests in notebooks
6. Deploy a shared Jupyter server
7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.
43

More Related Content

What's hot

Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
Paras Kohli
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
DataWorks Summit
 
Python Basics
Python BasicsPython Basics
Python Basics
tusharpanda88
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
SHUBHAM GUPTA
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Sangwoo Mo
 
Build, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at ScaleBuild, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at Scale
Amazon Web Services
 
Introduction to Python programming
Introduction to Python programmingIntroduction to Python programming
Introduction to Python programming
Damian T. Gordon
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Edureka!
 
Machine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflowMachine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflow
Databricks
 
Xgboost
XgboostXgboost
Clean pragmatic architecture @ devflix
Clean pragmatic architecture @ devflixClean pragmatic architecture @ devflix
Clean pragmatic architecture @ devflix
Victor Rentea
 
Clean Architecture
Clean ArchitectureClean Architecture
Clean Architecture
Badoo
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in Python
Microsoft
 
Clean architecture
Clean architectureClean architecture
Clean architecture
Travis Frisinger
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
Andrew Ferlitsch
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Krishnaram Kenthapadi
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
佳蓉 倪
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
Heman Pathak
 
Deep Learning With Python Tutorial | Edureka
Deep Learning With Python Tutorial | EdurekaDeep Learning With Python Tutorial | Edureka
Deep Learning With Python Tutorial | Edureka
Edureka!
 
DVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projectsDVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projects
Francesco Casalegno
 

What's hot (20)

Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Python Basics
Python BasicsPython Basics
Python Basics
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Build, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at ScaleBuild, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at Scale
 
Introduction to Python programming
Introduction to Python programmingIntroduction to Python programming
Introduction to Python programming
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
 
Machine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflowMachine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflow
 
Xgboost
XgboostXgboost
Xgboost
 
Clean pragmatic architecture @ devflix
Clean pragmatic architecture @ devflixClean pragmatic architecture @ devflix
Clean pragmatic architecture @ devflix
 
Clean Architecture
Clean ArchitectureClean Architecture
Clean Architecture
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in Python
 
Clean architecture
Clean architectureClean architecture
Clean architecture
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Deep Learning With Python Tutorial | Edureka
Deep Learning With Python Tutorial | EdurekaDeep Learning With Python Tutorial | Edureka
Deep Learning With Python Tutorial | Edureka
 
DVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projectsDVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projects
 

Similar to Clean Code in Jupyter notebook

OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
Altinity Ltd
 
Top Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To KnowTop Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To KnowKathy Brown
 
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
Codemotion
 
How to write maintainable code
How to write maintainable codeHow to write maintainable code
How to write maintainable code
Peter Hilton
 
Performance #5 cpu and battery
Performance #5  cpu and batteryPerformance #5  cpu and battery
Performance #5 cpu and battery
Vitali Pekelis
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
Discover Pinterest
 
Prometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesPrometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on Kubernetes
Leonardo Di Donato
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)
David Evans
 
Peddle the Pedal to the Metal
Peddle the Pedal to the MetalPeddle the Pedal to the Metal
Peddle the Pedal to the Metal
C4Media
 
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
Igalia
 
Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1
Andrei KUCHARAVY
 
Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....
Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....
Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....
AboutYouGmbH
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
Brandon Liu
 
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScriptBig Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
Igalia
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational Work
Yung-Yu Chen
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
Arcus Universe Ltd
 
Stop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principlesStop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principlesEdorian
 
Practicing Python 3
Practicing Python 3Practicing Python 3
Practicing Python 3
Mosky Liu
 

Similar to Clean Code in Jupyter notebook (20)

OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
Top Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To KnowTop Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To Know
 
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017
 
How to write maintainable code
How to write maintainable codeHow to write maintainable code
How to write maintainable code
 
Performance #5 cpu and battery
Performance #5  cpu and batteryPerformance #5  cpu and battery
Performance #5 cpu and battery
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
 
Caveats
CaveatsCaveats
Caveats
 
engage 2014 - JavaBlast
engage 2014 - JavaBlastengage 2014 - JavaBlast
engage 2014 - JavaBlast
 
Prometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on KubernetesPrometheus as exposition format for eBPF programs running on Kubernetes
Prometheus as exposition format for eBPF programs running on Kubernetes
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)
 
Peddle the Pedal to the Metal
Peddle the Pedal to the MetalPeddle the Pedal to the Metal
Peddle the Pedal to the Metal
 
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
 
Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1
 
Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....
Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....
Dennis Benkert & Matthias Lübken - Patterns in a containerized world? - code....
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScriptBig Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational Work
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
Stop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principlesStop wasting-time-by-applying-clean-code-principles
Stop wasting-time-by-applying-clean-code-principles
 
Practicing Python 3
Practicing Python 3Practicing Python 3
Practicing Python 3
 

More from Volodymyr Kazantsev

Plotly dash and data visualisation in Python
Plotly dash and data visualisation in PythonPlotly dash and data visualisation in Python
Plotly dash and data visualisation in Python
Volodymyr Kazantsev
 
Bayesian model averaging
Bayesian model averagingBayesian model averaging
Bayesian model averaging
Volodymyr Kazantsev
 
Customer segmentation - Games Analytics and Business Intelligence, Sep 2015
Customer segmentation - Games Analytics and Business Intelligence, Sep 2015Customer segmentation - Games Analytics and Business Intelligence, Sep 2015
Customer segmentation - Games Analytics and Business Intelligence, Sep 2015
Volodymyr Kazantsev
 
How to conclude online experiments in python
How to conclude online experiments in pythonHow to conclude online experiments in python
How to conclude online experiments in python
Volodymyr Kazantsev
 
Agile data visualisation
Agile data visualisationAgile data visualisation
Agile data visualisation
Volodymyr Kazantsev
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
Volodymyr Kazantsev
 

More from Volodymyr Kazantsev (6)

Plotly dash and data visualisation in Python
Plotly dash and data visualisation in PythonPlotly dash and data visualisation in Python
Plotly dash and data visualisation in Python
 
Bayesian model averaging
Bayesian model averagingBayesian model averaging
Bayesian model averaging
 
Customer segmentation - Games Analytics and Business Intelligence, Sep 2015
Customer segmentation - Games Analytics and Business Intelligence, Sep 2015Customer segmentation - Games Analytics and Business Intelligence, Sep 2015
Customer segmentation - Games Analytics and Business Intelligence, Sep 2015
 
How to conclude online experiments in python
How to conclude online experiments in pythonHow to conclude online experiments in python
How to conclude online experiments in python
 
Agile data visualisation
Agile data visualisationAgile data visualisation
Agile data visualisation
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 

Recently uploaded

Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 

Recently uploaded (20)

Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 

Clean Code in Jupyter notebook

  • 1. @KNerush @Volodymyrk Clean Code In Jupyter notebooks, using Python 1 5th of July, 2016
  • 2. @KNerush @Volodymyrk Volodymyr (Vlad) Kazantsev Head of Data @ product madness Product Manager MBA @LBS Graphics programming Writes code for money since 2002 Math degree 2 Kateryna (Katya) Nerush Mobile Dev @ Octopus Labs Dev Lead in Finance Data Engineer Web Developer Writes code for money since 2003 CS degree
  • 3. @KNerush @Volodymyrk Why we end-up with messy ipy notebooks? 3 Coding Stats Business
  • 4. @KNerush @Volodymyrk Who are Data Scientists, really? 4 Coding Stats Business “In a nutshell, coding is telling a computer to do something using a language it understands.” Data Science with Python
  • 5. @KNerush @Volodymyrk It is not going to production anyway! 5
  • 6. @KNerush @Volodymyrk “Any fool can write code that a computer can understand. Good programmers write code that humans can understand” - Kent Beck, 1999 6 WTF! How am I suppose to validate this?? Sorry, but how do can I calculate 7 day retention ?
  • 7. @KNerush @Volodymyrk From Prototype to ... The Data Science Spiral 7 Ideas & Questions Data Analysis Insights Impact
  • 8. @KNerush @Volodymyrk You do it for your own good.. 8 Re-run all AB tests analysis for the last months, by tomorrow Ideas & Questions Data Analysis Insights Impact
  • 9. @KNerush @Volodymyrk Part 2 What can Data Scientists learn from Software Engineers? 9
  • 10. @KNerush @Volodymyrk Robert C. Martin, a.k.a. “Uncle Bob” 10 https://cleancoders.com/
  • 11. @KNerush @Volodymyrk “Clean Code” ? 11 Pleasingly graceful and stylish in appearance or manner Bjarne Stroustrup Inventor of C++ Clean code reads like well written prose Grady Booch creator of UML .. each routine turns out to be pretty much what you expected Ward Cunningham inventor of Wiki and XP
  • 12. @KNerush @Volodymyrk One does not simply start writing clean code.. 12 First make it work, Then make it Right, Then make it fast and small Kent Beck co-inventor of XP and TDD Leave the campground cleaner than you found it - Run all the tests - Contains no duplicate code - Expresses all ideas... - Minimize classes and methods Ron Jeffries author of Extreme Programming Installed The Boy Scouts of America Applied to programming by Uncle Bob
  • 13. @KNerush @Volodymyrk I'm not a great programmer; I'm just a good programmer with great habits. 13 Kent Beck
  • 14. @KNerush @Volodymyrk “There are only two hard problems in Computer Science: cache invalidation and naming things" - Phil Karlton ● long_descriptive_names ○ Avoid: x, i, stuff, do_blah() ● Pronounceable and Searchable ○ revenue_per_payer vs. arpdpu ● Avoid encodings, abbreviations, prefixes, suffixes.. if possible ○ bonus_points_on_iphone vs. cns_crm_dip ● Add meaningful context ○ daily_revenue_per_payer ● Don’t be lazy. ○ Spend time naming and renaming things. 14
  • 15. @KNerush @Volodymyrk “each routine turns out to be pretty much what you expected” - Ward Cunningham ● Small ● Do one thing ● One Level of Abstraction ● Have only few arguments (one is the best) ○ Less important in Python, with named arguments. 15
  • 16. @KNerush @Volodymyrk ● Use good names ● Avoid obvious comments. ● Dead Commented-out Code ● ToDo, licenses, history, markup for documentation and other nonsense ● But there are exceptions.. “When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous” Kent Beck 16
  • 17. @KNerush @Volodymyrk // When I wrote this, only God and I understood what I was doing // Now, God only knows 17
  • 18. @KNerush @Volodymyrk // sometimes I believe compiler ignores all my comments 18
  • 19. @KNerush @Volodymyrk /** * Always returns true. */ public boolean isAvailable() { return false; } 19
  • 20. @KNerush @Volodymyrk “Long functions is where classes are trying to hide” - Robert C. Martin 20 ● Small ● Do one thing ● SOLID, Design Patterns, etc.
  • 21. @KNerush @Volodymyrk Code conventions ● Team should produce same style code as if that was one person ● Team conventions over language one, over personal ones ● Automate style formatting 21
  • 22. @KNerush @Volodymyrk Part 3 How to write Clean Code in Python? (e.g. this is not Java) 22
  • 23. @KNerush @Volodymyrk ● Indentation ● Tabs or Spaces? ● Maximum Line Length ● Should a line break before or after a binary operator? ● Blank Lines ● Imports ● Comments ● Naming Conventions Example: PEP 8 -- Style Guide for Python Code 23 foo = long_function_name(var_one, var_two, var_three, var_four) foo = long_function_name(var_one, var_two, var_three, var_four) Good Bad https://www.python.org/dev/peps/pep-0008/
  • 24. @KNerush @Volodymyrk Google Python Style Guide 24 https://google.github.io/styleguide/pyguide.html
  • 25. @KNerush @Volodymyrk25 My favourite ! This is not Java or C++ ● Functions are first-class objects ● Duck-typing as an interface ● No setters/getters ● Itertools, zip, enumerate ● etc.
  • 26. @KNerush @Volodymyrk Part 4 How to write Clean Python Code in Jupyter Notebook? 26
  • 27. @KNerush @Volodymyrk 1. Imports 27 2. Get Data 5.Visualisation 6. Making sense of the data 4. Modelling 3. Transform Data Typical structure of the ipynb
  • 28. @KNerush @Volodymyrk How big should a notebook file be? 28
  • 29. @KNerush @Volodymyrk How big should a notebook file be? Hypothesis - Data - Interpretation 29
  • 30. @KNerush @Volodymyrk Keep your notebooks small! (4-10 cells each) 30
  • 31. @KNerush @Volodymyrk Example: Tip 1: break fat notebook into many small ones 31 1_data_preparation.ipynb df.to_pickle(‘clean_data_1.pkl) 2_linear_model.py df = pd.read_pickle(‘clean_data_1.pkl) 3_ensamble.py df = pd.read_pickle(‘clean_data_1.pkl)
  • 32. @KNerush @Volodymyrk Tip 2: shared library ● Data access ● Common plotting functionality ● Report generation ● Misc. utils 32 acme_data_utils Data_access.py plotting.py setup.py tests/
  • 33. @KNerush @Volodymyrk Tip 3: Don’t just be pythonic. Be IPythonic Don’t hide “secret sauce” inside imported module BAD: Good: 33
  • 34. @KNerush @Volodymyrk Clean code reads like well written prose 34 Grady Booch
  • 35. @KNerush @Volodymyrk Good jupyter notebook reads like well written prose 35
  • 36. @KNerush @Volodymyrk How big should one Cell be? 36
  • 37. @KNerush @Volodymyrk ● One “idea - execution - output” triplet per cell ● Import Cell: expected output is no import errors ● CMD+SHIFT+P 37 Tip 4: each cell should have one logical output
  • 38. @KNerush @Volodymyrk Tip 5: write tests .. in jupyter notebooks 38 https://pypi.python.org/pypi/pytest-ipynb
  • 39. @KNerush @Volodymyrk Tip 6: ..to the cloud 39
  • 40. @KNerush @Volodymyrk Code Smells .. in ipynb - Cells can’t be executed in order (with runAll and Restart&RunAll) - Prototype (check ideas) code is mixed with “analysis” code - Debugging cells - Copy-paste cells - Duplicate code (in general) - Multiple notebooks that re-implement the same function 40
  • 41. @KNerush @Volodymyrk Tip 7: Run notebook from another notebook! 41 analysis.ipynb
  • 42. @KNerush @Volodymyrk Make Data Product from notebooks! 42
  • 43. @KNerush @Volodymyrk Summary: How to organise a Jupyter project 1. Notebook should have one Hypothesis-Data-Interpretation loop 2. Make a multi-project utils library 3. Good jupyter notebook reads like a well written prose 4. Each cell should have one and only one output 5. Write tests in notebooks 6. Deploy a shared Jupyter server 7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible. 43