Python clean code for data producs

1
August 2021 Jean Carlo Machado
Data Products
Python Clean Code for
Machine Learning

2
3
2
|
Motivation
● Clean ML Code is hard
● Less surprises
● Fewer incidents & Bugs
● Less Technical debt
● Easier handover of projects
● More Data Science less
operations
● Consistently ship products faster

3
3
|
Outline
1 2 3
The problem Large Scale
Clean Code
Small Scale
Clean Code

3
4
|
What is clean code?
“You know you are working on clean
code when each routine you read turns
out to be pretty much what you
expected.”
Ward Cunningham

3
5
|
ML Debt > Software Debt
Clean Code related
Glue code
Pipeline jungles
Configuration debt
Experimental code
paths
Not Clean code related
Entanglement
Hidden feedback loops
Static analysis of data
dependencies
Correlations drift
D. Schulley et. al. (2014)

3
7
|
- Size & complexity of each
line
- Indentation level
+ Average comments
+ Spacing & blank lines
Readability “Feature Importance”
Buse and Weimar (2008)

3
8
|
Decorators
def track_execution(func):
print(f"Started " + func.__name__ )
func()
print(f"Finished " + func.__name__ )
@track_execution
def train():
print("Training")
$ python decorator.py
Started train
Training
Finished train
Add pre/post behaviour to functions.

3
9
|
List Comprehension
Reduces indentation, does not invite adding complexity, pythonic

3
10
|
Avoid Else, Early Return Instead
def with_else():
#...
if df_historic_performance_aggregated is None:
df_historic_performance_aggregated = df_aggregation
else:
#..
df_historic_performance_aggregated = (
# ...
)
return df_historic_performance_aggregated
def without_else():
# ...
if df_historic_performance_aggregated is None:
return df_aggregation
# ..
df_historic_performance_aggregated = (
#...
)
return df_historic_performance_aggregated
Reduce indentation and perceived complexity.

11
3
11
|
Other Dos and Dont’s
● Metaphor journal
● Import *
● assert out of tests
● default values for
functions

3
12
|
Type-Systems
from enum import Enum
from dataclasses import dataclass
class Platform(str, Enum):
mweb = 'mweb'
ios = 'ios'
@dataclass
class HeaderMypyChecked:
platform: Platform
HeaderMypyChecked(
platform="123")
from enum import Enum
from pydantic import BaseModel
class Platform(str, Enum):
mweb = 'mweb'
ios = 'ios'
class HeaderRuntimeChecked
(BaseModel):
platform: Platform
HeaderRuntimeChecked(
platform="123")
$ mypy type_system.py
type_system.py:15: error: Argument
"platform" to "HeaderMypyChecked" has
incompatible type "str" ; expected "Platform"
Found 1 error in 1 file (checked 1 source
file)
$ python type_system.py
pydantic.error_wrappers.ValidationError: 1
validation error for HeaderRuntimeChecked
platform
value is not a valid enumeration member;
permitted: 'mweb', 'ios '
(type=type_error.enum;
enum_values=[<Platform.mweb: 'mweb'>,
<Platform.ios: 'ios'>])
15% reduction of
software bugs

3
14
|
Dom
1. Inside bounded-contexts the same
language is spoken
2. Clean and stable contracts
between contexts
Domain Driven Design

3
16
|
Closing Notes
Much more..
1. DRY
2. KISS
3. YAGNI
“Relatively simple things can tolerate a certain
level of disorganization. However, as
complexity increases, disorganization becomes
suicidal.“
Robert Martin

Python clean code for data producs

More Related Content

What's hot

Similar to Python clean code for data producs

More from Jean Carlo Machado

Recently uploaded

Python clean code for data producs