1
August 2021 Jean Carlo Machado
Data Products
Python Clean Code for
Machine Learning
2
3
2
|
Motivation
● Clean ML Code is hard
● Less surprises
● Fewer incidents & Bugs
● Less Technical debt
● Easier handover of projects
● More Data Science less
operations
● Consistently ship products faster
3
3
|
Outline
1 2 3
The problem Large Scale
Clean Code
Small Scale
Clean Code
3
4
|
What is clean code?
“You know you are working on clean
code when each routine you read turns
out to be pretty much what you
expected.”
Ward Cunningham
3
5
|
ML Debt > Software Debt
Clean Code related
Glue code
Pipeline jungles
Configuration debt
Experimental code
paths
Not Clean code related
Entanglement
Hidden feedback loops
Static analysis of data
dependencies
Correlations drift
D. Schulley et. al. (2014)
6
Small Scale Clean Code
3
7
|
- Size & complexity of each
line
- Indentation level
+ Average comments
+ Spacing & blank lines
Readability “Feature Importance”
Buse and Weimar (2008)
3
8
|
Decorators
def track_execution(func):
print(f"Started " + func.__name__ )
func()
print(f"Finished " + func.__name__ )
@track_execution
def train():
print("Training")
$ python decorator.py
Started train
Training
Finished train
Add pre/post behaviour to functions.
3
9
|
List Comprehension
Reduces indentation, does not invite adding complexity, pythonic
3
10
|
Avoid Else, Early Return Instead
def with_else():
#...
if df_historic_performance_aggregated is None:
df_historic_performance_aggregated = df_aggregation
else:
#..
df_historic_performance_aggregated = (
# ...
)
return df_historic_performance_aggregated
def without_else():
# ...
if df_historic_performance_aggregated is None:
return df_aggregation
# ..
df_historic_performance_aggregated = (
#...
)
return df_historic_performance_aggregated
Reduce indentation and perceived complexity.
11
3
11
|
Other Dos and Dont’s
● Metaphor journal
● Import *
● assert out of tests
● default values for
functions
3
12
|
Type-Systems
from enum import Enum
from dataclasses import dataclass
class Platform(str, Enum):
mweb = 'mweb'
ios = 'ios'
@dataclass
class HeaderMypyChecked:
platform: Platform
HeaderMypyChecked(
platform="123")
from enum import Enum
from pydantic import BaseModel
class Platform(str, Enum):
mweb = 'mweb'
ios = 'ios'
class HeaderRuntimeChecked
(BaseModel):
platform: Platform
HeaderRuntimeChecked(
platform="123")
$ mypy type_system.py
type_system.py:15: error: Argument
"platform" to "HeaderMypyChecked" has
incompatible type "str" ; expected "Platform"
Found 1 error in 1 file (checked 1 source
file)
$ python type_system.py
pydantic.error_wrappers.ValidationError: 1
validation error for HeaderRuntimeChecked
platform
value is not a valid enumeration member;
permitted: 'mweb', 'ios '
(type=type_error.enum;
enum_values=[<Platform.mweb: 'mweb'>,
<Platform.ios: 'ios'>])
15% reduction of
software bugs
13
Large Scale Clean Code
3
14
|
Dom
1. Inside bounded-contexts the same
language is spoken
2. Clean and stable contracts
between contexts
Domain Driven Design
3
15
|
Side-effects
3
16
|
Closing Notes
Much more..
1. DRY
2. KISS
3. YAGNI
“Relatively simple things can tolerate a certain
level of disorganization. However, as
complexity increases, disorganization becomes
suicidal.“
Robert Martin
3
17
|
Books
18
3
18
|

Python clean code for data producs

  • 1.
    1 August 2021 JeanCarlo Machado Data Products Python Clean Code for Machine Learning
  • 2.
    2 3 2 | Motivation ● Clean MLCode is hard ● Less surprises ● Fewer incidents & Bugs ● Less Technical debt ● Easier handover of projects ● More Data Science less operations ● Consistently ship products faster
  • 3.
    3 3 | Outline 1 2 3 Theproblem Large Scale Clean Code Small Scale Clean Code
  • 4.
    3 4 | What is cleancode? “You know you are working on clean code when each routine you read turns out to be pretty much what you expected.” Ward Cunningham
  • 5.
    3 5 | ML Debt >Software Debt Clean Code related Glue code Pipeline jungles Configuration debt Experimental code paths Not Clean code related Entanglement Hidden feedback loops Static analysis of data dependencies Correlations drift D. Schulley et. al. (2014)
  • 6.
  • 7.
    3 7 | - Size &complexity of each line - Indentation level + Average comments + Spacing & blank lines Readability “Feature Importance” Buse and Weimar (2008)
  • 8.
    3 8 | Decorators def track_execution(func): print(f"Started "+ func.__name__ ) func() print(f"Finished " + func.__name__ ) @track_execution def train(): print("Training") $ python decorator.py Started train Training Finished train Add pre/post behaviour to functions.
  • 9.
    3 9 | List Comprehension Reduces indentation,does not invite adding complexity, pythonic
  • 10.
    3 10 | Avoid Else, EarlyReturn Instead def with_else(): #... if df_historic_performance_aggregated is None: df_historic_performance_aggregated = df_aggregation else: #.. df_historic_performance_aggregated = ( # ... ) return df_historic_performance_aggregated def without_else(): # ... if df_historic_performance_aggregated is None: return df_aggregation # .. df_historic_performance_aggregated = ( #... ) return df_historic_performance_aggregated Reduce indentation and perceived complexity.
  • 11.
    11 3 11 | Other Dos andDont’s ● Metaphor journal ● Import * ● assert out of tests ● default values for functions
  • 12.
    3 12 | Type-Systems from enum importEnum from dataclasses import dataclass class Platform(str, Enum): mweb = 'mweb' ios = 'ios' @dataclass class HeaderMypyChecked: platform: Platform HeaderMypyChecked( platform="123") from enum import Enum from pydantic import BaseModel class Platform(str, Enum): mweb = 'mweb' ios = 'ios' class HeaderRuntimeChecked (BaseModel): platform: Platform HeaderRuntimeChecked( platform="123") $ mypy type_system.py type_system.py:15: error: Argument "platform" to "HeaderMypyChecked" has incompatible type "str" ; expected "Platform" Found 1 error in 1 file (checked 1 source file) $ python type_system.py pydantic.error_wrappers.ValidationError: 1 validation error for HeaderRuntimeChecked platform value is not a valid enumeration member; permitted: 'mweb', 'ios ' (type=type_error.enum; enum_values=[<Platform.mweb: 'mweb'>, <Platform.ios: 'ios'>]) 15% reduction of software bugs
  • 13.
  • 14.
    3 14 | Dom 1. Inside bounded-contextsthe same language is spoken 2. Clean and stable contracts between contexts Domain Driven Design
  • 15.
  • 16.
    3 16 | Closing Notes Much more.. 1.DRY 2. KISS 3. YAGNI “Relatively simple things can tolerate a certain level of disorganization. However, as complexity increases, disorganization becomes suicidal.“ Robert Martin
  • 17.
  • 18.