The document discusses clean code practices for machine learning projects. It defines clean code as code where each routine does what is expected. Clean ML code can lead to fewer bugs, less technical debt, and easier handovers. The document outlines practices for small-scale clean code like decorators, list comprehensions, and avoiding else blocks. It also discusses type systems, domain-driven design, and side effects for large-scale clean code. Overall, clean code aims to reduce complexity and make code more readable and maintainable.
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Python Clean Code for Machine Learning
1. 1
August 2021 Jean Carlo Machado
Data Products
Python Clean Code for
Machine Learning
2. 2
3
2
|
Motivation
● Clean ML Code is hard
● Less surprises
● Fewer incidents & Bugs
● Less Technical debt
● Easier handover of projects
● More Data Science less
operations
● Consistently ship products faster
4. 3
4
|
What is clean code?
“You know you are working on clean
code when each routine you read turns
out to be pretty much what you
expected.”
Ward Cunningham
5. 3
5
|
ML Debt > Software Debt
Clean Code related
Glue code
Pipeline jungles
Configuration debt
Experimental code
paths
Not Clean code related
Entanglement
Hidden feedback loops
Static analysis of data
dependencies
Correlations drift
D. Schulley et. al. (2014)
16. 3
16
|
Closing Notes
Much more..
1. DRY
2. KISS
3. YAGNI
“Relatively simple things can tolerate a certain
level of disorganization. However, as
complexity increases, disorganization becomes
suicidal.“
Robert Martin