From Software Engineering
To Machine Learning
Alexey Grigorev
Lead Data Scientist at OLX Group
Founder at DataTalks.Club
2010
2012
2015
2018
mlbookcamp.com
https://tech.olx.com/detecting-image-duplicates-at-olx-scale-7f59e4b6aef4
Mostly engineering work!
Mostly engineering work!
Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf
You already have 90% of required skills
Learning Plan
● Start with fundamentals
● Learn simple algorithms
● Evaluate your model
● Deploy your model
● Learn complex algorithms
Learning Plan
● Start with fundamentals
● Learn simple algorithms
● Evaluate your model
● Deploy your model
● Learn complex algorithms
By doing projects!
The fundamentals
● Python
● NumPy
● Pandas
WARNING:
SOMETHING SCARY
It’s not scary!
Matrix multiplication is just a bunch of for loops!
It’s not scary!
Matrix multiplication is just a bunch of for loops!
Tip + Home task: implement these operations yourself:
● Vector-vector multiplication
● Matrix-vector multiplication
● Matrix-matrix multiplication
Bonus points:
● Express each operation using one for loop + previous operation
The best way to learn:
Learn by doing projects
Regression
Classification
Evaluation
Tree-Based
Models
Image
Classification
Kubernetes
and Kubeflow
Serverless
Deep
Learning
Chapter 2
Chapter 4
Classification
Chapter 3
Deployment
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
https://github.com/alexeygrigorev/mlbookcamp-code/
Pictures from olx.ua
Project #1: Car Price Prediction
https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/chapter-02-car-price/02-carprice.ipynb
Project #2: Churn
https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/chapter-03-churn-prediction/03-churn.ipynb
Churn: 10% Churn: 20% Churn: 30% Churn: 40% Churn: 45%
Churn: 85%
Image source
Project #2 Cont’d: Evaluation
https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/chapter-03-churn-prediction/04-metrics.ipynb
IMPORTANT!
Project #2 Cont’d: Deployment
https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/chapter-05-deployment
Model
/predict
Churn service
POST /predict
{
"probability": 0.06,
"churn": true
}
Request
Response
{
"id": "8879-zkjof",
"gender": "female",
"partner": "no",
...
}
IMPORTANT!
* But easy
for devs
*
Project #3: Credit Risk
https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/chapter-06-trees/06-trees.ipynb
🚗
Risk Scoring
Model
Approve
Decline
Project #4: Image Classification
https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/chapter-07-neural-nets/07-neural-nets-train.ipynb
inputs base vector outputs
T-Shirt
150x150x3
keras.Model(inputs, outputs)
Dense(10)
Global
Average
Pooling2D
base_model
Input
Project #4 Cont’d: Deploy with Lambda
{
"tshirt": 0.9993,
"pants": 0.0005,
"shoes": 0.00004
}
https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/chapter-08-serverless
Project #4 Cont’d: Deploy with Kubernetes
Gateway
(Resize and
process image)
Flask
Model
(Make predictions)
TF-Serving
Pants
Raw
predictions
Pre-processed
image
HTTP
(JSON)
gRPC
(protobuf)
https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/chapter-09-kubernetes
Next steps
● Data science competitions (Kaggle)
● End-to-end projects — your own!
● You don’t have to do it alone. Join a community
Summary
● You already have 90% of required skills
● Learn fundamentals: Python, NumPy, Pandas
● Don’t be afraid of math (it’s just for loops)
● The best way to learn is by doing projects
● Learn evaluation metrics and cross-validation
● Deployment is easy for you and difficult for data scientists
● Don’t do it alone!
@Al_Grigor
agrigorev
DataTalks.Club

From Software Engineering To Machine Learning