This document discusses predicting customer churn using machine learning models built with Azure Databricks, Scikit-learn, and Mlflow. It involves collecting customer data, preprocessing the data through steps like encoding, scaling, sampling to address class imbalance, and splitting into train and test sets. Various classification models are trained and evaluated on the training data using metrics like f1 score, precision, and recall. Hyperparameter optimization is performed to improve model performance. The best model is stored and tracked using Mlflow for scoring new data and predicting customer churn probabilities. SHAP is used to explain the model predictions.
5. Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Train/te
st
split
SQL
6. Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
SQL
7. Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
Train
model
SQL
8. Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
Train
model
Evaluate
model
SQL
10. Introduction. Customer churn
Source: https://www.kaggle.com/blastchar/telco-customer-churn
Features: customer services data, account and demographic information
Target: Churner – customer that has left the company’s service
Goal: Predict probability that customer will churn
Business outcome: Actions that improve customer retention
13. Data preparation. Scale & one hot
One Hot Encoding Feature scaling
Encode
ID Color
123 Blue
235 Red
312 Red
455 Green
ID Blue Re
d
Green
123 1 0 0
235 0 1 0
312 0 1 0
455 0 0 1
Age Income
45 50000
20 35000
35 60000
65 45000
Scale
Age Income
0.69 0.83
0 0.58
0.54 1
1 0.75
14. Data preparation. Sampling
» Data sampling is needed when data classes are highly unbalanced.
» Upsampling – artificially create instances of smaller class.
» Downsampling – remove some instances from bigger class.
» We found upsampling (ADASYN or SMOTE) to work best.
23. Scoring. Getting the best model
» Initialize Mlflow client
» Fetch experiment run ids
» Fetch/store the run info
» Get the best run id
» Load (the best) model
24. Scoring. Predict and serve
» Predict probabilities
» Save the results
Can we explain these predictions?
25. SHAP (SHapley Additive exPlanations)
“A unified approach to explain the output of any machine learning model”