Predicting customer churn
Using Azure Databricks, Sklearn and Mlflow
Introduction. Machine learning
Data Preprocess Training Model Predictions
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
SQL
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
SQL
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Train/te
st
split
SQL
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
SQL
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
Train
model
SQL
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
Train
model
Evaluate
model
SQL
Introduction. Databricks
Web-based platform for spark «
Automated cluster management «
IPython-style notebooks «
Collaborative environment «
Mlflow – model management repository «
SQL
Introduction. Customer churn
Source: https://www.kaggle.com/blastchar/telco-customer-churn
Features: customer services data, account and demographic information
Target: Churner – customer that has left the company’s service
Goal: Predict probability that customer will churn
Business outcome: Actions that improve customer retention
Data preparation. Loading the data.
Data preparation. Train / test split
Data
Train
data
Test
data
SPLIT
Data preparation. Scale & one hot
One Hot Encoding Feature scaling
Encode
ID Color
123 Blue
235 Red
312 Red
455 Green
ID Blue Re
d
Green
123 1 0 0
235 0 1 0
312 0 1 0
455 0 0 1
Age Income
45 50000
20 35000
35 60000
65 45000
Scale
Age Income
0.69 0.83
0 0.58
0.54 1
1 0.75
Data preparation. Sampling
» Data sampling is needed when data classes are highly unbalanced.
» Upsampling – artificially create instances of smaller class.
» Downsampling – remove some instances from bigger class.
» We found upsampling (ADASYN or SMOTE) to work best.
Pipeline
The Machine learning pipeline consists of 3 steps:
» Preprocessor
» Sampler
» Classifier
Hyperparameter optimization
Finding the optimal set of hyperparameters to maximize performance
Training and evaluating the model
Train model on training data
Test the model to make
predictions on test data.
Calculate evaluation metrics
Classification evaluation metrics
𝑓1 = 2 ×
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
Actual
Positive Negative
Predicted
Positive True Positive False Positive
Negative False Negative True Negative
Action Preferred False Positive cost False Negative cost
Phone call High precision High Low
Sending an email High recall Low High
Mlflow
» Track training parameters
» Track performance metrics
» Store models
» Store graphs or other data
» Easily package and deploy
Mlflow GUI
Mlflow GUI
Mlflow GUI
Scoring. Getting the best model
» Initialize Mlflow client
» Fetch experiment run ids
» Fetch/store the run info
» Get the best run id
» Load (the best) model
Scoring. Predict and serve
» Predict probabilities
» Save the results
Can we explain these predictions?
SHAP (SHapley Additive exPlanations)
“A unified approach to explain the output of any machine learning model”
SHAP Plots. Summary plot
SHAP Plots. Force plot (individual)
Customer A
Customer B
THE END
Questions?

Presentation

  • 1.
    Predicting customer churn UsingAzure Databricks, Sklearn and Mlflow
  • 2.
    Introduction. Machine learning DataPreprocess Training Model Predictions
  • 3.
    Introduction. Machine learning.Process. Collect raw data Curate data Train & Score Take Insights Into Actions SQL
  • 4.
    Introduction. Machine learning.Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data SQL
  • 5.
    Introduction. Machine learning.Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Train/te st split SQL
  • 6.
    Introduction. Machine learning.Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Preprocess Train/te st split SQL
  • 7.
    Introduction. Machine learning.Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Preprocess Train/te st split Train model SQL
  • 8.
    Introduction. Machine learning.Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Preprocess Train/te st split Train model Evaluate model SQL
  • 9.
    Introduction. Databricks Web-based platformfor spark « Automated cluster management « IPython-style notebooks « Collaborative environment « Mlflow – model management repository « SQL
  • 10.
    Introduction. Customer churn Source:https://www.kaggle.com/blastchar/telco-customer-churn Features: customer services data, account and demographic information Target: Churner – customer that has left the company’s service Goal: Predict probability that customer will churn Business outcome: Actions that improve customer retention
  • 11.
  • 12.
    Data preparation. Train/ test split Data Train data Test data SPLIT
  • 13.
    Data preparation. Scale& one hot One Hot Encoding Feature scaling Encode ID Color 123 Blue 235 Red 312 Red 455 Green ID Blue Re d Green 123 1 0 0 235 0 1 0 312 0 1 0 455 0 0 1 Age Income 45 50000 20 35000 35 60000 65 45000 Scale Age Income 0.69 0.83 0 0.58 0.54 1 1 0.75
  • 14.
    Data preparation. Sampling »Data sampling is needed when data classes are highly unbalanced. » Upsampling – artificially create instances of smaller class. » Downsampling – remove some instances from bigger class. » We found upsampling (ADASYN or SMOTE) to work best.
  • 15.
    Pipeline The Machine learningpipeline consists of 3 steps: » Preprocessor » Sampler » Classifier
  • 16.
    Hyperparameter optimization Finding theoptimal set of hyperparameters to maximize performance
  • 17.
    Training and evaluatingthe model Train model on training data Test the model to make predictions on test data. Calculate evaluation metrics
  • 18.
    Classification evaluation metrics 𝑓1= 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 Actual Positive Negative Predicted Positive True Positive False Positive Negative False Negative True Negative Action Preferred False Positive cost False Negative cost Phone call High precision High Low Sending an email High recall Low High
  • 19.
    Mlflow » Track trainingparameters » Track performance metrics » Store models » Store graphs or other data » Easily package and deploy
  • 20.
  • 21.
  • 22.
  • 23.
    Scoring. Getting thebest model » Initialize Mlflow client » Fetch experiment run ids » Fetch/store the run info » Get the best run id » Load (the best) model
  • 24.
    Scoring. Predict andserve » Predict probabilities » Save the results Can we explain these predictions?
  • 25.
    SHAP (SHapley AdditiveexPlanations) “A unified approach to explain the output of any machine learning model”
  • 26.
  • 27.
    SHAP Plots. Forceplot (individual) Customer A Customer B
  • 28.