Predict Customer Churn with Azure Databricks, MLflow and SKLearn

•Download as PPTX, PDF•

2 likes•87 views

This document discusses predicting customer churn using machine learning models built with Azure Databricks, Scikit-learn, and Mlflow. It involves collecting customer data, preprocessing the data through steps like encoding, scaling, sampling to address class imbalance, and splitting into train and test sets. Various classification models are trained and evaluated on the training data using metrics like f1 score, precision, and recall. Hyperparameter optimization is performed to improve model performance. The best model is stored and tracked using Mlflow for scoring new data and predicting customer churn probabilities. SHAP is used to explain the model predictions.

Data & Analytics

Predicting customer churn
Using Azure Databricks, Sklearn and Mlflow

Introduction. Machine learning
Data Preprocess Training Model Predictions

Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
SQL

Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
SQL

Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Train/te
st
split
SQL

Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
SQL

Introduction. Databricks
Web-based platform for spark «
Automated cluster management «
IPython-style notebooks «
Collaborative environment «
Mlflow – model management repository «
SQL

Introduction. Customer churn
Source: https://www.kaggle.com/blastchar/telco-customer-churn
Features: customer services data, account and demographic information
Target: Churner – customer that has left the company’s service
Goal: Predict probability that customer will churn
Business outcome: Actions that improve customer retention

Data preparation. Train / test split
Data
Train
data
Test
data
SPLIT

Data preparation. Scale & one hot
One Hot Encoding Feature scaling
Encode
ID Color
123 Blue
235 Red
312 Red
455 Green
ID Blue Re
d
Green
123 1 0 0
235 0 1 0
312 0 1 0
455 0 0 1
Age Income
45 50000
20 35000
35 60000
65 45000
Scale
Age Income
0.69 0.83
0 0.58
0.54 1
1 0.75

Data preparation. Sampling
» Data sampling is needed when data classes are highly unbalanced.
» Upsampling – artificially create instances of smaller class.
» Downsampling – remove some instances from bigger class.
» We found upsampling (ADASYN or SMOTE) to work best.

Pipeline
The Machine learning pipeline consists of 3 steps:
» Preprocessor
» Sampler
» Classifier

Hyperparameter optimization
Finding the optimal set of hyperparameters to maximize performance

Training and evaluating the model
Train model on training data
Test the model to make
predictions on test data.
Calculate evaluation metrics

Classification evaluation metrics
𝑓1 = 2 ×
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
Actual
Positive Negative
Predicted
Positive True Positive False Positive
Negative False Negative True Negative
Action Preferred False Positive cost False Negative cost
Phone call High precision High Low
Sending an email High recall Low High

Mlflow
» Track training parameters
» Track performance metrics
» Store models
» Store graphs or other data
» Easily package and deploy

Scoring. Getting the best model
» Initialize Mlflow client
» Fetch experiment run ids
» Fetch/store the run info
» Get the best run id
» Load (the best) model

Scoring. Predict and serve
» Predict probabilities
» Save the results
Can we explain these predictions?

SHAP (SHapley Additive exPlanations)
“A unified approach to explain the output of any machine learning model”

SHAP Plots. Force plot (individual)
Customer A
Customer B

What's hot

Building Understanding Out of Incomplete and Biased Datasets using Machine Le...Databricks

NLP Text Recommendation System Journey to Automated TrainingDatabricks

Scalable Automatic Machine Learning in H2OSri Ambati

Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon

[2C2]PredictionIONAVER D2

Machine learning for java developersNirmal Fernando

Introduction to PredictionIOMuhammet Arslan

SigOpt for Machine Learning and AISigOpt

MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementDatabricks

Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiSri Ambati

Apache mahout - introductionJackson dos Santos Olveira

Ds for finance day 3QuantUniversity

Machine Learning Software Design Pattern with PredictionIOTuri, Inc.

Anomaly Detection at Scale!Databricks

Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Sri Ambati

AI Modernization at AT&T and the Application to Fraud with DatabricksDatabricks

An introduction to predictionIOJackson dos Santos Olveira

Productionizing Machine Learning in Our Health and Wellness MarketplaceDatabricks

Building Data Products with Python (Georgetown)Benjamin Bengfort

What's hot (19)

Building Understanding Out of Incomplete and Biased Datasets using Machine Le...

NLP Text Recommendation System Journey to Automated Training

Scalable Automatic Machine Learning in H2O

Production ready big ml workflows from zero to hero daniel marcous @ waze

[2C2]PredictionIO

Machine learning for java developers

Introduction to PredictionIO

SigOpt for Machine Learning and AI

MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management

Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai

Apache mahout - introduction

Ds for finance day 3

Machine Learning Software Design Pattern with PredictionIO

Anomaly Detection at Scale!

Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...

AI Modernization at AT&T and the Application to Fraud with Databricks

An introduction to predictionIO

Productionizing Machine Learning in Our Health and Wellness Marketplace

Building Data Products with Python (Georgetown)

Similar to Predict Customer Churn with Azure Databricks, MLflow and SKLearn

How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks

Azure Machine Learning Dotnet Campus 2015 antimo musone

Machine learningSaravanan Subburayal

Machine Learning for EveryoneAly Abdelkareem

Nose Dive into Apache Spark MLAhmet Bulut

Data Mining 101Ali Septiandri

Knowledge discovery claudiad amatoSSSW

Apache Spark Model Deployment Databricks

Demystifying Data Science Webinar - February 14, 2018Analytics8

Presentation Titlebutest

Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaSandesh Rao

Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEASandesh Rao

Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Spark Summit

Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...Chris Fregly

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale

Modeling at Scale: SigOpt at TWIMLcon 2019SigOpt

Net campus2015 antimomusoneDotNetCampus

PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus

Tutorial Knowledge DiscoverySSSW

ML-Ops how to bring your data science to productionHerman Wu

Similar to Predict Customer Churn with Azure Databricks, MLflow and SKLearn (20)

How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....

Azure Machine Learning Dotnet Campus 2015

Machine learning

Machine Learning for Everyone

Nose Dive into Apache Spark ML

Data Mining 101

Knowledge discovery claudiad amato

Apache Spark Model Deployment

Demystifying Data Science Webinar - February 14, 2018

Presentation Title

Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea

Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA

Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...

Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models

Modeling at Scale: SigOpt at TWIMLcon 2019

Net campus2015 antimomusone

PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA

Tutorial Knowledge Discovery

ML-Ops how to bring your data science to production

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

B2 Creative Industry Response Evaluation.docxStephen266013

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Ukraine War presentation: KNOW THE BASICSAishani27

Invezz.com - Grow your wealth with trading signalsInvezz1

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

April 2024 - Crypto Market Report's Analysismanisha194592

Introduction-to-Machine-Learning (1).pptxfirstjob4

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692

Midocean dropshipping via API with DroFxolyaivanovalion

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf

B2 Creative Industry Response Evaluation.docx

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati

RA-11058_IRR-COMPRESS Do 198 series of 1998

Ravak dropshipping via API with DroFx.pptx

Ukraine War presentation: KNOW THE BASICS

Invezz.com - Grow your wealth with trading signals

BigBuy dropshipping via API with DroFx.pptx

Unveiling Insights: The Role of a Data Analyst

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

BabyOno dropshipping via API with DroFx.pptx

Mature dropshipping via API with DroFx.pptx

Schema on read is obsolete. Welcome metaprogramming..pdf

Sampling (random) method and Non random.ppt

April 2024 - Crypto Market Report's Analysis

Introduction-to-Machine-Learning (1).pptx

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

Midocean dropshipping via API with DroFx

Predict Customer Churn with Azure Databricks, MLflow and SKLearn

1. Predicting customer churn Using Azure Databricks, Sklearn and Mlflow

2. Introduction. Machine learning Data Preprocess Training Model Predictions

3. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions SQL

4. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data SQL

5. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Train/te st split SQL

6. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Preprocess Train/te st split SQL

7. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Preprocess Train/te st split Train model SQL

8. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Preprocess Train/te st split Train model Evaluate model SQL

9. Introduction. Databricks Web-based platform for spark « Automated cluster management « IPython-style notebooks « Collaborative environment « Mlflow – model management repository « SQL

10. Introduction. Customer churn Source: https://www.kaggle.com/blastchar/telco-customer-churn Features: customer services data, account and demographic information Target: Churner – customer that has left the company’s service Goal: Predict probability that customer will churn Business outcome: Actions that improve customer retention

11. Data preparation. Loading the data.

12. Data preparation. Train / test split Data Train data Test data SPLIT

13. Data preparation. Scale & one hot One Hot Encoding Feature scaling Encode ID Color 123 Blue 235 Red 312 Red 455 Green ID Blue Re d Green 123 1 0 0 235 0 1 0 312 0 1 0 455 0 0 1 Age Income 45 50000 20 35000 35 60000 65 45000 Scale Age Income 0.69 0.83 0 0.58 0.54 1 1 0.75

14. Data preparation. Sampling » Data sampling is needed when data classes are highly unbalanced. » Upsampling – artificially create instances of smaller class. » Downsampling – remove some instances from bigger class. » We found upsampling (ADASYN or SMOTE) to work best.

15. Pipeline The Machine learning pipeline consists of 3 steps: » Preprocessor » Sampler » Classifier

16. Hyperparameter optimization Finding the optimal set of hyperparameters to maximize performance

17. Training and evaluating the model Train model on training data Test the model to make predictions on test data. Calculate evaluation metrics

18. Classification evaluation metrics 𝑓1 = 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 Actual Positive Negative Predicted Positive True Positive False Positive Negative False Negative True Negative Action Preferred False Positive cost False Negative cost Phone call High precision High Low Sending an email High recall Low High

19. Mlflow » Track training parameters » Track performance metrics » Store models » Store graphs or other data » Easily package and deploy

20. Mlflow GUI

21. Mlflow GUI

22. Mlflow GUI

23. Scoring. Getting the best model » Initialize Mlflow client » Fetch experiment run ids » Fetch/store the run info » Get the best run id » Load (the best) model

24. Scoring. Predict and serve » Predict probabilities » Save the results Can we explain these predictions?

25. SHAP (SHapley Additive exPlanations) “A unified approach to explain the output of any machine learning model”

26. SHAP Plots. Summary plot

27. SHAP Plots. Force plot (individual) Customer A Customer B

28. THE END Questions?

Predict Customer Churn with Azure Databricks, MLflow and SKLearn

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Predict Customer Churn with Azure Databricks, MLflow and SKLearn

Similar to Predict Customer Churn with Azure Databricks, MLflow and SKLearn (20)

Recently uploaded

Recently uploaded (20)

Predict Customer Churn with Azure Databricks, MLflow and SKLearn