Customer Churn
A Data Science Use Case in Telecom
Chris Chen - Data Analyst@Shaw Communications
The Problem
Who?
Why?
How?
CRISP-DM: Cross Industry
Standard Process for Data Mining
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
Business Understanding
Business objectives:
• Reduce customer
churns
• Minimize the costs
(efforts) of retention
• Conduct insights
Success criteria
• Metrics
• Non-metrics
Data Understanding
Data sources
Internal: Customer Data, Product Data, Transactions
and Customer interactions
External
Data qualities: missing values, duplicates, outliers etc.
First insights: Binary Classification, Skewed (Imbalanced)
Data Preparation
ETL
Feature selection
Feature engineering
Train/ validation/ test
Feature Selection (subtraction)
• Expert voting system
• Engage SMEs from various background: tech vs non-
tech, marketing vs customer care, management vs
frontline sales.
• 10-15 most important features that may have impacts
on customer churn/ customer retention
Wrapper based methods
Random Forest/ Boosting Tree - also good hints for feature
engineering
Filter methods
Missing Values Ratio
Low Variance Filter ( less informative features )
High Correlation Filter ( similar features)
Can be good candidates for interactions. e.g. Age vs Income
PCA
Feature Selection (subtraction) - science
Feature Engineering ( addition)
Business acumen, combined with domain knowledge and
model understandings
Ordinal vs Nominal: label encoding, one-hot-encoding
Transformation: normalization, log and so on
Imputation: missing values
Feature Interactions
Time series
Modeling
Classification:
Gradient Boosting Tree (GBT) - I’m big fan of Xgboost
Random Forest (RF)
Logistic Regression (LR) or Elastic Net (EN)
Neural Network (NN)
Support Vector Machine (SVM)
Modeling - Metric
• Precision = True Positive/ (True Positive
+ False Positive)
• Recall = True Positive / (True Positive +
False Negative)
• Accuracy = (TP + TN)/ (TP + TN +FP +
FN)
For a dataset that contains 99% non-
churn customers and 1% churn
customers if we predicted all customers as
churn then the accuracy would be 99%
Area Under Curve (AUC): Precision vs
Recall Trade off for skewed
classifications
True Positive
(Churn Customers
that were correctly
predicted as churns)
False Positive
(No-churn Customers
that were incorrectly
predicted as churn
customers)
False Negative
(Churn customers
that were incorrectly
predicted as non-
churns)
True Negative
(Non-churn
customers that were
correctly predicted as
non-churns)
Actuals
Predictions
Modeling - Ensemble
Evaluation - Model excellence vs Business
excellence
Accuracy
Interpretability
Low
Low
High
High
Radom Forest
Boosting
Deep Learning
Neural Network
Linear/ Logistic Regresson
Decision Trees
Naive Bayes
Nearest Neighbours
SVM
Deployment
Rule of Thumb:
Business Engagement
Thank you!

Customer Churn, A Data Science Use Case in Telecom

  • 1.
    Customer Churn A DataScience Use Case in Telecom Chris Chen - Data Analyst@Shaw Communications
  • 2.
  • 3.
    CRISP-DM: Cross Industry StandardProcess for Data Mining Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
  • 4.
    Business Understanding Business objectives: •Reduce customer churns • Minimize the costs (efforts) of retention • Conduct insights Success criteria • Metrics • Non-metrics
  • 5.
    Data Understanding Data sources Internal:Customer Data, Product Data, Transactions and Customer interactions External Data qualities: missing values, duplicates, outliers etc. First insights: Binary Classification, Skewed (Imbalanced)
  • 6.
    Data Preparation ETL Feature selection Featureengineering Train/ validation/ test
  • 7.
    Feature Selection (subtraction) •Expert voting system • Engage SMEs from various background: tech vs non- tech, marketing vs customer care, management vs frontline sales. • 10-15 most important features that may have impacts on customer churn/ customer retention
  • 8.
    Wrapper based methods RandomForest/ Boosting Tree - also good hints for feature engineering Filter methods Missing Values Ratio Low Variance Filter ( less informative features ) High Correlation Filter ( similar features) Can be good candidates for interactions. e.g. Age vs Income PCA Feature Selection (subtraction) - science
  • 9.
    Feature Engineering (addition) Business acumen, combined with domain knowledge and model understandings Ordinal vs Nominal: label encoding, one-hot-encoding Transformation: normalization, log and so on Imputation: missing values Feature Interactions Time series
  • 10.
    Modeling Classification: Gradient Boosting Tree(GBT) - I’m big fan of Xgboost Random Forest (RF) Logistic Regression (LR) or Elastic Net (EN) Neural Network (NN) Support Vector Machine (SVM)
  • 11.
    Modeling - Metric •Precision = True Positive/ (True Positive + False Positive) • Recall = True Positive / (True Positive + False Negative) • Accuracy = (TP + TN)/ (TP + TN +FP + FN) For a dataset that contains 99% non- churn customers and 1% churn customers if we predicted all customers as churn then the accuracy would be 99% Area Under Curve (AUC): Precision vs Recall Trade off for skewed classifications True Positive (Churn Customers that were correctly predicted as churns) False Positive (No-churn Customers that were incorrectly predicted as churn customers) False Negative (Churn customers that were incorrectly predicted as non- churns) True Negative (Non-churn customers that were correctly predicted as non-churns) Actuals Predictions
  • 12.
  • 13.
    Evaluation - Modelexcellence vs Business excellence Accuracy Interpretability Low Low High High Radom Forest Boosting Deep Learning Neural Network Linear/ Logistic Regresson Decision Trees Naive Bayes Nearest Neighbours SVM
  • 14.
  • 15.