Objective: reduce a telecommunication company’s churn rate (originally 14.49%) through developing predictive models with past customers’ churning data and identifying churners’ characteristics using machine learning techniques
Tools used:
>RStudio (ggplot2, dplyr)
>Weka (for model prototyping)
>MS Excel, MS Powerpoint
Techniques used:
>Decision tree
>Naive Bayes
>Random Forest
>K-means clustering analysis
>Recommender system
5. Data Preparation
Correlation plot
Simple logistic regression (attributes most
linked to churn)
Significant imbalance in churn attribute
Derived values: NA -> 0
Area code: discretized
6. Data Preparation
Correlation plot
Simple logistic regression (attributes most
linked to churn)
Significant imbalance in churn attribute
Derived values: NA -> 0
Area code: discretized
9. Predictive Modeling: Classification
Our predictive modeling covers three different
classification algorithms:
Decision Tree
Naïve Bayes and
Random Forest
12. K-Means Clustering our ‘Churners’
“Cathy Complainers”
Total charges
Day charges
Day price per call
Price per call
Customer service calls
Voicemail messages
“Danny Daytimes”
Voicemail messages
Customer service calls
Total charges
Day charges
Day Price per call
Total Price per minute
“Irene Internationals”
Customer service calls
Voicemail messages
International charges
International price per
call
*Comparisons amongst ‘churners’ only
14. 1. Retention Recommendations
“Cathy Complainers”
Lower cost retention
plan with free
voicemail
Offer to high volume
callers to call centre
“Danny Daytimes”
Retention plan with
discount/additional
daytime/evening
minutes
Offer at early contact
or proactively offer
“Irene Internationals”
Retention plan with
discount/additional
international minutes
or better international
plan
Offer at early contact
or proactively offer