Dr. Marko Mitić
Business Data Analyst at Telenor Serbia
Predicting churn in telco industry:
machine learning approach
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Contents
• Introduction to machine learning
• Churn definition & telco data
• Algorithm description
• Data exploration
• Modelling in R language
• Conclusion
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Introduction to machine learning
Supervised learning
Each training example is a pair consisting of an input object and a desired output value
• Regression (real values)
• Classification (discrete labels)
Unsupervised learning
Draw inferences from datasets data without labeled responses
• Clustering
• Dimensionality reduction
Reinforcement learning
Agents ought to take actions in an environment
so as to maximize cumulative reward
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Introduction to machine learning
Regression Classification
Clustering
Reinforcement
Learning
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Introduction to machine learning
Training set
(observed)
Universal set
(unobserved)
Testing set
(unobserved)
Data acquisition
Practical usage
Classification
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Churn definition
Churn rate (sometimes called attrition rate),
is a measure of the number of individuals or items moving out of a collective group over a specific period of time
= Customer leaving
Pay TV
E-mail/website subscribers
Legal sector
Recreation
Newspaper subscribers
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Telco data
Real telco data available in latest C50 library in R language
Feature engineering: 3/6 months average usage, average total charge,...
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Algorithms
1. Logistic Regression
• In logistic regression the outcome variable is binary, and the purpose of the
analysis is to assess the effects of multiple explanatory variables
Odds of success = P / 1-P = = e α + β1X1 + β2X2 + …+βpXp
The joint effects of all explanatory variables put together on the odds is
Logit P = α+β1X1+β2X2+..+βpXp
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Algorithms
2. Support Vector Machines
• SVMs maximize the margin around the separating
hyperplane.
• The decision function is fully specified by a subset
of training samples, the support vectors.
wTxi + b ≥ 1, if yi = 1
wTxi + b ≤ −1, if yi = −1
w
2
ρ • Margin
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Algorithms
3. Neural Network
• A neuron network (NN) is a computational model based on the structure and functions of
biological neural networks.
• A neural network usually involves a large number of processing units with the aim of
successfully mapping input to output space through iterative process
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Algorithms
4. Boosting
• Adaboost
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Evaluation metrics
Confusion matrix
• Accuracy, Precision, Recall
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Evaluation metrics
ROC curve and AUC
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Data exploration
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Modelling in R (1)
Logistic Regression
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Modelling in R (1.1)
ROC and AUC
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Modelling in R (2)
Support Vector Machines
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Modelling in R (4)
BP Neural Networks
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Conclusions
• 3 machine algorithms for churn prediction are presented
• Logistic Regression and BP Neural Net with boosting gave best results
• Good base for successfull broadcast campaign towards potential churners
Works even better
• Implementation of more complex ML algorithms (Random Forest, Gradient
Boosting Machines, Deep NNs)
• Generate hybrid ensemble models
yana-ckwivjw2n5n1lxfv@guest.airbnb.com
Thank you!

Predicting churn in telco industry: machine learning approach - Marko Mitić