SlideShare a Scribd company logo
1 of 35
MIS 637 Final ProjectMIS 637 Final Project
Predicting Churners in aPredicting Churners in a
Telecom CompanyTelecom Company
By
Rahul Bhatia
Student Id : 10398302
ABSTRACTABSTRACT
• "Churn Rate" is a business term describing the rate at which
customers leave or cease paying for a product or service. It's a
critical figure in many businesses, as it's often the case that
acquiring new customers is a lot more costly than retaining existing
ones (in some cases, 5 to 20 times more expensive).
• Understanding what keeps customers engaged, therefore, is
incredibly valuable, as it is a logical foundation from which to
develop retention strategies and roll out operational practices
aimed to keep customers from walking out the door. Consequently,
there's growing interest among companies to develop better
churn-detection techniques, leading many to look to data mining
and machine learning for new and creative approaches.
1
CROSS-INDUSTRY STANDARD PROCESS
(CRISP–DM)- 6 Phases
•Business understanding phase.
•Data understanding phase
•Data preparation phase
•Modeling phase
•Evaluation phase
•Deployment phase
3
Business UnderstandingBusiness Understanding
Profound Question:
For this project, I have obtained a longstanding telecom customer
dataset of a Telecom (Mobile) company which aims to predict
whether its customers will churn or not. The objective of this
competition is to build a model, learned using historical data, that will
determine churners in the telecom company.
Objective:
The classification goal is to derive rules and predict whether a
customer will churn or not by using KNN and C4.5(variable churn)
algorithms and compare both the model accuracies.
Accomplishments:
By using this model, we can increase churn prediction efficiency by
identifying the main variables which result in churning, and have a
more rational estimate about which customers are potential churners
that we should contact first.
4
Data Source: This dataset was used in yhat blog post “Predicting
customer churn with scikit-learn” by Eric Chiang.
Data set details:
•The data is straightforward. Each row represents a subscribing
telephone customer. Each column contains customer attributes such
as phone number, call minutes used during different times of day,
charges incurred for services, lifetime account duration, and whether
or not the customer is still a customer. The original dataset contains a
total of 3333 rows with 1 dependent variable and 20 independent
variables.
5
Data Understanding
Data UnderstandingData Understanding
Sample Data
6
Data UnderstandingData Understanding
Attributes Description:
7
Data UnderstandingData Understanding
Attribute Description:
8
Data PreparationData Preparation
Data Cleaning and Transformations:
Handle Missing values & Identify outliers:
No missing values and outliers have been found in original data.
Normalization:
Z-Score Normalization was performed on input variables Account Length ,
Number of Voice Mail Messages, Total Day Minutes, Total Day calls, Total
Evening Minutes, Total Evening calls, Total Night Minutes, Total Night Calls,
Total International Minutes, Total International Calls, Customer Service Calls.
9
Data PreparationData Preparation
• Attributes Selection:
 Attributes State, Area Code and Phone Number were dropped from
the model as we do not need these columns for churn prediction.
 Attributes Total Day Charge, Total Evening Charge and Total night
calls and Total International Charge were also dropped from the
model as high correlation was found between them and Total Day
Minutes, Total Evening Minutes, Total night minutes, Total International
Minutes respectively.
1
Data PreparationData Preparation
• Correlation:
Strong Correlation between Day Minutes and Day Charge.
1
Data PreparationData Preparation
• Correlation:
Strong Correlation between Evening Minutes and Evening Charge.
1
Data PreparationData Preparation
• Correlation:
Strong Correlation between Night Minutes and Night Charge.
1
Data PreparationData Preparation
• Correlation:
Strong Correlation between International Minutes and International Charge.
1
TRANSFORMED DATASETTRANSFORMED DATASET
Transformed Dataset has 13 Independent and 1 Dependent Variable(churn)
Sample Data
1
Data PreparationData Preparation
Data Division:
After data clean, the data set consisting of 3333 records is divided into 2 sets.
Training data set: 80% of the data (2666 records) is used to develop the model.
Testing data: 20% of the data ( 667 records) is used to evaluate the model.
16
ModelingModeling
Algorithm?
The target variable is categorical (true, false) and is not continuous, Classification is
the right choice.
Classification: predicts categorical class labels and classifies data based on the
training set and the values in a classifying attribute and uses it in classifying new
data.
17
ModelingModeling
K-Nearest Neighbors algorithm:
The output is a class membership. An object is classified by a majority vote of its
neighbors, with the object being assigned to the class most common among
its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the
object is simply assigned to the class of that single nearest neighbor.
C4.5 algorithm: An extension of ID3 algorithm. C4.5 recursively visits each
decision node, selecting the optimal splits, until no further splits are possible. It
makes use of the concept of information gain or entropy reduction to select the
optimal split.
18
ModelingModeling
Software:
SPSS Modeler 17.0 is a data mining and text analytics software application built by
IBM.
It is an extensive predictive analytics platform that is designed to built predictive
models, conduct analytic tasks and bring predictive intelligence to decisions by
providing a range of advanced algorithms and techniques.
19
ModelingModeling
Training Data
20
ModelingModeling
Input & Output Variables
21
K-nearest neighbor ModelK-nearest neighbor Model
1
K=5 is selected as Error is Minimum.
K-nearest neighbor ModelK-nearest neighbor Model
SummarySummary
1
K-nearest neighbor Test DatasetK-nearest neighbor Test Dataset
on Training Data Modelon Training Data Model
24
Evaluation
K-nearest neighbor AccuracyK-nearest neighbor Accuracy
1
87.71% Accuracy was
achieved.
Modeling(C4.5 algorithm)Modeling(C4.5 algorithm)
26
Set the model and execute it, with
Cross Validation on Training
Dataset and 95.1% accuracy
achieved.
C4.5 Test Dataset on TrainingC4.5 Test Dataset on Training
Data ModelData Model
27
94.9% Accuracy
EvaluationEvaluation
C4.5 algorithm(94.9%) is preferred over K-nearest neighbor algorithm
(87.1%) as the model accuracy is higher.
C4.5 Algorithm:
Coincidence Matrix
Shows a high accuracy in predicting “false” while a low accuracy when predicting “True”.
This is because the model often yield misleading results if the data set is unbalanced, as in
this project, we have 558 “false” and 109 ”true”, the classifier could easily be biased into
classifying all the samples as “false”.
However, we still can use this mode to predict a “true” due to the lifting and gain chart.
28
EvaluationEvaluation
29
6 Times accurate
Lifting is a measure of the
effectiveness of a predictive
model calculated as the ratio
between the results obtained
with and without the predictive
model.
For contacting 10% of
customers, using no model we
should get 10% of positive
churners and using the given
model we should get 60% of
positive churners.
EvaluationEvaluation
30
Gains Chart
The y-axis shows the percentage the
total possible positive churners(“true”)
The x-axis shows the percentage of
customers contacted
By using this model, we just need to
contact 50% of customers to receive
90% of the “true” churners.
Evaluation(Variable Importance)Evaluation(Variable Importance)
31
Day Minutes, Customer Service
Calls and International Plan are
the most important variables.
EvaluationEvaluation
Conclusion:
We can conclude that Day Minutes, Number of Customer Service Calls ,
International Plan, Evening Minutes, Number of International calls and Voice Mail
Plan are the most important variables in predicting Churners.
32
DeploymentDeployment
• Predicting churn is particularly important for businesses w/ subscription models
such as cell phone, cable, or merchant credit card processing plans.
• Since the model achieved high predictive performances, it can be used to in
predicting churners in any Telecom Company and help the company to prevent it
customers from churning by improving on the most important variables as
discussed earlier and also save campaign cost.
33
ReferencesReferences
Data source:
Link to the dataset:
https://raw.githubusercontent.com/EricChiang/churn/master/data/c
hurn.csv
Software:
http://www-01.ibm.com/software/analytics/spss/
Other References:
http://blog.yhathq.com/posts/predicting-customer-churn-with-
sklearn.html
34
Thank you

More Related Content

What's hot

Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and PredictionSOUMIT KAR
 
Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Gunvansh Khanna
 
Churn Prediction in Practice
Churn Prediction in PracticeChurn Prediction in Practice
Churn Prediction in PracticeBigData Republic
 
Telecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analyticsTelecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analyticssheetal sharma
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomChris Chen
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industryskewdlogix
 
Customer attrition and churn modeling
Customer attrition and churn modelingCustomer attrition and churn modeling
Customer attrition and churn modelingMariya Korsakova
 
Churn prediction data modeling
Churn prediction data modelingChurn prediction data modeling
Churn prediction data modelingPierre Gutierrez
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesSindhujanDhayalan
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET Journal
 
Telco churn presentation
Telco churn presentationTelco churn presentation
Telco churn presentationAditya Bahl
 
Customer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation SlidesCustomer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation SlidesSlideTeam
 
A case study on churn analysis1
A case study on churn analysis1A case study on churn analysis1
A case study on churn analysis1Amit Kumar
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
 
Ways to Reduce the Customer Churn Rate
Ways to Reduce the Customer Churn RateWays to Reduce the Customer Churn Rate
Ways to Reduce the Customer Churn RateFORMCEPT
 

What's hot (20)

Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and Prediction
 
Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Data analytics telecom churn final ppt
Data analytics telecom churn final ppt
 
Telcom churn .pptx
Telcom churn .pptxTelcom churn .pptx
Telcom churn .pptx
 
Churn Prediction in Practice
Churn Prediction in PracticeChurn Prediction in Practice
Churn Prediction in Practice
 
Telecom Churn Prediction
Telecom Churn PredictionTelecom Churn Prediction
Telecom Churn Prediction
 
Telecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analyticsTelecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analytics
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in Telecom
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
Customer attrition and churn modeling
Customer attrition and churn modelingCustomer attrition and churn modeling
Customer attrition and churn modeling
 
Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
 
Churn prediction data modeling
Churn prediction data modelingChurn prediction data modeling
Churn prediction data modeling
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniques
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom Industry
 
Telco churn presentation
Telco churn presentationTelco churn presentation
Telco churn presentation
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
 
Customer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation SlidesCustomer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation Slides
 
A case study on churn analysis1
A case study on churn analysis1A case study on churn analysis1
A case study on churn analysis1
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
Ways to Reduce the Customer Churn Rate
Ways to Reduce the Customer Churn RateWays to Reduce the Customer Churn Rate
Ways to Reduce the Customer Churn Rate
 
Predicting the e-commerce churn
Predicting the e-commerce churnPredicting the e-commerce churn
Predicting the e-commerce churn
 

Viewers also liked

Solving Real Life Problems using Data Science Part - 1
Solving Real Life Problems using Data Science Part - 1Solving Real Life Problems using Data Science Part - 1
Solving Real Life Problems using Data Science Part - 1Sohom Ghosh
 
Pragmatic machine learning for the real world
Pragmatic machine learning for the real worldPragmatic machine learning for the real world
Pragmatic machine learning for the real worldLouis Dorard
 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Alejandro Correa Bahnsen, PhD
 
Telecom Churn Prediction from Customer Usage Data (Igor Tymchuk)
Telecom Churn Prediction from Customer Usage Data (Igor Tymchuk)Telecom Churn Prediction from Customer Usage Data (Igor Tymchuk)
Telecom Churn Prediction from Customer Usage Data (Igor Tymchuk)Lviv IT School
 
Telco Churn Roi V3
Telco Churn Roi V3Telco Churn Roi V3
Telco Churn Roi V3hkaul
 
Predicting churn in telco industry: machine learning approach - Marko Mitić
 Predicting churn in telco industry: machine learning approach - Marko Mitić Predicting churn in telco industry: machine learning approach - Marko Mitić
Predicting churn in telco industry: machine learning approach - Marko MitićInstitute of Contemporary Sciences
 
Pragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainPragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainLouis Dorard
 
Logistic Regression: Behind the Scenes
Logistic Regression: Behind the ScenesLogistic Regression: Behind the Scenes
Logistic Regression: Behind the ScenesChris White
 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practiceAlejandro Correa Bahnsen, PhD
 
Analytics, KPIs for effective Churn & Loyalty management
Analytics, KPIs for effective Churn & Loyalty managementAnalytics, KPIs for effective Churn & Loyalty management
Analytics, KPIs for effective Churn & Loyalty managementEhtisham Rao
 
Presentation Churn Management
Presentation Churn ManagementPresentation Churn Management
Presentation Churn Managementfarhanmajeed
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
 
Combining Logit and Ensemble Modeling for Increased Customer Churn Detection
Combining Logit and Ensemble Modeling  for Increased Customer Churn DetectionCombining Logit and Ensemble Modeling  for Increased Customer Churn Detection
Combining Logit and Ensemble Modeling for Increased Customer Churn DetectionPython Predictions
 
Beyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingBeyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingPierre Gutierrez
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Turi, Inc.
 

Viewers also liked (18)

Solving Real Life Problems using Data Science Part - 1
Solving Real Life Problems using Data Science Part - 1Solving Real Life Problems using Data Science Part - 1
Solving Real Life Problems using Data Science Part - 1
 
Pragmatic machine learning for the real world
Pragmatic machine learning for the real worldPragmatic machine learning for the real world
Pragmatic machine learning for the real world
 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
 
Telecom Churn Prediction from Customer Usage Data (Igor Tymchuk)
Telecom Churn Prediction from Customer Usage Data (Igor Tymchuk)Telecom Churn Prediction from Customer Usage Data (Igor Tymchuk)
Telecom Churn Prediction from Customer Usage Data (Igor Tymchuk)
 
Telco Churn Roi V3
Telco Churn Roi V3Telco Churn Roi V3
Telco Churn Roi V3
 
Predicting churn in telco industry: machine learning approach - Marko Mitić
 Predicting churn in telco industry: machine learning approach - Marko Mitić Predicting churn in telco industry: machine learning approach - Marko Mitić
Predicting churn in telco industry: machine learning approach - Marko Mitić
 
Pragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainPragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML Spain
 
Logistic Regression: Behind the Scenes
Logistic Regression: Behind the ScenesLogistic Regression: Behind the Scenes
Logistic Regression: Behind the Scenes
 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice
 
Analytics, KPIs for effective Churn & Loyalty management
Analytics, KPIs for effective Churn & Loyalty managementAnalytics, KPIs for effective Churn & Loyalty management
Analytics, KPIs for effective Churn & Loyalty management
 
Presentation Churn Management
Presentation Churn ManagementPresentation Churn Management
Presentation Churn Management
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Combining Logit and Ensemble Modeling for Increased Customer Churn Detection
Combining Logit and Ensemble Modeling  for Increased Customer Churn DetectionCombining Logit and Ensemble Modeling  for Increased Customer Churn Detection
Combining Logit and Ensemble Modeling for Increased Customer Churn Detection
 
Beyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modelingBeyond Churn Prediction : An Introduction to uplift modeling
Beyond Churn Prediction : An Introduction to uplift modeling
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)
 
Churn management
Churn managementChurn management
Churn management
 
Churn Predictive Modelling
Churn Predictive ModellingChurn Predictive Modelling
Churn Predictive Modelling
 

Similar to MIS637_Final_Project_Rahul_Bhatia

Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxAniket Patil
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxpatilaniket2418
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersMohitMhapuskar
 
Data Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationData Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationKaushik Rajan
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...IRJET Journal
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptxDhanuDhanu49
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss sessionM Baddar
 
Leveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataLeveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataMRS
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...IRJET Journal
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...IRJET Journal
 
WP_ContactCenterPlanningMethodologies_whitepaper_laser
WP_ContactCenterPlanningMethodologies_whitepaper_laserWP_ContactCenterPlanningMethodologies_whitepaper_laser
WP_ContactCenterPlanningMethodologies_whitepaper_laserBayu Wicaksono
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.Souma Maiti
 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonaliSonali Gupta
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaTrushita Redij
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 

Similar to MIS637_Final_Project_Rahul_Bhatia (20)

Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationData Mining on Customer Churn Classification
Data Mining on Customer Churn Classification
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptx
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
Leveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataLeveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic data
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
Unfolding the Credit Card Fraud Detection Technique by Implementing SVM Algor...
 
WP_ContactCenterPlanningMethodologies_whitepaper_laser
WP_ContactCenterPlanningMethodologies_whitepaper_laserWP_ContactCenterPlanningMethodologies_whitepaper_laser
WP_ContactCenterPlanningMethodologies_whitepaper_laser
 
Bank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptxBank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptx
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonali
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 

MIS637_Final_Project_Rahul_Bhatia

  • 1. MIS 637 Final ProjectMIS 637 Final Project Predicting Churners in aPredicting Churners in a Telecom CompanyTelecom Company By Rahul Bhatia Student Id : 10398302
  • 2. ABSTRACTABSTRACT • "Churn Rate" is a business term describing the rate at which customers leave or cease paying for a product or service. It's a critical figure in many businesses, as it's often the case that acquiring new customers is a lot more costly than retaining existing ones (in some cases, 5 to 20 times more expensive). • Understanding what keeps customers engaged, therefore, is incredibly valuable, as it is a logical foundation from which to develop retention strategies and roll out operational practices aimed to keep customers from walking out the door. Consequently, there's growing interest among companies to develop better churn-detection techniques, leading many to look to data mining and machine learning for new and creative approaches. 1
  • 3. CROSS-INDUSTRY STANDARD PROCESS (CRISP–DM)- 6 Phases •Business understanding phase. •Data understanding phase •Data preparation phase •Modeling phase •Evaluation phase •Deployment phase 3
  • 4. Business UnderstandingBusiness Understanding Profound Question: For this project, I have obtained a longstanding telecom customer dataset of a Telecom (Mobile) company which aims to predict whether its customers will churn or not. The objective of this competition is to build a model, learned using historical data, that will determine churners in the telecom company. Objective: The classification goal is to derive rules and predict whether a customer will churn or not by using KNN and C4.5(variable churn) algorithms and compare both the model accuracies. Accomplishments: By using this model, we can increase churn prediction efficiency by identifying the main variables which result in churning, and have a more rational estimate about which customers are potential churners that we should contact first. 4
  • 5. Data Source: This dataset was used in yhat blog post “Predicting customer churn with scikit-learn” by Eric Chiang. Data set details: •The data is straightforward. Each row represents a subscribing telephone customer. Each column contains customer attributes such as phone number, call minutes used during different times of day, charges incurred for services, lifetime account duration, and whether or not the customer is still a customer. The original dataset contains a total of 3333 rows with 1 dependent variable and 20 independent variables. 5 Data Understanding
  • 9. Data PreparationData Preparation Data Cleaning and Transformations: Handle Missing values & Identify outliers: No missing values and outliers have been found in original data. Normalization: Z-Score Normalization was performed on input variables Account Length , Number of Voice Mail Messages, Total Day Minutes, Total Day calls, Total Evening Minutes, Total Evening calls, Total Night Minutes, Total Night Calls, Total International Minutes, Total International Calls, Customer Service Calls. 9
  • 10. Data PreparationData Preparation • Attributes Selection:  Attributes State, Area Code and Phone Number were dropped from the model as we do not need these columns for churn prediction.  Attributes Total Day Charge, Total Evening Charge and Total night calls and Total International Charge were also dropped from the model as high correlation was found between them and Total Day Minutes, Total Evening Minutes, Total night minutes, Total International Minutes respectively. 1
  • 11. Data PreparationData Preparation • Correlation: Strong Correlation between Day Minutes and Day Charge. 1
  • 12. Data PreparationData Preparation • Correlation: Strong Correlation between Evening Minutes and Evening Charge. 1
  • 13. Data PreparationData Preparation • Correlation: Strong Correlation between Night Minutes and Night Charge. 1
  • 14. Data PreparationData Preparation • Correlation: Strong Correlation between International Minutes and International Charge. 1
  • 15. TRANSFORMED DATASETTRANSFORMED DATASET Transformed Dataset has 13 Independent and 1 Dependent Variable(churn) Sample Data 1
  • 16. Data PreparationData Preparation Data Division: After data clean, the data set consisting of 3333 records is divided into 2 sets. Training data set: 80% of the data (2666 records) is used to develop the model. Testing data: 20% of the data ( 667 records) is used to evaluate the model. 16
  • 17. ModelingModeling Algorithm? The target variable is categorical (true, false) and is not continuous, Classification is the right choice. Classification: predicts categorical class labels and classifies data based on the training set and the values in a classifying attribute and uses it in classifying new data. 17
  • 18. ModelingModeling K-Nearest Neighbors algorithm: The output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. C4.5 algorithm: An extension of ID3 algorithm. C4.5 recursively visits each decision node, selecting the optimal splits, until no further splits are possible. It makes use of the concept of information gain or entropy reduction to select the optimal split. 18
  • 19. ModelingModeling Software: SPSS Modeler 17.0 is a data mining and text analytics software application built by IBM. It is an extensive predictive analytics platform that is designed to built predictive models, conduct analytic tasks and bring predictive intelligence to decisions by providing a range of advanced algorithms and techniques. 19
  • 22. K-nearest neighbor ModelK-nearest neighbor Model 1 K=5 is selected as Error is Minimum.
  • 23. K-nearest neighbor ModelK-nearest neighbor Model SummarySummary 1
  • 24. K-nearest neighbor Test DatasetK-nearest neighbor Test Dataset on Training Data Modelon Training Data Model 24 Evaluation
  • 25. K-nearest neighbor AccuracyK-nearest neighbor Accuracy 1 87.71% Accuracy was achieved.
  • 26. Modeling(C4.5 algorithm)Modeling(C4.5 algorithm) 26 Set the model and execute it, with Cross Validation on Training Dataset and 95.1% accuracy achieved.
  • 27. C4.5 Test Dataset on TrainingC4.5 Test Dataset on Training Data ModelData Model 27 94.9% Accuracy
  • 28. EvaluationEvaluation C4.5 algorithm(94.9%) is preferred over K-nearest neighbor algorithm (87.1%) as the model accuracy is higher. C4.5 Algorithm: Coincidence Matrix Shows a high accuracy in predicting “false” while a low accuracy when predicting “True”. This is because the model often yield misleading results if the data set is unbalanced, as in this project, we have 558 “false” and 109 ”true”, the classifier could easily be biased into classifying all the samples as “false”. However, we still can use this mode to predict a “true” due to the lifting and gain chart. 28
  • 29. EvaluationEvaluation 29 6 Times accurate Lifting is a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model. For contacting 10% of customers, using no model we should get 10% of positive churners and using the given model we should get 60% of positive churners.
  • 30. EvaluationEvaluation 30 Gains Chart The y-axis shows the percentage the total possible positive churners(“true”) The x-axis shows the percentage of customers contacted By using this model, we just need to contact 50% of customers to receive 90% of the “true” churners.
  • 31. Evaluation(Variable Importance)Evaluation(Variable Importance) 31 Day Minutes, Customer Service Calls and International Plan are the most important variables.
  • 32. EvaluationEvaluation Conclusion: We can conclude that Day Minutes, Number of Customer Service Calls , International Plan, Evening Minutes, Number of International calls and Voice Mail Plan are the most important variables in predicting Churners. 32
  • 33. DeploymentDeployment • Predicting churn is particularly important for businesses w/ subscription models such as cell phone, cable, or merchant credit card processing plans. • Since the model achieved high predictive performances, it can be used to in predicting churners in any Telecom Company and help the company to prevent it customers from churning by improving on the most important variables as discussed earlier and also save campaign cost. 33
  • 34. ReferencesReferences Data source: Link to the dataset: https://raw.githubusercontent.com/EricChiang/churn/master/data/c hurn.csv Software: http://www-01.ibm.com/software/analytics/spss/ Other References: http://blog.yhathq.com/posts/predicting-customer-churn-with- sklearn.html 34

Editor's Notes

  1. To test the training model, we use test dataset.