SlideShare a Scribd company logo
1 of 25
TELECOM CUSTOMER
CHURN
PREDICTION
By: ANIKET PATIL
INTRODUCTION
The data set used in this project is available in
the Kaggle and contains nineteen columns
(independent variables) that indicate
the characteristics of the clients of a fictional
telecommunications corporation. The Churn
column (response variable) indicates whether the
customer departed within the last month or not.
The class No includes the clients that did not leave
the company last month, while the class YES
contains the clients that decided to terminate
their relations with the company. The objective of
the analysis is to obtain the relation between the
customer’s characteristics and the churn.
PROBLEM STATEMENT
• Predicting customer churn is critical for telecommunication
companies to be able to effectively retain customers. It is
more costly to acquire new customers than to retain existing
ones.
• For this reason, large telecommunications corporations are
seeking to develop models to predict which customers are
more likely to change and take actions accordingly.
• In this article, we build a model to predict how likely a
customer will churn by analyzing its characteristics: (1)
demographic information, (2) account information, and (3)
services information.
• The objective is to obtain a data-driven solution that will
allow us to reduce churn rates and, as a consequence, to
increase customer satisfaction and corporation revenue.
Data Overview
• The data set used in this project is available in the Kaggle.
• It contains nineteen columns (independent variables) with 7043 rows that indicate the characteristics of the clients.
• The Churn column (response variable) indicates whether the customer departed within the last month or not.
• The class No includes the clients that did not leave the company last month, while the class Yes contains the
clients that decided to terminate their relations with the company.
• The objective of the analysis is to obtain the relation between the customer’s characteristics and the churn.
Data Preprocessing
• The data set contains 19 independent variables, which can be classified into 3 groups:
1. Demographic Information: gender, SeniorCitizen, Partner, Dependents
2. Customer Account Information: tenure, Contract, PaperlessBilling, PaymentMethod,
MontlyCharges, TotalCharges
3. Services Information: PhoneService, MultipleLines, InternetServices, OnlineSecurity,
OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies
• Exploratory Data Analysis and Data Cleaning
• Missing values and data types (pandas.DataFrame.info)
• Remove customerID column
• Payment method denominations
• Data Visualization
• Response Variable (Bar Plot)
• Demographic Information (Bar Plot)
• Customer Account Information — Categorical variables & Numerical variables (Bar Plot & Histogram)
• Services Information (Bar Plot)
• Checking Outliers
The following bar plot shows the
percentage of observations that
correspond to each class of the
response variable: no and yes. As
shown, this is an imbalanced data
set because both classes are not
equally distributed among all
observations, being no the majority
class (73.42%). When modeling, this
imbalance will lead to a large
number of false negatives, as we will
see later.
73.42%
26.58%
As shown, each bar is a category of the
independent variable, and it is
subdivided to show the proportion of
each response class (No and Yes).
We can extract the following
conclusions by analyzing demographic
attributes:
•The churn rate of senior citizens is
almost double that of young citizens.
•We do not expect gender to have
significant predictive power. A similar
percentage of churn is shown both
when a customer is a man or a woman.
•Customers with a partner churn less
than customers with no partner.
We can extract the following
conclusions by analyzing customer
account attributes:
• Customers with month-to-month
contracts have higher churn
rates compared to clients with yearly
contracts.
• Customers who opted for
an electronic check as paying method
are more likely to leave the company.
• Customers subscribed to paperless
billing churn more than those who are
not subscribed.
We can extract the following
conclusions by analyzing the
histograms:
• The churn rate tends to be
larger when monthly
charges are high.
• New customers (low tenure) are
more likely to churn.
• Clients with high total
charges are less likely to leave
the company.
We can extract the following conclusions by
evaluating services attributes:
• We do not expect phone
attributes (PhoneService and MultipleLines) to
have significant predictive power. The
percentage of churn for all classes in both
independent variables is nearly the same.
• Clients with online security churn less than
those without it.
• Customers with no tech support tend to
churn more often than those with tech
support.
By looking at all the plots, we can identify the most
relevant attributes for detecting churn. We
expect these attributes to be discriminative in our
future models.
As we can see the boxplot for Numeric features, there
are no outliers.
Feature Importance
Mutual information — analysis of linear and nonlinear relationships
Mutual information measures the mutual dependency between
two variables. In machine learning, we are interested in evaluating
the degree of dependency between each independent variable
and the response variable. Higher values of mutual information
show a higher degree of dependency which indicates that the
independent variable will be useful for predicting the target.
As shown above, gender, PhoneService,
and MultipleLines have a mutual
information score really close to 0,
meaning those variables do not have a
strong relationship with the target.
Feature Engineering
• The SeniorCitizen column is already a binary column and
should not be modified.
• Label Encoding
• In this project, we use label encoding with the
following binary variables: (1) gender, (2) Partner, (3)
Dependents, (4)PaperlessBilling, (5)PhoneService , and
(6)Churn
• One-Hot Encoding
• In this project, we apply one-hot encoding to the
following categorical variables: (1) Contract, (2)
PaymentMethod, (3) MultipleLines, (4)
InternetServices, (5) OnlineSecurity, (6) OnlineBackup,
(7) DeviceProtection, (8) TechSupport, (9) StreamingTV,
and (10)StreamingMovies
• Normalization
• In this project, we will use the min-max
method to rescale the numeric columns
(tenure, MontlyCharges, and
TotalCharges) to a common scale. The
min-max approach (often called
normalization) rescales the feature to a
fixed range of [0,1] by subtracting the
minimum value of the feature and then
dividing by the range.
Splitting the data in training and testing sets
The first step when building a model is to split the data into
two groups, which are typically referred to as training and
testing sets. The training set is used by the machine learning
algorithm to build the model. The test set contains samples
that are not part of the learning process and is used to evaluate
the model’s performance. It is important to assess the quality
of the model using unseen data to guarantee an objective
evaluation.
First, we create a variable X to store the independent
attributes of the dataset. Additionally, we create a variable y to
store only the target variable (Churn).
Then, we can use the train_test_split function from the
sklearn.model_selection package to create both the training
and testing sets.
Accessing Multiple Algorithm
In this project, we compare 4 different algorithms.
• Logistic Regression: Logistic regression is used in customer
churn prediction because it's good at estimating the
probability of a customer leaving or staying, making it suitable
for binary classification tasks.
• Support Vector Machines: Support Vector Classifier (SVC) is
used in customer churn prediction because it finds a clear
boundary between churn and non-churn customers, making it
effective for binary classification tasks.
• Decision Tree Classifier: Decision tree classifiers are used in
customer churn prediction because they create a
straightforward "if-then" tree to predict churn, making it
interpretable and suitable for understanding customer
behavior.
• Random Forest Classifier: Random forest classifiers are used
in customer churn prediction because they combine multiple
decision trees, reducing overfitting and improving accuracy by
considering various aspects of customer behavior for more
robust predictions.
Model Training and Evaluation
The Model was trained using 4 different algorithms and Evaluated using appropriate metrics such as accuracy,
precision, recall, F1-score.
• Logistic Regression Matrix:
• Support Vector Classifier Matrix
• Random Forest Classifier Matrix
• Decision Tree Classifier Matrix
accuracy f1_score precision recall
0.79 0.58 0.6 0.56
accuracy f1_score precision recall
0.72 0.48 0.46 0.51
accuracy f1_score precision recall
0.79 0.55 0.6 0.51
accuracy f1_score precision recall
0.77 0.53 0.56 0.5
Confusion Matrix
True Positive 1142 168 False Negative
False Positive 197 251 True Negative
• Logistic Regression
• Support Vector Machine
True Positive 1157 153 False Negative
False Positive 220 228 True Negative
• Random Forest
True Positive 1134 176 False Negative
False Positive 224 224 True Negative
• Decision Tree
True Positive 1043 267 False Negative
False Positive 220 228 True Negative
• In the model, relation analysis between Independent features and Response variable with
best visualization is summarized.
• Also best model has been predicted with best score was derived and made 79% accuracy
• After model building, Logistic Regression algorithm looks best for the telecom customer
churn dataset, which will predict the churn analysis.
Conclusion
Future Steps
• In upcoming time it is necessary to reduce
further more features in order to obtain better
accuracy.
• And introducing some more machine learning
models for better performance.
Acknowledgement
I'd like to give a big shoutout to the Boston Institute of
Analytics for giving me the amazing opportunity to work
on Telecom Customer Churn Prediction Capstone
Project in Machine Learning.
Huge thanks to my mentor, Mr. Jalal Khan, for being an
awesome guide throughout this journey. This
experience has been epic, and I'm so grateful for the
chance to learn and grow with your support.
Cheers to more data adventures ahead! 🚀
📊🙌
References
• https://www.kaggle.com/
• https://goggle.com/
• ChatGPT
• Umayaparvathi V, Iyakutti K. A survey on customer
churn prediction in telecom industry: datasets,
methods and metric. Int Res J Eng Technol.
2016;3(4):1065–70.
Contact Information
Email: patilaniket2418@gmail.com
Phone: 8422048860
THANK YOU!

More Related Content

Similar to Customer_Churn_prediction.pptx

Data mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetData mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetRohan Choksi
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...IRJET Journal
 
TELECOM SERVICES: I.T. & ANALYTICS
TELECOM SERVICES: I.T. & ANALYTICSTELECOM SERVICES: I.T. & ANALYTICS
TELECOM SERVICES: I.T. & ANALYTICSGeorge Krasadakis
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...IRJET Journal
 
Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and PredictionSOUMIT KAR
 
Case Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the CustomerCase Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the CustomerJill Kirkpatrick
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonaliSonali Gupta
 
Telecom analytics brochure
Telecom analytics brochure Telecom analytics brochure
Telecom analytics brochure Daniel Thomas
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET Journal
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detectionjagan477830
 
Machine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business casesMachine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business casesClaudio Mirti
 
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxHarshitGoel87
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Automated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning ModelsAutomated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning ModelsIRJET Journal
 
Bank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim PredictionBank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim PredictionIRJET Journal
 

Similar to Customer_Churn_prediction.pptx (20)

Data mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetData mining and analysis of customer churn dataset
Data mining and analysis of customer churn dataset
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
 
TELECOM SERVICES: I.T. & ANALYTICS
TELECOM SERVICES: I.T. & ANALYTICSTELECOM SERVICES: I.T. & ANALYTICS
TELECOM SERVICES: I.T. & ANALYTICS
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
 
Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and Prediction
 
ML_project_ppt.pdf
ML_project_ppt.pdfML_project_ppt.pdf
ML_project_ppt.pdf
 
Case Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the CustomerCase Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the Customer
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonali
 
Telecom analytics brochure
Telecom analytics brochure Telecom analytics brochure
Telecom analytics brochure
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom Industry
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
Machine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business casesMachine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business cases
 
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
 
20 ccp using logistic
20 ccp using logistic20 ccp using logistic
20 ccp using logistic
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Automated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning ModelsAutomated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning Models
 
Bank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim PredictionBank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim Prediction
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 

Recently uploaded

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

Customer_Churn_prediction.pptx

  • 2. INTRODUCTION The data set used in this project is available in the Kaggle and contains nineteen columns (independent variables) that indicate the characteristics of the clients of a fictional telecommunications corporation. The Churn column (response variable) indicates whether the customer departed within the last month or not. The class No includes the clients that did not leave the company last month, while the class YES contains the clients that decided to terminate their relations with the company. The objective of the analysis is to obtain the relation between the customer’s characteristics and the churn.
  • 3. PROBLEM STATEMENT • Predicting customer churn is critical for telecommunication companies to be able to effectively retain customers. It is more costly to acquire new customers than to retain existing ones. • For this reason, large telecommunications corporations are seeking to develop models to predict which customers are more likely to change and take actions accordingly. • In this article, we build a model to predict how likely a customer will churn by analyzing its characteristics: (1) demographic information, (2) account information, and (3) services information. • The objective is to obtain a data-driven solution that will allow us to reduce churn rates and, as a consequence, to increase customer satisfaction and corporation revenue.
  • 4. Data Overview • The data set used in this project is available in the Kaggle. • It contains nineteen columns (independent variables) with 7043 rows that indicate the characteristics of the clients. • The Churn column (response variable) indicates whether the customer departed within the last month or not. • The class No includes the clients that did not leave the company last month, while the class Yes contains the clients that decided to terminate their relations with the company. • The objective of the analysis is to obtain the relation between the customer’s characteristics and the churn.
  • 5. Data Preprocessing • The data set contains 19 independent variables, which can be classified into 3 groups: 1. Demographic Information: gender, SeniorCitizen, Partner, Dependents 2. Customer Account Information: tenure, Contract, PaperlessBilling, PaymentMethod, MontlyCharges, TotalCharges 3. Services Information: PhoneService, MultipleLines, InternetServices, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies • Exploratory Data Analysis and Data Cleaning • Missing values and data types (pandas.DataFrame.info) • Remove customerID column • Payment method denominations • Data Visualization • Response Variable (Bar Plot) • Demographic Information (Bar Plot) • Customer Account Information — Categorical variables & Numerical variables (Bar Plot & Histogram) • Services Information (Bar Plot) • Checking Outliers
  • 6. The following bar plot shows the percentage of observations that correspond to each class of the response variable: no and yes. As shown, this is an imbalanced data set because both classes are not equally distributed among all observations, being no the majority class (73.42%). When modeling, this imbalance will lead to a large number of false negatives, as we will see later. 73.42% 26.58%
  • 7. As shown, each bar is a category of the independent variable, and it is subdivided to show the proportion of each response class (No and Yes). We can extract the following conclusions by analyzing demographic attributes: •The churn rate of senior citizens is almost double that of young citizens. •We do not expect gender to have significant predictive power. A similar percentage of churn is shown both when a customer is a man or a woman. •Customers with a partner churn less than customers with no partner.
  • 8. We can extract the following conclusions by analyzing customer account attributes: • Customers with month-to-month contracts have higher churn rates compared to clients with yearly contracts. • Customers who opted for an electronic check as paying method are more likely to leave the company. • Customers subscribed to paperless billing churn more than those who are not subscribed.
  • 9. We can extract the following conclusions by analyzing the histograms: • The churn rate tends to be larger when monthly charges are high. • New customers (low tenure) are more likely to churn. • Clients with high total charges are less likely to leave the company.
  • 10. We can extract the following conclusions by evaluating services attributes: • We do not expect phone attributes (PhoneService and MultipleLines) to have significant predictive power. The percentage of churn for all classes in both independent variables is nearly the same. • Clients with online security churn less than those without it.
  • 11. • Customers with no tech support tend to churn more often than those with tech support.
  • 12. By looking at all the plots, we can identify the most relevant attributes for detecting churn. We expect these attributes to be discriminative in our future models.
  • 13. As we can see the boxplot for Numeric features, there are no outliers.
  • 14. Feature Importance Mutual information — analysis of linear and nonlinear relationships Mutual information measures the mutual dependency between two variables. In machine learning, we are interested in evaluating the degree of dependency between each independent variable and the response variable. Higher values of mutual information show a higher degree of dependency which indicates that the independent variable will be useful for predicting the target. As shown above, gender, PhoneService, and MultipleLines have a mutual information score really close to 0, meaning those variables do not have a strong relationship with the target.
  • 15. Feature Engineering • The SeniorCitizen column is already a binary column and should not be modified. • Label Encoding • In this project, we use label encoding with the following binary variables: (1) gender, (2) Partner, (3) Dependents, (4)PaperlessBilling, (5)PhoneService , and (6)Churn • One-Hot Encoding • In this project, we apply one-hot encoding to the following categorical variables: (1) Contract, (2) PaymentMethod, (3) MultipleLines, (4) InternetServices, (5) OnlineSecurity, (6) OnlineBackup, (7) DeviceProtection, (8) TechSupport, (9) StreamingTV, and (10)StreamingMovies • Normalization • In this project, we will use the min-max method to rescale the numeric columns (tenure, MontlyCharges, and TotalCharges) to a common scale. The min-max approach (often called normalization) rescales the feature to a fixed range of [0,1] by subtracting the minimum value of the feature and then dividing by the range.
  • 16. Splitting the data in training and testing sets The first step when building a model is to split the data into two groups, which are typically referred to as training and testing sets. The training set is used by the machine learning algorithm to build the model. The test set contains samples that are not part of the learning process and is used to evaluate the model’s performance. It is important to assess the quality of the model using unseen data to guarantee an objective evaluation. First, we create a variable X to store the independent attributes of the dataset. Additionally, we create a variable y to store only the target variable (Churn). Then, we can use the train_test_split function from the sklearn.model_selection package to create both the training and testing sets.
  • 17. Accessing Multiple Algorithm In this project, we compare 4 different algorithms. • Logistic Regression: Logistic regression is used in customer churn prediction because it's good at estimating the probability of a customer leaving or staying, making it suitable for binary classification tasks. • Support Vector Machines: Support Vector Classifier (SVC) is used in customer churn prediction because it finds a clear boundary between churn and non-churn customers, making it effective for binary classification tasks. • Decision Tree Classifier: Decision tree classifiers are used in customer churn prediction because they create a straightforward "if-then" tree to predict churn, making it interpretable and suitable for understanding customer behavior. • Random Forest Classifier: Random forest classifiers are used in customer churn prediction because they combine multiple decision trees, reducing overfitting and improving accuracy by considering various aspects of customer behavior for more robust predictions.
  • 18. Model Training and Evaluation The Model was trained using 4 different algorithms and Evaluated using appropriate metrics such as accuracy, precision, recall, F1-score. • Logistic Regression Matrix: • Support Vector Classifier Matrix • Random Forest Classifier Matrix • Decision Tree Classifier Matrix accuracy f1_score precision recall 0.79 0.58 0.6 0.56 accuracy f1_score precision recall 0.72 0.48 0.46 0.51 accuracy f1_score precision recall 0.79 0.55 0.6 0.51 accuracy f1_score precision recall 0.77 0.53 0.56 0.5
  • 19. Confusion Matrix True Positive 1142 168 False Negative False Positive 197 251 True Negative • Logistic Regression • Support Vector Machine True Positive 1157 153 False Negative False Positive 220 228 True Negative • Random Forest True Positive 1134 176 False Negative False Positive 224 224 True Negative • Decision Tree True Positive 1043 267 False Negative False Positive 220 228 True Negative
  • 20. • In the model, relation analysis between Independent features and Response variable with best visualization is summarized. • Also best model has been predicted with best score was derived and made 79% accuracy • After model building, Logistic Regression algorithm looks best for the telecom customer churn dataset, which will predict the churn analysis. Conclusion
  • 21. Future Steps • In upcoming time it is necessary to reduce further more features in order to obtain better accuracy. • And introducing some more machine learning models for better performance.
  • 22. Acknowledgement I'd like to give a big shoutout to the Boston Institute of Analytics for giving me the amazing opportunity to work on Telecom Customer Churn Prediction Capstone Project in Machine Learning. Huge thanks to my mentor, Mr. Jalal Khan, for being an awesome guide throughout this journey. This experience has been epic, and I'm so grateful for the chance to learn and grow with your support. Cheers to more data adventures ahead! 🚀 📊🙌
  • 23. References • https://www.kaggle.com/ • https://goggle.com/ • ChatGPT • Umayaparvathi V, Iyakutti K. A survey on customer churn prediction in telecom industry: datasets, methods and metric. Int Res J Eng Technol. 2016;3(4):1065–70.

Editor's Notes

  1. Links: https://unsplash.com/photos/fzOITuS1DIQ
  2. Links: https://www.pexels.com/photo/photo-of-men-talking-to-each-other-3285197/
  3. Links: https://www.pexels.com/photo/man-in-black-suit-jacket-wearing-white-headphones-3779414/
  4. Links: https://unsplash.com/photos/wD1LRb9OeEo
  5. Links: https://unsplash.com/photos/wD1LRb9OeEo
  6. Links: https://www.pexels.com/photo/photo-of-person-holding-black-tablet-3734641/
  7. Links: https://www.pexels.com/photo/photo-of-people-looking-on-laptop-3182750/
  8. Links: https://www.pexels.com/photo/photo-of-people-looking-on-tablet-3182835/
  9. Links: https://www.pexels.com/photo/photo-of-people-sitting-near-wooden-table-3183188/
  10. Links: https://www.pexels.com/photo/photo-of-person-holding-pen-3184635/
  11. Links: https://unsplash.com/photos/fzOITuS1DIQ