SlideShare a Scribd company logo
1 of 21
Pranov Shobhan Mishra
Prediction of customer propensity to churn
TELECOM INDUSTRY
Refer to github link for the code
https://github.com/Pranov1984/Prediction-of-customer-propensity-to-churn
Executive Summary
Overview:
At company X, there is a concern that there is an increased churn being experienced by all companies in the
telecom industry. Customer retention and Average revenue per unit (ARPU) has become a strong area of focus,
especially for company X, as the churn rate for X, has been relatively high.
Problem Statement:
Currently the effort to retain customers has been very reactive as the attempt is made only when the subscriber
calls in to close the account. The management team is keen to take more initiatives on this front and have a
targeted proactive strategy.
Goal Statement:
The aim of this project is to do a data analysis and help the company with insights on customer behaviour that
would be helpful in devising targeted strategies for retention of customers. The specific goals expected to be
achieved are given below
 Identification of the top variables driving likelihood of churn
 An independent industry survey suggested that “cost and billing”, “network and service quality” and “data usage
connectivity issues” are key influencing factors of churn. Validate if this holds true for Company X.
 Build a predictive model to identify customers who have highest probability of churn that the company can use for
proactive retention strategy.
 Build a lift chart so that the company can optimize their efforts by targeting most of the potential churns with
least contact efforts. Here with 30% of the total customer pool, the model accurately provides 33% of total
potential churn candidates.
Executive Summary Continued ...
 Extensive experimentation with more than 10 different models was done to identify the best model that could
predict customer behaviour so that the company can take proactive steps to retain the customer wherever
possible.
 The data was slightly imbalanced (Majority class: Minority Class = 76:24) and hence appropriate model evaluation
metric was required to be chosen. A combination of Harmonic mean (F1 score) and Area Under the Curve (AUC)
was used to finalize the best model.
 Models tried to arrive at the best are
 Simple Models like Logistic Regression & Discriminant Analysis with different thresholds for classification
 Random Forest after balancing the dataset using Synthetic Minority Oversampling Technique (SMOTE)
 Ensemble of five individual models and predicting the output by averaging the individual output probabilities
 Xgboost algorithm
Note: Ensemble model with majority and stacked ensemble were also tried but they did not give good results and hence not included in the project report
 The key insights as provided by the logistic regression model, which incidentally provided the best metrics,
indicate that the variables that impact customer behaviour can be broadly classified as below:
 minutes of usage(both in terms of duration and number of calls),
 overage charges,
 network quality (for both voice and data),
 number of subscribers in the family,
 handset price and
 credit class
 A Gains chart was prepared which gave a cumulative lift of 110% in the first 3 deciles. The customers with highest
probability of churn were identified. The company could use the customer details to proactively work with them
and retain them.
Approach
• Variables
greater than
30% missing
values
Remove
Variables
• Identify
Significant
variables using
Boruta
Feature
Selection
• kNN used on
dataset with
significant
variables
Missing Value
Imputation
• Winsor &
Variable
Transformation
Outlier
Treatment
Model Building
Logistic Regression
Linear Discriminant Analysis
Random Forest
Ensemble Models – Averaging the output from each individual model
Ensemble Models – Majority Voting
Stacked Ensemble Models
Boosting – Xgboost
Data Summary – Prior to any missing value treatment or data transformation
Variable datatype Max Min SD Average Unique Missing
hnd_webcap factor 0 0 0 0 3 2382
avg6mou integer 5589.00 0.00 527.02 527.02 0 814
avg6qty integer 2759.00 0.00 183.76 183.76 0 814
age1 integer 94.00 0.00 31.16 31.16 0 468
age2 integer 99.00 0.00 21.06 21.06 0 468
hnd_price numeric 499.99 9.99 104.95 104.95 0 254
change_mou numeric 3046.75 -2785.00 -10.21 -10.21 0 161
mou_Mean numeric 7667.75 0.00 533.58 533.58 0 58
totmrc_Mean numeric 399.99 -26.92 47.19 47.19 0 58
rev_Range numeric 1524.39 0.00 44.13 44.13 0 58
mou_Range numeric 6233.00 0.00 378.75 378.75 0 58
ovrrev_Mean numeric 896.09 0.00 13.32 13.32 0 58
rev_Mean numeric 926.08 -2.52 59.36 59.36 0 58
ovrmou_Mean numeric 3472.25 0.00 40.48 40.48 0 58
roam_Mean numeric 488.78 0.00 1.18 1.18 0 58
da_Range numeric 57.42 0.00 1.64 1.64 0 58
datovr_Mean numeric 242.87 0.00 0.25 0.25 0 58
datovr_Range numeric 475.02 0.00 0.71 0.71 0 58
drop_blk_Mean numeric 411.67 0.00 10.22 10.22 0 0
drop_vce_Range integer 313.00 0.00 5.52 5.52 0 0
owylis_vce_Range integer 542.00 0.00 15.90 15.90 0 0
mou_opkv_Range numeric 4783.67 0.00 117.42 117.42 0 0
months integer 60.00 6.00 18.69 18.69 0 0
Data Summary – Continued ...
Prior to any missing value treatment or data transformation
Variable datatype Max Min SD Average Unique Missing
totcalls integer 92076 0 2936.16 2936.16 0 0
eqpdays integer 1812 -5 376.45 376.45 0 0
custcare_Mean numeric 365.6667 0 1.90 1.90 0 0
callwait_Mean numeric 212.6667 0 1.89 1.89 0 0
iwylis_vce_Mean numeric 519.3333 0 8.31 8.31 0 0
callwait_Range integer 143 0 1.92 1.92 0 0
ccrndmou_Range integer 600 0 7.45 7.45 0 0
adjqty integer 92076 0 2895.65 2895.65 0 0
comp_vce_Mean numeric 1812.667 0 112.59 112.59 0 0
plcd_vce_Mean numeric 2180.333 0 149.80 149.80 0 0
avg3mou integer 7270 0 538.48 538.48 0 0
avgmou numeric 6329.4 0 493.95 493.95 0 0
avg3qty integer 3261 0 185.87 185.87 0 0
avgqty numeric 2475.75 0 177.23 177.23 0 0
crclscod factor 0 0 0.00 0.00 49 0
asl_flag factor 0 0 0.00 0.00 2 0
models integer 15 1 1.57 1.57 0 0
actvsubs integer 11 0 1.35 1.35 0 0
uniqsubs integer 12 1 1.52 1.52 0 0
drop_vce_Mean numeric 195.3333 0 6.08 6.08 0 0
adjmou numeric 174383.4 0 7707.23 7707.23 0 0
totrev numeric 13358.37 9.12 1037.70 1037.70 0 0
adjrev numeric 12982.62 8.77 965.30 965.30 0 0
avgrev numeric 588.27 1.13 58.37 58.37 0 0
Customer_ID integer 1099998 1000004 1050532.09 1050532.09 0 0
plcd_dat_Mean numeric 465 0 0.92 0.92 0 0
churn factor 0 0 0.00 0.00 2 0
Significant Variables, Missing Value Imputation & Outlier Treatment
The above variables were removed from the original dataset and then a boruta model was built for feature selection
purposes. The model identified the significant variables which was used to further subset the original dataset.
The significant variables were, as identified, given below
Correlation Plot highlighting the correlated numeric variables
Highly correlated variables
that was considered for
removal to reduce multi-
collinearity
Significant Variables, Missing Value Imputation & Outlier Treatment
Further sub setting the original with by selecting the significant variables, we still have 50 variables, with 18 of them
still having missing values.
The missing values were treated by using kNN algorithm to predict the missing values. Library VIM was used.
After that some of the highly correlated variables were removed.
Most of the numeric variables had outliers. The boxplots as can be seen when the code is run, help visualize the
variables with outliers.
The outliers were treated by winsorizing the variables. Library Desctools was used. Some variables were further log
transformed. See the r code for more details.
Data Visualization
Similar trend for customers who churn and who do not. Higher no. of
customers with low usage and high mean usage.
For customers who churn, the trend is generally decreasing after a value of
30. Customers who don’t churn, the trend is decreasing after close to 50.
Similar pattern in rev_Range for both customer behaviors
Similar pattern but differences in densities noticeable for comparable values.
Data Visualization
Similar trend for both customer behavior.
Similar pattern in rev_Range for both customer behaviors
Similar pattern but differences in densities noticeable for comparable values.Similar pattern but differences in densities noticeable for comparable values.
Data Visualization – Using Transformed Variables
Distinct differentiation noticed in customer behavior across models and
Months of usage.
Similarly for Call Drops also as seen below
Distinct differentiation achieved for both the variables after transformation
to two classes based on churn percentage
Model Comparison
Model Comparison – The previously build models and an Ensemble Model
Ensemble of five Individual models and final prediction was done by averaging the predictions from the models
Model Comparison with Ensemble Modelling
Five Individual models built and final prediction was done by combining the predictions from the models
Model Comparison
 Ten models tried to finalize the best amongst them. The evaluation metric comparison seen
above
 LG_26 seems to be giving the best results with highest AUC, second highest accuracy and
relatively good sensitivity and specificity
LG_26 is a logistic regression model with a threshold of cut-off for levels chosen at 26%
Variables with more than 50% probability of changing the decision of the customer for every 1 unit
change in the respective independent variable
Top ten factors for customer churn are
1. crcscod (Credit Class Code),
2. hnd_price (current handset price),
3. avgqty, (average monthly number of calls over the life of customer) ,
4. adjmou (Billing adjusted total minutes of use over the life of customer),
5. Uniqsubs (Number of unique subscribers in the household),
6. callwait_range, (Range of number of call waiting calls)
7. Datovr_Range, Range of revenue of data overage
8. Drop_blk_Mean, Mean number of dropped or blocked calls
9. Drop_vce_Range, Range of number of dropped (failed) voice calls
10. rev_range, Range of revenue(charge amount)
Validation of applicability of Independent Survey Finding for the Telecom Industry (Refer the Executive summary)
Shown below is the summary of the Logistic model used with each variable’s significance status.
Variables impacting cost and billing are highly significant. Adjmou has one of the top 5 odds ratio.
Mean total monthly recurring charge (totmrc_Mean), Revenue (charge amount) Range (rev_Range), adjmou (Billing adjustments) etc are
found to be highly significant, suggesting cost and billing impact customer behaviour.
Similarly network and service quality variables like drop_bkl_Mean (mean no. of dropped and blocked calls) is highly significant,
drop_vce_Range is significant, whereas callwait_mean is significant at alpha = 10%.
Datovr_Range(Range of revenue of data overage) is not found to be significant but has an odds ratio of more than 1 indicating that 1 unit
change in its value has more than 50% chance of changing the customer behaviour from one level to other, thereby suggesting the need to pay
attention to it. Additionally the intercept is significant which constitutes effects of levels of categorical variables that were removed by the
model.
Recommend rate plan migration as a proactive retention strategy?
The analysis below suggests an answer as “Yes”.
Mou_Mean (minutes of usage) is one of the highly significant variables as was seen in the previous slide and hence it makes sense to work
toward proactively working with customers to increase their MOU so that they are retained for a longer period.
The below boxplot also suggests that customers who are retained seem to have higher MOU.
Additionally mouR_Factor is found to be highly significant which is a derived variable of mou_Range.
Changes in MOU is also highly significant. Change_mF is a derived variable of change_mou.
To complement the above we also see that ovrmou_Mean is also a highly significant variable with an odds ration of more than 1. The variable
has positive estimate of coefficient indicating increase in overage increases churn.
It would help if the company is able to work with the customers and based on their usage migrate them to optimal plan rates to avoid overage
charges.
Use of model for prioritisation of customers for a proactive retention campaigns
Gains – Lift Chart
The lift achieved will help to reach out to churn candidates by targeting much fewer of the total
customer pool with the company. The highest gain can be achieved through the first 30 deciles which
can give about 33% of the customers who are likely to terminate the services. This means the company
selects 30% of entire customer database and that covers 33% of people who are likely to leave. This is
much better than randomly calling customers in the absence of model which would have given maybe
15% hit rate from all potential churn candidates.
Identification of the customers with highest probability of terminating services
The 20% customers who need to be proactively worked with to prevent churn were identified with.
They are the customers whose probability of churn is greater than 32.24% and less than 84.7. The code
to get the customer details is given below
gains(as.numeric(Telecom_Winsor$churn),predict(LGMF,type="response",newdata=Telecom_Winsor[,-42]),groups = 10)
Telecom_Winsor$Cust_ID=mydata$Customer_ID
Telecom_Winsor$prob<-predict(LGMF,type="response",newdata=Telecom_Winsor[,-42])
quantile(Telecom_Winsor$prob,prob=c(0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,1))
targeted=Telecom_Winsor%>%filter(prob>0.3224491 & prob<=0.8470540)%>%dplyr::select(Cust_ID)

More Related Content

What's hot

Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomChris Chen
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
 
Introduction To Predictive Analytics Part I
Introduction To Predictive Analytics   Part IIntroduction To Predictive Analytics   Part I
Introduction To Predictive Analytics Part Ijayroy
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Kazi Toufiq Wadud
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxAniket Patil
 
Churn prediction
Churn predictionChurn prediction
Churn predictionGigi Lino
 
Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.Kuldeep Mahani
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET Journal
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introductionkrishna singh
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction PresentationPinintiHarishReddy
 
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...Louis Dorard
 

What's hot (20)

Churn modelling
Churn modellingChurn modelling
Churn modelling
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in Telecom
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
Introduction To Predictive Analytics Part I
Introduction To Predictive Analytics   Part IIntroduction To Predictive Analytics   Part I
Introduction To Predictive Analytics Part I
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Timeseries forecasting
Timeseries forecastingTimeseries forecasting
Timeseries forecasting
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
 
PCA
PCAPCA
PCA
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Churn prediction
Churn predictionChurn prediction
Churn prediction
 
Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom Industry
 
Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introduction
 
predictive analytics
predictive analyticspredictive analytics
predictive analytics
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction Presentation
 
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
 
Telcom churn .pptx
Telcom churn .pptxTelcom churn .pptx
Telcom churn .pptx
 

Similar to Predict customer churn with logistic regression and ensemble modeling

Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industryskewdlogix
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
 
Data Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationData Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationKaushik Rajan
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesSindhujanDhayalan
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxpatilaniket2418
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...IRJET Journal
 
Operations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paperOperations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paperSomashekar S.M
 
Data Mining Problems in Retail
Data Mining Problems in RetailData Mining Problems in Retail
Data Mining Problems in RetailIlya Katsov
 
Setanta Systems - Supply Chain Report and Analyses Module
Setanta Systems - Supply Chain Report and Analyses ModuleSetanta Systems - Supply Chain Report and Analyses Module
Setanta Systems - Supply Chain Report and Analyses ModuleSeabrook Technology Group
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Churn model for telecom
Churn model for telecomChurn model for telecom
Churn model for telecomAmit Kumar
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptxDhanuDhanu49
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...IRJET Journal
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment Kunal Kashyap
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersMohitMhapuskar
 
Leveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataLeveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataMRS
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 

Similar to Predict customer churn with logistic regression and ensemble modeling (20)

Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Data Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationData Mining on Customer Churn Classification
Data Mining on Customer Churn Classification
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniques
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
 
Predictive modelling
Predictive modellingPredictive modelling
Predictive modelling
 
Operations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paperOperations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paper
 
Data Mining Problems in Retail
Data Mining Problems in RetailData Mining Problems in Retail
Data Mining Problems in Retail
 
Setanta Systems - Supply Chain Report and Analyses Module
Setanta Systems - Supply Chain Report and Analyses ModuleSetanta Systems - Supply Chain Report and Analyses Module
Setanta Systems - Supply Chain Report and Analyses Module
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Manuscript dss
Manuscript dssManuscript dss
Manuscript dss
 
Churn model for telecom
Churn model for telecomChurn model for telecom
Churn model for telecom
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptx
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
 
Leveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic dataLeveragin research, behavioural and demeographic data
Leveragin research, behavioural and demeographic data
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 

Recently uploaded

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 

Recently uploaded (20)

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 

Predict customer churn with logistic regression and ensemble modeling

  • 1. Pranov Shobhan Mishra Prediction of customer propensity to churn TELECOM INDUSTRY Refer to github link for the code https://github.com/Pranov1984/Prediction-of-customer-propensity-to-churn
  • 2. Executive Summary Overview: At company X, there is a concern that there is an increased churn being experienced by all companies in the telecom industry. Customer retention and Average revenue per unit (ARPU) has become a strong area of focus, especially for company X, as the churn rate for X, has been relatively high. Problem Statement: Currently the effort to retain customers has been very reactive as the attempt is made only when the subscriber calls in to close the account. The management team is keen to take more initiatives on this front and have a targeted proactive strategy. Goal Statement: The aim of this project is to do a data analysis and help the company with insights on customer behaviour that would be helpful in devising targeted strategies for retention of customers. The specific goals expected to be achieved are given below  Identification of the top variables driving likelihood of churn  An independent industry survey suggested that “cost and billing”, “network and service quality” and “data usage connectivity issues” are key influencing factors of churn. Validate if this holds true for Company X.  Build a predictive model to identify customers who have highest probability of churn that the company can use for proactive retention strategy.  Build a lift chart so that the company can optimize their efforts by targeting most of the potential churns with least contact efforts. Here with 30% of the total customer pool, the model accurately provides 33% of total potential churn candidates.
  • 3. Executive Summary Continued ...  Extensive experimentation with more than 10 different models was done to identify the best model that could predict customer behaviour so that the company can take proactive steps to retain the customer wherever possible.  The data was slightly imbalanced (Majority class: Minority Class = 76:24) and hence appropriate model evaluation metric was required to be chosen. A combination of Harmonic mean (F1 score) and Area Under the Curve (AUC) was used to finalize the best model.  Models tried to arrive at the best are  Simple Models like Logistic Regression & Discriminant Analysis with different thresholds for classification  Random Forest after balancing the dataset using Synthetic Minority Oversampling Technique (SMOTE)  Ensemble of five individual models and predicting the output by averaging the individual output probabilities  Xgboost algorithm Note: Ensemble model with majority and stacked ensemble were also tried but they did not give good results and hence not included in the project report  The key insights as provided by the logistic regression model, which incidentally provided the best metrics, indicate that the variables that impact customer behaviour can be broadly classified as below:  minutes of usage(both in terms of duration and number of calls),  overage charges,  network quality (for both voice and data),  number of subscribers in the family,  handset price and  credit class  A Gains chart was prepared which gave a cumulative lift of 110% in the first 3 deciles. The customers with highest probability of churn were identified. The company could use the customer details to proactively work with them and retain them.
  • 4. Approach • Variables greater than 30% missing values Remove Variables • Identify Significant variables using Boruta Feature Selection • kNN used on dataset with significant variables Missing Value Imputation • Winsor & Variable Transformation Outlier Treatment Model Building Logistic Regression Linear Discriminant Analysis Random Forest Ensemble Models – Averaging the output from each individual model Ensemble Models – Majority Voting Stacked Ensemble Models Boosting – Xgboost
  • 5. Data Summary – Prior to any missing value treatment or data transformation Variable datatype Max Min SD Average Unique Missing hnd_webcap factor 0 0 0 0 3 2382 avg6mou integer 5589.00 0.00 527.02 527.02 0 814 avg6qty integer 2759.00 0.00 183.76 183.76 0 814 age1 integer 94.00 0.00 31.16 31.16 0 468 age2 integer 99.00 0.00 21.06 21.06 0 468 hnd_price numeric 499.99 9.99 104.95 104.95 0 254 change_mou numeric 3046.75 -2785.00 -10.21 -10.21 0 161 mou_Mean numeric 7667.75 0.00 533.58 533.58 0 58 totmrc_Mean numeric 399.99 -26.92 47.19 47.19 0 58 rev_Range numeric 1524.39 0.00 44.13 44.13 0 58 mou_Range numeric 6233.00 0.00 378.75 378.75 0 58 ovrrev_Mean numeric 896.09 0.00 13.32 13.32 0 58 rev_Mean numeric 926.08 -2.52 59.36 59.36 0 58 ovrmou_Mean numeric 3472.25 0.00 40.48 40.48 0 58 roam_Mean numeric 488.78 0.00 1.18 1.18 0 58 da_Range numeric 57.42 0.00 1.64 1.64 0 58 datovr_Mean numeric 242.87 0.00 0.25 0.25 0 58 datovr_Range numeric 475.02 0.00 0.71 0.71 0 58 drop_blk_Mean numeric 411.67 0.00 10.22 10.22 0 0 drop_vce_Range integer 313.00 0.00 5.52 5.52 0 0 owylis_vce_Range integer 542.00 0.00 15.90 15.90 0 0 mou_opkv_Range numeric 4783.67 0.00 117.42 117.42 0 0 months integer 60.00 6.00 18.69 18.69 0 0
  • 6. Data Summary – Continued ... Prior to any missing value treatment or data transformation Variable datatype Max Min SD Average Unique Missing totcalls integer 92076 0 2936.16 2936.16 0 0 eqpdays integer 1812 -5 376.45 376.45 0 0 custcare_Mean numeric 365.6667 0 1.90 1.90 0 0 callwait_Mean numeric 212.6667 0 1.89 1.89 0 0 iwylis_vce_Mean numeric 519.3333 0 8.31 8.31 0 0 callwait_Range integer 143 0 1.92 1.92 0 0 ccrndmou_Range integer 600 0 7.45 7.45 0 0 adjqty integer 92076 0 2895.65 2895.65 0 0 comp_vce_Mean numeric 1812.667 0 112.59 112.59 0 0 plcd_vce_Mean numeric 2180.333 0 149.80 149.80 0 0 avg3mou integer 7270 0 538.48 538.48 0 0 avgmou numeric 6329.4 0 493.95 493.95 0 0 avg3qty integer 3261 0 185.87 185.87 0 0 avgqty numeric 2475.75 0 177.23 177.23 0 0 crclscod factor 0 0 0.00 0.00 49 0 asl_flag factor 0 0 0.00 0.00 2 0 models integer 15 1 1.57 1.57 0 0 actvsubs integer 11 0 1.35 1.35 0 0 uniqsubs integer 12 1 1.52 1.52 0 0 drop_vce_Mean numeric 195.3333 0 6.08 6.08 0 0 adjmou numeric 174383.4 0 7707.23 7707.23 0 0 totrev numeric 13358.37 9.12 1037.70 1037.70 0 0 adjrev numeric 12982.62 8.77 965.30 965.30 0 0 avgrev numeric 588.27 1.13 58.37 58.37 0 0 Customer_ID integer 1099998 1000004 1050532.09 1050532.09 0 0 plcd_dat_Mean numeric 465 0 0.92 0.92 0 0 churn factor 0 0 0.00 0.00 2 0
  • 7. Significant Variables, Missing Value Imputation & Outlier Treatment The above variables were removed from the original dataset and then a boruta model was built for feature selection purposes. The model identified the significant variables which was used to further subset the original dataset. The significant variables were, as identified, given below
  • 8. Correlation Plot highlighting the correlated numeric variables Highly correlated variables that was considered for removal to reduce multi- collinearity
  • 9. Significant Variables, Missing Value Imputation & Outlier Treatment Further sub setting the original with by selecting the significant variables, we still have 50 variables, with 18 of them still having missing values. The missing values were treated by using kNN algorithm to predict the missing values. Library VIM was used. After that some of the highly correlated variables were removed. Most of the numeric variables had outliers. The boxplots as can be seen when the code is run, help visualize the variables with outliers. The outliers were treated by winsorizing the variables. Library Desctools was used. Some variables were further log transformed. See the r code for more details.
  • 10. Data Visualization Similar trend for customers who churn and who do not. Higher no. of customers with low usage and high mean usage. For customers who churn, the trend is generally decreasing after a value of 30. Customers who don’t churn, the trend is decreasing after close to 50. Similar pattern in rev_Range for both customer behaviors Similar pattern but differences in densities noticeable for comparable values.
  • 11. Data Visualization Similar trend for both customer behavior. Similar pattern in rev_Range for both customer behaviors Similar pattern but differences in densities noticeable for comparable values.Similar pattern but differences in densities noticeable for comparable values.
  • 12. Data Visualization – Using Transformed Variables Distinct differentiation noticed in customer behavior across models and Months of usage. Similarly for Call Drops also as seen below Distinct differentiation achieved for both the variables after transformation to two classes based on churn percentage
  • 14. Model Comparison – The previously build models and an Ensemble Model Ensemble of five Individual models and final prediction was done by averaging the predictions from the models
  • 15. Model Comparison with Ensemble Modelling Five Individual models built and final prediction was done by combining the predictions from the models
  • 16. Model Comparison  Ten models tried to finalize the best amongst them. The evaluation metric comparison seen above  LG_26 seems to be giving the best results with highest AUC, second highest accuracy and relatively good sensitivity and specificity LG_26 is a logistic regression model with a threshold of cut-off for levels chosen at 26%
  • 17. Variables with more than 50% probability of changing the decision of the customer for every 1 unit change in the respective independent variable Top ten factors for customer churn are 1. crcscod (Credit Class Code), 2. hnd_price (current handset price), 3. avgqty, (average monthly number of calls over the life of customer) , 4. adjmou (Billing adjusted total minutes of use over the life of customer), 5. Uniqsubs (Number of unique subscribers in the household), 6. callwait_range, (Range of number of call waiting calls) 7. Datovr_Range, Range of revenue of data overage 8. Drop_blk_Mean, Mean number of dropped or blocked calls 9. Drop_vce_Range, Range of number of dropped (failed) voice calls 10. rev_range, Range of revenue(charge amount)
  • 18. Validation of applicability of Independent Survey Finding for the Telecom Industry (Refer the Executive summary) Shown below is the summary of the Logistic model used with each variable’s significance status. Variables impacting cost and billing are highly significant. Adjmou has one of the top 5 odds ratio. Mean total monthly recurring charge (totmrc_Mean), Revenue (charge amount) Range (rev_Range), adjmou (Billing adjustments) etc are found to be highly significant, suggesting cost and billing impact customer behaviour. Similarly network and service quality variables like drop_bkl_Mean (mean no. of dropped and blocked calls) is highly significant, drop_vce_Range is significant, whereas callwait_mean is significant at alpha = 10%. Datovr_Range(Range of revenue of data overage) is not found to be significant but has an odds ratio of more than 1 indicating that 1 unit change in its value has more than 50% chance of changing the customer behaviour from one level to other, thereby suggesting the need to pay attention to it. Additionally the intercept is significant which constitutes effects of levels of categorical variables that were removed by the model.
  • 19. Recommend rate plan migration as a proactive retention strategy? The analysis below suggests an answer as “Yes”. Mou_Mean (minutes of usage) is one of the highly significant variables as was seen in the previous slide and hence it makes sense to work toward proactively working with customers to increase their MOU so that they are retained for a longer period. The below boxplot also suggests that customers who are retained seem to have higher MOU. Additionally mouR_Factor is found to be highly significant which is a derived variable of mou_Range. Changes in MOU is also highly significant. Change_mF is a derived variable of change_mou. To complement the above we also see that ovrmou_Mean is also a highly significant variable with an odds ration of more than 1. The variable has positive estimate of coefficient indicating increase in overage increases churn. It would help if the company is able to work with the customers and based on their usage migrate them to optimal plan rates to avoid overage charges.
  • 20. Use of model for prioritisation of customers for a proactive retention campaigns Gains – Lift Chart The lift achieved will help to reach out to churn candidates by targeting much fewer of the total customer pool with the company. The highest gain can be achieved through the first 30 deciles which can give about 33% of the customers who are likely to terminate the services. This means the company selects 30% of entire customer database and that covers 33% of people who are likely to leave. This is much better than randomly calling customers in the absence of model which would have given maybe 15% hit rate from all potential churn candidates.
  • 21. Identification of the customers with highest probability of terminating services The 20% customers who need to be proactively worked with to prevent churn were identified with. They are the customers whose probability of churn is greater than 32.24% and less than 84.7. The code to get the customer details is given below gains(as.numeric(Telecom_Winsor$churn),predict(LGMF,type="response",newdata=Telecom_Winsor[,-42]),groups = 10) Telecom_Winsor$Cust_ID=mydata$Customer_ID Telecom_Winsor$prob<-predict(LGMF,type="response",newdata=Telecom_Winsor[,-42]) quantile(Telecom_Winsor$prob,prob=c(0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,1)) targeted=Telecom_Winsor%>%filter(prob>0.3224491 & prob<=0.8470540)%>%dplyr::select(Cust_ID)