Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Idiro Analytics - Social Network An... by Idiro Analytics 2031 views
- Data needs to be analysed: The Soci... by LUCA AI Powered D... 1054 views
- Creating Value from Big Data by LUCA AI Powered D... 973 views
- Telco Churn Roi V3 by hkaul 11968 views
- Idiro Analytics - Analytics & Big Data by Idiro Analytics 7044 views
- Churn modelling by Yogesh Khandelwal 6291 views

No Downloads

Total views

6,169

On SlideShare

0

From Embeds

0

Number of Embeds

270

Shares

0

Downloads

0

Comments

10

Likes

9

No notes for slide

- 1. Satyam Barsaiyan Great Lakes Institute of Management, Chennai
- 2. Predictive modeling using CART & Logistic regression Algorithm What is Churn Rate & How it affect Companies ? Data Collection and Descriptive Statistics Comparison between CART & Logistic Regression model and Final Recommendation
- 3. High Value Customers High Value Customers which are likely to churn Customers which are likely to churn Fig 1.1
- 4. Sl No. state account_ length area_cod e internati onal_pla n voice_ mail_ plan number_ vmail_m essages total_day _minutes total_day _calls total_day _charge total_eve _minutes total_eve _calls total_eve _charge total_nig ht_minut es total_nig ht_calls total_nig ht_charg e total_intl _minutes total_intl _calls total_intl _charge number_ customer _service_ calls churn 1 KS 128area_code_415 no yes 25 265.1 110 45.07 197.4 99 16.78 244.7 91 11.01 10 3 2.7 1 0 2 OH 107area_code_415 no yes 26 161.6 123 27.47 195.5 103 16.62 254.4 103 11.45 13.7 3 3.7 1 0 3 NJ 137area_code_415 no no 0 243.4 114 41.38 121.2 110 10.3 162.6 104 7.32 12.2 5 3.29 0 0 4 OH 84area_code_408 yes no 0 299.4 71 50.9 61.9 88 5.26 196.9 89 8.86 6.6 7 1.78 2 0 5 OK 75area_code_415 yes no 0 166.7 113 28.34 148.3 122 12.61 186.9 121 8.41 10.1 3 2.73 3 0 6 AL 118area_code_510 yes no 0 223.4 98 37.98 220.6 101 18.75 203.9 118 9.18 6.3 6 1.7 0 0 7 MA 121area_code_510 no yes 24 218.2 88 37.09 348.5 108 29.62 212.6 118 9.57 7.5 7 2.03 3 0 8 MO 147area_code_415 yes no 0 157 79 26.69 103.1 94 8.76 211.8 96 9.53 7.1 6 1.92 0 0 9 LA 117area_code_408 no no 0 184.5 97 31.37 351.6 80 29.89 215.8 90 9.71 8.7 4 2.35 1 0 10 WV 141area_code_415 yes yes 37 258.6 84 43.96 222 111 18.87 326.4 97 14.69 11.2 5 3.02 0 0 # of Observations # of Variables Churn 5000 20 Train_Churn 3333 20 Test_Churn 1667 20 Data Set Dimensions Data set used in this analysis is taken from Crain Repositories embedded in C50 package. This data set consist of 5000 observations and have 20 variables, out of which 19 variables are predictor variables and 1 variable is the response variables. The data set is partitioned in Train and Test in the ratio of 2/3. Table 1.1 Snapshot of Dataset used in the Analysis Table 1.2
- 5. Description, Role & Class of Variables in the Dataset Table 1.3 Variable Role Class Description Use in Model churn Response Binary 0 = Customer didn't left the service provider, 1 = Customer left the service provider DV state Predictor Nominal State to which customer belong IV account_length Predictor Numeric No. of days customer is associated with service provider IV area_code Predictor Nominal Area within each state IV international_plan Predictor Categorical Yes (1) = international plan, No (0) = No international plan IV voice_mail_plan Predictor Categorical Yes (1) = Active voice mail plan, No (0) = No voice mail plan IV number_vmail_messages Predictor Numeric Self explanatory IV total_day_minutes Predictor Numeric Self explanatory IV total_day_calls Predictor Numeric Self explanatory IV total_day_charge Predictor Numeric Self explanatory IV total_eve_minutes Predictor Numeric Self explanatory IV total_eve_calls Predictor Numeric Self explanatory IV total_eve_charge Predictor Numeric Self explanatory IV total_night_minutes Predictor Numeric Self explanatory IV total_night_calls Predictor Numeric Self explanatory IV total_night_charge Predictor Numeric Self explanatory IV total_intl_minutes Predictor Numeric Self explanatory IV total_intl_calls Predictor Numeric Self explanatory IV total_intl_charge Predictor Numeric Self explanatory IV number_customer_service_calls Predictor Numeric Self explanatory IV DV: Dependent VariableIV : Independent Variable In the Table 1.3, Class, Role and Description of each variable is mentioned. Churn in the response variable (Dependent variable) and 19 variables are Predictor variables (Independent Variable ). We are using all 19 variables for Modelling. Before going for modelling we will find out the descriptive statistics, so as to gain a fair idea about the significance of each variable on Churn.
- 6. Next step in the process of Model building is the descriptive statistics to get idea about which predictor variable are likely to be significant, which will get eventually validated by the model Fig 1.2 Fig 1.3 First and Foremost is the calculation of the summary statistics, for which we have PROC MEANS in SAS, and to gain better understanding of Individual predictor variables on Churn, we have used Box-plot. Few such box plots are shown in the Fig. Table 1.3 In these two Box-plots we can Clearly see that, distribution of total_day_charge in case of Churn & No-Churn is significantly different, similarly in case of no._customer_service_calls (i.e. Number of Service Calls) distribution is significantly different in case of Churn & No- Churn.
- 7. Fig 1.4 In continuation, to understand the effect of the Nominal Variable like “State” we have used Tableau to generate area Map based on the Longitude and Latitude information. From the Area Map we can clearly notice that Churn is significantly high in few states like New Jersey (NJ) followed by Texas (TX). Now we have got the fair Idea of the relative importance of each and every variable, and we have completed our data preparation stage, so we will shift our focus to most important part of the analysis i.e., Modeling
- 8. Predictive Model Using CART ( Classification and Regression Tree ) Algorithm Tree based learning algorithms are considered to be one of the best and mostly used supervised learning methods. Tree based methods empower predictive models with high accuracy, stability and ease of interpretation. Unlike linear models, they map non-linear relationships quite well. They are adaptable at solving any kind of problem at hand (classification or regression). Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It works for both categorical and continuous input and output variables. In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter / differentiator in input variables. Let’s have a look at terminology associated with the Decision Tree. Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets. Decision Node: When a sub-node splits into further sub- nodes, then it is called decision node. Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node. Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say opposite process of splitting. Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree. Terminology associated with Decision Tree Fig 1.5
- 9. SAS Code for CART ( Classification & Regression Tree ) PROC HPSPLIT : SAS procedure that builds tree based statistical models for Classification and Regression Fig 1.6 GROW Statement: Specify the criteria using this statement to minimize the Node’s error. Entropy is the most common choice when growing a classification tree. Gini is another famous criteria PRUNE : The Prune statement specify the method for pruning a tree into smaller sub- tree. The most common method is pruning through Cost-complexity. The Algorithm makes trade off between Complexity and Error rate.
- 10. Results for CART ( Classification & Regression Tree ) Table 1.4 Fig 1.7 In the Table 1.4, Split Criteria and Pruning method is as per our code and Model level is ‘0’ which means model is predicting No-Churn. Fig 1.7 represents graph between ASE ( Average standard error) or Avg. Misclassification Rate and Cost-complexity. The Vertical reference line is drawn for the tree with minimum ASE, in this case it is with # of Leaves = 19.
- 11. Fig 1.7 Fig 1.8 Fig 1.9 Form the Fig 1.9 we can clearly see 4 stage Sub-tree generated out of completed tree as shown in Fig 1.8. First level of splitting is based on the total_day_charge followed by number_customer_s ervice_calls & voice_mail_plan in the 2nd stage.
- 12. 0.0 0.2 0.4 0.6 0.8 1.0 1 - Specificity 0.0 0.2 0.4 0.6 0.8 1.0 Sensitivity ROC Curve for dummy_churn Training 0.0 0.2 0.4 0.6 0.8 1.0 1 - Specificity 0.0 0.2 0.4 0.6 0.8 1.0 Sensitivity 0.91Training AUC ROC Curve for dummy_churn Training Fig 1.10 Table 1.5 Table 1.4 Type 1 Error Type 2 Error From the table 1.4 we can see that Model is able to Classify No-Churn as No-Churn with an error rate of 1.16% and Churn as Churn with the error rate of 23.81%. Total Mis-classification is 4.45% i.e., total accuracy of this model is 95.55% which is good. From the table 1.5, we can see that out of 19 predictor variable only 09 are significant for the model building and relative importance in the decreasing order is shown in the table.
- 13. Introduction to Logistic Regression What is Logistic Regression ? Logistic Regression is a classification algorithm. It is used to predict a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables. To represent binary / categorical outcome, we use dummy variables. We can also think of logistic regression as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function. Important Points in GLM ( Generalized Linear Model ) Logistic Regression is part of a larger class of algorithms known as Generalized Linear Model (GLM). GLM does not assume a linear relationship between dependent and independent variables. However, it assumes a linear relationship between link function and independent variables in logit model. The dependent variable need not to be normally distributed. It does not uses OLS (Ordinary Least Square) for parameter estimation. Instead, it uses maximum likelihood estimation (MLE). Errors need to be independent but not normally distributed. Performance Measure of Logistic regression Model AIC (Akaike Information Criteria) – The analogous metric of adjusted R² in logistic regression is AIC. AIC is the measure of fit which penalizes model for the number of model coefficients. Therefore, we always prefer model with minimum AIC value. Confusion Matrix: It is nothing but a tabular representation of Actual vs Predicted values. This helps us to find the accuracy of the model and avoid overfitting.
- 14. SAS code for Logistic Regression The PROC LOGISTIC statement invokes the LOGISTIC procedure and optionally identifies input and output data sets, suppresses the display of results, and controls the ordering of the response levels. Table 1.6
- 15. Results for Logistic Regression Table 1.8 Table 1.7 Table 1.6 Important results obtained for Logistic Regression Algorithm are mentioned in the Table 1.6, 1.7 & 1.8 respectively. From the table 1.6, we can see that our Model is build with Response variable (‘Churn’) and optimization technique used is Fisher’s scoring. AIC which is a measure of the performance of the Model, and high value of AIC in this case represents loose fit i.e., accuracy of the model is expected to be low. From the Maximum Likelihood Estimates table we can see that predictor variables encircled in red are significant at 95% confidence level.
- 16. Final Model based on the results we have seen in the Maximum Likelihood Estimates ( Table 1.8 ). Logit = -8.6514 + 2.0427*( international_plan) - 2.0248*( voice_mail_plan) + 0.0359*( number_vmail_message) -0.0930*(total_intl_calls) + 16.3896*( total_intl_charge) + 0.5136*( number_customer_serv) Confusion Matrix on Train data Confusion Matrix on Test Data Table 1.10Table 1.9 Overall Accuracy in case of Train data is 89.19%, and Type II error is 78.46% which is very high. Overall Accuracy in case of Test data is 87.40%, and Type II error is 80.80% which is very high. So, overall accuracy looks fine but Type II error is very high.
- 17. Conclusion & Recommendation Overall accuracy achieved in case of Model using CART is 95.55% with Type II error is 23.81%. Overall accuracy achieved in case Model using Logistic Regression is approximately 87% with the type two error is as high as 80.80%. Based on these two Key observation we recommend to Use CART in case of telecom Churn. Key Advantages of CART: Easy to Understand: Decision tree output is very easy to understand even for people from non- analytical background. It does not require any statistical knowledge to read and interpret them. Its graphical representation is very intuitive and users can easily relate their hypothesis. Less data cleaning required: It requires less data cleaning compared to some other modeling techniques. It is not influenced by outliers and missing values to a fair degree Data type is not a constraint: It can handle both numerical and categorical variables. Non Parametric Method: Decision tree is considered to be a non-parametric method. This means that decision trees have no assumptions about the space distribution and the classifier structure. Disadvantages Over fitting: Over fitting is one of the most practical difficulty for decision tree models. This problem gets solved by setting constraints on model parameters and pruning (discussed in detailed below). Not fit for continuous variables: While working with continuous numerical variables, decision tree looses information when it categorizes variables in different categories.
- 18. References- https://support.sas.com/documentation https://www.analyticsvidhya.com/blog/category/sas/ https://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/

No public clipboards found for this slide

Login to see the comments