SlideShare a Scribd company logo
1 of 6
Download to read offline
Predicting Customer Churn at QWE Inc.
Group 11:
Qiang Gong
Jiaxuan Han
Meghan Hickey
Xiangheng Ma
Yawen Yang
ID Days Since Last
Login 0-1
Probability of
Churn
354 -1 4.49%
672 2 4.83%
5203 5 5.19%
Executive Summary
In this paper, we will determine QWE Inc.’s customers’ probability of churn by the end of February 2012. This report will
use a mix of modelling methods to classify customers based on their likelihood to churn. Based on logistic regression anal-
ysis,we found that Customer Happiness Index in December, CHI change from November to December and difference
of Days Since Last Login between December and November are three most important factors QWE can use to pre-
dict a customer’s probability of terminating the contract.We will explain the application of this method and then look
at how a decision tree approach further validates the findings from logistic regression. In order to fully demonstrate our
findings, we will use customers 354, 672 and 5203 as illustrations throughout the paper and discuss the specific applica-
tions of the different modeling techniques to each of these three cases. We will also provide a list of 10 potential customers
who have the highest churn probability so that the company can adopt a more proactive way to retain them.
Analysis:
Logistic Regression Analysis
Single Variable Approach
For QWE, Inc., the purpose is to predict if a customer will
terminate their contract based on certain factors and assess
the importance of factors, so that QWE has an accurate
model for future reference. The first step of our analysis was
to run logistic regression as the method is helpful for
analyzing a dataset that has one or more variables whose
outcome is dichotomous.
To figure out the best predictor, we first aim to determine
which variables are significant in relation to the result.
We evaluated the variables by looking at their p-value for
significance and their standardized coefficient values and
were able to filter out the following variables that aren’t
significant at all - Age (how long they have been a custom-
er with QWE), Support Case 0-1 (the difference between
service requests from Nov. to Dec. ), SP 0-1(the difference
between seriousness of cases reported from Nov. to Dec.),
blog articles 0-1 (the difference between blog articles posted
from Nov. to Dec.), views 0-1 (the difference between views
from Nov. to Dec.)
To illustrate this finding, we calculated the probability of
churn of customers 354, 672 and 5203 based on the change
of Days Since Last Login between Nov and Dec. Their
probabilities of churn are listed below.
We can see that the probability of churn has a positive
correlation with the difference of Days Since Last Login
between Dec and Nov. The smaller the difference of Days
Since Last Login between Dec and Nov, that is, the more
active a customer becomes, the lower probability a customer
would churn. We also noticed that even though a customer
becomes more active, namely, the difference of Days Since
Last Login between Nov and Dec decreases, the probability
of churn doesn’t show obvious decrease. All the probabili-
ties are pretty low.
Then we compared the absolute value of the coefficient of
standardized data of the remaining variables. The higher the
absolute value is, the bigger impact the variable has on
predicting churn probability. We found that “Days Since
Last Login” has the biggest absolute value. Therefore, we
conclude that “Days Since Last Login 0-1” is the most
impactful predictor. It makes sense intuitively because this
variable indicates the change in recency between Nov and
Dec, which tells whether a customer becomes more active
or not.
•	 USE these predictions of probability to help QWE improve
their business
•	 FIND out the best predictor of prediction of probability of
churn
•	 ENABLE QWE to use the wealth of data they possess to
identify customers who are most likely to leave
Then we used Receiver Operating Characteristic (ROC)
curve to evaluate the performance of the single classifier.
(Please refer to the term explanation below). We start with
the variable Days Since Last Login 0-1 to see if the logistic
regression model with the single variable is accurate enough.
After feeding the model customer behavior data, we can
generate probabilities of churn for all customers in the
database. Then we ran ROC analysis in SPSS and came up
with the following graph:
Therefore, if the entire ROC curve is even below the
benchmark, the model doesn’t perform well because
it is even worse than random guessing. In the case of
QWE, one part of the curve does go under the benchmark
curve. Moreover, the AUC of 0.589 is only slightly bigger
than 0.5, which is the AUC of diagonal. Combining these
two facts, we can conclude that the predict model using
Days Since Last Login is not sensible enough and does not
predict outcomes very accurately.
In fact, when performing the ROC analysis for the
remaining variables, we find that all AUC are similarly low
(slightly higher than 0.5). Hence using a single variable to
predict the customer churn probability may not give us the
best result. We think it’s necessary to devise a better model
that looks at multiple factors at once so QWE can more
accurately track the behavior of their customers and
understand what it means.
Three Factors Approach
Based on previous logistic regression analysis, we selected
the six variables with the highest significance, CHI Month
0, CHI 0-1, Support Cases Month 0, Support Priority 0-1,
logins 0-1 and Days Since Last Login 0-1, to be included in
logistic regression model. However, we found that three of
them actually have no significant impact on the predicted
probability and decided to only analyze the three that
actually did have high significance. With this method, we
determined Customer Happiness Index in December,
CHI change from November to December and
difference of Days Since Last Login between December
and November to be the three best factors for our
prediction modelling because they contribute the most
to the churn probability.
To illustrate, we used this updated model with three key
variables instead of just one to calculate the churn
probability for customers 354, 672 and 5203. Their
probabilities of churn are 4.73%, 3.59%, and 4.46%
respectively, which are pretty low.
The y-axis is TPR and the x-axis is FPR. (SPSS names
them as sensitivity and 1- Specificity respectively, but
they are essentially the same). The diagonal line
represents a benchmark ROC curve using simple guess
method (i.e.flip a coin) to predict positive or negative
outcome.
Term: ROC
ROC is the most commonly
used method to measure whether
your classification is effective and has
many important advantages. First, it gives you
the true positive rate (TPR) and false positive rate
(FPR) by considering all possible cut off points rather
than looking at just one specific cut off point.
Second, the area under curve (AUC) is a
useful metric that represents the overall
accuracy of the model.
Term: TPR and FPR
TPR denotes the percentage
of customers who finally churn
as we predicted out of customers who
actually churn, while FPR denotes the
percentage of customers who didn’t churn but
we predicted they will out of the amount
of customers who didn’t churn in
reality.
ID 354 672 5203
CHI Month 0 139 148 37
CHI 0-1 -29 1 32
Days Since
Last Login 0-1
-1 2 5
Probability of
Churn
4.73% 3.59% 4.46%
Based on churn probabilities resulting from the updated
logistic regression model, we created a list of the top ten
customers who have the highest likelihood of leaving the
company. The table below lists the IDs and corresponding
churn probabilities for these ten risky customers. Note that in
our dataset, the probability ranges from 0% to 22.4%. To put
it another way, even though the absolute value of probability
is not as big as 90% or 100%, it is big enough to show the risk
of churn when compared internally. Being able to identify
these risky customers is a huge opportunity for QWE because
it will allow them to understand the specific forces that lead
to churn. With that information, they can more reasonably
attempt to cut this problem off at the head by knowing that a
customer is probably going to terminate their contract before
that customer has even decided it themselves.
The upgraded logistic regression model also showed
vast improvement on ROC curve. The AUC of 0.634 is
higher than that of any single variable, and every part
of the curve is over the diagonal. We find this method
to be much more suitable and appropriate for QWE’s
prediction needs. It’s not one factor, but a combination
of factors that lead to churn and the model that predicts
it must reflect that nuance.
10 Risky Customers
1971
2076
1287
3671
1929
4245
1236
1616
2546
22%
21%
19%
18%
17%
16%
16%
16%
16%
16%
Possibility of Churn
109
Precision 19.25%
Accuracy 87.96%
TPR 42.72%
FPR 9.61%
Decision Tree Analysis
In order to give QWE the most thorough recommendations possible, we further modeled the data through a second
approach. Decision trees are a visually representative way for us to predict if a customer will churn or not by
generating a clear, specific path of rules that can easily be understood. The following image is the decision tree we
got from R using the QWE case data. After manually going through this decision process for customer 354, 672,
and 5203 one by one, we predict that these three customer won’t leave.
This method is useful for showing us which variables are key influencers of churn and partitioning of meaningful
patterns of breaking points. The higher the position of a variable (the node) in the tree, the more importance of the
variable. Both the decision tree and Logistic regression pick Days Since Last Login as the best predictor. Furthermore,
we evaluated the performance of decision tree method using different accuracy metrics. Although the TPR is about
average performance, the Accuracy (88%) is much higher than that of logistic regression. However, it’s the nature of
decision tree because it has the tendency to maximumly fit the training dataset so that the accuracy would even reach
to 100%. That is to say, if given new customers, the model may do bad job in prediction. Moreover, decision tree is
extremely sensible to small changes in dataset: the structure of the tree would change correspondingly. In reality, it is
likely to happen because some customers may edit their profile and change some information. In contrast to above
two downsides of decision tree approach, logistic regression can be tailored to particular business circumstances. In
this case, different cutoff point can be set depending on how the manager weight the cost of losing a customer against
the cost of retaining a customer. In conclusion, we recommend QWE.Inc to adopt logistic regression approach.
Factors Change in
this factor
How possibility
of churn will be
affacted
CHI Month 0
CHI 0-1
Days Since Last
Login 0-1
Recommendation:
Based on our analysis, we recommend QWE to con-
sider Customer Happiness Index in December, CHI
change from November to December and difference of
Days Since Last Login as three most important drivers
of prediction of churn. Focusing on these variables will
allow QWE to focus on customers who they are in the
highest danger of churn and identify points at which
their business might fail and these customers might
leave. This knowledge can be applied to strategy in all
areas of the business: marketing, product management,
etc. The models we created will help QWE tighten up
their business and better understand their customers and
their behavior. Specific examples of strategy include the
creation of a customer service outreach program where
QWE targets these bottom ten customers and sends
service representatives to engage with them and offer
them incentives to stay with the company.
Through logistical regression, we found a specific
association between these three factors and possibility of
churn:
Using the knowledge about these three priority variables, we
have devised the following recommendations for QWE in
terms of business operation:
Enhance user experience to increase Customer
Happiness Index. To achieve this goal, QWE can take appli-
cations like making user interface more friendly and acceler-
ating loading speed.
Increase user cohesiveness and interaction to
improve customer login recency. It’s critical to maintain
our users’ level of activity on our platform. There is a clear
relationship from being more active of the site in terms of
both content creation and simply volume of activity. For
example, QWE can use better calls to action in order to
incentivize traffic. Other than that, if they can make their
service more mobile-friendly, it will help increase using
frequency as well.

More Related Content

What's hot

Supply Chain Of SHOE MANUFACTURING FIRM
Supply Chain Of SHOE MANUFACTURING FIRMSupply Chain Of SHOE MANUFACTURING FIRM
Supply Chain Of SHOE MANUFACTURING FIRMniranjan nahak
 
American Tourister: Creative Strategy
American Tourister: Creative StrategyAmerican Tourister: Creative Strategy
American Tourister: Creative StrategyAyusha Mittal
 
Marketing research project on nike shoes
Marketing research project on nike shoesMarketing research project on nike shoes
Marketing research project on nike shoesRohit Kumar
 
Case study Zara
Case study Zara Case study Zara
Case study Zara Riitu Jhamb
 
Seven- Eleven Japan Co. Case Analysis
Seven- Eleven Japan Co. Case AnalysisSeven- Eleven Japan Co. Case Analysis
Seven- Eleven Japan Co. Case AnalysisGeeta Hansdah
 
Porter's 5 force model(oil & gas sector)
Porter's 5 force model(oil & gas sector)Porter's 5 force model(oil & gas sector)
Porter's 5 force model(oil & gas sector)Saurabh Agarwal
 
Karnataka Engineering Case Analysis
Karnataka Engineering Case Analysis Karnataka Engineering Case Analysis
Karnataka Engineering Case Analysis Sanjay Sharma
 
Michael Porter's 5 Forces in Online retail Store/Retailer Flipkart
Michael Porter's 5 Forces in Online retail Store/Retailer FlipkartMichael Porter's 5 Forces in Online retail Store/Retailer Flipkart
Michael Porter's 5 Forces in Online retail Store/Retailer FlipkartPreeti Acharya
 
T-shirt business final plan
T-shirt business final planT-shirt business final plan
T-shirt business final planMazharul Islam
 
Textile industry analysis
Textile industry analysisTextile industry analysis
Textile industry analysisKalyani Joshi
 
Logistics and supply chain of IOC & ONGC
Logistics and supply chain of IOC & ONGCLogistics and supply chain of IOC & ONGC
Logistics and supply chain of IOC & ONGCTony Sebastian
 
Dmart brand manual
Dmart brand manualDmart brand manual
Dmart brand manualShamikaDukle
 

What's hot (20)

Supply Chain Of SHOE MANUFACTURING FIRM
Supply Chain Of SHOE MANUFACTURING FIRMSupply Chain Of SHOE MANUFACTURING FIRM
Supply Chain Of SHOE MANUFACTURING FIRM
 
Zappos
ZapposZappos
Zappos
 
American Tourister: Creative Strategy
American Tourister: Creative StrategyAmerican Tourister: Creative Strategy
American Tourister: Creative Strategy
 
Sports Obermeyer
Sports ObermeyerSports Obermeyer
Sports Obermeyer
 
KCPL Case Study
KCPL Case StudyKCPL Case Study
KCPL Case Study
 
Marketing research project on nike shoes
Marketing research project on nike shoesMarketing research project on nike shoes
Marketing research project on nike shoes
 
Fashion business plan
Fashion business planFashion business plan
Fashion business plan
 
The theory of business
The theory of businessThe theory of business
The theory of business
 
Case study Zara
Case study Zara Case study Zara
Case study Zara
 
Lego Outsourcing
Lego OutsourcingLego Outsourcing
Lego Outsourcing
 
Sport obermeyer
Sport obermeyerSport obermeyer
Sport obermeyer
 
Seven- Eleven Japan Co. Case Analysis
Seven- Eleven Japan Co. Case AnalysisSeven- Eleven Japan Co. Case Analysis
Seven- Eleven Japan Co. Case Analysis
 
Porter's 5 force model(oil & gas sector)
Porter's 5 force model(oil & gas sector)Porter's 5 force model(oil & gas sector)
Porter's 5 force model(oil & gas sector)
 
Karnataka Engineering Case Analysis
Karnataka Engineering Case Analysis Karnataka Engineering Case Analysis
Karnataka Engineering Case Analysis
 
Zara: Fast Fashion
Zara: Fast FashionZara: Fast Fashion
Zara: Fast Fashion
 
Michael Porter's 5 Forces in Online retail Store/Retailer Flipkart
Michael Porter's 5 Forces in Online retail Store/Retailer FlipkartMichael Porter's 5 Forces in Online retail Store/Retailer Flipkart
Michael Porter's 5 Forces in Online retail Store/Retailer Flipkart
 
T-shirt business final plan
T-shirt business final planT-shirt business final plan
T-shirt business final plan
 
Textile industry analysis
Textile industry analysisTextile industry analysis
Textile industry analysis
 
Logistics and supply chain of IOC & ONGC
Logistics and supply chain of IOC & ONGCLogistics and supply chain of IOC & ONGC
Logistics and supply chain of IOC & ONGC
 
Dmart brand manual
Dmart brand manualDmart brand manual
Dmart brand manual
 

Similar to report

Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Supply Chain Metrics That Matter: A Closer Look at the Cash-To-Cash Cycle (20...
Supply Chain Metrics That Matter: A Closer Look at the Cash-To-Cash Cycle (20...Supply Chain Metrics That Matter: A Closer Look at the Cash-To-Cash Cycle (20...
Supply Chain Metrics That Matter: A Closer Look at the Cash-To-Cash Cycle (20...Lora Cecere
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic RegressionTaweh Beysolow II
 
Seasonality effects on second hand cars sales
Seasonality effects on second hand cars salesSeasonality effects on second hand cars sales
Seasonality effects on second hand cars salesArmando Vieira
 
Cross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersCross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersSaurabh Singh
 
Data_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateData_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateKaren Yang
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisAmanda Reed
 
Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011gertjankaart
 
3. Modelling Trust - MA ES Techniques
3. Modelling Trust - MA ES Techniques3. Modelling Trust - MA ES Techniques
3. Modelling Trust - MA ES TechniquesGan Chun Chet
 
Trust But Verify - Equifax Automotive
Trust But Verify - Equifax AutomotiveTrust But Verify - Equifax Automotive
Trust But Verify - Equifax AutomotiveEquifax
 
Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationAsadJaved304231
 

Similar to report (20)

FICO Credit Risk Data
FICO Credit Risk DataFICO Credit Risk Data
FICO Credit Risk Data
 
FICO Credit Risk Data
FICO Credit Risk DataFICO Credit Risk Data
FICO Credit Risk Data
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Supply Chain Metrics That Matter: A Closer Look at the Cash-To-Cash Cycle (20...
Supply Chain Metrics That Matter: A Closer Look at the Cash-To-Cash Cycle (20...Supply Chain Metrics That Matter: A Closer Look at the Cash-To-Cash Cycle (20...
Supply Chain Metrics That Matter: A Closer Look at the Cash-To-Cash Cycle (20...
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic Regression
 
Seasonality effects on second hand cars sales
Seasonality effects on second hand cars salesSeasonality effects on second hand cars sales
Seasonality effects on second hand cars sales
 
Cross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersCross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customers
 
Data_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRateData_Analysis_LendingClub_InterestRate
Data_Analysis_LendingClub_InterestRate
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Econometrics
EconometricsEconometrics
Econometrics
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011Pres. Gertjan Kaart Credit Alliance Jan 2011
Pres. Gertjan Kaart Credit Alliance Jan 2011
 
3. Modelling Trust - MA ES Techniques
3. Modelling Trust - MA ES Techniques3. Modelling Trust - MA ES Techniques
3. Modelling Trust - MA ES Techniques
 
Final Report
Final ReportFinal Report
Final Report
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Trust But Verify - Equifax Automotive
Trust But Verify - Equifax AutomotiveTrust But Verify - Equifax Automotive
Trust But Verify - Equifax Automotive
 
Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical information
 
Kratos iim kozhikode
Kratos iim kozhikodeKratos iim kozhikode
Kratos iim kozhikode
 

report

  • 1. Predicting Customer Churn at QWE Inc. Group 11: Qiang Gong Jiaxuan Han Meghan Hickey Xiangheng Ma Yawen Yang
  • 2. ID Days Since Last Login 0-1 Probability of Churn 354 -1 4.49% 672 2 4.83% 5203 5 5.19% Executive Summary In this paper, we will determine QWE Inc.’s customers’ probability of churn by the end of February 2012. This report will use a mix of modelling methods to classify customers based on their likelihood to churn. Based on logistic regression anal- ysis,we found that Customer Happiness Index in December, CHI change from November to December and difference of Days Since Last Login between December and November are three most important factors QWE can use to pre- dict a customer’s probability of terminating the contract.We will explain the application of this method and then look at how a decision tree approach further validates the findings from logistic regression. In order to fully demonstrate our findings, we will use customers 354, 672 and 5203 as illustrations throughout the paper and discuss the specific applica- tions of the different modeling techniques to each of these three cases. We will also provide a list of 10 potential customers who have the highest churn probability so that the company can adopt a more proactive way to retain them. Analysis: Logistic Regression Analysis Single Variable Approach For QWE, Inc., the purpose is to predict if a customer will terminate their contract based on certain factors and assess the importance of factors, so that QWE has an accurate model for future reference. The first step of our analysis was to run logistic regression as the method is helpful for analyzing a dataset that has one or more variables whose outcome is dichotomous. To figure out the best predictor, we first aim to determine which variables are significant in relation to the result. We evaluated the variables by looking at their p-value for significance and their standardized coefficient values and were able to filter out the following variables that aren’t significant at all - Age (how long they have been a custom- er with QWE), Support Case 0-1 (the difference between service requests from Nov. to Dec. ), SP 0-1(the difference between seriousness of cases reported from Nov. to Dec.), blog articles 0-1 (the difference between blog articles posted from Nov. to Dec.), views 0-1 (the difference between views from Nov. to Dec.) To illustrate this finding, we calculated the probability of churn of customers 354, 672 and 5203 based on the change of Days Since Last Login between Nov and Dec. Their probabilities of churn are listed below. We can see that the probability of churn has a positive correlation with the difference of Days Since Last Login between Dec and Nov. The smaller the difference of Days Since Last Login between Dec and Nov, that is, the more active a customer becomes, the lower probability a customer would churn. We also noticed that even though a customer becomes more active, namely, the difference of Days Since Last Login between Nov and Dec decreases, the probability of churn doesn’t show obvious decrease. All the probabili- ties are pretty low. Then we compared the absolute value of the coefficient of standardized data of the remaining variables. The higher the absolute value is, the bigger impact the variable has on predicting churn probability. We found that “Days Since Last Login” has the biggest absolute value. Therefore, we conclude that “Days Since Last Login 0-1” is the most impactful predictor. It makes sense intuitively because this variable indicates the change in recency between Nov and Dec, which tells whether a customer becomes more active or not. • USE these predictions of probability to help QWE improve their business • FIND out the best predictor of prediction of probability of churn • ENABLE QWE to use the wealth of data they possess to identify customers who are most likely to leave
  • 3. Then we used Receiver Operating Characteristic (ROC) curve to evaluate the performance of the single classifier. (Please refer to the term explanation below). We start with the variable Days Since Last Login 0-1 to see if the logistic regression model with the single variable is accurate enough. After feeding the model customer behavior data, we can generate probabilities of churn for all customers in the database. Then we ran ROC analysis in SPSS and came up with the following graph: Therefore, if the entire ROC curve is even below the benchmark, the model doesn’t perform well because it is even worse than random guessing. In the case of QWE, one part of the curve does go under the benchmark curve. Moreover, the AUC of 0.589 is only slightly bigger than 0.5, which is the AUC of diagonal. Combining these two facts, we can conclude that the predict model using Days Since Last Login is not sensible enough and does not predict outcomes very accurately. In fact, when performing the ROC analysis for the remaining variables, we find that all AUC are similarly low (slightly higher than 0.5). Hence using a single variable to predict the customer churn probability may not give us the best result. We think it’s necessary to devise a better model that looks at multiple factors at once so QWE can more accurately track the behavior of their customers and understand what it means. Three Factors Approach Based on previous logistic regression analysis, we selected the six variables with the highest significance, CHI Month 0, CHI 0-1, Support Cases Month 0, Support Priority 0-1, logins 0-1 and Days Since Last Login 0-1, to be included in logistic regression model. However, we found that three of them actually have no significant impact on the predicted probability and decided to only analyze the three that actually did have high significance. With this method, we determined Customer Happiness Index in December, CHI change from November to December and difference of Days Since Last Login between December and November to be the three best factors for our prediction modelling because they contribute the most to the churn probability. To illustrate, we used this updated model with three key variables instead of just one to calculate the churn probability for customers 354, 672 and 5203. Their probabilities of churn are 4.73%, 3.59%, and 4.46% respectively, which are pretty low. The y-axis is TPR and the x-axis is FPR. (SPSS names them as sensitivity and 1- Specificity respectively, but they are essentially the same). The diagonal line represents a benchmark ROC curve using simple guess method (i.e.flip a coin) to predict positive or negative outcome. Term: ROC ROC is the most commonly used method to measure whether your classification is effective and has many important advantages. First, it gives you the true positive rate (TPR) and false positive rate (FPR) by considering all possible cut off points rather than looking at just one specific cut off point. Second, the area under curve (AUC) is a useful metric that represents the overall accuracy of the model. Term: TPR and FPR TPR denotes the percentage of customers who finally churn as we predicted out of customers who actually churn, while FPR denotes the percentage of customers who didn’t churn but we predicted they will out of the amount of customers who didn’t churn in reality.
  • 4. ID 354 672 5203 CHI Month 0 139 148 37 CHI 0-1 -29 1 32 Days Since Last Login 0-1 -1 2 5 Probability of Churn 4.73% 3.59% 4.46% Based on churn probabilities resulting from the updated logistic regression model, we created a list of the top ten customers who have the highest likelihood of leaving the company. The table below lists the IDs and corresponding churn probabilities for these ten risky customers. Note that in our dataset, the probability ranges from 0% to 22.4%. To put it another way, even though the absolute value of probability is not as big as 90% or 100%, it is big enough to show the risk of churn when compared internally. Being able to identify these risky customers is a huge opportunity for QWE because it will allow them to understand the specific forces that lead to churn. With that information, they can more reasonably attempt to cut this problem off at the head by knowing that a customer is probably going to terminate their contract before that customer has even decided it themselves. The upgraded logistic regression model also showed vast improvement on ROC curve. The AUC of 0.634 is higher than that of any single variable, and every part of the curve is over the diagonal. We find this method to be much more suitable and appropriate for QWE’s prediction needs. It’s not one factor, but a combination of factors that lead to churn and the model that predicts it must reflect that nuance. 10 Risky Customers 1971 2076 1287 3671 1929 4245 1236 1616 2546 22% 21% 19% 18% 17% 16% 16% 16% 16% 16% Possibility of Churn 109
  • 5. Precision 19.25% Accuracy 87.96% TPR 42.72% FPR 9.61% Decision Tree Analysis In order to give QWE the most thorough recommendations possible, we further modeled the data through a second approach. Decision trees are a visually representative way for us to predict if a customer will churn or not by generating a clear, specific path of rules that can easily be understood. The following image is the decision tree we got from R using the QWE case data. After manually going through this decision process for customer 354, 672, and 5203 one by one, we predict that these three customer won’t leave. This method is useful for showing us which variables are key influencers of churn and partitioning of meaningful patterns of breaking points. The higher the position of a variable (the node) in the tree, the more importance of the variable. Both the decision tree and Logistic regression pick Days Since Last Login as the best predictor. Furthermore, we evaluated the performance of decision tree method using different accuracy metrics. Although the TPR is about average performance, the Accuracy (88%) is much higher than that of logistic regression. However, it’s the nature of decision tree because it has the tendency to maximumly fit the training dataset so that the accuracy would even reach to 100%. That is to say, if given new customers, the model may do bad job in prediction. Moreover, decision tree is extremely sensible to small changes in dataset: the structure of the tree would change correspondingly. In reality, it is likely to happen because some customers may edit their profile and change some information. In contrast to above two downsides of decision tree approach, logistic regression can be tailored to particular business circumstances. In this case, different cutoff point can be set depending on how the manager weight the cost of losing a customer against the cost of retaining a customer. In conclusion, we recommend QWE.Inc to adopt logistic regression approach.
  • 6. Factors Change in this factor How possibility of churn will be affacted CHI Month 0 CHI 0-1 Days Since Last Login 0-1 Recommendation: Based on our analysis, we recommend QWE to con- sider Customer Happiness Index in December, CHI change from November to December and difference of Days Since Last Login as three most important drivers of prediction of churn. Focusing on these variables will allow QWE to focus on customers who they are in the highest danger of churn and identify points at which their business might fail and these customers might leave. This knowledge can be applied to strategy in all areas of the business: marketing, product management, etc. The models we created will help QWE tighten up their business and better understand their customers and their behavior. Specific examples of strategy include the creation of a customer service outreach program where QWE targets these bottom ten customers and sends service representatives to engage with them and offer them incentives to stay with the company. Through logistical regression, we found a specific association between these three factors and possibility of churn: Using the knowledge about these three priority variables, we have devised the following recommendations for QWE in terms of business operation: Enhance user experience to increase Customer Happiness Index. To achieve this goal, QWE can take appli- cations like making user interface more friendly and acceler- ating loading speed. Increase user cohesiveness and interaction to improve customer login recency. It’s critical to maintain our users’ level of activity on our platform. There is a clear relationship from being more active of the site in terms of both content creation and simply volume of activity. For example, QWE can use better calls to action in order to incentivize traffic. Other than that, if they can make their service more mobile-friendly, it will help increase using frequency as well.