2. Problem Statement
Identifying patterns of rides and specific customer groups based on ride data and make recommendations to the
founders regarding specific plans of who and when to offer a discount.
Insights:
Most of the travellers like to travel around 2 pm, 4 pm and 11 pm in
the night, as the average duration to use found out to be maximum
during those period
Of all the age groups the travellers within the age range of 36-45yrs
have the highest riding duration, as these can be office going users.
Offering discount to them will be a great move.
However talking about the frequency or number of travellers 0-17yrs
consumers are the most visiting one, as this consumer base is more
of experience driven and want to explore furthermore.
There is another set of customer of Casual users who is using it
frequently in the 18-35yrs (Male) & 18-35 and 36-45yrs (F) who are
using it frequently but not members, they can be targeted during
peak time(2pm & 4PM) with discounts, it will be a valuable prospect.
0-17
25%
18-35
31%
36-45
24%
46-50
17%
50+
3%
Travellers
0-17 18-35 36-45 46-50 50+
28.5
29
29.5
30
30.5
31
31.5
32
32.5
33
33.5
0-17 18-35 36-45 46-50 50+
0-17, 32.17
18-35, 32.49
36-45, 33.29
46-50, 30.14
50+, 31.22
Average ride duration
0
10
20
30
40
50
0 1 2 3 4 5 6 7 8 910111213141516171819202122232425
AvgDuration
Time
Avg Duration wrt Time
0.00.00
480.00.00
960.00.00
1440.00.00
1920.00.00
2400.00.00
2880.00.00
0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
Days of the week Vs Duration
On an average people tend to spend more time
on weekends as people who are looking for fun
and convenience and want to spend time with
their family.
Our one of the analysis also shown that In
female of the age 18-35 yrs has the maximum
usage time and count. In Male also the
maximum traffic is driven by the same age
groups. Since the highest demand is at 2pm,
4pm and 11pm, these sets of group can be
targeted by offering discounts as they are
already using it heavily and offering more will
not only increase their brand likeliness but also
their brand loyalty.
Wealcan, Betcha can’t stop with one ride
3. We found many blank entries in the dataset and
replaced them with the average values of that
column for the respective sub-segments.
Cleaning
We felt that a change in the Sales Person, Regional
Sales Manager and Director might have significant
impact in the attrition rate and hence created new
variables for them.
New Variables
Cross Validation
We divided the data into two parts to perform
Stepwise Logistic Regression. We selected the data
in such a way that alternate 1000 entries are
present in each set.
Variables in the Equation
First half Next half
Vol_17_GeoPod_HR Vol_17_GeoPod_HR
Vol_17_Birth_Event Vol_17_Birth_Event
Vol_17_COA Vol_17_COA
Vol_17_G Vol_17_G
Ind_2013_pushback Ind_2013_pushback
DataCorp_Market_Share SalesPersonChange
Tenure_months RSMChange
SalesPersonChange DirectorChange
RSMChange VP_Jun_17
DirectorChange VP_Dec_17
VP_Jun_17 Price_Point
VP_Dec_17 Discount_17
Price_Point Count_Jurisdicion
Vol_16_GeoPod_Logistics Vol_16_GeoPod_Logistics
Vol_16_D Vol_16_D
We started our analysis with the Raw Test data, found an appropriate
model and applied that to the Business Critical customers to gain
insights and recommend the Retention strategies
When we divided the data into 2 parts and ran logistic
regression, error rate in 2nd half was very high.
Then ran logistic regression for both the set and
discrepancies were found in terms of significant
independent variable.
We perform cluster analysis and then divided these 4
clusters data into 2 halves each and then performed logistic
regression on first and tested on second.
Cluster Analysis
RetentionModel DeploymentModel EvaluationData ModellingData PreparationBusiness Understanding
4. Variables B S.E. Sig. Exp(B)
Vol_17_GeoPod_HR -0.010 0.005 0.047 0.990
Vol_17_Birth_Event 0.002 0.001 0.017 1.002
Vol_17_COA 0.020 0.004 0.000 1.020
Vol_17_G -0.001 0.000 0.002 0.999
Ind_2013_pushback -0.916 0.284 0.001 0.400
DataCorp_Market_Share -3.090 0.913 0.001 0.045
Tenure_months -0.003 0.001 0.024 0.997
Sales Person Change 0.294 0.108 0.006 1.342
RSM Change 0.683 0.093 0.000 1.979
Director Change -0.812 0.172 0.000 0.444
VP_Jun_17 0.981 0.246 0.000 2.668
VP_Dec_17 -0.900 0.194 0.000 0.407
Discount_17 2.178 0.285 0.000 8.828
Count_Jurisdicion -0.010 0.003 0.004 0.990
Price_Point 1.078 0.208 0.000 2.938
Vol_16_GeoPod_Logistics 0.362 0.087 0.000 1.436
Vol_16_D -0.002 0.001 0.001 0.998
Constant 0.240 0.695 0.729 1.271
Stepwise Logistic Regression Results
RSM Change: Change of Regional Sales Manager from Jun to Dec
Sales Person Change: Change of Sales Person from Jun to Dec
Director Change: Change of Director from Jun to Dec
Analysis of the Results
Regression Equation
Odds Ratio = 𝑒 𝛽 𝑐𝑜𝑛𝑠𝑡 × 𝑒 𝛽 𝑣𝑎𝑟 ×𝑉𝑎𝑟
p-value =
𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜
1 + 𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜
Cℎ𝑢𝑟𝑛 𝑅𝑎𝑡𝑒 = 𝑖𝑓(p−value ≤ 0.16, "N", "Y")
Standard Error and Multicollinearity (VIF <3)
Since none of the independent variables in the analysis had a
standard error larger than 2.0, there is no outliers.01
Vol_17_COA
This variable entered in the logistic regression first, and has high
individual impact.02
Variables which are more likely to decrease the odd ratio
Vol_17_GeoPod HR, Vol_17_G, DataCorp_Market _Share,
Director Change
03
Variables having positive impact on Odds ratio
Price_Point, VP_Jun_17, Discount_1704
RetentionModel DeploymentModel EvaluationData ModellingData PreparationBusiness Understanding
5. p-value = 0.5
The overall
accuracy of this
model is 87.5%
Only 30
attrition prone
customers were
rightly
predicted with
3.1% accuracy
This means that
the error rate in
True Positive
case is very high
We are unable
to predict that
about 97% of
the customers
are attrition
prone in this
model
p-value = 0.16
The overall
accuracy of this
model has been
reduced to
78.6%
But 374
attrition prone
customers were
rightly
predicted with
41.6% accuracy
This means that
the error rate in
True Positive
case has
improved to a
great extent
In our case, it’s
essential to
increase the
accuracy of
True Positive
rather than
True Negative
case
True Negative: If observed value is 0 and is rightly predicted
False Positive: If observed value is 0 and wrongly predicted as 1
False Negative: If observed value is 1 and is wrongly predicted as 0
True Positive: If observed value is 1 and is rightly predicted as 1
In this case we are interested in increasing the accuracy of True
Positive as we want to predict the attrition rate correctly.
• The ROC curve is plotted with True Positive
Rate (TPR) against the False Positive Rate
(FPR) where TPR is on y-axis and FPR is on
the x-axis
• AUC - ROC curve is a performance
measurement for classification problem at
various thresholds settings
• Higher the AUC, better the model is at
predicting 0s as 0s and 1s as 1s
• Hence the cutoff value should be selected
in such a way that it minimizes the
misclassification rates in the model
Optimum Cut-off value
Comparison using Confusion Matrix
RetentionModel DeploymentModel EvaluationData ModellingData PreparationBusiness Understanding
6. 0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Credit Unions Investment Firms Multinational Banks Small Business
No of churned customer
0-10% 10-20% 20-30% 30-40% 40-50% 50-60% 60-70% 70-80% 80-90% 90-100%
0%
20%
40%
60%
80%
100%
120%
None
Changed
Only Dir
Changed
Only RSM
Changed
Only SP
Changed
Dir and
RSM
Changed
Dir and SP
Changed
RSM and
SP
Changed
All
Changed
Churn percentage
Investment firms are 211 only out of 4000
business critical customer but having quite high
churn rate (26.54%).
For all the sub-segments except Multinational
banks, maximum churn rate has been seen in 0-
10% bracket.
Most of the Revenue loss comes from Small
Businesses followed by Multinational Banks.
Churn percentage is highest when RSM got
changed from Jun to Dec.
885
211
612
2292
182
56 119
501
0
500
1000
1500
2000
2500
Credit Unions Investment Firms Multinational Banks Small Business
Sub-segment wise Churn rate
Total Customer No of customer churn
1531513
826889
2536800
4390288
0 1000000 2000000 3000000 4000000 5000000
Credit Unions
Investment Firms
Multinational Banks
Small Business
Total Revenue loss sub-segment wise
RetentionModel DeploymentModel EvaluationData ModellingData PreparationBusiness Understanding
7. Credit UnionsSmall Business Multi-National Banks Investment Firms
Most of our customer are
small business and
maximum customer are
getting churn in 0-10%
discount bracket. So we
can lure them by giving
more discount. It will help
in retaining the customer
that will further off-set the
revenue loss by additional
discount.
Credit unions, next big
segment also seems price
sensitive and have
maximum attrition rate in
the 0-10% segment. We can
offer the discount here as
well. In addition of that
here we can introduce
community coalition
program, the more
members they get onboard
with us, the more discount
they’ll be provided.
Next big segment is Multi-
national bank, which
doesn’t seem to be much
sensitive about the
discounts given. Here we
can try to gain more market
share by bringing more
awareness and educating
the customer about our
product, and improving our
customer experience by
feedback mechanism and
paying attention to their
complaints.
Investment firms contains
small chunk of whole
portfolio, seems somewhat
sensitive about the
discounts given (29% of
churned customer lies in
the bracket of 0-10%
discount). Here we can
introduce referral programs
where investment firm will
be rewarded based on the
successful conversion of
promoted Geopod product
among their huge client
base.
Additional Recommendation: Based on our analysis, change in Regional Sales Manager is having the maximum impact on the
attrition of customer, hence we recommend to reduce the frequency of these changes.
Retention Strategies
RetentionModel DeploymentModel EvaluationData ModellingData PreparationBusiness Understanding