SlideShare a Scribd company logo
1 of 17
Predicting “the” Repeat
Customers
22st March 2016
V1.0
How to identify potential repeat customers?
Type of Customer ( since 2005)* No. of Customers
Repeat Customer 125,629
Customer who purchased 1 car with ALJF 425,745
How many are Repeat customer? A “subset” of 425,745 ( about 20 %)
Need to identify a time window beyond which second purchase chances are very low.
First purchase Customer older than this can be safely assumed to be non-repeat customer.
Is there a pattern in the time between
subsequent purchase for repeat customers?
5%
10%
12%
10%
9%
8%
7%
6% 6%
5%
4%
4%
3%
2%
2% 1% 1% 1% 1% 1% 1% 0% 0% 0% 0% 0% 0%
%ofsubsequentpurchase
Years between purchase of subsequent vehicles
Distribution of Time interval between purchase for all Repeat Customers
Need to identify a time window beyond which second purchase chances are very low.
First purchase Customer older than this can be safely assumed to be non-repeat customer.
Beyond 8 years only 5 % of the repeat customers bought their subsequent car.
Hence considering 8 years as a time window to observe repeat customers.
Training data for predictive modeling.
Subsequent Car
Buying preference generally depends upon multifactor ( Additional Data we have of our Customers)
Marital Status Nationality Work Place Job Car Model
Guest Age Range Guest Income Range Guest Gender Government/
Non Government
No. of Dependents
Customers profile Labelling as Repeat Customers
All repeat customers from 2005 till now 1
Single purchase customers 8 years from now, since 2005 0
Independent variables which determines
repeat/non repeat behavior:-
Data Partitioning:-
Training ( to train the model) Validation ( to test the model)
55% 45%
Probabilistic Binary Classification using Decision Tree
Decision Tree Model: Non-linear Prediction Model for binary
classification through hierarchical segmentation of the data, by
partitioning recursively.
A tree structure of rules over the input variables are used to
classify or predict according to the target variable ( Repeat).
Outcome assigns each case a probability, upon which one can
apply a threshold ( 0.5). Higher probability than threshold is
predicted as repeat customers and lower probability than
threshold is predicted as non-repeat customers.
Decision Tree to predict repeat/non repeat customer
Compare Models
Each model gives an improved classification than a naïve (randomly selecting
customers for solicitation) model. Decision Tree out performs other models.
Decision Tree Performance:-
The validation leaves are almost likely populated
as the training leaves indicating no over fitting.
The misclassification rate stabilizes beyond 50
leaves, indicating model converges. The depth
of the tree is good enough.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 100 200 300 400 500 600 700 800 900 1000
Probability
Customer Cases
Observing Probability Levels and Actual Repeat Customer Cases
More “crowded” plots at higher probabity.
Probability Zone for Maximum Prediction
Probability above 0.8 will have most repeat customer cases!
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability rating on actual Cases
Repeat Customer Non-repeat Customer
But who is “ready” to buy?
Most of subsequent purchase occur in this time window.
Superimpose on probability model to get Priority rating.
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10
Percentage of Customer making 2nd purchase after 1st
Purchase wrt. time interval
Strategic Solicitation(Ranking and Targeting) :-
1. Score all single purchase customers within 8 years using different
predictive model(logistic regression/ neural network/decision tree).
2. Identify region of maximum purity derived from actual case frequency
on different probability levels.
3. Adjust the weight using “Distribution of time between purchase for
purchase customers” as a factor depending upon the time since
purchase. Assign peak(frequency) as weight of 100% and weights of
other period based on relative frequency to peak and modify
probability of being a repeat customer with the above factor.
4. Remove delinquent customers.
5. Sort them in descending order of repeat customer probability score
from the model.
Dashboarding process
Advantages of dashboard
Integrated Data availability for all stake holders ( Operation, Telemarketing).
Summary area provides a snapshot of campaign performance.
Automated process with minimal work to launch campaign.
Top management will observe telemarketing feed and actual buying behavior to
generate insight to improve sales.
Telemarketing can view the progress of their previous call, and pursue customers
who planned to buy but did not buy yet.
Future plans
1. Extend this campaign to customers acquisition
through external data.
2. Automate Telemarking update through web pages.
3. Create SMS broadcast service.
4. Direct campaign to auto-dialer.
Thank you!

More Related Content

Similar to PredictingRepeatCust

Recency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industryRecency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industryQualex Asia
 
Valiance Portfolio
Valiance PortfolioValiance Portfolio
Valiance PortfolioRohit Pandey
 
Best Practice EAD Modelling Methodologies v1.4
Best Practice EAD Modelling Methodologies v1.4Best Practice EAD Modelling Methodologies v1.4
Best Practice EAD Modelling Methodologies v1.4David Ong
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment Kunal Kashyap
 
Customer analytics
Customer analyticsCustomer analytics
Customer analyticsKarl Melo
 
"Buy Till You Die": New Perspectives on E-Commerce Buying Patterns
"Buy Till You Die": New Perspectives on E-Commerce Buying Patterns"Buy Till You Die": New Perspectives on E-Commerce Buying Patterns
"Buy Till You Die": New Perspectives on E-Commerce Buying PatternsLemonTree Fundraising
 
How to Optimize for SaaS Retention | Masters of Conversion
How to Optimize for SaaS Retention | Masters of ConversionHow to Optimize for SaaS Retention | Masters of Conversion
How to Optimize for SaaS Retention | Masters of ConversionVWO
 
Helping Our Clients Select Best Quote
Helping Our Clients Select Best QuoteHelping Our Clients Select Best Quote
Helping Our Clients Select Best QuoteAlexander Levine
 
Taking the Guesswork out of Pipelines and Forecasts
Taking the Guesswork out of Pipelines and ForecastsTaking the Guesswork out of Pipelines and Forecasts
Taking the Guesswork out of Pipelines and ForecastsThe Naro Group
 
Defining Target Market for Telemarketing Campaigns
Defining Target Market for Telemarketing CampaignsDefining Target Market for Telemarketing Campaigns
Defining Target Market for Telemarketing CampaignsMelody Ucros
 
Loyalty_Driver_Analysis_V13b
Loyalty_Driver_Analysis_V13bLoyalty_Driver_Analysis_V13b
Loyalty_Driver_Analysis_V13bBayesia USA
 
Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modelingEsteban Ribero
 
Data mining - Machine Learning
Data mining - Machine LearningData mining - Machine Learning
Data mining - Machine LearningRupaDutta3
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxAniket Patil
 

Similar to PredictingRepeatCust (20)

Recency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industryRecency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industry
 
Valiance Portfolio
Valiance PortfolioValiance Portfolio
Valiance Portfolio
 
Best Practice EAD Modelling Methodologies v1.4
Best Practice EAD Modelling Methodologies v1.4Best Practice EAD Modelling Methodologies v1.4
Best Practice EAD Modelling Methodologies v1.4
 
Liberty
LibertyLiberty
Liberty
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Customer analytics
Customer analyticsCustomer analytics
Customer analytics
 
Customer lifetime value (1)
Customer lifetime value (1)Customer lifetime value (1)
Customer lifetime value (1)
 
"Buy Till You Die": New Perspectives on E-Commerce Buying Patterns
"Buy Till You Die": New Perspectives on E-Commerce Buying Patterns"Buy Till You Die": New Perspectives on E-Commerce Buying Patterns
"Buy Till You Die": New Perspectives on E-Commerce Buying Patterns
 
How to Optimize for SaaS Retention | Masters of Conversion
How to Optimize for SaaS Retention | Masters of ConversionHow to Optimize for SaaS Retention | Masters of Conversion
How to Optimize for SaaS Retention | Masters of Conversion
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Helping Our Clients Select Best Quote
Helping Our Clients Select Best QuoteHelping Our Clients Select Best Quote
Helping Our Clients Select Best Quote
 
Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
 
Taking the Guesswork out of Pipelines and Forecasts
Taking the Guesswork out of Pipelines and ForecastsTaking the Guesswork out of Pipelines and Forecasts
Taking the Guesswork out of Pipelines and Forecasts
 
Defining Target Market for Telemarketing Campaigns
Defining Target Market for Telemarketing CampaignsDefining Target Market for Telemarketing Campaigns
Defining Target Market for Telemarketing Campaigns
 
rating-vs-scoring
rating-vs-scoringrating-vs-scoring
rating-vs-scoring
 
Loyalty_Driver_Analysis_V13b
Loyalty_Driver_Analysis_V13bLoyalty_Driver_Analysis_V13b
Loyalty_Driver_Analysis_V13b
 
Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modeling
 
Model Validation
Model Validation Model Validation
Model Validation
 
Data mining - Machine Learning
Data mining - Machine LearningData mining - Machine Learning
Data mining - Machine Learning
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 

More from Jawed Khan

Data Analytics - CV
Data Analytics - CVData Analytics - CV
Data Analytics - CVJawed Khan
 
NextCarPrediction
NextCarPredictionNextCarPrediction
NextCarPredictionJawed Khan
 
MonteCarloWith@Risk
MonteCarloWith@RiskMonteCarloWith@Risk
MonteCarloWith@RiskJawed Khan
 
CollectionOptimizationFinal
CollectionOptimizationFinalCollectionOptimizationFinal
CollectionOptimizationFinalJawed Khan
 
Resume_Mohammed_jawed_khan
Resume_Mohammed_jawed_khanResume_Mohammed_jawed_khan
Resume_Mohammed_jawed_khanJawed Khan
 

More from Jawed Khan (6)

Data Analytics - CV
Data Analytics - CVData Analytics - CV
Data Analytics - CV
 
NextCarPrediction
NextCarPredictionNextCarPrediction
NextCarPrediction
 
MonteCarloWith@Risk
MonteCarloWith@RiskMonteCarloWith@Risk
MonteCarloWith@Risk
 
forecast
forecastforecast
forecast
 
CollectionOptimizationFinal
CollectionOptimizationFinalCollectionOptimizationFinal
CollectionOptimizationFinal
 
Resume_Mohammed_jawed_khan
Resume_Mohammed_jawed_khanResume_Mohammed_jawed_khan
Resume_Mohammed_jawed_khan
 

PredictingRepeatCust

  • 2. How to identify potential repeat customers? Type of Customer ( since 2005)* No. of Customers Repeat Customer 125,629 Customer who purchased 1 car with ALJF 425,745 How many are Repeat customer? A “subset” of 425,745 ( about 20 %) Need to identify a time window beyond which second purchase chances are very low. First purchase Customer older than this can be safely assumed to be non-repeat customer.
  • 3. Is there a pattern in the time between subsequent purchase for repeat customers? 5% 10% 12% 10% 9% 8% 7% 6% 6% 5% 4% 4% 3% 2% 2% 1% 1% 1% 1% 1% 1% 0% 0% 0% 0% 0% 0% %ofsubsequentpurchase Years between purchase of subsequent vehicles Distribution of Time interval between purchase for all Repeat Customers Need to identify a time window beyond which second purchase chances are very low. First purchase Customer older than this can be safely assumed to be non-repeat customer. Beyond 8 years only 5 % of the repeat customers bought their subsequent car. Hence considering 8 years as a time window to observe repeat customers.
  • 4. Training data for predictive modeling. Subsequent Car Buying preference generally depends upon multifactor ( Additional Data we have of our Customers) Marital Status Nationality Work Place Job Car Model Guest Age Range Guest Income Range Guest Gender Government/ Non Government No. of Dependents Customers profile Labelling as Repeat Customers All repeat customers from 2005 till now 1 Single purchase customers 8 years from now, since 2005 0 Independent variables which determines repeat/non repeat behavior:- Data Partitioning:- Training ( to train the model) Validation ( to test the model) 55% 45%
  • 5. Probabilistic Binary Classification using Decision Tree Decision Tree Model: Non-linear Prediction Model for binary classification through hierarchical segmentation of the data, by partitioning recursively. A tree structure of rules over the input variables are used to classify or predict according to the target variable ( Repeat). Outcome assigns each case a probability, upon which one can apply a threshold ( 0.5). Higher probability than threshold is predicted as repeat customers and lower probability than threshold is predicted as non-repeat customers.
  • 6. Decision Tree to predict repeat/non repeat customer
  • 7. Compare Models Each model gives an improved classification than a naïve (randomly selecting customers for solicitation) model. Decision Tree out performs other models.
  • 8. Decision Tree Performance:- The validation leaves are almost likely populated as the training leaves indicating no over fitting. The misclassification rate stabilizes beyond 50 leaves, indicating model converges. The depth of the tree is good enough.
  • 9. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 600 700 800 900 1000 Probability Customer Cases Observing Probability Levels and Actual Repeat Customer Cases More “crowded” plots at higher probabity.
  • 10. Probability Zone for Maximum Prediction Probability above 0.8 will have most repeat customer cases! 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Probability rating on actual Cases Repeat Customer Non-repeat Customer
  • 11. But who is “ready” to buy? Most of subsequent purchase occur in this time window. Superimpose on probability model to get Priority rating. 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 Percentage of Customer making 2nd purchase after 1st Purchase wrt. time interval
  • 12. Strategic Solicitation(Ranking and Targeting) :- 1. Score all single purchase customers within 8 years using different predictive model(logistic regression/ neural network/decision tree). 2. Identify region of maximum purity derived from actual case frequency on different probability levels. 3. Adjust the weight using “Distribution of time between purchase for purchase customers” as a factor depending upon the time since purchase. Assign peak(frequency) as weight of 100% and weights of other period based on relative frequency to peak and modify probability of being a repeat customer with the above factor. 4. Remove delinquent customers. 5. Sort them in descending order of repeat customer probability score from the model.
  • 14.
  • 15. Advantages of dashboard Integrated Data availability for all stake holders ( Operation, Telemarketing). Summary area provides a snapshot of campaign performance. Automated process with minimal work to launch campaign. Top management will observe telemarketing feed and actual buying behavior to generate insight to improve sales. Telemarketing can view the progress of their previous call, and pursue customers who planned to buy but did not buy yet.
  • 16. Future plans 1. Extend this campaign to customers acquisition through external data. 2. Automate Telemarking update through web pages. 3. Create SMS broadcast service. 4. Direct campaign to auto-dialer.