How to data model Churn
Real life examples
Quick quizz
•  How many of you are familiar with Churn issue?
•  with Machine Learning?
Logistic Regression, Random Forest, Gradient Boosting trees?
(Not the subject here)
•  With SQL?
(we may see some code later)
•  What database tech do you use?
What about EMC Greenplum or Vertica?
Who I am
•  Senior Data Scientist at Dataiku
(worked on churn prediction, fraud detection, bot detection, recommender systems,
graph analytics, smart cities, … )
•  Occasional Kaggle competitor
•  Mostly code with python and SQL
•  Twitter @prrgutierrez
Churn definition
•  Wikipedia:
“Churn rate (sometimes called attrition rate), in its broadest sense, is a measure of the
number of individuals or items moving out of a collective group over a specific period of
time”
= Customer leaving
Two types of Churn
•  Subscription models:
•  Telco
•  E-gamming (Wow)
•  Ex : Coyote -> 1 year subscription
-> you know when someone leave
•  Non subscription models:
•  E-Business (Amazon, Price Minister, Vente Privée)
•  E-gamming (Candy Crush, free MMORPG)
-> you approximate someone leaving
Candy Crush: days / weeks
MMORPG: 2 months (holidays)
Price Minister: months
Two types of Churn
•  Blurred Separation:
•  Ex: T-mobile: 1 month subscription -> paying each call
•  Ex: Wow: 1 month to 6 month subscription
•  Banking?
•  Focus : no subscription:
•  Can be seen as a generalization where you have to approximate the target
•  Bonus : Seller churn
•  Market places
•  Clients that participate product life
•  Forums (Reddit)
•  E-gamming (Korean competitions, guilds etc.)
Dealing with churn
•  Motivations :
•  Saturated market
-> cost get new client >>> cost keep client
•  Ex : http://www.bain.com/publications/articles/breaking-the-back-of-customer-churn.aspx
•  Wireline company : 2% to 2.5 % churn rate per month.
•  If 5 M customers -> 1.32 M churn per year
•  When reducing from 2.5% to 2% lowest estimation : 240 M $ in 18 month
Dealing with churn
•  Predict churn :
•  One model for performance <- our focus, short term, more ML
•  One model for understanding <- long term, more Analytics
•  Act on it (short term) :
•  Special offer (telco call, free in game money, discount coupon … )
•  Does it work? Feedback loop needed!
•  Model probabilities of leaving because of offer. A/B tests. Multi arms Bandit?
•  Significant LTV for activation?
•  Act on it (long term) :
•  Is there a problem in my purchasing funnel?
•  Is the game too hard at some point?
Dealing with churn
•  Candy Crush Rumor :
•  Change the distribution of
probabilities of candies / bombs
•  Change the difficulty of the
game
•  Loosing a lot makes the game
easier
Modelling Churn
•  Machine learning model (classification) -> target:
•  Known in subscription
•  Unknown in general
•  Step 1 : Maintain customer status
•  Do you care only about your best?
•  Anyway churn action won’t be the same
•  Has a client churned?
-> target = churner = don’t buy / visit since time X
-> best = buy / visit more than y since time Y
•  Can be refined (“new customer”, several class of best or inactive, reactivated…)
•  Storage : maintain only the difference!
Modelling Churn
•  Machine learning model -> features:
•  Explicative factors to use as input for the model
•  Step 2 : Maintain customer features
•  Social (woman, age, etc.)
•  Behavioral!
•  Utilization / buying rate
•  Trend in utilization / buying rate
•  Ad hoc features :
•  WoW / Social game churn: take into account friend network churn
•  Telco: call to call centers
•  Beware of time dependence!
Data Model
Computation Dependency diagram
Ex : Train and predict scheme
Time	
  
T	
  :	
  present	
  ,me	
  T	
  –	
  4	
  month	
  
Data	
  is	
  used	
  for	
  target	
  
crea,on	
  :	
  ac,vity	
  during	
  
the	
  last	
  4	
  months	
  
Data	
  is	
  used	
  for	
  feature	
  
genera,on.	
  
Use	
  model	
  to	
  predict	
  
future	
  churn	
  
Train	
  model	
  using	
  features	
  and	
  target	
  
Ex : Train Evaluation and Predict Scheme
Time	
  
T	
  :	
  present	
  ,me	
  T	
  –	
  4	
  month	
  
Data	
  is	
  used	
  for	
  target	
  
crea,on	
  :	
  ac,vity	
  during	
  
the	
  last	
  4	
  months	
  
Data	
  is	
  used	
  for	
  
feature	
  genera,on	
  
Valida&on	
  set	
  
Use	
  model	
  to	
  
predict	
  future	
  
churn	
  
Training	
  
Evaluate	
  on	
  the	
  target	
  
of	
  the	
  valida,on	
  set	
  
T	
  –	
  8	
  month	
  
Data	
  is	
  used	
  for	
  features	
  
genera,on.	
  
Data	
  is	
  used	
  for	
  target	
  
crea,on	
  :	
  ac,vity	
  during	
  
the	
  last	
  4	
  months	
  
Thank you for your attention !

Churn prediction data modeling

  • 1.
    How to datamodel Churn Real life examples
  • 2.
    Quick quizz •  Howmany of you are familiar with Churn issue? •  with Machine Learning? Logistic Regression, Random Forest, Gradient Boosting trees? (Not the subject here) •  With SQL? (we may see some code later) •  What database tech do you use? What about EMC Greenplum or Vertica?
  • 3.
    Who I am • Senior Data Scientist at Dataiku (worked on churn prediction, fraud detection, bot detection, recommender systems, graph analytics, smart cities, … ) •  Occasional Kaggle competitor •  Mostly code with python and SQL •  Twitter @prrgutierrez
  • 4.
    Churn definition •  Wikipedia: “Churnrate (sometimes called attrition rate), in its broadest sense, is a measure of the number of individuals or items moving out of a collective group over a specific period of time” = Customer leaving
  • 5.
    Two types ofChurn •  Subscription models: •  Telco •  E-gamming (Wow) •  Ex : Coyote -> 1 year subscription -> you know when someone leave •  Non subscription models: •  E-Business (Amazon, Price Minister, Vente Privée) •  E-gamming (Candy Crush, free MMORPG) -> you approximate someone leaving Candy Crush: days / weeks MMORPG: 2 months (holidays) Price Minister: months
  • 6.
    Two types ofChurn •  Blurred Separation: •  Ex: T-mobile: 1 month subscription -> paying each call •  Ex: Wow: 1 month to 6 month subscription •  Banking? •  Focus : no subscription: •  Can be seen as a generalization where you have to approximate the target •  Bonus : Seller churn •  Market places •  Clients that participate product life •  Forums (Reddit) •  E-gamming (Korean competitions, guilds etc.)
  • 7.
    Dealing with churn • Motivations : •  Saturated market -> cost get new client >>> cost keep client •  Ex : http://www.bain.com/publications/articles/breaking-the-back-of-customer-churn.aspx •  Wireline company : 2% to 2.5 % churn rate per month. •  If 5 M customers -> 1.32 M churn per year •  When reducing from 2.5% to 2% lowest estimation : 240 M $ in 18 month
  • 8.
    Dealing with churn • Predict churn : •  One model for performance <- our focus, short term, more ML •  One model for understanding <- long term, more Analytics •  Act on it (short term) : •  Special offer (telco call, free in game money, discount coupon … ) •  Does it work? Feedback loop needed! •  Model probabilities of leaving because of offer. A/B tests. Multi arms Bandit? •  Significant LTV for activation? •  Act on it (long term) : •  Is there a problem in my purchasing funnel? •  Is the game too hard at some point?
  • 9.
    Dealing with churn • Candy Crush Rumor : •  Change the distribution of probabilities of candies / bombs •  Change the difficulty of the game •  Loosing a lot makes the game easier
  • 10.
    Modelling Churn •  Machinelearning model (classification) -> target: •  Known in subscription •  Unknown in general •  Step 1 : Maintain customer status •  Do you care only about your best? •  Anyway churn action won’t be the same •  Has a client churned? -> target = churner = don’t buy / visit since time X -> best = buy / visit more than y since time Y •  Can be refined (“new customer”, several class of best or inactive, reactivated…) •  Storage : maintain only the difference!
  • 11.
    Modelling Churn •  Machinelearning model -> features: •  Explicative factors to use as input for the model •  Step 2 : Maintain customer features •  Social (woman, age, etc.) •  Behavioral! •  Utilization / buying rate •  Trend in utilization / buying rate •  Ad hoc features : •  WoW / Social game churn: take into account friend network churn •  Telco: call to call centers •  Beware of time dependence!
  • 12.
  • 13.
  • 14.
    Ex : Trainand predict scheme Time   T  :  present  ,me  T  –  4  month   Data  is  used  for  target   crea,on  :  ac,vity  during   the  last  4  months   Data  is  used  for  feature   genera,on.   Use  model  to  predict   future  churn   Train  model  using  features  and  target  
  • 15.
    Ex : TrainEvaluation and Predict Scheme Time   T  :  present  ,me  T  –  4  month   Data  is  used  for  target   crea,on  :  ac,vity  during   the  last  4  months   Data  is  used  for   feature  genera,on   Valida&on  set   Use  model  to   predict  future   churn   Training   Evaluate  on  the  target   of  the  valida,on  set   T  –  8  month   Data  is  used  for  features   genera,on.   Data  is  used  for  target   crea,on  :  ac,vity  during   the  last  4  months  
  • 16.
    Thank you foryour attention !