Advertisement
Advertisement

More Related Content

Recently uploaded(20)

Advertisement

Survival Regression PyData 2018

  1. Survival Analysis a practical application
  2. structure of the talk • Me • Tails.com • Lifetime value and retention • Survival analysis - motivation and theory • A survival regression model • Outputs and accuracy measures 2
  3. University of York: Maths MMath University of Bristol: PhD Random Matrix Theory Department for Education: Operations Research Analyst, Post-16 Education and Funding ASI Data Science: Data Science Internship Tails.com: Data Scientist 3 about me
  4. 4 Our proposition is based on a one-to-one relationship with each owner and their dog Changing the world of pet food for good Our proposition is based on a one-to-one relationship with each owner and their dog customer visits tails.com and enters dog’s details perfect product blended to meet pet’s individual requirements as a one-off Packaging personalised with dog’s name & their unique blend details Delivered to customer left in a safe place if necessary feeding plan automatically updates as dog ages or after optional owner feedback auto-replenishment so the owner never runs out, or has too much free adjustable, personalised feeding scoop making it easy to feed the right amount every day
  5. over 85,000 dogs UK wide deliver 4 million meals every month average monthly order costs £24 3 treat varieties, 15 wet food varieties over 1 million blends searched in 0.1s to find the optimal blend for your dog expect sales of well over £20m this year around 100 employees... 5 tails.com in numbers
  6. 6 … and around 25 office dogs
  7. customer retention and lifetime value
  8. Lifetime Value helps us make smart decisions on… ...product giveaways ...customer refunds ...marketing spend ...project prioritisation 8 why do we care about lifetime value?
  9. 9 retention and lifetime value Retention (how long you will be a customer) Frequency (how often you will order) Order Value (how much we make from your orders) Lifetime Value (total profit attributed to you)
  10. 10 retention and lifetime value Retention (how long you will be a customer) Frequency (how often you will order) Order Value (how much we make from your orders) Lifetime Value (total profit attributed to you)
  11. survival analysis
  12. 12 motivation What is the average subscription length? Censored data - not observed end event yet
  13. 13 survival analysis in action
  14. The time to a subscription end for a randomly chosen customer (time the customer churned) 14 survival analysis definitions Hazard function: probability that the customer will churn at time t And are related by: T ≥ 0 Survival Function: probability that the customer hasn’t churned by time t
  15. package options
  16. Lifelines Lightweight, good visualisations. Limited model selection Cameron Davidson Pilon github.com/CamDavidsonPilon scikit-survival Bigger selection of linear and nonlinear model options Sebastian Pölsterl - PyCon UK 2017 github.com/sebp/scikit-survival 16 model choices Survival, KMSurv, OISurv: Decent model selection, lots of tutorials and lectures on the subject use these packages. Generally slower to train, and less intuitive to use than Python options
  17. modelling
  18. 18 Kaplan-Meier estimate of the survival function Using lifelines package:
  19. 19 Time since subscription start Probabilitystillacustomer Probability customer still active = 50% Expected time active Kaplan-Meier estimate of the survival function
  20. Key Assumption: Impact of a factor on survival is multiplicative, and impact is constant over time 20 survival regression Input Features x: Everything we know about... - Your dog - You - Your actions in trial Cox Proportional Hazards model
  21. Dealing with categorical data 21 feature engineering pet_id breed 1 Jack Russell 2 Labrador 3 Dalmatian pet_id jack_russell labrador dalmatian 1 1 0 0 2 0 1 0 3 0 0 1 One hot encoding Category with n options converted to n - 1 binary features Assign sensible ordering pet_id breed 1 Jack Russell 2 Labrador 3 Dalmatian pet_id breed_median_days_active 1 xxx 2 yyy 3 zzz Logical ordering given instead of category - estimate median time active based on population Better clarity on impact of each individual category Generally more accurate prediction
  22. 22 training the model …..
  23. evaluation
  24. Measure accuracy by successful ordering of pairs of customers 24 concordance index Mowgli Predicted time a customer will be active 0 Mr. Patch
  25. Measure accuracy by successful ordering of pairs of customers 25 concordance index Mowgli: Active for 12 months Mr. Patch: Active for 18 months Predict Mowgli active for longer wrong! Predict Mr Patch active for longer correct! Concordance Index between 0 and 1 1: Perfect ordering of pets 0.5: As good as random ordering 0: Perfect anti-ordering
  26. 26 qA pet level survival predictions Time since subscription start Probabilitystillacustomer Lines don’t intersect - due to underlying proportional hazard assumption Happy customer, active for a long time Uninterested customer, churns quickly
  27. python, data and dogs your kind of thing?
  28. lorna@tails.com https://tails.com/careers 28
Advertisement