Your SlideShare is downloading. ×

My Entry to the DMEF CLV Contest

1,338

Published on

As part of my master thesis "Stochastic Models of Noncontractual Consumer Relationships" I participated in a contest organized by the DMEF to forecast Consumer Lifetime Value. My submitted model …

As part of my master thesis "Stochastic Models of Noncontractual Consumer Relationships" I participated in a contest organized by the DMEF to forecast Consumer Lifetime Value. My submitted model finished second (out of 25 entries). These slides concisely summarize my approach and also the final model.

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,338
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
38
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. THE DMEF CLV COMPETITION AND HOW I ENDED UP ON 2ND PLACE
  • 2. THE CHALLENGE $ $ $ ?$?$? 1.1.2002 31.8.2006 31.8.2008 non-contractual Setting non-observable Status
  • 3. THE CHALLENGE $ $ $ ?$?$? 1.1.2002 31.8.2006 31.8.2008 21,000 DONORS acquired in first half of 2002 54,000 DONATIONS until mid of 2006
  • 4. THE GAME PLAN • Understand the Data Set ➙ EDA • Split Estimation for # Transactions and $ Value • Implement Parametric Stochastic Models NBD, Pareto/NBD, BG/NBD, CBG/NBD,.. • Benchmark Data Fit and Predictive Power • Try to Improve Predictive Power
  • 5. THE DATA SET SAMPLED TIMING PATTERNS Various Timing Patterns 11382546 | | | | | 11371770 | | | || | | | | | | | | | | | 11359536 | | | 11343894 | | 11329984 | Donor ID 11317401 | 11303989 | 11292547 | | 11281342 | | | | | | | 11270451 | 11259736 | 10870988 |||||||||||||||||||||||||||||||||||||||||||| 2002 2003 2004 2005 2006 Time Scale
  • 6. THE DATA SET TRENDS AT AGGREGATE LEVEL Nr of Donations Avg Donation Amount 50 8000 40 30 4000 13% 15% 14% 20 10 +24% 10% +12% 0 0 2002 2004 2006 2002 2004 2006 Time Time
  • 7. THE DATA SET TRENDS AT AGGREGATE LEVEL Percentage of Donors Average Nr of Donations who Have Donated Within that Year per Active Donor 0.5 2.0 1.55 0.4 1.46 1.51 1.5 1.42 27.8% 29.5% 0.3 23.5% 1.0 18.8% 0.2 0.5 0.1 0.0 0.0 2002 2003 2004 2005 2002 2003 2004 2005 Time Time
  • 8. THE DATA SET INTERTRANSACTION TIMES Overall Distribution of Intertransaction Times 4000 1 12 3000 Count 2000 1000 24 0 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 Nr of Months in between Donations
  • 9. THE MODELS NBD ASSUMPTIONS (1959) A) The number of transactions follows a Poisson process with rate λ B) Heterogeneity in λ follows a Gamma distribution with shape parameter r and rate parameter α „while there is not enough information to reliably estimate the purchase rate for each person, there will generally be enough to estimate the distribution of it over customers“
  • 10. THE MODELS NBD - ESTIMATION r = 0,475 avg IPT: 2,9 years α = 489.5 med IPT: 6,6 years
  • 11. THE MODELS PARETO/NBD ASSUMPTIONS (1987) A) The number of transactions follows a Poisson NBD { process with rate λ B) Heterogeneity in λ follows a Gamma distribution with shape parameter r and rate parameter α C) Customer Lifetime is exponentially distributed Pareto { with death rate μ D) Heterogeneity in μ follows a Gamma distribution with shape parameter s and rate parameter β E) λ and μ are distributed independently
  • 12. THE MODELS BG/NBD ASSUMPTIONS (2005) A) The number of transactions follows a Poisson process with rate λ B) Heterogeneity in λ follows a Gamma distribution with shape parameter r and rate parameter α C) Directly after each purchase there is a constant drop-out probabilty p D) Heterogeneity in p follows a Beta distribution with parameter a and b E) λ and p are distributed independently
  • 13. THE MODELS CBG/NBD ASSUMPTIONS (2007) A) The number of transactions follows a Poisson process with rate λ B) Heterogeneity in λ follows a Gamma distribution with shape parameter r and rate parameter α C) At time zero and directly after each purchase there is a constant drop-out probabilty p D) Heterogeneity in p follows a Beta distribution with parameter a and b E) λ and p are distributed independently
  • 14. THE BENCHMARK DATA FIT Actual vs Fitted Frequency of Repeat Transactions 10000 Observed NBD Pareto/NBD BG/NBD 8000 2 = 366.1 CBG/NBD NBD 2 Pareto/NBD = 391.5 2 BG/NBD = 487.2 6000 Frequency 2 CBG/NBD = 363.7 4000 2000 0 0 1 2 3 4 5 6 7+
  • 15. THE BENCHMARK PREDICTIVE POWER Time Split Calibration Validation Period Period 2002 2003 2004 2005 2006
  • 16. THE BENCHMARK PREDICTIVE POWER MSLE = Mean Squared Logarithmic Error RMSE = Root Mean Squared Error MAE = Mean Absolute Error Corr = Correlation
  • 17. THE PROBLEM A SIMPLE LINEAR MODEL
  • 18. THE APPROACH INVESTIGATE IN ERRORS Timing Patterns for the Timing Patterns for the 10 Worst Underestimated Donors 10 Worst Overestimated Donors | | | | | | ||||| | | ||| | |||| || | | | | | |||||||||||||||||||||||||||||| || | | ||||||||||||| | |||| || |||| | || ||| |||| | | | | | || | | | |||||||||| |||||||||||||||| |||||||||||| | | | || | ||||||||||||||| || | | || | | | | ||| | | | | || | | | | | | | ||||||||||||||| || | || || | | |||||| ||| | | | | | | | ||||||||||||||| | | | || | ||| | ||||||||||| | | | | | | | |||||||||||||| |||||||||||| ||||||||||||||||||||||| | | | ||||| | | ||||||||||||||||||||||||||||||||| || | | | || ||||||| | ||| |||| | | | | | || | | | | || | | || | | | | | || | ||| | | | | Calibration Period Validation Period Calibration Period Validation Period
  • 19. REGULARITY IT‘S NOT JUST ABOUT RECENCY AND FREQUENCY Two Users with same Recency and Frequency But one of them is more likely to be active after T.
  • 20. THE POISSON PROCESS PROBLEMATIC IMPLICATIONS Poisson implies Exponentially Distributed IPT •Mode Zero: The most likely time of purchase is immediately after a purchase. No dead period. •Memoryless Property: No regularity within timing patterns. Succeeding interpurchase times are assumed to be uncorrelated.
  • 21. THE SOLUTION CBG/CNBD-K ASSUMPTIONS (2008) A) While active, transactions occur with Erlang-k (rate parameter λ) distributed waiting times B) Heterogeneity in λ follows a Gamma distribution with shape parameter r and rate parameter α C) Directly after each purchase there is a constant drop-out probabilty p D) Heterogeneity in p follows a Beta distribution with parameter a and b E) λ and p are distributed independently
  • 22. THE SOLUTION ERLANG-K Erlang 1 | | | | 0.0 0.4 0.8 | | | | | | | | | | | || | | | ||| | | || || | | | | || | | | | | | || | | | | 0 1 2 3 4 5 Erlang 2 | | | | | | | | | 0.0 0.4 0.8 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0 1 2 3 4 5 Erlang 3 | | | | | | | | | | 0.0 0.4 0.8 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0 1 2 3 4 5 Erlang 100 | | | | | | | | | 0.0 0.4 0.8 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0 1 2 3 4 5
  • 23. THE SOLUTION CBG/CNBD-K - 2008
  • 24. REGULARITY MEASURES ESTIMATING ,K‘ Distribution of Estimated Gamma Shape Parameters r=1 Exponential IPTs r=2 Erlang 2 IPTs 0 2 4 6 8 10 Regularity Measure M Shape Parameter r 2.5 Actual Distribution of M Distribution of M for r=2 Distribution of M for r=1 2.0 1.5 Density 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0
  • 25. THE BENCHMARK MSLE RMSE MAE Corr SUM LM 0,0863 0,642 0,262 0,644 -31 % Pareto/NBD 0,0977 0,653 0,359 0,628 +22% BG/NBD 0,0963 0,651 0,362 0,640 +19% CBG/NBD 0,0959 0,650 0,360 0,639 +19% CBD/CNBD-2 0,0831 0,632 0,293 0,660 -11 % CBD/CNBD-3 0,0816 0,637 0,275 0,663 -24 %
  • 26. THE CONTEST PARTICIPANTS Companies US Universities Internation Universities DataLab U Pennsylvania U Frankfurt Targetbase U Connecticut Tech Uni Munich Hewlett-Packard UT Dallas Leuven U Washington PUC Chile SAS OK State U Duisburg-Essen Alliance Data Commenius U Old Dominion U Thinkanalytics, LLC BU Vienna Georgia State DK Shiffet & Assoc Ltd. SUNY New Platz U Wisconsin W
  • 27. THE CONTEST MODELS • Ad Hoc • Linear Regression • Hierarchical Bayesian • BG/NBD, MBG-NBD, CBG-NBD, Pareto/NBD • Bayesian Seemingly Unrelated Regressions • Probit / logistic regression • Tobit • ARIMA • ArtXP Time Series • Support Vector Machines • Trees • Kohonen Networks • Feedforward Neural Networks • Stochastic Microanalytical Simulations No Markov chain models though
  • 28. THE CONTEST OUTCOME TASK 1: CUSTOMER EQUITY
  • 29. THE CONTEST OUTCOME TASK 2: CUSTOMER LIFETIME VALUE
  • 30. THE CONTEST WINNING MODEL •HP Labs - published paper •8 Segments via Classification & Regression Trees •Logit Model for Estimating Activeness •Log-Linear Model for Estimating Donation Sum •Also used R for computations
  • 31. CONCLUSIONS BY DMEF • Even the Best Model is still ,bad‘ (factor 5.4) • It is important to get to know your data with EDA • CLV Models are not commodities „It’s more the modeler than the model“ • Duke Teradata Churn Competition • Organizations should follow Contest approach • Split Data Sets (Modeling, Validation) • Stress Tests • Benchmark

×