Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Iterative Multilevel Empirical Bayes (IMEB): An efficient flexible and robust solution for large scale conjoint

316 views

Published on

Download this presentation on our website https://goo.gl/EVrXiy

We described an approach that fixes the upper level prior based on the data (hence Empirical Bayes) and cross-validation of a scale parameter. This fixed prior eliminates MCMC iterations, making it much faster than standard HB. The Empirical Bayes approach also avoids issues of inadequate convergence and improper scaling that may occur with MCMC.

Published in: Business
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1url.pw/IL4a1 ◀ ◀ ◀ ◀
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Iterative Multilevel Empirical Bayes (IMEB): An efficient flexible and robust solution for large scale conjoint

  1. 1. Iterative Multilevel Empirical Bayes (IMEB): Flexible and Robust Solution for Large Scale Conjoint Kevin Lattery | AMA ART Forum June 2017
  2. 2. 1) What is Iterative Multilevel Empirical Bayes?  Empirical Bayes vs “Standard HB”  Iterating Empirical Bayes  Non-Technical Summary 2) Results with Large Scale Conjoint 3) Additional Benefits of Iterative Empirical Bayes 4) How Does IMEB Perform with Small Nice Data? Agenda 2 Fairly Technical
  3. 3. What is Iterative Empirical Bayes? Answer: Empirical Bayes + Iterations 3 p(y|q) h(q|h)
  4. 4. Intro to Empirical Bayes: Prior as Empirical vs Unknown 4 Multivariate Normal(h) Means (Alpha), Cov Matrix Respondent Level Betas (q) Predict Respondent Choices (y) Logistic Regression p(h) dh Likelihood: p(y|q) Prior: h(q|h) Standard HB: h is unknown p(y|q) h(q|h) Empirical Bayes: h fixed empirically p(y|q) h(q|h) No Integral Directly Estimate Respondent Betas q | y,h Complex Posterior • MCMC Converge to Stationary Distribution • Gibbs Sampling?
  5. 5. How to Empirically Estimate Upper Level MVN? 5 Aggregate Level Model (Logistic Regression) Upper Level Means Betas Hessian* Covariance Matrix of Parameters Fisher Information (FI) Matrix (p x p) *Maximize likelihood  FI = negative Hessian *Minimize negative likelihood  FI = Hessian Inverse (FI) -1 ≈ 2/𝑛 (scalar analogy) (FI/𝑛)-1 ≈ 2 (n = num resp) (k * FI/𝒏 )-1 Unit Fisher Prior Cov =I will tell you how to get k later (Scaled Unit Fisher)
  6. 6. Given Empirical Prior, Do Respondent Level Regression Solve for each respondent: 6 log_pdensity <- function(beta, alpha, fish_inf) { result <- -.5 * ((t(beta - alpha) %*% (fish_inf)) %*% (beta - alpha)) } p(y|q) * h(q|h) = ln[p(y|q)] + ln[h(q|h)] MNL predicted probability | beta density of (beta | multivariate normal)* LogLikelihood Total = m = alpha, S = fish_inf -1
  7. 7. R Code for Solving p(y|q) h(q|h) − One Respondent 7 LLn_EB <- function(beta, ind, task, dep, alpha, fish_inf) { predprob <- PredProb(beta, ind, task) LLTot <- -1 * (sum((log(predprob) * dep)) + log_pdensity(beta, alpha, fish_inf)) return(LLTot)} # Negative Log Likelihood (Minimize) LLn_EB_grad <- function(beta, ind, task, dep, alpha, fish_inf) { predprob <- PredProb(beta, ind, task) grad1 <- (t(ind) %*% (predprob - dep)) #gradient of MNL grad2 <- (fish_inf) %*% (beta - alpha) #gradient of MVN Density grad <- grad1 + grad2 return(grad)} eb_solve <- optim(par = x0, fn = LLn_EB, gr = LLn_EB_grad, control = list(), method = "L-BFGS-B", lower = lb, upper = ub, ind, task, dep, alpha, fish_inf = scale_fac * inputlist$fish_inf) PredProb is function giving probabilities from MNL | betas, independent vars dep is vector of choices Optional Gradient function Lower level data Upper level data
  8. 8. Calibrating Scale Factor k for Unit Fisher Cross-Validation via Jack-Knife (LOO) on Holdout Tasks Pick N respondents (possibly all), Recommend [30, 300] Assume a specific scale factor k 1) For each of N respondents, remove 1 task as holdout 2) Do respondent level regressions with S-1 = unit FI * k 3) Compute LogLikelihood for holdout task Repeat steps 1-3 using different holdout tasks 8 Test each scale factor using line search – used Golden Section Search Most Time Consuming Part I used 300 respondents with 3 holdouts (1 at a time)
  9. 9. Two Step Basics of Multilevel Empirical Bayes 1) Empirically determine prior MVN (Multivariate Normal) Described How: Fit aggregate level model  Agg Betas, Inverse of Scaled Unit Fisher Information Matrix 2) Respondent Level Regression LogLikelihood of MNL + LN(Density of MVN | Betas) 9 No Integration, MCMC Our Empirical Bayes is double-dipping! Rossi & Allenby JMR May 1993 A Bayesian Approach to Estimating Household Parameters  Scaled Unit Fisher as Prior  But No Cross Validation of Scaling
  10. 10. 10 4 to 20 Iterations improve holdout fit: After respondent level regression we have more information about the upper level mean and covariance matrix • I chose Bayesian updating for joint Alpha and Cov Perhaps more relevant information: Cov of Resp Betas vs Cov of Parameter Estimates  Bryan Orme Let’s double-dip some more! EM Like vs “Proper/Full” Bayes  Replaces Lattery 2007 Sawtooth (EM CBC)  Finding Local Optimum Given Agg Model as Starting Point and Goal of Holdout LogLikelihood Test improvement using LOO on holdout tasks constant across iterations Iterative: Update Upper MVN and Repeat
  11. 11. 11 next_alpha <- function(betas, prior_alpha, k0 = nrow(betas)) { result <- ((k0 * prior_alpha) + colSums(betas)) / (k0 + nrow(betas)) return(result) } next_Fish <- function(prior_alpha, prior_cov, betas v0 = nrow(betas) + ncol(betas), # v0 = n + p k0 = nrow(betas)) { # prior alpha and cov are upper level priors # betas are respondent level betas # k0 is n size basis of alpha (set to n), v0 is total degrees of freedom (set to n + p) n <- nrow(betas); p <- ncol(betas) xbar <- colMeans(betas) xbar_diff <- (xbar - prior_alpha) %*% t(xbar - prior_alpha) alpha_covn <- ((k0 * n) / (k0 + n)) * (xbar_diff %*% t(xbar_diff)) # component 3 ab_diff <- t(apply(betas, 1, function(x) x - xbar)) beta_covn <- t(ab_diff) %*% ab_diff # component 2 = C covariance of betas cov_new_mean <- (prior_cov * n + beta_covn + alpha_covn) / (n + v0 - p - 1) # next covariance return(solve(cov_new_mean)) # returns inv(cov) = Fisher } MVN for 𝜇, ∑ Bayesian Update of Upper Level - Details Inverse Wishart (parameters) Conjugate Prior Expected Value
  12. 12. Simpson Overview of Iterative Multilevel Empirical Bayes 1) Empirically set the MVN (Multivariate Normal Prior) 2) Respondent Level Regression LogLikelihood of MNL + LN(Density of MVN | Betas) 12 Update MVN, scale factor Repeat 4-20 times
  13. 13. Large Scale Conjoint Results Standard HB and Iterative EB 13
  14. 14. What’s in Our Large-Scale Conjoint? o Attributes  275 SKUs across several brands  Price  None option o Conjoint Tasks  Each shelf showed 68 SKUs Multiple picks allowed (up to 10) Each respondent evaluated 10 shelves 36 Versions x 10 Tasks 14 3681 Respondents after cleaning 2.5 Million Stacked Rows of Data Many Parameters to Estimate Many Alternatives Per Task Medium # Respondents Later: 1 Million Respondents? No Problem
  15. 15. Test Specifications of Large Scale Conjoint and Speed 15 Sawtooth Prior Var = 1, DF = 5  No Constraints 120,000 Burn-In Iterations + 20,000  74+ Hours • Used Random 1200 of 3681 Respondents (For Speed & Lesson) • 1 Holdout Task Per Respondent • Estimation Using SuperFast Desktop (4.8 GHz, SSD, 64GB RAM) Calibrate scale with 300 respondents x 3 Tasks = 900 LOO holdout tasks 6 hours 32 minutes for 20 iterations 21 Minutes Aggregate Model 19 minutes each iteration Standard HB Iterative Empirical Bayes MUCH Faster!
  16. 16. 16 After 20 Iterations, holdout fit levels off  Other “normal size” studies level off sooner Note: Internal holdouts showed best fit at 20 iterations Yes! Iterations Significantly Help Empirical Bayes Predict Holdouts
  17. 17. IMEB Better than Standard HB (even after tuning) 17 Raw HB Betas  Prior Var = 1  DF = 5 Tuned HB Betas  Raw * .7 Key Caveat: Standard HB did not converge
  18. 18. Check Standard HB Convergence with Gelman-Rubin 18 Run HB 2 or more times @ different starting parameters • Here I used slightly different starting alphas near 0 • Same Prior Cov of 1 • Should be more diverse Pg 282-285 𝑅 = 1.03 Want 𝑅 value close to 1 • Gelman suggests 𝑅 < 1.1 • Practicioners maybe < 1.2 or 1.5 Also Retzer & Pflughoeft ART Forum Poster
  19. 19. 120,000 + 20,000 Iterations Nowhere Near Convergence 19 # Parameters in 𝑅 Range Horrible Really Bad Not Good OK Good 𝑅 Range 120 – 140k Only Slightly Better than 20 - 30k Iterations
  20. 20. Multiple Chains Clearly Reveal Non-Convergence 20 𝑅 = 2.49𝑅 = 6.79 What do we do if we know MCMC does not converge? EB Anyone?
  21. 21. Additional Benefits of Iterative Empirical Bayes 21
  22. 22. Iterative Empirical Bayes: Projectable Model 22 Estimated Priors (Alpha and Scaled Fisher) with 1200 Respondents  Use those priors (iteration 20) for respondent level regression on remaining 2481 respondents  No more calibrations or iterations How Long for n= 2481? 7 minutes and 6 seconds X Note on X: LL for N= 2481 multiplied by 1200/2481 1 Million Respondents? Under 48 hours on 1 PC (1 core) in R Estimation can be split across computers (results same with parallel computing ) Better than initial sample
  23. 23. Flexible Power of IMEB 23  Prof Ely Dahan cancer patients  Trackers Can Keep Upper Level (Periodic Update)  On the Fly (in Survey) Utility Estimation Quick & Robust Estimate of New Respondents Based on Initial Respondent Specific Regression Respondent Specific  Constraints  Parameters  Models/ Custom Utility Functions 1 Million Respondents and Hundreds of Parameters? No Problem Better handling of constraints in general?
  24. 24. How Does Iterative EB Perform with Small Nice Data? My Expectation: Much Worst 24
  25. 25. Small Scale Conjoint: Sawtooth Software Prize Data 25 Sawtooth Prior Var = 1, DF = 5 20,000 Burn-In + 10,000 10 minutes • 21 Degrees of Freedom, N = 600 • Single Choice from 4 Alternatives • 15 Tasks Per Respondent  Pulled 2 Tasks each Respondent as Holdouts (600 x 2) Calibrate scale with 300 respondents x 3 Tasks = 900 LOO holdout tasks 3 minutes per iteration (45 Minutes) Standard HB Iterative Empirical Bayes Opposite of Our Large Scale Conjoint Both Methods estimated on Kevin’s Laptop 2.6GHz o Iterative Empirical Bayes will take longer than Standard HB for Small Data  IMEB calibration of scale parameter is costly baseline time  But time scales much better with more respondents/parameters
  26. 26. Standard HB Very Good Convergence (All Below 1.1) Destination Cruise #Days StateRoom Amenities Price 1.005 1.013 1.029 1.047 1.010 1.010 1.014 1.002 1.031 1.013 1.010 1.014 1.000 1.029 1.003 1.045 1.050 1.004 1.065 1.011 1.100 1.032 1.035 1.063 1.082 1.022 1.023 26 𝑹 Values of Alphas 𝑹 Values of Covariance Matrix (231 Items) Max 1.24 Only 31 have 𝑅 > 1.1 Note: Different starting alphas Prior Var = 1 for both chains 1.09964!
  27. 27. 27 Iterative EB Matches Performance of Perfectly Tuned HB Tuned HB Betas  Raw * .5  Semi-Cheating Raw HB Utilities Internal Holdouts, Size of Upper Prior Variance
  28. 28. 28 Hit Rates Are Also Comparable Note: Tuning HB scale does not impact Hit Rates (Tuned = Raw Hit Rate) o Iterations significantly help with hit rate and likelihood o Based on other studies, hit rates with EB tend to be slightly lower than HB
  29. 29. 29 Iterative EB Can Fit as Well as Standard HB with Convergence How much practical value do we get from integrating over uncertainty of upper MVN? This was nearly ideal case for Standard HB
  30. 30. Summary 30
  31. 31. Summary “Standard HB” Integrates Over Upper Level Prior (Multivariate Normal) o Theoretically Great (Best?) Solution IF: 31 Estimate Integral Well Proper Scaling Empirical Bayes (Iterative) Better Methods for Complex Integrals Gibbs Sampling  Hamiltonian Monte Carlo? (Stan) Some Alternatives Difficult in Large Scale Conjoint Tuning that Many Ignore Integration  Iterating From Good Initial to Local Optimum via Cross-Validation
  32. 32. IMEB is Powerful & Flexible Tool o Fast, Especially for Larger Data o Scalable, Apply Upper Level Model to Any Data New Respondents (Dahan’s Cancer Patients) • Instant computer scoring Trackers can keep same upper level (periodic updates) Apply to different data set with subset of parameters o Respondent Level Regressions = Control at Respondent Level Respondent Specific… Constraints, Parameters, Models/ Custom Utility Functions, Links to Other Models Easy Parallel Computing 32 Many Potential Improvements
  33. 33. SKIMgroup.com Thank You! Kevin Lattery VP Methodology & Innovation k.lattery@skimgroup.com Special thanks to: Greg Allenby, Bryan Orme, Elea McFeit, Joe Retzer, Sean O’Connor

×