Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bayesian model averaging


Published on

Introduction to Bayesian Model Averaging. Presented at Bayesian Mixer in September 2016

Published in: Science
  • Be the first to comment

Bayesian model averaging

  1. 1. Volodymyrk Bayesian Model Averaging Bayesian Mixer, 27.09.2016 London, UK
  2. 2. Volodymyrk Bayesian Model Averaging (BMA) - 1 minute version New Project - how much does it worth? CFO VP of Growth Net Present Value: $50m $100m Model M1 Model M2 30%CEO belief: after evaluating both models and market data 70% $15m + $70m = $85m K = 2
  3. 3. Volodymyrk Bayesian Model Averaging (BMA) - 3 minute version VP of Growth CLV assumptions $10 $12 $15 CAC $4 72 129 149 $6 62 112 133 $8 51 92 101 Average= $100.11m Sensitivity Analysis for M2 DATA
  4. 4. Volodymyrk Bayesian Model Averaging (BMA) - 5 minute version Bayesian Model Averaging: A Tutorial Jennifer A. Hoeting, David Madigan, Adrian E. Raftery and Chris T. Volinsky How much do you trust your VP and CFO, before you look at models? Scary normalising term that you can ignore Prior probability for model parameter
  5. 5. Volodymyrk Bayesian answer to overfitting Frequentist: - model selection - regularisation Bayesian: - BMA - marginalisation
  6. 6. Volodymyrk Case Study You just get the best job in the galaxy
  7. 7. Volodymyrk Your new Boss Business domain Modelling case Always test your models on synthetic data that you understand and control
  8. 8. Volodymyrk Use Cases: - Fraud Detection - Inventory Sourcing Data
  9. 9. Volodymyrk Modelling goals - Prediction range is needed, so that you can identify fraudulent transactions (sand people under-reporting real transaction size and pocketing profit) - Sale price should be easily explainable, as a function of various Droid Features so that Jabba can invest in appropriate scavenging/sourcing projects - You want lowest prediction error possible so that you are not feeded to Sarlacc
  10. 10. Volodymyrk Data Generation Class-1 Class-2 Class-3 Class-4 durability circuitry height weight price ... age
  11. 11. Volodymyrk Data Collection
  12. 12. Volodymyrk Model Selection - classical method credits ~ height + weight + power + dents + rad + wheels + legs + red + blue + black + temperature + lat + long + ir_emit + dents_log + height_log + weight_log + power_log + rad_log Adj. R2: 0.884974385182
  13. 13. Volodymyrk Model Selection - backward elimination
  14. 14. Volodymyrk Final Model credits ~ weight + power + dents + rad + wheels + blue + black + temperature + lat + dents_log + height_log + weight_log + power_log Adj. R2: 0.903544333611
  15. 15. Volodymyrk Model Evaluation (out-of-sample)
  16. 16. Volodymyrk Ridge regression (L2 regularisation)
  17. 17. Volodymyrk Bayesian Model Averaging for Linear Models - a special case Inclusion probability for (regression coefficients) are weighted across all possible models Number of models = combinations of all K features (include/exclude) = 2K
  18. 18. Volodymyrk How to actually do BMA? (in R) Mature. A.k.a. “the original” Developed by PhD duringresearch. Not maintained Newest. Maintained by Chair of the Department ofStatistical Science at Duke
  19. 19. Volodymyrk BMA using BMS (R) package Model Selection L2 Regularisation BMA MSE 9736.49 7782.21 7329.44 It worked! But you can find inputs into data generator script that will not work as well!
  20. 20. Volodymyrk Nice things you get from BMA Posterior Inclusion Probability! How cool is that!
  21. 21. Volodymyrk Model ranking! MCMC can beused, if number of features is large Best model, according toBMA
  22. 22. Volodymyrk Can we use it for more complex models? normalising term that you can ignore Warning:Very questionable math. Does not work
  23. 23. Volodymyrk Can we use BMA to combine complex (incl. hierarchical) models? 1 3 2 Model order is somewhat similar. Relative probabilities are not. We need working Reverse-Jump MCMC or something more sophisticated. Not available in common bayesian MCMC packages yet.
  24. 24. Volodymyrk In Summary - BMA is a Bayesian version of ML Model Ensembles - Math behind is quite beautiful - Model Averaging is useful for interpretation, not only prediction - Invest in synthetic data generation, - before applying new modelling techniques to real-world data - Even if you are not using BMA, fit different models - And combine them, if your goal is prediction - BMA works very well for common GLMs, but does not work yet for arbitrary models - Do try it next time you need to fit OLS, though!
  25. 25. Volodymyrk Of course we are hiring! ● (Snr, Mid) Data Scientists ● Solutions Architect ● Ruby Developer ● Data Engineer ● Senior Artist ● Technical Artist ● Unity Developers ● Senior Product Manager ● Product Director