Bayesian Model Averaging
Bayesian Mixer, 27.09.2016
Bayesian Model Averaging (BMA) - 1 minute version
New Project - how much does it worth?
CFO VP of Growth
Net Present Value: $50m $100m
after evaluating both
models and market
$15m + $70m = $85m
K = 2
Bayesian Model Averaging (BMA) - 3 minute version
VP of Growth
$10 $12 $15
$4 72 129 149
$6 62 112 133
$8 51 92 101
Sensitivity Analysis for M2
Bayesian Model Averaging (BMA) - 5 minute version
Bayesian Model Averaging: A Tutorial
Jennifer A. Hoeting, David Madigan, Adrian E. Raftery and Chris T. Volinsky
How much do you trust your
VP and CFO, before you look
Scary normalising term
that you can ignore
Prior probability for
Bayesian answer to overfitting
- model selection
You just get the best job in the galaxy
Your new Boss Business domain Modelling case
Always test your models on synthetic data that you understand and control
- Fraud Detection
- Inventory Sourcing
- Prediction range is needed, so that you can identify fraudulent transactions
(sand people under-reporting real transaction size and pocketing profit)
- Sale price should be easily explainable, as a function of various Droid Features
so that Jabba can invest in appropriate scavenging/sourcing projects
- You want lowest prediction error possible
so that you are not feeded to Sarlacc
Model Selection - classical method
credits ~ height + weight + power + dents + rad + wheels + legs + red + blue + black + temperature + lat + long + ir_emit + dents_log + height_log + weight_log + power_log + rad_log
Adj. R2: 0.884974385182
Model Selection - backward elimination
credits ~ weight + power + dents + rad + wheels + blue + black + temperature + lat + dents_log + height_log + weight_log + power_log
Adj. R2: 0.903544333611
Bayesian Model Averaging for Linear Models - a special case
Inclusion probability for (regression coefficients) are weighted across all possible models
Number of models = combinations of all K features (include/exclude) = 2K
How to actually do BMA? (in R)
Mature. A.k.a. “the original”
Developed by PhD duringresearch. Not maintained
Newest. Maintained by Chair
of the Department ofStatistical Science at Duke
BMA using BMS (R) package
Model Selection L2 Regularisation BMA
MSE 9736.49 7782.21 7329.44
But you can find inputs into data generator script that will not work as well!
Nice things you get from BMA
Posterior Inclusion Probability!
How cool is that!
MCMC can beused, if number of
features is large
Best model, according toBMA
Can we use it for more complex models?
that you can ignore
Warning:Very questionable math.
Does not work
Can we use BMA to combine complex (incl. hierarchical) models?
Model order is somewhat similar. Relative probabilities are not.
We need working Reverse-Jump MCMC or something more sophisticated.
Not available in common bayesian MCMC packages yet.
- BMA is a Bayesian version of ML Model Ensembles
- Math behind is quite beautiful
- Model Averaging is useful for interpretation, not only prediction
- Invest in synthetic data generation,
- before applying new modelling techniques to real-world data
- Even if you are not using BMA, fit different models
- And combine them, if your goal is prediction
- BMA works very well for common GLMs, but does not work yet for arbitrary
- Do try it next time you need to fit OLS, though!
Of course we are hiring!
● (Snr, Mid) Data Scientists
● Solutions Architect
● Ruby Developer
● Data Engineer
● Senior Artist
● Technical Artist
● Unity Developers
● Senior Product Manager
● Product Director