- 1. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Uncertainty Estimation in Deep Learning A brief introduction Christian S. Perone christian.perone@gmail.com http://blog.christianperone.com
- 2. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Agenda Uncertainties Knowing what you don’t know The problem Different Uncertainties Importance of Uncertainty Bayesian Inference The frequentist way The bayesian inference MCMC Sampling Deep Learning Short intro Bayesian Neural Networks Variational Inference Introduction Posterior Approximation Training a BNN Dropout Ensembles Introduction Deep Ensembles Randomized Prior Functions Final Remarks Q&A
- 3. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Who Am I Christian S. Perone BSc in Computer Science in Brazil (UPF), MSc in Biomedical Eng. in Montreal (Polytechnique/UdeM) Machine Learning / Data Science Working at Jungle Blog at blog.christianperone.com Open-source projects https://github.com/perone Twitter @tarantulae
- 4. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Section I Uncertainties
- 5. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Knowing what you don’t know It is correct, somebody might say, that (...) Socrates did not know anything; and it was indeed wisdom that they recognized their own lack of knowledge, (...). —Karl R. Popper, The World of Parmenides
- 6. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Knowing what you don’t know It is correct, somebody might say, that (...) Socrates did not know anything; and it was indeed wisdom that they recognized their own lack of knowledge, (...). —Karl R. Popper, The World of Parmenides What this has to do statistical learning ?
- 7. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A The problem Let’s say you trained a model to classify an image as having lesion or not; Different MRI contrasts (T2/T1). Source: http://www.msdiscovery.org. 2019.
- 8. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A The problem Let’s say you trained a model to classify an image as having lesion or not; Different MRI contrasts (T2/T1). Source: http://www.msdiscovery.org. 2019. Later you do prediction on volumes with different parametrization, anatomy, etc;
- 9. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A The problem Let’s say you trained a model to classify an image as having lesion or not; Different MRI contrasts (T2/T1). Source: http://www.msdiscovery.org. 2019. Later you do prediction on volumes with different parametrization, anatomy, etc; The problem: you can still have a prediction with high probability, even if your sample is out-of-distribution.
- 10. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A The problem A simple regression problem. Source: Yarin Gal. Uncertainty in Deep Learning. PhD Thesis. 2016.
- 11. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A The problem A simple regression problem. 6 4 2 0 2 4 6 20 10 0 10 20 30 40 Source: Ian Osband et al. Using Randomized Prior Functions for Deep Reinforcement Learning. NIPS 2018. Image from: http://blog.christianperone.com
- 12. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Different Uncertainties Two main types of uncertainty, often confused by practitioners, but very different quantities:
- 13. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Different Uncertainties Two main types of uncertainty, often confused by practitioners, but very different quantities: Aleatoric Uncertainty Information data cannot explain, also called data uncertainty, or irreducible uncertainty. More data might not reduce it; Ex: increasing measurement precision can reduce it.
- 14. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Different Uncertainties Two main types of uncertainty, often confused by practitioners, but very different quantities: Aleatoric Uncertainty Information data cannot explain, also called data uncertainty, or irreducible uncertainty. More data might not reduce it; Ex: increasing measurement precision can reduce it. Epistemic Uncertainty Uncertainty in the model itself, also called model uncertainty, or reducible uncertainty; Ex: can be explained away by increasing training size.
- 15. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Importance of Uncertainty Medical imaging (classification, segmentation);
- 16. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Importance of Uncertainty Medical imaging (classification, segmentation); Autonomous vehicles (what’s the uncertainty this object is a tree ?);
- 17. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Importance of Uncertainty Medical imaging (classification, segmentation); Autonomous vehicles (what’s the uncertainty this object is a tree ?); Active Learning (which sample should be labeled ?);
- 18. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Importance of Uncertainty Medical imaging (classification, segmentation); Autonomous vehicles (what’s the uncertainty this object is a tree ?); Active Learning (which sample should be labeled ?); Explore/exploit dilemma in reinforcement learning;
- 19. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Importance of Uncertainty Medical imaging (classification, segmentation); Autonomous vehicles (what’s the uncertainty this object is a tree ?); Active Learning (which sample should be labeled ?); Explore/exploit dilemma in reinforcement learning; Out-of-distribution detection;
- 20. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Importance of Uncertainty Medical imaging (classification, segmentation); Autonomous vehicles (what’s the uncertainty this object is a tree ?); Active Learning (which sample should be labeled ?); Explore/exploit dilemma in reinforcement learning; Out-of-distribution detection; Model understanding/dataset understanding;
- 21. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Importance of Uncertainty Medical imaging (classification, segmentation); Autonomous vehicles (what’s the uncertainty this object is a tree ?); Active Learning (which sample should be labeled ?); Explore/exploit dilemma in reinforcement learning; Out-of-distribution detection; Model understanding/dataset understanding; Nearly all applications !
- 22. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Example in Reinforcement Learning The explore/exploit dilemma:
- 23. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Example in Reinforcement Learning Work by Maxime Wabartha et al.: estimated by taking, for each approach, the pointwise average and standard deviation over 50 sampled functions. We expect the empirical posterior predictive distribution to cover the ground truth function. While we succeed to do so using a MSE loss and the proposed approach, we do not manage to obtain diverse functions using solely anchoring neither using dropout; in our experiments, changing the dropout rate did not improve the quality of the obtained uncertainty. Input bootstrapping does produce functions that better span the width of outputs, but it also disregards by nature certain points of the training set, where we expect the uncertainty to be low given our current knowledge. We also provide in the appendix an example of the functions generated by our function approach when ﬁxing X. 0.4 0.2 0.0 0.2 0.4 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Dropout 0.2 0.4 0.2 0.0 0.2 0.4 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Input bootstrapping 0.4 0.2 0.0 0.2 0.4 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 AnchoringGround truth Sample function Standard deviations Training set 0.4 0.2 0.0 0.2 0.4 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 RepulsiveReference function Figure 1: Comparison of the empirical (over 20 sample functions) posterior predictive distribution for dropout, input bootstrapping, anchoring and repulsive constraint. 3.2 Diverse functions in high-dimensional input space We apply the method to function approximation in the case of a reinforcement learning problem requiring exploration. More precisely, we showcase how our method can help sample diverse reward functions in a model-based setting. We create a dataset of 43 13x13 frames with the associated reward. We use as function approximator a small CNN outputing a reward for a given frame (see appendix). To illustrate our method, we sample the repulsive points from possible frames, thus directly from the manifold, in or out of the training distribution (see appendix). Figure 2 (rightmost ﬁgure) shows how Source: Maxime Wabartha et al. Sampling diverse neural networks for exploration in reinforcement learning. NIPS 2018.
- 24. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Section II Bayesian Inference
- 25. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A A simple frequentist regression In a frequentist linear regression, we have a point estimate for the parameters of our model. For a maximum likelihood derivation, take a look at http://blog.christianperone.com/2019/01/mle/.
- 26. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A A simple frequentist regression In a frequentist linear regression, we have a point estimate for the parameters of our model. First, we define our model: f(x) = θ0 + θ1x1 + θ2x2 + . . . = Vectorial notation x β For a maximum likelihood derivation, take a look at http://blog.christianperone.com/2019/01/mle/.
- 27. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A A simple frequentist regression In a frequentist linear regression, we have a point estimate for the parameters of our model. First, we define our model: f(x) = θ0 + θ1x1 + θ2x2 + . . . = Vectorial notation x β Later, we define a loss such as the MSE (mean squared error): L = 1 n n i=1 (f(xi) − yi)2 For a maximum likelihood derivation, take a look at http://blog.christianperone.com/2019/01/mle/.
- 28. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A A simple frequentist regression In a frequentist linear regression, we have a point estimate for the parameters of our model. First, we define our model: f(x) = θ0 + θ1x1 + θ2x2 + . . . = Vectorial notation x β Later, we define a loss such as the MSE (mean squared error): L = 1 n n i=1 (f(xi) − yi)2 Finally, we optimize it: ˆθ = arg min θ L(f(x), y) For a maximum likelihood derivation, take a look at http://blog.christianperone.com/2019/01/mle/.
- 29. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A A simple frequentist regression 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 y Frequentist regression sample data regression line
- 30. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A The bayesian way Bayesian approaches represent the uncertainty using a distribution over parameters. Instead of a point estimate, we have an entire posterior.
- 31. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A The bayesian way Bayesian approaches represent the uncertainty using a distribution over parameters. Instead of a point estimate, we have an entire posterior. To formulate our bayesian regression, we first select a likelihood;
- 32. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A The bayesian way Bayesian approaches represent the uncertainty using a distribution over parameters. Instead of a point estimate, we have an entire posterior. To formulate our bayesian regression, we first select a likelihood; After that, we select priors over parameters;
- 33. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A The bayesian way Bayesian approaches represent the uncertainty using a distribution over parameters. Instead of a point estimate, we have an entire posterior. To formulate our bayesian regression, we first select a likelihood; After that, we select priors over parameters; Then we compute or approximate (sampling) the posterior of our model and data.
- 34. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Prior, likelihood and posterior 1 2 3 Credibility Prior
- 35. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Prior, likelihood and posterior 1 2 3 Credibility Prior 1 2 3 Credibility Data
- 36. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Prior, likelihood and posterior 1 2 3 Credibility Prior 1 2 3 Credibility Data 1 2 3 Credibility Posterior
- 37. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Prior, likelihood and posterior Posterior p(θ|X)
- 38. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Prior, likelihood and posterior Posterior p(θ|X) ∝ p(X|θ) Likelihood
- 39. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Prior, likelihood and posterior Posterior p(θ|X) ∝ p(X|θ) Likelihood Prior π(θ)
- 40. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A posterior 0 0.5 1 likelihood 0 0.5 1 prior 0 0.5 1 ⇥ / posterior 0 0.5 1 prior 0 0.5 1 ⇥ / ⇥ / likelihood 0 0.5 1 prior 0 0.5 1 likelihood 0 0.5 1 posterior 0 0.5 1 Source: Statistical Rethinking/Winter 2019. Richard McElreath.
- 41. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian regression Let’s reformulate our regression: We will use a simple Gaussian distribution for our observations, defined as: Y ∼ N(µ, σ2 )
- 42. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian regression Let’s reformulate our regression: We will use a simple Gaussian distribution for our observations, defined as: Y ∼ N(µ, σ2 ) We plug our regression of the µ: Y ∼ N( α + βx Linear model , σ2 )
- 43. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian regression Let’s reformulate our regression: We will use a simple Gaussian distribution for our observations, defined as: Y ∼ N(µ, σ2 ) We plug our regression of the µ: Y ∼ N( α + βx Linear model , σ2 ) And define the priors: α ∼ N(0, 20) β ∼ N(0, 20) σ ∼ U(0, 5)
- 44. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian Regression in Plate Notation You can represent the same model below with plate notation: Y ∼ N(α + βx, σ2 ) α ∼ N(0, 20) β ∼ N(0, 20) σ ∼ U(0, 5)
- 45. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian Regression in Plate Notation You can represent the same model below with plate notation: Y ∼ N(α + βx, σ2 ) α ∼ N(0, 20) β ∼ N(0, 20) σ ∼ U(0, 5)
- 46. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A MCMC Sampling Let’s see a demo of a Monte Carlo Markov Chain sampler: Source: MCMC Demos, by Chi Feng
- 47. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A MCMC Sampling 0.7 0.8 0.9 1.0 1.1 1.2 0 2 4 Frequency Intercept 0 1000 2000 3000 4000 0.8 1.0 1.2 Samplevalue Intercept 1.6 1.8 2.0 2.2 2.4 0 1 2 3 Frequency x 0 1000 2000 3000 4000 1.5 2.0 Samplevalue x 0.45 0.50 0.55 0.60 0 5 10 15 Frequency sigma 0 1000 2000 3000 4000 0.5 0.6 Samplevalue sigma Trace plot generated using PyMC3, you can also use ArviZ.
- 48. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian regression 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 y Posterior predictive regression lines sample data posterior predictive regression lines
- 49. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian methods Bayesian methods can give us a full posterior to reason about; 1 Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
- 50. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian methods Bayesian methods can give us a full posterior to reason about; Explicit priors; 1 Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
- 51. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian methods Bayesian methods can give us a full posterior to reason about; Explicit priors; Uncertainty; 1 Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
- 52. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian methods Bayesian methods can give us a full posterior to reason about; Explicit priors; Uncertainty; They’re on the side of algorithms, not models 1; 1 Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
- 53. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian methods Bayesian methods can give us a full posterior to reason about; Explicit priors; Uncertainty; They’re on the side of algorithms, not models 1; However, Intractable posterior for many practical cases and large datasets; p(θ|X) = p(X|θ)π(θ) p(X) 1 Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
- 54. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian methods Bayesian methods can give us a full posterior to reason about; Explicit priors; Uncertainty; They’re on the side of algorithms, not models 1; However, Intractable posterior for many practical cases and large datasets; p(θ|X) = p(X|θ)π(θ) p(X) Tuning and using MCMC algorithms can be tricky. 1 Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
- 55. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Section III Deep Learning
- 56. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Learning It’s not a secret that Deep Learning reached an important milestone in Machine Learning: Non-linear function approximators;
- 57. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Learning It’s not a secret that Deep Learning reached an important milestone in Machine Learning: Non-linear function approximators; They can scale to large datasets (thanks to stochastic approximation);
- 58. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Learning It’s not a secret that Deep Learning reached an important milestone in Machine Learning: Non-linear function approximators; They can scale to large datasets (thanks to stochastic approximation);
- 59. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Learning It’s not a secret that Deep Learning reached an important milestone in Machine Learning: Non-linear function approximators; They can scale to large datasets (thanks to stochastic approximation); They are state-of-the-art for NLP, computer vision, speech, etc;
- 60. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Learning It’s not a secret that Deep Learning reached an important milestone in Machine Learning: Non-linear function approximators; They can scale to large datasets (thanks to stochastic approximation); They are state-of-the-art for NLP, computer vision, speech, etc; Very expressive and flexible;
- 61. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Learning It’s not a secret that Deep Learning reached an important milestone in Machine Learning: Non-linear function approximators; They can scale to large datasets (thanks to stochastic approximation); They are state-of-the-art for NLP, computer vision, speech, etc; Very expressive and flexible; Representation learning;
- 62. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A One-slide Intro to Deep Learning x0 x1 ... xD y (1) 0 y (1) 1 ... y (1) m(1) . . . . . . . . . y (L) 0 y (L) 1 ... y (L) m(L) y (L+1) 1 y (L+1) 2 ... y (L+1) C input layer 1st hidden layer Lth hidden layer output layer A multi-layer perceptron (MLP) network overview. Source: David Stutz, 2018, BSD 3-Clause License. Parametrized models with composition of functions; Trained using backpropagation and SGD; Learned usually by maximizing the log likelihood;
- 63. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian Neural Networks A Bayesian Neural Network (BNN) is a Neural Network with distributions over parameters2. 2 Neal, Radford M. (2012). Bayesian learning for neural networks.
- 64. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian Neural Networks A Bayesian Neural Network (BNN) is a Neural Network with distributions over parameters2. Source: Weight Uncertainty in Neural Networks. Charles Blundell et al. 2015. 2 Neal, Radford M. (2012). Bayesian learning for neural networks.
- 65. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian Neural Networks In modern Deep Neural Networks, however, we have some challenges:
- 66. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian Neural Networks In modern Deep Neural Networks, however, we have some challenges: A lot of data;
- 67. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian Neural Networks In modern Deep Neural Networks, however, we have some challenges: A lot of data; High-dimensionality in data;
- 68. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian Neural Networks In modern Deep Neural Networks, however, we have some challenges: A lot of data; High-dimensionality in data; Millions of parameters;
- 69. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian Neural Networks In modern Deep Neural Networks, however, we have some challenges: A lot of data; High-dimensionality in data; Millions of parameters; Highly non-convex surfaces;
- 70. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bayesian Neural Networks In modern Deep Neural Networks, however, we have some challenges: A lot of data; High-dimensionality in data; Millions of parameters; Highly non-convex surfaces; This makes these models very difficult for Bayesian methods, therefore an approximation is required: Variational Inference (variational bayes)
- 71. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Section IV Variational Inference
- 72. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Variational Inference Variational Inference (VI) is often used as an alternative to MCMC;
- 73. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Variational Inference Variational Inference (VI) is often used as an alternative to MCMC; Can be used to approximate the posterior of Bayesian models;
- 74. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Variational Inference Variational Inference (VI) is often used as an alternative to MCMC; Can be used to approximate the posterior of Bayesian models; Faster than MCMC for complex models and larger datasets;
- 75. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Variational Inference Variational Inference (VI) is often used as an alternative to MCMC; Can be used to approximate the posterior of Bayesian models; Faster than MCMC for complex models and larger datasets; Shift from sampling to optimization;
- 76. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Variational Inference Variational Inference (VI) is often used as an alternative to MCMC; Can be used to approximate the posterior of Bayesian models; Faster than MCMC for complex models and larger datasets; Shift from sampling to optimization; Less guarantees than MCMC, density close to the target;
- 77. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Variational Inference Variational Inference (VI) is often used as an alternative to MCMC; Can be used to approximate the posterior of Bayesian models; Faster than MCMC for complex models and larger datasets; Shift from sampling to optimization; Less guarantees than MCMC, density close to the target; For an in-depth review For a modern in-depth review please refer to: Variational Inference: A Review for Statisticians. Blei, D. M. et al (2018).
- 78. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Variational Inference We have a very complex posterior distribution p(w | D) that we want to approximate (w are the parameters, and D is the data);
- 79. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Variational Inference We have a very complex posterior distribution p(w | D) that we want to approximate (w are the parameters, and D is the data); We do this approximation by using an "easier" distribution q(w | θ) (also called the variational distribution, where θ are the variational parameters);
- 80. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Variational Inference We have a very complex posterior distribution p(w | D) that we want to approximate (w are the parameters, and D is the data); We do this approximation by using an "easier" distribution q(w | θ) (also called the variational distribution, where θ are the variational parameters); Variational approximation (green). Source: Eric Jang, 2016. https://blog.evjang.com
- 81. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Posterior approximation If we want to approximate p(w | D) with q(w | θ), we need a measure of "closeness";
- 82. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Posterior approximation If we want to approximate p(w | D) with q(w | θ), we need a measure of "closeness"; We use Kullback-Leibler (KL) divergence: Source: Flawnson Tong, https://towardsdatascience.com
- 83. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Posterior approximation We use Kullback-Leibler (KL) divergence: θ∗ = arg min θ KL[q(w | θ) || p(w | D)]
- 84. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Posterior approximation We use Kullback-Leibler (KL) divergence: θ∗ = arg min θ KL[q(w | θ) || p(w | D)] θ∗ = arg min θ log q(w | θ) variational posterior − log p(w) prior − log p(D | w) log likelihood
- 85. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Posterior approximation We use Kullback-Leibler (KL) divergence: θ∗ = arg min θ KL[q(w | θ) || p(w | D)] θ∗ = arg min θ log q(w | θ) variational posterior − log p(w) prior − log p(D | w) log likelihood Why KL-divergence ? Because it allows us to derive a cost that is tractable to optimization.
- 86. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Posterior approximation We use Kullback-Leibler (KL) divergence: θ∗ = arg min θ KL[q(w | θ) || p(w | D)] θ∗ = arg min θ log q(w | θ) variational posterior − log p(w) prior − log p(D | w) log likelihood Why KL-divergence ? Because it allows us to derive a cost that is tractable to optimization. Not without paying a price though.
- 87. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Forward and Reverse KL Forms of the KL-divergence. Source: Pattern Recognition and Machine Learning. Christopher M. Bishop. 2006. (a) forward KL-divergence, (b) and (c) reverse KL-divergence.
- 88. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Forward KL Source: Colin Raffel, https://colinraffel.com
- 89. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Forward KL (misspecification) Source: Colin Raffel, https://colinraffel.com
- 90. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Reverse KL Source: Colin Raffel, https://colinraffel.com
- 91. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Quality of the uncertainty estimation MFVB approximation. Source: Variational Bayes and beyond: Bayesian inference for big data. Tamara Broderick. ICML 2018.
- 92. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Quality of the uncertainty estimation MFVB approximation. Source: Variational Bayes and beyond: Bayesian inference for big data. Tamara Broderick. ICML 2018. Can underestimate variance severely; When compared to MCMC, means are usually fine, but variance is far away;
- 93. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Training a Bayesian Neural Network The training loop for a Bayesian Neural Network (BNN) using Variational Inference is shown below: Sample from q(w | θ) the parameters of the network. Two variational parameters for each weight in q: µ and σ;
- 94. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Training a Bayesian Neural Network The training loop for a Bayesian Neural Network (BNN) using Variational Inference is shown below: Sample from q(w | θ) the parameters of the network. Two variational parameters for each weight in q: µ and σ; Parametrize the network with the sampled parameters, often using the reparametrization trick;
- 95. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Training a Bayesian Neural Network The training loop for a Bayesian Neural Network (BNN) using Variational Inference is shown below: Sample from q(w | θ) the parameters of the network. Two variational parameters for each weight in q: µ and σ; Parametrize the network with the sampled parameters, often using the reparametrization trick; Forward pass with the data batch;
- 96. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Training a Bayesian Neural Network The training loop for a Bayesian Neural Network (BNN) using Variational Inference is shown below: Sample from q(w | θ) the parameters of the network. Two variational parameters for each weight in q: µ and σ; Parametrize the network with the sampled parameters, often using the reparametrization trick; Forward pass with the data batch; Calculate the combined loss: variational posterior, prior and log likelihood;
- 97. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Training a Bayesian Neural Network The training loop for a Bayesian Neural Network (BNN) using Variational Inference is shown below: Sample from q(w | θ) the parameters of the network. Two variational parameters for each weight in q: µ and σ; Parametrize the network with the sampled parameters, often using the reparametrization trick; Forward pass with the data batch; Calculate the combined loss: variational posterior, prior and log likelihood; Compute gradients by backpropagation and optimize with SGD;
- 98. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Training a Bayesian Neural Network The training loop for a Bayesian Neural Network (BNN) using Variational Inference is shown below: Sample from q(w | θ) the parameters of the network. Two variational parameters for each weight in q: µ and σ; Parametrize the network with the sampled parameters, often using the reparametrization trick; Forward pass with the data batch; Calculate the combined loss: variational posterior, prior and log likelihood; Compute gradients by backpropagation and optimize with SGD; Repeat;
- 99. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Training a Bayesian Neural Network The training loop for a Bayesian Neural Network (BNN) using Variational Inference is shown below: Sample from q(w | θ) the parameters of the network. Two variational parameters for each weight in q: µ and σ; Parametrize the network with the sampled parameters, often using the reparametrization trick; Forward pass with the data batch; Calculate the combined loss: variational posterior, prior and log likelihood; Compute gradients by backpropagation and optimize with SGD; Repeat; Prediction: multiple forward passes.
- 100. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Training a Bayesian Neural Network The training loop for a Bayesian Neural Network (BNN) using Variational Inference is shown below: Sample from q(w | θ) the parameters of the network. Two variational parameters for each weight in q: µ and σ; Parametrize the network with the sampled parameters, often using the reparametrization trick; Forward pass with the data batch; Calculate the combined loss: variational posterior, prior and log likelihood; Compute gradients by backpropagation and optimize with SGD; Repeat; Prediction: multiple forward passes. This method is also called bayes by backprop.
- 101. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Quality of the uncertainty estimation HMC vs VI. Source: Bayesian Inference with Anchored Ensembles of Neural Networks, and Application to Exploration in Reinforcement Learning. Tim Pearce. 2018. For more information For more information about the variational approach, please refer to: Weight Uncertainty in Neural Networks. C. Blundell, et al. 2015.
- 102. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Dropout as a Bayesian Approximation Dropout. Source: Dropout: A Simple Way to Prevent Neural Networks from Overﬁtting. Nitish Srivastava, et al. 2014.
- 103. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Dropout as a Bayesian Approximation In 2015, the work Dropout as a Bayesian Approximation: Insights and Applications. Yarin Gal et al., they found a relationship between Dropout and Bayesian approximation;
- 104. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Dropout as a Bayesian Approximation In 2015, the work Dropout as a Bayesian Approximation: Insights and Applications. Yarin Gal et al., they found a relationship between Dropout and Bayesian approximation; It turns out that to do a Bernoulli approximate variational inference in Bayesian NNs, you can just add dropout during training and during prediction time as well;
- 105. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Dropout as a Bayesian Approximation In 2015, the work Dropout as a Bayesian Approximation: Insights and Applications. Yarin Gal et al., they found a relationship between Dropout and Bayesian approximation; It turns out that to do a Bernoulli approximate variational inference in Bayesian NNs, you can just add dropout during training and during prediction time as well; Quite appealing due to its simplicity and it also provided an interesting interpretation of dropout;
- 106. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Dropout as a Bayesian Approximation In 2015, the work Dropout as a Bayesian Approximation: Insights and Applications. Yarin Gal et al., they found a relationship between Dropout and Bayesian approximation; It turns out that to do a Bernoulli approximate variational inference in Bayesian NNs, you can just add dropout during training and during prediction time as well; Quite appealing due to its simplicity and it also provided an interesting interpretation of dropout; This technique is called "MC Dropout" or "Monte Carlo Dropout".
- 107. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A MC Dropout on a Regression Setting Some results from the MC Dropout on a regression setting: MC Dropout. Source: Dropout as a Bayesian Approximation: Insights and Applications. Yarin Gal et al. ICML 2015.
- 108. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A MC Dropout on a Classification Setting Some results from the MC Dropout on a classification setting: MC Dropout. Source: Dropout as a Bayesian Approximation: Insights and Applications. Yarin Gal et al. ICML 2015.
- 109. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Criticism of MC Dropout Some results from the MC Dropout on a regression setting: MC Dropout with varying number of data points. Gray regions is 1, std. dev. above and below. Source: Randomized Prior Functions for Deep Reinforcement Learning. Ian Osband et al. 2018. It was shown that MC Dropout didn’t pass a simple sanity check in a linear setting, as it didn’t concentrate with more data.
- 110. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Section V Ensembles
- 111. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Ensembles Uses multiple hypothesis to learn a better one; We can see dropout as an ensemble, but with shared weights; The ensemble variance can be interpreted as uncertainty; Simple intuition why it works. Input Data Combine predictions Model #1 Model #2 Model #3
- 112. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Ensembles In the work: Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a very simple method to compute uncertainty with ensembles:
- 113. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Ensembles In the work: Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a very simple method to compute uncertainty with ensembles: Setting You have M models, with independent parameters θ1, θ2, θM .
- 114. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Ensembles In the work: Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a very simple method to compute uncertainty with ensembles: Setting You have M models, with independent parameters θ1, θ2, θM . 1) Initialize parameters θ1, θ2, θM randomly;
- 115. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Ensembles In the work: Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a very simple method to compute uncertainty with ensembles: Setting You have M models, with independent parameters θ1, θ2, θM . 1) Initialize parameters θ1, θ2, θM randomly; 2) Train each network m ∈ M with weights θm individually;
- 116. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Ensembles In the work: Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a very simple method to compute uncertainty with ensembles: Setting You have M models, with independent parameters θ1, θ2, θM . 1) Initialize parameters θ1, θ2, θM randomly; 2) Train each network m ∈ M with weights θm individually; 3) Add or not adversarial training;
- 117. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Deep Ensembles In the work: Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a very simple method to compute uncertainty with ensembles: Setting You have M models, with independent parameters θ1, θ2, θM . 1) Initialize parameters θ1, θ2, θM randomly; 2) Train each network m ∈ M with weights θm individually; 3) Add or not adversarial training; 4) Combine the predictions with: p(y | x) = M−1 average M m=1 prediction from each network pθm (y | x, θm)
- 118. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Evaluating Entropy on Classification Plot of the binary entropy function H(p). A measure of the uncertainty.
- 119. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Evaluating Entropy on Classification 0.20.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 entropy values 0 1 2 3 4 5 6 7 8 Known classes 1 2 3 4 5 1 0 1 2 3 4 5 entropy values 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Unknown classes 1 2 3 4 5 ImageNet trained only on dogs. Histogram of the predictive entropy on test examples from known classes (dogs) and unknown classes (non-dogs) with varying ensemble size. Source: Lakshminarayanan B., et al. NIPS 2017.
- 120. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Evaluating Entropy on Classification −0.50.0 0.5 1.0 1.5 2.0 2.5 entropy values 0 1 2 3 4 5 6 7 Ensemble 1 5 10 −0.50.0 0.5 1.0 1.5 2.0 2.5 entropy values Ensemble + R 1 5 10 −0.50.0 0.5 1.0 1.5 2.0 2.5 entropy values Ensemble + AT 1 5 10 −0.5 0.0 0.5 1.0 1.5 2.0 entropy values MC dropout 1 5 10 −0.50.0 0.5 1.0 1.5 2.0 2.5 entropy values 0 1 2 3 4 5 6 7 Ensemble 1 5 10 −0.50.0 0.5 1.0 1.5 2.0 2.5 entropy values Ensemble + R 1 5 10 −0.50.0 0.5 1.0 1.5 2.0 2.5 entropy values Ensemble + AT 1 5 10 −0.50.0 0.5 1.0 1.5 2.0 2.5 entropy values MC dropout 1 5 10 Histogram of the predictive entropy on test examples from known classes from SVHN (top row) and unknown classes from CIFAR-10 (bottom row). Source: Lakshminarayanan B., et al. NIPS 2017.
- 121. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Randomized Priors In Randomized Prior Functions for Deep Reinforcement Learning. Ian Osband et al. 2018: Very simple and elegant modification on the ensemble method for uncertainty;
- 122. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Randomized Priors In Randomized Prior Functions for Deep Reinforcement Learning. Ian Osband et al. 2018: Very simple and elegant modification on the ensemble method for uncertainty; Developed in the Reinforcement Learning context;
- 123. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Randomized Priors In Randomized Prior Functions for Deep Reinforcement Learning. Ian Osband et al. 2018: Very simple and elegant modification on the ensemble method for uncertainty; Developed in the Reinforcement Learning context; Overcome the issue of injecting a prior into ensemble-based approaches to uncertainty;
- 124. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Randomized Priors In Randomized Prior Functions for Deep Reinforcement Learning. Ian Osband et al. 2018: Very simple and elegant modification on the ensemble method for uncertainty; Developed in the Reinforcement Learning context; Overcome the issue of injecting a prior into ensemble-based approaches to uncertainty; On a simple linear setting, it is equivalent to exact Bayesian inference for the case of a linear Gaussian model.
- 125. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bootstrap Population
- 126. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bootstrap Population Sample #1 Sample #2 Sample #3
- 127. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bootstrap Population Sample #1 Sample #2 Sample #3 Statistic Statistic Statistic q1 q2 q3
- 128. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Bootstrap Population Sample #1 Sample #2 Sample #3 Statistic Statistic Statistic q1 q2 q3 Bootstrap Statistic Distribution
- 129. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Randomized Prior Functions The key insight is to add a randomized (but ﬁxed) prior and bootstraped data:
- 130. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Randomized Prior Functions The key insight is to add a randomized (but ﬁxed) prior and bootstraped data: for k = 1, . . . , K do: Initialize θk ∼ random; Form Dk with bootstrap; Sample prior function pk ∼ P Optimize L(fθ + λpk; Dk) return posterior ensemble {fθk + pk}K k=1
- 131. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Qualitative Inspection Some pathological cases: Posterior predictive distributions for 1D regression with a (20, 20)-MLP and ReLUs. Source: Randomized Prior Functions for Deep Reinforcement Learning. Ian Osband et al. 2018.
- 132. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Qualitative Inspection Some pathological cases: Posterior predictive distributions for 1D regression with a (20, 20)-MLP and ReLUs. Source: Randomized Prior Functions for Deep Reinforcement Learning. Ian Osband et al. 2018. “(...) If an agent has only ever observed zero reward, then no amount of bootstrapping or ensembling will cause it to simulate positive rewards. (...)” – Randomized Prior Functions for Deep Reinforcement Learning. Ian
- 133. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Predictive Uncertainty 6 4 2 0 2 4 6 20 10 0 10 20 30 40 Source: Ian Osband et al. Using Randomized Prior Functions for Deep Reinforcement Learning. NIPS 2018. Image from: http://blog.christianperone.com
- 134. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Posterior Samples 4 3 2 1 0 1 2 3 4 10 5 0 5 10 Source: Ian Osband et al. Using Randomized Prior Functions for Deep Reinforcement Learning. NIPS 2018. Image from: http://blog.christianperone.com
- 135. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Prior Samples 4 3 2 1 0 1 2 3 4 4 2 0 2 4 Source: Ian Osband et al. Using Randomized Prior Functions for Deep Reinforcement Learning. NIPS 2018. Image from: http://blog.christianperone.com
- 136. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Final Remarks Many methods, no standardized evaluation, no ground truth for model uncertainty;
- 137. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Final Remarks Many methods, no standardized evaluation, no ground truth for model uncertainty; Performance (CPU/GPU resources) penalty basically for all methods;
- 138. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Final Remarks Many methods, no standardized evaluation, no ground truth for model uncertainty; Performance (CPU/GPU resources) penalty basically for all methods; No scalable solution for MCMC (yet);
- 139. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Final Remarks Many methods, no standardized evaluation, no ground truth for model uncertainty; Performance (CPU/GPU resources) penalty basically for all methods; No scalable solution for MCMC (yet); Choice depends on application;
- 140. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Final Remarks Many methods, no standardized evaluation, no ground truth for model uncertainty; Performance (CPU/GPU resources) penalty basically for all methods; No scalable solution for MCMC (yet); Choice depends on application; Always take into consideration the trade-off of guarantees;
- 141. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Final Remarks Many methods, no standardized evaluation, no ground truth for model uncertainty; Performance (CPU/GPU resources) penalty basically for all methods; No scalable solution for MCMC (yet); Choice depends on application; Always take into consideration the trade-off of guarantees; Significant evolution of methods, frameworks and hardware.
- 142. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Learning More - I Statistical Rethinking (excellent book and course), by Richard McElreath. https://xcelab.net/rm/statistical-rethinking/ Variational Inference: A Review, by David M. Blei, et al. https://arxiv.org/abs/1601.00670 Scalable Bayesian Inference, by David Dunson. NIPS 2018 Talk. https://www.youtube.com/watch?v=0HXpnG_WnlI Variational Bayes and Beyond, by Tamara Broderick. ICML 2018 Tutorial. https://www.youtube.com/watch?v=Moo4-KR5qNg History of Bayesian Neural Networks, by Zoubin Ghahramani. NIPS 2016 Keynote talk. https://www.youtube.com/watch?v=FD8l2vPU5FY
- 143. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Learning More - II Uncertainty in Deep Learning, Slides, by Roberto Silveira. http://tiny.cc/c77n9y A Beginner’s Guide to Variational Methods, by Eric Jang. https: //blog.evjang.com/2016/08/variational-bayes.html Uncertainty in Deep Learning, Thesis, by Yarin Gal. http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf PyMC3, Framework, by PyMC3 developers. https://docs.pymc.io/ Pyro, Framework, by Pyro developers. http://pyro.ai/ Tensorflow Probability, Framework, by TensorFlow developers. https://www.tensorflow.org/probability
- 144. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Section VI Q&A
- 145. Uncertainty in Deep Learning - Christian S. Perone (2019) Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A Q&A Hope you liked ! Questions ?