SlideShare a Scribd company logo
Bayesian Neural Network
Natan Katz
Natan.katz@gmail.com
Agenda
• Short Introduction to Bayesian Inference
• Variational Inference
• Bayesian Neural Network
• Numerical Methods
• MNIST Example
Bayesian Inference
Bayesian Inference
The inputs:
Evidence – A Sample of observations (numbers, categories, vectors, images)
Hypothesis - An assumption about the prob. structure that creates the sample
Objective :
We wish to learn the optimal parameters of this distribution.
• This probability is called Posterior .
• We wish to find the optimal parameters for P(H|E)
• Remark In many books it is called MAP (Maximum A postriori Estimation)
Let’s Formulate
Z- R.V. that represents the hypothesis
X- R.V. that represents the evidence
Bayes formula:
P(Z|X) =
𝑃(𝑍,𝑋)
𝑃(𝑋)
Let’s Formulate (Cont.)
𝑃𝑟(Z) –Prior (The parameters’ distribution according to our belief)
𝑃𝑙(X|Z) –Likelihood (How likely is the sample given the parameters)
P(Z|X) =
𝑃𝑟(z) 𝑃 𝑙(x|z)
𝑃(𝑋)
Bayesian inference is therefore about working with the RHS terms.
In some case studying the denominator is intractable or extremely
difficult to calculate.
Example -GMM
We have K Gaussians with known variance σ
Draw μ 𝑘 ~ 𝑁 0, τ (τ is positive) from the prior
For each sample j =1…n
𝑧𝑗 ~Cat (1/K,1/K…1/K)
𝑥𝑗 ~ 𝑁(μ 𝑧 𝑗
, σ)
p(𝑥1….𝑛) = μ1:𝑘 𝑙=1
𝐾
𝑃(μ𝑙) 𝑖=1
𝑛
𝑧 𝑗
𝑝( 𝑧𝑗) P(𝑥𝑖|μ 𝑧 𝑗
) => 𝑃𝑟𝑒𝑡𝑡𝑦 𝑆ℎ𝑖𝑡
Some Good news
P(Z|X) =
𝑃𝑟(z) 𝑃 𝑙(x|z)
𝑃(𝑋)
• We wish to learn Z
• There is no Z in the denominator
=> P(Z|X) α 𝑃𝑙 𝑋 𝑍 𝑃𝑟(𝑍)
Solutions
Until 1999
Mostly numerical sampling:
• Metropolis Hastings
• RBM
Variational
Inference
“AN INTRODUCTION TO VARIATIONAL METHODS FOR GRAPHICAL MODELS”
11
VI – Algorithm Overview
• Rather a numerical sampling method we provide an analytical one:
1. We define a distribution family Q(Z) (bias-variance trade off)
2. We minimize KL divergence min KL(Q(z)|| P(Z|X))
log(P(X)) = 𝐸 𝑄 [log P(x, Z)] − 𝐸 𝑄 [log Q(Z)] + KL(Q(Z)||P(Z|X))
ELBO-Evidence Lower Bound
• Maximizing ELBO =>minimizing KL
𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log P(X, Z)] − 𝐸 𝑄 [log Q(Z)] = 𝑄𝐿𝑜𝑔(
𝑃(𝑋,𝑍)
𝑄(𝑍)
)= J(Q)- Euler Lagrange
MFT (Mean Field Theory)
Scientific “approval”
13
What Deep Learning
doesn’t do
A DL Scenario
• We train a CNN to identify images (men versus women)
• Our accuracy can be amazing (98-99%)
Pretty cool
Let’s get Cruel
• We offer the model an image of a basketball
• The model outputs “man” or “woman”
Why is that?
Mathematical Observation
We trained a function F such that
F : {space of images}->{“man”,”woman”}
Statistical Observation
Basketball image is out of our training data
Anecdotes
Image (Uri Itay)
• Researchers trained a network to classify tanks and trees.
After using 200 images (100 of each kind 50 train 50 test), the test accuracy
was 100% .
As they took it to the Pentagon it began to miss. The reason was that all the
tank images were taken in cloudy days whereas trees in sunny.
Text
• We can see in text problems many cases where rather finding latent
phenomena networks use words as their anchor.
A plausible corollary
When we train a DL model:
• We hardly ever know what the model learned
• Models cannot “report” about their uncertainties
Is it crucial ?
• Consider an engine that decides upon AI whether a tumor is
malignant or benign
• Drug treatment upon medical record
• Actions that are taken by an autonomous vehicle
• High frequency trading
What can we do?
• DL models are trained to the optimal weights
• What if rather training weights pointwise, we ill train
weights’ distributions?
The Inference
• For each data pair (x,y) we create mean and variance
• This variance will reflect model’s uncertainty
• DL approach – Do Dropout in inference
Uncertainty Types
Epistemic Uncertainty :
Episteme= Knowledge
Uncertainty that theoretically we can know but we don’t:
• Model structure issues
• Absence of data
We can use the notion “reducible” too
Uncertainty Types
Aleatoric Uncertainty :
Aleator = Dice Player
Uncertainty that we cannot solve:
• The stochasticity of a dice
• Noisy labels
We can use the notion “irreducible” too
Bayesian Neural Network
BNN-Training
• We have a neural network
• We place a prior distribution P over the weights W
• For data D={ (X,Y)}
For measuring uncertainties, we use the posterior Distribution
DL Vs. BNN
DL
1. Training using a loss that is related to prediction probability P(Y|X,W)
2. The weights W are trained point-wise with MLE
Bayesian NN
1. Training using a loss that is related the posterior probability P(W|X,Y)
2. We train weights’ distribution
BNN-Inference
Inference
We assume prior knowledge on the weights’ distribution π
As in any NN we get an input x’ and aim to predict y’ :
P(y’| x’) = 𝑃 y’ 𝑥′
, 𝑤 𝑃 𝑤 𝐷 𝑑𝑤
This can be rewritten as:
P(y’| x’) =𝐸 𝑃(𝑤|𝐷) 𝑃 y’ 𝑥′
, 𝑤
D={(X,Y)}
Measuring Uncertainty
• In the inference, given a data point 𝑥∗
• Sample weights W 𝑛 𝑡𝑖𝑚𝑒𝑠
• Calculate its statistics
E[f(𝑥∗
,w)]= 𝑖=1
𝑛
𝑓(𝑥∗
, 𝑤𝑖)
V([f(𝑥∗
,w)] =E𝑓(𝑥∗
,w)2
- E[f(𝑥∗
,w)]2
W –r.v. which 𝑤𝑖 is its samples
Common tools to obtain Posterior Dist.
1. Variational Inference
2. MCMC –Sampling (Metropolis –Hastings, Gibbs)
3. HMC
4. SGLD
Metropolis Hastings
• MCMC sampling algorithm
• The main idea is that we pick samples upon pdf comparisons:
At each step we accept or randomize a sample upon the previous
sample and decide to accept or reject
• Unbiased, Huge variance and very slow (iterate over the entire data)
• Great History
What is Hamiltonian?
• A physical operator that measures energy of a dynamical system
Two sets of coordinates
q -State coordinates
p- Momentum
H(p, q) =U(q) +K(p)
U(q) = log[π 𝑞 𝐿(𝑞|𝐷)] K(P)=
𝑝 2
2𝑚
U-Potential energy, K –Kinetic
𝑑𝐻
𝑑𝑝
= 𝑞 ,
𝑑𝐻
𝑑𝑞
= - 𝑝
Hamiltonian Monte Carlo
• Hamiltonians offer a deterministic vector field (with trajectories….)
• If we set a Hamiltonian depended distribution, we can use this
property for sampling
P(x,y) = 𝑒−𝐻(𝑥,𝑦)
Hybrid - MC
• We have the “state space” x
• We can add “momentum” and use Hamiltonian mechanism
Leap Frog Algorithm
We set a time interval δ, For each step i :
1. 𝑃𝑖(t+0.5 δ) =𝑃𝑖(t) – (δ/2)
𝑑𝑈
𝑑𝑞(𝑡)
2 𝑄𝑖(t+ δ ) = 𝑄𝑖(t) + δ
𝑑𝐾
𝑑𝑝(𝑡+0.5δ)
3 𝑃𝑖(t+ δ) = 𝑃𝑖(t+0.5 δ) - (δ/2)
𝑑𝑈
𝑑𝑞(𝑡+δ)
𝑄𝑖
𝑄
HMC
Algorithm (Neal 1995, 2012, Duane 1987)
1. Draw 𝑥0 from our prior
Draw 𝑝0 from standard normal dist.
2. Perform L steps of leapfrog
3 Pick the 𝑥 𝑡 following M.H.
HMC –Pros & Cons
Pros
• It takes points from a wider domains thus we can describe the
distribution better
• It may take points with lower density
• Faster than MCMC
Cons
• It may suffer from low energy barrier
• No minibatch –Not nice
• It has to calculate gradients for the entire data!!! Bad
What do we need then?
• A tool that allows sub-sampling
• Fewer Gradients
• Keen knowledge about extremums and escape rooms
Stochastic Gradient
Langevin Dynamics
(SGLD)
Langevin Equation
Langevin Equation describes the motion of pollen grain in water:
F -γ𝑣 𝑡 +ξ 𝑡=0 ξ 𝑡 ~N(0,t)
ξ 𝑡 is a Brownian Force- The collisions with the water molecules
F - External forces
This equation has an equilibrium solution which is our posterior
distribution
Langevin Equation
Let’s use the following:
F=𝛻𝐸 𝑣 𝑡 =
𝑑𝑋
𝑑𝑇
The eq in its discrete form becomes:
𝑥𝑡+1 = 𝑥𝑡 +
dt
γ
ξ 𝑡 + 𝛻𝐸
dt
γ
(looks familiar doesn’t it?)
Langevin Equation
Some more re-write:
𝑥𝑡+1 = 𝑥𝑡 + 𝜖 𝑡 𝛻𝐸 +ε 𝑡 ε 𝑡 -Stochastic term
Consider this term
Are we in a better situation ?
Robbins & Monro (Stoch. Approx. 1951)
• Let F a function and θ a number
• There exists a unique solution :
F(𝑥∗ ) = θ
F - is unknown
Y - A measurable r.v.
E[Y(x)] = F(x)
Robbins &
Monro
(cont)
The following algorithm converges to 𝑥∗ :
𝑋 𝑁+1 = 𝑋 𝑁 +α 𝑁 (𝑌𝑁 − θ )
Back to Langevin
𝑥 𝑡+1 = 𝑥 𝑡 + 𝜖 𝑡 𝛻𝐸 +ε 𝑡
𝛻𝐸 𝑚𝑏=𝛻𝐸 + ε 𝑡
𝑥 𝑡+1 = 𝑥 𝑡 + 𝜖 𝑡 𝛻𝐸 𝑚𝑏
Δ 𝜃𝑡 =ε 𝑡(𝛻log 𝑝 𝜃𝑡 +
𝑁
𝑛 𝑖=1
𝑁
𝛻log 𝑝 𝑥𝑖|𝜃𝑡 )
We are almost there
• This eq converges to an optimal solution (MAP).
• We need a solution of SDE (probability)
• Let’s add a stochastic term
Δ 𝜃𝑡 =ε 𝑡(𝛻log 𝑝 𝜃𝑡 +
𝑁
𝑛 𝑖=1
𝑁
𝛻log 𝑝 𝑥𝑖|𝜃𝑡 ) + η 𝑡 η 𝑡~N(0,σ)
Variance Analysis
ε 𝑡 - Follow R&M rules
How big is σ ?
Bigger than ε 𝑡 *V(𝛻)
As t->∞ the equation must become Langevin.
THE variance of η must be therefore bigger than ε 𝑡 *V(𝛻)
We do the
following:
Finally, Example
https://towardsdatascience.com/making-your-neural-network-say-i-
dont-know-bayesian-nns-using-pyro-and-pytorch-b1c24e6ab8cd
Problem’s Framework
• MNIST CNN model
• MNIST SOTA ~99.8%
The Experiment
• Training a BNN – using VI (small amount, of epochs)
• Set a regular decision law – Take the max score of each digit
=>Accuracy ~88%
Allowing the Network to refuse
• For each image:
• Sample 100 networks
• We obtain 100 outputs per image
• We have 10 digits each with 100 scores
• If the median of these 100 scores>0.2 we take
(Indeed, we can accept more than one result)
Random Image
Summary
• Accuracy 96%
• Refuse 12.5%
• Random images 95% have been refused
Thanks!!
My process
• https://wjmaddox.github.io/assets/BNN_tutorial_CILVR.pdf
• https://arxiv.org/pdf/2007.06823.pdf
• https://towardsdatascience.com/what-uncertainties-tell-you-in-bayesian-neural-networks-6fbd5f85648e
• https://medium.com/@uriitai/augmentation-and-groups-theory-795c287fec3f
• https://github.com/paraschopra/bayesian-neural-network-mnist/blob/master/bnn.ipynb
• https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro-
and-pytorch-b1c24e6ab8cd
• http://www.stats.ox.ac.uk/~teh/research/compstats/WelTeh2011a.pdf
• https://arxiv.org/pdf/1206.1901.pdf
• http://cgl.elte.hu/~racz/Stoch-diff-eq.pdf
• https://arxiv.org/ftp/arxiv/papers/1103/1103.1184.pdf
• https://henripal.github.io/blog/langevin
• https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf

More Related Content

What's hot

Statistics vs machine learning
Statistics vs machine learningStatistics vs machine learning
Statistics vs machine learningTom Dierickx
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMPuneet Kulyana
 
Pattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical ModelsPattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical Modelsbutest
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Edureka!
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Fuzzy Logic Ppt
Fuzzy Logic PptFuzzy Logic Ppt
Fuzzy Logic Pptrafi
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationAdnan Masood
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 
02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probability02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probabilityAndres Mendez-Vazquez
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithmsanas_elf
 

What's hot (20)

Statistics vs machine learning
Statistics vs machine learningStatistics vs machine learning
Statistics vs machine learning
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHM
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
 
Pattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical ModelsPattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical Models
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
Bayes network
Bayes networkBayes network
Bayes network
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Fuzzy Logic Ppt
Fuzzy Logic PptFuzzy Logic Ppt
Fuzzy Logic Ppt
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probability02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probability
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithms
 
AI_7 Statistical Reasoning
AI_7 Statistical Reasoning AI_7 Statistical Reasoning
AI_7 Statistical Reasoning
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
 

Similar to Bayesian Neural Networks

GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesNatan Katz
 
Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUPNatan Katz
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptxHadrian7
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdfMcSwathi
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Datadr_jp_ebejer
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer Sammer Qader
 
Probability distribution
Probability distributionProbability distribution
Probability distributionRanjan Kumar
 

Similar to Bayesian Neural Networks (20)

GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
Variational inference
Variational inference  Variational inference
Variational inference
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Into to prob_prog_hari (2)
Into to prob_prog_hari (2)Into to prob_prog_hari (2)
Into to prob_prog_hari (2)
 
Into to prob_prog_hari
Into to prob_prog_hariInto to prob_prog_hari
Into to prob_prog_hari
 
Statistics-2 : Elements of Inference
Statistics-2 : Elements of InferenceStatistics-2 : Elements of Inference
Statistics-2 : Elements of Inference
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Lec13_Bayes.pptx
Lec13_Bayes.pptxLec13_Bayes.pptx
Lec13_Bayes.pptx
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Data
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

More from Natan Katz

AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptxNatan Katz
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Natan Katz
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL DivergenceNatan Katz
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihoodNatan Katz
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference projectNatan Katz
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference Natan Katz
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 

More from Natan Katz (13)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Finalver
FinalverFinalver
Finalver
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
Ucb
UcbUcb
Ucb
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 

Recently uploaded

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单yhkoc
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单ewymefz
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单enxupq
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sMAQIB18
 

Recently uploaded (20)

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 

Bayesian Neural Networks

  • 1. Bayesian Neural Network Natan Katz Natan.katz@gmail.com
  • 2. Agenda • Short Introduction to Bayesian Inference • Variational Inference • Bayesian Neural Network • Numerical Methods • MNIST Example
  • 4. Bayesian Inference The inputs: Evidence – A Sample of observations (numbers, categories, vectors, images) Hypothesis - An assumption about the prob. structure that creates the sample Objective : We wish to learn the optimal parameters of this distribution. • This probability is called Posterior . • We wish to find the optimal parameters for P(H|E) • Remark In many books it is called MAP (Maximum A postriori Estimation)
  • 5. Let’s Formulate Z- R.V. that represents the hypothesis X- R.V. that represents the evidence Bayes formula: P(Z|X) = 𝑃(𝑍,𝑋) 𝑃(𝑋)
  • 6. Let’s Formulate (Cont.) 𝑃𝑟(Z) –Prior (The parameters’ distribution according to our belief) 𝑃𝑙(X|Z) –Likelihood (How likely is the sample given the parameters) P(Z|X) = 𝑃𝑟(z) 𝑃 𝑙(x|z) 𝑃(𝑋) Bayesian inference is therefore about working with the RHS terms. In some case studying the denominator is intractable or extremely difficult to calculate.
  • 7. Example -GMM We have K Gaussians with known variance σ Draw μ 𝑘 ~ 𝑁 0, τ (τ is positive) from the prior For each sample j =1…n 𝑧𝑗 ~Cat (1/K,1/K…1/K) 𝑥𝑗 ~ 𝑁(μ 𝑧 𝑗 , σ) p(𝑥1….𝑛) = μ1:𝑘 𝑙=1 𝐾 𝑃(μ𝑙) 𝑖=1 𝑛 𝑧 𝑗 𝑝( 𝑧𝑗) P(𝑥𝑖|μ 𝑧 𝑗 ) => 𝑃𝑟𝑒𝑡𝑡𝑦 𝑆ℎ𝑖𝑡
  • 8. Some Good news P(Z|X) = 𝑃𝑟(z) 𝑃 𝑙(x|z) 𝑃(𝑋) • We wish to learn Z • There is no Z in the denominator => P(Z|X) α 𝑃𝑙 𝑋 𝑍 𝑃𝑟(𝑍)
  • 9. Solutions Until 1999 Mostly numerical sampling: • Metropolis Hastings • RBM
  • 11. “AN INTRODUCTION TO VARIATIONAL METHODS FOR GRAPHICAL MODELS” 11
  • 12. VI – Algorithm Overview • Rather a numerical sampling method we provide an analytical one: 1. We define a distribution family Q(Z) (bias-variance trade off) 2. We minimize KL divergence min KL(Q(z)|| P(Z|X)) log(P(X)) = 𝐸 𝑄 [log P(x, Z)] − 𝐸 𝑄 [log Q(Z)] + KL(Q(Z)||P(Z|X)) ELBO-Evidence Lower Bound • Maximizing ELBO =>minimizing KL
  • 13. 𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log P(X, Z)] − 𝐸 𝑄 [log Q(Z)] = 𝑄𝐿𝑜𝑔( 𝑃(𝑋,𝑍) 𝑄(𝑍) )= J(Q)- Euler Lagrange MFT (Mean Field Theory) Scientific “approval” 13
  • 15. A DL Scenario • We train a CNN to identify images (men versus women) • Our accuracy can be amazing (98-99%) Pretty cool
  • 16. Let’s get Cruel • We offer the model an image of a basketball • The model outputs “man” or “woman”
  • 17. Why is that? Mathematical Observation We trained a function F such that F : {space of images}->{“man”,”woman”} Statistical Observation Basketball image is out of our training data
  • 18. Anecdotes Image (Uri Itay) • Researchers trained a network to classify tanks and trees. After using 200 images (100 of each kind 50 train 50 test), the test accuracy was 100% . As they took it to the Pentagon it began to miss. The reason was that all the tank images were taken in cloudy days whereas trees in sunny. Text • We can see in text problems many cases where rather finding latent phenomena networks use words as their anchor.
  • 19. A plausible corollary When we train a DL model: • We hardly ever know what the model learned • Models cannot “report” about their uncertainties
  • 20. Is it crucial ? • Consider an engine that decides upon AI whether a tumor is malignant or benign • Drug treatment upon medical record • Actions that are taken by an autonomous vehicle • High frequency trading
  • 21. What can we do? • DL models are trained to the optimal weights • What if rather training weights pointwise, we ill train weights’ distributions? The Inference • For each data pair (x,y) we create mean and variance • This variance will reflect model’s uncertainty • DL approach – Do Dropout in inference
  • 22. Uncertainty Types Epistemic Uncertainty : Episteme= Knowledge Uncertainty that theoretically we can know but we don’t: • Model structure issues • Absence of data We can use the notion “reducible” too
  • 23. Uncertainty Types Aleatoric Uncertainty : Aleator = Dice Player Uncertainty that we cannot solve: • The stochasticity of a dice • Noisy labels We can use the notion “irreducible” too
  • 25. BNN-Training • We have a neural network • We place a prior distribution P over the weights W • For data D={ (X,Y)} For measuring uncertainties, we use the posterior Distribution
  • 26. DL Vs. BNN DL 1. Training using a loss that is related to prediction probability P(Y|X,W) 2. The weights W are trained point-wise with MLE Bayesian NN 1. Training using a loss that is related the posterior probability P(W|X,Y) 2. We train weights’ distribution
  • 27. BNN-Inference Inference We assume prior knowledge on the weights’ distribution π As in any NN we get an input x’ and aim to predict y’ : P(y’| x’) = 𝑃 y’ 𝑥′ , 𝑤 𝑃 𝑤 𝐷 𝑑𝑤 This can be rewritten as: P(y’| x’) =𝐸 𝑃(𝑤|𝐷) 𝑃 y’ 𝑥′ , 𝑤 D={(X,Y)}
  • 28. Measuring Uncertainty • In the inference, given a data point 𝑥∗ • Sample weights W 𝑛 𝑡𝑖𝑚𝑒𝑠 • Calculate its statistics E[f(𝑥∗ ,w)]= 𝑖=1 𝑛 𝑓(𝑥∗ , 𝑤𝑖) V([f(𝑥∗ ,w)] =E𝑓(𝑥∗ ,w)2 - E[f(𝑥∗ ,w)]2 W –r.v. which 𝑤𝑖 is its samples
  • 29. Common tools to obtain Posterior Dist. 1. Variational Inference 2. MCMC –Sampling (Metropolis –Hastings, Gibbs) 3. HMC 4. SGLD
  • 30. Metropolis Hastings • MCMC sampling algorithm • The main idea is that we pick samples upon pdf comparisons: At each step we accept or randomize a sample upon the previous sample and decide to accept or reject • Unbiased, Huge variance and very slow (iterate over the entire data) • Great History
  • 31.
  • 32. What is Hamiltonian? • A physical operator that measures energy of a dynamical system Two sets of coordinates q -State coordinates p- Momentum H(p, q) =U(q) +K(p) U(q) = log[π 𝑞 𝐿(𝑞|𝐷)] K(P)= 𝑝 2 2𝑚 U-Potential energy, K –Kinetic 𝑑𝐻 𝑑𝑝 = 𝑞 , 𝑑𝐻 𝑑𝑞 = - 𝑝
  • 33. Hamiltonian Monte Carlo • Hamiltonians offer a deterministic vector field (with trajectories….) • If we set a Hamiltonian depended distribution, we can use this property for sampling P(x,y) = 𝑒−𝐻(𝑥,𝑦)
  • 34. Hybrid - MC • We have the “state space” x • We can add “momentum” and use Hamiltonian mechanism Leap Frog Algorithm We set a time interval δ, For each step i : 1. 𝑃𝑖(t+0.5 δ) =𝑃𝑖(t) – (δ/2) 𝑑𝑈 𝑑𝑞(𝑡) 2 𝑄𝑖(t+ δ ) = 𝑄𝑖(t) + δ 𝑑𝐾 𝑑𝑝(𝑡+0.5δ) 3 𝑃𝑖(t+ δ) = 𝑃𝑖(t+0.5 δ) - (δ/2) 𝑑𝑈 𝑑𝑞(𝑡+δ) 𝑄𝑖 𝑄
  • 35. HMC Algorithm (Neal 1995, 2012, Duane 1987) 1. Draw 𝑥0 from our prior Draw 𝑝0 from standard normal dist. 2. Perform L steps of leapfrog 3 Pick the 𝑥 𝑡 following M.H.
  • 36.
  • 37. HMC –Pros & Cons Pros • It takes points from a wider domains thus we can describe the distribution better • It may take points with lower density • Faster than MCMC Cons • It may suffer from low energy barrier • No minibatch –Not nice • It has to calculate gradients for the entire data!!! Bad
  • 38. What do we need then? • A tool that allows sub-sampling • Fewer Gradients • Keen knowledge about extremums and escape rooms
  • 40. Langevin Equation Langevin Equation describes the motion of pollen grain in water: F -γ𝑣 𝑡 +ξ 𝑡=0 ξ 𝑡 ~N(0,t) ξ 𝑡 is a Brownian Force- The collisions with the water molecules F - External forces This equation has an equilibrium solution which is our posterior distribution
  • 41. Langevin Equation Let’s use the following: F=𝛻𝐸 𝑣 𝑡 = 𝑑𝑋 𝑑𝑇 The eq in its discrete form becomes: 𝑥𝑡+1 = 𝑥𝑡 + dt γ ξ 𝑡 + 𝛻𝐸 dt γ (looks familiar doesn’t it?)
  • 42. Langevin Equation Some more re-write: 𝑥𝑡+1 = 𝑥𝑡 + 𝜖 𝑡 𝛻𝐸 +ε 𝑡 ε 𝑡 -Stochastic term Consider this term Are we in a better situation ?
  • 43. Robbins & Monro (Stoch. Approx. 1951) • Let F a function and θ a number • There exists a unique solution : F(𝑥∗ ) = θ F - is unknown Y - A measurable r.v. E[Y(x)] = F(x)
  • 44. Robbins & Monro (cont) The following algorithm converges to 𝑥∗ : 𝑋 𝑁+1 = 𝑋 𝑁 +α 𝑁 (𝑌𝑁 − θ )
  • 45. Back to Langevin 𝑥 𝑡+1 = 𝑥 𝑡 + 𝜖 𝑡 𝛻𝐸 +ε 𝑡 𝛻𝐸 𝑚𝑏=𝛻𝐸 + ε 𝑡 𝑥 𝑡+1 = 𝑥 𝑡 + 𝜖 𝑡 𝛻𝐸 𝑚𝑏 Δ 𝜃𝑡 =ε 𝑡(𝛻log 𝑝 𝜃𝑡 + 𝑁 𝑛 𝑖=1 𝑁 𝛻log 𝑝 𝑥𝑖|𝜃𝑡 )
  • 46. We are almost there • This eq converges to an optimal solution (MAP). • We need a solution of SDE (probability) • Let’s add a stochastic term Δ 𝜃𝑡 =ε 𝑡(𝛻log 𝑝 𝜃𝑡 + 𝑁 𝑛 𝑖=1 𝑁 𝛻log 𝑝 𝑥𝑖|𝜃𝑡 ) + η 𝑡 η 𝑡~N(0,σ)
  • 47. Variance Analysis ε 𝑡 - Follow R&M rules How big is σ ? Bigger than ε 𝑡 *V(𝛻) As t->∞ the equation must become Langevin. THE variance of η must be therefore bigger than ε 𝑡 *V(𝛻)
  • 50. Problem’s Framework • MNIST CNN model • MNIST SOTA ~99.8%
  • 51. The Experiment • Training a BNN – using VI (small amount, of epochs) • Set a regular decision law – Take the max score of each digit =>Accuracy ~88%
  • 52. Allowing the Network to refuse • For each image: • Sample 100 networks • We obtain 100 outputs per image • We have 10 digits each with 100 scores • If the median of these 100 scores>0.2 we take (Indeed, we can accept more than one result)
  • 53.
  • 54.
  • 55.
  • 56.
  • 58. Summary • Accuracy 96% • Refuse 12.5% • Random images 95% have been refused
  • 60. My process • https://wjmaddox.github.io/assets/BNN_tutorial_CILVR.pdf • https://arxiv.org/pdf/2007.06823.pdf • https://towardsdatascience.com/what-uncertainties-tell-you-in-bayesian-neural-networks-6fbd5f85648e • https://medium.com/@uriitai/augmentation-and-groups-theory-795c287fec3f • https://github.com/paraschopra/bayesian-neural-network-mnist/blob/master/bnn.ipynb • https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro- and-pytorch-b1c24e6ab8cd • http://www.stats.ox.ac.uk/~teh/research/compstats/WelTeh2011a.pdf • https://arxiv.org/pdf/1206.1901.pdf • http://cgl.elte.hu/~racz/Stoch-diff-eq.pdf • https://arxiv.org/ftp/arxiv/papers/1103/1103.1184.pdf • https://henripal.github.io/blog/langevin • https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf