SlideShare a Scribd company logo
1 of 17
Download to read offline
Priors in Bayesian Neural
Networks
Tomasz Kuśmierczyk
2022-05-06
Based on:
Wenzel et al.: What Are Bayesian Neural Network Posteriors Really Like?, 2020
Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021
Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021
Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021
DNN vs BNN
Reminder: Bayesian inference and learning
network
weights
Reminder: Bayesian inference and learning
we can sample from posterior using MCMC or learn it using e.g. VI
likelihood prior
How do we learn (=find posteriors) for BNNs?
● (Stochastic Gradient)-MCMCs: Hamiltonian MC, Langevin Dynamics etc.
● Distributional:
○ Laplace approximation
○ VI (via ELBO)
● “Model-specific”
○ MC-Dropout
○ SWAG
vs:
Cold posterior effect
Cold posterior effect
likelihood prior
vs:
trading off the relative influence between the prior term and the likelihood term:
→if the CPE becomes stronger as the relative influence of the prior increases,
this would be an indication that the prior is poor
Bad prior hypothesis
(data size)
Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021
DNN (SGD trained; no prior) weights
Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021
FC BNN: different priors
Weights correlations
CNN BNN: different priors
How can we learn priors?
Marginal likelihood optimization / type-II MLE / empirical Bayes
Model selection (e.g. choosing priors) by maximizing log of ML:
● For a fixed model, can be estimated using Laplace approximation with
GGN for Hessian
→ Alternate between model updates and the approximation
Will the approximation capture difference in priors?
Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021
● Assume approximation from some (parametric) family of
distributions
● Maximize ELBO wrt its parameters λ
Will posterior capture difference in priors? How to learn so complex priors are
accounted for?
Posterior learning and learning priors: VI
● MCMC e.g. SGHMC:
○ explores the space of parameters and generates set of samples { } from the posterior
○ assumes a fixed energy function, for example,
parametric priors cannot be learned, but, we can think about hierarchical priors:
(Nalisnick et al.: Predictive Complexity Priors, 2020): optimize KL divergence to predictive distribution of a reference model
for hierarchical prior
MCMC?
Conclusion
Model selection and learning for BNNs are tied

More Related Content

Similar to Priors for BNNs

Learning, Logic, and Probability: a Unified View
Learning, Logic, and Probability: a Unified ViewLearning, Logic, and Probability: a Unified View
Learning, Logic, and Probability: a Unified View
butest
 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief nets
zukun
 
1-s2.0-S1474034622002737-main.pdf
1-s2.0-S1474034622002737-main.pdf1-s2.0-S1474034622002737-main.pdf
1-s2.0-S1474034622002737-main.pdf
archurssu
 
EchoBay: optimization of Echo State Networks under memory and time constraints
EchoBay: optimization of Echo State Networks under memory and time constraintsEchoBay: optimization of Echo State Networks under memory and time constraints
EchoBay: optimization of Echo State Networks under memory and time constraints
NECST Lab @ Politecnico di Milano
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 

Similar to Priors for BNNs (20)

ML-GCN
ML-GCNML-GCN
ML-GCN
 
Transformer based approaches for visual representation learning
Transformer based approaches for visual representation learningTransformer based approaches for visual representation learning
Transformer based approaches for visual representation learning
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_Slide
 
Learning, Logic, and Probability: a Unified View
Learning, Logic, and Probability: a Unified ViewLearning, Logic, and Probability: a Unified View
Learning, Logic, and Probability: a Unified View
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
 
Associative Memory Model について
Associative Memory Model についてAssociative Memory Model について
Associative Memory Model について
 
Density Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic DataDensity Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic Data
 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief nets
 
1-s2.0-S1474034622002737-main.pdf
1-s2.0-S1474034622002737-main.pdf1-s2.0-S1474034622002737-main.pdf
1-s2.0-S1474034622002737-main.pdf
 
Heavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfHeavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdf
 
EchoBay: optimization of Echo State Networks under memory and time constraints
EchoBay: optimization of Echo State Networks under memory and time constraintsEchoBay: optimization of Echo State Networks under memory and time constraints
EchoBay: optimization of Echo State Networks under memory and time constraints
 
Mj upjs
Mj upjsMj upjs
Mj upjs
 
Quantum Deep Learning
Quantum Deep LearningQuantum Deep Learning
Quantum Deep Learning
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Block coordinate descent__in_computer_vision
Block coordinate descent__in_computer_visionBlock coordinate descent__in_computer_vision
Block coordinate descent__in_computer_vision
 
Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
Black-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelBlack-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX model
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 

More from Tomasz Kusmierczyk

More from Tomasz Kusmierczyk (9)

Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
 
Automatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variablesAutomatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variables
 
Loss Calibrated Variational Inference
Loss Calibrated Variational InferenceLoss Calibrated Variational Inference
Loss Calibrated Variational Inference
 
Variational inference using implicit distributions
Variational inference using implicit distributionsVariational inference using implicit distributions
Variational inference using implicit distributions
 
On the Causal Effect of Digital Badges
On the Causal Effect of Digital BadgesOn the Causal Effect of Digital Badges
On the Causal Effect of Digital Badges
 
What are the negative effects of social media?: fighting fake information
What are the negative effects of social media?: fighting fake informationWhat are the negative effects of social media?: fighting fake information
What are the negative effects of social media?: fighting fake information
 
Sampling and Markov Chain Monte Carlo Techniques
Sampling and Markov Chain Monte Carlo TechniquesSampling and Markov Chain Monte Carlo Techniques
Sampling and Markov Chain Monte Carlo Techniques
 
Probabilistic Models in Recommender Systems: Time Variant Models
Probabilistic Models in Recommender Systems: Time Variant ModelsProbabilistic Models in Recommender Systems: Time Variant Models
Probabilistic Models in Recommender Systems: Time Variant Models
 
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Priors for BNNs

  • 1. Priors in Bayesian Neural Networks Tomasz Kuśmierczyk 2022-05-06 Based on: Wenzel et al.: What Are Bayesian Neural Network Posteriors Really Like?, 2020 Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021 Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021 Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021
  • 3. Reminder: Bayesian inference and learning network weights
  • 4. Reminder: Bayesian inference and learning we can sample from posterior using MCMC or learn it using e.g. VI likelihood prior
  • 5. How do we learn (=find posteriors) for BNNs? ● (Stochastic Gradient)-MCMCs: Hamiltonian MC, Langevin Dynamics etc. ● Distributional: ○ Laplace approximation ○ VI (via ELBO) ● “Model-specific” ○ MC-Dropout ○ SWAG
  • 8. trading off the relative influence between the prior term and the likelihood term: →if the CPE becomes stronger as the relative influence of the prior increases, this would be an indication that the prior is poor Bad prior hypothesis (data size) Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021
  • 9. DNN (SGD trained; no prior) weights Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021
  • 13. How can we learn priors?
  • 14. Marginal likelihood optimization / type-II MLE / empirical Bayes Model selection (e.g. choosing priors) by maximizing log of ML: ● For a fixed model, can be estimated using Laplace approximation with GGN for Hessian → Alternate between model updates and the approximation Will the approximation capture difference in priors? Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021
  • 15. ● Assume approximation from some (parametric) family of distributions ● Maximize ELBO wrt its parameters λ Will posterior capture difference in priors? How to learn so complex priors are accounted for? Posterior learning and learning priors: VI
  • 16. ● MCMC e.g. SGHMC: ○ explores the space of parameters and generates set of samples { } from the posterior ○ assumes a fixed energy function, for example, parametric priors cannot be learned, but, we can think about hierarchical priors: (Nalisnick et al.: Predictive Complexity Priors, 2020): optimize KL divergence to predictive distribution of a reference model for hierarchical prior MCMC?
  • 17. Conclusion Model selection and learning for BNNs are tied