SlideShare a Scribd company logo
1 of 17
Download to read offline
Priors in Bayesian Neural
Networks
Tomasz Kuśmierczyk
2022-05-06
Based on:
Wenzel et al.: What Are Bayesian Neural Network Posteriors Really Like?, 2020
Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021
Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021
Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021
DNN vs BNN
Reminder: Bayesian inference and learning
network
weights
Reminder: Bayesian inference and learning
we can sample from posterior using MCMC or learn it using e.g. VI
likelihood prior
How do we learn (=find posteriors) for BNNs?
● (Stochastic Gradient)-MCMCs: Hamiltonian MC, Langevin Dynamics etc.
● Distributional:
○ Laplace approximation
○ VI (via ELBO)
● “Model-specific”
○ MC-Dropout
○ SWAG
vs:
Cold posterior effect
Cold posterior effect
likelihood prior
vs:
trading off the relative influence between the prior term and the likelihood term:
→if the CPE becomes stronger as the relative influence of the prior increases,
this would be an indication that the prior is poor
Bad prior hypothesis
(data size)
Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021
DNN (SGD trained; no prior) weights
Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021
FC BNN: different priors
Weights correlations
CNN BNN: different priors
How can we learn priors?
Marginal likelihood optimization / type-II MLE / empirical Bayes
Model selection (e.g. choosing priors) by maximizing log of ML:
● For a fixed model, can be estimated using Laplace approximation with
GGN for Hessian
→ Alternate between model updates and the approximation
Will the approximation capture difference in priors?
Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021
● Assume approximation from some (parametric) family of
distributions
● Maximize ELBO wrt its parameters λ
Will posterior capture difference in priors? How to learn so complex priors are
accounted for?
Posterior learning and learning priors: VI
● MCMC e.g. SGHMC:
○ explores the space of parameters and generates set of samples { } from the posterior
○ assumes a fixed energy function, for example,
parametric priors cannot be learned, but, we can think about hierarchical priors:
(Nalisnick et al.: Predictive Complexity Priors, 2020): optimize KL divergence to predictive distribution of a reference model
for hierarchical prior
MCMC?
Conclusion
Model selection and learning for BNNs are tied

More Related Content

Similar to Priors for BNNs

Transformer based approaches for visual representation learning
Transformer based approaches for visual representation learningTransformer based approaches for visual representation learning
Transformer based approaches for visual representation learningRyohei Suzuki
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_SlideKang-Ho Lee
 
Learning, Logic, and Probability: a Unified View
Learning, Logic, and Probability: a Unified ViewLearning, Logic, and Probability: a Unified View
Learning, Logic, and Probability: a Unified Viewbutest
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...IOSR Journals
 
Associative Memory Model について
Associative Memory Model についてAssociative Memory Model について
Associative Memory Model についてohken
 
Density Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic DataDensity Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic DataEirini Ntoutsi
 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief netszukun
 
1-s2.0-S1474034622002737-main.pdf
1-s2.0-S1474034622002737-main.pdf1-s2.0-S1474034622002737-main.pdf
1-s2.0-S1474034622002737-main.pdfarchurssu
 
Heavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfHeavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfCharles Martin
 
EchoBay: optimization of Echo State Networks under memory and time constraints
EchoBay: optimization of Echo State Networks under memory and time constraintsEchoBay: optimization of Echo State Networks under memory and time constraints
EchoBay: optimization of Echo State Networks under memory and time constraintsNECST Lab @ Politecnico di Milano
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overviewRodion Kiryukhin
 
Block coordinate descent__in_computer_vision
Block coordinate descent__in_computer_visionBlock coordinate descent__in_computer_vision
Block coordinate descent__in_computer_visionYoussefKitane
 
Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationDai-Hai Nguyen
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
 
Black-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelBlack-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelIJECEIAES
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Pirouz Nourian
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 

Similar to Priors for BNNs (20)

ML-GCN
ML-GCNML-GCN
ML-GCN
 
Transformer based approaches for visual representation learning
Transformer based approaches for visual representation learningTransformer based approaches for visual representation learning
Transformer based approaches for visual representation learning
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_Slide
 
Learning, Logic, and Probability: a Unified View
Learning, Logic, and Probability: a Unified ViewLearning, Logic, and Probability: a Unified View
Learning, Logic, and Probability: a Unified View
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
 
Associative Memory Model について
Associative Memory Model についてAssociative Memory Model について
Associative Memory Model について
 
Density Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic DataDensity Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic Data
 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief nets
 
1-s2.0-S1474034622002737-main.pdf
1-s2.0-S1474034622002737-main.pdf1-s2.0-S1474034622002737-main.pdf
1-s2.0-S1474034622002737-main.pdf
 
Heavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfHeavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdf
 
EchoBay: optimization of Echo State Networks under memory and time constraints
EchoBay: optimization of Echo State Networks under memory and time constraintsEchoBay: optimization of Echo State Networks under memory and time constraints
EchoBay: optimization of Echo State Networks under memory and time constraints
 
Mj upjs
Mj upjsMj upjs
Mj upjs
 
Quantum Deep Learning
Quantum Deep LearningQuantum Deep Learning
Quantum Deep Learning
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Block coordinate descent__in_computer_vision
Block coordinate descent__in_computer_visionBlock coordinate descent__in_computer_vision
Block coordinate descent__in_computer_vision
 
Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
Black-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelBlack-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX model
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 

More from Tomasz Kusmierczyk

Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Tomasz Kusmierczyk
 
Automatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variablesAutomatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variablesTomasz Kusmierczyk
 
Loss Calibrated Variational Inference
Loss Calibrated Variational InferenceLoss Calibrated Variational Inference
Loss Calibrated Variational InferenceTomasz Kusmierczyk
 
Variational inference using implicit distributions
Variational inference using implicit distributionsVariational inference using implicit distributions
Variational inference using implicit distributionsTomasz Kusmierczyk
 
On the Causal Effect of Digital Badges
On the Causal Effect of Digital BadgesOn the Causal Effect of Digital Badges
On the Causal Effect of Digital BadgesTomasz Kusmierczyk
 
What are the negative effects of social media?: fighting fake information
What are the negative effects of social media?: fighting fake informationWhat are the negative effects of social media?: fighting fake information
What are the negative effects of social media?: fighting fake informationTomasz Kusmierczyk
 
Sampling and Markov Chain Monte Carlo Techniques
Sampling and Markov Chain Monte Carlo TechniquesSampling and Markov Chain Monte Carlo Techniques
Sampling and Markov Chain Monte Carlo TechniquesTomasz Kusmierczyk
 
Probabilistic Models in Recommender Systems: Time Variant Models
Probabilistic Models in Recommender Systems: Time Variant ModelsProbabilistic Models in Recommender Systems: Time Variant Models
Probabilistic Models in Recommender Systems: Time Variant ModelsTomasz Kusmierczyk
 
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)Tomasz Kusmierczyk
 

More from Tomasz Kusmierczyk (9)

Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
 
Automatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variablesAutomatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variables
 
Loss Calibrated Variational Inference
Loss Calibrated Variational InferenceLoss Calibrated Variational Inference
Loss Calibrated Variational Inference
 
Variational inference using implicit distributions
Variational inference using implicit distributionsVariational inference using implicit distributions
Variational inference using implicit distributions
 
On the Causal Effect of Digital Badges
On the Causal Effect of Digital BadgesOn the Causal Effect of Digital Badges
On the Causal Effect of Digital Badges
 
What are the negative effects of social media?: fighting fake information
What are the negative effects of social media?: fighting fake informationWhat are the negative effects of social media?: fighting fake information
What are the negative effects of social media?: fighting fake information
 
Sampling and Markov Chain Monte Carlo Techniques
Sampling and Markov Chain Monte Carlo TechniquesSampling and Markov Chain Monte Carlo Techniques
Sampling and Markov Chain Monte Carlo Techniques
 
Probabilistic Models in Recommender Systems: Time Variant Models
Probabilistic Models in Recommender Systems: Time Variant ModelsProbabilistic Models in Recommender Systems: Time Variant Models
Probabilistic Models in Recommender Systems: Time Variant Models
 
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
 

Recently uploaded

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 

Recently uploaded (20)

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 

Priors for BNNs

  • 1. Priors in Bayesian Neural Networks Tomasz Kuśmierczyk 2022-05-06 Based on: Wenzel et al.: What Are Bayesian Neural Network Posteriors Really Like?, 2020 Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021 Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021 Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021
  • 3. Reminder: Bayesian inference and learning network weights
  • 4. Reminder: Bayesian inference and learning we can sample from posterior using MCMC or learn it using e.g. VI likelihood prior
  • 5. How do we learn (=find posteriors) for BNNs? ● (Stochastic Gradient)-MCMCs: Hamiltonian MC, Langevin Dynamics etc. ● Distributional: ○ Laplace approximation ○ VI (via ELBO) ● “Model-specific” ○ MC-Dropout ○ SWAG
  • 8. trading off the relative influence between the prior term and the likelihood term: →if the CPE becomes stronger as the relative influence of the prior increases, this would be an indication that the prior is poor Bad prior hypothesis (data size) Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021
  • 9. DNN (SGD trained; no prior) weights Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021
  • 13. How can we learn priors?
  • 14. Marginal likelihood optimization / type-II MLE / empirical Bayes Model selection (e.g. choosing priors) by maximizing log of ML: ● For a fixed model, can be estimated using Laplace approximation with GGN for Hessian → Alternate between model updates and the approximation Will the approximation capture difference in priors? Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021
  • 15. ● Assume approximation from some (parametric) family of distributions ● Maximize ELBO wrt its parameters λ Will posterior capture difference in priors? How to learn so complex priors are accounted for? Posterior learning and learning priors: VI
  • 16. ● MCMC e.g. SGHMC: ○ explores the space of parameters and generates set of samples { } from the posterior ○ assumes a fixed energy function, for example, parametric priors cannot be learned, but, we can think about hierarchical priors: (Nalisnick et al.: Predictive Complexity Priors, 2020): optimize KL divergence to predictive distribution of a reference model for hierarchical prior MCMC?
  • 17. Conclusion Model selection and learning for BNNs are tied