Priors for BNNs

•

0 likes•886 views

Priors for BNNs, based on: "Wenzel et al.: What Are Bayesian Neural Network Posteriors Really Like?, 2020", "Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021", "Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021" and "Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021"

Technology

Priors in Bayesian Neural
Networks
Tomasz Kuśmierczyk
2022-05-06
Based on:
Wenzel et al.: What Are Bayesian Neural Network Posteriors Really Like?, 2020
Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021
Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021
Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021

Reminder: Bayesian inference and learning
network
weights

Reminder: Bayesian inference and learning
we can sample from posterior using MCMC or learn it using e.g. VI
likelihood prior

How do we learn (=find posteriors) for BNNs?
● (Stochastic Gradient)-MCMCs: Hamiltonian MC, Langevin Dynamics etc.
● Distributional:
○ Laplace approximation
○ VI (via ELBO)
● “Model-specific”
○ MC-Dropout
○ SWAG

Cold posterior effect
likelihood prior
vs:

trading off the relative influence between the prior term and the likelihood term:
→if the CPE becomes stronger as the relative influence of the prior increases,
this would be an indication that the prior is poor
Bad prior hypothesis
(data size)
Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021

DNN (SGD trained; no prior) weights
Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021

Marginal likelihood optimization / type-II MLE / empirical Bayes
Model selection (e.g. choosing priors) by maximizing log of ML:
● For a fixed model, can be estimated using Laplace approximation with
GGN for Hessian
→ Alternate between model updates and the approximation
Will the approximation capture difference in priors?
Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021

● Assume approximation from some (parametric) family of
distributions
● Maximize ELBO wrt its parameters λ
Will posterior capture difference in priors? How to learn so complex priors are
accounted for?
Posterior learning and learning priors: VI

● MCMC e.g. SGHMC:
○ explores the space of parameters and generates set of samples { } from the posterior
○ assumes a fixed energy function, for example,
parametric priors cannot be learned, but, we can think about hierarchical priors:
(Nalisnick et al.: Predictive Complexity Priors, 2020): optimize KL divergence to predictive distribution of a reference model
for hierarchical prior
MCMC?

Conclusion
Model selection and learning for BNNs are tied

Similar to Priors for BNNs

ML-GCN

Wonbeom Jang

Transformer based approaches for visual representation learning

Ryohei Suzuki

DLD_WeightSharing_Slide

Kang-Ho Lee

Learning, Logic, and Probability: a Unified View

butest

Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...

IOSR Journals

Associative Memory Model について

ohken

Density Based Subspace Clustering Over Dynamic Data

Eirini Ntoutsi

NIPS2007: deep belief nets

zukun

1-s2.0-S1474034622002737-main.pdf

archurssu

Heavy Tails Workshop NeurIPS2023.pdf

Charles Martin

The increase in computational power of embedded devices and the latency demands of novel applications brought a paradigm shift on how and where the computation is performed. AI inference is slowly moving from the Cloud to end-devices with limited resources, reducing bandwidth and latency, using compression, distillation of large networks, or quantization methods. While this approach worked well with regular Artificial Neural Networks, time-centric recurrent networks like Long-Short Term Memory remain too complex to be transferred on embedded devices without extreme simplifications. To solve this issue, the Reservoir Computing paradigm proposes sparse untrained non-linear networks, the reservoir, that can embed temporal relations without some of the hindrances of Recurrent Neural Networks training, and with a lower memory occupation. Echo State Networks (ESN) and Liquid State Machines are the most notable examples. In this scenario, we propose a methodology for ESN design and training based on Bayesian Optimization. Our Bayesian learning process efficiently searches hyper-parameters that maximize a fitness function. At the same time, it considers soft memory and time boundaries, measured empirically on the target device (whether embedded or not), and subject to the user’s constraints. Preliminary results show that the system is able to optimize the ESN hyper-parameters under stringent time and memory constraints, obtaining comparable results in terms of prediction accuracy.

EchoBay: optimization of Echo State Networks under memory and time constraints

NECST Lab @ Politecnico di Milano

Mj upjs

Pavol Jozef Safarik University

In recent years, deep learning has had a profound impact on machine learning and artificial intelligence. At the same time, algorithms for quantum computers have been shown to efficiently solve some problems that are intractable on conventional, classical computers. We show that quantum computing not only reduces the time required to train a deep restricted Boltzmann machine, but also provides a richer and more comprehensive framework for deep learning than classical computing and leads to significant improvements in the optimization of the underlying objective function. Our quantum methods also permit efficient training of full Boltzmann machines and multilayer, fully connected models and do not have well known classical counterparts.

Quantum Deep Learning

Willy Marroquin (WillyDevNET)

Graph neural networks overview

Rodion Kiryukhin

Block coordinate descent__in_computer_vision

YoussefKitane

Advanced machine learning for metabolite identification

Dai-Hai Nguyen

This presentation is intended as a high-level introduction for to deep learning and its applications in materials science. The intended audience is materials scientists and engineers Disclaimers: the second half of this presentation is intended as a broad overview of deep learning applications in materials science; due to time limitations it is not intended to be comprehensive. As a review of the field, this necessarily includes work that is not my own. If my own name is not included explicitly in the reference at the bottom of a slide, I was not involved in that work. Any mention of commercial products in this presentation is for information only; it does not imply recommendation or endorsement by NIST.

TMS workshop on machine learning in materials science: Intro to deep learning...

BrianDeCost

Nonlinear systems with uncertainty and disturbance are very difficult to model using mathematic approach. Therefore, a black-box modeling approach without any prior knowledge is necessary. There are some modeling approaches have been used to develop a black box model such as fuzzy logic, neural network, and evolution algorithms. In this paper, an evolutionary neural network by combining a neural network and a modified differential evolution algorithm is applied to model a nonlinear system. The feasibility and effectiveness of the proposed modeling are tested on a piezoelectric actuator SISO system and an experimental quadruple tank MIMO system.

Black-box modeling of nonlinear system using evolutionary neural NARX model

IJECEIAES

Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...

Pirouz Nourian

Cristopher M. Bishop's tutorial on graphical models

butest

Similar to Priors for BNNs (20)

ML-GCN

Transformer based approaches for visual representation learning

DLD_WeightSharing_Slide

Learning, Logic, and Probability: a Unified View

Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...

Associative Memory Model について

Density Based Subspace Clustering Over Dynamic Data

NIPS2007: deep belief nets

1-s2.0-S1474034622002737-main.pdf

Heavy Tails Workshop NeurIPS2023.pdf

EchoBay: optimization of Echo State Networks under memory and time constraints

Mj upjs

Quantum Deep Learning

Graph neural networks overview

Block coordinate descent__in_computer_vision

Advanced machine learning for metabolite identification

TMS workshop on machine learning in materials science: Intro to deep learning...

Black-box modeling of nonlinear system using evolutionary neural NARX model

Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...

Cristopher M. Bishop's tutorial on graphical models

More from Tomasz Kusmierczyk

Introduction to modern Variational Inference.

Tomasz Kusmierczyk

Advances in gradient-based inference have made distributional approximations for posterior distribution of latent-variable models easy, but only for continuous latent spaces. Models with discrete latent variables still require analytic marginalization, continuous relaxations, or specialized algorithms that are difficult to generalize already for minor variations of the model. Discrete normalizing flows could, in principle, be used as approximations while allowing efficient gradient-based learning, but are not sufficiently expressive for representing realistic posterior distributions even for simple cases. We overcome this limitation by considering mixtures of discrete normalizing flows instead.

Automatic variational inference with latent categorical variables

Tomasz Kusmierczyk

Loss Calibrated Variational Inference

Tomasz Kusmierczyk

Variational inference using implicit distributions

Tomasz Kusmierczyk

On the Causal Effect of Digital Badges

Tomasz Kusmierczyk

What are the negative effects of social media?: fighting fake information

Tomasz Kusmierczyk

Sampling and Markov Chain Monte Carlo Techniques

Tomasz Kusmierczyk

Probabilistic Models in Recommender Systems: Time Variant Models

Tomasz Kusmierczyk

The presentation of the paper "Tomasz Kusmierczyk, Kjetil Nørvåg: Mining Correlations on Massive Bursty Time Series Collections. DASFAA (1) 2015: 55-71" Abstract: Existing methods for finding correlations between bursty time series are limited to collections consisting of a small number of time series. In this paper, we present a novel approach for mining correlation in collections consisting of a large number of time series. In our approach, we use bursts co-occurring in different streams as the measure of their relatedness. By exploiting the pruning properties of our measure we develop new indexing structures and algorithms that allow for efficient mining of related pairs from millions of streams. An experimental study performed on a large time series collection demonstrates the efficiency and scalability of the proposed approach.

Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)

Tomasz Kusmierczyk

More from Tomasz Kusmierczyk (9)

Introduction to modern Variational Inference.

Automatic variational inference with latent categorical variables

Loss Calibrated Variational Inference

Variational inference using implicit distributions

On the Causal Effect of Digital Badges

What are the negative effects of social media?: fighting fake information

Sampling and Markov Chain Monte Carlo Techniques

Probabilistic Models in Recommender Systems: Time Variant Models

Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

Architecting Cloud Native Applications

WSO2

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

FWD Group - Insurer Innovation Award 2024

The Digital Insurer

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

Manulife - Insurer Transformation Award 2024

The Digital Insurer

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

Igalia

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Data Cloud, More than a CDP by Matt Robison

Architecting Cloud Native Applications

Strategies for Landing an Oracle DBA Job as a Fresher

FWD Group - Insurer Innovation Award 2024

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Manulife - Insurer Transformation Award 2024

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

presentation ICT roal in 21st century education

Automating Google Workspace (GWS) & more with Apps Script

Apidays New York 2024 - The value of a flexible API Management solution for O...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

How to Troubleshoot Apps for the Modern Connected Worker

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

A Year of the Servo Reboot: Where Are We Now?

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Priors for BNNs

1. Priors in Bayesian Neural Networks Tomasz Kuśmierczyk 2022-05-06 Based on: Wenzel et al.: What Are Bayesian Neural Network Posteriors Really Like?, 2020 Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021 Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021 Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021

2. DNN vs BNN

3. Reminder: Bayesian inference and learning network weights

4. Reminder: Bayesian inference and learning we can sample from posterior using MCMC or learn it using e.g. VI likelihood prior

5. How do we learn (=find posteriors) for BNNs? ● (Stochastic Gradient)-MCMCs: Hamiltonian MC, Langevin Dynamics etc. ● Distributional: ○ Laplace approximation ○ VI (via ELBO) ● “Model-specific” ○ MC-Dropout ○ SWAG

6. vs: Cold posterior effect

7. Cold posterior effect likelihood prior vs:

8. trading off the relative influence between the prior term and the likelihood term: →if the CPE becomes stronger as the relative influence of the prior increases, this would be an indication that the prior is poor Bad prior hypothesis (data size) Noci et al.: Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect, 2021

9. DNN (SGD trained; no prior) weights Fortuin et al.: Bayesian Neural Network Priors Revisited, 2021

10. FC BNN: different priors

11. Weights correlations

12. CNN BNN: different priors

13. How can we learn priors?

14. Marginal likelihood optimization / type-II MLE / empirical Bayes Model selection (e.g. choosing priors) by maximizing log of ML: ● For a fixed model, can be estimated using Laplace approximation with GGN for Hessian → Alternate between model updates and the approximation Will the approximation capture difference in priors? Immer et al.: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning, 2021

15. ● Assume approximation from some (parametric) family of distributions ● Maximize ELBO wrt its parameters λ Will posterior capture difference in priors? How to learn so complex priors are accounted for? Posterior learning and learning priors: VI

16. ● MCMC e.g. SGHMC: ○ explores the space of parameters and generates set of samples { } from the posterior ○ assumes a fixed energy function, for example, parametric priors cannot be learned, but, we can think about hierarchical priors: (Nalisnick et al.: Predictive Complexity Priors, 2020): optimize KL divergence to predictive distribution of a reference model for hierarchical prior MCMC?

17. Conclusion Model selection and learning for BNNs are tied

Priors for BNNs

Recommended

Recommended

More Related Content

Similar to Priors for BNNs

Similar to Priors for BNNs (20)

More from Tomasz Kusmierczyk

More from Tomasz Kusmierczyk (9)

Recently uploaded

Recently uploaded (20)

Priors for BNNs