Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

•Download as PPTX, PDF•

0 likes•53 views

Kien Duc Do

This is my presentation slide at AAAI 2021

Technology

Semi-supervised Learning with
Variational Bayesian Inference and
Maximum Uncertainty Regularization
Kien Do, Truyen Tran, Svetha Venkatesh
Applied AI Institute (A2I2), Deakin University, Australia
1

Introduction
• Many big systems nowadays need a lot of labeled data to learn well.
• However, manual label annotation is expensive and time consuming.
• Semi-supervised learning (SSL) mitigates the need for labels by
leveraging similar patterns in unlabeled data to improve classification.
• Recent SOTA methods for SSL are mainly based on consistency
regularization.
2

Two types of perturbation
4
data perturbation weight perturbation
Existing CR-based methods focus mainly on data perturbation

Some well-known CR based methods
• Pi-model:
• Mean Teacher:
5
is the exponential moving average of

Can we achieve a better perturbation of data?
• Under weak data perturbation, is often close to .
The classifier can only learn a locally smooth mapping from to .
• We want to be: i) not too close to , and ii) difficult for the
classifier to predict correctly.
• We choose to be a maximum uncertain (w.r.t. ) virtual point:
6

Approximating
• Recall that defined as follows:
• However, optimizing the above objective is difficult since it usually
has multiple local minima. To address this problem, we approximate
by optimizing the first-order Taylor expansion of :
where is the gradient of at .
7

Approximating (cont.)
• We can also approximate using projected gradient descent. The
update formula at step t+1 is given by:
• Solving the above equations give us:
8

Maximum Uncertainty Regularization
• The maximum uncertainty regularization (MUR) loss is defined as:
where is the maximum uncertain virtual point.
9

Weight Perturbation via Variational Bayesian
Inference
• Unlike data perturbation, weight perturbation is not straightforward
• We need some way to generate random weights
Variational Bayesian Inference (VBI) is a principled way to do that
• VBI objective:
10
Force weights to match the prior
Ensure faithful reconstruction

Consistency under Weight Perturbation
• The consistency loss under weight perturbation is given below:
where is the mean of .
11

Final Objective
The final objective when combining weight perturbation (via VBI) and
data perturbation (via MUR) is given by:
where can be an arbitrary consistency regularization based
method like Pi-model, Mean Teacher or ICT.
12

Ablation Study
14
Different coefficient values of ( )

Ablation Study (cont.)
15
Performance with different radiuses Random perturbation vs. MUR

Visualization of most uncertain samples
16

Conclusion
• We have proposed two new consistency regularization based
methods: MUR and CWP
• MUR finds the most uncertain virtual point and forces its class
prediction to be similar to that of .
• CWP leverages Variational Bayesian Inference to perturb weights and
forces a noisy classifier to produce consistent outputs.
• Both MUR and CWP lead to better performances on SSL.
17

What's hot

Data analytics with python introductoryAbhimanyu Dwivedi

WXGB6108_Article Review_The Effect of Attitudes, Goal Setting and Self-Effica...Husna Zayadi

Sotaguesta4fafe

House Price Prediction An AI Approach.Nahian Ahmed

G6 m2-a-lesson 7-smlabuski

slidesbutest

Ijcatr04071005Editor IJCATR

WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...Nexgen Technology

Meta learned Confidence for Few-shot LearningKIMMINHA3

ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULEIJCSEA Journal

A Three-Layer Visual Hash Function Using Adler-32Universitas Pembangunan Panca Budi

Parallel and distributed genetic algorithm with multiple objectives to impro...khalil IBRAHIM

Multi-Task Learning With Deep Neural NetworksAbhishekBais8

A graph based consensus maximization approach for combining multiple supervis...Ecway Technologies

What's hot (14)

Data analytics with python introductory

WXGB6108_Article Review_The Effect of Attitudes, Goal Setting and Self-Effica...

Sota

House Price Prediction An AI Approach.

G6 m2-a-lesson 7-s

slides

Ijcatr04071005

WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...

Meta learned Confidence for Few-shot Learning

ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULE

A Three-Layer Visual Hash Function Using Adler-32

Parallel and distributed genetic algorithm with multiple objectives to impro...

Multi-Task Learning With Deep Neural Networks

A graph based consensus maximization approach for combining multiple supervis...

Similar to Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

Continual learning: Variational continual learningWonjun Jeong

Learning Sparse Networks using Targeted DropoutSeunghyun Hwang

Lectura seisEvelyn Gabriela Lema Vinueza

November, 2006 CCKM'06 1 butest

Boosting based Transfer LearningAshok Venkatesan

Hyperparameter TuningJon Lederman

Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn

machine learning for engineering studentsKavitabani1

Nimrita deep learningNimrita Koul

Regularization in deep learningKien Le

5954987.pptMukhtiarKhan5

SocNL: Bayesian Label Propagation with ConfidenceYuto Yamaguchi

Stopped Training and Other Remedies for OverFITttingESCOM

08 neural networksankit_ppt

planning and decision making AdengappaUnavu

effect of learning raterajshreemuthiah

Why Batch Normalization Works so WellChun-Ming Chang

Dataset Distillation by Matching Training Trajectories taeseon ryu

Issues in DTL.pptxRamakrishna Reddy Bijjam

Learning loss for active learningNAVER Engineering

Similar to Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization (20)

Continual learning: Variational continual learning

Learning Sparse Networks using Targeted Dropout

Lectura seis

November, 2006 CCKM'06 1

Boosting based Transfer Learning

Hyperparameter Tuning

Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...

machine learning for engineering students

Nimrita deep learning

Regularization in deep learning

5954987.ppt

SocNL: Bayesian Label Propagation with Confidence

Stopped Training and Other Remedies for OverFITtting

08 neural networks

planning and decision making

effect of learning rate

Why Batch Normalization Works so Well

Dataset Distillation by Matching Training Trajectories

Issues in DTL.pptx

Learning loss for active learning

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Evaluating the top large language models.pdfChristopherTHyatt

GenAI Risks & Security Meetup 01052024.pdflior mazor

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

🐬 The future of MySQL is Postgres 🐘RTylerCroy

A Domino Admins Adventures (Engage 2024)Gabriella Davis

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Scaling API-first – The story of a global engineering organizationRadu Cotescu

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Strategies for Landing an Oracle DBA Job as a Fresher

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Evaluating the top large language models.pdf

GenAI Risks & Security Meetup 01052024.pdf

Axa Assurance Maroc - Insurer Innovation Award 2024

🐬 The future of MySQL is Postgres 🐘

A Domino Admins Adventures (Engage 2024)

How to Troubleshoot Apps for the Modern Connected Worker

GenCyber Cyber Security Day Presentation

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Presentation on how to chat with PDF using ChatGPT code interpreter

Finology Group – Insurtech Innovation Award 2024

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Scaling API-first – The story of a global engineering organization

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

1. Semi-supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization Kien Do, Truyen Tran, Svetha Venkatesh Applied AI Institute (A2I2), Deakin University, Australia 1

2. Introduction • Many big systems nowadays need a lot of labeled data to learn well. • However, manual label annotation is expensive and time consuming. • Semi-supervised learning (SSL) mitigates the need for labels by leveraging similar patterns in unlabeled data to improve classification. • Recent SOTA methods for SSL are mainly based on consistency regularization. 2

3. Consistency Regularization for SSL 3

4. Two types of perturbation 4 data perturbation weight perturbation Existing CR-based methods focus mainly on data perturbation

5. Some well-known CR based methods • Pi-model: • Mean Teacher: 5 is the exponential moving average of

6. Can we achieve a better perturbation of data? • Under weak data perturbation, is often close to . The classifier can only learn a locally smooth mapping from to . • We want to be: i) not too close to , and ii) difficult for the classifier to predict correctly. • We choose to be a maximum uncertain (w.r.t. ) virtual point: 6

7. Approximating • Recall that defined as follows: • However, optimizing the above objective is difficult since it usually has multiple local minima. To address this problem, we approximate by optimizing the first-order Taylor expansion of : where is the gradient of at . 7

8. Approximating (cont.) • We can also approximate using projected gradient descent. The update formula at step t+1 is given by: • Solving the above equations give us: 8

9. Maximum Uncertainty Regularization • The maximum uncertainty regularization (MUR) loss is defined as: where is the maximum uncertain virtual point. 9

10. Weight Perturbation via Variational Bayesian Inference • Unlike data perturbation, weight perturbation is not straightforward • We need some way to generate random weights Variational Bayesian Inference (VBI) is a principled way to do that • VBI objective: 10 Force weights to match the prior Ensure faithful reconstruction

11. Consistency under Weight Perturbation • The consistency loss under weight perturbation is given below: where is the mean of . 11

12. Final Objective The final objective when combining weight perturbation (via VBI) and data perturbation (via MUR) is given by: where can be an arbitrary consistency regularization based method like Pi-model, Mean Teacher or ICT. 12

13. Results on CIFAR-10/100 and SVHN 13

14. Ablation Study 14 Different coefficient values of ( )

15. Ablation Study (cont.) 15 Performance with different radiuses Random perturbation vs. MUR

16. Visualization of most uncertain samples 16

17. Conclusion • We have proposed two new consistency regularization based methods: MUR and CWP • MUR finds the most uncertain virtual point and forces its class prediction to be similar to that of . • CWP leverages Variational Bayesian Inference to perturb weights and forces a noisy classifier to produce consistent outputs. • Both MUR and CWP lead to better performances on SSL. 17

18. 18 Thank you for your attention!

Editor's Notes

The error of MT+VD is always smaller than the error of MT

Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

Similar to Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization (20)

Recently uploaded

Recently uploaded (20)

Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

Editor's Notes