SlideShare a Scribd company logo
1 of 18
Download to read offline
김 성 철
Contents
1. Introduction
2. Method
3. Experimental evaluation
4. Building intuitions with ablations
5. Conclusion
2
Introduction
• Contrastive Learning
• 좋은 representation을 학습하는 것이 computer vision에서 중요한 문제
• Contrastive learning은 state-of-the-art를 기록하고 있음
• Positive pairs (same image)와 가깝게, negative pairs (different images)와는 멀게
• Large batch sizes (SimCLR), memory banks (MoCo), customized mining strategies
• Bootstrap Your Own Latent (BYOL)
• state-of-the-art contrastive method without negative pairs
• Target에 대한 pseudo-label이나 cluster indices이 아닌 representation을 bootstrap
• 다른 augmentation을 적용한 online, target network 사용
3
Introduction
4
Method
• Motivation
• Step 1 → 1.4% Top 1 Acc
• Fixed randomly initialized encoder + trainable linear layer
• Labeled dataset으로 학습
• Step 2 → 18.8% Top 1 Acc
• Random initialized encoder & linear layer로 unlabeled dataset 예측
• 예측된 label을 새로운 fixed randomly initialized encoder
+ trainable linear layer 사전 학습
• 실제 labeled dataset으로 학습
→ 실제 사용하는 online network에 target network가 필요함!
(Self-knowledge distillation + Semi-supervised?)
5
https://hoya012.github.io/blog/byol/
Method
• Terminology
• 𝜃 : a set of weights of the online network
• 𝜉 : a set of weights of the target network
6
Encoder Projector PredictorAugmentation
Stop gradient
Method
• Description of BYOL
• Target network는 online network가 학습할 regression target을 예측
• Target network의 parameter 𝜉는 online parameter 𝜃의 exponential moving average
𝜉 ⟵ 𝜏𝜉 + 1 − 𝜏 𝜃, 𝜏 ∈ 0,1
• Loss : prediction 𝑞 𝜃 𝑧 𝜃 와 𝑧 𝜉
′
의 𝑙2-norm의 mean squared error
ℒ 𝜃
BYOL ≜ 𝑞 𝜃 𝑧 𝜃 − ഥ𝑧 𝜉
′
2
2
= 2 − 2 ⋅
𝑞 𝜃 𝑧 𝜃 , 𝑧 𝜉
′
𝑞 𝜃 𝑧 𝜃 2 ⋅ 𝑧 𝜉
′
2
when 𝑞 𝜃 𝑧 𝜃 ≜ Τ𝑞 𝜃 𝑧 𝜃 𝑞 𝜃 𝑧 𝜃 2 and ഥ𝑧 𝜉
′
≜ ൗ𝑧 𝜉
′
𝑧 𝜉
′
2
• 두 network의 input을 서로 바꾸어 낸 결과로 ሚℒ 𝜃
BYOL 계산
• ℒ 𝜃
BYOL + ሚℒ 𝜃
BYOL를 online network에만 적용
7
Method
• Implementation details
• Image augmentations
• SimCLR에서 사용한 기법 사용
8
Method
• Implementation details
• Architecture
• ResNet50
• 4096-dimension MLP (projection) with no batch normalization
• 256-dimension prediction layer
• Optimization
• LARS optimizer
• 1000 epochs with warm-up period of 10 epochs
• Linear scaled learning rate 0.2 (LearningRate = 0.2 x BatchSize/256)
• 1.5 ⋅ 10−6
global weight decay parameter
• 𝜏base = 0.996 and 𝜏 ≜ 1 − 1 − 𝜏base ⋅ cos 𝜋𝑘/𝐾 /2 with k the current training step and K the total step
• 4096 batch size split over 512 Cloud TPU v3 cores
9
Experimental evaluation
• Linear evaluation on ImageNet
10
Experimental evaluation
• Semi-supervised training on ImageNet
11
Experimental evaluation
• Transfer to other classification tasks
12
Experimental evaluation
• Transfer to other vision tasks
13
Building intuitions with ablations
• Batch size
• Batch size가 작아져도 SimCLR보다 성능 하락 폭이 좁음
• Image augmentations
• Image augmentation option에 대해 robust 함
14
Building intuitions with ablations
• Bootstrapping
• 𝜏 = 0 : target network = online network
• 𝜏 = 1 : never updated target network (18.8% Top 1 Acc)
• Online network의 weight를 target network에 입히기 위한 적절한 𝜏 설정이 필요함
15
Building intuitions with ablations
• Ablation to contrastive methods
• 𝛽 = 1 : SimCLR
• No predictor, no target network
• 𝛽 = 0 : BYOL
• No negative
InfoNCE 𝜃 ≜
2
𝐵
෍
𝑖=1
𝐵
𝑆 𝜃 𝑣𝑖, 𝑣𝑖
′
− 𝛽 ⋅
2𝛼
𝐵
෍
𝑖=1
𝐵
ln ෍
𝑗≠𝑖
exp
𝑆 𝜃 𝑣𝑖, 𝑣𝑗
𝛼
+ ෍
𝑗
exp
𝑆 𝜃 𝑣𝑖, 𝑣𝑗
′
𝛼
𝑆 𝜃 𝑢1, 𝑢2 ≜
𝜙 𝑢1 , 𝜓 𝑢2
𝜙 𝑢1 2 ⋅ 𝜓 𝑢2 2
16
Conclusion
• Negative pair 없이 representation 학습
• 하지만 역시나 큰 batch size 필요
• 여러 task에서 State-of-the-art
• 하지만 이미 SimCLRv2에게 짐..
• Augmentation option에 대해 robust 함
• 그래도 적합한 augmentation을 찾는 것이 필요함
17
감 사 합 니 다
18

More Related Content

What's hot

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
MLconf
 

What's hot (20)

Multi-class Image Classification using deep convolutional networks on extreme...
Multi-class Image Classification using deep convolutional networks on extreme...Multi-class Image Classification using deep convolutional networks on extreme...
Multi-class Image Classification using deep convolutional networks on extreme...
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursionMaking neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
 
Dl
DlDl
Dl
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
 
How Criteo optimized and sped up its TensorFlow models by 10x and served them...
How Criteo optimized and sped up its TensorFlow models by 10x and served them...How Criteo optimized and sped up its TensorFlow models by 10x and served them...
How Criteo optimized and sped up its TensorFlow models by 10x and served them...
 
20131024
2013102420131024
20131024
 
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networks
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
 
Deep Learning for Computer Vision: Optimization (UPC 2016)
Deep Learning for Computer Vision: Optimization (UPC 2016)Deep Learning for Computer Vision: Optimization (UPC 2016)
Deep Learning for Computer Vision: Optimization (UPC 2016)
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 

Similar to Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
 

Similar to Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (20)

Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017
 
DenseNet Models for Tiny ImageNet Classification
DenseNet Models for Tiny ImageNet Classification DenseNet Models for Tiny ImageNet Classification
DenseNet Models for Tiny ImageNet Classification
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
OBDPC 2022
OBDPC 2022OBDPC 2022
OBDPC 2022
 
Online video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkOnline video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident network
 
SPPNet
SPPNetSPPNet
SPPNet
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Deep learning
Deep learningDeep learning
Deep learning
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
 

More from Sungchul Kim

More from Sungchul Kim (20)

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object Detector
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with Convolutions
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic Segmentation
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural Networks
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial Networks
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive Learning
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
 
Regularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge DistillationRegularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge Distillation
 

Recently uploaded

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 

Recently uploaded (20)

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

  • 2. Contents 1. Introduction 2. Method 3. Experimental evaluation 4. Building intuitions with ablations 5. Conclusion 2
  • 3. Introduction • Contrastive Learning • 좋은 representation을 학습하는 것이 computer vision에서 중요한 문제 • Contrastive learning은 state-of-the-art를 기록하고 있음 • Positive pairs (same image)와 가깝게, negative pairs (different images)와는 멀게 • Large batch sizes (SimCLR), memory banks (MoCo), customized mining strategies • Bootstrap Your Own Latent (BYOL) • state-of-the-art contrastive method without negative pairs • Target에 대한 pseudo-label이나 cluster indices이 아닌 representation을 bootstrap • 다른 augmentation을 적용한 online, target network 사용 3
  • 5. Method • Motivation • Step 1 → 1.4% Top 1 Acc • Fixed randomly initialized encoder + trainable linear layer • Labeled dataset으로 학습 • Step 2 → 18.8% Top 1 Acc • Random initialized encoder & linear layer로 unlabeled dataset 예측 • 예측된 label을 새로운 fixed randomly initialized encoder + trainable linear layer 사전 학습 • 실제 labeled dataset으로 학습 → 실제 사용하는 online network에 target network가 필요함! (Self-knowledge distillation + Semi-supervised?) 5 https://hoya012.github.io/blog/byol/
  • 6. Method • Terminology • 𝜃 : a set of weights of the online network • 𝜉 : a set of weights of the target network 6 Encoder Projector PredictorAugmentation Stop gradient
  • 7. Method • Description of BYOL • Target network는 online network가 학습할 regression target을 예측 • Target network의 parameter 𝜉는 online parameter 𝜃의 exponential moving average 𝜉 ⟵ 𝜏𝜉 + 1 − 𝜏 𝜃, 𝜏 ∈ 0,1 • Loss : prediction 𝑞 𝜃 𝑧 𝜃 와 𝑧 𝜉 ′ 의 𝑙2-norm의 mean squared error ℒ 𝜃 BYOL ≜ 𝑞 𝜃 𝑧 𝜃 − ഥ𝑧 𝜉 ′ 2 2 = 2 − 2 ⋅ 𝑞 𝜃 𝑧 𝜃 , 𝑧 𝜉 ′ 𝑞 𝜃 𝑧 𝜃 2 ⋅ 𝑧 𝜉 ′ 2 when 𝑞 𝜃 𝑧 𝜃 ≜ Τ𝑞 𝜃 𝑧 𝜃 𝑞 𝜃 𝑧 𝜃 2 and ഥ𝑧 𝜉 ′ ≜ ൗ𝑧 𝜉 ′ 𝑧 𝜉 ′ 2 • 두 network의 input을 서로 바꾸어 낸 결과로 ሚℒ 𝜃 BYOL 계산 • ℒ 𝜃 BYOL + ሚℒ 𝜃 BYOL를 online network에만 적용 7
  • 8. Method • Implementation details • Image augmentations • SimCLR에서 사용한 기법 사용 8
  • 9. Method • Implementation details • Architecture • ResNet50 • 4096-dimension MLP (projection) with no batch normalization • 256-dimension prediction layer • Optimization • LARS optimizer • 1000 epochs with warm-up period of 10 epochs • Linear scaled learning rate 0.2 (LearningRate = 0.2 x BatchSize/256) • 1.5 ⋅ 10−6 global weight decay parameter • 𝜏base = 0.996 and 𝜏 ≜ 1 − 1 − 𝜏base ⋅ cos 𝜋𝑘/𝐾 /2 with k the current training step and K the total step • 4096 batch size split over 512 Cloud TPU v3 cores 9
  • 10. Experimental evaluation • Linear evaluation on ImageNet 10
  • 12. Experimental evaluation • Transfer to other classification tasks 12
  • 13. Experimental evaluation • Transfer to other vision tasks 13
  • 14. Building intuitions with ablations • Batch size • Batch size가 작아져도 SimCLR보다 성능 하락 폭이 좁음 • Image augmentations • Image augmentation option에 대해 robust 함 14
  • 15. Building intuitions with ablations • Bootstrapping • 𝜏 = 0 : target network = online network • 𝜏 = 1 : never updated target network (18.8% Top 1 Acc) • Online network의 weight를 target network에 입히기 위한 적절한 𝜏 설정이 필요함 15
  • 16. Building intuitions with ablations • Ablation to contrastive methods • 𝛽 = 1 : SimCLR • No predictor, no target network • 𝛽 = 0 : BYOL • No negative InfoNCE 𝜃 ≜ 2 𝐵 ෍ 𝑖=1 𝐵 𝑆 𝜃 𝑣𝑖, 𝑣𝑖 ′ − 𝛽 ⋅ 2𝛼 𝐵 ෍ 𝑖=1 𝐵 ln ෍ 𝑗≠𝑖 exp 𝑆 𝜃 𝑣𝑖, 𝑣𝑗 𝛼 + ෍ 𝑗 exp 𝑆 𝜃 𝑣𝑖, 𝑣𝑗 ′ 𝛼 𝑆 𝜃 𝑢1, 𝑢2 ≜ 𝜙 𝑢1 , 𝜓 𝑢2 𝜙 𝑢1 2 ⋅ 𝜓 𝑢2 2 16
  • 17. Conclusion • Negative pair 없이 representation 학습 • 하지만 역시나 큰 batch size 필요 • 여러 task에서 State-of-the-art • 하지만 이미 SimCLRv2에게 짐.. • Augmentation option에 대해 robust 함 • 그래도 적합한 augmentation을 찾는 것이 필요함 17
  • 18. 감 사 합 니 다 18