Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Contents
1. Introduction
2. Method
3. Experimental evaluation
4. Building intuitions with ablations
5. Conclusion
2

Introduction
• Contrastive Learning
• 좋은 representation을 학습하는 것이 computer vision에서 중요한 문제
• Contrastive learning은 state-of-the-art를 기록하고 있음
• Positive pairs (same image)와 가깝게, negative pairs (different images)와는 멀게
• Large batch sizes (SimCLR), memory banks (MoCo), customized mining strategies
• Bootstrap Your Own Latent (BYOL)
• state-of-the-art contrastive method without negative pairs
• Target에 대한 pseudo-label이나 cluster indices이 아닌 representation을 bootstrap
• 다른 augmentation을 적용한 online, target network 사용
3

Method
• Motivation
• Step 1 → 1.4% Top 1 Acc
• Fixed randomly initialized encoder + trainable linear layer
• Labeled dataset으로 학습
• Step 2 → 18.8% Top 1 Acc
• Random initialized encoder & linear layer로 unlabeled dataset 예측
• 예측된 label을 새로운 fixed randomly initialized encoder
+ trainable linear layer 사전 학습
• 실제 labeled dataset으로 학습
→ 실제 사용하는 online network에 target network가 필요함!
(Self-knowledge distillation + Semi-supervised?)
5
https://hoya012.github.io/blog/byol/

Method
• Terminology
• 𝜃 : a set of weights of the online network
• 𝜉 : a set of weights of the target network
6
Encoder Projector PredictorAugmentation
Stop gradient

Method
• Description of BYOL
• Target network는 online network가 학습할 regression target을 예측
• Target network의 parameter 𝜉는 online parameter 𝜃의 exponential moving average
𝜉 ⟵ 𝜏𝜉 + 1 − 𝜏 𝜃, 𝜏 ∈ 0,1
• Loss : prediction 𝑞 𝜃 𝑧 𝜃 와 𝑧 𝜉
′
의 𝑙2-norm의 mean squared error
ℒ 𝜃
BYOL ≜ 𝑞 𝜃 𝑧 𝜃 − ഥ𝑧 𝜉
′
2
2
= 2 − 2 ⋅
𝑞 𝜃 𝑧 𝜃 , 𝑧 𝜉
′
𝑞 𝜃 𝑧 𝜃 2 ⋅ 𝑧 𝜉
′
2
when 𝑞 𝜃 𝑧 𝜃 ≜ Τ𝑞 𝜃 𝑧 𝜃 𝑞 𝜃 𝑧 𝜃 2 and ഥ𝑧 𝜉
′
≜ ൗ𝑧 𝜉
′
𝑧 𝜉
′
2
• 두 network의 input을 서로 바꾸어 낸 결과로 ሚℒ 𝜃
BYOL 계산
• ℒ 𝜃
BYOL + ሚℒ 𝜃
BYOL를 online network에만 적용
7

Method
• Implementation details
• Image augmentations
• SimCLR에서 사용한 기법 사용
8

Method
• Implementation details
• Architecture
• ResNet50
• 4096-dimension MLP (projection) with no batch normalization
• 256-dimension prediction layer
• Optimization
• LARS optimizer
• 1000 epochs with warm-up period of 10 epochs
• Linear scaled learning rate 0.2 (LearningRate = 0.2 x BatchSize/256)
• 1.5 ⋅ 10−6
global weight decay parameter
• 𝜏base = 0.996 and 𝜏 ≜ 1 − 1 − 𝜏base ⋅ cos 𝜋𝑘/𝐾 /2 with k the current training step and K the total step
• 4096 batch size split over 512 Cloud TPU v3 cores
9

Experimental evaluation
• Linear evaluation on ImageNet
10

• Semi-supervised training on ImageNet
11

• Transfer to other classification tasks
12

• Transfer to other vision tasks
13

Building intuitions with ablations
• Batch size
• Batch size가 작아져도 SimCLR보다 성능 하락 폭이 좁음
• Image augmentations
• Image augmentation option에 대해 robust 함
14

• Bootstrapping
• 𝜏 = 0 : target network = online network
• 𝜏 = 1 : never updated target network (18.8% Top 1 Acc)
• Online network의 weight를 target network에 입히기 위한 적절한 𝜏 설정이 필요함
15

• Ablation to contrastive methods
• 𝛽 = 1 : SimCLR
• No predictor, no target network
• 𝛽 = 0 : BYOL
• No negative
InfoNCE 𝜃 ≜
2
𝐵
෍
𝑖=1
𝐵
𝑆 𝜃 𝑣𝑖, 𝑣𝑖
′
− 𝛽 ⋅
2𝛼
𝐵
෍
𝑖=1
𝐵
ln ෍
𝑗≠𝑖
exp
𝑆 𝜃 𝑣𝑖, 𝑣𝑗
𝛼
+ ෍
𝑗
exp
𝑆 𝜃 𝑣𝑖, 𝑣𝑗
′
𝛼
𝑆 𝜃 𝑢1, 𝑢2 ≜
𝜙 𝑢1 , 𝜓 𝑢2
𝜙 𝑢1 2 ⋅ 𝜓 𝑢2 2
16

Conclusion
• Negative pair 없이 representation 학습
• 하지만 역시나 큰 batch size 필요
• 여러 task에서 State-of-the-art
• 하지만 이미 SimCLRv2에게 짐..
• Augmentation option에 대해 robust 함
• 그래도 적합한 augmentation을 찾는 것이 필요함
17

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Similar to Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (20)

More from Sungchul Kim

More from Sungchul Kim (20)

Recently uploaded

Recently uploaded (20)

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning