SSII2021
自己教師あり学習における
対照学習の基礎と応用
2021.6.10
汪 雪婷
Wang Xueting(CyberAgent, Inc.)
1
Contrastive Self-Supervised Learning
Self-Supervised Learning
Pretrain-Finetune Pipeline
SSII2021
Representation
classifier Cat
Tiger
Label
Representation
classifier
Dog
Supervised Pipeline
Pretrain
Finetune
(downstream
tasks)
• Large amount of data
• High annotation cost
2
Self-Supervised Learning
Pretrain-Finetune Pipeline
3
SSII2021
Representation
classifier Cat
Tiger
Label
Representation
classifier
Dog
Representation
Pretext
Task
Representation
classifier
Dog
Supervised Pipeline Self-Supervised Pipeline
No Annotated
Label !
Pretrain
Finetune
(downstream
tasks)
Self-Supervised Learning
Pretrain-Finetune Pipeline
4
SSII2021
Representation
classifier Cat
Tiger
Label
Representation
classifier
Dog
Representation
Pretext
Task
Representation
classifier
Dog
Supervised Pipeline Self-Supervised Pipeline
No Annotated
Label !
Pretrain
Finetune
Traditional supervised learning
• Large amount of data+annotated labels
• Task-specific learning, limited generalization
Self-supervised learning
• Reducing cost of human annotations
• Supervision from the data itself
• General representation learning
• Alternative downstream tasks
Self-Supervised Learning
Pretrain-Finetune Pipeline
5
SSII2021
Representation
Pretext
Task
Representation
classifier
Dog
Self-Supervised Pipeline
No Annotated
Label !
Point
• How to learn effective representation
from unlabeled data?
• How to design effective pretext
from the data itself?
Self-supervised learning
• Reducing cost of human annotations
• Supervision from the data itself
• General representation learning
• Alternative downstream tasks
Self-Supervised Learning
How : Paradigm Overview
6
SSII2021
Representation
Similar
or not
Generative / Predictive Contrastive
Representation
• Loss measured in the output space
• Learning to reconstruct/predict
 Better reconstruction/prediction
better representation
• Loss measured in the representation space
• Learning to distinguish
 Do not need to reconstruct all detail
 focus on distinguishing samples
Positive Negative
Anchor
Original Generated
Eg. Auto-Encoders, BERT
Self-Supervised Learning
How : Paradigm Overview
SSII2021
Generative / Predictive
Representation
• Loss measured in the output space
• Learning to reconstruct/predict
 Better reconstruction/prediction
better representation
Original Generated
Eg. Auto-Encoders, BERT
Top: Drawing of a dollar bill from memory
Down: Drawing subsequently made with a
dollar bill present. [Image source: Epstein, 2016]
Self-Supervised Learning
How : Paradigm Overview
8
SSII2021
Representation
Similar
or not
Generative / Predictive Contrastive
• Loss measured in the output space
• Learning to reconstruct/predict
 Better reconstruction/prediction
better representation
• Loss measured in the representation space
• Learning to distinguish
 Do not need to reconstruct all detail
 focus on distinguishing samples
Positive Negative
Anchor
Representation
Original Generated
Contrastive Learning
Point: distinguish features among different instances
9
SSII2021
Representation
Positive (𝒙+
)
Negative
(𝐱𝐣, 𝒕𝒉𝒆 𝒐𝒕𝒉𝒆𝒓 𝒔𝒂𝒎𝒑𝒍𝒆𝒔)
Anchor (𝒙)
Positive
Anchor Negative
Anchor
similar dissimilar
InfoNCE Loss [Gutmann+, AISTATS’10]
Contrastive Learning
• MoCo: Momentum contrast for unsupervised visual
representation learning [CVPR’20]
• SimCLR: A Simple Framework for Contrastive Learning
of Visual Representations [ICML’20]
SSII2021 10
Recent related works
Increase negatives
Increase positives
MoCo[CVPR’20]
Points (Negative samples)
He, Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." CVPR2020.
SSII2021
11
MoCo[CVPR’20]
He, Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." CVPR2020.
SSII2021
12
End-to-end:
• Negative
• All the other samples (-anchor/positive)
• Two encoders
• q: anchor; k:positive/negative
• Benefit from large batch size
• Memory problem
MoCo[CVPR’20]
He, Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." CVPR2020.
SSII2021
13
Memory Bank(MB):
• Negative:
• Embedding stored in MB
• Random sampling from
MB
• Memory bank updating
• Computing cost problem
MoCo[CVPR’20]
He, Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." CVPR2020.
SSII2021
14
Momentum encoder
• Encoder:
• Only positive sample
• Negative:
• Past embeddings of positives
• Queue: save embedded features
• Updating: weight of momentum encoder
SimCLR[ICML’20]
Points
• Positive samples: data augmentation
• Random crops + color distortion
• Negative samples: larger batch size(end-to-end)
SimCLR:Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." ICML2020.
SSII2021
15
Effect of Recent Works
SSII2021 16
source: [SimCLR: A Simple Framework for Contrastive Learning of Visual Representations]
• Performance approaching
supervised methods
• Limitation:
more time & parameters
• Less labeling cost
• High generalization
(Eg. cross-domain)
Summary of Recent Works
• Positive samples:
• Multi-sampling method: transformation; crops;
• Negative samples:
• Larger batch size
• Save previous features: memory bank ; queue
SSII2021 17
• Intra-positives: same instance, same class
• Inter-negatives: different instances
Inter-intra Contrastive Framework [MM’20]
SSII2021 18
• Intra-positives: same instance, same class
 Intra-negatives: same instance, different class
• Inter-positives: different instances, same class
 Inter-negatives: different instances
Traditional contrastive learning
IIC: L. TAO, X. Wang, and T. YAMASAKI, “Self-supervised Video Representation Learning Using Inter-
intra Contrastive Framework”, ACMMM2020.
Inter-intra contrastive (IIC) learning framework
makes the most use of data
Proposed Concept (video task)
Constrains of our method
• Intra-positive: multi view (CMC based)
• Optical flow
• Frame difference (Residual frame)
• Inter-negative:
• Different instances
• Intra-negative samples
• Same instance destroying
temporal information
19
Inter-intra contrastive learning
SSII2021
CMC: Tian, Yonglong et al. “Contrastive Multiview Coding.”, ArXiv abs/1906.05849 (2019): n. pag.
Generation of Intra-negative Samples
• Break temporal relations
• Similar statistical information
• Two options
• Frame repeating
• Frame shuffling
20
SSII2021
Evaluation
Downstream tasks
•Video retrieval
•Video recognition
21
SSII2021
Results: video retrieval
• On UCF101
22
SSII2021
• On HMDB51
Results: video recognition
• SSL pretrained on
• UCF101 split 1
• Finetuned on
• UCF101 (3 splits)
• HMDB51 (3 splits)
23
* indicates results using the same network backbone, R3D.
SSII2021
Summary of IIC
• Introduce intra-negative samples to encourage models to learn
rich temporal information
• Significant improvements over the state-of-the-art methods are
achieved on two video tasks
24
Project page GitHub repo
SSII2021
Directions of
Contrastive Self-supervised Learning
• Pretext task selection/design
• Pair samples
• Find and use effective positive/negative samples
• Combine with supervised learning
• Supervised contrastive learning
• Task related Self-supervised Learning
SSII2021 25
Thank you for your attention!
SSII2021 26

SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用

  • 1.
  • 2.
    Self-Supervised Learning Pretrain-Finetune Pipeline SSII2021 Representation classifierCat Tiger Label Representation classifier Dog Supervised Pipeline Pretrain Finetune (downstream tasks) • Large amount of data • High annotation cost 2
  • 3.
    Self-Supervised Learning Pretrain-Finetune Pipeline 3 SSII2021 Representation classifierCat Tiger Label Representation classifier Dog Representation Pretext Task Representation classifier Dog Supervised Pipeline Self-Supervised Pipeline No Annotated Label ! Pretrain Finetune (downstream tasks)
  • 4.
    Self-Supervised Learning Pretrain-Finetune Pipeline 4 SSII2021 Representation classifierCat Tiger Label Representation classifier Dog Representation Pretext Task Representation classifier Dog Supervised Pipeline Self-Supervised Pipeline No Annotated Label ! Pretrain Finetune Traditional supervised learning • Large amount of data+annotated labels • Task-specific learning, limited generalization Self-supervised learning • Reducing cost of human annotations • Supervision from the data itself • General representation learning • Alternative downstream tasks
  • 5.
    Self-Supervised Learning Pretrain-Finetune Pipeline 5 SSII2021 Representation Pretext Task Representation classifier Dog Self-SupervisedPipeline No Annotated Label ! Point • How to learn effective representation from unlabeled data? • How to design effective pretext from the data itself? Self-supervised learning • Reducing cost of human annotations • Supervision from the data itself • General representation learning • Alternative downstream tasks
  • 6.
    Self-Supervised Learning How :Paradigm Overview 6 SSII2021 Representation Similar or not Generative / Predictive Contrastive Representation • Loss measured in the output space • Learning to reconstruct/predict  Better reconstruction/prediction better representation • Loss measured in the representation space • Learning to distinguish  Do not need to reconstruct all detail  focus on distinguishing samples Positive Negative Anchor Original Generated Eg. Auto-Encoders, BERT
  • 7.
    Self-Supervised Learning How :Paradigm Overview SSII2021 Generative / Predictive Representation • Loss measured in the output space • Learning to reconstruct/predict  Better reconstruction/prediction better representation Original Generated Eg. Auto-Encoders, BERT Top: Drawing of a dollar bill from memory Down: Drawing subsequently made with a dollar bill present. [Image source: Epstein, 2016]
  • 8.
    Self-Supervised Learning How :Paradigm Overview 8 SSII2021 Representation Similar or not Generative / Predictive Contrastive • Loss measured in the output space • Learning to reconstruct/predict  Better reconstruction/prediction better representation • Loss measured in the representation space • Learning to distinguish  Do not need to reconstruct all detail  focus on distinguishing samples Positive Negative Anchor Representation Original Generated
  • 9.
    Contrastive Learning Point: distinguishfeatures among different instances 9 SSII2021 Representation Positive (𝒙+ ) Negative (𝐱𝐣, 𝒕𝒉𝒆 𝒐𝒕𝒉𝒆𝒓 𝒔𝒂𝒎𝒑𝒍𝒆𝒔) Anchor (𝒙) Positive Anchor Negative Anchor similar dissimilar InfoNCE Loss [Gutmann+, AISTATS’10]
  • 10.
    Contrastive Learning • MoCo:Momentum contrast for unsupervised visual representation learning [CVPR’20] • SimCLR: A Simple Framework for Contrastive Learning of Visual Representations [ICML’20] SSII2021 10 Recent related works Increase negatives Increase positives
  • 11.
    MoCo[CVPR’20] Points (Negative samples) He,Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." CVPR2020. SSII2021 11
  • 12.
    MoCo[CVPR’20] He, Kaiming, etal. "Momentum contrast for unsupervised visual representation learning." CVPR2020. SSII2021 12 End-to-end: • Negative • All the other samples (-anchor/positive) • Two encoders • q: anchor; k:positive/negative • Benefit from large batch size • Memory problem
  • 13.
    MoCo[CVPR’20] He, Kaiming, etal. "Momentum contrast for unsupervised visual representation learning." CVPR2020. SSII2021 13 Memory Bank(MB): • Negative: • Embedding stored in MB • Random sampling from MB • Memory bank updating • Computing cost problem
  • 14.
    MoCo[CVPR’20] He, Kaiming, etal. "Momentum contrast for unsupervised visual representation learning." CVPR2020. SSII2021 14 Momentum encoder • Encoder: • Only positive sample • Negative: • Past embeddings of positives • Queue: save embedded features • Updating: weight of momentum encoder
  • 15.
    SimCLR[ICML’20] Points • Positive samples:data augmentation • Random crops + color distortion • Negative samples: larger batch size(end-to-end) SimCLR:Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." ICML2020. SSII2021 15
  • 16.
    Effect of RecentWorks SSII2021 16 source: [SimCLR: A Simple Framework for Contrastive Learning of Visual Representations] • Performance approaching supervised methods • Limitation: more time & parameters • Less labeling cost • High generalization (Eg. cross-domain)
  • 17.
    Summary of RecentWorks • Positive samples: • Multi-sampling method: transformation; crops; • Negative samples: • Larger batch size • Save previous features: memory bank ; queue SSII2021 17 • Intra-positives: same instance, same class • Inter-negatives: different instances
  • 18.
    Inter-intra Contrastive Framework[MM’20] SSII2021 18 • Intra-positives: same instance, same class  Intra-negatives: same instance, different class • Inter-positives: different instances, same class  Inter-negatives: different instances Traditional contrastive learning IIC: L. TAO, X. Wang, and T. YAMASAKI, “Self-supervised Video Representation Learning Using Inter- intra Contrastive Framework”, ACMMM2020. Inter-intra contrastive (IIC) learning framework makes the most use of data
  • 19.
    Proposed Concept (videotask) Constrains of our method • Intra-positive: multi view (CMC based) • Optical flow • Frame difference (Residual frame) • Inter-negative: • Different instances • Intra-negative samples • Same instance destroying temporal information 19 Inter-intra contrastive learning SSII2021 CMC: Tian, Yonglong et al. “Contrastive Multiview Coding.”, ArXiv abs/1906.05849 (2019): n. pag.
  • 20.
    Generation of Intra-negativeSamples • Break temporal relations • Similar statistical information • Two options • Frame repeating • Frame shuffling 20 SSII2021
  • 21.
  • 22.
    Results: video retrieval •On UCF101 22 SSII2021 • On HMDB51
  • 23.
    Results: video recognition •SSL pretrained on • UCF101 split 1 • Finetuned on • UCF101 (3 splits) • HMDB51 (3 splits) 23 * indicates results using the same network backbone, R3D. SSII2021
  • 24.
    Summary of IIC •Introduce intra-negative samples to encourage models to learn rich temporal information • Significant improvements over the state-of-the-art methods are achieved on two video tasks 24 Project page GitHub repo SSII2021
  • 25.
    Directions of Contrastive Self-supervisedLearning • Pretext task selection/design • Pair samples • Find and use effective positive/negative samples • Combine with supervised learning • Supervised contrastive learning • Task related Self-supervised Learning SSII2021 25
  • 26.
    Thank you foryour attention! SSII2021 26