SlideShare a Scribd company logo
Learning Visual Representation
without Manual-Label
kv
kelispinor@gmail.com
Types of Learning
With Target Without Target
Active Reinforcement Learning,
Active Learning
Motivation, Exploration
?
Passive Supervised Learning Self-Supervised Learning
Active: Non-Stationary Dataset
Passive: Stationary Dataset
Today’s Topic
Visual Representation
● Global: style, semantics
● Local: attribute
● Metric: embedding
Today’s Topic
Visual Representation
● Global: style, semantics
● Local: attribute
● Metric: embedding
General visual features
Today’s Topic
Visual Representation
● Global: style, semantics
● Local: attribute
● Metric: embedding
Label
● Much more expensive than general data (can not scale)
● Usually annotated for the specific task
● Contains far less information than data itself
General visual features
Goal of self-supervised learning
● Explore the structure of data distribution
● Task-driven representations are limited by targets (requirements of the task)
● Rapid generalization to new tasks and applications
Motivation
● Hot research topics
● Performance approaches supervised setting
● Relate to deep metric learning
Practical Motivation
SSLFTW → Self-Supervised Learning F**k The World !!!
https://twitter.com/ylecun/status/1228763787244843013
Evolution of ResNet-50
R50: #of params = 24M
Date Training Senario Method Backbone label fraction Top 1 Accuracy Top 5 Accuracy
2019/11
Semi Sup.
Noisy Stduent EfficientNet (480M) 100 + extra 88.4 98.7
2019/05 Teacher-Student
R50 (24M)
100 + extra 81.2
- Sup. - 100 76
2020/02
Self Sup.
SimCLR 0 69.3 89
2019/06 CMC 0 64.1 85.1
2019/12 CPC v2 0 63.8 85.3
2019/12 PIRL 0 63.6
2019/11 MoCo 0 60.6
2019/07 BigBiGAN 0 56.6
2018/05 InstDisc 0 54
● Are manual-label necessary for learning useful concepts? Are data itself contain rich
information?
● Can we treat each image as a single class? Or even single pixel as a class?
● Can we implicitly increase batch size? How to maintain embedding space stability?
● Does final layer contain rich representation?
● Do we reach the complexity upper bound of ResNet-50? If not, what is the efficient
training procedure?
● Is data augmentation a trick or an important DL feature?
Outline: Questions to be Discussed
Part I: Self-Supervised Learning
Pretext Task
Previous Works
Generate `labels` by rule
● Rotation
● Jigsaw Puzzle
● Colorization
Predicting Image Rotations
Unsupervised Representation Learning by Predicting Image Rotations
Solving Jigsaw Puzzles
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
Colorization
Tracking Emerges by Colorizing Videos
Two Major Ideas
Pretext Task (Surrogate Loss)
● Rotation
● Jigsaw Puzzle
● Colorization
`Data-Centric` Loss Function
● Mutual Information
● Energy-based model
Aside:
Disentanglement in β TC-VAE
Contrastive Learning
Autoregressive Generative Model & PixelCNN
Product chain rule of probability
Conditional Image Generation with PixelCNN Decoders
Self-Supervised without Reconstruction
● PixelCNN is almost the best likelihood model
● But log-likelihood model is flaw to encode high-level information
● Deep network learns hierarchical internal representation of the data
● Learn the dataset, not the data points
● Use high-level information to organize low-level data rather than annotate it
CPC: Contrastive Predictive Coding
Rather than directly model the distribution, extract shared information between x & c
be better for the purpose
Summarize pixels into context (or history into current state)
MI: generalized correlation function
c x
CPC: Contrastive Predictive Coding
Representation Learning with Contrastive Predictive Coding
CPC: Contrastive Predictive Coding
To maximize mutual information,
We model the ratio of prob. density
Simply use log-bilinear model as f, and linear map to predict future
Note: bilinear map f(u, v) = dot(u, v)
where
CPC: Contrastive Predictive Coding
● It’s called InfoNCE (also called categorical cross-entropy or softmax loss)
● 1 Postive sample; N-1 Negative samples
● N is crucial to the performance
● Loose theoretical lower bound estimation
all predictions
real future
Learning Deep Representations of Fine-grained Visual Descriptions
CPC: Contrastive Predictive Coding
feature extractor (patched)
context predictor (pixelCNN head)
predicted future
negative samples
downstream task
positive samples
Data-Efficient Image Recognition with Contrastive Predictive Coding
Evaluation Protocol
● Linear Evaluation
● Efficient Classification
● Transfer Learning
Labelled images in ImageNet
(14 million images)
● 1% : 12.8 per class
● 10%: 128 per class
Balanced distribution over class.
CPC: Contrastive Predictive Coding
● First paper shows significant
improvement in real dataset
● Top 1 = 71.5% on ImageNet
● Label efficiency becomes
advantage
CPC: Contrastive Predictive Coding
From v1 to v2
MC: Model Capacity
BU: Bottom-up Spatical Prediction
LN: Layer Normalization
RC: Random Crop
HP: Horizontal Spatical Prediction
PA: Patch Augmentation
MC & LN
● R101 → R161 & increase feat. dim
● BN statisics may cheat
CPC: Contrastive Predictive Coding
Questions
Is Patch Necessary?
→ Contrastive Consistency is Important
What hyperparameters are sensitive?
→ ResNet-like & Feature Dimension
CMC: Contrastive Multiview Coding
Contrastive Multiview Coding
Qualitative Study
Remarks
● Pretext task does not always translate well
● Skip connection prevent degradation of
representation quality
● Model Capacity (depth & rep. size) strongly
influence quality
Revisiting Self-Supervised Visual Representation Learning
MoCo: Momentum Contrast
Contrastive Learning as Dictionary Look-Up
Dictionary should be large & consistent
● Context: Right Key
Momentum Contrast for Unsupervised Visual Representation Learning
MoCo: Momentum Contrast
Memory Bank (as Replay Buffer in RL)
● Batch size is limited by hardware
● Maintain all keys in memory O(N)
MoCo
● Dynamic queue rather than memory bank O(N) → O(K)
● Momentum update encoder rather than key
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
MoCo: Momentum Contrast
Queue
aug 1 aug 2
Key space
Fast Slow
MoCo: Momentum Contrast
MoCo: Momentum Contrast
● End-to-End: K as batch size
SimCLR: Simple Framework for Contrastive Learning
Up to date, in 2020, Hinton publishes
- Stacked Capsule AE
- SimCLR
- Subclass Distillation
A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: Simple Framework for Contrastive Learning
● Data augmentation is crucial for
contrastive learning
● Non-linear mapping preserve the
information
● Larger batch size
● Normalized embedding
SimCLR: Simple Framework for Contrastive Learning
aug 2aug 1
….Representation
Compute Loss
SimCLR: Simple Framework for Contrastive Learning
SimCLR: Simple Framework for Contrastive Learning
Random Cropping + Color Distortion
Histogram of different crops in two images
SimCLR: Simple Framework for Contrastive Learning
Remind ReID Strong Baseline
● h: triplet embedding
● z: inference embedding & ID loss
● g: batch norm
Loss of information induced by
contrastive loss for downstream task.
Projection
Representation
Bag of Tricks and A Strong Baseline for Deep Person Re-identification
SimCLR: Simple Framework for Contrastive Learning
Money Talks
● Batch Size
● Training Epoch
● Simple
SimCLR: Simple Framework for Contrastive Learning
Transfer to smaller datasets
Training Scheme
Method Affair Backbone Batch Size Solver (lr) Epoch Machine
CPC v2 DeepMind ResNet 161 512 Adam + const lr 200 16 GPUs
CMC MIT
ResNet 50
156 ~ 240 SGD + Stepwise 240 *4 GPUs
MoCo FAIR 256 SGD + Stepwise 200 64 GPUs
SimCRL Google
Brain
8,192 LARS + Cos
schedule
1000 TPU 128
cores
Large Batch Training of Convolutional Networks
“We find in-batch negative
sampling surffices.”
Part II: Semi-Supervised Learning
Teacher-Student
Teacher-Student is kind of self-training
framework
● Train teacher with labelled data D
● Run trained teacher on unlabelled
examples D’
● Train a new student on D’
● Finetune student on D
Billion-scale semi-supervised learning for image classification
Teacher-Student
Noisy Student
● Train teacher with labelled
data D
● Run trained teacher on
unlabelled examples D’
● Train a equal size or larger
student on D & D’ with noise
added to student
→ Knowledge Expansion
Self-training with Noisy Student improves ImageNet classification
Noisy Student
Data Noise
● RandAug
→ Local Smoothness
Model Noise
● Dropout
● Stochastic Depth
→ Ensemble teacher
Others
● Data Balancing
Deep Networks with Stochastic Depth,
RandAugment: Practical automated data augmentation with a reduced search space
Noisy Student
Sup.
Semi-Sup.
Performance Margin
● Arch. complexity ~ 10%
● Extra Data ~ 3 to 5%
Semi-Supervised Training Scheme
Method Batch Size Solver Epoch Unlabelled Devices
Teacher
Student
1536 (24 x 64) SGD 1x ? IG-3B
(1500 tags)
64 GPUs
Noisy Student 512 ~ 2048 SGD 300 - 700 JFT-300M
(18291 categories)
Cloud TPU v3
2048 cores
ImageNet: 14M - 1000 classes
Affinity and Diversity
Affinity and Diversity: Quantifying Mechanisms of Data Augmentation
● Affinity: Distribution shift caused by augmentation
● Diversity: Complexity of augmentation applied
(Both are model-dependent measures)
Increase the effective unique
number of training data
Affinity and Diversity
Performance Boost Tricks
● Decaying learning rate on an appropriate
schedule
● Turning off l2 regularization at the right time in
training does not hurt performance
● Relaxing architectural constraints mid-training can
boost final performance
● Turning augmentations off and fine-tuning on
clean data can improve final test accuracy
Conclusion
Insights & Techniques
● Usage of Unlabelled or Pseudo Labelled Data
● Contrastive Loss Extracts Representative Features
● Distallation Squeezes Large-Scale Dataset
● Data Balance
● Dynamic Queue for Negative Samples
● Momentum Update for Encoder
● Non-linaer Head for Representation Preservation
● Augmentation Composition
● Constraint Relaxation during Training
References
1. Unsupervised Representation Learning by Predicting Image Rotations
2. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
3. Tracking Emerges by Colorizing Videos
4. Conditional Image Generation with PixelCNN Decoders
5. Representation Learning with Contrastive Predictive Coding
6. Learning Deep Representations of Fine-grained Visual Descriptions
7. Data-Efficient Image Recognition with Contrastive Predictive Coding
8. Contrastive Multiview Coding
9. Revisiting Self-Supervised Visual Representation Learning
10. Momentum Contrast for Unsupervised Visual Representation Learning
11. Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
12. A Simple Framework for Contrastive Learning of Visual Representations
13. Billion-scale semi-supervised learning for image classification
14. Self-training with Noisy Student improves ImageNet classification
15. Deep Networks with Stochastic Depth
16. RandAugment: Practical automated data augmentation with a reduced search space
17. Affinity and Diversity: Quantifying Mechanisms of Data Augmentation

More Related Content

What's hot

Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
Kuppusamy P
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
Gaussian process in machine learning
Gaussian process in machine learningGaussian process in machine learning
Gaussian process in machine learning
VARUN KUMAR
 
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
SlideTeam
 
BitCoin Price Predictor.pptx
BitCoin Price Predictor.pptxBitCoin Price Predictor.pptx
BitCoin Price Predictor.pptx
Irfan Anjum
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
Rodion Kiryukhin
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
Si Haem
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
Owin Will
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Multi sensor-fusion
Multi sensor-fusionMulti sensor-fusion
Multi sensor-fusion
万言 李
 
Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)
Olusola Amusan
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
JaeJun Yoo
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
Melaku Eneayehu
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang
 
Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference Learning
Seung Jae Lee
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
Sharayu Patil
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
Databricks
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
Rouyun Pan
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 

What's hot (20)

Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Gaussian process in machine learning
Gaussian process in machine learningGaussian process in machine learning
Gaussian process in machine learning
 
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...
 
BitCoin Price Predictor.pptx
BitCoin Price Predictor.pptxBitCoin Price Predictor.pptx
BitCoin Price Predictor.pptx
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Multi sensor-fusion
Multi sensor-fusionMulti sensor-fusion
Multi sensor-fusion
 
Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference Learning
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 

Similar to Learning visual representation without human label

Deep learning at nmc devin jones
Deep learning at nmc devin jones Deep learning at nmc devin jones
Deep learning at nmc devin jones
Ido Shilon
 
Pratik ibm-open power-ppt
Pratik ibm-open power-pptPratik ibm-open power-ppt
Pratik ibm-open power-ppt
Vaibhav R
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Transfer Learning CV - Souradip and Sayak
Transfer Learning CV - Souradip and SayakTransfer Learning CV - Souradip and Sayak
Transfer Learning CV - Souradip and Sayak
Sayak Paul
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
RAHUL BHOJWANI
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
CHENHuiMei
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
Sungjoon Choi
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
Chun-Hao Chang
 
Deep Generative Modelling
Deep Generative ModellingDeep Generative Modelling
Deep Generative Modelling
Petko Nikolov
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Introduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingIntroduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable Rendering
Preferred Networks
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Universitat Politècnica de Catalunya
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
Arvind Rapaka
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdf
dsfajkh
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image Processing
Tech Triveni
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
NUS-ISS
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
Turi, Inc.
 
contrastive-learning2.pdf
contrastive-learning2.pdfcontrastive-learning2.pdf
contrastive-learning2.pdf
omogire
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Subrat Panda, PhD
 

Similar to Learning visual representation without human label (20)

Deep learning at nmc devin jones
Deep learning at nmc devin jones Deep learning at nmc devin jones
Deep learning at nmc devin jones
 
Pratik ibm-open power-ppt
Pratik ibm-open power-pptPratik ibm-open power-ppt
Pratik ibm-open power-ppt
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Transfer Learning CV - Souradip and Sayak
Transfer Learning CV - Souradip and SayakTransfer Learning CV - Souradip and Sayak
Transfer Learning CV - Souradip and Sayak
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
Deep Generative Modelling
Deep Generative ModellingDeep Generative Modelling
Deep Generative Modelling
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
Introduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingIntroduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable Rendering
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdf
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image Processing
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
contrastive-learning2.pdf
contrastive-learning2.pdfcontrastive-learning2.pdf
contrastive-learning2.pdf
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 

More from Kai-Wen Zhao

Deep Double Descent
Deep Double DescentDeep Double Descent
Deep Double Descent
Kai-Wen Zhao
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
Kai-Wen Zhao
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifold
Kai-Wen Zhao
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBO
Kai-Wen Zhao
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
Kai-Wen Zhao
 
Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...
Kai-Wen Zhao
 
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
Kai-Wen Zhao
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
Kai-Wen Zhao
 

More from Kai-Wen Zhao (8)

Deep Double Descent
Deep Double DescentDeep Double Descent
Deep Double Descent
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifold
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBO
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...
 
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 

Recently uploaded

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 

Recently uploaded (20)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

Learning visual representation without human label

  • 1. Learning Visual Representation without Manual-Label kv kelispinor@gmail.com
  • 2. Types of Learning With Target Without Target Active Reinforcement Learning, Active Learning Motivation, Exploration ? Passive Supervised Learning Self-Supervised Learning Active: Non-Stationary Dataset Passive: Stationary Dataset
  • 3. Today’s Topic Visual Representation ● Global: style, semantics ● Local: attribute ● Metric: embedding
  • 4. Today’s Topic Visual Representation ● Global: style, semantics ● Local: attribute ● Metric: embedding General visual features
  • 5. Today’s Topic Visual Representation ● Global: style, semantics ● Local: attribute ● Metric: embedding Label ● Much more expensive than general data (can not scale) ● Usually annotated for the specific task ● Contains far less information than data itself General visual features
  • 6. Goal of self-supervised learning ● Explore the structure of data distribution ● Task-driven representations are limited by targets (requirements of the task) ● Rapid generalization to new tasks and applications Motivation
  • 7. ● Hot research topics ● Performance approaches supervised setting ● Relate to deep metric learning Practical Motivation SSLFTW → Self-Supervised Learning F**k The World !!! https://twitter.com/ylecun/status/1228763787244843013
  • 8. Evolution of ResNet-50 R50: #of params = 24M Date Training Senario Method Backbone label fraction Top 1 Accuracy Top 5 Accuracy 2019/11 Semi Sup. Noisy Stduent EfficientNet (480M) 100 + extra 88.4 98.7 2019/05 Teacher-Student R50 (24M) 100 + extra 81.2 - Sup. - 100 76 2020/02 Self Sup. SimCLR 0 69.3 89 2019/06 CMC 0 64.1 85.1 2019/12 CPC v2 0 63.8 85.3 2019/12 PIRL 0 63.6 2019/11 MoCo 0 60.6 2019/07 BigBiGAN 0 56.6 2018/05 InstDisc 0 54
  • 9. ● Are manual-label necessary for learning useful concepts? Are data itself contain rich information? ● Can we treat each image as a single class? Or even single pixel as a class? ● Can we implicitly increase batch size? How to maintain embedding space stability? ● Does final layer contain rich representation? ● Do we reach the complexity upper bound of ResNet-50? If not, what is the efficient training procedure? ● Is data augmentation a trick or an important DL feature? Outline: Questions to be Discussed
  • 12. Previous Works Generate `labels` by rule ● Rotation ● Jigsaw Puzzle ● Colorization
  • 13. Predicting Image Rotations Unsupervised Representation Learning by Predicting Image Rotations
  • 14. Solving Jigsaw Puzzles Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
  • 15. Colorization Tracking Emerges by Colorizing Videos
  • 16. Two Major Ideas Pretext Task (Surrogate Loss) ● Rotation ● Jigsaw Puzzle ● Colorization `Data-Centric` Loss Function ● Mutual Information ● Energy-based model Aside: Disentanglement in β TC-VAE
  • 18. Autoregressive Generative Model & PixelCNN Product chain rule of probability Conditional Image Generation with PixelCNN Decoders
  • 19. Self-Supervised without Reconstruction ● PixelCNN is almost the best likelihood model ● But log-likelihood model is flaw to encode high-level information ● Deep network learns hierarchical internal representation of the data ● Learn the dataset, not the data points ● Use high-level information to organize low-level data rather than annotate it
  • 20. CPC: Contrastive Predictive Coding Rather than directly model the distribution, extract shared information between x & c be better for the purpose Summarize pixels into context (or history into current state) MI: generalized correlation function c x
  • 21. CPC: Contrastive Predictive Coding Representation Learning with Contrastive Predictive Coding
  • 22. CPC: Contrastive Predictive Coding To maximize mutual information, We model the ratio of prob. density Simply use log-bilinear model as f, and linear map to predict future Note: bilinear map f(u, v) = dot(u, v) where
  • 23. CPC: Contrastive Predictive Coding ● It’s called InfoNCE (also called categorical cross-entropy or softmax loss) ● 1 Postive sample; N-1 Negative samples ● N is crucial to the performance ● Loose theoretical lower bound estimation all predictions real future Learning Deep Representations of Fine-grained Visual Descriptions
  • 24. CPC: Contrastive Predictive Coding feature extractor (patched) context predictor (pixelCNN head) predicted future negative samples downstream task positive samples Data-Efficient Image Recognition with Contrastive Predictive Coding
  • 25. Evaluation Protocol ● Linear Evaluation ● Efficient Classification ● Transfer Learning Labelled images in ImageNet (14 million images) ● 1% : 12.8 per class ● 10%: 128 per class Balanced distribution over class.
  • 26. CPC: Contrastive Predictive Coding ● First paper shows significant improvement in real dataset ● Top 1 = 71.5% on ImageNet ● Label efficiency becomes advantage
  • 27. CPC: Contrastive Predictive Coding From v1 to v2 MC: Model Capacity BU: Bottom-up Spatical Prediction LN: Layer Normalization RC: Random Crop HP: Horizontal Spatical Prediction PA: Patch Augmentation MC & LN ● R101 → R161 & increase feat. dim ● BN statisics may cheat
  • 29. Questions Is Patch Necessary? → Contrastive Consistency is Important What hyperparameters are sensitive? → ResNet-like & Feature Dimension
  • 30. CMC: Contrastive Multiview Coding Contrastive Multiview Coding
  • 31. Qualitative Study Remarks ● Pretext task does not always translate well ● Skip connection prevent degradation of representation quality ● Model Capacity (depth & rep. size) strongly influence quality Revisiting Self-Supervised Visual Representation Learning
  • 32. MoCo: Momentum Contrast Contrastive Learning as Dictionary Look-Up Dictionary should be large & consistent ● Context: Right Key Momentum Contrast for Unsupervised Visual Representation Learning
  • 33. MoCo: Momentum Contrast Memory Bank (as Replay Buffer in RL) ● Batch size is limited by hardware ● Maintain all keys in memory O(N) MoCo ● Dynamic queue rather than memory bank O(N) → O(K) ● Momentum update encoder rather than key Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
  • 34. MoCo: Momentum Contrast Queue aug 1 aug 2 Key space Fast Slow
  • 36. MoCo: Momentum Contrast ● End-to-End: K as batch size
  • 37. SimCLR: Simple Framework for Contrastive Learning Up to date, in 2020, Hinton publishes - Stacked Capsule AE - SimCLR - Subclass Distillation A Simple Framework for Contrastive Learning of Visual Representations
  • 38. SimCLR: Simple Framework for Contrastive Learning ● Data augmentation is crucial for contrastive learning ● Non-linear mapping preserve the information ● Larger batch size ● Normalized embedding
  • 39. SimCLR: Simple Framework for Contrastive Learning aug 2aug 1 ….Representation Compute Loss
  • 40. SimCLR: Simple Framework for Contrastive Learning
  • 41. SimCLR: Simple Framework for Contrastive Learning Random Cropping + Color Distortion Histogram of different crops in two images
  • 42. SimCLR: Simple Framework for Contrastive Learning Remind ReID Strong Baseline ● h: triplet embedding ● z: inference embedding & ID loss ● g: batch norm Loss of information induced by contrastive loss for downstream task. Projection Representation Bag of Tricks and A Strong Baseline for Deep Person Re-identification
  • 43. SimCLR: Simple Framework for Contrastive Learning Money Talks ● Batch Size ● Training Epoch ● Simple
  • 44. SimCLR: Simple Framework for Contrastive Learning Transfer to smaller datasets
  • 45. Training Scheme Method Affair Backbone Batch Size Solver (lr) Epoch Machine CPC v2 DeepMind ResNet 161 512 Adam + const lr 200 16 GPUs CMC MIT ResNet 50 156 ~ 240 SGD + Stepwise 240 *4 GPUs MoCo FAIR 256 SGD + Stepwise 200 64 GPUs SimCRL Google Brain 8,192 LARS + Cos schedule 1000 TPU 128 cores Large Batch Training of Convolutional Networks “We find in-batch negative sampling surffices.”
  • 47. Teacher-Student Teacher-Student is kind of self-training framework ● Train teacher with labelled data D ● Run trained teacher on unlabelled examples D’ ● Train a new student on D’ ● Finetune student on D Billion-scale semi-supervised learning for image classification
  • 49. Noisy Student ● Train teacher with labelled data D ● Run trained teacher on unlabelled examples D’ ● Train a equal size or larger student on D & D’ with noise added to student → Knowledge Expansion Self-training with Noisy Student improves ImageNet classification
  • 50. Noisy Student Data Noise ● RandAug → Local Smoothness Model Noise ● Dropout ● Stochastic Depth → Ensemble teacher Others ● Data Balancing Deep Networks with Stochastic Depth, RandAugment: Practical automated data augmentation with a reduced search space
  • 51. Noisy Student Sup. Semi-Sup. Performance Margin ● Arch. complexity ~ 10% ● Extra Data ~ 3 to 5%
  • 52. Semi-Supervised Training Scheme Method Batch Size Solver Epoch Unlabelled Devices Teacher Student 1536 (24 x 64) SGD 1x ? IG-3B (1500 tags) 64 GPUs Noisy Student 512 ~ 2048 SGD 300 - 700 JFT-300M (18291 categories) Cloud TPU v3 2048 cores ImageNet: 14M - 1000 classes
  • 53. Affinity and Diversity Affinity and Diversity: Quantifying Mechanisms of Data Augmentation ● Affinity: Distribution shift caused by augmentation ● Diversity: Complexity of augmentation applied (Both are model-dependent measures) Increase the effective unique number of training data
  • 54. Affinity and Diversity Performance Boost Tricks ● Decaying learning rate on an appropriate schedule ● Turning off l2 regularization at the right time in training does not hurt performance ● Relaxing architectural constraints mid-training can boost final performance ● Turning augmentations off and fine-tuning on clean data can improve final test accuracy
  • 55. Conclusion Insights & Techniques ● Usage of Unlabelled or Pseudo Labelled Data ● Contrastive Loss Extracts Representative Features ● Distallation Squeezes Large-Scale Dataset ● Data Balance ● Dynamic Queue for Negative Samples ● Momentum Update for Encoder ● Non-linaer Head for Representation Preservation ● Augmentation Composition ● Constraint Relaxation during Training
  • 56. References 1. Unsupervised Representation Learning by Predicting Image Rotations 2. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles 3. Tracking Emerges by Colorizing Videos 4. Conditional Image Generation with PixelCNN Decoders 5. Representation Learning with Contrastive Predictive Coding 6. Learning Deep Representations of Fine-grained Visual Descriptions 7. Data-Efficient Image Recognition with Contrastive Predictive Coding 8. Contrastive Multiview Coding 9. Revisiting Self-Supervised Visual Representation Learning 10. Momentum Contrast for Unsupervised Visual Representation Learning 11. Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination 12. A Simple Framework for Contrastive Learning of Visual Representations 13. Billion-scale semi-supervised learning for image classification 14. Self-training with Noisy Student improves ImageNet classification 15. Deep Networks with Stochastic Depth 16. RandAugment: Practical automated data augmentation with a reduced search space 17. Affinity and Diversity: Quantifying Mechanisms of Data Augmentation