Self-Supervised Learning (SSL)
Setia Juli Irzal Ismail
5 Nov 2024
Taxonomy
DL –
Learning
Paradigm
2
Supervised Learning
• Self-Supervised Learning
Unsupervised Learning
Reinforcement Learning
AI today is mostly supervised learning
• Training a machine by showing examples instead of programming it
• When the output is wrong, tweak the parameters of the machine
leCun, 2019
Weakness
1. Supervised Learning
• Labeling is difficult and expensive
• ImageNet with 14 Million images, 1000 category took 22 human years
• Require expert: medicine
2. Unsupervised Learning
• Clustering/dimensionality reduction
• Representations?
Reinforcement learning
• RL works well for games
• Too many trial and error
• Its ok in a game
• Its not ok in the real world
• 100 hours to reach the performance that a human can reach in 15 minutes on Atari games
(Hessel ArXiv:1710.02298)
• Real world don't have explicit reward
SSL
1. Create label from the data
2. Generate useful representations from the data
3. Adaptability to multiple data & modality (images, text, audio, video)
Analogy – How baby learn
Analogy - Jigsaw
How do SSL works
Raw Data
Learn from
unlabeled data
Transfer
learning
SSL
Pretext task
• Learn from unlabeled data
• Captured pattern & feature
• Trained model
Downstream task
• Supervised learning
• Fine-tuning
• Classification
SSL
Pretext task
• Represent how data relate to one
another
• Big Dataset
• Often different dataset than
downstram task
Downstream task
• Train linear classifier on Frozen
weight/features
• Full finetuning
SSL Taxonomy
1. Contrastive learning
2. Masked modeling/Predictive learning
3. Clustering
1. Contrastive
Learning
• Model learn to distinguish between
similar (positive) and dissimilar
(negative) data samples
• Pushing similar pairs together
• Pushing dissimilar pairs farther
apart
• Model learns feature that are
invariant to transformations
Chen, SimCLR, 2020
Contrastive learning
Positive vs Negative (similarity?)
Cosine similarity; Dot product
Model calculate similarity score between positive pair and negative pairs
Isra, PIRL, 2019
Contrastive Learning components
1. Siamese Network
2. Augmentation
• Random cropping
• Rotation
• Color jittering
3. Loss Function
https://ubiai.tools/what-are-the-advantages-anddisadvantages-of-data-augmentation-2023-update/
Contrastive loss function
• Maximize the similarity of positive pairs
• Minimize the similarity of negative pairs
• SimCLR, MoCo, SimSiam, BYOL
Chen, SimCLR, 2020
2. Masked Modelling/ Predictive
• Model learns to predict missing parts of data
• Model learn complex relationship
https://medium.com/@shaikhrayyan123/a-comprehensive-guide-to-understanding-bert-from-beginners-to-advanced-2379699e2b51
Colorization
Zhang, 2016, https://arxiv.org/abs/1603.08511
Puzzle
Masked modelling Components
• Data masking
• Predicting the masked parts
• Training objective
• Reconstruction loss, cross entropy loss
example
• BERT
• RoBERTa
• MAE
• BEiT
3. Clustering
• Grouping similar
data into clusters
Misra, 2019
Clustering steps
Feature Extraction
Clustering the embbeddings
• K-means, KNN, sinkhorn-knopp
Self-training with pseudo labels
Update cluster assignments
Model representations
Caron, et.al, SwAV, 2019
Morgado, Avid CMA, https://github.com/facebookresearch/AVID-CMA
Clustering
example DeepCluster
SwAV (Swapping Assignments
between Views
SCAN (Semantic Clustering by
adopting Nearest Neighbors)

Introduction to self-Supervised learning - kuliah machine learning STEI ITB