Self Supervised Learning for Vision Tasks (1).pdf

streamingo.ai
streamingo.ai
Self Supervised Learning for Vision Tasks
1 July 2023
streamingo.ai
Video. Business Intelligence. Insights

streamingo.ai
streamingo.ai
Ways to Learn
l Supervised Learning
l Unsupervised Learning
l Self-supervised Learning

streamingo.ai
streamingo.ai
Self Supervised Learning
l “dark matter of intelligence”
l Learns from unlableled data
l Able to match or surpass models trained with
supervised approach
l SSL works for text, image, video, audio and time series
data

streamingo.ai
streamingo.ai
Self Supervised Learning

streamingo.ai
streamingo.ai
Transfer Learning and Fine Tuning

streamingo.ai
streamingo.ai
Why Self Supervised Learning
l Representations learned can be used for variety of
tasks.
For eg. in NLP , downstream tasks could be
summarization, translation or generating text
l Supervised learning, the task has to be defined
beforehand.
l Unsupervised learning doesnt learn the representation.

streamingo.ai
streamingo.ai
Different Types of Learning
l Pretext Learning
l Generative Learning
l Contrastive Learning
l Cross-Modal Apperance

streamingo.ai
streamingo.ai
Pretext Learning

streamingo.ai
streamingo.ai
Apperance Statistics Prediction
l Model is asked to predict or classify appearance
modifying augmentation
l Augmentations could be color, rotation or random noise

streamingo.ai
streamingo.ai
Playback speed
l Take clips of t frames from each video, select frames in
a way that the playback speed is altered.
l Collect p frames, where p is the playback rate, either
speeding up the video or slowing it down

streamingo.ai
streamingo.ai
Temporal Order
l Each video V is split into clips of t frames
l Each set of clips contains a single clip in the correct
order, and the remaining clips are modified by shuffling
the order.
l For eg. (t2,t1,t3) is incorrect and (t1,t2,t3) is correct.
l Also called odd-one-out-learning

streamingo.ai
streamingo.ai
Video Jigsaw

streamingo.ai
streamingo.ai
Generative Learning

streamingo.ai
streamingo.ai
Generative Adversial Networks

streamingo.ai
streamingo.ai
Frame Prediction
l Reconstructing motion or Generating mtion from RGB
frames
l Uses optical flow as the motion signal
l Discrimintator and Variational AutoEncoder used to
measure the quality of the generated predictions
l Another approach is to create motion maps, and then
predict next frame as various resolutions.
l Use a reconstruction loss to measure quality of the
reconstruction

streamingo.ai
streamingo.ai
Masked Auto Encoders

streamingo.ai
streamingo.ai
Sampling

streamingo.ai
streamingo.ai
Video Masked Auto Encoders (Video MAE)

streamingo.ai
streamingo.ai
Masking in Video MAE

streamingo.ai
streamingo.ai
Multimodal Masked Modeling
l First introduced in NLP as Masked Language Modeling
(MLM)
l Bidirectional Encoder Representation from
Transformers (BERT) was extended to video domain by
transforming raw visual data into discrete sequence of
tokens using hierarchical k-means

streamingo.ai
streamingo.ai
Multimodal Masked Modeling

streamingo.ai
streamingo.ai
Contrastive Learning

streamingo.ai
streamingo.ai
View Augmentation
l Change in apperance using augmentations such as
l Random resized crop, channel drop, random color
jitter, random grey and/or random rotation
l Positive pairs are augmented versions of original
clips
l Negative pairs are clips from other videos
l Popular approaches SimCLR, BYOL aand MoCo

streamingo.ai
streamingo.ai
Simple Framework for Contrastive
Learning

streamingo.ai
streamingo.ai
Momentum Contrast

streamingo.ai
streamingo.ai
Bootstrap your own latent

streamingo.ai
streamingo.ai
Temporal Augmentation
l Augmentation used to generate
paris from modifying the temporal
order or the start and end of a clip
interval
l Maximize similarity function
between two temporally adjacent
frames in same video
l Minimize similarity between frames
from other videos

streamingo.ai
streamingo.ai
Spatio-Temporal Augmentation
l Inter-frame instance
discrimination using NCE
loss for temporal
elements.
l Intra-frame instance
discrimination using
cross-entropy for spatial
elements

streamingo.ai
streamingo.ai
Clustering

streamingo.ai
streamingo.ai
Cross-Modal Appearance

streamingo.ai
streamingo.ai
Cross-Modal Agreement

streamingo.ai
streamingo.ai
Downstream Tasks
l Action Recognition
l Temporal Action Segmentation
l Temporal Action Step Localization
l Video Retrieval
l Text-to-Video Retrieval
l Video Captioning

streamingo.ai
streamingo.ai
Thank You!

Self Supervised Learning for Vision Tasks (1).pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Self Supervised Learning for Vision Tasks (1).pdf

Similar to Self Supervised Learning for Vision Tasks (1).pdf (20)

More from KonfHubTechConferenc

More from KonfHubTechConferenc (9)

Recently uploaded

Recently uploaded (20)

Self Supervised Learning for Vision Tasks (1).pdf