SlideShare a Scribd company logo
1 of 33
Download to read offline
streamingo.ai
streamingo.ai
Self Supervised Learning for Vision Tasks
1 July 2023
streamingo.ai
Video. Business Intelligence. Insights
streamingo.ai
streamingo.ai
Ways to Learn
l Supervised Learning
l Unsupervised Learning
l Self-supervised Learning
streamingo.ai
streamingo.ai
Self Supervised Learning
l “dark matter of intelligence”
l Learns from unlableled data
l Able to match or surpass models trained with
supervised approach
l SSL works for text, image, video, audio and time series
data
streamingo.ai
streamingo.ai
Self Supervised Learning
streamingo.ai
streamingo.ai
Transfer Learning and Fine Tuning
streamingo.ai
streamingo.ai
Why Self Supervised Learning
l Representations learned can be used for variety of
tasks.
For eg. in NLP , downstream tasks could be
summarization, translation or generating text
l Supervised learning, the task has to be defined
beforehand.
l Unsupervised learning doesnt learn the representation.
streamingo.ai
streamingo.ai
Different Types of Learning
l Pretext Learning
l Generative Learning
l Contrastive Learning
l Cross-Modal Apperance
streamingo.ai
streamingo.ai
Pretext Learning
streamingo.ai
streamingo.ai
Apperance Statistics Prediction
l Model is asked to predict or classify appearance
modifying augmentation
l Augmentations could be color, rotation or random noise
streamingo.ai
streamingo.ai
Playback speed
l Take clips of t frames from each video, select frames in
a way that the playback speed is altered.
l Collect p frames, where p is the playback rate, either
speeding up the video or slowing it down
streamingo.ai
streamingo.ai
Temporal Order
l Each video V is split into clips of t frames
l Each set of clips contains a single clip in the correct
order, and the remaining clips are modified by shuffling
the order.
l For eg. (t2,t1,t3) is incorrect and (t1,t2,t3) is correct.
l Also called odd-one-out-learning
streamingo.ai
streamingo.ai
Video Jigsaw
streamingo.ai
streamingo.ai
Generative Learning
streamingo.ai
streamingo.ai
Generative Adversial Networks
streamingo.ai
streamingo.ai
Frame Prediction
l Reconstructing motion or Generating mtion from RGB
frames
l Uses optical flow as the motion signal
l Discrimintator and Variational AutoEncoder used to
measure the quality of the generated predictions
l Another approach is to create motion maps, and then
predict next frame as various resolutions.
l Use a reconstruction loss to measure quality of the
reconstruction
streamingo.ai
streamingo.ai
Masked Auto Encoders
streamingo.ai
streamingo.ai
Sampling
streamingo.ai
streamingo.ai
Video Masked Auto Encoders (Video MAE)
streamingo.ai
streamingo.ai
Masking in Video MAE
streamingo.ai
streamingo.ai
Multimodal Masked Modeling
l First introduced in NLP as Masked Language Modeling
(MLM)
l Bidirectional Encoder Representation from
Transformers (BERT) was extended to video domain by
transforming raw visual data into discrete sequence of
tokens using hierarchical k-means
streamingo.ai
streamingo.ai
Multimodal Masked Modeling
streamingo.ai
streamingo.ai
Contrastive Learning
streamingo.ai
streamingo.ai
View Augmentation
l Change in apperance using augmentations such as
l Random resized crop, channel drop, random color
jitter, random grey and/or random rotation
l Positive pairs are augmented versions of original
clips
l Negative pairs are clips from other videos
l Popular approaches SimCLR, BYOL aand MoCo
streamingo.ai
streamingo.ai
Simple Framework for Contrastive
Learning
streamingo.ai
streamingo.ai
Momentum Contrast
streamingo.ai
streamingo.ai
Bootstrap your own latent
streamingo.ai
streamingo.ai
Temporal Augmentation
l Augmentation used to generate
paris from modifying the temporal
order or the start and end of a clip
interval
l Maximize similarity function
between two temporally adjacent
frames in same video
l Minimize similarity between frames
from other videos
streamingo.ai
streamingo.ai
Spatio-Temporal Augmentation
l Inter-frame instance
discrimination using NCE
loss for temporal
elements.
l Intra-frame instance
discrimination using
cross-entropy for spatial
elements
streamingo.ai
streamingo.ai
Clustering
streamingo.ai
streamingo.ai
Cross-Modal Appearance
streamingo.ai
streamingo.ai
Cross-Modal Agreement
streamingo.ai
streamingo.ai
Downstream Tasks
l Action Recognition
l Temporal Action Segmentation
l Temporal Action Step Localization
l Video Retrieval
l Text-to-Video Retrieval
l Video Captioning
streamingo.ai
streamingo.ai
Thank You!

More Related Content

What's hot

1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
NAVER Engineering
 
Statistical learning
Statistical learningStatistical learning
Statistical learning
Slideshare
 

What's hot (20)

Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2
 
Statistical learning
Statistical learningStatistical learning
Statistical learning
 
Bayesian networks in AI
Bayesian networks in AIBayesian networks in AI
Bayesian networks in AI
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position EmbeddingRoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
 
Lecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptxLecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptx
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAIYurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
 
Restricted boltzmann machine
Restricted boltzmann machineRestricted boltzmann machine
Restricted boltzmann machine
 
MACHINE LEARNING PPT(ML) rohit.pptx
MACHINE LEARNING  PPT(ML) rohit.pptxMACHINE LEARNING  PPT(ML) rohit.pptx
MACHINE LEARNING PPT(ML) rohit.pptx
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representation
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)
 

Similar to Self Supervised Learning for Vision Tasks (1).pdf

Similar to Self Supervised Learning for Vision Tasks (1).pdf (20)

更適應性的AOI-深度強化學習之應用
更適應性的AOI-深度強化學習之應用更適應性的AOI-深度強化學習之應用
更適應性的AOI-深度強化學習之應用
 
Trustworthy Generative AI_ ICML'23 Tutorial.pptx
Trustworthy Generative AI_ ICML'23 Tutorial.pptxTrustworthy Generative AI_ ICML'23 Tutorial.pptx
Trustworthy Generative AI_ ICML'23 Tutorial.pptx
 
jefferson-mae Masked Autoencoders based Pretraining
jefferson-mae Masked Autoencoders based Pretrainingjefferson-mae Masked Autoencoders based Pretraining
jefferson-mae Masked Autoencoders based Pretraining
 
Open CV - 電腦怎麼看世界
Open CV - 電腦怎麼看世界Open CV - 電腦怎麼看世界
Open CV - 電腦怎麼看世界
 
GDSC Machine Learning Session Presentation
GDSC Machine Learning Session PresentationGDSC Machine Learning Session Presentation
GDSC Machine Learning Session Presentation
 
GDSC BPIT ML Campaign.pptx
GDSC BPIT ML Campaign.pptxGDSC BPIT ML Campaign.pptx
GDSC BPIT ML Campaign.pptx
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
 
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptxSANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
SANGEETA_YADAV_AI_VIDEO_SUMMARIZER_WEB_APP.pptx
 
INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...
INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...
INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...
 
THE PROCESS FOLLOWED FOR CREATING AN ANIMATED VIDEO BY MAAC ANIMATION KOLKATA...
THE PROCESS FOLLOWED FOR CREATING AN ANIMATED VIDEO BY MAAC ANIMATION KOLKATA...THE PROCESS FOLLOWED FOR CREATING AN ANIMATED VIDEO BY MAAC ANIMATION KOLKATA...
THE PROCESS FOLLOWED FOR CREATING AN ANIMATED VIDEO BY MAAC ANIMATION KOLKATA...
 
AI Guide to Indie Producers.pptx
AI Guide to Indie Producers.pptxAI Guide to Indie Producers.pptx
AI Guide to Indie Producers.pptx
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesis
 
Prior AI consulting use cases
Prior AI consulting use casesPrior AI consulting use cases
Prior AI consulting use cases
 
Visual Experiences with flex 4
Visual Experiences with flex 4Visual Experiences with flex 4
Visual Experiences with flex 4
 
kornev.pdf
kornev.pdfkornev.pdf
kornev.pdf
 
PR-315: Taming Transformers for High-Resolution Image Synthesis
PR-315: Taming Transformers for High-Resolution Image SynthesisPR-315: Taming Transformers for High-Resolution Image Synthesis
PR-315: Taming Transformers for High-Resolution Image Synthesis
 
Otto AI
Otto AIOtto AI
Otto AI
 
Image captioning
Image captioningImage captioning
Image captioning
 
Multimodal deep learning
Multimodal deep learningMultimodal deep learning
Multimodal deep learning
 
Sign language recognizer
Sign language recognizerSign language recognizer
Sign language recognizer
 

More from KonfHubTechConferenc

More from KonfHubTechConferenc (9)

KonfHub Features, Benefits and Pricing
KonfHub Features, Benefits and Pricing KonfHub Features, Benefits and Pricing
KonfHub Features, Benefits and Pricing
 
Functional Thinking for Java Developers (presented in Javafest Bengaluru)
Functional Thinking for Java Developers (presented in Javafest Bengaluru)Functional Thinking for Java Developers (presented in Javafest Bengaluru)
Functional Thinking for Java Developers (presented in Javafest Bengaluru)
 
Azuga A Safety Company - Data Science Saving Lives
Azuga A Safety Company - Data Science Saving LivesAzuga A Safety Company - Data Science Saving Lives
Azuga A Safety Company - Data Science Saving Lives
 
Application of Artificial Intelligence for Automotive Applications
Application of Artificial Intelligence for Automotive ApplicationsApplication of Artificial Intelligence for Automotive Applications
Application of Artificial Intelligence for Automotive Applications
 
Are you ready for AI? Is AI ready for you?
Are you ready for AI? Is AI ready for you?Are you ready for AI? Is AI ready for you?
Are you ready for AI? Is AI ready for you?
 
Exploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsExploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion Models
 
Exploring Generative AI with GAN Models
Exploring Generative AI with GAN ModelsExploring Generative AI with GAN Models
Exploring Generative AI with GAN Models
 
KonfHub Recap 2021
KonfHub Recap 2021 KonfHub Recap 2021
KonfHub Recap 2021
 
Become Thanos of the LambdaLand - Wield All the Infinity Stones
Become Thanos of the LambdaLand - Wield All the Infinity StonesBecome Thanos of the LambdaLand - Wield All the Infinity Stones
Become Thanos of the LambdaLand - Wield All the Infinity Stones
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 
WSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration ToolingWSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration Tooling
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 

Self Supervised Learning for Vision Tasks (1).pdf