SlideShare a Scribd company logo
1 of 11
Survey on Self-Supervised Multimodal
Representation Learning and
Foundation Models
SUBMITTED BY
CHIRUDEEP GORLE
016682627
Introduction
Deep learning is currently a need for all contemporary intelligent systems,
particularly those that handle natural language and computer vision.
However, it is costly and impractical to annotate datasets for supervised
learning. To circumvent this issue, researchers have employed self-supervised
learning, in which models learn from data without direct human supervision.
This study looks at the development of self-supervised learning across various
verbal, visual, and auditory media.
Research questions
 A review of the research on representation learning for many modalities,
including language, vision, audio, and robotics, is included in the study's
first section. The study of these modalities combined is then looked at.
This study addresses the goal of self-supervised representations, the
application of diverse learning algorithms for representations across
several modalities, and the benefits of combining these modalities to
create more effective AI agents.
Language
The method of developing statistical models that record the joint probability
distribution of word combinations in sentences and utilize that data to define
languages. The solution to the dimensionality problem is provided by the
work, which trains word representations using neighboring words. The
statistical approach represents words and language as conditional
probabilities of the subsequent word given all the preceding words.
Image
Self-supervised techniques have also been used to learn about visual
representation in the field of computer vision. Similar to NLP, picture
representation learning involves self-supervision through pretext tasks that
generate fictional labels for bogus prediction problems. These pretext
problems allow models to learn representations without explicit supervision.
Audio
Comparable progress has been made in the learning of voice representations
using techniques influenced by BERT and Transformers. Speech features may
be encoded using spectrograms as speech is a waveform of signals. These
speech features are processed by an audio encoder, which then learns an
embedding, represented by the letter "z." The speech features produced by
the decoder using 'z' as input allow for the reconstruction of the original
characteristics. This generative self-supervised learning technique is made
possible by the use of a reconstruction loss, which encourages the
development of rich speech representations.
Robotics
Robotics has been influenced by recent advances in image representation
learning, such as MoCo and SwAV. These techniques are employed in robotics
and reinforcement learning to offer extra incentives when the main RL
environment only offers occasional rewards. By generating self-supervised
additional incentives, these techniques make it possible to build RL agents
that are incredibly successful at robotics tasks. When used in RL scenarios like
games and simulated work, SwAV has shown promising results. This flexibility
helps researchers in robotics to overcome the issue of insufficient incentives
and improve the performance of RL agents.
Multimodal
Studies have focused on combining two or more modalities to address issues
including photo captioning, visual question answering, and music
composition. Combining textual and visual information is necessary for these
positions. Many methods have been developed that enable the learning of
cross-modal embeddings that capture the information from both domains in
order to encode and mix various modalities. In the Vilbert model, for instance,
the mask language model predicts the missing information from both visual
and textual representations, adapting the BERT architecture to handle
multimodal inputs.
Research opportunities
In terms of research, multimodal learning has gone a long way, but there is
still plenty to discover. One such area is the development of learnable
augmentation systems that can go beyond the limitations of hand-made
augmentations. Eliminating trial and error and biases can promote
multimodal learning. Making more effective pretext assignments for teaching
rich feature representations is another possible solution. Transformer-based
models, which are often used in multimodal learning, must be improved in
terms of setup and usefulness.
Conclusion
In-depth analyses of existing and successful self-supervised representation
learning methods in robotics, audio, vision, NLP, and multimodal applications
are presented in this study. It investigates the evolution of these algorithms,
from their use in subsequent applications through their learning of
embedding representations. The positive results of multimodal representation
learning are highlighted. The research also addresses the approaches' current
limitations and offers potential future developments.
Thank You

More Related Content

Similar to CMPE258 Short story.pptx

Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLMCrafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLMChristopherTHyatt
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language modelc sharada
 
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...IJCI JOURNAL
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023HyunJoon Jung
 
VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE
VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE
VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE ijmpict
 
Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...IAESIJAI
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKijnlc
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
 
A Flowchart-based Programming Environment for Improving Problem Solving Skill...
A Flowchart-based Programming Environment for Improving Problem Solving Skill...A Flowchart-based Programming Environment for Improving Problem Solving Skill...
A Flowchart-based Programming Environment for Improving Problem Solving Skill...Cynthia Velynne
 
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGLIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGIRJET Journal
 
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...IRJET Journal
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
 
A Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language SpecificationA Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language Specificationijtsrd
 

Similar to CMPE258 Short story.pptx (20)

Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLMCrafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
 
Icwl2015 wahl
Icwl2015 wahlIcwl2015 wahl
Icwl2015 wahl
 
Language Modeling.docx
Language Modeling.docxLanguage Modeling.docx
Language Modeling.docx
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
Multi-modal NLP Systems in Healthcare
Multi-modal NLP Systems in HealthcareMulti-modal NLP Systems in Healthcare
Multi-modal NLP Systems in Healthcare
 
VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE
VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE
VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE
 
Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...Analysis of the evolution of advanced transformer-based language models: Expe...
Analysis of the evolution of advanced transformer-based language models: Expe...
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
[IJET-V1I5P9] Author: Prutha Gandhi, Dhanashri Dalvi, Pallavi Gaikwad, Shubha...
[IJET-V1I5P9] Author: Prutha Gandhi, Dhanashri Dalvi, Pallavi Gaikwad, Shubha...[IJET-V1I5P9] Author: Prutha Gandhi, Dhanashri Dalvi, Pallavi Gaikwad, Shubha...
[IJET-V1I5P9] Author: Prutha Gandhi, Dhanashri Dalvi, Pallavi Gaikwad, Shubha...
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
 
A Flowchart-based Programming Environment for Improving Problem Solving Skill...
A Flowchart-based Programming Environment for Improving Problem Solving Skill...A Flowchart-based Programming Environment for Improving Problem Solving Skill...
A Flowchart-based Programming Environment for Improving Problem Solving Skill...
 
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGLIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
 
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
A Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language SpecificationA Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language Specification
 

Recently uploaded

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

CMPE258 Short story.pptx

  • 1. Survey on Self-Supervised Multimodal Representation Learning and Foundation Models SUBMITTED BY CHIRUDEEP GORLE 016682627
  • 2. Introduction Deep learning is currently a need for all contemporary intelligent systems, particularly those that handle natural language and computer vision. However, it is costly and impractical to annotate datasets for supervised learning. To circumvent this issue, researchers have employed self-supervised learning, in which models learn from data without direct human supervision. This study looks at the development of self-supervised learning across various verbal, visual, and auditory media.
  • 3. Research questions  A review of the research on representation learning for many modalities, including language, vision, audio, and robotics, is included in the study's first section. The study of these modalities combined is then looked at. This study addresses the goal of self-supervised representations, the application of diverse learning algorithms for representations across several modalities, and the benefits of combining these modalities to create more effective AI agents.
  • 4. Language The method of developing statistical models that record the joint probability distribution of word combinations in sentences and utilize that data to define languages. The solution to the dimensionality problem is provided by the work, which trains word representations using neighboring words. The statistical approach represents words and language as conditional probabilities of the subsequent word given all the preceding words.
  • 5. Image Self-supervised techniques have also been used to learn about visual representation in the field of computer vision. Similar to NLP, picture representation learning involves self-supervision through pretext tasks that generate fictional labels for bogus prediction problems. These pretext problems allow models to learn representations without explicit supervision.
  • 6. Audio Comparable progress has been made in the learning of voice representations using techniques influenced by BERT and Transformers. Speech features may be encoded using spectrograms as speech is a waveform of signals. These speech features are processed by an audio encoder, which then learns an embedding, represented by the letter "z." The speech features produced by the decoder using 'z' as input allow for the reconstruction of the original characteristics. This generative self-supervised learning technique is made possible by the use of a reconstruction loss, which encourages the development of rich speech representations.
  • 7. Robotics Robotics has been influenced by recent advances in image representation learning, such as MoCo and SwAV. These techniques are employed in robotics and reinforcement learning to offer extra incentives when the main RL environment only offers occasional rewards. By generating self-supervised additional incentives, these techniques make it possible to build RL agents that are incredibly successful at robotics tasks. When used in RL scenarios like games and simulated work, SwAV has shown promising results. This flexibility helps researchers in robotics to overcome the issue of insufficient incentives and improve the performance of RL agents.
  • 8. Multimodal Studies have focused on combining two or more modalities to address issues including photo captioning, visual question answering, and music composition. Combining textual and visual information is necessary for these positions. Many methods have been developed that enable the learning of cross-modal embeddings that capture the information from both domains in order to encode and mix various modalities. In the Vilbert model, for instance, the mask language model predicts the missing information from both visual and textual representations, adapting the BERT architecture to handle multimodal inputs.
  • 9. Research opportunities In terms of research, multimodal learning has gone a long way, but there is still plenty to discover. One such area is the development of learnable augmentation systems that can go beyond the limitations of hand-made augmentations. Eliminating trial and error and biases can promote multimodal learning. Making more effective pretext assignments for teaching rich feature representations is another possible solution. Transformer-based models, which are often used in multimodal learning, must be improved in terms of setup and usefulness.
  • 10. Conclusion In-depth analyses of existing and successful self-supervised representation learning methods in robotics, audio, vision, NLP, and multimodal applications are presented in this study. It investigates the evolution of these algorithms, from their use in subsequent applications through their learning of embedding representations. The positive results of multimodal representation learning are highlighted. The research also addresses the approaches' current limitations and offers potential future developments.