SlideShare a Scribd company logo
1 of 40
Fuse and Adapt: Investigating the Use of Pre-Trained
Self-Supervised Learning Models in Limited Data NLU
Problems
Shamane Siriwardhana
Supervised by Associate Professor Suranga Nanayakkara, Professor Mark Billinghurst & Dr Elliott Wen
1
Deep Learning and Cake Analogy 2.0
(-Yann LeCun Chief AI Scientist at Meta-)
Cherry - Reinforcement
Learning
Icing - Supervised Learning
Cake - Self-supervised
Learning
● Is Self-Supervised Learning the future of AI?
2
Self-Supervised Learning (SSL)
72K GitHub stars since 2020 50K citations since 2019 5M $ to train a single model
● Eliminates the prerequisite of requiring humans to label data.
● Models can use naturally available context as labels.
3
SSL workflow
Pretext Task Eg : Masked
Language Modeling
SSL Workflow
High Availability of Pre-trained SSL Models
4
Pioneers in pre-training
Open-sourced model checkpoints
The Focus of the Thesis: Utilization of Pre-trained SSL Models
5
6
Research Questions
Fusion of multimodal pre-trained SSL features and models
RQ1: How to fuse multimodal features extracted from frozen pretrained SSL
models?
RQ2: How to fuse two pre-trained transformer-based architectures in
multimodal settings?
Domain adaptation of pre-trained SSL models with fine-tuning mechanisms
RQ3: How to adapt generative pre-trained SSL model when there are no high-quality
training data?
RQ4: How to domain adopt compound neural architectures that consist of several pre-
trained SSL models?
7
Fusion of Multimodal Features Extracted from Three Different Frozen
Pre-trained SSL Models
Frozen models
Multimodal emotion Recognition:
● Challenging to collect and annotate
data
RQ1
8
Multimodal Frozen SSL Networks
● Fabnet - Video
● Convolution-based architecture
● Vector Size - 256
● Seq-len: frames in the video
● Wav2Vec - Speech
● Temporal convolutions (tc)
● Vector Size - 512
● Seq-len: strides in tc
● RoBERTa - Text
● Transformer
● Vector Size - 1024
● Seq-len: number of words
RQ1
9
SSL-Embedding Fusion Transformer
Ablation Studies on CMU-MOSEI dataset
Model comparisons
Proposed transformer-based fusion
Multimodal emotion recognition with transformer-based self supervised feature
fusion (Siriwardhana et al. 2020)
RQ1
10
Some Findings
❖ Dense SSL features extracted from different SSL models have robust
representational capabilities.
➢ They can be fused with transformer-based fusion mechanisms
➢ Self-attention plays an important role when combining sequential
embeddings
❖ Feature fusion while keeping the pre-trained models frozen is important
➢ When pre-trained models have a vast number of parameters
➢ E.g.,GPT3 - model consists of 185 billion neurons.
RQ1
Findings related to RQ1 have presented as a journal paper in IEEE Access 2020. S. Siriwardhana, T. Kaluarachchi, M. Billinghurst and S. Nanayakkara,
"Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion," in IEEE Access, vol. 8, pp. 176274- 176285, 2020, doi:
10.1109/ACCESS.2020.3026823.
Impact Factor - 3.367
11
Fusion of two Transformers Architectures in
Multimodal Settings
● Represent different modalities with transformer-based pre-trained models
● Utilizing architectural properties in the fusion
RQ2
12
Transformer models
● RoBERTa (unfrozen) - Text
● Transformer
● Vector Size - 1024
● Seq-len: number of words
● Speech-BERT (unfrozen) - Speech
● Transformer
● Vector Size - 1024
● Seq-len: Sampling frequency
RQ2
13
Shallow vs Co-attentional fusion
Shallow Fusion Co-attentional fusion
Fusion mechanisms
Model comparisons
Ablation studies
Jointly fine-tuning" bert-like" self supervised models to improve multimodal
speech emotion recognition (Siriwardhana et al. (2020))
RQ2
14
Some Findings
❖ Pre-trained SSL models with transformer-based architectures can easily fuse together
➢ Employing unique properties like [CLS] token
➢ Shallow fusion
❖ Transformer-based SSL models can finetune stably even with less amount of data
➢ Can finetune stably with lower learning rates
❖ Transformer architecture is becoming increasingly ubiquitous in self supervised
learning
➢ Transformer-based models represent different data modalities
RQ2
Findings related to RQ2 have presented as a full conference paper in Interspeech 2020. S. Siriwardhana, Reis A, Weerasakera R, Nanayakkara S. “Jointly
Fine-Tuning BERT-like Self Supervised Models to Improve Multimodal Speech Emotion Recognition.” Proceedings of the Annual Conference of the
International Speech Communication Association, INTERSPEECH. Vol. 2020.
H index - 100
15
Domain adaptation of Generative BART model when high-quality
training data is missing
Autobiographical Text Summarization
● Privacy issues
● Different language Patterns
● Scarcity of records and gold-standard
summaries
● BART transformer - generate text
● Works well for generation benchmarks
● Sequence-to-Sequence architecture
RQ3
16
Utilization of Reddit data and high-quality news data
Thread
Title
● News summarization
● Fundamental task
● Gold standard datasets
● Closely related dataset to the domain
● Titles only
RQ3
17
Mix Distribution Multitask Learning
● Finetuning BART for the autobiographical summarization with :
○ Domain-specific weakly labeled dataset
○ Task-specific dataset with gold-standard labels
Model Comparison Factual consistency (FactCC)
Abstractive Summarization System for Autobiographical Text (Siriwardhana et al. (2022))
RQ3
18
Human Studies
Abstractive Summarization System for Autobiographical Text (Siriwardhana et al. (2022))
Model comparison with Mturk participants
SummarizeMe (Digital Diary) - User study conducted with 75 users
RQ3
19
Some of the Findings
❖ SSL models like BART consist of strong language generation capabilities
➢ Such models have seen a large amount of data during the pre-training
➢ BART-like models can perform well even without high-quality data
❖ Data-centric approaches are crucial when adopting tasks like autobiographical
text summarization
➢ Designing better mechanisms to make use of available domain-specific data
❖ Human studies are essential and beneficial for evaluating generative models
Findings related to RQ3 have submitted as a journal paper in ISRE 2022. S. Siriwardhana, Kalurachchi T, Chithralekha G Scholl P, Dissanayake V,
Nanayakkara S. ``SummarizeMe: Abstractive Summarization System for Autobiographical Text'' Proceedings of the Information System Research (ISR)
2022 [Under review]
RQ3
20
Domain adaptation of Compound Neural Architectures with Several
Pre-trained SSL Models.
● Retrieval Augment Generation (RAG) model (Meta)
● Combines the information retrieval and seq2seq
generation
● DPR neural retriever and BART generator
RQ4
● Open Domain Question Answering (ODQA)
● Works well for Wikipedia-based knowledge bases
● Less work on domain adaptation of ODQA
21
Domain Adaptation of the RAG
RQ4
22
RAG-end2end and Introduction of an Auxiliary Signal
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question
Answering (Siriwardhana et al - 2022)
End-to-End RAG retriever Reconstruction auxiliary signal
RQ4
23
End2end retriever training improves the domain adaptation
RAG-end2end and auxiliary signals
can improve the overall results
Empowering further research in the paradigm of retrieval
augmentation
RQ4
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering (Siriwardhana
et al - 2022)
24
Some of the Findings
❖ Different SSL-pre-trained models can be combined and create effective RAG-like
pipelines.
❖ Retrieval models play a vital role in the domain adaptation of RAG.
➢ Neural retrieval models like DPR benefit from domain-specific fine-tuning
since they are mainly trained with Wiki-based data.
❖ Auxiliary signals can improve the process of domain adaptation.
➢ A solution to the scarcity of domain-specific labeled data.
Findings related to RQ4 have accepted as a journal paper in TACL 2022. Siriwardhana S, Weerasakera R, Kalurachchi T, Elliott W, Rana R, Nanayakkara S.
``Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open-Domain Question-Answering'' Transactions of the Association
for Computational Linguistics TACL 2022 (will be presented at EMNLP - 2022)
Impact Factor - 9.194
RQ4
25
Summary of the thesis
26
❖ Model compression techniques are important
➢ SSL model checkpoints are large
➢ Model Pruning and distillation
➢ Can support recent paradigms like federated learning
Model pruning Federated learning
Knowledge distillation
❖ Human-centric model evaluation is getting stronger
➢ Hallucinations and factual constancy is a significant area
Future Work
Future Work
27
❖ Retrieval Augmentation could play a significant role in the field of AI
➢ Could be a solution to billion-dollar large pre-trained models
➢ Are we getting closer to human-like intelligence?
Retrieval augmentation to make models to go beyond a
parametric memory (Source Retro-deepmind (2022))
Directly related publications
● S. Siriwardhana, T. Kaluarachchi, M. Billinghurst and S. Nanayakkara, "Multimodal Emotion Recognition With Transformer-
Based Self Supervised Feature Fusion," in IEEE Access 2020.
● Siriwardhana S, Reis A, Weerasakera R, Nanayakkara S. ``Jointly Fine-Tuning BERT-like Self Supervised Models to
Improve Multimodal Speech Emotion Recognition.'' Proceedings of the International Speech Communication Association,
INTERSPEECH. 2020.
● Siriwardhana S, Kalurachchi T, Scholl P, Dissanayake V, Nanayakkara S. ``SummarizeMe: Abstractive Summarization
System for Autobiographical Text'' Proceedings of the Information System Research (ISR) 2022 [Under review]
● Siriwardhana S, Weerasakera R, Kalurachchi T, Elliott W, Rana R, Nanayakkara S. ``Improving the Domain Adaptation of
Retrieval Augmented Generation (RAG) Models for Open-Domain Question-Answering'' Transactions of the Association for
Computational Linguistics TACL 2022 (will be presented at EMNLP - 2022)
29
Other Publications
● Wen, E., Kaluarachchi, T., Siriwardhana, S., Tang, V., Billinghurst, M., Lindeman, R.W., Yao, R., Lin, J. and
Nanayakkara, S.C., 2022. VRhook: A Data Collection Tool for VR Motion Sickness Research. Proceedings of the Annual
Conference of the User Interface Software and Technology UIST ’22.
● Kaluarachchi, T., Siriwardhana, S., Wenn, E., and Nanayakkara, S., A Corneal Surface Reflections-Based Intelligent
System for Lifelogging Application. International Journal of Human Computer Interaction (IJHCI) 22(4), [Under Review]
❖ My supervisors
➢ Prof. Suranga Nanayakkara
➢ Prof. Mark Billinghurst
➢ Dr. Elliotte Wen
❖ Examination committee members
➢ Assoc Prof Kwan Hui Lim
➢ Assoc Prof Alan Wang
❖ The University of Auckland Doctoral Scholarship Programme
❖ All my Co-authors and lab members
30
Acknowledgement
Thank you!
Appendix
● Pre-training is expensive
○ It is so expensive
30
32
● Huge carbon footprint
Performance Matters!
32
● Pre-trained SSL models are performing exceptionally well for many tasks.
36
● Retrieval augmented models have some important qualities
Difference between IMA and Co-attention
● Co-attention doesn’t need any modification like adding a class Token, or few layers of transformers
37
IMA modification Co-attention
DL features vs SSL features
38
DL features vs SSL features
39
● CNN Features off-the-shelf: an Astounding Baseline for Recognition (2014)
● PASS: An ImageNet replacement for self-supervised pretraining without humans
(2021)
● Efficient Self-supervised Vision Transformers for Representation Learning (2022)
(“When transferring to downstream linear classification tasks, EsViT outperforms its
supervised counterpart on 17 out of 18 datasets. ”)
● Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms
(2019)
● How Well Do Self-Supervised Models Transfer? (CVPR2022)
40
DL features vs SSL features
GPT-3 (Open AI)
● 12 M $ to train
● 175 Billion Parameters (365GB)
● Bigger the better
33

More Related Content

What's hot

neuromorphic computing.pdf
neuromorphic computing.pdfneuromorphic computing.pdf
neuromorphic computing.pdfkirti617012
 
House Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning AlgorithmHouse Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning Algorithmijtsrd
 
Gartner market guide for hybrid integration platform enabling technologies
Gartner market guide for hybrid integration platform enabling technologiesGartner market guide for hybrid integration platform enabling technologies
Gartner market guide for hybrid integration platform enabling technologiescamrituraj
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AIBenjaminlapid1
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveHuahai Yang
 
Doctor A-Z - dtac accelerate Batch 6
Doctor A-Z - dtac accelerate Batch 6Doctor A-Z - dtac accelerate Batch 6
Doctor A-Z - dtac accelerate Batch 6dtac Accelerate
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬VINCI Digital - Industrial IoT (IIoT) Strategic Advisory
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Edureka!
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 finalAmjid Ali
 

What's hot (17)

neuromorphic computing.pdf
neuromorphic computing.pdfneuromorphic computing.pdf
neuromorphic computing.pdf
 
House Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning AlgorithmHouse Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning Algorithm
 
Gartner market guide for hybrid integration platform enabling technologies
Gartner market guide for hybrid integration platform enabling technologiesGartner market guide for hybrid integration platform enabling technologies
Gartner market guide for hybrid integration platform enabling technologies
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's Perspective
 
Doctor A-Z - dtac accelerate Batch 6
Doctor A-Z - dtac accelerate Batch 6Doctor A-Z - dtac accelerate Batch 6
Doctor A-Z - dtac accelerate Batch 6
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
FIWARE Generic Enablers introduction
FIWARE Generic Enablers introductionFIWARE Generic Enablers introduction
FIWARE Generic Enablers introduction
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
 
Bi case study
Bi case studyBi case study
Bi case study
 
Data entry projects
Data entry projectsData entry projects
Data entry projects
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
 

Similar to Investigating Pre-Trained SSL Models in Limited Data NLU

SANN: Programming Code Representation Using Attention Neural Network with Opt...
SANN: Programming Code Representation Using Attention Neural Network with Opt...SANN: Programming Code Representation Using Attention Neural Network with Opt...
SANN: Programming Code Representation Using Attention Neural Network with Opt...Peter Brusilovsky
 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfRim Moussa
 
Cse 8th sem syllabus
Cse 8th sem syllabusCse 8th sem syllabus
Cse 8th sem syllabusAkshatha Nair
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceOptum
 
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...Dilnoza Bobokalonova
 
Revathi Resume L& T
Revathi Resume L& TRevathi Resume L& T
Revathi Resume L& TRevathi M
 
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...Dilnoza Bobokalonova
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindsporeijdms
 
A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsMichel Dumontier
 
Risk and Engineering Knowledge Integration in Cyber-physical Production Syste...
Risk and Engineering Knowledge Integration in Cyber-physical Production Syste...Risk and Engineering Knowledge Integration in Cyber-physical Production Syste...
Risk and Engineering Knowledge Integration in Cyber-physical Production Syste...SEAA 2022
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsNeo4j
 
Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
Spark-MPI: Approaching the Fifth Paradigm with Nikolay MalitskySpark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
Spark-MPI: Approaching the Fifth Paradigm with Nikolay MalitskyDatabricks
 
Large scale gpu cluster for ai
Large scale gpu cluster for aiLarge scale gpu cluster for ai
Large scale gpu cluster for aiKyunam Cho
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET Journal
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyShital Kat
 
Software-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big ServicesSoftware-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big ServicesPradeeban Kathiravelu, Ph.D.
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 

Similar to Investigating Pre-Trained SSL Models in Limited Data NLU (20)

SANN: Programming Code Representation Using Attention Neural Network with Opt...
SANN: Programming Code Representation Using Attention Neural Network with Opt...SANN: Programming Code Representation Using Attention Neural Network with Opt...
SANN: Programming Code Representation Using Attention Neural Network with Opt...
 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
 
Cse 8th sem syllabus
Cse 8th sem syllabusCse 8th sem syllabus
Cse 8th sem syllabus
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
 
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...
 
new_kitching_cv
new_kitching_cvnew_kitching_cv
new_kitching_cv
 
Revathi Resume L& T
Revathi Resume L& TRevathi Resume L& T
Revathi Resume L& T
 
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...
Dilnoza Bobokalonova Resume | Embedded Systems Engineering | Backend Software...
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
Risk and Engineering Knowledge Integration in Cyber-physical Production Syste...
Risk and Engineering Knowledge Integration in Cyber-physical Production Syste...Risk and Engineering Knowledge Integration in Cyber-physical Production Syste...
Risk and Engineering Knowledge Integration in Cyber-physical Production Syste...
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing Systems
 
Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
Spark-MPI: Approaching the Fifth Paradigm with Nikolay MalitskySpark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
Spark-MPI: Approaching the Fifth Paradigm with Nikolay Malitsky
 
Large scale gpu cluster for ai
Large scale gpu cluster for aiLarge scale gpu cluster for ai
Large scale gpu cluster for ai
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
 
Resume
ResumeResume
Resume
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop Technology
 
Software-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big ServicesSoftware-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big Services
 
Resume
Resume Resume
Resume
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 

Recently uploaded

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 

Recently uploaded (20)

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 

Investigating Pre-Trained SSL Models in Limited Data NLU

  • 1. Fuse and Adapt: Investigating the Use of Pre-Trained Self-Supervised Learning Models in Limited Data NLU Problems Shamane Siriwardhana Supervised by Associate Professor Suranga Nanayakkara, Professor Mark Billinghurst & Dr Elliott Wen
  • 2. 1 Deep Learning and Cake Analogy 2.0 (-Yann LeCun Chief AI Scientist at Meta-) Cherry - Reinforcement Learning Icing - Supervised Learning Cake - Self-supervised Learning
  • 3. ● Is Self-Supervised Learning the future of AI? 2 Self-Supervised Learning (SSL) 72K GitHub stars since 2020 50K citations since 2019 5M $ to train a single model
  • 4. ● Eliminates the prerequisite of requiring humans to label data. ● Models can use naturally available context as labels. 3 SSL workflow Pretext Task Eg : Masked Language Modeling SSL Workflow
  • 5. High Availability of Pre-trained SSL Models 4 Pioneers in pre-training Open-sourced model checkpoints
  • 6. The Focus of the Thesis: Utilization of Pre-trained SSL Models 5
  • 7. 6 Research Questions Fusion of multimodal pre-trained SSL features and models RQ1: How to fuse multimodal features extracted from frozen pretrained SSL models? RQ2: How to fuse two pre-trained transformer-based architectures in multimodal settings? Domain adaptation of pre-trained SSL models with fine-tuning mechanisms RQ3: How to adapt generative pre-trained SSL model when there are no high-quality training data? RQ4: How to domain adopt compound neural architectures that consist of several pre- trained SSL models?
  • 8. 7 Fusion of Multimodal Features Extracted from Three Different Frozen Pre-trained SSL Models Frozen models Multimodal emotion Recognition: ● Challenging to collect and annotate data RQ1
  • 9. 8 Multimodal Frozen SSL Networks ● Fabnet - Video ● Convolution-based architecture ● Vector Size - 256 ● Seq-len: frames in the video ● Wav2Vec - Speech ● Temporal convolutions (tc) ● Vector Size - 512 ● Seq-len: strides in tc ● RoBERTa - Text ● Transformer ● Vector Size - 1024 ● Seq-len: number of words RQ1
  • 10. 9 SSL-Embedding Fusion Transformer Ablation Studies on CMU-MOSEI dataset Model comparisons Proposed transformer-based fusion Multimodal emotion recognition with transformer-based self supervised feature fusion (Siriwardhana et al. 2020) RQ1
  • 11. 10 Some Findings ❖ Dense SSL features extracted from different SSL models have robust representational capabilities. ➢ They can be fused with transformer-based fusion mechanisms ➢ Self-attention plays an important role when combining sequential embeddings ❖ Feature fusion while keeping the pre-trained models frozen is important ➢ When pre-trained models have a vast number of parameters ➢ E.g.,GPT3 - model consists of 185 billion neurons. RQ1 Findings related to RQ1 have presented as a journal paper in IEEE Access 2020. S. Siriwardhana, T. Kaluarachchi, M. Billinghurst and S. Nanayakkara, "Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion," in IEEE Access, vol. 8, pp. 176274- 176285, 2020, doi: 10.1109/ACCESS.2020.3026823. Impact Factor - 3.367
  • 12. 11 Fusion of two Transformers Architectures in Multimodal Settings ● Represent different modalities with transformer-based pre-trained models ● Utilizing architectural properties in the fusion RQ2
  • 13. 12 Transformer models ● RoBERTa (unfrozen) - Text ● Transformer ● Vector Size - 1024 ● Seq-len: number of words ● Speech-BERT (unfrozen) - Speech ● Transformer ● Vector Size - 1024 ● Seq-len: Sampling frequency RQ2
  • 14. 13 Shallow vs Co-attentional fusion Shallow Fusion Co-attentional fusion Fusion mechanisms Model comparisons Ablation studies Jointly fine-tuning" bert-like" self supervised models to improve multimodal speech emotion recognition (Siriwardhana et al. (2020)) RQ2
  • 15. 14 Some Findings ❖ Pre-trained SSL models with transformer-based architectures can easily fuse together ➢ Employing unique properties like [CLS] token ➢ Shallow fusion ❖ Transformer-based SSL models can finetune stably even with less amount of data ➢ Can finetune stably with lower learning rates ❖ Transformer architecture is becoming increasingly ubiquitous in self supervised learning ➢ Transformer-based models represent different data modalities RQ2 Findings related to RQ2 have presented as a full conference paper in Interspeech 2020. S. Siriwardhana, Reis A, Weerasakera R, Nanayakkara S. “Jointly Fine-Tuning BERT-like Self Supervised Models to Improve Multimodal Speech Emotion Recognition.” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Vol. 2020. H index - 100
  • 16. 15 Domain adaptation of Generative BART model when high-quality training data is missing Autobiographical Text Summarization ● Privacy issues ● Different language Patterns ● Scarcity of records and gold-standard summaries ● BART transformer - generate text ● Works well for generation benchmarks ● Sequence-to-Sequence architecture RQ3
  • 17. 16 Utilization of Reddit data and high-quality news data Thread Title ● News summarization ● Fundamental task ● Gold standard datasets ● Closely related dataset to the domain ● Titles only RQ3
  • 18. 17 Mix Distribution Multitask Learning ● Finetuning BART for the autobiographical summarization with : ○ Domain-specific weakly labeled dataset ○ Task-specific dataset with gold-standard labels Model Comparison Factual consistency (FactCC) Abstractive Summarization System for Autobiographical Text (Siriwardhana et al. (2022)) RQ3
  • 19. 18 Human Studies Abstractive Summarization System for Autobiographical Text (Siriwardhana et al. (2022)) Model comparison with Mturk participants SummarizeMe (Digital Diary) - User study conducted with 75 users RQ3
  • 20. 19 Some of the Findings ❖ SSL models like BART consist of strong language generation capabilities ➢ Such models have seen a large amount of data during the pre-training ➢ BART-like models can perform well even without high-quality data ❖ Data-centric approaches are crucial when adopting tasks like autobiographical text summarization ➢ Designing better mechanisms to make use of available domain-specific data ❖ Human studies are essential and beneficial for evaluating generative models Findings related to RQ3 have submitted as a journal paper in ISRE 2022. S. Siriwardhana, Kalurachchi T, Chithralekha G Scholl P, Dissanayake V, Nanayakkara S. ``SummarizeMe: Abstractive Summarization System for Autobiographical Text'' Proceedings of the Information System Research (ISR) 2022 [Under review] RQ3
  • 21. 20 Domain adaptation of Compound Neural Architectures with Several Pre-trained SSL Models. ● Retrieval Augment Generation (RAG) model (Meta) ● Combines the information retrieval and seq2seq generation ● DPR neural retriever and BART generator RQ4 ● Open Domain Question Answering (ODQA) ● Works well for Wikipedia-based knowledge bases ● Less work on domain adaptation of ODQA
  • 22. 21 Domain Adaptation of the RAG RQ4
  • 23. 22 RAG-end2end and Introduction of an Auxiliary Signal Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering (Siriwardhana et al - 2022) End-to-End RAG retriever Reconstruction auxiliary signal RQ4
  • 24. 23 End2end retriever training improves the domain adaptation RAG-end2end and auxiliary signals can improve the overall results Empowering further research in the paradigm of retrieval augmentation RQ4 Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering (Siriwardhana et al - 2022)
  • 25. 24 Some of the Findings ❖ Different SSL-pre-trained models can be combined and create effective RAG-like pipelines. ❖ Retrieval models play a vital role in the domain adaptation of RAG. ➢ Neural retrieval models like DPR benefit from domain-specific fine-tuning since they are mainly trained with Wiki-based data. ❖ Auxiliary signals can improve the process of domain adaptation. ➢ A solution to the scarcity of domain-specific labeled data. Findings related to RQ4 have accepted as a journal paper in TACL 2022. Siriwardhana S, Weerasakera R, Kalurachchi T, Elliott W, Rana R, Nanayakkara S. ``Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open-Domain Question-Answering'' Transactions of the Association for Computational Linguistics TACL 2022 (will be presented at EMNLP - 2022) Impact Factor - 9.194 RQ4
  • 27. 26 ❖ Model compression techniques are important ➢ SSL model checkpoints are large ➢ Model Pruning and distillation ➢ Can support recent paradigms like federated learning Model pruning Federated learning Knowledge distillation ❖ Human-centric model evaluation is getting stronger ➢ Hallucinations and factual constancy is a significant area Future Work
  • 28. Future Work 27 ❖ Retrieval Augmentation could play a significant role in the field of AI ➢ Could be a solution to billion-dollar large pre-trained models ➢ Are we getting closer to human-like intelligence? Retrieval augmentation to make models to go beyond a parametric memory (Source Retro-deepmind (2022))
  • 29. Directly related publications ● S. Siriwardhana, T. Kaluarachchi, M. Billinghurst and S. Nanayakkara, "Multimodal Emotion Recognition With Transformer- Based Self Supervised Feature Fusion," in IEEE Access 2020. ● Siriwardhana S, Reis A, Weerasakera R, Nanayakkara S. ``Jointly Fine-Tuning BERT-like Self Supervised Models to Improve Multimodal Speech Emotion Recognition.'' Proceedings of the International Speech Communication Association, INTERSPEECH. 2020. ● Siriwardhana S, Kalurachchi T, Scholl P, Dissanayake V, Nanayakkara S. ``SummarizeMe: Abstractive Summarization System for Autobiographical Text'' Proceedings of the Information System Research (ISR) 2022 [Under review] ● Siriwardhana S, Weerasakera R, Kalurachchi T, Elliott W, Rana R, Nanayakkara S. ``Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open-Domain Question-Answering'' Transactions of the Association for Computational Linguistics TACL 2022 (will be presented at EMNLP - 2022) 29 Other Publications ● Wen, E., Kaluarachchi, T., Siriwardhana, S., Tang, V., Billinghurst, M., Lindeman, R.W., Yao, R., Lin, J. and Nanayakkara, S.C., 2022. VRhook: A Data Collection Tool for VR Motion Sickness Research. Proceedings of the Annual Conference of the User Interface Software and Technology UIST ’22. ● Kaluarachchi, T., Siriwardhana, S., Wenn, E., and Nanayakkara, S., A Corneal Surface Reflections-Based Intelligent System for Lifelogging Application. International Journal of Human Computer Interaction (IJHCI) 22(4), [Under Review]
  • 30. ❖ My supervisors ➢ Prof. Suranga Nanayakkara ➢ Prof. Mark Billinghurst ➢ Dr. Elliotte Wen ❖ Examination committee members ➢ Assoc Prof Kwan Hui Lim ➢ Assoc Prof Alan Wang ❖ The University of Auckland Doctoral Scholarship Programme ❖ All my Co-authors and lab members 30 Acknowledgement
  • 32. Appendix ● Pre-training is expensive ○ It is so expensive 30
  • 33. 32 ● Huge carbon footprint
  • 34. Performance Matters! 32 ● Pre-trained SSL models are performing exceptionally well for many tasks.
  • 35. 36 ● Retrieval augmented models have some important qualities
  • 36. Difference between IMA and Co-attention ● Co-attention doesn’t need any modification like adding a class Token, or few layers of transformers 37 IMA modification Co-attention
  • 37. DL features vs SSL features 38
  • 38. DL features vs SSL features 39 ● CNN Features off-the-shelf: an Astounding Baseline for Recognition (2014) ● PASS: An ImageNet replacement for self-supervised pretraining without humans (2021) ● Efficient Self-supervised Vision Transformers for Representation Learning (2022) (“When transferring to downstream linear classification tasks, EsViT outperforms its supervised counterpart on 17 out of 18 datasets. ”) ● Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms (2019) ● How Well Do Self-Supervised Models Transfer? (CVPR2022)
  • 39. 40 DL features vs SSL features
  • 40. GPT-3 (Open AI) ● 12 M $ to train ● 175 Billion Parameters (365GB) ● Bigger the better 33

Editor's Notes

  1. Deep Learning is important It has its limitations SSL as a savior It is becoming so popular
  2. Deep Learning is important It has its limitations SSL as a savior It is becoming so popular
  3. What is ssl It has two phases What is pre texting First phase is pre-texting which could take a lot of computational power
  4. But we do not have to worry about the pre-training all the time Big guns open source these models
  5. Focus of the thesis is utilization of the pre-trained models On downstream tasks - specially where we do not have much training data
  6. Research questions are separated by two main areas fusion and adoptation
  7. How to utilize features extracted from pre-trained frozen SSL models Conducted my experiments in multimodal emotion recognition Why multimodal ? because it is a challenging area
  8. I used three different frozen networks with different vector sizes and seq-lenths .. so it is not trivial to connect these dense embeddings
  9. So introduced a transformer based fusion mechanism. It showed competitive results
  10. I used three different frozen networks with different vector sizes and seq-lenths .. so it is not trivial to connect these dense embeddings
  11. Motivated from the first research question and conducted the experiments on he multimodal emotion recognition When the SSL came first, Transformer architecture was mainly for the text Then it got introduced to speech So can we use some special properties like CLS token
  12. Two transformers to represent both text and speech Having the same transformer architecture could help the fusion and improve the results
  13. Two fusion mechanism one employing the direct architectural properties Results on IEMOCAP dataset
  14. I used three different frozen networks with different vector sizes and seq-lenths .. so it is not trivial to connect these dense embeddings
  15. I used three different frozen networks with different vector sizes and seq-lenths .. so it is not trivial to connect these dense embeddings
  16. I used three different frozen networks with different vector sizes and seq-lenths .. so it is not trivial to connect these dense embeddings
  17. I used three different frozen networks with different vector sizes and seq-lenths .. so it is not trivial to connect these dense embeddings
  18. How DL has improved the field of NLU , but why still there are problems due to scarcity of data.