SlideShare a Scribd company logo
1 of 22
Download to read offline
Multi-source Meta Transfer for Low Resource
Multiple-Choice Question Answering
Daeung Kim
2020.09.18.
dukim@dongguk.edu
INDEX
1. Prerequisites
2. Multi-source Meta Transfer
3. Experiments & Results
4. Conclusions
1. Prerequisites
1. Prerequisites
Low resource & Domain discrepancy
https://slideslive.com/38929127/multisource-meta-transfer-for-low-resource-multiplechoice-question-answering
140
120
113.5
113
108
97.6
13.9
6.1
2.6
0 20 40 60 80 100 120 140 160
SEARCHQA
NEWSQA
SWAG
HOTPOTQA
SQUAD
RACE
SEMEVAL
DREAM
MCTEST
Data Size (K)
Question
Answering
Dataset
Data Size of Question Answering Datasets
• Scenario Text
• Wikipedia
• Exam
• Narrative Text
• Dialogue
• Story
Extractive/
Abstractive
MCQA
Most existing MCQA datasets are small in size Corpus from different domains
1. Prerequisites
How does meta learning work?
• Problem
• Low resource setting
• Domain discrepancy
• Existing Methods
• Transfer learning
• Multi-task learning
Fine-tuning on the target domain
Transfer Learning Multi-task learning
S
S
T
S
T
S
Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
Jin, Di, et al. "Mmm: Multi-stage multi-task learning for multi-choice reading comprehension." arXiv preprint arXiv:1910.00458 (2019).
1. Prerequisites
How does meta learning work?
• Meta-Learning
• Model-Agnostic Meta Learning
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙𝑙𝑙 =: 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄(𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙𝑚𝑚)
FFL 𝑦𝑦𝑙𝑙 = 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙𝑙𝑙 𝑤𝑤𝑙𝑙, 𝑥𝑥𝑙𝑙
BPL 𝑤𝑤𝑙𝑙 =: 𝑤𝑤𝑙𝑙 + 𝛼𝛼
𝜕𝜕𝐽𝐽𝑙𝑙
𝜕𝜕𝑤𝑤𝑙𝑙
FFL 𝑦𝑦𝑚𝑚 = 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙𝑙𝑙 𝑤𝑤𝑙𝑙, 𝑥𝑥𝑚𝑚
BPL 𝑤𝑤𝑚𝑚 =: 𝑤𝑤𝑚𝑚 + 𝛼𝛼
𝜕𝜕𝐽𝐽𝑚𝑚
𝜕𝜕𝑤𝑤𝑚𝑚
𝐽𝐽: 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ∶ 𝑥𝑥𝑙𝑙~𝑋𝑋 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ∶ 𝑥𝑥𝑚𝑚~𝑋𝑋
𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑤𝑤𝑚𝑚 from
𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
Fast Adaptation
Meta-Learning
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." arXiv preprint arXiv:1703.03400 (2017).
1. Prerequisites
How does meta learning work?
• Learn a model that can generalize over the task distribution
2. Multi-source
Meta Transfer
2. Multi-source Meta Transfer(MMT)
Meta-Learning vs. Multi-source Meta Transfer(MMT)
• MMT learns knowledge from multiple source
• Reduce discrepancy between sources and target
2. Multi-source Meta Transfer
Multi-source Meta Transfer
1. Multi-source Meta Learning(MML)
2. Meta-Transfer Learning(MTL)
2 3
4
4
3
1
2
Target
Representation spac
e
1
Input space
Target
MMT model
Supervised MMT
Task in source 3
Task in source 4
Task in source 1
Task in source 2
MMT representation
Source representation
MTL
MML
2. Multi-source Meta Transfer
1. Multi-source Meta Learning(MML)
• Learn knowledge from multiple sources
• Learn a representation near to the target
2 3
4
4
3
1
2
Target
Representation spac
e
1
Input space
Target
MMT model
Supervised MMT
Task in source 3
Task in source 4
Task in source 1
Task in source 2
MMT representation
Source representation
MTL
MML
2. Multi-source Meta Transfer
2. Meta-Transfer Learning(MTL)
• Finetune meta-model to the target source
2 3
4
4
3
1
2
Target
Representation spac
e
1
Input space
Target
MMT model
Supervised MMT
Task in source 3
Task in source 4
Task in source 1
Task in source 2
MMT representation
Source representation
MTL
MML
2. Multi-source Meta Transfer
How MMT samples the task?
• Samples the number of choices equal to the number of choices in the target task.
• The correct answer choice must be included, and the number of choices that the source task
has must be equal to or greater than the number of choices of the target task.
Target Task
(MCQA task that has 3 choices)
2. Multi-source Meta Transfer
How MMT samples the task?
Target Task
𝐾𝐾
𝑁𝑁
𝜏𝜏𝑖𝑖
1
𝜏𝜏𝑖𝑖
2
𝜏𝜏𝑖𝑖
3
2. Multi-source Meta Transfer
How MMT samples the task?
MML
• MMT is agnostic to backbone models
• Sequentially do below algorithm on source datasets
• Sample Support task and Query task from
the same distribution
• Update the learner (𝜃𝜃′) on support task
• Update the meta model (𝜃𝜃′) on query task
• Update the meta model (𝜃𝜃′) on target data
MTL
• Transfer meta model to the target
MML
MTL
3. Experiments
& Results
3. Experiments & Results
Experiments
• Settings
• The backbone model : BERT & RoBERTa
• The maximal sequence input length : 512 for BERT, 256 for RoBERTa
• The model optimization : Adam, initial lr of fast adapt : 1e-3, the rest ones are set to 1e-5
• Dataset
3. Experiments & Results
Results
• MMT(RoBERTa) achieves the best performances overall benchmark datasets
• MMT is able to boost up performance over different pre-trained language models
• MMT is backbone-free, It can improve the performance with the advance of LMs
3. Experiments & Results
How to select sources?
• Source selection is prerequisite step for MMT
• Dissimilar data sources will cause negative transfer when their distribution is far away from
the target one.
• To handle this, the model learns from dissimilar sources to similar one
4. Conclusion
4. Conclusion
• MMT extends meta learning to multiple sources on MCQA task
• MMT provides an algorithm both for supervised and unsupervised meta
training
• MMT gives a guideline to source selection
Q & A

More Related Content

What's hot

Text clustering
Text clusteringText clustering
Text clustering
KU Leuven
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
KU Leuven
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
KU Leuven
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Dustin Smith
 
Indexing for Large DNA Database sequences
Indexing for Large DNA Database sequencesIndexing for Large DNA Database sequences
Indexing for Large DNA Database sequences
CSCJournals
 
Multi label text classification
Multi label text classificationMulti label text classification
Multi label text classification
raghavr186
 
Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositories
feiwin
 

What's hot (20)

Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach
 
Text clustering
Text clusteringText clustering
Text clustering
 
Query expansion
Query expansionQuery expansion
Query expansion
 
Text categorization
Text categorizationText categorization
Text categorization
 
Meta learning with memory augmented neural network
Meta learning with memory augmented neural networkMeta learning with memory augmented neural network
Meta learning with memory augmented neural network
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
 
Quality Metrics for Linked Open Data
Quality Metrics for  Linked Open Data Quality Metrics for  Linked Open Data
Quality Metrics for Linked Open Data
 
Using lexical chains for text summarization
Using lexical chains for text summarizationUsing lexical chains for text summarization
Using lexical chains for text summarization
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
A probabilistic approach to string transformation
A probabilistic approach to string transformationA probabilistic approach to string transformation
A probabilistic approach to string transformation
 
Indexing for Large DNA Database sequences
Indexing for Large DNA Database sequencesIndexing for Large DNA Database sequences
Indexing for Large DNA Database sequences
 
Multi label text classification
Multi label text classificationMulti label text classification
Multi label text classification
 
Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositories
 
Neural Summarization by Extracting Sentences and Words
Neural Summarization by Extracting Sentences and WordsNeural Summarization by Extracting Sentences and Words
Neural Summarization by Extracting Sentences and Words
 

Similar to Multi source meta transfer for low resource multiple-choice question answering-

Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit
 
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
neju3
 

Similar to Multi source meta transfer for low resource multiple-choice question answering- (20)

Unsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domains
 
Rapid pruning of search space through hierarchical matching
Rapid pruning of search space through hierarchical matchingRapid pruning of search space through hierarchical matching
Rapid pruning of search space through hierarchical matching
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
Named Entity Recognition from Online News
Named Entity Recognition from Online NewsNamed Entity Recognition from Online News
Named Entity Recognition from Online News
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorization
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization
 
Dstc6 an introduction
Dstc6 an introductionDstc6 an introduction
Dstc6 an introduction
 
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
Searching for Quality: Genetic Algorithms and Metamorphic Testing for Softwar...
 
Incremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF GraphsIncremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF Graphs
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Multi source meta transfer for low resource multiple-choice question answering-

  • 1. Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering Daeung Kim 2020.09.18. dukim@dongguk.edu
  • 2. INDEX 1. Prerequisites 2. Multi-source Meta Transfer 3. Experiments & Results 4. Conclusions
  • 4. 1. Prerequisites Low resource & Domain discrepancy https://slideslive.com/38929127/multisource-meta-transfer-for-low-resource-multiplechoice-question-answering 140 120 113.5 113 108 97.6 13.9 6.1 2.6 0 20 40 60 80 100 120 140 160 SEARCHQA NEWSQA SWAG HOTPOTQA SQUAD RACE SEMEVAL DREAM MCTEST Data Size (K) Question Answering Dataset Data Size of Question Answering Datasets • Scenario Text • Wikipedia • Exam • Narrative Text • Dialogue • Story Extractive/ Abstractive MCQA Most existing MCQA datasets are small in size Corpus from different domains
  • 5. 1. Prerequisites How does meta learning work? • Problem • Low resource setting • Domain discrepancy • Existing Methods • Transfer learning • Multi-task learning Fine-tuning on the target domain Transfer Learning Multi-task learning S S T S T S Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018). Jin, Di, et al. "Mmm: Multi-stage multi-task learning for multi-choice reading comprehension." arXiv preprint arXiv:1910.00458 (2019).
  • 6. 1. Prerequisites How does meta learning work? • Meta-Learning • Model-Agnostic Meta Learning 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙𝑙𝑙 =: 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄(𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙𝑚𝑚) FFL 𝑦𝑦𝑙𝑙 = 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙𝑙𝑙 𝑤𝑤𝑙𝑙, 𝑥𝑥𝑙𝑙 BPL 𝑤𝑤𝑙𝑙 =: 𝑤𝑤𝑙𝑙 + 𝛼𝛼 𝜕𝜕𝐽𝐽𝑙𝑙 𝜕𝜕𝑤𝑤𝑙𝑙 FFL 𝑦𝑦𝑚𝑚 = 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙𝑙𝑙 𝑤𝑤𝑙𝑙, 𝑥𝑥𝑚𝑚 BPL 𝑤𝑤𝑚𝑚 =: 𝑤𝑤𝑚𝑚 + 𝛼𝛼 𝜕𝜕𝐽𝐽𝑚𝑚 𝜕𝜕𝑤𝑤𝑚𝑚 𝐽𝐽: 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ∶ 𝑥𝑥𝑙𝑙~𝑋𝑋 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ∶ 𝑥𝑥𝑚𝑚~𝑋𝑋 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑤𝑤𝑚𝑚 from 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 Fast Adaptation Meta-Learning Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." arXiv preprint arXiv:1703.03400 (2017).
  • 7. 1. Prerequisites How does meta learning work? • Learn a model that can generalize over the task distribution
  • 9. 2. Multi-source Meta Transfer(MMT) Meta-Learning vs. Multi-source Meta Transfer(MMT) • MMT learns knowledge from multiple source • Reduce discrepancy between sources and target
  • 10. 2. Multi-source Meta Transfer Multi-source Meta Transfer 1. Multi-source Meta Learning(MML) 2. Meta-Transfer Learning(MTL) 2 3 4 4 3 1 2 Target Representation spac e 1 Input space Target MMT model Supervised MMT Task in source 3 Task in source 4 Task in source 1 Task in source 2 MMT representation Source representation MTL MML
  • 11. 2. Multi-source Meta Transfer 1. Multi-source Meta Learning(MML) • Learn knowledge from multiple sources • Learn a representation near to the target 2 3 4 4 3 1 2 Target Representation spac e 1 Input space Target MMT model Supervised MMT Task in source 3 Task in source 4 Task in source 1 Task in source 2 MMT representation Source representation MTL MML
  • 12. 2. Multi-source Meta Transfer 2. Meta-Transfer Learning(MTL) • Finetune meta-model to the target source 2 3 4 4 3 1 2 Target Representation spac e 1 Input space Target MMT model Supervised MMT Task in source 3 Task in source 4 Task in source 1 Task in source 2 MMT representation Source representation MTL MML
  • 13. 2. Multi-source Meta Transfer How MMT samples the task? • Samples the number of choices equal to the number of choices in the target task. • The correct answer choice must be included, and the number of choices that the source task has must be equal to or greater than the number of choices of the target task. Target Task (MCQA task that has 3 choices)
  • 14. 2. Multi-source Meta Transfer How MMT samples the task? Target Task 𝐾𝐾 𝑁𝑁 𝜏𝜏𝑖𝑖 1 𝜏𝜏𝑖𝑖 2 𝜏𝜏𝑖𝑖 3
  • 15. 2. Multi-source Meta Transfer How MMT samples the task? MML • MMT is agnostic to backbone models • Sequentially do below algorithm on source datasets • Sample Support task and Query task from the same distribution • Update the learner (𝜃𝜃′) on support task • Update the meta model (𝜃𝜃′) on query task • Update the meta model (𝜃𝜃′) on target data MTL • Transfer meta model to the target MML MTL
  • 17. 3. Experiments & Results Experiments • Settings • The backbone model : BERT & RoBERTa • The maximal sequence input length : 512 for BERT, 256 for RoBERTa • The model optimization : Adam, initial lr of fast adapt : 1e-3, the rest ones are set to 1e-5 • Dataset
  • 18. 3. Experiments & Results Results • MMT(RoBERTa) achieves the best performances overall benchmark datasets • MMT is able to boost up performance over different pre-trained language models • MMT is backbone-free, It can improve the performance with the advance of LMs
  • 19. 3. Experiments & Results How to select sources? • Source selection is prerequisite step for MMT • Dissimilar data sources will cause negative transfer when their distribution is far away from the target one. • To handle this, the model learns from dissimilar sources to similar one
  • 21. 4. Conclusion • MMT extends meta learning to multiple sources on MCQA task • MMT provides an algorithm both for supervised and unsupervised meta training • MMT gives a guideline to source selection
  • 22. Q & A