SlideShare a Scribd company logo
PERMs
Evaluating Parameter Efficient Learning for
Generation
Peng Xu, Mostofa Patwary, Shrimai Prabhumoye et al.
National Yang Ming Chiao Tung University, Hsinchu
Speaker: Po-Chuan Chen
May 9, 2023
1 / 30
PERMs
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
2 / 30
PERMs
Abstract
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
3 / 30
PERMs
Abstract
Abstract
In this paper, they present a comprehensive evaluation of parameter
efficient learning methods (PERMs) for generation tasks in natural
language processing.
They compare PERMs to finetuning from three new perspectives,
including
1 The impact of sample and model size
2 Generalization to unseen domains and datasets
3 Faithfulness of generations
4 / 30
PERMs
Abstract
Abstract
Their results show that PERMs can outperform finetuning in
certain scenarios, particularly when training with fewer samples
and using larger pre-trained language models.
This study provides valuable insights into the effectiveness of PERMs
for adapting pre-trained language models to downstream tasks.
5 / 30
PERMs
Introduction
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
6 / 30
PERMs
Introduction
Introduction
The recent advancements in pre-trained language models (PLMs)
have revolutionized the field of natural language processing (NLP),
enabling state-of-the-art performance on a wide range of tasks.
However, adapting these large and complex models to specific
downstream tasks can be computationally expensive and
time-consuming.
Parameter efficient learning methods (PERMs) have emerged as a
promising solution to this challenge, providing an efficient way for
PLMs to adapt to new tasks with limited training data.
7 / 30
PERMs
Introduction
Introduction
In this paper, they present a comprehensive evaluation of PERMs
for generation tasks in NLP, comparing their performance to
finetuning from three new perspectives.
Their study sheds light on the effectiveness of PERMs for adapting
PLMs to downstream tasks and provides valuable insights into their
potential applications in real-world scenarios.
8 / 30
PERMs
Introduction
Contribution
They conducted a thorough evaluation of parameter efficient
learning methods (PERMs) for generating natural language text
They compared PERMs to finetuning from three new
perspectives, including the impact of sample and model size,
generalization to new domains and datasets, and the accuracy
of generated text
Their study provides insights into how PERMs can help
pre-trained language models (PLMs) adapt to new tasks with
limited training data
They offer valuable information on how PERMs can be used in
real-world scenarios where training large models is difficult or
expensive
9 / 30
PERMs
Methodology
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
10 / 30
PERMs
Methodology
Methodology
They compare the following four PERMs to finetuning (FT) using
GPT-style models from Megatron-LM
1 Adapter (AP)
2 Prefix Tuning (PF)
3 Prompt Tuning (PT)
4 P-tuning
11 / 30
PERMs
Methodology
Adapter
This method adds an extra layer with a bottleneck structure by first
projecting input h to a low dimension using trainable weights Wdown
and then projecting up to the original dimension using trainable
weights Wup.
Adapter(h) = h + g(hWdown)Wup
where g is the activation function.
12 / 30
PERMs
Methodology
Prefix Tuning
It adds trainable prefix tokens at the beginning of each transformer
block.
K ← concat ([WK; K])
V ← concat ([WV; V])
13 / 30
PERMs
Methodology
Prompt Tuning
This method adds extra parameters to the embedding layer and uses
these trainable embeddings to prompt the input.
14 / 30
PERMs
Methodology
P-tuning
It adds a prompt encoder to encode pseudo prompts and the encoded
representation is used to prompt the input.
15 / 30
PERMs
Experimental Setup
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
16 / 30
PERMs
Experimental Setup
Experimental Setup
Datasets
1 Summarization (Xsum): split the Xsum dataset into news
articles for training and sports articles for testing.
2 Dialogue (Wazards / CMU DoG): they ignore the knowledge
retrieval step and take the golden knowledge for the response
generation. And they test their model over all test set dialogue
turns except the starting one.
Metrics
1 Quality Metrics
2 Faithfulness Metrics
17 / 30
PERMs
Results
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
18 / 30
PERMs
Results
Results
1 In-domain Results
2 Cross-domain and Cross-dataset Generalization
3 Faithfulness
19 / 30
PERMs
Results
In-domain Results
For the reasult, they think that it can be attributed to the structural bias
of Adapter.
The skip-connection structure allows Adapter to add a small deviation
to the activation, which makes the optimization of the PLM
checkpoint smooth.
20 / 30
PERMs
Results
In-domain Results
21 / 30
PERMs
Results
Scaling up to 530b model
Because Adapter gets better performances than other methods, they
apply AP to one of the largest GPT model, MT-NLG.
This result shows that decoder-only model can still beat
encoder-decoder model, but it needs a much larger model size.
22 / 30
PERMs
Results
Scaling up varying parameter sizes for PERMs
With model size, it’s for trainable parameters’ size, and the parameters
is for extra inference parameters.
23 / 30
PERMs
Results
Cross-domain and Cross-dataset Generalization
24 / 30
PERMs
Results
Cross-domain and Cross-dataset Generalization
25 / 30
PERMs
Results
Faithfulness
26 / 30
PERMs
Conclusion
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
27 / 30
PERMs
Conclusion
Conclusion
In this paper, they extensively compare PERMs with finetuning over
three main areas:
1 In-domain evaluation by scaling both the sample size and model
size
2 Cross-domain and cross-dataset generalization
3 Faithfulness of generations
Compared to finetuning, not all PERMs can easily achieve better
cross-domain and cross-dataset scores than finetuning even with large
PLM. Adapter is a better choice than other PERMs in such cases.
And, Prefix tuning is the best method for faithfulness.
28 / 30
PERMs
Limitations
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
29 / 30
PERMs
Limitations
Limitations
They are only able to qualitatively show the cross point when FT
is better than AP
Only for summarization and dialogue generation when choosing
between these methods
In faithfulness, when the model is large enough, and the dataset
is large too, PF achieves quite close scores to FT
30 / 30

More Related Content

Similar to Evaluating Parameter Efficient Learning for Generation.pdf

Neural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variablesNeural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variablesJonathan D'Cruz
 
Experimental comparison of ranking techniques
Experimental comparison of ranking techniquesExperimental comparison of ranking techniques
Experimental comparison of ranking techniquesjleyvlop
 
Model based test case prioritization using neural network classification
Model based test case prioritization using neural network classificationModel based test case prioritization using neural network classification
Model based test case prioritization using neural network classification
cseij
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
ijnlc
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
tsysglobalsolutions
 
Jf3515881595
Jf3515881595Jf3515881595
Jf3515881595
IJERA Editor
 
Kumar2021
Kumar2021Kumar2021
Kumar2021
SadhikaArora2
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...
André Gonçalves
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
Po-Chuan Chen
 
Automated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language ModelsAutomated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language Models
Nat Rice
 
Performance analysis of logic
Performance analysis of logicPerformance analysis of logic
Performance analysis of logic
ijcsa
 
result analysis for deep leakage from gradients
result analysis for deep leakage from gradientsresult analysis for deep leakage from gradients
result analysis for deep leakage from gradients
國騰 丁
 
Sequential estimation of_discrete_choice_models
Sequential estimation of_discrete_choice_modelsSequential estimation of_discrete_choice_models
Sequential estimation of_discrete_choice_models
YoussefKitane
 
journal.pone.0161879.PDF
journal.pone.0161879.PDFjournal.pone.0161879.PDF
journal.pone.0161879.PDFsankar basu
 
20211008 修論中間発表
20211008 修論中間発表20211008 修論中間発表
20211008 修論中間発表
Tomoya Koike
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
MLconf
 
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Po-Chuan Chen
 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrases
Yue Xiangnan
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
gerogepatton
 

Similar to Evaluating Parameter Efficient Learning for Generation.pdf (20)

Neural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variablesNeural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variables
 
Experimental comparison of ranking techniques
Experimental comparison of ranking techniquesExperimental comparison of ranking techniques
Experimental comparison of ranking techniques
 
Model based test case prioritization using neural network classification
Model based test case prioritization using neural network classificationModel based test case prioritization using neural network classification
Model based test case prioritization using neural network classification
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
Jf3515881595
Jf3515881595Jf3515881595
Jf3515881595
 
Kumar2021
Kumar2021Kumar2021
Kumar2021
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...
 
Deep leaning Vincent Vanhoucke
Deep leaning Vincent VanhouckeDeep leaning Vincent Vanhoucke
Deep leaning Vincent Vanhoucke
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
 
Automated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language ModelsAutomated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language Models
 
Performance analysis of logic
Performance analysis of logicPerformance analysis of logic
Performance analysis of logic
 
result analysis for deep leakage from gradients
result analysis for deep leakage from gradientsresult analysis for deep leakage from gradients
result analysis for deep leakage from gradients
 
Sequential estimation of_discrete_choice_models
Sequential estimation of_discrete_choice_modelsSequential estimation of_discrete_choice_models
Sequential estimation of_discrete_choice_models
 
journal.pone.0161879.PDF
journal.pone.0161879.PDFjournal.pone.0161879.PDF
journal.pone.0161879.PDF
 
20211008 修論中間発表
20211008 修論中間発表20211008 修論中間発表
20211008 修論中間発表
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
 
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrases
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
 

More from Po-Chuan Chen

Graph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdfGraph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdf
Po-Chuan Chen
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdfE-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
Po-Chuan Chen
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Po-Chuan Chen
 
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Po-Chuan Chen
 
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
Po-Chuan Chen
 
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
Po-Chuan Chen
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
Po-Chuan Chen
 
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdf
Po-Chuan Chen
 
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
Po-Chuan Chen
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
Po-Chuan Chen
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdf
Po-Chuan Chen
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Po-Chuan Chen
 
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfCold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Po-Chuan Chen
 
Image_to_Prompts.pdf
Image_to_Prompts.pdfImage_to_Prompts.pdf
Image_to_Prompts.pdf
Po-Chuan Chen
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Po-Chuan Chen
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Po-Chuan Chen
 
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
Po-Chuan Chen
 
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Po-Chuan Chen
 
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Po-Chuan Chen
 

More from Po-Chuan Chen (20)

Graph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdfGraph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdf
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdfE-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
 
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
 
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
 
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdf
 
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdf
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
 
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfCold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
 
Image_to_Prompts.pdf
Image_to_Prompts.pdfImage_to_Prompts.pdf
Image_to_Prompts.pdf
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdf
 
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
 
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
 
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
 

Recently uploaded

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 

Recently uploaded (20)

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 

Evaluating Parameter Efficient Learning for Generation.pdf

  • 1. PERMs Evaluating Parameter Efficient Learning for Generation Peng Xu, Mostofa Patwary, Shrimai Prabhumoye et al. National Yang Ming Chiao Tung University, Hsinchu Speaker: Po-Chuan Chen May 9, 2023 1 / 30
  • 2. PERMs Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 2 / 30
  • 3. PERMs Abstract Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 3 / 30
  • 4. PERMs Abstract Abstract In this paper, they present a comprehensive evaluation of parameter efficient learning methods (PERMs) for generation tasks in natural language processing. They compare PERMs to finetuning from three new perspectives, including 1 The impact of sample and model size 2 Generalization to unseen domains and datasets 3 Faithfulness of generations 4 / 30
  • 5. PERMs Abstract Abstract Their results show that PERMs can outperform finetuning in certain scenarios, particularly when training with fewer samples and using larger pre-trained language models. This study provides valuable insights into the effectiveness of PERMs for adapting pre-trained language models to downstream tasks. 5 / 30
  • 6. PERMs Introduction Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 6 / 30
  • 7. PERMs Introduction Introduction The recent advancements in pre-trained language models (PLMs) have revolutionized the field of natural language processing (NLP), enabling state-of-the-art performance on a wide range of tasks. However, adapting these large and complex models to specific downstream tasks can be computationally expensive and time-consuming. Parameter efficient learning methods (PERMs) have emerged as a promising solution to this challenge, providing an efficient way for PLMs to adapt to new tasks with limited training data. 7 / 30
  • 8. PERMs Introduction Introduction In this paper, they present a comprehensive evaluation of PERMs for generation tasks in NLP, comparing their performance to finetuning from three new perspectives. Their study sheds light on the effectiveness of PERMs for adapting PLMs to downstream tasks and provides valuable insights into their potential applications in real-world scenarios. 8 / 30
  • 9. PERMs Introduction Contribution They conducted a thorough evaluation of parameter efficient learning methods (PERMs) for generating natural language text They compared PERMs to finetuning from three new perspectives, including the impact of sample and model size, generalization to new domains and datasets, and the accuracy of generated text Their study provides insights into how PERMs can help pre-trained language models (PLMs) adapt to new tasks with limited training data They offer valuable information on how PERMs can be used in real-world scenarios where training large models is difficult or expensive 9 / 30
  • 10. PERMs Methodology Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 10 / 30
  • 11. PERMs Methodology Methodology They compare the following four PERMs to finetuning (FT) using GPT-style models from Megatron-LM 1 Adapter (AP) 2 Prefix Tuning (PF) 3 Prompt Tuning (PT) 4 P-tuning 11 / 30
  • 12. PERMs Methodology Adapter This method adds an extra layer with a bottleneck structure by first projecting input h to a low dimension using trainable weights Wdown and then projecting up to the original dimension using trainable weights Wup. Adapter(h) = h + g(hWdown)Wup where g is the activation function. 12 / 30
  • 13. PERMs Methodology Prefix Tuning It adds trainable prefix tokens at the beginning of each transformer block. K ← concat ([WK; K]) V ← concat ([WV; V]) 13 / 30
  • 14. PERMs Methodology Prompt Tuning This method adds extra parameters to the embedding layer and uses these trainable embeddings to prompt the input. 14 / 30
  • 15. PERMs Methodology P-tuning It adds a prompt encoder to encode pseudo prompts and the encoded representation is used to prompt the input. 15 / 30
  • 16. PERMs Experimental Setup Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 16 / 30
  • 17. PERMs Experimental Setup Experimental Setup Datasets 1 Summarization (Xsum): split the Xsum dataset into news articles for training and sports articles for testing. 2 Dialogue (Wazards / CMU DoG): they ignore the knowledge retrieval step and take the golden knowledge for the response generation. And they test their model over all test set dialogue turns except the starting one. Metrics 1 Quality Metrics 2 Faithfulness Metrics 17 / 30
  • 18. PERMs Results Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 18 / 30
  • 19. PERMs Results Results 1 In-domain Results 2 Cross-domain and Cross-dataset Generalization 3 Faithfulness 19 / 30
  • 20. PERMs Results In-domain Results For the reasult, they think that it can be attributed to the structural bias of Adapter. The skip-connection structure allows Adapter to add a small deviation to the activation, which makes the optimization of the PLM checkpoint smooth. 20 / 30
  • 22. PERMs Results Scaling up to 530b model Because Adapter gets better performances than other methods, they apply AP to one of the largest GPT model, MT-NLG. This result shows that decoder-only model can still beat encoder-decoder model, but it needs a much larger model size. 22 / 30
  • 23. PERMs Results Scaling up varying parameter sizes for PERMs With model size, it’s for trainable parameters’ size, and the parameters is for extra inference parameters. 23 / 30
  • 27. PERMs Conclusion Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 27 / 30
  • 28. PERMs Conclusion Conclusion In this paper, they extensively compare PERMs with finetuning over three main areas: 1 In-domain evaluation by scaling both the sample size and model size 2 Cross-domain and cross-dataset generalization 3 Faithfulness of generations Compared to finetuning, not all PERMs can easily achieve better cross-domain and cross-dataset scores than finetuning even with large PLM. Adapter is a better choice than other PERMs in such cases. And, Prefix tuning is the best method for faithfulness. 28 / 30
  • 29. PERMs Limitations Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 29 / 30
  • 30. PERMs Limitations Limitations They are only able to qualitatively show the cross point when FT is better than AP Only for summarization and dialogue generation when choosing between these methods In faithfulness, when the model is large enough, and the dataset is large too, PF achieves quite close scores to FT 30 / 30