SlideShare a Scribd company logo
1 of 30
Download to read offline
PERMs
Evaluating Parameter Efficient Learning for
Generation
Peng Xu, Mostofa Patwary, Shrimai Prabhumoye et al.
National Yang Ming Chiao Tung University, Hsinchu
Speaker: Po-Chuan Chen
May 9, 2023
1 / 30
PERMs
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
2 / 30
PERMs
Abstract
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
3 / 30
PERMs
Abstract
Abstract
In this paper, they present a comprehensive evaluation of parameter
efficient learning methods (PERMs) for generation tasks in natural
language processing.
They compare PERMs to finetuning from three new perspectives,
including
1 The impact of sample and model size
2 Generalization to unseen domains and datasets
3 Faithfulness of generations
4 / 30
PERMs
Abstract
Abstract
Their results show that PERMs can outperform finetuning in
certain scenarios, particularly when training with fewer samples
and using larger pre-trained language models.
This study provides valuable insights into the effectiveness of PERMs
for adapting pre-trained language models to downstream tasks.
5 / 30
PERMs
Introduction
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
6 / 30
PERMs
Introduction
Introduction
The recent advancements in pre-trained language models (PLMs)
have revolutionized the field of natural language processing (NLP),
enabling state-of-the-art performance on a wide range of tasks.
However, adapting these large and complex models to specific
downstream tasks can be computationally expensive and
time-consuming.
Parameter efficient learning methods (PERMs) have emerged as a
promising solution to this challenge, providing an efficient way for
PLMs to adapt to new tasks with limited training data.
7 / 30
PERMs
Introduction
Introduction
In this paper, they present a comprehensive evaluation of PERMs
for generation tasks in NLP, comparing their performance to
finetuning from three new perspectives.
Their study sheds light on the effectiveness of PERMs for adapting
PLMs to downstream tasks and provides valuable insights into their
potential applications in real-world scenarios.
8 / 30
PERMs
Introduction
Contribution
They conducted a thorough evaluation of parameter efficient
learning methods (PERMs) for generating natural language text
They compared PERMs to finetuning from three new
perspectives, including the impact of sample and model size,
generalization to new domains and datasets, and the accuracy
of generated text
Their study provides insights into how PERMs can help
pre-trained language models (PLMs) adapt to new tasks with
limited training data
They offer valuable information on how PERMs can be used in
real-world scenarios where training large models is difficult or
expensive
9 / 30
PERMs
Methodology
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
10 / 30
PERMs
Methodology
Methodology
They compare the following four PERMs to finetuning (FT) using
GPT-style models from Megatron-LM
1 Adapter (AP)
2 Prefix Tuning (PF)
3 Prompt Tuning (PT)
4 P-tuning
11 / 30
PERMs
Methodology
Adapter
This method adds an extra layer with a bottleneck structure by first
projecting input h to a low dimension using trainable weights Wdown
and then projecting up to the original dimension using trainable
weights Wup.
Adapter(h) = h + g(hWdown)Wup
where g is the activation function.
12 / 30
PERMs
Methodology
Prefix Tuning
It adds trainable prefix tokens at the beginning of each transformer
block.
K ← concat ([WK; K])
V ← concat ([WV; V])
13 / 30
PERMs
Methodology
Prompt Tuning
This method adds extra parameters to the embedding layer and uses
these trainable embeddings to prompt the input.
14 / 30
PERMs
Methodology
P-tuning
It adds a prompt encoder to encode pseudo prompts and the encoded
representation is used to prompt the input.
15 / 30
PERMs
Experimental Setup
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
16 / 30
PERMs
Experimental Setup
Experimental Setup
Datasets
1 Summarization (Xsum): split the Xsum dataset into news
articles for training and sports articles for testing.
2 Dialogue (Wazards / CMU DoG): they ignore the knowledge
retrieval step and take the golden knowledge for the response
generation. And they test their model over all test set dialogue
turns except the starting one.
Metrics
1 Quality Metrics
2 Faithfulness Metrics
17 / 30
PERMs
Results
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
18 / 30
PERMs
Results
Results
1 In-domain Results
2 Cross-domain and Cross-dataset Generalization
3 Faithfulness
19 / 30
PERMs
Results
In-domain Results
For the reasult, they think that it can be attributed to the structural bias
of Adapter.
The skip-connection structure allows Adapter to add a small deviation
to the activation, which makes the optimization of the PLM
checkpoint smooth.
20 / 30
PERMs
Results
In-domain Results
21 / 30
PERMs
Results
Scaling up to 530b model
Because Adapter gets better performances than other methods, they
apply AP to one of the largest GPT model, MT-NLG.
This result shows that decoder-only model can still beat
encoder-decoder model, but it needs a much larger model size.
22 / 30
PERMs
Results
Scaling up varying parameter sizes for PERMs
With model size, it’s for trainable parameters’ size, and the parameters
is for extra inference parameters.
23 / 30
PERMs
Results
Cross-domain and Cross-dataset Generalization
24 / 30
PERMs
Results
Cross-domain and Cross-dataset Generalization
25 / 30
PERMs
Results
Faithfulness
26 / 30
PERMs
Conclusion
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
27 / 30
PERMs
Conclusion
Conclusion
In this paper, they extensively compare PERMs with finetuning over
three main areas:
1 In-domain evaluation by scaling both the sample size and model
size
2 Cross-domain and cross-dataset generalization
3 Faithfulness of generations
Compared to finetuning, not all PERMs can easily achieve better
cross-domain and cross-dataset scores than finetuning even with large
PLM. Adapter is a better choice than other PERMs in such cases.
And, Prefix tuning is the best method for faithfulness.
28 / 30
PERMs
Limitations
Table of contents
1 Abstract
2 Introduction
3 Methodology
4 Experimental Setup
5 Results
6 Conclusion
7 Limitations
29 / 30
PERMs
Limitations
Limitations
They are only able to qualitatively show the cross point when FT
is better than AP
Only for summarization and dialogue generation when choosing
between these methods
In faithfulness, when the model is large enough, and the dataset
is large too, PF achieves quite close scores to FT
30 / 30

More Related Content

Similar to Evaluating Parameter Efficient Learning for Generation.pdf

Model based test case prioritization using neural network classification
Model based test case prioritization using neural network classificationModel based test case prioritization using neural network classification
Model based test case prioritization using neural network classificationcseij
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...tsysglobalsolutions
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...André Gonçalves
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfPo-Chuan Chen
 
Automated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language ModelsAutomated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language ModelsNat Rice
 
Performance analysis of logic
Performance analysis of logicPerformance analysis of logic
Performance analysis of logicijcsa
 
result analysis for deep leakage from gradients
result analysis for deep leakage from gradientsresult analysis for deep leakage from gradients
result analysis for deep leakage from gradients國騰 丁
 
Sequential estimation of_discrete_choice_models
Sequential estimation of_discrete_choice_modelsSequential estimation of_discrete_choice_models
Sequential estimation of_discrete_choice_modelsYoussefKitane
 
journal.pone.0161879.PDF
journal.pone.0161879.PDFjournal.pone.0161879.PDF
journal.pone.0161879.PDFsankar basu
 
20211008 修論中間発表
20211008 修論中間発表20211008 修論中間発表
20211008 修論中間発表Tomoya Koike
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017 MLconf
 
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Po-Chuan Chen
 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrasesYue Xiangnan
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONgerogepatton
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONijaia
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONgerogepatton
 

Similar to Evaluating Parameter Efficient Learning for Generation.pdf (20)

Model based test case prioritization using neural network classification
Model based test case prioritization using neural network classificationModel based test case prioritization using neural network classification
Model based test case prioritization using neural network classification
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
Jf3515881595
Jf3515881595Jf3515881595
Jf3515881595
 
Kumar2021
Kumar2021Kumar2021
Kumar2021
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...
 
Deep leaning Vincent Vanhoucke
Deep leaning Vincent VanhouckeDeep leaning Vincent Vanhoucke
Deep leaning Vincent Vanhoucke
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
 
Automated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language ModelsAutomated Essay Scoring Using Efficient Transformer-Based Language Models
Automated Essay Scoring Using Efficient Transformer-Based Language Models
 
Performance analysis of logic
Performance analysis of logicPerformance analysis of logic
Performance analysis of logic
 
result analysis for deep leakage from gradients
result analysis for deep leakage from gradientsresult analysis for deep leakage from gradients
result analysis for deep leakage from gradients
 
Sequential estimation of_discrete_choice_models
Sequential estimation of_discrete_choice_modelsSequential estimation of_discrete_choice_models
Sequential estimation of_discrete_choice_models
 
journal.pone.0161879.PDF
journal.pone.0161879.PDFjournal.pone.0161879.PDF
journal.pone.0161879.PDF
 
20211008 修論中間発表
20211008 修論中間発表20211008 修論中間発表
20211008 修論中間発表
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017
 
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrases
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
 

More from Po-Chuan Chen

E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdfE-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdfPo-Chuan Chen
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfPo-Chuan Chen
 
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Po-Chuan Chen
 
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfPo-Chuan Chen
 
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...Po-Chuan Chen
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfPo-Chuan Chen
 
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfPo-Chuan Chen
 
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfPo-Chuan Chen
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...Po-Chuan Chen
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfPo-Chuan Chen
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfPo-Chuan Chen
 
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfCold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfPo-Chuan Chen
 
Image_to_Prompts.pdf
Image_to_Prompts.pdfImage_to_Prompts.pdf
Image_to_Prompts.pdfPo-Chuan Chen
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfPo-Chuan Chen
 
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfPo-Chuan Chen
 
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdfPo-Chuan Chen
 
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Po-Chuan Chen
 
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...Po-Chuan Chen
 
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...Po-Chuan Chen
 

More from Po-Chuan Chen (20)

E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdfE-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
 
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
 
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
 
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdf
 
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdf
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
 
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfCold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
 
Image_to_Prompts.pdf
Image_to_Prompts.pdfImage_to_Prompts.pdf
Image_to_Prompts.pdf
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdf
 
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
 
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
 
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
 
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
 
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
 

Recently uploaded

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 

Recently uploaded (20)

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 

Evaluating Parameter Efficient Learning for Generation.pdf

  • 1. PERMs Evaluating Parameter Efficient Learning for Generation Peng Xu, Mostofa Patwary, Shrimai Prabhumoye et al. National Yang Ming Chiao Tung University, Hsinchu Speaker: Po-Chuan Chen May 9, 2023 1 / 30
  • 2. PERMs Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 2 / 30
  • 3. PERMs Abstract Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 3 / 30
  • 4. PERMs Abstract Abstract In this paper, they present a comprehensive evaluation of parameter efficient learning methods (PERMs) for generation tasks in natural language processing. They compare PERMs to finetuning from three new perspectives, including 1 The impact of sample and model size 2 Generalization to unseen domains and datasets 3 Faithfulness of generations 4 / 30
  • 5. PERMs Abstract Abstract Their results show that PERMs can outperform finetuning in certain scenarios, particularly when training with fewer samples and using larger pre-trained language models. This study provides valuable insights into the effectiveness of PERMs for adapting pre-trained language models to downstream tasks. 5 / 30
  • 6. PERMs Introduction Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 6 / 30
  • 7. PERMs Introduction Introduction The recent advancements in pre-trained language models (PLMs) have revolutionized the field of natural language processing (NLP), enabling state-of-the-art performance on a wide range of tasks. However, adapting these large and complex models to specific downstream tasks can be computationally expensive and time-consuming. Parameter efficient learning methods (PERMs) have emerged as a promising solution to this challenge, providing an efficient way for PLMs to adapt to new tasks with limited training data. 7 / 30
  • 8. PERMs Introduction Introduction In this paper, they present a comprehensive evaluation of PERMs for generation tasks in NLP, comparing their performance to finetuning from three new perspectives. Their study sheds light on the effectiveness of PERMs for adapting PLMs to downstream tasks and provides valuable insights into their potential applications in real-world scenarios. 8 / 30
  • 9. PERMs Introduction Contribution They conducted a thorough evaluation of parameter efficient learning methods (PERMs) for generating natural language text They compared PERMs to finetuning from three new perspectives, including the impact of sample and model size, generalization to new domains and datasets, and the accuracy of generated text Their study provides insights into how PERMs can help pre-trained language models (PLMs) adapt to new tasks with limited training data They offer valuable information on how PERMs can be used in real-world scenarios where training large models is difficult or expensive 9 / 30
  • 10. PERMs Methodology Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 10 / 30
  • 11. PERMs Methodology Methodology They compare the following four PERMs to finetuning (FT) using GPT-style models from Megatron-LM 1 Adapter (AP) 2 Prefix Tuning (PF) 3 Prompt Tuning (PT) 4 P-tuning 11 / 30
  • 12. PERMs Methodology Adapter This method adds an extra layer with a bottleneck structure by first projecting input h to a low dimension using trainable weights Wdown and then projecting up to the original dimension using trainable weights Wup. Adapter(h) = h + g(hWdown)Wup where g is the activation function. 12 / 30
  • 13. PERMs Methodology Prefix Tuning It adds trainable prefix tokens at the beginning of each transformer block. K ← concat ([WK; K]) V ← concat ([WV; V]) 13 / 30
  • 14. PERMs Methodology Prompt Tuning This method adds extra parameters to the embedding layer and uses these trainable embeddings to prompt the input. 14 / 30
  • 15. PERMs Methodology P-tuning It adds a prompt encoder to encode pseudo prompts and the encoded representation is used to prompt the input. 15 / 30
  • 16. PERMs Experimental Setup Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 16 / 30
  • 17. PERMs Experimental Setup Experimental Setup Datasets 1 Summarization (Xsum): split the Xsum dataset into news articles for training and sports articles for testing. 2 Dialogue (Wazards / CMU DoG): they ignore the knowledge retrieval step and take the golden knowledge for the response generation. And they test their model over all test set dialogue turns except the starting one. Metrics 1 Quality Metrics 2 Faithfulness Metrics 17 / 30
  • 18. PERMs Results Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 18 / 30
  • 19. PERMs Results Results 1 In-domain Results 2 Cross-domain and Cross-dataset Generalization 3 Faithfulness 19 / 30
  • 20. PERMs Results In-domain Results For the reasult, they think that it can be attributed to the structural bias of Adapter. The skip-connection structure allows Adapter to add a small deviation to the activation, which makes the optimization of the PLM checkpoint smooth. 20 / 30
  • 22. PERMs Results Scaling up to 530b model Because Adapter gets better performances than other methods, they apply AP to one of the largest GPT model, MT-NLG. This result shows that decoder-only model can still beat encoder-decoder model, but it needs a much larger model size. 22 / 30
  • 23. PERMs Results Scaling up varying parameter sizes for PERMs With model size, it’s for trainable parameters’ size, and the parameters is for extra inference parameters. 23 / 30
  • 27. PERMs Conclusion Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 27 / 30
  • 28. PERMs Conclusion Conclusion In this paper, they extensively compare PERMs with finetuning over three main areas: 1 In-domain evaluation by scaling both the sample size and model size 2 Cross-domain and cross-dataset generalization 3 Faithfulness of generations Compared to finetuning, not all PERMs can easily achieve better cross-domain and cross-dataset scores than finetuning even with large PLM. Adapter is a better choice than other PERMs in such cases. And, Prefix tuning is the best method for faithfulness. 28 / 30
  • 29. PERMs Limitations Table of contents 1 Abstract 2 Introduction 3 Methodology 4 Experimental Setup 5 Results 6 Conclusion 7 Limitations 29 / 30
  • 30. PERMs Limitations Limitations They are only able to qualitatively show the cross point when FT is better than AP Only for summarization and dialogue generation when choosing between these methods In faithfulness, when the model is large enough, and the dataset is large too, PF achieves quite close scores to FT 30 / 30