SlideShare a Scribd company logo
1 of 45
Download to read offline
MetaPrompter
Effective Structured Prompting by Meta-Learning
and Representative Verbalizer
ICML, 2023
Weisen Jiang, Yu Zhang, James T. Kwok
Speaker: Po-Chuan Chen
Apr 11, 2024
1 / 45
MetaPrompter
Table of contents
1 Abstract
2 Introduction
3 Preliminaries and Related Work
4 Proposed Method
5 Experiments
6 Conclusion
2 / 45
MetaPrompter
Abstract
Table of contents
1 Abstract
2 Introduction
3 Preliminaries and Related Work
4 Proposed Method
5 Experiments
6 Conclusion
3 / 45
MetaPrompter
Abstract
Abstract
Due to the limited training data, prompt initialization is crucial for
prompt tuning.
MetaPrompting [9] utilizes meta-learning to acquire a shared
initialization for task-specific prompts.
However, a singular initialization proves inadequate for generating
effective prompts across all complex tasks and samples.
Furthermore, tuning the entire MLM incurs substantial resource and
computational burdens.
4 / 45
MetaPrompter
Abstract
Abstract (Cont.)
This paper utilizes a prompt pool to leverage task-specific knowledge
and generate instance-specific prompts using attention mechanisms.
Additionally, it introduces a novel soft verbalizer (RepVerb) that
directly constructs label embeddings from feature embeddings.
MetaPrompter offers parameter efficiency, requiring only tuning of the
prompt pool.
5 / 45
MetaPrompter
Introduction
Table of contents
1 Abstract
2 Introduction
3 Preliminaries and Related Work
4 Proposed Method
5 Experiments
6 Conclusion
6 / 45
MetaPrompter
Introduction
Introduction
To train a large language model:
Fine-tuning [4]
Adapter tuning [10]
Prompt learning [13]
7 / 45
MetaPrompter
Introduction
Prompt learning
It formulates the downstream task as a cloze-style MLM problem.
It wraps an input text with a discrete prompt (e.g., “Topic is
[MASK]”) and feeds it to the MLM to predict a token at the [MASK]
position.
A verbalizer then maps the predicted token to the label.
8 / 45
MetaPrompter
Introduction
Prompt tuning
The input embedding is enveloped by a continuous prompt and can
then be combined with discrete tokens to create a template, while the
MLM remains frozen.
9 / 45
MetaPrompter
Introduction
MetaPrompting
MetaPrompting is the state-of-the-art approach for addressing the
sensitivity of prompt tuning to initialization.
It employs meta-learning [5] to develop a meta-initialization that is
applicable across all task-specific prompts.
10 / 45
MetaPrompter
Introduction
MetaPrompting (Cont.)
However, MetaPrompting suffers from three problems:
1 Crafting effective prompts for all tasks and samples from a single
meta-initialized prompt is difficult when tasks are complex.
2 MetaPrompting employs a custom verbalizer, yet choosing
effective label tokens is labor-intensive and impractical for
extensive label sets.
3 MetaPrompting requires expensive tuning of the whole MLM.
11 / 45
MetaPrompter
Introduction
Meta training and testing procedures of MetaPrompting
12 / 45
MetaPrompter
Introduction
Contribution
This paper focuses on meta-learning a prompt pool, serving as
shared meta-knowledge, to enhance adaptability for complex
tasks.
A new soft verbalizer, called Representative Verbalizer
(RepVerb), is introduced in the paper. It constructs label
embeddings by averaging feature embeddings of respective
training samples.
In contrast to MetaPrompting, this approach requires
significantly fewer parameters, specifically 1000× fewer.
13 / 45
MetaPrompter
Preliminaries and Related Work
Table of contents
1 Abstract
2 Introduction
3 Preliminaries and Related Work
Prompt Learning
Meta-Learning for Prompt Learning
4 Proposed Method
5 Experiments
6 Conclusion
14 / 45
MetaPrompter
Preliminaries and Related Work
Prompt Learning
Prompt Learning
Given a sequence of n tokens (x1, . . . , xn), the MLM uses
x = ([CLS], x1, . . . , xn, [SEP]) as input, and encodes it into a hidden
representations (h[CLS], h1, . . . , hn, h[SEP]).
With fine-tuning, an extra classifier is added on top of h[CLS] to
predict the label distribution.
On the other hand, prompt learning freezes the pre-trained model
and formulates the downstream task as a cloze-style MLM M(·; 𝝓)
problem.
15 / 45
MetaPrompter
Preliminaries and Related Work
Prompt Learning
Prompt Learning (Cont.)
In topic classification, ”Topic is [MASK]” can be used as the
prompt.
If the discrete prompts are used, the input text x is wrapped with the
prompt and mapped to an input embedding sequence
(E(x), E(Topic), E(is), E([MASK])).
16 / 45
MetaPrompter
Preliminaries and Related Work
Prompt Learning
Prompt Learning (Cont.)
With prompt tuning, which using a continuous prompt 𝜽 ∈ RLp×di .
The input embedding sequence turns into (E(x), 𝜽, E([MASK])).
This can be further combined with anchor tokens to form a template:
x̃ = T(x; 𝜽) = (E(x), 𝜽, E(Topic), E(is), E([MASK]))
The MLM’s output h[MASK](x̃) ∈ Rdo , which will be infered and filled
at the [MASK] position.
17 / 45
MetaPrompter
Preliminaries and Related Work
Prompt Learning
Prompt Learning (Cont.)
A verbalizer bridges the prediction at the [MASK] position and labels
in prompt learning.
Prompt tuning then optimizes (𝝓, 𝜽) by maximizing the label
probability:
P̂(y | x; 𝝓, 𝜽) =
1
|Vy|
∑︁
w∈Vy
PM ([MASK] = w | T(x; 𝜽))
where Vy is a set of label-relevant tokens. For example, y = SPORTS,
then Vy = {sports, football, basketball}.
18 / 45
MetaPrompter
Preliminaries and Related Work
Prompt Learning
Prompt Learning (Cont.)
The verbalizer is crucial to the performance of prompt learning.
The computation costs a lot if the searching space is a discrete space.
Such that, soft verbalizer [7] is proposed, which map each label to a
continuous embedding and predicts the label distribution based on the
similarities between feature embedding and label embeddings.
But, another challenge arose in few-shot learning.
19 / 45
MetaPrompter
Preliminaries and Related Work
Meta-Learning for Prompt Learning
Meta-Learning for Prompt Learning
In meta-learning, we need to learn a collection T of tasks with a
shared meta-parameter. Each task 𝜏 ∈ T has a support set S𝜏 and a
query set Q𝜏. The label set of 𝜏 is Y𝜏.
Since prompt tuning is sensitive to prompt initialization in few-shot
tasks, meta-learning can be used to search for a good initialization.
20 / 45
MetaPrompter
Preliminaries and Related Work
Meta-Learning for Prompt Learning
MetaPrompting
It uses MAML to learn a meta-initialization for the task-specific
prompts.
For each iteration t, the base learner uses a task 𝜏 and
meta-parameters (𝝓t−1, 𝜽t−1), and builds a task-specific model
(𝝓t,J, 𝜽t,J) by performing J gradient updates on the support set with
step size 𝛼 and initialization (𝝓t,0, 𝜽t,0) ≡ (𝝓t−1, 𝜽t−1).
The meta-learner updates it by maximizing the log-likelihood
objective on the query set with a step size represented by 𝜂.
21 / 45
MetaPrompter
Proposed Method
Table of contents
1 Abstract
2 Introduction
3 Preliminaries and Related Work
4 Proposed Method
Representative Verbalizer (RepVerb)
Meta Structured-Prompting
5 Experiments
6 Conclusion
22 / 45
MetaPrompter
Proposed Method
Representative Verbalizer (RepVerb)
Representative Verbalizer (RepVerb)
They propose Representative Verbalizer (RepVerb), which constructs
vy from feature embeddings of the corresponding training samples:
vy =
1
|S𝜏,y|
∑︁
(x,y)∈S𝜏,y
h[MASK](x̃)
23 / 45
MetaPrompter
Proposed Method
Representative Verbalizer (RepVerb)
Representative Verbalizer (RepVerb)
To predict the label of a given x, they measure the cosine similarity
between h[MASK](x̃) and each vy(y ∈ Y𝜏):
P̃(y | x; 𝝓, 𝜽) =
exp 𝜌 cos vy, h[MASK](x̃)

Í
y′ ∈Y𝜏
𝜏
exp 𝜌 cos vy′ , h[MASK](x̃)

The temperature 𝜌  0, they set 𝜌 = 10 [12].
𝜌 → ∞, P̃(y | x; 𝝓, 𝜽) becomes one-hot.
𝜌 → 0, P̃(y | x; 𝝓, 𝜽) becomes uniform.
24 / 45
MetaPrompter
Proposed Method
Meta Structured-Prompting
Meta-Learn a Prompt Pool
A prompt pool has K learnable prompts {(ki, 𝜽i) : I = 1, . . . , K} with
key ki ∈ Rdo and value 𝜽i ∈ RLp×di [11].
The attention weights are computed as a = softmax(
Kqx
√
do
), where
K = [k⊤
1 ; . . . ; k⊤
K], and qx ∈ Rdo is the embedding of the [MASK]
output by a pre-trained and frozen MLM with the wrapped input.
25 / 45
MetaPrompter
Proposed Method
Meta Structured-Prompting
Meta-Learn a Prompt Pool
The generated prompt is weighted averaging over all the values (𝜽i’s):
𝜽x(K, 𝚯) =
K
∑︁
i=1
ai𝜽i
where 𝚯 = [𝜽1; . . . ; 𝜽K].
The proposed procedure for meta-learning the prompt pool (K, 𝚯),
which will be called MetaPrompter.
26 / 45
MetaPrompter
Proposed Method
Meta Structured-Prompting
27 / 45
MetaPrompter
Proposed Method
Meta Structured-Prompting
Base learner
They use 𝜽x,j to predict the label probability with both hand-crafted
verbalizer and soft verbalizer:
P(y | x; 𝜽x,j) = (1 − 𝜆)P̂(y | 𝜽x,j) + 𝜆P̃(y | 𝜽x,j)
where 𝜆 ∈ [0, 1].
28 / 45
MetaPrompter
Proposed Method
Meta Structured-Prompting
Meta learner
29 / 45
MetaPrompter
Proposed Method
Meta Structured-Prompting
Meta-Testing
The MLM preicts the label probability with the input x̃ ≡ T(x; 𝜽x,J),
where prompt pool is built with base learner from S𝜏′ .
And for each instance-dependent prompts 𝜽x,J are constructed from
each (x, ·) ∈ Q𝜏′ , where 𝜏′ is an unseen task.
30 / 45
MetaPrompter
Proposed Method
Meta Structured-Prompting
MetaPrompter is Parameter-Efficient
The number of parameters:
MetaPrompter is K(do + Lpdi).
MetaPrompting is d𝜙 + Lpdi.
For example, they use BERT (with do = di = 768, d𝜙 = 109 × 106)
and K = Lp = 8 in MetaPrompter.
31 / 45
MetaPrompter
Experiments
Table of contents
1 Abstract
2 Introduction
3 Preliminaries and Related Work
4 Proposed Method
5 Experiments
6 Conclusion
32 / 45
MetaPrompter
Experiments
Setup
In the experiment, they perform few-shot classification on six
popularly used data sets:
Table 1: Statistics of the data sets.
33 / 45
MetaPrompter
Experiments
Evaluation on RepVerb
They compare the performance of the proposed RepVerb with
state-of-the-art soft verbalizers:
1 WARP
2 ProtoVerb [3]
Figure 1: t-SNE visualization of [MASK]’s embeddings (crosses) and label
embeddings (circles) for a 5-way 5-shot task randomly sampled from Reuters.
34 / 45
MetaPrompter
Experiments
Evaluation on RepVerb
Table 2: Meta-testing accuracy of various verbalizers on 5-way few-shot
classification.
35 / 45
MetaPrompter
Experiments
Baseline
Prompt-based method
1 MetaPrompting
2 MetaPrompting + WARP
3 MetaPrompting + ProtoVerb
4 MetaPrompting + RepVerb
Non-prompt-based method
1 HATT [6]
2 DS [1]
3 MLADA [8]
4 ConstructNet [2]
36 / 45
MetaPrompter
Experiments
Evaluation on MetaPrompter
Table 3: 5-way 5-shot classification meta-testing accuracy.
Table 4: 5-way 1-shot Meta-testing classification accuracy.
37 / 45
MetaPrompter
Conclusion
Table of contents
1 Abstract
2 Introduction
3 Preliminaries and Related Work
4 Proposed Method
5 Experiments
6 Conclusion
38 / 45
MetaPrompter
Conclusion
Conclusion
This paper proposes MetaPrompter, which combines structured
prompting and a novel verbalizer called RepVerb.
A prompt pool structure is used to construct instance-dependent
prompts by attention.
RepVerb builds label embedding by averaging feature embeddings of
the corresponding training samples.
39 / 45
MetaPrompter
Conclusion
Reflection
MetaPrompter requires the availability of a set of meta-training tasks.
40 / 45
MetaPrompter
Conclusion
References I
[1] Yujia Bao, Menghua Wu, et al. “Few-shot Text Classification
with Distributional Signatures”. In: International Conference
on Learning Representations. 2019.
[2] Junfan Chen, Richong Zhang, et al. “Contrastnet: A contrastive
learning framework for few-shot text classification”. In:
Proceedings of the AAAI Conference on Artificial Intelligence.
2022, pp. 10492–10500.
[3] Ganqu Cui, Shengding Hu, et al. “Prototypical Verbalizer for
Prompt-based Few-shot Tuning”. In: Proceedings of the 60th
Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers). 2022, pp. 7014–7024.
41 / 45
MetaPrompter
Conclusion
References II
[4] Jacob Devlin, Ming-Wei Chang, et al. “BERT: Pre-training of
Deep Bidirectional Transformers for Language Understanding”.
In: Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long and Short
Papers). 2019, pp. 4171–4186.
[5] Chelsea Finn, Pieter Abbeel, et al. “Model-agnostic
meta-learning for fast adaptation of deep networks”. In:
International conference on machine learning. 2017,
pp. 1126–1135.
42 / 45
MetaPrompter
Conclusion
References III
[6] Tianyu Gao, Xu Han, et al. “Hybrid attention-based
prototypical networks for noisy few-shot relation
classification”. In: Proceedings of the AAAI conference on
artificial intelligence. 2019, pp. 6407–6414.
[7] Karen Hambardzumyan, Hrant Khachatrian, et al. “WARP:
Word-level Adversarial ReProgramming”. In: Proceedings of
the 59th Annual Meeting of the Association for Computational
Linguistics and the 11th International Joint Conference on
Natural Language Processing (Volume 1: Long Papers). 2021,
pp. 4921–4933.
43 / 45
MetaPrompter
Conclusion
References IV
[8] Chengcheng Han, Zeqiu Fan, et al. “Meta-Learning Adversarial
Domain Adaptation Network for Few-Shot Text Classification”.
In: Findings of the Association for Computational Linguistics:
ACL-IJCNLP 2021. 2021, pp. 1664–1673.
[9] Yutai Hou, Hongyuan Dong, et al. “MetaPrompting: Learning
to Learn Better Prompts”. In: Proceedings of the 29th
International Conference on Computational Linguistics. 2022,
pp. 3251–3262.
[10] Neil Houlsby, Andrei Giurgiu, et al. “Parameter-efficient
transfer learning for NLP”. In: International conference on
machine learning. 2019, pp. 2790–2799.
44 / 45
MetaPrompter
Conclusion
References V
[11] Junyi Li, Tianyi Tang, et al. “Learning to Transfer Prompts for
Text Generation”. In: Proceedings of the 2022 Conference of
the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies.
2022, pp. 3506–3518.
[12] Boris Oreshkin, Pau Rodrı́guez López, et al. “Tadam: Task
dependent adaptive metric for improved few-shot learning”. In:
Advances in neural information processing systems (2018).
[13] Alec Radford, Jeffrey Wu, et al. “Language models are
unsupervised multitask learners”. In: OpenAI blog (2019), p. 9.
45 / 45

More Related Content

Similar to Effective Structured Prompting by Meta-Learning and Representative Verbalizer.pdf

Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi
 
Meta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIMLMeta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIMLVijaySharma802
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeESCOM
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNINGMLReview
 
Why you should use a testing framework
Why you should use a testing frameworkWhy you should use a testing framework
Why you should use a testing frameworkRichie Cotton
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data萍華 楊
 
From Python to PySpark and Back Again – Unifying Single-host and Distributed ...
From Python to PySpark and Back Again – Unifying Single-host and Distributed ...From Python to PySpark and Back Again – Unifying Single-host and Distributed ...
From Python to PySpark and Back Again – Unifying Single-host and Distributed ...Databricks
 
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfPo-Chuan Chen
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfPo-Chuan Chen
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxDrKBManwade
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxssuserd23711
 
interfacing matlab with embedded systems
interfacing matlab with embedded systemsinterfacing matlab with embedded systems
interfacing matlab with embedded systemsRaghav Shetty
 
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence InformationLatent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Informationcsandit
 
Evaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfEvaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfPo-Chuan Chen
 
An improved teaching learning
An improved teaching learningAn improved teaching learning
An improved teaching learningcsandit
 

Similar to Effective Structured Prompting by Meta-Learning and Representative Verbalizer.pdf (20)

Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
 
Meta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIMLMeta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIML
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on Cooperative
 
Matlab Homework Help
Matlab Homework HelpMatlab Homework Help
Matlab Homework Help
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 
Fulltext
FulltextFulltext
Fulltext
 
Why you should use a testing framework
Why you should use a testing frameworkWhy you should use a testing framework
Why you should use a testing framework
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
From Python to PySpark and Back Again – Unifying Single-host and Distributed ...
From Python to PySpark and Back Again – Unifying Single-host and Distributed ...From Python to PySpark and Back Again – Unifying Single-host and Distributed ...
From Python to PySpark and Back Again – Unifying Single-host and Distributed ...
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
interfacing matlab with embedded systems
interfacing matlab with embedded systemsinterfacing matlab with embedded systems
interfacing matlab with embedded systems
 
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence InformationLatent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
 
Evaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfEvaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdf
 
Artificial Neural Networks , Recurrent networks , Perceptron's
Artificial Neural Networks , Recurrent networks , Perceptron'sArtificial Neural Networks , Recurrent networks , Perceptron's
Artificial Neural Networks , Recurrent networks , Perceptron's
 
An improved teaching learning
An improved teaching learningAn improved teaching learning
An improved teaching learning
 

More from Po-Chuan Chen

E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdfE-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdfPo-Chuan Chen
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfPo-Chuan Chen
 
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Po-Chuan Chen
 
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...Po-Chuan Chen
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfPo-Chuan Chen
 
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfPo-Chuan Chen
 
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfPo-Chuan Chen
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...Po-Chuan Chen
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfPo-Chuan Chen
 
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfCold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfPo-Chuan Chen
 
Image_to_Prompts.pdf
Image_to_Prompts.pdfImage_to_Prompts.pdf
Image_to_Prompts.pdfPo-Chuan Chen
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfPo-Chuan Chen
 
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfPo-Chuan Chen
 
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdfPo-Chuan Chen
 
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Po-Chuan Chen
 
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...Po-Chuan Chen
 
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...Po-Chuan Chen
 
PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density S...
PEN: Design and Evaluation of Partial-Erase  for 3D NAND-Based High Density S...PEN: Design and Evaluation of Partial-Erase  for 3D NAND-Based High Density S...
PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density S...Po-Chuan Chen
 
Towards User-Defined SLA in Cloud Flash Storage.pptx
Towards User-Defined SLA in Cloud Flash Storage.pptxTowards User-Defined SLA in Cloud Flash Storage.pptx
Towards User-Defined SLA in Cloud Flash Storage.pptxPo-Chuan Chen
 

More from Po-Chuan Chen (20)

E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdfE-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
 
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
 
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdf
 
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdf
 
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfCold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
 
Image_to_Prompts.pdf
Image_to_Prompts.pdfImage_to_Prompts.pdf
Image_to_Prompts.pdf
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdf
 
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
 
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
 
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
 
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
 
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
 
PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density S...
PEN: Design and Evaluation of Partial-Erase  for 3D NAND-Based High Density S...PEN: Design and Evaluation of Partial-Erase  for 3D NAND-Based High Density S...
PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density S...
 
Towards User-Defined SLA in Cloud Flash Storage.pptx
Towards User-Defined SLA in Cloud Flash Storage.pptxTowards User-Defined SLA in Cloud Flash Storage.pptx
Towards User-Defined SLA in Cloud Flash Storage.pptx
 

Recently uploaded

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Recently uploaded (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Effective Structured Prompting by Meta-Learning and Representative Verbalizer.pdf

  • 1. MetaPrompter Effective Structured Prompting by Meta-Learning and Representative Verbalizer ICML, 2023 Weisen Jiang, Yu Zhang, James T. Kwok Speaker: Po-Chuan Chen Apr 11, 2024 1 / 45
  • 2. MetaPrompter Table of contents 1 Abstract 2 Introduction 3 Preliminaries and Related Work 4 Proposed Method 5 Experiments 6 Conclusion 2 / 45
  • 3. MetaPrompter Abstract Table of contents 1 Abstract 2 Introduction 3 Preliminaries and Related Work 4 Proposed Method 5 Experiments 6 Conclusion 3 / 45
  • 4. MetaPrompter Abstract Abstract Due to the limited training data, prompt initialization is crucial for prompt tuning. MetaPrompting [9] utilizes meta-learning to acquire a shared initialization for task-specific prompts. However, a singular initialization proves inadequate for generating effective prompts across all complex tasks and samples. Furthermore, tuning the entire MLM incurs substantial resource and computational burdens. 4 / 45
  • 5. MetaPrompter Abstract Abstract (Cont.) This paper utilizes a prompt pool to leverage task-specific knowledge and generate instance-specific prompts using attention mechanisms. Additionally, it introduces a novel soft verbalizer (RepVerb) that directly constructs label embeddings from feature embeddings. MetaPrompter offers parameter efficiency, requiring only tuning of the prompt pool. 5 / 45
  • 6. MetaPrompter Introduction Table of contents 1 Abstract 2 Introduction 3 Preliminaries and Related Work 4 Proposed Method 5 Experiments 6 Conclusion 6 / 45
  • 7. MetaPrompter Introduction Introduction To train a large language model: Fine-tuning [4] Adapter tuning [10] Prompt learning [13] 7 / 45
  • 8. MetaPrompter Introduction Prompt learning It formulates the downstream task as a cloze-style MLM problem. It wraps an input text with a discrete prompt (e.g., “Topic is [MASK]”) and feeds it to the MLM to predict a token at the [MASK] position. A verbalizer then maps the predicted token to the label. 8 / 45
  • 9. MetaPrompter Introduction Prompt tuning The input embedding is enveloped by a continuous prompt and can then be combined with discrete tokens to create a template, while the MLM remains frozen. 9 / 45
  • 10. MetaPrompter Introduction MetaPrompting MetaPrompting is the state-of-the-art approach for addressing the sensitivity of prompt tuning to initialization. It employs meta-learning [5] to develop a meta-initialization that is applicable across all task-specific prompts. 10 / 45
  • 11. MetaPrompter Introduction MetaPrompting (Cont.) However, MetaPrompting suffers from three problems: 1 Crafting effective prompts for all tasks and samples from a single meta-initialized prompt is difficult when tasks are complex. 2 MetaPrompting employs a custom verbalizer, yet choosing effective label tokens is labor-intensive and impractical for extensive label sets. 3 MetaPrompting requires expensive tuning of the whole MLM. 11 / 45
  • 12. MetaPrompter Introduction Meta training and testing procedures of MetaPrompting 12 / 45
  • 13. MetaPrompter Introduction Contribution This paper focuses on meta-learning a prompt pool, serving as shared meta-knowledge, to enhance adaptability for complex tasks. A new soft verbalizer, called Representative Verbalizer (RepVerb), is introduced in the paper. It constructs label embeddings by averaging feature embeddings of respective training samples. In contrast to MetaPrompting, this approach requires significantly fewer parameters, specifically 1000× fewer. 13 / 45
  • 14. MetaPrompter Preliminaries and Related Work Table of contents 1 Abstract 2 Introduction 3 Preliminaries and Related Work Prompt Learning Meta-Learning for Prompt Learning 4 Proposed Method 5 Experiments 6 Conclusion 14 / 45
  • 15. MetaPrompter Preliminaries and Related Work Prompt Learning Prompt Learning Given a sequence of n tokens (x1, . . . , xn), the MLM uses x = ([CLS], x1, . . . , xn, [SEP]) as input, and encodes it into a hidden representations (h[CLS], h1, . . . , hn, h[SEP]). With fine-tuning, an extra classifier is added on top of h[CLS] to predict the label distribution. On the other hand, prompt learning freezes the pre-trained model and formulates the downstream task as a cloze-style MLM M(·; 𝝓) problem. 15 / 45
  • 16. MetaPrompter Preliminaries and Related Work Prompt Learning Prompt Learning (Cont.) In topic classification, ”Topic is [MASK]” can be used as the prompt. If the discrete prompts are used, the input text x is wrapped with the prompt and mapped to an input embedding sequence (E(x), E(Topic), E(is), E([MASK])). 16 / 45
  • 17. MetaPrompter Preliminaries and Related Work Prompt Learning Prompt Learning (Cont.) With prompt tuning, which using a continuous prompt 𝜽 ∈ RLp×di . The input embedding sequence turns into (E(x), 𝜽, E([MASK])). This can be further combined with anchor tokens to form a template: x̃ = T(x; 𝜽) = (E(x), 𝜽, E(Topic), E(is), E([MASK])) The MLM’s output h[MASK](x̃) ∈ Rdo , which will be infered and filled at the [MASK] position. 17 / 45
  • 18. MetaPrompter Preliminaries and Related Work Prompt Learning Prompt Learning (Cont.) A verbalizer bridges the prediction at the [MASK] position and labels in prompt learning. Prompt tuning then optimizes (𝝓, 𝜽) by maximizing the label probability: P̂(y | x; 𝝓, 𝜽) = 1 |Vy| ∑︁ w∈Vy PM ([MASK] = w | T(x; 𝜽)) where Vy is a set of label-relevant tokens. For example, y = SPORTS, then Vy = {sports, football, basketball}. 18 / 45
  • 19. MetaPrompter Preliminaries and Related Work Prompt Learning Prompt Learning (Cont.) The verbalizer is crucial to the performance of prompt learning. The computation costs a lot if the searching space is a discrete space. Such that, soft verbalizer [7] is proposed, which map each label to a continuous embedding and predicts the label distribution based on the similarities between feature embedding and label embeddings. But, another challenge arose in few-shot learning. 19 / 45
  • 20. MetaPrompter Preliminaries and Related Work Meta-Learning for Prompt Learning Meta-Learning for Prompt Learning In meta-learning, we need to learn a collection T of tasks with a shared meta-parameter. Each task 𝜏 ∈ T has a support set S𝜏 and a query set Q𝜏. The label set of 𝜏 is Y𝜏. Since prompt tuning is sensitive to prompt initialization in few-shot tasks, meta-learning can be used to search for a good initialization. 20 / 45
  • 21. MetaPrompter Preliminaries and Related Work Meta-Learning for Prompt Learning MetaPrompting It uses MAML to learn a meta-initialization for the task-specific prompts. For each iteration t, the base learner uses a task 𝜏 and meta-parameters (𝝓t−1, 𝜽t−1), and builds a task-specific model (𝝓t,J, 𝜽t,J) by performing J gradient updates on the support set with step size 𝛼 and initialization (𝝓t,0, 𝜽t,0) ≡ (𝝓t−1, 𝜽t−1). The meta-learner updates it by maximizing the log-likelihood objective on the query set with a step size represented by 𝜂. 21 / 45
  • 22. MetaPrompter Proposed Method Table of contents 1 Abstract 2 Introduction 3 Preliminaries and Related Work 4 Proposed Method Representative Verbalizer (RepVerb) Meta Structured-Prompting 5 Experiments 6 Conclusion 22 / 45
  • 23. MetaPrompter Proposed Method Representative Verbalizer (RepVerb) Representative Verbalizer (RepVerb) They propose Representative Verbalizer (RepVerb), which constructs vy from feature embeddings of the corresponding training samples: vy = 1 |S𝜏,y| ∑︁ (x,y)∈S𝜏,y h[MASK](x̃) 23 / 45
  • 24. MetaPrompter Proposed Method Representative Verbalizer (RepVerb) Representative Verbalizer (RepVerb) To predict the label of a given x, they measure the cosine similarity between h[MASK](x̃) and each vy(y ∈ Y𝜏): P̃(y | x; 𝝓, 𝜽) = exp 𝜌 cos vy, h[MASK](x̃) Í y′ ∈Y𝜏 𝜏 exp 𝜌 cos vy′ , h[MASK](x̃) The temperature 𝜌 0, they set 𝜌 = 10 [12]. 𝜌 → ∞, P̃(y | x; 𝝓, 𝜽) becomes one-hot. 𝜌 → 0, P̃(y | x; 𝝓, 𝜽) becomes uniform. 24 / 45
  • 25. MetaPrompter Proposed Method Meta Structured-Prompting Meta-Learn a Prompt Pool A prompt pool has K learnable prompts {(ki, 𝜽i) : I = 1, . . . , K} with key ki ∈ Rdo and value 𝜽i ∈ RLp×di [11]. The attention weights are computed as a = softmax( Kqx √ do ), where K = [k⊤ 1 ; . . . ; k⊤ K], and qx ∈ Rdo is the embedding of the [MASK] output by a pre-trained and frozen MLM with the wrapped input. 25 / 45
  • 26. MetaPrompter Proposed Method Meta Structured-Prompting Meta-Learn a Prompt Pool The generated prompt is weighted averaging over all the values (𝜽i’s): 𝜽x(K, 𝚯) = K ∑︁ i=1 ai𝜽i where 𝚯 = [𝜽1; . . . ; 𝜽K]. The proposed procedure for meta-learning the prompt pool (K, 𝚯), which will be called MetaPrompter. 26 / 45
  • 28. MetaPrompter Proposed Method Meta Structured-Prompting Base learner They use 𝜽x,j to predict the label probability with both hand-crafted verbalizer and soft verbalizer: P(y | x; 𝜽x,j) = (1 − 𝜆)P̂(y | 𝜽x,j) + 𝜆P̃(y | 𝜽x,j) where 𝜆 ∈ [0, 1]. 28 / 45
  • 30. MetaPrompter Proposed Method Meta Structured-Prompting Meta-Testing The MLM preicts the label probability with the input x̃ ≡ T(x; 𝜽x,J), where prompt pool is built with base learner from S𝜏′ . And for each instance-dependent prompts 𝜽x,J are constructed from each (x, ·) ∈ Q𝜏′ , where 𝜏′ is an unseen task. 30 / 45
  • 31. MetaPrompter Proposed Method Meta Structured-Prompting MetaPrompter is Parameter-Efficient The number of parameters: MetaPrompter is K(do + Lpdi). MetaPrompting is d𝜙 + Lpdi. For example, they use BERT (with do = di = 768, d𝜙 = 109 × 106) and K = Lp = 8 in MetaPrompter. 31 / 45
  • 32. MetaPrompter Experiments Table of contents 1 Abstract 2 Introduction 3 Preliminaries and Related Work 4 Proposed Method 5 Experiments 6 Conclusion 32 / 45
  • 33. MetaPrompter Experiments Setup In the experiment, they perform few-shot classification on six popularly used data sets: Table 1: Statistics of the data sets. 33 / 45
  • 34. MetaPrompter Experiments Evaluation on RepVerb They compare the performance of the proposed RepVerb with state-of-the-art soft verbalizers: 1 WARP 2 ProtoVerb [3] Figure 1: t-SNE visualization of [MASK]’s embeddings (crosses) and label embeddings (circles) for a 5-way 5-shot task randomly sampled from Reuters. 34 / 45
  • 35. MetaPrompter Experiments Evaluation on RepVerb Table 2: Meta-testing accuracy of various verbalizers on 5-way few-shot classification. 35 / 45
  • 36. MetaPrompter Experiments Baseline Prompt-based method 1 MetaPrompting 2 MetaPrompting + WARP 3 MetaPrompting + ProtoVerb 4 MetaPrompting + RepVerb Non-prompt-based method 1 HATT [6] 2 DS [1] 3 MLADA [8] 4 ConstructNet [2] 36 / 45
  • 37. MetaPrompter Experiments Evaluation on MetaPrompter Table 3: 5-way 5-shot classification meta-testing accuracy. Table 4: 5-way 1-shot Meta-testing classification accuracy. 37 / 45
  • 38. MetaPrompter Conclusion Table of contents 1 Abstract 2 Introduction 3 Preliminaries and Related Work 4 Proposed Method 5 Experiments 6 Conclusion 38 / 45
  • 39. MetaPrompter Conclusion Conclusion This paper proposes MetaPrompter, which combines structured prompting and a novel verbalizer called RepVerb. A prompt pool structure is used to construct instance-dependent prompts by attention. RepVerb builds label embedding by averaging feature embeddings of the corresponding training samples. 39 / 45
  • 40. MetaPrompter Conclusion Reflection MetaPrompter requires the availability of a set of meta-training tasks. 40 / 45
  • 41. MetaPrompter Conclusion References I [1] Yujia Bao, Menghua Wu, et al. “Few-shot Text Classification with Distributional Signatures”. In: International Conference on Learning Representations. 2019. [2] Junfan Chen, Richong Zhang, et al. “Contrastnet: A contrastive learning framework for few-shot text classification”. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2022, pp. 10492–10500. [3] Ganqu Cui, Shengding Hu, et al. “Prototypical Verbalizer for Prompt-based Few-shot Tuning”. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, pp. 7014–7024. 41 / 45
  • 42. MetaPrompter Conclusion References II [4] Jacob Devlin, Ming-Wei Chang, et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, pp. 4171–4186. [5] Chelsea Finn, Pieter Abbeel, et al. “Model-agnostic meta-learning for fast adaptation of deep networks”. In: International conference on machine learning. 2017, pp. 1126–1135. 42 / 45
  • 43. MetaPrompter Conclusion References III [6] Tianyu Gao, Xu Han, et al. “Hybrid attention-based prototypical networks for noisy few-shot relation classification”. In: Proceedings of the AAAI conference on artificial intelligence. 2019, pp. 6407–6414. [7] Karen Hambardzumyan, Hrant Khachatrian, et al. “WARP: Word-level Adversarial ReProgramming”. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, pp. 4921–4933. 43 / 45
  • 44. MetaPrompter Conclusion References IV [8] Chengcheng Han, Zeqiu Fan, et al. “Meta-Learning Adversarial Domain Adaptation Network for Few-Shot Text Classification”. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021, pp. 1664–1673. [9] Yutai Hou, Hongyuan Dong, et al. “MetaPrompting: Learning to Learn Better Prompts”. In: Proceedings of the 29th International Conference on Computational Linguistics. 2022, pp. 3251–3262. [10] Neil Houlsby, Andrei Giurgiu, et al. “Parameter-efficient transfer learning for NLP”. In: International conference on machine learning. 2019, pp. 2790–2799. 44 / 45
  • 45. MetaPrompter Conclusion References V [11] Junyi Li, Tianyi Tang, et al. “Learning to Transfer Prompts for Text Generation”. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022, pp. 3506–3518. [12] Boris Oreshkin, Pau Rodrı́guez López, et al. “Tadam: Task dependent adaptive metric for improved few-shot learning”. In: Advances in neural information processing systems (2018). [13] Alec Radford, Jeffrey Wu, et al. “Language models are unsupervised multitask learners”. In: OpenAI blog (2019), p. 9. 45 / 45