4. MetaPrompter
Abstract
Abstract
Due to the limited training data, prompt initialization is crucial for
prompt tuning.
MetaPrompting [9] utilizes meta-learning to acquire a shared
initialization for task-specific prompts.
However, a singular initialization proves inadequate for generating
effective prompts across all complex tasks and samples.
Furthermore, tuning the entire MLM incurs substantial resource and
computational burdens.
4 / 45
5. MetaPrompter
Abstract
Abstract (Cont.)
This paper utilizes a prompt pool to leverage task-specific knowledge
and generate instance-specific prompts using attention mechanisms.
Additionally, it introduces a novel soft verbalizer (RepVerb) that
directly constructs label embeddings from feature embeddings.
MetaPrompter offers parameter efficiency, requiring only tuning of the
prompt pool.
5 / 45
8. MetaPrompter
Introduction
Prompt learning
It formulates the downstream task as a cloze-style MLM problem.
It wraps an input text with a discrete prompt (e.g., “Topic is
[MASK]”) and feeds it to the MLM to predict a token at the [MASK]
position.
A verbalizer then maps the predicted token to the label.
8 / 45
9. MetaPrompter
Introduction
Prompt tuning
The input embedding is enveloped by a continuous prompt and can
then be combined with discrete tokens to create a template, while the
MLM remains frozen.
9 / 45
10. MetaPrompter
Introduction
MetaPrompting
MetaPrompting is the state-of-the-art approach for addressing the
sensitivity of prompt tuning to initialization.
It employs meta-learning [5] to develop a meta-initialization that is
applicable across all task-specific prompts.
10 / 45
11. MetaPrompter
Introduction
MetaPrompting (Cont.)
However, MetaPrompting suffers from three problems:
1 Crafting effective prompts for all tasks and samples from a single
meta-initialized prompt is difficult when tasks are complex.
2 MetaPrompting employs a custom verbalizer, yet choosing
effective label tokens is labor-intensive and impractical for
extensive label sets.
3 MetaPrompting requires expensive tuning of the whole MLM.
11 / 45
13. MetaPrompter
Introduction
Contribution
This paper focuses on meta-learning a prompt pool, serving as
shared meta-knowledge, to enhance adaptability for complex
tasks.
A new soft verbalizer, called Representative Verbalizer
(RepVerb), is introduced in the paper. It constructs label
embeddings by averaging feature embeddings of respective
training samples.
In contrast to MetaPrompting, this approach requires
significantly fewer parameters, specifically 1000× fewer.
13 / 45
14. MetaPrompter
Preliminaries and Related Work
Table of contents
1 Abstract
2 Introduction
3 Preliminaries and Related Work
Prompt Learning
Meta-Learning for Prompt Learning
4 Proposed Method
5 Experiments
6 Conclusion
14 / 45
15. MetaPrompter
Preliminaries and Related Work
Prompt Learning
Prompt Learning
Given a sequence of n tokens (x1, . . . , xn), the MLM uses
x = ([CLS], x1, . . . , xn, [SEP]) as input, and encodes it into a hidden
representations (h[CLS], h1, . . . , hn, h[SEP]).
With fine-tuning, an extra classifier is added on top of h[CLS] to
predict the label distribution.
On the other hand, prompt learning freezes the pre-trained model
and formulates the downstream task as a cloze-style MLM M(·; 𝝓)
problem.
15 / 45
16. MetaPrompter
Preliminaries and Related Work
Prompt Learning
Prompt Learning (Cont.)
In topic classification, ”Topic is [MASK]” can be used as the
prompt.
If the discrete prompts are used, the input text x is wrapped with the
prompt and mapped to an input embedding sequence
(E(x), E(Topic), E(is), E([MASK])).
16 / 45
17. MetaPrompter
Preliminaries and Related Work
Prompt Learning
Prompt Learning (Cont.)
With prompt tuning, which using a continuous prompt 𝜽 ∈ RLp×di .
The input embedding sequence turns into (E(x), 𝜽, E([MASK])).
This can be further combined with anchor tokens to form a template:
x̃ = T(x; 𝜽) = (E(x), 𝜽, E(Topic), E(is), E([MASK]))
The MLM’s output h[MASK](x̃) ∈ Rdo , which will be infered and filled
at the [MASK] position.
17 / 45
18. MetaPrompter
Preliminaries and Related Work
Prompt Learning
Prompt Learning (Cont.)
A verbalizer bridges the prediction at the [MASK] position and labels
in prompt learning.
Prompt tuning then optimizes (𝝓, 𝜽) by maximizing the label
probability:
P̂(y | x; 𝝓, 𝜽) =
1
|Vy|
∑︁
w∈Vy
PM ([MASK] = w | T(x; 𝜽))
where Vy is a set of label-relevant tokens. For example, y = SPORTS,
then Vy = {sports, football, basketball}.
18 / 45
19. MetaPrompter
Preliminaries and Related Work
Prompt Learning
Prompt Learning (Cont.)
The verbalizer is crucial to the performance of prompt learning.
The computation costs a lot if the searching space is a discrete space.
Such that, soft verbalizer [7] is proposed, which map each label to a
continuous embedding and predicts the label distribution based on the
similarities between feature embedding and label embeddings.
But, another challenge arose in few-shot learning.
19 / 45
20. MetaPrompter
Preliminaries and Related Work
Meta-Learning for Prompt Learning
Meta-Learning for Prompt Learning
In meta-learning, we need to learn a collection T of tasks with a
shared meta-parameter. Each task 𝜏 ∈ T has a support set S𝜏 and a
query set Q𝜏. The label set of 𝜏 is Y𝜏.
Since prompt tuning is sensitive to prompt initialization in few-shot
tasks, meta-learning can be used to search for a good initialization.
20 / 45
21. MetaPrompter
Preliminaries and Related Work
Meta-Learning for Prompt Learning
MetaPrompting
It uses MAML to learn a meta-initialization for the task-specific
prompts.
For each iteration t, the base learner uses a task 𝜏 and
meta-parameters (𝝓t−1, 𝜽t−1), and builds a task-specific model
(𝝓t,J, 𝜽t,J) by performing J gradient updates on the support set with
step size 𝛼 and initialization (𝝓t,0, 𝜽t,0) ≡ (𝝓t−1, 𝜽t−1).
The meta-learner updates it by maximizing the log-likelihood
objective on the query set with a step size represented by 𝜂.
21 / 45
22. MetaPrompter
Proposed Method
Table of contents
1 Abstract
2 Introduction
3 Preliminaries and Related Work
4 Proposed Method
Representative Verbalizer (RepVerb)
Meta Structured-Prompting
5 Experiments
6 Conclusion
22 / 45
23. MetaPrompter
Proposed Method
Representative Verbalizer (RepVerb)
Representative Verbalizer (RepVerb)
They propose Representative Verbalizer (RepVerb), which constructs
vy from feature embeddings of the corresponding training samples:
vy =
1
|S𝜏,y|
∑︁
(x,y)∈S𝜏,y
h[MASK](x̃)
23 / 45
24. MetaPrompter
Proposed Method
Representative Verbalizer (RepVerb)
Representative Verbalizer (RepVerb)
To predict the label of a given x, they measure the cosine similarity
between h[MASK](x̃) and each vy(y ∈ Y𝜏):
P̃(y | x; 𝝓, 𝜽) =
exp 𝜌 cos vy, h[MASK](x̃)
Í
y′ ∈Y𝜏
𝜏
exp 𝜌 cos vy′ , h[MASK](x̃)
The temperature 𝜌 0, they set 𝜌 = 10 [12].
𝜌 → ∞, P̃(y | x; 𝝓, 𝜽) becomes one-hot.
𝜌 → 0, P̃(y | x; 𝝓, 𝜽) becomes uniform.
24 / 45
25. MetaPrompter
Proposed Method
Meta Structured-Prompting
Meta-Learn a Prompt Pool
A prompt pool has K learnable prompts {(ki, 𝜽i) : I = 1, . . . , K} with
key ki ∈ Rdo and value 𝜽i ∈ RLp×di [11].
The attention weights are computed as a = softmax(
Kqx
√
do
), where
K = [k⊤
1 ; . . . ; k⊤
K], and qx ∈ Rdo is the embedding of the [MASK]
output by a pre-trained and frozen MLM with the wrapped input.
25 / 45
26. MetaPrompter
Proposed Method
Meta Structured-Prompting
Meta-Learn a Prompt Pool
The generated prompt is weighted averaging over all the values (𝜽i’s):
𝜽x(K, 𝚯) =
K
∑︁
i=1
ai𝜽i
where 𝚯 = [𝜽1; . . . ; 𝜽K].
The proposed procedure for meta-learning the prompt pool (K, 𝚯),
which will be called MetaPrompter.
26 / 45
28. MetaPrompter
Proposed Method
Meta Structured-Prompting
Base learner
They use 𝜽x,j to predict the label probability with both hand-crafted
verbalizer and soft verbalizer:
P(y | x; 𝜽x,j) = (1 − 𝜆)P̂(y | 𝜽x,j) + 𝜆P̃(y | 𝜽x,j)
where 𝜆 ∈ [0, 1].
28 / 45
30. MetaPrompter
Proposed Method
Meta Structured-Prompting
Meta-Testing
The MLM preicts the label probability with the input x̃ ≡ T(x; 𝜽x,J),
where prompt pool is built with base learner from S𝜏′ .
And for each instance-dependent prompts 𝜽x,J are constructed from
each (x, ·) ∈ Q𝜏′ , where 𝜏′ is an unseen task.
30 / 45
31. MetaPrompter
Proposed Method
Meta Structured-Prompting
MetaPrompter is Parameter-Efficient
The number of parameters:
MetaPrompter is K(do + Lpdi).
MetaPrompting is d𝜙 + Lpdi.
For example, they use BERT (with do = di = 768, d𝜙 = 109 × 106)
and K = Lp = 8 in MetaPrompter.
31 / 45
34. MetaPrompter
Experiments
Evaluation on RepVerb
They compare the performance of the proposed RepVerb with
state-of-the-art soft verbalizers:
1 WARP
2 ProtoVerb [3]
Figure 1: t-SNE visualization of [MASK]’s embeddings (crosses) and label
embeddings (circles) for a 5-way 5-shot task randomly sampled from Reuters.
34 / 45
39. MetaPrompter
Conclusion
Conclusion
This paper proposes MetaPrompter, which combines structured
prompting and a novel verbalizer called RepVerb.
A prompt pool structure is used to construct instance-dependent
prompts by attention.
RepVerb builds label embedding by averaging feature embeddings of
the corresponding training samples.
39 / 45
41. MetaPrompter
Conclusion
References I
[1] Yujia Bao, Menghua Wu, et al. “Few-shot Text Classification
with Distributional Signatures”. In: International Conference
on Learning Representations. 2019.
[2] Junfan Chen, Richong Zhang, et al. “Contrastnet: A contrastive
learning framework for few-shot text classification”. In:
Proceedings of the AAAI Conference on Artificial Intelligence.
2022, pp. 10492–10500.
[3] Ganqu Cui, Shengding Hu, et al. “Prototypical Verbalizer for
Prompt-based Few-shot Tuning”. In: Proceedings of the 60th
Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers). 2022, pp. 7014–7024.
41 / 45
42. MetaPrompter
Conclusion
References II
[4] Jacob Devlin, Ming-Wei Chang, et al. “BERT: Pre-training of
Deep Bidirectional Transformers for Language Understanding”.
In: Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long and Short
Papers). 2019, pp. 4171–4186.
[5] Chelsea Finn, Pieter Abbeel, et al. “Model-agnostic
meta-learning for fast adaptation of deep networks”. In:
International conference on machine learning. 2017,
pp. 1126–1135.
42 / 45
43. MetaPrompter
Conclusion
References III
[6] Tianyu Gao, Xu Han, et al. “Hybrid attention-based
prototypical networks for noisy few-shot relation
classification”. In: Proceedings of the AAAI conference on
artificial intelligence. 2019, pp. 6407–6414.
[7] Karen Hambardzumyan, Hrant Khachatrian, et al. “WARP:
Word-level Adversarial ReProgramming”. In: Proceedings of
the 59th Annual Meeting of the Association for Computational
Linguistics and the 11th International Joint Conference on
Natural Language Processing (Volume 1: Long Papers). 2021,
pp. 4921–4933.
43 / 45
44. MetaPrompter
Conclusion
References IV
[8] Chengcheng Han, Zeqiu Fan, et al. “Meta-Learning Adversarial
Domain Adaptation Network for Few-Shot Text Classification”.
In: Findings of the Association for Computational Linguistics:
ACL-IJCNLP 2021. 2021, pp. 1664–1673.
[9] Yutai Hou, Hongyuan Dong, et al. “MetaPrompting: Learning
to Learn Better Prompts”. In: Proceedings of the 29th
International Conference on Computational Linguistics. 2022,
pp. 3251–3262.
[10] Neil Houlsby, Andrei Giurgiu, et al. “Parameter-efficient
transfer learning for NLP”. In: International conference on
machine learning. 2019, pp. 2790–2799.
44 / 45
45. MetaPrompter
Conclusion
References V
[11] Junyi Li, Tianyi Tang, et al. “Learning to Transfer Prompts for
Text Generation”. In: Proceedings of the 2022 Conference of
the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies.
2022, pp. 3506–3518.
[12] Boris Oreshkin, Pau Rodrı́guez López, et al. “Tadam: Task
dependent adaptive metric for improved few-shot learning”. In:
Advances in neural information processing systems (2018).
[13] Alec Radford, Jeffrey Wu, et al. “Language models are
unsupervised multitask learners”. In: OpenAI blog (2019), p. 9.
45 / 45