Effective Structured Prompting by Meta-Learning and Representative Verbalizer.pdf

MetaPrompter
Effective Structured Prompting by Meta-Learning
and Representative Verbalizer
ICML, 2023
Weisen Jiang, Yu Zhang, James T. Kwok
Speaker: Po-Chuan Chen
Apr 11, 2024
1 / 45

MetaPrompter
Table of contents
1 Abstract
2 Introduction
3 Preliminaries and Related Work
4 Proposed Method
5 Experiments
6 Conclusion
2 / 45

MetaPrompter
Abstract
Table of contents
1 Abstract
2 Introduction
4 Proposed Method
5 Experiments
6 Conclusion
3 / 45

MetaPrompter
Abstract
Abstract
Due to the limited training data, prompt initialization is crucial for
prompt tuning.
MetaPrompting [9] utilizes meta-learning to acquire a shared
initialization for task-specific prompts.
However, a singular initialization proves inadequate for generating
effective prompts across all complex tasks and samples.
Furthermore, tuning the entire MLM incurs substantial resource and
computational burdens.
4 / 45

MetaPrompter
Abstract
Abstract (Cont.)
This paper utilizes a prompt pool to leverage task-specific knowledge
and generate instance-specific prompts using attention mechanisms.
Additionally, it introduces a novel soft verbalizer (RepVerb) that
directly constructs label embeddings from feature embeddings.
MetaPrompter offers parameter efficiency, requiring only tuning of the
prompt pool.
5 / 45

MetaPrompter
Introduction
Table of contents
1 Abstract
2 Introduction
4 Proposed Method
5 Experiments
6 Conclusion
6 / 45

MetaPrompter
Introduction
Introduction
To train a large language model:
Fine-tuning [4]
Adapter tuning [10]
Prompt learning [13]
7 / 45

MetaPrompter
Introduction
Prompt learning
It formulates the downstream task as a cloze-style MLM problem.
It wraps an input text with a discrete prompt (e.g., “Topic is
[MASK]”) and feeds it to the MLM to predict a token at the [MASK]
position.
A verbalizer then maps the predicted token to the label.
8 / 45

MetaPrompter
Introduction
Prompt tuning
The input embedding is enveloped by a continuous prompt and can
then be combined with discrete tokens to create a template, while the
MLM remains frozen.
9 / 45

MetaPrompter
Introduction
MetaPrompting
MetaPrompting is the state-of-the-art approach for addressing the
sensitivity of prompt tuning to initialization.
It employs meta-learning [5] to develop a meta-initialization that is
applicable across all task-specific prompts.
10 / 45

MetaPrompter
Introduction
MetaPrompting (Cont.)
However, MetaPrompting suffers from three problems:
1 Crafting effective prompts for all tasks and samples from a single
meta-initialized prompt is difficult when tasks are complex.
2 MetaPrompting employs a custom verbalizer, yet choosing
effective label tokens is labor-intensive and impractical for
extensive label sets.
3 MetaPrompting requires expensive tuning of the whole MLM.
11 / 45

MetaPrompter
Introduction
Meta training and testing procedures of MetaPrompting
12 / 45

MetaPrompter
Introduction
Contribution
This paper focuses on meta-learning a prompt pool, serving as
shared meta-knowledge, to enhance adaptability for complex
tasks.
A new soft verbalizer, called Representative Verbalizer
(RepVerb), is introduced in the paper. It constructs label
embeddings by averaging feature embeddings of respective
training samples.
In contrast to MetaPrompting, this approach requires
significantly fewer parameters, specifically 1000× fewer.
13 / 45

MetaPrompter
Preliminaries and Related Work
Table of contents
1 Abstract
2 Introduction
Prompt Learning
Meta-Learning for Prompt Learning
4 Proposed Method
5 Experiments
6 Conclusion
14 / 45

MetaPrompter
Prompt Learning
Prompt Learning
Given a sequence of n tokens (x1, . . . , xn), the MLM uses
x = ([CLS], x1, . . . , xn, [SEP]) as input, and encodes it into a hidden
representations (h[CLS], h1, . . . , hn, h[SEP]).
With fine-tuning, an extra classifier is added on top of h[CLS] to
predict the label distribution.
On the other hand, prompt learning freezes the pre-trained model
and formulates the downstream task as a cloze-style MLM M(·; 𝝓)
problem.
15 / 45

MetaPrompter
Prompt Learning
Prompt Learning (Cont.)
In topic classification, ”Topic is [MASK]” can be used as the
prompt.
If the discrete prompts are used, the input text x is wrapped with the
prompt and mapped to an input embedding sequence
(E(x), E(Topic), E(is), E([MASK])).
16 / 45

MetaPrompter
Prompt Learning
With prompt tuning, which using a continuous prompt 𝜽 ∈ RLp×di .
The input embedding sequence turns into (E(x), 𝜽, E([MASK])).
This can be further combined with anchor tokens to form a template:
x̃ = T(x; 𝜽) = (E(x), 𝜽, E(Topic), E(is), E([MASK]))
The MLM’s output h[MASK](x̃) ∈ Rdo , which will be infered and filled
at the [MASK] position.
17 / 45

MetaPrompter
Prompt Learning
A verbalizer bridges the prediction at the [MASK] position and labels
in prompt learning.
Prompt tuning then optimizes (𝝓, 𝜽) by maximizing the label
probability:
P̂(y | x; 𝝓, 𝜽) =
1
|Vy|
∑︁
w∈Vy
PM ([MASK] = w | T(x; 𝜽))
where Vy is a set of label-relevant tokens. For example, y = SPORTS,
then Vy = {sports, football, basketball}.
18 / 45

MetaPrompter
Prompt Learning
The verbalizer is crucial to the performance of prompt learning.
The computation costs a lot if the searching space is a discrete space.
Such that, soft verbalizer [7] is proposed, which map each label to a
continuous embedding and predicts the label distribution based on the
similarities between feature embedding and label embeddings.
But, another challenge arose in few-shot learning.
19 / 45

MetaPrompter
In meta-learning, we need to learn a collection T of tasks with a
shared meta-parameter. Each task 𝜏 ∈ T has a support set S𝜏 and a
query set Q𝜏. The label set of 𝜏 is Y𝜏.
Since prompt tuning is sensitive to prompt initialization in few-shot
tasks, meta-learning can be used to search for a good initialization.
20 / 45

MetaPrompter
MetaPrompting
It uses MAML to learn a meta-initialization for the task-specific
prompts.
For each iteration t, the base learner uses a task 𝜏 and
meta-parameters (𝝓t−1, 𝜽t−1), and builds a task-specific model
(𝝓t,J, 𝜽t,J) by performing J gradient updates on the support set with
step size 𝛼 and initialization (𝝓t,0, 𝜽t,0) ≡ (𝝓t−1, 𝜽t−1).
The meta-learner updates it by maximizing the log-likelihood
objective on the query set with a step size represented by 𝜂.
21 / 45

MetaPrompter
Proposed Method
Table of contents
1 Abstract
2 Introduction
4 Proposed Method
Representative Verbalizer (RepVerb)
Meta Structured-Prompting
5 Experiments
6 Conclusion
22 / 45

MetaPrompter
Proposed Method
They propose Representative Verbalizer (RepVerb), which constructs
vy from feature embeddings of the corresponding training samples:
vy =
1
|S𝜏,y|
∑︁
(x,y)∈S𝜏,y
h[MASK](x̃)
23 / 45

MetaPrompter
Proposed Method
To predict the label of a given x, they measure the cosine similarity
between h[MASK](x̃) and each vy(y ∈ Y𝜏):
P̃(y | x; 𝝓, 𝜽) =
exp 𝜌 cos vy, h[MASK](x̃)

Í
y′ ∈Y𝜏
𝜏
exp 𝜌 cos vy′ , h[MASK](x̃)

The temperature 𝜌 0, they set 𝜌 = 10 [12].
𝜌 → ∞, P̃(y | x; 𝝓, 𝜽) becomes one-hot.
𝜌 → 0, P̃(y | x; 𝝓, 𝜽) becomes uniform.
24 / 45

MetaPrompter
Proposed Method
Meta-Learn a Prompt Pool
A prompt pool has K learnable prompts {(ki, 𝜽i) : I = 1, . . . , K} with
key ki ∈ Rdo and value 𝜽i ∈ RLp×di [11].
The attention weights are computed as a = softmax(
Kqx
√
do
), where
K = [k⊤
1 ; . . . ; k⊤
K], and qx ∈ Rdo is the embedding of the [MASK]
output by a pre-trained and frozen MLM with the wrapped input.
25 / 45

MetaPrompter
Proposed Method
Meta-Learn a Prompt Pool
The generated prompt is weighted averaging over all the values (𝜽i’s):
𝜽x(K, 𝚯) =
K
∑︁
i=1
ai𝜽i
where 𝚯 = [𝜽1; . . . ; 𝜽K].
The proposed procedure for meta-learning the prompt pool (K, 𝚯),
which will be called MetaPrompter.
26 / 45

MetaPrompter
Proposed Method
27 / 45

MetaPrompter
Proposed Method
Base learner
They use 𝜽x,j to predict the label probability with both hand-crafted
verbalizer and soft verbalizer:
P(y | x; 𝜽x,j) = (1 − 𝜆)P̂(y | 𝜽x,j) + 𝜆P̃(y | 𝜽x,j)
where 𝜆 ∈ [0, 1].
28 / 45

MetaPrompter
Proposed Method
Meta learner
29 / 45

MetaPrompter
Proposed Method
Meta-Testing
The MLM preicts the label probability with the input x̃ ≡ T(x; 𝜽x,J),
where prompt pool is built with base learner from S𝜏′ .
And for each instance-dependent prompts 𝜽x,J are constructed from
each (x, ·) ∈ Q𝜏′ , where 𝜏′ is an unseen task.
30 / 45

MetaPrompter
Proposed Method
MetaPrompter is Parameter-Efficient
The number of parameters:
MetaPrompter is K(do + Lpdi).
MetaPrompting is d𝜙 + Lpdi.
For example, they use BERT (with do = di = 768, d𝜙 = 109 × 106)
and K = Lp = 8 in MetaPrompter.
31 / 45

MetaPrompter
Experiments
Table of contents
1 Abstract
2 Introduction
4 Proposed Method
5 Experiments
6 Conclusion
32 / 45

MetaPrompter
Experiments
Setup
In the experiment, they perform few-shot classification on six
popularly used data sets:
Table 1: Statistics of the data sets.
33 / 45

MetaPrompter
Experiments
Evaluation on RepVerb
They compare the performance of the proposed RepVerb with
state-of-the-art soft verbalizers:
1 WARP
2 ProtoVerb [3]
Figure 1: t-SNE visualization of [MASK]’s embeddings (crosses) and label
embeddings (circles) for a 5-way 5-shot task randomly sampled from Reuters.
34 / 45

MetaPrompter
Experiments
Evaluation on RepVerb
Table 2: Meta-testing accuracy of various verbalizers on 5-way few-shot
classification.
35 / 45

MetaPrompter
Experiments
Baseline
Prompt-based method
1 MetaPrompting
2 MetaPrompting + WARP
3 MetaPrompting + ProtoVerb
4 MetaPrompting + RepVerb
Non-prompt-based method
1 HATT [6]
2 DS [1]
3 MLADA [8]
4 ConstructNet [2]
36 / 45

MetaPrompter
Experiments
Evaluation on MetaPrompter
Table 3: 5-way 5-shot classification meta-testing accuracy.
Table 4: 5-way 1-shot Meta-testing classification accuracy.
37 / 45

MetaPrompter
Conclusion
Table of contents
1 Abstract
2 Introduction
4 Proposed Method
5 Experiments
6 Conclusion
38 / 45

MetaPrompter
Conclusion
Conclusion
This paper proposes MetaPrompter, which combines structured
prompting and a novel verbalizer called RepVerb.
A prompt pool structure is used to construct instance-dependent
prompts by attention.
RepVerb builds label embedding by averaging feature embeddings of
the corresponding training samples.
39 / 45

MetaPrompter
Conclusion
Reflection
MetaPrompter requires the availability of a set of meta-training tasks.
40 / 45

MetaPrompter
Conclusion
References I
[1] Yujia Bao, Menghua Wu, et al. “Few-shot Text Classification
with Distributional Signatures”. In: International Conference
on Learning Representations. 2019.
[2] Junfan Chen, Richong Zhang, et al. “Contrastnet: A contrastive
learning framework for few-shot text classification”. In:
Proceedings of the AAAI Conference on Artificial Intelligence.
2022, pp. 10492–10500.
[3] Ganqu Cui, Shengding Hu, et al. “Prototypical Verbalizer for
Prompt-based Few-shot Tuning”. In: Proceedings of the 60th
Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers). 2022, pp. 7014–7024.
41 / 45

MetaPrompter
Conclusion
References II
[4] Jacob Devlin, Ming-Wei Chang, et al. “BERT: Pre-training of
Deep Bidirectional Transformers for Language Understanding”.
In: Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long and Short
Papers). 2019, pp. 4171–4186.
[5] Chelsea Finn, Pieter Abbeel, et al. “Model-agnostic
meta-learning for fast adaptation of deep networks”. In:
International conference on machine learning. 2017,
pp. 1126–1135.
42 / 45

MetaPrompter
Conclusion
References III
[6] Tianyu Gao, Xu Han, et al. “Hybrid attention-based
prototypical networks for noisy few-shot relation
classification”. In: Proceedings of the AAAI conference on
artificial intelligence. 2019, pp. 6407–6414.
[7] Karen Hambardzumyan, Hrant Khachatrian, et al. “WARP:
Word-level Adversarial ReProgramming”. In: Proceedings of
the 59th Annual Meeting of the Association for Computational
Linguistics and the 11th International Joint Conference on
Natural Language Processing (Volume 1: Long Papers). 2021,
pp. 4921–4933.
43 / 45

MetaPrompter
Conclusion
References IV
[8] Chengcheng Han, Zeqiu Fan, et al. “Meta-Learning Adversarial
Domain Adaptation Network for Few-Shot Text Classification”.
In: Findings of the Association for Computational Linguistics:
ACL-IJCNLP 2021. 2021, pp. 1664–1673.
[9] Yutai Hou, Hongyuan Dong, et al. “MetaPrompting: Learning
to Learn Better Prompts”. In: Proceedings of the 29th
International Conference on Computational Linguistics. 2022,
pp. 3251–3262.
[10] Neil Houlsby, Andrei Giurgiu, et al. “Parameter-efficient
transfer learning for NLP”. In: International conference on
machine learning. 2019, pp. 2790–2799.
44 / 45

MetaPrompter
Conclusion
References V
[11] Junyi Li, Tianyi Tang, et al. “Learning to Transfer Prompts for
Text Generation”. In: Proceedings of the 2022 Conference of
the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies.
2022, pp. 3506–3518.
[12] Boris Oreshkin, Pau Rodrı́guez López, et al. “Tadam: Task
dependent adaptive metric for improved few-shot learning”. In:
Advances in neural information processing systems (2018).
[13] Alec Radford, Jeffrey Wu, et al. “Language models are
unsupervised multitask learners”. In: OpenAI blog (2019), p. 9.
45 / 45

Effective Structured Prompting by Meta-Learning and Representative Verbalizer.pdf

Recommended

Recommended

More Related Content

Similar to Effective Structured Prompting by Meta-Learning and Representative Verbalizer.pdf

Similar to Effective Structured Prompting by Meta-Learning and Representative Verbalizer.pdf (20)

More from Po-Chuan Chen

More from Po-Chuan Chen (20)

Recently uploaded

Recently uploaded (20)

Effective Structured Prompting by Meta-Learning and Representative Verbalizer.pdf