SlideShare a Scribd company logo
LLaMA-Adapter
LLaMA-Adapter: Efficient Fine-tuning of
Language Models with Zero-init Attention
Renrui Zhang, Jiaming Han, Chris Liu et al.
Speaker: Po-Chuan Chen
Jul 25, 2023
1 / 32
LLaMA-Adapter
Table of contents
1 Abstract
2 Introduction
3 Related Work
4 LLaMA-Adapter
5 Experiment
6 Conclusion
7 Reflection
2 / 32
LLaMA-Adapter
Abstract
Abstract
This paper proposes LLaMA-Adapter1, a lightweight adaption
method to efficiently finetune LLaMA into an instruction-following
model.
Specifically, they adopt a set of learnable adaption prompts, and
prepend them to the word tokens at higher transformer layers.
Then, a zero-initialized attention mechanism with zero gating is
proposed, which adaptively injects the new instructional cues into
LLaMA, while effectively preserves its pre-trained knowledge.
1https://github.com/OpenGVLab/LLaMA-Adapter
3 / 32
LLaMA-Adapter
Abstract
Abstract
LLaMA-Adapter can generate high-quality responses, comparable to
Alpaca [6] with fully fine-tuned 7B parameters.
Also, it can be simply extended to multi-modal instructions for
learning image-conditioned LLaMA model, which achieves superior
reasoning performance on ScienceQA and COCO Caption
benchmarks.
4 / 32
LLaMA-Adapter
Introduction
Table of contents
1 Abstract
2 Introduction
3 Related Work
4 LLaMA-Adapter
5 Experiment
6 Conclusion
7 Reflection
5 / 32
LLaMA-Adapter
Introduction
Introduction
Large-scale Language Models (LLMs) have stimulated widespread
attention in both academia and industry. However, lots of them
impeded by closed-source restriction and high development costs.
To alleviate this, Stanford Alpaca proposes to fine-tune an LLM, i.e.,
LLaMA [8] into an instruction-following model, which is affordable
and replicable.
Alpaca shows that fine-tunes the entire 7B parameters in LLaMA,
producing an exceptional instruction model that performs similarly to
GPT-3.5. But, finetuning LLaMA is still time-consuming,
computation-intensive.
6 / 32
LLaMA-Adapter
Introduction
Contribution
Figure 1: Characteristics of LLaMA-Adapter.
7 / 32
LLaMA-Adapter
Related Work
Related Work
Instruction-Following Language Models.
1 FLAN
2 InstructGPT
3 GPT-3.5 / GPT-4
4 Stanford Alpaca
5 Alpaca-LoRA [7]
Parameter-Efficient Fine-Tuning (PEFT).
1 Adapters
2 Low-Rank Adaptation (LoRA)
3 Prompt tuning
8 / 32
LLaMA-Adapter
LLaMA-Adapter
Table of contents
1 Abstract
2 Introduction
3 Related Work
4 LLaMA-Adapter
Learnable Adaption Prompts
Zero-initialized Attention
Multi-modal Reasoning
Zero-initialized Attention for other Large Models
5 Experiment
9 / 32
LLaMA-Adapter
LLaMA-Adapter
Learnable Adaption Prompts
Learnable Adaption Prompts
For the learnable adaption prompts for instruction-following
fine-tuning, they use 52K instruction-output data [9] and a pre-trained
LLaMA with an N-layer transformer.
Prompts can be defined for L transformer layers as {Pl}L
l=1, where
Pl ∈ RK×C with K denoting the prompt length for each layer, and C
equaling the feature dimension of LLaMA’s transformer.
Since they want tune the language representations with higher-level
semantics, L ≤ N.
10 / 32
LLaMA-Adapter
LLaMA-Adapter
Learnable Adaption Prompts
Learnable Adaption Prompts (cont.)
The learnable adaption prompt is concatenated with Tl along the token
dimension as prefix, formulated as
[Pl, Tl] ∈ R(K+M)×C
(1)
where M is the length word tokens.
In this way, the instruction knowledge learned within Pl , can
effectively guide Tl to generate the subsequent contextual response via
attention layers in the transformer block.
11 / 32
LLaMA-Adapter
LLaMA-Adapter
Zero-initialized Attention
Zero-initialized Attention
They found that if the adaption prompts are randomly initialized, the
stability and effectiveness with fine-tuning will be harmed.
Such that they modify the vanilla attention mechanisms at the last L
transformer layers to be zero-initialized attention.
12 / 32
LLaMA-Adapter
LLaMA-Adapter
Zero-initialized Attention
Figure 2: Details of LLaMA-Adapter.
13 / 32
LLaMA-Adapter
LLaMA-Adapter
Zero-initialized Attention
Zero-initialized Attention (cont.)
In the attention mechanism, several linear projection layers are first
applied to transform the input tokens into queries, keys, and values.
Ql = Linearq(tl); (2)
Kl = Lineark([Pl; Tl; tl]); (3)
Vl = Linearv([Pl; Tl; tl]). (4)
Then, the attention scores of Ql and Kl before the softmax function are
calculated as
Sl = QlKT
l /
√
C ∈ R1×(K+M+1)
(5)
14 / 32
LLaMA-Adapter
LLaMA-Adapter
Zero-initialized Attention
Zero-initialized Attention (cont.)
Meanwhile, Sl can be reformulated by two components as
Sl = [SK
l ; SM+1
l ]T
(6)
where SK
l ∈ RK and SM+1
l ∈ R(M+1)×1.
To this end, they adopt a learnable gating factor, denoted as gl, to
adaptively control the importance of SK
l in the attention.
Therefore, it independently apply the softmax functions to the two
components in Equation (6), and multiply the first term by gl,
formulated as
S
g
l = [softmax(SK
l ) · gl; softmax(SM+1
l )]T
(7)
15 / 32
LLaMA-Adapter
LLaMA-Adapter
Zero-initialized Attention
Zero-initialized Attention (cont.)
Finally, they calculate the output of the l-th attention layer with a
linear projection layer as
to
l = Linearo(S
g
l Vl) ∈ R1×C
(8)
16 / 32
LLaMA-Adapter
LLaMA-Adapter
Multi-modal Reasoning
Multi-modal Reasoning
Apart from textual instructions, LLaMA-Adapter is capable of
answering a question based on input of other modalities, which
augments the language model with rich cross-modal information.
Figure 3: Multi-modal Reasoning of LLaMA-Adapter.
17 / 32
LLaMA-Adapter
LLaMA-Adapter
Multi-modal Reasoning
Multi-modal Reasoning (cont.)
For an input image as the visual context, they first leverage a
pre-trained visual encoder, e.g., CLIP [5], to extract its multi-scale
global features, denoted as {Im}M
m=1, where Im ∈ R1×Cm and M denotes
the scale number.
A learnable projection network formulated as
Ip = Projection(Concat({Im}M
m=1)) (9)
where Ip ∈ R1×C and is regarded as the overall image token with the
same feature dimension as the adaption prompts.
18 / 32
LLaMA-Adapter
LLaMA-Adapter
Multi-modal Reasoning
Multi-modal Reasoning (cont.)
And then they repeat Ip for K times, and element-wisely add it onto
the K-length adaption prompts at all L inserted transformer layers. For
the l-th layer, they denote the acquired multi-modal prompt as
Pv
l = Pl + Repeat(Ip) ∈ RK×C
(10)
where Pv
l denotes the adaption prompt incorporating visual
information from the given image context.
19 / 32
LLaMA-Adapter
LLaMA-Adapter
Zero-initialized Attention for other Large Models
Zero-initialized Attention for other Large Models
Here, vision model uses ViT [1], language Model uses RoBERTa [4].
Vision Models. They insert the adaption prompts as prefix into
the topmost L transformer layers in ViT, and modify the attention
operations to be zero-initialized at all inserted layers.
Language Models. They implement the zero-initialized
attention on top of P-tuning v2 [3], a prompt tuning method for
efficiently adapting large language models. Likewise, they only
enable the prompt tokens in P-tuning v2 and their zero gating
factors to be learnable during fine-tuning.
20 / 32
LLaMA-Adapter
Experiment
Table of contents
1 Abstract
2 Introduction
3 Related Work
4 LLaMA-Adapter
5 Experiment
6 Conclusion
7 Reflection
21 / 32
LLaMA-Adapter
Experiment
Experiment
Instruction-following Evaluation
Multi-modal Evaluation
Ablation Study
Zero-initialized Attention for other Large Models
22 / 32
LLaMA-Adapter
Experiment
Figure 4: Instruction-following Comparison.
23 / 32
LLaMA-Adapter
Experiment
Table 1: Question Answering Accuracy (%) on ScienceQA’s test set.
24 / 32
LLaMA-Adapter
Experiment
Ablation Study
Focus on Insertion Layers, Zero-initialized Attention, Robustness to
Over-fitting.
Table 2: Inserted Layers (left) and Zero-initialized Attention (right)
Table 3: Robustness to Over-fitting
25 / 32
LLaMA-Adapter
Experiment
Zero-initialized Attention for other Large Models
Table 4: Vision (left) / Language (right) Model Fine-tuning
This demonstrates their superiority on traditional vision and language
tasks compared to existing fine-tuning methods.
26 / 32
LLaMA-Adapter
Conclusion
Table of contents
1 Abstract
2 Introduction
3 Related Work
4 LLaMA-Adapter
5 Experiment
6 Conclusion
7 Reflection
27 / 32
LLaMA-Adapter
Conclusion
Conclusion
In this paper, they propose LLaMA-Adapter, an efficient adaption
method for training instruction following models.
Also, they introduce zero-initialized attention with gating mechanism,
which adaptively incorporates instructional signals, while preserving
the pre-trained knowledge in LLaMA.
LLaMA-Adapter can be generalized to image conditions for
multi-modal reasoning, and language tasks. Their zero-initialized
attention also attains favorable fine-tuning performance, which
indicates strong generalization capacity.
28 / 32
LLaMA-Adapter
Reflection
Reflection
This work presents a parameter-efficient tuning on LLaMA. A way to
add additional parameters in prefixes to finetune the language model.
[Softmax(QKT
1 ), 𝛼 · Softmax(QKT
2 )][VT
1 , VT
2 ]T
(11)
While in their other paper, LLaMA-Adapter V2 [2], they add the
parameter in the scaled dot product attention. Both are equalized,
which means they can implement LLaMA Adapter in a flexible way.
Softmax(QKT
1 )V1 + 𝛼 · Softmax(QKT
2 )V2 (12)
29 / 32
LLaMA-Adapter
Reflection
References I
[1] Alexey Dosovitskiy et al. An Image is Worth 16x16 Words:
Transformers for Image Recognition at Scale. 2021. arXiv:
2010.11929 [cs.CV].
[2] Peng Gao et al. LLaMA-Adapter V2: Parameter-Efficient Visual
Instruction Model. 2023. arXiv: 2304.15010 [cs.CV].
[3] Xiao Liu et al. P-Tuning v2: Prompt Tuning Can Be Comparable
to Fine-tuning Universally Across Scales and Tasks. 2022. arXiv:
2110.07602 [cs.CL].
[4] Yinhan Liu et al. RoBERTa: A Robustly Optimized BERT
Pretraining Approach. 2019. arXiv: 1907.11692 [cs.CL].
30 / 32
LLaMA-Adapter
Reflection
References II
[5] Alec Radford et al. “Learning Transferable Visual Models From
Natural Language Supervision”. In: Proceedings of the 38th
International Conference on Machine Learning. Ed. by
Marina Meila and Tong Zhang. Vol. 139. Proceedings of
Machine Learning Research. PMLR, July 2021, pp. 8748–8763.
url: https:
//proceedings.mlr.press/v139/radford21a.html.
[6] Rohan Taori et al. Stanford Alpaca: An Instruction-following
LLaMA model.
https://github.com/tatsu-lab/stanford_alpaca. 2023.
[7] tloen. Alpaca-LoRA.
https://github.com/tloen/alpaca-lora. 2023.
31 / 32
LLaMA-Adapter
Reflection
References III
[8] Hugo Touvron et al. LLaMA: Open and Efficient Foundation
Language Models. 2023. arXiv: 2302.13971 [cs.CL].
[9] Yizhong Wang et al. Self-Instruct: Aligning Language Model
with Self Generated Instructions. 2022.
32 / 32

More Related Content

What's hot

BERT introduction
BERT introductionBERT introduction
BERT introduction
Hanwha System / ICT
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
Julien SIMON
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
Young Seok Kim
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Sangwoo Mo
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
JiWenKim
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
Julien SIMON
 
Word2Vec
Word2VecWord2Vec
Word2Vec
hyunyoung Lee
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
Suman Debnath
 
Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2
Vitaly Bondar
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
Sangwoo Mo
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
Jeong-Gwan Lee
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
Illia Polosukhin
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
Loic Merckel
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Po-Chuan Chen
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
JEE HYUN PARK
 
BERT
BERTBERT
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
Hady Elsahar
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Sungchul Kim
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
Giacomo Frisoni
 
A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
Ishan Jain
 

What's hot (20)

BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
BERT
BERTBERT
BERT
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
 
A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
 

Similar to LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.pdf

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
San Kim
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
Po-Chuan Chen
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
Jussara F.M.
 
An Offline Hybrid IGP/MPLS Traffic Engineering Approach under LSP Constraints
An Offline Hybrid IGP/MPLS Traffic Engineering Approach under LSP ConstraintsAn Offline Hybrid IGP/MPLS Traffic Engineering Approach under LSP Constraints
An Offline Hybrid IGP/MPLS Traffic Engineering Approach under LSP Constraints
EM Legacy
 
D0341015020
D0341015020D0341015020
D0341015020
inventionjournals
 
On Applying Or-Parallelism and Tabling to Logic Programs
On Applying Or-Parallelism and Tabling to Logic ProgramsOn Applying Or-Parallelism and Tabling to Logic Programs
On Applying Or-Parallelism and Tabling to Logic Programs
Lino Possamai
 
EGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project WriteupEGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project Writeup
Jacob Ramey
 
Multi layered perceptron (mlp)
Multi layered perceptron (mlp)Multi layered perceptron (mlp)
Multi layered perceptron (mlp)
Handson System
 
Local Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxLocal Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptx
lwz614595250
 
Calfem34
Calfem34Calfem34
Jx3417921795
Jx3417921795Jx3417921795
Jx3417921795
IJERA Editor
 
Console manual impl
Console manual implConsole manual impl
Console manual impl
Alkis Vazacopoulos
 
Evaluation of scalability and bandwidth
Evaluation of scalability and bandwidthEvaluation of scalability and bandwidth
Evaluation of scalability and bandwidth
IJCNCJournal
 
Partial Differential Equations (PDE’s) Industrial Modeling Framework (PDE-IMF)
Partial Differential Equations (PDE’s)  Industrial Modeling Framework (PDE-IMF)Partial Differential Equations (PDE’s)  Industrial Modeling Framework (PDE-IMF)
Partial Differential Equations (PDE’s) Industrial Modeling Framework (PDE-IMF)
Alkis Vazacopoulos
 
OPTIMIZATION OF IP NETWORKS IN VARIOUS HYBRID IGP/MPLS ROUTING SCHEMES
OPTIMIZATION OF IP NETWORKS IN VARIOUS HYBRID IGP/MPLS ROUTING SCHEMESOPTIMIZATION OF IP NETWORKS IN VARIOUS HYBRID IGP/MPLS ROUTING SCHEMES
OPTIMIZATION OF IP NETWORKS IN VARIOUS HYBRID IGP/MPLS ROUTING SCHEMES
EM Legacy
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Spark Summit
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
Databricks
 
Jf2416121616
Jf2416121616Jf2416121616
Jf2416121616
IJERA Editor
 
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
Po-Chuan Chen
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
Enrique Monzo Solves
 

Similar to LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.pdf (20)

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
 
An Offline Hybrid IGP/MPLS Traffic Engineering Approach under LSP Constraints
An Offline Hybrid IGP/MPLS Traffic Engineering Approach under LSP ConstraintsAn Offline Hybrid IGP/MPLS Traffic Engineering Approach under LSP Constraints
An Offline Hybrid IGP/MPLS Traffic Engineering Approach under LSP Constraints
 
D0341015020
D0341015020D0341015020
D0341015020
 
On Applying Or-Parallelism and Tabling to Logic Programs
On Applying Or-Parallelism and Tabling to Logic ProgramsOn Applying Or-Parallelism and Tabling to Logic Programs
On Applying Or-Parallelism and Tabling to Logic Programs
 
EGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project WriteupEGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project Writeup
 
Multi layered perceptron (mlp)
Multi layered perceptron (mlp)Multi layered perceptron (mlp)
Multi layered perceptron (mlp)
 
Local Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptxLocal Applications of Large Language Models based on RAG.pptx
Local Applications of Large Language Models based on RAG.pptx
 
Calfem34
Calfem34Calfem34
Calfem34
 
Jx3417921795
Jx3417921795Jx3417921795
Jx3417921795
 
Console manual impl
Console manual implConsole manual impl
Console manual impl
 
Evaluation of scalability and bandwidth
Evaluation of scalability and bandwidthEvaluation of scalability and bandwidth
Evaluation of scalability and bandwidth
 
Partial Differential Equations (PDE’s) Industrial Modeling Framework (PDE-IMF)
Partial Differential Equations (PDE’s)  Industrial Modeling Framework (PDE-IMF)Partial Differential Equations (PDE’s)  Industrial Modeling Framework (PDE-IMF)
Partial Differential Equations (PDE’s) Industrial Modeling Framework (PDE-IMF)
 
OPTIMIZATION OF IP NETWORKS IN VARIOUS HYBRID IGP/MPLS ROUTING SCHEMES
OPTIMIZATION OF IP NETWORKS IN VARIOUS HYBRID IGP/MPLS ROUTING SCHEMESOPTIMIZATION OF IP NETWORKS IN VARIOUS HYBRID IGP/MPLS ROUTING SCHEMES
OPTIMIZATION OF IP NETWORKS IN VARIOUS HYBRID IGP/MPLS ROUTING SCHEMES
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
Jf2416121616
Jf2416121616Jf2416121616
Jf2416121616
 
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
 

More from Po-Chuan Chen

Graph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdfGraph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdf
Po-Chuan Chen
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdfE-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
Po-Chuan Chen
 
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Po-Chuan Chen
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Po-Chuan Chen
 
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Po-Chuan Chen
 
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
Po-Chuan Chen
 
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
Po-Chuan Chen
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
Po-Chuan Chen
 
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdf
Po-Chuan Chen
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdf
Po-Chuan Chen
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Po-Chuan Chen
 
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfCold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Po-Chuan Chen
 
Image_to_Prompts.pdf
Image_to_Prompts.pdfImage_to_Prompts.pdf
Image_to_Prompts.pdf
Po-Chuan Chen
 
Evaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfEvaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdf
Po-Chuan Chen
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Po-Chuan Chen
 
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
Po-Chuan Chen
 
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Po-Chuan Chen
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
Po-Chuan Chen
 
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Po-Chuan Chen
 

More from Po-Chuan Chen (20)

Graph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdfGraph Neural Prompting with Large Language Models.pdf
Graph Neural Prompting with Large Language Models.pdf
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdfE-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation.pdf
 
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
 
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
 
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
 
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdf
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdf
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
 
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfCold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdf
 
Image_to_Prompts.pdf
Image_to_Prompts.pdfImage_to_Prompts.pdf
Image_to_Prompts.pdf
 
Evaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfEvaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdf
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdf
 
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
 
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
 
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
 

Recently uploaded

Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 

Recently uploaded (20)

Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.pdf

  • 1. LLaMA-Adapter LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention Renrui Zhang, Jiaming Han, Chris Liu et al. Speaker: Po-Chuan Chen Jul 25, 2023 1 / 32
  • 2. LLaMA-Adapter Table of contents 1 Abstract 2 Introduction 3 Related Work 4 LLaMA-Adapter 5 Experiment 6 Conclusion 7 Reflection 2 / 32
  • 3. LLaMA-Adapter Abstract Abstract This paper proposes LLaMA-Adapter1, a lightweight adaption method to efficiently finetune LLaMA into an instruction-following model. Specifically, they adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. 1https://github.com/OpenGVLab/LLaMA-Adapter 3 / 32
  • 4. LLaMA-Adapter Abstract Abstract LLaMA-Adapter can generate high-quality responses, comparable to Alpaca [6] with fully fine-tuned 7B parameters. Also, it can be simply extended to multi-modal instructions for learning image-conditioned LLaMA model, which achieves superior reasoning performance on ScienceQA and COCO Caption benchmarks. 4 / 32
  • 5. LLaMA-Adapter Introduction Table of contents 1 Abstract 2 Introduction 3 Related Work 4 LLaMA-Adapter 5 Experiment 6 Conclusion 7 Reflection 5 / 32
  • 6. LLaMA-Adapter Introduction Introduction Large-scale Language Models (LLMs) have stimulated widespread attention in both academia and industry. However, lots of them impeded by closed-source restriction and high development costs. To alleviate this, Stanford Alpaca proposes to fine-tune an LLM, i.e., LLaMA [8] into an instruction-following model, which is affordable and replicable. Alpaca shows that fine-tunes the entire 7B parameters in LLaMA, producing an exceptional instruction model that performs similarly to GPT-3.5. But, finetuning LLaMA is still time-consuming, computation-intensive. 6 / 32
  • 8. LLaMA-Adapter Related Work Related Work Instruction-Following Language Models. 1 FLAN 2 InstructGPT 3 GPT-3.5 / GPT-4 4 Stanford Alpaca 5 Alpaca-LoRA [7] Parameter-Efficient Fine-Tuning (PEFT). 1 Adapters 2 Low-Rank Adaptation (LoRA) 3 Prompt tuning 8 / 32
  • 9. LLaMA-Adapter LLaMA-Adapter Table of contents 1 Abstract 2 Introduction 3 Related Work 4 LLaMA-Adapter Learnable Adaption Prompts Zero-initialized Attention Multi-modal Reasoning Zero-initialized Attention for other Large Models 5 Experiment 9 / 32
  • 10. LLaMA-Adapter LLaMA-Adapter Learnable Adaption Prompts Learnable Adaption Prompts For the learnable adaption prompts for instruction-following fine-tuning, they use 52K instruction-output data [9] and a pre-trained LLaMA with an N-layer transformer. Prompts can be defined for L transformer layers as {Pl}L l=1, where Pl ∈ RK×C with K denoting the prompt length for each layer, and C equaling the feature dimension of LLaMA’s transformer. Since they want tune the language representations with higher-level semantics, L ≤ N. 10 / 32
  • 11. LLaMA-Adapter LLaMA-Adapter Learnable Adaption Prompts Learnable Adaption Prompts (cont.) The learnable adaption prompt is concatenated with Tl along the token dimension as prefix, formulated as [Pl, Tl] ∈ R(K+M)×C (1) where M is the length word tokens. In this way, the instruction knowledge learned within Pl , can effectively guide Tl to generate the subsequent contextual response via attention layers in the transformer block. 11 / 32
  • 12. LLaMA-Adapter LLaMA-Adapter Zero-initialized Attention Zero-initialized Attention They found that if the adaption prompts are randomly initialized, the stability and effectiveness with fine-tuning will be harmed. Such that they modify the vanilla attention mechanisms at the last L transformer layers to be zero-initialized attention. 12 / 32
  • 14. LLaMA-Adapter LLaMA-Adapter Zero-initialized Attention Zero-initialized Attention (cont.) In the attention mechanism, several linear projection layers are first applied to transform the input tokens into queries, keys, and values. Ql = Linearq(tl); (2) Kl = Lineark([Pl; Tl; tl]); (3) Vl = Linearv([Pl; Tl; tl]). (4) Then, the attention scores of Ql and Kl before the softmax function are calculated as Sl = QlKT l / √ C ∈ R1×(K+M+1) (5) 14 / 32
  • 15. LLaMA-Adapter LLaMA-Adapter Zero-initialized Attention Zero-initialized Attention (cont.) Meanwhile, Sl can be reformulated by two components as Sl = [SK l ; SM+1 l ]T (6) where SK l ∈ RK and SM+1 l ∈ R(M+1)×1. To this end, they adopt a learnable gating factor, denoted as gl, to adaptively control the importance of SK l in the attention. Therefore, it independently apply the softmax functions to the two components in Equation (6), and multiply the first term by gl, formulated as S g l = [softmax(SK l ) · gl; softmax(SM+1 l )]T (7) 15 / 32
  • 16. LLaMA-Adapter LLaMA-Adapter Zero-initialized Attention Zero-initialized Attention (cont.) Finally, they calculate the output of the l-th attention layer with a linear projection layer as to l = Linearo(S g l Vl) ∈ R1×C (8) 16 / 32
  • 17. LLaMA-Adapter LLaMA-Adapter Multi-modal Reasoning Multi-modal Reasoning Apart from textual instructions, LLaMA-Adapter is capable of answering a question based on input of other modalities, which augments the language model with rich cross-modal information. Figure 3: Multi-modal Reasoning of LLaMA-Adapter. 17 / 32
  • 18. LLaMA-Adapter LLaMA-Adapter Multi-modal Reasoning Multi-modal Reasoning (cont.) For an input image as the visual context, they first leverage a pre-trained visual encoder, e.g., CLIP [5], to extract its multi-scale global features, denoted as {Im}M m=1, where Im ∈ R1×Cm and M denotes the scale number. A learnable projection network formulated as Ip = Projection(Concat({Im}M m=1)) (9) where Ip ∈ R1×C and is regarded as the overall image token with the same feature dimension as the adaption prompts. 18 / 32
  • 19. LLaMA-Adapter LLaMA-Adapter Multi-modal Reasoning Multi-modal Reasoning (cont.) And then they repeat Ip for K times, and element-wisely add it onto the K-length adaption prompts at all L inserted transformer layers. For the l-th layer, they denote the acquired multi-modal prompt as Pv l = Pl + Repeat(Ip) ∈ RK×C (10) where Pv l denotes the adaption prompt incorporating visual information from the given image context. 19 / 32
  • 20. LLaMA-Adapter LLaMA-Adapter Zero-initialized Attention for other Large Models Zero-initialized Attention for other Large Models Here, vision model uses ViT [1], language Model uses RoBERTa [4]. Vision Models. They insert the adaption prompts as prefix into the topmost L transformer layers in ViT, and modify the attention operations to be zero-initialized at all inserted layers. Language Models. They implement the zero-initialized attention on top of P-tuning v2 [3], a prompt tuning method for efficiently adapting large language models. Likewise, they only enable the prompt tokens in P-tuning v2 and their zero gating factors to be learnable during fine-tuning. 20 / 32
  • 21. LLaMA-Adapter Experiment Table of contents 1 Abstract 2 Introduction 3 Related Work 4 LLaMA-Adapter 5 Experiment 6 Conclusion 7 Reflection 21 / 32
  • 22. LLaMA-Adapter Experiment Experiment Instruction-following Evaluation Multi-modal Evaluation Ablation Study Zero-initialized Attention for other Large Models 22 / 32
  • 24. LLaMA-Adapter Experiment Table 1: Question Answering Accuracy (%) on ScienceQA’s test set. 24 / 32
  • 25. LLaMA-Adapter Experiment Ablation Study Focus on Insertion Layers, Zero-initialized Attention, Robustness to Over-fitting. Table 2: Inserted Layers (left) and Zero-initialized Attention (right) Table 3: Robustness to Over-fitting 25 / 32
  • 26. LLaMA-Adapter Experiment Zero-initialized Attention for other Large Models Table 4: Vision (left) / Language (right) Model Fine-tuning This demonstrates their superiority on traditional vision and language tasks compared to existing fine-tuning methods. 26 / 32
  • 27. LLaMA-Adapter Conclusion Table of contents 1 Abstract 2 Introduction 3 Related Work 4 LLaMA-Adapter 5 Experiment 6 Conclusion 7 Reflection 27 / 32
  • 28. LLaMA-Adapter Conclusion Conclusion In this paper, they propose LLaMA-Adapter, an efficient adaption method for training instruction following models. Also, they introduce zero-initialized attention with gating mechanism, which adaptively incorporates instructional signals, while preserving the pre-trained knowledge in LLaMA. LLaMA-Adapter can be generalized to image conditions for multi-modal reasoning, and language tasks. Their zero-initialized attention also attains favorable fine-tuning performance, which indicates strong generalization capacity. 28 / 32
  • 29. LLaMA-Adapter Reflection Reflection This work presents a parameter-efficient tuning on LLaMA. A way to add additional parameters in prefixes to finetune the language model. [Softmax(QKT 1 ), 𝛼 · Softmax(QKT 2 )][VT 1 , VT 2 ]T (11) While in their other paper, LLaMA-Adapter V2 [2], they add the parameter in the scaled dot product attention. Both are equalized, which means they can implement LLaMA Adapter in a flexible way. Softmax(QKT 1 )V1 + 𝛼 · Softmax(QKT 2 )V2 (12) 29 / 32
  • 30. LLaMA-Adapter Reflection References I [1] Alexey Dosovitskiy et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2021. arXiv: 2010.11929 [cs.CV]. [2] Peng Gao et al. LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model. 2023. arXiv: 2304.15010 [cs.CV]. [3] Xiao Liu et al. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. 2022. arXiv: 2110.07602 [cs.CL]. [4] Yinhan Liu et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. 2019. arXiv: 1907.11692 [cs.CL]. 30 / 32
  • 31. LLaMA-Adapter Reflection References II [5] Alec Radford et al. “Learning Transferable Visual Models From Natural Language Supervision”. In: Proceedings of the 38th International Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. Vol. 139. Proceedings of Machine Learning Research. PMLR, July 2021, pp. 8748–8763. url: https: //proceedings.mlr.press/v139/radford21a.html. [6] Rohan Taori et al. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca. 2023. [7] tloen. Alpaca-LoRA. https://github.com/tloen/alpaca-lora. 2023. 31 / 32
  • 32. LLaMA-Adapter Reflection References III [8] Hugo Touvron et al. LLaMA: Open and Efficient Foundation Language Models. 2023. arXiv: 2302.13971 [cs.CL]. [9] Yizhong Wang et al. Self-Instruct: Aligning Language Model with Self Generated Instructions. 2022. 32 / 32