SlideShare a Scribd company logo
1 of 14
Download to read offline
LORA: LOW-RANK ADAPTATION OF LARGE
LANGUAGE MODELS
2023.3.9
유용상
NLP 티타임
Introduction
• We take inspiration from Li et al. (2018a); Aghajanyan et al. (2020) which
show that the learned over-parametrized models in fact reside on a low
intrinsic dimension.
• We hypothesize that the change in weights during model adaptation also
has a low “intrinsic rank”, leading to our proposed Low-Rank Adaptation
(LoRA) approach.
Introduction : Adventages
• A pre-trained model can be shared and used to build many small LoRA modules for
different tasks. We can freeze the shared model and efficiently switch tasks by replacing
the matrices in reducing the storage requirement and task-switching overhead
significantly.
• LoRA makes training more efficient and lowers the hardware barrier to entry by up to 3
times when using adaptive optimizers. optimize the injected, much smaller low-rank
matrices.
• Our simple linear design allows us to merge the trainable matrices with the frozen
weights when deployed, introducing no inference latency compared to a fully fine-tuned
model, by construction.
• LoRA is orthogonal to many prior methods and can be combined with many of them,
such as prefix-tuning.
Problem Statement
175B
82B
178B
530B
280B
Problem Statement
Aren’t Existing Solutions Good Enough?
Low-Rank Parametrized Update Matrices
• Forward Pass
• Update
W0 + ∆W = W0 + BA
• Hypothesis :
가중치에 대한 update도 adaptation 중 intrinsic rank가 낮을
것이다.
Low-Rank Parametrized Update Matrices
Low-Rank Parametrized Update Matrices
Low-Rank Parametrized Update Matrices
Empirical Experiments
Empirical Experiments
Conclusion
• Large-scale model을 효율적으로 튜닝하는 LoRA 제안
• Adapter류의 기법과 다르게 inference latency가 발생하지 않음
• Prefix-tuning과 다르게 usable sequence length를 줄일 필요가 없음
• 가중치 업데이트 행렬이 low intrinsic rank를 가진다고 가정
• 논문에선 LM에 초점을 맞췄지만 이론적으로 모든 dense layer에 적용
가능
PEFT
Blog : https://huggingface.co/blog/peft
https://4n3mone.tistory.com/7
Code : https://github.com/huggingface/peft

More Related Content

What's hot

Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learningRidge-i, Inc.
 
Benchmark comparison of Large Language Models
Benchmark comparison of Large Language ModelsBenchmark comparison of Large Language Models
Benchmark comparison of Large Language ModelsMatej Varga
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Prakhar Rastogi
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 
Underfitting and Overfitting in Machine Learning
Underfitting and Overfitting in Machine LearningUnderfitting and Overfitting in Machine Learning
Underfitting and Overfitting in Machine LearningAbdullah al Mamun
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Large Language Models Are Reasoning Teachers
Large Language Models Are Reasoning TeachersLarge Language Models Are Reasoning Teachers
Large Language Models Are Reasoning TeachersNamgyu Ho
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
 

What's hot (20)

gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Word embedding
Word embedding Word embedding
Word embedding
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
Benchmark comparison of Large Language Models
Benchmark comparison of Large Language ModelsBenchmark comparison of Large Language Models
Benchmark comparison of Large Language Models
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Underfitting and Overfitting in Machine Learning
Underfitting and Overfitting in Machine LearningUnderfitting and Overfitting in Machine Learning
Underfitting and Overfitting in Machine Learning
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Large Language Models Are Reasoning Teachers
Large Language Models Are Reasoning TeachersLarge Language Models Are Reasoning Teachers
Large Language Models Are Reasoning Teachers
 
Gpt models
Gpt modelsGpt models
Gpt models
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 

Similar to LORA: Low-Rank Adaptation of Large Language Models

Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняRoman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняLviv Startup Club
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...IEEEMEMTECHSTUDENTPROJECTS
 
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...IEEEFINALSEMSTUDENTPROJECTS
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...IEEEFINALSEMSTUDENTPROJECTS
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEEGLOBALSOFTSTUDENTPROJECTS
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitRafael Arana
 
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...DataScienceConferenc1
 
Integrating lock free and combining techniques for a practical and scalable f...
Integrating lock free and combining techniques for a practical and scalable f...Integrating lock free and combining techniques for a practical and scalable f...
Integrating lock free and combining techniques for a practical and scalable f...jpstudcorner
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWDataWorks Summit
 
Data warehouse 26 exploiting parallel technologies
Data warehouse  26 exploiting parallel technologiesData warehouse  26 exploiting parallel technologies
Data warehouse 26 exploiting parallel technologiesVaibhav Khanna
 
Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014
Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014
Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014Lari Hotari
 
Pretzel: optimized Machine Learning framework for low-latency and high throu...
Pretzel: optimized Machine Learning framework for  low-latency and high throu...Pretzel: optimized Machine Learning framework for  low-latency and high throu...
Pretzel: optimized Machine Learning framework for low-latency and high throu...NECST Lab @ Politecnico di Milano
 
Algorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
Algorithm Solved IEEE Projects 2012 2013 Java @ SeabirdssolutionsAlgorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
Algorithm Solved IEEE Projects 2012 2013 Java @ SeabirdssolutionsSBGC
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cseSBGC
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cseSBGC
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning OptimizationNikolas Markou
 
Scalable analytics for iaa s cloud availability
Scalable analytics for iaa s cloud availabilityScalable analytics for iaa s cloud availability
Scalable analytics for iaa s cloud availabilityNexgen Technology
 

Similar to LORA: Low-Rank Adaptation of Large Language Models (20)

Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняRoman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
 
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
 
Cla06.ppt
Cla06.pptCla06.ppt
Cla06.ppt
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks Summit
 
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
 
Integrating lock free and combining techniques for a practical and scalable f...
Integrating lock free and combining techniques for a practical and scalable f...Integrating lock free and combining techniques for a practical and scalable f...
Integrating lock free and combining techniques for a practical and scalable f...
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
Data warehouse 26 exploiting parallel technologies
Data warehouse  26 exploiting parallel technologiesData warehouse  26 exploiting parallel technologies
Data warehouse 26 exploiting parallel technologies
 
Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014
Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014
Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014
 
Pretzel: optimized Machine Learning framework for low-latency and high throu...
Pretzel: optimized Machine Learning framework for  low-latency and high throu...Pretzel: optimized Machine Learning framework for  low-latency and high throu...
Pretzel: optimized Machine Learning framework for low-latency and high throu...
 
Algorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
Algorithm Solved IEEE Projects 2012 2013 Java @ SeabirdssolutionsAlgorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
Algorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cse
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cse
 
Jooq java object oriented querying
Jooq java object oriented queryingJooq java object oriented querying
Jooq java object oriented querying
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning Optimization
 
Scala a case4
Scala a case4Scala a case4
Scala a case4
 
Scalable analytics for iaa s cloud availability
Scalable analytics for iaa s cloud availabilityScalable analytics for iaa s cloud availability
Scalable analytics for iaa s cloud availability
 

More from YongSang Yoo

20230727_tinystories
20230727_tinystories20230727_tinystories
20230727_tinystoriesYongSang Yoo
 
221220_페르소나챗봇
221220_페르소나챗봇221220_페르소나챗봇
221220_페르소나챗봇YongSang Yoo
 
230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...YongSang Yoo
 
230223_Knowledge_Distillation
230223_Knowledge_Distillation230223_Knowledge_Distillation
230223_Knowledge_DistillationYongSang Yoo
 
221108_Multimodal Transformer
221108_Multimodal Transformer221108_Multimodal Transformer
221108_Multimodal TransformerYongSang Yoo
 

More from YongSang Yoo (10)

20230727_tinystories
20230727_tinystories20230727_tinystories
20230727_tinystories
 
20230608_megabyte
20230608_megabyte20230608_megabyte
20230608_megabyte
 
221220_페르소나챗봇
221220_페르소나챗봇221220_페르소나챗봇
221220_페르소나챗봇
 
220920_AI ETHICS
220920_AI ETHICS220920_AI ETHICS
220920_AI ETHICS
 
230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...
 
230223_Knowledge_Distillation
230223_Knowledge_Distillation230223_Knowledge_Distillation
230223_Knowledge_Distillation
 
221108_Multimodal Transformer
221108_Multimodal Transformer221108_Multimodal Transformer
221108_Multimodal Transformer
 
221011_BERT
221011_BERT221011_BERT
221011_BERT
 
220910_GatedRNN
220910_GatedRNN220910_GatedRNN
220910_GatedRNN
 
220906_Glove
220906_Glove220906_Glove
220906_Glove
 

Recently uploaded

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 

Recently uploaded (20)

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 

LORA: Low-Rank Adaptation of Large Language Models

  • 1. LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS 2023.3.9 유용상 NLP 티타임
  • 2. Introduction • We take inspiration from Li et al. (2018a); Aghajanyan et al. (2020) which show that the learned over-parametrized models in fact reside on a low intrinsic dimension. • We hypothesize that the change in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed Low-Rank Adaptation (LoRA) approach.
  • 3. Introduction : Adventages • A pre-trained model can be shared and used to build many small LoRA modules for different tasks. We can freeze the shared model and efficiently switch tasks by replacing the matrices in reducing the storage requirement and task-switching overhead significantly. • LoRA makes training more efficient and lowers the hardware barrier to entry by up to 3 times when using adaptive optimizers. optimize the injected, much smaller low-rank matrices. • Our simple linear design allows us to merge the trainable matrices with the frozen weights when deployed, introducing no inference latency compared to a fully fine-tuned model, by construction. • LoRA is orthogonal to many prior methods and can be combined with many of them, such as prefix-tuning.
  • 7. Low-Rank Parametrized Update Matrices • Forward Pass • Update W0 + ∆W = W0 + BA • Hypothesis : 가중치에 대한 update도 adaptation 중 intrinsic rank가 낮을 것이다.
  • 13. Conclusion • Large-scale model을 효율적으로 튜닝하는 LoRA 제안 • Adapter류의 기법과 다르게 inference latency가 발생하지 않음 • Prefix-tuning과 다르게 usable sequence length를 줄일 필요가 없음 • 가중치 업데이트 행렬이 low intrinsic rank를 가진다고 가정 • 논문에선 LM에 초점을 맞췄지만 이론적으로 모든 dense layer에 적용 가능