SlideShare a Scribd company logo
LLaMa:
Open and Efficient Foundation
Language Models
허정원 조해창 이기성 황지현 정강민
Touvron, Hugo, et al. arXiv preprint arXiv:2302.13971 (2023).
Contents
• Introduction
• Approach
• Main Results
• Instruction Finetuning
• Conclusion
1. Introduction
• Large Languages Models focus on further scaling
• The best performances are not achieved by the largest models
- Hoffmann et al. (2022)
1. Introduction
• Hoffmann et al. (2022)
recommends training a 10B
model on 200B tokens.
• We find that the performance
of a 7B model continues to
improve even after 1T tokens.
1. Introduction
10x smaller LLaMA-13B
Outperforms
1. Introduction
• Only use publicly available data, making our work compatible with
open-sourcing
• There exist some exceptions, notably OPT(Zhang et al.), GPT-
NeoX(Black et al.), BLOOM(Scaoet et all.) and GLM(Zeng et al.) ,
but none that are competitive with PaLM-62B or Chinchilla.
• An overview of the modifications we made to the transformer
architecture
• Compare with others LLMs on a set of standard benchmarks
2. Approach
Pre-training Data
• Reuse data sources that
have been leveraged to
train other LLMs, with
the restriction of only
using data that is
publicly available.
• Entire training dataset
contains roughly 1.4T
tokens after tokenization.
2. Approach
Architecture
• Training approach is similar to the methods described in
previous work (Brown et al., 2020; Chowdhery et al., 2022)
• Pre-normalization [GPT3]
• SwiGLU activation function [PaLM]
• Rotary Embeddings [GPTNeo]
• AdamW optimizer, with the following hyper-parameters:
β1 = 0.9, β2 = 0.95
2. Approach
Pre-normalization [GPT3] Zhang and Sennrich (2019).
• LayerNorm forces the norm of neurons to be decoupled from the
inputs and weight matrix.(internal covariate shift issue)
• RMSNorm which only focuses on re-scaling invariance and
regularizes the summed inputs simply according to the root mean
square (RMS) statistic:
=> improve the training stability
2. Approach
SwiGLU activation function [PaLM] Shazeer (2020)
• Replace the ReLU non-linearity by the SwiGLU
activation function.
• Use a dimension of
!
"
4𝑑 instead of 4d as in
PaLM
SwiGLU = xW·sigmoid(βxW) @ xV
=> improve the performance
2. Approach
Rotary Embeddings [GPTNeo] Su et al. (2021)
• Remove the absolute positional embeddings, and instead, add rotary positional
embeddings (RoPE)
PE(pos,2i) = sin(pos/100002i/d_model)
PE(pos,2i+1) = cos(pos/100002i/d_model)
2. Approach
Efficient implementation
• Efficient implementation of the causal multi-head attention to reduce memory
usage and runtime
• Reduced the amount of activations that are recomputed during the backward pass
with checkpointing.
• Reduce the memory usage of the model by using model and sequence parallelism
3. Main Results
Zero-shot and few-shot tasks
Report results on a total of 20 benchmarks:
• Common Sense Reasoning
3. Main Results
• Closed-book Question Answering
This model runs on a single V100 GPU during inference.
3. Main Results
• Reading Comprehension
• Mathematical reasoning
• Code generation
• Massive Multitask Language Understanding
3. Results
Bias, Toxicity and Misinformation
• RealToxicity Prompts
• CrowS-Pairs
• WinoGender
• TruthfulQA
3. Results
Carbon footprint wu et al. (2022)
• Estimate that we used 2048 A100-80GB for a period of approximately
5 months to develop our models.
• Wh = GPU-h×(GPU power consumption)×PUE
• tCO2eq = MWh × 0.385.
• 2,638 MWh -> 1,015 tCO2eq.
4. Conclusion
• Show that it is possible to achieve state-of-the-art performance by
training exclusively on publicly available data.
• Accelerate the development of large language models.
• Help efforts to improve their robustness.
• Mitigate known issues such as toxicity and bias.
• We believe that this model will help democratize the access and study
of LLMs, since it can be run on a single GPU.
4. Conclusion
https://github.com/facebookresearch/llama
Q & A

More Related Content

What's hot

Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
Hady Elsahar
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
Fiza987241
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
Jim Steele
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
SylvainGugger
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
JiWenKim
 
LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
OzgurOscarOzkan
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
Leon Dohmen
 
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdfLlama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Dr. Yasir Butt
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
taozen
 
‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...
Leiden University
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
StephenAmell4
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
Data Science Dojo
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
David Talby
 
Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer Models
Databricks
 
200109-Open AI Chat GPT-4-3.pptx
200109-Open AI Chat GPT-4-3.pptx200109-Open AI Chat GPT-4-3.pptx
200109-Open AI Chat GPT-4-3.pptx
andre241421
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
bhavesh_physics
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
David Rostcheck
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
Seoung-Ho Choi
 

What's hot (20)

Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdfLlama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
 
‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer Models
 
200109-Open AI Chat GPT-4-3.pptx
200109-Open AI Chat GPT-4-3.pptx200109-Open AI Chat GPT-4-3.pptx
200109-Open AI Chat GPT-4-3.pptx
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 

Similar to LLaMA Open and Efficient Foundation Language Models - 230528.pdf

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
Algorithm Analysis & Solution Design.pdf
Algorithm Analysis & Solution Design.pdfAlgorithm Analysis & Solution Design.pdf
Algorithm Analysis & Solution Design.pdf
rony setyawansyah
 
Concurrent Root Cut Loops to Exploit Random Performance Variability
Concurrent Root Cut Loops to Exploit Random Performance VariabilityConcurrent Root Cut Loops to Exploit Random Performance Variability
Concurrent Root Cut Loops to Exploit Random Performance Variability
IBM Decision Optimization
 
Recent Advances in CPLEX 12.6.1
Recent Advances in CPLEX 12.6.1Recent Advances in CPLEX 12.6.1
Recent Advances in CPLEX 12.6.1
IBM Decision Optimization
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Jinwon Lee
 
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
Joonyoung Yi
 
RCIM 2008 - Modello Scheduling
RCIM 2008 - Modello SchedulingRCIM 2008 - Modello Scheduling
RCIM 2008 - Modello SchedulingMarco Santambrogio
 
Pegasus
PegasusPegasus
Pegasus
Hangil Kim
 
Paper_An Efficient Garbage Collection in Java Virtual Machine via Swap I/O O...
Paper_An Efficient Garbage Collection in Java Virtual  Machine via Swap I/O O...Paper_An Efficient Garbage Collection in Java Virtual  Machine via Swap I/O O...
Paper_An Efficient Garbage Collection in Java Virtual Machine via Swap I/O O...
Hyo jeong Lee
 
On the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringOn the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software Engineering
Abdel Salam Sayyad
 
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Aalto University
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
taeseon ryu
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Benoit Combemale
 
Bert
BertBert
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization StudioRecent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
IBM Decision Optimization
 
Seed rl paper review
Seed rl paper reviewSeed rl paper review
Seed rl paper review
Kyoungman Lee
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
principles of object oriented class design
principles of object oriented class designprinciples of object oriented class design
principles of object oriented class design
Neetu Mishra
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tune
XiaoweiJiang7
 

Similar to LLaMA Open and Efficient Foundation Language Models - 230528.pdf (20)

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Algorithm Analysis & Solution Design.pdf
Algorithm Analysis & Solution Design.pdfAlgorithm Analysis & Solution Design.pdf
Algorithm Analysis & Solution Design.pdf
 
Concurrent Root Cut Loops to Exploit Random Performance Variability
Concurrent Root Cut Loops to Exploit Random Performance VariabilityConcurrent Root Cut Loops to Exploit Random Performance Variability
Concurrent Root Cut Loops to Exploit Random Performance Variability
 
Recent Advances in CPLEX 12.6.1
Recent Advances in CPLEX 12.6.1Recent Advances in CPLEX 12.6.1
Recent Advances in CPLEX 12.6.1
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
 
RCIM 2008 - Modello Scheduling
RCIM 2008 - Modello SchedulingRCIM 2008 - Modello Scheduling
RCIM 2008 - Modello Scheduling
 
Pegasus
PegasusPegasus
Pegasus
 
Paper_An Efficient Garbage Collection in Java Virtual Machine via Swap I/O O...
Paper_An Efficient Garbage Collection in Java Virtual  Machine via Swap I/O O...Paper_An Efficient Garbage Collection in Java Virtual  Machine via Swap I/O O...
Paper_An Efficient Garbage Collection in Java Virtual Machine via Swap I/O O...
 
On the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringOn the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software Engineering
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
 
Bert
BertBert
Bert
 
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization StudioRecent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
 
Seed rl paper review
Seed rl paper reviewSeed rl paper review
Seed rl paper review
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
principles of object oriented class design
principles of object oriented class designprinciples of object oriented class design
principles of object oriented class design
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tune
 

More from taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
taeseon ryu
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
taeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
taeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
taeseon ryu
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
taeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
taeseon ryu
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
taeseon ryu
 
mPLUG
mPLUGmPLUG
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
taeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
taeseon ryu
 
Dream2Control paper review
Dream2Control paper reviewDream2Control paper review
Dream2Control paper review
taeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 
Dream2Control paper review
Dream2Control paper reviewDream2Control paper review
Dream2Control paper review
 

Recently uploaded

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 

Recently uploaded (20)

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 

LLaMA Open and Efficient Foundation Language Models - 230528.pdf

  • 1. LLaMa: Open and Efficient Foundation Language Models 허정원 조해창 이기성 황지현 정강민 Touvron, Hugo, et al. arXiv preprint arXiv:2302.13971 (2023).
  • 2. Contents • Introduction • Approach • Main Results • Instruction Finetuning • Conclusion
  • 3. 1. Introduction • Large Languages Models focus on further scaling • The best performances are not achieved by the largest models - Hoffmann et al. (2022)
  • 4. 1. Introduction • Hoffmann et al. (2022) recommends training a 10B model on 200B tokens. • We find that the performance of a 7B model continues to improve even after 1T tokens.
  • 5. 1. Introduction 10x smaller LLaMA-13B Outperforms
  • 6. 1. Introduction • Only use publicly available data, making our work compatible with open-sourcing • There exist some exceptions, notably OPT(Zhang et al.), GPT- NeoX(Black et al.), BLOOM(Scaoet et all.) and GLM(Zeng et al.) , but none that are competitive with PaLM-62B or Chinchilla. • An overview of the modifications we made to the transformer architecture • Compare with others LLMs on a set of standard benchmarks
  • 7. 2. Approach Pre-training Data • Reuse data sources that have been leveraged to train other LLMs, with the restriction of only using data that is publicly available. • Entire training dataset contains roughly 1.4T tokens after tokenization.
  • 8. 2. Approach Architecture • Training approach is similar to the methods described in previous work (Brown et al., 2020; Chowdhery et al., 2022) • Pre-normalization [GPT3] • SwiGLU activation function [PaLM] • Rotary Embeddings [GPTNeo] • AdamW optimizer, with the following hyper-parameters: β1 = 0.9, β2 = 0.95
  • 9. 2. Approach Pre-normalization [GPT3] Zhang and Sennrich (2019). • LayerNorm forces the norm of neurons to be decoupled from the inputs and weight matrix.(internal covariate shift issue) • RMSNorm which only focuses on re-scaling invariance and regularizes the summed inputs simply according to the root mean square (RMS) statistic: => improve the training stability
  • 10. 2. Approach SwiGLU activation function [PaLM] Shazeer (2020) • Replace the ReLU non-linearity by the SwiGLU activation function. • Use a dimension of ! " 4𝑑 instead of 4d as in PaLM SwiGLU = xW·sigmoid(βxW) @ xV => improve the performance
  • 11. 2. Approach Rotary Embeddings [GPTNeo] Su et al. (2021) • Remove the absolute positional embeddings, and instead, add rotary positional embeddings (RoPE) PE(pos,2i) = sin(pos/100002i/d_model) PE(pos,2i+1) = cos(pos/100002i/d_model)
  • 12. 2. Approach Efficient implementation • Efficient implementation of the causal multi-head attention to reduce memory usage and runtime • Reduced the amount of activations that are recomputed during the backward pass with checkpointing. • Reduce the memory usage of the model by using model and sequence parallelism
  • 13. 3. Main Results Zero-shot and few-shot tasks Report results on a total of 20 benchmarks: • Common Sense Reasoning
  • 14. 3. Main Results • Closed-book Question Answering This model runs on a single V100 GPU during inference.
  • 15. 3. Main Results • Reading Comprehension • Mathematical reasoning • Code generation • Massive Multitask Language Understanding
  • 16. 3. Results Bias, Toxicity and Misinformation • RealToxicity Prompts • CrowS-Pairs • WinoGender • TruthfulQA
  • 17. 3. Results Carbon footprint wu et al. (2022) • Estimate that we used 2048 A100-80GB for a period of approximately 5 months to develop our models. • Wh = GPU-h×(GPU power consumption)×PUE • tCO2eq = MWh × 0.385. • 2,638 MWh -> 1,015 tCO2eq.
  • 18. 4. Conclusion • Show that it is possible to achieve state-of-the-art performance by training exclusively on publicly available data. • Accelerate the development of large language models. • Help efforts to improve their robustness. • Mitigate known issues such as toxicity and bias. • We believe that this model will help democratize the access and study of LLMs, since it can be run on a single GPU.
  • 20. Q & A