LLaMA 2.pptx

RkRahul16
RkRahul16Student at Daffodil International University
Large Language
Model Mata AI
-LLaMA 2
© Rk Rahul
LLaMA - Overview
● LLaMA is a family of large language models (LLMs)
● LLaMA has four model sizes were trained: 7, 13, 33 and 65
billion parameters
● LLaMA developed by Meta
● First released in February 2023.
LLaMA 2 - Overview
● LLaMA 2 is a family of large language models (LLMs)
● LLaMA 2 is an auto-regressive language model
● First Release, July 18, 2023, in partnership with Microsoft, Meta
and Open-source Large Language models.
● LLaMA 2 pretrained models are trained on 2 trillion tokens, and
have double the context length than LLaMA 1
● Three model sizes were trained: 7, 13,70 billion parameters
● LLaMA 2 is available for free for research and commercial use.
LLaMA 2 – Can Do
● Generate different creative text formats of text content, like poems,
code, scripts, musical pieces, email, letters, etc.
● Translate languages.
● Write different kinds of creative content.
● Answer your questions in an informative way, even if they are open
ended, challenging, or strange.
● Help you with coding tasks.
● Generate dialogue for chatbots and other conversational AI systems.
LLaMA 2 - Improvements
● Increased Training on Tokens: Llama 2 is trained on 40% more
tokens.
● Longer Context Length: With a longer context length of 4k tokens.
● Fine-Tuning for Dialogues: The versions of Llama 2 that are fine-
tuned (Labelled Llama 2-Chat) are aimed at being optimized for
dialogue applications using Reinforcement Learning from Human
Feedback (RLHF).
Fine-Tuning Process and LLaMA-2-Chat
Supervised Fine-
Tuning
LLaMA 2 Building Process
1
Pre-Training
2
3
Reinforcement
Learning from
Human Feedback
(RLHF)
4
Reward Model
LLaMA 2 Pre-Training
● The pretraining approach using an optimized auto-regressive transformer
(several changes to improve performance)
● Also used grouped-query attention (GQA) (improve inference scalability)
● Trained on 2 trillion tokens of data for good performance.
● Model architecture uses standard transformer architecture.
● Pre-normalization using RMSNorm (Root Mean Square Layer Normalization)
LLaMA 2 Pre-Training Normalization
LLaMA 2 - Pretraining Functionality
● Trained using the AdamW optimizer (β1 = 0.9, β2 = 0.95, eps = 10−5
)
● The SwiGLU activation function
● To use a cosine learning rate schedule (warmup of 2000 steps) and decay
final learning rate.
● Weight decay of 0.1 and gradient clipping of 1.0
LLaMA 2 - Training Hardware
● LLaMA 2 was pre-trained on Meta's Research Super Cluster
(RSC) as well as internal production clusters.
● Both clusters use NVIDIA A100 GPUs.
● RSC use NVIDIA Quantum InfiniBand while production
cluster is using a RoCE (RDMA over converged Ethernet)
LLaMA 2 - Supervised Fine-Tuning (SFT)
● SFT is the technique of next-token prediction objective that is nearly
identical to pre-training.
● To encode text for LLaMA 2 and using the method of the tokenizer.
● Supervised fine-tuning to use a cosine learning rate schedule with an
initial learning rate of 2 × 𝟏𝟎−𝟓, a weight decay of 0.1, a batch size
of 64, and a sequence length of 4096 tokens.
LLaMA 2 - Tokenizer
● To encode text from SFT (LLaMA 2), the tokenizer first splits all
numbers into individual digits. LLaMA 2 is a sub word language
model, and it can learn to represent numbers using a small
number of sub words.
● LLaMA 2 is a byte pair encoding (BPE) tokenizer based on the
SentencePiece implementation.
● The total vocabulary size is 32k tokens.
LLaMA 2 - Tokenizer
LLaMA 2 - RLHF
● Reinforcement learning from human feedback (RLHF) is a model training
procedure that is applied to a fine-tuned language model to further align model
behavior with human preferences and instruction following.
● RLHF collects data that represents sampled human preferences, whereby
human annotators select which directly from human feedback on the
model’s output.
● Safety-based data collection during RLHF
● This human feedback is subsequently used to train a reward model, which
learns patterns in the preferences of the human annotators and can then
automate preference decisions.
LLaMA 2 - Reward Model
● The reward model is responsible
for telling the language model
what constitutes a good response.
Its response based on how helpful
and safe it is.
● The reward model takes a model
response and its corresponding as
inputs and outputs a scalar score to
indicate the quality of the model
generation.
LLaMA 2 - Model Evaluations
Reference
● Deep (Learning) Focus -
https://cameronrwolfe.substack.com/p/llama-2-from-the-ground-up
● Meta AI - https://ai.meta.com/
● Research Article - Llama 2: Open Foundation and Fine-Tuned
Chat Models
Thanks!
1 of 19

Recommended

LLaMA Open and Efficient Foundation Language Models - 230528.pdf by
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
212 views20 slides
Neural Language Generation Head to Toe by
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Hady Elsahar
102 views193 slides
And then there were ... Large Language Models by
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
2.4K views40 slides
Training language models to follow instructions with human feedback.pdf by
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdfPo-Chuan Chen
272 views50 slides
Fine tune and deploy Hugging Face NLP models by
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
961 views26 slides
A Comprehensive Review of Large Language Models for.pptx by
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
2.3K views24 slides

More Related Content

What's hot

Reinventing Deep Learning
 with Hugging Face Transformers by
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
597 views14 slides
Building NLP applications with Transformers by
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
1.1K views16 slides
Llama-index by
Llama-indexLlama-index
Llama-indexDenis973830
2.2K views14 slides
LLMs Bootcamp by
LLMs BootcampLLMs Bootcamp
LLMs BootcampFiza987241
134 views12 slides
Automated Machine Learning by
Automated Machine LearningAutomated Machine Learning
Automated Machine LearningYuriy Guts
1.8K views40 slides
Large Language Models - Chat AI.pdf by
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
720 views19 slides

What's hot(20)

Reinventing Deep Learning
 with Hugging Face Transformers by Julien SIMON
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
Julien SIMON597 views
Building NLP applications with Transformers by Julien SIMON
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
Julien SIMON1.1K views
LLMs Bootcamp by Fiza987241
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
Fiza987241134 views
Automated Machine Learning by Yuriy Guts
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
Yuriy Guts1.8K views
Large Language Models - Chat AI.pdf by David Rostcheck
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
David Rostcheck720 views
Benchmark comparison of Large Language Models by Matej Varga
Benchmark comparison of Large Language ModelsBenchmark comparison of Large Language Models
Benchmark comparison of Large Language Models
Matej Varga162 views
Using Large Language Models in 10 Lines of Code by Gautier Marti
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
Gautier Marti1.3K views
Automated Machine Learning by safa cimenli
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
safa cimenli1.2K views
Building and deploying LLM applications with Apache Airflow by Kaxil Naik
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik101 views
Microsoft Introduction to Automated Machine Learning by Setu Chokshi
Microsoft Introduction to Automated Machine LearningMicrosoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine Learning
Setu Chokshi346 views
LanGCHAIN Framework by Keymate.AI
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
Keymate.AI1.5K views
DC02. Interpretation of predictions by Anton Kulesh
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
Anton Kulesh365 views
Explainable AI by Dinesh V
Explainable AIExplainable AI
Explainable AI
Dinesh V3.9K views
Explainable Machine Learning (Explainable ML) by Hayim Makabee
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
Hayim Makabee578 views
Latent diffusions vs DALL-E v2 by Vitaly Bondar
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2
Vitaly Bondar1.4K views
Meta learning with memory augmented neural network by Katy Lee
Meta learning with memory augmented neural networkMeta learning with memory augmented neural network
Meta learning with memory augmented neural network
Katy Lee1.2K views
‘Big models’: the success and pitfalls of Transformer models in natural langu... by Leiden University
‘Big models’: the success and pitfalls of Transformer models in natural langu...‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...

Similar to LLaMA 2.pptx

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in... by
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...San Kim
22 views56 slides
Customizing LLMs by
Customizing LLMsCustomizing LLMs
Customizing LLMsJim Steele
69 views17 slides
Training language models to follow instructions with human feedback (Instruct... by
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Rama Irsheidat
194 views56 slides
Icbai 2018 ver_1 by
Icbai 2018 ver_1Icbai 2018 ver_1
Icbai 2018 ver_1BlackhatGAURAV
65 views42 slides
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent... by
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...Po-Chuan Chen
133 views32 slides
230915 paper summary learning to world model with language with details - pub... by
230915 paper summary learning to world model with language with details - pub...230915 paper summary learning to world model with language with details - pub...
230915 paper summary learning to world model with language with details - pub...Seungjoon1
33 views38 slides

Similar to LLaMA 2.pptx(20)

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in... by San Kim
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
San Kim22 views
Customizing LLMs by Jim Steele
Customizing LLMsCustomizing LLMs
Customizing LLMs
Jim Steele69 views
Training language models to follow instructions with human feedback (Instruct... by Rama Irsheidat
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
Rama Irsheidat194 views
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent... by Po-Chuan Chen
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
Po-Chuan Chen133 views
230915 paper summary learning to world model with language with details - pub... by Seungjoon1
230915 paper summary learning to world model with language with details - pub...230915 paper summary learning to world model with language with details - pub...
230915 paper summary learning to world model with language with details - pub...
Seungjoon133 views
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення by Lviv Startup Club
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняRoman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Long Short Term Memory (Neural Networks) by Olusola Amusan
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)
Olusola Amusan175 views
Emnlp2015 reading festival_lstm_cws by Ace12358
Emnlp2015 reading festival_lstm_cwsEmnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cws
Ace12358748 views
Transfer Learning in NLP: A Survey by NUPUR YADAV
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
NUPUR YADAV46 views
ROS 2 AI Integration Working Group 1: ALMA, SustainML & ROS 2 use case by eProsima
ROS 2 AI Integration Working Group 1: ALMA, SustainML & ROS 2 use case ROS 2 AI Integration Working Group 1: ALMA, SustainML & ROS 2 use case
ROS 2 AI Integration Working Group 1: ALMA, SustainML & ROS 2 use case
eProsima56 views
Ml also helps generic compiler ? by Ryo Takahashi
Ml also helps generic compiler ?Ml also helps generic compiler ?
Ml also helps generic compiler ?
Ryo Takahashi76 views
High Performance Transfer Learning for Classifying Intent of Sales Engagement... by Databricks
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
Databricks625 views
A vision for ejabberd - ejabberd SF Meetup by Mickaël Rémond
A vision for ejabberd - ejabberd SF MeetupA vision for ejabberd - ejabberd SF Meetup
A vision for ejabberd - ejabberd SF Meetup
Mickaël Rémond2.3K views
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C... by CloudxLab
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...
CloudxLab555 views
PMML - Predictive Model Markup Language by aguazzel
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Language
aguazzel8.1K views

Recently uploaded

Qualifying SaaS, IaaS.pptx by
Qualifying SaaS, IaaS.pptxQualifying SaaS, IaaS.pptx
Qualifying SaaS, IaaS.pptxSachin Bhandari
1K views8 slides
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericShapeBlue
130 views9 slides
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueShapeBlue
263 views23 slides
Business Analyst Series 2023 - Week 4 Session 8 by
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8DianaGray10
123 views13 slides
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueShapeBlue
138 views15 slides
NTGapps NTG LowCode Platform by
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform Mustafa Kuğu
423 views30 slides

Recently uploaded(20)

Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by ShapeBlue
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue130 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue263 views
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10123 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue138 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu423 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue161 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue173 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue221 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson160 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue194 views
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T by ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue152 views
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue147 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10139 views
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue180 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue135 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue184 views
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue139 views

LLaMA 2.pptx

  • 1. Large Language Model Mata AI -LLaMA 2 © Rk Rahul
  • 2. LLaMA - Overview ● LLaMA is a family of large language models (LLMs) ● LLaMA has four model sizes were trained: 7, 13, 33 and 65 billion parameters ● LLaMA developed by Meta ● First released in February 2023.
  • 3. LLaMA 2 - Overview ● LLaMA 2 is a family of large language models (LLMs) ● LLaMA 2 is an auto-regressive language model ● First Release, July 18, 2023, in partnership with Microsoft, Meta and Open-source Large Language models. ● LLaMA 2 pretrained models are trained on 2 trillion tokens, and have double the context length than LLaMA 1 ● Three model sizes were trained: 7, 13,70 billion parameters ● LLaMA 2 is available for free for research and commercial use.
  • 4. LLaMA 2 – Can Do ● Generate different creative text formats of text content, like poems, code, scripts, musical pieces, email, letters, etc. ● Translate languages. ● Write different kinds of creative content. ● Answer your questions in an informative way, even if they are open ended, challenging, or strange. ● Help you with coding tasks. ● Generate dialogue for chatbots and other conversational AI systems.
  • 5. LLaMA 2 - Improvements ● Increased Training on Tokens: Llama 2 is trained on 40% more tokens. ● Longer Context Length: With a longer context length of 4k tokens. ● Fine-Tuning for Dialogues: The versions of Llama 2 that are fine- tuned (Labelled Llama 2-Chat) are aimed at being optimized for dialogue applications using Reinforcement Learning from Human Feedback (RLHF).
  • 6. Fine-Tuning Process and LLaMA-2-Chat
  • 7. Supervised Fine- Tuning LLaMA 2 Building Process 1 Pre-Training 2 3 Reinforcement Learning from Human Feedback (RLHF) 4 Reward Model
  • 8. LLaMA 2 Pre-Training ● The pretraining approach using an optimized auto-regressive transformer (several changes to improve performance) ● Also used grouped-query attention (GQA) (improve inference scalability) ● Trained on 2 trillion tokens of data for good performance. ● Model architecture uses standard transformer architecture. ● Pre-normalization using RMSNorm (Root Mean Square Layer Normalization)
  • 9. LLaMA 2 Pre-Training Normalization
  • 10. LLaMA 2 - Pretraining Functionality ● Trained using the AdamW optimizer (β1 = 0.9, β2 = 0.95, eps = 10−5 ) ● The SwiGLU activation function ● To use a cosine learning rate schedule (warmup of 2000 steps) and decay final learning rate. ● Weight decay of 0.1 and gradient clipping of 1.0
  • 11. LLaMA 2 - Training Hardware ● LLaMA 2 was pre-trained on Meta's Research Super Cluster (RSC) as well as internal production clusters. ● Both clusters use NVIDIA A100 GPUs. ● RSC use NVIDIA Quantum InfiniBand while production cluster is using a RoCE (RDMA over converged Ethernet)
  • 12. LLaMA 2 - Supervised Fine-Tuning (SFT) ● SFT is the technique of next-token prediction objective that is nearly identical to pre-training. ● To encode text for LLaMA 2 and using the method of the tokenizer. ● Supervised fine-tuning to use a cosine learning rate schedule with an initial learning rate of 2 × 𝟏𝟎−𝟓, a weight decay of 0.1, a batch size of 64, and a sequence length of 4096 tokens.
  • 13. LLaMA 2 - Tokenizer ● To encode text from SFT (LLaMA 2), the tokenizer first splits all numbers into individual digits. LLaMA 2 is a sub word language model, and it can learn to represent numbers using a small number of sub words. ● LLaMA 2 is a byte pair encoding (BPE) tokenizer based on the SentencePiece implementation. ● The total vocabulary size is 32k tokens.
  • 14. LLaMA 2 - Tokenizer
  • 15. LLaMA 2 - RLHF ● Reinforcement learning from human feedback (RLHF) is a model training procedure that is applied to a fine-tuned language model to further align model behavior with human preferences and instruction following. ● RLHF collects data that represents sampled human preferences, whereby human annotators select which directly from human feedback on the model’s output. ● Safety-based data collection during RLHF ● This human feedback is subsequently used to train a reward model, which learns patterns in the preferences of the human annotators and can then automate preference decisions.
  • 16. LLaMA 2 - Reward Model ● The reward model is responsible for telling the language model what constitutes a good response. Its response based on how helpful and safe it is. ● The reward model takes a model response and its corresponding as inputs and outputs a scalar score to indicate the quality of the model generation.
  • 17. LLaMA 2 - Model Evaluations
  • 18. Reference ● Deep (Learning) Focus - https://cameronrwolfe.substack.com/p/llama-2-from-the-ground-up ● Meta AI - https://ai.meta.com/ ● Research Article - Llama 2: Open Foundation and Fine-Tuned Chat Models