Large Language
Models: Diving into
GPT, LLaMA, and More
Shervin Minaee1 , Tomas Mikolov2 , Narjes Nikzad3 ,
Meysam Chenaghlu4 Richard Socher5 , Xavier
Amatriain6 , Jianfeng Gao7
Presented by: Nikhil Khanchandani
01 Introduction
Table of contents
02 The Evolution of Language Models
03 Transformers
04 LLM Families
05 GPT Family
Table of contents
06 LLaMA
07 PaLM
08 Building LLMs
09 Model Architecture & Fine-Tuning
Table of contents
10 LLM Augmentation
11 Challenges & Future
12 Conclusion
Introduction
01
Introduction
● This project looks at a research paper
that breaks down how Large Language
Models (LLMs) like GPT and LLaMA are
trained, what makes them powerful,
and how they're used in real-world AI
tools from chatbots to reasoning
agents.
● LLMs are central to the current wave of
generative AI.
● Will look at how LLMs evolved, how
they’re built and evaluated.
The Evolution of Language
Models
02
The Evolution of Language Models
● 1950s–1990s: Early models used statistical techniques like
n-grams
● 2000s: Neural language models emerged
● 2010s: RNNs and LSTMs improved context understanding
but had difficulty with long-range dependencies.
● 2017: The Transformer architecture was introduced
● 2020s: LLMs like GPT-3, PaLM, and LLaMA became active
Transformers
03
Transformers
● Replaced recurrence with self-attention
○ Parallel processing of sequences
○ Long-range context
○ Good scalability
● All modern LLMs like GPT, PaLM, and
LLaMA started using them
● Encoder-decoder architecture
LLM Families
04
LLM Families
● Three Families:
○ GPT (OpenAI): Autoregressive models, from GPT-1 to
GPT-4
○ LLaMA (Meta): Open-source, efficient models
○ PaLM (Google): Large-scale, multilingual, and
reasoning-focused models.
GPT Family
05
GPT Family
● GPT-1 (2018): Generative pretraining
+ fine-tuning strategy.
● GPT-2 (2019): Strong zero-shot/few-
shot performance.
● GPT-3 (2020): 175B parameters, in-
context learning and few-shot
prompting.
● GPT-4 (2023): Multimodal
capabilities, advanced reasoning,
chain-of-thought prompting.
LLaMA
06
LLaMA
● LLaMA (Meta, 2023): Released
models from 7B to 65B parameters
● Focus on efficiency
● Uses curated, high-quality datasets.
● Popular for research and fine-tuning
due to open source
● Enabled low-cost adaptation via
techniques like:
○ LoRA
○ QLoRA
PaLM
07
PaLM
● PaLM developed by Google
○ PaLM-1: 540B parameters, trained
using the Pathways infrastructure.
○ PaLM-2: Better multilingual and
logical reasoning
● Strong performance on BIG-Bench,
MMLU, Multi-task NLP, and coding
benchmarks
● Used by Google's Bard
Building LLMs
08
Building LLMs
● LLMs are trained on massive text
● Data quality matters: deduplication,
cleaning
● Tokenization breaks text into chunks
● Training efficiency
● Model generalization
Model Architecture & Fine-
Tuning
09
Model Architecture & Fine-Tuning
● Transformers use self-attention to model relationships
between all words in a sequence.
● They don’t know word order, that’s where positional
encoding comes in.
● Supervised Fine-Tuning: Adjust model on specific tasks
● RLHF (Reinforcement Learning from Human Feedback)
● DPO (Direct Preference Optimization)
● Decoding Strategies:
○ Greedy, Beam Search, Top-k, Top-p
LLM Augmentation
10
LLM Augmentation
● RAG (Retrieval-Augmented Generation):
○ Adds knowledge to LLMs by fetching context
● Tool Use:
○ LLMs can call APIs, use calculators, run code
● Multimodal Extensions:
○ LLMs that handle images, audio, and video
Challenges & Future
11
Challenges & Future
● Challenges:
○ LLMs can generate incorrect or
made-up information.
○ Outputs can reflect training data
bias, raising concerns.
○ Compute Cost: Training and
inference require massive
resources
○ Closed-Source Limitations
● Future Directions:
○ Multimodal LLMs: Text + vision
+ audio + video in one model
○ LLMs with planning, memory,
and long-term goals
○ Long-Context Models: Extend
input size from 2k 100k+
→
tokens
○ More Open-Source Efforts
Conclusion
12
Conclusion
● LLMs like GPT, LLaMA, and PaLM show a major future in AI
● They are built using transformers, massive datasets, and fine-
tuning techniques.
● LLMs are increasingly used in real-world tools
● Despite challenges like bias, research is rapidly evolving.
● The future is multimodal, agentic, and more open-source.

Large Language Models: Diving into GPT, LLaMA, and More

  • 1.
    Large Language Models: Divinginto GPT, LLaMA, and More Shervin Minaee1 , Tomas Mikolov2 , Narjes Nikzad3 , Meysam Chenaghlu4 Richard Socher5 , Xavier Amatriain6 , Jianfeng Gao7 Presented by: Nikhil Khanchandani
  • 2.
    01 Introduction Table ofcontents 02 The Evolution of Language Models 03 Transformers 04 LLM Families
  • 3.
    05 GPT Family Tableof contents 06 LLaMA 07 PaLM 08 Building LLMs
  • 4.
    09 Model Architecture& Fine-Tuning Table of contents 10 LLM Augmentation 11 Challenges & Future 12 Conclusion
  • 5.
  • 6.
    Introduction ● This projectlooks at a research paper that breaks down how Large Language Models (LLMs) like GPT and LLaMA are trained, what makes them powerful, and how they're used in real-world AI tools from chatbots to reasoning agents. ● LLMs are central to the current wave of generative AI. ● Will look at how LLMs evolved, how they’re built and evaluated.
  • 7.
    The Evolution ofLanguage Models 02
  • 8.
    The Evolution ofLanguage Models ● 1950s–1990s: Early models used statistical techniques like n-grams ● 2000s: Neural language models emerged ● 2010s: RNNs and LSTMs improved context understanding but had difficulty with long-range dependencies. ● 2017: The Transformer architecture was introduced ● 2020s: LLMs like GPT-3, PaLM, and LLaMA became active
  • 9.
  • 10.
    Transformers ● Replaced recurrencewith self-attention ○ Parallel processing of sequences ○ Long-range context ○ Good scalability ● All modern LLMs like GPT, PaLM, and LLaMA started using them ● Encoder-decoder architecture
  • 11.
  • 12.
    LLM Families ● ThreeFamilies: ○ GPT (OpenAI): Autoregressive models, from GPT-1 to GPT-4 ○ LLaMA (Meta): Open-source, efficient models ○ PaLM (Google): Large-scale, multilingual, and reasoning-focused models.
  • 13.
  • 14.
    GPT Family ● GPT-1(2018): Generative pretraining + fine-tuning strategy. ● GPT-2 (2019): Strong zero-shot/few- shot performance. ● GPT-3 (2020): 175B parameters, in- context learning and few-shot prompting. ● GPT-4 (2023): Multimodal capabilities, advanced reasoning, chain-of-thought prompting.
  • 15.
  • 16.
    LLaMA ● LLaMA (Meta,2023): Released models from 7B to 65B parameters ● Focus on efficiency ● Uses curated, high-quality datasets. ● Popular for research and fine-tuning due to open source ● Enabled low-cost adaptation via techniques like: ○ LoRA ○ QLoRA
  • 17.
  • 18.
    PaLM ● PaLM developedby Google ○ PaLM-1: 540B parameters, trained using the Pathways infrastructure. ○ PaLM-2: Better multilingual and logical reasoning ● Strong performance on BIG-Bench, MMLU, Multi-task NLP, and coding benchmarks ● Used by Google's Bard
  • 19.
  • 20.
    Building LLMs ● LLMsare trained on massive text ● Data quality matters: deduplication, cleaning ● Tokenization breaks text into chunks ● Training efficiency ● Model generalization
  • 21.
    Model Architecture &Fine- Tuning 09
  • 22.
    Model Architecture &Fine-Tuning ● Transformers use self-attention to model relationships between all words in a sequence. ● They don’t know word order, that’s where positional encoding comes in. ● Supervised Fine-Tuning: Adjust model on specific tasks ● RLHF (Reinforcement Learning from Human Feedback) ● DPO (Direct Preference Optimization) ● Decoding Strategies: ○ Greedy, Beam Search, Top-k, Top-p
  • 23.
  • 24.
    LLM Augmentation ● RAG(Retrieval-Augmented Generation): ○ Adds knowledge to LLMs by fetching context ● Tool Use: ○ LLMs can call APIs, use calculators, run code ● Multimodal Extensions: ○ LLMs that handle images, audio, and video
  • 25.
  • 26.
    Challenges & Future ●Challenges: ○ LLMs can generate incorrect or made-up information. ○ Outputs can reflect training data bias, raising concerns. ○ Compute Cost: Training and inference require massive resources ○ Closed-Source Limitations ● Future Directions: ○ Multimodal LLMs: Text + vision + audio + video in one model ○ LLMs with planning, memory, and long-term goals ○ Long-Context Models: Extend input size from 2k 100k+ → tokens ○ More Open-Source Efforts
  • 27.
  • 28.
    Conclusion ● LLMs likeGPT, LLaMA, and PaLM show a major future in AI ● They are built using transformers, massive datasets, and fine- tuning techniques. ● LLMs are increasingly used in real-world tools ● Despite challenges like bias, research is rapidly evolving. ● The future is multimodal, agentic, and more open-source.