Large Language Models: Diving into GPT, LLaMA, and More

Large Language
Models: Diving into
GPT, LLaMA, and More
Shervin Minaee1 , Tomas Mikolov2 , Narjes Nikzad3 ,
Meysam Chenaghlu4 Richard Socher5 , Xavier
Amatriain6 , Jianfeng Gao7
Presented by: Nikhil Khanchandani

01 Introduction
Table of contents
02 The Evolution of Language Models
03 Transformers
04 LLM Families

05 GPT Family
Table of contents
06 LLaMA
07 PaLM
08 Building LLMs

09 Model Architecture & Fine-Tuning
Table of contents
10 LLM Augmentation
11 Challenges & Future
12 Conclusion

Introduction
● This project looks at a research paper
that breaks down how Large Language
Models (LLMs) like GPT and LLaMA are
trained, what makes them powerful,
and how they're used in real-world AI
tools from chatbots to reasoning
agents.
● LLMs are central to the current wave of
generative AI.
● Will look at how LLMs evolved, how
they’re built and evaluated.

The Evolution of Language
Models
02

The Evolution of Language Models
● 1950s–1990s: Early models used statistical techniques like
n-grams
● 2000s: Neural language models emerged
● 2010s: RNNs and LSTMs improved context understanding
but had difficulty with long-range dependencies.
● 2017: The Transformer architecture was introduced
● 2020s: LLMs like GPT-3, PaLM, and LLaMA became active

Transformers
● Replaced recurrence with self-attention
○ Parallel processing of sequences
○ Long-range context
○ Good scalability
● All modern LLMs like GPT, PaLM, and
LLaMA started using them
● Encoder-decoder architecture

LLM Families
● Three Families:
○ GPT (OpenAI): Autoregressive models, from GPT-1 to
GPT-4
○ LLaMA (Meta): Open-source, efficient models
○ PaLM (Google): Large-scale, multilingual, and
reasoning-focused models.

GPT Family
● GPT-1 (2018): Generative pretraining
+ fine-tuning strategy.
● GPT-2 (2019): Strong zero-shot/few-
shot performance.
● GPT-3 (2020): 175B parameters, in-
context learning and few-shot
prompting.
● GPT-4 (2023): Multimodal
capabilities, advanced reasoning,
chain-of-thought prompting.

LLaMA
● LLaMA (Meta, 2023): Released
models from 7B to 65B parameters
● Focus on efficiency
● Uses curated, high-quality datasets.
● Popular for research and fine-tuning
due to open source
● Enabled low-cost adaptation via
techniques like:
○ LoRA
○ QLoRA

PaLM
● PaLM developed by Google
○ PaLM-1: 540B parameters, trained
using the Pathways infrastructure.
○ PaLM-2: Better multilingual and
logical reasoning
● Strong performance on BIG-Bench,
MMLU, Multi-task NLP, and coding
benchmarks
● Used by Google's Bard

Building LLMs
● LLMs are trained on massive text
● Data quality matters: deduplication,
cleaning
● Tokenization breaks text into chunks
● Training efficiency
● Model generalization

Model Architecture & Fine-
Tuning
09

Model Architecture & Fine-Tuning
● Transformers use self-attention to model relationships
between all words in a sequence.
● They don’t know word order, that’s where positional
encoding comes in.
● Supervised Fine-Tuning: Adjust model on specific tasks
● RLHF (Reinforcement Learning from Human Feedback)
● DPO (Direct Preference Optimization)
● Decoding Strategies:
○ Greedy, Beam Search, Top-k, Top-p

LLM Augmentation
● RAG (Retrieval-Augmented Generation):
○ Adds knowledge to LLMs by fetching context
● Tool Use:
○ LLMs can call APIs, use calculators, run code
● Multimodal Extensions:
○ LLMs that handle images, audio, and video

Challenges & Future
● Challenges:
○ LLMs can generate incorrect or
made-up information.
○ Outputs can reflect training data
bias, raising concerns.
○ Compute Cost: Training and
inference require massive
resources
○ Closed-Source Limitations
● Future Directions:
○ Multimodal LLMs: Text + vision
+ audio + video in one model
○ LLMs with planning, memory,
and long-term goals
○ Long-Context Models: Extend
input size from 2k 100k+
→
tokens
○ More Open-Source Efforts

Conclusion
● LLMs like GPT, LLaMA, and PaLM show a major future in AI
● They are built using transformers, massive datasets, and fine-
tuning techniques.
● LLMs are increasingly used in real-world tools
● Despite challenges like bias, research is rapidly evolving.
● The future is multimodal, agentic, and more open-source.

Large Language Models: Diving into GPT, LLaMA, and More

More Related Content

Similar to Large Language Models: Diving into GPT, LLaMA, and More

Recently uploaded

Large Language Models: Diving into GPT, LLaMA, and More