Intro to LLMs

Intro to LLMs
Loic Merckel
Septembre 2023
linkedin.com/in/merckel
1966: ELIZA
Image source: en.wikipedia.org/wiki/ELIZA#/media/File:ELIZA_conversation.png
“While ELIZA was capable of
engaging in discourse, it
could not converse with true
understanding. However,
many early users were
convinced of ELIZA's
intelligence and
understanding, despite
Weizenbaum's insistence to
the contrary.”
Source: en.wikipedia.org/wiki/ELIZA (and
references therein).
2005: SCIgen - An Automatic CS Paper Generator
nature.com/articles/d41586-021-01436-7
news.mit.edu/2015/how-three-mit-students-fooled-scientific-journals-0414
A project using a rather rudimentary technology that aimed to "maximize amusement, rather than coherence" is
still the cause of troubles today...
pdos.csail.mit.edu/archive/scigen
2017: Google Revolutionized Text Generation
■ Vaswani (2017), Attention Is All You Need (doi.org/10.48550/arXiv.1706.03762)
■ openai.com/research/better-language-models
Image generated with DALL.E: “A small robot standing on the
shoulder of a giant robot” (and slightly modified with The Gimp)
OpenAI’s Generative Pre-trained
Transformer (DALL.E, 2021; ChatGPT,
2022), as the name suggests, reposes on
Transformers.
Google introduced the Transformer,
which rapidly became the state-of-the-art
approach to solve most NLP problems.
● Kiela et al. (2021), Dynabench: Rethinking Benchmarking in NLP: arxiv.org/abs/2104.14337
● Roser (2022), The brief history of artificial intelligence: The world has changed fast – what might be next?: ourworldindata.org/brief-history-of-ai
Transformers
2017
Text and shapes in blue have been added to the original work from Max Roser.
What Are Transformers?
Source: Vaswani (2017), Attention Is All You Need
(doi.org/10.48550/arXiv.1706.03762)
Generative (deep learning) models for understanding and generating text,
images and many other types of data.
Transformers analyze chunks of data, called "tokens" and learn to predict
the next token in a sequence, based on previous and, if available, following
tokens.
The auto-regressive concept means that the output of the model, such as
the prediction of a word in a sentence, is influenced by the previous words it
has generated.
Music—MusicLM (Google) and Jukebox (OpenAI) generate music from text.
Image—Imagen (Google) and DALL.E (OpenAI) generate novel images from text.
Texte—OpenAI’s GPT has become widely known, but other players have similar technology
(including Google, Meta, Anthropic and others).
Others—Recommender (movies, books, flight destinations), drug discovery…
Models that learn from a given dataset how to
generate new data instances.
2022: ChatGPT
“ChatGPT, the popular chatbot
from OpenAI, is estimated to have
reached 100 million monthly
active users in January, just two
months after launch, making it the
fastest-growing consumer
application in history”
statista.com/chart/29174/time-to-one-million-users
Reuters, Feb 1, 2023
https://reut.rs/3yQNlGo
The Mushrooming of Transformer-Based LLMs
PaML (540b), LaMDA
(137b) and others (Bard
relies on LaMDA)
OPT-IML (175b), Galactica
(120b), BlenderBot3
(175b), Llama 2 (70b)
ERNIE 3.0 Titan (260b)
GPT-3 (175b), GPT-3.5 (?b),
GPT-4 (?b)
BLOOM (176b)
PanGu-𝛼 (200b)
Jurassic-1 (178b), Jurassic-2 (?b)
Exaone (300b)
Megatron-Turing NLG (530b)
(It appears that all those models rely only on
transformer-based decoders)
Source:
github.com/Mooler0410/LLMsPracti
calGuide
Now What?
cv
In Finance…
bloomberg.com/news/articles/2023-03-07/griffin-says-trying-to-negotiate-enterprise-wide-chatgpt-license bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance
AI Mentions Boost Stock Prices
● AI-mentioning companies:
+4.6% avg. stock price
increase (nearly double of the
non-mentioning).
● In general, 67% of companies
that mentioned AI observed an
increase in their stock prices
→ +8.5% on average.
● Tech companies:
71% → +11.9% on avg.
● Non-tech companies:
65% → +6.8% on avg.
- Mentions of "AI" and related terms (machine learning, automation, robots, etc.).
- S&P 500 companies in 2023.
- 3-day change from the date the earnings call transcript was published. Source: wallstreetzen.com/blog/ai-mention-moves-stock-prices-2023
GPUs Demand Skyrockets
Before LLMs, GPUs were primarily needed for training, and
CPUs were used for inference. However, with the emergence
of LLMs, GPUs have become almost essential for both tasks.
Paraphrasing Brannin McBee, co-founder of CoreWeave, in
Bloomberg Podcast*:
While you may train the model using 10,000 GPUs, the real
challenge arises when you need 1 million GPUs to meet the
entire inference demand. This surge in demand is expected
during the initial one to two years after the launch, and it's likely
to keep growing thereafter.
* How to Build the Ultimate GPU Cloud to Power AI | Odd Lots (youtube.com/watch?v=9OOn6u6GIqk&t=1308s)
Enhancing Productivity With Generative AI?
nature.com/articles/d41586-023-02270-9
science.org/doi/10.1126/science.adh2586
Limitations
Beware of “Hallucinations” Which Do Remain Very Real
“Hallucinations” are “confident
statements that are not true”1
.
For the moment, this
phenomenon inexorably
affects all known LLMs.
1: fr.wikipedia.org/wiki/Hallucination_(intelligence_artificielle)
Yves Montand in “Le Cercle Rouge” during an attack of delirium tremens
This thing probably doesn't exist.
Concrete
Hallucinations (GPT-4)
We asked ChatGPT the first part of the third
question of the British Mathematical Olympiad
1977: bmos.ukmt.org.uk/home/bmo-1977.pdf
Is that so? Although not an obvious
hallucination, it may remind us of Fermat’s
lack of space in the margin to give the proof
of his last theorem… Perhaps here there is a
lack of tokens?
Here a total hallucination, this statement is
evidently false.
Perhaps it meant “the
product of two negative
numbers”
Here a total hallucination, this statement is
evidently false. (Although in this case the
inequality is indeed clearly true.)
The Saga of the Lawyer Who Used ChatGPT
nytimes.com/2023/06/08/nyregion/law
yer-chatgpt-sanctions.html
nytimes.com/2023/05/27/nyregion/avia
nca-airline-lawsuit-chatgpt.html
nytimes.com/2023/06/22/nyregion/la
wyers-chatgpt-schwartz-loduca.html
ChatGPT: Achieving Human-Level Performance in
Professional and Academic Benchmarks
● GPT-4's performance in recent tests is
undeniably impressive.
● Study conducted by OpenAI
(openai.com/papers/gpt-4.pdf).
● Most of those tests mainly focus on high
school-level content.
● Many are prepared through test prep
courses and resources.
● By contrast, university exams typically
require a deeper understanding of course
material and critical thinking skills.
● Uniform Bar Exam: Worth noting, but
potential overestimation concerns (see
dx.doi.org/10.2139/ssrn.4441311).
Exploring the MIT Mathematics and EECS Curriculum Using
Large Language Models
Published on Jun 15, 2023
Authors: Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith
Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando
Solar-Lezama, Iddo Drori
Abstract
We curate a comprehensive dataset of 4,550 questions and solutions from problem sets,
midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and
Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of
large language models to fulfill the graduation requirements for any MIT major in Mathematics
and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT
curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set
excluding questions based on images. We fine-tune an open-source large language model on
this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed
performance breakdown by course, question, and answer type. By embedding questions in a
low-dimensional space, we explore the relationships between questions, topics, and classes and
discover which questions and classes are required for solving other questions and classes
through few-shot learning. Our analysis offers valuable insights into course prerequisites and
curriculum design, highlighting language models' potential for learning and improving
Mathematics and EECS education.
Source: arxiv.org/abs/2306.08997
i.e., GPT-4
scored 100% on
MIT EECS
Curriculum
(Electrical
Engineering and
Computer
Science)
“No, GPT4 can’t ace MIT”
Three MIT undergrads have debunked the myth.
- 4% of the questions were unsolvable. (How did GPT-4 achieve 100%?)
- Information leak in some few-shot prompts: for those, the answer was
quasi-given in the question.
- The automatic grading using GPT-4 itself has some severe issues: prompt
cascade that reprompted (many times) when the given answer was deemed
incorrect. 16% of the questions were multi-choices questions, hence a
quasi-guaranteed correct response.
- Bugs found in the research script that raise serious questions regarding the
soundness of the study.
Source: flower-nutria-41d.notion.site/No-GPT4-can-t-ace-MIT-b27e6796ab5a48368127a98216c76864
Note: The paper has since been withdrawn (see official statement at people.csail.mit.edu/asolar/CoursesPaperStatement.pdf)
Chemistry May Not Be ChatGPT Cup of Tea
A study conducted by three researchers of the University of
Hertfordshire (UK) showed that ChatGPT is not a fan of
chemistry.
Real exams were used, and the authors note that “[a] well-written
question item aims to create intellectual challenge and to require
interpretation and inquiry. Questions that cannot be easily
‘Googled’ or easily answered through a single click in an
internet search engine is a focus.”
“The overall grade on the year 1 paper calculated from the top
four graded answers would be 34.1%, which does not meet the
pass criteria. The overall grade on the year 2 paper would be
18.3%, which does not meet the pass criteria.”
Source: Fergus et al., 2023, Evaluating Academic Answers Generated Using ChatGPT (pubs.acs.org/doi/10.1021/acs.jchemed.3c00087)
The “Drift” Phenomenon
Sources:
- wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0
- Chaîne et al., 2023, arxiv.org/abs/2307.09009
● New research from Stanford and UC Berkeley
highlights a fundamental challenge in AI
development: "drift."
● Drift occurs when improving one aspect of
complex AI models leads to a decline in
performance in other areas.
● ChatGPT has shown deterioration in basic math
operations despite advancements in other tasks.
● GPT-4 exhibits reduced responsiveness to
chain-of-thought prompting (may be intended to
mitigate potential misuse with malicious
prompts).
The “behavior of the ‘same’ LLM service can
change substantially in a relatively short amount of
time, highlighting the need for continuous monitoring
of LLMs” (Chain et al., 2023).
Techniques for Tailoring LLMs to
Specific Problems
Prompts Engineering
Fine-Tuning
Reinforcement Learning From Human Feedback (RLHF)
First We Must Have a Problem to Solve…
Source: DeepLearning.AI, licensed under CC BY-SA 2.0
Then We Need a Model
Commercial APIs
- Google, OpenAI, Anthropic, Microsoft...
- Privacy concerns may arise.
- No specific hardware requirement.
- Prompt engineering (OpenAI offers prompt fine-tuning).
Use a foundation model (many open sources models are available)
- As it is (prompt engineering),
- or fine-tuned (either full or parameter efficient fine-tuning).
- May required specific hardware/infrastructure for hosting, fine-tuning and
inferences.
Train a model from the scratch
- Requires huge resources (both data and computing power).
- (e.g., BloombergGPT, arxiv.org/abs/2303.17564.)
A Plethora of Open
Source Pre-Trained
Models
huggingface.co/models
Models should be selected
depending on:
● The problem at hand.
● The strength of the model.
● The operating costs (larger
models require more
resources).
● Other considerations (e.g.,
license).
Prompt Engineering: “Query Crafting”
Improving the output with actions like phrasing
queries, specifying styles, providing context, or
assigning roles (e.g., 'Act as a mathematics
teacher') (Wikipedia, 2023).
Some hints can be found in OpenAI’s “GPT best
practices” (OpenAi, 2023).
Chain-of-thought: popular technique consisting
in “guiding [LLMs] to produce a sequence of
intermediate steps before giving the final answer”
(Wei et al., 2022).
Sources:
- Wei, J.et al., 2022. Emergent abilities of large language models, arxiv.org/abs/2206.07682
- OpenAI, 2023, platform.openai.com/docs/guides/gpt-best-practices/six-strategies-for-getting-better-results
- Wikipedia, 2023, , Prompt Engineering, en.wikipedia.org/wiki/Prompt_engineering
(graph from Wei et al., 2022)
About GSM8K benchmark: arxiv.org/abs/2110.14168
Prompt Engineering: In-Context Learning (ICL)
In-Context Learning (ICL) consists in “a few input-output
examples in the model’s context (input) as a preamble
before asking the model to perform the task for an unseen
inference-time example” (Wei et al., 2022).
It is a kind of “ephemeral supervised learning.”
- Zero-shot prompting or Zero-shot learning: no example
given (for largest LLMs, smaller ones may struggle).
- One-shot prompting: one example provided.
- Few-shot prompting: a few examples (typically 3~6).
⚠ Context window limits (e.g., 4096 tokens).
Tweet: @lufthansa Please find our
missing luggage!!
Sentiment: negative
Tweet: Will be on LH to FRA very soon.
Cheers!
Sentiment: positive
Tweet: Refused to compensate me for 2
days cancelled flights . Joke of a airline
Sentiment:
LLM
negative
Example of an input and
output for two-shot prompting
Source: Wei, J.et al., 2022. Emergent abilities of large language models, arxiv.org/abs/2206.07682
Fine-Tuning: Introduction
Few shot learning:
- May not be sufficient for smaller models.
- Consumes tokens from the context window.
Fine-tuning is a supervised learning process
that leads to a new model (in contrast with
in-context learning that is “ephemeral”).
Task specific prompt-completion pairs data are
required.
Base LLM
Fine-tuned
LLM
(Prompt_1, completion_1)
(Prompt_2, completion_2)
…
(Prompt_n, completion_n)
Task specific prompt-completion
pairs data
Full Fine-Tuning: Updating All Parameters
Fine-tuning very often means “instruction fine-tuning.”
Instruction fine-tuning: each prompt-completion pair includes a specific
instruction (summarize this, translate that, classify this tweet, …).
● Fine-tuning on a single task (e.g, summarization) may lead to a phenomenon
referred to as “catastrophic forgetting” (arxiv.org/pdf/1911.00202), where the
model loses its abilities on other tasks (may not be a business issue, though).
● Fine-tuning on multi tasks (e.g., summarization, translation, classification, …).
This requires a lot more training data. (E.g., see FLAN in Wei et al., 2022.)
Full fine-tuning is extremely resources demanding, even more so for large models.
Source: Wei et al., 2022, Finetuned Language Models Are Zero-Shot Learners. arxiv.org/abs/2109.01652
Parameter Efficient Fine-Tuning (PEFT)
Unlike full fine-tuning, PEFT preserves the vast majority of the weights of the original
model.
● Less prone to “catastrophic forgetting” on single task.
● Often a single GPU is enough.
Three methods:
● Selective—subset of initial params to fine-tune.
● Reparameterization—reparameterize model weights using a low-rank
representation, e.g., LoRA (Hu et al., 2021).
● Additive—add trainable layers or parameters to model, two approaches:
- Adapters: add new trainable layers to the architecture of the model.
- Soft prompts: focus on manipulating the input (this is not prompt engineering).
Source:
- coursera.org/learn/generative-ai-with-llms/lecture/rCE9r/parameter-efficient-fine-tuning-peft
- Hu et al., 2021, LoRA: Low-Rank Adaptation of Large Language Models. arxiv.org/abs/2106.09685
OpenAI API offers
prompt tuning for
gpt-3.5-turbo, but not
“yet” for GPT-4.
platform.openai.com/docs/guides/fine-tuning
Fine-Tuning With
OpenAI GPT
(PEFT)
Reinforcement Learning From Human Feedback
LLMs are trained on the web data with a lot of irrelevant matters (unhelpful), or worse,
where false (dishonest) and/or harmful information are abundant, e.g.,
● Potentially dangerous false medical advices.
● Valid techniques for illegal activities (hacking, deceiving, building weapons, …).
HHH (Helpful, Honest & Harmless) alignment (Askell et al., 2021): ensuring that the
model's behavior and outputs are consistent with human values, intentions, and ethical
standards.
Reinforcement Learning from Human Feedback, or RLHF (Casper et al., 2023)
● “is a technique for training AI systems to align with human goals.”
● “[It] has emerged as the central method used to finetune state-of-the-art [LLMs].”
● It reposes on human judgment and consensus.
Source:
- Casper et al., 2023, Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. arxiv.org/abs/2307.15217
- Ziegler et al., 2022, Fine-Tuning Language Models from Human Preferences. arxiv.org/abs/1909.08593
- Askell et al., 2021, A General Language Assistant as a Laboratory for Alignment. arxiv.org/abs/2112.00861
What Is RLHF by Sam Altman
5:59
What is RLHF? Reinforcement Learning with Human Feedback, …
6:07
… So, we trained these models on a lot of text data and, in that process, they
learned the underlying, …. And they can do amazing things.
6:26
But when you first play with that base model, that we call it, after you finish
training, … it can do a lot of, you know, there's knowledge in there. But it's not
very useful or, at least, it's not easy to use, let's say. And RLHF is how we
take some human feedback,
6:45
the simplest version of this is show two outputs, ask which one is better
than the other,
6:50
which one the human raters prefer, and then feed that back into the model
with reinforcement learning.
6:56
And that process works remarkably well with, in my opinion, remarkably little
data to make the model more useful. So, RLHF is how we align the model to
what humans want it to do.
Sam Altman: OpenAI CEO on
GPT-4, ChatGPT, and the Future of
AI | Lex Fridman Podcast #367
(youtu.be/L_Guz73e6fw?si=vfkdtN
CyrQa1RzZR&t=359)
Source: Liu et al., 2022, Aligning Generative Language Models with Human Values. aclanthology.org/2022.findings-naacl.18
RLHF: Example of Alignment Tasks
Performance Evaluation
Assessing and Comparing LLMs
Metrics while training the model—ROUGE (summary) or BLEU (translation).
Benchmarks—A non-exhaustive list:
- ARC (Abstraction and Reasoning Corpus, arxiv.org/pdf/2305.18354),
- HellaSwag (arxiv.org/abs/1905.07830),
- TruthfulQA (arxiv.org/abs/2109.07958),
- GLUE & SuperGLUE (General Language Understanding Evaluation, gluebenchmark.com),
- HELM (Holistic Evaluation of Language Models, crfm.stanford.edu/helm),
- MMLU (Massive Multitask Language Understanding, arxiv.org/abs/2009.03300),
- BIG-bench (arxiv.org/pdf/2206.04615).
Others—“Auto-Eval of Question-Answering Tasks”
(blog.langchain.dev/auto-eval-of-question-answering-tasks).
Source: Wu et al., 2023,
BloombergGPT: A Large Language
Model for Finance.
arxiv.org/abs/2303.17564 (Table 13:
“BIG-bench hard results using
standard 3-shot prompting”)
Source: Touvron et al., 2023, Llama 2: Open Foundation and Fine-Tuned Chat Models,
scontent-fra3-1.xx.fbcdn.net/v/t39.2365-6/10000000_662098952474184_2584067087619170692_n.pdf
Application Example:
Conversing With Annual Reports
Question ChatGPT About the Latest Financial
Reports?
—blog.langchain.dev/tutorial-
chatgpt-over-your-data
“[ChatGPT] doesn’t know about
your private data, it doesn’t know
about recent sources of data.
Wouldn’t it be useful if it did?”
Workflow Overview
Question
Answer
« Quels vont être les dividendes payés
par action par le Groupe Crit ? »
« Le Groupe CRIT proposera lors de sa prochaine Assemblée Générale, le 9
juin 2023, le versement d'un dividende exceptionnel de 3,5 € par action. »
The example (the question and associated
answer) is a real example (the LLM was
“gpt-3.5-turbo” from OpenAI)
Technique described in: Lewis et al., 2020.
Retrieval-augmented generation for knowledge-intensive
nlp tasks. (doi.org/10.48550/arXiv.2005.11401)
Extracting
relevant
information
(“context”)
Generate a prompt
accordingly
(“question +
context”)
LLM
Vector store
Split into chunks
1
2 3
Compute
embeddings
Preliminary Prototype
Financial reports retrieved directly from the French AMF (“Autorité
des marchés financiers”) via their API (info-financiere.fr).
xhtml document in
French language.
Question and answer
are in English (they
would be in French
should the question be
asked in French).
Except where otherwise noted, this work is licensed under
https://creativecommons.org/licenses/by/4.0/
619.io
1 of 45

Recommended

LLMs Bootcamp by
LLMs BootcampLLMs Bootcamp
LLMs BootcampFiza987241
131 views12 slides
Large Language Models - Chat AI.pdf by
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
695 views19 slides
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT! by
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!taozen
1K views13 slides
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in... by
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
660 views36 slides
And then there were ... Large Language Models by
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
2.4K views40 slides
Unlocking the Power of Generative AI An Executive's Guide.pdf by
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
2.2K views29 slides

More Related Content

What's hot

An Introduction to Generative AI by
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AICori Faklaris
11.4K views28 slides
An Introduction to Generative AI - May 18, 2023 by
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023CoriFaklaris1
939 views28 slides
Leveraging Generative AI & Best practices by
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesDianaGray10
1.7K views21 slides
Large Language Models Bootcamp by
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
3.3K views15 slides
A Comprehensive Review of Large Language Models for.pptx by
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
2.3K views24 slides
AI and ML Series - Introduction to Generative AI and LLMs - Session 1 by
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
1.2K views38 slides

What's hot(20)

An Introduction to Generative AI by Cori Faklaris
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AI
Cori Faklaris11.4K views
An Introduction to Generative AI - May 18, 2023 by CoriFaklaris1
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023
CoriFaklaris1939 views
Leveraging Generative AI & Best practices by DianaGray10
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
DianaGray101.7K views
A Comprehensive Review of Large Language Models for.pptx by SaiPragnaKancheti
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
SaiPragnaKancheti2.3K views
AI and ML Series - Introduction to Generative AI and LLMs - Session 1 by DianaGray10
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray101.2K views
Let's talk about GPT: A crash course in Generative AI for researchers by Steven Van Vaerenbergh
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
‘Big models’: the success and pitfalls of Transformer models in natural langu... by Leiden University
‘Big models’: the success and pitfalls of Transformer models in natural langu...‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...
A brief primer on OpenAI's GPT-3 by Ishan Jain
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
Ishan Jain2K views
LLMs_talk_March23.pdf by ChaoYang81
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdf
ChaoYang8193 views
Generative AI and law.pptx by Chris Marsden
Generative AI and law.pptxGenerative AI and law.pptx
Generative AI and law.pptx
Chris Marsden631 views
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY by Andre Muscat
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
Andre Muscat6.6K views
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT by Anant Corporation
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Anant Corporation302 views
Generative-AI-in-enterprise-20230615.pdf by Liming Zhu
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
Liming Zhu843 views
ChatGPT, Foundation Models and Web3.pptx by Jesus Rodriguez
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
Jesus Rodriguez807 views
Prompting is an art / Sztuka promptowania by Michal Jaskolski
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowania
Michal Jaskolski287 views

Similar to Intro to LLMs

Introduction to LLMs by
Introduction to LLMsIntroduction to LLMs
Introduction to LLMsLoic Merckel
247 views49 slides
Case study on machine learning by
Case study on machine learningCase study on machine learning
Case study on machine learningHarshitBarde
620 views22 slides
Ntegra 20231003 v3.pptx by
Ntegra 20231003 v3.pptxNtegra 20231003 v3.pptx
Ntegra 20231003 v3.pptxISSIP
60 views161 slides
Deep Neural Networks for Machine Learning by
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningJustin Beirold
272 views15 slides
Genetic Algorithms and Programming - An Evolutionary Methodology by
Genetic Algorithms and Programming - An Evolutionary MethodologyGenetic Algorithms and Programming - An Evolutionary Methodology
Genetic Algorithms and Programming - An Evolutionary Methodologyacijjournal
92 views18 slides

Similar to Intro to LLMs(20)

Introduction to LLMs by Loic Merckel
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
Loic Merckel247 views
Case study on machine learning by HarshitBarde
Case study on machine learningCase study on machine learning
Case study on machine learning
HarshitBarde620 views
Ntegra 20231003 v3.pptx by ISSIP
Ntegra 20231003 v3.pptxNtegra 20231003 v3.pptx
Ntegra 20231003 v3.pptx
ISSIP60 views
Deep Neural Networks for Machine Learning by Justin Beirold
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine Learning
Justin Beirold272 views
Genetic Algorithms and Programming - An Evolutionary Methodology by acijjournal
Genetic Algorithms and Programming - An Evolutionary MethodologyGenetic Algorithms and Programming - An Evolutionary Methodology
Genetic Algorithms and Programming - An Evolutionary Methodology
acijjournal92 views
NITLE IT Leaders 2009: Emerging Technologies in a Submerging Economy by Bryan Alexander
NITLE IT Leaders 2009: Emerging Technologies in a Submerging EconomyNITLE IT Leaders 2009: Emerging Technologies in a Submerging Economy
NITLE IT Leaders 2009: Emerging Technologies in a Submerging Economy
Bryan Alexander1.4K views
Denmark future of ai 20180927 v8 by ISSIP
Denmark future of ai 20180927 v8Denmark future of ai 20180927 v8
Denmark future of ai 20180927 v8
ISSIP223 views
Spohrer SIRs 20230511 v16.pptx by ISSIP
Spohrer SIRs 20230511 v16.pptxSpohrer SIRs 20230511 v16.pptx
Spohrer SIRs 20230511 v16.pptx
ISSIP20 views
UpdatedSociety5, 2Oct23 by HeilaPienaar
UpdatedSociety5, 2Oct23UpdatedSociety5, 2Oct23
UpdatedSociety5, 2Oct23
HeilaPienaar10 views
20210128 jim spohrer ai house_fund v4 by ISSIP
20210128 jim spohrer ai house_fund v420210128 jim spohrer ai house_fund v4
20210128 jim spohrer ai house_fund v4
ISSIP133 views
Hicss52 20190108 v3 by ISSIP
Hicss52 20190108 v3Hicss52 20190108 v3
Hicss52 20190108 v3
ISSIP369 views
Worker Productivity 20230628 v1.pptx by ISSIP
Worker Productivity 20230628 v1.pptxWorker Productivity 20230628 v1.pptx
Worker Productivity 20230628 v1.pptx
ISSIP28 views
Semantic Web: In Quest for the Next Generation Killer Apps by Jie Bao
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
Jie Bao2.6K views
Future of AI - 2023 07 25.pptx by Greg Makowski
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
Greg Makowski3.6K views
Ai open powermeetupmarch25th by IBM
Ai open powermeetupmarch25thAi open powermeetupmarch25th
Ai open powermeetupmarch25th
IBM29 views
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw... by Melanie Swan
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
Melanie Swan3.9K views

Recently uploaded

Cross-network in Google Analytics 4.pdf by
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdfGA4 Tutorials
6 views7 slides
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptxayeshabaig2004
7 views30 slides
CRIJ4385_Death Penalty_F23.pptx by
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptxyvettemm100
7 views24 slides
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptxDataScienceConferenc1
9 views16 slides
UNEP FI CRS Climate Risk Results.pptx by
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptxpekka28
11 views51 slides
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...DataScienceConferenc1
5 views19 slides

Recently uploaded(20)

Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
LIVE OAK MEMORIAL PARK.pptx by ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo... by DataScienceConferenc1
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821712 views
PRIVACY AWRE PERSONAL DATA STORAGE by antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204217 views
CRM stick or twist.pptx by info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 views
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx

Intro to LLMs

  • 1. Intro to LLMs Loic Merckel Septembre 2023 linkedin.com/in/merckel
  • 2. 1966: ELIZA Image source: en.wikipedia.org/wiki/ELIZA#/media/File:ELIZA_conversation.png “While ELIZA was capable of engaging in discourse, it could not converse with true understanding. However, many early users were convinced of ELIZA's intelligence and understanding, despite Weizenbaum's insistence to the contrary.” Source: en.wikipedia.org/wiki/ELIZA (and references therein).
  • 3. 2005: SCIgen - An Automatic CS Paper Generator nature.com/articles/d41586-021-01436-7 news.mit.edu/2015/how-three-mit-students-fooled-scientific-journals-0414 A project using a rather rudimentary technology that aimed to "maximize amusement, rather than coherence" is still the cause of troubles today... pdos.csail.mit.edu/archive/scigen
  • 4. 2017: Google Revolutionized Text Generation ■ Vaswani (2017), Attention Is All You Need (doi.org/10.48550/arXiv.1706.03762) ■ openai.com/research/better-language-models Image generated with DALL.E: “A small robot standing on the shoulder of a giant robot” (and slightly modified with The Gimp) OpenAI’s Generative Pre-trained Transformer (DALL.E, 2021; ChatGPT, 2022), as the name suggests, reposes on Transformers. Google introduced the Transformer, which rapidly became the state-of-the-art approach to solve most NLP problems.
  • 5. ● Kiela et al. (2021), Dynabench: Rethinking Benchmarking in NLP: arxiv.org/abs/2104.14337 ● Roser (2022), The brief history of artificial intelligence: The world has changed fast – what might be next?: ourworldindata.org/brief-history-of-ai Transformers 2017 Text and shapes in blue have been added to the original work from Max Roser.
  • 6. What Are Transformers? Source: Vaswani (2017), Attention Is All You Need (doi.org/10.48550/arXiv.1706.03762) Generative (deep learning) models for understanding and generating text, images and many other types of data. Transformers analyze chunks of data, called "tokens" and learn to predict the next token in a sequence, based on previous and, if available, following tokens. The auto-regressive concept means that the output of the model, such as the prediction of a word in a sentence, is influenced by the previous words it has generated. Music—MusicLM (Google) and Jukebox (OpenAI) generate music from text. Image—Imagen (Google) and DALL.E (OpenAI) generate novel images from text. Texte—OpenAI’s GPT has become widely known, but other players have similar technology (including Google, Meta, Anthropic and others). Others—Recommender (movies, books, flight destinations), drug discovery… Models that learn from a given dataset how to generate new data instances.
  • 7. 2022: ChatGPT “ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after launch, making it the fastest-growing consumer application in history” statista.com/chart/29174/time-to-one-million-users Reuters, Feb 1, 2023 https://reut.rs/3yQNlGo
  • 8. The Mushrooming of Transformer-Based LLMs PaML (540b), LaMDA (137b) and others (Bard relies on LaMDA) OPT-IML (175b), Galactica (120b), BlenderBot3 (175b), Llama 2 (70b) ERNIE 3.0 Titan (260b) GPT-3 (175b), GPT-3.5 (?b), GPT-4 (?b) BLOOM (176b) PanGu-𝛼 (200b) Jurassic-1 (178b), Jurassic-2 (?b) Exaone (300b) Megatron-Turing NLG (530b) (It appears that all those models rely only on transformer-based decoders)
  • 12. AI Mentions Boost Stock Prices ● AI-mentioning companies: +4.6% avg. stock price increase (nearly double of the non-mentioning). ● In general, 67% of companies that mentioned AI observed an increase in their stock prices → +8.5% on average. ● Tech companies: 71% → +11.9% on avg. ● Non-tech companies: 65% → +6.8% on avg. - Mentions of "AI" and related terms (machine learning, automation, robots, etc.). - S&P 500 companies in 2023. - 3-day change from the date the earnings call transcript was published. Source: wallstreetzen.com/blog/ai-mention-moves-stock-prices-2023
  • 13. GPUs Demand Skyrockets Before LLMs, GPUs were primarily needed for training, and CPUs were used for inference. However, with the emergence of LLMs, GPUs have become almost essential for both tasks. Paraphrasing Brannin McBee, co-founder of CoreWeave, in Bloomberg Podcast*: While you may train the model using 10,000 GPUs, the real challenge arises when you need 1 million GPUs to meet the entire inference demand. This surge in demand is expected during the initial one to two years after the launch, and it's likely to keep growing thereafter. * How to Build the Ultimate GPU Cloud to Power AI | Odd Lots (youtube.com/watch?v=9OOn6u6GIqk&t=1308s)
  • 14. Enhancing Productivity With Generative AI? nature.com/articles/d41586-023-02270-9 science.org/doi/10.1126/science.adh2586
  • 16. Beware of “Hallucinations” Which Do Remain Very Real “Hallucinations” are “confident statements that are not true”1 . For the moment, this phenomenon inexorably affects all known LLMs. 1: fr.wikipedia.org/wiki/Hallucination_(intelligence_artificielle) Yves Montand in “Le Cercle Rouge” during an attack of delirium tremens This thing probably doesn't exist.
  • 17. Concrete Hallucinations (GPT-4) We asked ChatGPT the first part of the third question of the British Mathematical Olympiad 1977: bmos.ukmt.org.uk/home/bmo-1977.pdf Is that so? Although not an obvious hallucination, it may remind us of Fermat’s lack of space in the margin to give the proof of his last theorem… Perhaps here there is a lack of tokens? Here a total hallucination, this statement is evidently false. Perhaps it meant “the product of two negative numbers” Here a total hallucination, this statement is evidently false. (Although in this case the inequality is indeed clearly true.)
  • 18. The Saga of the Lawyer Who Used ChatGPT nytimes.com/2023/06/08/nyregion/law yer-chatgpt-sanctions.html nytimes.com/2023/05/27/nyregion/avia nca-airline-lawsuit-chatgpt.html nytimes.com/2023/06/22/nyregion/la wyers-chatgpt-schwartz-loduca.html
  • 19. ChatGPT: Achieving Human-Level Performance in Professional and Academic Benchmarks ● GPT-4's performance in recent tests is undeniably impressive. ● Study conducted by OpenAI (openai.com/papers/gpt-4.pdf). ● Most of those tests mainly focus on high school-level content. ● Many are prepared through test prep courses and resources. ● By contrast, university exams typically require a deeper understanding of course material and critical thinking skills. ● Uniform Bar Exam: Worth noting, but potential overestimation concerns (see dx.doi.org/10.2139/ssrn.4441311).
  • 20. Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models Published on Jun 15, 2023 Authors: Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando Solar-Lezama, Iddo Drori Abstract We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set excluding questions based on images. We fine-tune an open-source large language model on this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed performance breakdown by course, question, and answer type. By embedding questions in a low-dimensional space, we explore the relationships between questions, topics, and classes and discover which questions and classes are required for solving other questions and classes through few-shot learning. Our analysis offers valuable insights into course prerequisites and curriculum design, highlighting language models' potential for learning and improving Mathematics and EECS education. Source: arxiv.org/abs/2306.08997 i.e., GPT-4 scored 100% on MIT EECS Curriculum (Electrical Engineering and Computer Science)
  • 21. “No, GPT4 can’t ace MIT” Three MIT undergrads have debunked the myth. - 4% of the questions were unsolvable. (How did GPT-4 achieve 100%?) - Information leak in some few-shot prompts: for those, the answer was quasi-given in the question. - The automatic grading using GPT-4 itself has some severe issues: prompt cascade that reprompted (many times) when the given answer was deemed incorrect. 16% of the questions were multi-choices questions, hence a quasi-guaranteed correct response. - Bugs found in the research script that raise serious questions regarding the soundness of the study. Source: flower-nutria-41d.notion.site/No-GPT4-can-t-ace-MIT-b27e6796ab5a48368127a98216c76864 Note: The paper has since been withdrawn (see official statement at people.csail.mit.edu/asolar/CoursesPaperStatement.pdf)
  • 22. Chemistry May Not Be ChatGPT Cup of Tea A study conducted by three researchers of the University of Hertfordshire (UK) showed that ChatGPT is not a fan of chemistry. Real exams were used, and the authors note that “[a] well-written question item aims to create intellectual challenge and to require interpretation and inquiry. Questions that cannot be easily ‘Googled’ or easily answered through a single click in an internet search engine is a focus.” “The overall grade on the year 1 paper calculated from the top four graded answers would be 34.1%, which does not meet the pass criteria. The overall grade on the year 2 paper would be 18.3%, which does not meet the pass criteria.” Source: Fergus et al., 2023, Evaluating Academic Answers Generated Using ChatGPT (pubs.acs.org/doi/10.1021/acs.jchemed.3c00087)
  • 23. The “Drift” Phenomenon Sources: - wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0 - Chaîne et al., 2023, arxiv.org/abs/2307.09009 ● New research from Stanford and UC Berkeley highlights a fundamental challenge in AI development: "drift." ● Drift occurs when improving one aspect of complex AI models leads to a decline in performance in other areas. ● ChatGPT has shown deterioration in basic math operations despite advancements in other tasks. ● GPT-4 exhibits reduced responsiveness to chain-of-thought prompting (may be intended to mitigate potential misuse with malicious prompts). The “behavior of the ‘same’ LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLMs” (Chain et al., 2023).
  • 24. Techniques for Tailoring LLMs to Specific Problems Prompts Engineering Fine-Tuning Reinforcement Learning From Human Feedback (RLHF)
  • 25. First We Must Have a Problem to Solve… Source: DeepLearning.AI, licensed under CC BY-SA 2.0
  • 26. Then We Need a Model Commercial APIs - Google, OpenAI, Anthropic, Microsoft... - Privacy concerns may arise. - No specific hardware requirement. - Prompt engineering (OpenAI offers prompt fine-tuning). Use a foundation model (many open sources models are available) - As it is (prompt engineering), - or fine-tuned (either full or parameter efficient fine-tuning). - May required specific hardware/infrastructure for hosting, fine-tuning and inferences. Train a model from the scratch - Requires huge resources (both data and computing power). - (e.g., BloombergGPT, arxiv.org/abs/2303.17564.)
  • 27. A Plethora of Open Source Pre-Trained Models huggingface.co/models Models should be selected depending on: ● The problem at hand. ● The strength of the model. ● The operating costs (larger models require more resources). ● Other considerations (e.g., license).
  • 28. Prompt Engineering: “Query Crafting” Improving the output with actions like phrasing queries, specifying styles, providing context, or assigning roles (e.g., 'Act as a mathematics teacher') (Wikipedia, 2023). Some hints can be found in OpenAI’s “GPT best practices” (OpenAi, 2023). Chain-of-thought: popular technique consisting in “guiding [LLMs] to produce a sequence of intermediate steps before giving the final answer” (Wei et al., 2022). Sources: - Wei, J.et al., 2022. Emergent abilities of large language models, arxiv.org/abs/2206.07682 - OpenAI, 2023, platform.openai.com/docs/guides/gpt-best-practices/six-strategies-for-getting-better-results - Wikipedia, 2023, , Prompt Engineering, en.wikipedia.org/wiki/Prompt_engineering (graph from Wei et al., 2022) About GSM8K benchmark: arxiv.org/abs/2110.14168
  • 29. Prompt Engineering: In-Context Learning (ICL) In-Context Learning (ICL) consists in “a few input-output examples in the model’s context (input) as a preamble before asking the model to perform the task for an unseen inference-time example” (Wei et al., 2022). It is a kind of “ephemeral supervised learning.” - Zero-shot prompting or Zero-shot learning: no example given (for largest LLMs, smaller ones may struggle). - One-shot prompting: one example provided. - Few-shot prompting: a few examples (typically 3~6). ⚠ Context window limits (e.g., 4096 tokens). Tweet: @lufthansa Please find our missing luggage!! Sentiment: negative Tweet: Will be on LH to FRA very soon. Cheers! Sentiment: positive Tweet: Refused to compensate me for 2 days cancelled flights . Joke of a airline Sentiment: LLM negative Example of an input and output for two-shot prompting Source: Wei, J.et al., 2022. Emergent abilities of large language models, arxiv.org/abs/2206.07682
  • 30. Fine-Tuning: Introduction Few shot learning: - May not be sufficient for smaller models. - Consumes tokens from the context window. Fine-tuning is a supervised learning process that leads to a new model (in contrast with in-context learning that is “ephemeral”). Task specific prompt-completion pairs data are required. Base LLM Fine-tuned LLM (Prompt_1, completion_1) (Prompt_2, completion_2) … (Prompt_n, completion_n) Task specific prompt-completion pairs data
  • 31. Full Fine-Tuning: Updating All Parameters Fine-tuning very often means “instruction fine-tuning.” Instruction fine-tuning: each prompt-completion pair includes a specific instruction (summarize this, translate that, classify this tweet, …). ● Fine-tuning on a single task (e.g, summarization) may lead to a phenomenon referred to as “catastrophic forgetting” (arxiv.org/pdf/1911.00202), where the model loses its abilities on other tasks (may not be a business issue, though). ● Fine-tuning on multi tasks (e.g., summarization, translation, classification, …). This requires a lot more training data. (E.g., see FLAN in Wei et al., 2022.) Full fine-tuning is extremely resources demanding, even more so for large models. Source: Wei et al., 2022, Finetuned Language Models Are Zero-Shot Learners. arxiv.org/abs/2109.01652
  • 32. Parameter Efficient Fine-Tuning (PEFT) Unlike full fine-tuning, PEFT preserves the vast majority of the weights of the original model. ● Less prone to “catastrophic forgetting” on single task. ● Often a single GPU is enough. Three methods: ● Selective—subset of initial params to fine-tune. ● Reparameterization—reparameterize model weights using a low-rank representation, e.g., LoRA (Hu et al., 2021). ● Additive—add trainable layers or parameters to model, two approaches: - Adapters: add new trainable layers to the architecture of the model. - Soft prompts: focus on manipulating the input (this is not prompt engineering). Source: - coursera.org/learn/generative-ai-with-llms/lecture/rCE9r/parameter-efficient-fine-tuning-peft - Hu et al., 2021, LoRA: Low-Rank Adaptation of Large Language Models. arxiv.org/abs/2106.09685
  • 33. OpenAI API offers prompt tuning for gpt-3.5-turbo, but not “yet” for GPT-4. platform.openai.com/docs/guides/fine-tuning Fine-Tuning With OpenAI GPT (PEFT)
  • 34. Reinforcement Learning From Human Feedback LLMs are trained on the web data with a lot of irrelevant matters (unhelpful), or worse, where false (dishonest) and/or harmful information are abundant, e.g., ● Potentially dangerous false medical advices. ● Valid techniques for illegal activities (hacking, deceiving, building weapons, …). HHH (Helpful, Honest & Harmless) alignment (Askell et al., 2021): ensuring that the model's behavior and outputs are consistent with human values, intentions, and ethical standards. Reinforcement Learning from Human Feedback, or RLHF (Casper et al., 2023) ● “is a technique for training AI systems to align with human goals.” ● “[It] has emerged as the central method used to finetune state-of-the-art [LLMs].” ● It reposes on human judgment and consensus. Source: - Casper et al., 2023, Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. arxiv.org/abs/2307.15217 - Ziegler et al., 2022, Fine-Tuning Language Models from Human Preferences. arxiv.org/abs/1909.08593 - Askell et al., 2021, A General Language Assistant as a Laboratory for Alignment. arxiv.org/abs/2112.00861
  • 35. What Is RLHF by Sam Altman 5:59 What is RLHF? Reinforcement Learning with Human Feedback, … 6:07 … So, we trained these models on a lot of text data and, in that process, they learned the underlying, …. And they can do amazing things. 6:26 But when you first play with that base model, that we call it, after you finish training, … it can do a lot of, you know, there's knowledge in there. But it's not very useful or, at least, it's not easy to use, let's say. And RLHF is how we take some human feedback, 6:45 the simplest version of this is show two outputs, ask which one is better than the other, 6:50 which one the human raters prefer, and then feed that back into the model with reinforcement learning. 6:56 And that process works remarkably well with, in my opinion, remarkably little data to make the model more useful. So, RLHF is how we align the model to what humans want it to do. Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI | Lex Fridman Podcast #367 (youtu.be/L_Guz73e6fw?si=vfkdtN CyrQa1RzZR&t=359)
  • 36. Source: Liu et al., 2022, Aligning Generative Language Models with Human Values. aclanthology.org/2022.findings-naacl.18 RLHF: Example of Alignment Tasks
  • 38. Assessing and Comparing LLMs Metrics while training the model—ROUGE (summary) or BLEU (translation). Benchmarks—A non-exhaustive list: - ARC (Abstraction and Reasoning Corpus, arxiv.org/pdf/2305.18354), - HellaSwag (arxiv.org/abs/1905.07830), - TruthfulQA (arxiv.org/abs/2109.07958), - GLUE & SuperGLUE (General Language Understanding Evaluation, gluebenchmark.com), - HELM (Holistic Evaluation of Language Models, crfm.stanford.edu/helm), - MMLU (Massive Multitask Language Understanding, arxiv.org/abs/2009.03300), - BIG-bench (arxiv.org/pdf/2206.04615). Others—“Auto-Eval of Question-Answering Tasks” (blog.langchain.dev/auto-eval-of-question-answering-tasks).
  • 39. Source: Wu et al., 2023, BloombergGPT: A Large Language Model for Finance. arxiv.org/abs/2303.17564 (Table 13: “BIG-bench hard results using standard 3-shot prompting”)
  • 40. Source: Touvron et al., 2023, Llama 2: Open Foundation and Fine-Tuned Chat Models, scontent-fra3-1.xx.fbcdn.net/v/t39.2365-6/10000000_662098952474184_2584067087619170692_n.pdf
  • 42. Question ChatGPT About the Latest Financial Reports? —blog.langchain.dev/tutorial- chatgpt-over-your-data “[ChatGPT] doesn’t know about your private data, it doesn’t know about recent sources of data. Wouldn’t it be useful if it did?”
  • 43. Workflow Overview Question Answer « Quels vont être les dividendes payés par action par le Groupe Crit ? » « Le Groupe CRIT proposera lors de sa prochaine Assemblée Générale, le 9 juin 2023, le versement d'un dividende exceptionnel de 3,5 € par action. » The example (the question and associated answer) is a real example (the LLM was “gpt-3.5-turbo” from OpenAI) Technique described in: Lewis et al., 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. (doi.org/10.48550/arXiv.2005.11401) Extracting relevant information (“context”) Generate a prompt accordingly (“question + context”) LLM Vector store Split into chunks 1 2 3 Compute embeddings
  • 44. Preliminary Prototype Financial reports retrieved directly from the French AMF (“Autorité des marchés financiers”) via their API (info-financiere.fr). xhtml document in French language. Question and answer are in English (they would be in French should the question be asked in French).
  • 45. Except where otherwise noted, this work is licensed under https://creativecommons.org/licenses/by/4.0/ 619.io