SlideShare a Scribd company logo
1 of 14
Download to read offline
How Does Generative AI Actually Work?
(a quick semi-technical introduction to Large Language Models)
William H. Press
Department of Computer Science
The University of Texas at Austin
May 12, 2023
What word
comes after
“Happy…” Is it “aardvark”? Or “birthday”?
Simple probabilities (but trillions of them) are
at the root of all large language models (LLMs)
Me: ChatGPT, I want to explain sequential word guessing to someone. If a
sentence starts with "The boy" can you tell me what your next 10
sequential guesses of words would be, one word at a time?
Sure! Here are 10 sequential word guesses, one at a time, based on the sentence
starting with "The boy":
1. went, 2. to, 3. the, 4. park, 5. to, 6. play, 7. with, 8. his, 9.friends, 10. and
Keep in mind that as a language model, my guesses are based on patterns and
probabilities derived from the text corpus I was trained on. The actual guesses may
vary depending on the specific context and the model version being used.
What if the sentence instead begins with "The secret behind GPT-4 is"?
Here are 10 sequential word guesses:
1. "its", 2. "ability", 3. "to", 4. "understand", 5. "and", 6. "generate", 7. "human-
like", 8. "text", 9. "based", 10. "on“
Again, these are just guesses based on common patterns and do not represent an
actual sentence.
The distant ancestor of all chatbots is sequential word-
guessing in natural languages. Hence “generative”.
LLMs as “lossy text compression”
• Represent training corpus more compactly by finding and
encoding its structures and relationships
• eliminate redundancy at all levels: syntactic, semantic, multiple
sources, etc.
• the result encoded in ~109-1012 matrix “weights”
• “Lossy” because information is irretrievably lost
• prompts are answered by (sort-of) decompressing into highly
probable responses that could have been in the training data, but,
in general, weren’t exactly so verbatim
• The decompressed data maintains accuracy when…
• it is “common sense” or “conventional wisdom”
• because then a huge redundancy in the training data
• But can be wildly inaccurate (like “digital artifacts” in a
defective video) if query is not well represented in the
compressed training corpus
• e.g., most probable answer comes from one (wrong) document
• or variants of a widely circulated conspiracy theory
• if uncompressing from no germane data at all, it just makes things
up (“hallucinates”) to get the most probable response
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
GPT-4: OpenAI’s latest released Large Language Model
• OpenAI isn’t actually open! Can for many purposes be
thought of as practically a Microsoft subsidiary.
• Microsoft is said to provide the unique hardware infrastructure for
OpenAI algorithm development.
• GPT = Generative Pre-trained Transformer
• Thought to have >5x1011 trainable parameters.
• GPT-3 had 1.75x1011
• Trained on > several terabytes of language data
• Training cost claimed to be $100 million
• but this might be including amortized R&D
• once trained, cost per query is millicents per token
• I will highlight three key elements of “secret sauce”:
• 1. transformer architecture
• 2. huge scale of parameter space training corpus
• 3. “RLHF” Reinforcement Learning from Human Feedback
• mostly not reported on
Key #1: Transformer architecture. It is a distant
descendant of document query concepts
Document retrieval:
• Input text projected onto matrix of
possible queries
• Matrix multiply to cross queries
with keys (e.g., keywords)
• Matrix multiply to map result from
keys to values (e.g., documents)
• The brilliant idea of Vaswani et al.
(2017, “Attention Is All You Need”) is
map all of Q, K, V from the same input.
• This is “Self-Attention”
• And have all of Q, K, V learned.
• Many layers allows attention to many
different levels of structure
simultaneously
• This is “Multi-headed”
https://dugas.ch/artificial_curiosity/GPT_architecture.html
input
processing
stuff
(encoder)
output
processing
stuff
(decoder)
~ 103 layers?
Key #2: Sheer scale: Only a few dare to call it
emergence, but the gain-of-function is striking
• Transformer parameters:
• trillion parameters =(?) 1000 parallel instances of billion
• billion parameters in an instance =(?) 104 each query space, key space,
value space (multiply two at a time) + “glue”
• could think of as looking at every token list 107 ways in formulating the
next response
• “stateless”: looks at whole previous dialogue as a new token list, maximum
length 32768 tokens
• Training corpus parameters:
• many terabytes?
• ~1000x number of words a human hears or reads in a lifetime
• Many proprietary tricks:
• how to propagate gradients through the huge scale?
• how to maximize parallelism in training?
• special hardware?
• rumored that GPT-5 training is on hold because not enough GPUs are
obtainable in the world.
Key #3: Reinforcement Learning with
Human Feedback (RLHF)
• Said to involve 40 contractors with (each?) hundreds of workers.
• 1. Humans generate an initial dataset of “typical” queries and “good” responses.
• 2. Humans rank a separate dataset of responses (some good, some bad).
• 3. The reward model is trained on this data.
• 4. The language model trains itself to satisfy the reward model. (How often?)
• 5. The reward model is retrained with (free!) input from hundreds of millions of
users. (How often?)
user
prompt
“most-human”
output
many
output
guesses
feedback
https://aisupremacy.substack.com/p/
what-is-reinforcement-learning-with
https://www.bradknox.net/
human-reward/tamer/
In typical usage, RLHF doesn't directly train the base GPT model. The GPT
model is first pretrained on a large corpus of text data, learning to predict
the next word in a sentence. This pretraining process is unsupervised and
uses a variant of the Transformer architecture to learn patterns and
structures in the data.
After this pretraining, the model can be fine-tuned using reinforcement
learning from human feedback. In this process, human evaluators rank
different responses generated by the model, and these rankings are used
to create a reward model. The model is then fine-tuned to optimize this
reward model, typically using a method like Proximal Policy Optimization.
In this way, the RLHF is a separate step that fine-tunes the pretrained GPT
model, rather than directly training the initial GPT weights. However,
please note that this information is based on the methodology up to
September 2021, and OpenAI's approach may have evolved since then.
Me: ChatGPT, When GPT-3 or GPT-4 uses RLHF, is it a separate model, or
does RLHF train the GPT weights directly?
“There is no moat?” (Some insiders think that the technology
can be miniaturized and will leak via open source availability to
individuals and thence startups.)
• Meta/Facebook’s entire LLM was leaked online in March, 2023.
• and later released publicly
• A May, 2023, leaked Google document (one engineer’s private
view):
• People are now running foundation models on a Pixel 6 at 5 tokens/s
• You can fine-tune a personalized AI on your laptop in an evening.
• Open-source models are [he says] faster, more customizable, more
private, and pound-for-pound more capable.
• “They are doing things with $100 and 13B params that we struggle with
at $10M and 540B.”
• LORa (low-rank factorization of large models) saves factor ~103
• and is composable to train new models, no need to go back to full model
• Big companies might be held liable for copyright infringement on
training data, or have to buy it
• Open-source leaked models might (de-facto) not be
• Scenario: A bunch of (LORa?) matrices appear on a Discord server
Me: ChatGPT, To what extent can LLMs be miniaturized, for example by low
rank factorization?
As of my last update in September 2021, there's been some promising
research showing that low-rank factorization can be used to compress neural
networks, including Transformer-based models. Weight matrices in the
Transformer's self-attention layers can be approximated well with low-rank
matrices, which can significantly reduce the model's size with minimal loss in
performance.
Low-rank factorization is just one of many techniques for model
compression. Others include knowledge distillation, where a smaller model is
trained to mimic the behavior of a larger model, and pruning, where less
important parts of the model are removed.
Me: ChatGPT, Is LLM training data subject to copyright and thus needs to be
licensed?
The use of copyrighted material in training LLMs may be considered a
copyright infringement. The models learn by consuming vast amounts of text
data which may include copyrighted works. By using this information to
generate new text, the models are indirectly reproducing and repurposing
parts of the copyrighted works they were trained on, albeit in a non-
verbatim manner. Some argue that this constitutes a derivative work, which
is a right reserved for the copyright holder under U.S. Copyright Law and
international copyright agreements. The copyright holder has exclusive rights
to reproduce and create derivative works from their material.
The counter-argument hinges on the doctrine of fair use. Advocates can
argue that LLMs, which generate new, original text, use copyrighted material
in a transformative manner, this a crucial consideration in fair use
determinations. The vast amount of training data also means any specific
copyrighted work constitutes an insubstantial portion of the input. Lastly, this
use doesn't harm the market for the original works but may encourage
further knowledge creation. However, as of September 2021, these complex
legal questions lack clear legal precedent and interpretations may vary across
jurisdictions.
The LLM Value Chain:
figure courtesy W. Dally
May turn out that most value is with the end-service provider
who specializes the model and knows the end-customer needs.
Thank you.

More Related Content

What's hot

Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)Naoki (Neo) SATO
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!taozen
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersSteven Van Vaerenbergh
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsAdventureWorld5
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfDung Hoang
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AICori Faklaris
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdfChaoYang81
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyPekka Abrahamsson / Tampere University
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAnant Corporation
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveHuahai Yang
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPTLoic Merckel
 
Understanding generative AI models A comprehensive overview.pdf
Understanding generative AI models A comprehensive overview.pdfUnderstanding generative AI models A comprehensive overview.pdf
Understanding generative AI models A comprehensive overview.pdfStephenAmell4
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023HyunJoon Jung
 

What's hot (20)

Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language models
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AI
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdf
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
 
Journey of Generative AI
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
 
Generative AI
Generative AIGenerative AI
Generative AI
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's Perspective
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
 
Understanding generative AI models A comprehensive overview.pdf
Understanding generative AI models A comprehensive overview.pdfUnderstanding generative AI models A comprehensive overview.pdf
Understanding generative AI models A comprehensive overview.pdf
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 

Similar to How Does Generative AI Actually Work? (a quick semi-technical introduction to Large Language Models)

Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine LearningSri Ambati
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...rahul_net
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Robert McDermott
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Robert McDermott
 
DMDS Winter 2015 Workshop 1 slides
DMDS Winter 2015 Workshop 1 slidesDMDS Winter 2015 Workshop 1 slides
DMDS Winter 2015 Workshop 1 slidesPaige Morgan
 
Using Generative AI in the Classroom .pptx
Using Generative AI in the Classroom .pptxUsing Generative AI in the Classroom .pptx
Using Generative AI in the Classroom .pptxJonathanDietz3
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMsLoic Merckel
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator ProgramGoDataDriven
 
Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Paul Houle
 
Patterns of Semantic Integration
Patterns of Semantic IntegrationPatterns of Semantic Integration
Patterns of Semantic IntegrationOptum
 
Dmdh winter 2015 session #1
Dmdh winter 2015 session #1Dmdh winter 2015 session #1
Dmdh winter 2015 session #1sarahkh12
 
Writing Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric ProfilingWriting Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric ProfilingGeorgeMikros3
 

Similar to How Does Generative AI Actually Work? (a quick semi-technical introduction to Large Language Models) (20)

Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
ML crash course
ML crash courseML crash course
ML crash course
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
 
DMDS Winter 2015 Workshop 1 slides
DMDS Winter 2015 Workshop 1 slidesDMDS Winter 2015 Workshop 1 slides
DMDS Winter 2015 Workshop 1 slides
 
Using Generative AI in the Classroom .pptx
Using Generative AI in the Classroom .pptxUsing Generative AI in the Classroom .pptx
Using Generative AI in the Classroom .pptx
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
 
Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6
 
ms_3.pdf
ms_3.pdfms_3.pdf
ms_3.pdf
 
Patterns of Semantic Integration
Patterns of Semantic IntegrationPatterns of Semantic Integration
Patterns of Semantic Integration
 
Dmdh winter 2015 session #1
Dmdh winter 2015 session #1Dmdh winter 2015 session #1
Dmdh winter 2015 session #1
 
Oss swot
Oss swotOss swot
Oss swot
 
Writing Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric ProfilingWriting Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric Profiling
 

Recently uploaded

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 

Recently uploaded (20)

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 

How Does Generative AI Actually Work? (a quick semi-technical introduction to Large Language Models)

  • 1. How Does Generative AI Actually Work? (a quick semi-technical introduction to Large Language Models) William H. Press Department of Computer Science The University of Texas at Austin May 12, 2023
  • 2. What word comes after “Happy…” Is it “aardvark”? Or “birthday”? Simple probabilities (but trillions of them) are at the root of all large language models (LLMs)
  • 3. Me: ChatGPT, I want to explain sequential word guessing to someone. If a sentence starts with "The boy" can you tell me what your next 10 sequential guesses of words would be, one word at a time? Sure! Here are 10 sequential word guesses, one at a time, based on the sentence starting with "The boy": 1. went, 2. to, 3. the, 4. park, 5. to, 6. play, 7. with, 8. his, 9.friends, 10. and Keep in mind that as a language model, my guesses are based on patterns and probabilities derived from the text corpus I was trained on. The actual guesses may vary depending on the specific context and the model version being used. What if the sentence instead begins with "The secret behind GPT-4 is"? Here are 10 sequential word guesses: 1. "its", 2. "ability", 3. "to", 4. "understand", 5. "and", 6. "generate", 7. "human- like", 8. "text", 9. "based", 10. "on“ Again, these are just guesses based on common patterns and do not represent an actual sentence. The distant ancestor of all chatbots is sequential word- guessing in natural languages. Hence “generative”.
  • 4. LLMs as “lossy text compression” • Represent training corpus more compactly by finding and encoding its structures and relationships • eliminate redundancy at all levels: syntactic, semantic, multiple sources, etc. • the result encoded in ~109-1012 matrix “weights” • “Lossy” because information is irretrievably lost • prompts are answered by (sort-of) decompressing into highly probable responses that could have been in the training data, but, in general, weren’t exactly so verbatim • The decompressed data maintains accuracy when… • it is “common sense” or “conventional wisdom” • because then a huge redundancy in the training data • But can be wildly inaccurate (like “digital artifacts” in a defective video) if query is not well represented in the compressed training corpus • e.g., most probable answer comes from one (wrong) document • or variants of a widely circulated conspiracy theory • if uncompressing from no germane data at all, it just makes things up (“hallucinates”) to get the most probable response https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
  • 5. GPT-4: OpenAI’s latest released Large Language Model • OpenAI isn’t actually open! Can for many purposes be thought of as practically a Microsoft subsidiary. • Microsoft is said to provide the unique hardware infrastructure for OpenAI algorithm development. • GPT = Generative Pre-trained Transformer • Thought to have >5x1011 trainable parameters. • GPT-3 had 1.75x1011 • Trained on > several terabytes of language data • Training cost claimed to be $100 million • but this might be including amortized R&D • once trained, cost per query is millicents per token • I will highlight three key elements of “secret sauce”: • 1. transformer architecture • 2. huge scale of parameter space training corpus • 3. “RLHF” Reinforcement Learning from Human Feedback • mostly not reported on
  • 6. Key #1: Transformer architecture. It is a distant descendant of document query concepts Document retrieval: • Input text projected onto matrix of possible queries • Matrix multiply to cross queries with keys (e.g., keywords) • Matrix multiply to map result from keys to values (e.g., documents) • The brilliant idea of Vaswani et al. (2017, “Attention Is All You Need”) is map all of Q, K, V from the same input. • This is “Self-Attention” • And have all of Q, K, V learned. • Many layers allows attention to many different levels of structure simultaneously • This is “Multi-headed” https://dugas.ch/artificial_curiosity/GPT_architecture.html input processing stuff (encoder) output processing stuff (decoder) ~ 103 layers?
  • 7. Key #2: Sheer scale: Only a few dare to call it emergence, but the gain-of-function is striking • Transformer parameters: • trillion parameters =(?) 1000 parallel instances of billion • billion parameters in an instance =(?) 104 each query space, key space, value space (multiply two at a time) + “glue” • could think of as looking at every token list 107 ways in formulating the next response • “stateless”: looks at whole previous dialogue as a new token list, maximum length 32768 tokens • Training corpus parameters: • many terabytes? • ~1000x number of words a human hears or reads in a lifetime • Many proprietary tricks: • how to propagate gradients through the huge scale? • how to maximize parallelism in training? • special hardware? • rumored that GPT-5 training is on hold because not enough GPUs are obtainable in the world.
  • 8. Key #3: Reinforcement Learning with Human Feedback (RLHF) • Said to involve 40 contractors with (each?) hundreds of workers. • 1. Humans generate an initial dataset of “typical” queries and “good” responses. • 2. Humans rank a separate dataset of responses (some good, some bad). • 3. The reward model is trained on this data. • 4. The language model trains itself to satisfy the reward model. (How often?) • 5. The reward model is retrained with (free!) input from hundreds of millions of users. (How often?) user prompt “most-human” output many output guesses feedback https://aisupremacy.substack.com/p/ what-is-reinforcement-learning-with https://www.bradknox.net/ human-reward/tamer/
  • 9. In typical usage, RLHF doesn't directly train the base GPT model. The GPT model is first pretrained on a large corpus of text data, learning to predict the next word in a sentence. This pretraining process is unsupervised and uses a variant of the Transformer architecture to learn patterns and structures in the data. After this pretraining, the model can be fine-tuned using reinforcement learning from human feedback. In this process, human evaluators rank different responses generated by the model, and these rankings are used to create a reward model. The model is then fine-tuned to optimize this reward model, typically using a method like Proximal Policy Optimization. In this way, the RLHF is a separate step that fine-tunes the pretrained GPT model, rather than directly training the initial GPT weights. However, please note that this information is based on the methodology up to September 2021, and OpenAI's approach may have evolved since then. Me: ChatGPT, When GPT-3 or GPT-4 uses RLHF, is it a separate model, or does RLHF train the GPT weights directly?
  • 10. “There is no moat?” (Some insiders think that the technology can be miniaturized and will leak via open source availability to individuals and thence startups.) • Meta/Facebook’s entire LLM was leaked online in March, 2023. • and later released publicly • A May, 2023, leaked Google document (one engineer’s private view): • People are now running foundation models on a Pixel 6 at 5 tokens/s • You can fine-tune a personalized AI on your laptop in an evening. • Open-source models are [he says] faster, more customizable, more private, and pound-for-pound more capable. • “They are doing things with $100 and 13B params that we struggle with at $10M and 540B.” • LORa (low-rank factorization of large models) saves factor ~103 • and is composable to train new models, no need to go back to full model • Big companies might be held liable for copyright infringement on training data, or have to buy it • Open-source leaked models might (de-facto) not be • Scenario: A bunch of (LORa?) matrices appear on a Discord server
  • 11. Me: ChatGPT, To what extent can LLMs be miniaturized, for example by low rank factorization? As of my last update in September 2021, there's been some promising research showing that low-rank factorization can be used to compress neural networks, including Transformer-based models. Weight matrices in the Transformer's self-attention layers can be approximated well with low-rank matrices, which can significantly reduce the model's size with minimal loss in performance. Low-rank factorization is just one of many techniques for model compression. Others include knowledge distillation, where a smaller model is trained to mimic the behavior of a larger model, and pruning, where less important parts of the model are removed.
  • 12. Me: ChatGPT, Is LLM training data subject to copyright and thus needs to be licensed? The use of copyrighted material in training LLMs may be considered a copyright infringement. The models learn by consuming vast amounts of text data which may include copyrighted works. By using this information to generate new text, the models are indirectly reproducing and repurposing parts of the copyrighted works they were trained on, albeit in a non- verbatim manner. Some argue that this constitutes a derivative work, which is a right reserved for the copyright holder under U.S. Copyright Law and international copyright agreements. The copyright holder has exclusive rights to reproduce and create derivative works from their material. The counter-argument hinges on the doctrine of fair use. Advocates can argue that LLMs, which generate new, original text, use copyrighted material in a transformative manner, this a crucial consideration in fair use determinations. The vast amount of training data also means any specific copyrighted work constitutes an insubstantial portion of the input. Lastly, this use doesn't harm the market for the original works but may encourage further knowledge creation. However, as of September 2021, these complex legal questions lack clear legal precedent and interpretations may vary across jurisdictions.
  • 13. The LLM Value Chain: figure courtesy W. Dally May turn out that most value is with the end-service provider who specializes the model and knows the end-customer needs.