SlideShare a Scribd company logo
Large Language Models Are
Reasoning Teachers
Namgyu Ho Laura Schmid Se-Young Yun
KAIST AI
🧑🏫
Short Summary
§ Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning
… in huge models with over 100B 🤯 parameters.
Large Language Models Are Reasoning Teachers
Short Summary
§ Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning
… in huge models with over 400GB VRAM 💰.
Large Language Models Are Reasoning Teachers
Short Summary
§ Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning
… in huge models with over 100B 🤯 parameters.
§ We use GPT-3 175B as a reasoning teacher 🧑🏫
to teach smaller students with 70M‒6.7B parameters.
Large Language Models Are Reasoning Teachers
Short Summary
§ Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning
… in huge models with over 100B 🤯 parameters.
§ We use GPT-3 175B as a reasoning teacher 🧑🏫
to teach smaller students with 70M‒6.7B parameters.
§ Diverse reasoning ✨ is a simple way to boost teaching.
Large Language Models Are Reasoning Teachers
Short Summary
§ Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning
… in huge models with over 100B 🤯 parameters.
§ We use GPT-3 175B as a reasoning teacher 🧑🏫
to teach smaller students with 70M‒6.7B parameters.
§ Diverse reasoning ✨ is a simple way to boost teaching.
§ Extensive analysis 🕵 on the emergence of reasoning.
Large Language Models Are Reasoning Teachers
Introduction
§ Background: chain-of-thought (CoT) prompting [Weil 2022] elicits models to
solve complex reasoning tasks step-by-step
§ Standard prompting is insu cient.
Large Language Models Are Reasoning Teachers
Introduction
§ Background: chain-of-thought (CoT) prompting [Weil 2022] elicits models to
solve complex reasoning tasks step-by-step
§ Standard prompting is insufficient.
§ Limitation: CoT prompting is only applicable to very large models such as GPT-
3 175B and PaLM.
Large Language Models Are Reasoning Teachers
Introduction
§ Background: chain-of-thought (CoT) prompting [Weil 2022] elicits models to
solve complex reasoning tasks step-by-step
§ Standard prompting is insufficient.
§ Limitation: CoT prompting is only applicable to very large models such as GPT-
3 175B and PaLM.
§ Solution: apply CoT prompting on very large models to generate training data
on complex reasoning for smaller models.
Large Language Models Are Reasoning Teachers
Method: Fine-tune-CoT
Large Language Models Are Reasoning Teachers
Original Sample
Question
A pet store had 56 puppies. In one
day they sold 24 of them and put the
rest into cages with 4 in each cage.
How many cages did they use?
Answer
.8.
Prompt (Zero-shot-CoT)
Q: A pet store had 56 puppies. In one day
they sold 24 of them and put the rest into
cages with 4 in each cage. How many
cages did they use?
A: Let’s think step by step.
Completion (Generated)
The store started with 56 puppies. 24 of
them were sold, so that means that there
are now 32 puppies left. Since there are 4
puppies in each cage, that means that the
store now has .8 cages.
Step 1. Reasoning Generation
Large 175B Teacher Model
Step 2. Curation
Small Student Model
Prompt
A pet store had 56 puppies. In one day
they sold 24 of them and put the rest into
cages with 4 in each cage. How many
cages did they use? ###
Completion
The store started with 56 puppies. 24 of
them were sold, so that means that there
are now 32 puppies left. Since there are
4 puppies in each cage, that means that
the store now has 8 cages.
--> 8 END
Reasoning Sample (Curated)
Dataset
Step 3. Fine-tuning
{
Diverse Reasoning
Method: Fine-tune-CoT
Large Language Models Are Reasoning Teachers
Original Sample
Question
A pet store had 56 puppies. In one
day they sold 24 of them and put the
rest into cages with 4 in each cage.
How many cages did they use?
Answer
.8.
Prompt (Zero-shot-CoT)
Q: A pet store had 56 puppies. In one day
they sold 24 of them and put the rest into
cages with 4 in each cage. How many
cages did they use?
A: Let’s think step by step.
Completion (Generated)
The store started with 56 puppies. 24 of
them were sold, so that means that there
are now 32 puppies left. Since there are 4
puppies in each cage, that means that the
store now has .8 cages.
Step 1. Reasoning Generation
Large 175B Teacher Model
Step 2. Curation
Small Student Model
Prompt
A pet store had 56 puppies. In one day
they sold 24 of them and put the rest into
cages with 4 in each cage. How many
cages did they use? ###
Completion
The store started with 56 puppies. 24 of
them were sold, so that means that there
are now 32 puppies left. Since there are
4 puppies in each cage, that means that
the store now has 8 cages.
--> 8 END
Reasoning Sample (Curated)
Dataset
Step 3. Fine-tuning
{
Diverse Reasoning
✨
Results
Large Language Models Are Reasoning Teachers
Results
Large Language Models Are Reasoning Teachers
Results
Large Language Models Are Reasoning Teachers
Results
Large Language Models Are Reasoning Teachers
§ Fine-tune-CoT enables significant reasoning capabilities in small models.
§ Diverse reasoning boosts performance substantially.
Results
Large Language Models Are Reasoning Teachers
§ Performance Scalability
1. Diverse reasoning
2. Dataset size
3. Teacher performance
4. Student model scale
Results
Large Language Models Are Reasoning Teachers
§ Performance Scalability
1. Diverse reasoning
2. Dataset size
3. Teacher performance
4. Student model scale
Results
Large Language Models Are Reasoning Teachers
§ Performance Scalability
1. Diverse reasoning
2. Dataset size
3. Teacher performance
4. Student model scale
Results
Large Language Models Are Reasoning Teachers
§ Performance Scalability
1. Diverse reasoning
2. Dataset size
3. Teacher performance
4. Student model scale
Results
Large Language Models Are Reasoning Teachers
§ Fine-tune-CoT enables significant reasoning capabilities in small models.
§ Diverse reasoning boosts performance substantially.
§ Performance is highly scalable under Fine-tune-CoT.
Results
Large Language Models Are Reasoning Teachers
§ Fine-tune-CoT enables signi cant reasoning capabilities in small models.
§ Diverse reasoning boosts performance substantially.
§ Performance is highly scalable under Fine-tune-CoT.
§ Tradeo s must be considered between
§ Development-time cost: diverse reasoning, dataset size, teacher model
§ Inference-time cost: student model
(Analysis & Discussion)
§ Cost analysis of data acquisition
§ How to filter teacher reasoning samples. Do we need to?
§ Emergence of reasoning in small language models
§ Distillation of emergent abilities
§ Connection with knowledge distillation
Large Language Models Are Reasoning Teachers
Takeaways
§ Simple distillation can transfer 🧚 reasoning abilities from very large teachers
to small students <1B for a single domain.
§ What about other emergent abilities?
§ Fine-tune-CoT with diverse reasoning is an accessible and e ective approach
which is highly scalable.
§ Distillation poses a tradeo between development costs and inference
cost/quality.
Large Language Models Are Reasoning Teachers
Large Language Models Are
Reasoning Teachers
Namgyu Ho Laura Schmid Se-Young Yun
KAIST AI
🧑🏫
Paper
§ Why does reasoning
emerge in small models
§ Results on GPT-2, T5
Code
§ All code and data
§ $1000+ worth of teacher data
with ❤ from OSI LAB @ KAIST.

More Related Content

What's hot

Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer Models
Databricks
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
Loic Merckel
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Robert McDermott
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
Loic Merckel
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
Leon Dohmen
 
An overview of foundation models.pdf
An overview of foundation models.pdfAn overview of foundation models.pdf
An overview of foundation models.pdf
StephenAmell4
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
ssuser4edc93
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
Fiza987241
 
LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
OzgurOscarOzkan
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
Ivo Andreev
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
SaiPragnaKancheti
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
David Talby
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
David Rostcheck
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
Hady Elsahar
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
Gautier Marti
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
taozen
 
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
Gen AI Cognizant & AWS event presentation_12 Oct.pdfGen AI Cognizant & AWS event presentation_12 Oct.pdf
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
PhilipBasford
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Po-Chuan Chen
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
rahul_net
 

What's hot (20)

Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer Models
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
An overview of foundation models.pdf
An overview of foundation models.pdfAn overview of foundation models.pdf
An overview of foundation models.pdf
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
 
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
Gen AI Cognizant & AWS event presentation_12 Oct.pdfGen AI Cognizant & AWS event presentation_12 Oct.pdf
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
 

Similar to Large Language Models Are Reasoning Teachers

DLL_MATHEMATICS 5_Q1_W2.pdf
DLL_MATHEMATICS 5_Q1_W2.pdfDLL_MATHEMATICS 5_Q1_W2.pdf
DLL_MATHEMATICS 5_Q1_W2.pdf
RoyCEstenzo
 
T.E.A.C.H. Academy Course 11
T.E.A.C.H. Academy Course 11T.E.A.C.H. Academy Course 11
T.E.A.C.H. Academy Course 11
Jimmy Keng
 
Communication jarmilas presentation
Communication jarmilas presentationCommunication jarmilas presentation
Communication jarmilas presentation
Iuliana Filfanescu
 
Ap think lang ss
Ap think lang ssAp think lang ss
Ap think lang ssMrAguiar
 
UDL and CCSS in Math
UDL and CCSS in MathUDL and CCSS in Math
UDL and CCSS in Mathmurphy62
 
G6 m2-a-lesson 2-t
G6 m2-a-lesson 2-tG6 m2-a-lesson 2-t
G6 m2-a-lesson 2-tmlabuski
 
Designing quality open ended tasks
Designing quality open ended tasksDesigning quality open ended tasks
Designing quality open ended tasksevat71
 
Putting the Mathematical Practices Into Action
Putting the Mathematical Practices Into ActionPutting the Mathematical Practices Into Action
Putting the Mathematical Practices Into Actiondlschulz
 
(7) lesson 4.6 - Multiply Fractions
(7) lesson 4.6 - Multiply Fractions(7) lesson 4.6 - Multiply Fractions
(7) lesson 4.6 - Multiply Fractions
wzuri
 
Term 2 2013 rich tasks etc
Term 2 2013 rich tasks etcTerm 2 2013 rich tasks etc
Term 2 2013 rich tasks etcSimon Borgert
 
NCTM 2010 Regional Conferences & Expositions Denver 1
NCTM 2010 Regional Conferences & Expositions Denver 1NCTM 2010 Regional Conferences & Expositions Denver 1
NCTM 2010 Regional Conferences & Expositions Denver 1
Jimmy Keng
 
Lockers redesign
Lockers redesignLockers redesign
Lockers redesign
waglik
 
Video Lesson Plan
Video Lesson PlanVideo Lesson Plan
Video Lesson Plancprue22
 
SNC 2020 MATHEMATICS Final final.pptx
SNC 2020 MATHEMATICS Final final.pptxSNC 2020 MATHEMATICS Final final.pptx
SNC 2020 MATHEMATICS Final final.pptx
International advisers
 
Math for biotechnology bio link-fellows2012
Math for biotechnology bio link-fellows2012Math for biotechnology bio link-fellows2012
Math for biotechnology bio link-fellows2012
bio-link
 
Six principles of effective teaching of mathematics
Six principles of effective teaching of mathematicsSix principles of effective teaching of mathematics
Six principles of effective teaching of mathematics
The Australian Association of Mathematics Teachers (AAMT) Inc.
 
Professional Development: RIGOR
Professional Development: RIGORProfessional Development: RIGOR
Professional Development: RIGORlindseyjbarker
 
Sea to sky.learning.keynote
Sea to sky.learning.keynoteSea to sky.learning.keynote
Sea to sky.learning.keynote
Faye Brownlie
 
Math Rotations: A Strategy for Teaching Math
Math Rotations: A Strategy for Teaching MathMath Rotations: A Strategy for Teaching Math
Math Rotations: A Strategy for Teaching Math
rachelrhorn
 
Scaffolding instruction using the workshop model in pbl
Scaffolding instruction   using the workshop model in pblScaffolding instruction   using the workshop model in pbl
Scaffolding instruction using the workshop model in pbljeffcockrum
 

Similar to Large Language Models Are Reasoning Teachers (20)

DLL_MATHEMATICS 5_Q1_W2.pdf
DLL_MATHEMATICS 5_Q1_W2.pdfDLL_MATHEMATICS 5_Q1_W2.pdf
DLL_MATHEMATICS 5_Q1_W2.pdf
 
T.E.A.C.H. Academy Course 11
T.E.A.C.H. Academy Course 11T.E.A.C.H. Academy Course 11
T.E.A.C.H. Academy Course 11
 
Communication jarmilas presentation
Communication jarmilas presentationCommunication jarmilas presentation
Communication jarmilas presentation
 
Ap think lang ss
Ap think lang ssAp think lang ss
Ap think lang ss
 
UDL and CCSS in Math
UDL and CCSS in MathUDL and CCSS in Math
UDL and CCSS in Math
 
G6 m2-a-lesson 2-t
G6 m2-a-lesson 2-tG6 m2-a-lesson 2-t
G6 m2-a-lesson 2-t
 
Designing quality open ended tasks
Designing quality open ended tasksDesigning quality open ended tasks
Designing quality open ended tasks
 
Putting the Mathematical Practices Into Action
Putting the Mathematical Practices Into ActionPutting the Mathematical Practices Into Action
Putting the Mathematical Practices Into Action
 
(7) lesson 4.6 - Multiply Fractions
(7) lesson 4.6 - Multiply Fractions(7) lesson 4.6 - Multiply Fractions
(7) lesson 4.6 - Multiply Fractions
 
Term 2 2013 rich tasks etc
Term 2 2013 rich tasks etcTerm 2 2013 rich tasks etc
Term 2 2013 rich tasks etc
 
NCTM 2010 Regional Conferences & Expositions Denver 1
NCTM 2010 Regional Conferences & Expositions Denver 1NCTM 2010 Regional Conferences & Expositions Denver 1
NCTM 2010 Regional Conferences & Expositions Denver 1
 
Lockers redesign
Lockers redesignLockers redesign
Lockers redesign
 
Video Lesson Plan
Video Lesson PlanVideo Lesson Plan
Video Lesson Plan
 
SNC 2020 MATHEMATICS Final final.pptx
SNC 2020 MATHEMATICS Final final.pptxSNC 2020 MATHEMATICS Final final.pptx
SNC 2020 MATHEMATICS Final final.pptx
 
Math for biotechnology bio link-fellows2012
Math for biotechnology bio link-fellows2012Math for biotechnology bio link-fellows2012
Math for biotechnology bio link-fellows2012
 
Six principles of effective teaching of mathematics
Six principles of effective teaching of mathematicsSix principles of effective teaching of mathematics
Six principles of effective teaching of mathematics
 
Professional Development: RIGOR
Professional Development: RIGORProfessional Development: RIGOR
Professional Development: RIGOR
 
Sea to sky.learning.keynote
Sea to sky.learning.keynoteSea to sky.learning.keynote
Sea to sky.learning.keynote
 
Math Rotations: A Strategy for Teaching Math
Math Rotations: A Strategy for Teaching MathMath Rotations: A Strategy for Teaching Math
Math Rotations: A Strategy for Teaching Math
 
Scaffolding instruction using the workshop model in pbl
Scaffolding instruction   using the workshop model in pblScaffolding instruction   using the workshop model in pbl
Scaffolding instruction using the workshop model in pbl
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Large Language Models Are Reasoning Teachers

  • 1. Large Language Models Are Reasoning Teachers Namgyu Ho Laura Schmid Se-Young Yun KAIST AI 🧑🏫
  • 2. Short Summary § Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning … in huge models with over 100B 🤯 parameters. Large Language Models Are Reasoning Teachers
  • 3. Short Summary § Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning … in huge models with over 400GB VRAM 💰. Large Language Models Are Reasoning Teachers
  • 4. Short Summary § Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning … in huge models with over 100B 🤯 parameters. § We use GPT-3 175B as a reasoning teacher 🧑🏫 to teach smaller students with 70M‒6.7B parameters. Large Language Models Are Reasoning Teachers
  • 5. Short Summary § Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning … in huge models with over 100B 🤯 parameters. § We use GPT-3 175B as a reasoning teacher 🧑🏫 to teach smaller students with 70M‒6.7B parameters. § Diverse reasoning ✨ is a simple way to boost teaching. Large Language Models Are Reasoning Teachers
  • 6. Short Summary § Chain-of-thought (CoT) reasoning [Wei 2022] enables complex reasoning … in huge models with over 100B 🤯 parameters. § We use GPT-3 175B as a reasoning teacher 🧑🏫 to teach smaller students with 70M‒6.7B parameters. § Diverse reasoning ✨ is a simple way to boost teaching. § Extensive analysis 🕵 on the emergence of reasoning. Large Language Models Are Reasoning Teachers
  • 7. Introduction § Background: chain-of-thought (CoT) prompting [Weil 2022] elicits models to solve complex reasoning tasks step-by-step § Standard prompting is insu cient. Large Language Models Are Reasoning Teachers
  • 8. Introduction § Background: chain-of-thought (CoT) prompting [Weil 2022] elicits models to solve complex reasoning tasks step-by-step § Standard prompting is insufficient. § Limitation: CoT prompting is only applicable to very large models such as GPT- 3 175B and PaLM. Large Language Models Are Reasoning Teachers
  • 9. Introduction § Background: chain-of-thought (CoT) prompting [Weil 2022] elicits models to solve complex reasoning tasks step-by-step § Standard prompting is insufficient. § Limitation: CoT prompting is only applicable to very large models such as GPT- 3 175B and PaLM. § Solution: apply CoT prompting on very large models to generate training data on complex reasoning for smaller models. Large Language Models Are Reasoning Teachers
  • 10. Method: Fine-tune-CoT Large Language Models Are Reasoning Teachers Original Sample Question A pet store had 56 puppies. In one day they sold 24 of them and put the rest into cages with 4 in each cage. How many cages did they use? Answer .8. Prompt (Zero-shot-CoT) Q: A pet store had 56 puppies. In one day they sold 24 of them and put the rest into cages with 4 in each cage. How many cages did they use? A: Let’s think step by step. Completion (Generated) The store started with 56 puppies. 24 of them were sold, so that means that there are now 32 puppies left. Since there are 4 puppies in each cage, that means that the store now has .8 cages. Step 1. Reasoning Generation Large 175B Teacher Model Step 2. Curation Small Student Model Prompt A pet store had 56 puppies. In one day they sold 24 of them and put the rest into cages with 4 in each cage. How many cages did they use? ### Completion The store started with 56 puppies. 24 of them were sold, so that means that there are now 32 puppies left. Since there are 4 puppies in each cage, that means that the store now has 8 cages. --> 8 END Reasoning Sample (Curated) Dataset Step 3. Fine-tuning { Diverse Reasoning
  • 11. Method: Fine-tune-CoT Large Language Models Are Reasoning Teachers Original Sample Question A pet store had 56 puppies. In one day they sold 24 of them and put the rest into cages with 4 in each cage. How many cages did they use? Answer .8. Prompt (Zero-shot-CoT) Q: A pet store had 56 puppies. In one day they sold 24 of them and put the rest into cages with 4 in each cage. How many cages did they use? A: Let’s think step by step. Completion (Generated) The store started with 56 puppies. 24 of them were sold, so that means that there are now 32 puppies left. Since there are 4 puppies in each cage, that means that the store now has .8 cages. Step 1. Reasoning Generation Large 175B Teacher Model Step 2. Curation Small Student Model Prompt A pet store had 56 puppies. In one day they sold 24 of them and put the rest into cages with 4 in each cage. How many cages did they use? ### Completion The store started with 56 puppies. 24 of them were sold, so that means that there are now 32 puppies left. Since there are 4 puppies in each cage, that means that the store now has 8 cages. --> 8 END Reasoning Sample (Curated) Dataset Step 3. Fine-tuning { Diverse Reasoning ✨
  • 12. Results Large Language Models Are Reasoning Teachers
  • 13. Results Large Language Models Are Reasoning Teachers
  • 14. Results Large Language Models Are Reasoning Teachers
  • 15. Results Large Language Models Are Reasoning Teachers § Fine-tune-CoT enables significant reasoning capabilities in small models. § Diverse reasoning boosts performance substantially.
  • 16. Results Large Language Models Are Reasoning Teachers § Performance Scalability 1. Diverse reasoning 2. Dataset size 3. Teacher performance 4. Student model scale
  • 17. Results Large Language Models Are Reasoning Teachers § Performance Scalability 1. Diverse reasoning 2. Dataset size 3. Teacher performance 4. Student model scale
  • 18. Results Large Language Models Are Reasoning Teachers § Performance Scalability 1. Diverse reasoning 2. Dataset size 3. Teacher performance 4. Student model scale
  • 19. Results Large Language Models Are Reasoning Teachers § Performance Scalability 1. Diverse reasoning 2. Dataset size 3. Teacher performance 4. Student model scale
  • 20. Results Large Language Models Are Reasoning Teachers § Fine-tune-CoT enables significant reasoning capabilities in small models. § Diverse reasoning boosts performance substantially. § Performance is highly scalable under Fine-tune-CoT.
  • 21. Results Large Language Models Are Reasoning Teachers § Fine-tune-CoT enables signi cant reasoning capabilities in small models. § Diverse reasoning boosts performance substantially. § Performance is highly scalable under Fine-tune-CoT. § Tradeo s must be considered between § Development-time cost: diverse reasoning, dataset size, teacher model § Inference-time cost: student model
  • 22. (Analysis & Discussion) § Cost analysis of data acquisition § How to filter teacher reasoning samples. Do we need to? § Emergence of reasoning in small language models § Distillation of emergent abilities § Connection with knowledge distillation Large Language Models Are Reasoning Teachers
  • 23. Takeaways § Simple distillation can transfer 🧚 reasoning abilities from very large teachers to small students <1B for a single domain. § What about other emergent abilities? § Fine-tune-CoT with diverse reasoning is an accessible and e ective approach which is highly scalable. § Distillation poses a tradeo between development costs and inference cost/quality. Large Language Models Are Reasoning Teachers
  • 24. Large Language Models Are Reasoning Teachers Namgyu Ho Laura Schmid Se-Young Yun KAIST AI 🧑🏫 Paper § Why does reasoning emerge in small models § Results on GPT-2, T5 Code § All code and data § $1000+ worth of teacher data with ❤ from OSI LAB @ KAIST.