SlideShare a Scribd company logo
Hyung Won Chung Le Hou Shayne Longpre Barret Zoph Yi Tay +30
1
arXiv:2210.11416 (2022)
Scaling Instruction-Finetuned Language Models
Google Research
Presenter : 박산희, 조해창, 변현정, 이세현
Motivation
2
Motivation
3
General ability on unseen tasks
Low resource and efficient scaling
Large Language Models
Motivation
4
Large Language Models
Held-out Tasks
Pathway …., x-shot Tuning
General ability on unseen tasks
Low resource and efficient scaling
Introduction
5
In AI, generalized model to answer the unseen tasks is an important goal.
Finetuning language models on a collection of tasks phrased as instructions enables model
to respond better to instructions and reduces the need for few-shot exemplars.
Introduction
6
The need for scaling on instruction finetuning : number of tasks and the size of the model
MMLU Progress
Image reference : https://www.youtube.com/watch?v=QdwETwqyREY
FLAN (Jason Wei et al., ICLR 2022)
Instruction tuning: finetuning
language models on a collection of
datasets described via instructions
FLAN (Jason Wei et al., ICLR 2022)
A pretrained language model is Instruction tune on the mixture of all datasets with
examples in each dataset.
Mixture
Flan Finetuning
9
473 Datasets, 146 task categories, 1,836 total tasks : combination tasks from Flan (ICLR 2022), T0,
Natural Instruction.
Flan Finetuning
10
473 Datasets, 146 task categories, 1,836 total tasks : combination tasks from Flan (ICLR 2022), T0,
Natural Instruction
Flan Finetuning
11
Chain-of-though finetuning mixture: dataset formats compose of four combinations : w w/o
exemplars & w w/o CoT
→ Finetuning on CoT annotation improves performance on unseen reasoning tasks.
Flan Finetuning
12
Fine tuning procedure : T5, PaLM, U-PaLM
Evaluation Protocol
MMLU, BBH, TyDiQA, MGSM
13
MMLU (Multi Massive Multitask Language Understanding)
Various branch of knowledge (humanities, social sciences, hard sciences, and other areas that are
important for some people to learn) 57 tasks, 15908 questions.
Human-level accuracy is 34.5% (unspecialized humans : AMT), 89.8% (expert-level human)
BIG Benchmark (Beyond the Imitation Game Benchmark)
In traditional NLP, 200 diverse text-based tasks such as mathematics, commonsense reasoning, and
question-answering. BIG-Bench Hard (BBH) is a subset of 23 particularly challenging BIG-Bench tasks.
Multiple-choice qa, open-domain qa, multi-label classification etc.
Evaluation Protocol
MMLU, BBH, TyDiQA, MGSM
14
TyDiQA
Question answering across 8 typologically diverse languages
MGSM
Multilingual benchmark of mathematics problems translated to 10 languages
Code is released at FLAN/flan at main · google-research/FLAN (github.com)
Scaling to 540B parameters and 1.8K tasks
15
9.4%
15.5%
Scaling number of finetuning tasks also improve
performance, but small gain after 282 tasks.
→ Conjectures : Additional tasks are not diverse,
→ Model already gains new knowledge from pretraining.
Scaling the model size even further might
improve performance,
Experiments on three PaLM model sizes
Evaluation metric is few-shot prompted accuracy (exact match)
Scaling to 540B parameters and 1.8K tasks
16
CoT -> Muffin -> T0-SF-> NIV2
Increasing the number of tasks in the finetuning data improves performance of Flan-PaLM.
CoT tuning has most effect on MGSM like world math problem.
4 Finetuning with chain-of-thought annotations
17
Reasoning ability with chain-of-though data
New state-of-the-art
4 Finetuning with chain-of-thought annotations
18
Reasoning ability with chain-of-though data
CoT+non-CoT finetuning
CoT finetuning
No finetuning
MMLU, BBH, and TyDiQA
CoT+non-CoT finetuning
Non-CoT finetuing
CoT finetuning
No finetuning
MMLU, BBH, and MGSM
They stratify evaluation into held-out CoT benchmarks and held-out non-CoT benchmarks.
It is critical to finetune on some CoT examples to keep reasoning abilities.
4 Finetuning with chain-of-thought annotations
19
Unlocking zero-shot reasoning
Instruction finetuning on CoT data both with and without exemplars shows better
performance of CoT reasoning in a zero-shot setting.
In BBH, finetuning Flan-PaLM with some CoT dataset
enables zero-shot CoT reasoning in unseen tasks.
20
Results of the generality of instruction finetuning on several models
Putting it all together
GPT-3
175B : >43.9%
Instruction finetuning’s
combination best
achievement
Usability evaluation of open-ended generation
21
Evaluation of human preferences among open-form responses
• 190 evaluation examples are created: questions about challenging categories of creativity, reasoning
over contexts, complex reasoning, planning, and explanation.
• In the zero-shot setting, Flan-PaLM shows preference by a large margin. When using a CoT trigger
phrase, the rater preference for Flan-PaLM over PaLM increased by around 10%
Takeaways
22
• Scaling curves for instruction finetuning
• Scaling up the number of templates (1.6K, 3B model or 62K 172B → 1.8K 570B)
→ Both the scaling size of the model and the number of finetuning tasks are expected to
continue improving performance.
• CoT finetuning is critical for reasoning abilities
• Showing that CoT finetuning a large model improves performance on held-out tasks
→ Whereas prior works, CoT with Flan improves held-out tasks (unseen tasks)
• Instruction finetuning generalizes across models
• In section 5, instruction finetuning shows all improvements on different architectures.
→ We could use Flan-T5, Flan-U-PaLM!
Takeaways
23
• Instruction finetuning improves usability and
mitigates some potential harms
• Showing human preference for open-ended evaluations.
→ Similarly to Instruction-GPT, finetuned models produce
that is better aligned with human preferences.
• Instruction finetuning is relatively compute-efficient
• PaLM 540B instruction finetuning requires only 0.2% of pre-graining compute.
→ The Need to develop techniques that might leverage existing checkpoints
→ It doesn’t change the inference cost of models
Discussion
24
Haechang : Chain-of-thought Instruction annotation 과정이
자세히 나와있지 않다.
Sanhee : chain-of-thought + instruction 실험을 했을 때 reasoning ability
가 높아지는 결과가 있었는데 chain of thought prompt 퀄리티, 최적 수 엔지
니어링 접근에 대해서는 안내와 실험이 부족한 것 같다.
25
References
26
Q & A
27
Thank You
Appendix
Task ratio per mixtures
28
Appendix
Tasks Mixtures: NIV2
29
Natural Instruction V2
Appendix
30
Prompt Style
FLAN (Jason Wei, ICLR 2022) Appendix
Chain-of-Thought (CoT) (Jason Wei, Neurips 2022)
CoT prompting is a gradient-free technique of inducing LLMs to produce intermediate
reasoning steps to load the final answer.
33
An exemplar
Few shots

More Related Content

What's hot

Humpback whale identification challenge反省会
Humpback whale identification challenge反省会Humpback whale identification challenge反省会
Humpback whale identification challenge反省会
Yusuke Uchida
 
Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
Universitat Politècnica de Catalunya
 
画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)
画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)
画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)
STAIR Lab, Chiba Institute of Technology
 
これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由
Yoshitaka Ushiku
 
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Deep Learning JP
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
Sri Ambati
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
Julien SIMON
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Preferred Networks
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
Dmitry Kan
 
Semantic Segmentation AIML Project
Semantic Segmentation AIML ProjectSemantic Segmentation AIML Project
Semantic Segmentation AIML Project
Hitesh
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
Convolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするConvolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をする
Daiki Shimada
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
 
AWSによるグラフDB構築
AWSによるグラフDB構築AWSによるグラフDB構築
AWSによるグラフDB構築
Alexander Patrikalakis
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
Yusuke Uchida
 
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
Deep Learning JP
 
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
Hangil Kim
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法
Deep Learning JP
 

What's hot (20)

Humpback whale identification challenge反省会
Humpback whale identification challenge反省会Humpback whale identification challenge反省会
Humpback whale identification challenge反省会
 
Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
 
画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)
画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)
画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)
 
これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由
 
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Semantic Segmentation AIML Project
Semantic Segmentation AIML ProjectSemantic Segmentation AIML Project
Semantic Segmentation AIML Project
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
 
Convolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするConvolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をする
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
AWSによるグラフDB構築
AWSによるグラフDB構築AWSによるグラフDB構築
AWSによるグラフDB構築
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
 
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 
【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法
 

Similar to Scaling Instruction-Finetuned Language Models

Chain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptxChain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptx
atharva553835
 
Benchmarking transfer learning approaches for NLP
Benchmarking transfer learning approaches for NLPBenchmarking transfer learning approaches for NLP
Benchmarking transfer learning approaches for NLP
Yury Kashnitsky
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
Knoldus Inc.
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
NAVER Engineering
 
A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
StephenAmell4
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
DataScienceConferenc1
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
Fwdays
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
Raven Jiang
 
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEEVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
kevig
 
Evaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English LanguageEvaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English Language
kevig
 
A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
JamieDornan2
 
A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
StephenAmell4
 
A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
AnastasiaSteele10
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer Programming
Florian Wilhelm
 
Bert
BertBert
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
Joaquin Vanschoren
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
Rama Irsheidat
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
Surya Sg
 
Blenderbot
BlenderbotBlenderbot
Blenderbot
taeseon ryu
 

Similar to Scaling Instruction-Finetuned Language Models (20)

Chain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptxChain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptx
 
Benchmarking transfer learning approaches for NLP
Benchmarking transfer learning approaches for NLPBenchmarking transfer learning approaches for NLP
Benchmarking transfer learning approaches for NLP
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
 
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEEVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
 
Evaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English LanguageEvaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English Language
 
A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
 
A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
 
A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer Programming
 
Bert
BertBert
Bert
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Blenderbot
BlenderbotBlenderbot
Blenderbot
 

More from taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
taeseon ryu
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
taeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
taeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
taeseon ryu
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
taeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
taeseon ryu
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
taeseon ryu
 
mPLUG
mPLUGmPLUG
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
taeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
taeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 

Recently uploaded

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 

Recently uploaded (20)

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 

Scaling Instruction-Finetuned Language Models

  • 1. Hyung Won Chung Le Hou Shayne Longpre Barret Zoph Yi Tay +30 1 arXiv:2210.11416 (2022) Scaling Instruction-Finetuned Language Models Google Research Presenter : 박산희, 조해창, 변현정, 이세현
  • 3. Motivation 3 General ability on unseen tasks Low resource and efficient scaling Large Language Models
  • 4. Motivation 4 Large Language Models Held-out Tasks Pathway …., x-shot Tuning General ability on unseen tasks Low resource and efficient scaling
  • 5. Introduction 5 In AI, generalized model to answer the unseen tasks is an important goal. Finetuning language models on a collection of tasks phrased as instructions enables model to respond better to instructions and reduces the need for few-shot exemplars.
  • 6. Introduction 6 The need for scaling on instruction finetuning : number of tasks and the size of the model MMLU Progress Image reference : https://www.youtube.com/watch?v=QdwETwqyREY
  • 7. FLAN (Jason Wei et al., ICLR 2022) Instruction tuning: finetuning language models on a collection of datasets described via instructions
  • 8. FLAN (Jason Wei et al., ICLR 2022) A pretrained language model is Instruction tune on the mixture of all datasets with examples in each dataset. Mixture
  • 9. Flan Finetuning 9 473 Datasets, 146 task categories, 1,836 total tasks : combination tasks from Flan (ICLR 2022), T0, Natural Instruction.
  • 10. Flan Finetuning 10 473 Datasets, 146 task categories, 1,836 total tasks : combination tasks from Flan (ICLR 2022), T0, Natural Instruction
  • 11. Flan Finetuning 11 Chain-of-though finetuning mixture: dataset formats compose of four combinations : w w/o exemplars & w w/o CoT → Finetuning on CoT annotation improves performance on unseen reasoning tasks.
  • 12. Flan Finetuning 12 Fine tuning procedure : T5, PaLM, U-PaLM
  • 13. Evaluation Protocol MMLU, BBH, TyDiQA, MGSM 13 MMLU (Multi Massive Multitask Language Understanding) Various branch of knowledge (humanities, social sciences, hard sciences, and other areas that are important for some people to learn) 57 tasks, 15908 questions. Human-level accuracy is 34.5% (unspecialized humans : AMT), 89.8% (expert-level human) BIG Benchmark (Beyond the Imitation Game Benchmark) In traditional NLP, 200 diverse text-based tasks such as mathematics, commonsense reasoning, and question-answering. BIG-Bench Hard (BBH) is a subset of 23 particularly challenging BIG-Bench tasks. Multiple-choice qa, open-domain qa, multi-label classification etc.
  • 14. Evaluation Protocol MMLU, BBH, TyDiQA, MGSM 14 TyDiQA Question answering across 8 typologically diverse languages MGSM Multilingual benchmark of mathematics problems translated to 10 languages Code is released at FLAN/flan at main · google-research/FLAN (github.com)
  • 15. Scaling to 540B parameters and 1.8K tasks 15 9.4% 15.5% Scaling number of finetuning tasks also improve performance, but small gain after 282 tasks. → Conjectures : Additional tasks are not diverse, → Model already gains new knowledge from pretraining. Scaling the model size even further might improve performance, Experiments on three PaLM model sizes Evaluation metric is few-shot prompted accuracy (exact match)
  • 16. Scaling to 540B parameters and 1.8K tasks 16 CoT -> Muffin -> T0-SF-> NIV2 Increasing the number of tasks in the finetuning data improves performance of Flan-PaLM. CoT tuning has most effect on MGSM like world math problem.
  • 17. 4 Finetuning with chain-of-thought annotations 17 Reasoning ability with chain-of-though data New state-of-the-art
  • 18. 4 Finetuning with chain-of-thought annotations 18 Reasoning ability with chain-of-though data CoT+non-CoT finetuning CoT finetuning No finetuning MMLU, BBH, and TyDiQA CoT+non-CoT finetuning Non-CoT finetuing CoT finetuning No finetuning MMLU, BBH, and MGSM They stratify evaluation into held-out CoT benchmarks and held-out non-CoT benchmarks. It is critical to finetune on some CoT examples to keep reasoning abilities.
  • 19. 4 Finetuning with chain-of-thought annotations 19 Unlocking zero-shot reasoning Instruction finetuning on CoT data both with and without exemplars shows better performance of CoT reasoning in a zero-shot setting. In BBH, finetuning Flan-PaLM with some CoT dataset enables zero-shot CoT reasoning in unseen tasks.
  • 20. 20 Results of the generality of instruction finetuning on several models Putting it all together GPT-3 175B : >43.9% Instruction finetuning’s combination best achievement
  • 21. Usability evaluation of open-ended generation 21 Evaluation of human preferences among open-form responses • 190 evaluation examples are created: questions about challenging categories of creativity, reasoning over contexts, complex reasoning, planning, and explanation. • In the zero-shot setting, Flan-PaLM shows preference by a large margin. When using a CoT trigger phrase, the rater preference for Flan-PaLM over PaLM increased by around 10%
  • 22. Takeaways 22 • Scaling curves for instruction finetuning • Scaling up the number of templates (1.6K, 3B model or 62K 172B → 1.8K 570B) → Both the scaling size of the model and the number of finetuning tasks are expected to continue improving performance. • CoT finetuning is critical for reasoning abilities • Showing that CoT finetuning a large model improves performance on held-out tasks → Whereas prior works, CoT with Flan improves held-out tasks (unseen tasks) • Instruction finetuning generalizes across models • In section 5, instruction finetuning shows all improvements on different architectures. → We could use Flan-T5, Flan-U-PaLM!
  • 23. Takeaways 23 • Instruction finetuning improves usability and mitigates some potential harms • Showing human preference for open-ended evaluations. → Similarly to Instruction-GPT, finetuned models produce that is better aligned with human preferences. • Instruction finetuning is relatively compute-efficient • PaLM 540B instruction finetuning requires only 0.2% of pre-graining compute. → The Need to develop techniques that might leverage existing checkpoints → It doesn’t change the inference cost of models
  • 24. Discussion 24 Haechang : Chain-of-thought Instruction annotation 과정이 자세히 나와있지 않다. Sanhee : chain-of-thought + instruction 실험을 했을 때 reasoning ability 가 높아지는 결과가 있었는데 chain of thought prompt 퀄리티, 최적 수 엔지 니어링 접근에 대해서는 안내와 실험이 부족한 것 같다.
  • 28. Appendix Task ratio per mixtures 28
  • 32. FLAN (Jason Wei, ICLR 2022) Appendix
  • 33. Chain-of-Thought (CoT) (Jason Wei, Neurips 2022) CoT prompting is a gradient-free technique of inducing LLMs to produce intermediate reasoning steps to load the final answer. 33 An exemplar Few shots