SlideShare a Scribd company logo
Paper review summary
Made by Seoung-Ho Choi
Content
• GPT-1 (Language Models are Unsupervised Multitask Learners)
• GPT-2 (Improving Language Understanding by Generative Pre-
Training)
• GPT-3 (Language Models are Few-Shot Learners)
GPT -1
Improving Language Understanding by Generative
Pre-Training, (A. Radford et al., 2018)
• Goal:
• We demonstrate that large gains on these tasks can be realized by generative pre-training of a language
model on a diverse corpus of unlabeled text.
• In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to
achieve effective transfer while requiring minimal changes to the model architecture.
• Introduction :
• Problem : models that can leverage linguistic information from unlabeled data provide a valuable alternative
to gathering more annotation, which can be time-consuming and expensive
• Issue 1 :First, it is unclear what type of optimization objectives are most effective at learning text
representations that are useful for transfer.
• Issue 2 : Second, there is no consensus on the most effective way to transfer these learned representations to
the target task
• Motivation :
• we explore a semi-supervised approach for language understanding tasks using a combination of
unsupervised pre-training and supervised fine-tuning.
• Contribution:
• Our goal is to learn a universal representation that transfers with little adaptation to a wide range of tasks
Improving Language Understanding by Generative
Pre-Training, (A. Radford et al., 2018)
• Proposed Method
• Our training procedure consists of two stages.
• The first stage is learning a high-capacity language model on a large corpus of
text. This is followed by a fine-tuning stage, where we adapt the model to a
discriminative task with labeled data
• Unsupervised pre-training
Improving Language Understanding by Generative
Pre-Training, (A. Radford et al., 2018)
• Proposed Method
• Our training procedure consists of two stages.
• The first stage is learning a high-capacity language model on a large corpus of
text. This is followed by a fine-tuning stage, where we adapt the model to a
discriminative task with labeled data
• Supervised fine-tuning
Left section : Transformer architecture and training objectives
Right section : Input transformations for fine-tuning on different tasks
Improving Language Understanding by
Generative Pre-Training, A. Radford et al., 20
• Experiment
• We evaluate our approach on four types of language understanding tasks
• E.g. natural language inference, question answering , semantic similarity, and text classification.
• Five measure experiment
• Compare about state of the art methods
• Analysis
• Impact of number of layers transferred
• Effect of transferring increasing number of layer from the pre-trained language model
• Plot showing the evolution of zero-shot performance on different tasks as a function of LM pre-training updates.
• Zero-shot Behaviors
• We’d like to better understand why language model pre-training of transformers is effective
• Zero-shot := 훈련 데이터가 거의 또는 전혀 없어도 유용한 패턴 인식을 학습하는 방법
• Ablation studies
• we examine the performance of our method without the auxiliary LM objective during fine-tuning
• we analyze the effect of the Transformer by comparing it with a single layer 2048 unit LSTM using the same framework
• we also compare with our transformer architecture directly trained on supervised target tasks, without pre-training.
• Conclusion
• We introduced a framework for achieving strong natural language understanding with a single task-agnostic model through generative pre-
training and discriminative fine-tuning
• We study by pre-training on a diverse corpus with long stretches of contiguous text our model acquires significant world knowledge and ability
to process long-range dependencies which are then successfully transferred to solving discriminative tasks such as question answering,
semantic similarity assessment, entailment determination, and text classification, improving the state of the art on 9 of the 12 datasets
GPT-2
Language Models are Unsupervised Multitask
Learners, (A. Radford et al., 2019)
• Goal:
• We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new
dataset of millions of webpages called WebText
• Introduction
• Problem : Our suspicion is that the prevalence of single task training on single domain datasets is a major contributor to the
lack of generalization observed in Language model systems
• Motivation
• Affect of attention
• Existing :
• Adding condition :
• More convergence about multi task learning:
• the global minimum of the unsupervised objective is also the global minimum of the supervised objective.
• performing unsupervised multitask learning
• Contribution
• We demonstrate that language models begin to learn these tasks without any explicit supervision
• We conditioned on a document plus questions, the answers generated by the language model
• GPT-2 is a 1.5B parameter Transformer that achieving state of the art results on 7 out of 8 tested language modeling datasets
in a zero-shot setting.
Language Models are Unsupervised Multitask
Learners, (A. Radford et al., 2019)
• Proposal Methods
• Training dataset
• Our approach motivates building as large and diverse a dataset as possible in order to
collect natural language demonstrations of tasks in as varied of domains and contexts as
possible
• Input representation
• Combine the empirical benefits of word-level LMs with the generality of byte-level
approaches.
• Model
• Adding Layer norm and skip diagram
Language Models are Unsupervised Multitask
Learners, (A. Radford et al., 2019)
• Experiment
• Showing out of-distribution using Web Text LMs
• Showing different categories of words using Children's Book Test dataset
• Showing long-range dependencies using LAMBADA dataset
• Task
• Reading Comprehension, Summarization , Translation, Question Answering
• Generalization vs Memorization
• Text Memorization, Model capacity, Diversity, Robustness (무엇이 중요한가?)
• Discussion
• Much research has been dedicated to learning (Hill et al., 2016), understanding (Levy and
Goldberg, 2014), and critically evaluating (Wieting and Kiela, 2019) the representations of
both supervised and unsupervised pre-training methods
• Conclusion
• GPT-2 zero-shots to state of the art performance on 7 out of 8 tested language
modeling datasets
Reference
• A. Radford et al., Improving Language Understanding by Generative
Pre-training, 2018.
• A. Radford et al., Language Models are Unsupervised Multitask
Leaners, 2019.
• T. Brown et al., Language Models are Few-Shot Learners,
arXiv:2005.14165v4, 2020.

More Related Content

What's hot

Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
Fwdays
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
Hanwha System / ICT
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
JEE HYUN PARK
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
Ding Li
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
bhavesh_physics
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
NILESH VERMA
 
Word2Vec
Word2VecWord2Vec
Word2Vec
hyunyoung Lee
 
BERT
BERTBERT
BERT
BERTBERT
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
gohyunwoong
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
Senthil Kumar M
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
Arvind Devaraj
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
Arvind Devaraj
 
Bert
BertBert
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Shuntaro Yada
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
CloudxLab
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
Giacomo Frisoni
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 

What's hot (20)

Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
BERT
BERTBERT
BERT
 
BERT
BERTBERT
BERT
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Bert
BertBert
Bert
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 

Similar to Gpt1 and 2 model review

Vectorized Intent of Multilingual Large Language Models.pptx
Vectorized Intent of Multilingual Large Language Models.pptxVectorized Intent of Multilingual Large Language Models.pptx
Vectorized Intent of Multilingual Large Language Models.pptx
SachinAngre3
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
WarNik Chow
 
Successes and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSuccesses and Frontiers of Deep Learning
Successes and Frontiers of Deep Learning
Sebastian Ruder
 
Proposal.pptx
Proposal.pptxProposal.pptx
Proposal.pptx
SomaiaOsama1
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Jinho Choi
 
Learning Activity 1_ Viteri Flores_Arlyn Johanna
Learning Activity 1_ Viteri Flores_Arlyn JohannaLearning Activity 1_ Viteri Flores_Arlyn Johanna
Learning Activity 1_ Viteri Flores_Arlyn Johanna
ajviteri1
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
Machine Learning Prague
 
Primm and Classroom Talk Sue Sentance Nov 2020 pdf
Primm and Classroom Talk  Sue Sentance Nov 2020 pdfPrimm and Classroom Talk  Sue Sentance Nov 2020 pdf
Primm and Classroom Talk Sue Sentance Nov 2020 pdf
Sue Sentance
 
1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf
ssusere320ca
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
Surya Sg
 
Tsl641 principles for call evaluation v 2
Tsl641   principles for call evaluation v 2Tsl641   principles for call evaluation v 2
Tsl641 principles for call evaluation v 2
Izaham
 
Sean_McMinn-Breaking_Down_the_Four_Walls_Mobile_Devices_in_Classroom_Teaching...
Sean_McMinn-Breaking_Down_the_Four_Walls_Mobile_Devices_in_Classroom_Teaching...Sean_McMinn-Breaking_Down_the_Four_Walls_Mobile_Devices_in_Classroom_Teaching...
Sean_McMinn-Breaking_Down_the_Four_Walls_Mobile_Devices_in_Classroom_Teaching...
TunZawMin1
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Yuki Tomo
 
(4b) Tsl641 Principles For Call Evaluation (2)
(4b) Tsl641   Principles For Call Evaluation (2)(4b) Tsl641   Principles For Call Evaluation (2)
(4b) Tsl641 Principles For Call Evaluation (2)
Izaham
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
Lifeng (Aaron) Han
 
War
WarWar
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
NUPUR YADAV
 
Presentación.Ana.Marca.pdf
Presentación.Ana.Marca.pdfPresentación.Ana.Marca.pdf
Presentación.Ana.Marca.pdf
anagaby1994
 

Similar to Gpt1 and 2 model review (20)

Vectorized Intent of Multilingual Large Language Models.pptx
Vectorized Intent of Multilingual Large Language Models.pptxVectorized Intent of Multilingual Large Language Models.pptx
Vectorized Intent of Multilingual Large Language Models.pptx
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
Successes and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSuccesses and Frontiers of Deep Learning
Successes and Frontiers of Deep Learning
 
Proposal.pptx
Proposal.pptxProposal.pptx
Proposal.pptx
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
 
Learning Activity 1_ Viteri Flores_Arlyn Johanna
Learning Activity 1_ Viteri Flores_Arlyn JohannaLearning Activity 1_ Viteri Flores_Arlyn Johanna
Learning Activity 1_ Viteri Flores_Arlyn Johanna
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Primm and Classroom Talk Sue Sentance Nov 2020 pdf
Primm and Classroom Talk  Sue Sentance Nov 2020 pdfPrimm and Classroom Talk  Sue Sentance Nov 2020 pdf
Primm and Classroom Talk Sue Sentance Nov 2020 pdf
 
1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Tsl641 principles for call evaluation v 2
Tsl641   principles for call evaluation v 2Tsl641   principles for call evaluation v 2
Tsl641 principles for call evaluation v 2
 
Computer Assisted Language Learning
Computer Assisted Language LearningComputer Assisted Language Learning
Computer Assisted Language Learning
 
Sean_McMinn-Breaking_Down_the_Four_Walls_Mobile_Devices_in_Classroom_Teaching...
Sean_McMinn-Breaking_Down_the_Four_Walls_Mobile_Devices_in_Classroom_Teaching...Sean_McMinn-Breaking_Down_the_Four_Walls_Mobile_Devices_in_Classroom_Teaching...
Sean_McMinn-Breaking_Down_the_Four_Walls_Mobile_Devices_in_Classroom_Teaching...
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
(4b) Tsl641 Principles For Call Evaluation (2)
(4b) Tsl641   Principles For Call Evaluation (2)(4b) Tsl641   Principles For Call Evaluation (2)
(4b) Tsl641 Principles For Call Evaluation (2)
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
War
WarWar
War
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
Presentación.Ana.Marca.pdf
Presentación.Ana.Marca.pdfPresentación.Ana.Marca.pdf
Presentación.Ana.Marca.pdf
 

More from Seoung-Ho Choi

Seoung-Ho Choi Introduction to medical deep learning
Seoung-Ho Choi Introduction to medical deep learningSeoung-Ho Choi Introduction to medical deep learning
Seoung-Ho Choi Introduction to medical deep learning
Seoung-Ho Choi
 
Seungho Choi Introduction to deep learning solutions
Seungho Choi Introduction to deep learning solutionsSeungho Choi Introduction to deep learning solutions
Seungho Choi Introduction to deep learning solutions
Seoung-Ho Choi
 
Ensemble normalization for stable training
Ensemble normalization for stable trainingEnsemble normalization for stable training
Ensemble normalization for stable training
Seoung-Ho Choi
 
To classify Alzheimer’s Disease from 3D structural MRI data
To classify Alzheimer’s Disease from 3D structural MRI dataTo classify Alzheimer’s Disease from 3D structural MRI data
To classify Alzheimer’s Disease from 3D structural MRI data
Seoung-Ho Choi
 
Middle school winter science garden participation certificate
Middle school winter science garden participation certificateMiddle school winter science garden participation certificate
Middle school winter science garden participation certificate
Seoung-Ho Choi
 
Elementary school winter model aircraft school certificate
Elementary school winter model aircraft school certificateElementary school winter model aircraft school certificate
Elementary school winter model aircraft school certificate
Seoung-Ho Choi
 
Elementary school youth science exploration contest silver prize
Elementary school youth science exploration contest silver prizeElementary school youth science exploration contest silver prize
Elementary school youth science exploration contest silver prize
Seoung-Ho Choi
 
Elementary school youth science contest silver prize
Elementary school youth science contest silver prizeElementary school youth science contest silver prize
Elementary school youth science contest silver prize
Seoung-Ho Choi
 
Middle school creativity problem solving ability contest encouragement prize
Middle school creativity problem solving ability contest encouragement prizeMiddle school creativity problem solving ability contest encouragement prize
Middle school creativity problem solving ability contest encouragement prize
Seoung-Ho Choi
 
Middle school youth science exploration contest bronze prize
Middle school youth science exploration contest bronze prizeMiddle school youth science exploration contest bronze prize
Middle school youth science exploration contest bronze prize
Seoung-Ho Choi
 
Elementary school minister of science and technology award
Elementary school minister of science and technology awardElementary school minister of science and technology award
Elementary school minister of science and technology award
Seoung-Ho Choi
 
Elementary school completion certificate Jung-gu education center for the gifted
Elementary school completion certificate Jung-gu education center for the giftedElementary school completion certificate Jung-gu education center for the gifted
Elementary school completion certificate Jung-gu education center for the gifted
Seoung-Ho Choi
 
Encouragement award in Korean Information Science Society for Undergraduate S...
Encouragement award in Korean Information Science Society for Undergraduate S...Encouragement award in Korean Information Science Society for Undergraduate S...
Encouragement award in Korean Information Science Society for Undergraduate S...
Seoung-Ho Choi
 
Best paper in Korean Communication Society
Best paper in  Korean Communication SocietyBest paper in  Korean Communication Society
Best paper in Korean Communication Society
Seoung-Ho Choi
 
PS(Personal Statement) Korean
PS(Personal Statement) KoreanPS(Personal Statement) Korean
PS(Personal Statement) Korean
Seoung-Ho Choi
 
PS(Personal Statement) English
PS(Personal Statement) EnglishPS(Personal Statement) English
PS(Personal Statement) English
Seoung-Ho Choi
 
A Study on the Importance of Adaptive Seed Value Exploration
A Study on the Importance of Adaptive Seed Value Exploration A Study on the Importance of Adaptive Seed Value Exploration
A Study on the Importance of Adaptive Seed Value Exploration
Seoung-Ho Choi
 
Bi-activation Function : an Enhanced Version of an Activation Function in C...
Bi-activation Function : an Enhanced Version of an Activation Function in C...Bi-activation Function : an Enhanced Version of an Activation Function in C...
Bi-activation Function : an Enhanced Version of an Activation Function in C...
Seoung-Ho Choi
 
Nonlinear Exponential Regularization : An Improved Version of Regularization ...
Nonlinear Exponential Regularization : An Improved Version of Regularization ...Nonlinear Exponential Regularization : An Improved Version of Regularization ...
Nonlinear Exponential Regularization : An Improved Version of Regularization ...
Seoung-Ho Choi
 
Visualization Techniques for Outlier data
Visualization Techniques for Outlier data Visualization Techniques for Outlier data
Visualization Techniques for Outlier data
Seoung-Ho Choi
 

More from Seoung-Ho Choi (20)

Seoung-Ho Choi Introduction to medical deep learning
Seoung-Ho Choi Introduction to medical deep learningSeoung-Ho Choi Introduction to medical deep learning
Seoung-Ho Choi Introduction to medical deep learning
 
Seungho Choi Introduction to deep learning solutions
Seungho Choi Introduction to deep learning solutionsSeungho Choi Introduction to deep learning solutions
Seungho Choi Introduction to deep learning solutions
 
Ensemble normalization for stable training
Ensemble normalization for stable trainingEnsemble normalization for stable training
Ensemble normalization for stable training
 
To classify Alzheimer’s Disease from 3D structural MRI data
To classify Alzheimer’s Disease from 3D structural MRI dataTo classify Alzheimer’s Disease from 3D structural MRI data
To classify Alzheimer’s Disease from 3D structural MRI data
 
Middle school winter science garden participation certificate
Middle school winter science garden participation certificateMiddle school winter science garden participation certificate
Middle school winter science garden participation certificate
 
Elementary school winter model aircraft school certificate
Elementary school winter model aircraft school certificateElementary school winter model aircraft school certificate
Elementary school winter model aircraft school certificate
 
Elementary school youth science exploration contest silver prize
Elementary school youth science exploration contest silver prizeElementary school youth science exploration contest silver prize
Elementary school youth science exploration contest silver prize
 
Elementary school youth science contest silver prize
Elementary school youth science contest silver prizeElementary school youth science contest silver prize
Elementary school youth science contest silver prize
 
Middle school creativity problem solving ability contest encouragement prize
Middle school creativity problem solving ability contest encouragement prizeMiddle school creativity problem solving ability contest encouragement prize
Middle school creativity problem solving ability contest encouragement prize
 
Middle school youth science exploration contest bronze prize
Middle school youth science exploration contest bronze prizeMiddle school youth science exploration contest bronze prize
Middle school youth science exploration contest bronze prize
 
Elementary school minister of science and technology award
Elementary school minister of science and technology awardElementary school minister of science and technology award
Elementary school minister of science and technology award
 
Elementary school completion certificate Jung-gu education center for the gifted
Elementary school completion certificate Jung-gu education center for the giftedElementary school completion certificate Jung-gu education center for the gifted
Elementary school completion certificate Jung-gu education center for the gifted
 
Encouragement award in Korean Information Science Society for Undergraduate S...
Encouragement award in Korean Information Science Society for Undergraduate S...Encouragement award in Korean Information Science Society for Undergraduate S...
Encouragement award in Korean Information Science Society for Undergraduate S...
 
Best paper in Korean Communication Society
Best paper in  Korean Communication SocietyBest paper in  Korean Communication Society
Best paper in Korean Communication Society
 
PS(Personal Statement) Korean
PS(Personal Statement) KoreanPS(Personal Statement) Korean
PS(Personal Statement) Korean
 
PS(Personal Statement) English
PS(Personal Statement) EnglishPS(Personal Statement) English
PS(Personal Statement) English
 
A Study on the Importance of Adaptive Seed Value Exploration
A Study on the Importance of Adaptive Seed Value Exploration A Study on the Importance of Adaptive Seed Value Exploration
A Study on the Importance of Adaptive Seed Value Exploration
 
Bi-activation Function : an Enhanced Version of an Activation Function in C...
Bi-activation Function : an Enhanced Version of an Activation Function in C...Bi-activation Function : an Enhanced Version of an Activation Function in C...
Bi-activation Function : an Enhanced Version of an Activation Function in C...
 
Nonlinear Exponential Regularization : An Improved Version of Regularization ...
Nonlinear Exponential Regularization : An Improved Version of Regularization ...Nonlinear Exponential Regularization : An Improved Version of Regularization ...
Nonlinear Exponential Regularization : An Improved Version of Regularization ...
 
Visualization Techniques for Outlier data
Visualization Techniques for Outlier data Visualization Techniques for Outlier data
Visualization Techniques for Outlier data
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 

Gpt1 and 2 model review

  • 1. Paper review summary Made by Seoung-Ho Choi
  • 2. Content • GPT-1 (Language Models are Unsupervised Multitask Learners) • GPT-2 (Improving Language Understanding by Generative Pre- Training) • GPT-3 (Language Models are Few-Shot Learners)
  • 4. Improving Language Understanding by Generative Pre-Training, (A. Radford et al., 2018) • Goal: • We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text. • In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. • Introduction : • Problem : models that can leverage linguistic information from unlabeled data provide a valuable alternative to gathering more annotation, which can be time-consuming and expensive • Issue 1 :First, it is unclear what type of optimization objectives are most effective at learning text representations that are useful for transfer. • Issue 2 : Second, there is no consensus on the most effective way to transfer these learned representations to the target task • Motivation : • we explore a semi-supervised approach for language understanding tasks using a combination of unsupervised pre-training and supervised fine-tuning. • Contribution: • Our goal is to learn a universal representation that transfers with little adaptation to a wide range of tasks
  • 5. Improving Language Understanding by Generative Pre-Training, (A. Radford et al., 2018) • Proposed Method • Our training procedure consists of two stages. • The first stage is learning a high-capacity language model on a large corpus of text. This is followed by a fine-tuning stage, where we adapt the model to a discriminative task with labeled data • Unsupervised pre-training
  • 6. Improving Language Understanding by Generative Pre-Training, (A. Radford et al., 2018) • Proposed Method • Our training procedure consists of two stages. • The first stage is learning a high-capacity language model on a large corpus of text. This is followed by a fine-tuning stage, where we adapt the model to a discriminative task with labeled data • Supervised fine-tuning Left section : Transformer architecture and training objectives Right section : Input transformations for fine-tuning on different tasks
  • 7. Improving Language Understanding by Generative Pre-Training, A. Radford et al., 20 • Experiment • We evaluate our approach on four types of language understanding tasks • E.g. natural language inference, question answering , semantic similarity, and text classification. • Five measure experiment • Compare about state of the art methods • Analysis • Impact of number of layers transferred • Effect of transferring increasing number of layer from the pre-trained language model • Plot showing the evolution of zero-shot performance on different tasks as a function of LM pre-training updates. • Zero-shot Behaviors • We’d like to better understand why language model pre-training of transformers is effective • Zero-shot := 훈련 데이터가 거의 또는 전혀 없어도 유용한 패턴 인식을 학습하는 방법 • Ablation studies • we examine the performance of our method without the auxiliary LM objective during fine-tuning • we analyze the effect of the Transformer by comparing it with a single layer 2048 unit LSTM using the same framework • we also compare with our transformer architecture directly trained on supervised target tasks, without pre-training. • Conclusion • We introduced a framework for achieving strong natural language understanding with a single task-agnostic model through generative pre- training and discriminative fine-tuning • We study by pre-training on a diverse corpus with long stretches of contiguous text our model acquires significant world knowledge and ability to process long-range dependencies which are then successfully transferred to solving discriminative tasks such as question answering, semantic similarity assessment, entailment determination, and text classification, improving the state of the art on 9 of the 12 datasets
  • 9. Language Models are Unsupervised Multitask Learners, (A. Radford et al., 2019) • Goal: • We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText • Introduction • Problem : Our suspicion is that the prevalence of single task training on single domain datasets is a major contributor to the lack of generalization observed in Language model systems • Motivation • Affect of attention • Existing : • Adding condition : • More convergence about multi task learning: • the global minimum of the unsupervised objective is also the global minimum of the supervised objective. • performing unsupervised multitask learning • Contribution • We demonstrate that language models begin to learn these tasks without any explicit supervision • We conditioned on a document plus questions, the answers generated by the language model • GPT-2 is a 1.5B parameter Transformer that achieving state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting.
  • 10. Language Models are Unsupervised Multitask Learners, (A. Radford et al., 2019) • Proposal Methods • Training dataset • Our approach motivates building as large and diverse a dataset as possible in order to collect natural language demonstrations of tasks in as varied of domains and contexts as possible • Input representation • Combine the empirical benefits of word-level LMs with the generality of byte-level approaches. • Model • Adding Layer norm and skip diagram
  • 11. Language Models are Unsupervised Multitask Learners, (A. Radford et al., 2019) • Experiment • Showing out of-distribution using Web Text LMs • Showing different categories of words using Children's Book Test dataset • Showing long-range dependencies using LAMBADA dataset • Task • Reading Comprehension, Summarization , Translation, Question Answering • Generalization vs Memorization • Text Memorization, Model capacity, Diversity, Robustness (무엇이 중요한가?) • Discussion • Much research has been dedicated to learning (Hill et al., 2016), understanding (Levy and Goldberg, 2014), and critically evaluating (Wieting and Kiela, 2019) the representations of both supervised and unsupervised pre-training methods • Conclusion • GPT-2 zero-shots to state of the art performance on 7 out of 8 tested language modeling datasets
  • 12. Reference • A. Radford et al., Improving Language Understanding by Generative Pre-training, 2018. • A. Radford et al., Language Models are Unsupervised Multitask Leaners, 2019. • T. Brown et al., Language Models are Few-Shot Learners, arXiv:2005.14165v4, 2020.