SlideShare a Scribd company logo
Improving Language Understanding
by Generative Pre-Training
발표자 : 박지민
(kpdpkp@gmail.com)
저자 : Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever (OpenAI)
Why GPT?
Why GPT? 1)NLP에서의 Pre-Trained Model 설명
ELMo
2018년 2월
영상링크 : https://www.youtube.com/watch?v=XMJ8VxgUzTc&fbclid=IwAR2xgRshy2wu0nnHXGy-pFOeooiXNPUMz1FViTBaeBh13zks5aQw3mOdDmo
기사쓰는 인공지능 지피티 : https://www.facebook.com/GPTisAI/?eid=ARBwuiCcaa0rXwHUEXnXizCAwfG18ydgmjND-4x6nFJVluhxkwcg4bxbS1yn4yY8gXVgANwIuth6oqNm
Why GPT? 2)Generative Model
TabNine
https://www.youtube.com/watch?v=lssDX_zOwOo
Motivation
논문 목표
비지도학습으로 Unlabeled Text 활용해서 다양한 Specific Task를 풀고 싶음
“Our setup does not require these target tasks to be in the same domain as the unlabeled corpus.”
왜 해야하는가?
1) 도메인 데이터가 적음
2) 데이터를 구하려면 시간과 돈이 많이 들어감
3) Universal Representation을 학습
지도학습이 가능하더라도 성능향상에 도움이 됨
ex) Word embedding
Challenging
1. 어떤 Optimization Object 가 효과적인지?
- LM, Translation, Discourse Coherence
2. Transfer 하는 방법에 대한 Consensus가 없음
- architecture를 task-specific하게 수정 (ELMo)
- Task-aware input사용 → Architecture 변화를 최소화
- Using intricate learning scheme(ULMFiT)
- Adding auxilary learning objective
Framework
Generative
(Pre-Training)
● Transformer Decoder
● Language Model
Discrimitive
(Fine-Tuning)
● Input Transformation
● Auxilary objective
Pre-Training 단계
Embedding
● BPE with 40,000 Merge
● Sinusoidal 대신 learned position embedding
Language Modeling
● Transformer Decoder
● BERT와 다른점
GPT vs Transformer
Fine-Tuning 단계
Avoid changes to Model!
Textual Entailment
https://openai.com/blog/language-unsupervised/
Entail/ Contradict/ None
Story Cloze Test
https://openai.com/blog/language-unsupervised/
Performance
발표 당시 12개 dataset 중에 9개에서 sota 기록
○ NLI에서 앞선 점수, 다른 task에서도 좋은 성적을 거두지 않았나하고 추측. 파고들지는 않았음.
○ 엄청 작은 데이터셋(2490eg)인 RTE에서는 BiLSTM이 이김.
○ Handle long-range contexts effectively.
○ Corpus of Linguistic Acceptability(문법 판단, 언어 편향성 체크)에서 큰 성능 향상.
○ 작은 데이터셋(5.7K)부터 큰 데이터셋(550K)까지 잘 학습함.
Analysis
Zero-shot Behavior (w/o fine-tuning)
감성 분석의 경우 모든 예제에 very라는 토큰을
넣고 LM의 output을 positive와 negative라는 단어만
나오게 함.
왜 LM pre-training of transformer가 효과적?
Our hypothesis
1) the underlying generative model learns to perform
many of the tasks we evaluate on in order to
improve its language modeling capability
2) more structured attentional memory of the
transformer assists in transfer compared to
LSTMs
Ablation Studies
1) Larger Dataset이 Auxiliary Objective 효과가 큼.
2) LSTM보다 좋음.
3) Pre-Trained 안하면 성능이 대폭 하락.
개인적인 생각으론 Transformer가 나온지 꽤 됐지만 잘 활용하기가 쉽지 않았었나 봄.
FYI
● OpenAI는 GPT2가 악용될 소지를 걱정하여 모델을 제한하여 공개
● 적절한 조치였는가? (대중에게 공포심을 심어주는 것 아닐까?)

More Related Content

What's hot

Faster rcnn
Faster rcnnFaster rcnn
Faster rcnn
捷恩 蔡
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
Seoung-Ho Choi
 
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
Edge AI and Vision Alliance
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
Chat Generative Pre-Trained Transformer: An Overview
Chat Generative Pre-Trained Transformer: An OverviewChat Generative Pre-Trained Transformer: An Overview
Chat Generative Pre-Trained Transformer: An Overview
IRJET Journal
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
Young Seok Kim
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
RahulKumar854607
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
NAVER D2
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
Arvind Devaraj
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
Ding Li
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
Leon Dohmen
 
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가
Yongha Kim
 
Word embedding
Word embedding Word embedding
Word embedding
ShivaniChoudhary74
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
Suman Debnath
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
Xavier Amatriain
 
NLTK
NLTKNLTK
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
inside-BigData.com
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
Robert Lujo
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
taeseon ryu
 

What's hot (20)

Faster rcnn
Faster rcnnFaster rcnn
Faster rcnn
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Chat Generative Pre-Trained Transformer: An Overview
Chat Generative Pre-Trained Transformer: An OverviewChat Generative Pre-Trained Transformer: An Overview
Chat Generative Pre-Trained Transformer: An Overview
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가
 
Word embedding
Word embedding Word embedding
Word embedding
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
NLTK
NLTKNLTK
NLTK
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
 

Similar to GPT : Generative Pre-Training Model

RLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdfRLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdf
ssuser1bc84b
 
LLM 모델 기반 서비스 실전 가이드
LLM 모델 기반 서비스 실전 가이드LLM 모델 기반 서비스 실전 가이드
LLM 모델 기반 서비스 실전 가이드
Tae Young Lee
 
20231130 LLM이 걸어온 길과 앞으로의 활약 분야 - CoT기반 Autonomous Agents를 위한 한국어 Explanation...
20231130 LLM이 걸어온 길과 앞으로의 활약 분야 - CoT기반 Autonomous Agents를 위한 한국어 Explanation...20231130 LLM이 걸어온 길과 앞으로의 활약 분야 - CoT기반 Autonomous Agents를 위한 한국어 Explanation...
20231130 LLM이 걸어온 길과 앞으로의 활약 분야 - CoT기반 Autonomous Agents를 위한 한국어 Explanation...
Ki Hyun Kim
 
Improving Language Understanding by Generative Pre-Training
Improving Language Understanding by Generative Pre-TrainingImproving Language Understanding by Generative Pre-Training
Improving Language Understanding by Generative Pre-Training
pko89403
 
메이크챗봇 자연어기초
메이크챗봇 자연어기초메이크챗봇 자연어기초
메이크챗봇 자연어기초
김용범 | 무영인터내쇼날
 
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
Susang Kim
 
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
Susang Kim
 
파이썬을 활용한 자연어 분석 - 2차
파이썬을 활용한 자연어 분석 - 2차파이썬을 활용한 자연어 분석 - 2차
파이썬을 활용한 자연어 분석 - 2차
김용범 | 무영인터내쇼날
 
C'est la vie (hello bert!)
C'est la vie (hello bert!)C'est la vie (hello bert!)
C'est la vie (hello bert!)
Junho Lee
 
자바, 미안하다! 파이썬 한국어 NLP
자바, 미안하다! 파이썬 한국어 NLP자바, 미안하다! 파이썬 한국어 NLP
자바, 미안하다! 파이썬 한국어 NLP
Eunjeong (Lucy) Park
 
LLMs Service that provides what users want to know
LLMs Service that provides what users want to knowLLMs Service that provides what users want to know
LLMs Service that provides what users want to know
Tae Young Lee
 

Similar to GPT : Generative Pre-Training Model (11)

RLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdfRLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdf
 
LLM 모델 기반 서비스 실전 가이드
LLM 모델 기반 서비스 실전 가이드LLM 모델 기반 서비스 실전 가이드
LLM 모델 기반 서비스 실전 가이드
 
20231130 LLM이 걸어온 길과 앞으로의 활약 분야 - CoT기반 Autonomous Agents를 위한 한국어 Explanation...
20231130 LLM이 걸어온 길과 앞으로의 활약 분야 - CoT기반 Autonomous Agents를 위한 한국어 Explanation...20231130 LLM이 걸어온 길과 앞으로의 활약 분야 - CoT기반 Autonomous Agents를 위한 한국어 Explanation...
20231130 LLM이 걸어온 길과 앞으로의 활약 분야 - CoT기반 Autonomous Agents를 위한 한국어 Explanation...
 
Improving Language Understanding by Generative Pre-Training
Improving Language Understanding by Generative Pre-TrainingImproving Language Understanding by Generative Pre-Training
Improving Language Understanding by Generative Pre-Training
 
메이크챗봇 자연어기초
메이크챗봇 자연어기초메이크챗봇 자연어기초
메이크챗봇 자연어기초
 
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
 
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용Python과 Tensorflow를 활용한  AI Chatbot 개발 및 실무 적용
Python과 Tensorflow를 활용한 AI Chatbot 개발 및 실무 적용
 
파이썬을 활용한 자연어 분석 - 2차
파이썬을 활용한 자연어 분석 - 2차파이썬을 활용한 자연어 분석 - 2차
파이썬을 활용한 자연어 분석 - 2차
 
C'est la vie (hello bert!)
C'est la vie (hello bert!)C'est la vie (hello bert!)
C'est la vie (hello bert!)
 
자바, 미안하다! 파이썬 한국어 NLP
자바, 미안하다! 파이썬 한국어 NLP자바, 미안하다! 파이썬 한국어 NLP
자바, 미안하다! 파이썬 한국어 NLP
 
LLMs Service that provides what users want to know
LLMs Service that provides what users want to knowLLMs Service that provides what users want to know
LLMs Service that provides what users want to know
 

GPT : Generative Pre-Training Model