SlideShare a Scribd company logo
1 of 35
Download to read offline
BART
Denoising Sequence-to-Sequence Pre-training
for Natural Language Generation, Translation, and Comprehension
자연어처리팀: 김은희 박희수 백지윤 주정헌 진명훈(발표자)
🤗
👶 Lifelog of BART
https://arxiv.org/abs/1910.13461
https://www.aclweb.org/anthology/2020.acl-main.703/
https://github.com/pytorch/fairseq/tree/master/examples/bart
🐣 BART Architecture & Denoising Objective
🐥 BART Performance @ NLP Tasks
🦅 KoBART Summarization Example
Transformer’s Encoderand Decoder
Seq2Seq Model trained with denoising as pre-training objective QA
Fine-tuning BART
Comparing Pre-training Objectives
Large-scale Pre-training Experiments
Qualitative Analysis QA
Introduction
Summarization test results QA
🤔 What is BART?
Sequence-to-Sequence model trained with denoising as pretraining objective
https://www.aclweb.org/anthology/2020.acl-main.703/
🐣 BART Architecture & Denoising Objective
https://jeddy92.github.io/JEddy92.github.io/ts_seq2seq_intro/
Seq2Seq: Input Sequence를 입력으로 받고 Output Sequence를 출력하는 모델 !! (AutoEncoder)
🐣 BART Architecture & Denoising Objective
https://arxiv.org/pdf/1706.03762.pdf
BART의 Encoder-Decoder를 Transformers로! (BERT-GPT)
🐣 BART Architecture & Denoising Objective
Chickens cackle and _____ quack
Chickens cackle and ducks quack
https://www.deviantart.com/mighty355/art/Bart-Simpson-01-483298269
https://www.gwern.net/docs/psychology/writing/1953-taylor.pdf
Sequence-to-Sequence model trained with denoising as pretraining objective
🐣 BART Architecture & Denoising Objective
Self-supervised 방법론들은광범위한 NLP task에서 괄목할 성과를 거둠
• Word2Vec, ELMo, BERT,SpanBERT, XLNet,Roberta
이 중 가장 성공적인 접근은 Cloze Tasks에서 영감을 받은 MLM(MaskedLanguageModels)의변형
• WilsonL Taylor.1953.Clozeprocedure:A new tool for measuring readability. JournalismBulletin
• In XLNet,Masked Token이예측될순서를 개선 (PermutationOperation)
• In SpanBERT,Masked Token의분포를개선 (많은 NLP Task에서text span 간 관계 추론이 필요)
• In UniLM,Masked Token을대체할 Context를개선
그러나 기존의 방법론들을 특정 End Task 형태에 집중하여 활용성이 떨어짐
• XLNet,SpanBERT,UniLM
BERT BART
https://pngio.com/PNG/a127850-bert-png.html
https://www.pngwing.com/en/free-png-dkxeal
🐣 BART Architecture & Denoising Objective
BERT
✓ Transformers의Encoder로모델구축
✓ Cloze Task의영감을받은 MLM Pre-Train 목적함수와
✓ Next Sentence Prediction Pre-Train목적함수로학습
✓ Albert, Roberta등후속 연구에 의하면 MLM이진또배기!
✓ By XLNet Paper,BERTis based on denoisingauto-encoding
log𝑝 𝜃 ҧ𝓍| Ƹ𝓍 ≈ ෍
𝑡=1
𝑇
𝑚 𝑡 log 𝑝 𝜃 𝑥 𝑡| Ƹ𝓍
✓ Independence Assumption
모든 masked token들은독립적으로재구축됨
✓ Input Noise
Pretrain-Finetune discrependancy
✓ 위의 단점을 XLNet은Autoregressive한LM으로해결코자함
PLM(PermutationLanguage Model)
https://arxiv.org/pdf/1906.08237.pdf
🐣 BART Architecture & Denoising Objective
🐣 BART Architecture & Denoising Objective
BART
✓ BART는Encoder-Decoder모델!(Transformers)
✓ BidirectionalTransformer’s Encoder (as like BERT)
• Decoder의각Layer별Cross-Attention을수행
• Word Prediction 전추가적인 FFN 필요 X
✓ Autoregressive Transformers’Decoder (as like GPT)
• ReLU대신GeLUs 사용
• Parameter를 𝑁(0,0.02)로초기화
✓ BERT의단점들을 개선 가능
✓ XLNet,SpanBERT,UniLM 등기존의 MLM 변형체들 채용
✓ Noise Flexibility;origintext에임의의 변환을 적용할 수 있음
Base: 6 layers
Large:12 layers
Base: 6 layers
Large:12 layers
🐣 BART Architecture & Denoising Objective
어떤 Noise를 줬을까??
🐣 BART Architecture & Denoising Objective
Token Masking
BERT (Devlin et al., 2019)와 동일하게,
RandomToken들을 sampling하고 이를 [MASK] token으로대체
어제 그녀와 만났다.
그녀와의 데이트는 너무나도 달콤했다.
어제 [MASK] 만났다.
[MASK] 데이트는 너무나도 달콤했다.
🐣 BART Architecture & Denoising Objective
Token Deletion
어제 그녀와 만났다.
그녀와의 데이트는 너무나도 달콤했다.
어제 만났다.
데이트는 너무나도 달콤했다.
입력에서 임의로 token들을 제거
Token Masking과다르게, Model은어느 자리의 Token이 유실됐는지 결정해야함.
🐣 BART Architecture & Denoising Objective
Text Infilling
어제 그녀와 만났다.
그녀와의 데이트는 너무나도 달콤했다.
어제 그녀와 [MASK] 만났다.
[MASK] 너무나도 달콤했다.
Poisson(𝜆 = 3) 분포에서 Span Length를 추출한 길이 만큼의
Text Span을 Sampling하고 각 Token을 단일 [MASK] token으로 대체한다.
길이가 0일 경우에는 [MASK] token을 삽입하는 것과 동일하다.
이는 모델에게 Span에서부터 얼마나 많은 Token이 유실됐는지 예측하도록 학습시킨다.
SpanBERT (Joshi et al., 2019)에서 영감을 받았다고 한다.
https://towardsdatascience.com/the-poisson-distribution-and-poisson-process-explained-4e2cb17d459
𝜆 = 𝔼 𝑋 = Var(𝑋)
🐣 BART Architecture & Denoising Objective
Sentence Permutation
어제 그녀와 만났다.
그녀와의 데이트는 너무나도 달콤했다.
그녀와의 데이트는 너무나도 달콤했다.
어제 그녀와 만났다.
Full Stop들을 기준으로 Document를 Sentences로 나누고
이 문장들을 임의의 순서로 섞어준다.
🐣 BART Architecture & Denoising Objective
Document Rotation
임의로 Token을 선택하고 Document를 해당 Token으로 시작하도록 Rotate
이 Task는 Model이 문장의 시작이 어딘지 학습할 수 있게 도와준다.
어제 그녀와 만났다.
그녀와의 데이트는 너무나도 달콤했다.
만났다.
그녀와의 데이트는 너무나도 달콤했다.
어제 그녀와
https://github.com/huggingface/transformers/issues/5428
🐣 BART Architecture & Denoising Objective
🐣 BART Architecture & Denoising Objective
Noise Flexibility
https://github.com/pytorch/fairseq/blob/master/fairseq/tasks/denoising.py
https://github.com/pytorch/fairseq/blob/master/fairseq/data/denoising_dataset.py
https://github.com/pytorch/fairseq/issues/2303
https://github.com/huggingface/transformers/issues/1676
🤗
🐣 BART Architecture & Denoising Objective
https://github.com/sshleifer
🐣 BART Architecture & Denoising Objective
https://github.com/huggingface/transformers/pull/2745
TODO
• Pretrained weight conversion
• Some unit testing
• fairseq 버전과 동일한 추론 결과
• Encoder, Decoder arguments
• Docstrings
• 코드 reader들을 위한 주석 작업
Future PRs
• Pretraining objective 예제
• BartForSummarization 구현
🐣 BART Architecture & Denoising Objective
🐣 BART Architecture & Denoising Objective
https://huggingface.co/transformers/model_doc/bart.html
https://github.com/pytorch/fairseq/tree/master/examples/bart
FAIRSEQ Hugging Face
🐣 BART Architecture & Denoising Objective
https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/modeling_bart.py
(bsz, seq_len)
Embedded
(bsz, seq_len, d)
(bsz, seq_len, d)
(bsz, seq_len)
Embedded
(bsz, seq_len, d)
(bsz, seq_len, d)
(bsz, seq_len, |V|)
(bsz, seq_len)
corrupt
(bsz, seq_len, d)
(bsz, seq_len)
Embedded
(bsz, seq_len, d)
(bsz, seq_len, d)
(bsz, seq_len)
Embedded
(bsz, seq_len, d)
corrupt
(bsz, seq_len)
🐣 BART Architecture & Denoising Objective
Original
Document
Decoder’s
Output
𝑙𝑜𝑠𝑠 𝑥, 𝑐𝑙𝑎𝑠𝑠 = − log
exp 𝑥 𝑐𝑙𝑎𝑠𝑠
σ 𝑗 exp 𝑥 𝑗
= −𝑥 𝑐𝑙𝑎𝑠𝑠 + log ෍
𝑗
exp 𝑥[𝑗]
https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html
BART is trained by corrupting documents and then
optimizing a reconstruction loss – the cross-entropy
between the decoder’s output and the original document
(bsz, seq_len, |V|)
(bsz * seq_len)
(bsz * seq_len, |V|)
🐣 Fine-tuning BART
Sentence Classification Tasks Sentence Classification Tasks
Sequence Generation Tasks Machine Translation Tasks
🐥 BART Performance @ NLP Tasks
Comparing Pre-training Objectives
Base model 기준과 large model 기준으로 실험
Base-size model 을 기준으로 여러 Task에 대해서 비교
6 encoder layer, 6 decoder layer, 768 hidden size
Model Description
Language Model(CLM) GPT와 유사하게 left-to-right transformer 모델을 학습
Permuted Language Model(PLM) XLNet을 기반으로 토큰의 1/6을 샘플링하고 auto regressively 방식으로 생성
Masked Language Model(MLM) BERT에 따라 15% token을[MASK]로 대체하고 학습
Multitask Masked Language Model
UniLM과 같이 additional self-attention mask.
Self attention mask -> 1/6 left-to-right, 1/6 right-to-left, 1/3 un-masked, 1/3의50%는 un-
masked, 나머지는 left-to-right
Masked Seq-to-Seq(MASS) 토큰의 50%를포함하는 범위를 mask, masked된 토큰을 예측하기 위한 seq-to-seq 모델로 훈련
Task Description
SQuAD
wikipedia 단락에 대한 대한 질문/답변, 질문과 연결된 컨텍스트를 BAR
T의 인코더에 대한 입력으로 사용, 추가 디코더에 전달
MNLI
Fine-tuned model은 EOS 토큰이 추가된 두 문장을 연결, BART 인코
더와 디코더에 전달,
EOS 토큰은 문장 관계를 분류 하는데 사용
ELI5
긴 형식의 추상적인 질문/응답 데이터
질문과 supporting document의 연결을 조건으로 답변 생성
XSum 매우 추상적 요약이 포함된 뉴스 요약 데이터
ConvAI2 context와 페르소나를 조건으로 하는 대화 응답 생성
CNN/DM 뉴스 요약 데이터
🐥 BART Performance @ NLP Tasks
Comparing Pre-training Objectives
🐥 BART Performance @ NLP Tasks
Large-scale Pre-training Experiments
• encoder와decoder의hiddensize => 12
• batch size => 8,000
• train steps => 500,000
• tokenizing method => BPE
• Text Infilling + Sentence Shuffling=> masking 30% of token,
permute all sentences
• train step의마지막 10% => dropout off
• pre-trainingdata => 160Gb (news + books + stories + web text)
https://github.com/modulabs/beyondBERT/issues/6
🐥 BART Performance @ NLP Tasks
Large-scale Pre-training Experiments
🐥 BART Performance @ NLP Tasks
Large-scale Pre-training Experiments
🐥 BART Performance @ NLP Tasks
https://github.com/modulabs/beyondBERT/issues/6
단순히 input 정보를 넘어서,
background knowledge를활용하는 모습을 보여줌
• "Source Document가journal Science의연구임"
을 말함 (1st example)
• "PG&E 가 California에 있다"고 말함 (5th example)
🦅 KoBART Summarization Example
https://github.com/SKT-AI/KoBART
🦅 KoBART Summarization Example
https://github.com/haven-jeon/KoBART-chatbot
https://github.com/seujung/KoBART-summarization
https://github.com/SKT-AI/KoBART/tree/main/examples
https://github.com/seujung/KoBART-translation
🦅 KoBART Summarization Example
http://20.194.43.11:7874/
https://news.naver.com/main/read.nhn?mode=LSD&mid=shm&sid1=102&oid=119&aid=0002453221

More Related Content

What's hot

Finding connections among images using CycleGAN
Finding connections among images using CycleGANFinding connections among images using CycleGAN
Finding connections among images using CycleGANNAVER Engineering
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)WON JOON YOO
 
Graph attention network - deep learning paper review
Graph attention network -  deep learning paper reviewGraph attention network -  deep learning paper review
Graph attention network - deep learning paper reviewtaeseon ryu
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillationNAVER Engineering
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김
 
Show and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generatorShow and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generatorHojin Yang
 
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017Taehoon Kim
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksYunjey Choi
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 

What's hot (20)

Finding connections among images using CycleGAN
Finding connections among images using CycleGANFinding connections among images using CycleGAN
Finding connections among images using CycleGAN
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)
 
Graph attention network - deep learning paper review
Graph attention network -  deep learning paper reviewGraph attention network -  deep learning paper review
Graph attention network - deep learning paper review
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
 
BERT
BERTBERT
BERT
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
Show and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generatorShow and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generator
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Generative adversarial text to image synthesis
Generative adversarial text to image synthesisGenerative adversarial text to image synthesis
Generative adversarial text to image synthesis
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 

Similar to Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorSungchul Kim
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Sungchul Kim
 
S is For Spec at RubyKansai25
S is For Spec at RubyKansai25S is For Spec at RubyKansai25
S is For Spec at RubyKansai25Shintaro Kakutani
 
Dual tkb a-dual_learning_bridge_between_text_and_knowledge_base
Dual tkb  a-dual_learning_bridge_between_text_and_knowledge_baseDual tkb  a-dual_learning_bridge_between_text_and_knowledge_base
Dual tkb a-dual_learning_bridge_between_text_and_knowledge_baseAce12358
 
Lcos显示产业在中国的机遇
Lcos显示产业在中国的机遇Lcos显示产业在中国的机遇
Lcos显示产业在中国的机遇巍 陆
 
20090410 Gree Opentech Main
20090410 Gree Opentech Main20090410 Gree Opentech Main
20090410 Gree Opentech MainHideki Yamane
 
Peeling The Onion For Ipdc Forum09 Mix Ver1
Peeling The Onion For Ipdc Forum09 Mix Ver1Peeling The Onion For Ipdc Forum09 Mix Ver1
Peeling The Onion For Ipdc Forum09 Mix Ver1hutuworm
 
Ruby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionRuby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionLibin Pan
 
Bae bert based adversarial examples for text classification
Bae  bert based adversarial examples for text classificationBae  bert based adversarial examples for text classification
Bae bert based adversarial examples for text classificationtaeseon ryu
 
다양한 업무에 적합한 AWS의 스토리지 서비스 알아보기 – 김상현, AWS 솔루션즈 아키텍트:: AWS Builders Online Ser...
다양한 업무에 적합한 AWS의 스토리지 서비스 알아보기 – 김상현, AWS 솔루션즈 아키텍트:: AWS Builders Online Ser...다양한 업무에 적합한 AWS의 스토리지 서비스 알아보기 – 김상현, AWS 솔루션즈 아키텍트:: AWS Builders Online Ser...
다양한 업무에 적합한 AWS의 스토리지 서비스 알아보기 – 김상현, AWS 솔루션즈 아키텍트:: AWS Builders Online Ser...Amazon Web Services Korea
 
P2P Bug Tracking with SD
P2P Bug Tracking with SDP2P Bug Tracking with SD
P2P Bug Tracking with SDJesse Vincent
 
Prototyping Web Metaphors 20090418
Prototyping Web Metaphors 20090418Prototyping Web Metaphors 20090418
Prototyping Web Metaphors 20090418Charles (XXC) Chen
 
Working With Rails
Working With RailsWorking With Rails
Working With RailsDali Wang
 
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series Amazon Web Services Korea
 
How To Create Custom DSLs By PHP
How To Create Custom DSLs By PHPHow To Create Custom DSLs By PHP
How To Create Custom DSLs By PHPAtsuhiro Kubo
 
Ruby on Rails Tutorial Part I
Ruby on Rails Tutorial Part IRuby on Rails Tutorial Part I
Ruby on Rails Tutorial Part IWei Jen Lu
 

Similar to Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (20)

What Can Compilers Do for Us?
What Can Compilers Do for Us?What Can Compilers Do for Us?
What Can Compilers Do for Us?
 
S is for Spec
S is for SpecS is for Spec
S is for Spec
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object Detector
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
 
Genome Browser
Genome BrowserGenome Browser
Genome Browser
 
S is For Spec at RubyKansai25
S is For Spec at RubyKansai25S is For Spec at RubyKansai25
S is For Spec at RubyKansai25
 
Dual tkb a-dual_learning_bridge_between_text_and_knowledge_base
Dual tkb  a-dual_learning_bridge_between_text_and_knowledge_baseDual tkb  a-dual_learning_bridge_between_text_and_knowledge_base
Dual tkb a-dual_learning_bridge_between_text_and_knowledge_base
 
Lcos显示产业在中国的机遇
Lcos显示产业在中国的机遇Lcos显示产业在中国的机遇
Lcos显示产业在中国的机遇
 
20090410 Gree Opentech Main
20090410 Gree Opentech Main20090410 Gree Opentech Main
20090410 Gree Opentech Main
 
Peeling The Onion For Ipdc Forum09 Mix Ver1
Peeling The Onion For Ipdc Forum09 Mix Ver1Peeling The Onion For Ipdc Forum09 Mix Ver1
Peeling The Onion For Ipdc Forum09 Mix Ver1
 
Ruby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionRuby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese Version
 
Bae bert based adversarial examples for text classification
Bae  bert based adversarial examples for text classificationBae  bert based adversarial examples for text classification
Bae bert based adversarial examples for text classification
 
다양한 업무에 적합한 AWS의 스토리지 서비스 알아보기 – 김상현, AWS 솔루션즈 아키텍트:: AWS Builders Online Ser...
다양한 업무에 적합한 AWS의 스토리지 서비스 알아보기 – 김상현, AWS 솔루션즈 아키텍트:: AWS Builders Online Ser...다양한 업무에 적합한 AWS의 스토리지 서비스 알아보기 – 김상현, AWS 솔루션즈 아키텍트:: AWS Builders Online Ser...
다양한 업무에 적합한 AWS의 스토리지 서비스 알아보기 – 김상현, AWS 솔루션즈 아키텍트:: AWS Builders Online Ser...
 
P2P Bug Tracking with SD
P2P Bug Tracking with SDP2P Bug Tracking with SD
P2P Bug Tracking with SD
 
Preon (J-Fall 2008)
Preon (J-Fall 2008)Preon (J-Fall 2008)
Preon (J-Fall 2008)
 
Prototyping Web Metaphors 20090418
Prototyping Web Metaphors 20090418Prototyping Web Metaphors 20090418
Prototyping Web Metaphors 20090418
 
Working With Rails
Working With RailsWorking With Rails
Working With Rails
 
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series
 
How To Create Custom DSLs By PHP
How To Create Custom DSLs By PHPHow To Create Custom DSLs By PHP
How To Create Custom DSLs By PHP
 
Ruby on Rails Tutorial Part I
Ruby on Rails Tutorial Part IRuby on Rails Tutorial Part I
Ruby on Rails Tutorial Part I
 

More from taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Recently uploaded

Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
DIENES _ Organic Chemistry _ B. Pharm..pptx
DIENES _ Organic Chemistry _  B. Pharm..pptxDIENES _ Organic Chemistry _  B. Pharm..pptx
DIENES _ Organic Chemistry _ B. Pharm..pptxAZCPh
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 

Recently uploaded (20)

Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
DIENES _ Organic Chemistry _ B. Pharm..pptx
DIENES _ Organic Chemistry _  B. Pharm..pptxDIENES _ Organic Chemistry _  B. Pharm..pptx
DIENES _ Organic Chemistry _ B. Pharm..pptx
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 

Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

  • 1. BART Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension 자연어처리팀: 김은희 박희수 백지윤 주정헌 진명훈(발표자) 🤗
  • 2. 👶 Lifelog of BART https://arxiv.org/abs/1910.13461 https://www.aclweb.org/anthology/2020.acl-main.703/ https://github.com/pytorch/fairseq/tree/master/examples/bart
  • 3. 🐣 BART Architecture & Denoising Objective 🐥 BART Performance @ NLP Tasks 🦅 KoBART Summarization Example Transformer’s Encoderand Decoder Seq2Seq Model trained with denoising as pre-training objective QA Fine-tuning BART Comparing Pre-training Objectives Large-scale Pre-training Experiments Qualitative Analysis QA Introduction Summarization test results QA
  • 4. 🤔 What is BART? Sequence-to-Sequence model trained with denoising as pretraining objective https://www.aclweb.org/anthology/2020.acl-main.703/
  • 5. 🐣 BART Architecture & Denoising Objective https://jeddy92.github.io/JEddy92.github.io/ts_seq2seq_intro/ Seq2Seq: Input Sequence를 입력으로 받고 Output Sequence를 출력하는 모델 !! (AutoEncoder)
  • 6. 🐣 BART Architecture & Denoising Objective https://arxiv.org/pdf/1706.03762.pdf BART의 Encoder-Decoder를 Transformers로! (BERT-GPT)
  • 7. 🐣 BART Architecture & Denoising Objective Chickens cackle and _____ quack Chickens cackle and ducks quack https://www.deviantart.com/mighty355/art/Bart-Simpson-01-483298269 https://www.gwern.net/docs/psychology/writing/1953-taylor.pdf Sequence-to-Sequence model trained with denoising as pretraining objective
  • 8. 🐣 BART Architecture & Denoising Objective Self-supervised 방법론들은광범위한 NLP task에서 괄목할 성과를 거둠 • Word2Vec, ELMo, BERT,SpanBERT, XLNet,Roberta 이 중 가장 성공적인 접근은 Cloze Tasks에서 영감을 받은 MLM(MaskedLanguageModels)의변형 • WilsonL Taylor.1953.Clozeprocedure:A new tool for measuring readability. JournalismBulletin • In XLNet,Masked Token이예측될순서를 개선 (PermutationOperation) • In SpanBERT,Masked Token의분포를개선 (많은 NLP Task에서text span 간 관계 추론이 필요) • In UniLM,Masked Token을대체할 Context를개선 그러나 기존의 방법론들을 특정 End Task 형태에 집중하여 활용성이 떨어짐 • XLNet,SpanBERT,UniLM
  • 10. BERT ✓ Transformers의Encoder로모델구축 ✓ Cloze Task의영감을받은 MLM Pre-Train 목적함수와 ✓ Next Sentence Prediction Pre-Train목적함수로학습 ✓ Albert, Roberta등후속 연구에 의하면 MLM이진또배기! ✓ By XLNet Paper,BERTis based on denoisingauto-encoding log𝑝 𝜃 ҧ𝓍| Ƹ𝓍 ≈ ෍ 𝑡=1 𝑇 𝑚 𝑡 log 𝑝 𝜃 𝑥 𝑡| Ƹ𝓍 ✓ Independence Assumption 모든 masked token들은독립적으로재구축됨 ✓ Input Noise Pretrain-Finetune discrependancy ✓ 위의 단점을 XLNet은Autoregressive한LM으로해결코자함 PLM(PermutationLanguage Model) https://arxiv.org/pdf/1906.08237.pdf 🐣 BART Architecture & Denoising Objective
  • 11. 🐣 BART Architecture & Denoising Objective BART ✓ BART는Encoder-Decoder모델!(Transformers) ✓ BidirectionalTransformer’s Encoder (as like BERT) • Decoder의각Layer별Cross-Attention을수행 • Word Prediction 전추가적인 FFN 필요 X ✓ Autoregressive Transformers’Decoder (as like GPT) • ReLU대신GeLUs 사용 • Parameter를 𝑁(0,0.02)로초기화 ✓ BERT의단점들을 개선 가능 ✓ XLNet,SpanBERT,UniLM 등기존의 MLM 변형체들 채용 ✓ Noise Flexibility;origintext에임의의 변환을 적용할 수 있음 Base: 6 layers Large:12 layers Base: 6 layers Large:12 layers
  • 12. 🐣 BART Architecture & Denoising Objective 어떤 Noise를 줬을까??
  • 13. 🐣 BART Architecture & Denoising Objective Token Masking BERT (Devlin et al., 2019)와 동일하게, RandomToken들을 sampling하고 이를 [MASK] token으로대체 어제 그녀와 만났다. 그녀와의 데이트는 너무나도 달콤했다. 어제 [MASK] 만났다. [MASK] 데이트는 너무나도 달콤했다.
  • 14. 🐣 BART Architecture & Denoising Objective Token Deletion 어제 그녀와 만났다. 그녀와의 데이트는 너무나도 달콤했다. 어제 만났다. 데이트는 너무나도 달콤했다. 입력에서 임의로 token들을 제거 Token Masking과다르게, Model은어느 자리의 Token이 유실됐는지 결정해야함.
  • 15. 🐣 BART Architecture & Denoising Objective Text Infilling 어제 그녀와 만났다. 그녀와의 데이트는 너무나도 달콤했다. 어제 그녀와 [MASK] 만났다. [MASK] 너무나도 달콤했다. Poisson(𝜆 = 3) 분포에서 Span Length를 추출한 길이 만큼의 Text Span을 Sampling하고 각 Token을 단일 [MASK] token으로 대체한다. 길이가 0일 경우에는 [MASK] token을 삽입하는 것과 동일하다. 이는 모델에게 Span에서부터 얼마나 많은 Token이 유실됐는지 예측하도록 학습시킨다. SpanBERT (Joshi et al., 2019)에서 영감을 받았다고 한다. https://towardsdatascience.com/the-poisson-distribution-and-poisson-process-explained-4e2cb17d459 𝜆 = 𝔼 𝑋 = Var(𝑋)
  • 16. 🐣 BART Architecture & Denoising Objective Sentence Permutation 어제 그녀와 만났다. 그녀와의 데이트는 너무나도 달콤했다. 그녀와의 데이트는 너무나도 달콤했다. 어제 그녀와 만났다. Full Stop들을 기준으로 Document를 Sentences로 나누고 이 문장들을 임의의 순서로 섞어준다.
  • 17. 🐣 BART Architecture & Denoising Objective Document Rotation 임의로 Token을 선택하고 Document를 해당 Token으로 시작하도록 Rotate 이 Task는 Model이 문장의 시작이 어딘지 학습할 수 있게 도와준다. 어제 그녀와 만났다. 그녀와의 데이트는 너무나도 달콤했다. 만났다. 그녀와의 데이트는 너무나도 달콤했다. 어제 그녀와
  • 19. 🐣 BART Architecture & Denoising Objective Noise Flexibility https://github.com/pytorch/fairseq/blob/master/fairseq/tasks/denoising.py https://github.com/pytorch/fairseq/blob/master/fairseq/data/denoising_dataset.py https://github.com/pytorch/fairseq/issues/2303
  • 22. https://github.com/huggingface/transformers/pull/2745 TODO • Pretrained weight conversion • Some unit testing • fairseq 버전과 동일한 추론 결과 • Encoder, Decoder arguments • Docstrings • 코드 reader들을 위한 주석 작업 Future PRs • Pretraining objective 예제 • BartForSummarization 구현 🐣 BART Architecture & Denoising Objective
  • 23. 🐣 BART Architecture & Denoising Objective https://huggingface.co/transformers/model_doc/bart.html https://github.com/pytorch/fairseq/tree/master/examples/bart FAIRSEQ Hugging Face
  • 24. 🐣 BART Architecture & Denoising Objective https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/modeling_bart.py (bsz, seq_len) Embedded (bsz, seq_len, d) (bsz, seq_len, d) (bsz, seq_len) Embedded (bsz, seq_len, d) (bsz, seq_len, d) (bsz, seq_len, |V|) (bsz, seq_len) corrupt
  • 25. (bsz, seq_len, d) (bsz, seq_len) Embedded (bsz, seq_len, d) (bsz, seq_len, d) (bsz, seq_len) Embedded (bsz, seq_len, d) corrupt (bsz, seq_len) 🐣 BART Architecture & Denoising Objective Original Document Decoder’s Output 𝑙𝑜𝑠𝑠 𝑥, 𝑐𝑙𝑎𝑠𝑠 = − log exp 𝑥 𝑐𝑙𝑎𝑠𝑠 σ 𝑗 exp 𝑥 𝑗 = −𝑥 𝑐𝑙𝑎𝑠𝑠 + log ෍ 𝑗 exp 𝑥[𝑗] https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html BART is trained by corrupting documents and then optimizing a reconstruction loss – the cross-entropy between the decoder’s output and the original document (bsz, seq_len, |V|) (bsz * seq_len) (bsz * seq_len, |V|)
  • 26. 🐣 Fine-tuning BART Sentence Classification Tasks Sentence Classification Tasks Sequence Generation Tasks Machine Translation Tasks
  • 27. 🐥 BART Performance @ NLP Tasks Comparing Pre-training Objectives Base model 기준과 large model 기준으로 실험 Base-size model 을 기준으로 여러 Task에 대해서 비교 6 encoder layer, 6 decoder layer, 768 hidden size Model Description Language Model(CLM) GPT와 유사하게 left-to-right transformer 모델을 학습 Permuted Language Model(PLM) XLNet을 기반으로 토큰의 1/6을 샘플링하고 auto regressively 방식으로 생성 Masked Language Model(MLM) BERT에 따라 15% token을[MASK]로 대체하고 학습 Multitask Masked Language Model UniLM과 같이 additional self-attention mask. Self attention mask -> 1/6 left-to-right, 1/6 right-to-left, 1/3 un-masked, 1/3의50%는 un- masked, 나머지는 left-to-right Masked Seq-to-Seq(MASS) 토큰의 50%를포함하는 범위를 mask, masked된 토큰을 예측하기 위한 seq-to-seq 모델로 훈련 Task Description SQuAD wikipedia 단락에 대한 대한 질문/답변, 질문과 연결된 컨텍스트를 BAR T의 인코더에 대한 입력으로 사용, 추가 디코더에 전달 MNLI Fine-tuned model은 EOS 토큰이 추가된 두 문장을 연결, BART 인코 더와 디코더에 전달, EOS 토큰은 문장 관계를 분류 하는데 사용 ELI5 긴 형식의 추상적인 질문/응답 데이터 질문과 supporting document의 연결을 조건으로 답변 생성 XSum 매우 추상적 요약이 포함된 뉴스 요약 데이터 ConvAI2 context와 페르소나를 조건으로 하는 대화 응답 생성 CNN/DM 뉴스 요약 데이터
  • 28. 🐥 BART Performance @ NLP Tasks Comparing Pre-training Objectives
  • 29. 🐥 BART Performance @ NLP Tasks Large-scale Pre-training Experiments • encoder와decoder의hiddensize => 12 • batch size => 8,000 • train steps => 500,000 • tokenizing method => BPE • Text Infilling + Sentence Shuffling=> masking 30% of token, permute all sentences • train step의마지막 10% => dropout off • pre-trainingdata => 160Gb (news + books + stories + web text) https://github.com/modulabs/beyondBERT/issues/6
  • 30. 🐥 BART Performance @ NLP Tasks Large-scale Pre-training Experiments
  • 31. 🐥 BART Performance @ NLP Tasks Large-scale Pre-training Experiments
  • 32. 🐥 BART Performance @ NLP Tasks https://github.com/modulabs/beyondBERT/issues/6 단순히 input 정보를 넘어서, background knowledge를활용하는 모습을 보여줌 • "Source Document가journal Science의연구임" 을 말함 (1st example) • "PG&E 가 California에 있다"고 말함 (5th example)
  • 33. 🦅 KoBART Summarization Example https://github.com/SKT-AI/KoBART
  • 34. 🦅 KoBART Summarization Example https://github.com/haven-jeon/KoBART-chatbot https://github.com/seujung/KoBART-summarization https://github.com/SKT-AI/KoBART/tree/main/examples https://github.com/seujung/KoBART-translation
  • 35. 🦅 KoBART Summarization Example http://20.194.43.11:7874/ https://news.naver.com/main/read.nhn?mode=LSD&mid=shm&sid1=102&oid=119&aid=0002453221