SlideShare a Scribd company logo
A Multiscale Visualization of
Attention in the Transformer Model
딥러닝 논문 읽기 모임
자연어처리팀 : 백지윤, 진명훈
발표자 : 백지윤
Jesse Vig
Palo Alto Research Center
2019 ACL
Contents
• 1 Introduction ) Transformer , Bert , GPT 

• 2 Visualization Tool ) Attention-head view , Model View, Neuron View

• 3 Use Case 

• 4 Conclusion
1. Introduction - Transformer
Transformer's key principle - Self-Attention
Softmax
α1
α2
α3
1. Introduction - Transformer
a1
self-attention
"layer"
self-attention "layer"
a1 a2 a3
self-attention "layer"
the same process continues
Transformer
• Actual Transformer assigns many
heads per each word rather than
just one head as mentioned
before
• A decoder's key and value vectors
come from an encoder
• There are some other detailed
stuff to talk about (positional
encoding etc.) I will go over it later
on for myself.....!
Transformer
• for example, if the embedding
dimension of each word is 8, a
sequence length is 3, the
number of heads is 4, then a
final shape will be (N,3,4,2)
• A decoder's key and value
vectors come from an encoder
enc_src=self.encoder(src,src_mask)
out=self.decoder(trg,enc_src,src_mask,trg
_mask)
#codes inside Decoder>
def forward(x,enc_out,enc_out...)
Bert
• Bert ; Transformer Encoder +
Fully connected layer
• In order to be a smart language
model, Bert completes two
difficult tasks at a same time
(MLM & NSP)
• After acquiring a linguistic
ability, Bert is ready for fine-
tuning.
[CLS] I want to be a [Mask] [Sep]
Tomorrow will be rainy.
GPT
• GPT ; Transformer Decoder +
Fully connected layer
• In order to be a fluent language
model, GPT gets through one
important task
GPT - 2
<START> 나는 학교에
나는 학교에 간다
print(generate_sent("이때",gpt_model,greedy=True)
>> "이 때문에 일부 전문가들은 ... "
2.Why Visualization Tool & challenges
• An advantage of using attention is that it can help interpret a model by
showing how the model assigns weight to different input element through
visualization
• One challenge for visualizing attention in the Transformer is that it uses a
multi-layer, multi-head attention mechanism. Ex) 24 layers and 16 heads ->
24 * 16 = 384 unique attention structures already ! "
Attention-head View
Attention-head View
Attention-head View
Use Case : Detecting Model Bias
The doctor asked the nurse a question.
He asked her if she ever had a heart
attack.
The doctor asked the nurse a question.
She said "I'm not sure what you're
talking about."
Model View
Model View can be especially useful
for paraphrase detection task.
Neuron View
Positive : Blue , Negative : Orange
Color saturation : magnitude of value
Neuron View
Neuron View
• The attention weights appear to
be largely independent of the
content of the input text, based
on the fact that all the query
vectors have very similar values
• A small number of neuron
positions appear to be mostly
responsible for this distance-
decaying attention pattern
Use Case
• Model intervention - ex. One
might prefer a slower decay rate
for a scientific text compared to
a children's story. Other heads
may afford different types of
interventions.
4 Conclusion
• To me, the paper was visually pleasing. 

• However, I carefully suggest that it might have been better to give more
detailed explanations of how they extracted each weight and computation
values. 

• I find the tool very useful since it might help understand the blackbox when
the model result is somewhat different from what I expect. (plus I already
found many many posts explaining Transformer in depth using the site image)
4 Related works
• Llion Jones. 2017. Tensor2tensor transformer visualization 

• Interactive visualization and manipulation of attention-based neural machine
translation 

• Visual interrogation of attention-based models for natural language inference
and machine comprehension
4 Related works
• Llion Jones. 2017. Tensor2tensor transformer visualization 

• Interactive visualization and manipulation of attention-based neural machine
translation 

• Visual interrogation of attention-based models for natural language inference
and machine comprehension

More Related Content

What's hot

FIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
FIWARE Global Summit - FogFlow, a new GE for IoT Edge ComputingFIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
FIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
FIWARE
 
【2017年4月時点】Oracle Essbase 概要
【2017年4月時点】Oracle Essbase 概要【2017年4月時点】Oracle Essbase 概要
【2017年4月時点】Oracle Essbase 概要
オラクルエンジニア通信
 
IBM Integrated Analytics System ユーザー利用ガイド 20180213
IBM Integrated Analytics System ユーザー利用ガイド 20180213IBM Integrated Analytics System ユーザー利用ガイド 20180213
IBM Integrated Analytics System ユーザー利用ガイド 20180213
IBM Analytics Japan
 
外部データラッパによる PostgreSQL の拡張
外部データラッパによる PostgreSQL の拡張外部データラッパによる PostgreSQL の拡張
外部データラッパによる PostgreSQL の拡張
Shigeru Hanada
 
Oracle Container Engine for Kubernetes (OKE) ご紹介 [2021年5月版]
Oracle Container Engine for Kubernetes (OKE) ご紹介 [2021年5月版]Oracle Container Engine for Kubernetes (OKE) ご紹介 [2021年5月版]
Oracle Container Engine for Kubernetes (OKE) ご紹介 [2021年5月版]
オラクルエンジニア通信
 
RISC-Vの可能性
RISC-Vの可能性RISC-Vの可能性
RISC-Vの可能性
たけおか しょうぞう
 
openSUSEで最強仮想環境をつくろう - ゲーミングから仮想通貨まで - OSC名古屋2017セミナー資料
openSUSEで最強仮想環境をつくろう - ゲーミングから仮想通貨まで - OSC名古屋2017セミナー資料openSUSEで最強仮想環境をつくろう - ゲーミングから仮想通貨まで - OSC名古屋2017セミナー資料
openSUSEで最強仮想環境をつくろう - ゲーミングから仮想通貨まで - OSC名古屋2017セミナー資料
zgock
 
cloudpack負荷職人結果レポート(サンプル)
cloudpack負荷職人結果レポート(サンプル)cloudpack負荷職人結果レポート(サンプル)
cloudpack負荷職人結果レポート(サンプル)
iret, Inc.
 
VMware vSphere Networking deep dive
VMware vSphere Networking deep diveVMware vSphere Networking deep dive
VMware vSphere Networking deep dive
Vepsun Technologies
 
vSphere 7 へのアップグレードについて
vSphere 7 へのアップグレードについてvSphere 7 へのアップグレードについて
vSphere 7 へのアップグレードについて
富士通クラウドテクノロジーズ株式会社
 
クラウドのコストを大幅削減!事例から見るクラウド間移行の効果(Oracle Cloudウェビナーシリーズ: 2020年7月8日)
クラウドのコストを大幅削減!事例から見るクラウド間移行の効果(Oracle Cloudウェビナーシリーズ: 2020年7月8日)クラウドのコストを大幅削減!事例から見るクラウド間移行の効果(Oracle Cloudウェビナーシリーズ: 2020年7月8日)
クラウドのコストを大幅削減!事例から見るクラウド間移行の効果(Oracle Cloudウェビナーシリーズ: 2020年7月8日)
オラクルエンジニア通信
 
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.euDatabase migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
aldaschwede80
 
PostgreSQLでスケールアウト
PostgreSQLでスケールアウトPostgreSQLでスケールアウト
PostgreSQLでスケールアウト
Masahiko Sawada
 
プログラムを自動生成する技術 ~ Programming by Example ~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
プログラムを自動生成する技術 ~ Programming by Example ~(NTTデータ テクノロジーカンファレンス 2020 発表資料)プログラムを自動生成する技術 ~ Programming by Example ~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
プログラムを自動生成する技術 ~ Programming by Example ~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
NTT DATA Technology & Innovation
 
Red Hat OpenShift Container Storage
Red Hat OpenShift Container StorageRed Hat OpenShift Container Storage
Red Hat OpenShift Container Storage
Takuya Utsunomiya
 
Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019
Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019
Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019
VMware Tanzu
 
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)Horizon Cloud on Microsoft Azure 概要 (2018年4月版)
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)
Takamasa Maejima
 
Apache ignite v1.3
Apache ignite v1.3Apache ignite v1.3
Apache ignite v1.3
Klearchos Klearchou
 
昨今のストレージ選定のポイントとCephStorageの特徴
昨今のストレージ選定のポイントとCephStorageの特徴昨今のストレージ選定のポイントとCephStorageの特徴
昨今のストレージ選定のポイントとCephStorageの特徴
Takuya Utsunomiya
 
Emc data domain
Emc data domainEmc data domain
Emc data domain
solarisyougood
 

What's hot (20)

FIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
FIWARE Global Summit - FogFlow, a new GE for IoT Edge ComputingFIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
FIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
 
【2017年4月時点】Oracle Essbase 概要
【2017年4月時点】Oracle Essbase 概要【2017年4月時点】Oracle Essbase 概要
【2017年4月時点】Oracle Essbase 概要
 
IBM Integrated Analytics System ユーザー利用ガイド 20180213
IBM Integrated Analytics System ユーザー利用ガイド 20180213IBM Integrated Analytics System ユーザー利用ガイド 20180213
IBM Integrated Analytics System ユーザー利用ガイド 20180213
 
外部データラッパによる PostgreSQL の拡張
外部データラッパによる PostgreSQL の拡張外部データラッパによる PostgreSQL の拡張
外部データラッパによる PostgreSQL の拡張
 
Oracle Container Engine for Kubernetes (OKE) ご紹介 [2021年5月版]
Oracle Container Engine for Kubernetes (OKE) ご紹介 [2021年5月版]Oracle Container Engine for Kubernetes (OKE) ご紹介 [2021年5月版]
Oracle Container Engine for Kubernetes (OKE) ご紹介 [2021年5月版]
 
RISC-Vの可能性
RISC-Vの可能性RISC-Vの可能性
RISC-Vの可能性
 
openSUSEで最強仮想環境をつくろう - ゲーミングから仮想通貨まで - OSC名古屋2017セミナー資料
openSUSEで最強仮想環境をつくろう - ゲーミングから仮想通貨まで - OSC名古屋2017セミナー資料openSUSEで最強仮想環境をつくろう - ゲーミングから仮想通貨まで - OSC名古屋2017セミナー資料
openSUSEで最強仮想環境をつくろう - ゲーミングから仮想通貨まで - OSC名古屋2017セミナー資料
 
cloudpack負荷職人結果レポート(サンプル)
cloudpack負荷職人結果レポート(サンプル)cloudpack負荷職人結果レポート(サンプル)
cloudpack負荷職人結果レポート(サンプル)
 
VMware vSphere Networking deep dive
VMware vSphere Networking deep diveVMware vSphere Networking deep dive
VMware vSphere Networking deep dive
 
vSphere 7 へのアップグレードについて
vSphere 7 へのアップグレードについてvSphere 7 へのアップグレードについて
vSphere 7 へのアップグレードについて
 
クラウドのコストを大幅削減!事例から見るクラウド間移行の効果(Oracle Cloudウェビナーシリーズ: 2020年7月8日)
クラウドのコストを大幅削減!事例から見るクラウド間移行の効果(Oracle Cloudウェビナーシリーズ: 2020年7月8日)クラウドのコストを大幅削減!事例から見るクラウド間移行の効果(Oracle Cloudウェビナーシリーズ: 2020年7月8日)
クラウドのコストを大幅削減!事例から見るクラウド間移行の効果(Oracle Cloudウェビナーシリーズ: 2020年7月8日)
 
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.euDatabase migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
 
PostgreSQLでスケールアウト
PostgreSQLでスケールアウトPostgreSQLでスケールアウト
PostgreSQLでスケールアウト
 
プログラムを自動生成する技術 ~ Programming by Example ~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
プログラムを自動生成する技術 ~ Programming by Example ~(NTTデータ テクノロジーカンファレンス 2020 発表資料)プログラムを自動生成する技術 ~ Programming by Example ~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
プログラムを自動生成する技術 ~ Programming by Example ~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
 
Red Hat OpenShift Container Storage
Red Hat OpenShift Container StorageRed Hat OpenShift Container Storage
Red Hat OpenShift Container Storage
 
Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019
Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019
Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019
 
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)Horizon Cloud on Microsoft Azure 概要 (2018年4月版)
Horizon Cloud on Microsoft Azure 概要 (2018年4月版)
 
Apache ignite v1.3
Apache ignite v1.3Apache ignite v1.3
Apache ignite v1.3
 
昨今のストレージ選定のポイントとCephStorageの特徴
昨今のストレージ選定のポイントとCephStorageの特徴昨今のストレージ選定のポイントとCephStorageの特徴
昨今のストレージ選定のポイントとCephStorageの特徴
 
Emc data domain
Emc data domainEmc data domain
Emc data domain
 

Similar to A Multiscale Visualization of Attention in the Transformer Model

Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
Adwait Bhave
 
tensorflow.pptx
tensorflow.pptxtensorflow.pptx
tensorflow.pptx
JoanJeremiah
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
Charmi Chokshi
 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applications
Anas Arram, Ph.D
 
10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions
10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions
10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions
Jesus Rodriguez
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
ebelani
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A Survey
AkshayaNagarajan10
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
Quinton Anderson
 
Learning to Translate with Joey NMT
Learning to Translate with Joey NMTLearning to Translate with Joey NMT
Learning to Translate with Joey NMT
Julia Kreutzer
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
Uwe Friedrichsen
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
Shirin Elsinghorst
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLP
Justin Long
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learning
Dr. Ananth Krishnamoorthy
 
CPP19 - Revision
CPP19 - RevisionCPP19 - Revision
CPP19 - Revision
Michael Heron
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
Yuta Niki
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
doppenhe
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017
Iwan Sofana
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Zeynep Su Kurultay
 

Similar to A Multiscale Visualization of Attention in the Transformer Model (20)

Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
tensorflow.pptx
tensorflow.pptxtensorflow.pptx
tensorflow.pptx
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applications
 
10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions
10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions
10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A Survey
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
 
Learning to Translate with Joey NMT
Learning to Translate with Joey NMTLearning to Translate with Joey NMT
Learning to Translate with Joey NMT
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLP
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learning
 
CPP19 - Revision
CPP19 - RevisionCPP19 - Revision
CPP19 - Revision
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 

More from taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
taeseon ryu
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
taeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
taeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
taeseon ryu
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
taeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
taeseon ryu
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
taeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
taeseon ryu
 
mPLUG
mPLUGmPLUG
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
taeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Recently uploaded

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

A Multiscale Visualization of Attention in the Transformer Model

  • 1. A Multiscale Visualization of Attention in the Transformer Model 딥러닝 논문 읽기 모임 자연어처리팀 : 백지윤, 진명훈 발표자 : 백지윤 Jesse Vig Palo Alto Research Center 2019 ACL
  • 2. Contents • 1 Introduction ) Transformer , Bert , GPT • 2 Visualization Tool ) Attention-head view , Model View, Neuron View • 3 Use Case • 4 Conclusion
  • 3. 1. Introduction - Transformer Transformer's key principle - Self-Attention Softmax α1 α2 α3
  • 4. 1. Introduction - Transformer a1 self-attention "layer"
  • 5. self-attention "layer" a1 a2 a3 self-attention "layer" the same process continues
  • 6. Transformer • Actual Transformer assigns many heads per each word rather than just one head as mentioned before • A decoder's key and value vectors come from an encoder • There are some other detailed stuff to talk about (positional encoding etc.) I will go over it later on for myself.....!
  • 7. Transformer • for example, if the embedding dimension of each word is 8, a sequence length is 3, the number of heads is 4, then a final shape will be (N,3,4,2) • A decoder's key and value vectors come from an encoder enc_src=self.encoder(src,src_mask) out=self.decoder(trg,enc_src,src_mask,trg _mask) #codes inside Decoder> def forward(x,enc_out,enc_out...)
  • 8. Bert • Bert ; Transformer Encoder + Fully connected layer • In order to be a smart language model, Bert completes two difficult tasks at a same time (MLM & NSP) • After acquiring a linguistic ability, Bert is ready for fine- tuning. [CLS] I want to be a [Mask] [Sep] Tomorrow will be rainy.
  • 9. GPT • GPT ; Transformer Decoder + Fully connected layer • In order to be a fluent language model, GPT gets through one important task GPT - 2 <START> 나는 학교에 나는 학교에 간다 print(generate_sent("이때",gpt_model,greedy=True) >> "이 때문에 일부 전문가들은 ... "
  • 10. 2.Why Visualization Tool & challenges • An advantage of using attention is that it can help interpret a model by showing how the model assigns weight to different input element through visualization • One challenge for visualizing attention in the Transformer is that it uses a multi-layer, multi-head attention mechanism. Ex) 24 layers and 16 heads -> 24 * 16 = 384 unique attention structures already ! "
  • 14. Use Case : Detecting Model Bias The doctor asked the nurse a question. He asked her if she ever had a heart attack. The doctor asked the nurse a question. She said "I'm not sure what you're talking about."
  • 15. Model View Model View can be especially useful for paraphrase detection task.
  • 16. Neuron View Positive : Blue , Negative : Orange Color saturation : magnitude of value
  • 18. Neuron View • The attention weights appear to be largely independent of the content of the input text, based on the fact that all the query vectors have very similar values • A small number of neuron positions appear to be mostly responsible for this distance- decaying attention pattern
  • 19. Use Case • Model intervention - ex. One might prefer a slower decay rate for a scientific text compared to a children's story. Other heads may afford different types of interventions.
  • 20. 4 Conclusion • To me, the paper was visually pleasing. • However, I carefully suggest that it might have been better to give more detailed explanations of how they extracted each weight and computation values. • I find the tool very useful since it might help understand the blackbox when the model result is somewhat different from what I expect. (plus I already found many many posts explaining Transformer in depth using the site image)
  • 21. 4 Related works • Llion Jones. 2017. Tensor2tensor transformer visualization • Interactive visualization and manipulation of attention-based neural machine translation • Visual interrogation of attention-based models for natural language inference and machine comprehension
  • 22. 4 Related works • Llion Jones. 2017. Tensor2tensor transformer visualization • Interactive visualization and manipulation of attention-based neural machine translation • Visual interrogation of attention-based models for natural language inference and machine comprehension