Attention mechanism 소개 자료

Attention is all you need
Whi Kwon

소개
2008 ~ 2015: 화공생명공학과
2015 ~ 2017: 품질 / 고객지원 엔지니어
2017 ~ 2018: 딥러닝 자유롭게 공부
2018 ~: 의료 분야 스타트업

관심사
~2017.12: Vision, NLP
~2018.06: RL, GAN
2018.06~: Relational, Imitation

Outline
Part.1: Attention
Part.2: Self-Attention

Attention, also referred to as enthrallment, is the behavioral and cognitive process
of selectively concentrating on a discrete aspect of information, whether deemed
subjective or objective, while ignoring other perceivable information. It is a state of
arousal. . It is the taking possession by the mind in clear and vivid form of one out
of what seem several simultaneous objects or trains of thought. Focalization, the
concentration of consciousness, is of its essence. Attention or enthrallment or
attention has also been described as the allocation of limited cognitive processing
resources.

arousal. It is the taking possession by the mind in clear and vivid form of one out of
what seem several simultaneous objects or trains of thought. Focalization, the
resources.
Recurrent Neural Network
...
attention also referred resources
문제 : Non-parallel computation, not long-range dependencies

arousal. It is the taking possession by the mind in clear and vivid form of one out of
what seem several simultaneous objects or trains of thought. Focalization, the
resources.
Convolution Neural Network
attention also ... cognitive process
of selectively ... whether deemed
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
concentrat of ... enthrallment or
attention has ... cognitive processing
Filter
문제 : Not long-range dependencies, computationally inefficient

arousal. . It is the taking possession by the mind in clear and vivid form of one out
of what seem several simultaneous objects or trains of thought. Focalization, the
resources.
Attention mechanism
Parallel computation, long-range dependencies, explainable

Attention mechanism
Fig. from Vaswani et al. Attention is all you need. ArXiv. 2017
1. Q 와 K 간의 유사도를 구합니다 .

Attention mechanism
2. 너무 큰 값이 지배적이지 않도록 normalize

Attention mechanism
3. 유사도 → 가중치 ( 총 합 =1)

Attention mechanism
3. 유사도 → 가중치 ( 총 합 =1)
4. 가중치를
V 에 곱해줍니다 .

Attention mechanism
정보 {K:V} 가 어떤 Q 와 연관이 있을 것입니다 .
이를 활용해서 K 와 Q 의 유사도를 구하고 이를 , V 에 반영해줍시다 .
그럼 Q 에 직접적으로 연관된 V 의 정보를 더 많이 전달해 줄 수 있을 것입
니다 .
3. 유사도 → 가중치 ( 총 합 =1)
4. 가중치를
V 에 곱해줍니다 .

e.g. Attention mechanism with Seq2Seq
...
Encoder
Decoder
...
Decoder 의 정보 전달은 오직 이
전 t 의 정보에 의존적입니다 .
Encoder 의 마지막 정보가
Decoder 로 전달됩니다 .
Encoder 의 정보 전달은 이전
t 의 hidden state, 현재 t 의
input 에 의존적입니다 .
(Machine translation, Encoder-Decoder, Attention)

⊕
...
Decoder
Encoder
...
Attention
Long-range dependency

Fig from Bahdanau et al. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR. 2015
Attention

e.g. Style-token
Fig. from Wang et al. Style-tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis.
ArXiv. 2018
Decoder
Encoder1
Encoder2
GST
(Random init token)
⊕
Attention
(Text to speech, Encoder-Decoder, Style transfer, Attention)
Demo: https://google.github.io/tacotron/publications/global_style_tokens/

1
Self-attention
1 2 3
4 5 6
7 8 9
3
9
1
2
*
=
0.1
0.3
0.10.1
0.10.2
0.30.10.10.1
... ...
0.1
0.3
0.10.1
0.10.2
0.30.10.10.1
...
*
*
*
3
9
1
2
...
1’ 2’ 3’
4’ 5’ 6’
7’ 8’ 9’
1’⊕
Self-attention LayerSelf-attention Layer

2
Self-attention
1 2 3
4 5 6
7 8 9
3
9
1
2
*
=
0.1
0.1
0.10.1
0.10.2
0.30.10.10.2
... ...
0.1
0.1
0.10.1
0.10.2
0.30.10.10.2
...
*
*
*
3
9
1
2
...
⊕
1’ 2’ 3’
4’ 5’ 6’
7’ 8’ 9’
2’⊕
Self-attention Layer

Fig. from Wang et al. Non-local neural networks. ArXiv. 2017.
1. i, j pixel 간의 유사도를 구한다 .
Self-attention

2. j pixel 값을 곱한다 .
Self-attention

3. normalization 항
Self-attention

i, j 번째 정보는 서로 연관이 있을 것입니다 .
각 위치 별 유사도를 구하고 이를 가중치로 반영해줍시다 .
그럼 , 모든 위치 별 관계를 학습 할 수 있을 것입니다 .
(Long-range dependency!)
3. normalization 항
Self-attention

e.g. Self-Attention GAN
(Image generation, GAN, Self-attention)
Transpose
Conv ⊕
Latent
(z)
Image
(x’)
Self-
Attention
Conv ⊕ FC
Self-
Attention
ProbImage
(x)
Generator
Discriminator

Fig. from Zhang et al. Self-Attention Generative Adversarial Networks. ArXiv. 2018.
e.g. Self-Attention GAN
(Image generation, GAN, Self-attention)

Conclusion
Attention:
Self-Attention:

Next...?
Relational Network, Graphical Model...

Reference
- Bahdanau et al. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR. 2015
- Wang et al. Non-local neural networks. ArXiv. 2017
- Vaswani et al. Attention is all you need. ArXiv. 2017
- Wang et al. Style-tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End
Speech Synthesis. ArXiv. 2018
- Zhang et al. Self-Attention Generative Adversarial Networks. ArXiv. 2018.
- Attention is all you need 설명 블로그
(https://mchromiak.github.io/articles/2017/Sep/12/Transformer-Attention-is-all-you-need/)
- Attention is all you need 설명 동영상
(https://www.youtube.com/watch?v=iDulhoQ2pro)

Attention mechanism 소개 자료

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Attention mechanism 소개 자료

Similar to Attention mechanism 소개 자료 (20)

Recently uploaded

Recently uploaded (20)

Attention mechanism 소개 자료