3. Patrick Michl
www.frootlab.org
Slide 3
PyData Heidelberg #4
2019-11-21
Attention Please!
Image source: John Henderson and Taylor Hayes, UC Davis
Visual attention in
human perception is
based on the dynamics
between Recognition
and Selection
Visual Attention in
Human Perception
4. Patrick Michl
www.frootlab.org
Slide 4
PyData Heidelberg #4
2019-11-21
Attention Please!
Visual Attention in
Image Captioning
Visual Attention in
Human Perception
since
1990s
since
2014
6. Patrick Michl
www.frootlab.org
Slide 6
PyData Heidelberg #4
2019-11-21
Attention Please!
Visual Attention in
Image Captioning
Encoder: CNN
Convolutional features
entangle spatial with
semantic information
Decoder: RNN
Encoder
7. Patrick Michl
www.frootlab.org
Slide 7
PyData Heidelberg #4
2019-11-21
Attention Please!
Visual Attention in
Image Captioning
Encoder: CNN
Convolutional features
entangle spatial with
semantic information
Decoder: RNN
Visual Attention
allows to recover spatial
relations of current outputs
Decoder
8. Patrick Michl
www.frootlab.org
Slide 8
PyData Heidelberg #4
2019-11-21
Attention Please!
Visual Attention in
Image Captioning
Encoder: CNN
Convolutional features
entangle spatial with
semantic information
Decoder: RNN
Visual Attention
allows to recover spatial
relations of current outputs
9. Patrick Michl
www.frootlab.org
Slide 9
PyData Heidelberg #4
2019-11-21
Attention Please!
Visual Attention in
Image Captioning
Visual Attention in
Human Perception
since
1990s
since
2014
since
2015
Global Attention in
Machine Translation
11. Patrick Michl
www.frootlab.org
Slide 11
PyData Heidelberg #4
2019-11-21
Attention Please!
Global Attention in
Machine Translation
Encoder: RNN
Sequence of hidden states
encodes the context building
stack
Decoder: RNN
Encoder
12. Patrick Michl
www.frootlab.org
Slide 12
PyData Heidelberg #4
2019-11-21
Attention Please!
Global Attention in
Machine Translation
Decoder
Encoder: RNN
Sequence of hidden states
encodes the context building
stack
Decoder: RNN
Global Attention
allows to recover the context
of current outputs
13. Patrick Michl
www.frootlab.org
Slide 13
PyData Heidelberg #4
2019-11-21
Attention Please!
Global Attention in
Machine Translation
Context
Encoder: RNN
Sequence of hidden states
encodes the context building
stack
Decoder: RNN
Global Attention
allows to recover the context
of current outputs
14. Patrick Michl
www.frootlab.org
Slide 14
PyData Heidelberg #4
2019-11-21
Attention Please!
Visual Attention in
Image Captioning
Visual Attention in
Human Perception
since
1990s
since
2014
since
2015
since
2016
Global Attention in
Machine Translation
Multi-Step Attention in
Natural Language
Processing
16. Patrick Michl
www.frootlab.org
Slide 16
PyData Heidelberg #4
2019-11-21
Attention Please!
Multi-Step Attention
in Natural Language
Processing
Encoder: CNN
Sequence of convolutional
features encodes the context
building stack
Decoder: CNN
Encoder
17. Patrick Michl
www.frootlab.org
Slide 17
PyData Heidelberg #4
2019-11-21
Attention Please!
Multi-Step Attention
in Natural Language
Processing
Encoder: CNN
Sequence of convolutional
features encodes the context
building stack
Decoder: CNN
Multi-Step Attention
allows to recover the current
context of current
deconvolutional steps
Decoder
18. Patrick Michl
www.frootlab.org
Slide 18
PyData Heidelberg #4
2019-11-21
Attention Please!
Visual Attention in
Image Captioning
Visual Attention in
Human Perception
Self-Attention in
Natural Language
Processing
since
1990s
since
2014
since
2015
since
2016
since
2017
Global Attention in
Machine Translation
Multi-Step Attention in
Natural Language
Processing
19. Patrick Michl
www.frootlab.org
Slide 19
PyData Heidelberg #4
2019-11-21
Attention Please!
Self-Attention in
Natural Language
Processing
Encoder: Stacked ANN
Decoder: Stacked ANN
Architecture
20. Patrick Michl
www.frootlab.org
Slide 20
PyData Heidelberg #4
2019-11-21
Attention Please!
Self-Attention in
Natural Language
Processing
Encoder: Stacked ANN
Sequence of hidden features
captures the encoder context
building stack
Decoder: Stacked ANN
Encoder Stack
21. Patrick Michl
www.frootlab.org
Slide 21
PyData Heidelberg #4
2019-11-21
Attention Please!
Self-Attention in
Natural Language
Processing
Encoder: Stacked ANN
Sequence of hidden features
captures the encoder context
building stack
Decoder: Stacked ANN
Multi-Step Attention unrolls
the context stack to the
decoder
Decoder Stack
22. Patrick Michl
www.frootlab.org
Slide 22
PyData Heidelberg #4
2019-11-21
Attention Please!
Self-Attention in
Natural Language
Processing
Encoder: Stacked ANN
Sequence of hidden features
captures the encoder context
building stack
Decoder: Stacked ANN
Multi-Step Attention unrolls
the context stack to the
decoder
------
Self-attention aggregates
context dependencies within
the inputs
Self-Attention
24. Patrick Michl
www.frootlab.org
Slide 24
PyData Heidelberg #4
2019-11-21
Attention Please!
Summary
#2
Attention machanisms immitate
this behaviour by using
intermediate states, that entangle
context with semantic information
#1
Human Perception is based on
the dynamics between selection
and recognition
25. Patrick Michl
www.frootlab.org
Slide 25
PyData Heidelberg #4
2019-11-21
Attention Please!
Summary
#2
Attention machanisms immitate
this behaviour by using
intermediate states, that entangle
context with semantic information
#1
Human Perception is based on
the dynamics between selection
and recognition
#3
The incorporation of context
provides dynamic features that
are context specific and therefore
improve the model perfomance
26. Patrick Michl
www.frootlab.org
Slide 26
PyData Heidelberg #4
2019-11-21
Attention Please!
Summary
#2
Attention machanisms immitate
this behaviour by using
intermediate states, that entangle
context with semantic information
#1
Human Perception is based on
the dynamics between selection
and recognition
#3
The incorporation of context
provides dynamic features that
are context specific and therefore
improve the model perfomance
#4
After all - attention mechanisms
can also help to understand the
decisions of deep networks