The document discusses visual transformers and attention mechanisms in computer vision. It summarizes recent work on applying transformers, originally used for natural language processing, to vision tasks. This includes Vision Transformers which treat images as sequences and apply self-attention. The document reviews key papers on attention mechanisms, the Transformer architecture, and applying transformers to computer vision through Vision Transformers.
22. Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
Attention Map
X
Output
x1 x2 x3
y1
y2
y3
XT (KeyT)
y1
y2
y3
Q
KT
V
=(Q.KT). V
23. Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3
XT (KeyT)
y1
y2
y3
24. Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
XT (KeyT)
y1
y2
y3
Q
KT
25. Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
XT (KeyT)
y1
y2
y3
Q
KT
Attention Map
x1 x2 x3
y1
y2
y3
26. Attention is all you Need
Basics explained
Y (Query)
X (Value)
X
XT (KeyT)
Q
KT
Attention Map
‘I’ ‘am’ ‘Leo’
‘Je’
‘suis’
‘leo’
‘I’
‘am’
‘Leo’
‘Je’
‘suis’
‘leo’
‘I’ ‘am’ ‘Leo’
27. Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
XT (KeyT)
y1
y2
y3
Q
KT
Attention Map
x1 x2 x3
y1
y2
y3
X
28. Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
Attention Map
X
Output
x1 x2 x3
y1
y2
y3
XT (KeyT)
y1
y2
y3
Q
KT
V
=(Q.KT). V
29. Attention is all you Need
Basics explained
Y (Query)
X (Value)
X
Attention Map
X
Output
XT (KeyT)
Q
KT
V
=(Q.KT). V
‘I’ ‘am’ ‘Leo’
‘Je’
‘suis’
‘leo’
‘I’
‘am’
‘Leo’
‘Je’
‘suis’
‘leo’
‘I’ ‘am’ ‘Leo’