Visual transformers: Attention is all you need for computer vision

•Download as PPTX, PDF•

0 likes•440 views

The document discusses visual transformers and attention mechanisms in computer vision. It summarizes recent work on applying transformers, originally used for natural language processing, to vision tasks. This includes Vision Transformers which treat images as sequences and apply self-attention. The document reviews key papers on attention mechanisms, the Transformer architecture, and applying transformers to computer vision through Vision Transformers.

Engineering

Visual transformers
Leo Pauly
PhD student | Visual AI
Advisors: Prof. David Hogg, Prof. Raul Fuentes
University of Leeds, UK

Vaswani et.al, NeurlPS 2017
Dosovitskiy et.al, ICLR 2021

Vaswani et.al, NeurlPS 2017
Dosovitskiy et.al, ICLR 2021
Bahdanau et.al, ICLR 2015

Dosovitskiy et.al, ICLR 2021
Vaswani et.al, NeurlPS 2017 Bahdanau et.al, ICLR 2015
Sutskever et.al, NeurlPS 2014

Vaswani et.al, NeurlPS 2017
Sutskever et.al, NeurlPS 2014
Dosovitskiy et.al, ICLR 2021
Bahdanau et.al, ICLR 2015

Attention Mechanism
yi=RNN(yi-1,c,si-1)
s1 s2
y3
yo y1
y2
c
Bahdanau et.al, ICLR 2015

Attention Mechanism
yi=RNN(yi-1,c,si-1)
s1 s2
y3
yo y1
y2
c
Bahdanau et.al, ICLR 2015
• Bottleneck at the context vector (c)
• Information loss
• Back propagation issues

Attention Mechanism
yi=RNN(yi-1,c,si-1)
s1 s2
y3
yo y1
y2
c

Attention Mechanism
yi=RNN(yi-1,c,si-1)
s1 s2
y3
yo y1
y2
c
yi=RNN(yi-1,ci,si-1)
ci=f(hj) j=1…Tx

Attention Mechanism
s1 s2
y3
yo y1
y2
c
yi=RNN(yi-1,ci,si-1)
Figure from: https://medium.datadriveninvestor.com/attention-in-rnns-321fbcd64f05

Attention Mechanism
s1 s2
y3
yo y1
y2
c
yi=RNN(yi-1,ci,si-1)
More reading: https://medium.datadriveninvestor.com/attention-in-rnns-321fbcd64f05

Attention Mechanism
Figure from: https://trungtran.io/2019/03/29/neural-machine-translation-with-attention-mechanism/
x=
y=

Attention is all you Need
Vaswani et.al, NeurlPS 2017

Attention is all you Need
• Scaled dot product attention
• Multi-headed attention
• Self attention

Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
Attention Map
X
Output
x1 x2 x3
y1
y2
y3
XT (KeyT)
y1
y2
y3
Q
KT
V
=(Q.KT). V

Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3
XT (KeyT)
y1
y2
y3

Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
XT (KeyT)
y1
y2
y3
Q
KT

Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
XT (KeyT)
y1
y2
y3
Q
KT
Attention Map
x1 x2 x3
y1
y2
y3

Attention is all you Need
Basics explained
Y (Query)
X (Value)
X
XT (KeyT)
Q
KT
Attention Map
‘I’ ‘am’ ‘Leo’
‘Je’
‘suis’
‘leo’
‘I’
‘am’
‘Leo’
‘Je’
‘suis’
‘leo’
‘I’ ‘am’ ‘Leo’

Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
XT (KeyT)
y1
y2
y3
Q
KT
Attention Map
x1 x2 x3
y1
y2
y3
X

Attention is all you Need
Basics explained
Y (Query)
X (Value)
X
Attention Map
X
Output
XT (KeyT)
Q
KT
V
=(Q.KT). V
‘I’ ‘am’ ‘Leo’
‘Je’
‘suis’
‘leo’
‘I’
‘am’
‘Leo’
‘Je’
‘suis’
‘leo’
‘I’ ‘am’ ‘Leo’

Attention is all you Need
Self attention !!!
X

Attention is all you Need
Transformer Architecture

Vision Transformers
Dosovitskiy et.al, ICLR 2021

• Transformers vs CNNs : Is it worth the hype ?
Vision Transformers
Insights
Ref: https://youtu.be/TvVc1e_4648
?
MaaS ?

• Transformers vs CNNs : Is it worth the hype ?
Vision Transformers
Insights
?
?

• Transformers vs CNNs : Is it worth the hype ?
Vision Transformers
Insights
Higher
resolutions ?

Vision Transformers
• Can we do (un)self-supervised pre-training ?
Insights
Goyal et.al, Arxiv 2021

• Architecture-level unification across domains
Multi-modal
AI systems
Vision Transformers
Insights

What's hot

Attention is All You Need (Transformer)Jeong-Gwan Lee

Deep Learning - CNN and RNNAshray Bhandare

Transformers In Vision From Zero to Hero (DLI).pptxDeep Learning Italia

Swin transformerJAEMINJEONG5

Single Image Super Resolution OverviewLEE HOSEONG

Image segmentation with deep learningAntonio Rueda-Toicen

ViT.pptxChangjin Lee

Semantic Segmentation Methods using Deep LearningSungjoon Choi

Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov

Action Recognition (Thesis presentation)nikhilus85

U-Netpresentation.pptxNoorUlHaq47

Mask R-CNNChanuk Lim

Emerging Properties in Self-Supervised Vision TransformersSungchul Kim

Introduction to Transformer ModelNuwan Sriyantha Bandara

Deep Learning for Video: Action Recognition (UPC 2018)Universitat Politècnica de Catalunya

Deep Learning in Computer VisionSungjoon Choi

Resnet.pptxYanhuaSi

Object Detection with TransformersDatabricks

Deep Learning - Convolutional Neural NetworksChristian Perone

Introduction to object detectionBrodmann17

What's hot (20)

Attention is All You Need (Transformer)

Deep Learning - CNN and RNN

Transformers In Vision From Zero to Hero (DLI).pptx

Swin transformer

Single Image Super Resolution Overview

Image segmentation with deep learning

ViT.pptx

Semantic Segmentation Methods using Deep Learning

Recurrent Neural Networks. Part 1: Theory

Action Recognition (Thesis presentation)

U-Netpresentation.pptx

Mask R-CNN

Emerging Properties in Self-Supervised Vision Transformers

Introduction to Transformer Model

Deep Learning for Video: Action Recognition (UPC 2018)

Deep Learning in Computer Vision

Resnet.pptx

Object Detection with Transformers

Deep Learning - Convolutional Neural Networks

Introduction to object detection

Recently uploaded

Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774

Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran

main PPT.pptx of girls hostel security using rfidNikhilNagaraju

An experimental study in using natural admixture as an alternative for chemic...Chandu841456

Heart Disease Prediction using machine learning.pptxPoojaBan

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar

Oxy acetylene welding presentation note.eptoze12

Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721

An introduction to Semiconductor and its types.pptxPurva Nikam

Electronically Controlled suspensions system .pdfme23b1001

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

computer application and construction managementMariconPadriquez1

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxnull - The Open Security Community

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani

Introduction-To-Agricultural-Surveillance-Rover.pptxk795866

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

Recently uploaded (20)

Arduino_CSE ece ppt for working and principal of arduino.ppt

Introduction to Machine Learning Unit-3 for II MECH

main PPT.pptx of girls hostel security using rfid

An experimental study in using natural admixture as an alternative for chemic...

Heart Disease Prediction using machine learning.pptx

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger

Oxy acetylene welding presentation note.

Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync

An introduction to Semiconductor and its types.pptx

Electronically Controlled suspensions system .pdf

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR

computer application and construction management

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf

Introduction-To-Agricultural-Surveillance-Rover.pptx

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)

Call Girls Delhi {Jodhpur} 9711199012 high profile service

Visual transformers: Attention is all you need for computer vision

1. Visual transformers Leo Pauly PhD student | Visual AI Advisors: Prof. David Hogg, Prof. Raul Fuentes University of Leeds, UK

2. Visual transformers Leo Pauly PhD student | Visual AI Advisors: Prof. David Hogg, Prof. Raul Fuentes University of Leeds, UK

3. Visual transformers Leo Pauly PhD student | Visual AI Advisors: Prof. David Hogg, Prof. Raul Fuentes University of Leeds, UK

4. Visual transformers Leo Pauly PhD student | Visual AI Advisors: Prof. David Hogg, Prof. Raul Fuentes University of Leeds, UK

5. Dosovitskiy et.al, ICLR 2021

6. Vaswani et.al, NeurlPS 2017 Dosovitskiy et.al, ICLR 2021

7. Vaswani et.al, NeurlPS 2017 Dosovitskiy et.al, ICLR 2021 Bahdanau et.al, ICLR 2015

8. Dosovitskiy et.al, ICLR 2021 Vaswani et.al, NeurlPS 2017 Bahdanau et.al, ICLR 2015 Sutskever et.al, NeurlPS 2014

9. Vaswani et.al, NeurlPS 2017 Sutskever et.al, NeurlPS 2014 Dosovitskiy et.al, ICLR 2021 Bahdanau et.al, ICLR 2015

10. Attention Mechanism yi=RNN(yi-1,c,si-1) s1 s2 y3 yo y1 y2 c Bahdanau et.al, ICLR 2015

11. Attention Mechanism yi=RNN(yi-1,c,si-1) s1 s2 y3 yo y1 y2 c Bahdanau et.al, ICLR 2015 • Bottleneck at the context vector (c) • Information loss • Back propagation issues

12. Attention Mechanism yi=RNN(yi-1,c,si-1) s1 s2 y3 yo y1 y2 c

13. Attention Mechanism yi=RNN(yi-1,c,si-1) s1 s2 y3 yo y1 y2 c yi=RNN(yi-1,ci,si-1) ci=f(hj) j=1…Tx

14. Attention Mechanism s1 s2 y3 yo y1 y2 c yi=RNN(yi-1,ci,si-1) Figure from: https://medium.datadriveninvestor.com/attention-in-rnns-321fbcd64f05

15. Attention Mechanism s1 s2 y3 yo y1 y2 c yi=RNN(yi-1,ci,si-1) Figure from: https://medium.datadriveninvestor.com/attention-in-rnns-321fbcd64f05

16. Attention Mechanism s1 s2 y3 yo y1 y2 c yi=RNN(yi-1,ci,si-1) More reading: https://medium.datadriveninvestor.com/attention-in-rnns-321fbcd64f05

17. Attention Mechanism Figure from: https://trungtran.io/2019/03/29/neural-machine-translation-with-attention-mechanism/ x= y=

18. Attention is all you Need Vaswani et.al, NeurlPS 2017

19. Attention is all you Need

20. Attention is all you Need • Scaled dot product attention • Multi-headed attention • Self attention

21. Attention is all you Need

22. Attention is all you Need Basics explained Y (Query) X (Value) x1 x2 x3 x1 x2 x3 X Attention Map X Output x1 x2 x3 y1 y2 y3 XT (KeyT) y1 y2 y3 Q KT V =(Q.KT). V

23. Attention is all you Need Basics explained Y (Query) X (Value) x1 x2 x3 x1 x2 x3 XT (KeyT) y1 y2 y3

24. Attention is all you Need Basics explained Y (Query) X (Value) x1 x2 x3 x1 x2 x3 X XT (KeyT) y1 y2 y3 Q KT

25. Attention is all you Need Basics explained Y (Query) X (Value) x1 x2 x3 x1 x2 x3 X XT (KeyT) y1 y2 y3 Q KT Attention Map x1 x2 x3 y1 y2 y3

26. Attention is all you Need Basics explained Y (Query) X (Value) X XT (KeyT) Q KT Attention Map ‘I’ ‘am’ ‘Leo’ ‘Je’ ‘suis’ ‘leo’ ‘I’ ‘am’ ‘Leo’ ‘Je’ ‘suis’ ‘leo’ ‘I’ ‘am’ ‘Leo’

27. Attention is all you Need Basics explained Y (Query) X (Value) x1 x2 x3 x1 x2 x3 X XT (KeyT) y1 y2 y3 Q KT Attention Map x1 x2 x3 y1 y2 y3 X

28. Attention is all you Need Basics explained Y (Query) X (Value) x1 x2 x3 x1 x2 x3 X Attention Map X Output x1 x2 x3 y1 y2 y3 XT (KeyT) y1 y2 y3 Q KT V =(Q.KT). V

29. Attention is all you Need Basics explained Y (Query) X (Value) X Attention Map X Output XT (KeyT) Q KT V =(Q.KT). V ‘I’ ‘am’ ‘Leo’ ‘Je’ ‘suis’ ‘leo’ ‘I’ ‘am’ ‘Leo’ ‘Je’ ‘suis’ ‘leo’ ‘I’ ‘am’ ‘Leo’

30. Attention is all you Need

31. Attention is all you Need

32. Attention is all you Need Self attention !!! X

33. Attention is all you Need Transformer Architecture

34. Attention is all you Need

35. Vision Transformers Dosovitskiy et.al, ICLR 2021

36. Vision Transformers

37. Vision Transformers x xp=x1….xN

38. Vision Transformers x xp=x1….xN

39. Vision Transformers x xp=x1….xN

40. Vision Transformers z0 zl z' l L times

41. Vision Transformers y

42. Vision Transformers Results

43. • Transformers vs CNNs : Is it worth the hype ? Vision Transformers Insights Ref: https://youtu.be/TvVc1e_4648 ? MaaS ?

44. • Transformers vs CNNs : Is it worth the hype ? Vision Transformers Insights ? ?

45. • Transformers vs CNNs : Is it worth the hype ? Vision Transformers Insights Higher resolutions ?

46. Vision Transformers • Can we do (un)self-supervised pre-training ? Insights Goyal et.al, Arxiv 2021

47. • Architecture-level unification across domains Multi-modal AI systems Vision Transformers Insights

48. Q !

Visual transformers: Attention is all you need for computer vision

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Recently uploaded

Recently uploaded (20)

Visual transformers: Attention is all you need for computer vision