SlideShare a Scribd company logo
1 of 48
Visual transformers
Leo Pauly
PhD student | Visual AI
Advisors: Prof. David Hogg, Prof. Raul Fuentes
University of Leeds, UK
Visual transformers
Leo Pauly
PhD student | Visual AI
Advisors: Prof. David Hogg, Prof. Raul Fuentes
University of Leeds, UK
Visual transformers
Leo Pauly
PhD student | Visual AI
Advisors: Prof. David Hogg, Prof. Raul Fuentes
University of Leeds, UK
Visual transformers
Leo Pauly
PhD student | Visual AI
Advisors: Prof. David Hogg, Prof. Raul Fuentes
University of Leeds, UK
Dosovitskiy et.al, ICLR 2021
Vaswani et.al, NeurlPS 2017
Dosovitskiy et.al, ICLR 2021
Vaswani et.al, NeurlPS 2017
Dosovitskiy et.al, ICLR 2021
Bahdanau et.al, ICLR 2015
Dosovitskiy et.al, ICLR 2021
Vaswani et.al, NeurlPS 2017 Bahdanau et.al, ICLR 2015
Sutskever et.al, NeurlPS 2014
Vaswani et.al, NeurlPS 2017
Sutskever et.al, NeurlPS 2014
Dosovitskiy et.al, ICLR 2021
Bahdanau et.al, ICLR 2015
Attention Mechanism
yi=RNN(yi-1,c,si-1)
s1 s2
y3
yo y1
y2
c
Bahdanau et.al, ICLR 2015
Attention Mechanism
yi=RNN(yi-1,c,si-1)
s1 s2
y3
yo y1
y2
c
Bahdanau et.al, ICLR 2015
• Bottleneck at the context vector (c)
• Information loss
• Back propagation issues
Attention Mechanism
yi=RNN(yi-1,c,si-1)
s1 s2
y3
yo y1
y2
c
Attention Mechanism
yi=RNN(yi-1,c,si-1)
s1 s2
y3
yo y1
y2
c
yi=RNN(yi-1,ci,si-1)
ci=f(hj) j=1…Tx
Attention Mechanism
s1 s2
y3
yo y1
y2
c
yi=RNN(yi-1,ci,si-1)
Figure from: https://medium.datadriveninvestor.com/attention-in-rnns-321fbcd64f05
Attention Mechanism
s1 s2
y3
yo y1
y2
c
yi=RNN(yi-1,ci,si-1)
Figure from: https://medium.datadriveninvestor.com/attention-in-rnns-321fbcd64f05
Attention Mechanism
s1 s2
y3
yo y1
y2
c
yi=RNN(yi-1,ci,si-1)
More reading: https://medium.datadriveninvestor.com/attention-in-rnns-321fbcd64f05
Attention Mechanism
Figure from: https://trungtran.io/2019/03/29/neural-machine-translation-with-attention-mechanism/
x=
y=
Attention is all you Need
Vaswani et.al, NeurlPS 2017
Attention is all you Need
Attention is all you Need
• Scaled dot product attention
• Multi-headed attention
• Self attention
Attention is all you Need
Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
Attention Map
X
Output
x1 x2 x3
y1
y2
y3
XT (KeyT)
y1
y2
y3
Q
KT
V
=(Q.KT). V
Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3
XT (KeyT)
y1
y2
y3
Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
XT (KeyT)
y1
y2
y3
Q
KT
Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
XT (KeyT)
y1
y2
y3
Q
KT
Attention Map
x1 x2 x3
y1
y2
y3
Attention is all you Need
Basics explained
Y (Query)
X (Value)
X
XT (KeyT)
Q
KT
Attention Map
‘I’ ‘am’ ‘Leo’
‘Je’
‘suis’
‘leo’
‘I’
‘am’
‘Leo’
‘Je’
‘suis’
‘leo’
‘I’ ‘am’ ‘Leo’
Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
XT (KeyT)
y1
y2
y3
Q
KT
Attention Map
x1 x2 x3
y1
y2
y3
X
Attention is all you Need
Basics explained
Y (Query)
X (Value)
x1
x2
x3
x1 x2 x3 X
Attention Map
X
Output
x1 x2 x3
y1
y2
y3
XT (KeyT)
y1
y2
y3
Q
KT
V
=(Q.KT). V
Attention is all you Need
Basics explained
Y (Query)
X (Value)
X
Attention Map
X
Output
XT (KeyT)
Q
KT
V
=(Q.KT). V
‘I’ ‘am’ ‘Leo’
‘Je’
‘suis’
‘leo’
‘I’
‘am’
‘Leo’
‘Je’
‘suis’
‘leo’
‘I’ ‘am’ ‘Leo’
Attention is all you Need
Attention is all you Need
Attention is all you Need
Self attention !!!
X
Attention is all you Need
Transformer Architecture
Attention is all you Need
Vision Transformers
Dosovitskiy et.al, ICLR 2021
Vision Transformers
Vision Transformers
x
xp=x1….xN
Vision Transformers
x
xp=x1….xN
Vision Transformers
x
xp=x1….xN
Vision Transformers
z0
zl
z'
l
L times
Vision Transformers
y
Vision Transformers
Results
• Transformers vs CNNs : Is it worth the hype ?
Vision Transformers
Insights
Ref: https://youtu.be/TvVc1e_4648
?
MaaS ?
• Transformers vs CNNs : Is it worth the hype ?
Vision Transformers
Insights
?
?
• Transformers vs CNNs : Is it worth the hype ?
Vision Transformers
Insights
Higher
resolutions ?
Vision Transformers
• Can we do (un)self-supervised pre-training ?
Insights
Goyal et.al, Arxiv 2021
• Architecture-level unification across domains
Multi-modal
AI systems
Vision Transformers
Insights
Q !

More Related Content

What's hot

Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxDeep Learning Italia
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution OverviewLEE HOSEONG
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSungjoon Choi
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)nikhilus85
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptxNoorUlHaq47
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptxYanhuaSi
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with TransformersDatabricks
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 

What's hot (20)

Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
ViT.pptx
ViT.pptxViT.pptx
ViT.pptx
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptx
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 

Recently uploaded

Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 

Recently uploaded (20)

Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 

Visual transformers: Attention is all you need for computer vision