SlideShare a Scribd company logo
1 of 31
Download to read offline
[course site]
Attention Models
Day 4 Lecture 6
Amaia Salvador
Attention Models: Motivation
Image:
H x W x 3
bird
The whole input volume is used to predict the output...
2
Attention Models: Motivation
Image:
H x W x 3
bird
The whole input volume is used to predict the output...
...despite the fact that not all pixels are equally important
3
Attention Models: Motivation
Attention models can
relieve computational burden
Helpful when processing big
images !
4
Attention Models: Motivation
Attention models can
relieve computational burden
Helpful when processing big
images !
5
bird
Encoder & Decoder
6
Kyunghyun Cho, “Introduction to Neural
Machine Translation with GPUs” (2015)
From previous lecture...
The whole input sentence
is used to produce the translation
Attention Models
7
Bahdanau et al. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
Attention Models
A bird flying over a body of water
Idea: Focus in different parts of the input as you make/refine predictions in time
E.g.: Image Captioning
8
LSTM Decoder
LSTMLSTM LSTM
CNN LSTM
A bird flying
...
<EOS>
The LSTM decoder “sees” the input only at the beginning !
Features:
D
9
...
Attention for Image Captioning
CNN
Image:
H x W x 3
Features:
L x D
Slide Credit: CS231n 10
Attention for Image Captioning
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
Slide Credit: CS231n
Attention
weights (LxD)
11
Attention for Image Captioning
Slide Credit: CS231n
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
z1
Weighted
combination
of features
y1
h1
First word
Attention
weights (LxD)
a2 y2
Weighted
features: D
predicted
word
12
Attention for Image Captioning
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
z1
Weighted
combination
of features
y1
h1
First word
a2 y2
h2
a3 y3
z2 y2
Weighted
features: D
predicted
word
Attention
weights (LxD)
Slide Credit: CS231n 13
Attention for Image Captioning
Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015
14
Attention for Image Captioning
Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015
15
Attention for Image Captioning
Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015
16
Soft Attention
Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015
CNN
Image:
H x W x 3
Grid of features
(Each
D-dimensional)
a b
c d
pa
pb
pc
pd
Distribution over
grid locations
pa
+ pb
+ pc
+ pc
= 1
Soft attention:
Summarize ALL locations
z = pa
a+ pb
b + pc
c + pd
d
Derivative dz/dp is nice!
Train with gradient descent
Context vector
z
(D-dimensional)
From
RNN:
Slide Credit: CS231n 17
Soft Attention
Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015
CNN
Image:
H x W x 3
Grid of features
(Each
D-dimensional)
a b
c d
pa
pb
pc
pd
Distribution over
grid locations
pa
+ pb
+ pc
+ pc
= 1
Soft attention:
Summarize ALL locations
z = pa
a+ pb
b + pc
c + pd
d
Differentiable function
Train with gradient descent
Context vector
z
(D-dimensional)
From
RNN:
Slide Credit: CS231n
● Still uses the whole input !
● Constrained to fix grid
18
Hard attention
Input image:
H x W x 3
Box Coordinates:
(xc, yc, w, h)
Cropped and
rescaled image:
X x Y x 3
Not a differentiable function !
Can’t train with backprop :(
19
Hard attention:
Sample a subset
of the input
need reinforcement learning
Gradient is 0 almost everywhere
Gradient is undefined at x = 0
Hard attention
Gregor et al. DRAW: A Recurrent Neural Network For Image Generation. ICML 2015
Generate images by attending to
arbitrary regions of the output
Classify images by attending to
arbitrary regions of the input
20
Hard attention
Gregor et al. DRAW: A Recurrent Neural Network For Image Generation. ICML 2015 21
Hard attention
22Graves. Generating Sequences with Recurrent Neural Networks. arXiv 2013
Read text, generate handwriting using an RNN that attends at
different arbitrary regions over time
GENERATED
REAL
Hard attention
Input image:
H x W x 3
Box Coordinates:
(xc, yc, w, h)
Cropped and
rescaled image:
X x Y x 3
CNN
bird
Not a differentiable function !
Can’t train with backprop :( 23
Spatial Transformer Networks
Input image:
H x W x 3
Box Coordinates:
(xc, yc, w, h)
Cropped and
rescaled image:
X x Y x 3
CNN
bird
Jaderberg et al. Spatial Transformer Networks. NIPS 2015
Not a differentiable function !
Can’t train with backprop :(
Make it differentiable
Train with backprop :) 24
Spatial Transformer Networks
Jaderberg et al. Spatial Transformer Networks. NIPS 2015
Input image:
H x W x 3
Box Coordinates:
(xc, yc, w, h)
Cropped and
rescaled image:
X x Y x 3
Can we make this
function differentiable?
Idea: Function mapping
pixel coordinates (xt, yt) of
output to pixel coordinates
(xs, ys) of input
Slide Credit: CS231n
Repeat for all pixels
in output to get a
sampling grid
Then use bilinear
interpolation to
compute output
Network
attends to
input by
predicting
25
Spatial Transformer Networks
Jaderberg et al. Spatial Transformer Networks. NIPS 2015
Easy to incorporate in any network, anywhere !
Differentiable module
Insert spatial transformers into a
classification network and it learns
to attend and transform the input
26
Spatial Transformer Networks
Jaderberg et al. Spatial Transformer Networks. NIPS 2015
27
Fine-grained classification
Visual Attention
Zhu et al. Visual7w: Grounded Question Answering in Images. arXiv 2016
Visual Question Answering
28
Visual Attention
Sharma et al. Action Recognition Using Visual Attention. arXiv 2016
Kuen et al. Recurrent Attentional Networks for Saliency Detection. CVPR 2016
Salient Object DetectionAction Recognition in Videos
29
Other examples
Chen et al. Attention to Scale: Scale-aware Semantic Image Segmentation. CVPR 2016
You et al. Image Captioning with Semantic Attention. CVPR 2016
Attention to scale for
semantic segmentation
Semantic attention
For image captioning
30
Resources
● CS231n Lecture @ Stanford [slides][video]
● More on Reinforcement Learning
● Soft vs Hard attention
● Handwriting generation demo
● Spatial Transformer Networks - Slides & Video by Victor Campos
● Attention implementations:
○ Seq2seq in Keras
○ DRAW & Spatial Transformers in Keras
○ DRAW in Lasagne
○ DRAW in Tensorflow
31

More Related Content

What's hot

[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Basit Rafiq
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkFerdous ahmed
 
Introduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningNguyen Giang
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksYunjey Choi
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningJörgen Sandig
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep LearningJulien SIMON
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentationBushra Jbawi
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning健程 杨
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
 
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNetIntroduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNetAmazon Web Services
 
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetupLiad Magen
 

What's hot (20)

1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Introduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine Learning
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
 
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNetIntroduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
 
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetup
 

Similar to Attention Models Day 4 Lecture

CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingMark Kilgard
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)Pierre Schaus
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smallerTony Tran
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Universitat Politècnica de Catalunya
 
Language Language Models (in 2023) - OpenAI
Language Language Models (in 2023) - OpenAILanguage Language Models (in 2023) - OpenAI
Language Language Models (in 2023) - OpenAISamuelButler15
 
f37-book-intarch-pres-pt1.ppt
f37-book-intarch-pres-pt1.pptf37-book-intarch-pres-pt1.ppt
f37-book-intarch-pres-pt1.pptTimothy Paul
 
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfCD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfRajJain516913
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningssusere5ddd6
 
Machine learning for document analysis and understanding
Machine learning for document analysis and understandingMachine learning for document analysis and understanding
Machine learning for document analysis and understandingSeiichi Uchida
 
f37-book-intarch-pres-pt1.ppt
f37-book-intarch-pres-pt1.pptf37-book-intarch-pres-pt1.ppt
f37-book-intarch-pres-pt1.pptrickjones250264
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 

Similar to Attention Models Day 4 Lecture (20)

Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
 
CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and Culling
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
Log polar coordinates
Log polar coordinatesLog polar coordinates
Log polar coordinates
 
Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smaller
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
 
Language Language Models (in 2023) - OpenAI
Language Language Models (in 2023) - OpenAILanguage Language Models (in 2023) - OpenAI
Language Language Models (in 2023) - OpenAI
 
f37-book-intarch-pres-pt1.ppt
f37-book-intarch-pres-pt1.pptf37-book-intarch-pres-pt1.ppt
f37-book-intarch-pres-pt1.ppt
 
f37-book-intarch-pres-pt1.ppt
f37-book-intarch-pres-pt1.pptf37-book-intarch-pres-pt1.ppt
f37-book-intarch-pres-pt1.ppt
 
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfCD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
Machine learning for document analysis and understanding
Machine learning for document analysis and understandingMachine learning for document analysis and understanding
Machine learning for document analysis and understanding
 
f37-book-intarch-pres-pt1.ppt
f37-book-intarch-pres-pt1.pptf37-book-intarch-pres-pt1.ppt
f37-book-intarch-pres-pt1.ppt
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Recently uploaded

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 

Recently uploaded (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 

Attention Models Day 4 Lecture

  • 1. [course site] Attention Models Day 4 Lecture 6 Amaia Salvador
  • 2. Attention Models: Motivation Image: H x W x 3 bird The whole input volume is used to predict the output... 2
  • 3. Attention Models: Motivation Image: H x W x 3 bird The whole input volume is used to predict the output... ...despite the fact that not all pixels are equally important 3
  • 4. Attention Models: Motivation Attention models can relieve computational burden Helpful when processing big images ! 4
  • 5. Attention Models: Motivation Attention models can relieve computational burden Helpful when processing big images ! 5 bird
  • 6. Encoder & Decoder 6 Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015) From previous lecture... The whole input sentence is used to produce the translation
  • 7. Attention Models 7 Bahdanau et al. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015 Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
  • 8. Attention Models A bird flying over a body of water Idea: Focus in different parts of the input as you make/refine predictions in time E.g.: Image Captioning 8
  • 9. LSTM Decoder LSTMLSTM LSTM CNN LSTM A bird flying ... <EOS> The LSTM decoder “sees” the input only at the beginning ! Features: D 9 ...
  • 10. Attention for Image Captioning CNN Image: H x W x 3 Features: L x D Slide Credit: CS231n 10
  • 11. Attention for Image Captioning CNN Image: H x W x 3 Features: L x D h0 a1 Slide Credit: CS231n Attention weights (LxD) 11
  • 12. Attention for Image Captioning Slide Credit: CS231n CNN Image: H x W x 3 Features: L x D h0 a1 z1 Weighted combination of features y1 h1 First word Attention weights (LxD) a2 y2 Weighted features: D predicted word 12
  • 13. Attention for Image Captioning CNN Image: H x W x 3 Features: L x D h0 a1 z1 Weighted combination of features y1 h1 First word a2 y2 h2 a3 y3 z2 y2 Weighted features: D predicted word Attention weights (LxD) Slide Credit: CS231n 13
  • 14. Attention for Image Captioning Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015 14
  • 15. Attention for Image Captioning Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015 15
  • 16. Attention for Image Captioning Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015 16
  • 17. Soft Attention Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015 CNN Image: H x W x 3 Grid of features (Each D-dimensional) a b c d pa pb pc pd Distribution over grid locations pa + pb + pc + pc = 1 Soft attention: Summarize ALL locations z = pa a+ pb b + pc c + pd d Derivative dz/dp is nice! Train with gradient descent Context vector z (D-dimensional) From RNN: Slide Credit: CS231n 17
  • 18. Soft Attention Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015 CNN Image: H x W x 3 Grid of features (Each D-dimensional) a b c d pa pb pc pd Distribution over grid locations pa + pb + pc + pc = 1 Soft attention: Summarize ALL locations z = pa a+ pb b + pc c + pd d Differentiable function Train with gradient descent Context vector z (D-dimensional) From RNN: Slide Credit: CS231n ● Still uses the whole input ! ● Constrained to fix grid 18
  • 19. Hard attention Input image: H x W x 3 Box Coordinates: (xc, yc, w, h) Cropped and rescaled image: X x Y x 3 Not a differentiable function ! Can’t train with backprop :( 19 Hard attention: Sample a subset of the input need reinforcement learning Gradient is 0 almost everywhere Gradient is undefined at x = 0
  • 20. Hard attention Gregor et al. DRAW: A Recurrent Neural Network For Image Generation. ICML 2015 Generate images by attending to arbitrary regions of the output Classify images by attending to arbitrary regions of the input 20
  • 21. Hard attention Gregor et al. DRAW: A Recurrent Neural Network For Image Generation. ICML 2015 21
  • 22. Hard attention 22Graves. Generating Sequences with Recurrent Neural Networks. arXiv 2013 Read text, generate handwriting using an RNN that attends at different arbitrary regions over time GENERATED REAL
  • 23. Hard attention Input image: H x W x 3 Box Coordinates: (xc, yc, w, h) Cropped and rescaled image: X x Y x 3 CNN bird Not a differentiable function ! Can’t train with backprop :( 23
  • 24. Spatial Transformer Networks Input image: H x W x 3 Box Coordinates: (xc, yc, w, h) Cropped and rescaled image: X x Y x 3 CNN bird Jaderberg et al. Spatial Transformer Networks. NIPS 2015 Not a differentiable function ! Can’t train with backprop :( Make it differentiable Train with backprop :) 24
  • 25. Spatial Transformer Networks Jaderberg et al. Spatial Transformer Networks. NIPS 2015 Input image: H x W x 3 Box Coordinates: (xc, yc, w, h) Cropped and rescaled image: X x Y x 3 Can we make this function differentiable? Idea: Function mapping pixel coordinates (xt, yt) of output to pixel coordinates (xs, ys) of input Slide Credit: CS231n Repeat for all pixels in output to get a sampling grid Then use bilinear interpolation to compute output Network attends to input by predicting 25
  • 26. Spatial Transformer Networks Jaderberg et al. Spatial Transformer Networks. NIPS 2015 Easy to incorporate in any network, anywhere ! Differentiable module Insert spatial transformers into a classification network and it learns to attend and transform the input 26
  • 27. Spatial Transformer Networks Jaderberg et al. Spatial Transformer Networks. NIPS 2015 27 Fine-grained classification
  • 28. Visual Attention Zhu et al. Visual7w: Grounded Question Answering in Images. arXiv 2016 Visual Question Answering 28
  • 29. Visual Attention Sharma et al. Action Recognition Using Visual Attention. arXiv 2016 Kuen et al. Recurrent Attentional Networks for Saliency Detection. CVPR 2016 Salient Object DetectionAction Recognition in Videos 29
  • 30. Other examples Chen et al. Attention to Scale: Scale-aware Semantic Image Segmentation. CVPR 2016 You et al. Image Captioning with Semantic Attention. CVPR 2016 Attention to scale for semantic segmentation Semantic attention For image captioning 30
  • 31. Resources ● CS231n Lecture @ Stanford [slides][video] ● More on Reinforcement Learning ● Soft vs Hard attention ● Handwriting generation demo ● Spatial Transformer Networks - Slides & Video by Victor Campos ● Attention implementations: ○ Seq2seq in Keras ○ DRAW & Spatial Transformers in Keras ○ DRAW in Lasagne ○ DRAW in Tensorflow 31