SlideShare a Scribd company logo
1 of 41
Download to read offline
The Importance of Time in
Visual Attention Models
Author: Marta Coll Pol
Advisors: Kevin Mc Guinness and Xavier
Giró-i-Nieto
2
INTRODUCTION
Visual Attention Models
● What is visual attention?
○ Visual System Mechanism: allows humans to selectively process visual
information.
● What is a visual attention model?
○ Predictive models that aim to predict regions that attract human attention.
Source: https://algorithmia.com/algorithms/deeplearning/SalNet/docs 3
Saliency Information Representations
4
Motivation
Consistency in fixation selection between participants as
a function of fixation number [1].
5
[1] Benjamin W Tatler, Roland J Baddeley, and Iain D Gilchrist. Visual correlates of fixation selection: Effects of scale and time. Vision research,
45(5):643-659, 2005.
Motivation
● Hypothesis I: Visual Attention models have difficulties predicting later fixation
points.
● Hypothesis II: Adding less weight to later fixations in the ground-truth maps
used for training, could help improving visual attention models’ performance.
6
7
DATASETS OF
TEMPORALLY
SORTED FIXATIONS
Datasets
● iSUN:
○ Data collection: eye-tracking [2].
● SALICON:
○ Data collection: mouse-tracking [3].
8
Saliency ground-truth data for one observer in a given image:
[2] Yinda Zhang, Fisher Yu, Shuran Song, Pingmei Xu, Ari Se, and Jianxiong Xiao. Large-scale scene understanding challenge: Eye tracking
saliency estimation.
[3] Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. Salicon: Saliency in context. In Computer Vision and Pattern Recognition
(CVPR), 2015 IEEE Conference on, pages 1072-1080. IEEE, 2015.
Fixation points in order of visualization
iSUN:
● Locations to Fixations: Mean-shift.
● We used timestamps as a weight to
retrieve fixation points in order.
9
Mean-shift applied to the locations made for a given
image by an observer. Colored dots are locations
assigned to different clusters. Red Crosses are the
mean of each cluster (centroid), Fixation points.
Fixation points in order of visualization
SALICON:
● Locations to Fixations: Samples with high mouse-moving velocity are
excluded while keeping fixation points.
● An observation was made to see if fixation points are given in order of
visualization in the dataset.
10
Fixation points in order of visualization : SALICON
11
1
23
4
5
6
7
8
1
23
4
5
6
7
8
12
TEMPORALLY
WEIGHTED
SALIENCY
Temporally Weighted Saliency Maps (TWSM)
● Weighting in function of visualization order.
● Weighting function inspired in consistency
graph.
Weighting function:
13
TWSM : Choosing a weighting function parameter
14
Consistency in fixation selection between participants as a function of
fixation number
Saliency Prediction Metrics
● Area Under ROC Curve (AUC) Judd
● Kullback-Leibler Divergence (KLdiv)
● Pearson correlation coefficient (CC)
15
Model: MLNet
Architecture:
CNN + Encoding Network ( produces a temporary saliency map) + Prior learning
network = Final Saliency Map
16
[2] Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. A deep multi-level network for saliency prediction. ICPR 2016.
Model: Different MLNet versions
17
Model Training Temporal Weighting
MLNet Cornia et el [2] ❌ (nSM)
nMLNet (baseline) Ours ❌ (nSM)
wMLNet Ours ✅ (wSM)
[2] Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. A deep multi-level network for saliency prediction. ICPR 2016.
Model: Replicating MLNet Results (nMLNet)
18
SALICON 2015 (Validation set) AUC Judd
MLNet 0.886
nMLNet (baseline) 0.814
SALIENCY PREDICTION
Experiments
19
Experiment: Do visual attention models have difficulties predicting later
fixations?
Hypothesis I: Visual attention models have difficulties predicting later
fixations.
● Evaluate MLNet’s predicted maps for the iSUN training set using two types
of ground-truth: nSMs and wSMs.
20
Results: Do visual attention models have difficulties predicting later fixations?
Maps that scored way better when evaluated using wSM:
● Metric: KLdiv.
21
Results: Do visual attention models have difficulties predicting later fixations?
22
Results: Do visual attention models have difficulties predicting later fixations?
Maps that scored way better when evaluated using nSM:
● Metric: KLdiv
● Scores are in general bad for both maps, we could not extract further
conclusions.
23
Experiment: Study on the effect on the use of Weighted Saliency Maps on a
visual attention model's performance
Hypothesis II: Adding less weight to later fixations in the ground-truth maps used for
training, can improve model’s performance.
● We trained nMLNet and wMLNet, using the SALICON dataset. And evaluated their
performance.
24
Results: Study on the effect on the use of Weighted Saliency Maps on a visual
attention model's performance
AUC Judd
Kullback-Leibler Divergence (KLdiv)
25
Ground-truth nMLNet wMLNet
nSM 0.814 0.816
Ground-truth nMLNet wMLNet
nSM 1.332 1.039
wSM 1.493 1.136
Pearson's Correlation Coecient(CC)
Ground-truth nMLNet wMLNet
nSM 0.534 0.539
wSM 0.509 0.517
VISUAL SEARCH
Experiments
26
SalBoW
● SalBow is a retrieval framework that uses
saliency maps to weight the contribution of local
convolutional representations for the instance
search task.
● The model is divided in two parts:
○ CNN + K-means = Assignment Map (visual
vocabulary)
○ Saliency prediction model
● Model’s output: After weighting a Bag of Words is
created to build the final image representation.
27
SalBow pipeline with saliency weighting [3].
[3] Eva Mohedano, Kevin McGuinness, Xavier Giro-i Nieto, and Noel E O'Connor. Saliency weighted convolutional features for instance
search. arXiv preprint arXiv:1711.10795, 2017.
Results
28
Mean Average Precision (mAP)
Saliency Maps type mAP
nSM 0.6730
wSM 0.6743
29
CONCLUSIONS
Conclusions
30
Experiment: Do visual attention models have difficulties predicting
later fixations?
Conclusions
31
Experiment: Study on the effect on the use of Weighted Saliency Maps on a visual attention model's
performance.
AUC Judd
Kullback-Leibler Divergence (KLdiv)
Ground-truth nMLNet wMLNet
nSM 0.814 0.816
Ground-truth nMLNet wMLNet
nSM 1.332 1.039
wSM 1.493 1.136
Pearson's Correlation Coecient(CC)
Ground-truth nMLNet wMLNet
nSM 0.534 0.539
wSM 0.509 0.517
Conclusions
32
Mean Average Precision (mAP)
Saliency Maps type mAP
nSM 0.6730
wSM 0.6743
Experiment: Visual search task.
Conclusions : Future Work
New evaluation metric:
● Take into account what visual attention models can predict.
Ways of improving Temporally Weighted Saliency Maps:
● Weighting in function of order and the time spend on each fixation point.
● Looking for those fixation points that are common between observers for each specific image.
33
Project’s Code
● The code for this project can be found in :
https://github.com/imatge-upc/saliency-2018-timeweight
● The model and the not-owned code used in this project, is open-source.
34
Any Questions?
35
Quick Introduction to Deep Learning
Neural Networks:
● Forward propagation
● Loss function : metric to evaluate
a model’s performance.
● Back propagation
● The goal is to minimize the loss
function.
Source: http://cs231n.github.io/neural-networks-1/
Three-layer neural network (two hidden layers of four neurons
each and a single output), with three inputs.
36
Convolutional Neural Networks
Convolutional Neural Network (CNN) architectures receive a multi-channel image as an input
and are usually structured with a set of Convolutional layers each followed by a non-linear
operation (usually RELU) and sometimes by a pooling layer (usually max-pooling). At the end of the
network, Fully Connected layers are usually used.
37Source: http://www.mdpi.com/1099-4300/19/6/242
Convolutional Neural Networks
2) Source : http://cs231n.github.io/convolutional-networks/ 38
39
BUDGET
Budget
40
● The resources used on this thesis were of 1 GPU with 11GB of GDDR SDRAM (Graphics
Double Data Rate Synchronous Dynamic RAM) and about 30GB of regular RAM. The most
similar resources in Amazon Web Services can be found in EC2 instance; p2.xlarge. This
service provides 1 GPU, 12GB GDDR SDRAM and 1 CPU with 61GB of RAM. The cost of
this service for the duration of the project ascents to 1647€.
● Open-source software.
● No maintenance costs due to the project being a comparative study.
Results: Do visual attention models have difficulties predicting later fixations?
41
● Evaluation performed using KLdiv metric.
● We computed the distance for each image between both evaluations' scores using the absolute
difference |a-b|.

More Related Content

What's hot

What's hot (20)

Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
 
V2 v posenet
V2 v posenetV2 v posenet
V2 v posenet
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesis
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
 
Convolutional neural networks
Convolutional neural networks Convolutional neural networks
Convolutional neural networks
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)
 
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondenceParn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
Cnn
CnnCnn
Cnn
 

Similar to The Importance of Time in Visual Attention Models

NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
paperpublications3
 

Similar to The Importance of Time in Visual Attention Models (20)

Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in Images
 
Deep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual NavigationDeep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual Navigation
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
REVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNNREVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNN
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
 
28 01-2021-05
28 01-2021-0528 01-2021-05
28 01-2021-05
 
ExplainableAI.pptx
ExplainableAI.pptxExplainableAI.pptx
ExplainableAI.pptx
 
GNR638_Course Project for spring semester
GNR638_Course Project for spring semesterGNR638_Course Project for spring semester
GNR638_Course Project for spring semester
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
 
Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?
 
ImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural NetworksImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks
 
CNN
CNNCNN
CNN
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
 
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
 
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
 

More from Universitat Politècnica de Catalunya

Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Recently uploaded

Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 

Recently uploaded (20)

2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 

The Importance of Time in Visual Attention Models

  • 1. The Importance of Time in Visual Attention Models Author: Marta Coll Pol Advisors: Kevin Mc Guinness and Xavier Giró-i-Nieto
  • 3. Visual Attention Models ● What is visual attention? ○ Visual System Mechanism: allows humans to selectively process visual information. ● What is a visual attention model? ○ Predictive models that aim to predict regions that attract human attention. Source: https://algorithmia.com/algorithms/deeplearning/SalNet/docs 3
  • 5. Motivation Consistency in fixation selection between participants as a function of fixation number [1]. 5 [1] Benjamin W Tatler, Roland J Baddeley, and Iain D Gilchrist. Visual correlates of fixation selection: Effects of scale and time. Vision research, 45(5):643-659, 2005.
  • 6. Motivation ● Hypothesis I: Visual Attention models have difficulties predicting later fixation points. ● Hypothesis II: Adding less weight to later fixations in the ground-truth maps used for training, could help improving visual attention models’ performance. 6
  • 8. Datasets ● iSUN: ○ Data collection: eye-tracking [2]. ● SALICON: ○ Data collection: mouse-tracking [3]. 8 Saliency ground-truth data for one observer in a given image: [2] Yinda Zhang, Fisher Yu, Shuran Song, Pingmei Xu, Ari Se, and Jianxiong Xiao. Large-scale scene understanding challenge: Eye tracking saliency estimation. [3] Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. Salicon: Saliency in context. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, pages 1072-1080. IEEE, 2015.
  • 9. Fixation points in order of visualization iSUN: ● Locations to Fixations: Mean-shift. ● We used timestamps as a weight to retrieve fixation points in order. 9 Mean-shift applied to the locations made for a given image by an observer. Colored dots are locations assigned to different clusters. Red Crosses are the mean of each cluster (centroid), Fixation points.
  • 10. Fixation points in order of visualization SALICON: ● Locations to Fixations: Samples with high mouse-moving velocity are excluded while keeping fixation points. ● An observation was made to see if fixation points are given in order of visualization in the dataset. 10
  • 11. Fixation points in order of visualization : SALICON 11 1 23 4 5 6 7 8 1 23 4 5 6 7 8
  • 13. Temporally Weighted Saliency Maps (TWSM) ● Weighting in function of visualization order. ● Weighting function inspired in consistency graph. Weighting function: 13
  • 14. TWSM : Choosing a weighting function parameter 14 Consistency in fixation selection between participants as a function of fixation number
  • 15. Saliency Prediction Metrics ● Area Under ROC Curve (AUC) Judd ● Kullback-Leibler Divergence (KLdiv) ● Pearson correlation coefficient (CC) 15
  • 16. Model: MLNet Architecture: CNN + Encoding Network ( produces a temporary saliency map) + Prior learning network = Final Saliency Map 16 [2] Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. A deep multi-level network for saliency prediction. ICPR 2016.
  • 17. Model: Different MLNet versions 17 Model Training Temporal Weighting MLNet Cornia et el [2] ❌ (nSM) nMLNet (baseline) Ours ❌ (nSM) wMLNet Ours ✅ (wSM) [2] Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. A deep multi-level network for saliency prediction. ICPR 2016.
  • 18. Model: Replicating MLNet Results (nMLNet) 18 SALICON 2015 (Validation set) AUC Judd MLNet 0.886 nMLNet (baseline) 0.814
  • 20. Experiment: Do visual attention models have difficulties predicting later fixations? Hypothesis I: Visual attention models have difficulties predicting later fixations. ● Evaluate MLNet’s predicted maps for the iSUN training set using two types of ground-truth: nSMs and wSMs. 20
  • 21. Results: Do visual attention models have difficulties predicting later fixations? Maps that scored way better when evaluated using wSM: ● Metric: KLdiv. 21
  • 22. Results: Do visual attention models have difficulties predicting later fixations? 22
  • 23. Results: Do visual attention models have difficulties predicting later fixations? Maps that scored way better when evaluated using nSM: ● Metric: KLdiv ● Scores are in general bad for both maps, we could not extract further conclusions. 23
  • 24. Experiment: Study on the effect on the use of Weighted Saliency Maps on a visual attention model's performance Hypothesis II: Adding less weight to later fixations in the ground-truth maps used for training, can improve model’s performance. ● We trained nMLNet and wMLNet, using the SALICON dataset. And evaluated their performance. 24
  • 25. Results: Study on the effect on the use of Weighted Saliency Maps on a visual attention model's performance AUC Judd Kullback-Leibler Divergence (KLdiv) 25 Ground-truth nMLNet wMLNet nSM 0.814 0.816 Ground-truth nMLNet wMLNet nSM 1.332 1.039 wSM 1.493 1.136 Pearson's Correlation Coecient(CC) Ground-truth nMLNet wMLNet nSM 0.534 0.539 wSM 0.509 0.517
  • 27. SalBoW ● SalBow is a retrieval framework that uses saliency maps to weight the contribution of local convolutional representations for the instance search task. ● The model is divided in two parts: ○ CNN + K-means = Assignment Map (visual vocabulary) ○ Saliency prediction model ● Model’s output: After weighting a Bag of Words is created to build the final image representation. 27 SalBow pipeline with saliency weighting [3]. [3] Eva Mohedano, Kevin McGuinness, Xavier Giro-i Nieto, and Noel E O'Connor. Saliency weighted convolutional features for instance search. arXiv preprint arXiv:1711.10795, 2017.
  • 28. Results 28 Mean Average Precision (mAP) Saliency Maps type mAP nSM 0.6730 wSM 0.6743
  • 30. Conclusions 30 Experiment: Do visual attention models have difficulties predicting later fixations?
  • 31. Conclusions 31 Experiment: Study on the effect on the use of Weighted Saliency Maps on a visual attention model's performance. AUC Judd Kullback-Leibler Divergence (KLdiv) Ground-truth nMLNet wMLNet nSM 0.814 0.816 Ground-truth nMLNet wMLNet nSM 1.332 1.039 wSM 1.493 1.136 Pearson's Correlation Coecient(CC) Ground-truth nMLNet wMLNet nSM 0.534 0.539 wSM 0.509 0.517
  • 32. Conclusions 32 Mean Average Precision (mAP) Saliency Maps type mAP nSM 0.6730 wSM 0.6743 Experiment: Visual search task.
  • 33. Conclusions : Future Work New evaluation metric: ● Take into account what visual attention models can predict. Ways of improving Temporally Weighted Saliency Maps: ● Weighting in function of order and the time spend on each fixation point. ● Looking for those fixation points that are common between observers for each specific image. 33
  • 34. Project’s Code ● The code for this project can be found in : https://github.com/imatge-upc/saliency-2018-timeweight ● The model and the not-owned code used in this project, is open-source. 34
  • 36. Quick Introduction to Deep Learning Neural Networks: ● Forward propagation ● Loss function : metric to evaluate a model’s performance. ● Back propagation ● The goal is to minimize the loss function. Source: http://cs231n.github.io/neural-networks-1/ Three-layer neural network (two hidden layers of four neurons each and a single output), with three inputs. 36
  • 37. Convolutional Neural Networks Convolutional Neural Network (CNN) architectures receive a multi-channel image as an input and are usually structured with a set of Convolutional layers each followed by a non-linear operation (usually RELU) and sometimes by a pooling layer (usually max-pooling). At the end of the network, Fully Connected layers are usually used. 37Source: http://www.mdpi.com/1099-4300/19/6/242
  • 38. Convolutional Neural Networks 2) Source : http://cs231n.github.io/convolutional-networks/ 38
  • 40. Budget 40 ● The resources used on this thesis were of 1 GPU with 11GB of GDDR SDRAM (Graphics Double Data Rate Synchronous Dynamic RAM) and about 30GB of regular RAM. The most similar resources in Amazon Web Services can be found in EC2 instance; p2.xlarge. This service provides 1 GPU, 12GB GDDR SDRAM and 1 CPU with 61GB of RAM. The cost of this service for the duration of the project ascents to 1647€. ● Open-source software. ● No maintenance costs due to the project being a comparative study.
  • 41. Results: Do visual attention models have difficulties predicting later fixations? 41 ● Evaluation performed using KLdiv metric. ● We computed the distance for each image between both evaluations' scores using the absolute difference |a-b|.