Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019

Video Saliency Prediction
with Deep Neural Networks
Author:
Juan José Nieto
Advisors:
Eva Mohedano
Xavier Giró-i-Nieto
Kevin McGuiness
Barcelona, 5 February 2019
1

INDEX
1. Saliency Prediction
2. Objectives
3. Datasets and models
4. Proposal
5. Environment
6. Experiments
7. Conclusions
8. Future development
2

*Slides from DLCV Seminar by Kevin McGuiness
3

5

Where we look
6

● Understand what saliency
model is and how do they
work. Study
state-of-the-art.
● Set a baseline model
based on SalGAN on the
DHF1K.
● Explore complementary
modalities to explicitly
model time dynamics as
an input for SalGAN.
Objectives
7

Datasets and models
SalGAN
SALICON
ACLNet
DHF1K
DeepVS
LEDOV
8

SalGAN
Image source and paper: Pan, J., Ferrer, C.C., McGuinness, K., O'Connor, N.E., Torres, J., Sayrol, E. and Giro-i-Nieto, X., 2017.
Salgan: Visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081.
9

SALICON
Image source: http://salicon.net/explore/
● Mouse-movement
● General and
task-free
● 10K TRAINING
● 5K VALIDATION
● 5K TEST
● Gaussian width
mask 24 pixels
Paper: Jiang, M., Huang, S., Duan, J. and Zhao, Q., 2015. Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision
and pattern recognition (pp. 1072-1080) 10

ACLNet
Image source and paper: Wang, W., Shen, J., Guo, F., Cheng, M.M. and Borji, A., 2018, January. Revisiting Video Saliency: A
Large-scale Benchmark and a New Model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp.
4894-4903). 11

DHF1k
Images source: DHF1K dataset.
● Eye-tracker
● General and task-free
● 600 VIDEOS TRAINING
● 100 VIDEOS VALIDATION
● 300 VIDEOS TEST
● Gaussian width mask 30 pixels
Paper: Wang, W., Shen, J., Guo, F., Cheng, M.M. and Borji, A., 2018, January. Revisiting Video Saliency: A Large-scale
Benchmark and a New Model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp.
4894-4903).
12

DeepVS
13
Image source and paper: Jiang, L., Xu, M. and Wang, Z., 2017. Predicting Video Saliency with
Object-to-Motion CNN and Two-layer Convolutional LSTM. arXiv preprint arXiv:1709.06316.

LEDOV
Images source: LEDOV dataset.
● Eye-tracker
● General and task-free
● 436 VIDEOS TRAINING
● 41 VIDEOS VALIDATION
● 41 VIDEOS TEST
● Gaussian width mask 40 pixels
14

Methodology: SalBCE
R
G
B
DEPTH
COORDCONV
OR
OPTICAL FLOW
15

/home
code/
dataset/
saliency_maps/
/juanjo
DATASETS/
SALIENCY_MAPS/
SALGAN/
TRAINED_MODELS/
docker/
src/
Makefile
.git .gitignore LICENSE
README.md
Dockerfile
docker-composeE.yml
docker-composeJJ.yml
requirements.txt
dataloader
trained_models
utils
salgan_dhf1k
salgan_ledov
...
model_dataset_config/
model_dataset_config/
DHF1K
LEDOV
SALICON
...
...
16

Experiments
Checking
evaluation
Setting baseline
model
Transfer Learning
to DHF1k
Adding extra input
signals
17

Checking evaluation
Min AUC-JUDD video number:175 According to ACLNet. Metric value: 0.6630850
Ground-truth
SalGAN (0.775) ACL (0.762)
18

Experiments
Checking
evaluation
Setting baseline
model
Transfer Learning
to DHF1k
Adding extra input
signals
19

Setting baseline model
VALIDATION
BASELINE
LASAGNE
BASELINE
PYTORCH
3 EPOCHS
SALICON
17 EPOCHS
WITH DATA
AUGMENTATI
ON
27 EPOCHS
D.AUGM
(+ROTATE)
NESTEROV
PATIENCE=5
AUC_JUDD 0,856 0,763 0,861 0,862 0,863
AUC_SHUF 0,814 0,709 0,832 0,500 0,830
NSS 1,767 1,198 1,757 1,781 1,789
CC 0,843 0,555 0,859 0,862 0,866
SIM 0,726 0,536 0,751 0,757 0,761
20

Experiments
Setting baseline
model
Transfer Learning to
DHF1k
Adding extra input
signals
Checking
evaluation
21

Transfer learning to DHF1K
VALIDATION Baseline
Fine-tuned
Baseline
AUC_JUDD 0,872 0.880
AUC_SHUF 0,666 0.632
NSS 2,035 2.285
CC 0,379 0.420
SIM 0,267 0.339
22

Experiments
Setting baseline
model
Transfer Learning
to DHF1k
Adding extra
input signals
Checking
evaluation
23

Adding extra input signals
VALIDATION ACLNET
Train on
SALICON
Train on DHF1K
Baseline
RGB
fine-tuning
RGB and Depth
fine-tuning
RGB and
Coordconv
AUC_JUDD 0.89 0,872 0.880 0.895 0.866
AUC_SHUF 0.601 0,666 0.632 0.648 0.629
NSS 2.354 2,035 2.285 2.524 2.072
CC 0.434 0,379 0.420 0.463 0.389
SIM 0.315 0,267 0.339 0.351 0.304
24

Conclusions
● Good environment. The project code is
publicly available in
https://github.com/juanjo3ns/SalBCE.
● Study of state-of-the-art models. Rating
second in the public leaderboard with
SalGAN.
● Implementation of a pytorch version of
SalGAN with equivalent performance
fine-tuning in SALICON.
● Boost the performance of the baseline
pytorch model to predict saliency in videos.
The baseline model is fine-tuned on the
DHF1k dataset by using RGB information,
RGB + Depth, and RGB + coordinates
information.
25

Future Work
● LSTM
● Optical flow
● Combine with depth and coordconv in different streams
26

Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019

Similar to Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019 (20)

More from Universitat Politècnica de Catalunya

More from Universitat Politècnica de Catalunya (20)

Recently uploaded

Recently uploaded (20)

Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019