Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019

521 views

Published on

https://imatge.upc.edu/web/publications/video-saliency-prediction-deep-neural-networks

Saliency prediction is a topic undergoing intense study in computer vision with a broad range of applications. It consists in predicting where the attention is going to be received in an image or a video by a human. Our work is based on a deep neural network named SalGAN, which was trained on a saliency annotated dataset of static images. In this thesis we investigate different approaches for extending SalGAN to the video domain. To this end, we investigate the recently proposed saliency annotated video dataset DHF1K to train and evaluate our models. The obtained results indicate that techniques such as depth estimation or coordconv can effectively be used as additional modalities to enhance the saliency prediction of static images obtained with SalGAN, achieving encouraging results in the DHF1K benchmark. Our work is based on pytorch and it is publicly available here.

Published in: Data & Analytics
  • Login to see the comments

  • Be the first to like this

Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019

  1. 1. Video Saliency Prediction with Deep Neural Networks Author: Juan José Nieto Advisors: Eva Mohedano Xavier Giró-i-Nieto Kevin McGuiness Barcelona, 5 February 2019 1
  2. 2. INDEX 1. Saliency Prediction 2. Objectives 3. Datasets and models 4. Proposal 5. Environment 6. Experiments 7. Conclusions 8. Future development 2
  3. 3. *Slides from DLCV Seminar by Kevin McGuiness 3
  4. 4. 4
  5. 5. *Slides from DLCV Seminar by Kevin McGuiness 5
  6. 6. Where we look *Slides from DLCV Seminar by Kevin McGuiness 6
  7. 7. ● Understand what saliency model is and how do they work. Study state-of-the-art. ● Set a baseline model based on SalGAN on the DHF1K. ● Explore complementary modalities to explicitly model time dynamics as an input for SalGAN. Objectives 7
  8. 8. Datasets and models SalGAN SALICON ACLNet DHF1K DeepVS LEDOV 8
  9. 9. SalGAN Image source and paper: Pan, J., Ferrer, C.C., McGuinness, K., O'Connor, N.E., Torres, J., Sayrol, E. and Giro-i-Nieto, X., 2017. Salgan: Visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081. 9
  10. 10. SALICON Image source: http://salicon.net/explore/ ● Mouse-movement ● General and task-free ● 10K TRAINING ● 5K VALIDATION ● 5K TEST ● Gaussian width mask 24 pixels Paper: Jiang, M., Huang, S., Duan, J. and Zhao, Q., 2015. Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1072-1080) 10
  11. 11. ACLNet Image source and paper: Wang, W., Shen, J., Guo, F., Cheng, M.M. and Borji, A., 2018, January. Revisiting Video Saliency: A Large-scale Benchmark and a New Model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4894-4903). 11
  12. 12. DHF1k Images source: DHF1K dataset. ● Eye-tracker ● General and task-free ● 600 VIDEOS TRAINING ● 100 VIDEOS VALIDATION ● 300 VIDEOS TEST ● Gaussian width mask 30 pixels Paper: Wang, W., Shen, J., Guo, F., Cheng, M.M. and Borji, A., 2018, January. Revisiting Video Saliency: A Large-scale Benchmark and a New Model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4894-4903). 12
  13. 13. DeepVS 13 Image source and paper: Jiang, L., Xu, M. and Wang, Z., 2017. Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM. arXiv preprint arXiv:1709.06316.
  14. 14. LEDOV Images source: LEDOV dataset. ● Eye-tracker ● General and task-free ● 436 VIDEOS TRAINING ● 41 VIDEOS VALIDATION ● 41 VIDEOS TEST ● Gaussian width mask 40 pixels 14
  15. 15. Methodology: SalBCE R G B DEPTH COORDCONV OR OPTICAL FLOW 15
  16. 16. /home code/ dataset/ saliency_maps/ /juanjo DATASETS/ SALIENCY_MAPS/ SALGAN/ TRAINED_MODELS/ docker/ src/ Makefile .git .gitignore LICENSE README.md Dockerfile docker-composeE.yml docker-composeJJ.yml requirements.txt dataloader trained_models utils salgan_dhf1k salgan_ledov ... model_dataset_config/ model_dataset_config/ DHF1K LEDOV SALICON ... ... 16
  17. 17. Experiments Checking evaluation Setting baseline model Transfer Learning to DHF1k Adding extra input signals 17
  18. 18. Checking evaluation Min AUC-JUDD video number:175 According to ACLNet. Metric value: 0.6630850 Ground-truth SalGAN (0.775) ACL (0.762) 18
  19. 19. Experiments Checking evaluation Setting baseline model Transfer Learning to DHF1k Adding extra input signals 19
  20. 20. Setting baseline model VALIDATION BASELINE LASAGNE BASELINE PYTORCH 3 EPOCHS SALICON 17 EPOCHS WITH DATA AUGMENTATI ON 27 EPOCHS D.AUGM (+ROTATE) NESTEROV PATIENCE=5 AUC_JUDD 0,856 0,763 0,861 0,862 0,863 AUC_SHUF 0,814 0,709 0,832 0,500 0,830 NSS 1,767 1,198 1,757 1,781 1,789 CC 0,843 0,555 0,859 0,862 0,866 SIM 0,726 0,536 0,751 0,757 0,761 20
  21. 21. Experiments Setting baseline model Transfer Learning to DHF1k Adding extra input signals Checking evaluation 21
  22. 22. Transfer learning to DHF1K VALIDATION Baseline Fine-tuned Baseline AUC_JUDD 0,872 0.880 AUC_SHUF 0,666 0.632 NSS 2,035 2.285 CC 0,379 0.420 SIM 0,267 0.339 22
  23. 23. Experiments Setting baseline model Transfer Learning to DHF1k Adding extra input signals Checking evaluation 23
  24. 24. Adding extra input signals VALIDATION ACLNET Train on SALICON Train on DHF1K Baseline RGB fine-tuning RGB and Depth fine-tuning RGB and Coordconv AUC_JUDD 0.89 0,872 0.880 0.895 0.866 AUC_SHUF 0.601 0,666 0.632 0.648 0.629 NSS 2.354 2,035 2.285 2.524 2.072 CC 0.434 0,379 0.420 0.463 0.389 SIM 0.315 0,267 0.339 0.351 0.304 24
  25. 25. Conclusions ● Good environment. The project code is publicly available in https://github.com/juanjo3ns/SalBCE. ● Study of state-of-the-art models. Rating second in the public leaderboard with SalGAN. ● Implementation of a pytorch version of SalGAN with equivalent performance fine-tuning in SALICON. ● Boost the performance of the baseline pytorch model to predict saliency in videos. The baseline model is fine-tuned on the DHF1k dataset by using RGB information, RGB + Depth, and RGB + coordinates information. 25
  26. 26. Future Work ● LSTM ● Optical flow ● Combine with depth and coordconv in different streams 26
  27. 27. Thank you Q&A 27

×