Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
LAYER-WISE CNN SURGERY
FOR VISUAL SENTIMENT
PREDICTION
Víctor Campos Xavier Giró Amaia Salvador Brendan Jou
July 20th 2015
Outline
1. Introduction
2. Related work
3. Methodology and results
4. Conclusions
5. Future work
2
3
Introduction: motivation
4
Introduction: motivation
Introduction: motivation
5
6
Introduction: problem definition
▷ What?
▷ How?
▷ What? Predict the sentiment that an image provokes to a human
▷ How?
7
Introduction: problem definition
▷ What? Predict the sentiment that an image provokes to a human
▷ How?
8
Introduction: problem definition
▷ What? Predict the sentiment that an image provokes to a human
▷ How? Using Convolutional Neural Networks (CNNs)
9
CNN
In...
10
CNN
Introduction: example
11
CNN
Introduction: example
Outline
1. Introduction
2. Related work
3. Methodology and results
4. Conclusions
5. Future work
12
Related work: low-level descriptors
13
Siersdorfer, S., Minack, E., Deng, F., & Hare, J. (2010, October).
Analyzing and pr...
14
Borth, D., Ji, R., Chen, T., Breuel, T., & Chang, S. F. (2013, October). Large-scale visual sentiment ontology and dete...
Related work: CNNs for sentiment prediction
15
You, Q., Luo, J., Jin, H., & Yang, J. (2015). Robust image sentiment analys...
Outline
1. Introduction
2. Related work
3. Methodology and results
a. Convolutional Neural Networks
b. Datasets
c. Experim...
Convolutional Neural Networks
17
Krizhevsky, A.; Sutskever, I. & Hinton, G. E.: ImageNet Classification with Deep Convolut...
Outline
1. Introduction
2. Related work
3. Methodology and results
a. Convolutional Neural Networks
b. Datasets
c. Experim...
Datasets
19
Flickr Twitter
Authors Borth et al. (2013) You et al. (2015)
Size ~500k 1269
Annotation method Textual tags
5 ...
Datasets
20
Size
Flickr
dataset
Quality of the
annotations
Twitter
dataset
Datasets
21
Size
Flickr
dataset
Quality of the
annotations
Twitter
dataset
Outline
1. Introduction
2. Related work
3. Methodology and results
a. Convolutional Neural Networks
b. Datasets
c. Experim...
Experimental setup: 5-fold cross-validation
Dataset
Experimental setup: 5-fold cross-validation
Train Test
Experimental setup: 5-fold cross-validation
Train Test
Mean ± Std. Dev.
Experimental setup: 5-fold cross-validation
27
ARCHITECTURE
CaffeNet
Experimental setup: CNN
28
ARCHITECTURE
CaffeNet
SOFTWARE
[Jia’14]
Experimental setup: CNN
Experimental setup: CNN
29
Pre-trained
Model
ARCHITECTURE
CaffeNet
SOFTWARE
[Jia’14]
Experimental setup: outline
1. Fine-tuning CaffeNet
2. Layer by layer analysis
3. Layer ablation
4. Layer addition
30
Fine-tuning CaffeNet
31
Fine-tuning CaffeNet
32
Fine-tuning CaffeNet
33
Fine-tuning CaffeNet
34
Pre-trained
model
Data augmentation (oversampling)
35
CNN
Data augmentation (oversampling)
36
CNN
Data augmentation (oversampling)
37
CNN
Data augmentation (oversampling)
38
CNN
Data augmentation (oversampling)
39
CNN
Data augmentation (oversampling)
40
CNN
Data augmentation (oversampling)
41
CNN
Fine-tuning CaffeNet
42
Experimental setup: outline
1. Fine-tuning CaffeNet
2. Layer by layer analysis
3. Layer ablation
4. Layer addition
43
Layer by layer analysis
44
Layer by layer analysis
45
Experimental setup: outline
1. Fine-tuning CaffeNet
2. Layer by layer analysis
3. Layer ablation
4. Layer addition
46
Layer ablation
47
Raw ablation
2-neuron on top
Layer ablation
48
Layer ablation
49
Layer ablation
50
~16M
params
(~25%)
Experimental setup: outline
1. Fine-tuning CaffeNet
2. Layer by layer analysis
3. Layer ablation
4. Layer addition
51
Layer addition
52
Layer addition
53
Outline
1. Introduction
2. Related work
3. Methodology and results
4. Conclusions
5. Future work
54
Conclusions
55
Pre-trained
model
56
CNN
Conclusions
Conclusions
57
Outline
1. Introduction
2. Related work
3. Methodology and results
4. Conclusions
5. Future work
58
Future work
59
Size
Flickr
dataset
Quality of the
annotations
Twitter
dataset
Future work
60
Size
Flickr
dataset
Quality of the
annotations
Twitter
dataset
New
Flickr
dataset
Experimental setup: introduction
61
Model
ARCHITECTURE
CaffeNet
SOFTWARE
[Jia’14]
DATASET
[Jou’15]
62
Acknowledgements
63
Financial supportTechnical support
Albert Gil Josep Pujal
Evaluation metric: accuracy
Top-5 scores
Receptive fields visualization
CONV5, unit 49:
CONV5, unit 51:
Layer-wise CNN Surgery for Visual Sentiment Prediction
Layer-wise CNN Surgery for Visual Sentiment Prediction
Upcoming SlideShare
Loading in …5
×

Layer-wise CNN Surgery for Visual Sentiment Prediction

https://imatge.upc.edu/web/publications/layer-wise-cnn-surgery-visual-sentiment-prediction
Visual media are powerful means of expressing emotions and sentiments. The constant generation of new content in social networks highlights the need of automated visual sentiment analysis tools. While Convolutional Neural Networks (CNNs) have established a new state-of-the-art in several vision problems, their application to the task of sentiment analysis is mostly unexplored and there are few studies regarding how to design CNNs for this purpose. In this work, we study the suitability of fine-tuning a CNN for visual sentiment prediction as well as explore performance boosting techniques within this deep learning setting. Finally, we provide a deep-dive analysis into a benchmark, state-of-the-art network architecture to gain insight about how to design patterns for CNNs on the task of visual sentiment prediction.

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

Layer-wise CNN Surgery for Visual Sentiment Prediction

  1. 1. LAYER-WISE CNN SURGERY FOR VISUAL SENTIMENT PREDICTION Víctor Campos Xavier Giró Amaia Salvador Brendan Jou July 20th 2015
  2. 2. Outline 1. Introduction 2. Related work 3. Methodology and results 4. Conclusions 5. Future work 2
  3. 3. 3 Introduction: motivation
  4. 4. 4 Introduction: motivation
  5. 5. Introduction: motivation 5
  6. 6. 6 Introduction: problem definition ▷ What? ▷ How?
  7. 7. ▷ What? Predict the sentiment that an image provokes to a human ▷ How? 7 Introduction: problem definition
  8. 8. ▷ What? Predict the sentiment that an image provokes to a human ▷ How? 8 Introduction: problem definition
  9. 9. ▷ What? Predict the sentiment that an image provokes to a human ▷ How? Using Convolutional Neural Networks (CNNs) 9 CNN Introduction: problem definition
  10. 10. 10 CNN Introduction: example
  11. 11. 11 CNN Introduction: example
  12. 12. Outline 1. Introduction 2. Related work 3. Methodology and results 4. Conclusions 5. Future work 12
  13. 13. Related work: low-level descriptors 13 Siersdorfer, S., Minack, E., Deng, F., & Hare, J. (2010, October). Analyzing and predicting sentiment of images on the social web. In Proceedings of the international conference on Multimedia (pp. 715-718). ACM. Machajdik, J., & Hanbury, A. (2010, October). Affective image classification using features inspired by psychology and art theory. In Proceedings of the international conference on Multimedia (pp. 83-92). ACM.
  14. 14. 14 Borth, D., Ji, R., Chen, T., Breuel, T., & Chang, S. F. (2013, October). Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 21st ACM international conference on Multimedia (pp. 223-232). ACM. Related work: SentiBank
  15. 15. Related work: CNNs for sentiment prediction 15 You, Q., Luo, J., Jin, H., & Yang, J. (2015). Robust image sentiment analysis using progressively trained and domain transferred deep networks. In The Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI).
  16. 16. Outline 1. Introduction 2. Related work 3. Methodology and results a. Convolutional Neural Networks b. Datasets c. Experimental setup and results 4. Conclusions 5. Future work 16
  17. 17. Convolutional Neural Networks 17 Krizhevsky, A.; Sutskever, I. & Hinton, G. E.: ImageNet Classification with Deep Convolutional Neural Networks. In: NIPS., 2012
  18. 18. Outline 1. Introduction 2. Related work 3. Methodology and results a. Convolutional Neural Networks b. Datasets c. Experimental setup and results 4. Conclusions 5. Future work 18
  19. 19. Datasets 19 Flickr Twitter Authors Borth et al. (2013) You et al. (2015) Size ~500k 1269 Annotation method Textual tags 5 human annotators
  20. 20. Datasets 20 Size Flickr dataset Quality of the annotations Twitter dataset
  21. 21. Datasets 21 Size Flickr dataset Quality of the annotations Twitter dataset
  22. 22. Outline 1. Introduction 2. Related work 3. Methodology and results a. Convolutional Neural Networks b. Datasets c. Experimental setup and results 4. Conclusions 5. Future work 22
  23. 23. Experimental setup: 5-fold cross-validation Dataset
  24. 24. Experimental setup: 5-fold cross-validation
  25. 25. Train Test Experimental setup: 5-fold cross-validation
  26. 26. Train Test Mean ± Std. Dev. Experimental setup: 5-fold cross-validation
  27. 27. 27 ARCHITECTURE CaffeNet Experimental setup: CNN
  28. 28. 28 ARCHITECTURE CaffeNet SOFTWARE [Jia’14] Experimental setup: CNN
  29. 29. Experimental setup: CNN 29 Pre-trained Model ARCHITECTURE CaffeNet SOFTWARE [Jia’14]
  30. 30. Experimental setup: outline 1. Fine-tuning CaffeNet 2. Layer by layer analysis 3. Layer ablation 4. Layer addition 30
  31. 31. Fine-tuning CaffeNet 31
  32. 32. Fine-tuning CaffeNet 32
  33. 33. Fine-tuning CaffeNet 33
  34. 34. Fine-tuning CaffeNet 34 Pre-trained model
  35. 35. Data augmentation (oversampling) 35 CNN
  36. 36. Data augmentation (oversampling) 36 CNN
  37. 37. Data augmentation (oversampling) 37 CNN
  38. 38. Data augmentation (oversampling) 38 CNN
  39. 39. Data augmentation (oversampling) 39 CNN
  40. 40. Data augmentation (oversampling) 40 CNN
  41. 41. Data augmentation (oversampling) 41 CNN
  42. 42. Fine-tuning CaffeNet 42
  43. 43. Experimental setup: outline 1. Fine-tuning CaffeNet 2. Layer by layer analysis 3. Layer ablation 4. Layer addition 43
  44. 44. Layer by layer analysis 44
  45. 45. Layer by layer analysis 45
  46. 46. Experimental setup: outline 1. Fine-tuning CaffeNet 2. Layer by layer analysis 3. Layer ablation 4. Layer addition 46
  47. 47. Layer ablation 47 Raw ablation 2-neuron on top
  48. 48. Layer ablation 48
  49. 49. Layer ablation 49
  50. 50. Layer ablation 50 ~16M params (~25%)
  51. 51. Experimental setup: outline 1. Fine-tuning CaffeNet 2. Layer by layer analysis 3. Layer ablation 4. Layer addition 51
  52. 52. Layer addition 52
  53. 53. Layer addition 53
  54. 54. Outline 1. Introduction 2. Related work 3. Methodology and results 4. Conclusions 5. Future work 54
  55. 55. Conclusions 55 Pre-trained model
  56. 56. 56 CNN Conclusions
  57. 57. Conclusions 57
  58. 58. Outline 1. Introduction 2. Related work 3. Methodology and results 4. Conclusions 5. Future work 58
  59. 59. Future work 59 Size Flickr dataset Quality of the annotations Twitter dataset
  60. 60. Future work 60 Size Flickr dataset Quality of the annotations Twitter dataset New Flickr dataset
  61. 61. Experimental setup: introduction 61 Model ARCHITECTURE CaffeNet SOFTWARE [Jia’14] DATASET [Jou’15]
  62. 62. 62
  63. 63. Acknowledgements 63 Financial supportTechnical support Albert Gil Josep Pujal
  64. 64. Evaluation metric: accuracy
  65. 65. Top-5 scores
  66. 66. Receptive fields visualization CONV5, unit 49: CONV5, unit 51:

×