In this presentation, we present EURECOM's approach to address the MediaEval 2017 Predicting Media Interestingness Task.
We developed models for both the image and video subtasks. In particular, we investigate the usage of media genre information (i.e., drama, horror, etc.) to predict interestingness. Our approach is related to the affective impact of media content and is shown to be effective in predicting interestingness for both video shots and key-frames.
This approach obtained the best performance on the video task (in terms of MAP and MAP@10).
Media Genre Inference for Predicting Media Interestingness
1. EURECOM @MediaEval 2017:
Media Genre Inference for
Predicting Media Interestingness
O. Ben-Ahmed, J. Wacker, A. Gaballo, B. Huet
EURECOM
Sophia Antipolis, France
2. Introduction
Predicting Media Interestingness (PMI)
automatically analyze media data
identify the most attractive content
Content based approaches
gap between low-level features
and high-level human perception
Our proposal
Address PMI in association with
Media Genre
13/09/2017 MediaEval2017 Dublin - B. HUET, EURECOM - p 2
http://www.dailyherald.com/article/20110627/entlife/706279989/
3. Why Genre Inference for PMI
Motivation
Interestingness is highly correlated with data emotional content
Affective representation of data content
Hypothesis
Emotional impact of movie genre can be a factor for interestingness
of a video fragment
Method
Mid-level representation based on media genre prediction for PMI
– Represent each video fragment/image as a distribution of genres
13/09/2017 MediaEval2017 Dublin - B. HUET, EURECOM - p 3
5. Media Genre Prediction
Visual Branch
Middle frame selection
Deep CNN for features extraction
DNN classifier
Audio Branch
Audio extraction : OpenSmile
Deep features extraction : Soundnet
SVM classifier
VGG Architecture
13/09/2017 MediaEval2017 Dublin - B. HUET, EURECOM - p 5
| X
| X
6. Media Genre Prediction Example
Visual Audio Audio-Visual
Action
Drama
Horror
Romance
Sci-fi
13/09/2017 MediaEval2017 Dublin - B. HUET, EURECOM - p 6
7. Media Genre Prediction Example
Visual Audio Audio-Visual
Action
Drama
Horror
Romance
Sci-fi
2.5% 33,34% 17.92%
0% 17,78% 8.89%
0.37% 2.61% 1.49%
0% 3.87% 1.93%
97.12% 42,40% 69,76%
13/09/2017 MediaEval2017 Dublin - B. HUET, EURECOM - p 7
8. Media Genre Prediction Example
Visual Audio Audio-Visual
Action
Drama
Horror
Romance
Sci-fi
Interestingness : 1
Rank : 1
2.5% 33,34% 17.92%
0% 17,78% 8.89%
0.37% 2.61% 1.49%
0% 3.87% 1.93%
97.12% 42,40% 69,76%
13/09/2017 MediaEval2017 Dublin - B. HUET, EURECOM - p 8
9. Interestingness Classification
Features vectors
probability vector for the genre distribution for the video/image
Classifier
Binary SVM,
Taking into account the confidence score in training
Image subtask
Visual genre vector
Video subtask
Visual genre vector
Audio-Visual genre vector
– Mean of visual- and audio-based genre vectors probabilities
13/09/2017 MediaEval2017 Dublin - B. HUET, EURECOM - p 9
10. Experiments and Results
TASK RUN SVM CLASSIFIER MAP MAP@10
IMAGE 1 Sigmoid kernel 0.2029 0.0587
2 Linear kernel 0.2016 0.0579
VIDEO 1 Sigmoid kernel gamma=0.5, C=100 0.2034 0.0717
2 Polynomial kernel degree=3 0.1960 0.0732
3 Polynomial kernel degree=2 0.1964 0.0640
4 Sigmoid kernel gamma=0.2, C=100 0.2094 0.0827
5 Sigmoid kernel gamma=0.3 , C=100 0.2002 0.0774
13/09/2017 MediaEval2017 Dublin - B. HUET, EURECOM - p 10
11. Experiments and Results
TASK RUN SVM CLASSIFIER MAP MAP@10
IMAGE 1 Sigmoid kernel 0.2029 0.0587
2 Linear kernel 0.2016 0.0579
VIDEO 1 Sigmoid kernel gamma=0.5, C=100 0.2034 0.0717
2 Polynomial kernel degree=3 0.1960 0.0732
3 Polynomial kernel degree=2 0.1964 0.0640
4 Sigmoid kernel gamma=0.2, C=100 0.2094 0.0827
5 Sigmoid kernel gamma=0.3 , C=100 0.2002 0.0774
13/09/2017 MediaEval2017 Dublin - B. HUET, EURECOM - p 11
12. Conclusions and Future Work
We proposed a Genre Recognition System as a mid-
level representation for Predicting Media
Interestingness
Deep Audio and Visual Features for Genre Recognition
SVM Classifier for Predicting Media Interestingness
Best Results:
20,29 MAP for Image and 20,94 MAP for Video (on test set).
Audio brings limited additional information for PMI
13/09/2017 MediaEval2017 Dublin - B. HUET, EURECOM - p 12
13. Conclusions and Future Work
We proposed a Genre Recognition System as a mid-
level representation for Predicting Media
Interestingness
Deep Audio and Visual Features for Genre Recognition
SVM Classifier for Predicting Media Interestingness
Best Results:
20,29 MAP for Image and 20,94 MAP for Video (on test set).
Audio brings limited additional information for PMI
Joint learning of audio-visual features
Integration of temporal information
13/09/2017 MediaEval2017 Dublin - B. HUET, EURECOM - p 13