Presenter: Samuel G. Fadel
UNIFESP at MediaEval 2016: Predicting Media Interestingness Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Jurandy Almeida
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_28.pdf
Video: https://youtu.be/YLthKNczlcA
Abstract: This paper describes the approach proposed by UNIFESP for the MediaEval 2016 Predicting Media Interestingness Task and for its video subtask only. The proposed approach is based on combining learning-to-rank algorithms for predicting the interestingness of videos by their visual content.
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
MediaEval 2016 - UNIFESP Predicting Media Interestingness Task
1. UNIFESP at MediaEval 2016:
Predicting Media Interestingness Task
Jurandy Almeida
GIBIS Lab, Institute of Science and Technology, Federal University of S˜ao Paulo – UNIFESP
jurandy.almeida@unifesp.br
MediaEval’16 – Hilversum, Netherlands – October 20-21 – 2016
2. Predicting Media Interestingness Task 2
Developed in the MediaEval 2016 Predicting Media Interestingness Task and
for its video subtask only.
The goal is to automatically select the most interesting video segments
according to a common viewer.
The focus is on features derived from audio-visual content or associated textual
information.
3. Available Resources 3
Table: Resources made available for the task.
Resources Textual Visual
Used — Videos
Not Used Title Low-Level and Mid-Level Features
4. Proposed Approach 4
It relies on combining learning-to-rank
algorithms and exploiting only visual
information:
1. A simple, yet effective, histogram of
motion patterns is used for
processing visual information.
2. A majority voting scheme is used
for combining machine-learned
rankers and predicting the
interestingness of videos.
Input
Rankers R1 R2 RN
O1 O2 ON
Combining Rankings
Output ˆo
5. Visual Features 5
Low-Level and Mid-Level Features: Not used
Applying an algorithm to encode visual properties from video segments.
“Comparison of Video Sequences with Histograms of Motion Patterns”.
J. Almeida, N. J. Leite, and R. S. Torres.
IEEE International Conference on Image Processing (ICIP), 2011.
It relies on three steps:
1. partial decoding;
2. feature extraction;
3. signature generation.
6. Visual Features 6
Histograms of Motion Patterns (HMP)1
106 111
100 88
91 94
95 90
90 93
96 91
1 1
2 1
2 1
0 3
Previous Current Next
Temporal Spatial
Time Series of Macroblocks
Video Frames
I-frames
Macroblock
Pixel Block
Histogram Distribution
DC coefficient
1: Partial Decoding
2: Feature Extraction
3: Signature Generation
Motion Pattern
0101100110010011
1J. Almeida, N. J. Leite, and R. S. Torres. “Comparison of Video Sequences with Histograms
of Motion Patterns”. In: ICIP. 2011, pp. 3673–3676.
7. Learning to Rank Strategies 7
Ranking SVM2
Use the traditional SVM classifier to learn a ranking function.
RankNet3
Probability distribution metrics as cost functions to be optimized.
RankBoost4
Regression error on weighted distributions of pairwise rankings.
ListNet5
Extension of RankNet that uses a ranked list instead of pairwise rankings.
Majority Voting6
The label with the most votes is selected as the label for a given instance.
2T. Joachims. “Training linear SVMs in linear time”. In: ACM SIGKDD. 2006, pp. 217–226.
3C. J. C. Burges et al. “Learning to rank using gradient descent”. In: ICML. 2005, pp. 89–96.
4Y. Freund et al. “An Efficient Boosting Algorithm for Combining Preferences”. In: Journal of
Machine Learning Research 4 (2003), pp. 933–969.
5Z. Cao et al. “Learning to rank: from pairwise approach to listwise approach”. In: ICML.
2007, pp. 129–136.
6L. Lam and C. Y. Suen. “Application of majority voting to pattern recognition: an analysis of
its behavior and performance”. In: IEEE Trans. Systems, Man, and Cybernetics, Part A 27.5
(1997), pp. 553–568.
8. Experimental Protocol 8
4-fold cross validation
Development data
5,054 video segments from 52 movie trailers
Test data
2,342 video segments from 26 movie trailers
Mean Average Precision (MAP)
13. Conclusions 13
Remarks
The proposed approach has explored only visual properties. Different
learning-to-rank strategies were considered, including a fusion of all of them.
Findings
Obtained results demonstrate that the proposed approach is promising. By
combining learning-to-rank algorithms, it is possible to make a contribution to
better results.
Future work
The investigation of a smarter strategy for combining learning-to-rank algorithms
and considering other information sources to include more features semantically
related to visual content.
14. Acknowledgements 14
Organizers of Predicting Media Interestingness Task and MediaEval 2016
Brazilian funding agencies
FAPESP, CAPES, and CNPq
15. Obrigado!!! 15
Thank you for your attention!!!
If you have any questions, do not hesitate to contact me:
Jurandy Almeida (jurandy.almeida@unifesp.br)