The document describes a system developed by UNIFESP for the MediaEval 2016 Predicting Media Interestingness task. The system:
1. Uses histograms of motion patterns to extract visual features from video segments.
2. Employs various learning-to-rank algorithms like Ranking SVM, RankNet, RankBoost and ListNet to predict interestingness.
3. Uses a majority voting scheme to combine the rankings from different algorithms and improve the prediction results.
UNIFESP Predicting Media Interestingness Using Motion Histograms
1. UNIFESP at MediaEval 2016:
Predicting Media Interestingness Task
Jurandy Almeida
GIBIS Lab, Institute of Science and Technology, Federal University of S˜ao Paulo – UNIFESP
jurandy.almeida@unifesp.br
Introduction
• Developed in the MediaEval 2016 Pre-
dicting Media Interestingness Task
and for its video subtask only.
• The goal is to automatically select the
most interesting video segments ac-
cording to a common viewer.
• The focus is on features derived from
audio-visual content or associated tex-
tual information.
Proposed Approach
It relies on combining learning-to-rank algo-
rithms and exploiting visual information:
1. A simple histogram of motion patterns
is used for processing visual information.
2. A majority voting scheme is used for
combining machine-learned rankers and
predicting the interestingness of videos.
Visual Features
• Low-Level & Mid-Level Features: Not used
• Applying an algorithm to encode visual
properties from video segments.
– “Comparison of Video Sequences with
Histograms of Motion Patterns” [1].
• It relies on three steps:
1. partial decoding;
2. feature extraction;
3. signature generation.
106 111
100 88
91 94
95 90
90 93
96 91
1 1
2 1
2 1
0 3
Previous Current Next
Temporal Spatial
Time Series of Macroblocks
Video Frames
I-frames
Macroblock
Pixel Block
Histogram Distribution
DC coefficient
1: Partial Decoding
2: Feature Extraction
3: Signature Generation
Motion Pattern
0101100110010011
Histograms of Motion Patterns (HMP)
Learning to Rank Strategies
• Ranking SVM [5]: Use the traditional SVM classifier
to learn a ranking function.
• RankNet [2]: Probability distribution metrics as cost
functions to be optimized.
• RankBoost [4]: Regression error on weighted distri-
butions of pairwise rankings.
• ListNet [3]: Extension of RankNet that uses a ranked
list instead of pairwise rankings.
• Majority Voting [6]: The label with the most votes
is selected as the label for a given instance.
Input
Rankers R1 R2 RN
O1 O2 ON
Combining Rankings
Output ˆo
Experimental Protocol
• 4-fold cross validation
• Development data
– 5,054 videos from 52 movie trailers
• Test data
– 2,342 videos from 26 movie trailers
• Mean Average Precision (MAP)
Configurations of Runs
Run Learning-to-Rank Strategy
1 Ranking SVM
2 RankNet
3 RankBoost
4 ListNet
5 Majority Voting
Experimental Results
Results obtained on the development data. Results of the official submitted runs.
Ranking
SVM
RankN
et
RankBoost
ListN
et
M
ajority
Voting
MAP(%)
10
11
12
13
14
15
16
17
18
19
20
0
5
10
15
20
25
MAP(%)
Ranking
SVM
RankN
et
RankBoost
ListN
et
M
ajority
Voting
18.15
16.1716.17 16.56
14.35
AP per movie trailer achieved in each run.
video−52
video−53
video−54
video−55
video−56
video−57
video−58
video−59
video−60
video−61
video−62
video−63
video−64
video−65
video−66
video−67
video−68
video−69
video−70
video−71
video−72
video−73
video−74
video−75
video−76
video−77
0
10
20
30
40
50
60
70
AveragePrecision(%)
Ranking SVM
RankNet
RankBoost
ListNet
Majority Voting
The learning-to-rank algorithms
provide complementary infor-
mation that can be combined by
fusion techniques aiming at pro-
ducing better results.
Remarks
• The proposed approach has explored only
visual properties. Different learning-
to-rank strategies were considered, in-
cluding a fusion of all of them.
• Results demonstrate that the proposed
approach is promising. By combining
learning-to-rank algorithms, it is possible
to make a contribution to better results.
Future Works
The investigation of a smarter strategy for combining learning-to-rank algorithms and considering
other information sources to include more features semantically related to visual content.
Acknowledgements
This research was supported by Brazilian agencies FAPESP, CAPES, and CNPq.
References
[1] J. Almeida, N. J. Leite, and R. S. Torres. Compar-
ison of video sequences with Histograms of Motion
Patterns. In ICIP, pages 3673–3676, 2011.
[2] C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier,
M. Deeds, N. Hamilton and G. N. Hullender. Learn-
ing to rank using gradient descent. In ICML, pages
89–96, 2005.
[3] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li.
Learning to rank: from pairwise approach to listwise
approach. In ICML, pages 129–136, 2007.
[4] Y. Freund, R. D. Iyer, R. E. Schapire, and Y. Singer.
An efficient boosting algorithm for combining prefer-
ences. Journal of Machine Learning Research, 4:933–
969, 2003.
[5] T. Joachims. Training linear SVMs in linear time. In
ACM SIGKDD, pages 217–226, 2006.
[6] L. Lam and C. Y. Suen. Application of majority vot-
ing to pattern recognition: an analysis of its behavior
and performance. IEEE Trans. Systems, Man, and
Cybernetics, Part A, 27(5):553–568, 1997.