The MediaEval 2017 Emotional Impact of Movies Task (Overview)

MediaEval 2017 Emotional Impact of Movies Task
Organizers: Emmanuel Dellandréa, Martijn Huigsloot, Liming Chen,
Yoann Baveye,Mats Sjöberg
Contact: Emmanuel Dellandréa – emmanuel.dellandrea@ec-lyon.fr
1MediaEval'17,13-15 September 2017,Dublin,Ireland

Task overview
è Participants are expected to create systems that
automatically predict the emotional impact that video content
will have on viewers, in terms of valence, arousal and fear

Context
An evolution of previous years tasks on violence, affect and
emotion prediction from videos
Applications:
Personalized content delivery
Movie recommendation
Video editing supervision
Video summarization
Protection of children from potential harmful content

Task description
Goal: Deploy multimedia features and models to
automatically predict the emotional impact of movies
Emotion considered in terms of induced valence,
arousal and fear
Long movies are considered and the emotional impact
has to be predicted for consecutive 10-second segments
sliding over the whole movie with a shift of 5 seconds
Local prediction of emotion
Should allow to benefit from the audio-visual context and
temporal dependencies

Task description
Two subtasks:
Valence/Arousal prediction: predict a score of expected
valence and arousal for each consecutive 10-second
segments
Fear prediction: predict for each consecutive segments
whether they are likely to induce fear or not
Targeted use case: the prediction of frightening scenes to help
systems protecting children from potentially harmful video

Run submissions and evaluation
Up to 5 runs for each subtask
Models can rely on the features provided by the
organizers or any other external data
Standard evaluation metrics:
Valence/arousal prediction subtask (regression
problem): Mean Square Error, Pearson’s Correlation
Coefficient
Fear prediction subtask (binary classification problem):
Accuracy, Precision, Recall and F1-score

Dataset: LIRIS-ACCEDE
Development set
30 movies selected among 160 movies
under Creative Commons licenses
Duration between 117s and 4,566s (total
duration: ~7 hours)
Continuous induced valence and arousal
self-assessments
Test set:
14 other movies selected among the set of
160 movies
Duration between 210s and 6,260s (total
duration: ~8 hours)
Audio and visual features provided
1582 general purpose audio features
11 types of visual features (VGG16, LBP,
ACC, Tamura, ...)
7
LIRIS-ACCEDE available at:
http://liris-accede.ec-lyon.fr
MediaEval'17,13-15 September 2017,Dublin,Ireland

Ground truth
Valence/arousal predition subtask:
Induced valence and arousal self-assessments
16 annotators
ModifiedGtrace interface and joystick
è Arousal and valence values for consecutive 10-second
segments sliding over the whole movie with a shift of 5
seconds

Ground truth
Fear predition subtask:
Use of tool specifically designed for the classification of
audio-visual media (NICAM*)
Annotations realized by two well experienced team
members of NICAM trained in classification of media
Each movie annotated by 1 annotator reporting the start
and stop times of each sequence in the movie expected to
induce fear
è Segments labeled as fear (value 1) if they intersect one of
the fear sequences and as not fear (value 0) otherwise
* Netherlands Institute for the Classification of Audio-visual Media

Task participation
12 team registered, 5 have submitted runs
Grand total of 39 run submissions
Valence/arousal prediction subtask:
5 teams, 22 runs
Fear prediction subtask:
4 teams, 17 runs

Teams
MIC-TJU
Yun Yi1,2, Hanli Wang2, Jiangchuan Wei2
1Gannan Normal University, China
2Tongji University, China
HKBU
Yang Liu, Zhonglei Gu, Tobey H. Ko
Hong Kong Baptist University, HKSAR, China
THU-HCSI
Zitong Jin, Yuqi Yao, Ye Ma, Mingxing Xu
Tsinghua University, China

Teams
BOUN-NKU
Nihan Karslioglu1, Yasemin Timar1, Albert Ali Salah1,
Heysem Kaya2
1Bogazici University, Turkey
2Namik Kemal University, Turkey
TCNJ-CS
Sejong Yoon
The College of New Jersey, U.S.A

Participants’ approaches
Visual Features
General purpose visual features provided by organizers
Auto Color Correlogram, Color and Edge Directivity De-
scriptor, Color Layout, Edge Histogram, Fuzzy Color and
Texture Histogram, Gabor, Joint descriptor joining CEDD
and FCTH, Scalable Color, Tamura, Local Binary Patterns,
fc6 layer of VGG16 network
Motion Keypoint Trajectory (MKT) feature
based on Histogram of Oriented Gradient (HOG), Motion
Boundary Histogram (MBH) , Histogram of Optical Flow
(HOF) and Trajectory-Based Covariance (TBC)
Two-stream Convolutional Networks (ConvNets)
Dense SIFT features

Audio Features
Features provided by organizers
1,582 audio features (EmoBase2010 from OpenSmile)
Extended Geneva Minimalistic Acoustic Parameter Set
(eGeMAPS)
High-level features
Lingering features: computationally model the gradually
amplifying or decaying emotional flow

Feature reduction
Principal Components Analysis
Fisher Vectors
Biased Discriminant Embedding

Regression/classification models
Support Vector Regression
Support Vector Classification
Multiple Kernel learning
Adaboost
Extreme Learning Machines
Random forests
Long Short-Term Memory models
èApproaches quite similar to last year

Valence/arousal prediction subtask
Valence
( Best Pearson's CC last year: 0.14 )

Valence/arousal prediction subtask
Arousal
( Best Pearson's CC last year: 0.23 )

Fear prediction subtask
Clarifying the evaluation metrics:
True Positives: segments are predicted as fear and are
actually fear
True Negatives: segments are predicted as not fear and
are actually not fear
False Positives: segments are predicted as fear and are
actually not fear
False Negatives: segments are predicted as not fear and
are actually fear
MediaEval'17,13-15 September 2017,Dublin,Ireland 19

Accuracy = (TP+TN) / (TP+TN+FP+FN)

Precision = TP / (TP+FP)

Recall = TP / (TP+FN)

F1_score = 2.Precision.Recall / (Precision+Recall)

Conclusion
Participants’ approaches provided encouraging results,
(better than last year for valence/arousal prediction)
Arousal generally better predicted than valence
(consistent with the literature)
Some submissions rely on features/models to cope with
temporal dependencies
Half of the registered participants have submitted runs
è task too difficult ?
Both subtasks remain particularly challenging
High subjectivity of emotions
Unbalanced data for fear prediction

The future of the Emotional Impact task
This year development and test sets as an extension
of LIRIS-ACCEDE dataset available at http://liris-
accede.ec-lyon.fr
Some possible directions of investigation:
Collect more data for fear prediction
Encourage to go further in developping approaches to
model temporal dependencies
Push to study interplays between valence/arousal and
fear
A novel orientation of the task ?

Program of the session
THUHCSI in MediaEval 2017 Emotional Impact of Movies Task
Presenter: Mingxing Xu, Tsinghua University, China
MIC-TJU in MediaEval 2017 Emotional Impact of Movies Task
Presenter: (Stand in) Emmanuel Dellandréa, Ecole Centrale
de Lyon, France
TCNJ-CS @ MediaEval 2017 Emotional Impact of Movie Task
(video)
BOUN-NKU in MediaEval 2017 Emotional Impact of Movies
Task (video)
HKBU at MediaEval 2017 Emotional Impact of Movies Task
(video)

Thank you for your attention !

The MediaEval 2017 Emotional Impact of Movies Task (Overview)

Recommended

Recommended

More Related Content

More from multimediaeval

More from multimediaeval (20)

Recently uploaded

Recently uploaded (20)

The MediaEval 2017 Emotional Impact of Movies Task (Overview)