Paper: http://ceur-ws.org/Vol-2882/paper52.pdf
Janadhip Jacutprakart, Rukiye Savran Kiziltepe, John Q. Gan, Giorgos Papanastasiou and Alba G. Seco de Herrera : Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we present the methods of approach and the main results from the Essex NLIP Team’s participation in the MediEval 2020 Predicting Media Memorability task. The task requires participants to build systems that can predict short-term and long-term memorability scores on real-world video samples provided. The focus of our approach is on the use of colour-based visual features as well as the use of the video annotation meta-data. In addition, hyper-parameter tuning was explored. Besides the simplicity of the methodology, our approach achieves competitive results. We investigated the use of different visual features. We assessed the performance of memorability scores through various regression models where Random Forest regression is our final model, to predict the memorability of videos.
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
1. MediaEval 2020
Predicting Media Memorability
Janadhip Jacutprakart, Rukiye Savran Kiziltepe,
John Q. Gan, Alba García Seco de Herrera, Giorgos
Papanastasiou
School of Computer Science and Electronic Engineering
University of Essex
15 December 2020
https://essexnlip.uk/
2. Features
AlexNetFC7
HOG
HSVHist
RGBHist
LBP
VGGFC7
C3D
Descriptive*
Text descriptions
Annotations
Response time
Key press
Video position
Short-term score
Long-term score
5. Achieved the highest score on on colour-
based features and metadata on the video
position annotation on development sets.
Can achieved competitive results from
simplicity propose and found an influence
on memorability score from the video
position and number of annotations
Conclusion
MediaEval is a benchmarking initiative dedicated to evaluating new algorithms for multimedia access and retrieval.
The main purpose of this system is to automatically identify whether a video will remain fresh in our memory for a
period of time. Remembering videos are a key aspect of advertisement, entertainment, and recommendation systems. It
is highly likely we speak of a video that remains fresh in our memory and subsequently share its contents with others.
Creating memorable video content is crucial for generating consumer impact and engaging entertainment and profitable
marketing campaigns. Understanding and predicting memorability as a function of video features is therefore important
for computational video analysis task
The memorability dataset comprises 10,000 short soundless videos split into 8,000 videos for the development set and
2,000 videos for the test set. They were extracted from raw footage used by professionals when creating the content of
7s-duration each. The
The features are stored in individual folders per feature type and in individual csv files per sample. For example, in the Features folder there are 7 folders containing the 7 features as follows: AlexNetFC7 (image-level feature) HOG (image-level feature) HSVHist (image-level feature) RGBHist (image-level feature) LBP (image-level feature) VGGFC7 (image-level feature) C3D (video-level feature) For the image-level features we extract features from 3 frames for each video, each one in an individual file, where the filenames are composed as follows: <video_id>-<frame_no>.csv. The 3 frames per each video represent the first, the middle and the last frame in the movie. For example, for video_id 8 we extract the following AlexNet feature-files (please keep in mind that the same structure applies to all the image-level feature folders):
AlexNetFC7/00008-000.csv : AlexNetFC7 feature for video_id = 8, frame_no = 0 (first frame)
AlexNetFC7/00008-098.csv : AlexNetFC7 feature for video_id = 8, frame_no = 98 (middle frame)
AlexNetFC7/00008-195.csv : AlexNetFC7 feature for video_id = 8, frame_no = 195 (last frame)
...
For the video-level features we extract 1 feature for each video, where the filenames are composed as follows: <video_id>.mp4.csv. Using the same video_id 8 as example, we extract the following C3D feature-file:
C3D/00008.mp4.csv : C3D features for video_id = 8