MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video Interestingness

•

1 like•157 views

Michael Gygli ETH-CVL @ MediaEval 2016: Textual-Visual Embeddings and Video2GIF for Video Interestingness In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Arun B. Vasudevan, Michael Gygli, Anna Volokitin, Luc Van Gool Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_24.pdf Video: https://youtu.be/8qe-NIPSD-4 Abstract: This paper presents the methods that underly our submission to the Predicting Media Interestingness Task at MediaEval 2016. Our contribution relies on two main approaches: (i) A similarity metric between image and text and (ii) a generic video highlight detector. In particular, we develop a method for learning the similarity of text and images, by projecting them into the same embedding space. This embedding allows to find video frames that are both, canonical and relevant w.r.t the title of the video. We present the result of different configurations and give insights into when our best performing method works well and where it has difficulties.

Science

ETH-CVL @ MediaEval 2016: Textual-Visual
Embeddings and Video2GIF for Video Interestingness
Michael Gygli, PhD student @ Computer Vision Lab, ETH Zurich
Work with Arun Balajee Vasudevan, Anna Volokitin and Luc Van Gool
1

Content
▪ Overview
▪ Textual-visual embedding
▪ Video2GIF
▪ Results
2Michael Gygli, PhD student @ ETH Zurich06/21/2016

Finding the most interesting and relevant content
▪ Two dominant approaches
▪ Use of textual information (title, category) to obtain a video specific
model, often using web image priors
E.g. [Khosla et al. CVPR 13], [Liu et al. IJCAI 09], Song et al. CVPR 15]
▪ Supervised learning with generic features and large training set
E.g. [Potapov et al. ECCV 2014], [Sun et al. ECCV 2014], [Gygli et al.
CVPR 16]
▪ Some methods use both
E.g. [Liu et al. CVPR 15]
3

Multi-task deep visual-semantic embedding for video
thumbnail selection [Liu et al. CVPR 15]
▪ Use Bing image search data (query, image, # of clicks) to learn a
joint embedding space for images and text
▪ Compute frame relevance as cosine similarity between the query or
title embedding and the frame embedding
[Liu et al. CVPR 2015]
5Michael Gygli, PhD student @ ETH Zurich06/21/2016

Our contribution with improvements over Liu et al.
● Siamese network
● Text Embedding Model
○ Words encoded through word2vec
○ LSTM to obtain fixed length embedding
● Convolutional Neural Network model
○ Fine-tuned VGG-19
● Training Data based on learning a ranking of: (query+, image+, image-
(“cat”, , )
6

Video2GIF: Automatic Generation of Animated GIFs
from Video [Gygli et al. CVPR 16]
Approach
▪ Work with segments as units
▪ Obtained through change-point detection [Song et al. CVPR 15]
▪ Train a deep neural network for ranking segments
...
8
Example video
Highest
scoring
Lowest
scoring
Michael Gygli, PhD student @ ETH Zurich06/21/2016

Video2GIF: Automatic Generation of Animated GIFs
from Video
Approach
▪ Train a deep neural network for ranking
segments
▪ Built on C3D network [Tran et al. ICCV
2015]
▪ Objective: score positives higher than
negatives
h: scoring function
s+
: positive segment
s-
: negative segment
9Michael Gygli, PhD student @ ETH Zurich06/21/2016

Video2GIF: Dataset
10Michael Gygli, PhD student @ ETH Zurich06/21/2016
Available on github.com/gyglim/video2gif_dataset
▪ Large-scale training data: GIFs created from YouTube videos
▪ Align GIF back to video
▪ This part defines a positive example
▪ Assume non-selected parts are less interesting than selected part

Results
Task Run mAP
Image
1 0.1866
2 0.1952
3 0.1858
Video
1 0.1362
2 0.1574
Frame-based:
• Run 1: Visual-semantic embedding trained on Clickture dataset
• Run 2: As Run 1, with fine-tuning on development set
• Run 3: As Run 1, but trained on a larger subset of Clickture
Segment-based:
• Run 1: Video2GIF
• Run 2: Averaged score of Visual-semantic embedding and Video2GIF

Qualitative Results - Run 2
Title: Captives
Title: After Earth
Predicted best frame True best frame
Predicted best frame True best frame

More information
▪ Paper Video2GIF: Automatic Generation of Animated GIFs from Video.
M. Gygli, Y. Song, L. Cao, CVPR 2016
▪ Slides on Video Summarization as Subset Selection @ Tutorial on
Optimization Algorithms for Subset Selection and Summarization in Large
Data Sets, CVPR 2016
https://t.co/mQIpxMab3v
▪ Demo website for Video2GIF: http://video2gif.info/autogif
▪ Paper Multi-task deep visual-semantic embedding for video thumbnail
selection. W. Liu, T. Mei, Y. Zhang, C. Che, and J. Luo, CVPR 2015

This document discusses using technology to engage learners in the classroom. It suggests having students do a Think-Pair-Share activity to discuss what technologies they currently use. Some example technologies mentioned are Tagxedo and Wordle for creating word clouds. The document emphasizes that the technology should support learning rather than be the focus. It provides examples of tools like Google Drive, Google Forms, Diigo, Google Docs, YouTube, Khan Academy and TED Ed that can be used. It recommends teachers choose one tool to set up and design an activity with to start integrating technology while ensuring it remains relevant, engaging, collaborative and inspires individual research.

Operation management project-based learning

dsjaque

Durham2014 GCUGameson

James Emery

MediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images Task

multimediaeval

Ana Garcia Serrano UNED-UV @ Retrieving Diverse Social Images Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Angel C. González, Xaro B. Garcia, Ana García-Serrano, Esther de Ves Cuenca Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf Video: https://youtu.be/EyuN7IA1HFk Abstract: This paper details the participation of the UNED-UV group at the MediaEval 2016 Retrieving Diverse Social Images Task using a multimodal approach. Several Local Logistic Regression models, which use the visual low-level features, estimate the relevance probability for all the images in the dataset. Then, the images are ranked by selecting the highest probability image at each of the textual clusters. These textual clusters are generated by making use of a textual algorithm based on Formal Concept Analysis (FCA) and Hierarchical Agglomerative Clustering (HAC) to detect the latent topics addressed. The images will be then diversified according to detected topics.

MediaEval 2016 - MLPBOON Predicting Media Interestingness System

multimediaeval

Presenter: Jayneel Parekh The MLPBOON Predicting Media Interestingness System for MediaEval 2016 In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Jayneel Parekh, Sanjeel Parekh Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_25.pdf Video: https://youtu.be/nAnrdYiy7nc Abstract: This paper describes the system developed by team MLPBOON for MediaEval 2016 Predicting Media Interestingness Image Subtask. After experimenting with various features and classifiers on the development dataset, our final system involves use of CNN features (fc7 layer of AlexNet) for the input representation and logistic regression as the classifier. For the proposed method, the MAP for the best run reaches a value of 0.229.

MediaEval 2016: LAPI at Predicting Media Interestingness Task

multimediaeval

The document describes the LAPI team's approach for the MediaEval 2016 predicting media interestingness task. They used a classical machine learning approach where descriptors were generated for videos and images, and support vector machines (SVMs) were trained on the development set. The best SVM-descriptor combinations were identified on the development set and used to predict the test set. Descriptors included HSV histograms, HOG, SIFT, LBP, GIST, and AlexNet layers. SVMs with linear, polynomial, and RBF kernels were tested. The best results on the development set had a MAP of 0.214 for images and 0.179 for videos. On the test set, their best run was above the estimated MAP on

MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...

multimediaeval

Presenter: Giorgos Kordopatis-Zilos Placing Images with Refined Language Models and Similarity Search with PCA-reduced VGG Features In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Giorgos Kordopatis-Zilos, Adrian Popescu, Symeon Papadopoulos, Yiannis Kompatsiaris Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_13.pdf Video: https://youtu.be/WR4I3CWjcR4 Abstract: We describe the participation of the CERTH/CEA-LIST team in the MediaEval 2016 Placing Task. We submitted five runs to the estimation-based sub-task: one based only on text by employing a Language Model-based approach with several refinements, one based on visual content, using geospatial clustering over the most visually similar images, and three based on a hybrid scheme exploiting both visual and textual cues from the multimedia items, trained on datasets of different size and origin. The best results were obtained by a hybrid approach trained with external training data and using two publicly available gazetteers.

MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models

multimediaeval

Presenter: Göksu Erdoğan HUCVL at MediaEval 2016: Predicting Interesting Key Frames with Deep Models In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Goksu Erdogan, Aykut Erdem, Erkut Erdem Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_18.pdf Video: https://youtu.be/A--6O2v81Cw Abstract: In MediaEval 2016, we focus on the image interestingness subtask which involves predicting interesting key frames of a video in the form of a movie trailer. We specifically propose three different deep models for this subtask. The first two models are based on fine-tuning two pretrained models, namely AlexNet and MemNet, where we cast the interestingness prediction as a regression problem. Our third deep model, on the other hand, depends on a triplet network which is comprised of three instances of the same feedforward network with shared weights, and trained according to a triplet ranking loss. Our experiments demonstrate that all these models provide relatively similar and promising results on the image interestingness subtask.

Presenter: Bogdan Boteanu, LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Bogdan Boteanu, Mihai G. Constantin, Bogdan Ionescu Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_20.pdf Video: https://youtu.be/mDI8Z31p7TY Abstract: In this paper we present the results achieved during the 2016 MediaEval Retrieving Diverse Social Images Task, using an approach based on pseudo-relevance feedback, in which human feedback is replaced by an automatic selection of images. The proposed approach is designed to have in priority the diversification of the results, in contrast to most of the existing techniques that address only the relevance. Diversification is achieved by exploiting a hierarchical clustering scheme followed by a diversification strategy. Methods are tested on the benchmarking data and results are analyzed. Insights for future work conclude the paper.

MediaEval 2016 - BUT Zero-Cost Speech Recognition

multimediaeval

Presenter: Miroslav Skácel BUT Zero-Cost Speech Recognition 2016 System Description In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Miroslav Skácel, Martin Karafiát, Lucas Ondel, Albert Uchytil, Igor Szöke Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_48.pdf Video: https://youtu.be/0pNiLLVTa28 Abstract: This paper describes our work on developing speech recognizers for Vietnamese. It focuses on procedures to prepare provided data precisely. We aim on analysis of the textual transcriptions in particular. Methods to filter out defective data to improve performance of final system are proposed and described in detail. We also propose cleaning of other textual data used for language modeling. Several architectures are investigated to reach both sub-tasks goals. The achieved results are discussed.

MediaEval 2016 - Simula Team @ Context of Experience Task

multimediaeval

Presenter: Konstantin Pogorelov Simula @ MediaEval 2016 Context of Experience Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Carsten Griwodz Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_53.pdf Video: https://youtu.be/FTIeGpHhURU Abstract: This paper presents our approach for the Context of Multimedia Experience Task of the MediaEval 2016 Benchmark. We present different analyses of the given data using different subsets of data sources and combinations of it. Our approach gives a baseline evaluation indicating that metadata approaches work well but that also visual features can provide useful information for the given problem to solve.

MediaEval 2016 - ININ Submission to Zero Cost ASR Task

multimediaeval

Presenter: Tejas Godambe ININ Submission to Zero Cost ASR Task at MediaEval 2016 In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Tejas Godambe, Naresh Kumar, Pavan Kumar, Veera Raghavendra, Aravind Ganapathiraju Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_31.pdf Video: https://youtu.be/e70xRjsUUts Abstract: This paper details the experiments conducted to train an as good performing Vietnamese speech recognition system as possible using public domain data only, as a part of the Zero Cost task at MediEval 2016. We explored techniques related to audio preprocessing, use of speaker’s pitch information, data perturbation, for building subspace Gaussian mixture acoustic model which is known for estimating robust parameters when the amount of data is less, and also unsupervised adaptation, RNN language model based lattice rescoring and system combination using ROVER technique.

MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

multimediaeval

This document discusses different approaches to evaluating information retrieval (IR) systems, including user studies, in-situ evaluation, A/B testing, interleaving, and collection-based evaluation using test collections like Cranfield. It describes the TREC Session Track, which aimed to improve search over an entire user session, and the TREC Tasks Track, which focuses on understanding and assisting users with their underlying tasks. The CLEF Dynamic Search for Complex Tasks task also examines simulating users to dynamically build test collections and understand good document rankings and sessions for complex tasks.

MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task

multimediaeval

Presenter: Sabrina Tollari UPMC at MediaEval 2016 Retrieving Diverse Social Images Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Sabrina Tollari Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_14.pdf Video: https://youtu.be/_uletoYIZGQ Abstract: In the MediaEval 2016 Retrieving Diverse Social Images Task, we proposed a general framework based on agglomerative hierarchical clustering (AHC). We tested the provided credibility descriptors as a vector input for our AHC. The results on devset showed that this vector based on the credibility descriptors is the best feature, but unfortunately that is not confirmed on testset. To merge several features, we chose to merge feature similarities. Tests on devset showed that to merge similarities using linear or weighted-max operators gave, most of the time, better results than using only one feature. This results is partially confirmed on testset.

MediaEval 2016 - UNIFESP Predicting Media Interestingness Task

multimediaeval

Presenter: Samuel G. Fadel UNIFESP at MediaEval 2016: Predicting Media Interestingness Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Jurandy Almeida Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_28.pdf Video: https://youtu.be/YLthKNczlcA Abstract: This paper describes the approach proposed by UNIFESP for the MediaEval 2016 Predicting Media Interestingness Task and for its video subtask only. The proposed approach is based on combining learning-to-rank algorithms for predicting the interestingness of videos by their visual content.

MediaEval 2016 - Emotion in Music Task: Lessons Learned

multimediaeval

This document summarizes lessons learned from three years of the Emotion in Music Task, which aimed to recognize emotions expressed in music over time. It describes the tasks, datasets, and evaluation metrics used each year. The quality of emotion annotations improved over time, but issues remained around absolute scales, reaction time, and scaling of emotional expression. The best performing systems used bi-directional long short-term memory neural networks. Possible solutions discussed include changing the task focus or data collection interface to address annotation problems.

MediaEval 2016 - TUD-MMC Predicting media Interestingness Task

multimediaeval

Presenter: Cynthia Liem TUD-MMC at MediaEval 2016: Predicting Media Interestingness Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Cynthia Liem Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_30.pdf Video: https://youtu.be/NQan10E_-kE Abstract: This working notes paper describes the TUD-MMC entry to the MediaEval 2016 Predicting Media Interestingness Task. Noting that the nature of movie trailer shots is different from that of preceding tasks on image and video interestingness, we propose two baseline heuristic approaches based on the clear occurrence of people. MAP scores obtained on the development set and test set suggest that our approaches cover a limited but non-marginal subset of the interestingness spectrum. Most strikingly, our obtained scores on the Image and Video Subtasks are comparable or better than those obtained when evaluating the ground truth annotations of the Image Subtask against the Video Subtask and vice versa

MediaEval 2016 - Tag Propagation in Talking Face Graphs

multimediaeval

Presenter: Guillaume Gravier PUCMinas and IRISA at Multimodal Person Discovery In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Gabriel Sargent, Gabriel Barbosa de Fonseca, Izabela Lyon Freire, Ronan Sicre, Zenilton K. G. Patrocínio Jr, Silvio Jamil F. Guimarães, and Guillaume Gravier Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_34.pdf Video: https://youtu.be/O2Ecgy89hN4 Abstract: This paper describes the systems developed by PUC Minas and IRISA for the person discovery task at MediaEval 2016. We adopt a graph-based representation and investigate two tag-propagation approaches to associate overlays cooccurring with some speaking faces to other visually or audio-visually similar speaking faces. Given a video, we first build a graph from the detected speaking faces (nodes) and their audio-visual similarities (edges). Each node is associated to its co-occurring overlays (tags) when they exist. Then, we consider two tagpropagation approaches, respectively based on a random walk strategy and on Kruskal’s algorithm.

MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...

multimediaeval

The document discusses the OpenMIC Challenge 2017, which aims to establish a sustainable model for music information retrieval (MIR) evaluation. It proposes using openly licensed music, distributed computation of system outputs, and incremental annotation of the most informative examples to reduce costs and bias compared to previous evaluation models like MIREX. Key aspects include collecting Creative Commons music, defining classification and retrieval tasks, developing a backend to manage annotations, and releasing an initial development set for participants to train systems on.

MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...

multimediaeval

Presenter: Paula López Otero GTM-UVigo System for Multimodal Person Discovery in Broadcast TV Task at MediaEval 2016 In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Paula L. Otero, Laura Docio-Fernandez, Carmen G. Mateo Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_36.pdf Video: https://youtu.be/KrSjh47rEdc Abstract: In this paper, we present the system developed by GTMUVigo team for the Multimedia Person Discovery in Broadcast TV task at MediaEval 2016. The proposed approach consists in a novel strategy for person discovery which is not based on speaker and face diarisation as in previous works. In this system, the task is approached as a person recognition problem: there is an enrolment stage, where the voice and face of each discovered person are detected and, for each shot, the most suitable voice and face are assigned using the i-vector paradigm. These two biometric modalities are combined by decision fusion.

MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge

multimediaeval

Presenter: Nam Le EUMSSI Team at the MediaEval Person Discovery Challenge 2016 In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Nam D. Le, Sylvain Meignier, Jean-Marc Odobez Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_46.pdf Video: https://youtu.be/axt_PxhIrJ4 Abstract: We present the results of the EUMSSI team’s participation in the Multimodal Person Discovery task. The goal is to identify all people who simultaneously appear and speak in a video corpus. In the proposed system, besides improving each modality, we emphasize on the ranking of multiple results from both audio stream and visual stream.

MediaEval 2016 - RECOD at Placing Task

multimediaeval

Samuel G. Fadel, RECOD, Brazil RECOD @ Placing Task of MediaEval 2016: A Ranking Fusion Approach for Geographic-Location Prediction of Multimedia Objects In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Javier A. V. Muñoz, Lin Tzy Li, Ícaro C. Dourado, Keiller Nogueira, Samuel G. Fadel, Otávio A. B. Penatti1, Jurandy Almeida, Luís A. M. Pereira, Rodrigo T. Calumby, Jefersson A. dos Santos, Ricardo da S. Torres Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_22.pdf Video: https://youtu.be/njZHg-5GR1I Abstract: We describe the approach proposed by the RECOD team for the estimation-based sub-task of Placing Task at MediaEval 2016. Our approach uses genetic programming (GP) to combine ranked lists defined in terms of textual and visual descriptors to automatically assign geographic locations to images and videos.

MediaEval 2016 - Retrieving Diverse Social Images Task Overview

multimediaeval

Presenter: Maia Zaharieva Retrieving Diverse Social Images at MediaEval 2013: Objectives, Dataset and Evaluation Working Notes Proceedings of the MediaEval 2013 Workshop, Barcelona, Spain, October 18-19, CEUR-WS.org, ISSN 1613-0073 (2013) by Bogdan Ionescu, María Menéndez, Henning Müller, Adrian Popescu Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_5.pdf Video: https://youtu.be/1prt8LSTLf8 Abstract: This paper provides an overview of the Retrieving Diverse Social Images task that is organized as part of the MediaEval 2016 Benchmarking Initiative for Multimedia Evaluation. The task addresses the problem of result diversification in the context of social photo retrieval where images, metadata, text information, user tagging profiles and content and text models are available for processing. We present the task challenges, the proposed data set and ground truth, the required participant runs and the evaluation metrics.

MediaEval 2016 - the C@merata task: Natural Language Queries Derived from Exa...

multimediaeval

Presenters: Richard Sutcliffe and Cynthia Liem The C@merata task at MediaEval 2016: Natural Language Queries Derived from Exam Papers, Articles and Other Sources against Classical Music Scores in MusicXML In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Richard Sutcliffe, Tom Collins, Eduard Hovy, Richard Lewis, Chris Fox, and Deane Root Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_55.pdf Video: https://youtu.be/dL6ouVaM5eo Abstract: Cl@ssical Music Extraction of Relevant Aspects by Text Analysis (C@merata) is a shared evaluation held at MediaEval and this is the third time the task has run. The input is a natural language query (‘F# in the cello’) and the output is a passage in a MusicXML score which contains this note played on the instrument in question. There are 200 such questions each year and evaluation is via modified versions of Precision, Recall and FMeasure. In 2014 and 2015 the best Beat F (BF) scores were 0.797 and 0.620, both attained by CLAS. This year, queries were more difficult and in addition the most experienced groups from previous years were unable to take part. In consequence, the best BF was 0.070. This year, there was progress concerning the development of the queries, many of these being derived from real sources such as exam papers, books and scholarly articles. We are thus converging on our goal of relating musical references in complex natural language texts to passages in music scores.

Using educational technology to convey complex IL topics: animating OSCOLA re...

IL Group (CILIP Information Literacy Group)

The document discusses the creation of animated videos and interactive elements by library staff at City, University of London to teach students about OSCOLA referencing and copyright. Powtoon software was used to create engaging animations about OSCOLA basics, citing and referencing, and avoiding plagiarism. H5P was used to add interactive quiz elements to test student understanding. The videos were well-received, with increased complex questions from students. The library staff shared their experience using the tools and discussed ideas for future expansions, such as incorporating other technologies like Camtasia and creating a copyright video.

A Ensemble Learning-based No Reference QoE Model for User Generated Contents

Duc Nguyen

The document proposes an ensemble learning-based no-reference quality of experience (QoE) model for user-generated video content. It extracts encoding and content features from videos and pools the features to train an ensemble of decision trees to predict mean opinion scores without access to the original videos. The model achieves high prediction performance with Pearson linear correlation coefficients over 0.9 on a test dataset compared to baseline methods. Key encoding features like bitrate and quantization parameters are most important inputs. The ensemble learning approach outperforms non-ensemble methods, providing a low-complexity solution to assess user-generated video quality.

Icme2020 tutorial video_summarization_part1

VasileiosMezaris

Tutorial on "Video Summarization and Re-use Technologies and Tools", delivered at IEEE ICME 2020. These slides correspond to the first part of the tutorial, presented by Vasileios Mezaris and Evlampios Apostolidis. This part deals with automatic video summarization, and includes a presentation of the video summarization problem definition and a literature overview; an in-depth discussion on a few unsupervised GAN-based methods; and a discussion on video summarization datasets, evaluation protocols and results, and future directions.

Interactive Video and Adult Education

Carolyn Guertin

The document discusses using interactive video in education. It provides examples of how interactive video can foster inductive learning by allowing students to notice details at their own pace. Some tools mentioned for creating interactive video, animation and apps include Tumult Hype, Powtoons, Pencil, Synfig Studio, Stykz, CreaToon, Ajax Animator, and Blender. The document also discusses using participatory video to have students create their own materials.

YouTube Trending Video Dashboard

IRJET Journal

This dashboard analyzes trending videos on YouTube in India to understand factors that influence videos to appear in the trending section. The dashboard collects data from the YouTube API, cleans it, and stores it in a database. Visualizations including scatter plots, line plots, pie charts, bar charts and histograms are generated from the data to show trends like popular publishing hours, videos by day of the week, title formatting, top channels, and title lengths. The dashboard is deployed on Heroku so it is publicly available for creators to analyze trends and optimize their content.

Jing

MRMR77

Jing is a free screen recording and video creation tool that allows users to capture images and videos of what is happening on their computer screen. It can be used for educational purposes like creating tutorials or demonstrations, or for business purposes like recording technical support issues. Jing is easy to download and use, allows basic editing and sharing of short 5 minute videos, and works on both Windows and Mac computers.

Viewers also liked

MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...

multimediaeval

MediaEval 2016 - BUT Zero-Cost Speech Recognition

multimediaeval

MediaEval 2016 - Simula Team @ Context of Experience Task

multimediaeval

MediaEval 2016 - ININ Submission to Zero Cost ASR Task

multimediaeval

MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

multimediaeval

MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task

multimediaeval

MediaEval 2016 - UNIFESP Predicting Media Interestingness Task

multimediaeval

MediaEval 2016 - Emotion in Music Task: Lessons Learned

multimediaeval

MediaEval 2016 - TUD-MMC Predicting media Interestingness Task

multimediaeval

MediaEval 2016 - Tag Propagation in Talking Face Graphs

multimediaeval

MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...

multimediaeval

MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...

multimediaeval

MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge

multimediaeval

MediaEval 2016 - RECOD at Placing Task

multimediaeval

MediaEval 2016 - Retrieving Diverse Social Images Task Overview

multimediaeval

MediaEval 2016 - the C@merata task: Natural Language Queries Derived from Exa...

multimediaeval

Viewers also liked (16)

MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...

MediaEval 2016 - BUT Zero-Cost Speech Recognition

MediaEval 2016 - Simula Team @ Context of Experience Task

MediaEval 2016 - ININ Submission to Zero Cost ASR Task

MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task

MediaEval 2016 - UNIFESP Predicting Media Interestingness Task

MediaEval 2016 - Emotion in Music Task: Lessons Learned

MediaEval 2016 - TUD-MMC Predicting media Interestingness Task

MediaEval 2016 - Tag Propagation in Talking Face Graphs

MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...

MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...

MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge

MediaEval 2016 - RECOD at Placing Task

MediaEval 2016 - Retrieving Diverse Social Images Task Overview

MediaEval 2016 - the C@merata task: Natural Language Queries Derived from Exa...

Similar to MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video Interestingness

Using educational technology to convey complex IL topics: animating OSCOLA re...

IL Group (CILIP Information Literacy Group)

A Ensemble Learning-based No Reference QoE Model for User Generated Contents

Duc Nguyen

Icme2020 tutorial video_summarization_part1

VasileiosMezaris

Interactive Video and Adult Education

Carolyn Guertin

YouTube Trending Video Dashboard

IRJET Journal

Jing

MRMR77

Jing

MRMR77

Jing is a free screen recording and video creation tool that allows users to capture images and videos of what is happening on their computer screen. It can be used for educational purposes like creating tutorials or demonstrations, or for business purposes like recording technical support issues. Jing is easy to download and use, allows 5 minute maximum video recordings, and makes it simple to save and share videos. Common users of Jing include students, teachers, and businesses.

EDUC5101G Session 5 Presentation (March 8, 2016)

Robert Power

This document outlines the agenda for the 5th Adobe Connect session of the EDUC5101G course on digital tools for knowledge construction. The session will include check-ins, reminders about the final paper, discussions on learning theories and how they relate to educational technology, breakout activities to discuss instructional design models and frameworks, and an introduction to the third problem-based learning activity around designing instruction with technology. Key dates are provided for submitting the final paper and participating in peer reviews.

Eye Tracking for Predicting ADHD

Gavindya Jayawardena

軒銘Icalt.pptx

軒銘張

This document describes the development of a data-driven learning interest recommendation system on MOOCs. It discusses using students' activity logs on MOOCs, such as video clickstreams, to analyze their interests and provide recommendations to help students. The system, called Videomark, analyzes video activity data to identify keywords and popular video segments for each week's content. It displays these as a keyword cloud and allows students to click keywords to access related video sections. An experiment on a networking security course found Videomark improved students' learning performance, engagement and experience compared to not having the recommendation system. Future work could include generating keywords in other languages and improving the user interface.

OSGi Alliance and its Technology - Where Are We Now, and What is Your Vision ...

mfrancis

OSGi Community Event 2007 Panel: Susan Schwarze Ph.D, Member of the Board, Vice President of Marketing, OSGi Alliance. Marketing Director, ProSyst Software GmbH Panelists: Didier Donsez, Associate Professor, University Grenoble, France. OSGi User Group, France Jochen Krause, CEO, Innoopract, and Eclipse Board Peter Kriens, OSGi Evangelist Prof. Dae Young Seo, OSGi User Forum, Korea. Professor, Department of Computer Engineering, Korea Polytechnic University

SUMMARY GENERATION FOR LECTURING VIDEOS

IRJET Journal

The document discusses generating summaries for lecturing videos. It proposes a method that uses optical character recognition to extract textual information from each video frame, detects changes in text between frames to find scene changes, and combines key frames to create a highlight summary. The proposed system was tested on PowerPoint-based lecture videos. Future work could focus on expanding it to handle chalkboard-style lectures through improved machine learning models. The goal is to automatically generate concise summaries that extract the most important concepts from long videos to help students learn more efficiently.

The LEGO Strategy: Guidelines for a Profitable Deployment

Luigi Buglione

When dealing with improvements, organizations seek to find a break-even point for their applications as early as possible in order to maximize the return from their investment. However, in some cases such a strategy can lead to a long term failure by not realizing the full benefits, when focusing only on a short term. The LEGO (Living EnGineering prOcess) approach – a method for building your own process meta-model based on multiple inputs – is a way to make an organization more efficient and effective, optimizing resources, as well as time and costs through looking at its entire Business Process Model. This paper introduces the elements for designing a strategy for a more valuable deployment of a process improvement initiative, in order to optimize the choice of the models and elements to be considered as an input to the LEGO approach

IGoogle presentation

Suhe Ndri

This document provides an agenda and overview for a workshop on using iGoogle for educational purposes. iGoogle allows users to create a customizable homepage with gadgets. It can be used freely for any purpose, including education. The workshop aims to teach participants how to create an iGoogle account, customize it with gadgets and tabs, and use it as an effective homepage for surfing and social interaction in learning. Potential benefits discussed include creating interactive learning, opportunities for student practice and social interaction, and ways it connects to learning theories and can be used for assessment. The document also provides tips, limitations, and videos for further information.

IGoogle presentation

Suhe0004

IGoogle presentation

Suhe0004

This document provides an agenda and overview for a workshop on using iGoogle for educational purposes. iGoogle allows users to create a customizable homepage with gadgets. It can be used freely for any purpose, including education. The workshop aims to teach participants how to create an iGoogle account, customize it with gadgets and tabs, and use it as an effective homepage for surfing and social interaction in learning. Potential benefits discussed include creating engaging learning experiences, opportunities for student practice and social interaction, and ways to assess student learning through iGoogle features. Limitations and tips are also provided.

OSGeo INSPIRE Ping-Pong Match

Arnulf Christl

This document summarizes an INSPIRE workshop organized by OSGeo members. The agenda included short presentations from representatives of public administration, the EU Commission, and industry. Speakers included individuals from organizations like Ordnance Survey GB, the EEA, JRC, regional governments in Germany and Austria, the UK MetOffice, GeoCat, Occam Labs, metaspatial, and the Joint Research Center. Many speakers discussed their roles working with geospatial data and standards, and their goals in supporting INSPIRE. Common challenges mentioned were the need for more coordination, funding, and communication to help with INSPIRE implementation and maintenance.

Jenkins and visual regression – Exove

Exove

Jenkins is a tool used for continuous integration and automation that can build, test, and deploy software. Visual regression testing involves comparing screenshots of a website between builds to detect unwanted visual changes. The document describes a case study where a screenshot comparison tool was built to run within Jenkins, automatically collecting screenshots of a site, comparing galleries of screenshots between test runs, and reporting any visual differences found.

How to start WebGL easily?

誠人堀口

Modeling and Performance Analysis of Scrumban with Test-Driven Development us...

Fernando Sambinelli, MSc

Similar to MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video Interestingness (20)

Using educational technology to convey complex IL topics: animating OSCOLA re...

A Ensemble Learning-based No Reference QoE Model for User Generated Contents

Icme2020 tutorial video_summarization_part1

Interactive Video and Adult Education

YouTube Trending Video Dashboard

Jing

EDUC5101G Session 5 Presentation (March 8, 2016)

Eye Tracking for Predicting ADHD

軒銘Icalt.pptx

OSGi Alliance and its Technology - Where Are We Now, and What is Your Vision ...

SUMMARY GENERATION FOR LECTURING VIDEOS

The LEGO Strategy: Guidelines for a Profitable Deployment

IGoogle presentation

OSGeo INSPIRE Ping-Pong Match

Jenkins and visual regression – Exove

How to start WebGL easily?

Modeling and Performance Analysis of Scrumban with Test-Driven Development us...

More from multimediaeval

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper62.pdf YouTube: https://youtu.be/gV-rvV3iFDA Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri and Julien Morlier : Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal CNN for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online. This work presents a method for classifying table tennis strokes using spatio-temporal convolutional neural networks. The fine-grained classification is performed on trimmed video segments recorded at 120 fps with different players performing in natural conditions. From those segments, the frames are extracted, their optical flow is computed and the pose of the player is estimated. From the optical flow amplitude, a region of interest is inferred. A three stream spatio-temporal convolutional neural network using combination of those modalities and 3D attention mechanisms is presented in order to perform classification. Presented by: Pierre-Etienne Martin

HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper50.pdf Hai Nguyen-Truong, San Cao, N. A. Khoa Nguyen, Bang-Dang Pham, Hieu Dao, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task. Proc. of MediaEval 2020, 14-15 December 2020, Online. The Sports Video Classification Tasks in the Multimedia Evaluation 2020 Challenge focuses on classifying different types of table tennis strokes in video segments. In this task, we - the HCMUS Team - perform multiple experiments, which includes a combination of models such as SlowFast, Optical Flow, DensePose, R2+1, Channel-Separated Convolutional Networks, to classify 21 types of table tennis strokes from video segments. In total, we submit eight runs corresponding to five different models with different sets of hyper-parameters in each of our models. In addition, we apply some pre-processing techniques on the dataset in order for our model to learn and classify more accurately. According to the evaluation results, one of our team's methods out-performs the other team's. In particular, our best run achieves 31.35\% global accuracy, and all of our methods show potential results in terms of local and global accuracy for action recognition tasks.

Sports Video Classification: Classification of Strokes in Table Tennis for Me...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper2.pdf YouTube: https://youtu.be/-bRL868b8ys Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri, Laurent Mascarilla, Jordan Calandre and Julien Morlier : Sports Video Classification: Classification of Strokes in Table Tennis for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online. Fine-grained action classification has raised new challenges compared to classical action classification problems. Sport video analysis is a very popular research topic, due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests, up to analysis of athletes' performances. Running since 2019 as a part of MediaEval, we offer a task which consists in classifying table tennis strokes from videos recorded in natural conditions at the University of Bordeaux. The aim is to build tools for teachers, coaches and players to analyse table tennis games. Such tools could lead to an automatic profiling of the player and adaptation of his training for improving his/her sport skills more efficiently. Presented by: Pierre-Etienne Martin

Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper61.pdf YouTube: https://youtu.be/brmI4g3jLS4 Ricardo Kleinlein, Cristina Luna-Jiménez, Fernando Fernández-Martínez and Zoraida Callejas : Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention and LSTM Models. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper reports on the GTH-UPM team experience in the Predicting Media Memorability task at MediaEval 2020. Teams were requested to predict memorability scores at both short-term and long-term, understanding such score as a measure of whether a video was perdurable in a viewer's memory or not. Our proposed system relies on a late fusion of the scores predicted by three sequential models, each trained over a different modality: video captions, aural embeddings and visual optical flow-based vectors. Whereas single-modality models show a low or zero Spearman correlation coefficient value, their combination considerably boosts performance over development data up to 0.2 in the short-term memorability prediction subtask and 0.19 in the long-term subtask. However, performance over test data drops to 0.016 and -0.041, respectively. Presented by: Ricardo Kleinlein

Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper52.pdf Janadhip Jacutprakart, Rukiye Savran Kiziltepe, John Q. Gan, Giorgos Papanastasiou and Alba G. Seco de Herrera : Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task. Proc. of MediaEval 2020, 14-15 December 2020, Online. In this paper, we present the methods of approach and the main results from the Essex NLIP Team’s participation in the MediEval 2020 Predicting Media Memorability task. The task requires participants to build systems that can predict short-term and long-term memorability scores on real-world video samples provided. The focus of our approach is on the use of colour-based visual features as well as the use of the video annotation meta-data. In addition, hyper-parameter tuning was explored. Besides the simplicity of the methodology, our approach achieves competitive results. We investigated the use of different visual features. We assessed the performance of memorability scores through various regression models where Random Forest regression is our final model, to predict the memorability of videos.

Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper6.pdf YouTube: https://youtu.be/ySGGu_4vaxs Alba García Seco De Herrera, Rukiye Savran Kiziltepe, Jon Chamberlain, Mihai Gabriel Constantin, Claire-Hélène Demarty, Faiyaz Doctor, Bogdan Ionescu and Alan F. Smeaton : Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a Video Memorable? Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper describes the MediaEval 2020 Predicting Media Memorability task. After first being proposed at MediaEval 2018, the Predicting Media Memorability task is in its 3rd edition this year, as the prediction of short-term and long-term video memorability (VM) remains a challenging task. In 2020, the format remained the same as in previous editions. This year the videos are a subset of the TRECVid 2019 Video to Text dataset, containing more action rich video content as compare with the 2019 task. In this paper a description of some aspects of this task is provided, including its main characteristics, a description of the collection, the ground truth dataset, evaluation metrics and the requirements for the run submission. Presented by: Rukiye Savran Kiziltepe

Fooling an Automatic Image Quality Estimator

multimediaeval

This document discusses fooling an automatic image quality estimator (BIQA) using adversarial attacks. It presents adapting the definition of an adversarial attack since BIQA predicts a quality score rather than class. It also covers challenges in generating adversarial samples as integer images and quantizing samples before JPEG compression to simulate artifacts. Results show success rates for different quality targets and selection rates by evaluators.

Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper16.pdf YouTube: https://youtu.be/ix_b9K7j72w Zhengyu Zhao : Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable Color Filter. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper presents the submission of our RU-DS team to the Pixel Privacy Task 2020. We propose to fool the blind image quality assessment model by transforming images based on optimizing a human-understandable color filter. In contrast to the common work that relies on small, $L_p$-bounded additive pixel perturbations, our approach yields large yet smooth perturbations. Experimental results demonstrate that in the specific context of this task, our approach is able to achieve strong adversarial effects, but has to sacrifice the image appeal. Presented by: Zhengyu Zhao

Pixel Privacy: Quality Camouflage for Social Images

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper77.pdf YouTube: https://youtu.be/8Rr4KknGSac Zhuoran Liu, Zhengyu Zhao, Martha Larson and Laurent Amsaleg : Pixel Privacy: Quality Camouflage for Social Images. Proc. of MediaEval 2020, 14-15 December 2020, Online. High-quality social images shared online can be misappropriated for unauthorized goals, where the quality filtering step is commonly carried out by automatic Blind Image Quality Assessment (BIQA) algorithms. Pixel Privacy benchmarks privacy-protective approaches that protect privacy-sensitive images against unethical computer vision algorithms. In the 2020 task, participants are encouraged to develop camouflage methods that can effectively decrease the BIQA quality score of high-quality images and maintain image appeal. The camouflaged images need to be either imperceptible to the human eye, or it can be a visible enhancement. Presented by: Zhuoran Liu

HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper73.pdf YouTube: https://youtu.be/TadJ6y7xZeA Thuc Nguyen-Quang, Tuan-Duy Nguyen, Thang-Long Nguyen-Ho, Anh-Kiet Duong, Xuan-Nhat Hoang, Vinh-Thuyen Nguyen-Truong, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching. Proc. of MediaEval 2020, 14-15 December 2020, Online. Matching text and images based on their semantics has an important role in cross-media retrieval. However, text and images in articles have a complex connection. In the context of MediaEval 2020 Challenge, we propose three multi-modal methods for mapping text and images of news articles to the shared space in order to perform efficient cross-retrieval. Our methods show systemic improvement and validate our hypotheses, while the best-performed method reaches a recall@100 score of 0.2064. Presented by: Thuc Nguyen-Quang

Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper72.pdf Sabarinathan D and Suganya Ramamoorthy : Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attention Unit. Proc. of MediaEval 2020, 14-15 December 2020, Online. Colorectal cancer is the third most common cause of cancer worldwide. In the era of medical Industry, identifying colorectal cancer in its early stages has been a challenging problem. Inspired by these issues, the main objective of this paper is to develop a Multi supervision net algorithm for segmenting polys on a comprehensive dataset. The risk of colorectal cancer could be reduced by early diagnosis of poly during a colonoscopy. The disease and their symptoms are highly varying and always a need for a continuous update of knowledge for the doctors and medical analyst. The diseases fall into different categories and a small variation of symptoms may lead to higher rate of risk. We have taken Medico polyp challenge dataset, which consists of 1000 segmented polyp images from gastrointestinal track. We proposed an efficient Net B4 as a pre-trained architecture in multi-supervision net. The model is trained with multiple output layers. We present quantitative results on colorectal dataset to evaluate the performance and achieved good results in all the performance metrics. The experimental results proved that the proposed model is robust and provides a good level of accuracy in segmenting polyps on a comprehensive dataset for different metrics such as Dice coefficient, Recall, Precision and F2.

HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper47.pdf YouTube: https://youtu.be/vMsM4zg2-JY Tien-Phat Nguyen, Tan-Cong Nguyen, Gia-Han Diep, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online. The Medico task, MediaEval 2020, explores the challenge of building accurate and high-performance algorithms to detect all types of polyps in endoscopic images. We proposed different approaches leveraging the advantages of either ResUnet++ or PraNet model to efficiently segment polyps in colonoscopy images, with modifications on the network structure, parameters, and training strategies to tackle various observed characteristics of the given dataset. Our methods outperform the other teams' methods, for both accuracy and efficiency. After the evaluation, we are at top 2 for task 1 (with Jaccard index of 0.777, best Precision and Accuracy scores) and top 1 for task 2 (with 67.52 FPS and Jaccard index of 0.658).

Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper31.pdf Syed Muhammad Faraz Ali, Muhammad Taha Khan, Syed Unaiz Haider, Talha Ahmed, Zeshan Khan and Muhammad Atif Tahir : Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Intestinal Tract. Proc. of MediaEval 2020, 14-15 December 2020, Online. Identification of polyps in endoscopic images is critical for the diagnosis of colon cancer. Finding the exact shape and size of polyps requires the segmentation of endoscopic images. This research explores the advantage of using depth-wise separable convolution in the atrous convolution of the ResUNet++ architecture. Deep atrous spatial pyramid pooling was also implemented on the ResUNet++ architecture. The results show that architecture with separable convolution has a smaller size and fewer GFLOPs without degrading the performance too much.

Deep Conditional Adversarial learning for polyp Segmentation

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper22.pdf Debapriya Banik and Debotosh Bhattacharjee : Deep Conditional Adversarial learning for polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online. This approach has addressed the Medico automatic polyp segmentation challenge which is a part of Mediaeval 2020. We have proposed a deep conditional adversarial learning based network for the automatic polyp segmentation task. The network comprises of two interdependent models namely a generator and a discriminator. The generator network is a FCN employed for the prediction of the polyp mask while the discriminator enforces the segmentation to be as similar as the real segmented mask (ground truth). Our proposed model achieved a comparative result on the test dataset provided by the organizers of the challenge.

A Temporal-Spatial Attention Model for Medical Image Detection

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper21.pdf Hwang Maxwell, Wu Cai, Hwang Kao-Shing, Xu Yong Si and Wu Chien-Hsing : A Temporal-Spatial Attention Model for Medical Image Detection. Proc. of MediaEval 2020, 14-15 December 2020, Online. A local region model with attentive temporal-spatial pathways is proposed for automatically learning various target structures. The attentive spatial pathway highlights the salient region to generate bounding boxes and ignores irrelevant regions in an input image. The proposed attention mechanism allows efficient object localization and the overall predictive performance is increased because there are fewer false positives for the object detection task for medical images with manual annotations. The experimental results show that proposed models consistently increase the base architectures' predictive performance for different datasets and training sizes without undue computational efficiency.

HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper20.pdf YouTube: https://youtu.be/CVelQl5Luf0 Quoc-Huy Trinh, Minh-Van Nguyen, Thiet-Gia Huynh and Minh-Triet Tran : HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Network and UNet for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online. The Medico: Multimedia Task focuses on developing an efficient and accurate framework to computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps in endoscopic images of the gastrointestinal (GI) tract. We are HCMUS-team approach a solution, which includes combination Residual module, Inception module, Adaptive Convolutional neural network with Unet model and PraNet to semantic segmentation all types of polyps in endoscopic images. We submit multiple runs with different architecture and parameters in our model. Our methods show potential results in accuracy and efficiency through multiple experiments.

Fine-tuning for Polyp Segmentation with Attention

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper15.pdf Rabindra Khadka : Transfer of Knowledge: Fine-tuning for Polyp Segmentation with Attention. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper describes how the transfer of prior knowledge can effectively take on segmentation tasks with the help of attention mechanisms. The UNet model pretrained on brain MRI dataset was fine-tuned with the polyp dataset. Attention mechanism was integrated to focus on relevant regions in the input images. The implemented architecture is evaluated on 200 validation images based on intersection over union and dice score between groundtruth and predicted region. The model demonstrates a promising result with computational efciency.

Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper12.pdf Adrian Krenzer and Frank Puppe : Bigger Networks are not Always Better: Deep Convolutional Neural Networks for Automated Polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper presents our team's (AI-JMU) approach to the Medico automated polyp segmentation challenge. We consider deep convolutional neural networks to be well suited for this task. To determine the best architecture we test and compare state of the art backbones and two different heads. Finally we achieve a Jaccard index of 73.74\% on the challenge test set. We further demonstrate that bigger networks do not always perform better. However the growing network size always increases the computational complexity.

Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper51.pdf Amel Ksibi, Amina Salhi, Ala Alluhaidan and Sahar A. El-Rahman : Insights for wellbeing: Predicting Personal Air Quality Index using Regression Approach. Proc. of MediaEval 2020, 14-15 December 2020, Online. Providing air pollution information to individuals enables them to understand the air quality of their living environments. Thus, the association between people’s wellbeing and the properties of the surrounding environment is an essential area of investigation. This paper proposes Air Quality Prediction through harvesting public/open data and leveraging them to get the Personal Air Quality index. These are usually incomplete. To cope with the problem of missing data, we applied the KNN imputation method. To predict Personal Air Quality Index, we apply a voting regression approach based on three base regressors which are Gradient Boosting regressor, Random Forest regressor, and linear regressor. Evaluating the experimental results using the RMSE metric, we got an average score of 35.39 for Walker and 51.16 for Car.

Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper40.pdf YouTube: https://youtu.be/SL5Hvu1mARY Trung-Quan Nguyen, Dang-Hieu Nguyen and Loc Tai Tan Nguyen : Use Visual Features From Surrounding Scenes to Improve Personal Air Quality Data Prediction Performance. Proc. of MediaEval 2020, 14-15 December 2020, Online. In this paper, we propose a method to predict the personal air quality index in an area by using the combination of the levels of the following pollutants: PM2.5, NO2, and O3, measured from the nearby weather stations of that area, and the photos of surrounding scenes taken at that area. Our approach uses the Inverse Distance Weighted (IDW) technique to estimate the missing air pollutant levels and then use regression to integrate visual features from taken photos to optimize the predicted values. After that, we can use those values to calculate the Air Quality Index (AQI). The results show that the proposed method may not improve the performance of the prediction in some cases.

More from multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...

HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...

Sports Video Classification: Classification of Strokes in Table Tennis for Me...

Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...

Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task

Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...

Fooling an Automatic Image Quality Estimator

Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...

Pixel Privacy: Quality Camouflage for Social Images

HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching

Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...

HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...

Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...

Deep Conditional Adversarial learning for polyp Segmentation

A Temporal-Spatial Attention Model for Medical Image Detection

HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...

Fine-tuning for Polyp Segmentation with Attention

Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...

Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...

Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...

Recently uploaded

BREEDING METHODS FOR DISEASE RESISTANCE.pptx

RASHMI M G

Randomised Optimisation Algorithms in DAPHNE

University of Maribor

Phenomics assisted breeding in crop improvement

IshaGoswami9

As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus, high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology, and analysis methods, numerous infrastructure platforms have been developed for phenotyping.

Chapter 12 - climate change and the energy crisis

tonzsalvador2222

Oedema_types_causes_pathophysiology.pptx

muralinath2

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...

Abdul Wali Khan University Mardan,kP,Pakistan

hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills

Deep Software Variability and Frictionless Reproducibility

University of Rennes, INSA Rennes, Inria/IRISA, CNRS

The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions. Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability. Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields. I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating). I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems. Exposé invité Journées Nationales du GDR GPL 2024

mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt

HongcNguyn6

8.Isolation of pure cultures and preservation of cultures.pdf

by6843629

Medical Orthopedic PowerPoint Templates.pptx

terusbelajar5

Nucleophilic Addition of carbonyl compounds.pptx

SSR02

Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general. Carbonyls undergo addition reactions with a large range of nucleophiles. Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen. Electronic effects (inductive effects, electron donation) have a large impact on reactivity. Large groups adjacent to the carbonyl will slow the rate of reaction. Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.

ESR spectroscopy in liquid food and beverages.pptx

PRIYANKA PATEL

With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.

Nucleic Acid-its structural and functional complexity.

Nistarini College, Purulia (W.B) India

Eukaryotic Transcription Presentation.pptx

RitabrataSarkar3

Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...

AbdullaAlAsif1

The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.

ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx

RASHMI M G

Applied Science: Thermodynamics, Laws & Methodology.pdf

University of Hertfordshire

When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best. Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level. Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.

Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...

University of Maribor

The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx

MAGOTI ERNEST

Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024). Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).

molar-distalization in orthodontics-seminar.pptx

Anagha Prasad

Recently uploaded (20)

BREEDING METHODS FOR DISEASE RESISTANCE.pptx

Randomised Optimisation Algorithms in DAPHNE

Phenomics assisted breeding in crop improvement

Chapter 12 - climate change and the energy crisis

Oedema_types_causes_pathophysiology.pptx

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...

Deep Software Variability and Frictionless Reproducibility

mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt

8.Isolation of pure cultures and preservation of cultures.pdf

Medical Orthopedic PowerPoint Templates.pptx

Nucleophilic Addition of carbonyl compounds.pptx

ESR spectroscopy in liquid food and beverages.pptx

Nucleic Acid-its structural and functional complexity.

Eukaryotic Transcription Presentation.pptx

Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...

ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx

Applied Science: Thermodynamics, Laws & Methodology.pdf

Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...

The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx

molar-distalization in orthodontics-seminar.pptx

MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video Interestingness

1. ETH-CVL @ MediaEval 2016: Textual-Visual Embeddings and Video2GIF for Video Interestingness Michael Gygli, PhD student @ Computer Vision Lab, ETH Zurich Work with Arun Balajee Vasudevan, Anna Volokitin and Luc Van Gool 1

2. Content ▪ Overview ▪ Textual-visual embedding ▪ Video2GIF ▪ Results 2Michael Gygli, PhD student @ ETH Zurich06/21/2016

3. Finding the most interesting and relevant content ▪ Two dominant approaches ▪ Use of textual information (title, category) to obtain a video specific model, often using web image priors E.g. [Khosla et al. CVPR 13], [Liu et al. IJCAI 09], Song et al. CVPR 15] ▪ Supervised learning with generic features and large training set E.g. [Potapov et al. ECCV 2014], [Sun et al. ECCV 2014], [Gygli et al. CVPR 16] ▪ Some methods use both E.g. [Liu et al. CVPR 15] 3

4. Textual visual embedding

5. Multi-task deep visual-semantic embedding for video thumbnail selection [Liu et al. CVPR 15] ▪ Use Bing image search data (query, image, # of clicks) to learn a joint embedding space for images and text ▪ Compute frame relevance as cosine similarity between the query or title embedding and the frame embedding [Liu et al. CVPR 2015] 5Michael Gygli, PhD student @ ETH Zurich06/21/2016

6. Our contribution with improvements over Liu et al. ● Siamese network ● Text Embedding Model ○ Words encoded through word2vec ○ LSTM to obtain fixed length embedding ● Convolutional Neural Network model ○ Fine-tuned VGG-19 ● Training Data based on learning a ranking of: (query+, image+, image- (“cat”, , ) 6

7. Video2GIF

8. Video2GIF: Automatic Generation of Animated GIFs from Video [Gygli et al. CVPR 16] Approach ▪ Work with segments as units ▪ Obtained through change-point detection [Song et al. CVPR 15] ▪ Train a deep neural network for ranking segments ... 8 Example video Highest scoring Lowest scoring Michael Gygli, PhD student @ ETH Zurich06/21/2016

9. Video2GIF: Automatic Generation of Animated GIFs from Video Approach ▪ Train a deep neural network for ranking segments ▪ Built on C3D network [Tran et al. ICCV 2015] ▪ Objective: score positives higher than negatives h: scoring function s+ : positive segment s- : negative segment 9Michael Gygli, PhD student @ ETH Zurich06/21/2016

10. Video2GIF: Dataset 10Michael Gygli, PhD student @ ETH Zurich06/21/2016 Available on github.com/gyglim/video2gif_dataset ▪ Large-scale training data: GIFs created from YouTube videos ▪ Align GIF back to video ▪ This part defines a positive example ▪ Assume non-selected parts are less interesting than selected part

11. Results Task Run mAP Image 1 0.1866 2 0.1952 3 0.1858 Video 1 0.1362 2 0.1574 Frame-based: • Run 1: Visual-semantic embedding trained on Clickture dataset • Run 2: As Run 1, with fine-tuning on development set • Run 3: As Run 1, but trained on a larger subset of Clickture Segment-based: • Run 1: Video2GIF • Run 2: Averaged score of Visual-semantic embedding and Video2GIF

12. Qualitative Results - Run 2 Title: Captives Title: After Earth Predicted best frame True best frame Predicted best frame True best frame

13. More information ▪ Paper Video2GIF: Automatic Generation of Animated GIFs from Video. M. Gygli, Y. Song, L. Cao, CVPR 2016 ▪ Slides on Video Summarization as Subset Selection @ Tutorial on Optimization Algorithms for Subset Selection and Summarization in Large Data Sets, CVPR 2016 https://t.co/mQIpxMab3v ▪ Demo website for Video2GIF: http://video2gif.info/autogif ▪ Paper Multi-task deep visual-semantic embedding for video thumbnail selection. W. Liu, T. Mei, Y. Zhang, C. Che, and J. Luo, CVPR 2015

14. Questions?

MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video Interestingness

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (16)

Similar to MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video Interestingness

Similar to MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video Interestingness (20)

More from multimediaeval

More from multimediaeval (20)

Recently uploaded

Recently uploaded (20)

MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video Interestingness