MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015

•

0 likes•152 views

In this paper, we present the systems developed by GTMUVigo team for the Multimedia Person Discovery in Broadcast TV task at MediaEval 2015. The systems propose two different strategies for person discovery in audio through speaker diarization (one based on an online clustering strategy with error correction using OCR information and the other based on agglomerative hierarchical clustering) as well as intrashot and intershot trategies for face clustering. http://ceur-ws.org/Vol-1436/ http://www.multimediaeval.org

GTM-UVigo Systems for Person Discovery Task
at MediaEval 2015
Paula L´opez Otero, Rosal´ıa Barros, Laura Doc´ıo Fern´andez,
Elisardo Gonz´alez Agulla, Jos´e Luis Alba Castro, Carmen Garc´ıa
Mateo
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 1/6

Main contributions
Error correction in speaker diarization using written names
Face tracking correction using quality scores
Visual Voice activity detection
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 2/6

Speaker diarization + written names
Speech activity detection
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6

Speaker diarization + written names
Speaker segmentation
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6

Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6

Face diarization + shot segmentation
Face detection and tracking
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6

Face diarization + shot segmentation
Quality Filter
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6

Face diarization + shot segmentation
Visual Voice Activity Detection
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6

Face diarization + shot segmentation
Face recognition
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6

Results
REPERE INA
EwMAP MAP C EwMAP MAP C
fusion 75.76 % 77.10 % 78.03 % 80.34 % 80.61 % 92.42 %
audio 69.37 % 70.90 % 78.48 % 89.38 % 89.76 % 97.34 %
video 73.94 % 75.29 % 78.03 % 80.66 % 80.94 % 92.46 %
baseline 63.58 % 63.93 % 71.75 % 78.35 % 78.64 % 92.71 %
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 5/6

Conclusions
Diﬃcult scenarios:
Audio: background music, noise.
Video: face pose and distance to the camara, video quality.
Face approaches work better in REPERE, but speech
approach works better in INA.
Future work: ﬁnding a smarter way to combine speech and
video.
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 6/6

GTM-UVigo Systems for Person Discovery Task
at MediaEval 2015
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 6/6

Presenter: Konstantin Pogorelov Simula @ MediaEval 2016 Context of Experience Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Carsten Griwodz Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_53.pdf Video: https://youtu.be/FTIeGpHhURU Abstract: This paper presents our approach for the Context of Multimedia Experience Task of the MediaEval 2016 Benchmark. We present different analyses of the given data using different subsets of data sources and combinations of it. Our approach gives a baseline evaluation indicating that metadata approaches work well but that also visual features can provide useful information for the given problem to solve.

The InVID Plug-in: Web Video Verification on the Browser

InVID Project

MediaEval 2016: LAPI at Predicting Media Interestingness Task

multimediaeval

Presenter: Mihai Gabriel Constantin LAPI at MediaEval 2016 Predicting Media Interestingness Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Mihai G. Constantin, Bogdan Boteanu, Bogdan Ionescu Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_23.pdf Video: https://youtu.be/4VKeMMeroG0 Abstract: This paper will present our results for the MediaEval 2016 Predicting Media Interestingness task. We proposed an approach based on video descriptors and studied several machine learning models, in order to detect the optimal configuration and combination for the descriptors and algorithms that compose our system.

MediaEval 2016 - BUT Zero-Cost Speech Recognition

multimediaeval

Presenter: Miroslav Skácel BUT Zero-Cost Speech Recognition 2016 System Description In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Miroslav Skácel, Martin Karafiát, Lucas Ondel, Albert Uchytil, Igor Szöke Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_48.pdf Video: https://youtu.be/0pNiLLVTa28 Abstract: This paper describes our work on developing speech recognizers for Vietnamese. It focuses on procedures to prepare provided data precisely. We aim on analysis of the textual transcriptions in particular. Methods to filter out defective data to improve performance of final system are proposed and described in detail. We also propose cleaning of other textual data used for language modeling. Several architectures are investigated to reach both sub-tasks goals. The achieved results are discussed.

MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models

multimediaeval

Presenter: Göksu Erdoğan HUCVL at MediaEval 2016: Predicting Interesting Key Frames with Deep Models In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Goksu Erdogan, Aykut Erdem, Erkut Erdem Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_18.pdf Video: https://youtu.be/A--6O2v81Cw Abstract: In MediaEval 2016, we focus on the image interestingness subtask which involves predicting interesting key frames of a video in the form of a movie trailer. We specifically propose three different deep models for this subtask. The first two models are based on fine-tuning two pretrained models, namely AlexNet and MemNet, where we cast the interestingness prediction as a regression problem. Our third deep model, on the other hand, depends on a triplet network which is comprised of three instances of the same feedforward network with shared weights, and trained according to a triplet ranking loss. Our experiments demonstrate that all these models provide relatively similar and promising results on the image interestingness subtask.

MediaEval 2016 - Verifying Multimedia Use Task Overview

multimediaeval

Presenters: Stuart E. Middleton and Christina Boididou Verifying Multimedia Use at MediaEval 2016 In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Christina Boididou, Symeon Papadopoulos, Duc-Tien Dang-Nguyen, Giulia Boato, Michael Riegler, Stuart E. Middleton, Andreas Petlund, and Yiannis Kompatsiaris Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_3.pdf Video: https://youtu.be/2Jx6OliFR-0 Abstract: This paper provides an overview of the Verifying Multimedia Use task that takes places as part of the 2016 MediaEval Benchmark. The task motivates the development of automated techniques for detecting manipulated and misleading use of web multimedia content. Splicing, tampering and reposting videos and images are examples of manipulation that are part of the task definition. For the 2016 edition of the task, a corpus of images/videos and their associated posts is made available, together with labels indicating the appearance of misuse (fake) or not (real) in each case as well as some useful post metadata.

MediaEval 2016 - Emotion in Music Task: Lessons Learned

multimediaeval

MediaEval 2015 - JRS at Synchronization of Multi-user Event Media Task

multimediaeval

The event synchronisation task addresses the problem of aligning media (i.e., photo and video) streams (“galleries”) from different users temporally and identifying coherent events in the streams. Our approach uses the visual similarity of image/key frame pairs based on full matching of SIFT descriptors with geometric verification. Based on the visual similarity and the given time information, a probabilistic algorithm is employed, where in each run a hypothesis is calculated for the set of time offsets with respect to the reference gallery. From the gathered hypotheses, the final set of time offsets is calculated as the medoid of all hypotheses. http://ceur-ws.org/Vol-1436/ http://www.multimediaeval.org

This paper describes the results of our participation to the Synchronization of Multi-User Event Media Task at the MediaEval 2015 challenge. Using multiple similarity measures, we identify pairs of similar media from different galleries. We use a graph-based approach to temporally synchronize user galleries; subsequently we use time information, geolocation information and visual concept detection results to cluster all photos into different sub-events. Our method achieves good accuracy on considerably diverse datasets. http://ceur-ws.org/Vol-1436/ http://www.multimediaeval.org

MediaEval 2015 - Synchronization of Multi-User Event Media at MediaEval 2015:...

multimediaeval

The objective of this paper is to provide an overview of the Synchronization of Multi-User Event Media (SEM) Task, which is part of the MediaEval Benchmark for Multimedia Evaluation. The SEM task was initially presented at MediaEval in 2014, with the goal of proposing a challenge in aligning multiple users’ photo galleries related to the same event but with unreliable timestamps. Besides aligning the pictures on a common timeline, participants were also required to detect the sub-events and cluster the pictures accordingly. For 2015 we have decided to extend the task also to other types of media, thus including audio and video information for a more complete and diversified representation of the analyzed event. http://ceur-ws.org/Vol-1436/ http://www.multimediaeval.org

MediaEval 2016 - TUD-MMC Predicting media Interestingness Task

multimediaeval

Presenter: Cynthia Liem TUD-MMC at MediaEval 2016: Predicting Media Interestingness Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Cynthia Liem Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_30.pdf Video: https://youtu.be/NQan10E_-kE Abstract: This working notes paper describes the TUD-MMC entry to the MediaEval 2016 Predicting Media Interestingness Task. Noting that the nature of movie trailer shots is different from that of preceding tasks on image and video interestingness, we propose two baseline heuristic approaches based on the clear occurrence of people. MAP scores obtained on the development set and test set suggest that our approaches cover a limited but non-marginal subset of the interestingness spectrum. Most strikingly, our obtained scores on the Image and Video Subtasks are comparable or better than those obtained when evaluating the ground truth annotations of the Image Subtask against the Video Subtask and vice versa

Media REVEALr: A social multimedia monitoring and intelligence system for Web...

Symeon Papadopoulos

MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...

multimediaeval

Presenter: Bogdan Boteanu, LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Bogdan Boteanu, Mihai G. Constantin, Bogdan Ionescu Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_20.pdf Video: https://youtu.be/mDI8Z31p7TY Abstract: In this paper we present the results achieved during the 2016 MediaEval Retrieving Diverse Social Images Task, using an approach based on pseudo-relevance feedback, in which human feedback is replaced by an automatic selection of images. The proposed approach is designed to have in priority the diversification of the results, in contrast to most of the existing techniques that address only the relevance. Diversification is achieved by exploiting a hierarchical clustering scheme followed by a diversification strategy. Methods are tested on the benchmarking data and results are analyzed. Insights for future work conclude the paper.

MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

multimediaeval

MediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task

multimediaeval

Presenter: Maigrot Cédric MediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Cédric Maigrot, Vincent Claveau, Ewa Kijak, Ronan Sicre Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_45.pdf Video: https://youtu.be/ay1zWydnijY Abstract: This paper presents a multi-modal hoax detection system composed of text, source, and image analysis. As hoax can be very diverse, we want to analyze several modalities to better detect them. This system is applied in the context of the Verifying Multimedia Use task of MediaEval 2016. Experiments show the performance of each separated modality as well as their combination.

Video Retrieval for Multimedia Verification of Breaking News on Social Networks

InVID Project

This slideset presents an approach to automatically detecting breaking news events from social media streams, using event detection to collecting near real time relevant video documents from social networks regarding that breaking news. A visual analytics dashboard provides access to the results of the content processing pipeline, providing a rich interactive interface to explore emerging stories and select video material around those stories for verification.

MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...

multimediaeval

MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...

multimediaeval

Presenter: Giorgos Kordopatis-Zilos Placing Images with Refined Language Models and Similarity Search with PCA-reduced VGG Features In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Giorgos Kordopatis-Zilos, Adrian Popescu, Symeon Papadopoulos, Yiannis Kompatsiaris Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_13.pdf Video: https://youtu.be/WR4I3CWjcR4 Abstract: We describe the participation of the CERTH/CEA-LIST team in the MediaEval 2016 Placing Task. We submitted five runs to the estimation-based sub-task: one based only on text by employing a Language Model-based approach with several refinements, one based on visual content, using geospatial clustering over the most visually similar images, and three based on a hybrid scheme exploiting both visual and textual cues from the multimedia items, trained on datasets of different size and origin. The best results were obtained by a hybrid approach trained with external training data and using two publicly available gazetteers.

MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015

multimediaeval

This paper provides an overview of the Verifying Multimedia Use task that takes places as part of the 2015 MediaEval Benchmark. The task deals with the automatic detection of manipulation and misuse of Web multimedia content. Its aim is to lay the basis for a future generation of tools that could assist media professionals in the process of verification. Examples of manipulation include maliciously tampering with images and videos, e.g., splicing, removal/addition of elements, while other kinds of misuse include the reposting of previously captured multimedia content in a different context (e.g., a new event) claiming that it was captured there. For the 2015 edition of the task, we have generated and made available a large corpus of real-world cases of images that were distributed through tweets, along with manually assigned labels regarding their use, i.e. misleading (fake) versus appropriate (real). http://ceur-ws.org/Vol-1436/ http://www.multimediaeval.org

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper62.pdf YouTube: https://youtu.be/gV-rvV3iFDA Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri and Julien Morlier : Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal CNN for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online. This work presents a method for classifying table tennis strokes using spatio-temporal convolutional neural networks. The fine-grained classification is performed on trimmed video segments recorded at 120 fps with different players performing in natural conditions. From those segments, the frames are extracted, their optical flow is computed and the pose of the player is estimated. From the optical flow amplitude, a region of interest is inferred. A three stream spatio-temporal convolutional neural network using combination of those modalities and 3D attention mechanisms is presented in order to perform classification. Presented by: Pierre-Etienne Martin

HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper50.pdf Hai Nguyen-Truong, San Cao, N. A. Khoa Nguyen, Bang-Dang Pham, Hieu Dao, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task. Proc. of MediaEval 2020, 14-15 December 2020, Online. The Sports Video Classification Tasks in the Multimedia Evaluation 2020 Challenge focuses on classifying different types of table tennis strokes in video segments. In this task, we - the HCMUS Team - perform multiple experiments, which includes a combination of models such as SlowFast, Optical Flow, DensePose, R2+1, Channel-Separated Convolutional Networks, to classify 21 types of table tennis strokes from video segments. In total, we submit eight runs corresponding to five different models with different sets of hyper-parameters in each of our models. In addition, we apply some pre-processing techniques on the dataset in order for our model to learn and classify more accurately. According to the evaluation results, one of our team's methods out-performs the other team's. In particular, our best run achieves 31.35\% global accuracy, and all of our methods show potential results in terms of local and global accuracy for action recognition tasks.

Sports Video Classification: Classification of Strokes in Table Tennis for Me...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper2.pdf YouTube: https://youtu.be/-bRL868b8ys Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri, Laurent Mascarilla, Jordan Calandre and Julien Morlier : Sports Video Classification: Classification of Strokes in Table Tennis for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online. Fine-grained action classification has raised new challenges compared to classical action classification problems. Sport video analysis is a very popular research topic, due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests, up to analysis of athletes' performances. Running since 2019 as a part of MediaEval, we offer a task which consists in classifying table tennis strokes from videos recorded in natural conditions at the University of Bordeaux. The aim is to build tools for teachers, coaches and players to analyse table tennis games. Such tools could lead to an automatic profiling of the player and adaptation of his training for improving his/her sport skills more efficiently. Presented by: Pierre-Etienne Martin

Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper61.pdf YouTube: https://youtu.be/brmI4g3jLS4 Ricardo Kleinlein, Cristina Luna-Jiménez, Fernando Fernández-Martínez and Zoraida Callejas : Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention and LSTM Models. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper reports on the GTH-UPM team experience in the Predicting Media Memorability task at MediaEval 2020. Teams were requested to predict memorability scores at both short-term and long-term, understanding such score as a measure of whether a video was perdurable in a viewer's memory or not. Our proposed system relies on a late fusion of the scores predicted by three sequential models, each trained over a different modality: video captions, aural embeddings and visual optical flow-based vectors. Whereas single-modality models show a low or zero Spearman correlation coefficient value, their combination considerably boosts performance over development data up to 0.2 in the short-term memorability prediction subtask and 0.19 in the long-term subtask. However, performance over test data drops to 0.016 and -0.041, respectively. Presented by: Ricardo Kleinlein

Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper52.pdf Janadhip Jacutprakart, Rukiye Savran Kiziltepe, John Q. Gan, Giorgos Papanastasiou and Alba G. Seco de Herrera : Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task. Proc. of MediaEval 2020, 14-15 December 2020, Online. In this paper, we present the methods of approach and the main results from the Essex NLIP Team’s participation in the MediEval 2020 Predicting Media Memorability task. The task requires participants to build systems that can predict short-term and long-term memorability scores on real-world video samples provided. The focus of our approach is on the use of colour-based visual features as well as the use of the video annotation meta-data. In addition, hyper-parameter tuning was explored. Besides the simplicity of the methodology, our approach achieves competitive results. We investigated the use of different visual features. We assessed the performance of memorability scores through various regression models where Random Forest regression is our final model, to predict the memorability of videos.

Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper6.pdf YouTube: https://youtu.be/ySGGu_4vaxs Alba García Seco De Herrera, Rukiye Savran Kiziltepe, Jon Chamberlain, Mihai Gabriel Constantin, Claire-Hélène Demarty, Faiyaz Doctor, Bogdan Ionescu and Alan F. Smeaton : Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a Video Memorable? Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper describes the MediaEval 2020 Predicting Media Memorability task. After first being proposed at MediaEval 2018, the Predicting Media Memorability task is in its 3rd edition this year, as the prediction of short-term and long-term video memorability (VM) remains a challenging task. In 2020, the format remained the same as in previous editions. This year the videos are a subset of the TRECVid 2019 Video to Text dataset, containing more action rich video content as compare with the 2019 task. In this paper a description of some aspects of this task is provided, including its main characteristics, a description of the collection, the ground truth dataset, evaluation metrics and the requirements for the run submission. Presented by: Rukiye Savran Kiziltepe

Fooling an Automatic Image Quality Estimator

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper45.pdf Benoit Bonnet, Teddy Furon and Patrick Bas : Fooling an Automatic Image Quality Estimator. Proc. of MediaEval 2020, 14-15 December 2020, Online. In this paper we present our work on the 2020 MediaEval task: Pixel "Privacy: Quality Camouflage for Social Images". Blind Image Quality Assessment (BIQA) is a classifier that for any given image will return a quality score. Our task is to modify an image to decrease its BIQA score while maintaining a good perceived quality. Since BIQA is a deep neural network, we worked on an adversarial attack approach of the problem.

Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper16.pdf YouTube: https://youtu.be/ix_b9K7j72w Zhengyu Zhao : Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable Color Filter. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper presents the submission of our RU-DS team to the Pixel Privacy Task 2020. We propose to fool the blind image quality assessment model by transforming images based on optimizing a human-understandable color filter. In contrast to the common work that relies on small, $L_p$-bounded additive pixel perturbations, our approach yields large yet smooth perturbations. Experimental results demonstrate that in the specific context of this task, our approach is able to achieve strong adversarial effects, but has to sacrifice the image appeal. Presented by: Zhengyu Zhao

Pixel Privacy: Quality Camouflage for Social Images

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper77.pdf YouTube: https://youtu.be/8Rr4KknGSac Zhuoran Liu, Zhengyu Zhao, Martha Larson and Laurent Amsaleg : Pixel Privacy: Quality Camouflage for Social Images. Proc. of MediaEval 2020, 14-15 December 2020, Online. High-quality social images shared online can be misappropriated for unauthorized goals, where the quality filtering step is commonly carried out by automatic Blind Image Quality Assessment (BIQA) algorithms. Pixel Privacy benchmarks privacy-protective approaches that protect privacy-sensitive images against unethical computer vision algorithms. In the 2020 task, participants are encouraged to develop camouflage methods that can effectively decrease the BIQA quality score of high-quality images and maintain image appeal. The camouflaged images need to be either imperceptible to the human eye, or it can be a visible enhancement. Presented by: Zhuoran Liu

HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper73.pdf YouTube: https://youtu.be/TadJ6y7xZeA Thuc Nguyen-Quang, Tuan-Duy Nguyen, Thang-Long Nguyen-Ho, Anh-Kiet Duong, Xuan-Nhat Hoang, Vinh-Thuyen Nguyen-Truong, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching. Proc. of MediaEval 2020, 14-15 December 2020, Online. Matching text and images based on their semantics has an important role in cross-media retrieval. However, text and images in articles have a complex connection. In the context of MediaEval 2020 Challenge, we propose three multi-modal methods for mapping text and images of news articles to the shared space in order to perform efficient cross-retrieval. Our methods show systemic improvement and validate our hypotheses, while the best-performed method reaches a recall@100 score of 0.2064. Presented by: Thuc Nguyen-Quang

Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper72.pdf Sabarinathan D and Suganya Ramamoorthy : Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attention Unit. Proc. of MediaEval 2020, 14-15 December 2020, Online. Colorectal cancer is the third most common cause of cancer worldwide. In the era of medical Industry, identifying colorectal cancer in its early stages has been a challenging problem. Inspired by these issues, the main objective of this paper is to develop a Multi supervision net algorithm for segmenting polys on a comprehensive dataset. The risk of colorectal cancer could be reduced by early diagnosis of poly during a colonoscopy. The disease and their symptoms are highly varying and always a need for a continuous update of knowledge for the doctors and medical analyst. The diseases fall into different categories and a small variation of symptoms may lead to higher rate of risk. We have taken Medico polyp challenge dataset, which consists of 1000 segmented polyp images from gastrointestinal track. We proposed an efficient Net B4 as a pre-trained architecture in multi-supervision net. The model is trained with multiple output layers. We present quantitative results on colorectal dataset to evaluate the performance and achieved good results in all the performance metrics. The experimental results proved that the proposed model is robust and provides a good level of accuracy in segmenting polyps on a comprehensive dataset for different metrics such as Dice coefficient, Recall, Precision and F2.

Viewers also liked

MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...

multimediaeval

MediaEval 2015 - Synchronization of Multi-User Event Media at MediaEval 2015:...

multimediaeval

MediaEval 2016 - TUD-MMC Predicting media Interestingness Task

multimediaeval

Media REVEALr: A social multimedia monitoring and intelligence system for Web...

Symeon Papadopoulos

MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...

multimediaeval

MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop

multimediaeval

MediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task

multimediaeval

Video Retrieval for Multimedia Verification of Breaking News on Social Networks

InVID Project

MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...

multimediaeval

MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...

multimediaeval

MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015

multimediaeval

Viewers also liked (11)