The MediaEval 2015 Affective Impact of Movies Task challenged participants to automatically find violent scenes in a set of videos and, also, to predict the affective impact that video content will have on viewers. We propose the use of several multimodal descriptors, such as visual, motion and auditory features, then we fuse their predictions to detect the violent or affective content. Our best-performing run with regard to the offcial metric received a MAP of 0.1419 in the violence detection task, and an accuracy of 45.038% for the arousal estimation and 36.123% for the valence estimation.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
CHI 2016: An EEG-based Approach for Evaluating Graphic Icons from the Perspec...Fu-Yin Cherng
Graphic icons play an increasingly important role in interface design due to the proliferation of digital devices in recent years. Their ability to express information in a universal fashion allows us to immediately interact with new applications, systems, and devices. Icons can, however, cause user confusion and frustration if designed poorly. Several studies have evaluated icons using behavioral-performance metrics such as reaction time as well as self-report methods. However, determining the usability of icons based on behavioral measures alone is not straightforward, because users’ interpretations of the meaning of icons involve various cognitive processes and perceptual mechanisms. Moreover, these perceptual mechanisms are affected not only by the icons them- selves, but by usage scenarios. Thus, we need a means of sensitively and continuously measuring users’ different cognitive processes when they are interacting with icons. In this study, we propose an EEG-based approach to icon evaluation, in which users’ EEG signals are measured in multiple usage scenarios. Based on a combination of EEG and behavioral results, we provide a novel interpretation of the participants’ perception during these tasks, and identify some important implications for icon design.
Meta learned Confidence for Few-shot LearningKIMMINHA3
This was presented Meta learned Confidence for Few-shot Learning on CVPR in 2020.
Few-shot learning is an important challenge under data scarcity.
When there is a lot of unlabeled data and data scarcity,
a) leveraging nearest neighbor graph
b) using predicted soft or hard labels on unlabeled samples to update the class prototype.
the model confidence may be unreliable, which may lead to incorrect predictions.
This paper describes privacy protection method based on a false coloring approach for Drone Protect Task of MediaEval 2015. The aim is to obscure regions of a video that are privacy sensitive without sacrificing intelligibility and pleasantness. False coloring transforms the original colors of pixels using a color palette into a different set of colors in which private information is harder to recognize. The method can be applied globally to an entire frame of the video or to a specific region of interest (ROI). The privacy protected output is expected to remain pleasant, and when needed, a close approximation of the original input can be recovered. Benchmarking evaluations on the mini-drone dataset show promising results, especially, for intelligibility and pleasantness criteria.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
MediaEval 2015 - Exploring Microblog Activity for the Prediction of Hyperlink...multimediaeval
In this paper, we present a social media based approach to finding anchors in video archives. We use social activity on Twitter to find topics on which people have questions about in order to select suitable anchors. The experiments were carried out on the MediaEval Search and Anchoring in Video Archives Task (SAVA) data set, consisting of 68 hours of BBC video content broadcasted in 2008. The performance of our relatively simple, but straightforward method seems sufficiently promising to pursue further research.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
MediaEval 2015 - Query by Example Search on Speech Taskmultimediaeval
In this paper, we describe the “Query by Example Search on Speech Task” (QUESST), held as part of the MediaEval 2015 evaluation campaign. As in previous years, the proposed task requires performing language-independent audio search in a low resource scenario. This year, the task has been designed to get as close as possible to a practical use case scenario, in which a user would like to retrieve, using speech, utterances containing a given word or short sentence, including those with limited inflectional variations of words, some filler content and/or word re-orderings. We also stressed a mismatch caused by noise and reverberations.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
MediaEval 2015 - The C@merata Task at MediaEval 2015: Natural Language:Querie...multimediaeval
This was the second year of the C@merata task which relates natural language processing to music information retrieval. Participants each build a system which takes as input a query and a music score and produces as output one or more matching passages in the score. This year, questions were more difficult and scores were more complex. Participants were the same as last year and once again CLAS was the best with a Beat F-Score of 0.620.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015multimediaeval
This paper describes the system developed by the NNI team for the Query-by-Example Search on Speech Task (QUESST) in the MediaEval 2015 evaluation. Our submitted system mainly used bottleneck features/stacked bottleneck features (BNF/SBNF) trained from various resources. We investigated noise robustness techniques to deal with the noisy data of this year. The submitted system obtained the actual normalized cross entropy (actCnxe) of 0.761 and the actual Term Weighted Value (actTWV) of 0.270 on all types of queries of the evaluation data
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
CHI 2016: An EEG-based Approach for Evaluating Graphic Icons from the Perspec...Fu-Yin Cherng
Graphic icons play an increasingly important role in interface design due to the proliferation of digital devices in recent years. Their ability to express information in a universal fashion allows us to immediately interact with new applications, systems, and devices. Icons can, however, cause user confusion and frustration if designed poorly. Several studies have evaluated icons using behavioral-performance metrics such as reaction time as well as self-report methods. However, determining the usability of icons based on behavioral measures alone is not straightforward, because users’ interpretations of the meaning of icons involve various cognitive processes and perceptual mechanisms. Moreover, these perceptual mechanisms are affected not only by the icons them- selves, but by usage scenarios. Thus, we need a means of sensitively and continuously measuring users’ different cognitive processes when they are interacting with icons. In this study, we propose an EEG-based approach to icon evaluation, in which users’ EEG signals are measured in multiple usage scenarios. Based on a combination of EEG and behavioral results, we provide a novel interpretation of the participants’ perception during these tasks, and identify some important implications for icon design.
Meta learned Confidence for Few-shot LearningKIMMINHA3
This was presented Meta learned Confidence for Few-shot Learning on CVPR in 2020.
Few-shot learning is an important challenge under data scarcity.
When there is a lot of unlabeled data and data scarcity,
a) leveraging nearest neighbor graph
b) using predicted soft or hard labels on unlabeled samples to update the class prototype.
the model confidence may be unreliable, which may lead to incorrect predictions.
This paper describes privacy protection method based on a false coloring approach for Drone Protect Task of MediaEval 2015. The aim is to obscure regions of a video that are privacy sensitive without sacrificing intelligibility and pleasantness. False coloring transforms the original colors of pixels using a color palette into a different set of colors in which private information is harder to recognize. The method can be applied globally to an entire frame of the video or to a specific region of interest (ROI). The privacy protected output is expected to remain pleasant, and when needed, a close approximation of the original input can be recovered. Benchmarking evaluations on the mini-drone dataset show promising results, especially, for intelligibility and pleasantness criteria.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
MediaEval 2015 - Exploring Microblog Activity for the Prediction of Hyperlink...multimediaeval
In this paper, we present a social media based approach to finding anchors in video archives. We use social activity on Twitter to find topics on which people have questions about in order to select suitable anchors. The experiments were carried out on the MediaEval Search and Anchoring in Video Archives Task (SAVA) data set, consisting of 68 hours of BBC video content broadcasted in 2008. The performance of our relatively simple, but straightforward method seems sufficiently promising to pursue further research.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
MediaEval 2015 - Query by Example Search on Speech Taskmultimediaeval
In this paper, we describe the “Query by Example Search on Speech Task” (QUESST), held as part of the MediaEval 2015 evaluation campaign. As in previous years, the proposed task requires performing language-independent audio search in a low resource scenario. This year, the task has been designed to get as close as possible to a practical use case scenario, in which a user would like to retrieve, using speech, utterances containing a given word or short sentence, including those with limited inflectional variations of words, some filler content and/or word re-orderings. We also stressed a mismatch caused by noise and reverberations.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
MediaEval 2015 - The C@merata Task at MediaEval 2015: Natural Language:Querie...multimediaeval
This was the second year of the C@merata task which relates natural language processing to music information retrieval. Participants each build a system which takes as input a query and a music score and produces as output one or more matching passages in the score. This year, questions were more difficult and scores were more complex. Participants were the same as last year and once again CLAS was the best with a Beat F-Score of 0.620.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015multimediaeval
This paper describes the system developed by the NNI team for the Query-by-Example Search on Speech Task (QUESST) in the MediaEval 2015 evaluation. Our submitted system mainly used bottleneck features/stacked bottleneck features (BNF/SBNF) trained from various resources. We investigated noise robustness techniques to deal with the noisy data of this year. The submitted system obtained the actual normalized cross entropy (actCnxe) of 0.761 and the actual Term Weighted Value (actTWV) of 0.270 on all types of queries of the evaluation data
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
Presenter: Jaeyoung Choi
The Placing Task at MediaEval 2016 In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Jaeyoung Choi, Claudia Hauff, Olivier Van Laere, Bart Thomee
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_7.pdf
Video: https://youtu.be/8646sUGqMbg
Abstract: The seventh edition of the Placing Task at MediaEval focuses on two challenges: (1) estimation-based placing, which addresses estimating the geographic location where a photo or video was taken, and (2) verification-based placing, which addresses verifying whether a photo or video was indeed taken at a pre-specified geographic location. Like the previous edition, we made the organizer baselines for both subtasks available as open source code, and published a live leaderboard that allows the participants to gain insights into the effectiveness of their approaches compared to the official baselines and in relation to each other at an early stage, before the actual run submissions are due.
MediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images Taskmultimediaeval
Ana Garcia Serrano
UNED-UV @ Retrieving Diverse Social Images Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Angel C. González, Xaro B. Garcia, Ana García-Serrano, Esther de Ves Cuenca
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf
Video: https://youtu.be/EyuN7IA1HFk
Abstract: This paper details the participation of the UNED-UV group at the MediaEval 2016 Retrieving Diverse Social Images Task using a multimodal approach. Several Local Logistic Regression models, which use the visual low-level features, estimate the relevance probability for all the images in the dataset. Then, the images are ranked by selecting the highest probability image at each of the textual clusters. These textual clusters are generated by making use of a textual algorithm based on Formal Concept Analysis (FCA) and Hierarchical Agglomerative Clustering (HAC) to detect the latent topics addressed. The images will be then diversified according to detected topics.
MediaEval 2016 - Simula Team @ Context of Experience Taskmultimediaeval
Presenter: Konstantin Pogorelov
Simula @ MediaEval 2016 Context of Experience Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Carsten Griwodz
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_53.pdf
Video: https://youtu.be/FTIeGpHhURU
Abstract: This paper presents our approach for the Context of Multimedia Experience Task of the MediaEval 2016 Benchmark. We present different analyses of the given data using different subsets of data sources and combinations of it. Our approach gives a baseline evaluation indicating that metadata approaches work well but that also visual features can provide useful information for the given problem to solve.
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...multimediaeval
Presenter: Bogdan Boteanu,
LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Bogdan Boteanu, Mihai G. Constantin, Bogdan Ionescu
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_20.pdf
Video: https://youtu.be/mDI8Z31p7TY
Abstract: In this paper we present the results achieved during the 2016 MediaEval Retrieving Diverse Social Images Task, using an approach based on pseudo-relevance feedback, in which human feedback is replaced by an automatic selection of images. The proposed approach is designed to have in priority the diversification of the results, in contrast to most of the existing techniques that address only the relevance. Diversification is achieved by exploiting a hierarchical clustering scheme followed by a diversification strategy. Methods are tested on the benchmarking data and results are analyzed. Insights for future work conclude the paper.
MediaEval 2016 - ININ Submission to Zero Cost ASR Taskmultimediaeval
Presenter: Tejas Godambe
ININ Submission to Zero Cost ASR Task at MediaEval 2016 In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Tejas Godambe, Naresh Kumar, Pavan Kumar, Veera Raghavendra, Aravind Ganapathiraju
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_31.pdf
Video: https://youtu.be/e70xRjsUUts
Abstract: This paper details the experiments conducted to train an as good performing Vietnamese speech recognition system as possible using public domain data only, as a part of the Zero Cost task at MediEval 2016. We explored techniques related to audio preprocessing, use of speaker’s pitch information, data perturbation, for building subspace Gaussian mixture acoustic model which is known for estimating robust parameters when the amount of data is less, and also unsupervised adaptation, RNN language model based lattice rescoring and system combination using ROVER technique.
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...multimediaeval
Presenter: Giorgos Kordopatis-Zilos
Placing Images with Refined Language Models and Similarity Search with PCA-reduced VGG Features In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Giorgos Kordopatis-Zilos, Adrian Popescu, Symeon Papadopoulos, Yiannis Kompatsiaris
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_13.pdf
Video: https://youtu.be/WR4I3CWjcR4
Abstract: We describe the participation of the CERTH/CEA-LIST team in the MediaEval 2016 Placing Task. We submitted five runs to the estimation-based sub-task: one based only on text by employing a Language Model-based approach with several refinements, one based on visual content, using geospatial clustering over the most visually similar images, and three based on a hybrid scheme exploiting both visual and textual cues from the multimedia items, trained on datasets of different size and origin. The best results were obtained by a hybrid approach trained with external training data and using two publicly available gazetteers.
Presenter: Claire-Hélène Demarty
MediaEval 2016 Predicting Media Interestingness Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Claire-Hélène Demarty, Mats Sjöberg, Bogdan Ionescu, Thanh-Toan Do, Hanli Wang, Ngoc Q. K. Duong, and Frédéric Lefebvre
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_1.pdf
Video: https://youtu.be/rAarQaEr9-w
Abstract: This paper provides an overview of the Predicting Media Interestingness task that is organized as part of the MediaEval 2016 Benchmarking Initiative for Multimedia Evaluation. The task, which is running for the first year, expects participants to create systems that automatically select images and video segments that are considered to be the most interesting for a common viewer. In this paper, we present the task use case and challenges, the proposed data set and ground truth, the required participant runs and the evaluation metrics.
MediaEval 2016 - UNIFESP Predicting Media Interestingness Taskmultimediaeval
Presenter: Samuel G. Fadel
UNIFESP at MediaEval 2016: Predicting Media Interestingness Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Jurandy Almeida
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_28.pdf
Video: https://youtu.be/YLthKNczlcA
Abstract: This paper describes the approach proposed by UNIFESP for the MediaEval 2016 Predicting Media Interestingness Task and for its video subtask only. The proposed approach is based on combining learning-to-rank algorithms for predicting the interestingness of videos by their visual content.
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonCSCJournals
Key-frame extraction is one of the important steps in semantic concept based video indexing and retrieval and accuracy of video concept detection highly depends on the effectiveness of keyframe extraction method. Therefore, extracting key-frames efficiently and effectively from video shots is considered to be a very challenging research problem in video retrieval systems. One of many approaches to extract key-frames from a shot is to make use of unsupervised clustering. Depending on the salient content of the shot and results of clustering, key-frames can be extracted. But usually, because of the visual complexity and/or the content of the video shot, we tend to get near duplicate or repetitive key-frames having the same semantic content in the output and hence accuracy of key-frame extraction decreases. In an attempt to improve accuracy, we proposed a novel key-frame extraction method based on unsupervised clustering and mutual comparison where we assigned 70% weightage to color component (HSV histogram) and 30% to texture (GLCM), while computing a combined frame similarity index used for clustering. We suggested a mutual comparison of the key-frames extracted from the output of the clustering where each key-frame is compared with every other to remove near duplicate keyframes. The proposed algorithm is both computationally simple and able to detect non-redundant and unique key-frames for the shot and as a result improving concept detection rate. The efficiency and effectiveness are validated by open database videos.
Semantic Segmentation State of the Art on AUV imagery of plant species classification.
Work during an internship. Led to the conclusion that the labelling was too approximate to allow semantic segmentation to work well.
Full details: https://imatge.upc.edu/web/publications/visual-instance-mining-news-videos-using-graph-based-approach
Author: David Almendros-Gutiérrez
Advisors: Xavier Giró-i-Nieto (UPC) and Horst Eidenberger (TU Wien)
Degree: Telecommunications Engineering (5 years) at Telecom BCN-ETSETB (UPC)
The aim of this thesis is to design a tool that performs visual instance search mining for news video summarization. This means to extract the relevant content of the video in order to be able to recognize the storyline of the news.
Initially, a sampling of the video is required to get the frames with a desired rate. Then, different relevant contents are detected from each frame, focusing on faces, text and several objects that the user can select. Next, we use a graph-based clustering method in order to recognize them with a high accuracy and select the most representative ones to show them in the visual summary. Furthermore, a graphical user interface in Wt was developed to create an online demo to test the application.
During the development of the application we have been testing the tool with the CCMA dataset. We prepared a web-based survey based on four results from this dataset to check the opinion of the users. We also validate our visual instance mining results comparing them with the results obtained applying an algorithm developed at Columbia University for video summarization. We have run the algorithm on a dataset of a few videos on two events: 'Boston bombings' and the 'search of the Malaysian airlines flight'. We carried out another web-based survey in which users could compare our approach with this related work. With these surveys we analyze if our tool fulfill the requirements we set up.
We can conclude that our system extract visual instances that show the most relevant content of news videos and can be used to summarize these videos effectively.
Event recognition image & video segmentationeSAT Journals
Abstract This paper gives a clear look at the segmentation process at the basic level. Segmentation is done at multiple levels so that we will get different results. Segmentation of relative motion descriptors gives a clear picture about the segmentation done for a given input video. Relative motion computation and histograms incrementation are used to evaluate this approach. Also here we will give complete information about the related research which is done about how segmentation can be done for the both images and videos. Keywords: Image Segmentation, Video Segmentation.
From sensor readings to prediction: on the process of developing practical so...Manuel Martín
Automatic data acquisition systems provide large amounts of streaming data generated by physical sensors. This data forms an input to computational models (soft sensors) routinely used for monitoring and control of industrial processes, traffic patterns, environment and natural hazards, and many more. The majority of these models assume that the data comes in a cleaned and pre-processed form, ready to be fed directly into a predictive model. In practice, to ensure appropriate data quality, most of the modelling efforts concentrate on preparing data from raw sensor readings to be used as model inputs. This study analyzes the process of data preparation for predictive models with streaming sensor data. We present the challenges of data preparation as a four-step process, identify the key challenges in each step, and provide recommendations for handling these issues. The discussion is focused on the approaches that are less commonly used, while, based on our experience, may contribute particularly well to solving practical soft sensor tasks. Our arguments are illustrated with a case study in the chemical production industry.
Presenter: Jaeyoung Choi
The Placing Task at MediaEval 2016 In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Jaeyoung Choi, Claudia Hauff, Olivier Van Laere, Bart Thomee
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_7.pdf
Video: https://youtu.be/8646sUGqMbg
Abstract: The seventh edition of the Placing Task at MediaEval focuses on two challenges: (1) estimation-based placing, which addresses estimating the geographic location where a photo or video was taken, and (2) verification-based placing, which addresses verifying whether a photo or video was indeed taken at a pre-specified geographic location. Like the previous edition, we made the organizer baselines for both subtasks available as open source code, and published a live leaderboard that allows the participants to gain insights into the effectiveness of their approaches compared to the official baselines and in relation to each other at an early stage, before the actual run submissions are due.
MediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images Taskmultimediaeval
Ana Garcia Serrano
UNED-UV @ Retrieving Diverse Social Images Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Angel C. González, Xaro B. Garcia, Ana García-Serrano, Esther de Ves Cuenca
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf
Video: https://youtu.be/EyuN7IA1HFk
Abstract: This paper details the participation of the UNED-UV group at the MediaEval 2016 Retrieving Diverse Social Images Task using a multimodal approach. Several Local Logistic Regression models, which use the visual low-level features, estimate the relevance probability for all the images in the dataset. Then, the images are ranked by selecting the highest probability image at each of the textual clusters. These textual clusters are generated by making use of a textual algorithm based on Formal Concept Analysis (FCA) and Hierarchical Agglomerative Clustering (HAC) to detect the latent topics addressed. The images will be then diversified according to detected topics.
MediaEval 2016 - Simula Team @ Context of Experience Taskmultimediaeval
Presenter: Konstantin Pogorelov
Simula @ MediaEval 2016 Context of Experience Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Carsten Griwodz
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_53.pdf
Video: https://youtu.be/FTIeGpHhURU
Abstract: This paper presents our approach for the Context of Multimedia Experience Task of the MediaEval 2016 Benchmark. We present different analyses of the given data using different subsets of data sources and combinations of it. Our approach gives a baseline evaluation indicating that metadata approaches work well but that also visual features can provide useful information for the given problem to solve.
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...multimediaeval
Presenter: Bogdan Boteanu,
LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Bogdan Boteanu, Mihai G. Constantin, Bogdan Ionescu
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_20.pdf
Video: https://youtu.be/mDI8Z31p7TY
Abstract: In this paper we present the results achieved during the 2016 MediaEval Retrieving Diverse Social Images Task, using an approach based on pseudo-relevance feedback, in which human feedback is replaced by an automatic selection of images. The proposed approach is designed to have in priority the diversification of the results, in contrast to most of the existing techniques that address only the relevance. Diversification is achieved by exploiting a hierarchical clustering scheme followed by a diversification strategy. Methods are tested on the benchmarking data and results are analyzed. Insights for future work conclude the paper.
MediaEval 2016 - ININ Submission to Zero Cost ASR Taskmultimediaeval
Presenter: Tejas Godambe
ININ Submission to Zero Cost ASR Task at MediaEval 2016 In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Tejas Godambe, Naresh Kumar, Pavan Kumar, Veera Raghavendra, Aravind Ganapathiraju
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_31.pdf
Video: https://youtu.be/e70xRjsUUts
Abstract: This paper details the experiments conducted to train an as good performing Vietnamese speech recognition system as possible using public domain data only, as a part of the Zero Cost task at MediEval 2016. We explored techniques related to audio preprocessing, use of speaker’s pitch information, data perturbation, for building subspace Gaussian mixture acoustic model which is known for estimating robust parameters when the amount of data is less, and also unsupervised adaptation, RNN language model based lattice rescoring and system combination using ROVER technique.
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...multimediaeval
Presenter: Giorgos Kordopatis-Zilos
Placing Images with Refined Language Models and Similarity Search with PCA-reduced VGG Features In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Giorgos Kordopatis-Zilos, Adrian Popescu, Symeon Papadopoulos, Yiannis Kompatsiaris
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_13.pdf
Video: https://youtu.be/WR4I3CWjcR4
Abstract: We describe the participation of the CERTH/CEA-LIST team in the MediaEval 2016 Placing Task. We submitted five runs to the estimation-based sub-task: one based only on text by employing a Language Model-based approach with several refinements, one based on visual content, using geospatial clustering over the most visually similar images, and three based on a hybrid scheme exploiting both visual and textual cues from the multimedia items, trained on datasets of different size and origin. The best results were obtained by a hybrid approach trained with external training data and using two publicly available gazetteers.
Presenter: Claire-Hélène Demarty
MediaEval 2016 Predicting Media Interestingness Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Claire-Hélène Demarty, Mats Sjöberg, Bogdan Ionescu, Thanh-Toan Do, Hanli Wang, Ngoc Q. K. Duong, and Frédéric Lefebvre
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_1.pdf
Video: https://youtu.be/rAarQaEr9-w
Abstract: This paper provides an overview of the Predicting Media Interestingness task that is organized as part of the MediaEval 2016 Benchmarking Initiative for Multimedia Evaluation. The task, which is running for the first year, expects participants to create systems that automatically select images and video segments that are considered to be the most interesting for a common viewer. In this paper, we present the task use case and challenges, the proposed data set and ground truth, the required participant runs and the evaluation metrics.
MediaEval 2016 - UNIFESP Predicting Media Interestingness Taskmultimediaeval
Presenter: Samuel G. Fadel
UNIFESP at MediaEval 2016: Predicting Media Interestingness Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Jurandy Almeida
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_28.pdf
Video: https://youtu.be/YLthKNczlcA
Abstract: This paper describes the approach proposed by UNIFESP for the MediaEval 2016 Predicting Media Interestingness Task and for its video subtask only. The proposed approach is based on combining learning-to-rank algorithms for predicting the interestingness of videos by their visual content.
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonCSCJournals
Key-frame extraction is one of the important steps in semantic concept based video indexing and retrieval and accuracy of video concept detection highly depends on the effectiveness of keyframe extraction method. Therefore, extracting key-frames efficiently and effectively from video shots is considered to be a very challenging research problem in video retrieval systems. One of many approaches to extract key-frames from a shot is to make use of unsupervised clustering. Depending on the salient content of the shot and results of clustering, key-frames can be extracted. But usually, because of the visual complexity and/or the content of the video shot, we tend to get near duplicate or repetitive key-frames having the same semantic content in the output and hence accuracy of key-frame extraction decreases. In an attempt to improve accuracy, we proposed a novel key-frame extraction method based on unsupervised clustering and mutual comparison where we assigned 70% weightage to color component (HSV histogram) and 30% to texture (GLCM), while computing a combined frame similarity index used for clustering. We suggested a mutual comparison of the key-frames extracted from the output of the clustering where each key-frame is compared with every other to remove near duplicate keyframes. The proposed algorithm is both computationally simple and able to detect non-redundant and unique key-frames for the shot and as a result improving concept detection rate. The efficiency and effectiveness are validated by open database videos.
Semantic Segmentation State of the Art on AUV imagery of plant species classification.
Work during an internship. Led to the conclusion that the labelling was too approximate to allow semantic segmentation to work well.
Full details: https://imatge.upc.edu/web/publications/visual-instance-mining-news-videos-using-graph-based-approach
Author: David Almendros-Gutiérrez
Advisors: Xavier Giró-i-Nieto (UPC) and Horst Eidenberger (TU Wien)
Degree: Telecommunications Engineering (5 years) at Telecom BCN-ETSETB (UPC)
The aim of this thesis is to design a tool that performs visual instance search mining for news video summarization. This means to extract the relevant content of the video in order to be able to recognize the storyline of the news.
Initially, a sampling of the video is required to get the frames with a desired rate. Then, different relevant contents are detected from each frame, focusing on faces, text and several objects that the user can select. Next, we use a graph-based clustering method in order to recognize them with a high accuracy and select the most representative ones to show them in the visual summary. Furthermore, a graphical user interface in Wt was developed to create an online demo to test the application.
During the development of the application we have been testing the tool with the CCMA dataset. We prepared a web-based survey based on four results from this dataset to check the opinion of the users. We also validate our visual instance mining results comparing them with the results obtained applying an algorithm developed at Columbia University for video summarization. We have run the algorithm on a dataset of a few videos on two events: 'Boston bombings' and the 'search of the Malaysian airlines flight'. We carried out another web-based survey in which users could compare our approach with this related work. With these surveys we analyze if our tool fulfill the requirements we set up.
We can conclude that our system extract visual instances that show the most relevant content of news videos and can be used to summarize these videos effectively.
Event recognition image & video segmentationeSAT Journals
Abstract This paper gives a clear look at the segmentation process at the basic level. Segmentation is done at multiple levels so that we will get different results. Segmentation of relative motion descriptors gives a clear picture about the segmentation done for a given input video. Relative motion computation and histograms incrementation are used to evaluate this approach. Also here we will give complete information about the related research which is done about how segmentation can be done for the both images and videos. Keywords: Image Segmentation, Video Segmentation.
From sensor readings to prediction: on the process of developing practical so...Manuel Martín
Automatic data acquisition systems provide large amounts of streaming data generated by physical sensors. This data forms an input to computational models (soft sensors) routinely used for monitoring and control of industrial processes, traffic patterns, environment and natural hazards, and many more. The majority of these models assume that the data comes in a cleaned and pre-processed form, ready to be fed directly into a predictive model. In practice, to ensure appropriate data quality, most of the modelling efforts concentrate on preparing data from raw sensor readings to be used as model inputs. This study analyzes the process of data preparation for predictive models with streaming sensor data. We present the challenges of data preparation as a four-step process, identify the key challenges in each step, and provide recommendations for handling these issues. The discussion is focused on the approaches that are less commonly used, while, based on our experience, may contribute particularly well to solving practical soft sensor tasks. Our arguments are illustrated with a case study in the chemical production industry.
https://imatge.upc.edu/web/publications/keyframe-based-video-summarization-designer
This Final Degree Work extends two previous projects and consists in carrying out an improvement of the video keyframe extraction module from one of them called Designer Master, by integrating the algorithms that were developed in the other, Object Maps.
Firstly the proposed solution is explained, which consists in a shot detection method, where the input video is sampled uniformly and afterwards, cumulative pixel-to-pixel difference is applied and a classifier decides which frames are keyframes or not.
Last, to validate our approach we conducted a user study in which both applications were compared. Users were asked to complete a survey regarding to different summaries created by means of the original application and with the one developed in this project. The results obtained were analyzed and they showed that the improvement done in the keyframes extraction module improves slightly the application performance and the quality of the generated summaries.
Practical Approaches to Target Detection in Long Range and Low Quality Infrar...sipij
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
Practical Approaches to Target Detection in Long Range and Low Quality Infrar...sipij
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
Practical Approaches to Target Detection in Long Range and Low Quality Infrar...sipij
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
Similar to MediaEval 2015 - RFA at MediaEval 2015 Affective Impact of Movies Task: A Multimodal Approach (20)
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper62.pdf
YouTube: https://youtu.be/gV-rvV3iFDA
Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri and Julien Morlier : Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal CNN for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This work presents a method for classifying table tennis strokes using spatio-temporal convolutional neural networks. The fine-grained classification is performed on trimmed video segments recorded at 120 fps with different players performing in natural conditions. From those segments, the frames are extracted, their optical flow is computed and the pose of the player is estimated. From the optical flow amplitude, a region of interest is inferred. A three stream spatio-temporal convolutional neural network using combination of those modalities and 3D attention mechanisms is presented in order to perform classification.
Presented by: Pierre-Etienne Martin
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper50.pdf
Hai Nguyen-Truong, San Cao, N. A. Khoa Nguyen, Bang-Dang Pham, Hieu Dao, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Sports Video Classification Tasks in the Multimedia Evaluation 2020 Challenge focuses on classifying different types of table tennis strokes in video segments. In this task, we - the HCMUS Team - perform multiple experiments, which includes a combination of models such as SlowFast, Optical Flow, DensePose, R2+1, Channel-Separated Convolutional Networks, to classify 21 types of table tennis strokes from video segments. In total, we submit eight runs corresponding to five different models with different sets of hyper-parameters in each of our models. In addition, we apply some pre-processing techniques on the dataset in order for our model to learn and classify more accurately. According to the evaluation results, one of our team's methods out-performs the other team's. In particular, our best run achieves 31.35\% global accuracy, and all of our methods show potential results in terms of local and global accuracy for action recognition tasks.
Sports Video Classification: Classification of Strokes in Table Tennis for Me...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper2.pdf
YouTube: https://youtu.be/-bRL868b8ys
Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri, Laurent Mascarilla, Jordan Calandre and Julien Morlier : Sports Video Classification: Classification of Strokes in Table Tennis for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Fine-grained action classification has raised new challenges compared to classical action classification problems. Sport video analysis is a very popular research topic, due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests, up to analysis of athletes' performances. Running since 2019 as a part of MediaEval, we offer a task which consists in classifying table tennis strokes from videos recorded in natural conditions at the University of Bordeaux. The aim is to build tools for teachers, coaches and players to analyse table tennis games. Such tools could lead to an automatic profiling of the player and adaptation of his training for improving his/her sport skills more efficiently.
Presented by: Pierre-Etienne Martin
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper61.pdf
YouTube: https://youtu.be/brmI4g3jLS4
Ricardo Kleinlein, Cristina Luna-Jiménez, Fernando Fernández-Martínez and Zoraida Callejas : Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention and LSTM Models. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper reports on the GTH-UPM team experience in the Predicting Media Memorability task at MediaEval 2020. Teams were requested to predict memorability scores at both short-term and long-term, understanding such score as a measure of whether a video was perdurable in a viewer's memory or not. Our proposed system relies on a late fusion of the scores predicted by three sequential models, each trained over a different modality: video captions, aural embeddings and visual optical flow-based vectors. Whereas single-modality models show a low or zero Spearman correlation coefficient value, their combination considerably boosts performance over development data up to 0.2 in the short-term memorability prediction subtask and 0.19 in the long-term subtask. However, performance over test data drops to 0.016 and -0.041, respectively.
Presented by: Ricardo Kleinlein
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Taskmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper52.pdf
Janadhip Jacutprakart, Rukiye Savran Kiziltepe, John Q. Gan, Giorgos Papanastasiou and Alba G. Seco de Herrera : Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we present the methods of approach and the main results from the Essex NLIP Team’s participation in the MediEval 2020 Predicting Media Memorability task. The task requires participants to build systems that can predict short-term and long-term memorability scores on real-world video samples provided. The focus of our approach is on the use of colour-based visual features as well as the use of the video annotation meta-data. In addition, hyper-parameter tuning was explored. Besides the simplicity of the methodology, our approach achieves competitive results. We investigated the use of different visual features. We assessed the performance of memorability scores through various regression models where Random Forest regression is our final model, to predict the memorability of videos.
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper6.pdf
YouTube: https://youtu.be/ySGGu_4vaxs
Alba García Seco De Herrera, Rukiye Savran Kiziltepe, Jon Chamberlain, Mihai Gabriel Constantin, Claire-Hélène Demarty, Faiyaz Doctor, Bogdan Ionescu and Alan F. Smeaton : Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a Video Memorable? Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper describes the MediaEval 2020 Predicting Media Memorability task. After first being proposed at MediaEval 2018, the Predicting Media Memorability task is in its 3rd edition this year, as the prediction of short-term and long-term video memorability (VM) remains a challenging task. In 2020, the format remained the same as in previous editions. This year the videos are a subset of the TRECVid 2019 Video to Text dataset, containing more action rich video content as compare with the 2019 task. In this paper a description of some aspects of this task is provided, including its main characteristics, a description of the collection, the ground truth dataset, evaluation metrics and the requirements for the run submission.
Presented by: Rukiye Savran Kiziltepe
Fooling an Automatic Image Quality Estimatormultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper45.pdf
Benoit Bonnet, Teddy Furon and Patrick Bas : Fooling an Automatic Image Quality Estimator. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper we present our work on the 2020 MediaEval task: Pixel "Privacy: Quality Camouflage for Social Images". Blind Image Quality Assessment (BIQA) is a classifier that for any given image will return a quality score. Our task is to modify an image to decrease its BIQA score while maintaining a good perceived quality. Since BIQA is a deep neural network, we worked on an adversarial attack approach of the problem.
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper16.pdf
YouTube: https://youtu.be/ix_b9K7j72w
Zhengyu Zhao : Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable Color Filter. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper presents the submission of our RU-DS team to the Pixel Privacy Task 2020. We propose to fool the blind image quality assessment model by transforming images based on optimizing a human-understandable color filter. In contrast to the common work that relies on small, $L_p$-bounded additive pixel perturbations, our approach yields large yet smooth perturbations. Experimental results demonstrate that in the specific context of this task, our approach is able to achieve strong adversarial effects, but has to sacrifice the image appeal.
Presented by: Zhengyu Zhao
Pixel Privacy: Quality Camouflage for Social Imagesmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper77.pdf
YouTube: https://youtu.be/8Rr4KknGSac
Zhuoran Liu, Zhengyu Zhao, Martha Larson and Laurent Amsaleg : Pixel Privacy: Quality Camouflage for Social Images. Proc. of MediaEval 2020, 14-15 December 2020, Online.
High-quality social images shared online can be misappropriated for unauthorized goals, where the quality filtering step is commonly carried out by automatic Blind Image Quality Assessment (BIQA) algorithms. Pixel Privacy benchmarks privacy-protective approaches that protect privacy-sensitive images against unethical computer vision algorithms. In the 2020 task, participants are encouraged to develop camouflage methods that can effectively decrease the BIQA quality score of high-quality images and maintain image appeal. The camouflaged images need to be either imperceptible to the human eye, or it can be a visible enhancement.
Presented by: Zhuoran Liu
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matchingmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper73.pdf
YouTube: https://youtu.be/TadJ6y7xZeA
Thuc Nguyen-Quang, Tuan-Duy Nguyen, Thang-Long Nguyen-Ho, Anh-Kiet Duong, Xuan-Nhat Hoang, Vinh-Thuyen Nguyen-Truong, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Matching text and images based on their semantics has an important role in cross-media retrieval. However, text and images in articles have a complex connection. In the context of MediaEval 2020 Challenge, we propose three multi-modal methods for mapping text and images of news articles to the shared space in order to perform efficient cross-retrieval. Our methods show systemic improvement and validate our hypotheses, while the best-performed method reaches a recall@100 score of 0.2064.
Presented by: Thuc Nguyen-Quang
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper72.pdf
Sabarinathan D and Suganya Ramamoorthy : Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attention Unit. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Colorectal cancer is the third most common cause of cancer worldwide. In the era of medical Industry, identifying colorectal cancer in its early stages has been a challenging problem. Inspired by these issues, the main objective of this paper is to develop a Multi supervision net algorithm for segmenting polys on a comprehensive dataset. The risk of colorectal cancer could be reduced by early diagnosis of poly during a colonoscopy. The disease and their symptoms are highly varying and always a need for a continuous update of knowledge for the doctors and medical analyst. The diseases fall into different categories and a small variation of symptoms may lead to higher rate of risk. We have taken Medico polyp challenge dataset, which consists of 1000 segmented polyp images from gastrointestinal track. We proposed an efficient Net B4 as a pre-trained architecture in multi-supervision net. The model is trained with multiple output layers. We present quantitative results on colorectal dataset to evaluate the performance and achieved good results in all the performance metrics. The experimental results proved that the proposed model is robust and provides a good level of accuracy in segmenting polyps on a comprehensive dataset for different metrics such as Dice coefficient, Recall, Precision and F2.
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper47.pdf
YouTube: https://youtu.be/vMsM4zg2-JY
Tien-Phat Nguyen, Tan-Cong Nguyen, Gia-Han Diep, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Medico task, MediaEval 2020, explores the challenge of building accurate and high-performance algorithms to detect all types of polyps in endoscopic images. We proposed different approaches leveraging the advantages of either ResUnet++ or PraNet model to efficiently segment polyps in colonoscopy images, with modifications on the network structure, parameters, and training strategies to tackle various observed characteristics of the given dataset. Our methods outperform the other teams' methods, for both accuracy and efficiency. After the evaluation, we are at top 2 for task 1 (with Jaccard index of 0.777, best Precision and Accuracy scores) and top 1 for task 2 (with 67.52 FPS and Jaccard index of 0.658).
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper31.pdf
Syed Muhammad Faraz Ali, Muhammad Taha Khan, Syed Unaiz Haider, Talha Ahmed, Zeshan Khan and Muhammad Atif Tahir : Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Intestinal Tract. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Identification of polyps in endoscopic images is critical for the diagnosis of colon cancer. Finding the exact shape and size of polyps requires the segmentation of endoscopic images. This research explores the advantage of using depth-wise separable convolution in the atrous convolution of the ResUNet++ architecture. Deep atrous spatial pyramid pooling was also implemented on the ResUNet++ architecture. The results show that architecture with separable convolution has a smaller size and fewer GFLOPs without degrading the performance too much.
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper22.pdf
Debapriya Banik and Debotosh Bhattacharjee : Deep Conditional Adversarial learning for polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This approach has addressed the Medico automatic polyp segmentation challenge which is a part of Mediaeval 2020. We have proposed a deep conditional adversarial learning based network for the automatic polyp segmentation task. The network comprises of two interdependent models namely a generator and a discriminator. The generator network is a FCN employed for the prediction of the polyp mask while the discriminator enforces the segmentation to be as similar as the real segmented mask (ground truth). Our proposed model achieved a comparative result on the test dataset provided by the organizers of the challenge.
A Temporal-Spatial Attention Model for Medical Image Detectionmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper21.pdf
Hwang Maxwell, Wu Cai, Hwang Kao-Shing, Xu Yong Si and Wu Chien-Hsing : A Temporal-Spatial Attention Model for Medical Image Detection. Proc. of MediaEval 2020, 14-15 December 2020, Online.
A local region model with attentive temporal-spatial pathways is proposed for automatically learning various target structures. The attentive spatial pathway highlights the salient region to generate bounding boxes and ignores irrelevant regions in an input image. The proposed attention mechanism allows efficient object localization and the overall predictive performance is increased because there are fewer false positives for the object detection task for medical images with manual annotations. The experimental results show that proposed models consistently increase the base architectures' predictive performance for different datasets and training sizes without undue computational efficiency.
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper20.pdf
YouTube: https://youtu.be/CVelQl5Luf0
Quoc-Huy Trinh, Minh-Van Nguyen, Thiet-Gia Huynh and Minh-Triet Tran : HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Network and UNet for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Medico: Multimedia Task focuses on developing an efficient and accurate framework to computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps in endoscopic images of the gastrointestinal (GI) tract. We are HCMUS-team approach a solution, which includes combination Residual module, Inception module, Adaptive Convolutional neural network with Unet model and PraNet to semantic segmentation all types of polyps in endoscopic images. We submit multiple runs with different architecture and parameters in our model. Our methods show potential results in accuracy and efficiency through multiple experiments.
Fine-tuning for Polyp Segmentation with Attentionmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper15.pdf
Rabindra Khadka : Transfer of Knowledge: Fine-tuning for Polyp Segmentation with Attention. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper describes how the transfer of prior knowledge can effectively take on segmentation tasks with the help of attention mechanisms. The UNet model pretrained on brain MRI dataset was fine-tuned with the polyp dataset. Attention mechanism was integrated to focus on relevant regions in the input images. The implemented architecture is evaluated on 200 validation images based on intersection over union and dice score between groundtruth and predicted region. The model demonstrates a promising result with computational efciency.
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper12.pdf
Adrian Krenzer and Frank Puppe : Bigger Networks are not Always Better: Deep Convolutional Neural Networks for Automated Polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper presents our team's (AI-JMU) approach to the Medico automated polyp segmentation challenge. We consider deep convolutional neural networks to be well suited for this task. To determine the best architecture we test and compare state of the art backbones and two different heads. Finally we achieve a Jaccard index of 73.74\% on the challenge test set. We further demonstrate that bigger networks do not always perform better. However the growing network size always increases the computational complexity.
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper51.pdf
Amel Ksibi, Amina Salhi, Ala Alluhaidan and Sahar A. El-Rahman : Insights for wellbeing: Predicting Personal Air Quality Index using Regression Approach. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Providing air pollution information to individuals enables them to understand the air quality of their living environments. Thus, the association between people’s wellbeing and the properties of the surrounding environment is an essential area of investigation. This paper proposes Air Quality Prediction through harvesting public/open data and leveraging them to get the Personal Air Quality index. These are usually incomplete. To cope with the problem of missing data, we applied the KNN imputation method. To predict Personal Air Quality Index, we apply a voting regression approach based on three base regressors which are Gradient Boosting regressor, Random Forest regressor, and linear regressor. Evaluating the experimental results using the RMSE metric, we got an average score of 35.39 for Walker and 51.16 for Car.
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper40.pdf
YouTube: https://youtu.be/SL5Hvu1mARY
Trung-Quan Nguyen, Dang-Hieu Nguyen and Loc Tai Tan Nguyen : Use Visual Features From Surrounding Scenes to Improve Personal Air Quality Data Prediction Performance. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we propose a method to predict the personal air quality index in an area by using the combination of the levels of the following pollutants: PM2.5, NO2, and O3, measured from the nearby weather stations of that area, and the photos of surrounding scenes taken at that area. Our approach uses the Inverse Distance Weighted (IDW) technique to estimate the missing air pollutant levels and then use regression to integrate visual features from taken photos to optimize the predicted values. After that, we can use those values to calculate the Air Quality Index (AQI). The results show that the proposed method may not improve the performance of the prediction in some cases.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
MediaEval 2015 - RFA at MediaEval 2015 Affective Impact of Movies Task: A Multimodal Approach
1. MediaEval 2015
RFA at MediaEval 2015 Affective Impact
of Movies Task: A Multimodal Approach
Ionuț
Mironică1
imironica@imag.pub.ro
Bogdan
Ionescu1
bionescu@imag.pub.ro
Mats
Sjöberg2
mats.sjoberg@helsinki.fi
Markus
Schedl3
markus.schedl@jku.at
Marcin
Skowron4
marcin.skowron@ofai.at
Romania Finland Austria
The 2015 Affective Impact of Movies Task (includes Violent Scenes Detection)
2
Helsinki Institute for
Information
Technology HIIT
University of Helsinki,
University
POLITEHNICA
of Bucharest
1 3
Austrian Research
Institute for
Artificial Intelligence,
Vienna, Austria
4
2. MediaEval 2015 2
Presentation outline
• Global approach
• Video content description
• Experimental results
• Conclusions
3. MediaEval 2015 3
> challenge: find a way to assign violence estimation tags to unknown videos;
> approach: machine learning paradigm;
Global Approach
labeled data
unlabeled data
train
extract features
estimate new
videos
4. MediaEval 2015
objective 2: test a broad range of frame aggregation techniques
• We focus on:
Global Approach
objective 3: test several fusion techniques
objective 1: go multimodal
visual audio motion
4
5. MediaEval 2015
Global Approach
Extract features Frame aggregation Global video
features Train classifier
• Bag of Words
• Fisher kernel
• Vector of Locally Aggregated Descriptors
5
6. MediaEval 2015
Video Content Description
[K. Seyerlehner et al., SMC, 2010]
Audio features
f1 fn…f2
time
• Block-based audio features
Motion features
[J. Uijlings et al., IJMIR, 2014]
• 3D-HoG / 3D-HOF features
6
7. MediaEval 2015
Video Content Description
Visual features
[K. E. A. van de Sande et al., TPAMI, 2010]
• ColorSIFT features
• CENsus Transform hISTogram (CENTRIST)
[J. Wu et al., TPAMI, 2011]
• CNN features
[J. Krizhevsky et al., NIPS, 2011]
7
8. MediaEval 2015 8
Evaluation
(1) Performance on Violence Detection Task
- the best performance is used with Fisher kernel and CNN
visual features
- fusing all the features together did not improve the
results above the FK-CNN only result
Description MAP
Run 1 Average on audio descriptors & nonlinear SVM 0.0485
Run 2 Average on visual features & nonlinear SVM 0.0452
Run 3 Modified VLAD with motion features & linear SVM 0.0768
Run 4 Fisher kernel with CNN visual features 0.1419
Run 5 Late fusion between all the previous runs 0.0824
9. MediaEval 2015 9
Evaluation
(2) Performance on Emotional Impact of Movies Task
Description Accuracy
valence
Accuracy
arousal
Run 1 Average on audio descriptors & nonlinear SVM 33.032% 45.038%
Run 2 Average on visual features & nonlinear SVM 36.123% 34.104%
Run 3 Modified VLAD with motion features & linear SVM 29.731% 39.865%
Run 4 Fisher kernel with CNN visual features 30.320% 44.365%
Run 5 Late fusion between all the previous runs 29.752% 37.595%
10. MediaEval 2015 10
Conclusions
• we obtained the best results on the violence task by using motion and visual
features;
• the visual / motion features obtained lower results for both valence and arousal
predictions.
• on the other side, we obtained the best results on the a affect task using the
audio features only;