Paper: http://ceur-ws.org/Vol-2882/paper47.pdf
YouTube: https://youtu.be/vMsM4zg2-JY
Tien-Phat Nguyen, Tan-Cong Nguyen, Gia-Han Diep, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Medico task, MediaEval 2020, explores the challenge of building accurate and high-performance algorithms to detect all types of polyps in endoscopic images. We proposed different approaches leveraging the advantages of either ResUnet++ or PraNet model to efficiently segment polyps in colonoscopy images, with modifications on the network structure, parameters, and training strategies to tackle various observed characteristics of the given dataset. Our methods outperform the other teams' methods, for both accuracy and efficiency. After the evaluation, we are at top 2 for task 1 (with Jaccard index of 0.777, best Precision and Accuracy scores) and top 1 for task 2 (with 67.52 FPS and Jaccard index of 0.658).
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matchingmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper73.pdf
YouTube: https://youtu.be/TadJ6y7xZeA
Thuc Nguyen-Quang, Tuan-Duy Nguyen, Thang-Long Nguyen-Ho, Anh-Kiet Duong, Xuan-Nhat Hoang, Vinh-Thuyen Nguyen-Truong, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Matching text and images based on their semantics has an important role in cross-media retrieval. However, text and images in articles have a complex connection. In the context of MediaEval 2020 Challenge, we propose three multi-modal methods for mapping text and images of news articles to the shared space in order to perform efficient cross-retrieval. Our methods show systemic improvement and validate our hypotheses, while the best-performed method reaches a recall@100 score of 0.2064.
Presented by: Thuc Nguyen-Quang
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper31.pdf
Syed Muhammad Faraz Ali, Muhammad Taha Khan, Syed Unaiz Haider, Talha Ahmed, Zeshan Khan and Muhammad Atif Tahir : Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Intestinal Tract. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Identification of polyps in endoscopic images is critical for the diagnosis of colon cancer. Finding the exact shape and size of polyps requires the segmentation of endoscopic images. This research explores the advantage of using depth-wise separable convolution in the atrous convolution of the ResUNet++ architecture. Deep atrous spatial pyramid pooling was also implemented on the ResUNet++ architecture. The results show that architecture with separable convolution has a smaller size and fewer GFLOPs without degrading the performance too much.
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper20.pdf
YouTube: https://youtu.be/CVelQl5Luf0
Quoc-Huy Trinh, Minh-Van Nguyen, Thiet-Gia Huynh and Minh-Triet Tran : HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Network and UNet for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Medico: Multimedia Task focuses on developing an efficient and accurate framework to computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps in endoscopic images of the gastrointestinal (GI) tract. We are HCMUS-team approach a solution, which includes combination Residual module, Inception module, Adaptive Convolutional neural network with Unet model and PraNet to semantic segmentation all types of polyps in endoscopic images. We submit multiple runs with different architecture and parameters in our model. Our methods show potential results in accuracy and efficiency through multiple experiments.
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper50.pdf
Hai Nguyen-Truong, San Cao, N. A. Khoa Nguyen, Bang-Dang Pham, Hieu Dao, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Sports Video Classification Tasks in the Multimedia Evaluation 2020 Challenge focuses on classifying different types of table tennis strokes in video segments. In this task, we - the HCMUS Team - perform multiple experiments, which includes a combination of models such as SlowFast, Optical Flow, DensePose, R2+1, Channel-Separated Convolutional Networks, to classify 21 types of table tennis strokes from video segments. In total, we submit eight runs corresponding to five different models with different sets of hyper-parameters in each of our models. In addition, we apply some pre-processing techniques on the dataset in order for our model to learn and classify more accurately. According to the evaluation results, one of our team's methods out-performs the other team's. In particular, our best run achieves 31.35\% global accuracy, and all of our methods show potential results in terms of local and global accuracy for action recognition tasks.
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper62.pdf
YouTube: https://youtu.be/gV-rvV3iFDA
Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri and Julien Morlier : Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal CNN for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This work presents a method for classifying table tennis strokes using spatio-temporal convolutional neural networks. The fine-grained classification is performed on trimmed video segments recorded at 120 fps with different players performing in natural conditions. From those segments, the frames are extracted, their optical flow is computed and the pose of the player is estimated. From the optical flow amplitude, a region of interest is inferred. A three stream spatio-temporal convolutional neural network using combination of those modalities and 3D attention mechanisms is presented in order to perform classification.
Presented by: Pierre-Etienne Martin
Personal Air Quality Index Prediction Using Inverse Distance Weighting Methodmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper39.pdf
YouTube: https://youtu.be/3r_oSguFPVM
Trung-Quan Nguyen, Dang-Hieu Nguyen and Loc Tai Tan Nguyen : Personal Air Quality Index Prediction Using Inverse Distance Weighting Method. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we propose a method to predict the personal air quality index in an area by only using the levels of the following pollutants: PM2.5, NO2, O3. All of them are measured from the nearby weather stations of that area. Our approach uses one of the most well-known interpolation methods in spatial analysis, the Inverse Distance Weighted (IDW) technique, to estimate the missing air pollutant levels. After that, we can use those levels to calculate the Air Quality Index (AQI). The results show that the proposed method is suitable for the prediction of those air pollutant levels.
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matchingmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper73.pdf
YouTube: https://youtu.be/TadJ6y7xZeA
Thuc Nguyen-Quang, Tuan-Duy Nguyen, Thang-Long Nguyen-Ho, Anh-Kiet Duong, Xuan-Nhat Hoang, Vinh-Thuyen Nguyen-Truong, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Matching text and images based on their semantics has an important role in cross-media retrieval. However, text and images in articles have a complex connection. In the context of MediaEval 2020 Challenge, we propose three multi-modal methods for mapping text and images of news articles to the shared space in order to perform efficient cross-retrieval. Our methods show systemic improvement and validate our hypotheses, while the best-performed method reaches a recall@100 score of 0.2064.
Presented by: Thuc Nguyen-Quang
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper31.pdf
Syed Muhammad Faraz Ali, Muhammad Taha Khan, Syed Unaiz Haider, Talha Ahmed, Zeshan Khan and Muhammad Atif Tahir : Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Intestinal Tract. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Identification of polyps in endoscopic images is critical for the diagnosis of colon cancer. Finding the exact shape and size of polyps requires the segmentation of endoscopic images. This research explores the advantage of using depth-wise separable convolution in the atrous convolution of the ResUNet++ architecture. Deep atrous spatial pyramid pooling was also implemented on the ResUNet++ architecture. The results show that architecture with separable convolution has a smaller size and fewer GFLOPs without degrading the performance too much.
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper20.pdf
YouTube: https://youtu.be/CVelQl5Luf0
Quoc-Huy Trinh, Minh-Van Nguyen, Thiet-Gia Huynh and Minh-Triet Tran : HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Network and UNet for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Medico: Multimedia Task focuses on developing an efficient and accurate framework to computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps in endoscopic images of the gastrointestinal (GI) tract. We are HCMUS-team approach a solution, which includes combination Residual module, Inception module, Adaptive Convolutional neural network with Unet model and PraNet to semantic segmentation all types of polyps in endoscopic images. We submit multiple runs with different architecture and parameters in our model. Our methods show potential results in accuracy and efficiency through multiple experiments.
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper50.pdf
Hai Nguyen-Truong, San Cao, N. A. Khoa Nguyen, Bang-Dang Pham, Hieu Dao, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Sports Video Classification Tasks in the Multimedia Evaluation 2020 Challenge focuses on classifying different types of table tennis strokes in video segments. In this task, we - the HCMUS Team - perform multiple experiments, which includes a combination of models such as SlowFast, Optical Flow, DensePose, R2+1, Channel-Separated Convolutional Networks, to classify 21 types of table tennis strokes from video segments. In total, we submit eight runs corresponding to five different models with different sets of hyper-parameters in each of our models. In addition, we apply some pre-processing techniques on the dataset in order for our model to learn and classify more accurately. According to the evaluation results, one of our team's methods out-performs the other team's. In particular, our best run achieves 31.35\% global accuracy, and all of our methods show potential results in terms of local and global accuracy for action recognition tasks.
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper62.pdf
YouTube: https://youtu.be/gV-rvV3iFDA
Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri and Julien Morlier : Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal CNN for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This work presents a method for classifying table tennis strokes using spatio-temporal convolutional neural networks. The fine-grained classification is performed on trimmed video segments recorded at 120 fps with different players performing in natural conditions. From those segments, the frames are extracted, their optical flow is computed and the pose of the player is estimated. From the optical flow amplitude, a region of interest is inferred. A three stream spatio-temporal convolutional neural network using combination of those modalities and 3D attention mechanisms is presented in order to perform classification.
Presented by: Pierre-Etienne Martin
Personal Air Quality Index Prediction Using Inverse Distance Weighting Methodmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper39.pdf
YouTube: https://youtu.be/3r_oSguFPVM
Trung-Quan Nguyen, Dang-Hieu Nguyen and Loc Tai Tan Nguyen : Personal Air Quality Index Prediction Using Inverse Distance Weighting Method. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we propose a method to predict the personal air quality index in an area by only using the levels of the following pollutants: PM2.5, NO2, O3. All of them are measured from the nearby weather stations of that area. Our approach uses one of the most well-known interpolation methods in spatial analysis, the Inverse Distance Weighted (IDW) technique, to estimate the missing air pollutant levels. After that, we can use those levels to calculate the Air Quality Index (AQI). The results show that the proposed method is suitable for the prediction of those air pollutant levels.
Big data fusion and parametrization for strategic transport modelsLuuk Brederode
Presentation at the European transport conference 2019 (Dublin);
also presented at the 6th International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS) Krakow, Poland (2019).
Accompanying paper: https://doi.org/10.1109/MTITS.2019.8883333
A LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETIONNexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
On February 2017, the University of Sherbrooke inveted me to talk about deep learning and my professional experience to student of Master program. These slides are extracted from my original presentation.
Most existing high-performance co-segmentation algorithms
are usually complex due to the way of co-labelling a
set of images as well as the common need of fine-tuning few
parameters for effective co-segmentation. In this paper, instead
of following the conventional way of co-labelling multiple images,
we propose to first exploit inter-image information through cosaliency,
and then perform single-image segmentation on each
individual image. To make the system robust and to avoid heavy
dependence on one single saliency extraction method, we propose
to apply multiple existing saliency extraction methods on each
image to obtain diverse salient maps. Our major contribution lies
in the proposed method that fuses the obtained diverse saliency
maps by exploiting the inter-image information, which we call
saliency co-fusion. Experiments on five benchmark datasets with
eight saliency extraction methods show that our saliency co-fusion
based approach achieves competitive performance even without
parameter fine-tuning when compared with the state-of-the-art
methods.
This presentation is used to show comparison of two wavelet image compression techniques named as STW and SPIHT. This compression performed using MATLAB Wavelet Tool. The black & white image is compressed using tool. Three parameters PSNR, MSE, CR and Size is used to compare.
발표자: 홍정모 (동국대학교 교수)
발표일: 18.5.
딥러닝으로 대표되는 최신 기계학습 기술은 방대한 응용 분야에서 인공지능 소프트웨어를 향한 돌파구를 열어가고 있으며 특히 이미지 처리나 컴퓨터 그래픽스와 관련된 응용 분야에서의 활약이 크게 기대된다. 본 세미나에서는 삼차원 기하 데이터를 중심으로 딥러닝 기술이 어떻게 발전해나가고 있는 지를 살펴보고 관련 산업에 끼칠 영향과 대응 방안 등에 대해서 생각해본다.
홍정모 교수는 2008년부터 동국대학교 컴퓨터공학과에 재직중이다. KAIST 기계공학과에서 학사와 석사를 마쳤으며 석사과정 중에는 요즘 4D라고 불리우는 가상현실 시뮬레이터를 연구하여 탑승형 로봇의 가상 체험 시뮬레이션 게임을 개발하였다. 고려대학교에서 영상 특수효과를 위한 유체 시뮬레이션 연구로 전산학 박사학위를 취득한 후 스탠포드 대학교 연구원으로써 파괴, 폭발, 화염과 같은 본격적인 VFX 연구를 수행하였다. 산학협력에 많은 노력을 기울여 '해운대', '7광구', '적인걸2' 등 다수의 작품에 기술 자문을 하였다. 디지털 제조로 연구 분야를 확장하며 개발한 모델링 소프트웨어 '리쏘피아'는 전 세계의 3D 프린터 사용자와 창업자들에게 꾸준히 사용되고 있다. 이 과정에서 전통적인 소프트웨어 기술의 한계를 느끼고 딥러닝과 기계학습을 활용한 모델링과 콘텐츠 제작에서 돌파구를 찾고 있다. 'C++로 배우는 딥러닝' 동영상 강의를 공개하였으며 최신 기술을 대학 강의에 선제적으로 활용하며 4차산업혁명 시대의 고급 소프트웨어 인력 양성에 노력하고 있다.
We introduce a new algorithm for image segmentation based on crowdsourcing through a game : Ask'nSeek. The game provides information on the objects of an image, under
the form of clicks that are either on the object, or on the background. These logs are then used in order to determine the best segmentation for an object among a set of candidates generated by the state-of-the-art CPMC algorithm. We also introduce a simulator that allows the generation of game logs and therefore gives insight about the number of games needed on an image to perform acceptable segmentation.
Presented by Amaia Salvador in CrowdMM 2013 (http://crowdmm.org/).
More info:
https://imatge.upc.edu/web/publications/crowdsourced-object-segmentation-game-0
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...csandit
Recognizing Faces helps to name the various subjects present in the image. This work focuses
on labeling faces on an image which includes faces of humans being of various age group
(heterogeneous set ). Principal component analysis concentrates on finds the mean of the data
set and subtracts the mean value from the data set with an intention to normalize that data.
Normalization with respect to image is the removal of common features from the data set. This
work brings in the novel idea of deploying the median another measure of central tendency for
normalization rather than mean. The above work was implemented using matlab. Results show
that Median is the best measure for normalization for a heterogeneous data set which gives
raise to outliers.
Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...CSCJournals
Histogram equalization is an efficient process often employed in consumer electronic systems for image contrast enhancement. In addition to an increase in contrast, it is also required to preserve the mean brightness of an image in order to convey the true scene information to the viewer. A conventional approach is to separate the image into sub-images and then process independently by histogram equalization towards a modified profile. However, due to the variations in image contents, the histogram separation threshold greatly influences the level of shift in mean brightness with respect to the uniform histogram in the equalization process. Therefore, the choice of a proper threshold, to separate the input image into sub-images, is very critical in order to preserve the mean brightness of the output image. In this research work, a dynamic range stretching approach is adopted to reduce the shift in output image mean brightness. Moreover, the computationally efficient golden section search algorithm is applied to obtain a proper separation into sub-images to preserve the mean brightness. Experiments were carried out on a large number of color images of natural scenes. Results, as compared to current available approaches, showed that the proposed method performed satisfactorily in terms of mean brightness preservation and enhancement in image contrast.
Introduction to Model-Based Machine LearningDaniel Emaasit
The field of machine learning has seen the development of thousands of learning algorithms. Typically, scientists choose from these algorithms to solve specific problems. Their choices often being limited by their familiarity with these algorithms. In this classical/traditional framework of machine learning, scientists are constrained to making some assumptions so as to use an existing algorithm. This is in contrast to the model-based machine learning approach which seeks to create a bespoke solution tailored to each new problem.
An Enhanced Model for Inpainting on Digital Images Using Dynamic MaskingMd. Shohel Rana
Given an image with significant portions missing or damaged. Dynamically detect the damaged regions to be inpainted. Reconstitute missing regions with data consistent with the rest of the image. Proposed a method which restore damaged area of the image reducing processing time without blurring output.
Classifying hot water chemistry: Application of multivariate statisticsDasapta Erwin Irawan
The following paper is a try out on the application of multivariate analysis (regression tree, principal component analysis, and cluster analysis) for classifying hot water chemistry. The number of sample analysed was 11 (including three cold water samples), taken from three Gorontalo geothermal sites (Boalemo, Pohuwato, and Gorontalo Regency.
Regression tree technique has failed to read the data structure due to collinearity effect therefore PCA and cluster analysis were applied. We used open source R statistical packages to do the calculation.
Such technique classifies hot water samples into three major clusters: cluster 1 (hot water from Diloniyohu-Boalemo), cluster 2 (combining hot water from Tungo and Dulangeya-Boalemo, and cold water from Dulangeya-Boalemo), and cluster 3 (cold water from Pohuwato and Diloniyohu-Boalemo). According to the results, hot water from Boalemo consists of systems: distinct geothermal system and mixing system with meteoric water, while hot water from Pohuwato has no or less mixing with meteoric water.
The statistical is able to detect the close and open geothermal system based on data structure. This robust method should be applied to more geothermal system with larger dataset to see its performance.
MediaEval 2016 - UNIFESP Predicting Media Interestingness Taskmultimediaeval
Presenter: Samuel G. Fadel
UNIFESP at MediaEval 2016: Predicting Media Interestingness Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Jurandy Almeida
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_28.pdf
Video: https://youtu.be/YLthKNczlcA
Abstract: This paper describes the approach proposed by UNIFESP for the MediaEval 2016 Predicting Media Interestingness Task and for its video subtask only. The proposed approach is based on combining learning-to-rank algorithms for predicting the interestingness of videos by their visual content.
MediaEval 2016 - MLPBOON Predicting Media Interestingness Systemmultimediaeval
Presenter: Jayneel Parekh
The MLPBOON Predicting Media Interestingness System for MediaEval 2016 In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Jayneel Parekh, Sanjeel Parekh
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_25.pdf
Video: https://youtu.be/nAnrdYiy7nc
Abstract: This paper describes the system developed by team MLPBOON for MediaEval 2016 Predicting Media Interestingness Image Subtask. After experimenting with various features and classifiers on the development dataset, our final system involves use of CNN features (fc7 layer of AlexNet) for the input representation and logistic regression as the classifier. For the proposed method, the MAP for the best run reaches a value of 0.229.
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Modelsmultimediaeval
Presenter: Göksu Erdoğan
HUCVL at MediaEval 2016: Predicting Interesting Key Frames with Deep Models In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Goksu Erdogan, Aykut Erdem, Erkut Erdem
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_18.pdf
Video: https://youtu.be/A--6O2v81Cw
Abstract: In MediaEval 2016, we focus on the image interestingness subtask which involves predicting interesting key frames of a video in the form of a movie trailer. We specifically propose three different deep models for this subtask. The first two models are based on fine-tuning two pretrained models, namely AlexNet and MemNet, where we cast the interestingness prediction as a regression problem. Our third deep model, on the other hand, depends on a triplet network which is comprised of three instances of the same feedforward network with shared weights, and trained according to a triplet ranking loss. Our experiments demonstrate that all these models provide relatively similar and promising results on the image interestingness subtask.
Template matching is a basic method in image analysis to extract useful information from images. In this
paper, we suggest a new method for pattern matching. Our method transform the template image from two
dimensional image into one dimensional vector. Also all sub-windows (same size of template) in the
reference image will transform into one dimensional vectors. The three similarity measures SAD, SSD, and
Euclidean are used to compute the likeness between template and all sub-windows in the reference image
to find the best match. The experimental results show the superior performance of the proposed method
over the conventional methods on various template of different sizes.
Modeling Crude Oil Prices (CPO) using General Regression Neural Network (GRNN) AI Publications
Modeling time series is often associated with the process forecasts certain characteristics in the next period. One of the methods forecasts that developed nowadays is using artificial neural network or more popularly known as aneural network. Use neural network in forecasts time series can be agood solution, but the problem is network architecture and the training method in the right direction. General Regression Neural Network (GRNN) is one of the network model radial basis that used to approach a function. GRNN including model neural network model with a solution that quickly, because it is not needed each iteration in the estimation weight. This model has a network architecture that wasa number of units in pattern layer in accordance with the number of input data. One of the application GRNN is to predict the crude oil by using a model GRNN.From the training and testing on the data obtained by the RMSE testing 1.9355 and RMSE training 1.1048.Model is good to be used to give aprediction that is quite accurate information that is shown by the close target with the output
Machine Learning techniques for the Task Planning of the Ambulance Rescue TeamFrancesco Cucari
The RoboCup Rescue simulation models an earthquake in an urban centre presented in the form of a map. The goal of this project is to develop a machine learning technique able to predict the expected time of death (ETD) of civilians and use it in the task planning of the ambulance team in order to save the maximum number of civilians.
Despite the availability of radiology devices in some health care centers, thorax diseases are considered as one of the most common health problems, especially in rural areas. By exploiting the power of the Internet of things and specific platforms to analyze a large volume of medical data, the health of a patient could be improved earlier. In this paper, the proposed model is based on pre-trained ResNet-50 for diagnosing thorax diseases. Chest x-ray images are cropped to extract the rib cage part from the chest radiographs. ResNet-50 was re-train on Chest x-ray14 dataset where a chest radiograph images are inserted into the model to determine if the person is healthy or not. In the case of an unhealthy patient, the model can classify the disease into one of the fourteen chest diseases. The results show the ability of ResNet-50 in achieving impressive performance in classifying thorax diseases.
Big data fusion and parametrization for strategic transport modelsLuuk Brederode
Presentation at the European transport conference 2019 (Dublin);
also presented at the 6th International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS) Krakow, Poland (2019).
Accompanying paper: https://doi.org/10.1109/MTITS.2019.8883333
A LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETIONNexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
On February 2017, the University of Sherbrooke inveted me to talk about deep learning and my professional experience to student of Master program. These slides are extracted from my original presentation.
Most existing high-performance co-segmentation algorithms
are usually complex due to the way of co-labelling a
set of images as well as the common need of fine-tuning few
parameters for effective co-segmentation. In this paper, instead
of following the conventional way of co-labelling multiple images,
we propose to first exploit inter-image information through cosaliency,
and then perform single-image segmentation on each
individual image. To make the system robust and to avoid heavy
dependence on one single saliency extraction method, we propose
to apply multiple existing saliency extraction methods on each
image to obtain diverse salient maps. Our major contribution lies
in the proposed method that fuses the obtained diverse saliency
maps by exploiting the inter-image information, which we call
saliency co-fusion. Experiments on five benchmark datasets with
eight saliency extraction methods show that our saliency co-fusion
based approach achieves competitive performance even without
parameter fine-tuning when compared with the state-of-the-art
methods.
This presentation is used to show comparison of two wavelet image compression techniques named as STW and SPIHT. This compression performed using MATLAB Wavelet Tool. The black & white image is compressed using tool. Three parameters PSNR, MSE, CR and Size is used to compare.
발표자: 홍정모 (동국대학교 교수)
발표일: 18.5.
딥러닝으로 대표되는 최신 기계학습 기술은 방대한 응용 분야에서 인공지능 소프트웨어를 향한 돌파구를 열어가고 있으며 특히 이미지 처리나 컴퓨터 그래픽스와 관련된 응용 분야에서의 활약이 크게 기대된다. 본 세미나에서는 삼차원 기하 데이터를 중심으로 딥러닝 기술이 어떻게 발전해나가고 있는 지를 살펴보고 관련 산업에 끼칠 영향과 대응 방안 등에 대해서 생각해본다.
홍정모 교수는 2008년부터 동국대학교 컴퓨터공학과에 재직중이다. KAIST 기계공학과에서 학사와 석사를 마쳤으며 석사과정 중에는 요즘 4D라고 불리우는 가상현실 시뮬레이터를 연구하여 탑승형 로봇의 가상 체험 시뮬레이션 게임을 개발하였다. 고려대학교에서 영상 특수효과를 위한 유체 시뮬레이션 연구로 전산학 박사학위를 취득한 후 스탠포드 대학교 연구원으로써 파괴, 폭발, 화염과 같은 본격적인 VFX 연구를 수행하였다. 산학협력에 많은 노력을 기울여 '해운대', '7광구', '적인걸2' 등 다수의 작품에 기술 자문을 하였다. 디지털 제조로 연구 분야를 확장하며 개발한 모델링 소프트웨어 '리쏘피아'는 전 세계의 3D 프린터 사용자와 창업자들에게 꾸준히 사용되고 있다. 이 과정에서 전통적인 소프트웨어 기술의 한계를 느끼고 딥러닝과 기계학습을 활용한 모델링과 콘텐츠 제작에서 돌파구를 찾고 있다. 'C++로 배우는 딥러닝' 동영상 강의를 공개하였으며 최신 기술을 대학 강의에 선제적으로 활용하며 4차산업혁명 시대의 고급 소프트웨어 인력 양성에 노력하고 있다.
We introduce a new algorithm for image segmentation based on crowdsourcing through a game : Ask'nSeek. The game provides information on the objects of an image, under
the form of clicks that are either on the object, or on the background. These logs are then used in order to determine the best segmentation for an object among a set of candidates generated by the state-of-the-art CPMC algorithm. We also introduce a simulator that allows the generation of game logs and therefore gives insight about the number of games needed on an image to perform acceptable segmentation.
Presented by Amaia Salvador in CrowdMM 2013 (http://crowdmm.org/).
More info:
https://imatge.upc.edu/web/publications/crowdsourced-object-segmentation-game-0
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...csandit
Recognizing Faces helps to name the various subjects present in the image. This work focuses
on labeling faces on an image which includes faces of humans being of various age group
(heterogeneous set ). Principal component analysis concentrates on finds the mean of the data
set and subtracts the mean value from the data set with an intention to normalize that data.
Normalization with respect to image is the removal of common features from the data set. This
work brings in the novel idea of deploying the median another measure of central tendency for
normalization rather than mean. The above work was implemented using matlab. Results show
that Median is the best measure for normalization for a heterogeneous data set which gives
raise to outliers.
Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...CSCJournals
Histogram equalization is an efficient process often employed in consumer electronic systems for image contrast enhancement. In addition to an increase in contrast, it is also required to preserve the mean brightness of an image in order to convey the true scene information to the viewer. A conventional approach is to separate the image into sub-images and then process independently by histogram equalization towards a modified profile. However, due to the variations in image contents, the histogram separation threshold greatly influences the level of shift in mean brightness with respect to the uniform histogram in the equalization process. Therefore, the choice of a proper threshold, to separate the input image into sub-images, is very critical in order to preserve the mean brightness of the output image. In this research work, a dynamic range stretching approach is adopted to reduce the shift in output image mean brightness. Moreover, the computationally efficient golden section search algorithm is applied to obtain a proper separation into sub-images to preserve the mean brightness. Experiments were carried out on a large number of color images of natural scenes. Results, as compared to current available approaches, showed that the proposed method performed satisfactorily in terms of mean brightness preservation and enhancement in image contrast.
Introduction to Model-Based Machine LearningDaniel Emaasit
The field of machine learning has seen the development of thousands of learning algorithms. Typically, scientists choose from these algorithms to solve specific problems. Their choices often being limited by their familiarity with these algorithms. In this classical/traditional framework of machine learning, scientists are constrained to making some assumptions so as to use an existing algorithm. This is in contrast to the model-based machine learning approach which seeks to create a bespoke solution tailored to each new problem.
An Enhanced Model for Inpainting on Digital Images Using Dynamic MaskingMd. Shohel Rana
Given an image with significant portions missing or damaged. Dynamically detect the damaged regions to be inpainted. Reconstitute missing regions with data consistent with the rest of the image. Proposed a method which restore damaged area of the image reducing processing time without blurring output.
Classifying hot water chemistry: Application of multivariate statisticsDasapta Erwin Irawan
The following paper is a try out on the application of multivariate analysis (regression tree, principal component analysis, and cluster analysis) for classifying hot water chemistry. The number of sample analysed was 11 (including three cold water samples), taken from three Gorontalo geothermal sites (Boalemo, Pohuwato, and Gorontalo Regency.
Regression tree technique has failed to read the data structure due to collinearity effect therefore PCA and cluster analysis were applied. We used open source R statistical packages to do the calculation.
Such technique classifies hot water samples into three major clusters: cluster 1 (hot water from Diloniyohu-Boalemo), cluster 2 (combining hot water from Tungo and Dulangeya-Boalemo, and cold water from Dulangeya-Boalemo), and cluster 3 (cold water from Pohuwato and Diloniyohu-Boalemo). According to the results, hot water from Boalemo consists of systems: distinct geothermal system and mixing system with meteoric water, while hot water from Pohuwato has no or less mixing with meteoric water.
The statistical is able to detect the close and open geothermal system based on data structure. This robust method should be applied to more geothermal system with larger dataset to see its performance.
MediaEval 2016 - UNIFESP Predicting Media Interestingness Taskmultimediaeval
Presenter: Samuel G. Fadel
UNIFESP at MediaEval 2016: Predicting Media Interestingness Task In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Jurandy Almeida
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_28.pdf
Video: https://youtu.be/YLthKNczlcA
Abstract: This paper describes the approach proposed by UNIFESP for the MediaEval 2016 Predicting Media Interestingness Task and for its video subtask only. The proposed approach is based on combining learning-to-rank algorithms for predicting the interestingness of videos by their visual content.
MediaEval 2016 - MLPBOON Predicting Media Interestingness Systemmultimediaeval
Presenter: Jayneel Parekh
The MLPBOON Predicting Media Interestingness System for MediaEval 2016 In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Jayneel Parekh, Sanjeel Parekh
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_25.pdf
Video: https://youtu.be/nAnrdYiy7nc
Abstract: This paper describes the system developed by team MLPBOON for MediaEval 2016 Predicting Media Interestingness Image Subtask. After experimenting with various features and classifiers on the development dataset, our final system involves use of CNN features (fc7 layer of AlexNet) for the input representation and logistic regression as the classifier. For the proposed method, the MAP for the best run reaches a value of 0.229.
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Modelsmultimediaeval
Presenter: Göksu Erdoğan
HUCVL at MediaEval 2016: Predicting Interesting Key Frames with Deep Models In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Goksu Erdogan, Aykut Erdem, Erkut Erdem
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_18.pdf
Video: https://youtu.be/A--6O2v81Cw
Abstract: In MediaEval 2016, we focus on the image interestingness subtask which involves predicting interesting key frames of a video in the form of a movie trailer. We specifically propose three different deep models for this subtask. The first two models are based on fine-tuning two pretrained models, namely AlexNet and MemNet, where we cast the interestingness prediction as a regression problem. Our third deep model, on the other hand, depends on a triplet network which is comprised of three instances of the same feedforward network with shared weights, and trained according to a triplet ranking loss. Our experiments demonstrate that all these models provide relatively similar and promising results on the image interestingness subtask.
Template matching is a basic method in image analysis to extract useful information from images. In this
paper, we suggest a new method for pattern matching. Our method transform the template image from two
dimensional image into one dimensional vector. Also all sub-windows (same size of template) in the
reference image will transform into one dimensional vectors. The three similarity measures SAD, SSD, and
Euclidean are used to compute the likeness between template and all sub-windows in the reference image
to find the best match. The experimental results show the superior performance of the proposed method
over the conventional methods on various template of different sizes.
Modeling Crude Oil Prices (CPO) using General Regression Neural Network (GRNN) AI Publications
Modeling time series is often associated with the process forecasts certain characteristics in the next period. One of the methods forecasts that developed nowadays is using artificial neural network or more popularly known as aneural network. Use neural network in forecasts time series can be agood solution, but the problem is network architecture and the training method in the right direction. General Regression Neural Network (GRNN) is one of the network model radial basis that used to approach a function. GRNN including model neural network model with a solution that quickly, because it is not needed each iteration in the estimation weight. This model has a network architecture that wasa number of units in pattern layer in accordance with the number of input data. One of the application GRNN is to predict the crude oil by using a model GRNN.From the training and testing on the data obtained by the RMSE testing 1.9355 and RMSE training 1.1048.Model is good to be used to give aprediction that is quite accurate information that is shown by the close target with the output
Machine Learning techniques for the Task Planning of the Ambulance Rescue TeamFrancesco Cucari
The RoboCup Rescue simulation models an earthquake in an urban centre presented in the form of a map. The goal of this project is to develop a machine learning technique able to predict the expected time of death (ETD) of civilians and use it in the task planning of the ambulance team in order to save the maximum number of civilians.
Despite the availability of radiology devices in some health care centers, thorax diseases are considered as one of the most common health problems, especially in rural areas. By exploiting the power of the Internet of things and specific platforms to analyze a large volume of medical data, the health of a patient could be improved earlier. In this paper, the proposed model is based on pre-trained ResNet-50 for diagnosing thorax diseases. Chest x-ray images are cropped to extract the rib cage part from the chest radiographs. ResNet-50 was re-train on Chest x-ray14 dataset where a chest radiograph images are inserted into the model to determine if the person is healthy or not. In the case of an unhealthy patient, the model can classify the disease into one of the fourteen chest diseases. The results show the ability of ResNet-50 in achieving impressive performance in classifying thorax diseases.
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...multimediaeval
Presenter: Giorgos Kordopatis-Zilos
Placing Images with Refined Language Models and Similarity Search with PCA-reduced VGG Features In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands, October 20-21, CEUR-WS.org (2016) by Giorgos Kordopatis-Zilos, Adrian Popescu, Symeon Papadopoulos, Yiannis Kompatsiaris
Paper: http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_13.pdf
Video: https://youtu.be/WR4I3CWjcR4
Abstract: We describe the participation of the CERTH/CEA-LIST team in the MediaEval 2016 Placing Task. We submitted five runs to the estimation-based sub-task: one based only on text by employing a Language Model-based approach with several refinements, one based on visual content, using geospatial clustering over the most visually similar images, and three based on a hybrid scheme exploiting both visual and textual cues from the multimedia items, trained on datasets of different size and origin. The best results were obtained by a hybrid approach trained with external training data and using two publicly available gazetteers.
This article aims at a new algorithm for tracking moving objects in the long term. We have tried to overcome some potential difficulties, first by a comparative study of the measuring methods of the difference and the similarity between the template and the source image. In the second part, an improvement of the best method allows us to follow the target in a robust way. This method also allows us to effectively overcome the problems of geometric deformation, partial occlusion and recovery after the target leaves the field of vision. The originality of our algorithm is based on a new model, which does not depend on a probabilistic process and does not require a data based detection in advance. Experimental results on several difficult video sequences have proven performance advantages over many recent trackers. The developed algorithm can be employed in several applications such as video surveillance, active vision or industrial visual servoing.
TRANSFER LEARNING BASED IMAGE VISUALIZATION USING CNNgerogepatton
Image classification is a popular machine learning based applications of deep learning. Deep learning techniques are very popular because they can be effectively used in performing operations on image data in large-scale. In this paper CNN model was designed to better classify images. We make use of feature extraction part of inception v3 model for feature vector calculation and retrained the classification layer with these feature vector. By using the transfer learning mechanism the classification layer of the CNN
model was trained with 20 classes of Caltech101 image dataset and 17 classes of Oxford 17 flower image dataset. After training, network was evaluated with testing dataset images from Oxford 17 flower dataset and Caltech101 image dataset. The mean testing precision of the neural network architecture with Caltech101 dataset was 98 % and with Oxford 17 Flower image dataset was 92.27 %.
TRANSFER LEARNING BASED IMAGE VISUALIZATION USING CNNijaia
Image classification is a popular machine learning based applications of deep learning. Deep learning techniques are very popular because they can be effectively used in performing operations on image data in large-scale. In this paper CNN model was designed to better classify images. We make use of feature extraction part of inception v3 model for feature vector calculation and retrained the classification layer with these feature vector. By using the transfer learning mechanism the classification layer of the CNN model was trained with 20 classes of Caltech101 image dataset and 17 classes of Oxford 17 flower image dataset. After training, network was evaluated with testing dataset images from Oxford 17 flower dataset and Caltech101 image dataset. The mean testing precision of the neural network architecture with Caltech101 dataset was 98 % and with Oxford 17 Flower image dataset was 92.27 %.
Presentation of the joint participation between CERTH and CEA LIST in the 2015 edition of the MediaEval Placing Task in Wurzen, Germany, September 14-15, 2015.
Realtime face matching and gender prediction based on deep learningIJECEIAES
Face analysis is an essential topic in computer vision that dealing with human faces for recognition or prediction tasks. The face is one of the easiest ways to distinguish the identity people. Face recognition is a type of personal identification system that employs a person’s personal traits to determine their identity. Human face recognition scheme generally consists of four steps, namely face detection, alignment, representation, and verification. In this paper, we propose to extract information from human face for several tasks based on recent advanced deep learning framework. The proposed approach outperforms the results in the state-of-the-art.
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...IJCSEIT Journal
A face recognition system using different local features with different distance measures is proposed in this
paper. Proposed method is fast and gives accurate detection. Feature vector is based on Eigen values,
Eigen vectors, and diagonal vectors of sub images. Images are partitioned into sub images to detect local
features. Sub partitions are rearranged into vertically and horizontally matrices. Eigen values, Eigenvector
and diagonal vectors are computed for these matrices. Global feature vector is generated for face
recognition. Experiments are performed on benchmark face YALE database. Results indicate that the
proposed method gives better recognition performance in terms of average recognized rate and retrieval
time compared to the existing methods.
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
The ability to mine and extract useful information automatically, from large datasets, is a
common concern for organizations (having large datasets), over the last few decades. Over the
internet, data is vastly increasing gradually and consequently the capacity to collect and store
very large data is significantly increasing.
Existing clustering algorithms are not always efficient and accurate in solving clustering
problems for large datasets.
However, the development of accurate and fast data classification algorithms for very large
scale datasets is still a challenge. In this paper, various algorithms and techniques especially,
approach using non-smooth optimization formulation of the clustering problem, are proposed
for solving the minimum sum-of-squares clustering problems in very large datasets. This
research also develops accurate and real time L2-DC algorithm based with the incremental
approach to solve the minimum
Sports Video Classification: Classification of Strokes in Table Tennis for Me...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper2.pdf
YouTube: https://youtu.be/-bRL868b8ys
Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri, Laurent Mascarilla, Jordan Calandre and Julien Morlier : Sports Video Classification: Classification of Strokes in Table Tennis for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Fine-grained action classification has raised new challenges compared to classical action classification problems. Sport video analysis is a very popular research topic, due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests, up to analysis of athletes' performances. Running since 2019 as a part of MediaEval, we offer a task which consists in classifying table tennis strokes from videos recorded in natural conditions at the University of Bordeaux. The aim is to build tools for teachers, coaches and players to analyse table tennis games. Such tools could lead to an automatic profiling of the player and adaptation of his training for improving his/her sport skills more efficiently.
Presented by: Pierre-Etienne Martin
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper61.pdf
YouTube: https://youtu.be/brmI4g3jLS4
Ricardo Kleinlein, Cristina Luna-Jiménez, Fernando Fernández-Martínez and Zoraida Callejas : Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention and LSTM Models. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper reports on the GTH-UPM team experience in the Predicting Media Memorability task at MediaEval 2020. Teams were requested to predict memorability scores at both short-term and long-term, understanding such score as a measure of whether a video was perdurable in a viewer's memory or not. Our proposed system relies on a late fusion of the scores predicted by three sequential models, each trained over a different modality: video captions, aural embeddings and visual optical flow-based vectors. Whereas single-modality models show a low or zero Spearman correlation coefficient value, their combination considerably boosts performance over development data up to 0.2 in the short-term memorability prediction subtask and 0.19 in the long-term subtask. However, performance over test data drops to 0.016 and -0.041, respectively.
Presented by: Ricardo Kleinlein
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Taskmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper52.pdf
Janadhip Jacutprakart, Rukiye Savran Kiziltepe, John Q. Gan, Giorgos Papanastasiou and Alba G. Seco de Herrera : Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we present the methods of approach and the main results from the Essex NLIP Team’s participation in the MediEval 2020 Predicting Media Memorability task. The task requires participants to build systems that can predict short-term and long-term memorability scores on real-world video samples provided. The focus of our approach is on the use of colour-based visual features as well as the use of the video annotation meta-data. In addition, hyper-parameter tuning was explored. Besides the simplicity of the methodology, our approach achieves competitive results. We investigated the use of different visual features. We assessed the performance of memorability scores through various regression models where Random Forest regression is our final model, to predict the memorability of videos.
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper6.pdf
YouTube: https://youtu.be/ySGGu_4vaxs
Alba García Seco De Herrera, Rukiye Savran Kiziltepe, Jon Chamberlain, Mihai Gabriel Constantin, Claire-Hélène Demarty, Faiyaz Doctor, Bogdan Ionescu and Alan F. Smeaton : Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a Video Memorable? Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper describes the MediaEval 2020 Predicting Media Memorability task. After first being proposed at MediaEval 2018, the Predicting Media Memorability task is in its 3rd edition this year, as the prediction of short-term and long-term video memorability (VM) remains a challenging task. In 2020, the format remained the same as in previous editions. This year the videos are a subset of the TRECVid 2019 Video to Text dataset, containing more action rich video content as compare with the 2019 task. In this paper a description of some aspects of this task is provided, including its main characteristics, a description of the collection, the ground truth dataset, evaluation metrics and the requirements for the run submission.
Presented by: Rukiye Savran Kiziltepe
Fooling an Automatic Image Quality Estimatormultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper45.pdf
Benoit Bonnet, Teddy Furon and Patrick Bas : Fooling an Automatic Image Quality Estimator. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper we present our work on the 2020 MediaEval task: Pixel "Privacy: Quality Camouflage for Social Images". Blind Image Quality Assessment (BIQA) is a classifier that for any given image will return a quality score. Our task is to modify an image to decrease its BIQA score while maintaining a good perceived quality. Since BIQA is a deep neural network, we worked on an adversarial attack approach of the problem.
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper16.pdf
YouTube: https://youtu.be/ix_b9K7j72w
Zhengyu Zhao : Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable Color Filter. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper presents the submission of our RU-DS team to the Pixel Privacy Task 2020. We propose to fool the blind image quality assessment model by transforming images based on optimizing a human-understandable color filter. In contrast to the common work that relies on small, $L_p$-bounded additive pixel perturbations, our approach yields large yet smooth perturbations. Experimental results demonstrate that in the specific context of this task, our approach is able to achieve strong adversarial effects, but has to sacrifice the image appeal.
Presented by: Zhengyu Zhao
Pixel Privacy: Quality Camouflage for Social Imagesmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper77.pdf
YouTube: https://youtu.be/8Rr4KknGSac
Zhuoran Liu, Zhengyu Zhao, Martha Larson and Laurent Amsaleg : Pixel Privacy: Quality Camouflage for Social Images. Proc. of MediaEval 2020, 14-15 December 2020, Online.
High-quality social images shared online can be misappropriated for unauthorized goals, where the quality filtering step is commonly carried out by automatic Blind Image Quality Assessment (BIQA) algorithms. Pixel Privacy benchmarks privacy-protective approaches that protect privacy-sensitive images against unethical computer vision algorithms. In the 2020 task, participants are encouraged to develop camouflage methods that can effectively decrease the BIQA quality score of high-quality images and maintain image appeal. The camouflaged images need to be either imperceptible to the human eye, or it can be a visible enhancement.
Presented by: Zhuoran Liu
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper72.pdf
Sabarinathan D and Suganya Ramamoorthy : Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attention Unit. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Colorectal cancer is the third most common cause of cancer worldwide. In the era of medical Industry, identifying colorectal cancer in its early stages has been a challenging problem. Inspired by these issues, the main objective of this paper is to develop a Multi supervision net algorithm for segmenting polys on a comprehensive dataset. The risk of colorectal cancer could be reduced by early diagnosis of poly during a colonoscopy. The disease and their symptoms are highly varying and always a need for a continuous update of knowledge for the doctors and medical analyst. The diseases fall into different categories and a small variation of symptoms may lead to higher rate of risk. We have taken Medico polyp challenge dataset, which consists of 1000 segmented polyp images from gastrointestinal track. We proposed an efficient Net B4 as a pre-trained architecture in multi-supervision net. The model is trained with multiple output layers. We present quantitative results on colorectal dataset to evaluate the performance and achieved good results in all the performance metrics. The experimental results proved that the proposed model is robust and provides a good level of accuracy in segmenting polyps on a comprehensive dataset for different metrics such as Dice coefficient, Recall, Precision and F2.
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper22.pdf
Debapriya Banik and Debotosh Bhattacharjee : Deep Conditional Adversarial learning for polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This approach has addressed the Medico automatic polyp segmentation challenge which is a part of Mediaeval 2020. We have proposed a deep conditional adversarial learning based network for the automatic polyp segmentation task. The network comprises of two interdependent models namely a generator and a discriminator. The generator network is a FCN employed for the prediction of the polyp mask while the discriminator enforces the segmentation to be as similar as the real segmented mask (ground truth). Our proposed model achieved a comparative result on the test dataset provided by the organizers of the challenge.
A Temporal-Spatial Attention Model for Medical Image Detectionmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper21.pdf
Hwang Maxwell, Wu Cai, Hwang Kao-Shing, Xu Yong Si and Wu Chien-Hsing : A Temporal-Spatial Attention Model for Medical Image Detection. Proc. of MediaEval 2020, 14-15 December 2020, Online.
A local region model with attentive temporal-spatial pathways is proposed for automatically learning various target structures. The attentive spatial pathway highlights the salient region to generate bounding boxes and ignores irrelevant regions in an input image. The proposed attention mechanism allows efficient object localization and the overall predictive performance is increased because there are fewer false positives for the object detection task for medical images with manual annotations. The experimental results show that proposed models consistently increase the base architectures' predictive performance for different datasets and training sizes without undue computational efficiency.
Fine-tuning for Polyp Segmentation with Attentionmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper15.pdf
Rabindra Khadka : Transfer of Knowledge: Fine-tuning for Polyp Segmentation with Attention. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper describes how the transfer of prior knowledge can effectively take on segmentation tasks with the help of attention mechanisms. The UNet model pretrained on brain MRI dataset was fine-tuned with the polyp dataset. Attention mechanism was integrated to focus on relevant regions in the input images. The implemented architecture is evaluated on 200 validation images based on intersection over union and dice score between groundtruth and predicted region. The model demonstrates a promising result with computational efciency.
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper12.pdf
Adrian Krenzer and Frank Puppe : Bigger Networks are not Always Better: Deep Convolutional Neural Networks for Automated Polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper presents our team's (AI-JMU) approach to the Medico automated polyp segmentation challenge. We consider deep convolutional neural networks to be well suited for this task. To determine the best architecture we test and compare state of the art backbones and two different heads. Finally we achieve a Jaccard index of 73.74\% on the challenge test set. We further demonstrate that bigger networks do not always perform better. However the growing network size always increases the computational complexity.
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper51.pdf
Amel Ksibi, Amina Salhi, Ala Alluhaidan and Sahar A. El-Rahman : Insights for wellbeing: Predicting Personal Air Quality Index using Regression Approach. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Providing air pollution information to individuals enables them to understand the air quality of their living environments. Thus, the association between people’s wellbeing and the properties of the surrounding environment is an essential area of investigation. This paper proposes Air Quality Prediction through harvesting public/open data and leveraging them to get the Personal Air Quality index. These are usually incomplete. To cope with the problem of missing data, we applied the KNN imputation method. To predict Personal Air Quality Index, we apply a voting regression approach based on three base regressors which are Gradient Boosting regressor, Random Forest regressor, and linear regressor. Evaluating the experimental results using the RMSE metric, we got an average score of 35.39 for Walker and 51.16 for Car.
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper40.pdf
YouTube: https://youtu.be/SL5Hvu1mARY
Trung-Quan Nguyen, Dang-Hieu Nguyen and Loc Tai Tan Nguyen : Use Visual Features From Surrounding Scenes to Improve Personal Air Quality Data Prediction Performance. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we propose a method to predict the personal air quality index in an area by using the combination of the levels of the following pollutants: PM2.5, NO2, and O3, measured from the nearby weather stations of that area, and the photos of surrounding scenes taken at that area. Our approach uses the Inverse Distance Weighted (IDW) technique to estimate the missing air pollutant levels and then use regression to integrate visual features from taken photos to optimize the predicted values. After that, we can use those values to calculate the Air Quality Index (AQI). The results show that the proposed method may not improve the performance of the prediction in some cases.
Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper11.pdf
YouTube: https://youtu.be/fBPuacAZkxs
Minh-Son Dao, Peijiang Zhao, Thanh Nguyen, Thanh Binh Nguyen, Duc Tien Dang Nguyen and Cathal Gurrin : Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health Lifelog Data Analysis. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper provides a description of the MediaEval 2020 “Multimodal personal health lifelog data analysis". The purpose of this task is to develop approaches that process the environment data to obtain insights about personal wellbeing. Establishing the association between people’s wellbeing and properties of the surrounding environment which is vital for numerous research. Our task focuses on the internal associations of heterogeneous data. Participants create systems that derive insights from multimodal lifelog data that are important for health and wellbeing to tackle two challenging subtasks. The first task is to investigate whether we can use public/open data to predict personal air pollution data. The second task is to develop approaches to predict personal air quality index(AQI) using images captured by people (plus GAQD). This task targets (but is not limited to) researchers in the areas of multimedia information retrieval, machine learning, AI, data science, event-based processing and analysis, multimodal multimedia content analysis, lifelog data analysis, urban computing, environmental science, and atmospheric science.
Presented by: Peijiang Zhao
Ensemble based method for the classification of flooding event using social m...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper37.pdf
YouTube: https://youtu.be/4ROoOzdQzEI
Muhammad Hanif, Huzaifa Joozer, Muhammad Atif Tahir and Muhammad Rafi : Ensemble based method for the classification of flooding event using social media data. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper presents the method proposed and implemented by team FAST-NU-DS, in "The Flood-related Multimedia Task at MediaEval 2020". The task includes data of tweets in Italian language, extracted during floods between 2017 and 2019. The proposed method has utilized text of the tweet and its relevant image for the purpose of binary classification, which identifies whether or not the particular tweet is about flood incident. The proposed method has designed an ensemble based method for the classification of tweets, on the basis of textual data, visual data and combination of both. For visual data, the proposed method has utilized the technique of data augmentation for oversampling of the minority class and applied stratified random sampling for the selection of input. Moreover, Visual Geometry Group (VGG16) convolutional neural network, pretrained on ImageNet and Places365 is utilized by the proposed method. For classification of textual data, the technique of Term Frequency Inverse Document Frequency (TF-IDF) is utilized for feature representation and Multinomial Naive-Bayes classifier is used for the prediction of class. The prediction of image and text are combined for the prediction of each instance. The evaluation of method revealed 36.31%, 20.76% and 27.86% F1-score for text, image and combination of both text and image respectively.
Presented by: Muhammad Hanif
Flood Detection via Twitter Streams using Textual and Visual Featuresmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper35.pdf
Firoj Alam, Zohaib Hassan, Kashif Ahmad, Asma Gul, Michael Reiglar, Nicola Conci and Ala Al-Fuqaha : Flood Detection via Twitter Streams using Textual and Visual Features. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The paper presents our proposed solutions for the MediaEval 2020 Flood-Related Multimedia Task, which aims to analyze and detect flooding events in multimedia content shared over Twitter. In total, we proposed four different solutions including a multi-modal solution combining textual and visual information for the mandatory run, and three single modal image and text-based solutions as optional runs. In the multi-modal method, we rely on a supervised multimodal bitransformer model that combines textual and visual features in an early fusion, achieving a micro F1-score of .859 on the development data set. For the text-based flood events detection, we use a transformer network (i.e., pretrained Italian BERT model) achieving an F1-score of .853. For image-based solutions, we employed multiple deep models, pre-trained on both, the Ima- geNet and places data sets, individually and combined in an early fusion achieving F1-scores of .816 and .805 on the development set, respectively.
Floods Detection in Twitter Text and Imagesmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper34.pdf
YouTube: https://youtu.be/3f_Q1WeulbI
Naina Said, Kashif Ahmad, Asma Gul, Nasir Ahmad and Ala Al-Fuqaha : Floods Detection in Twitter Text and Images. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we present our methods for the MediaEval 2020 Flood Related Multimedia task, which aims to analyze and combine textual and visual content from social media for the detection of real-world flooding events. The task mainly focuses on identifying floods related tweets relevant to a specific area. We propose several schemes to address the challenge. For text-based flood events detection, we use three different methods, relying on Bog of Words (BOW) and an Italian Version of Bert individually and in combination, achieving an F1-score of 0.77%, 0.68%, and 0.70% on the development set, respectively. For the visual analysis, we rely on features extracted via multiple state-of-the-art deep models pre-trained on ImageNet. The extracted features are then used to train multiple individual classifiers whose scores are then combined in a late fusion manner achieving an F1-score of 0.75%. For our mandatory multi-modal run, we combine the classification scores obtained with the best textual and visual schemes in a late fusion manner. Overall, better results are obtained with the multimodal scheme achieving an F1-score of 0.80% on the development set
Presented by: Naina Said
Flood Detection in Twitter Using a Novel Learning Method for Neural Networksmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper19.pdf
Rabiul Islam Jony and Alan Woodley : Flood Detection in Twitter Using a Novel Learning Method for Neural Networks. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper we use a novel backpropagation technique, Direct Backpropagation (DBP) to train a neural network and use it to detect flooding in Twitter posts. We use the the textual information from the tweets and the visual features form the associated images to classify the posts into two categories, flood (1) and no-flood (0). We also fuse these two modes using fusion methods for the classification. For the classification task we employ a neural network that we train using our proposed method instead of typical backpropgation method. This work has been done in the context of the MediaEval 2020 Flood-Related Multimedia Task.
The Flood-related Multimedia Task at MediaEval 2020multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper5.pdf
YouTube: https://youtu.be/s2aVeW3ig0s
Stelios Andreadis, Ilias Gialampoukidis, Anastasios Karakostas, Stefanos Vrochidis, Ioannis Kompatsiaris, Roberto Fiorin, Daniele Norbiato and Michele Ferri : The Flood-related Multimedia Task at MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper provides a description of the Flood-related Multimedia Task at MediaEval 2020. The primary goal of the task is to analyse and combine textual and visual content from social media data that reflect real-world events. The focus is on natural disasters and especially on flooding incidents, which are frequent around the globe and have large social consequences for communities and individuals. In particular, the task requires participants to identify Twitter posts that are relevant to flood events in a specific area of interest, based on their text and images. The automatic classification of posts as relevant or not relevant will essentially improve the quality of retrieved social media data, so that they can play a more valuable role in the emergency management.
Presented by: Ilias Gialampoukidis
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
3. Overview
3
[1] Debesh Jha et al. “Medico Multimedia Task at MediaEval 2020: Auto-matic Polyp Segmentation”. MediaEval 2020 Workshop. 2020.
MEDICO TASK1
- Segment all types of polyps in colonoscopy images
4. Overview
4
TASK 1: Polyp Segmentation
- Data:
- Training set: Kvasir-SEG1
Dataset with 1.000 images
- Test set: 160 images
- Evaluated by Jaccard index
[1] Debesh Jha et al. “Kvasir-seg: A segmented polyp dataset.” In International Conference on Multimedia Modeling. 2020.
5. Overview
5
TASK 2: Algorithm efficiency
- Same as Task 1 but focus on the speed of the algorithm
- Participants are required to submit a Docker image
- Evaluated by frames per second rate and Jaccard index
6. - PraNet1
: We use Pranet with Res2net3
backbone as the main
segmentation algorithm.
- Training Signal Annealing2
: We apply TSA strategy to prevent
overfitting in training step.
- Pre/Post-processing methods: We use random rotation and
high-boost filtering for augmentation.
Approach 1: PraNet with Training Signal Annealing
6
[1] D.-P. Fan et al. “PraNet: Parallel Reverse Attention Network for Polyp Segmentation”. MICCAI 2020
[2] Xie, Qizhe, et al. “Unsupervised data augmentation for consistency training”. (2019)
[3] Gao, Shanghua, et al. “Res2net: A new multi-scale backbone architecture”. IEEE 2019
7. 7
Training Signal Annealing
- Images with high score (dice-coef) over a threshold are weighted less
than the others.
- Threshold formula:
T : total number of training step
nt
: threshold at step t
K : 2
Approach 1: PraNet with Training Signal Annealing
8. 8
Preprocessing method: High-boost filtering
- Aims to enhance the polyps texture by emphasize high frequency
components.
Approach 1: PraNet with Training Signal Annealing
9. Approach 2: ResUnet++ with Triple-Path and Geodesic Distance
- Base architecture: use ResUnet++1
- Triple-path input: aggregate three
versions of the enhanced input
image
- Guide map: integrate a distance
map layer using Geodesic Distance
Transform2
as a guide mask
9
[1] Jha, Debesh, et al. “Resunet++: An advanced architecture for
medical image segmentation.” IEEE 2020
[2] G. Wang et al. “DeepIGeoS: A Deep Interactive Geodesic Framework
for Medical Image Segmentation.IEEE 2019
10. Approach 2: ResUnet++ with Triple-Path and Geodesic Distance
- Create two new enhanced images
using CLAHE and Equalize transform
- Put into separated convolution layers,
then are concatenated together and
passed into a sum convolution layer
10
Triple-path input
11. Approach 2: ResUnet++ with Triple-Path and Geodesic Distance
11
Guide map with Geodesic Distance
- Pixels closer the boundary should be treated
differently from the pixels inside the polyps
- Calculate the geodesic distance map of the
image based on the center points of the polyps.
- Integrate the distance map to the original
predict layer of the ResUnet++ architecture
12. Result
1. Medico polyp segmentation task 1
- Submitted 5 runs:
➢ Run 1: PraNet with strategy discussed in approach 1
➢ Run 2: Train and ensemble five ResUnet++'s improved model in approach 2
➢ Run 3: Ensemble the results from run 1 and 2.
➢ Run 4: Continue training Run 2 for some epochs with the full dataset that
includes validation set.
➢ Run 5: Single model with highest validation score of approach 2
12
13. Result
1. Medico polyp segmentation task 1
- Result:
13
Results of the Medico polyp segmentation task 1
14. Result
2. Medico polyp segmentation task 2
- Submitted 2 runs:
➢ Run 1: Choose best model which achieves best validation
result from approach 1
➢ Run 2 : Likewise, choose model with highest validation
score of approach 2
14
15. Result
2. Medico polyp segmentation task 2
- Result:
15
Results of the Medico polyp segmentation task 2
16. Conclusion
- We proposed different methods leveraging the advantages of
either ResUnet++ or PraNet model to efficiently segment polyps in
colonoscopy images.
- Our suggested approaches show significant improvements in the
results, for both accuracy and efficiency.
16