Paper: http://ceur-ws.org/Vol-2882/paper73.pdf
YouTube: https://youtu.be/TadJ6y7xZeA
Thuc Nguyen-Quang, Tuan-Duy Nguyen, Thang-Long Nguyen-Ho, Anh-Kiet Duong, Xuan-Nhat Hoang, Vinh-Thuyen Nguyen-Truong, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Matching text and images based on their semantics has an important role in cross-media retrieval. However, text and images in articles have a complex connection. In the context of MediaEval 2020 Challenge, we propose three multi-modal methods for mapping text and images of news articles to the shared space in order to perform efficient cross-retrieval. Our methods show systemic improvement and validate our hypotheses, while the best-performed method reaches a recall@100 score of 0.2064.
Presented by: Thuc Nguyen-Quang
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper47.pdf
YouTube: https://youtu.be/vMsM4zg2-JY
Tien-Phat Nguyen, Tan-Cong Nguyen, Gia-Han Diep, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Medico task, MediaEval 2020, explores the challenge of building accurate and high-performance algorithms to detect all types of polyps in endoscopic images. We proposed different approaches leveraging the advantages of either ResUnet++ or PraNet model to efficiently segment polyps in colonoscopy images, with modifications on the network structure, parameters, and training strategies to tackle various observed characteristics of the given dataset. Our methods outperform the other teams' methods, for both accuracy and efficiency. After the evaluation, we are at top 2 for task 1 (with Jaccard index of 0.777, best Precision and Accuracy scores) and top 1 for task 2 (with 67.52 FPS and Jaccard index of 0.658).
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper31.pdf
Syed Muhammad Faraz Ali, Muhammad Taha Khan, Syed Unaiz Haider, Talha Ahmed, Zeshan Khan and Muhammad Atif Tahir : Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Intestinal Tract. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Identification of polyps in endoscopic images is critical for the diagnosis of colon cancer. Finding the exact shape and size of polyps requires the segmentation of endoscopic images. This research explores the advantage of using depth-wise separable convolution in the atrous convolution of the ResUNet++ architecture. Deep atrous spatial pyramid pooling was also implemented on the ResUNet++ architecture. The results show that architecture with separable convolution has a smaller size and fewer GFLOPs without degrading the performance too much.
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper50.pdf
Hai Nguyen-Truong, San Cao, N. A. Khoa Nguyen, Bang-Dang Pham, Hieu Dao, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Sports Video Classification Tasks in the Multimedia Evaluation 2020 Challenge focuses on classifying different types of table tennis strokes in video segments. In this task, we - the HCMUS Team - perform multiple experiments, which includes a combination of models such as SlowFast, Optical Flow, DensePose, R2+1, Channel-Separated Convolutional Networks, to classify 21 types of table tennis strokes from video segments. In total, we submit eight runs corresponding to five different models with different sets of hyper-parameters in each of our models. In addition, we apply some pre-processing techniques on the dataset in order for our model to learn and classify more accurately. According to the evaluation results, one of our team's methods out-performs the other team's. In particular, our best run achieves 31.35\% global accuracy, and all of our methods show potential results in terms of local and global accuracy for action recognition tasks.
Development of stereo matching algorithm based on sum of absolute RGB color d...IJECEIAES
This article presents local-based stereo matching algorithm which comprises a devel- opment of an algorithm using block matching and two edge preserving filters in the framework. Fundamentally, the matching process consists of several stages which will produce the disparity or depth map. The problem and most challenging work for matching process is to get an accurate corresponding point between two images. Hence, this article proposes an algorithm for stereo matching using improved Sum of Absolute RGB Differences (SAD), gradient matching and edge preserving filters. It is Bilateral Filter (BF) to surge up the accuracy. The SAD and gradient matching will be implemented at the first stage to get the preliminary corresponding result, then the BF works as an edge-preserving filter to remove the noise from the first stage. The second BF is used at the last stage to improve final disparity map and increase the object boundaries. The experimental analysis and validation are using the Middlebury standard benchmarking evaluation system. Based on the results, the proposed work is capable to increase the accuracy and to preserve the object edges. To make the proposed work more reliable with current available methods, the quantitative measurement has been made to compare with other existing methods and it shows the proposed work in this article perform much better.
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper47.pdf
YouTube: https://youtu.be/vMsM4zg2-JY
Tien-Phat Nguyen, Tan-Cong Nguyen, Gia-Han Diep, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Medico task, MediaEval 2020, explores the challenge of building accurate and high-performance algorithms to detect all types of polyps in endoscopic images. We proposed different approaches leveraging the advantages of either ResUnet++ or PraNet model to efficiently segment polyps in colonoscopy images, with modifications on the network structure, parameters, and training strategies to tackle various observed characteristics of the given dataset. Our methods outperform the other teams' methods, for both accuracy and efficiency. After the evaluation, we are at top 2 for task 1 (with Jaccard index of 0.777, best Precision and Accuracy scores) and top 1 for task 2 (with 67.52 FPS and Jaccard index of 0.658).
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper31.pdf
Syed Muhammad Faraz Ali, Muhammad Taha Khan, Syed Unaiz Haider, Talha Ahmed, Zeshan Khan and Muhammad Atif Tahir : Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Intestinal Tract. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Identification of polyps in endoscopic images is critical for the diagnosis of colon cancer. Finding the exact shape and size of polyps requires the segmentation of endoscopic images. This research explores the advantage of using depth-wise separable convolution in the atrous convolution of the ResUNet++ architecture. Deep atrous spatial pyramid pooling was also implemented on the ResUNet++ architecture. The results show that architecture with separable convolution has a smaller size and fewer GFLOPs without degrading the performance too much.
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper50.pdf
Hai Nguyen-Truong, San Cao, N. A. Khoa Nguyen, Bang-Dang Pham, Hieu Dao, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Sports Video Classification Tasks in the Multimedia Evaluation 2020 Challenge focuses on classifying different types of table tennis strokes in video segments. In this task, we - the HCMUS Team - perform multiple experiments, which includes a combination of models such as SlowFast, Optical Flow, DensePose, R2+1, Channel-Separated Convolutional Networks, to classify 21 types of table tennis strokes from video segments. In total, we submit eight runs corresponding to five different models with different sets of hyper-parameters in each of our models. In addition, we apply some pre-processing techniques on the dataset in order for our model to learn and classify more accurately. According to the evaluation results, one of our team's methods out-performs the other team's. In particular, our best run achieves 31.35\% global accuracy, and all of our methods show potential results in terms of local and global accuracy for action recognition tasks.
Development of stereo matching algorithm based on sum of absolute RGB color d...IJECEIAES
This article presents local-based stereo matching algorithm which comprises a devel- opment of an algorithm using block matching and two edge preserving filters in the framework. Fundamentally, the matching process consists of several stages which will produce the disparity or depth map. The problem and most challenging work for matching process is to get an accurate corresponding point between two images. Hence, this article proposes an algorithm for stereo matching using improved Sum of Absolute RGB Differences (SAD), gradient matching and edge preserving filters. It is Bilateral Filter (BF) to surge up the accuracy. The SAD and gradient matching will be implemented at the first stage to get the preliminary corresponding result, then the BF works as an edge-preserving filter to remove the noise from the first stage. The second BF is used at the last stage to improve final disparity map and increase the object boundaries. The experimental analysis and validation are using the Middlebury standard benchmarking evaluation system. Based on the results, the proposed work is capable to increase the accuracy and to preserve the object edges. To make the proposed work more reliable with current available methods, the quantitative measurement has been made to compare with other existing methods and it shows the proposed work in this article perform much better.
Detection of leaf diseases and classification using digital image processingNaeem Shehzad
In this presentation you can learn how to find leaf disease using k mean algorithm and gray level co-occurrence matrix and support vector machine with complete results.
In this presentation , I mention all the data in very convenient way . I hope you can take it easy.
Thank you
Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...CSCJournals
Histogram equalization is an efficient process often employed in consumer electronic systems for image contrast enhancement. In addition to an increase in contrast, it is also required to preserve the mean brightness of an image in order to convey the true scene information to the viewer. A conventional approach is to separate the image into sub-images and then process independently by histogram equalization towards a modified profile. However, due to the variations in image contents, the histogram separation threshold greatly influences the level of shift in mean brightness with respect to the uniform histogram in the equalization process. Therefore, the choice of a proper threshold, to separate the input image into sub-images, is very critical in order to preserve the mean brightness of the output image. In this research work, a dynamic range stretching approach is adopted to reduce the shift in output image mean brightness. Moreover, the computationally efficient golden section search algorithm is applied to obtain a proper separation into sub-images to preserve the mean brightness. Experiments were carried out on a large number of color images of natural scenes. Results, as compared to current available approaches, showed that the proposed method performed satisfactorily in terms of mean brightness preservation and enhancement in image contrast.
Image Fusion and Image Quality Assessment of Fused ImagesCSCJournals
Accurate diagnosis of tumor extent is important in radiotherapy. This paper presents the use of image fusion of PET and MRI image. Multi-sensor image fusion is the process of combining information from two or more images into a single image. The resulting image contains more information as compared to individual images. PET delivers high-resolution molecular imaging with a resolution down to 2.5 mm full width at half maximum (FWHM), which allows us to observe the brain\'s molecular changes using the specific reporter genes and probes. On the other hand, the 7.0 T-MRI, with sub-millimeter resolution images of the cortical areas down to 250 m, allows us to visualize the fine details of the brainstem areas as well as the many cortical and sub-cortical areas. The PET-MRI fusion imaging system provides complete information on neurological diseases as well as cognitive neurosciences. The paper presents PCA based image fusion and also focuses on image fusion algorithm based on wavelet transform to improve resolution of the images in which two images to be fused are firstly decomposed into sub-images with different frequency and then the information fusion is performed and finally these sub-images are reconstructed into result image with plentiful information. . We also propose image fusion in Radon space. This paper presents assessment of image fusion by measuring the quantity of enhanced information in fused images. We use entropy, mean, standard deviation and Fusion Mutual Information, cross correlation , Mutual Information Root Mean Square Error, Universal Image Quality Index and Relative shift in mean to compare fused image quality. Comparative evaluation of fused images is a critical step to evaluate the relative performance of different image fusion algorithms. In this paper, we also propose image quality metric based on the human vision system (HVS).
Development and Comparison of Image Fusion Techniques for CT&MRI ImagesIJERA Editor
Image processing techniques primarily focus upon enhancing the quality of an image or a set ofimages to derive
the maximum information from them. Image Fusion is a technique of producing a superior quality image from a
set of available images. It is the process of combining relevant information from two or more images into a
single image wherein the resulting image will be more informative and complete than any of the input images. A
lot of research is being done in this field encompassing areas of Computer Vision, Automatic object detection,
Image processing, parallel and distributed processing, Robotics and remote sensing. This project paves way to
explain the theoretical and implementation issues of seven image fusion algorithms and the experimental results
of the same. The fusion algorithms would be assessed based on the study and development of some image
quality metrics
Most existing high-performance co-segmentation algorithms
are usually complex due to the way of co-labelling a
set of images as well as the common need of fine-tuning few
parameters for effective co-segmentation. In this paper, instead
of following the conventional way of co-labelling multiple images,
we propose to first exploit inter-image information through cosaliency,
and then perform single-image segmentation on each
individual image. To make the system robust and to avoid heavy
dependence on one single saliency extraction method, we propose
to apply multiple existing saliency extraction methods on each
image to obtain diverse salient maps. Our major contribution lies
in the proposed method that fuses the obtained diverse saliency
maps by exploiting the inter-image information, which we call
saliency co-fusion. Experiments on five benchmark datasets with
eight saliency extraction methods show that our saliency co-fusion
based approach achieves competitive performance even without
parameter fine-tuning when compared with the state-of-the-art
methods.
A survey on methods and applications of meta-learning with GNNsShreya Goyal
This survey paper has provided a comprehensive review of works that are a combination of graph neural networks (GNNs) and meta-learning. They have also provided a thorough review, summary of methods, and applications in these categories. The application of meta-learning to GNNs is a growing and exciting field; many graph problems will benefit immensely from the combination of the two approaches.
This article aims at a new algorithm for tracking moving objects in the long term. We have tried to overcome some potential difficulties, first by a comparative study of the measuring methods of the difference and the similarity between the template and the source image. In the second part, an improvement of the best method allows us to follow the target in a robust way. This method also allows us to effectively overcome the problems of geometric deformation, partial occlusion and recovery after the target leaves the field of vision. The originality of our algorithm is based on a new model, which does not depend on a probabilistic process and does not require a data based detection in advance. Experimental results on several difficult video sequences have proven performance advantages over many recent trackers. The developed algorithm can be employed in several applications such as video surveillance, active vision or industrial visual servoing.
Adaptive threshold for moving objects detection using gaussian mixture modelTELKOMNIKA JOURNAL
Moving object detection becomes the important task in the video surveilance system. Defining the threshold automatically is challenging to differentiate the moving object from the background within a video. This study proposes gaussian mixture model (GMM) as a threshold strategy in moving object detection. The performance of the proposed method is compared to the Otsu algorithm and gray threshold as the baseline method using mean square error (MSE) and Peak Signal Noise Ratio (PSNR). The performance comparison of the methods is evaluated on human video dataset. The average result of MSE value GMM is 257.18, Otsu is 595.36 and Gray is 645.39, so the MSE value is lower than Otsu and Gray threshold. The average result of PSNR value GMM is 24.71, Otsu is 20.66 and Gray is 19.35, so the PSNR value is higher than Otsu and Gray threshold. The performance of the proposed method outperforms the baseline method in term of error detection.
A novel method is proposed for image segmentation based on probabilistic field theory. This model assumes that the whole pixels of an image and some unknown parameters form a field. According to this model, the pixel labels are generated by a compound function of the field. The main novelty of this model is it consider the features of the pixels and the interdependent among the pixels. The parameters are generated by a novel spatially variant mixture model and estimated by expectation-maximization (EM)-
based algorithm. Thus, we simultaneously impose the spatial smoothness on the prior knowledge. Numerical experiments are presented where the proposed method and other mixture model-based methods were tested on synthetic and real world images. These experimental results demonstrate that our algorithm achieves competitive performance compared to other methods.
Cognitive radio networks enable a more efficient use of the radioelectric spectrum through dynamic access. Decentralized cognitive radio networks have gained popularity due to their advantages over centralized networks. The purpose of this article is to propose the collaboration between secondary users for cognitive Wi-Fi networks, in the form of two multi-criteria decision-making algorithms known as TOPSIS and VIKOR and assess their performance in terms of the number of failed handoffs. The comparative analysis is established under four different scenarios, according to the service class and the traffic level, within the Wi-Fi frequency band. The results show the performance evaluation obtained through simulations and experimental measurements, where the VIKOR algorithm has a better performance in terms of failed handoffs under different scenarios and collaboration levels.
A LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETIONNexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...idescitation
This paper introduces a comparison of two popular
clustering algorithms for breast DCE-MRI segmentation
purpose. Magnetic resonance imaging (MRI) is an advanced
medical imaging technique providing rich information about
the human soft tissue anatomy. The goal of breast magnetic
resonance image segmentation is to accurately identify the
principal mass or lesion structures in these image volumes.
There are many methods that exist to segment the breast
DCE-MR images. One of these, K-means clustering procedure
provides effective solutions in many science and engineering
fields. They are especially popular in the pattern classification
and signal processing areas and can segment the breast DCE-
MRI with high precision. The artificial bee colony (ABC)
algorithm is a new, very simple and robust population based
optimization algorithm that is inspired by the intelligent
behavior of honey bee swarms. This paper compares the
performance of two image segmentation techniques in the
subject of breast DCE-MR image. In the experiments, the
real dynamic contrast enhanced magnetic resonance images
(DCE- MRI) are used. Results show that artificial bee colony
algorithm performs better in terms of segmentation accuracy,
robustness and speed of computation.
In this paper, we propose an Attentional Generative Ad-
versarial Network (AttnGAN) that allows attention-driven,
multi-stage refinement for fine-grained text-to-image gener-
ation. With a novel attentional generative network, the At-
tnGAN can synthesize fine-grained details at different sub-
regions of the image by paying attentions to the relevant
words in the natural language description. In addition, a
deep attentional multimodal similarity model is proposed to
compute a fine-grained image-text matching loss for train-
ing the generator. The proposed AttnGAN significantly out-
performs the previous state of the art, boosting the best re-
ported inception score by 14.14% on the CUB dataset and
170.25% on the more challenging COCO dataset. A de-
tailed analysis is also performed by visualizing the atten-
tion layers of the AttnGAN. It for the first time shows that
the layered attentional GAN is able to automatically select
the condition at the word level for generating different parts
of the image
Adversarial Variational Autoencoders to extend and improve generative model -...Loc Nguyen
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is a good data generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Detection of leaf diseases and classification using digital image processingNaeem Shehzad
In this presentation you can learn how to find leaf disease using k mean algorithm and gray level co-occurrence matrix and support vector machine with complete results.
In this presentation , I mention all the data in very convenient way . I hope you can take it easy.
Thank you
Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...CSCJournals
Histogram equalization is an efficient process often employed in consumer electronic systems for image contrast enhancement. In addition to an increase in contrast, it is also required to preserve the mean brightness of an image in order to convey the true scene information to the viewer. A conventional approach is to separate the image into sub-images and then process independently by histogram equalization towards a modified profile. However, due to the variations in image contents, the histogram separation threshold greatly influences the level of shift in mean brightness with respect to the uniform histogram in the equalization process. Therefore, the choice of a proper threshold, to separate the input image into sub-images, is very critical in order to preserve the mean brightness of the output image. In this research work, a dynamic range stretching approach is adopted to reduce the shift in output image mean brightness. Moreover, the computationally efficient golden section search algorithm is applied to obtain a proper separation into sub-images to preserve the mean brightness. Experiments were carried out on a large number of color images of natural scenes. Results, as compared to current available approaches, showed that the proposed method performed satisfactorily in terms of mean brightness preservation and enhancement in image contrast.
Image Fusion and Image Quality Assessment of Fused ImagesCSCJournals
Accurate diagnosis of tumor extent is important in radiotherapy. This paper presents the use of image fusion of PET and MRI image. Multi-sensor image fusion is the process of combining information from two or more images into a single image. The resulting image contains more information as compared to individual images. PET delivers high-resolution molecular imaging with a resolution down to 2.5 mm full width at half maximum (FWHM), which allows us to observe the brain\'s molecular changes using the specific reporter genes and probes. On the other hand, the 7.0 T-MRI, with sub-millimeter resolution images of the cortical areas down to 250 m, allows us to visualize the fine details of the brainstem areas as well as the many cortical and sub-cortical areas. The PET-MRI fusion imaging system provides complete information on neurological diseases as well as cognitive neurosciences. The paper presents PCA based image fusion and also focuses on image fusion algorithm based on wavelet transform to improve resolution of the images in which two images to be fused are firstly decomposed into sub-images with different frequency and then the information fusion is performed and finally these sub-images are reconstructed into result image with plentiful information. . We also propose image fusion in Radon space. This paper presents assessment of image fusion by measuring the quantity of enhanced information in fused images. We use entropy, mean, standard deviation and Fusion Mutual Information, cross correlation , Mutual Information Root Mean Square Error, Universal Image Quality Index and Relative shift in mean to compare fused image quality. Comparative evaluation of fused images is a critical step to evaluate the relative performance of different image fusion algorithms. In this paper, we also propose image quality metric based on the human vision system (HVS).
Development and Comparison of Image Fusion Techniques for CT&MRI ImagesIJERA Editor
Image processing techniques primarily focus upon enhancing the quality of an image or a set ofimages to derive
the maximum information from them. Image Fusion is a technique of producing a superior quality image from a
set of available images. It is the process of combining relevant information from two or more images into a
single image wherein the resulting image will be more informative and complete than any of the input images. A
lot of research is being done in this field encompassing areas of Computer Vision, Automatic object detection,
Image processing, parallel and distributed processing, Robotics and remote sensing. This project paves way to
explain the theoretical and implementation issues of seven image fusion algorithms and the experimental results
of the same. The fusion algorithms would be assessed based on the study and development of some image
quality metrics
Most existing high-performance co-segmentation algorithms
are usually complex due to the way of co-labelling a
set of images as well as the common need of fine-tuning few
parameters for effective co-segmentation. In this paper, instead
of following the conventional way of co-labelling multiple images,
we propose to first exploit inter-image information through cosaliency,
and then perform single-image segmentation on each
individual image. To make the system robust and to avoid heavy
dependence on one single saliency extraction method, we propose
to apply multiple existing saliency extraction methods on each
image to obtain diverse salient maps. Our major contribution lies
in the proposed method that fuses the obtained diverse saliency
maps by exploiting the inter-image information, which we call
saliency co-fusion. Experiments on five benchmark datasets with
eight saliency extraction methods show that our saliency co-fusion
based approach achieves competitive performance even without
parameter fine-tuning when compared with the state-of-the-art
methods.
A survey on methods and applications of meta-learning with GNNsShreya Goyal
This survey paper has provided a comprehensive review of works that are a combination of graph neural networks (GNNs) and meta-learning. They have also provided a thorough review, summary of methods, and applications in these categories. The application of meta-learning to GNNs is a growing and exciting field; many graph problems will benefit immensely from the combination of the two approaches.
This article aims at a new algorithm for tracking moving objects in the long term. We have tried to overcome some potential difficulties, first by a comparative study of the measuring methods of the difference and the similarity between the template and the source image. In the second part, an improvement of the best method allows us to follow the target in a robust way. This method also allows us to effectively overcome the problems of geometric deformation, partial occlusion and recovery after the target leaves the field of vision. The originality of our algorithm is based on a new model, which does not depend on a probabilistic process and does not require a data based detection in advance. Experimental results on several difficult video sequences have proven performance advantages over many recent trackers. The developed algorithm can be employed in several applications such as video surveillance, active vision or industrial visual servoing.
Adaptive threshold for moving objects detection using gaussian mixture modelTELKOMNIKA JOURNAL
Moving object detection becomes the important task in the video surveilance system. Defining the threshold automatically is challenging to differentiate the moving object from the background within a video. This study proposes gaussian mixture model (GMM) as a threshold strategy in moving object detection. The performance of the proposed method is compared to the Otsu algorithm and gray threshold as the baseline method using mean square error (MSE) and Peak Signal Noise Ratio (PSNR). The performance comparison of the methods is evaluated on human video dataset. The average result of MSE value GMM is 257.18, Otsu is 595.36 and Gray is 645.39, so the MSE value is lower than Otsu and Gray threshold. The average result of PSNR value GMM is 24.71, Otsu is 20.66 and Gray is 19.35, so the PSNR value is higher than Otsu and Gray threshold. The performance of the proposed method outperforms the baseline method in term of error detection.
A novel method is proposed for image segmentation based on probabilistic field theory. This model assumes that the whole pixels of an image and some unknown parameters form a field. According to this model, the pixel labels are generated by a compound function of the field. The main novelty of this model is it consider the features of the pixels and the interdependent among the pixels. The parameters are generated by a novel spatially variant mixture model and estimated by expectation-maximization (EM)-
based algorithm. Thus, we simultaneously impose the spatial smoothness on the prior knowledge. Numerical experiments are presented where the proposed method and other mixture model-based methods were tested on synthetic and real world images. These experimental results demonstrate that our algorithm achieves competitive performance compared to other methods.
Cognitive radio networks enable a more efficient use of the radioelectric spectrum through dynamic access. Decentralized cognitive radio networks have gained popularity due to their advantages over centralized networks. The purpose of this article is to propose the collaboration between secondary users for cognitive Wi-Fi networks, in the form of two multi-criteria decision-making algorithms known as TOPSIS and VIKOR and assess their performance in terms of the number of failed handoffs. The comparative analysis is established under four different scenarios, according to the service class and the traffic level, within the Wi-Fi frequency band. The results show the performance evaluation obtained through simulations and experimental measurements, where the VIKOR algorithm has a better performance in terms of failed handoffs under different scenarios and collaboration levels.
A LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETIONNexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...idescitation
This paper introduces a comparison of two popular
clustering algorithms for breast DCE-MRI segmentation
purpose. Magnetic resonance imaging (MRI) is an advanced
medical imaging technique providing rich information about
the human soft tissue anatomy. The goal of breast magnetic
resonance image segmentation is to accurately identify the
principal mass or lesion structures in these image volumes.
There are many methods that exist to segment the breast
DCE-MR images. One of these, K-means clustering procedure
provides effective solutions in many science and engineering
fields. They are especially popular in the pattern classification
and signal processing areas and can segment the breast DCE-
MRI with high precision. The artificial bee colony (ABC)
algorithm is a new, very simple and robust population based
optimization algorithm that is inspired by the intelligent
behavior of honey bee swarms. This paper compares the
performance of two image segmentation techniques in the
subject of breast DCE-MR image. In the experiments, the
real dynamic contrast enhanced magnetic resonance images
(DCE- MRI) are used. Results show that artificial bee colony
algorithm performs better in terms of segmentation accuracy,
robustness and speed of computation.
In this paper, we propose an Attentional Generative Ad-
versarial Network (AttnGAN) that allows attention-driven,
multi-stage refinement for fine-grained text-to-image gener-
ation. With a novel attentional generative network, the At-
tnGAN can synthesize fine-grained details at different sub-
regions of the image by paying attentions to the relevant
words in the natural language description. In addition, a
deep attentional multimodal similarity model is proposed to
compute a fine-grained image-text matching loss for train-
ing the generator. The proposed AttnGAN significantly out-
performs the previous state of the art, boosting the best re-
ported inception score by 14.14% on the CUB dataset and
170.25% on the more challenging COCO dataset. A de-
tailed analysis is also performed by visualizing the atten-
tion layers of the AttnGAN. It for the first time shows that
the layered attentional GAN is able to automatically select
the condition at the word level for generating different parts
of the image
Adversarial Variational Autoencoders to extend and improve generative model -...Loc Nguyen
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is a good data generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Improved wolf algorithm on document images detection using optimum mean techn...journalBEEI
Detection text from handwriting in historical documents provides high-level features for the challenging problem of handwriting recognition. Such handwriting often contains noise, faint or incomplete strokes, strokes with gaps, and competing lines when embedded in a table or form, making it unsuitable for local line following algorithms or associated binarization schemes. In this paper, a proposed method based on the optimum threshold value and namely as the Optimum Mean method was presented. Besides, Wolf method unsuccessful in order to detect the thin text in the non-uniform input image. However, the proposed method was suggested to overcome the Wolf method problem by suggesting a maximum threshold value using optimum mean. Based on the calculation, the proposed method obtained a higher F-measure (74.53), PSNR (14.77) and lowest NRM (0.11) compared to the Wolf method. In conclusion, the proposed method successful and effective to solve the wolf problem by producing a high-quality output image.
An effective RGB color selection for complex 3D object structure in scene gra...IJECEIAES
Our goal of the project is to develop a complete, fully detailed 3D interactive model of the human body and systems in the human body, and allow the user to interacts in 3D with all the elements of that system, to teach students about human anatomy. Some organs, which contain a lot of details about a particular anatomy, need to be accurately and fully described in minute detail, such as the brain, lungs, liver and heart. These organs are need have all the detailed descriptions of the medical information needed to learn how to do surgery on them, and should allow the user to add careful and precise marking to indicate the operative landmarks on the surgery location. Adding so many different items of information is challenging when the area to which the information needs to be attached is very detailed and overlaps with all kinds of other medical information related to the area. Existing methods to tag areas was not allowing us sufficient locations to attach the information to. Our solution combines a variety of tagging methods, which use the marking method by selecting the RGB color area that is drawn in the texture, on the complex 3D object structure. Then, it relies on those RGB color codes to tag IDs and create relational tables that store the related information about the specific areas of the anatomy. With this method of marking, it is possible to use the entire set of color values (R, G, B) to identify a set of anatomic regions, and this also makes it possible to define multiple overlapping regions.
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEYijcsit
In recent years, frameworks that employ Generative Adversarial Networks (GANs) have achieved immense results for various applications in many fields especially those related to image generation both due to their ability to create highly realistic and sharp images as well as train on huge data sets. However, successfully training GANs are notoriously difficult task in case ifhigh resolution images are required. In this article, we discuss five applicable and fascinating areas for image synthesis based on the state-of-theart GANs techniques including Text-to-Image-Synthesis, Image-to-Image-Translation, Face Manipulation, 3D Image Synthesis and DeepMasterPrints. We provide a detailed review of current GANs-based image generation models with their advantages and disadvantages.The results of the publications in each section show the GANs based algorithmsAREgrowing fast and their constant improvement, whether in the same field or in others, will solve complicated image generation tasks in the future.
In recent years, frameworks that employ Generative Adversarial Networks (GANs) have achieved immense results for various applications in many fields especially those related to image generation both due to their ability to create highly realistic and sharp images as well as train on huge data sets. However, successfully training GANs are notoriously difficult task in case ifhigh resolution images are required. In this article, we discuss five applicable and fascinating areas for image synthesis based on the state-of-theart GANs techniques including Text-to-Image-Synthesis, Image-to-Image-Translation, Face Manipulation, 3D Image Synthesis and DeepMasterPrints. We provide a detailed review of current GANs-based image generation models with their advantages and disadvantages.The results of the publications in each section show the GANs based algorithmsAREgrowing fast and their constant improvement, whether in the same field or in others, will solve complicated image generation tasks in the future.
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...ijsc
Assigning the submitted text to one of the predetermined categories is required when dealing with
application-oriented texts. There are many different approaches to solving this problem, including using
neural network algorithms. This article explores using neural networks to sort news articles based on their
category. Two word vectorization algorithms are being used — The Bag of Words (BOW) and the
word2vec distributive semantic model. For this work the BOW model was applied to the FNN, whereas the
word2vec model was applied to CNN. We have measured the accuracy of the classification when applying
these methods for ad texts datasets. The experimental results have shown that both of the models show us
quite the comparable accuracy. However, the word2vec encoding used for CNN showed more relevant
results, regarding to the texts semantics. Moreover, the trained CNN, based on the word2vec architecture,
has produced a compact feature map on its last convolutional layer, which can then be used in the future
text representation. I.e. Using CNN as a text encoder and for learning transfer.
Texts Classification with the usage of Neural Network based on the Word2vec’s...ijsc
Assigning the submitted text to one of the predetermined categories is required when dealing with application-oriented texts. There are many different approaches to solving this problem, including using neural network algorithms. This article explores using neural networks to sort news articles based on their category. Two word vectorization algorithms are being used — The Bag of Words (BOW) and the
word2vec distributive semantic model. For this work the BOW model was applied to the FNN, whereas the word2vec model was applied to CNN. We have measured the accuracy of the classification when applying these methods for ad texts datasets. The experimental results have shown that both of the models show us quite the comparable accuracy. However, the word2vec encoding used for CNN showed more relevant results, regarding to the texts semantics. Moreover, the trained CNN, based on the word2vec architecture, has produced a compact feature map on its last convolutional layer, which can then be used in the future text representation. I.e. Using CNN as a text encoder and for learning transfer.
Ensemble based method for the classification of flooding event using social m...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper37.pdf
YouTube: https://youtu.be/4ROoOzdQzEI
Muhammad Hanif, Huzaifa Joozer, Muhammad Atif Tahir and Muhammad Rafi : Ensemble based method for the classification of flooding event using social media data. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper presents the method proposed and implemented by team FAST-NU-DS, in "The Flood-related Multimedia Task at MediaEval 2020". The task includes data of tweets in Italian language, extracted during floods between 2017 and 2019. The proposed method has utilized text of the tweet and its relevant image for the purpose of binary classification, which identifies whether or not the particular tweet is about flood incident. The proposed method has designed an ensemble based method for the classification of tweets, on the basis of textual data, visual data and combination of both. For visual data, the proposed method has utilized the technique of data augmentation for oversampling of the minority class and applied stratified random sampling for the selection of input. Moreover, Visual Geometry Group (VGG16) convolutional neural network, pretrained on ImageNet and Places365 is utilized by the proposed method. For classification of textual data, the technique of Term Frequency Inverse Document Frequency (TF-IDF) is utilized for feature representation and Multinomial Naive-Bayes classifier is used for the prediction of class. The prediction of image and text are combined for the prediction of each instance. The evaluation of method revealed 36.31%, 20.76% and 27.86% F1-score for text, image and combination of both text and image respectively.
Presented by: Muhammad Hanif
A method for semantic-based image retrieval using hierarchical clustering tre...TELKOMNIKA JOURNAL
Semantic extraction for images is an urgent problem and is applied in many different semantic retrieval systems. In this paper, a semantic-based image retrieval (SBIR) system is proposed based on the combination of growth partitioning tree (GP-Tree), which was built in the authors’ previous work, with a self-organizing map (SOM) network and neighbor graph (called SgGP-Tree) to improve accuracy. For each query image, a similar set of images is retrieved on the SgGP-Tree, and a set of visual words is extracted relying on the classes obtained from mask region-based convolutional neural networks (R-CNN), as the basis for querying semantic of input images on ontology by simple protocol and resource description framework query language (SPARQL) query. The experiment was performed on image datasets, such as ImageCLEF and MS-COCO, with precision values of 0.898453 and 0.875467, respectively. These results are compared with related works on the same image dataset, showing the effectiveness of the methods proposed.
Channel and spatial attention mechanism for fashion image captioning IJECEIAES
Image captioning aims to automatically generate one or more description sentences for a given input image. Most of the existing captioning methods use encoder-decoder model which mainly focus on recognizing and capturing the relationship between objects appearing in the input image. However, when generating captions for fashion images, it is important to not only describe the items and their relationships, but also mention attribute features of clothes (shape, texture, style, fabric, and more). In this study, one novel model is proposed for fashion image captioning task which can capture not only the items and their relationship, but also their attribute features. Two different attention mechanisms (spatial-attention and channel-wise attention) is incorporated to the traditional encoder-decoder model, which dynamically interprets the caption sentence in multi-layer feature map in addition to the depth dimension of the feature map. We evaluate our proposed architecture on Fashion-Gen using three different metrics (CIDEr, ROUGE-L, and BLEU-1), and achieve the scores of 89.7, 50.6 and 45.6, respectively. Based on experiments, our proposed method shows significant performance improvement for the task of fashion-image captioning, and outperforms other state-of-the-art image captioning methods.
Knowledge maps for e-learning. Jae Hwa Lee, Aviv Segev
Maps such as concept maps and knowledge maps are often used as learning materials. These maps havenodes and links, nodes as key concepts and links as relationships between key concepts. From a map, theuser can recognize the important concepts and the relationships between them. To build concept orknowledge maps, domain experts are needed. Therefore, since these experts are hard to obtain, the costof map creation is high. In this study, an attempt was made to automatically build a domain knowledgemap for e-learning using text mining techniques. From a set of documents about a specific topic,keywords are extracted using the TF/IDF algorithm. A domain knowledge map (K-map) is based onranking pairs of keywords according to the number of appearances in a sentence and the number ofwords in a sentence. The experiments analyzed the number of relations required to identify theimportant ideas in the text. In addition, the experiments compared K-map learning to document learningand found that K-map identifies the more important ideas
Template matching is a basic method in image analysis to extract useful information from images. In this
paper, we suggest a new method for pattern matching. Our method transform the template image from two
dimensional image into one dimensional vector. Also all sub-windows (same size of template) in the
reference image will transform into one dimensional vectors. The three similarity measures SAD, SSD, and
Euclidean are used to compute the likeness between template and all sub-windows in the reference image
to find the best match. The experimental results show the superior performance of the proposed method
over the conventional methods on various template of different sizes.
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper62.pdf
YouTube: https://youtu.be/gV-rvV3iFDA
Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri and Julien Morlier : Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal CNN for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This work presents a method for classifying table tennis strokes using spatio-temporal convolutional neural networks. The fine-grained classification is performed on trimmed video segments recorded at 120 fps with different players performing in natural conditions. From those segments, the frames are extracted, their optical flow is computed and the pose of the player is estimated. From the optical flow amplitude, a region of interest is inferred. A three stream spatio-temporal convolutional neural network using combination of those modalities and 3D attention mechanisms is presented in order to perform classification.
Presented by: Pierre-Etienne Martin
Sports Video Classification: Classification of Strokes in Table Tennis for Me...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper2.pdf
YouTube: https://youtu.be/-bRL868b8ys
Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri, Laurent Mascarilla, Jordan Calandre and Julien Morlier : Sports Video Classification: Classification of Strokes in Table Tennis for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Fine-grained action classification has raised new challenges compared to classical action classification problems. Sport video analysis is a very popular research topic, due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests, up to analysis of athletes' performances. Running since 2019 as a part of MediaEval, we offer a task which consists in classifying table tennis strokes from videos recorded in natural conditions at the University of Bordeaux. The aim is to build tools for teachers, coaches and players to analyse table tennis games. Such tools could lead to an automatic profiling of the player and adaptation of his training for improving his/her sport skills more efficiently.
Presented by: Pierre-Etienne Martin
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper61.pdf
YouTube: https://youtu.be/brmI4g3jLS4
Ricardo Kleinlein, Cristina Luna-Jiménez, Fernando Fernández-Martínez and Zoraida Callejas : Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention and LSTM Models. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper reports on the GTH-UPM team experience in the Predicting Media Memorability task at MediaEval 2020. Teams were requested to predict memorability scores at both short-term and long-term, understanding such score as a measure of whether a video was perdurable in a viewer's memory or not. Our proposed system relies on a late fusion of the scores predicted by three sequential models, each trained over a different modality: video captions, aural embeddings and visual optical flow-based vectors. Whereas single-modality models show a low or zero Spearman correlation coefficient value, their combination considerably boosts performance over development data up to 0.2 in the short-term memorability prediction subtask and 0.19 in the long-term subtask. However, performance over test data drops to 0.016 and -0.041, respectively.
Presented by: Ricardo Kleinlein
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Taskmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper52.pdf
Janadhip Jacutprakart, Rukiye Savran Kiziltepe, John Q. Gan, Giorgos Papanastasiou and Alba G. Seco de Herrera : Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we present the methods of approach and the main results from the Essex NLIP Team’s participation in the MediEval 2020 Predicting Media Memorability task. The task requires participants to build systems that can predict short-term and long-term memorability scores on real-world video samples provided. The focus of our approach is on the use of colour-based visual features as well as the use of the video annotation meta-data. In addition, hyper-parameter tuning was explored. Besides the simplicity of the methodology, our approach achieves competitive results. We investigated the use of different visual features. We assessed the performance of memorability scores through various regression models where Random Forest regression is our final model, to predict the memorability of videos.
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper6.pdf
YouTube: https://youtu.be/ySGGu_4vaxs
Alba García Seco De Herrera, Rukiye Savran Kiziltepe, Jon Chamberlain, Mihai Gabriel Constantin, Claire-Hélène Demarty, Faiyaz Doctor, Bogdan Ionescu and Alan F. Smeaton : Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a Video Memorable? Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper describes the MediaEval 2020 Predicting Media Memorability task. After first being proposed at MediaEval 2018, the Predicting Media Memorability task is in its 3rd edition this year, as the prediction of short-term and long-term video memorability (VM) remains a challenging task. In 2020, the format remained the same as in previous editions. This year the videos are a subset of the TRECVid 2019 Video to Text dataset, containing more action rich video content as compare with the 2019 task. In this paper a description of some aspects of this task is provided, including its main characteristics, a description of the collection, the ground truth dataset, evaluation metrics and the requirements for the run submission.
Presented by: Rukiye Savran Kiziltepe
Fooling an Automatic Image Quality Estimatormultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper45.pdf
Benoit Bonnet, Teddy Furon and Patrick Bas : Fooling an Automatic Image Quality Estimator. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper we present our work on the 2020 MediaEval task: Pixel "Privacy: Quality Camouflage for Social Images". Blind Image Quality Assessment (BIQA) is a classifier that for any given image will return a quality score. Our task is to modify an image to decrease its BIQA score while maintaining a good perceived quality. Since BIQA is a deep neural network, we worked on an adversarial attack approach of the problem.
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper16.pdf
YouTube: https://youtu.be/ix_b9K7j72w
Zhengyu Zhao : Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable Color Filter. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper presents the submission of our RU-DS team to the Pixel Privacy Task 2020. We propose to fool the blind image quality assessment model by transforming images based on optimizing a human-understandable color filter. In contrast to the common work that relies on small, $L_p$-bounded additive pixel perturbations, our approach yields large yet smooth perturbations. Experimental results demonstrate that in the specific context of this task, our approach is able to achieve strong adversarial effects, but has to sacrifice the image appeal.
Presented by: Zhengyu Zhao
Pixel Privacy: Quality Camouflage for Social Imagesmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper77.pdf
YouTube: https://youtu.be/8Rr4KknGSac
Zhuoran Liu, Zhengyu Zhao, Martha Larson and Laurent Amsaleg : Pixel Privacy: Quality Camouflage for Social Images. Proc. of MediaEval 2020, 14-15 December 2020, Online.
High-quality social images shared online can be misappropriated for unauthorized goals, where the quality filtering step is commonly carried out by automatic Blind Image Quality Assessment (BIQA) algorithms. Pixel Privacy benchmarks privacy-protective approaches that protect privacy-sensitive images against unethical computer vision algorithms. In the 2020 task, participants are encouraged to develop camouflage methods that can effectively decrease the BIQA quality score of high-quality images and maintain image appeal. The camouflaged images need to be either imperceptible to the human eye, or it can be a visible enhancement.
Presented by: Zhuoran Liu
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper72.pdf
Sabarinathan D and Suganya Ramamoorthy : Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attention Unit. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Colorectal cancer is the third most common cause of cancer worldwide. In the era of medical Industry, identifying colorectal cancer in its early stages has been a challenging problem. Inspired by these issues, the main objective of this paper is to develop a Multi supervision net algorithm for segmenting polys on a comprehensive dataset. The risk of colorectal cancer could be reduced by early diagnosis of poly during a colonoscopy. The disease and their symptoms are highly varying and always a need for a continuous update of knowledge for the doctors and medical analyst. The diseases fall into different categories and a small variation of symptoms may lead to higher rate of risk. We have taken Medico polyp challenge dataset, which consists of 1000 segmented polyp images from gastrointestinal track. We proposed an efficient Net B4 as a pre-trained architecture in multi-supervision net. The model is trained with multiple output layers. We present quantitative results on colorectal dataset to evaluate the performance and achieved good results in all the performance metrics. The experimental results proved that the proposed model is robust and provides a good level of accuracy in segmenting polyps on a comprehensive dataset for different metrics such as Dice coefficient, Recall, Precision and F2.
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper22.pdf
Debapriya Banik and Debotosh Bhattacharjee : Deep Conditional Adversarial learning for polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This approach has addressed the Medico automatic polyp segmentation challenge which is a part of Mediaeval 2020. We have proposed a deep conditional adversarial learning based network for the automatic polyp segmentation task. The network comprises of two interdependent models namely a generator and a discriminator. The generator network is a FCN employed for the prediction of the polyp mask while the discriminator enforces the segmentation to be as similar as the real segmented mask (ground truth). Our proposed model achieved a comparative result on the test dataset provided by the organizers of the challenge.
A Temporal-Spatial Attention Model for Medical Image Detectionmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper21.pdf
Hwang Maxwell, Wu Cai, Hwang Kao-Shing, Xu Yong Si and Wu Chien-Hsing : A Temporal-Spatial Attention Model for Medical Image Detection. Proc. of MediaEval 2020, 14-15 December 2020, Online.
A local region model with attentive temporal-spatial pathways is proposed for automatically learning various target structures. The attentive spatial pathway highlights the salient region to generate bounding boxes and ignores irrelevant regions in an input image. The proposed attention mechanism allows efficient object localization and the overall predictive performance is increased because there are fewer false positives for the object detection task for medical images with manual annotations. The experimental results show that proposed models consistently increase the base architectures' predictive performance for different datasets and training sizes without undue computational efficiency.
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper20.pdf
YouTube: https://youtu.be/CVelQl5Luf0
Quoc-Huy Trinh, Minh-Van Nguyen, Thiet-Gia Huynh and Minh-Triet Tran : HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Network and UNet for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The Medico: Multimedia Task focuses on developing an efficient and accurate framework to computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps in endoscopic images of the gastrointestinal (GI) tract. We are HCMUS-team approach a solution, which includes combination Residual module, Inception module, Adaptive Convolutional neural network with Unet model and PraNet to semantic segmentation all types of polyps in endoscopic images. We submit multiple runs with different architecture and parameters in our model. Our methods show potential results in accuracy and efficiency through multiple experiments.
Fine-tuning for Polyp Segmentation with Attentionmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper15.pdf
Rabindra Khadka : Transfer of Knowledge: Fine-tuning for Polyp Segmentation with Attention. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper describes how the transfer of prior knowledge can effectively take on segmentation tasks with the help of attention mechanisms. The UNet model pretrained on brain MRI dataset was fine-tuned with the polyp dataset. Attention mechanism was integrated to focus on relevant regions in the input images. The implemented architecture is evaluated on 200 validation images based on intersection over union and dice score between groundtruth and predicted region. The model demonstrates a promising result with computational efciency.
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper12.pdf
Adrian Krenzer and Frank Puppe : Bigger Networks are not Always Better: Deep Convolutional Neural Networks for Automated Polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper presents our team's (AI-JMU) approach to the Medico automated polyp segmentation challenge. We consider deep convolutional neural networks to be well suited for this task. To determine the best architecture we test and compare state of the art backbones and two different heads. Finally we achieve a Jaccard index of 73.74\% on the challenge test set. We further demonstrate that bigger networks do not always perform better. However the growing network size always increases the computational complexity.
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper51.pdf
Amel Ksibi, Amina Salhi, Ala Alluhaidan and Sahar A. El-Rahman : Insights for wellbeing: Predicting Personal Air Quality Index using Regression Approach. Proc. of MediaEval 2020, 14-15 December 2020, Online.
Providing air pollution information to individuals enables them to understand the air quality of their living environments. Thus, the association between people’s wellbeing and the properties of the surrounding environment is an essential area of investigation. This paper proposes Air Quality Prediction through harvesting public/open data and leveraging them to get the Personal Air Quality index. These are usually incomplete. To cope with the problem of missing data, we applied the KNN imputation method. To predict Personal Air Quality Index, we apply a voting regression approach based on three base regressors which are Gradient Boosting regressor, Random Forest regressor, and linear regressor. Evaluating the experimental results using the RMSE metric, we got an average score of 35.39 for Walker and 51.16 for Car.
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper40.pdf
YouTube: https://youtu.be/SL5Hvu1mARY
Trung-Quan Nguyen, Dang-Hieu Nguyen and Loc Tai Tan Nguyen : Use Visual Features From Surrounding Scenes to Improve Personal Air Quality Data Prediction Performance. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we propose a method to predict the personal air quality index in an area by using the combination of the levels of the following pollutants: PM2.5, NO2, and O3, measured from the nearby weather stations of that area, and the photos of surrounding scenes taken at that area. Our approach uses the Inverse Distance Weighted (IDW) technique to estimate the missing air pollutant levels and then use regression to integrate visual features from taken photos to optimize the predicted values. After that, we can use those values to calculate the Air Quality Index (AQI). The results show that the proposed method may not improve the performance of the prediction in some cases.
Personal Air Quality Index Prediction Using Inverse Distance Weighting Methodmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper39.pdf
YouTube: https://youtu.be/3r_oSguFPVM
Trung-Quan Nguyen, Dang-Hieu Nguyen and Loc Tai Tan Nguyen : Personal Air Quality Index Prediction Using Inverse Distance Weighting Method. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we propose a method to predict the personal air quality index in an area by only using the levels of the following pollutants: PM2.5, NO2, O3. All of them are measured from the nearby weather stations of that area. Our approach uses one of the most well-known interpolation methods in spatial analysis, the Inverse Distance Weighted (IDW) technique, to estimate the missing air pollutant levels. After that, we can use those levels to calculate the Air Quality Index (AQI). The results show that the proposed method is suitable for the prediction of those air pollutant levels.
Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper11.pdf
YouTube: https://youtu.be/fBPuacAZkxs
Minh-Son Dao, Peijiang Zhao, Thanh Nguyen, Thanh Binh Nguyen, Duc Tien Dang Nguyen and Cathal Gurrin : Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health Lifelog Data Analysis. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This paper provides a description of the MediaEval 2020 “Multimodal personal health lifelog data analysis". The purpose of this task is to develop approaches that process the environment data to obtain insights about personal wellbeing. Establishing the association between people’s wellbeing and properties of the surrounding environment which is vital for numerous research. Our task focuses on the internal associations of heterogeneous data. Participants create systems that derive insights from multimodal lifelog data that are important for health and wellbeing to tackle two challenging subtasks. The first task is to investigate whether we can use public/open data to predict personal air pollution data. The second task is to develop approaches to predict personal air quality index(AQI) using images captured by people (plus GAQD). This task targets (but is not limited to) researchers in the areas of multimedia information retrieval, machine learning, AI, data science, event-based processing and analysis, multimodal multimedia content analysis, lifelog data analysis, urban computing, environmental science, and atmospheric science.
Presented by: Peijiang Zhao
Flood Detection via Twitter Streams using Textual and Visual Featuresmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper35.pdf
Firoj Alam, Zohaib Hassan, Kashif Ahmad, Asma Gul, Michael Reiglar, Nicola Conci and Ala Al-Fuqaha : Flood Detection via Twitter Streams using Textual and Visual Features. Proc. of MediaEval 2020, 14-15 December 2020, Online.
The paper presents our proposed solutions for the MediaEval 2020 Flood-Related Multimedia Task, which aims to analyze and detect flooding events in multimedia content shared over Twitter. In total, we proposed four different solutions including a multi-modal solution combining textual and visual information for the mandatory run, and three single modal image and text-based solutions as optional runs. In the multi-modal method, we rely on a supervised multimodal bitransformer model that combines textual and visual features in an early fusion, achieving a micro F1-score of .859 on the development data set. For the text-based flood events detection, we use a transformer network (i.e., pretrained Italian BERT model) achieving an F1-score of .853. For image-based solutions, we employed multiple deep models, pre-trained on both, the Ima- geNet and places data sets, individually and combined in an early fusion achieving F1-scores of .816 and .805 on the development set, respectively.
Floods Detection in Twitter Text and Imagesmultimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper34.pdf
YouTube: https://youtu.be/3f_Q1WeulbI
Naina Said, Kashif Ahmad, Asma Gul, Nasir Ahmad and Ala Al-Fuqaha : Floods Detection in Twitter Text and Images. Proc. of MediaEval 2020, 14-15 December 2020, Online.
In this paper, we present our methods for the MediaEval 2020 Flood Related Multimedia task, which aims to analyze and combine textual and visual content from social media for the detection of real-world flooding events. The task mainly focuses on identifying floods related tweets relevant to a specific area. We propose several schemes to address the challenge. For text-based flood events detection, we use three different methods, relying on Bog of Words (BOW) and an Italian Version of Bert individually and in combination, achieving an F1-score of 0.77%, 0.68%, and 0.70% on the development set, respectively. For the visual analysis, we rely on features extracted via multiple state-of-the-art deep models pre-trained on ImageNet. The extracted features are then used to train multiple individual classifiers whose scores are then combined in a late fusion manner achieving an F1-score of 0.75%. For our mandatory multi-modal run, we combine the classification scores obtained with the best textual and visual schemes in a late fusion manner. Overall, better results are obtained with the multimodal scheme achieving an F1-score of 0.80% on the development set
Presented by: Naina Said
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
1. HCMUS at MediaEval 2020: Image-Text Fusion for Automatic
News-Images Re-Matching
Thuc Nguyen-Quang 1,3, Tuan-Duy H. Nguyen1,3, Thang-Long Nguyen-Ho1,3,
Anh-Kiet Duong1,3, Xuan-Nhat Hoang1,3, Vinh-Thuyen Nguyen-Truong1,3,
Hai-Dang Nguyen1,3, Minh-Triet Tran1,2,3
1University of Science, VNU-HCM,
2John von Neumann Institute, VNU-HCM,
3Vietnam National University, Ho Chi Minh city, Vietnam
December 14-15, 2020
T. Nguyen-Quang et al. HCMUS at MediaEval 2020
2. Outlines
1 Introduction
2 Methods
Metric Learning
Image-Text Matching via Categorization
Image-Text Fusion with Image Captioning and
Contextual Embeddings
Image-Text Fusion with Knowledge Graph-based
Contextual Embeddings
Image-Text Fusion with Knowledge Graph-based
Contextual Embeddings
Graph-based Face-Name Matching
Ensemble
3 Results
4 Conclusion and future works
5 Bibliography
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 1 / 19
4. Introduction
Introduction
Mainly concern fusing cross-modal embedded information extracted as:
Simple set intersection
Deep neural features
Knowledge-graph-enhanced neural features
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 2 / 19
5. Introduction
Introduction
M1 Metric Learning
M2 Image-Text Matching via Categorization
M3 Image-Text Fusion with Image Captioning and Contextual Embeddings
M4 Image-Text Fusion with Knowledge Graph-based Contextual Embeddings
M5 Graph-based Face-Name Matching
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 3 / 19
7. Methods • § Metric Learning
Metric Learning
Using a Triplet Loss model to project embeddings of image-text pairs to bases of
significant similarity.
Title texts are embedded with BERT.
Image embeddings:
Global context embedding: EfficientNet
Local context embedding: Top-k bottom-up-attention objects passed to a
self-attention sequential model.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 4 / 19
8. Methods • § Image-Text Matching via Categorization
Image-Text Matching via Categorization
Categorizing images and texts with two gradient boosting decision trees.
Target categories extracted from URLS:
nrw
kultur
region
panorama
sport
wirtscharft
koeln
ratgerber
politik
unknown
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 5 / 19
9. Methods • § Image-Text Matching via Categorization
Image-Text Matching via Categorization
Augment and extract image features with VGG16, InceptionResNetV2, MobileNetV2,
EfficientNetB1-7, Xception, ResNet152V2, NASNetLarge, DenseNet201.
Texts are mapped to BERT and ELECTRA contextual embeddings.
An iterative ranking method that takes into account the order of matched categories:
At the k-th iteration, finds top-k categories for each image and top-k categories
for each article.
For each article: candidate images are ones having top-k categories intersect that
of the article.
Sequentially concatenate k candidate lists, then append the remaining images to
the tail to make the final ranked list.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 6 / 19
10. Methods • § Image-Text Fusion with Image Captioning and Contextual Embeddings
Image-Text Fusion with Image Captioning and Contextual
Embeddings
We hypothesize that the description of the image is semantically similar to the title.
Captioning model consist of three parts:
Image feature extractor: We use EfficientNetTan and Le, EfficientNet:
Rethinking Model Scaling for Convolutional Neural Networks for feature
extraction. The feature has the shape (8, 8, 2048)
Feature encoder: The features pass through fully connected giving a vector
256-dims.
Decoder: To generate the caption, we use Bahdanau attentionBahdanau, Cho,
and Bengio, Neural Machine Translation by Jointly Learning to Align and
Translate and GRU to predict the next word.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 7 / 19
11. Methods • § Image-Text Fusion with Image Captioning and Contextual Embeddings
Image-Text Fusion with Image Captioning and Contextual
Embeddings
To represent the caption and the title as vectors, we use RoBERTa and doc2vec. Then
we compute their similarity via:
Stotal = Swiki + Sapnews + SRoBERTa + (1 − Dfuzzy) + (1 − Dpartial)
with
Swiki, Sapnews, SRoBERTa are cosine similarity of two vectors generated by enwiki dbow,
apnews dbow, RoBERTa, respectively
Dfuzzy , Dpartial are fuzzywuzzy and partial ratios, respectively.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 8 / 19
12. Methods • § Image-Text Fusion with Knowledge Graph-based Contextual Embeddings
Image-Text Fusion with Knowledge Graph-based Contextual
Embeddings
To account for high-level semantics, we exploit BabelNet knowledge graph.
For articles:
Link textual entities from texts to their synsets in the WordNet subset of
BabelNet using EWISER word sense disambiguator.
Use mean of accompanied SenSemBERT+LMMS embeddings corresponds to
these extracted synsets representing the texts
For images:
Use ResNET-L with Asymmetric Loss (ASL) pre-trained on OpenImagesV6 to
extract multi-label from images.
Map concatenated labels to SenSemBERT+LMMS synset embeddings similar to
the texts.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 9 / 19
13. Methods • § Image-Text Fusion with Knowledge Graph-based Contextual Embeddings
Image-Text Fusion with Knowledge Graph-based Contextual
Embeddings
Train a canonical correlation analysis (CCA) on the train set to project cross-modal
embeddings to bases of significant similarity.
Finally, rank all images in the test set using the L2-distance between the transformed
embeddings.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 10 / 19
14. Methods • § Graph-based Face-Name Matching
Graph-based Face-Name Matching
In a lot of instances, the publisher uses a portrait of somebody mentioned in the text.
Person name extraction: We use entity-fishing to automatically extract people’s
name from the text.
Face encoding: We use face recognition open-source library to detect and
represent the face as 128-dims vectors.
We connect each person mentioned in the articles with features extracted from
accompanying faces on the train set.
During testing, we encode the face from the image and aggregate the number of
matched faces connected to the people mentioned in the text.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 11 / 19
15. Methods • § Ensemble
Ensemble
The Ensemble submission combines all described methods, weighting each models
based on their efficiency. As such, the final ranking of a candidate image is:
REnsemble = w1RCaption + w2RTriplet + w3RFace + w4RKG−Fusion.
With
REnsemble, RCaption, RTriplet, RFace, RKG−Fusion are ranks of the image produced by
respective methods.
Weighting factors are empirically chosen to be w1 = w4 = 1, w2 = 0.02 and w3 = 0.25.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 12 / 19
18. Conclusion and future works
Conclusion
Our methods systematically increase the performance on the recall@100 metric.
Consistent results, i.e., high-ranking images are of relevance to queried articles.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 14 / 19
19. Conclusion and future works
Conclusion
Incorporating high-level semantics increase performance.
System builders should use multiple methods to handle different aspects of the
complex image-text multimodal relation.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 15 / 19
20. Conclusion and future works
Future Works
Investigate better fusion methods.
Thorough ablation study for proposed methods.
Enhance the dataset for thorough evaluation with information retrieval metrics
like NDCG
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 16 / 19
21. Bibliography
References I
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. 2016. arXiv:
1409.0473 [cs.CL].
Ben-Baruch, Emanuel et al. “Asymmetric Loss For Multi-Label Classification”. In: arXiv preprint arXiv:2009.14119 (2020).
Bevilacqua, Michele and Roberto Navigli. “Breaking through the 80% glass ceiling: Raising the state of the art in Word Sense Disambiguation by
incorporating knowledge graph information”. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020,
pp. 2854–2864.
Bollacker, Kurt et al. “Freebase: a collaboratively created graph database for structuring human knowledge”. In: Proceedings of the 2008 ACM
SIGMOD international conference on Management of data. 2008, pp. 1247–1250.
Branden Chan Timo Möller, Malte Pietsch Tanay Soni. “Model from https://huggingface.co/bert-base-german-cased”. In: (2020).
Chan, Branden, Stefan Schweter, and Timo Möller. German’s Next Language Model. 2020. arXiv: 2010.10906 [cs.CL].
Chollet, François. “Xception: Deep learning with depthwise separable convolutions”. In: Proceedings of the IEEE conference on computer vision and
pattern recognition. 2017, pp. 1251–1258.
dbmdz. “Model from https://huggingface.co/dbmdz/bert-base-german-uncased”. In: (2020).
Devlin, Jacob et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019. arXiv: 1810.04805 [cs.CL].
Geitgey, Adam. Face Recognition. 2018. url: https://github.com/ageitgey/face_recognition.
He, Kaiming et al. “Identity mappings in deep residual networks”. In: European conference on computer vision. Springer. 2016, pp. 630–645.
Hoffer, Elad and Nir Ailon. Deep metric learning using Triplet network. 2018. arXiv: 1412.6622 [cs.LG].
Hossain, MD Zakir et al. “A comprehensive survey of deep learning for image captioning”. In: ACM Computing Surveys (CSUR) 51.6 (2019),
pp. 1–36.
Huang, Gao et al. “Densely connected convolutional networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition.
2017, pp. 4700–4708.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 17 / 19
22. Bibliography
References II
Ke, Guolin et al. “LightGBM: A Highly Efficient Gradient Boosting Decision Tree”. In: Advances in Neural Information Processing Systems. Ed. by
I. Guyon et al. Vol. 30. Curran Associates, Inc., 2017, pp. 3146–3154. url:
https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf.
Kille, Benjamin, Andreas Lommatzsch, and Özlem Özgöbek. “News Images in MediaEval 2020”. In: Proc. of the MediaEval 2020 Workshop. Online.
2020.
King, Davis E. dlib-models. 2018. url: https://github.com/davisking/dlib-models.
Kuznetsova, Alina et al. “The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale”. In:
IJCV (2020).
Lau, Jey Han and Timothy Baldwin. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. 2016. arXiv:
1607.05368 [cs.CL].
Lopez, Patrice. Entity Fishing. 2020. url: https://github.com/kermitt2/entity-fishing.
“Model from https://huggingface.co/german-nlp-group/electra-base-german-uncased”. In: (2020).
“Model from https://huggingface.co/T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb”. In: (2020).
Navigli, Roberto and Simone Paolo Ponzetto. “BabelNet: Building a very large multilingual semantic network”. In: Proceedings of the 48th annual
meeting of the association for computational linguistics. 2010, pp. 216–225.
Oostdijk, NHJ et al. “The Connection between the Text and Images of News Articles: New Insights for Multimedia Analysis”. In: (2020).
Ridnik, Tal et al. “TResNet: High Performance GPU-Dedicated Architecture”. In: arXiv preprint arXiv:2003.13630 (2020).
Sandler, Mark et al. “Mobilenetv2: Inverted residuals and linear bottlenecks”. In: Proceedings of the IEEE conference on computer vision and pattern
recognition. 2018, pp. 4510–4520.
Simonyan, Karen and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition”. In: arXiv preprint arXiv:1409.1556
(2014).
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 18 / 19
23. Bibliography
References III
Szegedy, Christian et al. “Inception-v4, inception-resnet and the impact of residual connections on learning”. In: arXiv preprint arXiv:1602.07261
(2016).
Tan, Mingxing and Quoc V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. 2020. arXiv: 1905.11946 [cs.LG].
Xu, Kelvin et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. 2016. arXiv: 1502.03044 [cs.LG].
Zoph, Barret et al. “Learning transferable architectures for scalable image recognition”. In: Proceedings of the IEEE conference on computer vision
and pattern recognition. 2018, pp. 8697–8710.
T. Nguyen-Quang et al. HCMUS at MediaEval 2020 December 14-15, 2020 19 / 19